Token Count Optimization feature on Repomix

For this week, I had to extend my repo-contextr project with some additional features. However, this time the catch was that we didn’t have a feature requirement beforehand. Our professor gave us a CLI tool link called Repomix.

Repomix is a command-line tool that helps developers analyze and visualize their codebase for AI processing. It measures metrics like token usage, file composition, and repository structure, allowing users to optimize how their code is represented when interacting with large language models (LLMs).

While going through the user guide, I got interested in the Token Count Optimization feature. This caught my attention because I already had a feature for counting tokens in my project — though it was a rough estimate, treating each word as a single token. However, working on this project taught me that tokenization doesn’t work that way, especially when dealing with LLMs.

To elaborate on the Token Count Optimization feature — it’s used to understand how much of your codebase would “cost” in terms of LLM context tokens when processing your query. Running

repomix --token-count-tree produces a hierarchical visualization showing token counts across your project structure. You can also apply thresholds, such as

repomix --token-count-tree 1000, to focus on larger files. This helps identify token-heavy files, optimize file selection patterns, and plan compression strategies when preparing code for AI analysis.

Diving into the Implementation

First, I used GitHub’s code search to look for "token count tree", which led me to the configuration schema and the main orchestration in calculateMetrics.ts. GitHub’s search gave me an overview of the file structure, implementation details, and references to where the feature was used in other files.

However, I quickly realized that the feature was quite complex, and navigating through multiple parts of the program in the browser-based GitHub interface was becoming difficult. That’s when I decided to set up the project locally in my code editor.

After setting it up, I used

git grep "tokenCountTree"

from the terminal, which showed me everywhere this configuration option appeared. This revealed the data flow from CLI parsing → configuration → metrics calculation → output formatting.

For each major component, I opened the file in VS Code and used “Go to Definition” on every import and function call. This helped me build a mental map of how different modules connected. Whenever I encountered unfamiliar patterns, I used GitHub Copilot Chat to clarify things instead of getting stuck. This made the learning process much smoother since it can get quite complex jumping from one concept definition to another on the internet.

Understanding the Architecture

To give you an overview, Repomix uses a vertical slice architecture, where each type of metric calculation has its own self-contained module while sharing common infrastructure.

However, token calculation was handled differently — it used the tiktoken library from OpenAI.

You can see its implementation here.

In my repo-contextr project, I implemented a similar feature that calculates the total token count of the entire codebase — although in my case, it’s equivalent to the total number of words.

Exploring Parallelism and Task Execution

One thing I’m still figuring out is how workers run tasks in parallel and the role of the TaskRunner abstraction.

After tracing through processConcurrency.ts, I discovered that Repomix uses Tinypool to manage a pool of reusable worker threads.

The TaskRunner wraps Tinypool with a simple run(task) API — when you call it, Tinypool queues the task and assigns it to an available worker.

The clever part is TASKS_PER_THREAD = 100:

With 500 files on 8 cores, it creates only 5 workers instead of 8, avoiding unnecessary thread startup overhead.

What still confuses me is Tinypool’s internal scheduling — when Promise.all() submits 500 tasks simultaneously, how does it decide which worker gets which task? I also don’t fully understand when to use runtime: 'worker_threads' vs runtime: 'child_process'.

My Plan for Repo-Contextr

I liked the idea of separating the token count for each file, so I plan to implement this feature in my repo-contextr project as well.

However, I won’t be implementing actual tokenization in the context of LLMs or using parallelism at this stage.

Still, exploring Repomix has given me a clearer understanding of how large-scale tools are structured — from CLI parsing to concurrent task orchestration — and it has definitely influenced how I’ll approach the next iteration of my project.

Leave a Reply