AI Distiller (aid)
Note: This is the very first version of this tool. We would be very grateful for any feedback in the form of a discussion or by creating an issue on GitHub. Thank you!
π MCP Server Available: Install the Model Context Protocol server for AI Distiller from NPM: @janreges/ai-distiller-mcp - seamlessly integrate with Claude, Cursor, and other MCP-compatible AI tools!
π€ Why AI Distiller?
Do you work with large-scale projects that have thousands of files and functions? Do you struggle with AI tools like Claude Code, Gemini, Copilot, or Cursor frequently "hallucinating" and generating code that looks correct at first glance but is actually incompatible with your project?
The problem is context. AI models have a limited context window and cannot comprehend your entire codebase. Instead, AI agents search files, "grep" for keywords, look at a few lines before and after the found term, and try (often, but not always) to guess the interface of your classes and functions. The result? Code full of errors that guesses parameters, returns incorrect data types, and ignores the existing architecture. If you are a sophisticated user of AI agents (vibe coder), you know that you can help yourself by instructing the AI ββagent to consistently write and run tests, using static code analysis, pre-commit hooks, etc. - the AI ββagent will usually fix the code itself, but in the meantime it will take 20 steps and 5 minutes. On the other hand, it must be admitted that if you pay for each AI request (and large context is an expensive factor) and are not "playing for time", you may not mind this limited context approach.
AI Distiller (or aid for short) helps solve this problem. Its main function is code "distillation" β a process where it extracts only the most essential information from the entire project (ideally from the main source folder, or a specific module subdirectory for extremely large projects) that the AI needs to write code correctly on the first try. This distillation usually generates a context that is only 5-20% of the original source code volume, allowing AI tools to include it in their context. As a result, the AI uses the existing code exactly as it was designed, not by trial and error.
Very simply, it can be said that aid, within the distillation process, will leave only the public parts of the interface, input and output data types, but in the default state it will discard method implementations and non-public structures. But everything is configurable via CLI Options.
Table of Contents
- π€ Why AI Distiller?
- β¨ Key Features
- π― How It Works
- π Dependency-Aware Distillation
- π Quick Start
- π Example Output
- π Guides & Examples
- π Complete CLI Reference
- π οΈ Advanced Usage
- β οΈ Limitations
- π Security Considerations
- β FAQ
- π€ Contributing
- π License
- π Acknowledgments
β¨ Key Features
| Feature | Description |
|---|---|
| π Extreme Speed | Processes tens of megabytes of code in hundreds of milliseconds. By default, it uses 80% of available CPU cores, but can be configured, e.g., with --workers=1 to use only a single CPU core. |
| π§ Intelligent Distillation | Understands 12+ programming languages and extracts only public APIs (methods, properties, types). |
| βοΈ High Configurability | Allows including private, protected, and internal members, implementation, or comments. |
| π€ AI Prompt Generation | Generates ready-to-use prompts with distilled code for AI analysis. The tool creates files with prompts that AI agents can then execute for security audits, refactoring, etc. See --ai-action switch. |
| π Analysis Automation | Creates a complete checklist and directory structure for AI agents, who can then systematically analyze the entire project. See the flow-for-* actions for the --ai-action switch. |
| π Git Analysis | Processes commit history and prepares data for in-depth analysis of development quality and team dynamics. |
| π» Multi-platform | A single binary file with no dependencies for Windows, Linux, and macOS (x64 & ARM). |
| π Integration via MCP | Can be integrated into tools like Claude Code, VS Code, Cursor, Windsurf and others thanks to the included MCP server. |
π― Intelligent Filtering
Control exactly what to include with our new granular flag system:
Visibility Control:
--public=1(default) - Include public members--protected=0(default) - Exclude protected members--internal=0(default) - Exclude internal/package-private--private=0(default) - Exclude private members
Content Control:
--comments=0(default) - Exclude comments--docstrings=1(default) - Include documentation--implementation=0(default) - Exclude function/methods bodies--imports=1(default) - Include import/use statements
Default behavior: Shows only public API signatures with basic documentation - perfect for AI understanding while maintaining maximum compression.
π€ AI-Powered Analysis Prompt Generation
AI Distiller generates specialized prompts combined with distilled code for AI-driven analysis:
--ai-action=flow-for-deep-file-to-file-analysis- Generates task lists and prompts for systematic file-by-file analysis--ai-action=flow-for-multi-file-docs- Creates documentation workflow prompts with code structure- Output to files - Prompts are saved to
.aid/directory (or use--stdoutfor small codebases) - Ready for AI execution - Generated files contain both the analysis prompt and distilled code
- AI agent instructions - Output includes guidance for AI agents to read and process the generated files
- Gemini advantage - 1M token context window perfect for larger codebase analysis
Note: AI Distiller doesn't perform the analysis itself - it prepares optimized prompts that AI agents (Claude, Gemini, ChatGPT) then execute. Users often need to explicitly ask their AI agent to process the generated file or copy its contents to web-based AI tools.
π Multiple Output Formats
- Text (
--format text) - Ultra-compact for AI consumption (default) - Markdown (
--format md) - Clean, structured Markdown - JSON Structured (
--format json-structured) - Rich semantic data for tools - JSONL (
--format jsonl) - Streaming format - XML (
--format xml) - Legacy system compatible
π Smart Summary Output
After each distillation, AI Distiller displays a summary showing compression efficiency and processing speed:
# Default: Visual progress bar for interactive terminals (green dots = saved, red dots = remaining)
β¨ Distilled 970 files [βββββββββββββββ] 98% (10M β 256K) in 231ms π° ~2.4M tokens saved (~64k remaining)
# Choose your preferred format with --summary-type
aid ./src --summary-type=stock-ticker
π AID 97.6% β² β SIZE: 10Mβ256K β TIME: 231ms β EST: ~2.4M tokens saved
# JSON output
aid ./src --summary-type=json
{
"original_bytes": 70020,
"distilled_bytes": 8244,
"savings_pct": 88.22622107969151,
"duration_ms": 6,
"tokens_before": 17505,
"tokens_after": 2061,
"tokens_saved": 15444,
"token_savings_pct": 88.22622107969151,
"file_count": 9,
"output_path": "/home/user/project/.aid/aid.processor.txt",
"tokenizer": "cl100k_base"
}Available formats:
visual-progress-bar(default) - Shows compression as a progress barstock-ticker- Compact stock market style displayspeedometer-dashboard- Multi-line dashboard with metricsminimalist-sparkline- Single line with all essential infoci-friendly- Clean format for CI/CD pipelinesjson- Machine-readable JSON outputoff- Disable summary output
Use --no-emoji to remove emojis from any format.
π Smart Project Root Detection
AI Distiller automatically detects your project root and centralizes all outputs in a .aid/ directory:
- Automatic detection: Searches upward for
.aidrc,go.mod,package.json,.git, etc. - Consistent location: All outputs go to
<project-root>/.aid/regardless of where you runaid - Cache management: MCP cache stored in
.aid/cache/for better organization - Easy cleanup: Add
.aid/to.gitignoreto keep outputs out of version control
Detection priority:
.aidrcfile - Create this empty file to explicitly mark your project root- Language markers -
go.mod,package.json,pyproject.toml, etc. - Version control -
.gitdirectory - Environment variable -
AID_PROJECT_ROOT(fallback if no markers found) - Current directory - Final fallback with warning
# Mark a specific directory as project root (recommended)
touch /my/project/.aidrc
# Run from anywhere in your project - outputs always go to project root
cd deep/nested/directory
aid ../../../src # Output: <project-root>/.aid/aid.src.txt
# Use environment variable as fallback (useful for CI/CD)
AID_PROJECT_ROOT=/build/workspace aid src/π Language Support
Currently supports 12 languages via tree-sitter:
- Full Support: Python, Go, JavaScript, PHP, Ruby
- Beta: TypeScript, Java, C#, Rust, Kotlin, Swift, C++
- Coming Soon: Zig, Scala, Clojure
Language-Specific Documentation:
- C++ - C++11/14/17/20 support with templates, namespaces, modern features
- C# - Complete C# 12 support with records, nullable reference types, pattern matching
- Go - Full Go support with interfaces, goroutines, generics (1.18+)
- Java - Java 8-21 support with records, sealed classes, pattern matching
- JavaScript - ES6+ support with classes, modules, async/await
- Kotlin - Kotlin 1.x support with coroutines, data classes, sealed classes
- PHP - PHP 7.4+ with PHP 8.x features (attributes, union types, enums)
- Python - Full Python 3.x support with type hints, async/await, decorators
- Ruby - Ruby 2.x/3.x support with blocks, modules, metaprogramming
- Rust - Rust 2018/2021 editions with traits, lifetimes, async
- Swift - Swift 5.x support with protocols, extensions, property wrappers
- TypeScript - TypeScript 4.x/5.x with generics, decorators, type system
π― How It Works
- Scans your codebase recursively for supported file types (10+ languages)
- Parses each file using language-specific tree-sitter parsers (all bundled, no dependencies)
- Extracts only what you need: public APIs, type signatures, class hierarchies
-
β¦