Skip to content

Commit 82b2cd7

Browse files
committed
docs: add missing feature documentation to README files
Add custom instructions, token count, output splitting, agent skills generation sections. Expand remote URL formats, comment removal language list, config JSON with input/token_count fields, and jq examples.
1 parent 45bcec4 commit 82b2cd7

2 files changed

Lines changed: 296 additions & 18 deletions

File tree

README.md

Lines changed: 148 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -140,6 +140,9 @@ Create a `repomix.config.json` file in your project root for custom configuratio
140140

141141
```json
142142
{
143+
"input": {
144+
"max_file_size": 52428800
145+
},
143146
"output": {
144147
"file_path": "repomix-output.md",
145148
"style": "markdown",
@@ -185,12 +188,16 @@ Create a `repomix.config.json` file in your project root for custom configuratio
185188
"url": "",
186189
"branch": ""
187190
},
188-
"include": []
191+
"include": [],
192+
"token_count": {
193+
"encoding": "o200k_base"
194+
}
189195
}
190196
```
191197

192198
> [!NOTE]
193-
> *Note on `remove_comments`*: This feature is language-aware, correctly handling comment syntax for various languages like Python, JavaScript, C++, HTML, etc., rather than using a simple generic pattern.*
199+
> *Note on `remove_comments`*: This feature is language-aware, correctly handling comment syntax for various languages rather than using a simple generic pattern. Supported languages:
200+
> Python, JavaScript, TypeScript, JSX, TSX, Vue, Svelte, Java, C, C++, C#, Go, Rust, Ruby, PHP, Swift, Kotlin, HTML, CSS, XML, YAML
194201
195202
#### Remote Repository Configuration
196203

@@ -201,6 +208,25 @@ The `remote` section allows you to configure remote repository processing:
201208

202209
When a remote URL is specified in the configuration, Repomix will process the remote repository instead of the local directory. This can be overridden by CLI parameters (`--remote-branch`).
203210

211+
You can use various URL formats with `--remote`:
212+
213+
```bash
214+
# GitHub shorthand
215+
repomix --remote user/repo
216+
217+
# Full GitHub URL
218+
repomix --remote https://github.com/user/repo
219+
220+
# Specific branch
221+
repomix --remote https://github.com/user/repo --remote-branch feature-branch
222+
223+
# Specific tag
224+
repomix --remote https://github.com/user/repo --remote-branch v1.0.0
225+
226+
# Specific commit
227+
repomix --remote https://github.com/user/repo --remote-branch abc123
228+
```
229+
204230
**Command Line Options**
205231

206232
- `repomix [directories...]`: Target directories (defaults to current directory). Supports multiple directories.
@@ -259,11 +285,111 @@ Disable checks via configuration or CLI:
259285
repomix --no-security-check
260286
```
261287

262-
### 4.4 Code Compression
288+
### 4.4 Custom Instructions
289+
290+
You can add custom instructions to the output file that will guide AI tools on how to interpret and use the packed codebase.
291+
292+
Create a markdown file (e.g., `repomix-instruction.md`) with your instructions:
293+
294+
```markdown
295+
## Project Context
296+
This is a Python web application using FastAPI.
297+
Please follow PEP 8 conventions when suggesting code changes.
298+
```
299+
300+
Then specify the path via CLI or configuration:
301+
302+
```bash
303+
# Via CLI
304+
repomix --instruction-file-path repomix-instruction.md
305+
306+
# Via configuration (repomix.config.json)
307+
```
308+
309+
```json
310+
{
311+
"output": {
312+
"instruction_file_path": "repomix-instruction.md"
313+
}
314+
}
315+
```
316+
317+
The instruction content will be included in the "Instruction" section of the output file.
318+
319+
### 4.5 Token Count
320+
321+
Repomix provides token counting to help you understand the size of your codebase in terms of AI model tokens.
322+
323+
#### Choosing an Encoding
324+
325+
Use `--token-count-encoding` to select the tokenizer encoding:
326+
327+
```bash
328+
# Use GPT-4o encoding (default)
329+
repomix --token-count-encoding o200k_base
330+
331+
# Use GPT-3.5/4 encoding
332+
repomix --token-count-encoding cl100k_base
333+
```
334+
335+
#### Visualizing Token Distribution
336+
337+
Use `--token-count-tree` to display a file tree with token counts for each file:
338+
339+
```bash
340+
# Show all files with token counts
341+
repomix --token-count-tree
342+
343+
# Show only files with 100 or more tokens
344+
repomix --token-count-tree 100
345+
```
346+
347+
### 4.6 Splitting Output for Large Codebases
348+
349+
For large codebases that exceed AI model context limits, you can split the output into multiple files:
350+
351+
```bash
352+
# Split into files of approximately 500KB each
353+
repomix --split-output 500kb
354+
355+
# Split into files of approximately 2MB each
356+
repomix --split-output 2mb
357+
```
358+
359+
Output files will be numbered sequentially (e.g., `repomix-output.1.md`, `repomix-output.2.md`, etc.). Files are split at directory boundaries to keep related files together.
360+
361+
### 4.7 Agent Skills Generation
362+
363+
Repomix can generate [Claude Agent Skills](https://docs.anthropic.com/en/docs/agents-and-tools/claude-code/skills) format output, which provides structured reference materials for AI coding agents.
364+
365+
```bash
366+
# Generate skills with auto-detected name
367+
repomix --skill-generate
368+
369+
# Generate skills with a custom name
370+
repomix --skill-generate my-project
371+
372+
# Specify output directory directly
373+
repomix --skill-output ./my-skills-dir
374+
```
375+
376+
This creates the following directory structure:
377+
378+
```
379+
.claude/skills/<name>/
380+
├── SKILL.md # Entry point with usage guide
381+
└── references/
382+
├── summary.md # Purpose, format, and statistics
383+
├── project-structure.md # Directory tree with line counts
384+
├── files.md # All file contents
385+
└── tech-stack.md # Languages, frameworks, dependencies
386+
```
387+
388+
### 4.8 Code Compression
263389

264390
Repomix provides advanced code compression capabilities to reduce output size while preserving essential information. This feature is particularly useful when working with large codebases or when you need to focus on specific aspects of your code.
265391

266-
#### 4.4.1 Compression Modes
392+
#### 4.8.1 Compression Modes
267393

268394
**Interface Mode** (`keep_interfaces: true`)
269395
- Preserves function and class signatures with their complete type annotations
@@ -282,7 +408,7 @@ Repomix provides advanced code compression capabilities to reduce output size wh
282408
- Keeps only global variables, imports, and module-level code
283409
- Maximum compression for focusing on configuration and constants
284410

285-
#### 4.4.2 Configuration Options
411+
#### 4.8.2 Configuration Options
286412

287413
```json
288414
{
@@ -295,7 +421,7 @@ Repomix provides advanced code compression capabilities to reduce output size wh
295421
}
296422
```
297423

298-
#### 4.4.3 Usage Examples
424+
#### 4.8.3 Usage Examples
299425

300426
**Generate API Documentation:**
301427
```bash
@@ -315,7 +441,7 @@ repomix --config-override '{"compression": {"enabled": true, "keep_interfaces":
315441
repomix --config-override '{"compression": {"enabled": true, "keep_signatures": false}}'
316442
```
317443

318-
#### 4.4.4 Language Support
444+
#### 4.8.4 Language Support
319445

320446
Currently, advanced compression features are fully supported for:
321447
- **Python**: Complete AST-based compression with all modes
@@ -331,7 +457,7 @@ Currently, advanced compression features are fully supported for:
331457
- **CSS**: Tree-sitter based compression
332458
- **Other Languages**: Basic compression with warnings (future enhancement planned)
333459

334-
#### 4.4.5 Example Output
460+
#### 4.8.5 Example Output
335461

336462
**Original Python Code:**
337463
```python
@@ -370,7 +496,7 @@ def calculate_sum(a: int, b: int) -> int:
370496
pass
371497
```
372498

373-
### 4.5 Ignore Patterns
499+
### 4.9 Ignore Patterns
374500

375501
Repomix provides multiple methods to set ignore patterns for excluding specific files or directories during the packing process:
376502

@@ -604,6 +730,19 @@ JSON format is ideal for:
604730
- Programmatic processing of codebase analysis
605731
- Building custom pipelines and workflows
606732

733+
You can use `jq` to extract specific information from the JSON output:
734+
735+
```bash
736+
# Extract file paths
737+
cat repomix-output.json | jq '.files[].path'
738+
739+
# Get total token count
740+
cat repomix-output.json | jq '.summary.total_tokens'
741+
742+
# Find files with more than 1000 tokens
743+
cat repomix-output.json | jq '.files[] | select(.tokens > 1000) | {path, tokens}'
744+
```
745+
607746
## 🛠️ 6. Advanced Usage
608747

609748
### 6.1 Library Usage

0 commit comments

Comments
 (0)