|
1 | | -# tuneprompt 🎛️ |
| 1 | +# TunePrompt |
2 | 2 |
|
3 | | -**Stop guessing. Start measuring. Automate your prompt engineering.** |
| 3 | +Industrial-grade testing framework for LLM prompts |
4 | 4 |
|
5 | | -`tuneprompt` is a local-first CLI framework that treats prompts like unit tests. It moves LLM evaluation from "vibes" to deterministic and semantic scoring, directly inside your CI/CD pipeline. |
| 5 | +## Overview |
6 | 6 |
|
7 | | ---- |
| 7 | +TunePrompt is a comprehensive testing framework designed specifically for Large Language Model (LLM) prompts. It helps developers validate, test, and optimize their prompts with industrial-grade reliability and accuracy. |
8 | 8 |
|
9 | | -## ⚡ The Problem |
| 9 | +## Features |
10 | 10 |
|
11 | | -LLMs are non-deterministic. Traditional "Exact Match" testing fails because a model might be 100% correct but use different wording. This makes it difficult to maintain consistent quality and reliability in production applications that depend on LLM outputs. |
| 11 | +- **Multi-provider Support**: Test prompts across OpenAI, Anthropic, OpenRouter, and other LLM providers |
| 12 | +- **Semantic Testing**: Compare outputs using semantic similarity rather than exact matches |
| 13 | +- **JSON Validation**: Validate structured JSON outputs |
| 14 | +- **LLM-based Judging**: Use advanced LLMs to evaluate prompt quality |
| 15 | +- **Watch Mode**: Automatically re-run tests when files change |
| 16 | +- **CI/CD Integration**: Seamlessly integrate with your CI/CD pipeline |
| 17 | +- **Cloud Sync**: Upload results to the TunePrompt Cloud dashboard |
| 18 | +- **Auto-fix Engine**: Premium feature to automatically fix failing prompts using AI |
| 19 | +- **Detailed Reporting**: Comprehensive test reports with scores, methods, and durations |
12 | 20 |
|
13 | | -## 🚀 The Solution |
14 | | - |
15 | | -`tuneprompt` uses **Semantic Similarity** (embeddings) to score responses based on intent and meaning, not just strings. If your prompt changes cause a regression, your build fails. |
16 | | - |
17 | | -### Key Features |
18 | | - |
19 | | -* **Semantic Scoring (FREE):** Uses local embeddings to verify output meaning against expectations. |
20 | | -* **CI/CD Native:** Integrates seamlessly with GitHub Actions, GitLab CI, and other CI/CD platforms to block regressions. |
21 | | -* **Watch Mode:** Iterate on prompts in real-time with instant feedback. |
22 | | -* **Multi-Provider Support:** Works with OpenAI, Anthropic, OpenRouter, and custom LLM endpoints. |
23 | | -* **Auto-Fix (Premium):** Don't just find errors—fix them. Our engine rewrites failing prompts automatically. |
24 | | -* **Performance Metrics:** Track latency, token usage, and cost alongside semantic accuracy. |
25 | | - |
26 | | ---- |
27 | | - |
28 | | -## 📦 Installation |
| 21 | +## Installation |
29 | 22 |
|
30 | 23 | ```bash |
31 | 24 | npm install -g tuneprompt |
32 | 25 | ``` |
33 | 26 |
|
34 | | -Or use npx without installation: |
35 | | - |
36 | | -```bash |
37 | | -npx tuneprompt@latest run |
38 | | -``` |
39 | | - |
40 | | ---- |
41 | | - |
42 | | -## 🛠️ Quick Start |
43 | | - |
44 | | -### 1. Initialize your project |
45 | | - |
46 | | -This creates a `tuneprompt.config.js` and a sample test directory. |
| 27 | +## Quick Start |
47 | 28 |
|
| 29 | +1. Initialize a new project: |
48 | 30 | ```bash |
49 | 31 | tuneprompt init |
50 | 32 | ``` |
51 | 33 |
|
52 | | -### 2. Define a Test Case (`tests/onboarding.json`) |
53 | | - |
54 | | -```json |
55 | | -{ |
56 | | - "description": "User onboarding welcome message", |
57 | | - "prompt": "Generate a welcome message for a user named Alice.", |
58 | | - "expect": "Welcome, Alice! We are glad you are here.", |
59 | | - "config": { |
60 | | - "threshold": 0.85, |
61 | | - "method": "semantic" |
62 | | - } |
63 | | -} |
64 | | -``` |
65 | | - |
66 | | -### 3. Run the Suite |
| 34 | +2. Create test files in the `tests` directory with your prompts and expectations |
67 | 35 |
|
| 36 | +3. Run tests: |
68 | 37 | ```bash |
69 | 38 | tuneprompt run |
70 | 39 | ``` |
71 | 40 |
|
72 | | -### 4. Watch Mode for Development |
73 | | - |
74 | | -Iterate on prompts with live feedback: |
75 | | - |
76 | | -```bash |
77 | | -tuneprompt watch |
78 | | -``` |
79 | | - |
80 | | ---- |
81 | | - |
82 | | -## 🧪 Test Configuration |
83 | | - |
84 | | -Tests are defined as JSON files in your test directory. Each test includes: |
85 | | - |
86 | | -- `description`: Human-readable description of the test |
87 | | -- `prompt`: The input prompt to test |
88 | | -- `expect`: Expected output for semantic comparison |
89 | | -- `config`: Test-specific configuration options |
90 | | - |
91 | | -### Configuration Options |
92 | | - |
93 | | -| Option | Type | Description | |
94 | | -|--------|------|-------------| |
95 | | -| `threshold` | Number | Semantic similarity threshold (0.0 - 1.0) | |
96 | | -| `method` | String | Scoring method (`semantic`, `exact`, `regex`) | |
97 | | -| `provider` | String | LLM provider to use for this test | |
98 | | -| `timeout` | Number | Request timeout in milliseconds | |
99 | | - |
100 | | ---- |
101 | | - |
102 | | -## 🤖 Continuous Integration |
103 | | - |
104 | | -Ensure prompt integrity on every Pull Request. Add this to `.github/workflows/prompt-test.yml`: |
105 | | - |
106 | | -```yaml |
107 | | -name: Prompt Integrity Check |
108 | | -on: [pull_request] |
109 | | -jobs: |
110 | | - test-prompts: |
111 | | - runs-on: ubuntu-latest |
112 | | - steps: |
113 | | - - uses: actions/checkout@v3 |
114 | | - - name: Setup Node.js |
115 | | - uses: actions/setup-node@v3 |
116 | | - with: |
117 | | - node-version: '18' |
118 | | - - name: Install & Test |
119 | | - run: | |
120 | | - npm install -g tuneprompt |
121 | | - tuneprompt run --ci |
122 | | - env: |
123 | | - OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }} |
124 | | -``` |
125 | | -
|
126 | | -For GitLab CI, add this to `.gitlab-ci.yml`: |
127 | | - |
128 | | -```yaml |
129 | | -prompt-tests: |
130 | | - stage: test |
131 | | - script: |
132 | | - - npm install -g tuneprompt |
133 | | - - tuneprompt run --ci |
134 | | - variables: |
135 | | - OPENAI_API_KEY: $CI_OPENAI_API_KEY |
136 | | -``` |
137 | | - |
138 | | ---- |
139 | | - |
140 | | -## 🧠 Premium: The "Auto-Fix" Engine |
141 | | - |
142 | | -**Stop tweaking words manually.** When a test fails, `tuneprompt` doesn't just complain—it solves. |
143 | | - |
| 41 | +4. Run tests with cloud sync (requires activation): |
144 | 42 | ```bash |
145 | | -tuneprompt fix |
| 43 | +tuneprompt run --cloud |
146 | 44 | ``` |
147 | 45 |
|
148 | | -The engine analyzes the failure, extracts constraints, and uses iterative meta-prompting to **rewrite your prompt** until it passes the test. |
| 46 | +## Commands |
149 | 47 |
|
150 | | ---- |
| 48 | +- `tuneprompt init`: Initialize a new TunePrompt project |
| 49 | +- `tuneprompt run`: Run prompt tests |
| 50 | +- `tuneprompt run --watch`: Run tests in watch mode |
| 51 | +- `tuneprompt run --cloud`: Run tests and upload results to cloud |
| 52 | +- `tuneprompt run --ci`: Run tests in CI mode |
| 53 | +- `tuneprompt fix`: Auto-fix failing prompts (Premium feature) |
| 54 | +- `tuneprompt history`: View test run history |
| 55 | +- `tuneprompt activate [subscription-id]`: Activate your Premium license |
| 56 | +- `tuneprompt status`: Check license status |
151 | 57 |
|
152 | | -## 📜 Configuration |
| 58 | +## Configuration |
153 | 59 |
|
154 | | -Create a `tuneprompt.config.js` file in your project root: |
| 60 | +TunePrompt uses a configuration file to define providers and settings. The default location is `tuneprompt.config.js` in your project root. |
155 | 61 |
|
| 62 | +Example configuration: |
156 | 63 | ```javascript |
157 | | -// tuneprompt.config.js |
158 | 64 | module.exports = { |
159 | 65 | providers: { |
160 | 66 | openai: { |
161 | 67 | apiKey: process.env.OPENAI_API_KEY, |
162 | 68 | model: 'gpt-4o', |
163 | | - temperature: 0.7 |
164 | 69 | }, |
165 | 70 | anthropic: { |
166 | 71 | apiKey: process.env.ANTHROPIC_API_KEY, |
167 | | - model: 'claude-3-5-sonnet' |
| 72 | + model: 'claude-3-opus-20240229', |
168 | 73 | }, |
169 | 74 | openrouter: { |
170 | 75 | apiKey: process.env.OPENROUTER_API_KEY, |
171 | | - model: 'openai/gpt-4o' |
| 76 | + model: 'openai/gpt-4o', |
172 | 77 | } |
173 | 78 | }, |
174 | | - testDir: './tests', // Directory containing test files |
175 | | - threshold: 0.8, // Default semantic similarity |
176 | | - timeout: 30000, // Default request timeout (ms) |
177 | | - concurrency: 5, // Number of concurrent requests |
178 | | - verbose: false // Enable detailed logging |
| 79 | + threshold: 0.85, |
| 80 | + testDir: './tests', |
| 81 | + outputFormat: 'table' |
179 | 82 | }; |
180 | 83 | ``` |
181 | 84 |
|
182 | | -### Environment Variables |
183 | | - |
184 | | -Set these environment variables to configure your LLM providers: |
185 | | - |
186 | | -- `OPENAI_API_KEY` - OpenAI API key |
187 | | -- `ANTHROPIC_API_KEY` - Anthropic API key |
188 | | -- `OPENROUTER_API_KEY` - OpenRouter API key |
189 | | -- `TUNEPROMPT_LICENSE_KEY` - Premium license key |
| 85 | +## Test File Format |
190 | 86 |
|
191 | | ---- |
| 87 | +Tests are defined in JSON files in the `tests` directory. Each test file contains an array of test cases: |
192 | 88 |
|
193 | | -## 🛠️ CLI Commands |
194 | | - |
195 | | -| Command | Description | |
196 | | -|---------|-------------| |
197 | | -| `tuneprompt init` | Initialize a new project with config and sample tests | |
198 | | -| `tuneprompt run` | Run all tests once | |
199 | | -| `tuneprompt watch` | Watch for changes and run tests automatically | |
200 | | -| `tuneprompt fix` | Auto-fix failing prompts (premium) | |
201 | | -| `tuneprompt report` | Generate detailed test reports | |
202 | | -| `tuneprompt activate` | Activate premium features | |
203 | | - |
204 | | -### CLI Options |
| 89 | +```json |
| 90 | +[ |
| 91 | + { |
| 92 | + "description": "User onboarding welcome message", |
| 93 | + "prompt": "Generate a friendly welcome message for a user named {{name}}.", |
| 94 | + "variables": { |
| 95 | + "name": "Alice" |
| 96 | + }, |
| 97 | + "expect": "Welcome, Alice! We are glad you are here.", |
| 98 | + "config": { |
| 99 | + "threshold": 0.85, |
| 100 | + "method": "semantic", |
| 101 | + "model": "gpt-4o", |
| 102 | + "provider": "openai" |
| 103 | + } |
| 104 | + } |
| 105 | +] |
| 106 | +``` |
205 | 107 |
|
206 | | -- `--ci` - CI mode (exits with code 1 on test failure) |
207 | | -- `--verbose` - Show detailed output |
208 | | -- `--report` - Generate test reports in various formats |
209 | | -- `--provider` - Override default provider for this run |
210 | | -- `--threshold` - Override default threshold for this run |
| 108 | +## Testing Methods |
211 | 109 |
|
212 | | ---- |
| 110 | +- `exact`: Exact string match |
| 111 | +- `semantic`: Semantic similarity comparison |
| 112 | +- `json`: JSON structure validation |
| 113 | +- `llm-judge`: LLM-based evaluation |
213 | 114 |
|
214 | | -## 📊 Reporting |
| 115 | +## Cloud Integration |
215 | 116 |
|
216 | | -Generate detailed reports in multiple formats: |
| 117 | +TunePrompt offers cloud synchronization for storing test results and viewing them in a dashboard. To use cloud features: |
217 | 118 |
|
| 119 | +1. Purchase a subscription at [TunePrompt website] |
| 120 | +2. Activate your license: |
218 | 121 | ```bash |
219 | | -# Generate HTML report |
220 | | -tuneprompt run --report html |
221 | | -
|
222 | | -# Generate JSON report |
223 | | -tuneprompt run --report json |
224 | | -
|
225 | | -# Generate JUnit XML for CI integration |
226 | | -tuneprompt run --report junit |
| 122 | +tuneprompt activate [your-subscription-id] |
| 123 | +``` |
| 124 | +3. Run tests with cloud sync: |
| 125 | +```bash |
| 126 | +tuneprompt run --cloud |
227 | 127 | ``` |
228 | 128 |
|
229 | | ---- |
230 | | - |
231 | | -## 🔐 Privacy & Security |
232 | | - |
233 | | -- All semantic comparisons happen locally using open-source embedding models |
234 | | -- Your prompts and expected outputs never leave your machine (unless using premium cloud features) |
235 | | -- API keys are only sent to the respective LLM providers |
236 | | -- Premium features offer optional cloud processing for faster results |
| 129 | +## Premium Features |
237 | 130 |
|
238 | | ---- |
| 131 | +- **Auto-fix Engine**: Automatically repair failing prompts using AI |
| 132 | +- **Cloud sync & team collaboration**: Store results in the cloud and collaborate with your team |
| 133 | +- **Advanced diagnostics**: Detailed insights and recommendations |
239 | 134 |
|
240 | | -## 🤝 Contributing |
| 135 | +## Environment Variables |
241 | 136 |
|
242 | | -We welcome contributions! Please see our [Contributing Guide](CONTRIBUTING.md) for details on how to get started. |
| 137 | +Create a `.env` file in your project root with your API keys: |
243 | 138 |
|
244 | | ---- |
| 139 | +```env |
| 140 | +OPENAI_API_KEY=your_openai_api_key |
| 141 | +ANTHROPIC_API_KEY=your_anthropic_api_key |
| 142 | +OPENROUTER_API_KEY=your_openrouter_api_key |
| 143 | +``` |
245 | 144 |
|
246 | | -## ⚖️ License |
| 145 | +## Contributing |
247 | 146 |
|
248 | | -MIT © Tuneprompt. Premium features require a license key via `tuneprompt activate`. |
| 147 | +Contributions are welcome! Please feel free to submit a Pull Request. |
249 | 148 |
|
250 | | -For commercial use and enterprise support, contact us at [contact@tuneprompt.com](mailto:contact@tuneprompt.com). |
| 149 | +## License |
251 | 150 |
|
252 | | ---- |
| 151 | +MIT |
0 commit comments