88
99## ⚡ The Problem
1010
11- LLMs are non-deterministic. Traditional "Exact Match" testing fails because a model might be 100% correct but use different wording.
11+ LLMs are non-deterministic. Traditional "Exact Match" testing fails because a model might be 100% correct but use different wording. This makes it difficult to maintain consistent quality and reliability in production applications that depend on LLM outputs.
1212
1313## 🚀 The Solution
1414
@@ -17,19 +17,28 @@ LLMs are non-deterministic. Traditional "Exact Match" testing fails because a mo
1717### Key Features
1818
1919* ** Semantic Scoring (FREE):** Uses local embeddings to verify output meaning against expectations.
20- * ** CI/CD Native:** Drops into GitHub Actions or GitLab CI to block regressions.
20+ * ** CI/CD Native:** Integrates seamlessly with GitHub Actions, GitLab CI, and other CI/CD platforms to block regressions.
2121* ** Watch Mode:** Iterate on prompts in real-time with instant feedback.
22+ * ** Multi-Provider Support:** Works with OpenAI, Anthropic, OpenRouter, and custom LLM endpoints.
2223* ** Auto-Fix (Premium):** Don't just find errors—fix them. Our engine rewrites failing prompts automatically.
24+ * ** Performance Metrics:** Track latency, token usage, and cost alongside semantic accuracy.
2325
2426---
2527
2628## 📦 Installation
2729
2830``` bash
2931npm install -g tuneprompt
32+ ```
33+
34+ Or use npx without installation:
3035
36+ ``` bash
37+ npx tuneprompt@latest run
3138```
3239
40+ ---
41+
3342## 🛠️ Quick Start
3443
3544### 1. Initialize your project
@@ -38,7 +47,6 @@ This creates a `tuneprompt.config.js` and a sample test directory.
3847
3948``` bash
4049tuneprompt init
41-
4250```
4351
4452### 2. Define a Test Case (` tests/onboarding.json ` )
@@ -53,18 +61,44 @@ tuneprompt init
5361 "method" : " semantic"
5462 }
5563}
56-
5764```
5865
5966### 3. Run the Suite
6067
6168``` bash
6269tuneprompt run
70+ ```
71+
72+ ### 4. Watch Mode for Development
6373
74+ Iterate on prompts with live feedback:
75+
76+ ``` bash
77+ tuneprompt watch
6478```
6579
6680---
6781
82+ ## 🧪 Test Configuration
83+
84+ Tests are defined as JSON files in your test directory. Each test includes:
85+
86+ - ` description ` : Human-readable description of the test
87+ - ` prompt ` : The input prompt to test
88+ - ` expect ` : Expected output for semantic comparison
89+ - ` config ` : Test-specific configuration options
90+
91+ ### Configuration Options
92+
93+ | Option | Type | Description |
94+ | --------| ------| -------------|
95+ | ` threshold ` | Number | Semantic similarity threshold (0.0 - 1.0) |
96+ | ` method ` | String | Scoring method (` semantic ` , ` exact ` , ` regex ` ) |
97+ | ` provider ` | String | LLM provider to use for this test |
98+ | ` timeout ` | Number | Request timeout in milliseconds |
99+
100+ ---
101+
68102## 🤖 Continuous Integration
69103
70104Ensure prompt integrity on every Pull Request. Add this to ` .github/workflows/prompt-test.yml ` :
@@ -77,11 +111,28 @@ jobs:
77111 runs-on : ubuntu-latest
78112 steps :
79113 - uses : actions/checkout@v3
114+ - name : Setup Node.js
115+ uses : actions/setup-node@v3
116+ with :
117+ node-version : ' 18'
80118 - name : Install & Test
81119 run : |
82120 npm install -g tuneprompt
83121 tuneprompt run --ci
122+ env :
123+ OPENAI_API_KEY : ${{ secrets.OPENAI_API_KEY }}
124+ ` ` `
125+
126+ For GitLab CI, add this to ` .gitlab-ci.yml`:
84127
128+ ` ` ` yaml
129+ prompt-tests:
130+ stage: test
131+ script:
132+ - npm install -g tuneprompt
133+ - tuneprompt run --ci
134+ variables:
135+ OPENAI_API_KEY: $CI_OPENAI_API_KEY
85136` ` `
86137
87138---
92143
93144` ` ` bash
94145tuneprompt fix
95-
96146` ` `
97147
98148The engine analyzes the failure, extracts constraints, and uses iterative meta-prompting to **rewrite your prompt** until it passes the test.
@@ -101,24 +151,102 @@ The engine analyzes the failure, extracts constraints, and uses iterative meta-p
101151
102152# # 📜 Configuration
103153
104- ` tuneprompt` supports OpenAI, Anthropic, and OpenRouter out of the box.
154+ Create a `tuneprompt.config.js` file in your project root :
105155
106156` ` ` javascript
107157// tuneprompt.config.js
108158module.exports = {
109159 providers: {
110- openai: { apiKey: process.env.OPENAI_API_KEY, model: 'gpt-4o' },
111- anthropic: { apiKey: process.env.ANTHROPIC_API_KEY, model: 'claude-3-5-sonnet' }
160+ openai: {
161+ apiKey: process.env.OPENAI_API_KEY,
162+ model: 'gpt-4o',
163+ temperature: 0.7
164+ },
165+ anthropic: {
166+ apiKey: process.env.ANTHROPIC_API_KEY,
167+ model: 'claude-3-5-sonnet'
168+ },
169+ openrouter: {
170+ apiKey: process.env.OPENROUTER_API_KEY,
171+ model: 'openai/gpt-4o'
172+ }
112173 },
113- threshold: 0.8 // Default semantic similarity
174+ testDir: './tests', // Directory containing test files
175+ threshold: 0.8, // Default semantic similarity
176+ timeout: 30000, // Default request timeout (ms)
177+ concurrency: 5, // Number of concurrent requests
178+ verbose: false // Enable detailed logging
114179};
180+ ` ` `
181+
182+ # ## Environment Variables
115183
184+ Set these environment variables to configure your LLM providers :
185+
186+ - ` OPENAI_API_KEY` - OpenAI API key
187+ - ` ANTHROPIC_API_KEY` - Anthropic API key
188+ - ` OPENROUTER_API_KEY` - OpenRouter API key
189+ - ` TUNEPROMPT_LICENSE_KEY` - Premium license key
190+
191+ ---
192+
193+ # # 🛠️ CLI Commands
194+
195+ | Command | Description |
196+ |---------|-------------|
197+ | `tuneprompt init` | Initialize a new project with config and sample tests |
198+ | `tuneprompt run` | Run all tests once |
199+ | `tuneprompt watch` | Watch for changes and run tests automatically |
200+ | `tuneprompt fix` | Auto-fix failing prompts (premium) |
201+ | `tuneprompt report` | Generate detailed test reports |
202+ | `tuneprompt activate` | Activate premium features |
203+
204+ # ## CLI Options
205+
206+ - ` --ci` - CI mode (exits with code 1 on test failure)
207+ - ` --verbose` - Show detailed output
208+ - ` --report` - Generate test reports in various formats
209+ - ` --provider` - Override default provider for this run
210+ - ` --threshold` - Override default threshold for this run
211+
212+ ---
213+
214+ # # 📊 Reporting
215+
216+ Generate detailed reports in multiple formats :
217+
218+ ` ` ` bash
219+ # Generate HTML report
220+ tuneprompt run --report html
221+
222+ # Generate JSON report
223+ tuneprompt run --report json
224+
225+ # Generate JUnit XML for CI integration
226+ tuneprompt run --report junit
116227` ` `
117228
118229---
119230
231+ # # 🔐 Privacy & Security
232+
233+ - All semantic comparisons happen locally using open-source embedding models
234+ - Your prompts and expected outputs never leave your machine (unless using premium cloud features)
235+ - API keys are only sent to the respective LLM providers
236+ - Premium features offer optional cloud processing for faster results
237+
238+ ---
239+
240+ # # 🤝 Contributing
241+
242+ We welcome contributions! Please see our [Contributing Guide](CONTRIBUTING.md) for details on how to get started.
243+
244+ ---
245+
120246# # ⚖️ License
121247
122248MIT © Tuneprompt. Premium features require a license key via `tuneprompt activate`.
123249
250+ For commercial use and enterprise support, contact us at [contact@tuneprompt.com](mailto:contact@tuneprompt.com).
251+
124252---
0 commit comments