diff --git a/README.md b/README.md index ed6f6b6..fba182a 100644 --- a/README.md +++ b/README.md @@ -169,6 +169,8 @@ Run it on your own hardware to know exactly where your setup stands. Running on a **Mac M1 Mini 8GB**: local Qwen3.5-4B scores **39/54** (72%), cloud GPT-5.2 scores **46/48** (96%), and the hybrid config reaches **53/54** (98%). All 35 VLM test images are **AI-generated** โ€” no real footage, fully privacy-compliant. +The benchmark supports multiple cloud LLM providers including **OpenAI**, **[MiniMax](https://www.minimaxi.com)** (M2.7, M2.5), and any OpenAI-compatible endpoint โ€” set `AEGIS_LLM_API_TYPE=minimax` with your `MINIMAX_API_KEY` to benchmark MiniMax models against your local setup. + ๐Ÿ“„ [Read the Paper](docs/paper/home-security-benchmark.pdf) ยท ๐Ÿ”ฌ [Run It Yourself](skills/analysis/home-security-benchmark/) ยท ๐Ÿ“‹ [Test Scenarios](skills/analysis/home-security-benchmark/fixtures/) --- diff --git a/skills/analysis/home-security-benchmark/SKILL.md b/skills/analysis/home-security-benchmark/SKILL.md index 03ccafb..2612e4e 100644 --- a/skills/analysis/home-security-benchmark/SKILL.md +++ b/skills/analysis/home-security-benchmark/SKILL.md @@ -49,6 +49,14 @@ node scripts/run-benchmark.cjs --gateway http://localhost:5407 # Skip report auto-open node scripts/run-benchmark.cjs --no-open + +# Benchmark with MiniMax Cloud API +AEGIS_LLM_API_TYPE=minimax MINIMAX_API_KEY=your-key \ + node scripts/run-benchmark.cjs + +# MiniMax with a specific model +AEGIS_LLM_API_TYPE=minimax MINIMAX_API_KEY=your-key AEGIS_LLM_MODEL=MiniMax-M2.7-highspeed \ + node scripts/run-benchmark.cjs ``` ## Configuration @@ -59,10 +67,11 @@ node scripts/run-benchmark.cjs --no-open |----------|---------|-------------| | `AEGIS_GATEWAY_URL` | `http://localhost:5407` | LLM gateway (OpenAI-compatible) | | `AEGIS_LLM_URL` | โ€” | Direct llama-server LLM endpoint | -| `AEGIS_LLM_API_TYPE` | `openai` | LLM provider type (builtin, openai, etc.) | +| `AEGIS_LLM_API_TYPE` | `openai` | LLM provider type (`builtin`, `openai`, `minimax`) | | `AEGIS_LLM_MODEL` | โ€” | LLM model name | | `AEGIS_LLM_API_KEY` | โ€” | API key for cloud LLM providers | | `AEGIS_LLM_BASE_URL` | โ€” | Cloud provider base URL (e.g. `https://api.openai.com/v1`) | +| `MINIMAX_API_KEY` | โ€” | MiniMax API key (fallback when `AEGIS_LLM_API_KEY` is not set) | | `AEGIS_VLM_URL` | *(disabled)* | VLM server base URL | | `AEGIS_VLM_MODEL` | โ€” | Loaded VLM model ID | | `AEGIS_SKILL_ID` | โ€” | Skill identifier (enables skill mode) | @@ -77,6 +86,8 @@ This skill includes a [`config.yaml`](config.yaml) that defines user-configurabl | Parameter | Type | Default | Description | |-----------|------|---------|-------------| | `mode` | select | `llm` | Which suites to run: `llm` (96 tests), `vlm` (47 tests), or `full` (143 tests) | +| `llmProvider` | select | `builtin` | LLM provider: `builtin` (local), `openai`, or `minimax` | +| `minimaxModel` | select | `MiniMax-M2.7` | MiniMax model to benchmark (requires `llmProvider=minimax`) | | `noOpen` | boolean | `false` | Skip auto-opening the HTML report in browser | Platform parameters like `AEGIS_GATEWAY_URL` and `AEGIS_VLM_URL` are auto-injected by Aegis โ€” they are **not** in `config.yaml`. See [Aegis Skill Platform Parameters](../../../docs/skill-params.md) for the full platform contract. @@ -141,5 +152,14 @@ Results are saved to `~/.aegis-ai/benchmarks/` as JSON. An HTML report with cros - Node.js โ‰ฅ 18 - `npm install` (for `openai` SDK dependency) -- Running LLM server (llama-server, OpenAI API, or any OpenAI-compatible endpoint) +- Running LLM server (llama-server, OpenAI API, MiniMax Cloud API, or any OpenAI-compatible endpoint) - Optional: Running VLM server for scene analysis tests (47 tests) + +## Supported LLM Providers + +| Provider | API Type | Models | Notes | +|----------|----------|--------|-------| +| Local (llama-server) | `builtin` | Any GGUF model | Default โ€” runs on your hardware | +| OpenAI | `openai` | GPT-5.4, etc. | Requires `AEGIS_LLM_API_KEY` | +| **MiniMax** | `minimax` | MiniMax-M2.7, M2.7-highspeed, M2.5, M2.5-highspeed | Auto-configured base URL, temp clamping [0, 1] | +| Any OpenAI-compatible | โ€” | โ€” | Set `AEGIS_LLM_BASE_URL` + `AEGIS_LLM_API_KEY` | diff --git a/skills/analysis/home-security-benchmark/config.yaml b/skills/analysis/home-security-benchmark/config.yaml index c643fb7..b978e39 100644 --- a/skills/analysis/home-security-benchmark/config.yaml +++ b/skills/analysis/home-security-benchmark/config.yaml @@ -6,6 +6,20 @@ params: default: llm description: "Which test suites to run: llm-only, vlm-only, or full" + - key: llmProvider + label: LLM Provider + type: select + options: [builtin, openai, minimax] + default: builtin + description: "LLM provider: builtin (local llama-server), openai, or minimax (MiniMax Cloud API)" + + - key: minimaxModel + label: MiniMax Model + type: select + options: [MiniMax-M2.7, MiniMax-M2.7-highspeed, MiniMax-M2.5, MiniMax-M2.5-highspeed] + default: MiniMax-M2.7 + description: "MiniMax model to benchmark (requires llmProvider=minimax and MINIMAX_API_KEY)" + - key: noOpen label: Don't auto-open report type: boolean diff --git a/skills/analysis/home-security-benchmark/scripts/run-benchmark.cjs b/skills/analysis/home-security-benchmark/scripts/run-benchmark.cjs index 8598be1..cb69083 100644 --- a/skills/analysis/home-security-benchmark/scripts/run-benchmark.cjs +++ b/skills/analysis/home-security-benchmark/scripts/run-benchmark.cjs @@ -63,10 +63,20 @@ Options: -h, --help Show this help message Environment Variables (set by Aegis): - AEGIS_GATEWAY_URL LLM gateway URL - AEGIS_VLM_URL VLM server base URL - AEGIS_SKILL_ID Skill identifier (enables skill mode) - AEGIS_SKILL_PARAMS JSON params from skill config + AEGIS_GATEWAY_URL LLM gateway URL + AEGIS_LLM_API_TYPE LLM provider: builtin, openai, minimax + AEGIS_LLM_MODEL LLM model name + AEGIS_LLM_API_KEY API key for cloud providers + AEGIS_LLM_BASE_URL Cloud provider base URL + AEGIS_VLM_URL VLM server base URL + AEGIS_SKILL_ID Skill identifier (enables skill mode) + AEGIS_SKILL_PARAMS JSON params from skill config + MINIMAX_API_KEY MiniMax API key (fallback for AEGIS_LLM_API_KEY) + +Providers: + minimax MiniMax Cloud API (auto-configured, models: M2.7, M2.7-highspeed, M2.5, M2.5-highspeed) + openai OpenAI API + builtin Local llama-server (default) Tests: 131 total (96 LLM + 35 VLM) across 16 suites `.trim()); @@ -93,21 +103,55 @@ const FIXTURES_DIR = path.join(__dirname, '..', 'fixtures'); // API type and model info from Aegis (or defaults for standalone) const LLM_API_TYPE = process.env.AEGIS_LLM_API_TYPE || 'openai'; const LLM_MODEL = process.env.AEGIS_LLM_MODEL || ''; -const LLM_API_KEY = process.env.AEGIS_LLM_API_KEY || ''; +const LLM_API_KEY = process.env.AEGIS_LLM_API_KEY || process.env.MINIMAX_API_KEY || ''; const LLM_BASE_URL = process.env.AEGIS_LLM_BASE_URL || ''; const VLM_API_TYPE = process.env.AEGIS_VLM_API_TYPE || 'openai-compatible'; const VLM_MODEL = process.env.AEGIS_VLM_MODEL || ''; +// โ”€โ”€โ”€ Provider Presets โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ +// Auto-configure known cloud LLM providers by API type. +// MiniMax uses an OpenAI-compatible API at https://api.minimax.io/v1 +const PROVIDER_PRESETS = { + minimax: { + baseUrl: 'https://api.minimax.io/v1', + defaultModel: 'MiniMax-M2.7', + models: ['MiniMax-M2.7', 'MiniMax-M2.7-highspeed', 'MiniMax-M2.5', 'MiniMax-M2.5-highspeed'], + }, + openai: { + baseUrl: 'https://api.openai.com/v1', + defaultModel: '', + models: [], + }, +}; + +const isMiniMaxProvider = LLM_API_TYPE === 'minimax' + || LLM_BASE_URL.includes('api.minimax.io') + || LLM_BASE_URL.includes('minimax'); + // โ”€โ”€โ”€ OpenAI SDK Clients โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ const OpenAI = require('openai'); // Resolve LLM base URL โ€” priority: cloud provider โ†’ direct llama-server โ†’ gateway const strip = (u) => u.replace(/\/v1\/?$/, ''); -const llmBaseUrl = LLM_BASE_URL - ? `${strip(LLM_BASE_URL)}/v1` - : LLM_URL - ? `${strip(LLM_URL)}/v1` - : `${GATEWAY_URL}/v1`; + +// Auto-resolve base URL for known providers (e.g. minimax โ†’ api.minimax.io) +function resolveProviderBaseUrl() { + if (LLM_BASE_URL) return `${strip(LLM_BASE_URL)}/v1`; + const preset = PROVIDER_PRESETS[LLM_API_TYPE]; + if (preset) return preset.baseUrl; + if (LLM_URL) return `${strip(LLM_URL)}/v1`; + return `${GATEWAY_URL}/v1`; +} + +// Auto-resolve default model for known providers +function resolveProviderModel() { + if (LLM_MODEL) return LLM_MODEL; + const preset = PROVIDER_PRESETS[LLM_API_TYPE]; + return preset ? preset.defaultModel : ''; +} + +const llmBaseUrl = resolveProviderBaseUrl(); +const effectiveModel = resolveProviderModel(); const llmClient = new OpenAI({ apiKey: LLM_API_KEY || 'not-needed', // Local servers don't require auth @@ -167,7 +211,7 @@ async function llmCall(messages, opts = {}) { throw new Error(opts.vlm ? 'VLM client not configured' : 'LLM client not configured'); } - const model = opts.model || (opts.vlm ? VLM_MODEL : LLM_MODEL) || undefined; + const model = opts.model || (opts.vlm ? VLM_MODEL : effectiveModel) || undefined; // For JSON-expected tests, use low temperature + top_p to encourage // direct JSON output without extended reasoning. // NOTE: Do NOT inject assistant prefill โ€” Qwen3.5 rejects prefill @@ -220,12 +264,18 @@ async function llmCall(messages, opts = {}) { // Determine the correct max-tokens parameter name: // - OpenAI cloud (GPT-5.4+): requires 'max_completion_tokens', rejects 'max_tokens' // - Local llama-server: requires 'max_tokens', may not understand 'max_completion_tokens' - const isCloudApi = !opts.vlm && (LLM_API_TYPE === 'openai' || LLM_BASE_URL.includes('openai.com') || LLM_BASE_URL.includes('api.anthropic')); + const isCloudApi = !opts.vlm && (LLM_API_TYPE === 'openai' || LLM_API_TYPE === 'minimax' || isMiniMaxProvider || LLM_BASE_URL.includes('openai.com') || LLM_BASE_URL.includes('api.anthropic')); // No max_tokens for any API โ€” the streaming loop's 2000-token hard cap is the safety net. // Sending max_tokens to thinking models (Qwen3.5) starves actual output since // reasoning_content counts against the limit. + // MiniMax temperature clamping: API accepts [0, 1.0] + let temperature = opts.temperature; + if (isMiniMaxProvider && temperature !== undefined) { + temperature = Math.max(0, Math.min(1.0, temperature)); + } + // Build request params const params = { messages, @@ -234,8 +284,8 @@ async function llmCall(messages, opts = {}) { // llama-server crashes with "Failed to parse input" when stream_options is present) ...(isCloudApi && { stream_options: { include_usage: true } }), ...(model && { model }), - ...(opts.temperature !== undefined && { temperature: opts.temperature }), - ...(opts.expectJSON && opts.temperature === undefined && { temperature: 0.7 }), + ...(temperature !== undefined && { temperature }), + ...(opts.expectJSON && temperature === undefined && { temperature: isMiniMaxProvider ? 0.7 : 0.7 }), ...(opts.expectJSON && { top_p: 0.8 }), ...(opts.tools && { tools: opts.tools }), }; @@ -2320,13 +2370,9 @@ async function main() { log('โ•‘ Home Security AI Benchmark Suite โ€ข DeepCamera / SharpAI โ•‘'); log('โ•šโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•'); // Resolve the LLM endpoint that will actually be used - const effectiveLlmUrl = LLM_BASE_URL - ? LLM_BASE_URL.replace(/\/v1\/?$/, '') - : LLM_URL - ? LLM_URL.replace(/\/v1\/?$/, '') - : GATEWAY_URL; + const effectiveLlmUrl = llmBaseUrl.replace(/\/v1\/?$/, ''); - log(` LLM: ${LLM_API_TYPE} @ ${effectiveLlmUrl}${LLM_MODEL ? ' โ†’ ' + LLM_MODEL : ''}`); + log(` LLM: ${LLM_API_TYPE} @ ${effectiveLlmUrl}${effectiveModel ? ' โ†’ ' + effectiveModel : ''}`); log(` VLM: ${VLM_URL || '(disabled โ€” use --vlm URL to enable)'}${VLM_MODEL ? ' โ†’ ' + VLM_MODEL : ''}`); log(` Results: ${RESULTS_DIR}`); log(` Mode: ${IS_SKILL_MODE ? 'Aegis Skill' : 'Standalone'} (streaming, ${IDLE_TIMEOUT_MS / 1000}s idle timeout)`); @@ -2335,7 +2381,7 @@ async function main() { // Healthcheck โ€” ping the LLM endpoint via SDK try { const ping = await llmClient.chat.completions.create({ - ...(LLM_MODEL && { model: LLM_MODEL }), + ...(effectiveModel && { model: effectiveModel }), messages: [{ role: 'user', content: 'ping' }], }); results.model.name = ping.model || 'unknown'; diff --git a/skills/analysis/home-security-benchmark/tests/minimax-provider.test.cjs b/skills/analysis/home-security-benchmark/tests/minimax-provider.test.cjs new file mode 100644 index 0000000..d7d25b0 --- /dev/null +++ b/skills/analysis/home-security-benchmark/tests/minimax-provider.test.cjs @@ -0,0 +1,433 @@ +/** + * Unit & Integration tests for MiniMax LLM provider support + * in the Home Security AI Benchmark. + * + * Run: node tests/minimax-provider.test.cjs + */ + +const assert = require('assert'); +const path = require('path'); + +// โ”€โ”€โ”€ Test Framework โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ + +let passed = 0; +let failed = 0; +const failures = []; + +function test(name, fn) { + try { + fn(); + passed++; + console.log(` โœ… ${name}`); + } catch (err) { + failed++; + failures.push({ name, error: err.message }); + console.log(` โŒ ${name}: ${err.message}`); + } +} + +async function asyncTest(name, fn) { + try { + await fn(); + passed++; + console.log(` โœ… ${name}`); + } catch (err) { + failed++; + failures.push({ name, error: err.message }); + console.log(` โŒ ${name}: ${err.message}`); + } +} + +function suite(name, fn) { + console.log(`\n๐Ÿ“ฆ ${name}`); + fn(); +} + +// โ”€โ”€โ”€ Helpers: simulate the provider resolution logic from run-benchmark.cjs โ”€ + +const PROVIDER_PRESETS = { + minimax: { + baseUrl: 'https://api.minimax.io/v1', + defaultModel: 'MiniMax-M2.7', + models: ['MiniMax-M2.7', 'MiniMax-M2.7-highspeed', 'MiniMax-M2.5', 'MiniMax-M2.5-highspeed'], + }, + openai: { + baseUrl: 'https://api.openai.com/v1', + defaultModel: '', + models: [], + }, +}; + +function resolveProviderBaseUrl(apiType, baseUrl, llmUrl, gatewayUrl) { + const strip = (u) => u.replace(/\/v1\/?$/, ''); + if (baseUrl) return `${strip(baseUrl)}/v1`; + const preset = PROVIDER_PRESETS[apiType]; + if (preset) return preset.baseUrl; + if (llmUrl) return `${strip(llmUrl)}/v1`; + return `${gatewayUrl}/v1`; +} + +function resolveProviderModel(apiType, model) { + if (model) return model; + const preset = PROVIDER_PRESETS[apiType]; + return preset ? preset.defaultModel : ''; +} + +function isMiniMaxProvider(apiType, baseUrl) { + return apiType === 'minimax' + || baseUrl.includes('api.minimax.io') + || baseUrl.includes('minimax'); +} + +function clampTemperature(temp, isMiniMax) { + if (isMiniMax && temp !== undefined) { + return Math.max(0, Math.min(1.0, temp)); + } + return temp; +} + +// โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ• +// UNIT TESTS +// โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ• + +suite('Provider Preset Resolution', () => { + test('minimax API type resolves to api.minimax.io base URL', () => { + const url = resolveProviderBaseUrl('minimax', '', '', 'http://localhost:5407'); + assert.strictEqual(url, 'https://api.minimax.io/v1'); + }); + + test('openai API type resolves to api.openai.com base URL', () => { + const url = resolveProviderBaseUrl('openai', '', '', 'http://localhost:5407'); + assert.strictEqual(url, 'https://api.openai.com/v1'); + }); + + test('explicit base URL overrides provider preset', () => { + const url = resolveProviderBaseUrl('minimax', 'https://custom.example.com/v1', '', 'http://localhost:5407'); + assert.strictEqual(url, 'https://custom.example.com/v1'); + }); + + test('unknown API type falls back to gateway URL', () => { + const url = resolveProviderBaseUrl('builtin', '', '', 'http://localhost:5407'); + assert.strictEqual(url, 'http://localhost:5407/v1'); + }); + + test('direct LLM URL takes precedence over gateway for unknown providers', () => { + const url = resolveProviderBaseUrl('builtin', '', 'http://localhost:8080', 'http://localhost:5407'); + assert.strictEqual(url, 'http://localhost:8080/v1'); + }); + + test('strips trailing /v1 from explicit base URL before appending', () => { + const url = resolveProviderBaseUrl('minimax', 'https://api.minimax.io/v1/', '', 'http://localhost:5407'); + assert.strictEqual(url, 'https://api.minimax.io/v1'); + }); +}); + +suite('Provider Model Resolution', () => { + test('minimax API type defaults to MiniMax-M2.7', () => { + const model = resolveProviderModel('minimax', ''); + assert.strictEqual(model, 'MiniMax-M2.7'); + }); + + test('explicit model overrides default', () => { + const model = resolveProviderModel('minimax', 'MiniMax-M2.5-highspeed'); + assert.strictEqual(model, 'MiniMax-M2.5-highspeed'); + }); + + test('openai API type has no default model', () => { + const model = resolveProviderModel('openai', ''); + assert.strictEqual(model, ''); + }); + + test('unknown API type has no default model', () => { + const model = resolveProviderModel('builtin', ''); + assert.strictEqual(model, ''); + }); +}); + +suite('MiniMax Provider Detection', () => { + test('detects minimax via API type', () => { + assert.strictEqual(isMiniMaxProvider('minimax', ''), true); + }); + + test('detects minimax via base URL containing api.minimax.io', () => { + assert.strictEqual(isMiniMaxProvider('openai', 'https://api.minimax.io/v1'), true); + }); + + test('detects minimax via base URL containing minimax', () => { + assert.strictEqual(isMiniMaxProvider('openai', 'https://minimax-proxy.example.com'), true); + }); + + test('does not detect non-minimax providers', () => { + assert.strictEqual(isMiniMaxProvider('openai', 'https://api.openai.com'), false); + }); + + test('does not detect builtin as minimax', () => { + assert.strictEqual(isMiniMaxProvider('builtin', ''), false); + }); +}); + +suite('Temperature Clamping', () => { + test('clamps temperature > 1.0 to 1.0 for MiniMax', () => { + assert.strictEqual(clampTemperature(1.5, true), 1.0); + }); + + test('clamps temperature < 0 to 0 for MiniMax', () => { + assert.strictEqual(clampTemperature(-0.1, true), 0); + }); + + test('temperature=0 is valid for MiniMax', () => { + assert.strictEqual(clampTemperature(0, true), 0); + }); + + test('temperature=1.0 is valid for MiniMax', () => { + assert.strictEqual(clampTemperature(1.0, true), 1.0); + }); + + test('temperature=0.5 passes through for MiniMax', () => { + assert.strictEqual(clampTemperature(0.5, true), 0.5); + }); + + test('undefined temperature passes through for MiniMax', () => { + assert.strictEqual(clampTemperature(undefined, true), undefined); + }); + + test('does not clamp temperature for non-MiniMax providers', () => { + assert.strictEqual(clampTemperature(2.0, false), 2.0); + }); + + test('temperature=0.1 (common benchmark value) passes through', () => { + assert.strictEqual(clampTemperature(0.1, true), 0.1); + }); +}); + +suite('MiniMax Model Catalog', () => { + test('MiniMax preset includes M2.7 model', () => { + assert.ok(PROVIDER_PRESETS.minimax.models.includes('MiniMax-M2.7')); + }); + + test('MiniMax preset includes M2.7-highspeed model', () => { + assert.ok(PROVIDER_PRESETS.minimax.models.includes('MiniMax-M2.7-highspeed')); + }); + + test('MiniMax preset includes M2.5 model', () => { + assert.ok(PROVIDER_PRESETS.minimax.models.includes('MiniMax-M2.5')); + }); + + test('MiniMax preset includes M2.5-highspeed model', () => { + assert.ok(PROVIDER_PRESETS.minimax.models.includes('MiniMax-M2.5-highspeed')); + }); + + test('MiniMax preset has exactly 4 models', () => { + assert.strictEqual(PROVIDER_PRESETS.minimax.models.length, 4); + }); + + test('MiniMax base URL uses https', () => { + assert.ok(PROVIDER_PRESETS.minimax.baseUrl.startsWith('https://')); + }); +}); + +suite('Cloud API Detection', () => { + test('minimax API type is recognized as cloud API', () => { + // isCloudApi logic: LLM_API_TYPE === 'minimax' || isMiniMaxProvider + const apiType = 'minimax'; + const isCloud = apiType === 'openai' || apiType === 'minimax'; + assert.strictEqual(isCloud, true); + }); + + test('openai API type is recognized as cloud API', () => { + const apiType = 'openai'; + const isCloud = apiType === 'openai' || apiType === 'minimax'; + assert.strictEqual(isCloud, true); + }); + + test('builtin API type is not cloud API', () => { + const apiType = 'builtin'; + const isCloud = apiType === 'openai' || apiType === 'minimax'; + assert.strictEqual(isCloud, false); + }); +}); + +suite('Config File Validation', () => { + const fs = require('fs'); + const configPath = path.join(__dirname, '..', 'config.yaml'); + + test('config.yaml exists', () => { + assert.ok(fs.existsSync(configPath)); + }); + + test('config.yaml contains llmProvider parameter', () => { + const content = fs.readFileSync(configPath, 'utf8'); + assert.ok(content.includes('llmProvider')); + }); + + test('config.yaml contains minimax option for llmProvider', () => { + const content = fs.readFileSync(configPath, 'utf8'); + assert.ok(content.includes('minimax')); + }); + + test('config.yaml contains minimaxModel parameter', () => { + const content = fs.readFileSync(configPath, 'utf8'); + assert.ok(content.includes('minimaxModel')); + }); + + test('config.yaml lists MiniMax-M2.7 model', () => { + const content = fs.readFileSync(configPath, 'utf8'); + assert.ok(content.includes('MiniMax-M2.7')); + }); + + test('config.yaml lists MiniMax-M2.5-highspeed model', () => { + const content = fs.readFileSync(configPath, 'utf8'); + assert.ok(content.includes('MiniMax-M2.5-highspeed')); + }); +}); + +suite('SKILL.md Documentation', () => { + const fs = require('fs'); + const skillMdPath = path.join(__dirname, '..', 'SKILL.md'); + + test('SKILL.md documents minimax API type', () => { + const content = fs.readFileSync(skillMdPath, 'utf8'); + assert.ok(content.includes('minimax')); + }); + + test('SKILL.md documents MINIMAX_API_KEY env var', () => { + const content = fs.readFileSync(skillMdPath, 'utf8'); + assert.ok(content.includes('MINIMAX_API_KEY')); + }); + + test('SKILL.md includes MiniMax standalone usage example', () => { + const content = fs.readFileSync(skillMdPath, 'utf8'); + assert.ok(content.includes('AEGIS_LLM_API_TYPE=minimax')); + }); + + test('SKILL.md lists supported providers table', () => { + const content = fs.readFileSync(skillMdPath, 'utf8'); + assert.ok(content.includes('Supported LLM Providers')); + }); +}); + +suite('Script Source Validation', () => { + const fs = require('fs'); + const scriptPath = path.join(__dirname, '..', 'scripts', 'run-benchmark.cjs'); + const source = fs.readFileSync(scriptPath, 'utf8'); + + test('script defines PROVIDER_PRESETS with minimax', () => { + assert.ok(source.includes("minimax: {")); + assert.ok(source.includes("baseUrl: 'https://api.minimax.io/v1'")); + }); + + test('script detects MiniMax via isMiniMaxProvider', () => { + assert.ok(source.includes('isMiniMaxProvider')); + }); + + test('script implements temperature clamping for MiniMax', () => { + assert.ok(source.includes('Math.max(0, Math.min(1.0, temperature))')); + }); + + test('script reads MINIMAX_API_KEY as fallback', () => { + assert.ok(source.includes('MINIMAX_API_KEY')); + }); + + test('script uses resolveProviderBaseUrl function', () => { + assert.ok(source.includes('resolveProviderBaseUrl')); + }); + + test('script uses resolveProviderModel function', () => { + assert.ok(source.includes('resolveProviderModel')); + }); + + test('script includes MiniMax in isCloudApi check', () => { + assert.ok(source.includes("LLM_API_TYPE === 'minimax'")); + }); +}); + +// โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ• +// INTEGRATION TESTS (require MINIMAX_API_KEY) +// โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ• + +async function runIntegrationTests() { + const apiKey = process.env.MINIMAX_API_KEY; + if (!apiKey) { + console.log('\nโญ๏ธ Skipping integration tests (set MINIMAX_API_KEY to enable)'); + return; + } + + console.log('\nโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•'); + console.log(' INTEGRATION TESTS (live MiniMax API)'); + console.log('โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•'); + + const OpenAI = require('openai'); + const client = new OpenAI({ + apiKey, + baseURL: 'https://api.minimax.io/v1', + }); + + await asyncTest('MiniMax API responds to simple chat completion', async () => { + const response = await client.chat.completions.create({ + model: 'MiniMax-M2.7', + messages: [{ role: 'user', content: 'Reply with exactly: BENCHMARK_OK' }], + temperature: 0.1, + max_tokens: 20, + }); + assert.ok(response.choices[0].message.content.includes('BENCHMARK_OK')); + }); + + await asyncTest('MiniMax API supports streaming', async () => { + const stream = await client.chat.completions.create({ + model: 'MiniMax-M2.7', + messages: [{ role: 'user', content: 'Reply with exactly one word: hello' }], + temperature: 0, + stream: true, + stream_options: { include_usage: true }, + }); + let content = ''; + let hasUsage = false; + for await (const chunk of stream) { + if (chunk.choices?.[0]?.delta?.content) content += chunk.choices[0].delta.content; + if (chunk.usage) hasUsage = true; + } + assert.ok(content.toLowerCase().includes('hello'), `Expected "hello" in: ${content}`); + assert.ok(hasUsage, 'Expected usage data in stream'); + }); + + await asyncTest('MiniMax API accepts temperature=0', async () => { + // Verify the API does not reject temperature=0 (no error thrown) + const response = await client.chat.completions.create({ + model: 'MiniMax-M2.7', + messages: [{ role: 'user', content: 'Reply with exactly: TEMP_ZERO_OK' }], + temperature: 0, + max_tokens: 20, + }); + assert.ok(response.choices && response.choices.length > 0, 'Expected at least one choice'); + assert.ok(response.choices[0].message.content.length > 0, 'Expected non-empty response'); + }); +} + +// โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ• +// MAIN +// โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ• + +async function main() { + console.log('โ•”โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•—'); + console.log('โ•‘ MiniMax Provider Tests โ€ข Home Security AI Benchmark โ•‘'); + console.log('โ•šโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•'); + + // Integration tests + await runIntegrationTests(); + + // Summary + console.log(`\n${'โ•'.repeat(60)}`); + console.log(` RESULTS: ${passed} passed, ${failed} failed (${passed + failed} total)`); + console.log(`${'โ•'.repeat(60)}`); + + if (failures.length > 0) { + console.log('\n Failures:'); + for (const f of failures) { + console.log(` โŒ ${f.name}: ${f.error}`); + } + } + + process.exit(failed > 0 ? 1 : 0); +} + +main();