Develop by solderzzc · Pull Request #175 · SharpAI/DeepCamera

solderzzc · 2026-03-21T06:57:47Z

No description provided.

…anity check - Add MODEL_FAMILIES config table with per-model API params and server flags - Add getModelApiParams() helper to inject reasoning_effort:none for Mistral - Add delta.thinking fallback in streaming loop to capture thinking tokens - Add streaming sanity check before benchmark run (detects empty-token loops) - Add test-model-config.cjs with 17 unit tests for model detection logic

Feature/mistral thinking fix

Add Mistral Small 4 (119B, IQ1_M + Q2_K_XL), NVIDIA Nemotron-3-Nano (4B + 30B), Liquid LFM2 (1.2B + 24B), Qwen3.5-9B BF16, and Qwen3.5-27B Q8_K_XL to the benchmark paper. Key updates: - Abstract: 7→16 models, 5 families, best local now 95.8% - Models Under Test table: grouped by family, 16 rows - Overall Scorecard: full 16-model ranking - Key Finding 3: quantization precision > parameter count - Conclusion: Qwen3.5-27B Q8 at 95.8%, Mistral-119B at 89.6%

Add Nemotron and LFM2 model families to MODEL_FAMILIES with minTemperature: 1.0 — these models reject temperature < 1.0 with HTTP 400. The benchmark now clamps temperature to the family minimum before sending the request. - Refactor getModelApiParams → getModelFamily (returns full config) - Add resolveTemperature logic in llmCall params builder - Update test-model-config.cjs: 27 tests including temperature clamp - Fix Mistral serverFlags to match current llm-server-manager.cjs

codeCraft-Ritik

Great work! The implementation is clean and easy to understand

llama-server with Qwen3.5/Claude-distilled models outputs thinking as 'Let me analyze...' plain text in delta.content (no <think> tags, no separate reasoning field). The JSON-expect abort was firing at 50 chars, killing the request after 8-10 tokens before the model could output actual JSON. Changes: - Raised JSON content check threshold from 50 to 200 chars - Strip common plain-text reasoning prefixes before checking - Only abort if 200+ chars of non-JSON, non-reasoning content

For JSON-expected tests, delta.content text arriving before any JSON start ({/[) is now routed to reasoningContent, not content. This handles llama-server with Qwen3.5-Claude models where thinking appears as plain text in delta.content without <think> tags. The model's thinking is logged/shown but NOT evaluated as output. Only the actual JSON content (after the first {/[) is treated as the model's response for test evaluation.

- Docker-only deployment for all platforms (Linux, macOS, Windows) - Docker Desktop 4.35+ USB/IP for macOS/Windows USB passthrough - YOLO26n Edge TPU model (INT8, 320x320, ~4ms inference) - pycoral-based inference with CPU fallback - JSONL stdin/stdout protocol (same as yolo-detection-2026) - deploy.sh/deploy.bat for autonomous Docker image build - Colab/Kaggle compilation script for Edge TPU model - TPU device selector and clock speed config

- Docker deployment using official openvino/ubuntu22_runtime image - Supports Intel NCS2 (MYRIAD), Intel GPU (iGPU/Arc), and CPU - AUTO device selector lets OpenVINO pick best available - FP16/INT8/FP32 precision options - YOLO26n with Ultralytics OpenVINO backend - JSONL stdin/stdout protocol (same as yolo-detection-2026) - Colab script for model export (runs on any platform)

… update OpenVINO detect.py: - Add file_read timing metric (matches Coral TPU) - Add frame-not-found guard in main loop (empty detections response) - Add invalid JSON log message instead of silent continue OpenVINO SKILL.md: - Add description fields to all parameters - Add Platform Setup (Linux/macOS/Windows) section - Add Model section with compile instructions - Add Bounding Box Format section OpenVINO deploy.sh: - Add find_docker() function pattern - Add exit code 2 for partial success (CPU-only) - Add architecture to platform progress event - Add accelerator_found field in complete event OpenVINO deploy.bat: - Add Docker version reporting - Add device probe result checking New: scripts/compile_model.py - Local model export (--model, --size, --precision, --output) - FP16/INT8/FP32 via YOLO.export(format=openvino) README.md: - Add Coral TPU and OpenVINO to Skill Catalog (🧪 Testing) - Add Detection & Segmentation Skills architecture section - Add mermaid diagram showing native vs Docker detection paths - Add LLM-Assisted Skill Installation explanation

feat: add YOLO 2026 Coral TPU detection skill (Docker-based)

solderzzc and others added 4 commits March 20, 2026 23:53

Merge pull request #174 from SharpAI/feature/mistral-thinking-fix

a304253

Feature/mistral thinking fix

codeCraft-Ritik reviewed Mar 23, 2026

View reviewed changes

solderzzc and others added 8 commits March 23, 2026 15:21

fix: handle delta.reasoning field from mlx-lm Python server

c031470

fix: sanity check counts delta.reasoning as valid activity

d4b3550

Merge pull request #178 from SharpAI/feature/coral-tpu-detection

65f3ca5

feat: add YOLO 2026 Coral TPU detection skill (Docker-based)

solderzzc merged commit a275e40 into master Mar 25, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Develop#175

Develop#175
solderzzc merged 12 commits intomasterfrom
develop

solderzzc commented Mar 21, 2026

Uh oh!

codeCraft-Ritik left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

solderzzc commented Mar 21, 2026

Uh oh!

codeCraft-Ritik left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants