diff --git a/README.md b/README.md index 3b6630a..12d80bc 100644 --- a/README.md +++ b/README.md @@ -30,6 +30,8 @@ Dyana is a sandbox environment using Docker and [Tracee](https://github.com/aquasecurity/tracee) for loading, running and profiling a wide range of files, including machine learning models, ELF executables, Pickle serialized files, Javascripts [and more](https://docs.dreadnode.io/open-source/dyana/topics/loaders). It provides detailed insights into GPU memory usage, filesystem interactions, network requests, and security related events. +It also includes a lightweight host-side planning command, `dyana fit`, that recommends models likely to fit the current machine based on available RAM, GPU memory, and detected local runtimes. + ## Installation Install with: @@ -70,6 +72,24 @@ uv run pytest dyana See our docs on dyana usage [here](https://docs.dreadnode.io/open-source/dyana/basic-usage) +Quick example: + +```bash +dyana fit --use-case coding --top-k 5 +``` + +Constrain the recommendation surface: + +```bash +dyana fit --use-case coding --runtime ollama --max-memory-gb 12 --explain-excluded +``` + +Get Dyana-native recommendations for the `automodel` loader: + +```bash +dyana fit --use-case coding --runtime automodel +``` + ## License Dyana is released under the [MIT license](LICENSE). Tracee is released under the [Apache 2.0 license](third_party_licenses/APACHE2.md). diff --git a/docs/basic-usage.md b/docs/basic-usage.md index 796b442..9268ef9 100644 --- a/docs/basic-usage.md +++ b/docs/basic-usage.md @@ -12,6 +12,30 @@ Show help for a specific loader: dyana help automodel ``` +Plan model choices against the current machine before tracing: + +```bash +dyana fit --use-case coding --top-k 5 +``` + +Emit machine-readable fit recommendations: + +```bash +dyana fit --use-case general --json +``` + +Restrict the planner to a specific runtime and memory budget: + +```bash +dyana fit --use-case coding --runtime ollama --max-memory-gb 12 --explain-excluded +``` + +Plan specifically for Dyana's built-in automodel loader: + +```bash +dyana fit --use-case coding --runtime automodel +``` + Create a trace file for a loader run: ```bash @@ -36,6 +60,8 @@ Show a summary of a trace file: dyana summary --trace-path trace.json ``` +`dyana fit` is host-side only. It does not start Docker, pull models, or execute artifacts. It is intended as a quick planning step before a real traced run. + ## Default Safeguards Network access is disabled by default for loader containers. Allow it explicitly when needed: diff --git a/docs/fit.md b/docs/fit.md new file mode 100644 index 0000000..7571c52 --- /dev/null +++ b/docs/fit.md @@ -0,0 +1,90 @@ +# Fit Planning + +`dyana fit` recommends a small set of models that are likely to fit the current machine. + +Unlike `dyana trace`, this command is host-side only: + +- it does not start Docker +- it does not run loaders +- it does not download models +- it does not execute artifacts + +It is meant to answer a narrower question first: what is even worth trying on this hardware? + +## What It Uses + +The current prototype looks at: + +- total system RAM +- detected NVIDIA GPU memory, if present +- Apple Silicon unified memory heuristics on `Darwin arm64` +- detected runtimes such as Dyana `automodel`, `ollama`, `llama.cpp`, and `mlx` +- a packaged local model and provider catalog + +## Examples + +Recommend coding-oriented models: + +```bash +dyana fit --use-case coding --top-k 5 +``` + +Get JSON output for automation: + +```bash +dyana fit --use-case general --top-k 3 --json +``` + +Limit results to a specific runtime and budget: + +```bash +dyana fit --use-case coding --runtime ollama --max-memory-gb 12 +``` + +Prefer a Dyana-native execution path: + +```bash +dyana fit --use-case coding --runtime automodel +``` + +Explain why some candidates were excluded: + +```bash +dyana fit --use-case coding --explain-excluded +``` + +## Output + +The text view shows: + +- detected hardware summary +- detected runtimes +- ranked recommendations +- estimated memory use +- runtime and quantization choice +- a short rationale for each recommendation +- provider-specific artifact and invocation hints +- optional exclusion reasons for rejected candidates + +The JSON view includes the same information in a machine-readable structure. + +## Preferences + +The planner supports a small set of opinionated controls: + +- `--runtime` to limit results to `automodel`, `ollama`, `mlx`, or `llama_cpp` +- `--max-memory-gb` to cap the effective memory budget +- `--preference balanced|quality|speed` to nudge quantization ranking +- `--explain-excluded` to include a short rejection reason for excluded candidates + +## Current Scope + +This is intentionally lightweight. The prototype: + +- uses simple fit heuristics instead of benchmark-backed throughput estimates +- ranks a packaged local catalog rather than a large external model index +- focuses on fit and practical starting points, not exhaustive provider support + +The command is a planning tool. For real artifact execution and profiling, continue to use `dyana trace`. + +When the selected provider is `automodel`, the recommendation includes a Dyana invocation hint using `dyana trace --loader automodel`. diff --git a/docs/index.md b/docs/index.md index 1d190ef..a9e9fc2 100644 --- a/docs/index.md +++ b/docs/index.md @@ -2,12 +2,19 @@ Dyana is a sandbox environment using Docker and [Tracee](https://github.com/aquasecurity/tracee) for loading, running, and profiling a wide range of files, including machine learning models, ELF executables, pickle files, JavaScript, and more. +In addition to trace-time inspection, Dyana includes a small host-side planning surface for choosing models that are likely to fit your hardware before you run anything. + It provides visibility into: - GPU memory usage - Filesystem interactions - Network requests - Security-relevant runtime events +- Model fit recommendations for the current host + +## Fit Planning + +Use [`dyana fit`](fit.md) to rank a compact set of model recommendations against the current machine's RAM, GPU or unified memory budget, and detected local runtimes such as Ollama or MLX. ## Loaders diff --git a/dyana/cli.py b/dyana/cli.py index 6ba696f..7ded4ec 100644 --- a/dyana/cli.py +++ b/dyana/cli.py @@ -2,6 +2,11 @@ import pathlib import platform as platform_pkg +import typer +from rich import box +from rich import print as rich_print +from rich.table import Table + try: import cysimdjson @@ -9,17 +14,14 @@ except ImportError: _HAS_CYSIMDJSON = False -import typer -from rich import box -from rich import print as rich_print -from rich.table import Table - import dyana.loaders as loaders_pkg +from dyana.fit import detect_hardware, fit_result_json, recommend_models from dyana.loaders.loader import Loader from dyana.tracer.tracee import Tracer from dyana.view import ( view_disk_events, view_disk_usage, + view_fit, view_gpus, view_header, view_imports, @@ -45,6 +47,35 @@ ) +@cli.command(help="Recommend models that fit the current machine.") +def fit( + use_case: str = typer.Option(help="Target workload, e.g. coding, chat, reasoning, general.", default="general"), + top_k: int = typer.Option(help="Number of recommendations to return.", default=5), + runtime: str | None = typer.Option(help="Limit results to a specific runtime, e.g. ollama, mlx, llama_cpp.", default=None), + max_memory_gb: float | None = typer.Option(help="Override the available memory budget in GiB.", default=None), + preference: str = typer.Option(help="Ranking preference: balanced, quality, or speed.", default="balanced"), + explain_excluded: bool = typer.Option(False, help="Include a short explanation for excluded candidates."), + json_output: bool = typer.Option(False, "--json", help="Emit recommendations as JSON."), +) -> None: + if preference not in {"balanced", "quality", "speed"}: + raise typer.BadParameter("preference must be one of: balanced, quality, speed") + + result = recommend_models( + detect_hardware(), + use_case=use_case, + top_k=top_k, + runtime=runtime, + max_memory_gb=max_memory_gb, + preference=preference, + explain_excluded=explain_excluded, + ) + + if json_output: + rich_print(fit_result_json(result)) + else: + view_fit(result.model_dump()) + + @cli.command( help="Show the available loaders.", ) diff --git a/dyana/cli_test.py b/dyana/cli_test.py index 433eb49..4354ae2 100644 --- a/dyana/cli_test.py +++ b/dyana/cli_test.py @@ -49,6 +49,15 @@ def test_loaders_help(self) -> None: assert result.exit_code == 0 assert "--build" in _strip_ansi(result.output) + def test_fit_help(self) -> None: + result = runner.invoke(cli, ["fit", "--help"]) + assert result.exit_code == 0 + output = _strip_ansi(result.output) + assert "--use-case" in output + assert "--top-k" in output + assert "--runtime" in output + assert "--max-memory-gb" in output + class TestSummaryCommand: def test_summary_with_modern_trace(self, tmp_path: t.Any) -> None: @@ -137,6 +146,72 @@ def test_summary_missing_file(self) -> None: assert result.exit_code != 0 +class TestFitCommand: + def test_fit_text_output(self) -> None: + fake_result: dict[str, t.Any] = { + "hardware": { + "platform": "Linux", + "arch": "x86_64", + "total_ram_gb": 64.0, + "gpu_name": "RTX 4090", + "gpu_count": 1, + "total_vram_gb": 24.0, + "unified_memory": False, + "runtimes": {"automodel": True, "ollama": True, "llama_cpp": False, "mlx": False}, + }, + "use_case": "coding", + "runtime_filter": None, + "max_memory_gb": None, + "recommendations": [ + { + "model_id": "qwen25-coder-7b", + "model": "Qwen2.5-Coder 7B Instruct", + "runtime": "ollama", + "provider": "Ollama", + "quantization": "Q8_0", + "mode": "gpu", + "estimated_memory_gb": 8.5, + "score": 92, + "rationale": "Fits comfortably.", + "artifact_hint": "Use an Ollama model tag.", + "invocation_hint": "ollama run qwen2.5-coder:7b", + } + ], + "excluded": [], + } + + with ( + patch("dyana.cli.detect_hardware"), + patch("dyana.cli.recommend_models") as mock_recommend, + ): + mock_recommend.return_value.model_dump.return_value = fake_result + result = runner.invoke(cli, ["fit", "--use-case", "coding"]) + + assert result.exit_code == 0 + output = _strip_ansi(result.output) + assert "Hardware" in output + assert "Qwen2.5-Coder 7B Instruct" in output + assert "ollama run qwen2.5-coder:7b" in output + assert "automodel" in output + + def test_fit_json_output(self) -> None: + payload = json.dumps({"use_case": "general", "recommendations": [], "excluded": []}) + with ( + patch("dyana.cli.detect_hardware"), + patch("dyana.cli.recommend_models"), + patch("dyana.cli.fit_result_json", return_value=payload), + ): + result = runner.invoke(cli, ["fit", "--json"]) + + assert result.exit_code == 0 + assert json.loads(result.output)["use_case"] == "general" + + def test_fit_rejects_invalid_preference(self) -> None: + result = runner.invoke(cli, ["fit", "--preference", "unknown"]) + assert result.exit_code == 2 + assert "preference must be one of" in _strip_ansi(result.output) + + def _noop_loader_init(self: t.Any, **kwargs: t.Any) -> None: self.name = kwargs.get("name", "automodel") self.settings = None diff --git a/dyana/data/__init__.py b/dyana/data/__init__.py new file mode 100644 index 0000000..a43adf0 --- /dev/null +++ b/dyana/data/__init__.py @@ -0,0 +1 @@ +# Package data container for Dyana. diff --git a/dyana/data/models.json b/dyana/data/models.json new file mode 100644 index 0000000..7a890f7 --- /dev/null +++ b/dyana/data/models.json @@ -0,0 +1,68 @@ +[ + { + "id": "qwen25-coder-7b", + "name": "Qwen2.5-Coder 7B Instruct", + "family": "qwen", + "use_cases": ["coding", "chat"], + "params_b": 7.0, + "context_k": 128, + "supported_providers": ["automodel", "mlx", "ollama", "llama_cpp"], + "supported_quantizations": ["Q4_K_M", "Q6_K", "Q8_0", "F16"], + "aliases": ["qwen2.5-coder:7b"] + }, + { + "id": "qwen25-coder-14b", + "name": "Qwen2.5-Coder 14B Instruct", + "family": "qwen", + "use_cases": ["coding", "chat"], + "params_b": 14.0, + "context_k": 128, + "supported_providers": ["automodel", "mlx", "ollama", "llama_cpp"], + "supported_quantizations": ["Q4_K_M", "Q6_K", "Q8_0", "F16"], + "aliases": ["qwen2.5-coder:14b"] + }, + { + "id": "deepseek-r1-distill-qwen-7b", + "name": "DeepSeek-R1-Distill-Qwen-7B", + "family": "deepseek", + "use_cases": ["reasoning", "coding"], + "params_b": 7.0, + "context_k": 32, + "supported_providers": ["automodel", "mlx", "ollama", "llama_cpp"], + "supported_quantizations": ["Q4_K_M", "Q6_K", "Q8_0", "F16"], + "aliases": ["deepseek-r1-distill-qwen:7b"] + }, + { + "id": "llama31-8b", + "name": "Llama 3.1 8B Instruct", + "family": "llama", + "use_cases": ["chat", "general"], + "params_b": 8.0, + "context_k": 128, + "supported_providers": ["automodel", "mlx", "ollama", "llama_cpp"], + "supported_quantizations": ["Q4_K_M", "Q6_K", "Q8_0", "F16"], + "aliases": ["llama3.1:8b"] + }, + { + "id": "qwen25-3b", + "name": "Qwen2.5 3B Instruct", + "family": "qwen", + "use_cases": ["chat", "general"], + "params_b": 3.0, + "context_k": 32, + "supported_providers": ["automodel", "mlx", "ollama", "llama_cpp"], + "supported_quantizations": ["Q4_K_M", "Q6_K", "Q8_0", "F16"], + "aliases": ["qwen2.5:3b"] + }, + { + "id": "gemma3-4b", + "name": "Gemma 3 4B Instruct", + "family": "gemma", + "use_cases": ["chat", "general", "coding"], + "params_b": 4.0, + "context_k": 128, + "supported_providers": ["automodel", "mlx", "ollama", "llama_cpp"], + "supported_quantizations": ["Q4_K_M", "Q6_K", "Q8_0", "F16"], + "aliases": ["gemma3:4b"] + } +] diff --git a/dyana/data/providers.json b/dyana/data/providers.json new file mode 100644 index 0000000..4c831fa --- /dev/null +++ b/dyana/data/providers.json @@ -0,0 +1,42 @@ +[ + { + "id": "automodel", + "name": "Dyana AutoModel", + "runtime_key": "automodel", + "supported_modes": ["gpu", "cpu", "unified"], + "preferred_on": ["gpu", "unified", "cpu"], + "quantizations": ["Q4_K_M", "Q6_K", "Q8_0", "F16"], + "artifact_hint": "Use a local Hugging Face model directory or weights path with Dyana's automodel loader.", + "invocation_template": "dyana trace --loader automodel -- --model /path/to/{model_id}" + }, + { + "id": "mlx", + "name": "MLX", + "runtime_key": "mlx", + "supported_modes": ["unified"], + "preferred_on": ["unified"], + "quantizations": ["Q4_K_M", "Q6_K", "Q8_0", "F16"], + "artifact_hint": "Use MLX-converted weights or an MLX-native model package.", + "invocation_template": "mlx_lm.generate --model {model_id}" + }, + { + "id": "ollama", + "name": "Ollama", + "runtime_key": "ollama", + "supported_modes": ["gpu", "cpu", "unified"], + "preferred_on": ["gpu", "unified"], + "quantizations": ["Q4_K_M", "Q6_K", "Q8_0"], + "artifact_hint": "Use an Ollama model tag or local Modelfile import.", + "invocation_template": "ollama run {model_id}" + }, + { + "id": "llama_cpp", + "name": "llama.cpp", + "runtime_key": "llama_cpp", + "supported_modes": ["gpu", "cpu", "unified"], + "preferred_on": ["gpu", "cpu", "unified"], + "quantizations": ["Q4_K_M", "Q6_K", "Q8_0", "F16"], + "artifact_hint": "Use a GGUF artifact compatible with llama.cpp.", + "invocation_template": "llama-cli -m /path/to/{model_id}.gguf" + } +] diff --git a/dyana/fit/__init__.py b/dyana/fit/__init__.py new file mode 100644 index 0000000..0940509 --- /dev/null +++ b/dyana/fit/__init__.py @@ -0,0 +1,30 @@ +from dyana.fit.catalog import load_catalog +from dyana.fit.engine import estimate_model_memory_gb, fit_result_json, recommend_models +from dyana.fit.hardware import detect_hardware, detect_nvidia_gpu +from dyana.fit.models import ( + ExcludedCandidate, + FitCatalog, + FitRecommendation, + FitResult, + HardwareProfile, + ModelSpec, + ProviderSpec, + RuntimeAvailability, +) + +__all__ = [ + "FitCatalog", + "FitRecommendation", + "FitResult", + "HardwareProfile", + "ModelSpec", + "ProviderSpec", + "RuntimeAvailability", + "detect_hardware", + "detect_nvidia_gpu", + "estimate_model_memory_gb", + "ExcludedCandidate", + "fit_result_json", + "load_catalog", + "recommend_models", +] diff --git a/dyana/fit/catalog.py b/dyana/fit/catalog.py new file mode 100644 index 0000000..191ee41 --- /dev/null +++ b/dyana/fit/catalog.py @@ -0,0 +1,18 @@ +from __future__ import annotations + +import json +from importlib.resources import files +from typing import Any, cast + +from dyana.fit.models import FitCatalog + + +def _read_catalog_file(name: str) -> list[dict[str, Any]]: + resource = files("dyana.data").joinpath(name) + return cast(list[dict[str, Any]], json.loads(resource.read_text())) + + +def load_catalog() -> FitCatalog: + providers = _read_catalog_file("providers.json") + models = _read_catalog_file("models.json") + return FitCatalog.model_validate({"providers": providers, "models": models}) diff --git a/dyana/fit/engine.py b/dyana/fit/engine.py new file mode 100644 index 0000000..958b3b1 --- /dev/null +++ b/dyana/fit/engine.py @@ -0,0 +1,216 @@ +from __future__ import annotations + +from dyana.fit.catalog import load_catalog +from dyana.fit.models import ( + ExcludedCandidate, + FitCatalog, + FitRecommendation, + FitResult, + HardwareProfile, + ModelSpec, + ProviderSpec, +) + +QUANTIZATION_BYTES_PER_PARAM: dict[str, float] = { + "Q4_K_M": 0.62, + "Q6_K": 0.85, + "Q8_0": 1.05, + "F16": 2.10, +} + + +def _safe_round(value: float) -> float: + return round(value, 1) + + +def estimate_model_memory_gb(params_b: float, quantization: str) -> float: + bytes_per_param = QUANTIZATION_BYTES_PER_PARAM[quantization] + return _safe_round(params_b * bytes_per_param * 1.15) + + +def _runtime_enabled(hardware: HardwareProfile, provider: ProviderSpec) -> bool: + return bool(getattr(hardware.runtimes, provider.runtime_key, False)) + + +def _mode_capacity_gb(hardware: HardwareProfile, mode: str) -> float: + if mode == "unified": + return _safe_round(hardware.total_ram_gb * 0.7) + if mode == "gpu": + return hardware.total_vram_gb or 0.0 + return _safe_round(hardware.total_ram_gb * 0.6) + + +def _provider_viable_modes(hardware: HardwareProfile, provider: ProviderSpec) -> list[str]: + modes: list[str] = [] + for mode in provider.supported_modes: + if mode == "unified" and hardware.unified_memory: + modes.append(mode) + elif mode == "gpu" and hardware.total_vram_gb and hardware.total_vram_gb > 0: + modes.append(mode) + elif mode == "cpu": + modes.append(mode) + return modes + + +def _use_case_bonus(model: ModelSpec, requested_use_case: str) -> int: + if requested_use_case in model.use_cases: + return 18 + if requested_use_case == "coding" and "reasoning" in model.use_cases: + return 8 + return 0 + + +def _runtime_bonus(provider: ProviderSpec, mode: str) -> int: + score = 0 + if mode in provider.preferred_on: + score += 6 + runtime_bonuses = {"mlx": 4, "ollama": 3, "llama_cpp": 2} + return score + runtime_bonuses.get(provider.runtime_key, 0) + + +def _provider_map(catalog: FitCatalog) -> dict[str, ProviderSpec]: + return {provider.id: provider for provider in catalog.providers} + + +def _preferred_quantizations(preference: str) -> list[str]: + if preference == "quality": + return ["F16", "Q8_0", "Q6_K", "Q4_K_M"] + if preference == "speed": + return ["Q4_K_M", "Q6_K", "Q8_0", "F16"] + return ["Q8_0", "Q6_K", "Q4_K_M", "F16"] + + +def _quantization_bonus(quantization: str, preference: str) -> int: + quality_bonus = {"F16": 7, "Q8_0": 5, "Q6_K": 3, "Q4_K_M": 1} + speed_bonus = {"Q4_K_M": 7, "Q6_K": 5, "Q8_0": 3, "F16": 1} + balanced_bonus = {"Q8_0": 5, "Q6_K": 4, "Q4_K_M": 3, "F16": 2} + if preference == "quality": + return quality_bonus[quantization] + if preference == "speed": + return speed_bonus[quantization] + return balanced_bonus[quantization] + + +def recommend_models( + hardware: HardwareProfile, + use_case: str = "general", + top_k: int = 5, + runtime: str | None = None, + max_memory_gb: float | None = None, + preference: str = "balanced", + explain_excluded: bool = False, + catalog: FitCatalog | None = None, +) -> FitResult: + active_catalog = catalog or load_catalog() + providers = _provider_map(active_catalog) + recommendations: list[FitRecommendation] = [] + excluded: list[ExcludedCandidate] = [] + + for model in active_catalog.models: + best: FitRecommendation | None = None + model_excluded_reasons: list[ExcludedCandidate] = [] + for provider_id in model.supported_providers: + provider = providers[provider_id] + if runtime and provider.runtime_key != runtime: + model_excluded_reasons.append( + ExcludedCandidate( + model_id=model.id, + model=model.name, + provider=provider.runtime_key, + reason=f"runtime filter excludes provider '{provider.runtime_key}'", + ) + ) + continue + if not _runtime_enabled(hardware, provider): + model_excluded_reasons.append( + ExcludedCandidate( + model_id=model.id, + model=model.name, + provider=provider.runtime_key, + reason=f"runtime '{provider.runtime_key}' is not available on this host", + ) + ) + continue + + for mode in _provider_viable_modes(hardware, provider): + capacity_gb = _mode_capacity_gb(hardware, mode) + if max_memory_gb is not None: + capacity_gb = min(capacity_gb, max_memory_gb) + + shared_quants = [quant for quant in _preferred_quantizations(preference) if quant in model.supported_quantizations and quant in provider.quantizations] + if not shared_quants: + model_excluded_reasons.append( + ExcludedCandidate( + model_id=model.id, + model=model.name, + provider=provider.runtime_key, + reason="no shared quantization between model and provider", + ) + ) + continue + for quantization in shared_quants: + estimated_memory_gb = estimate_model_memory_gb(model.params_b, quantization) + headroom_gb = _safe_round(capacity_gb - estimated_memory_gb) + if headroom_gb < 0: + if explain_excluded: + model_excluded_reasons.append( + ExcludedCandidate( + model_id=model.id, + model=model.name, + provider=provider.runtime_key, + reason=( + f"{quantization} needs ~{estimated_memory_gb} GiB but only " + f"{capacity_gb} GiB is available in {mode} mode" + ), + ) + ) + continue + + score = 20 + score += _use_case_bonus(model, use_case) + score += _runtime_bonus(provider, mode) + score += min(int(headroom_gb * 1.5), 12) + score += min(model.context_k // 32, 6) + score += _quantization_bonus(quantization, preference) + + rationale = ( + f"Fits in {mode} memory with ~{headroom_gb} GiB headroom using {quantization}; " + f"good match for {use_case} via {provider.name}." + ) + candidate = FitRecommendation( + model_id=model.id, + model=model.name, + family=model.family, + use_case=use_case, + runtime=provider.runtime_key, + provider=provider.name, + quantization=quantization, + mode=mode, + estimated_memory_gb=estimated_memory_gb, + headroom_gb=headroom_gb, + score=min(score, 100), + rationale=rationale, + artifact_hint=provider.artifact_hint, + invocation_hint=provider.invocation_template.format(model_id=model.aliases[0] if model.aliases else model.id), + ) + if best is None or candidate.score > best.score: + best = candidate + + if best is not None: + recommendations.append(best) + elif explain_excluded and model_excluded_reasons: + excluded.append(model_excluded_reasons[0]) + + recommendations.sort(key=lambda item: (-item.score, item.estimated_memory_gb, item.model)) + return FitResult( + hardware=hardware, + use_case=use_case, + recommendations=recommendations[:top_k], + runtime_filter=runtime, + max_memory_gb=max_memory_gb, + excluded=excluded[:top_k] if explain_excluded else [], + ) + + +def fit_result_json(result: FitResult) -> str: + return result.model_dump_json(indent=2) diff --git a/dyana/fit/hardware.py b/dyana/fit/hardware.py new file mode 100644 index 0000000..7f2b1ac --- /dev/null +++ b/dyana/fit/hardware.py @@ -0,0 +1,87 @@ +from __future__ import annotations + +import os +import platform +import shutil +import subprocess + +from dyana.fit.models import HardwareProfile, RuntimeAvailability + + +def _safe_round(value: float) -> float: + return round(value, 1) + + +def detect_total_ram_gb() -> float: + if hasattr(os, "sysconf") and "SC_PAGE_SIZE" in os.sysconf_names and "SC_PHYS_PAGES" in os.sysconf_names: + page_size = int(os.sysconf("SC_PAGE_SIZE")) + pages = int(os.sysconf("SC_PHYS_PAGES")) + return _safe_round((page_size * pages) / (1024**3)) + + return 0.0 + + +def detect_runtimes() -> RuntimeAvailability: + return RuntimeAvailability( + automodel=True, + ollama=shutil.which("ollama") is not None, + llama_cpp=shutil.which("llama-cli") is not None or shutil.which("llama-server") is not None, + mlx=platform.system() == "Darwin" and platform.machine() == "arm64", + ) + + +def detect_nvidia_gpu() -> tuple[str | None, int, float | None]: + binary = shutil.which("nvidia-smi") + if not binary: + return None, 0, None + + try: + output = subprocess.check_output( + [ + binary, + "--query-gpu=name,memory.total", + "--format=csv,noheader,nounits", + ], + text=True, + ) + except Exception: + return None, 0, None + + rows = [row.strip() for row in output.splitlines() if row.strip()] + if not rows: + return None, 0, None + + names: list[str] = [] + total_mb = 0.0 + for row in rows: + name, mem = [part.strip() for part in row.split(",", maxsplit=1)] + names.append(name) + total_mb += float(mem) + + return names[0], len(rows), _safe_round(total_mb / 1024) + + +def detect_hardware() -> HardwareProfile: + system = platform.system() + arch = platform.machine() + ram_gb = detect_total_ram_gb() + gpu_name, gpu_count, total_vram_gb = detect_nvidia_gpu() + runtimes = detect_runtimes() + unified_memory = system == "Darwin" and arch == "arm64" + + if unified_memory and total_vram_gb is None: + total_vram_gb = _safe_round(ram_gb * 0.7) + if gpu_name is None: + gpu_name = "Apple Silicon" + gpu_count = 1 + + return HardwareProfile( + platform=system, + arch=arch, + total_ram_gb=ram_gb, + gpu_name=gpu_name, + gpu_count=gpu_count, + total_vram_gb=total_vram_gb, + unified_memory=unified_memory, + runtimes=runtimes, + ) diff --git a/dyana/fit/models.py b/dyana/fit/models.py new file mode 100644 index 0000000..f7344be --- /dev/null +++ b/dyana/fit/models.py @@ -0,0 +1,82 @@ +from __future__ import annotations + +from pydantic import BaseModel + + +class RuntimeAvailability(BaseModel): + automodel: bool = True + ollama: bool = False + llama_cpp: bool = False + mlx: bool = False + + +class HardwareProfile(BaseModel): + platform: str + arch: str + total_ram_gb: float + gpu_name: str | None = None + gpu_count: int = 0 + total_vram_gb: float | None = None + unified_memory: bool = False + runtimes: RuntimeAvailability + + +class ProviderSpec(BaseModel): + id: str + name: str + runtime_key: str + supported_modes: list[str] + preferred_on: list[str] = [] + quantizations: list[str] + artifact_hint: str + invocation_template: str + + +class ModelSpec(BaseModel): + id: str + name: str + family: str + use_cases: list[str] + params_b: float + context_k: int + supported_providers: list[str] + supported_quantizations: list[str] + aliases: list[str] = [] + + +class FitCatalog(BaseModel): + providers: list[ProviderSpec] + models: list[ModelSpec] + + +class FitRecommendation(BaseModel): + model_id: str + model: str + family: str + use_case: str + runtime: str + provider: str + quantization: str + mode: str + estimated_memory_gb: float + headroom_gb: float + score: int + rationale: str + artifact_hint: str + invocation_hint: str + + +class ExcludedCandidate(BaseModel): + model_id: str + model: str + provider: str + reason: str + + +class FitResult(BaseModel): + hardware: HardwareProfile + use_case: str + recommendations: list[FitRecommendation] + runtime_filter: str | None = None + max_memory_gb: float | None = None + excluded: list[ExcludedCandidate] = [] diff --git a/dyana/fit_test.py b/dyana/fit_test.py new file mode 100644 index 0000000..1f92eab --- /dev/null +++ b/dyana/fit_test.py @@ -0,0 +1,190 @@ +from unittest.mock import patch + +from dyana.fit import ( + HardwareProfile, + RuntimeAvailability, + detect_hardware, + detect_nvidia_gpu, + estimate_model_memory_gb, + load_catalog, + recommend_models, +) + + +class TestEstimateModelMemory: + def test_q4_estimate(self) -> None: + assert estimate_model_memory_gb(7.0, "Q4_K_M") > 0 + + def test_f16_larger_than_q4(self) -> None: + assert estimate_model_memory_gb(7.0, "F16") > estimate_model_memory_gb(7.0, "Q4_K_M") + + +class TestDetectNvidiaGpu: + def test_no_binary(self) -> None: + with patch("dyana.fit.hardware.shutil.which", return_value=None): + assert detect_nvidia_gpu() == (None, 0, None) + + def test_parses_multiple_gpus(self) -> None: + with ( + patch("dyana.fit.hardware.shutil.which", return_value="/usr/bin/nvidia-smi"), + patch( + "dyana.fit.hardware.subprocess.check_output", + return_value="NVIDIA RTX 4090, 24564\nNVIDIA RTX 4090, 24564\n", + ), + ): + name, count, total_vram_gb = detect_nvidia_gpu() + assert name == "NVIDIA RTX 4090" + assert count == 2 + assert total_vram_gb is not None + assert total_vram_gb > 40 + + +class TestDetectHardware: + def test_detects_apple_unified_memory(self) -> None: + with ( + patch("dyana.fit.hardware.platform.system", return_value="Darwin"), + patch("dyana.fit.hardware.platform.machine", return_value="arm64"), + patch("dyana.fit.hardware.detect_total_ram_gb", return_value=64.0), + patch("dyana.fit.hardware.detect_nvidia_gpu", return_value=(None, 0, None)), + patch("dyana.fit.hardware.detect_runtimes", return_value=RuntimeAvailability(mlx=True)), + ): + hardware = detect_hardware() + assert hardware.unified_memory is True + assert hardware.gpu_name == "Apple Silicon" + assert hardware.total_vram_gb == 44.8 + + def test_detects_standard_linux_host(self) -> None: + with ( + patch("dyana.fit.hardware.platform.system", return_value="Linux"), + patch("dyana.fit.hardware.platform.machine", return_value="x86_64"), + patch("dyana.fit.hardware.detect_total_ram_gb", return_value=32.0), + patch("dyana.fit.hardware.detect_nvidia_gpu", return_value=("RTX 4090", 1, 24.0)), + patch( + "dyana.fit.hardware.detect_runtimes", + return_value=RuntimeAvailability(ollama=True, llama_cpp=False, mlx=False), + ), + ): + hardware = detect_hardware() + assert hardware.platform == "Linux" + assert hardware.gpu_count == 1 + assert hardware.total_vram_gb == 24.0 + assert hardware.runtimes.ollama is True + + +class TestRecommendModels: + def test_catalog_loads_from_data_files(self) -> None: + catalog = load_catalog() + assert len(catalog.providers) >= 1 + assert len(catalog.models) >= 1 + + def test_prefers_coding_models(self) -> None: + hardware = HardwareProfile( + platform="Linux", + arch="x86_64", + total_ram_gb=64.0, + gpu_name="RTX 4090", + gpu_count=1, + total_vram_gb=24.0, + runtimes=RuntimeAvailability(ollama=True), + ) + + result = recommend_models(hardware, use_case="coding", top_k=3) + + assert len(result.recommendations) == 3 + assert any("Coder" in recommendation.model for recommendation in result.recommendations) + assert result.recommendations[0].score >= result.recommendations[-1].score + assert result.recommendations[0].artifact_hint + assert result.recommendations[0].invocation_hint + + def test_automodel_runtime_filter_works(self) -> None: + hardware = HardwareProfile( + platform="Linux", + arch="x86_64", + total_ram_gb=64.0, + gpu_name="RTX 4090", + gpu_count=1, + total_vram_gb=24.0, + runtimes=RuntimeAvailability(automodel=True), + ) + + result = recommend_models(hardware, use_case="coding", runtime="automodel", top_k=2) + + assert len(result.recommendations) == 2 + assert all(recommendation.runtime == "automodel" for recommendation in result.recommendations) + assert all("dyana trace --loader automodel" in recommendation.invocation_hint for recommendation in result.recommendations) + + def test_returns_no_recommendations_for_tiny_machine(self) -> None: + hardware = HardwareProfile( + platform="Linux", + arch="x86_64", + total_ram_gb=1.0, + gpu_name=None, + gpu_count=0, + total_vram_gb=None, + runtimes=RuntimeAvailability(), + ) + + result = recommend_models(hardware, use_case="general", top_k=5) + assert result.recommendations == [] + + def test_cpu_only_mode_works(self) -> None: + hardware = HardwareProfile( + platform="Linux", + arch="x86_64", + total_ram_gb=24.0, + gpu_name=None, + gpu_count=0, + total_vram_gb=None, + runtimes=RuntimeAvailability(llama_cpp=True), + ) + + result = recommend_models(hardware, use_case="general", top_k=2) + assert len(result.recommendations) == 2 + assert all(recommendation.mode == "cpu" for recommendation in result.recommendations) + + def test_runtime_filter_limits_results(self) -> None: + hardware = HardwareProfile( + platform="Darwin", + arch="arm64", + total_ram_gb=24.0, + gpu_name="Apple Silicon", + gpu_count=1, + total_vram_gb=16.8, + unified_memory=True, + runtimes=RuntimeAvailability(ollama=True, mlx=True), + ) + + result = recommend_models(hardware, use_case="coding", runtime="ollama", top_k=2) + assert len(result.recommendations) == 2 + assert all(recommendation.runtime == "ollama" for recommendation in result.recommendations) + + def test_max_memory_budget_restricts_recommendations(self) -> None: + hardware = HardwareProfile( + platform="Darwin", + arch="arm64", + total_ram_gb=24.0, + gpu_name="Apple Silicon", + gpu_count=1, + total_vram_gb=16.8, + unified_memory=True, + runtimes=RuntimeAvailability(mlx=True), + ) + + result = recommend_models(hardware, use_case="coding", max_memory_gb=6.0, top_k=5) + assert result.recommendations + assert all(recommendation.estimated_memory_gb <= 6.0 for recommendation in result.recommendations) + + def test_explain_excluded_returns_reasons(self) -> None: + hardware = HardwareProfile( + platform="Linux", + arch="x86_64", + total_ram_gb=4.0, + gpu_name=None, + gpu_count=0, + total_vram_gb=None, + runtimes=RuntimeAvailability(ollama=True), + ) + + result = recommend_models(hardware, use_case="coding", top_k=3, explain_excluded=True) + assert result.excluded + assert result.excluded[0].reason diff --git a/dyana/view.py b/dyana/view.py index 284d081..e6e601a 100644 --- a/dyana/view.py +++ b/dyana/view.py @@ -102,6 +102,69 @@ def severity_fmt(level: int) -> str: return "[bold dim]no severity[/]" +def view_fit(result: dict[str, t.Any]) -> None: + hardware = result["hardware"] + rich_print("[bold cyan]Hardware:[/]") + rich_print(f" Platform : {hardware['platform']} ({hardware['arch']})") + rich_print(f" System RAM : {hardware['total_ram_gb']} GiB") + if hardware.get("gpu_name"): + rich_print( + f" GPU : {hardware['gpu_name']} x{hardware['gpu_count']} " + f"({hardware.get('total_vram_gb', 0)} GiB)" + ) + else: + rich_print(" GPU : none detected") + + runtimes = [name for name, enabled in hardware["runtimes"].items() if enabled] + rich_print(f" Runtimes : {', '.join(runtimes) if runtimes else 'none detected'}") + if result.get("runtime_filter"): + rich_print(f" Runtime Filter: {result['runtime_filter']}") + if result.get("max_memory_gb") is not None: + rich_print(f" Max Memory : {result['max_memory_gb']} GiB") + rich_print() + + rich_print(f"[bold cyan]Recommendations For {result['use_case'].title()}:[/]") + recommendations = result.get("recommendations", []) + if not recommendations: + rich_print(" No viable recommendations found for the detected hardware.") + rich_print() + return + + table = Table(box=box.ROUNDED) + table.add_column("Model", style="green") + table.add_column("Runtime", style="cyan") + table.add_column("Quant") + table.add_column("Mode") + table.add_column("Est. Mem") + table.add_column("Score") + table.add_column("Hint") + + for recommendation in recommendations: + table.add_row( + recommendation["model"], + recommendation["runtime"], + recommendation["quantization"], + recommendation["mode"], + f"{recommendation['estimated_memory_gb']} GiB", + str(recommendation["score"]), + recommendation["artifact_hint"], + ) + + rich_print(table) + rich_print() + for recommendation in recommendations: + rich_print(f" * [bold]{recommendation['model']}[/] - {recommendation['rationale']}") + rich_print(f" next step: {recommendation['invocation_hint']}") + + excluded = result.get("excluded", []) + if excluded: + rich_print() + rich_print("[bold cyan]Excluded:[/]") + for item in excluded: + rich_print(f" * [bold]{item['model']}[/] via {item['provider']} - {item['reason']}") + rich_print() + + def view_header(trace: dict[str, t.Any], is_legacy: bool) -> None: run = trace["run"] if is_legacy: diff --git a/mkdocs.yaml b/mkdocs.yaml index c17f56f..36d4686 100644 --- a/mkdocs.yaml +++ b/mkdocs.yaml @@ -9,6 +9,7 @@ nav: - Overview: index.md - Install: install.md - Basic Usage: basic-usage.md + - Fit Planning: fit.md - Topics: - Loaders: topics/loaders.md