diff --git a/README.md b/README.md
index 3b6630a..12d80bc 100644
--- a/README.md
+++ b/README.md
@@ -30,6 +30,8 @@
 
 Dyana is a sandbox environment using Docker and [Tracee](https://github.com/aquasecurity/tracee) for loading, running and profiling a wide range of files, including machine learning models, ELF executables, Pickle serialized files, Javascripts [and more](https://docs.dreadnode.io/open-source/dyana/topics/loaders). It provides detailed insights into GPU memory usage, filesystem interactions, network requests, and security related events.
 
+It also includes a lightweight host-side planning command, `dyana fit`, that recommends models likely to fit the current machine based on available RAM, GPU memory, and detected local runtimes.
+
 ## Installation
 
 Install with:
@@ -70,6 +72,24 @@ uv run pytest dyana
 
 See our docs on dyana usage [here](https://docs.dreadnode.io/open-source/dyana/basic-usage)
 
+Quick example:
+
+```bash
+dyana fit --use-case coding --top-k 5
+```
+
+Constrain the recommendation surface:
+
+```bash
+dyana fit --use-case coding --runtime ollama --max-memory-gb 12 --explain-excluded
+```
+
+Get Dyana-native recommendations for the `automodel` loader:
+
+```bash
+dyana fit --use-case coding --runtime automodel
+```
+
 ## License
 
 Dyana is released under the [MIT license](LICENSE). Tracee is released under the [Apache 2.0 license](third_party_licenses/APACHE2.md).
diff --git a/docs/basic-usage.md b/docs/basic-usage.md
index 796b442..9268ef9 100644
--- a/docs/basic-usage.md
+++ b/docs/basic-usage.md
@@ -12,6 +12,30 @@ Show help for a specific loader:
 dyana help automodel
 ```
 
+Plan model choices against the current machine before tracing:
+
+```bash
+dyana fit --use-case coding --top-k 5
+```
+
+Emit machine-readable fit recommendations:
+
+```bash
+dyana fit --use-case general --json
+```
+
+Restrict the planner to a specific runtime and memory budget:
+
+```bash
+dyana fit --use-case coding --runtime ollama --max-memory-gb 12 --explain-excluded
+```
+
+Plan specifically for Dyana's built-in automodel loader:
+
+```bash
+dyana fit --use-case coding --runtime automodel
+```
+
 Create a trace file for a loader run:
 
 ```bash
@@ -36,6 +60,8 @@ Show a summary of a trace file:
 dyana summary --trace-path trace.json
 ```
 
+`dyana fit` is host-side only. It does not start Docker, pull models, or execute artifacts. It is intended as a quick planning step before a real traced run.
+
 ## Default Safeguards
 
 Network access is disabled by default for loader containers. Allow it explicitly when needed:
diff --git a/docs/fit.md b/docs/fit.md
new file mode 100644
index 0000000..7571c52
--- /dev/null
+++ b/docs/fit.md
@@ -0,0 +1,90 @@
+# Fit Planning
+
+`dyana fit` recommends a small set of models that are likely to fit the current machine.
+
+Unlike `dyana trace`, this command is host-side only:
+
+- it does not start Docker
+- it does not run loaders
+- it does not download models
+- it does not execute artifacts
+
+It is meant to answer a narrower question first: what is even worth trying on this hardware?
+
+## What It Uses
+
+The current prototype looks at:
+
+- total system RAM
+- detected NVIDIA GPU memory, if present
+- Apple Silicon unified memory heuristics on `Darwin arm64`
+- detected runtimes such as Dyana `automodel`, `ollama`, `llama.cpp`, and `mlx`
+- a packaged local model and provider catalog
+
+## Examples
+
+Recommend coding-oriented models:
+
+```bash
+dyana fit --use-case coding --top-k 5
+```
+
+Get JSON output for automation:
+
+```bash
+dyana fit --use-case general --top-k 3 --json
+```
+
+Limit results to a specific runtime and budget:
+
+```bash
+dyana fit --use-case coding --runtime ollama --max-memory-gb 12
+```
+
+Prefer a Dyana-native execution path:
+
+```bash
+dyana fit --use-case coding --runtime automodel
+```
+
+Explain why some candidates were excluded:
+
+```bash
+dyana fit --use-case coding --explain-excluded
+```
+
+## Output
+
+The text view shows:
+
+- detected hardware summary
+- detected runtimes
+- ranked recommendations
+- estimated memory use
+- runtime and quantization choice
+- a short rationale for each recommendation
+- provider-specific artifact and invocation hints
+- optional exclusion reasons for rejected candidates
+
+The JSON view includes the same information in a machine-readable structure.
+
+## Preferences
+
+The planner supports a small set of opinionated controls:
+
+- `--runtime` to limit results to `automodel`, `ollama`, `mlx`, or `llama_cpp`
+- `--max-memory-gb` to cap the effective memory budget
+- `--preference balanced|quality|speed` to nudge quantization ranking
+- `--explain-excluded` to include a short rejection reason for excluded candidates
+
+## Current Scope
+
+This is intentionally lightweight. The prototype:
+
+- uses simple fit heuristics instead of benchmark-backed throughput estimates
+- ranks a packaged local catalog rather than a large external model index
+- focuses on fit and practical starting points, not exhaustive provider support
+
+The command is a planning tool. For real artifact execution and profiling, continue to use `dyana trace`.
+
+When the selected provider is `automodel`, the recommendation includes a Dyana invocation hint using `dyana trace --loader automodel`.
diff --git a/docs/index.md b/docs/index.md
index 1d190ef..a9e9fc2 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -2,12 +2,19 @@
 
 Dyana is a sandbox environment using Docker and [Tracee](https://github.com/aquasecurity/tracee) for loading, running, and profiling a wide range of files, including machine learning models, ELF executables, pickle files, JavaScript, and more.
 
+In addition to trace-time inspection, Dyana includes a small host-side planning surface for choosing models that are likely to fit your hardware before you run anything.
+
 It provides visibility into:
 
 - GPU memory usage
 - Filesystem interactions
 - Network requests
 - Security-relevant runtime events
+- Model fit recommendations for the current host
+
+## Fit Planning
+
+Use [`dyana fit`](fit.md) to rank a compact set of model recommendations against the current machine's RAM, GPU or unified memory budget, and detected local runtimes such as Ollama or MLX.
 
 ## Loaders
 
diff --git a/dyana/cli.py b/dyana/cli.py
index 6ba696f..7ded4ec 100644
--- a/dyana/cli.py
+++ b/dyana/cli.py
@@ -2,6 +2,11 @@
 import pathlib
 import platform as platform_pkg
 
+import typer
+from rich import box
+from rich import print as rich_print
+from rich.table import Table
+
 try:
     import cysimdjson
 
@@ -9,17 +14,14 @@
 except ImportError:
     _HAS_CYSIMDJSON = False
 
-import typer
-from rich import box
-from rich import print as rich_print
-from rich.table import Table
-
 import dyana.loaders as loaders_pkg
+from dyana.fit import detect_hardware, fit_result_json, recommend_models
 from dyana.loaders.loader import Loader
 from dyana.tracer.tracee import Tracer
 from dyana.view import (
     view_disk_events,
     view_disk_usage,
+    view_fit,
     view_gpus,
     view_header,
     view_imports,
@@ -45,6 +47,35 @@
 )
 
 
+@cli.command(help="Recommend models that fit the current machine.")
+def fit(
+    use_case: str = typer.Option(help="Target workload, e.g. coding, chat, reasoning, general.", default="general"),
+    top_k: int = typer.Option(help="Number of recommendations to return.", default=5),
+    runtime: str | None = typer.Option(help="Limit results to a specific runtime, e.g. ollama, mlx, llama_cpp.", default=None),
+    max_memory_gb: float | None = typer.Option(help="Override the available memory budget in GiB.", default=None),
+    preference: str = typer.Option(help="Ranking preference: balanced, quality, or speed.", default="balanced"),
+    explain_excluded: bool = typer.Option(False, help="Include a short explanation for excluded candidates."),
+    json_output: bool = typer.Option(False, "--json", help="Emit recommendations as JSON."),
+) -> None:
+    if preference not in {"balanced", "quality", "speed"}:
+        raise typer.BadParameter("preference must be one of: balanced, quality, speed")
+
+    result = recommend_models(
+        detect_hardware(),
+        use_case=use_case,
+        top_k=top_k,
+        runtime=runtime,
+        max_memory_gb=max_memory_gb,
+        preference=preference,
+        explain_excluded=explain_excluded,
+    )
+
+    if json_output:
+        rich_print(fit_result_json(result))
+    else:
+        view_fit(result.model_dump())
+
+
 @cli.command(
     help="Show the available loaders.",
 )
diff --git a/dyana/cli_test.py b/dyana/cli_test.py
index 433eb49..4354ae2 100644
--- a/dyana/cli_test.py
+++ b/dyana/cli_test.py
@@ -49,6 +49,15 @@ def test_loaders_help(self) -> None:
         assert result.exit_code == 0
         assert "--build" in _strip_ansi(result.output)
 
+    def test_fit_help(self) -> None:
+        result = runner.invoke(cli, ["fit", "--help"])
+        assert result.exit_code == 0
+        output = _strip_ansi(result.output)
+        assert "--use-case" in output
+        assert "--top-k" in output
+        assert "--runtime" in output
+        assert "--max-memory-gb" in output
+
 
 class TestSummaryCommand:
     def test_summary_with_modern_trace(self, tmp_path: t.Any) -> None:
@@ -137,6 +146,72 @@ def test_summary_missing_file(self) -> None:
         assert result.exit_code != 0
 
 
+class TestFitCommand:
+    def test_fit_text_output(self) -> None:
+        fake_result: dict[str, t.Any] = {
+            "hardware": {
+                "platform": "Linux",
+                "arch": "x86_64",
+                "total_ram_gb": 64.0,
+                "gpu_name": "RTX 4090",
+                "gpu_count": 1,
+                "total_vram_gb": 24.0,
+                "unified_memory": False,
+                "runtimes": {"automodel": True, "ollama": True, "llama_cpp": False, "mlx": False},
+            },
+            "use_case": "coding",
+            "runtime_filter": None,
+            "max_memory_gb": None,
+            "recommendations": [
+                {
+                    "model_id": "qwen25-coder-7b",
+                    "model": "Qwen2.5-Coder 7B Instruct",
+                    "runtime": "ollama",
+                    "provider": "Ollama",
+                    "quantization": "Q8_0",
+                    "mode": "gpu",
+                    "estimated_memory_gb": 8.5,
+                    "score": 92,
+                    "rationale": "Fits comfortably.",
+                    "artifact_hint": "Use an Ollama model tag.",
+                    "invocation_hint": "ollama run qwen2.5-coder:7b",
+                }
+            ],
+            "excluded": [],
+        }
+
+        with (
+            patch("dyana.cli.detect_hardware"),
+            patch("dyana.cli.recommend_models") as mock_recommend,
+        ):
+            mock_recommend.return_value.model_dump.return_value = fake_result
+            result = runner.invoke(cli, ["fit", "--use-case", "coding"])
+
+        assert result.exit_code == 0
+        output = _strip_ansi(result.output)
+        assert "Hardware" in output
+        assert "Qwen2.5-Coder 7B Instruct" in output
+        assert "ollama run qwen2.5-coder:7b" in output
+        assert "automodel" in output
+
+    def test_fit_json_output(self) -> None:
+        payload = json.dumps({"use_case": "general", "recommendations": [], "excluded": []})
+        with (
+            patch("dyana.cli.detect_hardware"),
+            patch("dyana.cli.recommend_models"),
+            patch("dyana.cli.fit_result_json", return_value=payload),
+        ):
+            result = runner.invoke(cli, ["fit", "--json"])
+
+        assert result.exit_code == 0
+        assert json.loads(result.output)["use_case"] == "general"
+
+    def test_fit_rejects_invalid_preference(self) -> None:
+        result = runner.invoke(cli, ["fit", "--preference", "unknown"])
+        assert result.exit_code == 2
+        assert "preference must be one of" in _strip_ansi(result.output)
+
+
 def _noop_loader_init(self: t.Any, **kwargs: t.Any) -> None:
     self.name = kwargs.get("name", "automodel")
     self.settings = None
diff --git a/dyana/data/__init__.py b/dyana/data/__init__.py
new file mode 100644
index 0000000..a43adf0
--- /dev/null
+++ b/dyana/data/__init__.py
@@ -0,0 +1 @@
+# Package data container for Dyana.
diff --git a/dyana/data/models.json b/dyana/data/models.json
new file mode 100644
index 0000000..7a890f7
--- /dev/null
+++ b/dyana/data/models.json
@@ -0,0 +1,68 @@
+[
+  {
+    "id": "qwen25-coder-7b",
+    "name": "Qwen2.5-Coder 7B Instruct",
+    "family": "qwen",
+    "use_cases": ["coding", "chat"],
+    "params_b": 7.0,
+    "context_k": 128,
+    "supported_providers": ["automodel", "mlx", "ollama", "llama_cpp"],
+    "supported_quantizations": ["Q4_K_M", "Q6_K", "Q8_0", "F16"],
+    "aliases": ["qwen2.5-coder:7b"]
+  },
+  {
+    "id": "qwen25-coder-14b",
+    "name": "Qwen2.5-Coder 14B Instruct",
+    "family": "qwen",
+    "use_cases": ["coding", "chat"],
+    "params_b": 14.0,
+    "context_k": 128,
+    "supported_providers": ["automodel", "mlx", "ollama", "llama_cpp"],
+    "supported_quantizations": ["Q4_K_M", "Q6_K", "Q8_0", "F16"],
+    "aliases": ["qwen2.5-coder:14b"]
+  },
+  {
+    "id": "deepseek-r1-distill-qwen-7b",
+    "name": "DeepSeek-R1-Distill-Qwen-7B",
+    "family": "deepseek",
+    "use_cases": ["reasoning", "coding"],
+    "params_b": 7.0,
+    "context_k": 32,
+    "supported_providers": ["automodel", "mlx", "ollama", "llama_cpp"],
+    "supported_quantizations": ["Q4_K_M", "Q6_K", "Q8_0", "F16"],
+    "aliases": ["deepseek-r1-distill-qwen:7b"]
+  },
+  {
+    "id": "llama31-8b",
+    "name": "Llama 3.1 8B Instruct",
+    "family": "llama",
+    "use_cases": ["chat", "general"],
+    "params_b": 8.0,
+    "context_k": 128,
+    "supported_providers": ["automodel", "mlx", "ollama", "llama_cpp"],
+    "supported_quantizations": ["Q4_K_M", "Q6_K", "Q8_0", "F16"],
+    "aliases": ["llama3.1:8b"]
+  },
+  {
+    "id": "qwen25-3b",
+    "name": "Qwen2.5 3B Instruct",
+    "family": "qwen",
+    "use_cases": ["chat", "general"],
+    "params_b": 3.0,
+    "context_k": 32,
+    "supported_providers": ["automodel", "mlx", "ollama", "llama_cpp"],
+    "supported_quantizations": ["Q4_K_M", "Q6_K", "Q8_0", "F16"],
+    "aliases": ["qwen2.5:3b"]
+  },
+  {
+    "id": "gemma3-4b",
+    "name": "Gemma 3 4B Instruct",
+    "family": "gemma",
+    "use_cases": ["chat", "general", "coding"],
+    "params_b": 4.0,
+    "context_k": 128,
+    "supported_providers": ["automodel", "mlx", "ollama", "llama_cpp"],
+    "supported_quantizations": ["Q4_K_M", "Q6_K", "Q8_0", "F16"],
+    "aliases": ["gemma3:4b"]
+  }
+]
diff --git a/dyana/data/providers.json b/dyana/data/providers.json
new file mode 100644
index 0000000..4c831fa
--- /dev/null
+++ b/dyana/data/providers.json
@@ -0,0 +1,42 @@
+[
+  {
+    "id": "automodel",
+    "name": "Dyana AutoModel",
+    "runtime_key": "automodel",
+    "supported_modes": ["gpu", "cpu", "unified"],
+    "preferred_on": ["gpu", "unified", "cpu"],
+    "quantizations": ["Q4_K_M", "Q6_K", "Q8_0", "F16"],
+    "artifact_hint": "Use a local Hugging Face model directory or weights path with Dyana's automodel loader.",
+    "invocation_template": "dyana trace --loader automodel -- --model /path/to/{model_id}"
+  },
+  {
+    "id": "mlx",
+    "name": "MLX",
+    "runtime_key": "mlx",
+    "supported_modes": ["unified"],
+    "preferred_on": ["unified"],
+    "quantizations": ["Q4_K_M", "Q6_K", "Q8_0", "F16"],
+    "artifact_hint": "Use MLX-converted weights or an MLX-native model package.",
+    "invocation_template": "mlx_lm.generate --model {model_id}"
+  },
+  {
+    "id": "ollama",
+    "name": "Ollama",
+    "runtime_key": "ollama",
+    "supported_modes": ["gpu", "cpu", "unified"],
+    "preferred_on": ["gpu", "unified"],
+    "quantizations": ["Q4_K_M", "Q6_K", "Q8_0"],
+    "artifact_hint": "Use an Ollama model tag or local Modelfile import.",
+    "invocation_template": "ollama run {model_id}"
+  },
+  {
+    "id": "llama_cpp",
+    "name": "llama.cpp",
+    "runtime_key": "llama_cpp",
+    "supported_modes": ["gpu", "cpu", "unified"],
+    "preferred_on": ["gpu", "cpu", "unified"],
+    "quantizations": ["Q4_K_M", "Q6_K", "Q8_0", "F16"],
+    "artifact_hint": "Use a GGUF artifact compatible with llama.cpp.",
+    "invocation_template": "llama-cli -m /path/to/{model_id}.gguf"
+  }
+]
diff --git a/dyana/fit/__init__.py b/dyana/fit/__init__.py
new file mode 100644
index 0000000..0940509
--- /dev/null
+++ b/dyana/fit/__init__.py
@@ -0,0 +1,30 @@
+from dyana.fit.catalog import load_catalog
+from dyana.fit.engine import estimate_model_memory_gb, fit_result_json, recommend_models
+from dyana.fit.hardware import detect_hardware, detect_nvidia_gpu
+from dyana.fit.models import (
+    ExcludedCandidate,
+    FitCatalog,
+    FitRecommendation,
+    FitResult,
+    HardwareProfile,
+    ModelSpec,
+    ProviderSpec,
+    RuntimeAvailability,
+)
+
+__all__ = [
+    "FitCatalog",
+    "FitRecommendation",
+    "FitResult",
+    "HardwareProfile",
+    "ModelSpec",
+    "ProviderSpec",
+    "RuntimeAvailability",
+    "detect_hardware",
+    "detect_nvidia_gpu",
+    "estimate_model_memory_gb",
+    "ExcludedCandidate",
+    "fit_result_json",
+    "load_catalog",
+    "recommend_models",
+]
diff --git a/dyana/fit/catalog.py b/dyana/fit/catalog.py
new file mode 100644
index 0000000..191ee41
--- /dev/null
+++ b/dyana/fit/catalog.py
@@ -0,0 +1,18 @@
+from __future__ import annotations
+
+import json
+from importlib.resources import files
+from typing import Any, cast
+
+from dyana.fit.models import FitCatalog
+
+
+def _read_catalog_file(name: str) -> list[dict[str, Any]]:
+    resource = files("dyana.data").joinpath(name)
+    return cast(list[dict[str, Any]], json.loads(resource.read_text()))
+
+
+def load_catalog() -> FitCatalog:
+    providers = _read_catalog_file("providers.json")
+    models = _read_catalog_file("models.json")
+    return FitCatalog.model_validate({"providers": providers, "models": models})
diff --git a/dyana/fit/engine.py b/dyana/fit/engine.py
new file mode 100644
index 0000000..958b3b1
--- /dev/null
+++ b/dyana/fit/engine.py
@@ -0,0 +1,216 @@
+from __future__ import annotations
+
+from dyana.fit.catalog import load_catalog
+from dyana.fit.models import (
+    ExcludedCandidate,
+    FitCatalog,
+    FitRecommendation,
+    FitResult,
+    HardwareProfile,
+    ModelSpec,
+    ProviderSpec,
+)
+
+QUANTIZATION_BYTES_PER_PARAM: dict[str, float] = {
+    "Q4_K_M": 0.62,
+    "Q6_K": 0.85,
+    "Q8_0": 1.05,
+    "F16": 2.10,
+}
+
+
+def _safe_round(value: float) -> float:
+    return round(value, 1)
+
+
+def estimate_model_memory_gb(params_b: float, quantization: str) -> float:
+    bytes_per_param = QUANTIZATION_BYTES_PER_PARAM[quantization]
+    return _safe_round(params_b * bytes_per_param * 1.15)
+
+
+def _runtime_enabled(hardware: HardwareProfile, provider: ProviderSpec) -> bool:
+    return bool(getattr(hardware.runtimes, provider.runtime_key, False))
+
+
+def _mode_capacity_gb(hardware: HardwareProfile, mode: str) -> float:
+    if mode == "unified":
+        return _safe_round(hardware.total_ram_gb * 0.7)
+    if mode == "gpu":
+        return hardware.total_vram_gb or 0.0
+    return _safe_round(hardware.total_ram_gb * 0.6)
+
+
+def _provider_viable_modes(hardware: HardwareProfile, provider: ProviderSpec) -> list[str]:
+    modes: list[str] = []
+    for mode in provider.supported_modes:
+        if mode == "unified" and hardware.unified_memory:
+            modes.append(mode)
+        elif mode == "gpu" and hardware.total_vram_gb and hardware.total_vram_gb > 0:
+            modes.append(mode)
+        elif mode == "cpu":
+            modes.append(mode)
+    return modes
+
+
+def _use_case_bonus(model: ModelSpec, requested_use_case: str) -> int:
+    if requested_use_case in model.use_cases:
+        return 18
+    if requested_use_case == "coding" and "reasoning" in model.use_cases:
+        return 8
+    return 0
+
+
+def _runtime_bonus(provider: ProviderSpec, mode: str) -> int:
+    score = 0
+    if mode in provider.preferred_on:
+        score += 6
+    runtime_bonuses = {"mlx": 4, "ollama": 3, "llama_cpp": 2}
+    return score + runtime_bonuses.get(provider.runtime_key, 0)
+
+
+def _provider_map(catalog: FitCatalog) -> dict[str, ProviderSpec]:
+    return {provider.id: provider for provider in catalog.providers}
+
+
+def _preferred_quantizations(preference: str) -> list[str]:
+    if preference == "quality":
+        return ["F16", "Q8_0", "Q6_K", "Q4_K_M"]
+    if preference == "speed":
+        return ["Q4_K_M", "Q6_K", "Q8_0", "F16"]
+    return ["Q8_0", "Q6_K", "Q4_K_M", "F16"]
+
+
+def _quantization_bonus(quantization: str, preference: str) -> int:
+    quality_bonus = {"F16": 7, "Q8_0": 5, "Q6_K": 3, "Q4_K_M": 1}
+    speed_bonus = {"Q4_K_M": 7, "Q6_K": 5, "Q8_0": 3, "F16": 1}
+    balanced_bonus = {"Q8_0": 5, "Q6_K": 4, "Q4_K_M": 3, "F16": 2}
+    if preference == "quality":
+        return quality_bonus[quantization]
+    if preference == "speed":
+        return speed_bonus[quantization]
+    return balanced_bonus[quantization]
+
+
+def recommend_models(
+    hardware: HardwareProfile,
+    use_case: str = "general",
+    top_k: int = 5,
+    runtime: str | None = None,
+    max_memory_gb: float | None = None,
+    preference: str = "balanced",
+    explain_excluded: bool = False,
+    catalog: FitCatalog | None = None,
+) -> FitResult:
+    active_catalog = catalog or load_catalog()
+    providers = _provider_map(active_catalog)
+    recommendations: list[FitRecommendation] = []
+    excluded: list[ExcludedCandidate] = []
+
+    for model in active_catalog.models:
+        best: FitRecommendation | None = None
+        model_excluded_reasons: list[ExcludedCandidate] = []
+        for provider_id in model.supported_providers:
+            provider = providers[provider_id]
+            if runtime and provider.runtime_key != runtime:
+                model_excluded_reasons.append(
+                    ExcludedCandidate(
+                        model_id=model.id,
+                        model=model.name,
+                        provider=provider.runtime_key,
+                        reason=f"runtime filter excludes provider '{provider.runtime_key}'",
+                    )
+                )
+                continue
+            if not _runtime_enabled(hardware, provider):
+                model_excluded_reasons.append(
+                    ExcludedCandidate(
+                        model_id=model.id,
+                        model=model.name,
+                        provider=provider.runtime_key,
+                        reason=f"runtime '{provider.runtime_key}' is not available on this host",
+                    )
+                )
+                continue
+
+            for mode in _provider_viable_modes(hardware, provider):
+                capacity_gb = _mode_capacity_gb(hardware, mode)
+                if max_memory_gb is not None:
+                    capacity_gb = min(capacity_gb, max_memory_gb)
+
+                shared_quants = [quant for quant in _preferred_quantizations(preference) if quant in model.supported_quantizations and quant in provider.quantizations]
+                if not shared_quants:
+                    model_excluded_reasons.append(
+                        ExcludedCandidate(
+                            model_id=model.id,
+                            model=model.name,
+                            provider=provider.runtime_key,
+                            reason="no shared quantization between model and provider",
+                        )
+                    )
+                    continue
+                for quantization in shared_quants:
+                    estimated_memory_gb = estimate_model_memory_gb(model.params_b, quantization)
+                    headroom_gb = _safe_round(capacity_gb - estimated_memory_gb)
+                    if headroom_gb < 0:
+                        if explain_excluded:
+                            model_excluded_reasons.append(
+                                ExcludedCandidate(
+                                    model_id=model.id,
+                                    model=model.name,
+                                    provider=provider.runtime_key,
+                                    reason=(
+                                        f"{quantization} needs ~{estimated_memory_gb} GiB but only "
+                                        f"{capacity_gb} GiB is available in {mode} mode"
+                                    ),
+                                )
+                            )
+                        continue
+
+                    score = 20
+                    score += _use_case_bonus(model, use_case)
+                    score += _runtime_bonus(provider, mode)
+                    score += min(int(headroom_gb * 1.5), 12)
+                    score += min(model.context_k // 32, 6)
+                    score += _quantization_bonus(quantization, preference)
+
+                    rationale = (
+                        f"Fits in {mode} memory with ~{headroom_gb} GiB headroom using {quantization}; "
+                        f"good match for {use_case} via {provider.name}."
+                    )
+                    candidate = FitRecommendation(
+                        model_id=model.id,
+                        model=model.name,
+                        family=model.family,
+                        use_case=use_case,
+                        runtime=provider.runtime_key,
+                        provider=provider.name,
+                        quantization=quantization,
+                        mode=mode,
+                        estimated_memory_gb=estimated_memory_gb,
+                        headroom_gb=headroom_gb,
+                        score=min(score, 100),
+                        rationale=rationale,
+                        artifact_hint=provider.artifact_hint,
+                        invocation_hint=provider.invocation_template.format(model_id=model.aliases[0] if model.aliases else model.id),
+                    )
+                    if best is None or candidate.score > best.score:
+                        best = candidate
+
+        if best is not None:
+            recommendations.append(best)
+        elif explain_excluded and model_excluded_reasons:
+            excluded.append(model_excluded_reasons[0])
+
+    recommendations.sort(key=lambda item: (-item.score, item.estimated_memory_gb, item.model))
+    return FitResult(
+        hardware=hardware,
+        use_case=use_case,
+        recommendations=recommendations[:top_k],
+        runtime_filter=runtime,
+        max_memory_gb=max_memory_gb,
+        excluded=excluded[:top_k] if explain_excluded else [],
+    )
+
+
+def fit_result_json(result: FitResult) -> str:
+    return result.model_dump_json(indent=2)
diff --git a/dyana/fit/hardware.py b/dyana/fit/hardware.py
new file mode 100644
index 0000000..7f2b1ac
--- /dev/null
+++ b/dyana/fit/hardware.py
@@ -0,0 +1,87 @@
+from __future__ import annotations
+
+import os
+import platform
+import shutil
+import subprocess
+
+from dyana.fit.models import HardwareProfile, RuntimeAvailability
+
+
+def _safe_round(value: float) -> float:
+    return round(value, 1)
+
+
+def detect_total_ram_gb() -> float:
+    if hasattr(os, "sysconf") and "SC_PAGE_SIZE" in os.sysconf_names and "SC_PHYS_PAGES" in os.sysconf_names:
+        page_size = int(os.sysconf("SC_PAGE_SIZE"))
+        pages = int(os.sysconf("SC_PHYS_PAGES"))
+        return _safe_round((page_size * pages) / (1024**3))
+
+    return 0.0
+
+
+def detect_runtimes() -> RuntimeAvailability:
+    return RuntimeAvailability(
+        automodel=True,
+        ollama=shutil.which("ollama") is not None,
+        llama_cpp=shutil.which("llama-cli") is not None or shutil.which("llama-server") is not None,
+        mlx=platform.system() == "Darwin" and platform.machine() == "arm64",
+    )
+
+
+def detect_nvidia_gpu() -> tuple[str | None, int, float | None]:
+    binary = shutil.which("nvidia-smi")
+    if not binary:
+        return None, 0, None
+
+    try:
+        output = subprocess.check_output(
+            [
+                binary,
+                "--query-gpu=name,memory.total",
+                "--format=csv,noheader,nounits",
+            ],
+            text=True,
+        )
+    except Exception:
+        return None, 0, None
+
+    rows = [row.strip() for row in output.splitlines() if row.strip()]
+    if not rows:
+        return None, 0, None
+
+    names: list[str] = []
+    total_mb = 0.0
+    for row in rows:
+        name, mem = [part.strip() for part in row.split(",", maxsplit=1)]
+        names.append(name)
+        total_mb += float(mem)
+
+    return names[0], len(rows), _safe_round(total_mb / 1024)
+
+
+def detect_hardware() -> HardwareProfile:
+    system = platform.system()
+    arch = platform.machine()
+    ram_gb = detect_total_ram_gb()
+    gpu_name, gpu_count, total_vram_gb = detect_nvidia_gpu()
+    runtimes = detect_runtimes()
+    unified_memory = system == "Darwin" and arch == "arm64"
+
+    if unified_memory and total_vram_gb is None:
+        total_vram_gb = _safe_round(ram_gb * 0.7)
+        if gpu_name is None:
+            gpu_name = "Apple Silicon"
+            gpu_count = 1
+
+    return HardwareProfile(
+        platform=system,
+        arch=arch,
+        total_ram_gb=ram_gb,
+        gpu_name=gpu_name,
+        gpu_count=gpu_count,
+        total_vram_gb=total_vram_gb,
+        unified_memory=unified_memory,
+        runtimes=runtimes,
+    )
diff --git a/dyana/fit/models.py b/dyana/fit/models.py
new file mode 100644
index 0000000..f7344be
--- /dev/null
+++ b/dyana/fit/models.py
@@ -0,0 +1,82 @@
+from __future__ import annotations
+
+from pydantic import BaseModel
+
+
+class RuntimeAvailability(BaseModel):
+    automodel: bool = True
+    ollama: bool = False
+    llama_cpp: bool = False
+    mlx: bool = False
+
+
+class HardwareProfile(BaseModel):
+    platform: str
+    arch: str
+    total_ram_gb: float
+    gpu_name: str | None = None
+    gpu_count: int = 0
+    total_vram_gb: float | None = None
+    unified_memory: bool = False
+    runtimes: RuntimeAvailability
+
+
+class ProviderSpec(BaseModel):
+    id: str
+    name: str
+    runtime_key: str
+    supported_modes: list[str]
+    preferred_on: list[str] = []
+    quantizations: list[str]
+    artifact_hint: str
+    invocation_template: str
+
+
+class ModelSpec(BaseModel):
+    id: str
+    name: str
+    family: str
+    use_cases: list[str]
+    params_b: float
+    context_k: int
+    supported_providers: list[str]
+    supported_quantizations: list[str]
+    aliases: list[str] = []
+
+
+class FitCatalog(BaseModel):
+    providers: list[ProviderSpec]
+    models: list[ModelSpec]
+
+
+class FitRecommendation(BaseModel):
+    model_id: str
+    model: str
+    family: str
+    use_case: str
+    runtime: str
+    provider: str
+    quantization: str
+    mode: str
+    estimated_memory_gb: float
+    headroom_gb: float
+    score: int
+    rationale: str
+    artifact_hint: str
+    invocation_hint: str
+
+
+class ExcludedCandidate(BaseModel):
+    model_id: str
+    model: str
+    provider: str
+    reason: str
+
+
+class FitResult(BaseModel):
+    hardware: HardwareProfile
+    use_case: str
+    recommendations: list[FitRecommendation]
+    runtime_filter: str | None = None
+    max_memory_gb: float | None = None
+    excluded: list[ExcludedCandidate] = []
diff --git a/dyana/fit_test.py b/dyana/fit_test.py
new file mode 100644
index 0000000..1f92eab
--- /dev/null
+++ b/dyana/fit_test.py
@@ -0,0 +1,190 @@
+from unittest.mock import patch
+
+from dyana.fit import (
+    HardwareProfile,
+    RuntimeAvailability,
+    detect_hardware,
+    detect_nvidia_gpu,
+    estimate_model_memory_gb,
+    load_catalog,
+    recommend_models,
+)
+
+
+class TestEstimateModelMemory:
+    def test_q4_estimate(self) -> None:
+        assert estimate_model_memory_gb(7.0, "Q4_K_M") > 0
+
+    def test_f16_larger_than_q4(self) -> None:
+        assert estimate_model_memory_gb(7.0, "F16") > estimate_model_memory_gb(7.0, "Q4_K_M")
+
+
+class TestDetectNvidiaGpu:
+    def test_no_binary(self) -> None:
+        with patch("dyana.fit.hardware.shutil.which", return_value=None):
+            assert detect_nvidia_gpu() == (None, 0, None)
+
+    def test_parses_multiple_gpus(self) -> None:
+        with (
+            patch("dyana.fit.hardware.shutil.which", return_value="/usr/bin/nvidia-smi"),
+            patch(
+                "dyana.fit.hardware.subprocess.check_output",
+                return_value="NVIDIA RTX 4090, 24564\nNVIDIA RTX 4090, 24564\n",
+            ),
+        ):
+            name, count, total_vram_gb = detect_nvidia_gpu()
+            assert name == "NVIDIA RTX 4090"
+            assert count == 2
+            assert total_vram_gb is not None
+            assert total_vram_gb > 40
+
+
+class TestDetectHardware:
+    def test_detects_apple_unified_memory(self) -> None:
+        with (
+            patch("dyana.fit.hardware.platform.system", return_value="Darwin"),
+            patch("dyana.fit.hardware.platform.machine", return_value="arm64"),
+            patch("dyana.fit.hardware.detect_total_ram_gb", return_value=64.0),
+            patch("dyana.fit.hardware.detect_nvidia_gpu", return_value=(None, 0, None)),
+            patch("dyana.fit.hardware.detect_runtimes", return_value=RuntimeAvailability(mlx=True)),
+        ):
+            hardware = detect_hardware()
+            assert hardware.unified_memory is True
+            assert hardware.gpu_name == "Apple Silicon"
+            assert hardware.total_vram_gb == 44.8
+
+    def test_detects_standard_linux_host(self) -> None:
+        with (
+            patch("dyana.fit.hardware.platform.system", return_value="Linux"),
+            patch("dyana.fit.hardware.platform.machine", return_value="x86_64"),
+            patch("dyana.fit.hardware.detect_total_ram_gb", return_value=32.0),
+            patch("dyana.fit.hardware.detect_nvidia_gpu", return_value=("RTX 4090", 1, 24.0)),
+            patch(
+                "dyana.fit.hardware.detect_runtimes",
+                return_value=RuntimeAvailability(ollama=True, llama_cpp=False, mlx=False),
+            ),
+        ):
+            hardware = detect_hardware()
+            assert hardware.platform == "Linux"
+            assert hardware.gpu_count == 1
+            assert hardware.total_vram_gb == 24.0
+            assert hardware.runtimes.ollama is True
+
+
+class TestRecommendModels:
+    def test_catalog_loads_from_data_files(self) -> None:
+        catalog = load_catalog()
+        assert len(catalog.providers) >= 1
+        assert len(catalog.models) >= 1
+
+    def test_prefers_coding_models(self) -> None:
+        hardware = HardwareProfile(
+            platform="Linux",
+            arch="x86_64",
+            total_ram_gb=64.0,
+            gpu_name="RTX 4090",
+            gpu_count=1,
+            total_vram_gb=24.0,
+            runtimes=RuntimeAvailability(ollama=True),
+        )
+
+        result = recommend_models(hardware, use_case="coding", top_k=3)
+
+        assert len(result.recommendations) == 3
+        assert any("Coder" in recommendation.model for recommendation in result.recommendations)
+        assert result.recommendations[0].score >= result.recommendations[-1].score
+        assert result.recommendations[0].artifact_hint
+        assert result.recommendations[0].invocation_hint
+
+    def test_automodel_runtime_filter_works(self) -> None:
+        hardware = HardwareProfile(
+            platform="Linux",
+            arch="x86_64",
+            total_ram_gb=64.0,
+            gpu_name="RTX 4090",
+            gpu_count=1,
+            total_vram_gb=24.0,
+            runtimes=RuntimeAvailability(automodel=True),
+        )
+
+        result = recommend_models(hardware, use_case="coding", runtime="automodel", top_k=2)
+
+        assert len(result.recommendations) == 2
+        assert all(recommendation.runtime == "automodel" for recommendation in result.recommendations)
+        assert all("dyana trace --loader automodel" in recommendation.invocation_hint for recommendation in result.recommendations)
+
+    def test_returns_no_recommendations_for_tiny_machine(self) -> None:
+        hardware = HardwareProfile(
+            platform="Linux",
+            arch="x86_64",
+            total_ram_gb=1.0,
+            gpu_name=None,
+            gpu_count=0,
+            total_vram_gb=None,
+            runtimes=RuntimeAvailability(),
+        )
+
+        result = recommend_models(hardware, use_case="general", top_k=5)
+        assert result.recommendations == []
+
+    def test_cpu_only_mode_works(self) -> None:
+        hardware = HardwareProfile(
+            platform="Linux",
+            arch="x86_64",
+            total_ram_gb=24.0,
+            gpu_name=None,
+            gpu_count=0,
+            total_vram_gb=None,
+            runtimes=RuntimeAvailability(llama_cpp=True),
+        )
+
+        result = recommend_models(hardware, use_case="general", top_k=2)
+        assert len(result.recommendations) == 2
+        assert all(recommendation.mode == "cpu" for recommendation in result.recommendations)
+
+    def test_runtime_filter_limits_results(self) -> None:
+        hardware = HardwareProfile(
+            platform="Darwin",
+            arch="arm64",
+            total_ram_gb=24.0,
+            gpu_name="Apple Silicon",
+            gpu_count=1,
+            total_vram_gb=16.8,
+            unified_memory=True,
+            runtimes=RuntimeAvailability(ollama=True, mlx=True),
+        )
+
+        result = recommend_models(hardware, use_case="coding", runtime="ollama", top_k=2)
+        assert len(result.recommendations) == 2
+        assert all(recommendation.runtime == "ollama" for recommendation in result.recommendations)
+
+    def test_max_memory_budget_restricts_recommendations(self) -> None:
+        hardware = HardwareProfile(
+            platform="Darwin",
+            arch="arm64",
+            total_ram_gb=24.0,
+            gpu_name="Apple Silicon",
+            gpu_count=1,
+            total_vram_gb=16.8,
+            unified_memory=True,
+            runtimes=RuntimeAvailability(mlx=True),
+        )
+
+        result = recommend_models(hardware, use_case="coding", max_memory_gb=6.0, top_k=5)
+        assert result.recommendations
+        assert all(recommendation.estimated_memory_gb <= 6.0 for recommendation in result.recommendations)
+
+    def test_explain_excluded_returns_reasons(self) -> None:
+        hardware = HardwareProfile(
+            platform="Linux",
+            arch="x86_64",
+            total_ram_gb=4.0,
+            gpu_name=None,
+            gpu_count=0,
+            total_vram_gb=None,
+            runtimes=RuntimeAvailability(ollama=True),
+        )
+
+        result = recommend_models(hardware, use_case="coding", top_k=3, explain_excluded=True)
+        assert result.excluded
+        assert result.excluded[0].reason
diff --git a/dyana/view.py b/dyana/view.py
index 284d081..e6e601a 100644
--- a/dyana/view.py
+++ b/dyana/view.py
@@ -102,6 +102,69 @@ def severity_fmt(level: int) -> str:
         return "[bold dim]no severity[/]"
 
 
+def view_fit(result: dict[str, t.Any]) -> None:
+    hardware = result["hardware"]
+    rich_print("[bold cyan]Hardware:[/]")
+    rich_print(f"  Platform      : {hardware['platform']} ({hardware['arch']})")
+    rich_print(f"  System RAM    : {hardware['total_ram_gb']} GiB")
+    if hardware.get("gpu_name"):
+        rich_print(
+            f"  GPU           : {hardware['gpu_name']} x{hardware['gpu_count']} "
+            f"({hardware.get('total_vram_gb', 0)} GiB)"
+        )
+    else:
+        rich_print("  GPU           : none detected")
+
+    runtimes = [name for name, enabled in hardware["runtimes"].items() if enabled]
+    rich_print(f"  Runtimes      : {', '.join(runtimes) if runtimes else 'none detected'}")
+    if result.get("runtime_filter"):
+        rich_print(f"  Runtime Filter: {result['runtime_filter']}")
+    if result.get("max_memory_gb") is not None:
+        rich_print(f"  Max Memory    : {result['max_memory_gb']} GiB")
+    rich_print()
+
+    rich_print(f"[bold cyan]Recommendations For {result['use_case'].title()}:[/]")
+    recommendations = result.get("recommendations", [])
+    if not recommendations:
+        rich_print("  No viable recommendations found for the detected hardware.")
+        rich_print()
+        return
+
+    table = Table(box=box.ROUNDED)
+    table.add_column("Model", style="green")
+    table.add_column("Runtime", style="cyan")
+    table.add_column("Quant")
+    table.add_column("Mode")
+    table.add_column("Est. Mem")
+    table.add_column("Score")
+    table.add_column("Hint")
+
+    for recommendation in recommendations:
+        table.add_row(
+            recommendation["model"],
+            recommendation["runtime"],
+            recommendation["quantization"],
+            recommendation["mode"],
+            f"{recommendation['estimated_memory_gb']} GiB",
+            str(recommendation["score"]),
+            recommendation["artifact_hint"],
+        )
+
+    rich_print(table)
+    rich_print()
+    for recommendation in recommendations:
+        rich_print(f"  * [bold]{recommendation['model']}[/] - {recommendation['rationale']}")
+        rich_print(f"    next step: {recommendation['invocation_hint']}")
+
+    excluded = result.get("excluded", [])
+    if excluded:
+        rich_print()
+        rich_print("[bold cyan]Excluded:[/]")
+        for item in excluded:
+            rich_print(f"  * [bold]{item['model']}[/] via {item['provider']} - {item['reason']}")
+    rich_print()
+
+
 def view_header(trace: dict[str, t.Any], is_legacy: bool) -> None:
     run = trace["run"]
     if is_legacy:
diff --git a/mkdocs.yaml b/mkdocs.yaml
index c17f56f..36d4686 100644
--- a/mkdocs.yaml
+++ b/mkdocs.yaml
@@ -9,6 +9,7 @@ nav:
       - Overview: index.md
       - Install: install.md
       - Basic Usage: basic-usage.md
+      - Fit Planning: fit.md
   - Topics:
       - Loaders: topics/loaders.md