Climate-REF · lewisjared · Mar 8, 2026 · Mar 6, 2026 · Mar 6, 2026 · Mar 6, 2026
diff --git a/.envrc b/.envrc
@@ -0,0 +1 @@
+dotenv_if_exists .env
diff --git a/README.md b/README.md
@@ -3,10 +3,11 @@
 This repository contains the API and Frontend for the Climate Rapid Evaluation Framework (REF). This system enables comprehensive benchmarking and evaluation of Earth system models against observational data, integrating with the `climate-ref` core library.
 
 This is a full-stack application that consists of a:
-*   **Backend**: FastAPI API (Python 3.11+)
-    *   FastAPI, Pydantic, SQLAlchemy, OpenAPI documentation
-*   **Frontend**: React frontend (React 19, TypeScript)
-    *   Vite, Tanstack Router, Tanstack Query, Tailwind CSS, Shadcn/ui, Recharts
+
+* **Backend**: FastAPI API (Python 3.11+)
+  * FastAPI, Pydantic, SQLAlchemy, OpenAPI documentation
+* **Frontend**: React frontend (React 19, TypeScript)
+  * Vite, Tanstack Router, Tanstack Query, Tailwind CSS, Shadcn/ui, Recharts
 
 **Status**: Alpha
 
@@ -15,29 +16,41 @@ This is a full-stack application that consists of a:
 [![Last Commit](https://img.shields.io/github/last-commit/Climate-REF/ref-app.svg)](https://github.com/Climate-REF/climate-ref/commits/main)
 [![Contributors](https://img.shields.io/github/contributors/Climate-REF/ref-app.svg)](https://github.com/Climate-REF/ref-app/graphs/contributors)
 
-
 ## Overview
 
 The Climate REF Web Application provides researchers and scientists with tools to:
--   Enable rapid model evaluation and near real-time assessment of climate model performance.
--   Provide standardized, reproducible evaluation metrics across different models and datasets.
--   Make complex climate model diagnostics accessible through an intuitive web interface.
--   Ensure evaluation processes are transparent and results are traceable.
--   Consolidate various diagnostic tools into a unified framework.
--   Automate the execution of diagnostics when new datasets are available.
--   Help researchers find and understand available datasets and their evaluation status.
--   Enable easy comparison of model performance across different versions and experiments.
+
+* Enable rapid model evaluation and near real-time assessment of climate model performance.
+* Provide standardized, reproducible evaluation metrics across different models and datasets.
+* Make complex climate model diagnostics accessible through an intuitive web interface.
+* Ensure evaluation processes are transparent and results are traceable.
+* Consolidate various diagnostic tools into a unified framework.
+* Automate the execution of diagnostics when new datasets are available.
+* Help researchers find and understand available datasets and their evaluation status.
+* Enable easy comparison of model performance across different versions and experiments.
+
+## Updating Diagnostic Content
+
+Display metadata for each AFT diagnostic collection (descriptions, explanations, plain-language summaries)
+is maintained in YAML files under [`backend/static/collections/`](backend/static/collections/).
+See the [collections README](backend/static/collections/README.md) for the full schema and instructions.
+
+Diagnostic-level metadata overrides (display names, reference datasets, tags) are split into per-provider
+YAML files under `backend/static/diagnostics/` (e.g. `pmp.yaml`, `esmvaltool.yaml`, `ilamb.yaml`),
+which can be regenerated from the provider registry with `make generate-metadata`.
+
+After changing content fields or adding new collections, regenerate the frontend TypeScript client with `make generate-client`.
 
 ## Getting Started
 
 ### Prerequisites
 
--   Python 3.11+ (with `uv` for package management)
--   Node.js v20 and npm (for frontend)
--   Database: SQLite (development/test) or PostgreSQL (production)
--   Docker and Docker Compose (optional, for containerized deployment)
+* Python 3.11+ (with `uv` for package management)
+* Node.js v20 and npm (for frontend)
+* Database: SQLite (development/test) or PostgreSQL (production)
+* Docker and Docker Compose (optional, for containerized deployment)
 
-1.  **Clone the repository**
+1. **Clone the repository**
 
     ```bash
     git clone https://github.com/Climate-REF/ref-app.git
@@ -46,7 +59,7 @@ The Climate REF Web Application provides researchers and scientists with tools t
 
 ### Backend Setup
 
-2.  **Set up environment variables**
+1. **Set up environment variables**
 
     Create a `.env` file in the project root by copying the `.env.example` file.
 
@@ -56,35 +69,35 @@ The Climate REF Web Application provides researchers and scientists with tools t
 
     Modify the `.env` to your needs. The `REF_CONFIGURATION` variable should point to the configuration directory for the REF, which defines the database connection string and other REF-specific settings.
 
-3.  **Install dependencies**
+2. **Install dependencies**
 
     ```bash
     cd backend
     make virtual-environment
     ```
 
-4.  **Start the backend server**
+3. **Start the backend server**
 
     ```bash
     make dev
     ```
 
 ### Frontend Setup
 
-1.  **Generate Client**
+1. **Generate Client**
 
     ```bash
     make generate-client
     ```
 
-2.  **Install dependencies**
+2. **Install dependencies**
 
     ```bash
     cd frontend
     npm install
     ```
 
-3.  **Start the frontend server**
+3. **Start the frontend server**
 
     ```bash
     npm run dev
@@ -104,6 +117,9 @@ ref-app/
 │   │       │   └── main.py  # API router aggregation
 │   │       ├── core/        # Core application logic (config, file handling, REF initialization)
 │   │       └── models.py    # Pydantic models for API responses
+│   ├── static/
+│   │   ├── collections/     # Per-collection YAML metadata (see collections/README.md)
+│   │   └── diagnostics/     # Diagnostic metadata overrides
 │   ├── tests/               # Backend test suite
 │   ├── pyproject.toml       # Python dependencies and project metadata
 │   └── uv.lock              # uv lock file for reproducible dependencies
@@ -125,6 +141,7 @@ ref-app/
 ## API Documentation
 
 When the backend is running, API documentation is available at:
--   Swagger UI: http://localhost:8001/docs
--   ReDoc: http://localhost:8001/redoc
--   OpenAPI JSON: http://localhost:8001/openapi.json
+
+* Swagger UI: <http://localhost:8001/docs>
+* ReDoc: <http://localhost:8001/redoc>
+* OpenAPI JSON: <http://localhost:8001/openapi.json>
diff --git a/backend/pyproject.toml b/backend/pyproject.toml
@@ -11,7 +11,7 @@ dependencies = [
     "psycopg[binary]<4.0.0,>=3.1.13",
     "pydantic-settings<3.0.0,>=2.2.1",
     "sentry-sdk[fastapi]>=2.0.0",
-    "climate-ref[aft-providers,postgres]>=0.12.0",
+    "climate-ref[aft-providers,postgres]>=0.12.2",
     "loguru",
     "pyyaml>=6.0",
     "fastapi-sqlalchemy-monitor>=1.1.3",

diff --git a/backend/scripts/generate_metadata.py b/backend/scripts/generate_metadata.py
@@ -1,16 +1,17 @@
 """
 Generate diagnostic metadata YAML from the current provider registry.
 
-This script bootstraps or updates the metadata.yaml file by iterating all
-registered diagnostics and capturing their current state (display_name,
-description, tags, reference_datasets). Existing values in metadata.yaml
+This script bootstraps or updates the per-provider metadata files by iterating
+all registered diagnostics and capturing their current state (display_name,
+description, tags, reference_datasets). Existing values in the metadata files
 take precedence over auto-generated values.
 
 Usage:
     cd backend && uv run python scripts/generate_metadata.py
 
 Options:
-    --output PATH   Write to a specific file (default: static/diagnostics/metadata.yaml)
+    --output PATH   Write to a specific file (default: writes per-provider files
+                    into static/diagnostics/)
     --dry-run       Print to stdout instead of writing to file
 """
 
@@ -72,45 +73,35 @@ def _build_entry(
 
 
 def generate_metadata(output_path: Path | None = None, *, dry_run: bool = False) -> None:
-    """Generate metadata.yaml from the provider registry, merging with existing values."""
+    """Generate per-provider metadata YAML files from the provider registry, merging with existing values."""
     settings = Settings()
     ref_config = get_ref_config(settings)
     database = get_database(ref_config)
     provider_registry = get_provider_registry(ref_config)
 
-    # Load existing metadata (existing values take precedence)
-    default_metadata_path = backend_dir / "static" / "diagnostics" / "metadata.yaml"
-    metadata_path = output_path or default_metadata_path
-    existing_metadata = load_diagnostic_metadata(metadata_path)
+    # Load existing metadata from the directory (existing values take precedence)
+    default_metadata_dir = backend_dir / "static" / "diagnostics"
+    metadata_dir = output_path or default_metadata_dir
+    existing_metadata = load_diagnostic_metadata(metadata_dir)
 
-    # Iterate all registered diagnostics
-    generated: dict[str, dict[str, Any]] = {}
+    # Group diagnostics by provider
+    by_provider: dict[str, dict[str, dict[str, Any]]] = {}
 
     with database.session.connection():
         for provider_slug, diagnostics in provider_registry.metrics.items():
             for diagnostic_slug, concrete_diagnostic in diagnostics.items():
                 key = f"{provider_slug}/{diagnostic_slug}"
-                generated[key] = _build_entry(key, diagnostic_slug, concrete_diagnostic, existing_metadata)
+                entry = _build_entry(key, diagnostic_slug, concrete_diagnostic, existing_metadata)
+                by_provider.setdefault(provider_slug, {})[key] = entry
 
     # Also include any entries from existing metadata that weren't found in the registry
     for key, metadata in existing_metadata.items():
-        if key not in generated:
-            generated[key] = _metadata_to_dict(metadata)
-
-    # Sort by key for consistent output
-    sorted_metadata = dict(sorted(generated.items()))
-
-    # Generate YAML output
-    yaml_content = yaml.dump(
-        sorted_metadata,
-        default_flow_style=False,
-        sort_keys=False,
-        allow_unicode=True,
-        width=120,
-    )
+        provider_slug = key.split("/")[0]
+        if key not in by_provider.get(provider_slug, {}):
+            by_provider.setdefault(provider_slug, {})[key] = _metadata_to_dict(metadata)
 
     header = (
-        "# Diagnostic Metadata\n"
+        "# {provider} Diagnostic Metadata\n"
         "#\n"
         "# Auto-generated by: cd backend && uv run python scripts/generate_metadata.py\n"
         "#\n"
@@ -120,15 +111,32 @@ def generate_metadata(output_path: Path | None = None, *, dry_run: bool = False)
         "#\n\n"
     )
 
-    output = header + yaml_content
+    total = 0
+    for provider_slug, entries in sorted(by_provider.items()):
+        sorted_entries = dict(sorted(entries.items()))
+        total += len(sorted_entries)
+
+        yaml_content = yaml.dump(
+            sorted_entries,
+            default_flow_style=False,
+            sort_keys=False,
+            allow_unicode=True,
+            width=120,
+        )
+
+        output = header.format(provider=provider_slug) + yaml_content
+
+        if dry_run:
+            print(f"--- {provider_slug}.yaml ---")
+            print(output)
+        else:
+            metadata_dir.mkdir(parents=True, exist_ok=True)
+            file_path = metadata_dir / f"{provider_slug}.yaml"
+            file_path.write_text(output)
+            print(f"Generated metadata written to {file_path} ({len(sorted_entries)} diagnostics)")
 
-    if dry_run:
-        print(output)
-    else:
-        metadata_path.parent.mkdir(parents=True, exist_ok=True)
-        metadata_path.write_text(output)
-        print(f"Generated metadata written to {metadata_path}")
-        print(f"Total diagnostics: {len(sorted_metadata)}")
+    if not dry_run:
+        print(f"Total diagnostics across all providers: {total}")
 
 
 def main() -> None:
@@ -137,7 +145,7 @@ def main() -> None:
         "--output",
         type=Path,
         default=None,
-        help="Output file path (default: static/diagnostics/metadata.yaml)",
+        help="Output directory (default: static/diagnostics/)",
     )
     parser.add_argument(
         "--dry-run",

diff --git a/backend/src/ref_backend/api/main.py b/backend/src/ref_backend/api/main.py
@@ -1,11 +1,12 @@
 from fastapi import APIRouter
 
-from ref_backend.api.routes import aft, datasets, diagnostics, executions, results, utils
+from ref_backend.api.routes import aft, datasets, diagnostics, executions, explorer, results, utils
 
 api_router = APIRouter()
 api_router.include_router(aft.router)
 api_router.include_router(datasets.router)
 api_router.include_router(diagnostics.router)
 api_router.include_router(executions.router)
+api_router.include_router(explorer.router)
 api_router.include_router(results.router)
 api_router.include_router(utils.router)
diff --git a/backend/src/ref_backend/api/routes/explorer.py b/backend/src/ref_backend/api/routes/explorer.py
@@ -0,0 +1,40 @@
+from fastapi import APIRouter, HTTPException
+
+from ref_backend.core.collections import (
+    AFTCollectionDetail,
+    AFTCollectionSummary,
+    ThemeDetail,
+    ThemeSummary,
+    get_collection_by_id,
+    get_collection_summaries,
+    get_theme_by_slug,
+    get_theme_summaries,
+)
+
+router = APIRouter(prefix="/explorer", tags=["Explorer"])
+
+
+@router.get("/collections/", response_model=list[AFTCollectionSummary])
+async def list_collections() -> list[AFTCollectionSummary]:
+    return get_collection_summaries()
+
+
+@router.get("/collections/{collection_id}", response_model=AFTCollectionDetail)
+async def get_collection(collection_id: str) -> AFTCollectionDetail:
+    result = get_collection_by_id(collection_id)
+    if result is None:
+        raise HTTPException(status_code=404, detail=f"Collection '{collection_id}' not found")
+    return result
+
+
+@router.get("/themes/", response_model=list[ThemeSummary])
+async def list_themes() -> list[ThemeSummary]:
+    return get_theme_summaries()
+
+
+@router.get("/themes/{theme_slug}", response_model=ThemeDetail)
+async def get_theme(theme_slug: str) -> ThemeDetail:
+    result = get_theme_by_slug(theme_slug)
+    if result is None:
+        raise HTTPException(status_code=404, detail=f"Theme '{theme_slug}' not found")
+    return result