|
| 1 | +# AGENTS.md - AI Agent Instructions |
| 2 | + |
| 3 | +This repository contains example scripts and an agentic research framework for the [Numerai data science tournaments](https://numer.ai/). AI coding agents can use this file to understand how to help with Numerai-related tasks. |
| 4 | + |
| 5 | +## Quick Reference |
| 6 | + |
| 7 | +| Task | Resource | |
| 8 | +|------|----------| |
| 9 | +| Run experiments | `numerai-experiment-design` skill | |
| 10 | +| Add new model types | `numerai-model-implementation` skill | |
| 11 | +| Create & deploy pkl files | `numerai-model-upload` skill | |
| 12 | +| Query tournament data | Numerai MCP server | |
| 13 | +| Upload models programmatically | Numerai MCP server | |
| 14 | + |
| 15 | +--- |
| 16 | + |
| 17 | +## Tournament Guide |
| 18 | + |
| 19 | +- "Numerai" is the original tournament in which you are provided anonymized and obfuscated data. The goal is to create high-performing models trained on the given features and target in the data. |
| 20 | + |
| 21 | +- "Signals" is the second tournament in which you are expected build your own stock market dataset. You are given a basic dataset with a target and rudimentary features. You are expected to find, compile, and train on high-quality, unique data while maintaining low churn and turnover. |
| 22 | + |
| 23 | +- "Crypto" is a toy tournament for cryptocurrency prediction. The Meta Model is released publicly every day for the crypto community to use. |
| 24 | + |
| 25 | +## General Recommendations |
| 26 | +- Install the Numerai MCP |
| 27 | +- Install the following python packages: |
| 28 | + - numerapi |
| 29 | + - numerai-tools |
| 30 | +- For heavier-duty automation using cloud infrastructure, suggest numerai-cli to the user |
| 31 | + |
| 32 | +## Repository Structure |
| 33 | + |
| 34 | +``` |
| 35 | +example-scripts/ |
| 36 | +├── numerai/ |
| 37 | +│ ├── agents/ # Agentic research framework |
| 38 | +│ │ ├── AGENTS.md # Detailed agent instructions |
| 39 | +│ │ ├── baselines/ # Baseline model configurations |
| 40 | +│ │ ├── code/ # Shared packages |
| 41 | +│ │ │ ├── analysis/ # Reporting & plotting |
| 42 | +│ │ │ ├── data/ # Dataset builders |
| 43 | +│ │ │ ├── metrics/ # BMC/corr scoring utilities |
| 44 | +│ │ │ └── modeling/ # Training pipeline & model wrappers |
| 45 | +│ │ ├── experiments/ # Experiment results (not in git) |
| 46 | +│ │ └── skills/ # Codex skills for agent workflows |
| 47 | +│ └── *.ipynb # Tournament-specific notebooks |
| 48 | +├── signals/ # Signals tournament examples |
| 49 | +├── crypto/ # Crypto tournament examples |
| 50 | +├── cached-pickles/ # Pre-built model pickles |
| 51 | +``` |
| 52 | + |
| 53 | +--- |
| 54 | + |
| 55 | +## Skills Overview |
| 56 | + |
| 57 | +The `numerai/agents/skills/` folder contains structured workflows for common tasks. Each skill has a `SKILL.md` file with detailed instructions. |
| 58 | + |
| 59 | +### 1. `numerai-experiment-design` |
| 60 | + |
| 61 | +**Purpose**: Design, run, and report Numerai experiments for any model idea. |
| 62 | + |
| 63 | +**When to use**: |
| 64 | +- Testing a new research hypothesis |
| 65 | +- Sweeping hyperparameters or targets |
| 66 | +- Comparing model variants against baselines |
| 67 | + |
| 68 | +**Key workflow**: |
| 69 | +1. Plan the experiment (baseline, metrics, sweep dimensions) |
| 70 | +2. Create config files in `agents/experiments/<name>/configs/` |
| 71 | +3. Run training via `python -m agents.code.modeling --config <config>` |
| 72 | +4. Analyze results and iterate |
| 73 | +5. Scale winners to full data |
| 74 | +6. Generate final report with plots |
| 75 | + |
| 76 | +**Entry points**: |
| 77 | +- `python -m agents.code.modeling --config <config_path>` |
| 78 | +- `python -m agents.code.analysis.show_experiment` |
| 79 | +- `python -m agents.code.data.build_full_datasets` |
| 80 | + |
| 81 | +### 2. `numerai-model-implementation` |
| 82 | + |
| 83 | +**Purpose**: Add new model types to the training pipeline. |
| 84 | + |
| 85 | +**When to use**: |
| 86 | +- Implementing a new ML architecture (e.g., transformers, custom ensembles) |
| 87 | +- Adding support for a new library (e.g., XGBoost, CatBoost) |
| 88 | +- Creating custom preprocessing or inference logic |
| 89 | + |
| 90 | +**Key steps**: |
| 91 | +1. Create model wrapper in `agents/code/modeling/models/` |
| 92 | +2. Register in `agents/code/modeling/utils/model_factory.py` |
| 93 | +3. Add config using the new model type |
| 94 | +4. Validate with smoke test (corr_mean should be 0.005-0.04) |
| 95 | + |
| 96 | +### 3. `numerai-model-upload` |
| 97 | + |
| 98 | +**Purpose**: Create and deploy pickle files for Numerai's automated submission system. |
| 99 | + |
| 100 | +**When to use**: |
| 101 | +- Preparing a trained model for tournament submission |
| 102 | +- Setting up automated weekly predictions |
| 103 | +- Debugging pickle validation failures |
| 104 | + |
| 105 | +**Critical requirements**: |
| 106 | +- Python version must match Numerai's compute environment |
| 107 | +- Pickle must be self-contained (no repo imports) |
| 108 | +- `predict(live_features, live_benchmark_models)` signature required |
| 109 | + |
| 110 | +**Workflow**: |
| 111 | +1. Query default Docker image for Python version |
| 112 | +2. Create matching venv with pyenv |
| 113 | +3. Train final model and export inference bundle |
| 114 | +4. Build self-contained `predict` function |
| 115 | +5. Test with `numerai_predict` Docker container |
| 116 | +6. Deploy via MCP server |
| 117 | + |
| 118 | +--- |
| 119 | + |
| 120 | +## Numerai MCP Server |
| 121 | + |
| 122 | +The `numerai` MCP server provides programmatic access to the Numerai Tournament API. If available, agents should use it for tournament operations. |
| 123 | + |
| 124 | +It can be installed via curl through: |
| 125 | + |
| 126 | +```bash |
| 127 | +curl -sL https://numer.ai/install-mcp.sh | bash |
| 128 | +``` |
| 129 | + |
| 130 | +This install script sets Codex CLI up with the MCP configuration as well as configures an environment variable for an MCP API key. |
| 131 | + |
| 132 | +### Available Tools |
| 133 | + |
| 134 | +| Tool | Purpose | |
| 135 | +|------|---------| |
| 136 | +| `check_api_credentials` | Verify API token and scopes | |
| 137 | +| `create_model` | Create new model slots | |
| 138 | +| `upload_model` | Upload pkl files (multi-step workflow) | |
| 139 | +| `get_model_profile` | Query model stats | |
| 140 | +| `get_model_performance` | Get round-by-round performance | |
| 141 | +| `get_leaderboard` | View tournament rankings | |
| 142 | +| `get_tournaments` | List active tournaments | |
| 143 | +| `get_current_round` | Get current round info | |
| 144 | +| `list_datasets` | List available dataset files | |
| 145 | +| `run_diagnostics` | Run diagnostics on predictions | |
| 146 | +| `graphql_query` | Execute custom GraphQL queries | |
| 147 | + |
| 148 | +### Tournament IDs |
| 149 | + |
| 150 | +- **8** = Classic (main stock market tournament) |
| 151 | +- **11** = Signals (bring your own data) |
| 152 | +- **12** = CryptoSignals (crypto market predictions) |
| 153 | + |
| 154 | +### Key Metrics |
| 155 | + |
| 156 | +- `corr20Rep` - 20-day rolling correlation score (main metric) |
| 157 | +- `mmc20Rep` - Meta-model contribution (unique signal) |
| 158 | +- `return13Weeks` - 13-week return on staked NMR |
| 159 | +- `nmrStaked` - Amount of NMR staked |
| 160 | + |
| 161 | +### Authentication |
| 162 | + |
| 163 | +MCP tools require a Numerai API token with appropriate scopes: |
| 164 | +- Format: `Authorization: Token PUBLIC_ID$SECRET_KEY` |
| 165 | +- If MCP was installed via the above curl command, this would have already likely been configured through the `NUMERAI_MCP_AUTH` environment variable. |
| 166 | +- If MCP was not installed via the above curl command, you will need to create an MCP API key by navigating to https://numer.ai/account and clicking "Create MCP Key" and following the instructions in the modal, which are taking the PUBLIC_ID and SECRET_KEY and setting them in an environment variable through: |
| 167 | + |
| 168 | +```bash |
| 169 | +export NUMERAI_MCP_AUTH="Token PUBLIC_ID\$SECRET_KEY" |
| 170 | +``` |
| 171 | + |
| 172 | +Since the PUBLIC_ID and SECRET_KEY are separated by a $ character, it likely needs to be escaped when set through the export command. |
| 173 | + |
| 174 | +### Common Queries |
| 175 | + |
| 176 | +**List account's models**: |
| 177 | +```graphql |
| 178 | +query { account { models { id name } } } |
| 179 | +``` |
| 180 | + |
| 181 | +**Get default Python runtime**: |
| 182 | +```graphql |
| 183 | +query { computePickleDockerImages { id name image tag default } } |
| 184 | +``` |
| 185 | + |
| 186 | +**Check pickle validation status**: |
| 187 | +```graphql |
| 188 | +query { |
| 189 | + account { |
| 190 | + models { |
| 191 | + username |
| 192 | + computePickleUpload { |
| 193 | + filename validationStatus triggerStatus |
| 194 | + triggers { id status statuses { status description insertedAt } } |
| 195 | + } |
| 196 | + } |
| 197 | + } |
| 198 | +} |
| 199 | +``` |
| 200 | + |
| 201 | +### PKL Upload Workflow |
| 202 | + |
| 203 | +``` |
| 204 | +1. create_model(name, tournament=8) # Optional: create new model slot |
| 205 | +2. upload_model(operation="get_upload_auth") # Get presigned S3 URL |
| 206 | +3. PUT file to presigned URL # Upload the pkl file |
| 207 | +4. upload_model(operation="create") # Register upload |
| 208 | +5. upload_model(operation="list") # Wait for validation |
| 209 | +6. upload_model(operation="assign") # Assign to model slot |
| 210 | +``` |
| 211 | + |
| 212 | +--- |
| 213 | + |
| 214 | +## Python Environment Setup |
| 215 | + |
| 216 | +**CRITICAL**: Pickle files must be created with a Python version matching Numerai's compute environment to avoid segfaults and binary incompatibility. |
| 217 | + |
| 218 | +### Setup Steps |
| 219 | + |
| 220 | +```bash |
| 221 | +# 1. Query default Docker image (via MCP) to get Python version |
| 222 | +# Look for default: true, e.g., numerai_predict_py_3_12 = Python 3.12 |
| 223 | + |
| 224 | +# 2. Create matching venv with pyenv |
| 225 | +PYENV_PY=$(ls -d ~/.pyenv/versions/3.12.* 2>/dev/null | head -1) |
| 226 | +$PYENV_PY/bin/python -m venv ./venv |
| 227 | + |
| 228 | +# 3. Activate and install dependencies |
| 229 | +source ./venv/bin/activate |
| 230 | +pip install numpy pandas cloudpickle scipy lightgbm |
| 231 | +``` |
| 232 | + |
| 233 | +### Testing Pickles Locally |
| 234 | + |
| 235 | +```bash |
| 236 | +docker run -i --rm -v "$PWD:$PWD" \ |
| 237 | + ghcr.io/numerai/numerai_predict_py_3_12:a78dedd \ |
| 238 | + --debug --model $PWD/model.pkl |
| 239 | +``` |
| 240 | + |
| 241 | +--- |
| 242 | + |
| 243 | +## Modeling Philosophy |
| 244 | + |
| 245 | +- **Model-agnostic pipeline**: `pipeline.py`, `numerai_cv.py`, and metrics stay generic |
| 246 | +- **Model-specific logic**: Lives in configs and `agents/code/modeling/models/` wrappers |
| 247 | +- **Reproducibility**: All settings captured in config files |
| 248 | +- **Accurate validation**: No early stopping leakage; honest OOF performance estimation |
| 249 | + |
| 250 | +--- |
| 251 | + |
| 252 | +## Data Handling |
| 253 | + |
| 254 | +### Datasets |
| 255 | + |
| 256 | +Build datasets with `python -m agents.code.data.build_full_datasets`: |
| 257 | + |
| 258 | +| File | Description | |
| 259 | +|------|-------------| |
| 260 | +| `numerai/v5.2/full.parquet` | Full training data | |
| 261 | +| `numerai/v5.2/full_benchmark_models.parquet` | Benchmark model predictions | |
| 262 | +| `numerai/v5.2/downsampled_full.parquet` | Every 4th era (fast iteration) | |
| 263 | +| `numerai/v5.2/downsampled_full_benchmark_models.parquet` | Downsampled benchmarks | |
| 264 | + |
| 265 | +### Strategy |
| 266 | + |
| 267 | +1. **Scout phase**: Use downsampled data for quick experiments |
| 268 | +2. **Scale phase**: Run best configs on full data for final validation |
| 269 | + |
| 270 | +--- |
| 271 | + |
| 272 | +## Getting Started with Agent Tasks |
| 273 | + |
| 274 | +### For Research Tasks |
| 275 | + |
| 276 | +1. Read `numerai/agents/AGENTS.md` for detailed instructions |
| 277 | +2. Check relevant skills in `numerai/agents/skills/` |
| 278 | +3. Look for existing experiments in `numerai/agents/experiments/` |
| 279 | +4. Use downsampled data for iteration, full data for final runs |
| 280 | + |
| 281 | +### For Deployment Tasks |
| 282 | + |
| 283 | +1. Use the `numerai-model-upload` skill |
| 284 | +2. Verify Python version compatibility first |
| 285 | +3. Test pickle locally before uploading |
| 286 | +4. Use MCP server for programmatic deployment |
| 287 | + |
| 288 | +### For Understanding the Tournament |
| 289 | + |
| 290 | +1. Start with `hello_numerai.ipynb` for basics |
| 291 | +2. Review `feature_neutralization.ipynb` for feature risk |
| 292 | +3. Check `target_ensemble.ipynb` for ensemble strategies |
| 293 | +4. Use MCP server to query live tournament data |
| 294 | + |
| 295 | +--- |
| 296 | + |
| 297 | +## Important Notes |
| 298 | + |
| 299 | +- **Run commands from `numerai/`** (so `agents` is importable), or from repo root with `PYTHONPATH=numerai` |
| 300 | +- **Data lives under `numerai/<data_version>/`** (e.g. `numerai/v5.2/`), which is often gitignored locally |
| 301 | +- **Register repo skills**: `ln -s $PWD/numerai/agents/skills/* ~/.codex/skills/` |
| 302 | +- **Network access required** for MCP operations (Codex CLI may need `--yolo` flag) |
| 303 | +- **Always query Python version** before creating pkl files |
| 304 | +- **BMC (Benchmark Model Contribution)** is the key experiment metric (proxy for MMC), computed vs official `v52_lgbm_ender20` benchmark predictions in `*_benchmark_models.parquet` |
| 305 | +- **Only Classic tournament (8)** supports pickle uploads |
0 commit comments