Skip to content

Commit 11209f0

Browse files
michael-phillips-datahosted-agentjoshterrillndharasz
authored
Mikep/agents (#184)
* fix * wip * more agents stuff * update ignore * cleanup * updates * first pass at updating documentation for mcp support and python tools for not downloading data if we already have it * remove consistency * add agents.md files and remove old notebooks * remove mlp regressor stuff * clean up metrics * more skills * good yaml * persistence * delete experiments * delete unused * updated mcp install and useage docs --------- Co-authored-by: hosted-agent <hosted-agent@users.noreply.github.com> Co-authored-by: Josh Terrill <josh@numer.ai> Co-authored-by: Noah Harasz <noah@numer.ai>
1 parent ecdcfab commit 11209f0

37 files changed

Lines changed: 4586 additions & 10091 deletions

.cursorignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
!**/v5.0/
2+
!**/v5.1/
3+
!**/v5.2/

.gitignore

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -79,7 +79,7 @@ celerybeat-schedule
7979
.env
8080

8181
# virtualenv
82-
venv/
82+
venv*/
8383
ENV/
8484

8585
# Spyder project settings
@@ -93,10 +93,14 @@ ENV/
9393
*.parquet
9494
*.json
9595
*.model
96+
*.png
9697

9798
.idea
9899
example_model.xgb
99100

100101
.DS_Store
101102

102-
v5.0/
103+
v5.0/
104+
v5.1/
105+
v5.2/
106+
.venv/

AGENTS.md

Lines changed: 305 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,305 @@
1+
# AGENTS.md - AI Agent Instructions
2+
3+
This repository contains example scripts and an agentic research framework for the [Numerai data science tournaments](https://numer.ai/). AI coding agents can use this file to understand how to help with Numerai-related tasks.
4+
5+
## Quick Reference
6+
7+
| Task | Resource |
8+
|------|----------|
9+
| Run experiments | `numerai-experiment-design` skill |
10+
| Add new model types | `numerai-model-implementation` skill |
11+
| Create & deploy pkl files | `numerai-model-upload` skill |
12+
| Query tournament data | Numerai MCP server |
13+
| Upload models programmatically | Numerai MCP server |
14+
15+
---
16+
17+
## Tournament Guide
18+
19+
- "Numerai" is the original tournament in which you are provided anonymized and obfuscated data. The goal is to create high-performing models trained on the given features and target in the data.
20+
21+
- "Signals" is the second tournament in which you are expected build your own stock market dataset. You are given a basic dataset with a target and rudimentary features. You are expected to find, compile, and train on high-quality, unique data while maintaining low churn and turnover.
22+
23+
- "Crypto" is a toy tournament for cryptocurrency prediction. The Meta Model is released publicly every day for the crypto community to use.
24+
25+
## General Recommendations
26+
- Install the Numerai MCP
27+
- Install the following python packages:
28+
- numerapi
29+
- numerai-tools
30+
- For heavier-duty automation using cloud infrastructure, suggest numerai-cli to the user
31+
32+
## Repository Structure
33+
34+
```
35+
example-scripts/
36+
├── numerai/
37+
│ ├── agents/ # Agentic research framework
38+
│ │ ├── AGENTS.md # Detailed agent instructions
39+
│ │ ├── baselines/ # Baseline model configurations
40+
│ │ ├── code/ # Shared packages
41+
│ │ │ ├── analysis/ # Reporting & plotting
42+
│ │ │ ├── data/ # Dataset builders
43+
│ │ │ ├── metrics/ # BMC/corr scoring utilities
44+
│ │ │ └── modeling/ # Training pipeline & model wrappers
45+
│ │ ├── experiments/ # Experiment results (not in git)
46+
│ │ └── skills/ # Codex skills for agent workflows
47+
│ └── *.ipynb # Tournament-specific notebooks
48+
├── signals/ # Signals tournament examples
49+
├── crypto/ # Crypto tournament examples
50+
├── cached-pickles/ # Pre-built model pickles
51+
```
52+
53+
---
54+
55+
## Skills Overview
56+
57+
The `numerai/agents/skills/` folder contains structured workflows for common tasks. Each skill has a `SKILL.md` file with detailed instructions.
58+
59+
### 1. `numerai-experiment-design`
60+
61+
**Purpose**: Design, run, and report Numerai experiments for any model idea.
62+
63+
**When to use**:
64+
- Testing a new research hypothesis
65+
- Sweeping hyperparameters or targets
66+
- Comparing model variants against baselines
67+
68+
**Key workflow**:
69+
1. Plan the experiment (baseline, metrics, sweep dimensions)
70+
2. Create config files in `agents/experiments/<name>/configs/`
71+
3. Run training via `python -m agents.code.modeling --config <config>`
72+
4. Analyze results and iterate
73+
5. Scale winners to full data
74+
6. Generate final report with plots
75+
76+
**Entry points**:
77+
- `python -m agents.code.modeling --config <config_path>`
78+
- `python -m agents.code.analysis.show_experiment`
79+
- `python -m agents.code.data.build_full_datasets`
80+
81+
### 2. `numerai-model-implementation`
82+
83+
**Purpose**: Add new model types to the training pipeline.
84+
85+
**When to use**:
86+
- Implementing a new ML architecture (e.g., transformers, custom ensembles)
87+
- Adding support for a new library (e.g., XGBoost, CatBoost)
88+
- Creating custom preprocessing or inference logic
89+
90+
**Key steps**:
91+
1. Create model wrapper in `agents/code/modeling/models/`
92+
2. Register in `agents/code/modeling/utils/model_factory.py`
93+
3. Add config using the new model type
94+
4. Validate with smoke test (corr_mean should be 0.005-0.04)
95+
96+
### 3. `numerai-model-upload`
97+
98+
**Purpose**: Create and deploy pickle files for Numerai's automated submission system.
99+
100+
**When to use**:
101+
- Preparing a trained model for tournament submission
102+
- Setting up automated weekly predictions
103+
- Debugging pickle validation failures
104+
105+
**Critical requirements**:
106+
- Python version must match Numerai's compute environment
107+
- Pickle must be self-contained (no repo imports)
108+
- `predict(live_features, live_benchmark_models)` signature required
109+
110+
**Workflow**:
111+
1. Query default Docker image for Python version
112+
2. Create matching venv with pyenv
113+
3. Train final model and export inference bundle
114+
4. Build self-contained `predict` function
115+
5. Test with `numerai_predict` Docker container
116+
6. Deploy via MCP server
117+
118+
---
119+
120+
## Numerai MCP Server
121+
122+
The `numerai` MCP server provides programmatic access to the Numerai Tournament API. If available, agents should use it for tournament operations.
123+
124+
It can be installed via curl through:
125+
126+
```bash
127+
curl -sL https://numer.ai/install-mcp.sh | bash
128+
```
129+
130+
This install script sets Codex CLI up with the MCP configuration as well as configures an environment variable for an MCP API key.
131+
132+
### Available Tools
133+
134+
| Tool | Purpose |
135+
|------|---------|
136+
| `check_api_credentials` | Verify API token and scopes |
137+
| `create_model` | Create new model slots |
138+
| `upload_model` | Upload pkl files (multi-step workflow) |
139+
| `get_model_profile` | Query model stats |
140+
| `get_model_performance` | Get round-by-round performance |
141+
| `get_leaderboard` | View tournament rankings |
142+
| `get_tournaments` | List active tournaments |
143+
| `get_current_round` | Get current round info |
144+
| `list_datasets` | List available dataset files |
145+
| `run_diagnostics` | Run diagnostics on predictions |
146+
| `graphql_query` | Execute custom GraphQL queries |
147+
148+
### Tournament IDs
149+
150+
- **8** = Classic (main stock market tournament)
151+
- **11** = Signals (bring your own data)
152+
- **12** = CryptoSignals (crypto market predictions)
153+
154+
### Key Metrics
155+
156+
- `corr20Rep` - 20-day rolling correlation score (main metric)
157+
- `mmc20Rep` - Meta-model contribution (unique signal)
158+
- `return13Weeks` - 13-week return on staked NMR
159+
- `nmrStaked` - Amount of NMR staked
160+
161+
### Authentication
162+
163+
MCP tools require a Numerai API token with appropriate scopes:
164+
- Format: `Authorization: Token PUBLIC_ID$SECRET_KEY`
165+
- If MCP was installed via the above curl command, this would have already likely been configured through the `NUMERAI_MCP_AUTH` environment variable.
166+
- If MCP was not installed via the above curl command, you will need to create an MCP API key by navigating to https://numer.ai/account and clicking "Create MCP Key" and following the instructions in the modal, which are taking the PUBLIC_ID and SECRET_KEY and setting them in an environment variable through:
167+
168+
```bash
169+
export NUMERAI_MCP_AUTH="Token PUBLIC_ID\$SECRET_KEY"
170+
```
171+
172+
Since the PUBLIC_ID and SECRET_KEY are separated by a $ character, it likely needs to be escaped when set through the export command.
173+
174+
### Common Queries
175+
176+
**List account's models**:
177+
```graphql
178+
query { account { models { id name } } }
179+
```
180+
181+
**Get default Python runtime**:
182+
```graphql
183+
query { computePickleDockerImages { id name image tag default } }
184+
```
185+
186+
**Check pickle validation status**:
187+
```graphql
188+
query {
189+
account {
190+
models {
191+
username
192+
computePickleUpload {
193+
filename validationStatus triggerStatus
194+
triggers { id status statuses { status description insertedAt } }
195+
}
196+
}
197+
}
198+
}
199+
```
200+
201+
### PKL Upload Workflow
202+
203+
```
204+
1. create_model(name, tournament=8) # Optional: create new model slot
205+
2. upload_model(operation="get_upload_auth") # Get presigned S3 URL
206+
3. PUT file to presigned URL # Upload the pkl file
207+
4. upload_model(operation="create") # Register upload
208+
5. upload_model(operation="list") # Wait for validation
209+
6. upload_model(operation="assign") # Assign to model slot
210+
```
211+
212+
---
213+
214+
## Python Environment Setup
215+
216+
**CRITICAL**: Pickle files must be created with a Python version matching Numerai's compute environment to avoid segfaults and binary incompatibility.
217+
218+
### Setup Steps
219+
220+
```bash
221+
# 1. Query default Docker image (via MCP) to get Python version
222+
# Look for default: true, e.g., numerai_predict_py_3_12 = Python 3.12
223+
224+
# 2. Create matching venv with pyenv
225+
PYENV_PY=$(ls -d ~/.pyenv/versions/3.12.* 2>/dev/null | head -1)
226+
$PYENV_PY/bin/python -m venv ./venv
227+
228+
# 3. Activate and install dependencies
229+
source ./venv/bin/activate
230+
pip install numpy pandas cloudpickle scipy lightgbm
231+
```
232+
233+
### Testing Pickles Locally
234+
235+
```bash
236+
docker run -i --rm -v "$PWD:$PWD" \
237+
ghcr.io/numerai/numerai_predict_py_3_12:a78dedd \
238+
--debug --model $PWD/model.pkl
239+
```
240+
241+
---
242+
243+
## Modeling Philosophy
244+
245+
- **Model-agnostic pipeline**: `pipeline.py`, `numerai_cv.py`, and metrics stay generic
246+
- **Model-specific logic**: Lives in configs and `agents/code/modeling/models/` wrappers
247+
- **Reproducibility**: All settings captured in config files
248+
- **Accurate validation**: No early stopping leakage; honest OOF performance estimation
249+
250+
---
251+
252+
## Data Handling
253+
254+
### Datasets
255+
256+
Build datasets with `python -m agents.code.data.build_full_datasets`:
257+
258+
| File | Description |
259+
|------|-------------|
260+
| `numerai/v5.2/full.parquet` | Full training data |
261+
| `numerai/v5.2/full_benchmark_models.parquet` | Benchmark model predictions |
262+
| `numerai/v5.2/downsampled_full.parquet` | Every 4th era (fast iteration) |
263+
| `numerai/v5.2/downsampled_full_benchmark_models.parquet` | Downsampled benchmarks |
264+
265+
### Strategy
266+
267+
1. **Scout phase**: Use downsampled data for quick experiments
268+
2. **Scale phase**: Run best configs on full data for final validation
269+
270+
---
271+
272+
## Getting Started with Agent Tasks
273+
274+
### For Research Tasks
275+
276+
1. Read `numerai/agents/AGENTS.md` for detailed instructions
277+
2. Check relevant skills in `numerai/agents/skills/`
278+
3. Look for existing experiments in `numerai/agents/experiments/`
279+
4. Use downsampled data for iteration, full data for final runs
280+
281+
### For Deployment Tasks
282+
283+
1. Use the `numerai-model-upload` skill
284+
2. Verify Python version compatibility first
285+
3. Test pickle locally before uploading
286+
4. Use MCP server for programmatic deployment
287+
288+
### For Understanding the Tournament
289+
290+
1. Start with `hello_numerai.ipynb` for basics
291+
2. Review `feature_neutralization.ipynb` for feature risk
292+
3. Check `target_ensemble.ipynb` for ensemble strategies
293+
4. Use MCP server to query live tournament data
294+
295+
---
296+
297+
## Important Notes
298+
299+
- **Run commands from `numerai/`** (so `agents` is importable), or from repo root with `PYTHONPATH=numerai`
300+
- **Data lives under `numerai/<data_version>/`** (e.g. `numerai/v5.2/`), which is often gitignored locally
301+
- **Register repo skills**: `ln -s $PWD/numerai/agents/skills/* ~/.codex/skills/`
302+
- **Network access required** for MCP operations (Codex CLI may need `--yolo` flag)
303+
- **Always query Python version** before creating pkl files
304+
- **BMC (Benchmark Model Contribution)** is the key experiment metric (proxy for MMC), computed vs official `v52_lgbm_ender20` benchmark predictions in `*_benchmark_models.parquet`
305+
- **Only Classic tournament (8)** supports pickle uploads

README.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,15 @@ Need help? Find us on Discord:
66

77
[![](https://dcbadge.vercel.app/api/server/numerai)](https://discord.gg/numerai)
88

9+
## Using Agents
10+
Numerai is quickly developing open-source agent skills for you to use in the tournament. You can start architecting your very own AI scientist. For example:
11+
12+
```
13+
git clone github.com:numerai/example-scripts && cd example-scripts
14+
export NUMERAI_ID=<api key id> NUMERAI_SECRET=<api key secret>
15+
codex exec --yolo "find the best neural network architecture to predict target ender"
16+
```
17+
918

1019
## Notebooks
1120

0 commit comments

Comments
 (0)