A project pack teaches the intelligence engine about a specific research category.
Create a directory under src/intelligence/projects/<pack_name>/ with:
<pack_name>/
├── __init__.py # Docstring only, e.g. """My pack."""
├── config/
│ └── project.yaml # Pack metadata (name, domain, purpose, keyword_groups)
├── keywords/
│ └── seed_keywords.csv # CSV with group,keyword columns (at least 1 keyword row)
└── templates/ # At least one .md template (report.md recommended)
└── report.md
name: my_pack
domain: xiaohongshu
purpose: my_research_purpose
contract:
config: config/project.yaml
keywords: keywords/seed_keywords.csv
templates: templates/
examples: optional
keyword_groups:
- group_one
- group_twogroup,keyword
group_one,关键词一
group_one,关键词二
group_two,keyword_threeCreate src/intelligence/workflows/<pack_name>_pack.py with:
- A
_bucket_scores(sample: CanonicalSample) -> dict[str, float]function with your category-specific heuristics - A
ScoringConfigdefining bucket weights, confidence rules, and classification rules - A
PackSpecinstance wiring the above together - A thin
run_<pack_name>_pack()wrapper callingrun_pack_flow()
See jade_pack.py or streetwear_pack.py for working examples.
Add your PackSpec to _PACK_SPECS in src/intelligence/cli.py:
from .workflows.my_pack import my_pack_spec
_PACK_SPECS = {
"jade": jade_pack_spec,
"designer_streetwear": streetwear_pack_spec,
"my_pack": my_pack_spec,
}Create a small JSONL fixture at tests/fixtures/mediacrawler_<pack_name>_export.jsonl with representative samples. Include at least one on-category and one off-category row to test scoring differentiation.
python -m intelligence validate-pack my_packThis checks that all required assets exist and are well-formed.
# With fixture fallback
python -m intelligence run-pack my_pack --output-dir /tmp/my-output
# With real input
python -m intelligence run-pack my_pack --input path/to/export.jsonl --output-dir /tmp/my-outputEvery pack produces the same five files:
normalized_samples.json— canonical schema samplesscored_samples.json— samples with bucket scores, weighted score, confidence, classificationreport.json— structured reportreport.md— human-readable markdown reportreport.html— HTML report