Mine repeated structural patterns in text, compile them into LLM-ready specs, validate new drafts against those specs, and render review dashboards.
flowchart TD
D["DOCUMENT (6)"]
S["SECTION (5)"]
C["CHUNK (4)"]
P["PARAGRAPH (3)"]
L["LINE / Sentence (2)"]
PH["PHRASE (1)"]
D --> S --> C --> P --> L --> PH
A pattern is a recurrent shape at a given level defined by how that node is composed of children at the lower level. The mined library is the evidence layer; compiled specs, validation bundles, and dashboards sit on top of it.
flowchart LR
F["📄 File"] --> IT["io.load_text()"]
IT --> RP["run_pipeline()"]
RP --> NP["Normalizer.parse()"]
NP --> BD["BoundaryDetector.detect()"]
BD --> PM["PatternMiner.mine()"]
PM --> O["📊 patterns.yml"]
O --> CS["compile-spec"]
CS --> SPEC["🧭 compiled-pattern-spec.yml"]
SPEC --> VI["validate-index"]
VI --> BUNDLE["📦 validation-index.json / .md"]
BUNDLE --> DASH["🖥️ validation-dashboard.html"]
- Markdown and plain-text parsing with heading, list, and table awareness
- Signature-based pattern clustering — no ML required
- YAML and JSON export with source locations and example excerpts
- Compiled bridge specs for templates, writer guides, generation programs, and validation contracts
- Draft validation against compiled structural, lexical, and policy checks
- Aggregate JSON and Markdown validation bundles with per-draft detail reports
- Static HTML dashboards for filtered review, inline draft diagnostics, and CSV export
- Corpus-wide aggregation via
corpus_mine.py - Human-readable pattern library reports via
generate_pattern_library_md.py - Extensible boundary detection (Strategy pattern; drop in spaCy or NLTK)
- Unit-tested CLI and library modules
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -e ".[dev]"# 1) Mine structural patterns from a source document
mplm mine examples/example1.md --out patterns.yml --min-support 1
# 2) Compile the mined library into an LLM-ready bridge spec
mplm compile-spec patterns.yml --out compiled-pattern-spec.yml
# 3) Validate a folder of drafts and write a JSON review bundle
mplm validate-index compiled-pattern-spec.yml examples/dashboard-drafts \
--template-id tmpl-l3-s-s-s \
--out validation-index.json
# 4a) Render an HTML dashboard from the JSON bundle
mplm render-dashboard validation-index.json --out validation-dashboard.html
# 4b) Or render the dashboard directly from the compiled spec plus drafts
mplm render-dashboard \
--spec-path compiled-pattern-spec.yml \
--drafts-root examples/dashboard-drafts \
--template-id tmpl-l3-s-s-s \
--out validation-dashboard.direct.htmlUseful supporting commands:
# Preview the parsed AST
mplm preview examples/example1.md
# Mine with INFO logging to see pipeline stages
mplm --log-level INFO mine examples/example1.md --out patterns.yml
# Run tests
pytest -qfrom mplm import run_pipeline
from mplm.io import load_text, save_library
text = load_text("examples/example1.md")
lib = run_pipeline(text, source="example1.md", min_support=2)
save_library(lib, "patterns.yml")
for p in lib.patterns:
print(p.id, p.structure["signature"], p.support)The project is best understood as a layered workflow:
source text
-> patterns.yml
-> compiled-pattern-spec.yml
-> validation-index.json / validation-index.md
-> validation-dashboard.html
Typical uses:
- derive template systems and writer guidance from mined patterns
- validate generated or human-authored drafts against structural contracts
- review batches of failures through JSON, Markdown, or HTML artifacts
Mine patterns across a whole directory and aggregate support counts:
# Aggregated corpus library
python3 corpus_mine.py ./corpus ./out/patterns.yml --min-support 2
# Aggregated + per-file outputs
python3 corpus_mine.py ./corpus ./out/ --min-support 2 --aggregate ./out/corpus.ymlGenerate a human-readable Markdown report with optional Mermaid graphs:
python3 generate_pattern_library_md.py \
--in patterns.yml \
--out pattern-library.md \
--mermaid
# Also split into per-level files
python3 generate_pattern_library_md.py \
--in patterns.yml \
--out pattern-library.md \
--split-dir docs/patterns_by_level \
--mermaid- Phrase-level detection defaults to a regex heuristic to keep dependencies light. Enable advanced parsing by installing spaCy or NLTK.
- The top-level mined artifact is still
patterns.yml, but most downstream workflows now continue throughcompiled-pattern-spec.yml, validation bundles, and dashboards. - See DESIGN.md for the full architecture and design pattern justification.
MIT