MultiLevel Pattern Miner

Mine repeated structural patterns in text, compile them into LLM-ready specs, validate new drafts against those specs, and render review dashboards.

flowchart TD
    D["DOCUMENT (6)"]
    S["SECTION (5)"]
    C["CHUNK (4)"]
    P["PARAGRAPH (3)"]
    L["LINE / Sentence (2)"]
    PH["PHRASE (1)"]
    D --> S --> C --> P --> L --> PH

A pattern is a recurrent shape at a given level defined by how that node is composed of children at the lower level. The mined library is the evidence layer; compiled specs, validation bundles, and dashboards sit on top of it.

Pipeline

flowchart LR
    F["📄 File"] --> IT["io.load_text()"]
    IT --> RP["run_pipeline()"]
    RP --> NP["Normalizer.parse()"]
    NP --> BD["BoundaryDetector.detect()"]
    BD --> PM["PatternMiner.mine()"]
    PM --> O["📊 patterns.yml"]
    O --> CS["compile-spec"]
    CS --> SPEC["🧭 compiled-pattern-spec.yml"]
    SPEC --> VI["validate-index"]
    VI --> BUNDLE["📦 validation-index.json / .md"]
    BUNDLE --> DASH["🖥️ validation-dashboard.html"]

Features

Markdown and plain-text parsing with heading, list, and table awareness
Signature-based pattern clustering — no ML required
YAML and JSON export with source locations and example excerpts
Compiled bridge specs for templates, writer guides, generation programs, and validation contracts
Draft validation against compiled structural, lexical, and policy checks
Aggregate JSON and Markdown validation bundles with per-draft detail reports
Static HTML dashboards for filtered review, inline draft diagnostics, and CSV export
Corpus-wide aggregation via corpus_mine.py
Human-readable pattern library reports via generate_pattern_library_md.py
Extensible boundary detection (Strategy pattern; drop in spaCy or NLTK)
Unit-tested CLI and library modules

Install

python -m venv .venv
source .venv/bin/activate   # Windows: .venv\Scripts\activate
pip install -e ".[dev]"

Quick Start

# 1) Mine structural patterns from a source document
mplm mine examples/example1.md --out patterns.yml --min-support 1

# 2) Compile the mined library into an LLM-ready bridge spec
mplm compile-spec patterns.yml --out compiled-pattern-spec.yml

# 3) Validate a folder of drafts and write a JSON review bundle
mplm validate-index compiled-pattern-spec.yml examples/dashboard-drafts \
  --template-id tmpl-l3-s-s-s \
  --out validation-index.json

# 4a) Render an HTML dashboard from the JSON bundle
mplm render-dashboard validation-index.json --out validation-dashboard.html

# 4b) Or render the dashboard directly from the compiled spec plus drafts
mplm render-dashboard \
  --spec-path compiled-pattern-spec.yml \
  --drafts-root examples/dashboard-drafts \
  --template-id tmpl-l3-s-s-s \
  --out validation-dashboard.direct.html

Useful supporting commands:

# Preview the parsed AST
mplm preview examples/example1.md

# Mine with INFO logging to see pipeline stages
mplm --log-level INFO mine examples/example1.md --out patterns.yml

# Run tests
pytest -q

Python API

from mplm import run_pipeline
from mplm.io import load_text, save_library

text = load_text("examples/example1.md")
lib = run_pipeline(text, source="example1.md", min_support=2)
save_library(lib, "patterns.yml")

for p in lib.patterns:
    print(p.id, p.structure["signature"], p.support)

Core Workflow

The project is best understood as a layered workflow:

source text
  -> patterns.yml
  -> compiled-pattern-spec.yml
  -> validation-index.json / validation-index.md
  -> validation-dashboard.html

Typical uses:

derive template systems and writer guidance from mined patterns
validate generated or human-authored drafts against structural contracts
review batches of failures through JSON, Markdown, or HTML artifacts

Corpus Mining

Mine patterns across a whole directory and aggregate support counts:

# Aggregated corpus library
python3 corpus_mine.py ./corpus ./out/patterns.yml --min-support 2

# Aggregated + per-file outputs
python3 corpus_mine.py ./corpus ./out/ --min-support 2 --aggregate ./out/corpus.yml

Pattern Library Report

Generate a human-readable Markdown report with optional Mermaid graphs:

python3 generate_pattern_library_md.py \
  --in patterns.yml \
  --out pattern-library.md \
  --mermaid

# Also split into per-level files
python3 generate_pattern_library_md.py \
  --in patterns.yml \
  --out pattern-library.md \
  --split-dir docs/patterns_by_level \
  --mermaid

Notes

Phrase-level detection defaults to a regex heuristic to keep dependencies light. Enable advanced parsing by installing spaCy or NLTK.
The top-level mined artifact is still patterns.yml, but most downstream workflows now continue through compiled-pattern-spec.yml, validation bundles, and dashboards.
See DESIGN.md for the full architecture and design pattern justification.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.vscode		.vscode
docs-pattern-analysis		docs-pattern-analysis
docs		docs
examples		examples
site		site
src/mplm		src/mplm
tests		tests
.gitignore		.gitignore
DESIGN.md		DESIGN.md
LICENSE		LICENSE
README.md		README.md
corpus_mine.py		corpus_mine.py
generate_pattern_library_md.py		generate_pattern_library_md.py
mkdocs.yml		mkdocs.yml
patterns.yml		patterns.yml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MultiLevel Pattern Miner

Pipeline

Features

Install

Quick Start

Python API

Core Workflow

Corpus Mining

Pattern Library Report

Notes

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MultiLevel Pattern Miner

Pipeline

Features

Install

Quick Start

Python API

Core Workflow

Corpus Mining

Pattern Library Report

Notes

License

About

Topics

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages