Skip to content

Final-State-Press/Pattern-Miner

MultiLevel Pattern Miner

Mine repeated structural patterns in text, compile them into LLM-ready specs, validate new drafts against those specs, and render review dashboards.

flowchart TD
    D["DOCUMENT (6)"]
    S["SECTION (5)"]
    C["CHUNK (4)"]
    P["PARAGRAPH (3)"]
    L["LINE / Sentence (2)"]
    PH["PHRASE (1)"]
    D --> S --> C --> P --> L --> PH
Loading

A pattern is a recurrent shape at a given level defined by how that node is composed of children at the lower level. The mined library is the evidence layer; compiled specs, validation bundles, and dashboards sit on top of it.

Pipeline

flowchart LR
    F["📄 File"] --> IT["io.load_text()"]
    IT --> RP["run_pipeline()"]
    RP --> NP["Normalizer.parse()"]
    NP --> BD["BoundaryDetector.detect()"]
    BD --> PM["PatternMiner.mine()"]
    PM --> O["📊 patterns.yml"]
    O --> CS["compile-spec"]
    CS --> SPEC["🧭 compiled-pattern-spec.yml"]
    SPEC --> VI["validate-index"]
    VI --> BUNDLE["📦 validation-index.json / .md"]
    BUNDLE --> DASH["🖥️ validation-dashboard.html"]
Loading

Features

  • Markdown and plain-text parsing with heading, list, and table awareness
  • Signature-based pattern clustering — no ML required
  • YAML and JSON export with source locations and example excerpts
  • Compiled bridge specs for templates, writer guides, generation programs, and validation contracts
  • Draft validation against compiled structural, lexical, and policy checks
  • Aggregate JSON and Markdown validation bundles with per-draft detail reports
  • Static HTML dashboards for filtered review, inline draft diagnostics, and CSV export
  • Corpus-wide aggregation via corpus_mine.py
  • Human-readable pattern library reports via generate_pattern_library_md.py
  • Extensible boundary detection (Strategy pattern; drop in spaCy or NLTK)
  • Unit-tested CLI and library modules

Install

python -m venv .venv
source .venv/bin/activate   # Windows: .venv\Scripts\activate
pip install -e ".[dev]"

Quick Start

# 1) Mine structural patterns from a source document
mplm mine examples/example1.md --out patterns.yml --min-support 1

# 2) Compile the mined library into an LLM-ready bridge spec
mplm compile-spec patterns.yml --out compiled-pattern-spec.yml

# 3) Validate a folder of drafts and write a JSON review bundle
mplm validate-index compiled-pattern-spec.yml examples/dashboard-drafts \
  --template-id tmpl-l3-s-s-s \
  --out validation-index.json

# 4a) Render an HTML dashboard from the JSON bundle
mplm render-dashboard validation-index.json --out validation-dashboard.html

# 4b) Or render the dashboard directly from the compiled spec plus drafts
mplm render-dashboard \
  --spec-path compiled-pattern-spec.yml \
  --drafts-root examples/dashboard-drafts \
  --template-id tmpl-l3-s-s-s \
  --out validation-dashboard.direct.html

Useful supporting commands:

# Preview the parsed AST
mplm preview examples/example1.md

# Mine with INFO logging to see pipeline stages
mplm --log-level INFO mine examples/example1.md --out patterns.yml

# Run tests
pytest -q

Python API

from mplm import run_pipeline
from mplm.io import load_text, save_library

text = load_text("examples/example1.md")
lib = run_pipeline(text, source="example1.md", min_support=2)
save_library(lib, "patterns.yml")

for p in lib.patterns:
    print(p.id, p.structure["signature"], p.support)

Core Workflow

The project is best understood as a layered workflow:

source text
  -> patterns.yml
  -> compiled-pattern-spec.yml
  -> validation-index.json / validation-index.md
  -> validation-dashboard.html

Typical uses:

  • derive template systems and writer guidance from mined patterns
  • validate generated or human-authored drafts against structural contracts
  • review batches of failures through JSON, Markdown, or HTML artifacts

Corpus Mining

Mine patterns across a whole directory and aggregate support counts:

# Aggregated corpus library
python3 corpus_mine.py ./corpus ./out/patterns.yml --min-support 2

# Aggregated + per-file outputs
python3 corpus_mine.py ./corpus ./out/ --min-support 2 --aggregate ./out/corpus.yml

Pattern Library Report

Generate a human-readable Markdown report with optional Mermaid graphs:

python3 generate_pattern_library_md.py \
  --in patterns.yml \
  --out pattern-library.md \
  --mermaid

# Also split into per-level files
python3 generate_pattern_library_md.py \
  --in patterns.yml \
  --out pattern-library.md \
  --split-dir docs/patterns_by_level \
  --mermaid

Notes

  • Phrase-level detection defaults to a regex heuristic to keep dependencies light. Enable advanced parsing by installing spaCy or NLTK.
  • The top-level mined artifact is still patterns.yml, but most downstream workflows now continue through compiled-pattern-spec.yml, validation bundles, and dashboards.
  • See DESIGN.md for the full architecture and design pattern justification.

License

MIT

About

Identifies repeated structural patterns in text across six hierarchical levels: phrase, line, paragraph/list/table/title, chunk, section, and document. Higher levels are built from recurring lower-level patterns, revealing the structural DNA of the content. Outputs results in YAML or JSON for analysis, templating, and validation.

Topics

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages