Skip to content

Add documentation extractor for Confluence and Notion #503

@ravisuhag

Description

@ravisuhag

Context

Business context that explains why a table exists or what a metric means often lives outside the data stack — in Confluence pages, Notion docs, or internal wikis. AI agents need these connections to reason about meaning, not just structure.

Scope

New extractors for documentation platforms:

Confluence

  • Extract page metadata: title, space, author, last modified, labels
  • Extract page hierarchy and parent-child relationships
  • Emit documented_by relationships linking pages to data assets (via mentions, links, or labels)

Notion

  • Extract page/database metadata: title, workspace, author, properties
  • Extract page hierarchy
  • Emit documented_by relationships where inferrable

Design Considerations

  • Not full-text content ingestion — extract metadata and graph edges, not document bodies
  • Relationship inference: scan page content for URNs, table names, or asset references to auto-link
  • Support filtering by space/workspace to avoid extracting irrelevant documentation

Why

AI agents that only see technical metadata miss the "why" behind design decisions. Documentation extractors close the gap between technical and business context.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions