Skip to content

Expand ingestion coverage for top cable brands (Shopify + non-Shopify) #96

@anand-testcompare

Description

@anand-testcompare

Why

Catalog quality is limited by source coverage. We need broader brand/vendor ingestion to improve identification accuracy and search relevance.

Outcome

Build a repeatable ingestion program for top cable companies across Shopify and non-Shopify sites.

In Scope

  • Define a target source list (top cable brands + key retailers).
  • Support both Shopify and non-Shopify source types.
  • Add source provenance fields (source URL, crawl/import timestamp, source identifier).
  • Add normalization + validation rules for core cable fields.
  • Add quality scoring/reporting per source (coverage + parse success + critical-field completeness).

Out of Scope

  • Perfect extraction of every long-tail product variant in one pass.
  • New UI features unrelated to source quality/coverage.

Implementation Plan

  1. Build prioritized source inventory (tier 1/2) with owner + cadence.
  2. Expand Shopify ingestion configs/connectors for selected Shopify brands.
  3. Add non-Shopify extractors (HTML/JSON-LD/API where available).
  4. Standardize normalization pipeline for connector/wattage/data/video/length fields.
  5. Add per-source validation + anomaly reporting.
  6. Add replayable seed process for preview and local QA.

Test Plan

  • convex: parser/normalizer fixture tests per source type.
  • convex: invariants for required fields + unknown handling (no silent coercion).
  • manual: run ingest for each tier-1 source and verify coverage + critical fields.

Acceptance Criteria

  • Tier-1 source list exists and is implemented.
  • Both Shopify and non-Shopify ingestion paths are running in CI-usable workflows.
  • Source-level quality report exists with parse success and critical-field completeness.
  • Ingest failures are explicit (no silent fallback behavior).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions