-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
Why
Catalog quality is limited by source coverage. We need broader brand/vendor ingestion to improve identification accuracy and search relevance.
Outcome
Build a repeatable ingestion program for top cable companies across Shopify and non-Shopify sites.
In Scope
- Define a target source list (top cable brands + key retailers).
- Support both Shopify and non-Shopify source types.
- Add source provenance fields (source URL, crawl/import timestamp, source identifier).
- Add normalization + validation rules for core cable fields.
- Add quality scoring/reporting per source (coverage + parse success + critical-field completeness).
Out of Scope
- Perfect extraction of every long-tail product variant in one pass.
- New UI features unrelated to source quality/coverage.
Implementation Plan
- Build prioritized source inventory (tier 1/2) with owner + cadence.
- Expand Shopify ingestion configs/connectors for selected Shopify brands.
- Add non-Shopify extractors (HTML/JSON-LD/API where available).
- Standardize normalization pipeline for connector/wattage/data/video/length fields.
- Add per-source validation + anomaly reporting.
- Add replayable seed process for preview and local QA.
Test Plan
convex: parser/normalizer fixture tests per source type.convex: invariants for required fields + unknown handling (no silent coercion).manual: run ingest for each tier-1 source and verify coverage + critical fields.
Acceptance Criteria
- Tier-1 source list exists and is implemented.
- Both Shopify and non-Shopify ingestion paths are running in CI-usable workflows.
- Source-level quality report exists with parse success and critical-field completeness.
- Ingest failures are explicit (no silent fallback behavior).
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels