teadata is a snapshot-first Python engine for Texas education data.
It provides:
DistrictandCampusdomain models- a fluent query DSL using
>> - geospatial lookups (nearest charter, campuses in district boundaries, private-school overlap)
- config-driven enrichment from TAPR, accountability, transfers, PEIMS financials, and closure datasets
- sidecar sqlite stores for fast boundary/map/entity lookup
pip install teadatagit clone https://github.com/adpena/teadata.git
cd teadata
uv sync --all-extrasfrom teadata import DataEngine
# Preferred runtime path: load the latest discovered snapshot.
engine = DataEngine.from_snapshot(search=True)
# District lookup by district number, campus number, or name.
aldine = engine.get_district("101902")
print(aldine.name)
# Campuses physically inside district boundaries.
for campus in aldine.campuses[:5]:
print(campus.name, campus.campus_number)Primary imports:
from teadata import DataEngine, District, CampusCore behaviors:
DataEngine.from_snapshot(...)supports.pkland.pkl.gzsnapshots and multiple payload shapes.- Snapshot discovery checks explicit paths, env vars, package
.cache, and parent.cachedirectories. DistrictandCampussupport dynamic metadata attributes throughmeta.Campus.to_dict()always includespercent_enrollment_change(numeric when available, otherwise"N/A").
teadata is intentionally cache-first.
Artifacts typically used at runtime:
repo_*.pkl/repo_*.pkl.gz(engine snapshot)boundaries_*.sqlite(boundary WKB sidecar)map_payloads_*.sqlite(map payload sidecar)entities_*.sqlite(entity lookup sidecar)
If snapshot/store files are Git LFS pointers or missing locally, runtime asset resolvers can fetch real files when URL env vars are provided.
TEADATA_SNAPSHOT: explicit snapshot path.TEADATA_SNAPSHOT_URL: URL used when snapshot candidate is missing or a Git LFS pointer.TEADATA_BOUNDARY_STORE: explicit boundary sqlite path.TEADATA_BOUNDARY_STORE_URL: URL fallback for boundary store.TEADATA_MAP_STORE: explicit map sqlite path.TEADATA_MAP_STORE_URL: URL fallback for map store.TEADATA_ENTITY_STORE: explicit entity sqlite path.TEADATA_ENTITY_STORE_URL: URL fallback for entity store.TEADATA_ASSET_CACHE_DIR: override cache directory used for downloaded assets.TEADATA_DISABLE_INDEXES: disable default spatial acceleration indexes.TEADATA_LOG_MEMORY: enable memory snapshot logging.
DataEngine and Query chains use >>.
# Resolve district then expand to district-operated campuses.
q = engine >> ("district", "ALDINE ISD") >> ("campuses_in",)
# Filter, sort, and take.
top = (
q
>> ("filter", lambda c: (c.enrollment or 0) > 1000)
>> ("sort", lambda c: c.enrollment or 0, True)
>> ("take", 10)
)
rows = top.to_df(columns=["name", "campus_number", "enrollment"])Supported lookup semantics include:
- case-insensitive district and campus name matching
- wildcard patterns (
*,?, SQL-like%/_) - normalized district number handling (for example
"123"and"'000123")
Spatial and transfer helpers include:
- nearest-campus/nearest-charter queries
nearest_charter_same_type(...)- transfer graph methods such as
transfers_out(...)/transfers_in(...)
teadata/enrichment provides registered enrichers for district and campus datasets.
Included enrichers cover:
- district accountability and district TAPR profile data
- campus accountability, TAPR profile/historical enrollment, PEIMS financials
- planned closure overlays
- charter network augmentation
Pipeline behavior is fault-tolerant by design: dataset-level failures are generally logged and do not hard-stop the full build.
teadata/load_data.py builds a full DataEngine and updates cached artifacts.
uv run python -m teadata.load_dataAt a high level, it:
- resolves year-aware source paths from
teadata/teadata_sources.yaml - warm-loads compatible snapshot cache when signatures match
- otherwise builds districts/campuses from spatial files
- applies enrichment datasets
- writes snapshot + sqlite sidecars back to
.cache/
teadata/teadata_config.py provides YAML/TOML config loading, year resolution, schema checks, and dataset joins.
CLI entrypoint:
uv run teadata-config --helpSubcommands:
init <out.yaml>resolve <cfg> <section> <dataset> <year>report <cfg> [--json] [--min N] [--max N]join <cfg> <year> [--datasets a,b,c] [--parquet out.parquet] [--duckdb out.duckdb --table t]
uv run pytestCurrent tests cover:
- snapshot gzip and fallback loading
- query DSL semantics and chaining
- nearest charter behavior and transfer grouping
- store discovery and asset-cache behavior
- entity serialization invariants (
percent_enrollment_change)
PyPI defaults currently documented at:
- per-file upload limit:
100 MB - total project limit:
10 GB
Reference: https://docs.pypi.org/project-management/storage-limits/
Before packaging trim, 0.0.118 artifacts were above the per-file limit:
- wheel: about
448 MB - sdist: about
446 MB
Current slimmed artifacts are below the limit:
- wheel:
dist/teadata-0.0.118-py3-none-any.whlabout74 MB - sdist:
dist/teadata-0.0.118.tar.gzabout72 MB
To stay under PyPI file limits while preserving runtime behavior:
- PyPI package data now includes compressed snapshots (
.pkl.gz) and selected sidecars (boundaries_*.sqlite,entities_*.sqlite). - Uncompressed
.pklfiles are excluded from distributions. map_payloads_*.sqliteis excluded from distributions; provide it at runtime viaTEADATA_MAP_STOREorTEADATA_MAP_STORE_URL.- URL-based store discovery supports snapshot-derived sidecar paths, so
TEADATA_*_URLcan hydrate missing local sidecars automatically.
- Versioning uses thousandths-place tags (
v0.0.101,v0.0.102, ...). - Keep only the 3 most recent release tags/assets.
Apache License 2.0. See LICENSE.