Releases: NeotomaDB/DataBUS
Releases · NeotomaDB/DataBUS
DataBUS v2.0.0
[2.0.0] - 2026-03-05
Added
- Universal YAML template (
data/template_example.yml). - Example CSV data file (
data/data_example.csv) demonstrating the full column set. - Comprehensive test suite with coverage reporting via Codecov.
- CI pipeline with Ruff linting, pytest + coverage, and Codecov upload (
.github/workflows/ci.yml). - MkDocs documentation site with auto-generated API reference via mkdocstrings.
- Tutorials rewritten to reflect the actual two-pass workflow (
databus_example.py). - OpenSSF Best Practices badge tracking.
Changed
- Major refactor of the validation/upload architecture (BU-334, BU-349): each validator now also handles insertion when a populated
databusdict is supplied, eliminating the separateneotomaUploadermodule and reducing code duplication. - Refactored
pull_paramsinto smaller, testable helper functions inutils.py, removing the dependency on pandas. - Contact handling consolidated: all contact types (PI, collector, processor, analyst) now go through
valid_contact, with chronology modeler assignment handled withinvalid_chronologies. This significantly reduces repeated code. - Data upload now tracks inserted IDs so that data uncertainties can be linked correctly.
- Chronology handling improved to properly manage calendar years, default chronologies, and sample age linkage.
- Geopolitical unit insertion updated to handle entities like Scotland under the UK.
- Improved logging with
logging_dictand per-file.valid.logoutput. - Adopted Ruff as the sole linter and formatter, replacing previous tooling.
- Switched to
uvfor dependency management and script execution.
Fixed
- Chron controls now handle calendar years properly.
- U-Th series insertion works correctly when the number of geochron indices differs from sample indices.
- Fixed dataset–publication and dataset–database linking during upload.
- Fixed collector insertion for NODE community datasets.
- Fixed variable validation to handle null values without comparing null against null.
- Numerous typos across
chroncontrols.py,sample.py,Chronology.py, and others.
DataBUSv1.0.0
DataBUSv1.0.0
DataBUS is a Python-based bulk uploader tool for the Neotoma Paleoecology Database. It helps users prepare, validate, and upload large sets of paleoecological records in bulk — using a YAML + CSV template, validation routines, and an upload script that pushes data into a temporary holding database for subsequent ingestion into Neotoma. 
Key Features (v1.0.0)
- Template-based uploads: Define data using a standardized YAML + CSV template structure that maps CSV columns to Neotoma DB schema (tables/columns) via a “cross-walk.” This enables consistent and repeatable bulk uploads. 
- Validation suite: BEFORE upload, DataBUS validates submitted CSV data against the template definitions. This includes checks for Site, Collection Unit, Analysis Unit, Dataset, Sample, Data values, dating horizons, and more — reducing risk of malformed or invalid uploads.
- Automated upload script: Once validated, users can run a single command (python3 data_upload.py) to push data into the
neotomaholdingtankorneotomaproper database. - Open-source & MIT licensed: DataBUS is released under the MIT license, enabling free use, modification, and redistribution under standard open-source terms. 
Known limitations / Scope & Considerations
- DataBUS currently expects data templates to be prepared in YAML + CSV format. Data must be in CSV format.
- Users must follow template rules carefully (column names, vocabularies, types, etc.) — misconfigured templates or CSVs may result in validation failures.
- Because this is the first official release, the tool will still evolve; future versions might include usability enhancements, more automated checks, or UI tooling.
What's Changed
- NSF Badge Update by @ErikZepeda59 in #1
- Dev by @sedv8808 in #2
- Main developer @sedv8808
New Contributors
- @ErikZepeda59 made their first contribution in #1
- @sedv8808 made their first contribution in #2
Full Changelog: v0.0.1...v1.0.0
Alpha DataBUS release
This release represents the alpha release of the DataBUS, including template generation and initial package development.
Full Changelog: https://github.com/NeotomaDB/DataBUS/commits/v0.0.1