Implement (migrate) constraints and validation from postgres betydb

## Background

The Postgres implementation of betydb had a number of data constraints. These did a great job of enforcing data integrity (value ranges, foreign keys, uniqueness, etc). When migrating to a CSV based dataset, these database constraints are no longer automatically enforced.

Constraints are 
- described / documented in [Constraints for BETYdb](https://www.overleaf.com/articles/constraints-for-betydb/wxptyrksypkx.pdf). 
- Enumerated in the [Constraints Spreadsheet](https://docs.google.com/spreadsheets/d/1fJgaOSR0egq5azYPCP0VRIWw1AazND0OCduyjONH9Wk/edit?pli=1&gid=956483089#gid=956483089)
- Implemented in the postgres schema structure [db/structure.sql](https://github.com/PecanProject/bety/blob/develop/db/structure.sql) in the [bety repository](https://github.com/PecanProject/bety/), but these were not [completely implemented](https://github.com/PecanProject/bety/issues?q=is%3Aissue%20state%3Aopen%20constraints%20label%3Aconstraints).

## Scope

It is not necessary to replicate all constraints, which would be a lot of work with diminishing returns (there are a lot!). 

- translate PostgreSQL constraints to validation within R, and run by GH Actions
- focus on useful constraints only
- only focus on tables included in this repository

⚠️ Do not use the original design and implementation as a checklist.
⚠️ Don't overthink it. the original set of constraints took a lot of time to develop.

## Approach

1. Prioritize constraints
2. Discuss implementation approach here. Some options:
    1. validation functions, e.g. in data-raw/validation.R, called by data-raw/make-data.R
    2. testthat tests, e.g. tests/testthat/test-data-constraints.R
    3. combination of above
    4. Other

Large chunks of this work, especially translating constraints to R, may be well suited to LLMs because they are formally defined and the most important ones are implemented in SQL.

#### Priorities

**General Approach**

Prioritize constraints that:

- prevent real data corruption
- avoid complex cross-table logic
- are easy to understand and maintain

**Value constraints**

* numeric ranges (e.g., percentages between 0–100, precipitation ≥0)
* positive counts (`n ≥ 0`)
* sanity bounds for variables (min, max from variables table)

**Uniqueness constraints**

* natural keys that prevent duplicate rows

**Non-NULL constraints**

On critical fields required to interpret measurements; natural keys.

**Standardization**

* whitespace normalization
* canonical values for units or categorical variables

## Deliverables

* R validation functions implementing key constraints
* automated tests ensuring they run in CI
* documentation describing which constraints are enforced


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement (migrate) constraints and validation from postgres betydb #14

Background

Scope

Approach

Priorities

Deliverables

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Implement (migrate) constraints and validation from postgres betydb #14

Description

Background

Scope

Approach

Priorities

Deliverables

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions