Skip to content

Add organism classification (acellular/unicellular/multicellular) for assay metadata#2381

Closed
SatoryKono wants to merge 1 commit intomainfrom
codex/add-organism-classification-function-jzh1tb
Closed

Add organism classification (acellular/unicellular/multicellular) for assay metadata#2381
SatoryKono wants to merge 1 commit intomainfrom
codex/add-organism-classification-function-jzh1tb

Conversation

@SatoryKono
Copy link
Copy Markdown
Owner

Motivation

  • Добавить детерминированную классификацию организмов по полям assay_organism и assay_taxonomy_id с приоритетом taxonomy_id и диагностикой конфликтов.
  • Обеспечить простую, расширяемую реализацию (нормализация имени, алиасы, базовая таблица соответствий taxonomy_id → класс) без внешних HTTP-вызовов и с полным типизированным публичным API.

Description

  • Добавлен новый доменный модуль src/bioetl/domain/mapping/organism_classification.py, экспортируемый через bioetl.domain.mapping, реализующий:
    • OrganismClass (Enum) с членами ACELLULAR, UNICELLULAR, MULTICELLULAR;
    • OrganismClassificationResult (frozen dataclass) с полями organism_class, normalized_organism, taxonomy_id, source, source_conflict, reason;
    • функцию classify_organism(assay_organism, assay_taxonomy_id) с приоритетом taxonomy_id и явным указанием причины/конфликта;
    • функцию normalize_organism_name (trim, lower, remove parenthetical fragments, collapse spaces);
    • встроенные маппинги TAXONOMY_CLASS_MAP, ORGANISM_ALIAS_MAP и ORGANISM_NAME_CLASS_MAP и правило ключевых слов для эвристики (штаммы/изолят, ключевые слова virus/bacter/etc.).
  • Обновлён фасад src/bioetl/domain/mapping/__init__.py, добавлены экспорты OrganismClass, OrganismClassificationResult, classify_organism, normalize_organism_name.
  • Добавлены unit-тесты tests/unit/domain/mapping/test_organism_classification.py, покрывающие: классификацию по taxonomy_id, классификацию по имени (включая штаммы), алиасы (hiv, rice, eel, monkey), поведение при конфликте и невалидный/пустой ввод.

Testing

  • Unit tests for the new module passed: uv run python -m pytest tests/unit/domain/mapping/test_organism_classification.py -q ✅.
  • Ran repository-wide tests: uv run python -m pytest tests/ -x -q — failed due to pre-existing repository formatting checks (ruff) detecting an unrelated test file that needs formatting (tests/unit/infrastructure/config/test_contract_policy_loader.py) (failure is unrelated to the new mapping logic) ❌.
  • Type check: uv run python -m mypy --strict src/bioetl/ — reported pre-existing mypy issues in unrelated files (src/bioetl/__init__.py, src/bioetl/infrastructure/config/contract_policy_loader.py, src/bioetl/infrastructure/system/memory_monitor.py), so strict type-check for entire repo did not pass (these are existing errors) ❌.
  • Lint/format: applied ruff format to the new files during the rollout; new module now adheres to ruff checks.

If desired, next steps: expand the taxonomy mapping table and aliases from a config file or reference CSV, and follow-up PRs can address the unrelated formatting/mypy issues that block full-suite gates.


Codex Task

@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

@SatoryKono
Copy link
Copy Markdown
Owner Author

Superseded by #2380.

@SatoryKono SatoryKono closed this Mar 2, 2026
@SatoryKono SatoryKono deleted the codex/add-organism-classification-function-jzh1tb branch March 3, 2026 09:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant