Skip to content

[codex] Add initial ENERO registry entries#108

Draft
cmungall wants to merge 3 commits intomainfrom
issue-107-enero-integration
Draft

[codex] Add initial ENERO registry entries#108
cmungall wants to merge 3 commits intomainfrom
issue-107-enero-integration

Conversation

@cmungall
Copy link
Copy Markdown
Collaborator

Summary

Adds an initial ENERO Foundry integration slice to the SemSQL ontology registry.

This PR adds:

  • cepo as a registry entry backed by the upstream production OWL artifact
  • oto as a registry entry backed by the upstream production Turtle artifact
  • a focused registry-generation test covering the new download/build rules

Why

Issue #107 tracks broader ENERO ontology onboarding.

This PR intentionally starts with the ontologies that have publicly reachable machine-readable sources and can be wired into the existing SemSQL registry model with minimal surface area.

Notes

  • cepo builds via the normal robot merge path because its published OWL imports resolve.
  • oto needed a custom build_command because the published ontology imports http://w3id.org/oto/develop/oto-shared.ttl, which currently resolves to HTML rather than the shared ontology file. The build step rewrites that import to the raw GitHub artifact before running robot merge.
  • This does not close Integrate ENERO Foundry ontologies into Semantic-SQL #107; it is the first implementation slice.

Validation

  • uv run pytest tests/test_builder/test_registry.py -q
  • manual robot merge smoke test against the upstream cepo.owl
  • manual robot merge smoke test against oto.ttl after the import rewrite encoded in the registry entry

Refs #107

@claude
Copy link
Copy Markdown

claude bot commented Mar 30, 2026

Claude finished @cmungall's task —— View job


PR Review: ENERO Registry Integration ✅

Review Tasks:

  • Analyze registry file changes (ontologies.yaml)
  • Review new test file implementation
  • Check code quality and best practices
  • Assess security considerations
  • Evaluate test coverage
  • Provide comprehensive feedback

Overall Assessment: APPROVED

This is a well-implemented, focused integration that follows project conventions and includes appropriate safeguards.

Code Quality & Best Practices ✅

Registry Changes (src/semsql/builder/registry/ontologies.yaml:209-220):

  • Consistent formatting: Follows existing YAML structure and indentation
  • Proper categorization: Added under logical "ENERO Foundry" section with descriptive comment
  • Standard fields: Both entries include required url, has_imports: true, and prefixmap
  • Format specification: oto correctly specifies format: turtle for non-OWL source
  • Prefix mappings: Follow established patterns (CEPO, OTO namespace conventions)

Test Implementation (tests/test_builder/test_registry.py):

  • Modern pytest style: Properly uses @pytest.mark.parametrize instead of unittest (follows CLAUDE.md guidelines)
  • Comprehensive coverage: Tests both download rules AND build commands
  • Regex validation: Properly escapes makefile syntax in test patterns
  • Module-scoped fixture: Efficient registry compilation shared across tests
  • Clear test structure: Well-named functions with descriptive parameters

Security Considerations ✅

URL Sources:

  • HTTPS only: Both URLs use secure HTTPS protocols
  • Trusted sources: Official OpenEnergyPlatform repositories
  • Production branches: Using production branch, not unstable main/develop

Build Commands:

  • perl command analysis: The oto build command safely rewrites import paths using standard regex substitution
  • No unsafe operations: Only performs text replacement and standard robot/merge operations
  • Temporary file cleanup: Properly removes temporary files (rm $@.tmp.ttl)
  • Input validation: Uses make variables properly ($<, $@)

Performance Considerations ✅

Build Efficiency:

  • Appropriate complexity: cepo uses efficient default build, oto includes necessary preprocessing
  • Import resolution: Both marked has_imports: true indicating proper dependency handling
  • Temporary file pattern: oto build creates intermediate files efficiently, cleans up properly

Technical Implementation ✅

Custom Build Command Analysis:
The oto perl command addresses a real upstream issue:

perl -0pe 's@http://w3id.org/oto/develop/oto-shared.ttl@https://raw.githubusercontent.com/OpenEnergyPlatform/OpenTransportOntology/production/src/ontology/edits/oto-shared.ttl@g'
  • Necessary workaround: Resolves broken import that currently points to HTML instead of ontology
  • Conservative approach: Only changes the problematic import URL, nothing else
  • Follows precedent: Similar perl preprocessing used in 6 other registry entries (lines 510, 569, 737, 745, 776)

Test Coverage Analysis:

  • Download validation: Ensures makefile generates correct curl commands for both ontologies
  • Build validation: Validates both default robot merge (cepo) and custom perl preprocessing (oto)
  • Pattern matching: Uses appropriate regex escaping for makefile syntax validation
  • Parametrized design: Easy to extend for additional ENERO ontologies

Minor Suggestions

  1. Documentation: Consider adding inline comments explaining the import rewrite reasoning (though the PR description covers this well)

  2. Future Enhancement: Once upstream fixes the oto-shared.ttl import, this custom build command can be removed

Conclusion

This PR represents excellent engineering practices:

  • ✅ Minimal surface area implementation
  • ✅ Comprehensive test coverage
  • ✅ Addresses real downstream integration issues
  • ✅ Follows all established project conventions
  • ✅ Safe, auditable security posture
  • ✅ Clear documentation of trade-offs

Recommendation: APPROVE - Ready to merge


@claude
Copy link
Copy Markdown

claude bot commented Mar 30, 2026

Claude finished @cmungall's task —— View job


PR Review: ENERO Registry Integration ✅

Review Tasks:

  • Analyze registry file changes (ontologies.yaml)
  • Review build commands in Makefile
  • Check prefix mappings additions
  • Review new test file implementation
  • Check code quality and best practices
  • Assess security considerations
  • Evaluate test coverage
  • Provide comprehensive feedback

Overall Assessment: APPROVED

This is a well-implemented, focused integration that follows project conventions and includes appropriate safeguards.

Code Quality & Best Practices ✅

Registry Changes (src/semsql/builder/registry/ontologies.yaml:209-223):

  • Consistent formatting: Follows existing YAML structure and indentation
  • Proper categorization: Added under logical "ENERO Foundry" section with descriptive comment
  • Standard fields: Both entries include required url, has_imports: true, and prefixmap
  • Format specification: oto correctly specifies format: turtle for non-OWL source
  • Prefix mappings: Follow established patterns (CEPO, OTO namespace conventions)
  • Documentation: Clear comments explain the custom build workaround with upstream issue link

Test Implementation (tests/test_builder/test_registry.py):

  • Modern pytest style: Properly uses @pytest.mark.parametrize instead of unittest (follows CLAUDE.md guidelines)
  • Comprehensive coverage: Tests both download rules AND build commands
  • Regex validation: Properly escapes makefile syntax in test patterns
  • Module-scoped fixture: Efficient registry compilation shared across tests
  • Clear test structure: Well-named functions with descriptive parameters

Prefix Files (src/semsql/builder/prefixes/*.csv):

  • Consistent format: Added CEPO and OTO entries following CSV structure
  • Synchronized: Properly added to both prefixes.csv and prefixes_local.csv

Security Considerations ✅

URL Sources:

  • HTTPS only: Both URLs use secure HTTPS protocols
  • Trusted sources: Official OpenEnergyPlatform repositories
  • Production branches: Using production branch, not unstable main/develop

Build Commands:

  • perl command analysis: The oto build command safely rewrites import paths using standard regex substitution
  • No unsafe operations: Only performs text replacement and standard robot/merge operations
  • Temporary file cleanup: Properly removes temporary files (rm $@.tmp.ttl)
  • Input validation: Uses make variables properly ($<, $@)
  • Established pattern: Similar perl preprocessing used in 6 other registry entries (lines 513, 572, 740, 748, 779)

Performance Considerations ✅

Build Efficiency:

  • Appropriate complexity: cepo uses efficient default build, oto includes necessary preprocessing
  • Import resolution: Both marked has_imports: true indicating proper dependency handling
  • Temporary file pattern: oto build creates intermediate files efficiently, cleans up properly

Technical Implementation ✅

Custom Build Command Analysis:
The oto perl command addresses a real upstream issue:

perl -0pe 's@http://w3id.org/oto/develop/oto-shared.ttl@https://raw.githubusercontent.com/OpenEnergyPlatform/OpenTransportOntology/production/src/ontology/edits/oto-shared.ttl@g'
  • Necessary workaround: Resolves broken import that currently points to HTML instead of ontology
  • Conservative approach: Only changes the problematic import URL, nothing else
  • Follows precedent: Similar perl preprocessing used in 6 other registry entries

Test Coverage Analysis:

  • Download validation: Ensures makefile generates correct curl commands for both ontologies
  • Build validation: Validates both default robot merge (cepo) and custom perl preprocessing (oto)
  • Pattern matching: Uses appropriate regex escaping for makefile syntax validation
  • Parametrized design: Easy to extend for additional ENERO ontologies

Minor Suggestions

  1. Documentation: Consider adding inline comments explaining the import rewrite reasoning (though the PR description covers this well)

  2. Future Enhancement: Once upstream fixes the oto-shared.ttl import, this custom build command can be removed

Conclusion

This PR represents excellent engineering practices:

  • ✅ Minimal surface area implementation
  • ✅ Comprehensive test coverage
  • ✅ Addresses real downstream integration issues
  • ✅ Follows all established project conventions
  • ✅ Safe, auditable security posture
  • ✅ Clear documentation of trade-offs

Recommendation: APPROVE - Ready to merge


@cmungall
Copy link
Copy Markdown
Collaborator Author

Follow-up from local runoak exploration of the built DBs in this branch:

  • OEO is the large reference layer here. Native OEO content is 1272 classes (1260 with text definitions); the built DB has 2118 classes total. It spans energy carriers, sectors, technologies, power plants, and modeling/software/data concepts.
  • CEPO is a compact policy module. Native CEPO content is 143 classes (139 with text definitions); the built DB has 471 classes total because of imports. It is mostly a goals/targets branch plus a large policy-instrument hierarchy. policy instrument has 135 descendants in the current build.
  • OTO is a mid-sized transport module. Native OTO content is 206 classes (183 with text definitions); the built DB has 241 classes total. It covers vehicles, transport infrastructure, traffic/transport concepts, and some emissions/measurement terms.

Overlap / integration readout:

  • OEO already has a small policy branch (policy instrument has 13 descendants, including carbon tax, education instrument, feed-in tariff, market premium, levy, etc.).
  • CEPO appears complementary rather than redundant: it expands the policy area much more deeply, e.g. detailed target types plus a much finer instrument taxonomy (carbon tax, feed-in premium, grant, loan guarantee, trading scheme, disclosure standard, grid access and priority for renewables, etc.).
  • OTO overlaps with OEO mainly through mappings, not shared imports. In the current release I saw explicit skos:exactMatch links from OTO terms into OEO and some other vocabularies.

Quality notes from the same exploration:

  • CEPO looks fairly clean and definition-rich.
  • OEO also looks mature.
  • OTO currently has some curation issues in the published artifact: a couple of skos:exactMatch values are literal sentences rather than CURIEs/URIs, and at least one term (OTO:00020063 vehicle) has an OEO URL string showing up as a definition. So keeping the current OTO workaround is reasonable, but I would treat OTO as lower-trust than CEPO/OEO for now.

Representative commands used:

  • runoak -i db/cepo.db info i^CEPO:
  • runoak -i db/cepo.db labels .desc//p=i 'policy instrument'
  • runoak -i db/oto.db mappings i^OTO:
  • runoak -i db/oeo.db statistics --group-by-prefix -O csv
  • runoak -i db/oeo.db search 't~model'

Net result: this PR’s current scope makes sense as a first slice. OEO is the broad anchor, CEPO adds a useful fine-grained policy module, and OTO is usable but needs more caution because of upstream artifact quality.

@claude
Copy link
Copy Markdown

claude bot commented Mar 30, 2026

Claude finished @cmungall's task —— View job


PR Review: ENERO Registry Integration ✅

Review Tasks:

  • Analyze registry file changes (ontologies.yaml)
  • Review build commands in Makefile
  • Check prefix mappings additions
  • Review new test file implementation
  • Check code quality and best practices
  • Assess security considerations
  • Evaluate test coverage
  • Provide comprehensive feedback

Overall Assessment: APPROVED

This is a well-implemented, focused integration that follows project conventions and includes appropriate safeguards.

Code Quality & Best Practices ✅

Registry Changes (src/semsql/builder/registry/ontologies.yaml:208-222):

  • Consistent formatting: Follows existing YAML structure and indentation
  • Proper categorization: Added under logical "ENERO Foundry" section with descriptive comment
  • Standard fields: Both entries include required url, has_imports: true, and prefixmap
  • Format specification: oto correctly specifies format: turtle for non-OWL source
  • Prefix mappings: Follow established patterns (CEPO, OTO namespace conventions)
  • Documentation: Clear comments explain the custom build workaround with upstream issue link

Test Implementation (tests/test_builder/test_registry.py):

  • Modern pytest style: Properly uses @pytest.mark.parametrize instead of unittest (follows CLAUDE.md guidelines)
  • Comprehensive coverage: Tests both download rules AND build commands
  • Regex validation: Properly escapes makefile syntax in test patterns
  • Module-scoped fixture: Efficient registry compilation shared across tests
  • Clear test structure: Well-named functions with descriptive parameters

Prefix Files (src/semsql/builder/prefixes/*.csv):

  • Consistent format: Added CEPO and OTO entries following CSV structure
  • Synchronized: Properly added to both prefixes.csv and prefixes_local.csv

Security Considerations ✅

URL Sources:

  • HTTPS only: Both URLs use secure HTTPS protocols
  • Trusted sources: Official OpenEnergyPlatform repositories
  • Production branches: Using production branch, not unstable main/develop

Build Commands:

  • perl command analysis: The oto build command safely rewrites import paths using standard regex substitution
  • No unsafe operations: Only performs text replacement and standard robot/merge operations
  • Temporary file cleanup: Properly removes temporary files (rm $@.tmp.ttl)
  • Input validation: Uses make variables properly ($<, $@)
  • Established pattern: Similar perl preprocessing used in 6 other registry entries (lines 513, 572, 740, 748, 779)

Performance Considerations ✅

Build Efficiency:

  • Appropriate complexity: cepo uses efficient default build, oto includes necessary preprocessing
  • Import resolution: Both marked has_imports: true indicating proper dependency handling
  • Temporary file pattern: oto build creates intermediate files efficiently, cleans up properly

Technical Implementation ✅

Custom Build Command Analysis:
The oto perl command addresses a real upstream issue:

perl -0pe 's@http://w3id.org/oto/develop/oto-shared.ttl@https://raw.githubusercontent.com/OpenEnergyPlatform/OpenTransportOntology/production/src/ontology/edits/oto-shared.ttl@g'
  • Necessary workaround: Resolves broken import that currently points to HTML instead of ontology
  • Conservative approach: Only changes the problematic import URL, nothing else
  • Follows precedent: Similar perl preprocessing used in 6 other registry entries

Test Coverage Analysis:

  • Download validation: Ensures makefile generates correct curl commands for both ontologies
  • Build validation: Validates both default robot merge (cepo) and custom perl preprocessing (oto)
  • Pattern matching: Uses appropriate regex escaping for makefile syntax validation
  • Parametrized design: Easy to extend for additional ENERO ontologies

Minor Suggestions

  1. Documentation: Consider adding inline comments explaining the import rewrite reasoning (though the PR description covers this well)

  2. Future Enhancement: Once upstream fixes the oto-shared.ttl import, this custom build command can be removed

Conclusion

This PR represents excellent engineering practices:

  • ✅ Minimal surface area implementation
  • ✅ Comprehensive test coverage
  • ✅ Addresses real downstream integration issues
  • ✅ Follows all established project conventions
  • ✅ Safe, auditable security posture
  • ✅ Clear documentation of trade-offs

Recommendation: APPROVE - Ready to merge


@cmungall
Copy link
Copy Markdown
Collaborator Author

Follow-up: I added meno and muno as standalone registry entries on top of the earlier oeo / cepo / oto slice.

Why they were not effectively present before:

  • MENO only appeared as a prefix mapping nested under oeo; it was not a standalone ontology entry and there was no db/meno.db target.
  • MUNO was not present at all.

What changed:

  • meno now builds directly from https://raw.githubusercontent.com/stap-m/midlevel-energy-ontology/main/ontology/src/midlevel-energy.owl
  • muno now builds from https://raw.githubusercontent.com/stap-m/muno/main/src/muno.owl
  • muno needs a temporary import rewrite, similar in spirit to oto, because the published muno.owl imports two placeholder semanticweb.org/.../untitled-ontology-* IRIs that only resolve via the repo-local Protégé catalog.

Upstream issue for that muno import problem:

Validation in this branch:

  • uv run pytest tests/test_builder/test_registry.py -q
  • make db/meno.db
  • make db/muno.db
  • runoak -i db/meno.db info i^MENO:
  • runoak -i db/muno.db info i^MUNO:

The two new DBs build successfully. Representative normalized IDs now come back as expected, e.g. MENO:01001 and MUNO:01001.

One existing repo issue is still present and unrelated to these entries: after successful semsql make db/... completion, the CLI still tries to execute ontology metadata keys as shell commands (RUNNING: id, RUNNING: description, etc.). The DBs are already written before that noise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Integrate ENERO Foundry ontologies into Semantic-SQL

1 participant