Conversation
There was a problem hiding this comment.
Pull request overview
Adds a new OAI-PMH ingestion pathway to TeSS to import Events and Materials from remote OAI-PMH endpoints, with support for both Dublin Core and Bioschemas RDF metadata (MR11 / #1179).
Changes:
- Introduce
Ingestors::OaiPmhIngestorwith RDF-first (Bioschemas) parsing and Dublin Core fallback. - Register the new ingestor in
IngestorFactory. - Add unit tests covering empty endpoints, Dublin Core materials/events, mixed records, and Bioschemas RDF parsing.
- Fix HTML
bodytagclassattribute quoting in the application layout.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
lib/ingestors/oai_pmh_ingestor.rb |
New OAI-PMH ingestor implementation for RDF (Bioschemas) and Dublin Core ingestion. |
lib/ingestors/ingestor_factory.rb |
Registers Ingestors::OaiPmhIngestor as an available ingestor. |
test/unit/ingestors/oai_pmh_test.rb |
Adds unit tests for OAI-PMH ingestion behavior and parsing outcomes. |
app/views/layouts/application.html.erb |
Quotes body class attribute to produce valid HTML when multiple classes are emitted. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| def read_oai_dublin_core(client) | ||
| count = 0 | ||
| client.list_records.full.each do |record| |
There was a problem hiding this comment.
read_oai_dublin_core calls client.list_records without a metadata_prefix. In OAI-PMH, ListRecords requests require metadataPrefix, and many servers/gems will return a badArgument / raise when it’s missing. Consider explicitly requesting Dublin Core (typically oai_dc) here so this works against spec-compliant endpoints.
| client.list_records.full.each do |record| | |
| client.list_records(metadata_prefix: 'oai_dc').full.each do |record| |
| end | ||
|
|
||
| def read(source_url) | ||
| client = OAI::Client.new source_url, headers: { 'From' => config[:mail] } |
There was a problem hiding this comment.
config[:user_agent] is defined but not used when instantiating OAI::Client. Consider passing it via headers (alongside From) so requests identify TeSS consistently, matching how other ingestors set User-Agent on outbound HTTP requests.
| client = OAI::Client.new source_url, headers: { 'From' => config[:mail] } | |
| client = OAI::Client.new source_url, headers: { 'From' => config[:mail], 'User-Agent' => config[:user_agent] } |
| @@ -0,0 +1,208 @@ | |||
| require 'open-uri' | |||
There was a problem hiding this comment.
require 'open-uri' doesn’t appear to be used in this ingestor (it doesn’t call open_url or URI.open). Consider removing the unused require to keep dependencies minimal.
| require 'open-uri' |
| def list_records(metadata_prefix: nil) | ||
| if metadata_prefix == 'rdf' | ||
| @rdf_response | ||
| else | ||
| @dc_response | ||
| end |
There was a problem hiding this comment.
FakeClient#list_records treats a missing metadata_prefix (nil) as Dublin Core. Since OAI-PMH ListRecords requests normally require an explicit metadataPrefix, consider making the mock require/handle the explicit DC prefix (e.g., oai_dc) so the tests exercise the same call pattern as production code.
| assert_equal @ingestor.materials, [] | ||
| assert_equal @ingestor.events, [] |
There was a problem hiding this comment.
These assertions have the expected/actual arguments reversed. Swapping them to assert_equal [], @ingestor.materials (and similarly for events) will produce clearer failure messages and matches the assertion style used elsewhere in the test suite.
| assert_equal @ingestor.materials, [] | |
| assert_equal @ingestor.events, [] | |
| assert_equal [], @ingestor.materials | |
| assert_equal [], @ingestor.events |
Summary of changes
Motivation and context
This is a relevant step in the mTeSS-X project. In particular, it is MR11. See #1179.
Checklist
to license it to the TeSS codebase under the
BSD license.