Skip to content

ETT-1288 Run fixity check on new storage#175

Open
moseshll wants to merge 15 commits intomainfrom
ETT-1288_fixity
Open

ETT-1288 Run fixity check on new storage#175
moseshll wants to merge 15 commits intomainfrom
ETT-1288_fixity

Conversation

@moseshll
Copy link
Contributor

@moseshll moseshll commented Mar 3, 2026

  • Add storage_name column to feed_audit and feed_audit_detail
  • main_repo_audit.pl now requires a --storage_name parameter
    • Expected values s3-truenas-ictc and s3-truenas-macc
  • HTFeed/Storage/LocalPairtree.pm and HTFeed/StorageAudit.pm also write feed_audit.storage_namd and feed_audit_detail.storage_name, respectivelt, but I'm not sure what values that can or should provide
  • Add a couple of happy path tests for the script as a whole.

moseshll added 8 commits March 3, 2026 11:38
- Add `storage_name` column to `feed_audit` and `feed_audit_detail`
- `main_repo_audit.pl` now requires a `--storage_name` parameter
  - Expected values `s3-truenas-ictc` and `s3-truenas-macc`
- `HTFeed/Storage/LocalPairtree.pm` and `HTFeed/StorageAudit.pm` also write `feed_audit.storage_namd` and `feed_audit_detail.storage_name`, respectivelt, but I'm not sure what values that can or should provide
- Add a couple of happy path tests for the script as a whole.
- New code is truenas_audit.pl
- TODO: several FIXMEs in the code, possible leftovers from main_repo_audit.pl inside truenas_audit.pl
@aelkiss aelkiss self-requested a review March 11, 2026 16:57
Copy link
Member

@aelkiss aelkiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've looked at this some; I want to look at the tests more, not sure I'll get to that today though.

Copy link
Member

@aelkiss aelkiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests look good; see comments about the backticks. Not critical, but maybe worth considering.

@moseshll
Copy link
Contributor Author

@aelkiss Since the scope of this has changed to focus on feed_storage instead of feed_audit I would suggest backing out changes to the feed_audit DB schema on this branch. I suspect the changes may still be disruptive for feed_internal but if this is to be the MVP for this round then I'd like to return to feed_audit after getting some utility from what's here.
Thoughts?

@moseshll moseshll marked this pull request as ready for review March 13, 2026 18:35
@moseshll moseshll requested a review from aelkiss March 13, 2026 18:35
Copy link
Member

@aelkiss aelkiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are definitely some things here to think about in the future:

  • how we use the checksums saved in feed_storage
  • questions about feed_audit going forward
  • in the future, merging this with StorageAudit and making it more testable

I think this will get us what we need for now, and we can make some follow-up issues for the remaining issues.

localtime( ( stat($metsfile) )[9] ) );
}

my $last_touched = $zip_seconds;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$last_touched doesn't appear to be used



#insert
execute_stmt(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe wait to do this until after the file checks? Maybe worth comparing to the backup audits to see behavior -- that is, do we update lastchecked when we see it or after we verify some part of it?

is($db_data->[0]->{storage_name}, $storage_name, 'correct storage_name');
ok($db_data->[0]->{zip_size} > 0, 'nonzero zip_size');
ok($db_data->[0]->{mets_size} > 0, 'nonzero mets_size');
ok(!defined $db_data->[0]->{saved_md5sum}, 'not defined saved_md5sum');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want it to save this? Is it worth using if it's there? Probably fine for now to leave as-is, but maybe worth thinking about in the future or otherwise making it more consistent with what we do for the backup auditing.

CREATE TABLE IF NOT EXISTS `feed_audit` (
`namespace` varchar(10) NOT NULL,
`id` varchar(30) NOT NULL,
`storage_name` varchar(32) NOT NULL,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to go ahead and roll back these changes

my $stmt =
"insert into feed_audit (namespace, id, sdr_partition, zip_size, zip_date, mets_size, mets_date, lastchecked, lastmd5check, md5check_ok) \
values(?,?,?,?,?,?,?,CURRENT_TIMESTAMP,CURRENT_TIMESTAMP,1) \
"insert into feed_audit (namespace, id, storage_name, sdr_partition, zip_size, zip_date, mets_size, mets_date, lastchecked, lastmd5check, md5check_ok) \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise here.

However, we also need to think about once we turn off the old storage that uses LocalPairtree, we probably still want feed_audit populated? Maybe we need to mark one storage as "primary" for the purposes of feed_audit, or have some kind of view. Regardless, we don't need to do it right now in this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants