Skip to content

ENG-2292 - Add distance column to StagedResourceAncestor#7326

Merged
vcruces merged 2 commits intomainfrom
ENG-2292
Feb 11, 2026
Merged

ENG-2292 - Add distance column to StagedResourceAncestor#7326
vcruces merged 2 commits intomainfrom
ENG-2292

Conversation

@vcruces
Copy link
Copy Markdown
Contributor

@vcruces vcruces commented Feb 5, 2026

Ticket ENG-2292

Description Of Changes

This PR adds a distance column to the StagedResourceAncestor table to track the hierarchical distance between resources in their ancestor relationships. The distance is calculated based on URN segment depth (e.g., database → schema = 1, database → table = 2, database → field = 3).

The implementation includes a database migration, backfill script to populate existing records, and tests.

Code Changes

  • Added distance column (nullable integer) to StagedResourceAncestor table
  • Created migration with conditional index creation based on table size
  • Implemented backfill script backfill_stagedresrouceancestor_distance.py that calculates distance from URN segment counts
  • Integrated distance backfill into post-upgrade process and admin backfill status endpoint
  • Updated StagedResourceAncestor.create_all_staged_resource_ancestor_links() to accept and store distance values
  • Added ix_staged_resource_ancestor_desc_anc_dist index on [descendant_urn, ancestor_urn, distance]
  • Updated all related tests to handle distance field in ancestor relationships

Steps to Confirm

  1. Run the migration and verify distance column is added to stagedresourceancestor table
  2. Verify backfill process correctly calculates distances based on URN segments
  3. Check admin endpoint /api/v1/admin/backfills/status includes distance backfill status
  4. Verify downgrade migration properly removes the column and clears backfill history

Pre-Merge Checklist

  • Issue requirements met
  • All CI pipelines succeeded
  • CHANGELOG.md updated
    • Add a db-migration This indicates that a change includes a database migration label to the entry if your change includes a DB migration
    • Add a high-risk This issue suggests changes that have a high-probability of breaking existing code label to the entry if your change includes a high-risk change (i.e. potential for performance impact or unexpected regression) that should be flagged
    • Updates unreleased work already in Changelog, no new entry necessary
  • UX feedback:
    • All UX related changes have been reviewed by a designer
    • No UX review needed
  • Followup issues:
    • Followup issues created
    • No followup issues
  • Database migrations:
    • Ensure that your downrev is up to date with the latest revision on main
    • Ensure that your downgrade() migration is correct and works
      • If a downgrade migration is not possible for this change, please call this out in the PR description!
    • No migrations
  • Documentation:
    • Documentation complete, PR opened in fidesdocs
    • Documentation issue created in fidesdocs
    • If there are any new client scopes created as part of the pull request, remember to update public-facing documentation that references our scope registry
    • No documentation updates required

@vercel
Copy link
Copy Markdown
Contributor

vercel Bot commented Feb 5, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

2 Skipped Deployments
Project Deployment Actions Updated (UTC)
fides-plus-nightly Ignored Ignored Preview Feb 11, 2026 10:03pm
fides-privacy-center Ignored Ignored Feb 11, 2026 10:03pm

Request Review

@vcruces vcruces changed the title Eng 2292 ENG-2292 - Add distance column to StagedResourceAncestor Feb 5, 2026
@vcruces vcruces changed the base branch from main to ENG-2293 February 5, 2026 21:46
@vcruces vcruces force-pushed the ENG-2293 branch 2 times, most recently from a980566 to 864d4f6 Compare February 8, 2026 23:48
@vcruces vcruces added the do not merge Please don't merge yet, bad things will happen if you do label Feb 9, 2026
@vcruces vcruces requested a review from adamsachs February 10, 2026 13:32
@vcruces vcruces marked this pull request as ready for review February 10, 2026 13:32
@vcruces vcruces requested a review from a team as a code owner February 10, 2026 13:32
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Feb 10, 2026

Greptile Overview

Greptile Summary

This PR adds a nullable distance integer column to stagedresourceancestor to represent hierarchical distance between a descendant and ancestor URN, plus a composite index on (descendant_urn, ancestor_urn, distance). It also introduces a post-upgrade batched backfill that computes distance from URN segment counts, wires the backfill into the post-upgrade runner, and exposes the pending-count via the admin backfill status endpoint.

The main correctness risk is an API/signature change: StagedResourceAncestor.create_all_staged_resource_ancestor_links() now expects a mapping of descendant URN to (ancestor_urn, distance) tuples; if any existing production callers still pass Set[str] as before, they will break at runtime. There are also a couple of repo-rule issues (mid-file imports) and a timing-based test that is prone to flakiness.

Confidence Score: 3/5

  • This PR has a couple of merge-blocking integration/testing issues to address before it’s safe to merge.
  • Score reduced due to a likely breaking signature change in a model method with no verified production caller updates, plus a flaky timing-based test and import-structure violations that should be cleaned up.
  • src/fides/api/models/detection_discovery/core.py; tests/ops/models/detection_discovery/test_core.py; tests/ops/migration_tests/backfill_scripts.py/test_backfill_stagedresourceancestor_distance.py; src/fides/api/api/v1/endpoints/admin.py

Important Files Changed

Filename Overview
.fides/db_dataset.yml Added distance field metadata to the stagedresourceancestor dataset entry.
changelog/7326-stagedresourceancestor-distance.yaml Added changelog entry for new distance column (db-migration label present).
src/fides/api/alembic/migrations/versions/xx_2026_02_08_2339_f85bd4c08401_add_is_leaf_to_stagedresource.py Adjusted migration logging message for existing conditional index creation.
src/fides/api/alembic/migrations/versions/xx_2026_02_09_0010_d304f57aea6d_add_distance_to_stagedresourceancestor.py Adds nullable distance column and conditionally creates a composite index; downgrade drops index/column and deletes backfill_history row.
src/fides/api/api/v1/endpoints/admin.py Backfill status endpoint now reports pending count for the new distance backfill; introduces a mid-file import that should be moved.
src/fides/api/migrations/backfill_scripts/backfill_stagedresrouceancestor_distance.py New batched backfill calculates distance via URN segment-count SQL and exposes pending-count helper.
src/fides/api/migrations/post_upgrade_backfill.py Wires the new distance backfill into the post-upgrade backfill runner; comment references an incorrect migration id.
src/fides/api/migrations/post_upgrade_index_creation.py Registers deferred concurrent index creation for (descendant_urn, ancestor_urn, distance) on stagedresourceancestor.
src/fides/api/models/detection_discovery/core.py Adds distance column, adds ORM index, and changes create_all_staged_resource_ancestor_links to require (ancestor_urn, distance) tuples; also introduces a mid-file import.
tests/ctl/api/test_admin.py Updates admin backfill status tests to include the new distance pending count.
tests/ops/migration_tests/backfill_scripts.py/test_backfill_stagedresourceancestor_distance.py Adds tests for pending-count query and distance backfill calculation; depends on ORM behavior for auto-generated IDs.
tests/ops/models/detection_discovery/test_core.py Updates ancestor-link tests to include distance and adds/retains a timing-based test using time.sleep, which is flaky.

Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

12 files reviewed, 6 comments

Edit Code Review Agent Settings | Greptile

Comment thread src/fides/api/models/detection_discovery/core.py
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Feb 10, 2026

Additional Comments (4)

src/fides/api/api/v1/endpoints/admin.py
Mid-file import added

from enum import StrEnum is introduced mid-file (admin.py:36) even though this module already imports from enum at the top. This violates the repo guidance to keep imports at the top and can also confuse import ordering tools; it should be moved to the existing import block and the duplicate enum import consolidated.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!


src/fides/api/models/detection_discovery/core.py
Import not at top

from enum import StrEnum is added below other code (core.py:68) instead of within the top import section, which violates the repo rule against mid-file imports. This should be moved up with the other enum imports to keep module import structure consistent.

Context Used: Rule from dashboard - Python imports should always be placed at the top of the file, not near the code that uses them. (source)

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!


tests/ops/models/detection_dis/recovery/test_core.py
Timing-based test flakiness

This test uses time.sleep(0.1) to force differing timestamps (test_core.py:1094-1097). This is inherently flaky under CI load and also contradicts the repo guidance to freeze time (freezegun) rather than relying on timing/tolerances. As written, slow scheduling or clock granularity can cause intermittent failures even when the code is correct.

Context Used: Rule from dashboard - When dealing with flaky tests due to timing issues, prefer using freezegun to freeze time rather tha... (source)


src/fides/api/migrations/post_upgrade_backfill.py
Incorrect migration ID comment

The comment says the distance backfill was added in migration 7bf63ecd48c2 (post_upgrade_backfill.py:84), but the actual Alembic revision in this PR is d304f57aea6d (xx_2026_02_09_0010_d304f57aea6d_add_distance_to_stagedresourceancestor.py). This is misleading for operators trying to trace schema/backfill provenance and should be corrected.

Base automatically changed from ENG-2293 to main February 10, 2026 15:59
@vcruces vcruces force-pushed the ENG-2292 branch 2 times, most recently from 3bb7875 to 85899ae Compare February 10, 2026 17:28
Copy link
Copy Markdown
Contributor

@adamsachs adamsachs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, this is so straightforward to review given the solid backfill framework already in place! 👏

just a couple of very small, non-blocking comments.

Comment thread src/fides/api/models/detection_discovery/core.py Outdated
@vcruces vcruces added this pull request to the merge queue Feb 11, 2026
@vcruces vcruces removed the do not merge Please don't merge yet, bad things will happen if you do label Feb 11, 2026
Merged via the queue into main with commit 2de0c6c Feb 11, 2026
53 of 54 checks passed
@vcruces vcruces deleted the ENG-2292 branch February 11, 2026 23:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants