Skip to content

HDDS-14517. [Recon] Include all storage report fields in CSV report for Capacity Distribution#9681

Open
priyeshkaratha wants to merge 2 commits intoapache:masterfrom
priyeshkaratha:HDDS-14517
Open

HDDS-14517. [Recon] Include all storage report fields in CSV report for Capacity Distribution#9681
priyeshkaratha wants to merge 2 commits intoapache:masterfrom
priyeshkaratha:HDDS-14517

Conversation

@priyeshkaratha
Copy link
Contributor

What changes were proposed in this pull request?

This pull request simplifies how datanode storage metrics are downloaded. Instead of having multiple endpoints, the download logic is now handled by a single improved endpoint. The old endpoint that only supported pending deletion data has been removed. The new endpoint generates a detailed CSV with all relevant storage metrics, giving a more complete view of datanode health and usage.

  • Refactored Datanode Metrics Download: The dedicated /download endpoint for pending deletion metrics in PendingDeletionEndpoint has been removed.
  • Consolidated Storage Report: A new /download endpoint has been introduced in StorageDistributionEndpoint to provide a comprehensive CSV report.
  • Expanded Report Fields: The new CSV report now includes all datanode storage fields such as Capacity, Used Space, Remaining Space, Committed Space, Reserved Space, Minimum Free Space, and Pending Block Size.
  • Frontend URL Update: The frontend application (capacity.tsx) has been updated to use the new consolidated download URL.
  • Updated Test Coverage: Corresponding unit tests have been removed from TestPendingDeletionEndpoint and new, comprehensive tests have been added in TestStorageDistributionEndpoint for the new download functionality.

What is the link to the Apache JIRA

HDDS-14517

How was this patch tested?

Tested using modified testcases.

@priyeshkaratha priyeshkaratha changed the title HDDS-14517. Include all storage report fields in downloaded CSV report for Capacity Distribution HDDS-14517. [Recon] Include all storage report fields in downloaded CSV report for Capacity Distribution Jan 28, 2026
@priyeshkaratha
Copy link
Contributor Author

Hi @devabhishekpal @devmadhuu @ArafatKhan2198 can you please review the changes?

@adoroszlai adoroszlai changed the title HDDS-14517. [Recon] Include all storage report fields in downloaded CSV report for Capacity Distribution HDDS-14517. [Recon] Include all storage report fields in CSV report for Capacity Distribution Jan 28, 2026
@devabhishekpal devabhishekpal requested review from ArafatKhan2198, devabhishekpal and devmadhuu and removed request for ArafatKhan2198 January 30, 2026 08:26
Copy link
Contributor

@devabhishekpal devabhishekpal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change looks good to me, but it also needs frontend change.
The download button currently calls the pendingDeletion/download API endpoint.
However this changes it to storageDistribution/download so the download button won't work

@priyeshkaratha
Copy link
Contributor Author

This change looks good to me, but it also needs frontend change. The download button currently calls the pendingDeletion/download API endpoint. However this changes it to storageDistribution/download so the download button won't work

Thanks @devabhishekpal for the review. calling storageDistribution/download is already part of the PR. Did I miss anything specific?

Copy link
Contributor

@devabhishekpal devabhishekpal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch, lgtm @priyeshkaratha

@priyeshkaratha priyeshkaratha marked this pull request as ready for review February 8, 2026 05:55
Copy link
Contributor

@devmadhuu devmadhuu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch @priyeshkaratha . Some comments, pls check.

}

@GET
@Path("/download")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are adding new APIs, ReadMe also to be updated along with Swagger API doc
cc: @devabhishekpal

}

Map<String, DatanodeStorageReport> reportByUuid =
collectDatanodeReports().stream()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. collectDatanodeReports() calls nodeManager.getAllNodes() which iterates all datanodes
  2. For each datanode, it calls nodeManager.getNodeStat() and nodeManager.getTotalFilesystemUsage()
  3. This happens synchronously during the HTTP request

For 1000+ datanodes, this could be a bottleneck, can we use already-collected metrics from DataNodeMetricsService, because chances are that user may click to download csv as soon as he see the data in table and if this is not feasible, then can we add some kind of timeout protection ?

for (DatanodePendingDeletionMetrics metric : pendingDeletionMetrics) {
DatanodeStorageReport report = reportByUuid.get(metric.getDatanodeUuid());
if (report == null) {
continue; // skip if report is missing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When a datanode has pending deletion metrics but no storage report, the row is silently dropped from the CSV. This could mask operational issues.
Scenarios Where This Fails:

  1. Datanode just registered but hasn't sent full report yet
  2. Datanode in STALE state
  3. Race condition between metrics collection and report generation
    Impact: Operators lose visibility into datanodes with pending deletions, which is exactly what they're trying to monitor.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants