Skip to content

Comments

Throw user-visible error on opening a dataset which is too large to visualize#469

Open
mattjala wants to merge 4 commits intomasterfrom
large_dset_error
Open

Throw user-visible error on opening a dataset which is too large to visualize#469
mattjala wants to merge 4 commits intomasterfrom
large_dset_error

Conversation

@mattjala
Copy link
Collaborator

@mattjala mattjala commented Feb 20, 2026

When a dataset dimension is large enough along one axis that the number of elements times the number of pixels per cell overflows a Java int (~2.1B limit), NatTable/SWT breaks - the table window opens with headers but no data rows without throwing an error. This can be replicated with a dataset with ~340M rows ( * 24 pixels vertically =~ 2.1B) or ~26M columns (* 80 pixels horizontally =~ 2.1B).

Since the error exists in an underlying library and outside of our power to easily fix, I've added checks to detect this case and throw a user-visible error instead of silently failing.

Resolves #466


Important

Adds checks in Dataset.java to prevent overflow errors when visualizing large datasets, and updates Maven plugin versions.

  • Behavior:
    • Adds checks in getData() in Dataset.java to throw an exception if dataset dimensions exceed MAX_DISPLAY_ROWS or MAX_DISPLAY_COLS.
    • Displays user-visible error when dataset is too large to visualize.
  • Constants:
    • Defines MAX_DISPLAY_ROWS and MAX_DISPLAY_COLS in Dataset.java to limit dataset dimensions for display.
  • Build:
    • Updates maven-resources-plugin version to 3.3.1 in object/pom.xml and pom.xml.
    • Changes resource directory path in pom.xml.

This description was created by Ellipsis for 8e22c09. You can customize this summary. It will automatically update as commits are pushed.

Changed the resource directory in the parent pom.xml from . (the entire module root) to src/main/resources (the standard Maven convention). The old setting caused the resources plugin to
  scan everything including target/, where it would choke on binary .class files. Also upgraded the resources plugin in object/pom.xml from the ancient 2.3 to 3.3.1, which has better default
   exclusions as an extra safety net.
long displayRows = selectedDims[selectedIndex[0]];
long displayCols = (rank > 1) ? selectedDims[selectedIndex[1]] : 1;
if (displayRows > MAX_DISPLAY_ROWS) {
throw new Exception("Too many rows to display (" + displayRows + " rows, limit is " +
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is something that should be fixed at the tableview abstraction level, not the object library Dataset abstraction level.


// SWT uses int pixel coordinates; dimension * pixelsPerCell overflows int, silently breaking display.
public static final long MAX_DISPLAY_ROWS = 80_000_000L; // ~89M at 24px row height
public static final long MAX_DISPLAY_COLS = 25_000_000L; // ~26M at 80px column width
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems way too fragile to hard code a limit to check for

Copy link
Collaborator Author

@mattjala mattjala Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In what sense? The pixel values aren't exact, but at least for columns it's hard to imagine a user wanting much less than 24px in height per row. The columns might vary more for datasets with large numeric data, but in that case the columns would be wider than expected, causing this check to be useless (since we hit the int limit on pixel size sooner than ~25M columns) but not actively harmful

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think making predictions here based on pixel count will only get us into trouble; I could easily see the row and column sizes going in either direction. Based on https://github.com/eclipse-nattable/nattable/wiki/FAQ#how-much-data-can-you-put-behind-it, I think it would be better to simply check the number of rows and columns against 32-bit sizes and simply do not try to throw errors for any conditions outside of that. Any remaining problems are a downstream issue that we must accept, possibly updating to a newer version of the NATTable if available or filing a bug.

Copy link
Collaborator Author

@mattjala mattjala Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand you correctly, a direct row/col check against the 32-bit limit would leave a pretty large range of sizes where HDFView still fails to display with no error message (between 25M-2.1B columns or 80M-2.1B rows). Maybe it's not truly HDFView's responsibility to worry about this, but it's a case real users have run into and it's fairly easy to check and give a useful error. On top of that, it doesn't seem like NatTable is likely to change this in the short-to-medium term (and if it were, it would take time to propagate the version update into HDFView anyway).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand you correctly, a direct row/col check against the 32-bit limit would leave a pretty large range of sizes where HDFView still fails to display with no error message (between 25M-2.1B columns or 80M-2.1B rows).

Correct, but I would treat this as a downstream problem to be reported, as there is no good way we're going to be able to predict based off of pixel sizes.

On top of that, it doesn't seem like NatTable is likely to change this in the short-to-medium term (and if it were, it would take time to propagate the version update into HDFView anyway).

Maybe, but NatTable still seems to be under active development (https://github.com/eclipse-nattable/nattable), so we could easily (excepting source changes that may need to be made) update to a newer version and try this again. Or at least report an issue.

@github-actions
Copy link
Contributor

🔍 Quality Analysis Report

Metric Status Value
Code Coverage Unknown
Quality Gate ✅ Passed 0 issues

Details

  • Coverage Threshold: 60% minimum (non-blocking - JaCoCo integration issue)
  • PMD Violations: Maximum 4000 allowed (progressive reduction target)
  • Checkstyle: No errors allowed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Problem Opening Very Long Table

2 participants