Throw user-visible error on opening a dataset which is too large to visualize#469
Throw user-visible error on opening a dataset which is too large to visualize#469
Conversation
Changed the resource directory in the parent pom.xml from . (the entire module root) to src/main/resources (the standard Maven convention). The old setting caused the resources plugin to scan everything including target/, where it would choke on binary .class files. Also upgraded the resources plugin in object/pom.xml from the ancient 2.3 to 3.3.1, which has better default exclusions as an extra safety net.
| long displayRows = selectedDims[selectedIndex[0]]; | ||
| long displayCols = (rank > 1) ? selectedDims[selectedIndex[1]] : 1; | ||
| if (displayRows > MAX_DISPLAY_ROWS) { | ||
| throw new Exception("Too many rows to display (" + displayRows + " rows, limit is " + |
There was a problem hiding this comment.
This is something that should be fixed at the tableview abstraction level, not the object library Dataset abstraction level.
|
|
||
| // SWT uses int pixel coordinates; dimension * pixelsPerCell overflows int, silently breaking display. | ||
| public static final long MAX_DISPLAY_ROWS = 80_000_000L; // ~89M at 24px row height | ||
| public static final long MAX_DISPLAY_COLS = 25_000_000L; // ~26M at 80px column width |
There was a problem hiding this comment.
This seems way too fragile to hard code a limit to check for
There was a problem hiding this comment.
In what sense? The pixel values aren't exact, but at least for columns it's hard to imagine a user wanting much less than 24px in height per row. The columns might vary more for datasets with large numeric data, but in that case the columns would be wider than expected, causing this check to be useless (since we hit the int limit on pixel size sooner than ~25M columns) but not actively harmful
There was a problem hiding this comment.
I think making predictions here based on pixel count will only get us into trouble; I could easily see the row and column sizes going in either direction. Based on https://github.com/eclipse-nattable/nattable/wiki/FAQ#how-much-data-can-you-put-behind-it, I think it would be better to simply check the number of rows and columns against 32-bit sizes and simply do not try to throw errors for any conditions outside of that. Any remaining problems are a downstream issue that we must accept, possibly updating to a newer version of the NATTable if available or filing a bug.
There was a problem hiding this comment.
If I understand you correctly, a direct row/col check against the 32-bit limit would leave a pretty large range of sizes where HDFView still fails to display with no error message (between 25M-2.1B columns or 80M-2.1B rows). Maybe it's not truly HDFView's responsibility to worry about this, but it's a case real users have run into and it's fairly easy to check and give a useful error. On top of that, it doesn't seem like NatTable is likely to change this in the short-to-medium term (and if it were, it would take time to propagate the version update into HDFView anyway).
There was a problem hiding this comment.
If I understand you correctly, a direct row/col check against the 32-bit limit would leave a pretty large range of sizes where HDFView still fails to display with no error message (between 25M-2.1B columns or 80M-2.1B rows).
Correct, but I would treat this as a downstream problem to be reported, as there is no good way we're going to be able to predict based off of pixel sizes.
On top of that, it doesn't seem like NatTable is likely to change this in the short-to-medium term (and if it were, it would take time to propagate the version update into HDFView anyway).
Maybe, but NatTable still seems to be under active development (https://github.com/eclipse-nattable/nattable), so we could easily (excepting source changes that may need to be made) update to a newer version and try this again. Or at least report an issue.
1f65675 to
ddee8e7
Compare
🔍 Quality Analysis Report
Details
|
When a dataset dimension is large enough along one axis that the number of elements times the number of pixels per cell overflows a Java int (~2.1B limit), NatTable/SWT breaks - the table window opens with headers but no data rows without throwing an error. This can be replicated with a dataset with ~340M rows ( * 24 pixels vertically =~ 2.1B) or ~26M columns (* 80 pixels horizontally =~ 2.1B).
Since the error exists in an underlying library and outside of our power to easily fix, I've added checks to detect this case and throw a user-visible error instead of silently failing.
Resolves #466
Important
Adds checks in
Dataset.javato prevent overflow errors when visualizing large datasets, and updates Maven plugin versions.getData()inDataset.javato throw an exception if dataset dimensions exceedMAX_DISPLAY_ROWSorMAX_DISPLAY_COLS.MAX_DISPLAY_ROWSandMAX_DISPLAY_COLSinDataset.javato limit dataset dimensions for display.maven-resources-pluginversion to3.3.1inobject/pom.xmlandpom.xml.pom.xml.This description was created by
for 8e22c09. You can customize this summary. It will automatically update as commits are pushed.