Skip to content

feat: Added describe-parquet command to print info of parquet files#65

Merged
subkanthi merged 10 commits intomasterfrom
59-provide-convenient-introspection-of-parquet-files
Feb 11, 2026
Merged

feat: Added describe-parquet command to print info of parquet files#65
subkanthi merged 10 commits intomasterfrom
59-provide-convenient-introspection-of-parquet-files

Conversation

@subkanthi
Copy link
Collaborator

… files.

@subkanthi subkanthi linked an issue Oct 15, 2025 that may be closed by this pull request
@subkanthi
Copy link
Collaborator Author

Testing:
local file

 describe-parquet iris.parquet -a
2025-10-28 19:00:00 [2119751-100] INFO c.a.i.r.c.i.r.RESTCatalogServlet > @token:anonymous GET v1/config
---
summary:
  rows: 150
  rowGroups: 1
  compressedSize: 1885
  uncompressedSize: 0
  createdBy: "DuckDB"
  columnCount: 5
columns:
- name: "sepal.length"
  type: "DOUBLE"
  repetition: "OPTIONAL"
- name: "sepal.width"
  type: "DOUBLE"
  repetition: "OPTIONAL"
- name: "petal.length"
  type: "DOUBLE"
  repetition: "OPTIONAL"
- name: "petal.width"
  type: "DOUBLE"
  repetition: "OPTIONAL"
- name: "variety"
  type: "BINARY"
  repetition: "OPTIONAL"
  logicalType: "STRING"
rowGroups:

@subkanthi
Copy link
Collaborator Author

remote file

describe-parquet   https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2025-01.parquet
2025-11-06 20:34:28 [12843412-49] INFO c.a.i.r.c.i.r.RESTCatalogServlet > @token:anonymous GET v1/config
---
summary:
  rows: 3475226
  rowGroups: 4
  compressedSize: 59138125
  uncompressedSize: 92602414
  createdBy: "parquet-cpp-arrow version 16.1.0"
  columnCount: 20

@subkanthi
Copy link
Collaborator Author

 describe-parquet -a   s3://aws-public-blockchain/v1.0/btc/transactions/date=2025-01-01/part-00000-33e8d075-2099-409b-a806-68dd17217d39-c000.snappy.parquet
2025-11-06 20:41:15 [12843412-78] INFO c.a.i.r.c.i.r.RESTCatalogServlet > @token:anonymous GET v1/config
---
summary:
  rows: 292213
  rowGroups: 5
  compressedSize: 539840873
  uncompressedSize: 717049859
  createdBy: "parquet-mr version 1.10.1 (build 65f31597b18a0f2718a129fd2d69af0168952c55)"
  columnCount: 19
columns:
- name: "hash"
  type: "BINARY"
  repetition: "OPTIONAL"
  logicalType: "STRING"
- name: "version"
  type: "INT64"
  repetition: "OPTIONAL"
- name: "size"
- ```

@subkanthi subkanthi marked this pull request as ready for review November 7, 2025 02:42
@subkanthi subkanthi requested a review from shyiko November 7, 2025 02:42
@subkanthi subkanthi requested a review from xieandrew January 16, 2026 18:48
Copy link
Collaborator

@xieandrew xieandrew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested the commands and everything seems to work. Just a few comments.

@subkanthi subkanthi requested a review from xieandrew February 10, 2026 19:23
Copy link
Collaborator

@xieandrew xieandrew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall, a couple import comments

throws IOException {
setAWSRegion(s3Region);
try (RESTCatalog catalog = loadCatalog()) {
var options = new java.util.ArrayList<DescribeParquet.Option>();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can just be new ArrayList (already imported)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

}

private static ParquetInfo extractParquetInfo(ParquetMetadata metadata, Option... options) {
var optionsSet = java.util.Set.of(options);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be imported as Set

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@xieandrew xieandrew changed the title Added functionality similar to parquet-tools to print info of parquet… feat: Added describe-parquet command to print info of parquet files Feb 11, 2026
@subkanthi subkanthi requested a review from xieandrew February 11, 2026 19:49
Copy link
Collaborator

@xieandrew xieandrew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@xieandrew xieandrew added the ice Relates to ice label Feb 11, 2026
@subkanthi subkanthi merged commit 0e83163 into master Feb 11, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ice Relates to ice

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Provide convenient introspection of Parquet files

2 participants