Skip to content

fix: bundle Arrow deps to fix build on Debian 12#171

Merged
timosachsenberg merged 1 commit intomasterfrom
fix/arrow-bundle-missing-deps
Mar 5, 2026
Merged

fix: bundle Arrow deps to fix build on Debian 12#171
timosachsenberg merged 1 commit intomasterfrom
fix/arrow-bundle-missing-deps

Conversation

@timosachsenberg
Copy link
Contributor

@timosachsenberg timosachsenberg commented Mar 5, 2026

Summary

  • Bundle Arrow's missing dependencies (Snappy, zstd, Thrift, xsimd, RapidJSON) using Arrow's per-dependency _SOURCE=BUNDLED mechanism instead of requiring them as system packages
  • Remove libthrift-dev / thrift from CI prerequisites since Thrift is now bundled
  • Add documentation comment explaining the bundled dependency strategy

Context

Arrow from contrib does not build on Debian 12 (and likely other minimal Linux installations) because it requires Snappy, zstd, Thrift, xsimd, and RapidJSON which are not built by the contrib system and are not available as system packages.

Previously, the CI worked around this by installing libthrift-dev (Ubuntu) and thrift (macOS), and relied on GitHub runners having other deps pre-installed. This is fragile and undocumented.

The fix uses Arrow's built-in <pkg>_SOURCE=BUNDLED flags to tell Arrow to download and build these dependencies itself during the contrib build. Dependencies already provided by contrib (zlib, bzip2, boost) continue to be found via CMAKE_PREFIX_PATH.

Note: This requires internet access during the contrib build for the bundled dependency downloads.

Fixes OpenMS/OpenMS#8797

Test plan

  • Verify contrib CI passes on all platforms (Linux, macOS, Windows)
  • Test building Arrow from contrib on a clean Debian 12 system without Snappy/zstd/Thrift installed
  • Verify OpenMS builds and Parquet tests pass with the updated contrib

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Chores
    • Removed system package dependencies from installation workflows for Linux and macOS.
    • Updated build configuration to use bundled versions of Arrow dependencies (Snappy, zstd, Thrift, xsimd, RapidJSON) across all platforms.

…x build on Debian 12

Arrow requires Snappy, zstd, Thrift, xsimd, and RapidJSON which are not
built by the contrib system and may not be available as system packages
(e.g. on Debian 12). Use Arrow's per-dependency BUNDLED source mechanism
to have Arrow download and build these during the contrib build.

Dependencies already provided by contrib (zlib, bzip2, boost) continue
to be found via CMAKE_PREFIX_PATH.

Remove libthrift-dev/thrift from CI prerequisites since Thrift is now
bundled.

Fixes OpenMS/OpenMS#8797

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 5, 2026

📝 Walkthrough

Walkthrough

Removed system Thrift package dependencies from CI workflows while configuring Apache Arrow to build all required dependencies from bundled sources instead of relying on system installations across Linux, macOS, and Windows platforms.

Changes

Cohort / File(s) Summary
CI Package Removal
.github/workflows/main.yml
Removed libthrift-dev from Linux apt-get install and thrift from macOS brew install lists to eliminate system-level Thrift package dependencies.
Arrow Bundled Dependencies
libraries.cmake/arrow.cmake
Added CMake configuration directives to build Arrow's dependencies (Snappy, zstd, Thrift, xsimd, RapidJSON) from bundled sources across MSVC, Linux, and macOS paths with documentation explaining the approach.

Possibly related issues

  • #8797: Arrow from contrib does not build on Debian12 — These changes directly address the root cause by configuring Arrow to use bundled Snappy, zstd, and Thrift sources instead of requiring system packages, resolving the CMake failure reported on Debian12.

Possibly related PRs

Suggested reviewers

  • poshul

Poem

🐰 A rabbit hops through bundled sources, no system packages needed,
Thrift and Snappy, all self-contained, the build finally succeeded!
From Debian's dusty shelves we run, with Arrow standing tall,
Dependencies bundled tight and true—no external calls at all! 🎉

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix: bundle Arrow deps to fix build on Debian 12' directly matches the main objective of bundling Arrow dependencies to resolve build failures on Debian 12, as confirmed by the changes and PR objectives.
Linked Issues check ✅ Passed The PR addresses all coding requirements from issue #8797: configures bundled dependencies for Snappy, zstd, Thrift, xsimd, and RapidJSON; removes libthrift-dev/thrift from CI; and documents the strategy, enabling Arrow builds on Debian 12.
Out of Scope Changes check ✅ Passed All changes are directly related to bundling Arrow dependencies and removing system package prerequisites; no unrelated modifications are present outside the stated objectives.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/arrow-bundle-missing-deps

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
libraries.cmake/arrow.cmake (1)

45-49: Centralize bundled dependency flags to avoid branch drift.

The same *_SOURCE=BUNDLED flags are duplicated in both OS branches. Defining them once and reusing the list will make future updates safer.

♻️ Suggested refactor
+  set(ARROW_BUNDLED_DEP_ARGS
+      "-DSnappy_SOURCE=BUNDLED"
+      "-Dzstd_SOURCE=BUNDLED"
+      "-DThrift_SOURCE=BUNDLED"
+      "-Dxsimd_SOURCE=BUNDLED"
+      "-DRapidJSON_SOURCE=BUNDLED")
...
-                          -D Snappy_SOURCE=BUNDLED
-                          -D zstd_SOURCE=BUNDLED
-                          -D Thrift_SOURCE=BUNDLED
-                          -D xsimd_SOURCE=BUNDLED
-                          -D RapidJSON_SOURCE=BUNDLED
+                          ${ARROW_BUNDLED_DEP_ARGS}
...
-                          "-DSnappy_SOURCE=BUNDLED"
-                          "-Dzstd_SOURCE=BUNDLED"
-                          "-DThrift_SOURCE=BUNDLED"
-                          "-Dxsimd_SOURCE=BUNDLED"
-                          "-DRapidJSON_SOURCE=BUNDLED"
+                          ${ARROW_BUNDLED_DEP_ARGS}

Also applies to: 150-154

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@libraries.cmake/arrow.cmake` around lines 45 - 49, Extract the repeated -D
Snappy_SOURCE=BUNDLED, -D zstd_SOURCE=BUNDLED, -D Thrift_SOURCE=BUNDLED, -D
xsimd_SOURCE=BUNDLED and -D RapidJSON_SOURCE=BUNDLED flags into a single CMake
variable (e.g., ARROW_BUNDLED_DEPS) and use that variable in both OS branches
and the other duplicated location (the second block containing the same flags)
so the flags are defined once and reused to avoid branch drift.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@libraries.cmake/arrow.cmake`:
- Around line 45-49: Extract the repeated -D Snappy_SOURCE=BUNDLED, -D
zstd_SOURCE=BUNDLED, -D Thrift_SOURCE=BUNDLED, -D xsimd_SOURCE=BUNDLED and -D
RapidJSON_SOURCE=BUNDLED flags into a single CMake variable (e.g.,
ARROW_BUNDLED_DEPS) and use that variable in both OS branches and the other
duplicated location (the second block containing the same flags) so the flags
are defined once and reused to avoid branch drift.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 3b28de70-b1ca-4b3f-807c-6b2b3cc60ce7

📥 Commits

Reviewing files that changed from the base of the PR and between 3420d62 and f49b484.

📒 Files selected for processing (2)
  • .github/workflows/main.yml
  • libraries.cmake/arrow.cmake

@timosachsenberg timosachsenberg merged commit 52b8eca into master Mar 5, 2026
6 checks passed
@jpfeuffer
Copy link
Contributor

jpfeuffer commented Mar 5, 2026

You are not specifying where to get AWSSDK and others from in Arrow, which might differ from other systems like brew or conda that will include all optional dependencies. Just so you know. Not strictly required.

@timosachsenberg
Copy link
Contributor Author

Yeah it is still not 100% there... thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Arrow from contrib does not build on Debian12

2 participants