Skip to content

Add parallel API calls via furrr/future#734

Closed
maciekbanas wants to merge 22 commits intomasterfrom
mb/furrr-parallel-api-calls
Closed

Add parallel API calls via furrr/future#734
maciekbanas wants to merge 22 commits intomasterfrom
mb/furrr-parallel-api-calls

Conversation

@maciekbanas
Copy link
Copy Markdown
Member

@maciekbanas maciekbanas commented Mar 12, 2026

Description

API calls across repos and orgs are currently sequential — each purrr::map loop waits for one HTTP response before starting the next. For setups with many repos/orgs, this is dominated by network latency.

This PR adds optional parallel execution using furrr/future. When a user enables parallelism via set_parallel(), API requests run concurrently across repos/orgs. When no parallel plan is set, behavior is identical to before (falls back to purrr::map).

Related Issue(s)

Fixes #736

Key changes

  • gitstats_map(), gitstats_map2(), gitstats_map_chr() — drop-in wrappers in utils.R that check future::plan() and dispatch to either purrr::map (sequential) or furrr::future_map (parallel)
  • set_parallel(workers) — exported convenience function to configure parallelism. Accepts TRUE (auto-detect cores), an integer, or FALSE/0 to disable
  • ~35 API-bound loops replaced across GitHost.R, GitHostGitHub.R, GitHostGitLab.R, EngineGraphQL.R, EngineGraphQLGitHub.R, EngineGraphQLGitLab.R, EngineRestGitHub.R, EngineRestGitLab.R
  • Data transformation loops and setup-time validation loops left as purrr::map (no benefit from parallelism)
  • furrr and future added to Imports

Usage

library(GitStats)

# Enable parallel processing (4 workers)
set_parallel(4)

my_gitstats <- create_gitstats() |>
  set_github_host(
    token = Sys.getenv(\"GITHUB_PAT\"),
    orgs = c(\"r-world-devs\", \"openpharma\")
  )

# These now run API calls in parallel across repos
get_commits(my_gitstats, since = \"2024-01-01\")
get_repos(my_gitstats)

# Revert to sequential
set_parallel(FALSE)

Design decisions

  • Opt-in parallelism: Default behavior unchanged. Users must call set_parallel() to enable.
  • No <<- in parallel: Loops using <<- for side effects (e.g., get_orgs_from_orgs_and_repos in GitLab) are left sequential.
  • seed = NULL: Passed to furrr_options since these are deterministic API calls, not stochastic computations.
  • Dev mode support: get_furrr_options() detects whether GitStats is installed or loaded via devtools::load_all() and adjusts how internal functions are exported to workers.

How to test

  1. Install with devtools::install() (requires furrr and future)
  2. Run existing tests — all should pass unchanged (sequential by default)
  3. Test parallel mode:
    set_parallel(2)
    # Run any get_* function and compare timing vs sequential
    set_parallel(FALSE)
    ```"

maciekbanas and others added 10 commits March 12, 2026 10:05
Replace sequential purrr::map calls with parallel-aware gitstats_map
wrappers in all org-level and repo-level API iteration loops. When a
non-sequential future plan is active (via set_parallel()), API requests
run concurrently across repos/orgs. Falls back to purrr::map when no
parallel plan is set, so existing behavior is unchanged by default.

Changes:
- Add furrr and future to Imports
- Add gitstats_map, gitstats_map2, gitstats_map_chr helpers in utils.R
- Add set_parallel() exported function for users
- Replace purrr::map with gitstats_map in ~35 API-bound loops across
  GitHost, GitHostGitHub, GitHostGitLab, EngineGraphQL*,  EngineRest*

Co-authored-by: Ona <no-reply@ona.com>
furrr::future_map requires .progress to be a single logical, but
set_progress_bar() returns a character label (e.g. 'GitHub') when
progress is enabled. Use !isFALSE(.progress) to coerce: character
or TRUE -> TRUE, FALSE -> FALSE.

Co-authored-by: Ona <no-reply@ona.com>
furrr spawns separate R sessions that don't have access to package
internals like url_decode(), show_message(), etc. Adding
packages = 'GitStats' to furrr_options ensures each worker loads
the package namespace.

Co-authored-by: Ona <no-reply@ona.com>
When GitStats is installed, use packages='GitStats' to load it on
workers. When running via devtools::load_all (package not installed),
export the namespace contents as explicit globals so workers can
access internal functions like url_decode, show_message, cli_icons.

Co-authored-by: Ona <no-reply@ona.com>
Replace sequential purrr::map calls with parallel-aware gitstats_map
wrappers in all org-level and repo-level API iteration loops. When a
non-sequential future plan is active (via set_parallel()), API requests
run concurrently across repos/orgs. Falls back to purrr::map when no
parallel plan is set, so existing behavior is unchanged by default.

Changes:
- Add furrr and future to Imports
- Add gitstats_map, gitstats_map2, gitstats_map_chr helpers in utils.R
- Add set_parallel() exported function for users
- Replace purrr::map with gitstats_map in ~35 API-bound loops across
  GitHost, GitHostGitHub, GitHostGitLab, EngineGraphQL*,  EngineRest*

Co-authored-by: Ona <no-reply@ona.com>
furrr::future_map requires .progress to be a single logical, but
set_progress_bar() returns a character label (e.g. 'GitHub') when
progress is enabled. Use !isFALSE(.progress) to coerce: character
or TRUE -> TRUE, FALSE -> FALSE.

Co-authored-by: Ona <no-reply@ona.com>
furrr spawns separate R sessions that don't have access to package
internals like url_decode(), show_message(), etc. Adding
packages = 'GitStats' to furrr_options ensures each worker loads
the package namespace.

Co-authored-by: Ona <no-reply@ona.com>
When GitStats is installed, use packages='GitStats' to load it on
workers. When running via devtools::load_all (package not installed),
export the namespace contents as explicit globals so workers can
access internal functions like url_decode, show_message, cli_icons.

Co-authored-by: Ona <no-reply@ona.com>
…rld-devs/gitstats into mb/furrr-parallel-api-calls

# Conflicts:
#	R/EngineRestGitLab.R
@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 16, 2026

Codecov Report

❌ Patch coverage is 65.67164% with 23 lines in your changes missing coverage. Please review.
✅ Project coverage is 91.37%. Comparing base (dea2177) to head (e6e7793).
⚠️ Report is 27 commits behind head on master.

Files with missing lines Patch % Lines
R/utils.R 33.33% 12 Missing ⚠️
R/gitstats_functions.R 0.00% 11 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #734      +/-   ##
==========================================
- Coverage   91.82%   91.37%   -0.45%     
==========================================
  Files          26       26              
  Lines        4600     4628      +28     
==========================================
+ Hits         4224     4229       +5     
- Misses        376      399      +23     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@maciekbanas maciekbanas self-assigned this Mar 17, 2026
maciekbanas and others added 7 commits March 17, 2026 08:55
Replace sequential purrr::map calls with parallel-aware gitstats_map
wrappers in all org-level and repo-level API iteration loops. When a
non-sequential future plan is active (via set_parallel()), API requests
run concurrently across repos/orgs. Falls back to purrr::map when no
parallel plan is set, so existing behavior is unchanged by default.

Changes:
- Add furrr and future to Imports
- Add gitstats_map, gitstats_map2, gitstats_map_chr helpers in utils.R
- Add set_parallel() exported function for users
- Replace purrr::map with gitstats_map in ~35 API-bound loops across
  GitHost, GitHostGitHub, GitHostGitLab, EngineGraphQL*,  EngineRest*

Co-authored-by: Ona <no-reply@ona.com>
furrr::future_map requires .progress to be a single logical, but
set_progress_bar() returns a character label (e.g. 'GitHub') when
progress is enabled. Use !isFALSE(.progress) to coerce: character
or TRUE -> TRUE, FALSE -> FALSE.

Co-authored-by: Ona <no-reply@ona.com>
furrr spawns separate R sessions that don't have access to package
internals like url_decode(), show_message(), etc. Adding
packages = 'GitStats' to furrr_options ensures each worker loads
the package namespace.

Co-authored-by: Ona <no-reply@ona.com>
When GitStats is installed, use packages='GitStats' to load it on
workers. When running via devtools::load_all (package not installed),
export the namespace contents as explicit globals so workers can
access internal functions like url_decode, show_message, cli_icons.

Co-authored-by: Ona <no-reply@ona.com>
maciekbanas added a commit that referenced this pull request Mar 18, 2026
Add parallel API calls via mirai (alternative to #734)
@maciekbanas
Copy link
Copy Markdown
Member Author

maciekbanas commented Mar 18, 2026

Closing, as solution with mirai #737 was merged.

@maciekbanas maciekbanas deleted the mb/furrr-parallel-api-calls branch March 27, 2026 09:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add parallel API calls to speed up data fetching

1 participant