refactor: bioconductor submission by mtran-code · Pull Request #80 · bhklab/AnnotationGx

mtran-code · 2026-02-17T16:51:47Z

Cleanup package and address issues brought up by BiocCheck::BiocCheck('new-package'=TRUE) to prepare for submission to Bioconductor.

Summary by CodeRabbit

New Features
- Improved API input validation and safer request/error flows; added interactive examples across docs.
Bug Fixes
- More robust query construction, response handling and retry/throttling behavior for external services.
Documentation
- Expanded examples in vignettes and man pages; updated authorship and bug report metadata.
Chores
- Bumped package to 0.99.1, raised R minimum to (>= 4.5.0), added several runtime packages, refreshed CI workflow and applied broad code-style cleanups.

coderabbitai · 2026-02-17T16:52:14Z

📝 Walkthrough

Walkthrough

This PR bumps the package to 0.99.1, updates metadata (authors, R requirement, imports, BugReports, biocViews), removes a CI workflow and adds a Bioconductor-focused CI workflow, standardizes formatting (including vapply replacements), tightens PubChem request/error flows, normalizes boolean literals, and updates documentation, vignettes, tests, and data-raw scripts.

Changes

Cohort / File(s)	Summary
Package metadata & ignores `DESCRIPTION`, `.Rbuildignore`, `.gitignore`	Version bump to 0.99.1; maintainer role moved to Michael Tran; R >= 4.5.0; new Imports (httr2, data.table, readr, xml2, methods, memoise); BugReports URL added; LazyData removed; biocViews added; minor whitespace/newline fixes in ignore files.
CI workflows `.github/workflows/R-CMD-check.yaml`, `.github/workflows/check-bioc.yml`	Removed legacy multi-job CI; added new Bioconductor-aware workflow that builds a dynamic matrix, supports per-OS/R/bioc checks, and optional Docker build/push steps.
Core API & query logic `R/biomart_core.R`, `R/pubchem_rest.R`, `R/pubchem_view.R`, `R/pubchem_status.R`	Extensive internal changes: sapply → vapply for type-safe operations, query construction reflowed, PubChem flows restructured with explicit query-only early returns, tryCatch-based error handling, throttling header capture and retry logic, and more explicit response validation. Public signatures unchanged but internal control flow updated.
Utilities & helpers `R/utils-*.R`, `R/utils-matchNested.R`, `R/utils-split.R`, `R/utils-standardize_names.R`	Formatting and some input validation added (.strSplitFinite), assertion boolean style standardized, nested matching reflowed; core behavior preserved.
Formatting-only / stylistic `R/chembl.R`, `R/oncotree.R`, `R/cellosaurus*.R`, `R/pubchem_view_helpers.R`, `R/unichem_helpers.R`, many tests and data-raw files	Signature reflowing, indentation, whitespace normalization, explicit TRUE/FALSE replacements, and other non-functional style changes across many files and tests.
Documentation & vignettes `man/.Rd`, `R/AnnotationGx-data.R`, `vignettes/.Rmd`, `vignettes/articles/*`	Roxygen cleanup, added interactive examples in multiple Rd files, renamed knitr chunk labels, expanded/explained vignette text, added sessionInfo() chunks; no API changes.
Tests & examples `tests/testthat/`, `inst/extdata/`, `data-raw/*`	Mostly formatting updates and boolean literal normalization in tests and example/data-raw scripts; a few added example blocks for documentation and small test formatting tweaks.
Styling for pkgdown `pkgdown/extra.css`	Minor CSS tweak: h3 font-weight set to bold (replacing font-size: medium).

Sequence Diagram(s)

sequenceDiagram
    participant Caller as Client
    participant PubFn as getPubchemCompound
    participant Builder as .build_pubchem_rest_query
    participant HTTP as HTTP Handler
    participant Err as Error Handler

    Caller->>PubFn: call with id/domain/namespace (query_only?)
    PubFn->>Builder: build query list (named fields)
    Builder-->>PubFn: return query list

    alt query_only = TRUE
        PubFn-->>Caller: return query list
    else
        PubFn->>HTTP: execute requests (tryCatch)
        alt request success
            HTTP-->>PubFn: responses
            PubFn->>PubFn: detect failures via vapply, parse responses
            PubFn-->>Caller: return parsed results
        else request failure
            HTTP->>Err: raise/format error
            Err-->>PubFn: error details
            PubFn-->>Caller: return error state / message
        end
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

feat: include_query and name mapping in cellosaurus #66 — touches Cellosaurus mapping code; likely related to formatting/behavior in R/cellosaurus* files.

Poem

🐰 A hop, a patch, version 0.99.1,
vapply lined up, tidy and done,
TRUEs in place, errors caught with care,
Vignettes polished, CI found a new lair.
Thump—this rabbit applauds the tidy run!

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'refactor: bioconductor submission' accurately summarizes the main objective of the changeset - preparing the package for Bioconductor submission.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch refactor/bioconductor-submission

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 9

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (6)

R/pubchem_status.R (2)
29-38: ⚠️ Potential issue | 🟠 Major

Unbounded retry loop can hang indefinitely.

The comment on line 26 acknowledges that X-Throttling-Control may be absent from a 200 response, yet this is exactly the condition that makes the while (is.null(message)) loop infinite — a successful response without the header will keep looping forever. Similarly, persistent non-200 responses cause an infinite loop with only a 1-second sleep.

Consider adding a maximum retry count and/or exponential backoff.
Proposed fix
   message <- NULL
+  max_retries <- 5
+  attempt <- 0
 
-  while (is.null(message)) {
+  while (is.null(message) && attempt < max_retries) {
+    attempt <- attempt + 1
     response <- httr2::req_perform(request)
 
     if (httr2::resp_status(response) == 200) {
       message <- response$headers[["X-Throttling-Control"]]
     } else {
       .warn("Request failed. Retrying...")
-      Sys.sleep(1)
+      Sys.sleep(min(2^attempt, 30))
     }
   }
+  if (is.null(message)) {
+    .warn(funContext, " Could not retrieve throttling status after ", max_retries, " attempts.")
+    return(invisible(NULL))
+  }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@R/pubchem_status.R` around lines 29 - 38, The retry loop in pubchem_status.R
using while(is.null(message)) can hang indefinitely if a 200 response lacks the
X-Throttling-Control header or if responses remain non-200; modify the loop
around req_perform to use a bounded retry strategy (e.g., a max_retries counter)
and exponential backoff (e.g., sleep_time doubling each retry) and break with an
error or return NA after retries exhausted; ensure you check
httr2::resp_status(response) and presence of
response$headers[["X-Throttling-Control"]] each iteration, increment retry
counters in both success-without-header and failure branches, and surface a
clear failure via .warn/.stop or a returned status when max_retries reached.
39-43: ⚠️ Potential issue | 🟡 Minor

Implicit NULL return when returnMessage = FALSE.

The @return documentation states it returns "the status code," but when returnMessage is FALSE the function falls off the end and returns NULL invisibly. If this is intentional, consider return(invisible(httr2::resp_status(response))) or updating the docs to clarify.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@R/pubchem_status.R` around lines 39 - 43, The function currently only returns
parsed_info when returnMessage is TRUE, causing an implicit NULL when FALSE;
either make the function explicitly return the HTTP status (and keep it
invisible) or update the docs. Fix by adding an explicit return for the
non-message branch — for example return(invisible(httr2::resp_status(response)))
— or change the roxygen `@return` to state that NULL is returned; update
references in this function around parsed_info, .checkThrottlingStatus2,
returnMessage, and the httr2::resp_status(response) call accordingly.
man/AnnotationGx-package.Rd (1)
19-25: ⚠️ Potential issue | 🟡 Minor

Regenerate man/ files after DESCRIPTION maintainer change.

The DESCRIPTION file correctly identifies Michael Tran as maintainer (cre role), but man/AnnotationGx-package.Rd line 19 still lists Jermiah Joseph as \strong{Maintainer}. Since this .Rd file is auto-generated by roxygen2, it must be regenerated to reflect the current DESCRIPTION. Run roxygen2::roxygenise() to update the documentation.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@man/AnnotationGx-package.Rd` around lines 19 - 25, The man page
man/AnnotationGx-package.Rd still shows "\strong{Maintainer}: Jermiah Joseph"
even though DESCRIPTION now lists Michael Tran as the maintainer (cre role);
regenerate the roxygen2 documentation so the .Rd files reflect DESCRIPTION by
running roxygen2::roxygenise() (or the project’s documented build step) to
update man/AnnotationGx-package.Rd and replace the old \strong{Maintainer} entry
with the current maintainer from DESCRIPTION.
data-raw/sample_pipeline.Rmd (1)
181-184: ⚠️ Potential issue | 🟡 Minor

Pre-existing syntax error on Line 184.

"TRASH/CTRP_result.csv"****) has trailing **** that will cause a parse error. This wasn't introduced by this PR, but since the file is being touched it's worth fixing.
-data.table::fread("TRASH/CTRP_result.csv"****)
+data.table::fread("TRASH/CTRP_result.csv")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@data-raw/sample_pipeline.Rmd` around lines 181 - 184, The CTRP_result fread
call contains stray characters and an extra token causing a syntax error; update
the CTRP_result line (matching the other imports like GDSC_result, gCSI_result,
CCLE_result) to call data.table::fread with a single string argument
"TRASH/CTRP_result.csv" and remove the trailing **** so the parentheses are
balanced (i.e., mirror the other lines such as GDSC_result and CCLE_result).
DESCRIPTION (1)
22-30: ⚠️ Potential issue | 🟠 Major

parallel package is used but not declared in Imports.

parallel::mclapply is used in multiple files (R/pubchem_view.R:158, R/cellosaurus_annotations.R:38 & 61, R/cellosaurus.R:143 & 174), but parallel is not listed in Imports or Depends. BiocCheck will flag this — all packages accessed via :: must be declared in DESCRIPTION, including base packages.
Proposed fix
 Imports: 
     checkmate,
     crayon,
     httr2,
     data.table,
+    parallel,
     readr,
     xml2,
     methods,
     memoise
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@DESCRIPTION` around lines 22 - 30, The package uses parallel::mclapply in
multiple places (e.g., R/pubchem_view.R, R/cellosaurus_annotations.R,
R/cellosaurus.R), but DESCRIPTION does not declare parallel in Imports; add
"parallel" to the Imports field of DESCRIPTION so BiocCheck/R CMD check
recognizes the dependency, and then re-run checks; you can continue using
parallel::mclapply calls (no NAMESPACE change required) or alternatively add an
importFrom(parallel, mclapply) if you prefer explicit NAMESPACE import.
tests/testthat/test_pubchem_view.R (1)
78-87: ⚠️ Potential issue | 🟡 Minor

fake_parser is referenced before it's defined — test passes for the wrong reason.

On line 81, fake_parser is passed to annotatePubchemCompound inside expect_error(), but fake_parser isn't defined until line 84. The test passes because R raises object 'fake_parser' not found, not because the function validates the heading. This pre-dates the current PR but was touched during reformatting.
Suggested fix: move definition before the assertion
+  fake_parser <- function(x) {
+    return(data.table::data.table(Heading = "CAS", Value = "fake_value"))
+  }
+
   expect_error(annotatePubchemCompound(
     CID,
     heading = "fake_placeholder",
     parse_function = fake_parser
   ))
 
-  fake_parser <- function(x) {
-    return(data.table::data.table(Heading = "CAS", Value = "fake_value"))
-  }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/testthat/test_pubchem_view.R` around lines 78 - 87, The test calls
annotatePubchemCompound(..., parse_function = fake_parser) before fake_parser is
defined, causing the test to pass for the wrong reason (missing object) instead
of validating heading handling; move the fake_parser definition above both
expect_error(...) and the subsequent successful call so that expect_error
invokes annotatePubchemCompound with a real parse_function, ensuring it fails
due to the invalid heading rather than an undefined symbol (update the
fake_parser function used in the test that returns
data.table::data.table(Heading="CAS", Value="fake_value")).

🧹 Nitpick comments (9)

R/utils-split.R (2)
67-70: Inconsistent checkmate function naming convention.

Lines 67–68 use camelCase (assertString, assertFlag), while lines 69–70 use snake_case (assert_integerish, assert_character). Similarly, .strSplitInfinite on line 137 uses assertCharacter (camelCase). checkmate supports both styles, but picking one consistently improves readability.
♻️ Suggested: unify to camelCase
   checkmate::assertString(split)
   checkmate::assertFlag(fixed)
-  checkmate::assert_integerish(n, lower = 2L, upper = Inf)
-  checkmate::assert_character(x)
+  checkmate::assertIntegerish(n, lower = 2L, upper = Inf)
+  checkmate::assertCharacter(x)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@R/utils-split.R` around lines 67 - 70, The checkmate assertions use mixed
naming styles; standardize them to camelCase for consistency: replace
assert_integerish and assert_character with assertIntegerish and assertCharacter
in the function that contains checkmate::assertString and checkmate::assertFlag,
and also change the assertCharacter call in .strSplitInfinite to the camelCase
variant if it's not already; update all checkmate::... calls in R/utils-split.R
to use the camelCase names (assertString, assertFlag, assertIntegerish,
assertCharacter) so the file uses a single naming convention.
161-165: Remove or restore the commented-out assertions.

This block has been commented out. If the old DFrame/CharacterList/isString checks are no longer needed (replaced by other validation), consider removing the dead code entirely. If they're still intended, replace them with checkmate equivalents consistent with the rest of the file.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@R/utils-split.R` around lines 161 - 165, Either remove the dead commented
assertions or restore them using checkmate to match the rest of the file:
replace the commented assert(...) block with checkmate::assert_class(object,
"DFrame"), checkmate::assert_class(object[[colName]], "CharacterList") (or
another appropriate checkmate assertion if CharacterList is represented
differently in this codebase), and checkmate::assert_string(split); ensure you
import/namespace checkmate consistently with other checks in the file.
R/cellosaurus.R (1)
576-582: sapply on line 579 was not converted to vapply.

The PR commits mention "prefer vapply over sapply" as a targeted fix, but this sapply call in .formatComments was missed. sapply can return unpredictable types (list vs. matrix vs. vector) depending on input, and BiocCheck flags this.

Note that strsplit always returns a list, so sapply(test_, strsplit, ...) will return a list here anyway—but for consistency with the stated PR goal and to satisfy BiocCheck, consider switching to lapply (since the result is a list of character vectors, vapply wouldn't directly apply).
Suggested fix
-  test_ <- sapply(test_, strsplit, split = "; ")
+  test_ <- lapply(test_, strsplit, split = "; ")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@R/cellosaurus.R` around lines 576 - 582, Replace the use of sapply in
.formatComments when processing test_ (the call sapply(test_, strsplit, split =
"; ")) with lapply to ensure a stable list output; locate the block that creates
test_ from strSplit(object[["CC"]], ": ", ...) and change the sapply call to
lapply(test_, strsplit, split = "; ") so object[["CC"]] is consistently assigned
a list of character vectors and BiocCheck warnings are avoided.
R/pubchem_view_helpers.R (1)
100-103: Remove commented-out dead code.

This commented-out alternative implementation should be removed as part of the cleanup for Bioconductor submission. If it's needed for reference, version control history preserves it.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@R/pubchem_view_helpers.R` around lines 100 - 103, Remove the commented-out
dead code lines containing the alternative implementation (the three commented
lines "url", "httr2::url_build()", and ".build_request()") from
pubchem_view_helpers.R; simply delete those commented lines so only the active
implementation remains (you can rely on VCS history if the alternative is ever
needed).
R/utils-standardize_names.R (1)
77-80: Prefer != over !x == for clarity.

While !space_action == "-" is correct in R (comparison binds tighter than !), using != is more immediately readable and avoids a common source of confusion in other languages.
♻️ Suggested simplification
-  if (!space_action == "-") {
+  if (space_action != "-") {
     name <- gsub("-", "", name)
   }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@R/utils-standardize_names.R` around lines 77 - 80, The condition currently
uses the confusing negation form `!space_action == "-"`; change it to the
clearer inequality operator `space_action != "-"` in the block that removes
hyphens (the code that assigns to `name` via `gsub("-", "", name)`), so the
behavior remains the same but the intent is clearer and avoids confusion with
operator precedence on `space_action`.
R/utils-matchNested.R (2)
28-31: keep_duplicates in the generic's signature argument is unusual if not used for dispatch.

The setGeneric declares signature = c("x", "table", "keep_duplicates"), but none of the setMethod calls dispatch on keep_duplicates. Including it in signature without corresponding method specialization is harmless but misleading — it suggests dispatch occurs on that argument. Consider removing it from signature to clarify intent.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@R/utils-matchNested.R` around lines 28 - 31, The generic matchNested
currently lists keep_duplicates in the setGeneric signature even though no
setMethod dispatches on that argument; remove "keep_duplicates" from the
signature argument in the setGeneric call so signature = c("x", "table") (or
omit signature entirely) and confirm existing setMethod definitions for
matchNested continue to match correctly; update any related
documentation/comments if they imply dispatch on keep_duplicates.
34-83: The list, data.table, and data.frame method bodies are nearly identical — consider extracting common logic.

All three implementations share the same pattern: convert → deduplicate → lookup. The only difference is that data.table and data.frame use apply(..., MARGIN=1) with a type assertion, while list uses lapply. A single internal helper accepting the pre-processed list would eliminate the triplication.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@R/utils-matchNested.R` around lines 34 - 83, Three nearly identical methods
(matchNested,list; matchNested,data.table; matchNested,data.frame) should share
a single helper to avoid duplication: create an internal function (e.g.,
.matchNested_from_unlisted) that accepts a list of unlisted rows and
keep_duplicates, calls .convert_nested_list_to_dt(), applies the deduplication
when keep_duplicates is FALSE, and returns the idx lookup for x; then change
matchNested,list to build the list via lapply(unlistNested, ...) and call the
helper, and change matchNested,data.table and matchNested,data.frame to assert
types, build the list via apply(..., FUN = unlistNested, simplify = FALSE), and
call the same helper (referencing unlistNested, .convert_nested_list_to_dt, and
keep_duplicates).
R/oncotree.R (1)
15-15: Use HTTPS for the Oncotree API URL.

"http://oncotree.mskcc.org" should be changed to "https://oncotree.mskcc.org". The HTTPS endpoint is available (responds with 200), and HTTP redirects to HTTPS anyway (301). Bioconductor BiocCheck flags HTTP URLs, so this should be addressed before submission.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@R/oncotree.R` at line 15, The URL assigned to the variable url in Oncotree.R
uses an insecure scheme; update the literal "http://oncotree.mskcc.org" to use
HTTPS ("https://oncotree.mskcc.org") wherever url is defined or constructed
(refer to the url variable in Oncotree.R) so requests use the secure endpoint
and avoid BiocCheck HTTP warnings.
R/biomart_core.R (1)
354-358: @export combined with @keywords internal on BioMartClient.

This combination is contradictory: @export adds the class to NAMESPACE while @keywords internal hides it from the documentation index. If the intent is to export the class for programmatic use but discourage casual use, this is acceptable. Otherwise, consider removing one of the two directives. BiocCheck may flag this.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@R/biomart_core.R` around lines 354 - 358, The Roxygen block for the R6 class
BioMartClient currently has both `@export` and `@keywords` internal which conflict;
decide whether BioMartClient should be exported or internal and update the
Roxygen: if the class should be available to package users keep `@export` and
remove `@keywords` internal (or change to `@keywords` internal only if it should not
be exported), then re-document to regenerate NAMESPACE; reference the R6 class
name BioMartClient and the Roxygen directives `@export` and `@keywords` internal
when making the change.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@inst/extdata/test_cellosaurus_detailed.R`:
- Around line 71-78: Remove the stray debug call da() from the test block and
either convert this snippet into a real unit test or delete it (since
inst/extdata/ is for data, not tests); if keeping the test, replace the da()
call and add meaningful assertions that call .formatComments(dt) and check its
output with expect_* helpers (e.g., expect_equal/expect_true) to validate
formatted comment fields and edge cases; locate the code in the test_that block
referencing .formatComments and dt and update it accordingly.

In `@R/cellosaurus_annotations.R`:
- Around line 38-48: The call inside the mclapply is hardcoding query_only =
FALSE instead of forwarding the function parameter, making the parameter a
no-op; update the anonymous function that builds requests so it passes the
enclosing query_only value into .build_cellosaurus_request (e.g., replace
query_only = FALSE with query_only = query_only) so the caller's flag is honored
by requests <- parallel::mclapply(...).

In `@R/pubchem_view.R`:
- Around line 187-193: Replace the sapply call on parsed_responses with a safer
vapply (or lapply if responses are non-scalar) and change .replace_null to use a
scalar if/else: locate the call to sapply(parsed_responses, .replace_null) and
replace with vapply(parsed_responses, .replace_null, FUN.VALUE = character(1))
if every response is a single character, otherwise use lapply(parsed_responses,
.replace_null); update the .replace_null function to use a scalar conditional
(if (is.null(x)) NA_character_ else x) instead of ifelse to avoid type-coercion
issues.

In `@R/utils-logging.R`:
- Around line 30-36: The logging checks use options("log_level") (which returns
a named list) so comparisons fail; update each logging function (.info, .debug,
.warn, .err) to call getOption("log_level") instead and compare its scalar value
(e.g., replace options("log_level") == "INFO" with getOption("log_level") ==
"INFO" and replace (options("log_level") %in% c(...)) with
getOption("log_level") %in% c(...)); optionally handle NULL by using
getOption("log_level", "INFO") or explicit NULL-safe checks so the membership
and equality tests behave correctly in .info, .debug, .warn, and .err.

In `@R/utils-matchNested.R`:
- Around line 132-150: The second duplicate setMethod registration for
matchNested incorrectly repeats the signature (x = "character", table =
"data.frame"); change that second setMethod to register the numeric dispatch so
it becomes signature(x = "numeric", table = "data.frame") so matchNested called
with a numeric x will dispatch correctly; locate the duplicate setMethod blocks
for matchNested,data.frame and update the signature in the second definition to
use x = "numeric".

In `@vignettes/Cellosaurus.Rmd`:
- Around line 47-51: The sentence contains a duplicated word "the the" in the
explanation of return columns; edit the vignette text so the phrase reads "will
return the Standardized Identifier as `cellLineName` and the Cellosaurus
Accession ID `accession`" (keep the surrounding mention of `from = "idsy"` and
`query` unchanged).

In `@vignettes/ChEMBL.Rmd`:
- Around line 77-78: The Markdown link for "API Documentation" is broken by
stray bold markers; edit the paragraph containing the link (the line with [API
Documentation](...)) to remove the extra asterisks so the URL reads
https://chembl.gitbook.io/chembl-interface-documentation/web-services/chembl-data-web-services
and ensure the link is formatted as [API
Documentation](https://chembl.gitbook.io/chembl-interface-documentation/web-services/chembl-data-web-services).

In `@vignettes/Introduction.Rmd`:
- Around line 26-28: Change the article before "R" in the sentence mentioning
the package: replace "a `R` package hosted on Bioconductor" with "an `R` package
hosted on Bioconductor" so the line referencing `R` and `AnnotationGx` uses the
correct indefinite article.

In `@vignettes/OncoTree.Rmd`:
- Around line 65-69: The vignette text and code call different function names:
the prose mentions getCancerSubtypes while the code chunk invokes
getOncotreeTumorTypes(); update the vignette so they match by either (A)
changing the descriptive text to refer to getOncotreeTumorTypes(), or (B)
changing the code chunk to call getCancerSubtypes()—choose the name that matches
the actual exported function in the package (getCancerSubtypes or
getOncotreeTumorTypes) and update both the sentence and the code chunk to use
that same identifier.

---

Outside diff comments:
In `@data-raw/sample_pipeline.Rmd`:
- Around line 181-184: The CTRP_result fread call contains stray characters and
an extra token causing a syntax error; update the CTRP_result line (matching the
other imports like GDSC_result, gCSI_result, CCLE_result) to call
data.table::fread with a single string argument "TRASH/CTRP_result.csv" and
remove the trailing **** so the parentheses are balanced (i.e., mirror the other
lines such as GDSC_result and CCLE_result).

In `@DESCRIPTION`:
- Around line 22-30: The package uses parallel::mclapply in multiple places
(e.g., R/pubchem_view.R, R/cellosaurus_annotations.R, R/cellosaurus.R), but
DESCRIPTION does not declare parallel in Imports; add "parallel" to the Imports
field of DESCRIPTION so BiocCheck/R CMD check recognizes the dependency, and
then re-run checks; you can continue using parallel::mclapply calls (no
NAMESPACE change required) or alternatively add an importFrom(parallel,
mclapply) if you prefer explicit NAMESPACE import.

In `@man/AnnotationGx-package.Rd`:
- Around line 19-25: The man page man/AnnotationGx-package.Rd still shows
"\strong{Maintainer}: Jermiah Joseph" even though DESCRIPTION now lists Michael
Tran as the maintainer (cre role); regenerate the roxygen2 documentation so the
.Rd files reflect DESCRIPTION by running roxygen2::roxygenise() (or the
project’s documented build step) to update man/AnnotationGx-package.Rd and
replace the old \strong{Maintainer} entry with the current maintainer from
DESCRIPTION.

In `@R/pubchem_status.R`:
- Around line 29-38: The retry loop in pubchem_status.R using
while(is.null(message)) can hang indefinitely if a 200 response lacks the
X-Throttling-Control header or if responses remain non-200; modify the loop
around req_perform to use a bounded retry strategy (e.g., a max_retries counter)
and exponential backoff (e.g., sleep_time doubling each retry) and break with an
error or return NA after retries exhausted; ensure you check
httr2::resp_status(response) and presence of
response$headers[["X-Throttling-Control"]] each iteration, increment retry
counters in both success-without-header and failure branches, and surface a
clear failure via .warn/.stop or a returned status when max_retries reached.
- Around line 39-43: The function currently only returns parsed_info when
returnMessage is TRUE, causing an implicit NULL when FALSE; either make the
function explicitly return the HTTP status (and keep it invisible) or update the
docs. Fix by adding an explicit return for the non-message branch — for example
return(invisible(httr2::resp_status(response))) — or change the roxygen `@return`
to state that NULL is returned; update references in this function around
parsed_info, .checkThrottlingStatus2, returnMessage, and the
httr2::resp_status(response) call accordingly.

In `@tests/testthat/test_pubchem_view.R`:
- Around line 78-87: The test calls annotatePubchemCompound(..., parse_function
= fake_parser) before fake_parser is defined, causing the test to pass for the
wrong reason (missing object) instead of validating heading handling; move the
fake_parser definition above both expect_error(...) and the subsequent
successful call so that expect_error invokes annotatePubchemCompound with a real
parse_function, ensuring it fails due to the invalid heading rather than an
undefined symbol (update the fake_parser function used in the test that returns
data.table::data.table(Heading="CAS", Value="fake_value")).

---

Duplicate comments:
In `@man/getPubchemProperties.Rd`:
- Around line 17-23: In the examples section for getPubchemProperties(), replace
the runtime-only guard if (interactive()) { getPubchemProperties() } with a Rd
\donttest{ getPubchemProperties() } block so Bioconductor/CRAN run checks won't
skip it, and remove the extra blank line before the closing brace so the example
block is tidy; locate the example around getPubchemProperties() in the Rd
examples and make these two edits.

---

Nitpick comments:
In `@R/biomart_core.R`:
- Around line 354-358: The Roxygen block for the R6 class BioMartClient
currently has both `@export` and `@keywords` internal which conflict; decide whether
BioMartClient should be exported or internal and update the Roxygen: if the
class should be available to package users keep `@export` and remove `@keywords`
internal (or change to `@keywords` internal only if it should not be exported),
then re-document to regenerate NAMESPACE; reference the R6 class name
BioMartClient and the Roxygen directives `@export` and `@keywords` internal when
making the change.

In `@R/cellosaurus.R`:
- Around line 576-582: Replace the use of sapply in .formatComments when
processing test_ (the call sapply(test_, strsplit, split = "; ")) with lapply to
ensure a stable list output; locate the block that creates test_ from
strSplit(object[["CC"]], ": ", ...) and change the sapply call to lapply(test_,
strsplit, split = "; ") so object[["CC"]] is consistently assigned a list of
character vectors and BiocCheck warnings are avoided.

In `@R/oncotree.R`:
- Line 15: The URL assigned to the variable url in Oncotree.R uses an insecure
scheme; update the literal "http://oncotree.mskcc.org" to use HTTPS
("https://oncotree.mskcc.org") wherever url is defined or constructed (refer to
the url variable in Oncotree.R) so requests use the secure endpoint and avoid
BiocCheck HTTP warnings.

In `@R/pubchem_view_helpers.R`:
- Around line 100-103: Remove the commented-out dead code lines containing the
alternative implementation (the three commented lines "url",
"httr2::url_build()", and ".build_request()") from pubchem_view_helpers.R;
simply delete those commented lines so only the active implementation remains
(you can rely on VCS history if the alternative is ever needed).

In `@R/utils-matchNested.R`:
- Around line 28-31: The generic matchNested currently lists keep_duplicates in
the setGeneric signature even though no setMethod dispatches on that argument;
remove "keep_duplicates" from the signature argument in the setGeneric call so
signature = c("x", "table") (or omit signature entirely) and confirm existing
setMethod definitions for matchNested continue to match correctly; update any
related documentation/comments if they imply dispatch on keep_duplicates.
- Around line 34-83: Three nearly identical methods (matchNested,list;
matchNested,data.table; matchNested,data.frame) should share a single helper to
avoid duplication: create an internal function (e.g.,
.matchNested_from_unlisted) that accepts a list of unlisted rows and
keep_duplicates, calls .convert_nested_list_to_dt(), applies the deduplication
when keep_duplicates is FALSE, and returns the idx lookup for x; then change
matchNested,list to build the list via lapply(unlistNested, ...) and call the
helper, and change matchNested,data.table and matchNested,data.frame to assert
types, build the list via apply(..., FUN = unlistNested, simplify = FALSE), and
call the same helper (referencing unlistNested, .convert_nested_list_to_dt, and
keep_duplicates).

In `@R/utils-split.R`:
- Around line 67-70: The checkmate assertions use mixed naming styles;
standardize them to camelCase for consistency: replace assert_integerish and
assert_character with assertIntegerish and assertCharacter in the function that
contains checkmate::assertString and checkmate::assertFlag, and also change the
assertCharacter call in .strSplitInfinite to the camelCase variant if it's not
already; update all checkmate::... calls in R/utils-split.R to use the camelCase
names (assertString, assertFlag, assertIntegerish, assertCharacter) so the file
uses a single naming convention.
- Around line 161-165: Either remove the dead commented assertions or restore
them using checkmate to match the rest of the file: replace the commented
assert(...) block with checkmate::assert_class(object, "DFrame"),
checkmate::assert_class(object[[colName]], "CharacterList") (or another
appropriate checkmate assertion if CharacterList is represented differently in
this codebase), and checkmate::assert_string(split); ensure you import/namespace
checkmate consistently with other checks in the file.

In `@R/utils-standardize_names.R`:
- Around line 77-80: The condition currently uses the confusing negation form
`!space_action == "-"`; change it to the clearer inequality operator
`space_action != "-"` in the block that removes hyphens (the code that assigns
to `name` via `gsub("-", "", name)`), so the behavior remains the same but the
intent is clearer and avoids confusion with operator precedence on
`space_action`.

coderabbitai · 2026-02-17T16:59:23Z

inst/extdata/test_cellosaurus_detailed.R

 test_that(".formatComments works as expected", {
-    # cc_column <- AnnotationGx:::.formatComments(x)
+  # cc_column <- AnnotationGx:::.formatComments(x)

-    dt <- data.table::data.table(rbind(x))
-
-    da(); .formatComments(dt)
+  dt <- data.table::data.table(rbind(x))

+  da()
+  .formatComments(dt)
 })


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Check if `da` is defined anywhere in the package rg -n --type=r '\bda\b\s*(<-|=)\s*function' # Also check if it's used elsewhere rg -n --type=r '\bda' # Let's also check what other functions are in this test file echo "=== Content of the test file ===" head -n 100 inst/extdata/test_cellosaurus_detailed.R

Repository: bhklab/AnnotationGx

Length of output: 1736

Remove da() debug artifact and add assertions to complete the test.

da() on line 76 is undefined in the package and appears only in this location—it's a leftover debug call. More critically, the test_that block lacks any expect_* assertions, so it validates nothing. The test creates data but doesn't check the output of .formatComments(dt).

Either remove the test_that wrapper (since inst/extdata/ is for package data, not tests), or add meaningful assertions to validate .formatComments() behavior.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@inst/extdata/test_cellosaurus_detailed.R` around lines 71 - 78, Remove the stray debug call da() from the test block and either convert this snippet into a real unit test or delete it (since inst/extdata/ is for data, not tests); if keeping the test, replace the da() call and add meaningful assertions that call .formatComments(dt) and check its output with expect_* helpers (e.g., expect_equal/expect_true) to validate formatted comment fields and edge cases; locate the code in the test_that block referencing .formatComments and dt and update it accordingly.

coderabbitai · 2026-02-17T16:59:24Z

R/cellosaurus_annotations.R

+  requests <- parallel::mclapply(accessions, function(accession) {
+    .build_cellosaurus_request(
+      query = accession,
+      to = to,
+      numResults = 1,
+      apiResource = "search/cell-line",
+      output = "TXT",
+      sort = NULL,
+      query_only = FALSE
    )
-    names(responses) <- accessions
-    if (raw) {
-        return(responses)
-    }
+  })


⚠️ Potential issue | 🟡 Minor

query_only parameter is accepted but never forwarded.

The function declares query_only as a parameter (Line 32) but hardcodes query_only = FALSE on Line 46 instead of passing the caller's value. This makes the parameter a no-op.

.build_cellosaurus_request( query = accession, to = to, numResults = 1, apiResource = "search/cell-line", output = "TXT", sort = NULL, - query_only = FALSE + query_only = query_only )

This appears to be a pre-existing issue, but since the lines were reformatted it's worth addressing.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@R/cellosaurus_annotations.R` around lines 38 - 48, The call inside the mclapply is hardcoding query_only = FALSE instead of forwarding the function parameter, making the parameter a no-op; update the anonymous function that builds requests so it passes the enclosing query_only value into .build_cellosaurus_request (e.g., replace query_only = FALSE with query_only = query_only) so the caller's flag is honored by requests <- parallel::mclapply(...).

coderabbitai · 2026-02-17T16:59:24Z

R/pubchem_view.R

  sapply(parsed_responses, .replace_null)
-
 }

 # helper function to replace NULL with NA
 .replace_null <- function(x) {
  ifelse(is.null(x), NA_character_, x)
-}
+}


🛠️ Refactor suggestion | 🟠 Major

sapply remains on line 187, and .replace_null uses ifelse instead of if/else.

Given the PR's goal of replacing sapply with vapply, line 187 is an oversight. Additionally, .replace_null uses vectorized ifelse() on a scalar check (is.null(x)), which has unintuitive type-coercion behavior for non-atomic inputs. A simple if/else is safer here.

Proposed fix

- sapply(parsed_responses, .replace_null) + vapply(parsed_responses, .replace_null, FUN.VALUE = character(1)) } # helper function to replace NULL with NA .replace_null <- function(x) { - ifelse(is.null(x), NA_character_, x) + if (is.null(x)) NA_character_ else x }

Note: The FUN.VALUE = character(1) assumes all parsed responses are scalar character. If responses can be other types (lists, data.tables), vapply may not be appropriate here and lapply would be the safer alternative.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

sapply(parsed_responses, .replace_null)

}

# helper function to replace NULL with NA

.replace_null <- function(x) {

ifelse(is.null(x), NA_character_, x)

}

\ No newline at end of file

}

vapply(parsed_responses, .replace_null, FUN.VALUE = character(1))

}

# helper function to replace NULL with NA

.replace_null <- function(x) {

if (is.null(x)) NA_character_ else x

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@R/pubchem_view.R` around lines 187 - 193, Replace the sapply call on parsed_responses with a safer vapply (or lapply if responses are non-scalar) and change .replace_null to use a scalar if/else: locate the call to sapply(parsed_responses, .replace_null) and replace with vapply(parsed_responses, .replace_null, FUN.VALUE = character(1)) if every response is a single character, otherwise use lapply(parsed_responses, .replace_null); update the .replace_null function to use a scalar conditional (if (is.null(x)) NA_character_ else x) instead of ifelse to avoid type-coercion issues.

coderabbitai · 2026-02-17T16:59:24Z

R/utils-logging.R

 .info <- function(...) {
  msg <- .log_fmt("INFO", ...)
  optionIsTRUE <- options("log_level") == "INFO" || (options("log_level") %in% c("WARN", "DEBUG", "ERROR"))
-  if(optionIsTRUE) {
+  if (optionIsTRUE) {
    message(crayon::green(msg))
  }
 }


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n R/utils-logging.R

Repository: bhklab/AnnotationGx

Length of output: 3128

🌐 Web query:

R programming options() function returns list vs getOption() scalar value

💡 Result:

In base R, the two functions are designed with different return contracts:

options(...) always returns a named list, even if you ask for a single option (e.g., options("digits") → a list of length 1 named "digits"). This is because options() is a multi-purpose API: it can query many options at once, set options, and when setting, it returns the previous values (again as a list). [1]

getOption("name") returns the value of one option (a scalar, vector, function, etc.), or NULL (or your provided default=) if it’s not set. It’s the recommended way to fetch one option. [2]

Examples:

options("digits") # list(digits = 7) getOption("digits") # 7 options("digits")$digits # 7 (extract from the list) options()[["digits"]] # 7

Rule of thumb: use getOption() when you want “the option value”; use options() when you want “a list of option(s)” or you’re setting options and want the old values back. [1][2]

Sources:
[1] R options() documentation (Value section: returns a list even of length 1) (docs.tibco.com)
[2] R getOption() documentation (returns single option value, supports default=) (rdrr.io)

Citations:

1: https://docs.tibco.com/pub/enterprise-runtime-for-R/6.0.0/doc/html/Language_Reference/base/options.html?utm_source=openai

2: https://rdrr.io/r/base/options.html?utm_source=openai

Replace options() with getOption() in all logging functions.

In R, options("log_level") returns a named list (list(log_level = ...)), not a scalar value. Comparisons like options("log_level") == "INFO" will not behave as intended. Use getOption("log_level") instead, which returns the actual value.

This bug exists in all four logging functions: .info() (line 32), .debug() (line 55), .warn() (line 65), and .err() (line 73).

Suggested fix for .info (similar pattern applies to .debug, .warn, .err)

- optionIsTRUE <- options("log_level") == "INFO" || (options("log_level") %in% c("WARN", "DEBUG", "ERROR")) + optionIsTRUE <- getOption("log_level") == "INFO" || (getOption("log_level") %in% c("WARN", "DEBUG", "ERROR"))

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@R/utils-logging.R` around lines 30 - 36, The logging checks use options("log_level") (which returns a named list) so comparisons fail; update each logging function (.info, .debug, .warn, .err) to call getOption("log_level") instead and compare its scalar value (e.g., replace options("log_level") == "INFO" with getOption("log_level") == "INFO" and replace (options("log_level") %in% c(...)) with getOption("log_level") %in% c(...)); optionally handle NULL by using getOption("log_level", "INFO") or explicit NULL-safe checks so the membership and equality tests behave correctly in .info, .debug, .warn, and .err.

coderabbitai · 2026-02-17T16:59:24Z

R/utils-matchNested.R

 setMethod(
-    f = "matchNested",
-    signature = signature(
-        x = "character",
-        table = "data.frame"
-    ),
-    definition = `matchNested,data.frame`
+  f = "matchNested",
+  signature = signature(
+    x = "character",
+    table = "data.frame"
+  ),
+  definition = `matchNested,data.frame`
 )

 #' @rdname matchNested
 #' @export
 setMethod(
-    f = "matchNested",
-    signature = signature(
-        x = "character",
-        table = "data.frame"
-    ),
-    definition = `matchNested,data.frame`
+  f = "matchNested",
+  signature = signature(
+    x = "character",
+    table = "data.frame"
+  ),
+  definition = `matchNested,data.frame`
 )


⚠️ Potential issue | 🔴 Critical

Duplicate setMethod for (character, data.frame) — likely should be (numeric, data.frame).

Lines 132–139 and 143–150 register the exact same signature (x = "character", table = "data.frame"). The second registration silently overwrites the first. Based on the pattern established by the list and data.table variants (which each have both character and numeric dispatches), the second one should use x = "numeric".

This means matchNested(123, some_data_frame) will fail to dispatch to the intended method.

🐛 Proposed fix

#' `@rdname` matchNested #' `@export` setMethod( f = "matchNested", signature = signature( - x = "character", + x = "numeric", table = "data.frame" ), definition = `matchNested,data.frame` )

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

setMethod(

f = "matchNested",

signature = signature(

x = "character",

table = "data.frame"

),

definition = `matchNested,data.frame`

f = "matchNested",

signature = signature(

x = "character",

table = "data.frame"

),

definition = `matchNested,data.frame`

)

#' @rdname matchNested

#' @export

setMethod(

f = "matchNested",

signature = signature(

x = "character",

table = "data.frame"

),

definition = `matchNested,data.frame`

f = "matchNested",

signature = signature(

x = "character",

table = "data.frame"

),

definition = `matchNested,data.frame`

)

setMethod(

f = "matchNested",

signature = signature(

x = "character",

table = "data.frame"

),

definition = `matchNested,data.frame`

)

#' `@rdname` matchNested

#' `@export`

setMethod(

f = "matchNested",

signature = signature(

x = "numeric",

table = "data.frame"

),

definition = `matchNested,data.frame`

)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@R/utils-matchNested.R` around lines 132 - 150, The second duplicate setMethod registration for matchNested incorrectly repeats the signature (x = "character", table = "data.frame"); change that second setMethod to register the numeric dispatch so it becomes signature(x = "numeric", table = "data.frame") so matchNested called with a numeric x will dispatch correctly; locate the duplicate setMethod blocks for matchNested,data.frame and update the signature in the second definition to use x = "numeric".

coderabbitai · 2026-02-17T16:59:24Z

vignettes/Cellosaurus.Rmd

+By default, the function will try to map using the common identifiers and 
+synonyms (`from = "idsy"`) and will return the the Standardized Identifier as 
+`cellLineName` and the Cellosaurus Accession ID `accession`. The function also 
+returns an additional column `query` which can be used to identify the original 
+query if needed. 


⚠️ Potential issue | 🟡 Minor

Typo: "the the Standardized" — duplicate word.

Line 48 has a repeated "the".

Proposed fix

-By default, the function will try to map using the common identifiers and -synonyms (`from = "idsy"`) and will return the the Standardized Identifier as +By default, the function will try to map using the common identifiers and +synonyms (`from = "idsy"`) and will return the Standardized Identifier as

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

By default, the function will try to map using the common identifiers and

synonyms (`from = "idsy"`) and will return the the Standardized Identifier as

`cellLineName` and the Cellosaurus Accession ID `accession`. The function also

returns an additional column `query` which can be used to identify the original

query if needed.

By default, the function will try to map using the common identifiers and

synonyms (`from = "idsy"`) and will return the Standardized Identifier as

`cellLineName` and the Cellosaurus Accession ID `accession`. The function also

returns an additional column `query` which can be used to identify the original

query if needed.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@vignettes/Cellosaurus.Rmd` around lines 47 - 51, The sentence contains a duplicated word "the the" in the explanation of return columns; edit the vignette text so the phrase reads "will return the Standardized Identifier as `cellLineName` and the Cellosaurus Accession ID `accession`" (keep the surrounding mention of `from = "idsy"` and `query` unchanged).

coderabbitai · 2026-02-17T16:59:24Z

vignettes/ChEMBL.Rmd

+More information can be found at the 
+[API Documentation](https://chembl.gitbook.io/chembl-interface-documentation/web-services/chembl-data-web-**services******)


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

cat -n vignettes/ChEMBL.Rmd | sed -n '70,85p'

Repository: bhklab/AnnotationGx

Length of output: 808

🌐 Web query:

ChEMBL API documentation gitbook URL

💡 Result:

The ChEMBL API documentation on GitBook is hosted here:

https://chembl.gitbook.io/chembl-interface-documentation/web-services/chembl-data-web-services [1]

(Documentation home: https://chembl.gitbook.io/chembl-interface-documentation [2])

Broken URL — extra asterisks corrupt the link.

The URL on line 78 contains web-**services******) which is malformed markdown. The bold markers (**) are embedded inside the URL, breaking both the rendered link and the hyperlink target.

Proposed fix

-More information can be found at the -[API Documentation](https://chembl.gitbook.io/chembl-interface-documentation/web-services/chembl-data-web-**services******) +More information can be found at the +[API Documentation](https://chembl.gitbook.io/chembl-interface-documentation/web-services/chembl-data-web-services)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

More information can be found at the

[API Documentation](https://chembl.gitbook.io/chembl-interface-documentation/web-services/chembl-data-web-**services******)

More information can be found at the

[API Documentation](https://chembl.gitbook.io/chembl-interface-documentation/web-services/chembl-data-web-services)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@vignettes/ChEMBL.Rmd` around lines 77 - 78, The Markdown link for "API Documentation" is broken by stray bold markers; edit the paragraph containing the link (the line with [API Documentation](...)) to remove the extra asterisks so the URL reads https://chembl.gitbook.io/chembl-interface-documentation/web-services/chembl-data-web-services and ensure the link is formatted as [API Documentation](https://chembl.gitbook.io/chembl-interface-documentation/web-services/chembl-data-web-services).

coderabbitai · 2026-02-17T16:59:24Z

vignettes/Introduction.Rmd

+`R` is an open-source statistical environment which can be easily modified to 
+enhance its functionality via packages. `AnnotationGx` is a `R` package hosted
+on Bioconductor and can be installed using `BiocManager`.


⚠️ Potential issue | 🟡 Minor

Minor grammar: "a R package" → "an R package".

Since "R" is pronounced with a vowel sound, the article should be "an".

Proposed fix

-`R` is an open-source statistical environment which can be easily modified to -enhance its functionality via packages. `AnnotationGx` is a `R` package hosted -on Bioconductor and can be installed using `BiocManager`. +`R` is an open-source statistical environment which can be easily modified to +enhance its functionality via packages. `AnnotationGx` is an `R` package hosted +on Bioconductor and can be installed using `BiocManager`.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

`R` is an open-source statistical environment which can be easily modified to

enhance its functionality via packages. `AnnotationGx` is a `R` package hosted

on Bioconductor and can be installed using `BiocManager`.

`R` is an open-source statistical environment which can be easily modified to

enhance its functionality via packages. `AnnotationGx` is an `R` package hosted

on Bioconductor and can be installed using `BiocManager`.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@vignettes/Introduction.Rmd` around lines 26 - 28, Change the article before "R" in the sentence mentioning the package: replace "a `R` package hosted on Bioconductor" with "an `R` package hosted on Bioconductor" so the line referencing `R` and `AnnotationGx` uses the correct indefinite article.

coderabbitai · 2026-02-17T16:59:24Z

vignettes/OncoTree.Rmd

+The `getCancerSubtypes` function retrieves the subtypes of a specific cancer 
+type.
+```{r cancer-subtypes}
 getOncotreeTumorTypes()
 ```


⚠️ Potential issue | 🟡 Minor

Vignette text references a non-existent function name.

Line 65 describes the function as getCancerSubtypes, but the code chunk on line 68 actually calls getOncotreeTumorTypes(). The text should reference the correct function name to avoid confusing readers.

📝 Proposed fix

-The `getCancerSubtypes` function retrieves the subtypes of a specific cancer -type. +The `getOncotreeTumorTypes` function retrieves the subtypes of a specific cancer +type.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@vignettes/OncoTree.Rmd` around lines 65 - 69, The vignette text and code call different function names: the prose mentions getCancerSubtypes while the code chunk invokes getOncotreeTumorTypes(); update the vignette so they match by either (A) changing the descriptive text to refer to getOncotreeTumorTypes(), or (B) changing the code chunk to call getCancerSubtypes()—choose the name that matches the actual exported function in the package (getCancerSubtypes or getOncotreeTumorTypes) and update both the sentence and the code chunk to use that same identifier.

coderabbitai

Actionable comments posted: 5

🧹 Nitpick comments (1)

.github/workflows/check-bioc.yml (1)
23-25: Consider restricting trigger branches to avoid unnecessary CI runs.

on: push and on: pull_request without branch filters will trigger this workflow on every branch push and every PR. For a Bioconductor submission, you likely only need this on main/devel and PRs targeting those branches.
 on:
   push:
+    branches: [main, devel]
   pull_request:
+    branches: [main, devel]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/check-bioc.yml around lines 23 - 25, The workflow
currently triggers on all branches via the top-level triggers "on: push" and
"on: pull_request"; restrict these to only the relevant branches (e.g., main and
devel) by replacing those triggers with branch filters such as using "push:
branches: [main, devel]" and "pull_request: branches: [main, devel]" so CI runs
only on pushes to and PRs targeting the intended branches.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/check-bioc.yml:
- Line 11: Fix the three typos in the workflow comments: replace "operating
system.s" with "operating systems", change "Finding the the R version" to
"Finding the R version", and correct "isntalling" to "installing" so the
comments read correctly; search for those exact strings in the
.github/workflows/check-bioc.yml and update the text in the comment lines
surrounding them (no code changes required).
- Around line 251-254: The DISPLAY env value in the "Run CMD check" step is
incorrect (set as 99.0) and must be changed to a quoted string with a colon
prefix; update the env key DISPLAY in the Run CMD check job to use ":99.0"
(including quotes to prevent YAML from parsing it as a number) and apply the
same fix to the other DISPLAY occurrence in the workflow. Ensure you only modify
the env value for DISPLAY and keep the surrounding step name "Run CMD check" and
other env entries unchanged.
- Around line 136-142: The cache keys reference the non-existent workflow input
inputs.cache-version causing an empty cache suffix and the restore-keys has a
stray double dash; update the "Restore R package cache" step to use the
environment variable cache-version instead (replace ${{ inputs.cache-version }}
with ${{ env.cache-version }}) for both key and restore-keys in that step, and
fix the restore-keys separator from `--` to a single `-` so the key and
restore-keys are consistent (refer to the step name "Restore R package cache"
and the key/restore-keys fields).
- Around line 36-43: The workflow env variable has_testthat is set to 'false'
which skips the "Reveal testthat details" step; change has_testthat to 'true' so
the existing testthat reveal step runs and test output is shown on
failure—update the env entry has_testthat: 'true' (keeping other env keys
unchanged) to enable testthat output for the tests in tests/testthat/ and the
declared Suggests dependency.
- Line 112: Update the GitHub Actions steps to use currently supported release
tags: change actions/checkout@v3 to actions/checkout@v6,
actions/upload-artifact@master to actions/upload-artifact@v6,
docker/setup-qemu-action@v2 to docker/setup-qemu-action@v3,
docker/setup-buildx-action@v2 to docker/setup-buildx-action@v3,
docker/login-action@v2 to docker/login-action@v3, and
docker/build-push-action@v4 to docker/build-push-action@v6; edit the workflow
step definitions that reference these action identifiers to use the new version
pins so you stop tracking unstable/master and meet the required supported
versions.

---

Nitpick comments:
In @.github/workflows/check-bioc.yml:
- Around line 23-25: The workflow currently triggers on all branches via the
top-level triggers "on: push" and "on: pull_request"; restrict these to only the
relevant branches (e.g., main and devel) by replacing those triggers with branch
filters such as using "push: branches: [main, devel]" and "pull_request:
branches: [main, devel]" so CI runs only on pushes to and PRs targeting the
intended branches.

coderabbitai · 2026-02-17T17:24:46Z