Add accept3_uk() for UK-specific CPRD recalibration#21
Add accept3_uk() for UK-specific CPRD recalibration#21jeenatm wants to merge 5 commits intoresplab:masterfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR introduces accept3_uk(), a UK-specific COPD exacerbation risk prediction entrypoint that recalibrates ACCEPT 2.0 using CPRD-derived Cox recalibration parameters and a UK-specific sequential (triangular) imputation for optional predictors.
Changes:
- Added
accept3_uk()with CPRD-based sequential imputation for missing optional predictors. - Added UK-specific Cox recalibration step applied to ACCEPT 2.0 predicted risks.
- Minor formatting-only adjustments in existing
accept()andaccept3()code blocks.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| #' @importFrom dplyr tibble mutate select starts_with | ||
| #' @export | ||
| accept3_uk <- function(patientData, |
There was a problem hiding this comment.
The roxygen @export tag adds a new public function, but this PR doesn’t update the generated NAMESPACE (and corresponding .Rd in man/). As-is, accept3_uk() won’t be exported/accessible when installing from source. Regenerate and commit NAMESPACE/documentation via roxygen2 (or add them manually if roxygen isn’t part of the workflow).
R/predict.R
Outdated
| accept3_uk <- function(patientData, | ||
| prediction_interval = FALSE, | ||
| return_predictors = FALSE) { |
There was a problem hiding this comment.
prediction_interval is documented and exposed in the signature but never used, so callers will get identical output regardless of its value. Either implement interval propagation (e.g., recalibrate accept2()’s *_lower_PI/*_upper_PI columns and include them when prediction_interval=TRUE) or remove the parameter and its documentation to avoid a misleading API.
| #' CPRD Cox model (Table in manuscript): | ||
| #' \itemize{ | ||
| #' \item Moderate-to-severe: \eqn{H_0 = 0.676}, \eqn{\beta = 0.986} | ||
| #' \item Severe: \eqn{H_0 = 1.124}, \eqn{\beta = 0.482} | ||
| #' } |
There was a problem hiding this comment.
The severe-outcome recalibration parameters are inconsistent within this function’s documentation/comments vs the actual constants used. The roxygen block lists severe H0 = 1.124 and beta = 0.482, while the implementation/comment block uses H0_sev <- 0.482 and beta_sev <- 1.124. Please verify the correct CPRD values and make the roxygen formula section, in-code comments, and constants agree.
| BMI = list( | ||
| binary = FALSE, | ||
| clamp_low = 10, | ||
| clamp_hi = 70, | ||
| coef = c(28.944, -0.074, 0.195, 0.421, 0.027, -0.527, 0.010, | ||
| 0.451, -0.093, -0.051, -0.105, 1.629) # +LABA, oxygen, ICS, LAMA, statin |
There was a problem hiding this comment.
BMI imputation clamps to a maximum of 70, but the package documentation for samplePatients (and accept3 parameter docs) specifies BMI is expected in the 10–60 range. Clamping above the validated range may yield out-of-domain predictions; consider clamping to 60 (or explicitly documenting/justifying the higher cap if the UK recalibration supports it).
| # Use mMRC if available, otherwise back-transform from SGRQ | ||
| if (!"mMRC" %in% colnames(patientData) && "SGRQ" %in% colnames(patientData)) { | ||
| patientData$mMRC <- (patientData$SGRQ - 20.43) / 14.77 | ||
| } |
There was a problem hiding this comment.
mMRC is treated as a mandatory predictor for the imputation models, but the code only back-calculates mMRC when the column is entirely absent. If mMRC exists but has NA values (while SGRQ is present), those rows will keep NA in X, causing lp/imputations to become NA. Consider filling mMRC where it is missing using SGRQ when available (and/or explicitly error if mandatory predictors contain missing values).
| # 3. ACCEPT 2.0 predictions | ||
| accept2_preds <- accept2(patientData = patientData) | ||
|
|
||
| p2_modsev <- accept2_preds$predicted_exac_probability | ||
| p2_sev <- accept2_preds$predicted_severe_exac_probability |
There was a problem hiding this comment.
Unlike accept(), this function doesn’t validate LastYrSevExacCount <= LastYrExacCount. Since those fields directly influence both imputation and downstream accept2() predictions, add the same guard here (or call through the existing accept() validation path) to fail fast with a clear error for inconsistent history inputs.
| accept3_uk <- function(patientData, | ||
| prediction_interval = FALSE, | ||
| return_predictors = FALSE) { | ||
|
|
||
| if (!tibble::is_tibble(patientData)) { | ||
| stop("patientData must be a tibble. Use as_tibble() to convert.") | ||
| } |
There was a problem hiding this comment.
This adds substantial new behavior (UK-specific sequential imputation + recalibration) but there are no tests covering accept3_uk() in tests/testthat/test-predict.R. Add tests for: (1) baseline output columns/shape, (2) expected predictions on fixed inputs, and (3) at least one missing-optional-predictor scenario to assert imputation + determinism.
| if (!vname %in% colnames(patientData)) { | ||
| patientData[[vname]] <- pred_vals | ||
| message(paste0("accept3_uk: '", vname, "' not found - imputed using UK model.")) | ||
| } else { | ||
| na_idx <- is.na(patientData[[vname]]) | ||
| patientData[[vname]][na_idx] <- pred_vals[na_idx] | ||
| if (any(na_idx)) | ||
| message(paste0("accept3_uk: ", sum(na_idx), " missing value(s) in '", | ||
| vname, "' imputed using UK model.")) | ||
| } |
There was a problem hiding this comment.
The sequential imputation emits multiple message() calls (per missing column / per NA count), which can be noisy in batch prediction and hard to suppress selectively. Consider switching to a single aggregated warning()/message() at the end (or add a quiet/verbose flag consistent with the rest of the API) so callers can control console output more easily.
|
Should the function be called |
@aminadibi What about accept3_cprd? I can change accordingly. Or we can say that accept3_UK is based on primary care UK data whereas accept3 with GBR is for secondary/tertiary care? |
This PR adds accept3_uk(), a new function that provides
UK-specific COPD exacerbation risk predictions by recalibrating
ACCEPT 2.0 using the CPRD primary-care dataset.
Key features: