Skip to content

ntiGideon/ReproStat

Repository files navigation

ReproStat

Reproducibility Diagnostics for Statistical Modeling

ReproStat helps you answer a practical question that standard model summaries do not answer well:

If the data changed a little, would the substantive result still look the same?

The package repeatedly perturbs a dataset, refits the model, and measures how stable the outputs remain. It summarizes that behavior through:

  • coefficient stability
  • p-value stability
  • selection stability
  • prediction stability
  • a composite Reproducibility Index (RI) on a 0-100 scale
  • cross-validation ranking stability for comparing multiple candidate models

This makes ReproStat useful when you want to move beyond single-fit inference and assess how sensitive your modeling conclusions are to ordinary data variation.


What ReproStat Is For

ReproStat is designed for analysts who want to know whether a model result is:

  • stable under bootstrap resampling
  • sensitive to sample composition
  • sensitive to small measurement noise
  • robust across competing modeling choices

Typical use cases include:

  • regression diagnostics for research analyses
  • checking whether selected predictors remain important across perturbations
  • comparing candidate models by how consistently they rank best in repeated CV
  • producing a transparent reproducibility summary for reports, theses, or papers

ReproStat does not claim to prove scientific reproducibility on its own. It quantifies the stability of model outputs under controlled perturbation schemes so you can assess how fragile or dependable the modeling result appears.


Installation

If the package is available on CRAN:

install.packages("ReproStat")

To install the development version from GitHub:

remotes::install_github("ntiGideon/ReproStat")

Optional dependencies

Package Purpose
MASS Robust M-estimation backend via backend = "rlm"
glmnet Penalized regression backend via backend = "glmnet"
ggplot2 ggplot-based plotting helpers

Core Idea

The package follows a simple workflow:

original data
    ->
perturb data many times
    ->
refit the model each time
    ->
measure how much results change
    ->
summarize the stability of those changes

If coefficient signs, significance decisions, selected variables, predictions, and model rankings remain similar across perturbations, the analysis is more reproducible in the sense ReproStat measures. If they vary substantially, the result may be fragile.


Quick Start

library(ReproStat)

set.seed(1)

diag_obj <- run_diagnostics(
  mpg ~ wt + hp + disp,
  data = mtcars,
  B = 200,
  method = "bootstrap"
)

print(diag_obj)
coef_stability(diag_obj)
pvalue_stability(diag_obj)
selection_stability(diag_obj)
prediction_stability(diag_obj)
reproducibility_index(diag_obj)
ri_confidence_interval(diag_obj, R = 500, seed = 1)

Main Workflow

1. Fit the diagnostic object

run_diagnostics() is the main entry point. It fits the original model, perturbs the data B times, refits the model, and stores the results for downstream summaries.

diag_obj <- run_diagnostics(
  mpg ~ wt + hp + disp,
  data = mtcars,
  B = 200,
  method = "bootstrap"
)

2. Inspect individual stability dimensions

coef_stability(diag_obj)
pvalue_stability(diag_obj)
selection_stability(diag_obj)
prediction_stability(diag_obj)

These functions answer different questions:

  • coef_stability(): Do coefficient magnitudes fluctuate a lot?
  • pvalue_stability(): Do significance decisions stay consistent?
  • selection_stability(): Do predictors keep the same sign or selection pattern?
  • prediction_stability(): Do predictions stay similar across perturbations?

3. Summarize with the Reproducibility Index

ri <- reproducibility_index(diag_obj)
ri

The RI is a compact summary of multiple stability components, scaled to 0-100.

4. Visualize the result

oldpar <- par(mfrow = c(2, 2))
plot_stability(diag_obj, "coefficient")
plot_stability(diag_obj, "pvalue")
plot_stability(diag_obj, "selection")
plot_stability(diag_obj, "prediction")
par(oldpar)

Interpreting the Outputs

Perturbation methods

Method What it tests Good when you want to assess
"bootstrap" resampling variability ordinary data-driven stability
"subsample" sample composition sensitivity robustness to who enters the sample
"noise" measurement perturbation sensitivity to noisy recorded values

Reproducibility Index

The RI is best treated as an interpretive summary, not as a hard universal threshold.

RI range Interpretation
90-100 Highly stable under the chosen perturbation design
70-89 Moderately stable; overall pattern is fairly dependable
50-69 Mixed stability; some conclusions may be sensitive
< 50 Low stability; investigate model dependence and data fragility

Two important cautions:

  • The RI depends on your perturbation scheme, backend, and tuning choices.
  • RI values are not directly comparable across all backends, especially glmnet, because not all components are defined the same way.

Supported Backends

ReproStat supports four model-fitting backends through the same interface:

Backend Model family Notes
"lm" ordinary least squares default
"glm" generalized linear models use family = ...
"rlm" robust regression requires MASS
"glmnet" penalized regression requires glmnet

Examples:

# Logistic regression
diag_glm <- run_diagnostics(
  am ~ wt + hp + qsec,
  data = mtcars,
  B = 100,
  family = stats::binomial()
)

# Robust regression
if (requireNamespace("MASS", quietly = TRUE)) {
  diag_rlm <- run_diagnostics(
    mpg ~ wt + hp,
    data = mtcars,
    B = 100,
    backend = "rlm"
  )
}

# Penalized regression
if (requireNamespace("glmnet", quietly = TRUE)) {
  diag_lasso <- run_diagnostics(
    mpg ~ wt + hp + disp + qsec,
    data = mtcars,
    B = 100,
    backend = "glmnet",
    en_alpha = 1
  )
}

Comparing Models with CV Ranking Stability

ReproStat also helps evaluate model-selection stability, not just single-model stability.

cv_ranking_stability() repeatedly runs cross-validation across competing formulas and records how often each model ranks best.

models <- list(
  baseline = mpg ~ wt + hp,
  medium   = mpg ~ wt + hp + disp,
  full     = mpg ~ wt + hp + disp + qsec
)

cv_obj <- cv_ranking_stability(models, mtcars, v = 5, R = 50)
cv_obj$summary
plot_cv_stability(cv_obj, metric = "top1_frequency")
plot_cv_stability(cv_obj, metric = "mean_rank")

This is useful when the lowest average error and the most consistently top-ranked model are not the same thing.


Example: End-to-End Analysis

library(ReproStat)

set.seed(42)

diag_obj <- run_diagnostics(
  mpg ~ wt + hp + disp,
  data = mtcars,
  B = 200,
  method = "bootstrap"
)

# Individual summaries
coef_stability(diag_obj)
pvalue_stability(diag_obj)
selection_stability(diag_obj)
prediction_stability(diag_obj)$mean_variance

# Composite summary
ri <- reproducibility_index(diag_obj)
cat(sprintf("RI = %.1f\n", ri$index))
ri_confidence_interval(diag_obj, R = 500, seed = 42)

# Visuals
oldpar <- par(mfrow = c(2, 2))
plot_stability(diag_obj, "coefficient")
plot_stability(diag_obj, "pvalue")
plot_stability(diag_obj, "selection")
plot_stability(diag_obj, "prediction")
par(oldpar)

Learn More on the pkgdown Site

The pkgdown site is organized for different user needs:

  • Get started: a narrative introduction and first analysis
  • Articles: interpretation guidance, backend choices, and workflow patterns
  • Reference: individual function documentation
  • Changelog: package evolution over releases

Key pages:

  • vignette("ReproStat-intro")
  • the interpretation article
  • the backend guide
  • the workflow patterns article

Citation

If you use ReproStat in published work, cite the package and any associated manuscript or software record that accompanies the release you used.


License

GPL (>= 3)

About

R package for diagnosing reproducibility of statistical model outputs under bootstrap, subsample, and noise perturbations.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages