Deconvolution with Iterative Completion for Estimating cellular Proportions from RNA-seq data.
Bulk RNA-seq deconvolution infers the proportions of distinct cell types from a mixed gene expression profile. Most existing methods assume that the reference signature matrix is complete — i.e., that every cell population present in the bulk sample is represented. In practice this assumption rarely holds, leading to biased estimates.
dicepro addresses this limitation through an iterative joint optimisation that simultaneously:
- estimates cell-type proportions for known populations (supervised step, via CIBERSORTx, FARDEEP, DCQ, CDSeq, or BayesPrism), and
- discovers and quantifies unknown populations using Non-Negative Matrix Factorisation (NMF) with L-BFGS-B optimisation (unsupervised step).
Hyperparameters
- Incomplete-reference robustness — recovers cell types absent from the reference matrix.
- Method-agnostic supervised step — plug in any supported deconvolution backend.
- Automated hyperparameter search — random search over a log-uniform grid with Pareto-optimal selection.
- Bundled benchmark data —
BlueCode(34-cell-type reference) andCellMixtures(12 experimentally mixed bulk samples) included. - Rich diagnostics — Pareto plot and hyperparameter scatter matrix
saved automatically to
output_path/report/.
install.packages("dicepro")# install.packages("remotes")
remotes::install_github("kalidouBA/dicepro")library(dicepro)
set.seed(2101)
# 1. Simulate reference, proportions, and noisy bulk
sim <- simulation(
scenario = "hierarchical",
nSample = 30,
nGenes = 200,
nCellsType = 10,
sigma_bio = 0.07,
sigma_tech = 0.07
)
# 2. Run dicepro
out <- dicepro(
reference = as.matrix(sim$W),
bulk = as.matrix(sim$B),
methodDeconv = "FARDEEP",
bulkName = "SimBulk",
refName = "SimRef",
hp_max_evals = 100L,
hspaceTechniqueChoose = "all",
output_path = tempdir()
)
# 3. Inspect results
class(out)
out$hyperparameters # best lambda / gamma
head(out$H) # estimated proportions
out$plot # interactive Pareto plot
out$plot_hyperopt # hyperparameter scatter matrixlibrary(dicepro)
data(BlueCode) # 34-cell-type reference (G x 34)
data(CellMixtures) # 12 mixed bulk samples (G x 12)
out <- dicepro(
reference = BlueCode,
bulk = CellMixtures,
methodDeconv = "FARDEEP",
bulkName = "CellMixtures",
refName = "BlueCode",
hp_max_evals = 100L,
hspaceTechniqueChoose = "all",
output_path = tempdir()
)
head(out$H)CIBERSORTx (methodDeconv = "CSx") requires Docker and a personal
token.
Step 1 — Install Docker Desktop
Download from https://www.docker.com/products/docker-desktop, open it, log in, then pull the CIBERSORTx image from a terminal:
docker pull cibersortx/fractionsStep 2 — Obtain a token
Request a token at https://cibersortx.stanford.edu/getoken.php. Tokens are tied to your account and expire periodically; request a new one when the existing token has expired.
Step 3 — Run dicepro with CIBERSORTx
out <- dicepro(
reference = BlueCode,
bulk = CellMixtures,
methodDeconv = "CSx",
cibersortx_email = "your@email.com",
cibersortx_token = "your_token_here",
bulkName = "CellMixtures",
refName = "BlueCode",
output_path = tempdir()
)Other supported deconvolution backends can be listed with
?running_method.
dicepro() returns an S3 object of class "dicepro" with the following
elements:
| Element | Description |
|---|---|
$hyperparameters |
Best |
$metrics |
Loss and constraint value at the optimum |
$trials |
data.frame of all evaluated hyperparameter configurations |
$W |
Optimised reference matrix (including unknown cell types) |
$H |
Estimated cell-type proportions (samples x cell types) |
$plot |
Pareto frontier |
$plot_hyperopt |
Hyperparameter scatter matrix (ggplot2) |
A gene x 34 cell-type reference signature matrix derived from sorted bulk RNA-seq profiles spanning five major tissue compartments: Immune (9), Stromal (8), Endothelial (3), Epithelial (5), and Muscle (9).
data(BlueCode)
dim(BlueCode)
colnames(BlueCode)A gene x 12 bulk RNA-seq matrix of experimentally constructed cell mixtures (samples A–L), paired with BlueCode for benchmarking.
data(CellMixtures)
dim(CellMixtures)
colnames(CellMixtures)See ?BlueCode and ?CellMixtures for full documentation.
Two vignettes provide step-by-step:
vignette("vignette-simulation", package = "dicepro")
vignette("vignette-real-data", package = "dicepro")If you use dicepro in your research, please cite:
[When Less Is Not More]{When Less Is Not More: Mitigates the Impact of Incomplete Reference Matrices on Cellular Frequency Deconvolution* Bioinformatics. doi:XXXX