Running copyKAT on large datasets

Dears,
I am trying to integrate scRNA datasets (2 million cells), and regarding the copy number variation step, I am using copyKAT, and as you know, it can take quite a long time to run, especially if performed per dataset. 
I started running it per sample as recommended in the tutorial. I was also wondering whether it is okay for subsampling cells after clustering and then applying transfer learning for label transfer to speed up the process. 
I mean, cells are first clustered, a representative subset of cells from each cluster is selected for copyKAT analysis, and then the resulting copy number labels are transferred from the annotated cells to the remaining cells.
OR applying the Metacell-2 to aggregate cells to reduce noise and handle millions of data points before applying the copyKAT?

What do you think, which approach will speed the process? Or even if you have any other suggestions, please let me know.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running copyKAT on large datasets #138

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Running copyKAT on large datasets #138

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions