Tutorials

scRNA unseen gene perturbation prediction (RNA_Gene Folder)
scRNA unseen drug perturbation prediction (RNA_Drug Folder)
scATAC unseen gene perturbation prediction (ATAC_Gene Folder)

Tips

Please note that you need to prepare a test CSV template specifying the unseen genes to be predicted and the number of cells to generate for each. This file must strictly adhere to the following format:

It must contain exactly two columns: target_gene and n_cells.
The target_gene column lists the names of the perturbation genes (unseen during training). It does not need to include “control”, as the inference pipeline will always use real, experimentally observed control cells — not synthetic controls generated by the model.
The n_cells column indicates the number of cells to generate for each corresponding target gene.

Ensure the CSV follows this structure precisely to avoid errors during training or prediction.

target_gene	n_cells
non-targeting	38176
PRCP	4331
MED13	2787
USP22	2491
CAST	2252
PTPN1	2195
……	……

Entries non-targeting (control cells) is optional in target_gene and will be ignored if present.

Additionally, please ensure that all datasets use the same control cell label, as required by DeSCOPE_LOO for correct execution.

Evaluation Metrics Calculation for Single-Cell Perturbation Prediction

from descope.inference import CellEvalMixin

eval_mixin = CellEvalMixin()

You can inspect all built-in evaluation metrics supported by cell-eval via eval_mixin.all_metrics.

print(eval_mixin.all_metrics)

['pearson_delta',
 'mse',
 'mae',
 'mse_delta',
 'mae_delta',
 'discrimination_score_l1',
 'discrimination_score_l2',
 'discrimination_score_cosine',
 'pearson_edistance',
 'overlap_at_N',
 'overlap_at_50',
 'overlap_at_100',
 'overlap_at_200',
 'overlap_at_500',
 'precision_at_N',
 'precision_at_50',
 'precision_at_100',
 'precision_at_200',
 'precision_at_500',
 'de_spearman_sig',
 'de_direction_match',
 'de_spearman_lfc_sig',
 'de_sig_genes_recall',
 'de_nsig_counts',
 'pr_auc',
 'roc_auc',
 'clustering_agreement']

The eval_mixin.all_metrics attribute contains two categories of evaluation metrics:

de-based metrics (eval_mixin.all_de_metrics)
anndata-pair-based metrics (eval_mixin.all_anndata_pair_metrics)

You can inspect each category individually:

print(eval_mixin.all_de_metrics)

['overlap_at_N',
 'overlap_at_50',
 'overlap_at_100',
 'overlap_at_200',
 'overlap_at_500',
 'precision_at_N',
 'precision_at_50',
 'precision_at_100',
 'precision_at_200',
 'precision_at_500',
 'de_spearman_sig',
 'de_direction_match',
 'de_spearman_lfc_sig',
 'de_sig_genes_recall',
 'de_nsig_counts',
 'pr_auc',
 'roc_auc']


print(eval_mixin.all_anndata_pair_metrics)

['pearson_delta',
 'mse',
 'mae',
 'mse_delta',
 'mae_delta',
 'discrimination_score_l1',
 'discrimination_score_l2',
 'discrimination_score_cosine',
 'pearson_edistance',
 'clustering_agreement']

Select the desired metrics from eval_mixin.all_metrics, place them in a list, and pass that list to the .compute_metrics() method.

results, agg_results, evaluator = eval_mixin.compute_metrics(
    adata_pred=...,
    adata_real=...,
    metrics_to_calculate=...,  # Select the desired metrics from eval_mixin.all_metrics
    de_pred=None,
    de_real=None,
    control_pert=...,
    pert_col=...,
    de_method="wilcoxon",
    num_threads=32,
    outdir="./cell-eval-outdir"
)

(Optional) To extend the evaluation capabilities of cell-eval, we have added additional metrics to CellEvalMixin, including pearson, pearson_delta_on_topk_de, direction_match_on_topk_de, and edistance.

# pearson
results, mean_results = eval_mixin.extra_metrics_func.pearson(evaluator.anndata_pair)

# pearson_delta_on_topk_de
results, mean_results = eval_mixin.extra_metrics_func.pearson_delta_on_topk_de(
    data=evaluator.anndata_pair,
    de_real=...,
    topk=...
)

# direction_match_on_topk_de
## separate_up_down_regulated=False
results, mean_results = eval_mixin.extra_metrics_func.direction_match_on_topk_de(
    data=evaluator.anndata_pair,
    de_real=...,
    topk=...,
    separate_up_down_regulated=False
)

## separate_up_down_regulated=True
up_results, down_results, mean_up_results, mean_down_results = eval_mixin.extra_metrics_func.direction_match_on_topk_de(
    data=evaluator.anndata_pair,
    de_real=...,
    topk=...,
    separate_up_down_regulated=True
)

# edistance
results = eval_mixin.extra_metrics_func.edistance(
    adata=...,
    control_pert=...,
    pert_col=...,
    metric="euclidean"
)