Tutorials

See examples for:

  • scRNA unseen gene perturbation prediction

  • scATAC unseen gene perturbation prediction

Tips

Please note that you need to prepare a test CSV template specifying the unseen genes to be predicted and the number of cells to generate for each. This file must strictly adhere to the following format:

  • It must contain exactly two columns: target_gene and n_cells.

  • The target_gene column lists the names of the perturbation genes (unseen during training). It does not need to include “control”, as the inference pipeline will always use real, experimentally observed control cells — not synthetic controls generated by the model.

  • The n_cells column indicates the number of cells to generate for each corresponding target gene.

Ensure the CSV follows this structure precisely to avoid errors during training or prediction.

target_gene

n_cells

non-targeting

38176

PRCP

4331

MED13

2787

USP22

2491

CAST

2252

PTPN1

2195

……

……

  • Entries non-targeting (control cells) is optional in target_gene and will be ignored if present.

Additionally, please ensure that all datasets use the same control cell label, as required by DeSCOPE_LOO for correct execution.

Evaluation Metrics Calculation for Single-Cell Perturbation Prediction

from descope.inference import CellEvalMixin

eval_mixin = CellEvalMixin()

You can inspect all built-in evaluation metrics supported by cell-eval via eval_mixin.all_metrics.

print(eval_mixin.all_metrics)

['pearson_delta',
 'mse',
 'mae',
 'mse_delta',
 'mae_delta',
 'discrimination_score_l1',
 'discrimination_score_l2',
 'discrimination_score_cosine',
 'pearson_edistance',
 'overlap_at_N',
 'overlap_at_50',
 'overlap_at_100',
 'overlap_at_200',
 'overlap_at_500',
 'precision_at_N',
 'precision_at_50',
 'precision_at_100',
 'precision_at_200',
 'precision_at_500',
 'de_spearman_sig',
 'de_direction_match',
 'de_spearman_lfc_sig',
 'de_sig_genes_recall',
 'de_nsig_counts',
 'pr_auc',
 'roc_auc',
 'clustering_agreement']

The eval_mixin.all_metrics attribute contains two categories of evaluation metrics:

  • de-based metrics (eval_mixin.all_de_metrics)

  • anndata-pair-based metrics (eval_mixin.all_anndata_pair_metrics)

You can inspect each category individually:

print(eval_mixin.all_de_metrics)

['overlap_at_N',
 'overlap_at_50',
 'overlap_at_100',
 'overlap_at_200',
 'overlap_at_500',
 'precision_at_N',
 'precision_at_50',
 'precision_at_100',
 'precision_at_200',
 'precision_at_500',
 'de_spearman_sig',
 'de_direction_match',
 'de_spearman_lfc_sig',
 'de_sig_genes_recall',
 'de_nsig_counts',
 'pr_auc',
 'roc_auc']


print(eval_mixin.all_anndata_pair_metrics)

['pearson_delta',
 'mse',
 'mae',
 'mse_delta',
 'mae_delta',
 'discrimination_score_l1',
 'discrimination_score_l2',
 'discrimination_score_cosine',
 'pearson_edistance',
 'clustering_agreement']

Select the desired metrics from eval_mixin.all_metrics, place them in a list, and pass that list to the .compute_metrics() method.

results, agg_results, evaluator = evalmixin.compute_metrics(
    adata_pred=...,
    adata_real=...,
    metrics_to_calculate=...,  # Select the desired metrics from eval_mixin.all_metrics
    de_pred=None,
    de_real=None,
    control_pert=...,
    pert_col=...,
    de_method="wilcoxon",
    num_threads=32,
    outdir="./cell-eval-outdir"
)

(Optional) To extend the evaluation capabilities of cell-eval, we have added additional metrics to CellEvalMixin, including pearson, pearson_delta_on_topk_de, direction_match_on_topk_de, and edistance.

# pearson
results, mean_results = eval_mixin.extra_metrics_func.pearson(evaluator.anndata_pair)

# pearson_delta_on_topk_de
results, mean_results = eval_mixin.extra_metrics_func.pearson_delta_on_topk_de(
    data=evaluator.anndata_pair,
    de_real=...,
    topk=...
)

# direction_match_on_topk_de
## separate_up_down_regulated=False
results, mean_results = eval_mixin.extra_metrics_func.direction_match_on_topk_de(
    data=evaluator.anndata_pair,
    de_real=...,
    topk=...,
    separate_up_down_regulated=False
)

## separate_up_down_regulated=True
up_results, down_results, mean_up_results, mean_down_results = eval_mixin.extra_metrics_func.direction_match_on_topk_de(
    data=evaluator.anndata_pair,
    de_real=...,
    topk=...,
    separate_up_down_regulated=True
)

# edistance
results = eval_mixin.extra_metrics_func.edistance(
    adata=...,
    control_pert=...,
    pert_col=...,
    metric="euclidean"
)