Tutorials
See examples for:
scRNA unseen gene perturbation prediction
scATAC unseen gene perturbation prediction
Tips
Please note that you need to prepare a test CSV template specifying the unseen genes to be predicted and the number of cells to generate for each. This file must strictly adhere to the following format:
It must contain exactly two columns:
target_geneandn_cells.The
target_genecolumn lists the names of the perturbation genes (unseen during training). It does not need to include “control”, as the inference pipeline will always use real, experimentally observed control cells — not synthetic controls generated by the model.The
n_cellscolumn indicates the number of cells to generate for each corresponding target gene.
Ensure the CSV follows this structure precisely to avoid errors during training or prediction.
target_gene |
n_cells |
|---|---|
non-targeting |
38176 |
PRCP |
4331 |
MED13 |
2787 |
USP22 |
2491 |
CAST |
2252 |
PTPN1 |
2195 |
…… |
…… |
Entries
non-targeting(control cells) is optional intarget_geneand will be ignored if present.
Additionally, please ensure that all datasets use the same control cell label, as required by DeSCOPE_LOO for correct execution.
Evaluation Metrics Calculation for Single-Cell Perturbation Prediction
from descope.inference import CellEvalMixin
eval_mixin = CellEvalMixin()
You can inspect all built-in evaluation metrics supported by cell-eval via eval_mixin.all_metrics.
print(eval_mixin.all_metrics)
['pearson_delta',
'mse',
'mae',
'mse_delta',
'mae_delta',
'discrimination_score_l1',
'discrimination_score_l2',
'discrimination_score_cosine',
'pearson_edistance',
'overlap_at_N',
'overlap_at_50',
'overlap_at_100',
'overlap_at_200',
'overlap_at_500',
'precision_at_N',
'precision_at_50',
'precision_at_100',
'precision_at_200',
'precision_at_500',
'de_spearman_sig',
'de_direction_match',
'de_spearman_lfc_sig',
'de_sig_genes_recall',
'de_nsig_counts',
'pr_auc',
'roc_auc',
'clustering_agreement']
The eval_mixin.all_metrics attribute contains two categories of evaluation metrics:
de-based metrics (
eval_mixin.all_de_metrics)anndata-pair-based metrics (
eval_mixin.all_anndata_pair_metrics)
You can inspect each category individually:
print(eval_mixin.all_de_metrics)
['overlap_at_N',
'overlap_at_50',
'overlap_at_100',
'overlap_at_200',
'overlap_at_500',
'precision_at_N',
'precision_at_50',
'precision_at_100',
'precision_at_200',
'precision_at_500',
'de_spearman_sig',
'de_direction_match',
'de_spearman_lfc_sig',
'de_sig_genes_recall',
'de_nsig_counts',
'pr_auc',
'roc_auc']
print(eval_mixin.all_anndata_pair_metrics)
['pearson_delta',
'mse',
'mae',
'mse_delta',
'mae_delta',
'discrimination_score_l1',
'discrimination_score_l2',
'discrimination_score_cosine',
'pearson_edistance',
'clustering_agreement']
Select the desired metrics from eval_mixin.all_metrics, place them in a list, and pass that list to the .compute_metrics() method.
results, agg_results, evaluator = evalmixin.compute_metrics(
adata_pred=...,
adata_real=...,
metrics_to_calculate=..., # Select the desired metrics from eval_mixin.all_metrics
de_pred=None,
de_real=None,
control_pert=...,
pert_col=...,
de_method="wilcoxon",
num_threads=32,
outdir="./cell-eval-outdir"
)
(Optional) To extend the evaluation capabilities of cell-eval, we have added additional metrics to CellEvalMixin, including pearson, pearson_delta_on_topk_de, direction_match_on_topk_de, and edistance.
# pearson
results, mean_results = eval_mixin.extra_metrics_func.pearson(evaluator.anndata_pair)
# pearson_delta_on_topk_de
results, mean_results = eval_mixin.extra_metrics_func.pearson_delta_on_topk_de(
data=evaluator.anndata_pair,
de_real=...,
topk=...
)
# direction_match_on_topk_de
## separate_up_down_regulated=False
results, mean_results = eval_mixin.extra_metrics_func.direction_match_on_topk_de(
data=evaluator.anndata_pair,
de_real=...,
topk=...,
separate_up_down_regulated=False
)
## separate_up_down_regulated=True
up_results, down_results, mean_up_results, mean_down_results = eval_mixin.extra_metrics_func.direction_match_on_topk_de(
data=evaluator.anndata_pair,
de_real=...,
topk=...,
separate_up_down_regulated=True
)
# edistance
results = eval_mixin.extra_metrics_func.edistance(
adata=...,
control_pert=...,
pert_col=...,
metric="euclidean"
)