Tutorial3: Using scMagnifier on spatial transcriptomic datasets

This tutorial demonstrates how to use scMagnifier+STAGATE on spatial transcriptomic data. Our workflow includes data preprocessing, GRN construction, TF perturbation, consensus clustering and cluster merging. In this tutorial, we will use two conda environments: scMagnifier and stagate_scMagnifier.
Here we use breast cancer dataset (BRCA1) from 10X Genomics Visium as our example.

Step1: Data preprocessing

In this step, we use stagate_scMagnifier as our conda environment. We define spatial_preprocess() function to preprocess the dataset. First, you need to input the path to the prepared h5ad file in the function (set input_path=""). In the function, you can modify the parameters required for STAGATE execution by specifying values for "alpha=" (default: 0), "stagate_epochs=" (default: 300), and "k_cutoff=" (default: 6). Additionally, you can adjust the clustering resolution by setting a value for "resolution=".
When analyzing datasets from 10X Genomics Visium, the default parameters are sufficient for the analysis. For datasets from other platforms, you can adjust k_cutoff to meet STAGATE's requirements, and you may also specify a value for "spot_size=" to perform spatial plotting.
You can refer to the spatial_preprocess_core.py file for the remaining detailed parameters.

# stagate_scMagnifier conda environment
from stagate_scMagnifier import spatial_preprocess
spatial_preprocess(input_path="/mnt/disk1/hzh/BRCA1.h5ad",resolution=0.2)
Original data shape: (3798, 36601)
Data shape after filtering blank spots: (3798, 36601)
Starting STAGATE ...
------Calculating spatial graph...
The graph contains 22788 edges, 3798 cells.
6.0000 neighbors per cell on average.
Size of Input:  (3798, 2000)


2026-01-21 10:41:44.392593: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2026-01-21 10:41:44.759680: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1613] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 22256 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3090, pci bus id: 0000:21:00.0, compute capability: 8.6
2026-01-21 10:41:44.760514: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1613] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 22453 MB memory:  -> device: 1, name: NVIDIA GeForce RTX 3090, pci bus id: 0000:81:00.0, compute capability: 8.6
2026-01-21 10:41:44.784478: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:357] MLIR V1 optimization pass is not enabled
  0%|          | 0/300 [00:00<?, ?it/s]2026-01-21 10:41:44.968677: I tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:630] TensorFloat-32 will be used for the matrix multiplication. This will only be logged once.
100%|██████████| 300/300 [00:06<00:00, 43.39it/s]


Preprocessing completed! Results saved to: preprocessed_result/preprocessed.h5ad

After preprocessing, you can find the preprocessed h5ad file and generated images in the preprocessed_result directory.
Here we demonstrate the visualization results of the generated clustering results on UMAP as well as on spatial plots.

# You do not need to execute this code
from IPython.display import HTML
import base64

def img_to_base64(img_path):
    with open(img_path, "rb") as f:
        return base64.b64encode(f.read()).decode()

img_path1 = "/mnt/disk1/hzh/Tutorial3/preprocessed_result/umap_leiden.png"
img_path2 = "/mnt/disk1/hzh/Tutorial3/preprocessed_result/spatial_leiden.png"
img1_b64 = img_to_base64(img_path1)
img2_b64 = img_to_base64(img_path2)
HTML(f"""
<div style="display: flex; gap: 20px; scope: local;">
    <img src="data:image/png;base64,{img1_b64}" width="400">
    <img src="data:image/png;base64,{img2_b64}" width="300">
</div>
""")

Step2: GRN construction

In this step, we use scMagnifier as our conda environment. This step uses the same GRN() function as that employed for processing single-batch datasets.

# scMagnifier conda environment
from scMagnifier import GRN
GRN()

After GRN construction, we define the preperturb() function to extract the GRN network required for the subsequent perturbation process.

# scMagnifier conda environment
from scMagnifier import spatial_preperturb
spatial_preperturb()
[INFO] Loading oracle file ...
[INFO] Found coefficient matrix in oracle.coef_matrix_per_cluster
[SAVED] GRN coefficients saved to GRN/celloracle_grn_coef.npz
[DONE] Spatial preperturb process completed

Step3: TF perturbation

In this step, we use stagate_scMagnifier as our conda environment. we define spatial_perturb() function to simulate the process of gene perturbation propagation. If you customized the resolution parameter during preprocessing, you should also set resolution="" here to keep it consistent. You can also adjust the perturbation multiplier by modifying the multiple="" parameter. If you modified the alpha, stagate_epochs, k_cutoff, and spot_size parameters during preprocessing, you need to input the same values here to maintain consistency.
You can refer to the spatial_perturb_core.py file for the remaining detailed parameters.

# stagate_scMagnifier conda environment
from stagate_scMagnifier import spatial_perturb
spatial_perturb(resolution=0.2)

After perturbation, the perturb_results directory will store the clustering results following each gene perturbation (e.g., cluster_AFP_0.1.csv), the perturbed gene expression matrix (e.g., perturbed_matrix_AFP_0.1.csv), as well as the visualization results of the perturbed clustering results on the new UMAP, original UMAP, and spatial plots.

Next, we demonstrate the visualization plots of the new clustering results on the new UMAP, original UMAP, and spatial plots after following perturbation of the AFP gene.

# You do not need to execute this code
img_path1 = "/mnt/disk1/hzh/Tutorial3/perturb_results/0p1/umap_new_AFP_0.1.png"
img_path2 = "/mnt/disk1/hzh/Tutorial3/perturb_results/0p1/umap_old_on_new_AFP_0.1.png"
img_path3 = "/mnt/disk1/hzh/Tutorial3/perturb_results/0p1/spatial_AFP_0.1.png"
img1_b64 = img_to_base64(img_path1)
img2_b64 = img_to_base64(img_path2)
img3_b64 = img_to_base64(img_path3)
HTML(f"""
<div style="display: flex; gap: 20px; scope: local;">
    <img src="data:image/png;base64,{img1_b64}" width="400">
    <img src="data:image/png;base64,{img2_b64}" width="400">
    <img src="data:image/png;base64,{img3_b64}" width="300">
</div>
""")

Step4: Consensus clustering

In this step, we use stagate_scMagnifier as our conda environment. We define spatial_consensus() function to perform consensus clustering. If cell annotations are available, you can input the column name of the cell annotations for label_key="". We use a default resolution of 0.3 here; if you want to modify this value, you can set resolution="". Finally, if you input the spot_size parameter earlier, you need to input the same value here as well.
You can refer to the spatial_consensus_core.py file for the remaining detailed parameters.

# stagate_scMagnifier conda environment
from stagate_scMagnifier import spatial_consensus
spatial_consensus()
[INFO] Found 106 cluster CSV files.
[INFO] Loaded h5ad with 3798 cells.
[INFO] 3798 cells after alignment (intersection across all CSV files).
[INFO] One-hot matrix shape: (3798, 1109)
[INFO] Using STAGATE embedding with first 20 dimensions.
[INFO] STAGATE matrix shape: (3798, 20)
[INFO] Computing STAGATE (Euclidean) distance matrix ...
[INFO] Computing One-hot (cosine) distance matrix ...
[INFO] Running new UMAP (precomputed distances) ...
[INFO] New UMAP embedding shape: (3798, 2)
[SAVED] Custom UMAP + combined_distance saved to consensus_result/adata_with_rpcumap.h5ad

[INFO] Building kNN graph from the combined distance matrix (precomputed)...
[INFO] kNN graph (connectivities + distances) placed into adata_cluster.obsp
[INFO] Will use scanpy default spot size for spatial plots (if available).

[INFO] Running leiden clustering at resolution=0.3
[SAVED] consensus_result/leiden_res0.3.csv
[INFO] Plotting spatial graph for leiden resolution=0.3 (using scanpy default spot_size)

[DONE] All resolutions processed.

After consensus clustering, you can obtain the h5ad file storing the rpcUMAP embeddings (the embedding information is saved in "X_umap_custom") in the consensus_result folder, along with the results of consensus clustering in leiden_res0.3.csv, the visualizations of consensus clustering results on the original UMAP and rpcUMAP, and the visualization of consensus clustering results on the spatial plots.

Next, we present the visualizations of the consensus clustering results on the original UMAP, rpcUMAP and spatial plots.

# You do not need to execute this code
img_path1 = "/mnt/disk1/hzh/Tutorial3/consensus_result/leiden_UMAP_res0.3.png"
img_path2 = "/mnt/disk1/hzh/Tutorial3/consensus_result/leiden_rpcUMAP_res0.3.png"
img_path3 = "/mnt/disk1/hzh/Tutorial3/consensus_result/spatial_leiden_res0.3.png"
img1_b64 = img_to_base64(img_path1)
img2_b64 = img_to_base64(img_path2)
img3_b64 = img_to_base64(img_path3)
HTML(f"""
<div style="display: flex; gap: 20px; scope: local;">
    <img src="data:image/png;base64,{img1_b64}" width="400">
    <img src="data:image/png;base64,{img2_b64}" width="400">
    <img src="data:image/png;base64,{img3_b64}" width="300">
</div>
""")

Step5: Cluster merging

In this step, we use stagate_scMagnifier as our conda environment. We define spatial_merge() function to perform cluster merging. In the function, we set the default value of min_size_fraction to 0.01. In practical use, you can lower this value—especially for tasks involving the identification of rare cells, where you can set it to 0.001.Additionally, if you need to control the number of clusters in the final output, you can also do so by adjusting the value of min_size_fraction. If you input the spot_size parameter earlier, you need to input the same value here as well.
You can refer to the spatial_merge_core.py file for the remaining detailed parameters.

# stagate_scMagnifier conda environment
from stagate_scMagnifier import spatial_merge
spatial_merge()
[INFO] Auto-selected cluster CSV: consensus_result/leiden_res0.3.csv
[INFO] Loading h5ad and cluster CSV ...
[INFO] Read cluster CSV with 3798 rows.
[INFO] 3798 cells after alignment (intersection).
[INFO] Found 16 initial D-clusters.
[INFO] Centroid coords shape (cells x features): (3798, 2000)
[INFO] Computed 16 centroids based on HVG expression.
[INFO] THM raw = 5.51845, th_scaler = 0.75, THM_scaled = 4.13884
[INFO] After threshold merging, 16 merged clusters created.
[INFO] 16 clusters after the first merge step.
[INFO] Minimum cluster size threshold = 38 cells (1.00%).
[INFO] 15 clusters after merging small clusters.
[SAVED] Merged cluster CSV -> merged_result/merged_clusters.csv
[SAVED] Old UMAP merged plot -> merged_result/umap_old_merged.png
[SAVED] New UMAP merged plot -> merged_result/umap_new_merged.png
[INFO] Plotting spatial with scanpy default spot_size
[SAVED] Spatial merged plot -> merged_result/spatial_merged.png
[DONE] Merge + UMAP + spatial visualization complete.

After cluster merging, we obtain the final results, including the final clustering results saved in merged_clusters.csv, as well as the visualizations of the clustering results on the original UMAP, rpcUMAP and spatial plots.

Next, we demonstrate the visualization plots of the final clustering results on the original UMAP, rpcUMAP and spatial plots.

# You do not need to execute this code
img_path1 = "/mnt/disk1/hzh/Tutorial3/merged_result/umap_old_merged.png"
img_path2 = "/mnt/disk1/hzh/Tutorial3/merged_result/umap_new_merged.png"
img_path3 = "/mnt/disk1/hzh/Tutorial3/merged_result/spatial_merged.png"
img1_b64 = img_to_base64(img_path1)
img2_b64 = img_to_base64(img_path2)
img3_b64 = img_to_base64(img_path3)
HTML(f"""
<div style="display: flex; gap: 20px; scope: local;">
    <img src="data:image/png;base64,{img1_b64}" width="400">
    <img src="data:image/png;base64,{img2_b64}" width="400">
    <img src="data:image/png;base64,{img3_b64}" width="300">
</div>
""")