Optimal transport modeling uncovers spatial domain dynamics in spatiotemporal transcriptomics
Authors:
Wenjing Ma, Department of Biostatistics, University of Michigan
Siyu Hou, Department of Biostatistics, University of Michigan
Lulu Shang, Department of Biostatistics, MD Anderson Cancer Center
Jiaying Lu, Center for Data Science, School of Nursing, Emory University
Xiang Zhou, Department of Biostatistics, University of Michigan
Maintainer: Wenjing Ma (wenjinma@umich.edu)
Latest revision: 05/03/2025
Spatiotemporal transcriptomics is an emerging and powerful approach that adds a temporal dimension to traditional spatial transcriptomics, thus enabling the characterization of dynamic changes in tissue architecture during development or disease progression. Tissue architecture is generally organized into spatial domains – regions within a tissue characterized by relatively similar gene expression profiles that often correspond to distinct biological functions. Crucially, these spatial domains are not static; rather, they undergo complex temporal dynamics during development, differentiation, and disease progression, resulting in emergence, disappearance, splitting, and merging of domains over time. Therefore, we develop SpaDOT (Spatial DOmain Transition detection), a novel and scalable machine learning method for identifying spatial domains and inferring their temporal dynamics in spatiotemporal transcriptomics studies.
(The figure illustrates how SpaDOT works. SpaDOT adopts an integration of two complementary encoders, a Gaussian Process kernel and a Graph Attention Transformer, within one variational autoencoder framework to obtain spatially aware latent representations. The latent representations are further constrained by clustering within each time point and optimal transport (OT) coupling across time points, enabling SpaDOT to identify spatial domains and capture domain transition dynamics. )
In this tutorial, we provide detailed instructions for SpaDOT by utilizing two real data applications: a developing chicken heart sequenced by 10X Vision and a developing mouse brain sequenced by Stereo-seq.
Step 1: SpaDOT is developed as a Python package. You will have to install Python, and the recommended version is Python 3.9. SpaDOT also incorporates an R package SPARK-X to perform feature selection as an option. Having the spatial variable genes selected is a practice that we test can generate better results. Therefore, installiation of R and SPARK-X is recommended.
Step 2: Use the following command to install SpaDOT:
pip install SpaDOT
The installation will take seconds to finish and the software dependencies have been taken care of by pip. We have tested our package on Windows, MacOS, and Linux.
You will also have to install your version (based on your pytorch and cuda version) of pyg
or torch-sparse
and then install torch-geometric
, more details can be found here. In my scenario:
pip install pyg-lib torch-scatter torch-sparse -f https://data.pyg.org/whl/torch-2.5.0+cu124.html
pip install torch-geometric -f https://data.pyg.org/whl/torch-2.5.0+cu124.html
To make life easier, I would highly recommend using a conda environment.
conda create -n SpaDOT python=3.9
pip install SpaDOT
Then, install pyg-lib
and torch-geometric
based on your import torch; torch.__version__
.
Step 3: If you have successfully installed SpaDOT, you can try the following command:
## Check the help documentation of SpaDOT
SpaDOT -h
You should see the following console output:
usage: SpaDOT [-h] {preprocess,train,analyze} ...
SpaDOT: Optimal transport modeling uncovers spatial domain dynamics in spatiotemporal transcriptomics.
positional arguments:
{preprocess,train,analyze}
sub-command help.
preprocess (Recommended but optional) Perform data preprocessing and feature selection.
train Train a SpaDOT model.
analyze Analyze the latent representations generated by SpaDOT model.
optional arguments:
-h, --help show this help message and exit
In each module, you can use -h
to show related help pages. For example, SpaDOT train -h
:
usage: SpaDOT train [-h] [-i DATA] [-o OUTPUT_DIR] [--prefix PREFIX] [--config CONFIG] [--device DEVICE] [--save_model]
optional arguments:
-h, --help show this help message and exit
-i DATA, --data DATA A anndata object containing time point information and spatial coordinates.
-o OUTPUT_DIR, --output_dir OUTPUT_DIR
Output directory to store latent representations. Default: the same as where the data locates.
--prefix PREFIX Prefix for output latent representations. Default: ''
--config CONFIG Path to the config file, in a yaml format.
--device DEVICE Device to use for training. Default: cuda:0
--save_model Whether saving the trained model. If specified, the trained model will be stored in the output_dir as 'model.pt'.
Step 4 (Optional): If you would like to use SpaDOT with-in program, you can do the following and pass the correct arguments into corresponding functions.
import SpaDOT
data_dir = "./examples"
# create arguments for preprocessing
class Args:
data = os.path.join(data_dir, "ChickenHeart.h5ad")
prefix = "preprocessed_"
feature_selection = True
args = Args()
# create output directory if not exists
if 'output_dir' not in args.__dict__:
args.output_dir = os.path.dirname(args.data)
if not os.path.exists(args.output_dir):
os.makedirs(args.output_dir)
SpaDOT.preprocess(args)
# create arguments for training
class Args:
data = os.path.join(data_dir, "preprocessed_ChickenHeart.h5ad")
prefix = ""
config = None
save_model = True
args = Args()
SpaDOT.train(args)
# create arguments for analyses
class Args:
data = os.path.join(data_dir, "latent.h5ad")
prefix = ""
n_clusters = [5, 7, 7, 6]
args = Args()
SpaDOT.analyze(args)
The developing chicken heart is measured by 10X Visium and collected from four stages: Day 4, Day 7, Day 10 and Day 14. In this dataset, SpaDOT accurately identifies valvulogenesis - a valve splits into artrioventricular valve and semilunar valve at Day 14.
Step 1: obtain example data
First, we downloaded the spatial transcritpomics data from GSE149457 and selected
GSM4502482_chicken_heart_spatial_RNAseq_D4_filtered_feature_bc_matrix.h5
GSM4502483_chicken_heart_spatial_RNAseq_D7_filtered_feature_bc_matrix.h5
GSM4502484_chicken_heart_spatial_RNAseq_D10_filtered_feature_bc_matrix.h5
GSM4502485_chicken_heart_spatial_RNAseq_D14_filtered_feature_bc_matrix.h5
Second, we downloaded spatial coordinates from the analysis code shared by the paper on Github:
chicken_heart_spatial_RNAseq_D4_tissue_positions_list.csv
chicken_heart_spatial_RNAseq_D7_tissue_positions_list.csv
chicken_heart_spatial_RNAseq_D10_tissue_positions_list.csv
chicken_heart_spatial_RNAseq_D14_tissue_positions_list.csv
Third, we used the script process_ChickenHeart.py
provided here to preprocess the data by integrating them into one anndata with timepoint
in anndata observations (obs) as one-hot encoder indicating four time points, 0
, 1
, 2
and 3
indicate Day 4, Day 7, Day 10 and Day 14, respectively. We have also put the spatial coordinates with keyword spatial
as a numpy array inside anndata observation metadata (obsm).
After running the process_ChickenHeart.py
, we will obtain the file ChickenHeart.h5ad
. For your convenience, you can download the processed data here.
Step 2: perform data preprocessing
After obtaining ChickenHeart.h5ad
, we the perform the data preprocessing. Here, we use command line as demonstration.
Rscript run_SPARKX.R --data ./ChickenHeart.h5ad
SpaDOT preprocess --data ./ChickenHeart.h5ad
run_SPARKX.R
can be found here.
If spatially variable genes selection is not desired, you can add an additional option --feature_selection False
, which would use all genes from the data. However, we again recommend having feature selection to generate better results.
SpaDOT preprocess --data ./ChickenHeart.h5ad --feature_selection False
After data preprocessing, we will have processed_ChickenHeart.h5ad
in the directory.
Step 3: train SpaDOT to obtain latent representations
Then, we can train the SpaDOT model to obtain latent representations by using
SpaDOT train --data preprocessed_ChickenHeart.h5ad
Here, we train with default parameters. For overriding default parameters purpose, a yaml
file is needed. We have provided an example config.yaml with our default parameters for your reference. Again, we do recommend using our default parameters to achieve the best performance.
SpaDOT train --data preprocessed_ChickenHeart.h5ad --config config.yaml
Step 4: infer spatial domains and domain dynamics based on region number
Once the training stage finishes, we can obtain spatial domains and generate domain dynamics. If we have prior knowledge on how many domains we might have (given by the original study), we can run:
SpaDOT analyze --data latent.h5ad --n_clusters 5,7,7,6
Timepoint | Day 4 | Day 7 | Day 10 | Day 14 |
---|---|---|---|---|
Spatial Domains | ![]() |
![]() |
![]() |
![]() |
Timepoint | Day 4 –> Day 7 | Day 7 –> Day 10 | Day 10 –> Day 14 |
---|---|---|---|
OT transition | ![]() |
![]() |
![]() |
Step 5: infer spatial domains and domain dynamics based on Elbow method (Optional)
Or, we can leave out the --n_clusters
option, then SpaDOT would automatically run Elbow method to determine the number of clusters
SpaDOT analyze --data latent.h5ad
We then have the plot of calculating the within-cluster sum of squares (WSS) of KMeans with the number of clusters ranging from 5 to 20. We then detect the Elbow point and select the corresponding cluster number.
Timepoint | Day 4 | Day 7 | Day 10 | Day 14 |
---|---|---|---|---|
WSS per cluster | ![]() |
![]() |
![]() |
![]() |
Spatial Domains | ![]() |
![]() |
![]() |
![]() |
Timepoint | Day 4 –> Day 7 | Day 7 –> Day 10 | Day 10 –> Day 14 |
---|---|---|---|
OT transition | ![]() |
![]() |
![]() |
SpaDOT provides efficient and accurate spatial domain detection for spatiotemporal transcriptomics studies and offers insights into domain transition dynamics. Its contributions are three-fold:
Capturing domain dynamics via optimal transport (OT) constraints: Across time points, OT constraints guide the alignment of functionally similar domains while separating dissimilar ones, enabling SpaDOT to accurately identify both shared and time-specific domains and infer their biological relationships over time.
Modeling both global and local spatial patterns for enhanced structural embedding: Within each time point, SpaDOT integrates a Gaussian Process (GP) prior and Graph Attention Transformer (GAT) within a variational autoencoder (VAE) framework, capturing both global spatial continuity and local structural heterogeneity.
Eliminating the need to predefine the number of spatial domains: Unlike existing methods, SpaDOT does not require the number of domains to be specified in advance, reducing dependence on prior knowledge and improving ease of use.
We hope that SpaDOT will be a useful tool for your research. For questions or comments, please open an issue on Github.