library(TESSA)
library(dplyr)
library(ggplot2)
library(Seurat)
library(SingleCellExperiment)
library(slingshot)
We used sample S2A from the PDAC dataset as a main example. First, load PDAC S2A sample as a Seurat object and signature genes for normal ductal cells, tumor ductal cells and stroma cancer associated fibroblast(CAF) cells.
seu.obj <- readRDS('./data/PDAC_S2A_seurat.rds')
seu.obj
## An object of class Seurat
## 17943 features across 3142 samples within 1 assay
## Active assay: Spatial (17943 features, 0 variable features)
## 1 layer present: counts
## 1 spatial field of view present: slice1
##
mklist <- list(
'NormalDuctal' = c('KRT19', 'EPCAM', 'GATA6', 'CFTR', 'MUC1', 'KRT7', 'CDH1', 'HNF1A', 'FOXA2'),
'TumorDuctal' = c('KRT19', 'MUC1', 'MUC5AC', 'CEACAM6', 'S100P', 'SOX9', 'ZEB1', 'SNAI2', 'TP63', 'FN1', 'CD44', 'VIM','TFF1','TFF2'),
'StromaCAF' = c('COL1A1', 'FN1', 'SPARC', 'ACTA2', 'PDGFRB', 'POSTN', 'LOXL2', 'MMP2', 'MMP9', 'FAP', 'TGFB1', 'CXCL12', 'CCL2')
)
mkgenes = unique(unlist(mklist))
mkgenes = intersect(mkgenes, rownames(seu.obj))
str(mkgenes)
## chr [1:31] "EPCAM" "GATA6" "CFTR" "MUC1" "KRT7" "CDH1" "HNF1A" "FOXA2" ...
Here, we use signature genes to estimate pseudotime. Because we only have 33 signature genes, we set the number of principal components as 10.
seu.obj <- SCTransform(seu.obj, assay = "Spatial", verbose = FALSE, residual.features = NULL, return.only.var.genes = FALSE )
seu.obj <- RunPCA(seu.obj, assay = "SCT", verbose = FALSE, features = mkgenes)
## Warning in svd.function(A = t(x = object), nv = npcs, ...): You're computing
## too large a percentage of total singular values, use a standard svd instead.
## Warning in svd.function(A = t(x = object), nv = npcs, ...): did not
## converge--results might be invalid!; try increasing work or maxit
npcs = 10
seu.obj <- FindNeighbors(seu.obj, dims = 1:npcs,verbose = F)
seu.obj <- FindClusters(seu.obj,resolution = 0.25,verbose = F)
seu.obj <- RunUMAP(seu.obj, dims = 1:npcs,verbose = F)
sim <- SingleCellExperiment(assays = seu.obj[['Spatial']]$counts, colData = seu.obj@meta.data)
embedding <- Embeddings(seu.obj, reduction = "pca")
reducedDims(sim) <- SimpleList(PCA = embedding[,1:npcs])
sim <- slingshot(sim, clusterLabels = 'seurat_clusters', reducedDim = 'PCA',start.clus="4" )
summary(sim@colData@listData)
## Length Class Mode
## orig.ident 3142 -none- character
## nCount_Spatial 3142 -none- numeric
## nFeature_Spatial 3142 -none- numeric
## nCount_SCT 3142 -none- numeric
## nFeature_SCT 3142 -none- numeric
## SCT_snn_res.0.25 3142 factor numeric
## seurat_clusters 3142 factor numeric
## slingshot 3142 PseudotimeOrdering S4
## slingPseudotime_1 3142 -none- numeric
## slingPseudotime_2 3142 -none- numeric
seu.obj@meta.data$slingshot_t1 = sim@colData@listData$slingPseudotime_1
seu.obj@meta.data$slingshot_t2 = sim@colData@listData$slingPseudotime_2
Visualization of pseudotime on both umap and spatial coordinates.
options(repr.plot.width = 10, repr.plot.height = 5)
pt.size = 2.4; cont_cols <- c("#fafbd4",'#5c92b6')
FeaturePlot(seu.obj, features = 'slingshot_t1', cols = cont_cols) + NoLegend() & theme(plot.title = element_blank()) |
FeaturePlot(seu.obj, features = 'slingshot_t2', cols = cont_cols)+ NoLegend() & theme(plot.title = element_blank())
SpatialFeaturePlot(seu.obj, features = 'slingshot_t1', pt.size = pt.size, alpha = 1) &
ggplot2::scale_fill_gradientn(colors = cont_cols) & theme(legend.title = element_text(size = 20))|
SpatialFeaturePlot(seu.obj, features = 'slingshot_t2', pt.size = pt.size) &
ggplot2::scale_fill_gradientn(colors = cont_cols)& theme(legend.title = element_text(size = 20))
## Scale for fill is already present.
## Adding another scale for fill, which will replace the existing scale.
## Scale for fill is already present.
## Adding another scale for fill, which will replace the existing scale.
The required input is a raw count matrix counts
, a
metadata dataframe including spatial coordinates, estimated pseudotime
meta_df
and sample ID, and signature genes used to estimate
pseudotime mkgenes
.
counts = seu.obj[['Spatial']]$counts
locations = GetTissueCoordinates(seu.obj)[,c('x','y')]
meta_df = data.frame(x = GetTissueCoordinates(seu.obj)$x,
y = GetTissueCoordinates(seu.obj)$y,
lineage1 = seu.obj$slingshot_t1,
lineage2 = seu.obj$slingshot_t2,
Sample_ID = 'S2A')
Firstly, We preprocess the data by filtering out low quality spots
with less than 10 genes detected and genes that expressed in less than
10% of the spots by data_preprocess()
. Secondly, we
construct spatial and pseudotime kernels by
build_kernelMatrix
.
Tessa.obj = CreateTessaObject(counts = counts, meta_df = meta_df,
signature_genes = mkgenes,
covariates = NULL, normalized = FALSE)
Tessa.obj = data_preprocess(object = Tessa.obj, spot.threshold = 10, gene.threshold = 0.1)
## The location and pseudotime have been scaled.
## 0 spots with gene expression less than 10 have been removed.
## 6828 genes with gene expression rate less than 0.1 have been removed.
## Normalizing the count data...
Tessa.obj = build_kernelMatrix(Tessa.obj)
## Found more than one class "dist" in cache; using the first, from namespace 'BiocGenerics'
## Also defined by 'spam'
## Found more than one class "dist" in cache; using the first, from namespace 'BiocGenerics'
## Also defined by 'spam'
We first run stage 1 overall test to identify unified TSVGs (uTSVGs),
refers to genes exhibits either temporal and spatial patterns.
run_Test1()
run stage 1 test for all lineages in the
metadata and get_Test1_result()
extracts the p-value
restult. To address the double dipping issue, refers that genes used
twice for both pseudotime estimation and testing, leading to seriously
inflated type I errors, we apply Leave-One-Out(LOO) correction by
setting parameter as LOO = TRUE
. The genes with
< 0.05
are idenfied as uTSVGs.
system.time(
Tessa.obj <- run_Test1(Tessa.obj,LOO = FALSE, npcs = 10)
)
## Identifying uTSVGs for lineage1 ...
## Identifying uTSVGs for lineage2 ...
## user system elapsed
## 330.792 2.869 236.711
system.time(
Tessa.obj <- run_Test1(Tessa.obj, LOO = TRUE, npcs = 10)
)
## Identifying uTSVGs for lineage1 ...
## Identifying uTSVGs for lineage2 ...
## user system elapsed
## 3122.051 25.690 659.479
uTSVGs_result <- get_Test1_result(Tessa.obj) %>% filter(pvs_adj_LOO < 0.05)
head(uTSVGs_result)
## geneid pvs pvs_adj pvs_LOO pvs_adj_LOO lineage
## 1 HES4 1.555476e-08 6.318679e-07 1.555476e-08 6.318679e-07 lineage1
## 2 ISG15 7.886800e-32 6.293653e-30 7.886800e-32 6.284532e-30 lineage1
## 3 AGRN 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 lineage1
## 4 TNFRSF4 3.572649e-06 1.100766e-04 3.572649e-06 1.100766e-04 lineage1
## 5 UBE2J2 9.039010e-04 1.931533e-02 9.039010e-04 1.931533e-02 lineage1
## 6 INTS11 1.616058e-04 3.953469e-03 1.616058e-04 3.953469e-03 lineage1
## uTSVGs for each lineage
uTSVGs_list <- split(uTSVGs_result$geneid, uTSVGs_result$lineage)
str(uTSVGs_list)
## List of 2
## $ lineage1: chr [1:5643] "HES4" "ISG15" "AGRN" "TNFRSF4" ...
## $ lineage2: chr [1:6605] "HES4" "ISG15" "AGRN" "C1orf159" ...
## uTSVGs for all lineages
uTSVGs <- unique(unlist(uTSVGs_list))
We could further dissect whether a uTSVG is temporally variable
gene(TVG), or spatially variable gene(SVG) by conduction stage 2
individual test. To save time, here we run stage 2 test for 2 genes by
run_Test2_lineage()
as a quick demo.
system.time(
Tessa.obj <- run_Test2_lineage(Tessa.obj, LOO = TRUE,lineage = 'lineage1',genes = c("HES4", "ISG15"))
)
## run Test2 on user defined 2 uTSVGs on lineage1
## user system elapsed
## 73.780 2.913 35.822
Test2_result <- get_Test2_result(Tessa.obj)
Test2_result
## geneid TVG_pvs TVG_pvs_adj SVG_pvs SVG_pvs_adj
## 1 HES4 0.427729880169835 0.6415948 9.67304871006445e-17 2.901915e-16
## 2 ISG15 0.112403134734778 0.3372094 1.27758564218014e-12 1.916378e-12
## 3 <NA> <NA> NA <NA> NA
## lineage
## 1 lineage1
## 2 lineage1
## 3 lineage2
Double dipping issue, refers that genes used twice for both pseudotime estimation and testing, leading to seriously inflated type I errors. To address it, LOO is based on a simple gene-splitting idea that if we want to test gene A, we re-estimate the pseudotime without gene A. In theory, LOO can be applied to any pseudotime estimation algorithm, but in practice, we set Slingshot as the default pseudotime estimation algorithm to apply LOO correction in Tessa package. To construct a meaningful pseudotime and reduce the confounding between spatial and temporal effect, we recommend user to infer pseudotime using gene expression based pseudotime inference method like Slingshot, with a concise and strong signature genes. For example, in this PDAC sample, we are interested in the ductal tumor progression, so we choose 31 signature genes related to normal ductal, tumor ductal and stromaCAF.