tutorial

library(TESSA)
library(dplyr)
library(ggplot2)
library(Seurat)
library(SingleCellExperiment)
library(slingshot)

Load data and signature genes

We used sample S2A from the PDAC dataset as a main example. First, load PDAC S2A sample as a Seurat object and signature genes for normal ductal cells, tumor ductal cells and stroma cancer associated fibroblast(CAF) cells.

seu.obj <- readRDS('./data/PDAC_S2A_seurat.rds')
seu.obj

## An object of class Seurat 
## 17943 features across 3142 samples within 1 assay 
## Active assay: Spatial (17943 features, 0 variable features)
##  1 layer present: counts
##  1 spatial field of view present: slice1

## 
mklist <- list(
  'NormalDuctal' = c('KRT19', 'EPCAM', 'GATA6', 'CFTR', 'MUC1', 'KRT7', 'CDH1', 'HNF1A', 'FOXA2'),
  'TumorDuctal' = c('KRT19', 'MUC1', 'MUC5AC', 'CEACAM6', 'S100P', 'SOX9', 'ZEB1', 'SNAI2', 'TP63', 'FN1', 'CD44', 'VIM','TFF1','TFF2'),
  'StromaCAF' = c('COL1A1', 'FN1', 'SPARC', 'ACTA2', 'PDGFRB', 'POSTN', 'LOXL2', 'MMP2', 'MMP9', 'FAP', 'TGFB1', 'CXCL12', 'CCL2')
)
mkgenes = unique(unlist(mklist))
mkgenes = intersect(mkgenes, rownames(seu.obj))
str(mkgenes)

##  chr [1:31] "EPCAM" "GATA6" "CFTR" "MUC1" "KRT7" "CDH1" "HNF1A" "FOXA2" ...

Pseudotime inference by slingshot

Here, we use signature genes to estimate pseudotime. Because we only have 33 signature genes, we set the number of principal components as 10.

seu.obj <- SCTransform(seu.obj, assay = "Spatial", verbose = FALSE, residual.features = NULL, return.only.var.genes = FALSE ) 
seu.obj <- RunPCA(seu.obj, assay = "SCT", verbose = FALSE, features = mkgenes)

## Warning in svd.function(A = t(x = object), nv = npcs, ...): You're computing
## too large a percentage of total singular values, use a standard svd instead.

## Warning in svd.function(A = t(x = object), nv = npcs, ...): did not
## converge--results might be invalid!; try increasing work or maxit

npcs = 10
seu.obj <- FindNeighbors(seu.obj, dims = 1:npcs,verbose = F)
seu.obj <- FindClusters(seu.obj,resolution = 0.25,verbose = F)
seu.obj <- RunUMAP(seu.obj, dims = 1:npcs,verbose = F)


sim <- SingleCellExperiment(assays = seu.obj[['Spatial']]$counts, colData =  seu.obj@meta.data)
embedding <-  Embeddings(seu.obj, reduction = "pca") 
reducedDims(sim) <- SimpleList(PCA = embedding[,1:npcs])
sim  <- slingshot(sim, clusterLabels = 'seurat_clusters', reducedDim = 'PCA',start.clus="4" )  
summary(sim@colData@listData)

##                   Length Class              Mode     
## orig.ident        3142   -none-             character
## nCount_Spatial    3142   -none-             numeric  
## nFeature_Spatial  3142   -none-             numeric  
## nCount_SCT        3142   -none-             numeric  
## nFeature_SCT      3142   -none-             numeric  
## SCT_snn_res.0.25  3142   factor             numeric  
## seurat_clusters   3142   factor             numeric  
## slingshot         3142   PseudotimeOrdering S4       
## slingPseudotime_1 3142   -none-             numeric  
## slingPseudotime_2 3142   -none-             numeric

seu.obj@meta.data$slingshot_t1 = sim@colData@listData$slingPseudotime_1
seu.obj@meta.data$slingshot_t2 = sim@colData@listData$slingPseudotime_2

Visualization of pseudotime on both umap and spatial coordinates.

options(repr.plot.width = 10, repr.plot.height = 5)
pt.size = 2.4; cont_cols <- c("#fafbd4",'#5c92b6')

FeaturePlot(seu.obj, features = 'slingshot_t1', cols = cont_cols) + NoLegend()  &   theme(plot.title = element_blank())  | 
FeaturePlot(seu.obj, features = 'slingshot_t2', cols = cont_cols)+ NoLegend() &   theme(plot.title = element_blank())

SpatialFeaturePlot(seu.obj, features = 'slingshot_t1', pt.size = pt.size, alpha = 1) & 
  ggplot2::scale_fill_gradientn(colors = cont_cols)  & theme(legend.title = element_text(size = 20))|
SpatialFeaturePlot(seu.obj, features = 'slingshot_t2', pt.size = pt.size) & 
  ggplot2::scale_fill_gradientn(colors = cont_cols)& theme(legend.title = element_text(size = 20))

## Scale for fill is already present.
## Adding another scale for fill, which will replace the existing scale.
## Scale for fill is already present.
## Adding another scale for fill, which will replace the existing scale.

Create TessaObject

Prepare input data

The required input is a raw count matrix counts, a metadata dataframe including spatial coordinates, estimated pseudotime meta_df and sample ID, and signature genes used to estimate pseudotime mkgenes.

counts = seu.obj[['Spatial']]$counts
locations = GetTissueCoordinates(seu.obj)[,c('x','y')]
meta_df = data.frame(x = GetTissueCoordinates(seu.obj)$x,
                     y = GetTissueCoordinates(seu.obj)$y,
                     lineage1 = seu.obj$slingshot_t1,
                     lineage2 = seu.obj$slingshot_t2,
                     Sample_ID = 'S2A')

Firstly, We preprocess the data by filtering out low quality spots with less than 10 genes detected and genes that expressed in less than 10% of the spots by data_preprocess(). Secondly, we construct spatial and pseudotime kernels by build_kernelMatrix.

Tessa.obj =  CreateTessaObject(counts = counts, meta_df = meta_df,
                               signature_genes = mkgenes,
                               covariates = NULL, normalized = FALSE)
Tessa.obj = data_preprocess(object = Tessa.obj, spot.threshold = 10, gene.threshold = 0.1)

## The location and pseudotime have been scaled.
## 0 spots with gene expression less than 10 have been removed. 
## 6828 genes with gene expression rate less than 0.1 have been removed. 
## Normalizing the count data...

Tessa.obj = build_kernelMatrix(Tessa.obj)

## Found more than one class "dist" in cache; using the first, from namespace 'BiocGenerics'

## Also defined by 'spam'

## Found more than one class "dist" in cache; using the first, from namespace 'BiocGenerics'

## Also defined by 'spam'

TESSA Stage 1 Test

We first run stage 1 overall test to identify unified TSVGs (uTSVGs), refers to genes exhibits either temporal and spatial patterns. run_Test1() run stage 1 test for all lineages in the metadata and get_Test1_result() extracts the p-value restult. To address the double dipping issue, refers that genes used twice for both pseudotime estimation and testing, leading to seriously inflated type I errors, we apply Leave-One-Out(LOO) correction by setting parameter as LOO = TRUE. The genes with < 0.05 are idenfied as uTSVGs.

system.time(
  Tessa.obj <-  run_Test1(Tessa.obj,LOO = FALSE, npcs = 10)
)

## Identifying uTSVGs for  lineage1  ... 
## Identifying uTSVGs for  lineage2  ...

##    user  system elapsed 
## 330.792   2.869 236.711

system.time(
  Tessa.obj <-  run_Test1(Tessa.obj, LOO = TRUE, npcs = 10)  
)

## Identifying uTSVGs for  lineage1  ... 
## Identifying uTSVGs for  lineage2  ...

##     user   system  elapsed 
## 3122.051   25.690  659.479

uTSVGs_result <- get_Test1_result(Tessa.obj) %>% filter(pvs_adj_LOO < 0.05)
head(uTSVGs_result)

##    geneid          pvs      pvs_adj      pvs_LOO  pvs_adj_LOO  lineage
## 1    HES4 1.555476e-08 6.318679e-07 1.555476e-08 6.318679e-07 lineage1
## 2   ISG15 7.886800e-32 6.293653e-30 7.886800e-32 6.284532e-30 lineage1
## 3    AGRN 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 lineage1
## 4 TNFRSF4 3.572649e-06 1.100766e-04 3.572649e-06 1.100766e-04 lineage1
## 5  UBE2J2 9.039010e-04 1.931533e-02 9.039010e-04 1.931533e-02 lineage1
## 6  INTS11 1.616058e-04 3.953469e-03 1.616058e-04 3.953469e-03 lineage1

## uTSVGs for each lineage
uTSVGs_list <- split(uTSVGs_result$geneid, uTSVGs_result$lineage)
str(uTSVGs_list)

## List of 2
##  $ lineage1: chr [1:5643] "HES4" "ISG15" "AGRN" "TNFRSF4" ...
##  $ lineage2: chr [1:6605] "HES4" "ISG15" "AGRN" "C1orf159" ...

## uTSVGs for all lineages
uTSVGs <- unique(unlist(uTSVGs_list))

TESSA Stage 2 Test

We could further dissect whether a uTSVG is temporally variable gene(TVG), or spatially variable gene(SVG) by conduction stage 2 individual test. To save time, here we run stage 2 test for 2 genes by run_Test2_lineage() as a quick demo.

system.time(
Tessa.obj <-  run_Test2_lineage(Tessa.obj, LOO = TRUE,lineage = 'lineage1',genes = c("HES4", "ISG15"))
)

## run Test2 on user defined  2  uTSVGs on  lineage1

##    user  system elapsed 
##  73.780   2.913  35.822

Test2_result <- get_Test2_result(Tessa.obj)
Test2_result

##   geneid           TVG_pvs TVG_pvs_adj              SVG_pvs  SVG_pvs_adj
## 1   HES4 0.427729880169835   0.6415948 9.67304871006445e-17 2.901915e-16
## 2  ISG15 0.112403134734778   0.3372094 1.27758564218014e-12 1.916378e-12
## 3   <NA>              <NA>          NA                 <NA>           NA
##    lineage
## 1 lineage1
## 2 lineage1
## 3 lineage2

Notes for Pseudotime estimation and LOO double dipping correction

Double dipping issue, refers that genes used twice for both pseudotime estimation and testing, leading to seriously inflated type I errors. To address it, LOO is based on a simple gene-splitting idea that if we want to test gene A, we re-estimate the pseudotime without gene A. In theory, LOO can be applied to any pseudotime estimation algorithm, but in practice, we set Slingshot as the default pseudotime estimation algorithm to apply LOO correction in Tessa package. To construct a meaningful pseudotime and reduce the confounding between spatial and temporal effect, we recommend user to infer pseudotime using gene expression based pseudotime inference method like Slingshot, with a concise and strong signature genes. For example, in this PDAC sample, we are interested in the ductal tumor progression, so we choose 31 signature genes related to normal ductal, tumor ductal and stromaCAF.