| Literature DB >> 35432484 |
Jared Huzar1,2, Hannah Kim1,2, Sudhir Kumar1,2,3, Sayaka Miura1,2.
Abstract
In cancer, somatic mutations occur continuously, causing cell populations to evolve. These somatic mutations result in the evolution of cellular gene expression patterns that can also change due to epigenetic modifications and environmental changes. By exploring the concordance of gene expression changes with molecular evolutionary trajectories of cells, we can examine the role of somatic variation on the evolution of gene expression patterns. We present Multi-Omics Concordance Analysis (MOCA) software to jointly analyze gene expressions and genetic variations from single-cell RNA sequencing profiles. MOCA outputs cells and genes showing convergent and divergent gene expression patterns in functional genomics.Entities:
Keywords: cellular phylogeny; gene expression trajectory; multi-omics analyses; single-cell RNA sequencing; tumor evolution
Year: 2022 PMID: 35432484 PMCID: PMC9009314 DOI: 10.3389/fgene.2022.831040
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.772
FIGURE 1Overview of MOCA. (A) MOCA workflow. MOCA analyzes cellular genetic variations and gene expression profiles from scRNA-seq data. (B) MOCA takes as input a phylogenetic tree if the genetic ancestry of each cell is not provided. If the type of tree shape (balanced or unbalanced) is unclear, MOCA’s TreeBalance function first suggests the tree’s shape. Based on the tree shape, MOCA suggests using either the BalancedAnnotation or the UnbalancedAnnotation function to identify groups of genetically similar cells, which are defined as genetic ancestries. The phylogeny was inferred by BEAM analysis of 139 variants in the MGH26 data. (C) MOCA’s AncestryComparison function visualizes and quantifies the relationship between inferred genetic ancestry annotations among different phylogenies. For each pair of trees (genetic ancestry annotations), Cramer’s V effective size together with p-value is produced. (D) Using the genetic ancestries together with the gene expression matrix (input), MOCA’s PhyloTrajectory function infers the expression trajectory. From the inferred trajectory, PhyloTrajectory calculates the Sub-concordance index (SCI) for each genetic ancestry and Overall concordance index (OCI). The SCI index is the count of expression states which are largely unique to a given genetic ancestry, e.g., >80% of cells. The OCI is the ratio of total expression states that are largely unique to any single ancestry compared to all the expression states identified. (E) The SCI index for each genetic ancestry for different numbers of genes, 200–1,000. (F) The OCI of the tumor across gene sets, 200–1,000. These indices are produced for each tree.
FIGURE 2Hou data analysis. (A,B) Inferred phylogenies on datasets where 60% (A) and 70% (B) SNV filtering cutoffs were applied. BEAM was used for building cellular phylogeny to account for missing data and mutation calling errors. The genetic ancestry annotation obtained from the analysis of DNA data is shown next to the phylogenies. The DNA-based annotation was obtained from Hou et al. (2016). (C–E) The inferred expression trajectory using genetic ancestries annotated on the dataset with 60% SNV filtering cutoff tree (C), 70% SNV filtering cutoff tree (D), and DNA-based genetic ancestry annotation (E). 1,000 most differentially expressed genes between the genetic ancestries were used in each analysis.
FIGURE 3Glioblastoma tumor data analysis. (A–C) Analysis of MGH26 tumor data. (A) Schematic of genetic ancestries that are consistent across four phylogenies (Supplementary Figure S2). (B) Sub-concordance index (SCI) of genetic ancestry 1 and 3. (C) The number of all expression states and states that are unique to genetic ancestries allowing a few exceptions (>80% of cells from an ancestry share the same expression state). (D–G) Analysis of MGH31 tumor dataset. (D) Schematic of genetic ancestries. Copy number aberrations were used for genetic ancestry annotation for MGH31 data. Cells of genetic ancestry 1 contained 13q and 14q deletion, while cells of genetic ancestry 2 contained 1p and 5q amplification. (E) The inferred expression trajectory using the top 1,000 most differentially expressed genes between the genetic ancestries. All normal cells were predicted to have the same gene expression state. (F) The Sub-concordance index (SCI) of each genetic ancestry. Normal cells were excluded, which always had a concordance index equal to one. (G) The Overall concordance index (OCI). The expression state for normal cells was included. The MGH26 and MGH31 datasets are from Patel et al. (2014).