| Literature DB >> 31551047 |
Daniel Shnier1, Mircea A Voineagu1, Irina Voineagu2.
Abstract
Persistent homology methods have found applications in the analysis of multiple types of biological data, particularly imaging data or data with a spatial and/or temporal component. However, few studies have assessed the use of persistent homology for the analysis of gene expression data. Here we apply persistent homology methods to investigate the global properties of gene expression in post-mortem brain tissue (cerebral cortex) of individuals with autism spectrum disorders (ASD) and matched controls. We observe a significant difference in the geometry of inter-sample relationships between autism and healthy controls as measured by the sum of the death times of zero-dimensional components and the Euler characteristic. This observation is replicated across two distinct datasets, and we interpret it as evidence for an increased heterogeneity of gene expression in autism. We also assessed the topology of gene-level point clouds and did not observe significant differences between ASD and control transcriptomes, suggesting that the overall transcriptome organization is similar in ASD and healthy cerebral cortex. Overall, our study provides a novel framework for persistent homology analyses of gene expression data for genetically complex disorders.Entities:
Keywords: autism; gene expression; persistent homology; topology; transcriptome
Year: 2019 PMID: 31551047 PMCID: PMC6769309 DOI: 10.1098/rsif.2019.0531
Source DB: PubMed Journal: J R Soc Interface ISSN: 1742-5662 Impact factor: 4.118
Figure 1.Study overview. For each gene expression dataset, the ASD and control groups were analysed by generating either a gene-level or a sample-level distance matrix (1-Pearson correlation). Distance matrices were used to compute persistence diagrams and their corresponding Betti number and Euler characteristic. The difference in these topological invariants between ASD and controls was then assessed for significance by random permutation of sample labels. (Online version in colour.)
Figure 2.Schematic representation of basic persistent homology concepts. (a) Vietoris–Rips simplicial complexes VR(V, ε) formed by a cloud V of four points, at increasing ε values (ε is arbitrary, for illustration purposes). (b) Persistence diagram of the point cloud shown in (a). Zero-dimensional components are shown as red circles, one-dimensional components are shown as green triangles. For each component, the x-axis represents the ε value at which it is born (i.e. persistence interval start), and the y-axis represents the ε value at which it dies (i.e. persistence interval end). Persistent components are those located away from the diagonal. (c) Hypothetical examples of two point clouds of different degrees of heterogeneity. The number of points is the same in both point clouds, i.e. 13 points. The bottom example is more heterogeneous than the top example. Using circles of the same radius (ε/2), for the top example, we have an associated simplicial set with 2 connected components so the associated Vietoris–Rips complex VR(V, ε) has 2 connected components, while in the bottom example, we have an associated Vietoris–Rips complex VR(V, ε) with 13 connected components. Therefore, we have more connected components where the point cloud is more heterogeneous. (Online version in colour.)
Figure 3.Persistent homology analysis of sample-level point clouds. (a) Persistence diagrams of ASD and control groups, based on the microarray dataset. (b) The same persistence diagrams as in (a) are plotted with a zoomed-in y-axis, to better visualize components with dimension greater than 0. (c) (i) Density plot of SDT difference between ASD and controls (DSDT) generated by 100 000 random permutations of sample labels. Vertical red line: observed DSDT value. (ii) Density plot of Euler characteristic difference between ASD and controls (D) generated by 100 000 random permutations of sample labels. Vertical red line: observed D value. (d) Persistence diagrams of ASD and control groups, based on the RNA-seq dataset. (e) The same persistence diagrams as in (d) are plotted with a zoomed-in y-axis, to better visualize components with dimension greater than 0. (f) (i) Density plot of SDT difference between ASD and controls (DSDT) generated by 1000 random permutations of sample labels. Vertical red line: observed DSDT value. (ii) Density plot of Euler characteristic difference between ASD and controls (D) generated by 1000 random permutations of sample labels. Vertical red line: observed D value. (Online version in colour.)