| Literature DB >> 34320340 |
Yang Yang1, Hongjian Sun2, Yu Zhang3, Tiefu Zhang4, Jialei Gong5, Yunbo Wei3, Yong-Gang Duan5, Minglei Shu6, Yuchen Yang7, Di Wu8, Di Yu9.
Abstract
Transcriptomic analysis plays a key role in biomedical research. Linear dimensionality reduction methods, especially principal-component analysis (PCA), are widely used in detecting sample-to-sample heterogeneity, while recently developed non-linear methods, such as t-distributed stochastic neighbor embedding (t-SNE) and uniform manifold approximation and projection (UMAP), can efficiently cluster heterogeneous samples in single-cell RNA sequencing analysis. Yet, the application of t-SNE and UMAP in bulk transcriptomic analysis and comparison with conventional methods have not been achieved. We compare four major dimensionality reduction methods (PCA, multidimensional scaling [MDS], t-SNE, and UMAP) in analyzing 71 large bulk transcriptomic datasets. UMAP is superior to PCA and MDS but shows some advantages over t-SNE in differentiating batch effects, identifying pre-defined biological groups, and revealing in-depth clusters in two-dimensional space. Importantly, UMAP generates sample clusters uncovering biological features and clinical meaning. We recommend deploying UMAP in visualizing and analyzing sizable bulk transcriptomic datasets to reinforce sample heterogeneity analysis.Entities:
Keywords: PCA; UMAP; bulk transcriptomics; clustering structure; dimensionality reduction; heterogeneity analysis; t-SNE
Mesh:
Year: 2021 PMID: 34320340 DOI: 10.1016/j.celrep.2021.109442
Source DB: PubMed Journal: Cell Rep Impact factor: 9.423