| Literature DB >> 35086932 |
Li Lin1, Yufeng Zhang2, Weizhou Qian1, Yao Liu3, Yingkun Zhang1, Fanghe Lin1, Cenxi Liu2, Guangxing Lu2, Di Sun4, Xiaoxu Guo1, YanLing Song1, Jia Song5, Chaoyong Yang6,4, Jin Li7,8.
Abstract
Single-cell RNA-sequencing (scRNA-seq) has become a powerful tool for biomedical research by providing a variety of valuable information with the advancement of computational tools. Lineage analysis based on scRNA-seq provides key insights into the fate of individual cells in various systems. However, such analysis is limited by several technical challenges. On top of the considerable computational expertise and resources, these analyses also require specific types of matching data such as exogenous barcode information or bulk assay for transposase-accessible chromatin with high throughput sequencing (ATAC-seq) data. To overcome these technical challenges, we developed a user-friendly computational algorithm called "LINEAGE" (label-free identification of endogenous informative single-cell mitochondrial RNA mutation for lineage analysis). Aiming to screen out endogenous markers of lineage located on mitochondrial reads from label-free scRNA-seq data to conduct lineage inference, LINEAGE integrates a marker selection strategy by feature subspace separation and de novo "low cross-entropy subspaces" identification. In this process, the mutation type and subspace-subspace "cross-entropy" of features were both taken into consideration. LINEAGE outperformed three other methods, which were designed for similar tasks as testified with two standard datasets in terms of biological accuracy and computational efficiency. Applied on a label-free scRNA-seq dataset of BRAF-mutated cancer cells, LINEAGE also revealed genes that contribute to BRAF inhibitor resistance. LINEAGE removes most of the technical hurdles of lineage analysis, which will remarkably accelerate the discovery of the important genes or cell-lineage clusters from scRNA-seq data.Entities:
Keywords: BRAF inhibitor resistance; lineage analysis; single-cell RNA-seq
Mesh:
Substances:
Year: 2022 PMID: 35086932 PMCID: PMC8812554 DOI: 10.1073/pnas.2119767119
Source DB: PubMed Journal: Proc Natl Acad Sci U S A ISSN: 0027-8424 Impact factor: 12.779
Fig. 1.A schematic representation of LINEAGE. (A) The whole analysis process of LINEAGE. Using full-length scRNA-seq dataset as input, mitochondrial RNA variants are called and the variant-frequency matrix is generated for lineage inference. (B) Feature selection. LINEAGE firstly screens highly variable variants across cells with different mutation types and then separates the merged highly variable variant-frequency matrix into subspaces according to their dynamic frequency patterns across cells. Subspace–subspace cross-entropy calculation is then conducted based on ARI calculation among clusters from different subspaces to find out the “low cross-entropy subspaces,” which show higher consensus among subspaces than other subspaces. (C) Consensus clustering. LINEAGE learns a strong, informative similarity matrix by using similarity and cell group information from selected low–cross-entropy subspaces. LINEAGE then applies the learned similarity for initial cell-clustering and group marker identification. The group markers are then used as lineage-related mutations to refine the inference.
Fig. 2.Low–cross-entropy subspaces selected by LINEAGE. (A) Low–cross-entropy subspaces of a scRNA-seq dataset (TF1 clones) containing 70 cells with exogenous barcoding from three clones. The different clones are labeled in different colors. The distinctive clone groups in the subspaces are circled in red. (B) Low–cross-entropy subspaces of a scRNA-seq dataset (TF1 barcoding) containing 158 cells with exogenous barcoding from 11 clones. The different clones are labeled in different colors. The distinctive clone groups in the subspaces are circled in red.
Fig. 3.Performance comparison among four methods. (A) Performance comparison on a standard dataset with three clones. Clone information is labeled by “Clones” annotation bars above the heatmap. The cluster groups inferred by TBSP and Seurat are also labeled by the “Group” annotation bars above the heatmap. (B) Performance comparison on a standard dataset with 11 clones. Clone information is labeled by “Clones” annotation bars above the heatmap. The cluster groups inferred by TBSP and Seurat are also labeled by the “Group” annotation bars above the heatmap. (C) Performance comparison based on NNE, which was inferred with the t-SNE distribution, resulted from all of the four methods.
Fig. 4.LINEAGE identified important clonal evolution–related genes from a cancer dataset. (A) The lineage analysis result visualized by t-SNE plot from LINEAGE. Cells are labeled according to its BRAF inhibitor resistance status. (B) The lineage tree from LINEAGE. Cells are labeled according to its BRAF inhibitor resistance status as well as clonal status. (C) The expression levels of DCT and GSTP1 across resistant and clonal status (P = 0.39 for "ns" in cluster A, P = 0.22 for "ns" in cluster B). (D) Gene ontology enrichment results of 64 differential expressed genes. (E) Cell viability determination under treatment of BRAF inhibitor Vemurafenib and GST inhibitor GSTO-IN-2 in two melanoma cell lines carrying BRAF V600E mutation (A2058: 4 μM V,1.25 μM G and A375: 8 μM V, 2.5 μM G. *P < 0.05, ****P < 0.0001).