Literature DB >> 35086932

LINEAGE: Label-free identification of endogenous informative single-cell mitochondrial RNA mutation for lineage analysis.

Li Lin1, Yufeng Zhang2, Weizhou Qian1, Yao Liu3, Yingkun Zhang1, Fanghe Lin1, Cenxi Liu2, Guangxing Lu2, Di Sun4, Xiaoxu Guo1, YanLing Song1, Jia Song5, Chaoyong Yang6,4, Jin Li7,8.   

Abstract

Single-cell RNA-sequencing (scRNA-seq) has become a powerful tool for biomedical research by providing a variety of valuable information with the advancement of computational tools. Lineage analysis based on scRNA-seq provides key insights into the fate of individual cells in various systems. However, such analysis is limited by several technical challenges. On top of the considerable computational expertise and resources, these analyses also require specific types of matching data such as exogenous barcode information or bulk assay for transposase-accessible chromatin with high throughput sequencing (ATAC-seq) data. To overcome these technical challenges, we developed a user-friendly computational algorithm called "LINEAGE" (label-free identification of endogenous informative single-cell mitochondrial RNA mutation for lineage analysis). Aiming to screen out endogenous markers of lineage located on mitochondrial reads from label-free scRNA-seq data to conduct lineage inference, LINEAGE integrates a marker selection strategy by feature subspace separation and de novo "low cross-entropy subspaces" identification. In this process, the mutation type and subspace-subspace "cross-entropy" of features were both taken into consideration. LINEAGE outperformed three other methods, which were designed for similar tasks as testified with two standard datasets in terms of biological accuracy and computational efficiency. Applied on a label-free scRNA-seq dataset of BRAF-mutated cancer cells, LINEAGE also revealed genes that contribute to BRAF inhibitor resistance. LINEAGE removes most of the technical hurdles of lineage analysis, which will remarkably accelerate the discovery of the important genes or cell-lineage clusters from scRNA-seq data.
Copyright © 2022 the Author(s). Published by PNAS.

Entities:  

Keywords:  BRAF inhibitor resistance; lineage analysis; single-cell RNA-seq

Mesh:

Substances:

Year:  2022        PMID: 35086932      PMCID: PMC8812554          DOI: 10.1073/pnas.2119767119

Source DB:  PubMed          Journal:  Proc Natl Acad Sci U S A        ISSN: 0027-8424            Impact factor:   12.779


Lineage analysis is an important assay for developmental biology, cancer biology, etc. Classical lineage analysis in developmental biology study is hypothesis driven and relies on the “pulse-chase” model with an inducible Cre-LoxP system (1, 2). This system labels the progenitors permanently so the source of a mature cell type can be identified via lineage tracing. Lineage analysis also can be used to identify the genes correlated to the clonal evolution of cancer cells upon treatment, which can be done on either primary cancer samples or cancer cell lines based on the somatic mutations. Traditionally, lineage analysis is time consuming, technically challenging, and in demand of much preexisting knowledge. A tool is desired to simplify the lineage analysis on complex tissues. Single-cell RNA-sequencing (scRNA-seq) has become a powerful tool for biomedical research (3–6). Initially employed as an assay to identify clusters of cells with distinct transcriptomic features, scRNA-seq data now can provide additional information with the advancement of computational tools (7–10). Because scRNA-seq can profile many cell types simultaneously, potentially, it is a useful tool for lineage analysis. However, this application normally relies on exogenous barcodes created by transforming barcode libraries (11) or Cas9-based genome-editing tools (12). These cellular barcodes therefore are used as reference for clonal clustering. Regardless of the complexity of performing such experiments, it is impossible to directly analyze the lineage information of clinical samples with this strategy. We therefore aim to simplify the lineage analysis by developing a user-friendly computational algorithm based on endogenous markers of scRNA-seq data. The whole-genome RNA single nucleotide polymorphisms (SNP) has been used as endogenous markers for lineage study (13), though the requirement of high sequencing coverage limits its application. In comparison, the size of mitochondrial genome is relatively small. Thus, mitochondrial RNA variant is a great resource as endogenous makers. We have shown that it is possible to perform lineage analysis with mitochondrial RNA variants based on scRNA-seq data on human islet (14). Unfortunately, the challenge for lineage analysis with mitochondrial RNA variants is to identify the informative variants (mutations) for following inference, without knowing the features of clones in the cells. Ludwig et al. (15) tried to solve this problem by analyzing bulk assay for transposase-accessible chromatin with high throughput sequencing (ATAC-seq) data, which can provide reliable and informative mitochondria genome variants as reference, from the same sample at first. Clustering is then performed on scRNA-seq data based on these variants identified from the bulk ATAC-seq data. The requirement of parallel bulk ATAC-seq obviously limits the application of this strategy. A tool solely based on scRNA-seq is more desirable. The bottleneck of performing de novo and label-free lineage analysis solely based on scRNA-seq is to identify the informative variants for clones, so called clonal features, without preexisting knowledge. The clonal features are rather sparse and more sensitive to sequencing error and coverages in comparison to expression features. Due to their unique properties, the clonal features cannot be identified with the similar strategies as expression features of scRNA-seq data. Thus, we developed an algorithm to call the clonal features efficiently based on “low cross-entropy subspace” separation and identification. Instead of using whole-genome SNPs as endogenous markers (13), we employed the mitochondrial RNA SNP for the analysis to avoid the limitation of sequencing coverage. The initial feature calling still showed an unignorable level of noise, due to sequencing errors and low coverage. We therefore improved the feature-selection process by integrating mutation type and subspace–subspace cross-entropy into consideration. Cross-entropy is a loss function that can be used to quantify the difference between two probability distributions (16). Based on this concept from information theory, we defined a ‘cross-entropy’ measure to quantify the difference of the embedded cluster structures among subspaces. The “low cross-entropy subspaces” discovered in this strategy could well capture lineage information from scRNA-seq. Meanwhile, we also integrated an optimized consensus-clustering process with a refinement step to further fully capture and refine the lineage structure from the “low cross-entropy subspaces.” This computational algorithm, called label-free identification of endogenous informative single-ccll mitochondrial RNA mutation for lineage analysis (LINEAGE), is one for de novo label-free lineage analysis of scRNA-seq based on endogenous lineage markers selection. We applied LINEAGE on a label-free BRAF inhibitor resistance study to identify and validate the genes associated to the resistance in melanoma. We expect the application of LINEAGE to dramatically accelerate lineage analysis–related studies.

Results

Working Principle of the Lineage Analysis in LINEAGE.

Full-length scRNA-seq data generated by Smart-seq2 (17) protocol was used for lineage analysis. The mitochondrial RNA variants were called and the allele frequency (AFx,b) was calculated to produce the variants frequency matrix (Fig. 1 our preprocessing codes can be downloaded at https://github.com/songjiajia2018/ppl).
Fig. 1.

A schematic representation of LINEAGE. (A) The whole analysis process of LINEAGE. Using full-length scRNA-seq dataset as input, mitochondrial RNA variants are called and the variant-frequency matrix is generated for lineage inference. (B) Feature selection. LINEAGE firstly screens highly variable variants across cells with different mutation types and then separates the merged highly variable variant-frequency matrix into subspaces according to their dynamic frequency patterns across cells. Subspace–subspace cross-entropy calculation is then conducted based on ARI calculation among clusters from different subspaces to find out the “low cross-entropy subspaces,” which show higher consensus among subspaces than other subspaces. (C) Consensus clustering. LINEAGE learns a strong, informative similarity matrix by using similarity and cell group information from selected low–cross-entropy subspaces. LINEAGE then applies the learned similarity for initial cell-clustering and group marker identification. The group markers are then used as lineage-related mutations to refine the inference.

A schematic representation of LINEAGE. (A) The whole analysis process of LINEAGE. Using full-length scRNA-seq dataset as input, mitochondrial RNA variants are called and the variant-frequency matrix is generated for lineage inference. (B) Feature selection. LINEAGE firstly screens highly variable variants across cells with different mutation types and then separates the merged highly variable variant-frequency matrix into subspaces according to their dynamic frequency patterns across cells. Subspace–subspace cross-entropy calculation is then conducted based on ARI calculation among clusters from different subspaces to find out the “low cross-entropy subspaces,” which show higher consensus among subspaces than other subspaces. (C) Consensus clustering. LINEAGE learns a strong, informative similarity matrix by using similarity and cell group information from selected low–cross-entropy subspaces. LINEAGE then applies the learned similarity for initial cell-clustering and group marker identification. The group markers are then used as lineage-related mutations to refine the inference. Due to its nature, variant frequency has many more noises than gene expression. These noises may come from many aspects of data including sequencing errors and low coverage. Therefore, variant-related analysis (18–22) normally needs bulk sequencing data with high coverage. However, scRNA-seq data are highly sparse, which makes it difficult to do feature selection. Here, we performed feature selection on the variant frequency from cells containing distinct clones with Seruat version 3.0/version 4.0 (8, 23) and Entropy subspace separation-based clustering for noise reduction (ENCORE). Unfortunately, neither of these computational algorithms managed to identify features with clonal information (). These results suggested that traditional feature-selection strategies, which were designed for screening expression features, need to be revised for the analysis of variants frequency. To address this point, LINEAGE (https://github.com/songjiajia2018/LINEAGE) developed a feature-selection strategy to efficiently pick out lineage-informative variants (defined as clonal features) from scRNA-seq datasets. Firstly, the variant-frequency matrix is separated into 12 submatrices according to the mutation types, and highly variable sites are discovered in each submatrix. A merged-frequency matrix with 12*20 highly variable variants is generated. This process guarantees that the initial selected highly variable features contain variants with different mutation types. So, the downstream analysis may avoid being misled by mutational type–specific systemic noises such as sequencing errors. Then, we applied the same hypothesis as ENCORE that features with similar dynamic pattern tend to capture similar cell cluster structures. Thus, subspace separation was performed on the merged matrix to generate 20 subspaces based on the dynamic patterns of variants frequency. In this way, variants with similar frequency dynamic patterns would be clustered into the same subspace, and cluster signals resulted from different events (noise, lineage, or other events) tend to be clearly separated. Then to find out informative feature subspaces for lineage inference, LINEAGE used a method to define the “cross-entropy” among subspaces and pick out “low cross-entropy subspaces” as informative subspaces. In detail, the consensus status among subspaces is indicated by “cross-entropy,” which is defined based on the adjusted rand index (ARI, detailed in ). This is based on the hypothesis that cluster structures with more consensus information among subspaces are more likely generated by informative events as lineage structures. By default, six subspaces with lowest cross-entropy among subspaces are selected for downstream analysis ( and Fig. 1). Then LINEAGE performed consensus clustering to get the initial clusters with clonal information based on these “low cross-entropy subspaces.” Candidate endogenous markers were subsequently identified for each candidate cluster. To improve the accuracy, LINEAGE applied a refinement process by refining the distance between cells based on these marker variants (Fig. 1). The final result of clonal identification was presented as t-distributed stochastic neighbor embedding (t-SNE)/Uniform Manifold Approximation and Projection (UMAP) plot as well as heatmap. The details of variant-frequency matrix generation, subspace separation, low cross-entropy subspace selection, and consensus clustering are described in .

LINEAGE Identified Clones from ScRNA-Seq Data in a De Novo and Label-Free Fashion.

We firstly tested LINEAGE on a simulated dataset, which consists of two human melanoma cell lines, A375 and 451Lu (24) (dataset description is detailed in ). LINEAGE can separate the cells from different cell lines accurately (). To further test the performance of LINEAGE on datasets with cells from close lineages, we tested LINEAGE on a scRNA-seq dataset (named as TF1 clones) containing 70 cells with exogenous barcoding from three clones and a more-complicated scRNA-seq dataset (named as TF1 barcoding) containing 158 cells with exogenous barcoding from 11 clones (15) (dataset description is detailed in ). In both cases, data from six subspaces were selected. We found that the lineage information of different clones was captured in different subspaces (Fig. 2). In comparison, the unselected subspaces (high–cross-entropy subspaces) showed little lineage structure ().
Fig. 2.

Low–cross-entropy subspaces selected by LINEAGE. (A) Low–cross-entropy subspaces of a scRNA-seq dataset (TF1 clones) containing 70 cells with exogenous barcoding from three clones. The different clones are labeled in different colors. The distinctive clone groups in the subspaces are circled in red. (B) Low–cross-entropy subspaces of a scRNA-seq dataset (TF1 barcoding) containing 158 cells with exogenous barcoding from 11 clones. The different clones are labeled in different colors. The distinctive clone groups in the subspaces are circled in red.

Low–cross-entropy subspaces selected by LINEAGE. (A) Low–cross-entropy subspaces of a scRNA-seq dataset (TF1 clones) containing 70 cells with exogenous barcoding from three clones. The different clones are labeled in different colors. The distinctive clone groups in the subspaces are circled in red. (B) Low–cross-entropy subspaces of a scRNA-seq dataset (TF1 barcoding) containing 158 cells with exogenous barcoding from 11 clones. The different clones are labeled in different colors. The distinctive clone groups in the subspaces are circled in red. The capability of accurate lineage inference was compared among LINEAGE and three other methods, including Ludwig et al. (15) (mitochondrial SNPs from both bulk ATAC-seq + scRNA-seq), trajectory inference based on SNP information (TBSP) (13) (whole-genome RNA SNPs + expression), and Seurat version 3 (23) (gene expression). In the dataset containing 70 cells/three clones, LINEAGE outperformed the other methods as correctly sorting all the cells into corresponding clones (Fig. 3 and ). For the dataset containing 158 cells/11 clones, LINEAGE identified clones with comparable accuracy as Ludwig et al. (15) and much-higher accuracy than the other two methods (Fig. 3 and ). Especially, there is a large overlap of the markers selected by Ludwig et al. (15) and LINEAGE (), although LINEAGE does not require high-depth bulk ATAC-seq data from same samples.
Fig. 3.

Performance comparison among four methods. (A) Performance comparison on a standard dataset with three clones. Clone information is labeled by “Clones” annotation bars above the heatmap. The cluster groups inferred by TBSP and Seurat are also labeled by the “Group” annotation bars above the heatmap. (B) Performance comparison on a standard dataset with 11 clones. Clone information is labeled by “Clones” annotation bars above the heatmap. The cluster groups inferred by TBSP and Seurat are also labeled by the “Group” annotation bars above the heatmap. (C) Performance comparison based on NNE, which was inferred with the t-SNE distribution, resulted from all of the four methods.

Performance comparison among four methods. (A) Performance comparison on a standard dataset with three clones. Clone information is labeled by “Clones” annotation bars above the heatmap. The cluster groups inferred by TBSP and Seurat are also labeled by the “Group” annotation bars above the heatmap. (B) Performance comparison on a standard dataset with 11 clones. Clone information is labeled by “Clones” annotation bars above the heatmap. The cluster groups inferred by TBSP and Seurat are also labeled by the “Group” annotation bars above the heatmap. (C) Performance comparison based on NNE, which was inferred with the t-SNE distribution, resulted from all of the four methods. The performance of the four methods was also quantified by the Nearest Neighbor Error (NNE) (25), which represents the error neighbor relationships among cells captured by each method (Fig. 3). The calculation process of NNE is detailed in . LINEAGE performed as good as Ludwig et al. (15) as their NNE scores were comparable. The running time and the required input data of these four methods are shown in . Obviously, LINEAGE has the higher computational efficiency than other variant-based methods. In general, LINEAGE can perform lineage analysis well on label-free scRNA-seq solely, without the requirement of preexisting bulk ATAC-seq data and exogenous barcodes.

LINEAGE Reveals Transcriptomic Features of BRAF Inhibitor–Resistant Clones in Cancer Cells with BRAF V600E Mutation.

We then applied LINEAGE on a scRNA-seq dataset of BRAF V600E mutated melanoma cells 451Lu, which contains parental cells and BRAF inhibitor–resistant cells (24). By performing lineage analysis, we aimed to analyze the clonal evolution with BRAF inhibitor treatment and identify the genes correlated to BRAF inhibitor resistance. Two clusters with distinct clonal features were discovered (Fig. 4 and ), defined as either sensitive Cluster A (the majority are parental cells) or resistant Cluster B (the majority are BRAF inhibitor–resistant cells). The differential distribution of parental and resistant cells in Clusters A and B indicated that the clonal evolution process happened in the selection process. Gene-expression comparison was performed between Clusters A and B. In total, 64 significantly changed genes were found (; method is detailed in ). The originally reported resistant gene DCT showed elevation in both sensitive and resistant clones, indicating that it may not be directly correlated to the clonal evolution (Fig. 4).
Fig. 4.

LINEAGE identified important clonal evolution–related genes from a cancer dataset. (A) The lineage analysis result visualized by t-SNE plot from LINEAGE. Cells are labeled according to its BRAF inhibitor resistance status. (B) The lineage tree from LINEAGE. Cells are labeled according to its BRAF inhibitor resistance status as well as clonal status. (C) The expression levels of DCT and GSTP1 across resistant and clonal status (P = 0.39 for "ns" in cluster A, P = 0.22 for "ns" in cluster B). (D) Gene ontology enrichment results of 64 differential expressed genes. (E) Cell viability determination under treatment of BRAF inhibitor Vemurafenib and GST inhibitor GSTO-IN-2 in two melanoma cell lines carrying BRAF V600E mutation (A2058: 4 μM V,1.25 μM G and A375: 8 μM V, 2.5 μM G. *P < 0.05, ****P < 0.0001).

LINEAGE identified important clonal evolution–related genes from a cancer dataset. (A) The lineage analysis result visualized by t-SNE plot from LINEAGE. Cells are labeled according to its BRAF inhibitor resistance status. (B) The lineage tree from LINEAGE. Cells are labeled according to its BRAF inhibitor resistance status as well as clonal status. (C) The expression levels of DCT and GSTP1 across resistant and clonal status (P = 0.39 for "ns" in cluster A, P = 0.22 for "ns" in cluster B). (D) Gene ontology enrichment results of 64 differential expressed genes. (E) Cell viability determination under treatment of BRAF inhibitor Vemurafenib and GST inhibitor GSTO-IN-2 in two melanoma cell lines carrying BRAF V600E mutation (A2058: 4 μM V,1.25 μM G and A375: 8 μM V, 2.5 μM G. *P < 0.05, ****P < 0.0001). Many of these genes were enriched in gene ontology (GO) term “GO_CC: MITOCHONDRION” with Gene Set Enrichment Analysis (Fig. 4) (26). Since the connection between mitochondria and BRAF inhibitor resistance has been widely observed (27, 28), this result actually validated the function of LINEAGE. We then explored the detailed mechanism of the mitochondria–BRAF connection by focusing on the top differentially expressed gene GSTP1 (Fig. 4), which encodes an important redox regulator glutathione-S-transferase (GST). It is known that GST inhibitor can regulate the pigment generation in melanocytes (29). It has also been reported that the disruption of redox balance, which is an important function of GST, may conquer the resistance to BRAF inhibitor in melanoma cells (30). In consistence, combinational treatment of BRAF inhibitor Vemurafenib and GST inhibitor GSTO-IN-2 (31) synergistically decreased the cell viability of two melanoma cell lines (Fig. 4). GST can be a target to induce synthetic lethality in BRAF V600E mutated cancer cells.

Discussion

Many lineage analysis–related studies, such as carcinogenesis studies, cancer resistance studies (32), or even developmental biology studies (33), have used scRNA-seq to understand the detailed mechanism. However, the requirement of exogenous barcode prevents many scientists from performing such kind of studies. Although several computational algorithms claimed that they can perform lineage analysis without exogenous barcodes, the requirement of preexisting knowledge such as parallel bulk WGS/ATAC-seq data on the same samples is certainly a technical challenge. In addition, the demand of extensive computational expertise and resource is also beyond many laboratories’ capability. We created a “low cross-entropy subspace” separation and consensus clustering–based analysis as LINEAGE. LINEAGE uses informative mitochondrial RNA variants as endogenous markers, which has relatively small size, and its polymorphism is a frequent event across different tissues and ages (34). In comparison to the method from Ludwig et al. (15), LINEAGE simplifies the endogenous markers identification process and can be applied to scRNA-seq studies without preexisting bulk WGS/ATAC-seq data. In comparison to TBSP, LINEAGE requires shorter running time and has largely improved performance. We tested LINEAGE on a classical clonal evolution study of BRAF inhibitor–resistant melanoma cells. By analyzing the label-free scRNA-seq dataset from this study, we discovered the sensitive and resistant clones. Differential expression analysis identified GSTP1 as a BRAF V600E mutation resistance–related gene. GST inhibitor can sensitize melanoma cells to BRAF inhibitor treatment. Therefore, it may serve as the target to develop synthetic lethality therapies for BRAF V600E mutated cancer cells if this result can be validated by in vivo experiment. In summary, LINEAGE removes most of the technical hurdles on performing lineage analysis by a “low cross-entropy subspace” separation and consensus clustering–based analysis. Due to the requirement of sequencing depth to call variants, it is still difficult to perform lineage analysis on scRNA-seq data from 3′/5′-end-directed scRNA-seq technologies. However, with LINEAGE, it is possible to perform lineage analysis on much existing Smart-seq2 data if desired. Biologists can spare their time and energy on answering biological questions instead of establishing complex labeling system or perform intensive computational analysis. The application of LINEAGE may remarkably accelerate the discovery of the important genes or cell clusters in the diverse context of biomedical research.

Materials and Methods

Data Preprocessing.

Read alignment.

All scRNA-seq datasets used in this study were obtained from National Center for Biotechnology Information (https://www.ncbi.nlm.nih.gov/) with following accessions: Gene Expression Omnibus (GEO): GSE115218 (TF1_clones_scRNA, TF1_barcoding_scRNA) (15) and GEO: GSE108383 (scRNA-seq of A375 and 451Lu cell lines were used) (24). The reads were aligned to the GRCh38 human genome and its associated annotations (GRCh38.98) using Spliced Transcripts Alignment to a Reference (STAR) version 2.7.1a (35) with default parameters.

Mitochondrial genotype matrix generating.

A bam file consisting of mitochondrial DNA records, which were extracted from the alignment result with Samtools (version 1.9) (36), was obtained. The total number of reads aligned to per allele on each site of mitochondrial genome were counted using a Python script (15). Here, nucleotides with minimum base quality or minimum read’s alignment quality <30 were filtered out. The variant frequency (AF,) was defined as the following: R is the number of reads holding base b at position x; is the total coverage of position ×. Mitochondrial genotype matrix (M), where a column represented a single cell and a row represented variant frequency of a specific mitochondrial genotype, was thus generated.

Highly variable site identification.

To screen out highly variable sites with different mutation types, M was split into 12 submatrices (Mi, i = 1, 2, 3…12) according to mutation types. The 20 highest-variable sites were then called by identifying the rows with highest SD across cells in each submatrix. Considering the sparsity of the matrix, which heavily affected the SD values, LINEAGE transformed the zeroes into ones when the median of the nonzero frequencies in the same row ≥0.6. Thus, a merged submatrix M240 of M was obtained by merging the resulted highly variable sites into a matrix.

Subspace Separation.

Hierarchical clustering was carried out to reseparate the M240 into 20 submatrices according to the frequency dynamic patterns. In detail, the similarity between pairs of rows in M240 was commonly quantified by Pearson's correlation tests. The distance matrix (D) was then generated as:where J is an all-ones matrix and S is the similarity matrix. Hierarchical clusters were obtained based on this distance matrix, and the feature spaces with features (variants) from each resulted cluster were defined as “feature subspaces.” In this way, highly variable sites with similar frequency dynamic patterns would be grouped into the same feature subspace; thus, similar clonal cluster signals can be concentrated in same subspace.

Entropy evaluation.

In each subspace, LINEAGE used t-SNE followed by k-means clustering to realize cell-clustering (we defined these cluster results as Csubs). The k, which indicated the number of clusters, in the k-means clustering process was set to 3 for datasets with cells >100 and otherwise was set to 2. To effectively identify subspaces that might contain clone lineage information, LINEAGE calculated ARI between pairs of subspaces as consensus indicator and got the consensus index matrix as A. LINEAGE defined entropy based on the cell distribution similarity among subspaces, so called “cross-entropy” here. The subspaces with lowest “cross-entropy” (I, i = 1, 2, 3…20), which indicated highest consensus with other subspaces, were selected for subsequent consensus clustering. In this study, six low–cross-entropy subspaces were selected in all cases, and the number of subspaces can be adjusted by parameter. Here, I was defined as:

Consensus Clustering.

Then, LINEAGE generated a combined distance matrix as follows:where D is the distance matrix from a selected subspace. Meanwhile, LINEAGE also characterized consensus information across subspaces by calculating a consensus-factor matrix based on the clustering results (Csubs) from the selected subspaces: By integrating distance information and consensus information, LINEAGE generated a more-integrative distance matrix D as follows:where J is an all-ones matrix. Basing on distance matrix D, t-SNE followed by k-means clustering strategy was used to infer the initial consensus cell clone clusters. Here, an adaptive density peak detection algorithm implemented in the ADPclust package in R (37) was integrated to accurately infer the number of clusters.

Marker Variants Identification.

Group marker identification.

To screen out the marker variants, LINEAGE transformed the consensus-clustering result into a binary cluster: for a cell group, if a cell is in this group, set its cluster label as 1; if not, set as 0. For each highly variable variant, a receiver operating characteristic (38) curve was built, and the area under the receiver operating characteristic (AUC) score was thus calculated with the frequency distribution as predictor and the binary cluster labels of each group as response. Pearson's correlation coefficient was also calculated between the binary cluster labels and the frequency distribution. Variants with P values <0.05 were considered as cell group markers. These markers were ranked by AUC scores, since higher AUC scores indicated more-reliable markers. Subsequently, 10 to 20 markers with the highest AUC scores were used to refine the consensus-clustering result.

Refinement based on marker variants.

After the frequency submatrix consisting of frequencies of markers (Mm) was gotten, cells with zeroes on all markers were removed. LINEAGE integrated both t-SNE and UMAP (39) for dimension reduction and got refined consensus-clustering results based on the dimension-reduction results separately.

Iterative optimization.

Considering the randomness from clustering and dimension-reduction processes, an iteration process was implemented in LINEAGE to guarantee a more-stable and -reliable cell-clustering/clone-tracing result. Based on the assumption that real clone clusters always show more-reliable markers and cell cluster information from more-effective subspaces with larger information capacity, a measurement Sscore was defined for optimization as follows:where is the sum of the 10 greatest AUC scores of inferred markers. Dscore is a score calculated in the refinement process. Concretely, LINEAGE calculated ARIs between the refined clustering results (resulted from t-SNE or UMAP-Kmeans procedures) and the clustering results in the selected subspaces (Csubs). Subspaces with ARI > 0.1 were recorded as effective subspaces and the number of effective subspaces was labeled as n. To evaluate the information capacity of the consensus results, a Dscore was defined to reflect the consensus among the refined results and the selected subspaces: where Bmax represents the maximum ARI between the refined consensus-clustering result and the clusters in effective subspaces; thus, n+Bmax indicates the consensus information capacity from effective subspaces. Meanwhile, represents the average of a submatrix of A, which consists of ARI values among effective subspaces, and DARI represents the information overlap status among effective subspaces. Lower DARI means higher overlap. In this way, consensus result-containing cluster structures from various subspaces with low-overlap cluster information is more preferred. Among the iteration with same or different parameters, the one with highest was reserved as the best result.

Methods Performance Evaluation and Comparison.

A simulated dataset was generated by mixing two human melanoma cell lines A375 and 451Lu. LINEAGE processes described as above (Data Preprocessing, Subspace Separation, Consensus Clustering, and Marker Variants Identification) were carried out on the simulated and the two standard benchmark datasets. Codes of Seurat version 3 and version 4 (8, 23), ENCORE (40), the method developed by Ludwig et al. (15), and TBSP (13) were downloaded and run according to their manuals on the two benchmark datasets.

Cell Culture.

A2058 and A375 were cultured in Dulbecco's modified Eagle medium (Thermo Fisher Scientific) with 10% fetal bovine serum (Yeasen) and 5% penicillin/streptomycin (Gibco) at 37 °C with 5% CO2. Cell viability was performed with Cell Counting Kit-8 (CCK-8; Yeasen) according to the manufacturer's instructions. Briefly, cells were seeded into 96-well plate at a density of 1,000 cells per well. Cells were treated with different chemical combinations (Vemurafenib, GSTO-IN-2; MCE) and examined at the time point of 0, 24, 48, and 72 hours. At each time point, CCK-8 (10%) was added to the wells, and after an incubation of 1 h at 37 °C, absorbance was measured at 450 nm with a Microplate Reader Infinite F50 (Tecan).
  37 in total

1.  Fast clustering using adaptive density peak detection.

Authors:  Xiao-Feng Wang; Yifan Xu
Journal:  Stat Methods Med Res       Date:  2015-10-16       Impact factor: 3.021

2.  T cell immunity. Functional heterogeneity of human memory CD4⁺ T cell clones primed by pathogens or vaccines.

Authors:  Simone Becattini; Daniela Latorre; Federico Mele; Mathilde Foglierini; Corinne De Gregorio; Antonino Cassotta; Blanca Fernandez; Sander Kelderman; Ton N Schumacher; Davide Corti; Antonio Lanzavecchia; Federica Sallusto
Journal:  Science       Date:  2014-12-04       Impact factor: 47.728

3.  STAR: ultrafast universal RNA-seq aligner.

Authors:  Alexander Dobin; Carrie A Davis; Felix Schlesinger; Jorg Drenkow; Chris Zaleski; Sonali Jha; Philippe Batut; Mark Chaisson; Thomas R Gingeras
Journal:  Bioinformatics       Date:  2012-10-25       Impact factor: 6.937

4.  Genome sequencing of normal cells reveals developmental lineages and mutational processes.

Authors:  Meritxell Huch; Ruben van Boxtel; Wouter Karthaus; Sam Behjati; David C Wedge; Asif U Tamuri; Inigo Martincorena; Mia Petljak; Ludmil B Alexandrov; Gunes Gundem; Patrick S Tarpey; Sophie Roerink; Joyce Blokker; Mark Maddison; Laura Mudie; Ben Robinson; Serena Nik-Zainal; Peter Campbell; Nick Goldman; Marc van de Wetering; Edwin Cuppen; Hans Clevers; Michael R Stratton
Journal:  Nature       Date:  2014-06-29       Impact factor: 49.962

5.  Entropy subspace separation-based clustering for noise reduction (ENCORE) of scRNA-seq data.

Authors:  Jia Song; Yao Liu; Xuebing Zhang; Qiuyue Wu; Juan Gao; Wei Wang; Jin Li; Yanling Song; Chaoyong Yang
Journal:  Nucleic Acids Res       Date:  2021-02-22       Impact factor: 16.971

Review 6.  Lineage tracing meets single-cell omics: opportunities and challenges.

Authors:  Daniel E Wagner; Allon M Klein
Journal:  Nat Rev Genet       Date:  2020-03-31       Impact factor: 53.242

7.  Single-cell RNA-seq analysis identifies markers of resistance to targeted BRAF inhibitors in melanoma cell populations.

Authors:  Yu-Jui Ho; Naishitha Anaparthy; David Molik; Grinu Mathew; Toby Aicher; Ami Patel; James Hicks; Molly Gale Hammell
Journal:  Genome Res       Date:  2018-07-30       Impact factor: 9.043

8.  Single-cell lineage analysis reveals extensive multimodal transcriptional control during directed beta-cell differentiation.

Authors:  Chen Weng; Jiajia Xi; Haiyan Li; Jian Cui; Anniya Gu; Sisi Lai; Konstantin Leskov; Luxin Ke; Fulai Jin; Yan Li
Journal:  Nat Metab       Date:  2020-11-30

9.  Benchmarked approaches for reconstruction of in vitro cell lineages and in silico models of C. elegans and M. musculus developmental trees.

Authors:  Wuming Gong; Alejandro A Granados; Jingyuan Hu; Matthew G Jones; Ofir Raz; Irepan Salvador-Martínez; Hanrui Zhang; Ke-Huan K Chow; Il-Youp Kwak; Renata Retkute; Alidivinas Prusokas; Augustinas Prusokas; Alex Khodaverdian; Richard Zhang; Suhas Rao; Robert Wang; Phil Rennert; Vangala G Saipradeep; Naveen Sivadasan; Aditya Rao; Thomas Joseph; Rajgopal Srinivasan; Jiajie Peng; Lu Han; Xuequn Shang; Daniel J Garry; Thomas Yu; Verena Chung; Michael Mason; Zhandong Liu; Yuanfang Guan; Nir Yosef; Jay Shendure; Maximilian J Telford; Ehud Shapiro; Michael B Elowitz; Pablo Meyer
Journal:  Cell Syst       Date:  2021-06-18       Impact factor: 10.304

10.  HCV poly U/UC sequence-induced inflammation leads to metabolic disorders in vulvar lichen sclerosis.

Authors:  Qing Cong; Xiao Guo; Shengwei Zhang; Jinhui Wang; Yi Zhu; Lili Wang; Guangxing Lu; Yufeng Zhang; Wei Fu; Liying Zhou; Shuaikang Wang; Cenxi Liu; Jia Song; Chaoyong Yang; Chi Luo; Ting Ni; Long Sui; He Huang; Jin Li
Journal:  Life Sci Alliance       Date:  2021-06-18
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.