Meng Yuan1, Xujiang Yang1, Jinghua Lin2, Xiaolong Cao1, Feng Chen1, Xiaoyu Zhang1, Zizhang Li1, Guifeng Zheng2, Xueqin Wang3, Xiaoshu Chen4, Jian-Rong Yang5. 1. Department of Biomedical Informatics, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou 510080, China. 2. School of Data and Computer Science, Sun Yat-sen University, Guangzhou 510275, China. 3. Department of Statistics and Finance, School of Management, University of Science and Technology of China, Hefei 230026, China. 4. Department of Medical Genetics, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou 510080, China. Electronic address: chenxshu3@mail.sysu.edu.cn. 5. Department of Biomedical Informatics, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou 510080, China; RNA Biomedical Institute, Sun Yat-Sen Memorial Hospital, Sun Yat-sen University, Guangzhou 510120, China; Key Laboratory of Tropical Disease Control, Ministry of Education, Sun Yat-sen University, Guangzhou 510080, China. Electronic address: yangjianrong@mail.sysu.edu.cn.
Abstract
A full understanding of the developmental process requires fine-scale characterization of cell divisions and cell types, which are naturally organized as the developmental cell lineage tree (CLT). Technological breakthroughs facilitated determination of more CLTs, but complete comprehension of the data remains difficult without quantitative comparison among CLTs. We hereby quantified phenotypic similarity between CLTs using a novel computational method that exhaustively searches for optimal correspondence between individual cells meanwhile retaining their topological relationships. The revealed CLT similarities allowed us to infer functional similarity at the transcriptome level, identify cell fate transformations, predict functional relationships between mutants, and find evolutionary correspondence between cell types of different species. By allowing quantitative comparison between CLTs, our work is expected to greatly enhance the interpretability of relevant data and help answer the myriad of questions surrounding the developmental process.
A full understanding of the developmental process requires fine-scale characterization of cell divisions and cell types, which are naturally organized as the developmental cell lineage tree (CLT). Technological breakthroughs facilitated determination of more CLTs, but complete comprehension of the data remains difficult without quantitative comparison among CLTs. We hereby quantified phenotypic similarity between CLTs using a novel computational method that exhaustively searches for optimal correspondence between individual cells meanwhile retaining their topological relationships. The revealed CLT similarities allowed us to infer functional similarity at the transcriptome level, identify cell fate transformations, predict functional relationships between mutants, and find evolutionary correspondence between cell types of different species. By allowing quantitative comparison between CLTs, our work is expected to greatly enhance the interpretability of relevant data and help answer the myriad of questions surrounding the developmental process.
The life of multicellular organisms typically starts from a zygote, which undergoes multiple rounds of cell divisions and simultaneous differentiation and eventually develops into an individual organism with multiple types of cells. The developmental cell lineage tree (CLT) is a record of both the differentiation result of the cells appearing at a specific developmental time point (cell types the terminal nodes of the CLT) and the cell division events since the zygote that led to these cells (topology of the CLT) (Figure 1A). A more generalized CLT does not necessarily root at the zygote but may start from any dividable cell, in which case it is a subtree of the CLT rooted at the zygote or a sub-CLT (Figure 1A). As one of the most important traits of multicellular organisms, the CLT is the key to resolving many significant problems in the life sciences. For example, developmental CLTs record the process of development (Du et al., 2014, 2015; Junker et al., 2017; Kalhor et al., 2017; McKenna et al., 2016; Santella et al., 2016; Sulston et al., 1983) and help understanding the mechanism of developmental robustness (Reizel et al., 2012; Salipante et al., 2010; Yang et al., 2014). Other types of CLTs reveal the origin of relapsed or metastatic tumor cell populations (Frumkin et al., 2008; Shlush et al., 2012), the risk of carcinogenesis attributable to the number of cell divisions since the zygote (Tomasetti et al., 2017; Wasserstrom et al., 2008), and the origin and evolution of cell types and lineages (Arendt et al., 2016; Lescroart et al., 2015).
Figure 1
Overview of the DELTA Algorithm
(A) A very simple typical developmental CLT rooted at the zygote (“Z”). Cells undergoing further division are represented by internal nodes and others by terminal nodes. The cell types of the terminal nodes are indicated by the colors in the legend on the right, whereas the type of the internal cells, as inferred from the cell type of the two daughter cells, are also indicated by different node colors. The depth, or the number of divisions since the zygote, of a cell is indicated by the vertical axis. A subtree, or sub-CLT, is outlined by a dotted box.
(B) Two CLTs to be aligned, Q and S, are presented with their terminal cell types color coded. DELTA aligns them globally (DELTA-g), where all cells in respective CLTs are either pruned or aligned, or locally (DELTA-l), where only pairs of sub-CLTs with good enough alignments are reported (see Figure S1A for more details).
(C) The DELTA alignment of the C. elegans CLT of standard anatomical terminal cell type annotation, with an isomorphic version of itself, where 30% of randomly chosen sister sub-CLT pairs were swapped. The resulting CLT alignments were visualized by our newly developed R package, “ggVITA” (see also Figure S1C). See also Figure S1.
Overview of the DELTA Algorithm(A) A very simple typical developmental CLT rooted at the zygote (“Z”). Cells undergoing further division are represented by internal nodes and others by terminal nodes. The cell types of the terminal nodes are indicated by the colors in the legend on the right, whereas the type of the internal cells, as inferred from the cell type of the two daughter cells, are also indicated by different node colors. The depth, or the number of divisions since the zygote, of a cell is indicated by the vertical axis. A subtree, or sub-CLT, is outlined by a dotted box.(B) Two CLTs to be aligned, Q and S, are presented with their terminal cell types color coded. DELTA aligns them globally (DELTA-g), where all cells in respective CLTs are either pruned or aligned, or locally (DELTA-l), where only pairs of sub-CLTs with good enough alignments are reported (see Figure S1A for more details).(C) The DELTA alignment of the C. elegans CLT of standard anatomical terminal cell type annotation, with an isomorphic version of itself, where 30% of randomly chosen sister sub-CLT pairs were swapped. The resulting CLT alignments were visualized by our newly developed R package, “ggVITA” (see also Figure S1C). See also Figure S1.Since the resolution of the first complete cell lineage tree in Caenorhabditis elegans (Sulston et al., 1983), technological advancements ranging from 3D time-lapse imaging (Gritti et al., 2016) to genome editing in combination with single-cell high-throughput sequencing (Junker et al., 2017; Kalhor et al., 2017; McKenna et al., 2016; Raj et al., 2018a, 2018b) had fueled the accumulation of more CLT data. However, a general computational framework for quantitative comparison of CLTs has been lacking. Take the classical CLT of Caenorhabditis elegans for example, phenotypic comparison and functional inference were previously made on the predefined lists of developmental phenotypes (Gunsalus et al., 2005; Piano et al., 2002). This approach had not fully utilized the rich information embedded in the CLT and cannot reveal finer scale correspondence between individual cells. Quantitative comparison of CLTs should facilitate quality assessment of CLT data, relating new observations to the known, disentangling variation from the consensus, and evolutionary comparative studies.To address this critical demand, we designed Developmental Cell Lineage Tree Alignment (DELTA), an algorithm that aligns a pair of CLTs and quantifies the similarity between them by identifying homeomorphic sub-CLTs, with the assumption that similar genetic (or developmental) programs should give rise to similar sub-CLTs (Azevedo et al., 2005; Yang et al., 2014). A critical feature of DELTA is its compatibility with classical CLTs (such as those of nematodes) and the genome-editing-based lineage CLTs, as it only required the topology and the terminal cell types of the CLT to work. Using simulated CLTs (Lohaus et al., 2007) and real CLTs from C. elegans (Murray et al., 2012), we showed that homeomorphic sub-CLTs found by DELTA have highly similar expression profiles. Comparisons among CLTs of the wild-type and single-gene knockdown strains of C. elegans (Santella et al., 2016) revealed both known (Du et al., 2015) and novel homeotic transformations of cell fates in the knockdown strains and suggested for the knockdown genes functional relationships compatible with evolutionary and experimental evidence. Finally, we compared the developmental CLTs of two nematodes and were able to pinpoint the evolutionary changes in fates between cells in these two CLTs. By maximizing the alignment score between real CLTs of the two species, we found biologically interpretable correspondence between their nonuniformly defined cell types, highlighting a conceptually new way of inferring the evolutionary relationship between cell types. Together, these results recapitulated known developmental patterns and demonstrated the usefulness of DELTA. In the way that sequence alignment algorithms fundamentally transformed genetics, CLT comparison/alignment enabled by DELTA will likely lead to new opportunities for a deeper understanding of the biology of multicellular organisms, such as assessing the repeatability of differentiation, linking sub-CLTs to developmental programs, and distinguishing autonomous and regulatory components involved in development.
Results
Overview of the DELTA Algorithm
A typical developmental CLT, as analyzed here, is a binary tree (Figure 1A), where each node represents a single cell and each branch represents a descendant relationship from a mother cell to one of its daughter cells. The cells in the tree can be divided into internal or terminal cells/nodes based on whether they undergo further division as recorded by the CLT. A subtree rooted at any of the cells is a sub-CLT. The terminal cells of the CLT are all labeled by their cell types, which could be anatomically defined as, for example, muscle or neural cells, or defined by the expression state of one or more genes such as CD4+ cells. Note that, in contrast to the CLTs commonly discussed in nematodes such as C. elegans, we ignored the temporal duration of the cell cycle and the order of sister cells to ensure compatibility with CLTs determined by genome editing. In other words, the length of the branch contains no information about how long each cell exists and swapping any pair of sister sub-CLTs (i.e., two sub-CLTs whose roots are a pair of sister cells divided from the same mother) will not change the CLT.We designed the DELTA algorithm with the purpose of identifying similarities in developmental programs using the phenotypic information represented by the CLT. Here the developmental program is a succession of cell fate choices made at every division event recorded by the CLT. We assumed that the developmental state of a cell is reflected by the states of its daughter cells, which are further defined by their own daughter cells until reaching the terminal cells with known cell types (Figure 1A, color of nodes). In other words, a pair of cells is similar if the two sub-CLTs rooted at them resemble each other in topology and lineal organization of terminal cell types. This assumption was deemed useful in demonstrating the simplicity (Azevedo et al., 2005) and robustness (Yang et al., 2014) of metazoan CLTs, as well as in identifying homeotic transformation of cell fates (Du et al., 2015). DELTA compares every sub-CLT from a query CLT with those from a subject CLT and exhaustively searches for their maximal resemblance in terms of topology and lineal organization of terminal cell types via a dynamic programming strategy (Figure S1A). As an analogy, sequence alignment algorithms align residues in biological sequences with the constraint of their sequential order, whereas DELTA aligns terminal and internal cells in CLTs with the constraint of their lineal organization. DELTA can align CLTs globally (DELTA-g), where all cells in respective CLTs are either pruned or aligned, or locally (DELTA-l), where only pairs of sub-CLTs with good enough alignments are reported, very similar to global and local sequence alignment, respectively (Figure 1B). DELTA also estimates the statistical significance of each CLT alignment relative to random pairs of CLTs with the same sizes and terminal cell type compositions as the aligned CLTs. More algorithmic details of DELTA are given in the Transparent Methods section and Supplemental Information.As a basic validation of DELTA, we aligned a C. elegans CLT with an isomorphic version of itself, where 30% of randomly chosen sister sub-CLT pairs were swapped. DELTA-g successfully aligned the isomorphic CLT with the original CLT by matching all terminal nodes, yielding the same DELTA score as that of the alignment between two identical C. elegans CLTs (Figure 1C; see also Figure S1B). We also developed an accompanying R package named ggtree-based visualization of tree alignments (ggVITA) for the visualization of DELTA alignments (Figure 1C; see also Figure S1C).
CLT Simulations Suggest that DELTA Can Identify Developmental Similarities
To further demonstrate that DELTA alignment can indeed reveal developmental similarities, we simulated CLTs using a previously published model (Lohaus et al., 2007), in which the gene expression status (on/off) of each gene in each cell and at each discrete time point was calculated by a predefined regulatory network (Figures 2A and 2B; see Transparent Methods). DELTA-l was used to align the simulated CLT with itself. This process was repeated with 1,000 different simulated CLTs, and the top ten local alignments from each simulation were examined to assess the performance of DELTA. Several results in support of the capability of DELTA were observed. First, we found that, for each simulated CLT, the self-alignment always had the highest DELTA score in the local alignment result (Figure 2C, left-most red bar). Second, DELTA tended to find alignments between large sub-CLTs, which contained more developmental information (Figure 2D, red bars). Third, the CLT alignments were statistically highly significant, indicating that DELTA scores were much higher than those between random CLTs of similar sizes and terminal cell type compositions (Figure 2E, red bars). Fourth, the differences in gene expression status, as measured by the Hamming distance (the number of genes with differences in expression status between two cells), between the roots of the aligned sub-CLTs was much smaller than that between two randomly chosen internal cells and tended to be lower for those with higher DELTA scores (Figure 2F, red bars). Fifth, by comparing the expression status of each gene for all aligned (terminal and internal) cells in a pair of (sub-)CLTs, we found that their expression trajectories were much more similar (Hamming distance scaled to [0,1], therefore having an expectation of 0.5) between aligned internal or terminal cells than expected (Figure 2G), suggesting that not only the initial state but also the subsequence changes in expression were highly similar between the aligned CLTs. These results demonstrated that DELTA can indeed pair internal cells with similar gene expression status.
Figure 2
Validating DELTA by Simulated CLTs
(A) An example of a transcriptional regulatory network used to simulate CLTs. There are eight genes, each of which is regulated by an average of four other genes. Red and blue lines represent activations and repressions, respectively. Values in the matrix of regulatory interactions are detailed in Figure S2A. The sectors underneath the genes represent the sectors shown for each cell in (B).
(B) An example of a simulated CLT. Lowercase letters a to h represent eight terminal cell types based on the ON/OFF state of eight genes, which is shown as red (ON) or white (OFF) in the corresponding sector around the letter. The development of the lineage tree stops when one of the terminal cells reaches 12 rounds of divisions or the 50th discrete time point of the simulation. CLTs were also simulated under other parameter settings, as listed in Figure S2B and explained in the Transparent Methods.
(C) The score of the top 10 CLT alignments found by DELTA-l from self-comparison of simulated CLTs. In addition to the full simulated CLT (red bars), perturbations were added to mimic experimental/biological noises, such as the 35% loss of terminal cells (green bars), non-autonomous cell fate (purple bars), or both (blue bars). Each bar shows the average score and its standard error assessed by 1,000 simulations with different regulatory networks and initial expression states.
(D–F) Similar to (C), except that the number of terminal cells in the aligned CLT (D), statistical significance of the alignment (E) (P values truncated at 10−300), and difference in expression status between roots of aligned CLTs as measured by the Hamming distance (F) are plotted. Please see Figure S3 for results from simulations with cell loss rate of 5%, 10%, 20%, 50%, or 90%.
(G) For the same set of top 10 CLT alignments presented in (C) (x axis), the expression trajectory dissimilarity of every gene except genes 1 and 2 (y axis) is shown. The expression states of a specific gene at the last time point of every cell were compared for every pair of matched cells (terminal or internal) from the two aligned CLTs. The resulting Hamming distance is normalized by the number of matched cell pairs, giving rise to the expression trajectory dissimilarity, the value of which is scaled based on the color scale bar to the right. All expression trajectory dissimilarities are average values from the DELTA-l results of 1,000 simulations. The expression trajectory dissimilarities between the aligned cells are clearly much lower than the null expectation 0.5 for the normalized Hamming distance. See also Figures S2 and S3.
Validating DELTA by Simulated CLTs(A) An example of a transcriptional regulatory network used to simulate CLTs. There are eight genes, each of which is regulated by an average of four other genes. Red and blue lines represent activations and repressions, respectively. Values in the matrix of regulatory interactions are detailed in Figure S2A. The sectors underneath the genes represent the sectors shown for each cell in (B).(B) An example of a simulated CLT. Lowercase letters a to h represent eight terminal cell types based on the ON/OFF state of eight genes, which is shown as red (ON) or white (OFF) in the corresponding sector around the letter. The development of the lineage tree stops when one of the terminal cells reaches 12 rounds of divisions or the 50th discrete time point of the simulation. CLTs were also simulated under other parameter settings, as listed in Figure S2B and explained in the Transparent Methods.(C) The score of the top 10 CLT alignments found by DELTA-l from self-comparison of simulated CLTs. In addition to the full simulated CLT (red bars), perturbations were added to mimic experimental/biological noises, such as the 35% loss of terminal cells (green bars), non-autonomous cell fate (purple bars), or both (blue bars). Each bar shows the average score and its standard error assessed by 1,000 simulations with different regulatory networks and initial expression states.(D–F) Similar to (C), except that the number of terminal cells in the aligned CLT (D), statistical significance of the alignment (E) (P values truncated at 10−300), and difference in expression status between roots of aligned CLTs as measured by the Hamming distance (F) are plotted. Please see Figure S3 for results from simulations with cell loss rate of 5%, 10%, 20%, 50%, or 90%.(G) For the same set of top 10 CLT alignments presented in (C) (x axis), the expression trajectory dissimilarity of every gene except genes 1 and 2 (y axis) is shown. The expression states of a specific gene at the last time point of every cell were compared for every pair of matched cells (terminal or internal) from the two aligned CLTs. The resulting Hamming distance is normalized by the number of matched cell pairs, giving rise to the expression trajectory dissimilarity, the value of which is scaled based on the color scale bar to the right. All expression trajectory dissimilarities are average values from the DELTA-l results of 1,000 simulations. The expression trajectory dissimilarities between the aligned cells are clearly much lower than the null expectation 0.5 for the normalized Hamming distance. See also Figures S2 and S3.Two practical considerations prompted us to further scrutinize the performance of DELTA using revised CLT simulation models. First, the aforementioned model used for simulated CLTs implicitly assumed that all cells differentiate autonomously, whereas the real differentiation process is believed to be highly regulated. To mimic such regulation by external signals from other cells or the environment, we introduced a probability (5%) of randomly flipping the gene expression status of a gene at every time point during the simulated development by negating its expression level (see Transparent Methods), excluding the cell cycle and asymmetric division regulator. Second, current experimental techniques are not perfect in capturing all cells, making the experimentally reconstructed CLTs incomplete. For example, the state-of-the-art lineage tree reconstruction method used a droplet-based single-cell sequencing system (Raj et al., 2018a) and the most commonly used commercial single-cell sequencing platform (10x Chromium) has a cell loss rate of ∼35%. To reflect such technical limitations, we simulated two CLTs with identical initial parameters (regulatory network and expression of the root), randomly removed 35% of terminal cells from each CLT, reconstructed the simulated CLTs following the actual lineal relationship among the remaining terminal cells (see Transparent Methods), and aligned them by DELTA. As expected, these two perturbations reduced the DELTA score (Figure 2C, green and blue bars), CLT size (Figure 2D, green and blue bars), statistical significance (Figure 2E, green and blue bars), and gene expression similarity between the aligned sub-CLTs (green and blue bars in Figure 2F and the two panel on the right of Figure 2G). Nevertheless, DELTA is still capable of identifying statistically significant sub-CLTs with highly similar gene expression status (Figures 2C–2G). We found by additional simulations that statistically significant alignments could readily be found with 5%, 10%, 20%, or 50% cell losses, but not 90% (Figure S2). These results suggest that, despite the detection power reduction due to stochastic perturbations, associating CLT phenotypes with underlying gene expression status by DELTA remains feasible as long as ≥50% of the terminal cells of the real CLT were captured by the reconstructed CLT. Note, however, recent genome-editing-based CLT reconstruction efforts have not yielded CLTs with cell capture rate ≥50%; we therefore refrained from trying DELTA on genome-editing-based CLTs (see more details in Limitations of the Study).
Comparison of C. elegans CLTs by DELTA Reveals Cells with Highly Similar Developmental Programs
Next, we sought to test the performance of DELTA using real CLTs from C. elegans. The C. elegans embryonic CLT contains 31 pairs of bilaterally symmetrical sub-CLTs (Sulston et al., 1983), 8 of which have 10 or more terminal cells (Figure 3A). We aligned these symmetric pairs of sub-CLTs by DELTA-g and found that their DELTA scores were highly significant (Figure 3B, gray bars) and always higher than those from the alignments between one sub-CLT from the symmetric pairs and another sub-CLT with a similar number of terminal cells (Figure 3B, blue dots; see Transparent Methods). For the 17 symmetric pairs of sub-CLTs with three to nine terminal cells, all but one (ABarappa versus ABalpaaa) pair gave rise to statistically significant CLT alignments (Table S1). The remaining six symmetric pairs of sub-CLTs cannot be aligned by DELTA as they have <3 terminal cells (Table S1).
(A) Bilaterally symmetric sub-CLTs with at least 10 terminal cells in the C. elegans CLT are highlighted by different colors, whereas the lineal names of their root are also marked below every sub-CLT.
(B) The alignment score (dots, scaled by the left y axis) and statistical significance (gray boxes, scaled by the right y axis, see Transparent Methods) found by DELTA-g alignment between the symmetric sub-CLTs are indicated by red dots. As controls, sub-CLTs with a number of terminal cells differing from the symmetric sub-CLTs by no more than 10% were also compared with one of the symmetric sub-CLTs, and the resulting DELTA scores are indicated by blue dots. Please see Table S1 for DELTA results between other bilaterally symmetric sub-CLTs with less than 10 terminal cells. See also Table S1.
Bilaterally Symmetric sub-CLTs Yielded Highly Significant DELTA Alignments(A) Bilaterally symmetric sub-CLTs with at least 10 terminal cells in the C. elegans CLT are highlighted by different colors, whereas the lineal names of their root are also marked below every sub-CLT.(B) The alignment score (dots, scaled by the left y axis) and statistical significance (gray boxes, scaled by the right y axis, see Transparent Methods) found by DELTA-g alignment between the symmetric sub-CLTs are indicated by red dots. As controls, sub-CLTs with a number of terminal cells differing from the symmetric sub-CLTs by no more than 10% were also compared with one of the symmetric sub-CLTs, and the resulting DELTA scores are indicated by blue dots. Please see Table S1 for DELTA results between other bilaterally symmetric sub-CLTs with less than 10 terminal cells. See also Table S1.To further assess the capability of DELTA to find similarities in developmental programs, we take advantage of the Expression Patterns in C. elegans (EPIC) database, where the expression of 130 genes is tracked in each cell during the embryonic development of C. elegans, from the zygote to the last round of embryonic cleavage (the 350-cell stage) (Murray et al., 2012). We collected all statistically significant (nominal P < 0.05) entries from the top 2,000 sub-CLT alignments in the DELTA-l results of a C. elegans CLT versus itself and calculated the Pearson's correlation coefficient R between the aligned cells in the sub-CLTs (See Transparent Methods). Since the sizes of different alignments vary, the Pearson's R values are standardized by Fisher's r-to-z transformation before being compared. In support of the usefulness of DELTA, a higher z is observed in CLT alignments with higher DELTA scores (Figure 4A, Pearson's R = 0.951, P < 10−300, Spearman's ρ = 0.926, P < 10−300) and more significant alignment P values (Figure 4B, Pearson's R = 0.505, P < 10−98, Spearman's ρ = 0.177, P < 10−11). In combination with the results from the simulated CLTs, we demonstrated that DELTA can indeed identify CLTs with highly similar developmental programs.
Figure 4
Expression Similarities between Aligned sub-CLTs in C. elegans
The top 2,000 local alignments between sub-CLTs of C. elegans were found by DELTA-l, and those with a nominal P < 0.05 were checked for expression similarities between aligned cells. For each sub-CLT alignment, the expression levels of 130 genes recorded in the EPIC database were compared for all aligned cells, giving rise to the expression similarity between the two aligned sub-CLTs in the form of Pearson's correlation coefficient (point size). The expression similarities were further processed by Fisher's r-to-z transformation to ensure comparability between sub-CLT alignments with different numbers of cells and then shown to be highly correlated with the DELTA score (A) and p value (B) of the sub-CLT alignment. A full list of all 1,526 results with nominal P < 0.05 within the top 2,000 local alignments was given as Table S2. Note that 30 alignments with DELTA score >250 were truncated from this figure to facilitate a better visualization of the trends for alignments with small DELTA scores. See also Table S2.
Expression Similarities between Aligned sub-CLTs in C. elegansThe top 2,000 local alignments between sub-CLTs of C. elegans were found by DELTA-l, and those with a nominal P < 0.05 were checked for expression similarities between aligned cells. For each sub-CLT alignment, the expression levels of 130 genes recorded in the EPIC database were compared for all aligned cells, giving rise to the expression similarity between the two aligned sub-CLTs in the form of Pearson's correlation coefficient (point size). The expression similarities were further processed by Fisher's r-to-z transformation to ensure comparability between sub-CLT alignments with different numbers of cells and then shown to be highly correlated with the DELTA score (A) and p value (B) of the sub-CLT alignment. A full list of all 1,526 results with nominal P < 0.05 within the top 2,000 local alignments was given as Table S2. Note that 30 alignments with DELTA score >250 were truncated from this figure to facilitate a better visualization of the trends for alignments with small DELTA scores. See also Table S2.
Phenotypic Differences in Knockdown CLTs Quantified by DELTA Reveal Functional Relationships among Underlying Genes
Inspired by the capability of DELTA to identify similarities in developmental programs by homeomorphic (sub-)CLTs, we continued to test whether DELTA can associate CLT changes with their underlying genetic mechanisms. The Digital Development database (Santella et al., 2016), where CLTs are recorded for C. elegans strains with ∼200 conserved genes individually knocked down (knockdown strains) provides a unique opportunity to compare phenotypic changes in CLTs with underlying genetic differences. Specifically, homeotic transformations, where a cell x in a knockdown strain adopted the fate used by cell y in the wild-type strain (an x-to-y transformation), were previously observed using this dataset (Du et al., 2015). For a homeotic transformation of x-to-y, we extracted the sub-CLT rooted at x from the knockdown strain as well as the sub-CLT rooted at y from the wild-type strain. Using a scoring matrix defined by the number of markers with shared expression (Figure 5A) and a pruning cost of 1, we used DELTA-g to align all the extracted pairs of sub-CLTs from the homeotic transformations, i.e., the sub-CLT with an altered fate in the mutant strain and the sub-CLT from the wild-type strain representing the new fate resulting from transformation. We found that 87.3% of the pairs gave rise to statistically significant (P < 0.05) alignments (Figures 5B and 5C; see also Table S3), suggesting that DELTA can indeed identify homeotic cell fate transformations. Moreover, DELTA showed the correspondence between the terminal cells of these aligned CLTs (Figure 5D, alignments on the left), revealing the subtle differences between wild-type and transformed sub-CLTs.
Figure 5
DELTA Reveals Homeotic Cell Fate Transformation in C. elegans Mutants
(A) The scoring matrix used in DELTA analyses of cell types annotated in the Digital Development database.
(B and C) For the 131 homeotic cell fate transformation events found by Du et al. between CLTs with at least five terminal cells, the DELTA score (B) and p value (C) between the sub-CLTs in mutant strains and those in a wild-type strain representing the adopted cell fate are shown.
(D) Detailed sub-CLT alignments visualized by our newly developed R package “ggVITA”. The two alignments on the left were previously marked by Du et al., and the two on the right were newly found by DELTA. Terminal cell types are indicated by colors as in (A). Sub-CLT pruning was shown as red circles on the corresponding internal nodes, in which the number of pruned nodes was indicated. See also Table S3.
DELTA Reveals Homeotic Cell Fate Transformation in C. elegans Mutants(A) The scoring matrix used in DELTA analyses of cell types annotated in the Digital Development database.(B and C) For the 131 homeotic cell fate transformation events found by Du et al. between CLTs with at least five terminal cells, the DELTA score (B) and p value (C) between the sub-CLTs in mutant strains and those in a wild-type strain representing the adopted cell fate are shown.(D) Detailed sub-CLT alignments visualized by our newly developed R package “ggVITA”. The two alignments on the left were previously marked by Du et al., and the two on the right were newly found by DELTA. Terminal cell types are indicated by colors as in (A). Sub-CLT pruning was shown as red circles on the corresponding internal nodes, in which the number of pruned nodes was indicated. See also Table S3.We further examined the top 100 DELTA-l results between the wild-type and each mutant strain for homeotic transformation events. Some known homeotic transformations are among the top-ranking local alignments. For example, the cell fate transformation of the ABar lineage in the MOM-2 knockdown strain into ABal in the wild-type strain had the sixth highest DELTA score in the local alignment between the wild-type and MOM-2 knockdown CLTs, the detailed alignment of which between individual cells found by DELTA was visualized by ggVITA (Figure 5D, top left alignment). Similarly, the E cell in GLD-2 knockdown strains takes the cell fate of wild-type MS, which corresponds to the 32nd top alignment in the DELTA-l results between the wild-type and GLD-2 knockdown CLTs (Figure 5D, bottom left alignment). Furthermore, we found some alignments between sub-CLTs that likely correspond to additional homeotic cell fate transformations that have not been previously reported, such as the transformation of P1 into ABp when CAMT-1 is knocked down and P1 into ABa when C01A2.5 is knocked down (Figure 5D, right alignments). Note, however, that most local alignments found in DELTA-l between wild-type and mutant strains are between sub-CLTs unchanged by the gene knockdown or highly similar in the original CLT. Nevertheless, these results suggest that homeotic cell fate transformation can be readily found by DELTA.Given the above results, we further hypothesized that the phenotypic impact of the gene knockdown, as approximated by the DELTA score between wild-type and knockdown CLTs, can reflect the functional importance of the underlying genes. To test this hypothesis, we compared the DELTA score, which was calculated from the global alignment between wild-type and knockdown CLTs, with the evolutionary rates of the genes being knocked down, since functionally more important genes generally evolve more slowly and thus are more conserved (Zhang and Yang, 2015). Here we used the dN/dS ratio to measure the protein evolutionary rate, where dN is the number of nonsynonymous nucleotide substitutions per nonsynonymous site and dS is the number of synonymous nucleotide substitutions per synonymous site (Nei and Kumar, 2000). We split the genes with knockdown CLTs into two groups with high or low DELTA scores with the wild-type CLT and compared the average evolutionary rates of the two groups. As we divided the two groups based on greater DELTA score differences, the deviation of the evolutionary rate of the two groups continued to increase up to ∼ 8-fold when genes with DELTA scores >7400 and <2600 were compared (Figure 6A). Since genes with a more dramatic functional impact upon deletion are generally more constrained by natural selection (Zhang and Yang, 2015), this observation suggests that DELTA comparisons between knockdown and wild-type CLTs can indeed quantify phenotypic changes in CLTs in relation to the functional importance of the knockdown gene.
Figure 6
DELTA Relates Phenotypic Changes in CLTs to Underlying Genetic Mechanisms
(A–D) C. elegans single-gene knockdown (KD) CLTs were categorized into high (dark color) or low (light color) DELTA score groups by different thresholds (x axis) according to their DELTA score of global alignments with the wild-type CLT. The differences between the high and low DELTA score groups become more dramatic for dN/dS (A) and dN (B), but not the dS (B) or expression level (D) of the knockdown genes. The error bar indicates the standard deviation assessed by 1,000 bootstraps of the genes.
(E and F) Enrichment with experimentally determined interactions (E) and coexpression (F) in pairs of genes with knockdown CLTs having a high DELTA score relative to those with a low DELTA score was assessed by an odds ratio from the Mantel-Haenszel test, where the thresholds for high and low DELTA scores are indicated on the x axis. The experimentally determined interactions and coexpression relationships were extracted from the STRING database with different confidence thresholds, as indicated by the color scale. The error bar indicates the 95% confidence interval of the odds ratio given by the Mantel-Haenszel test.
DELTA Relates Phenotypic Changes in CLTs to Underlying Genetic Mechanisms(A–D) C. elegans single-gene knockdown (KD) CLTs were categorized into high (dark color) or low (light color) DELTA score groups by different thresholds (x axis) according to their DELTA score of global alignments with the wild-type CLT. The differences between the high and low DELTA score groups become more dramatic for dN/dS (A) and dN (B), but not the dS (B) or expression level (D) of the knockdown genes. The error bar indicates the standard deviation assessed by 1,000 bootstraps of the genes.(E and F) Enrichment with experimentally determined interactions (E) and coexpression (F) in pairs of genes with knockdown CLTs having a high DELTA score relative to those with a low DELTA score was assessed by an odds ratio from the Mantel-Haenszel test, where the thresholds for high and low DELTA scores are indicated on the x axis. The experimentally determined interactions and coexpression relationships were extracted from the STRING database with different confidence thresholds, as indicated by the color scale. The error bar indicates the 95% confidence interval of the odds ratio given by the Mantel-Haenszel test.To better understand this observation, we compared dN and dS separately with the DELTA score. We found that differences in DELTA scores were predictive of deviation in dN (Figure 6B) but not dS (Figure 6C). Since dS is primarily determined by the mutation rate, whereas dN is determined jointly by the mutation rate and natural selection (Nei and Kumar, 2000), these results suggest that the DELTA score indeed captured phenotypic changes in CLT that are subject to natural selection acting on the function of the gene, instead of a mutational bias in favor of less important genes. In addition, the gene expression level was found to be unrelated to the DELTA score (Figure 6D). Therefore, the correspondence between the evolutionary conservation of a gene and the impact of its knockdown quantified by DELTA is not confounded (Zhang and Yang, 2015) by the expression level of the gene.We next asked whether comparisons between CLTs of two knockdown strains can reveal a functional relationship between the knockdown genes, as has been shown by CLT comparison using methods other than DELTA (Gunsalus et al., 2005; Lee et al., 2008; Piano et al., 2002). Since the DELTA score quantifies the phenotypic similarity between two CLTs, we hypothesized that, if the DELTA score between two knockdown CLTs is higher, the genes knocked down in these strains will more likely be functionally related. To test our hypothesis, we assessed the enrichment of experimentally determined interactions recorded by STRING (Szklarczyk et al., 2017) in pairs of genes with knockdown CLTs having a high DELTA score relative to those with a low DELTA score. To avoid interdependence between gene pairs due to involvement of the same gene, we constructed a 2x2 contingency table for each of the 204 genes separately based on (1) whether their DELTA scores with the 203 other knockdown CLTs were higher or lower than some thresholds (Figure 6E, x axis) and (ii) whether the confidence of experimentally determined interactions between the pair of knockdown genes in comparison surpassed a selected confidence threshold (Figure 6E, color scale). The 2x2 contingency tables for all genes were summarized by the Mantel-Haenszel procedure to calculate a combined odds ratio to reflect the enrichment of the interaction, such that a larger odds ratio indicates that the gene pairs underlying the knockdown CLTs are more likely to be coexpressed. A similar analysis was also performed for coexpression between knockdown genes (Figure 6F). For both interaction and coexpression, we found that the odds ratio increased as the DELTA score difference between the two groups of gene pairs became larger, regardless of the confidence threshold used. For example, the enrichment of experimentally determined interactions with a confidence value >0.5 yielded an odds ratio of 25.5 when gene pairs with a DELTA score >6,000 were compared with those with a DELTA score <4,000. These results suggest that genes whose knockdown yields similar phenotypic outcomes for CLTs tend to be functionally related. Our observations also provided novel CLT-based results that were consistent with previous observation derived from image-based developmental phenotypes (Gunsalus et al., 2005; Piano et al., 2002), yet offering an analytical method that is more versatile than a predefined list of image-based phenotypes. Altogether, our comparative analyses among knockdown and wild-type CLTs by DELTA successfully associate CLT phenotypes with the underlying genotypes.
CLT Comparison between Species by DELTA Hints at Evolutionary Correspondence between Cell Identities and Cell Types
The diversity of cell types is a significant feature of multicellular organisms, but how it evolves remains largely unexplored. Similar to the necessity of finding orthologous genes between species, the study of cell type evolution is impossible without mapping cell types or constructing the “cell type orthology” between different species. Traditionally, cell types from two species are considered “orthologous” according to structure- and/or function-based cell type definitions, such as neural or muscle cells. Recent technological development of single-cell RNA sequencing and other high-throughput approaches (Schwartzman and Tanay, 2015; Shapiro et al., 2013; Stegle et al., 2015) has promoted research efforts to revise cell type definitions by molecular similarities at the transcriptome or epigenome level, such as in the Human Cell Atlas Project (Regev et al., 2017). However, inferring cell type orthology by structural/functional/molecular similarities may not be reliable because such similarities could have emerged from phenotypic convergence (Arendt et al., 2016), as found for the striated muscles of vertebrates and Drosophila melanogaster (Brunet et al., 2016). Alternatively, we hypothesized that the DELTA score between CLTs from two species with nonuniformly defined cell types would be maximized by a scoring matrix based on the actual evolutionary relationship between specific cell types.To test our hypothesis, we compared the CLTs of Pellioditis marina and C. elegans, the cell type identities of which were previously defined by structure and function nonuniformly in the two species (Azevedo et al., 2005). By a greedy strategy (see Transparent Methods, Figures 7A and S4), we optimized the scoring matrix such that the DELTA score from global alignments between the two CLTs was maximally higher than the scores from pairs of control CLTs created by relabeling all cells of a type as another random type. Intriguingly, the high matching scores in the optimized scoring matrix indeed hint at biologically reasonable correspondence between cell types from the two species (Figure 7B). For example, the optimized matrix suggests that the cells labeled as “Muscle,” “Death,” “Intestinal,” and “Germ” have exact matches between the two species. On the other hand, cells labeled as “Nervous System” in P. marina correspond to the cells labeled as “Neurons” and “Structural” (neuronal structural) cells, and the cells labeled as “Pharynx” in P. marina correspond to “Gland,” “Epithelial,” “Muscle,” and “Neuron” cells in C. elegans. Finally, the 30 cells in P. marina with the “Other” fate were suggested by the optimized matrix to be epithelial cells in C. elegans.
Figure 7
DELTA Comparison across Species Highlights the Evolutionary Correspondence of Cell Fates
(A) Optimization of the scoring matrix between cell types of two species. With the cell types from two species, we first defined a scoring matrix M, the row and column labels (cell types) of which were randomly permutated to generate control matrices M1, M2, …, and M. These 1 + l matrices were individually used by DELTA to align the CLTs from the two species, giving rise to 1 + l corresponding DELTA scores w, w1, w2, …, and w. The deviation of w from its random expectation , i.e., , is optimized by a greedy strategy. See Transparent Methods and Figure S4 for more details.
(B) The optimal scoring matrix for comparisons between C. elegans and P. marina.
(C) The alignment between C. elegans and P. marina found by DELTA using the scoring matrix shown in (B). Note that the small circled numbers on some internal branches indicate the size of the pruned sub-CLT.
(D) A sub-alignment from (C); note that the Epiderm “P1aabab” cell in P. marina is apoptotic in C. elegans.
(E) The number of aligned cell pairs in (E) that fall into each cell type pairs between P. marina (x axis) and C. elegans (y axis) is shown by the number and the color within the grid of a heatmap, with the color scale indicated by the scale bar to the right. See also Table S4 and Figure S4.
DELTA Comparison across Species Highlights the Evolutionary Correspondence of Cell Fates(A) Optimization of the scoring matrix between cell types of two species. With the cell types from two species, we first defined a scoring matrix M, the row and column labels (cell types) of which were randomly permutated to generate control matrices M1, M2, …, and M. These 1 + l matrices were individually used by DELTA to align the CLTs from the two species, giving rise to 1 + l corresponding DELTA scores w, w1, w2, …, and w. The deviation of w from its random expectation , i.e., , is optimized by a greedy strategy. See Transparent Methods and Figure S4 for more details.(B) The optimal scoring matrix for comparisons between C. elegans and P. marina.(C) The alignment between C. elegans and P. marina found by DELTA using the scoring matrix shown in (B). Note that the small circled numbers on some internal branches indicate the size of the pruned sub-CLT.(D) A sub-alignment from (C); note that the Epiderm “P1aabab” cell in P. marina is apoptotic in C. elegans.(E) The number of aligned cell pairs in (E) that fall into each cell type pairs between P. marina (x axis) and C. elegans (y axis) is shown by the number and the color within the grid of a heatmap, with the color scale indicated by the scale bar to the right. See also Table S4 and Figure S4.Furthermore, the DELTA-g alignment based on the optimized scoring matrix revealed the detailed correspondence between terminal cells of the two species (Figure 7C). Some evolutionary events of cell fate changes are clearly highlighted, such as an Epiderm cell in P. marina becoming apoptotic in C. elegans (Figure 7D. See Table S4 for a full list of cell pairs between P. marina and C. elegans aligned by DELTA-g). On a broader scale, the CLT alignment between P. marina and C. elegans showed results highly consistent with those found in previous reports (Houthoofd et al., 2003). For example, it was previously found that gut and somatic muscle of P. marina is highly conserved in comparison to C. elegans (Houthoofd et al., 2003). The same pattern was quite apparent if we count, for the CLT alignment by DELTA-g between P. marina and C. elegans, the number of aligned terminal cell pairs that fall into each combination of cell types between P. marina and C. elegans (Figure 7E). Indeed, all 20 intestine cells in P. marina were aligned to intestine cells in C. elegans, and 80 of 81 somatic muscle cells in P. marina were aligned to muscle cells in C. elegans. Given that DELTA only used the CLT (tree topology and terminal cell type), but not other information such as the cellular position, this observation therefore again supported the utility of DELTA.
Discussion
A Computational Framework for Comparative Studies of CLTs
In this study, we designed and implemented DELTA, a computational framework for the alignment of developmental CLTs. Using simulated CLTs and real CLTs from C. elegans, we showed that DELTA can find sub-CLTs with highly similar developmental programs, such as bilaterally symmetric lineage pairs, and lineages with highly similar expression trajectories. Furthermore, DELTA alignments among knockdown and wild-type C. elegans strains identified homeotic cell fate transformations, showed more dramatic phenotypic changes for the knockdown of more important genes, and revealed higher CLT similarities between strains where functionally related genes are individually knocked down. Finally, we found that the scoring matrix, optimized for the DELTA score between CLTs from two species, shed light on the evolution of cell types and CLTs of the two species. Together, these results not only recapitulated previously known developmental patterns and therefore demonstrated the utility of DELTA but also pointed to some novel biological patterns that merit further investigation.DELTA relies on two critical pieces of information to align CLTs, i.e., the topology and the terminal cell types of the CLT. This feature of DELTA ensures its compatibility with both classical CLTs (such as those of nematodes) and genome-editing-based CLTs. To the best of our knowledge, there is currently no other tool designed to quantify the similarity of CLTs other than DELTA. Algorithms for the quantitative comparison of other types of trees do exist. For example, similarities between phylogenetic trees can be measured by the edit distance between two trees with identical sets of unique leaves (species) (Nye et al., 2006). Additionally, an algorithm has been designed to find similarities between RNA structure trees, which are unrooted trees with nodes (internal or terminal) representing one of four nucleotides (A/U/G/C) (Milo et al., 2013). Both these methods are apparently not suitable for alignment and similarity quantification between CLTs, which are rooted trees with different non-uniquely labeled leaves. In conclusion, DELTA is expected to open new paths to the analysis of CLTs, which include both classical CLT (Houthoofd et al., 2003; Sulston et al., 1983) and novel genome-editing based CLT that will rapidly accumulate (Junker et al., 2017; Kalhor et al., 2017; McKenna et al., 2016; Raj et al., 2018a, 2018b).
Potential Applications of DELTA
As more CLTs are being determined, the application of DELTA to them shall provide critical insight into several important biological questions. First, the repeatability of development can be assessed by comparing CLTs from different individuals or CLTs that root at different cells from an otherwise homogeneous cell population. This analysis is particularly relevant for the efficiency of induced pluripotent stem (iPS) cells, where a seemingly homogeneous population of cells are similarly treated but only a very small fraction are successfully transformed to the pluripotent state, with an even smaller fraction capable of growth into organoids. Comparisons between the CLTs rooted at failed or successful iPS cells may hint at the mechanisms underlying their differences.Second, as demonstrated with simulated and wild-type C. elegans CLTs, DELTA is capable of associating CLT phenotypes with the genetic states of individual cells. Theoretically, the phenotypic consequence of genetic states, i.e., genotype-phenotype mapping (GPM), will become more complex when more cells are involved. As an intermediate phenotype to the phenotypes of single cells and of tissues/organisms, DELTA offers a novel path of bridging GPM at the unicellular and multicellular levels.Third, we have shown in our study that DELTA can be used to find the evolutionary correspondence between cell types in two species. The advancement of single-cell transcriptome profiling experiments has allowed cell type classification at the finest scale with molecular signatures of gene expression. Experimental limitations aside, this approach for cell type identity determination has two biological difficulties. On the one hand, the similarity of the transcriptional profiles may arise from both cell type homology owing to inheritance from the common precursor and phenotypic convergence, which might lead to false combinations of different cell types into one. On the other hand, the stochastic nature of gene expression (Elowitz et al., 2002) may lead to erroneous separation of the homogeneous cell population of one type into two. This problem was recently reported (Arendt et al., 2016), where an evolutionary definition of cell types based on the “core regulatory complex” (CoRC) of transcription factors was proposed. As a complementary approach, DELTA utilizes the biological information in a CLT to find the evolutionary correspondence between cell types and simultaneously reveals how the CLT itself evolves.Fourth, one critical unanswered question in development is the relative prevalence of autonomous and regulatory development. In nematodes such as C. elegans, development is autonomous except for a small number of sub-CLTs (Sulston and White, 1980). In most other animals, however, it is generally believed that regulatory development is the prevailing mode and autonomy is the exception. A direct quantitative answer to this question would emerge by examining the result from DELTA local alignments for the frequency of identical sub-CLTs, should a CLT or sub-CLT be available for those species.Fifth, an ideal CLT comprises complete longitudinal and horizontal data. However, during experimental assessment of CLTs by single-cell transcriptome profiling and lineage barcoding, cell lysis and loss are inevitable even in the state-of-the-art method, which dictates that CLT is longitudinally (because the cells are killed at the time of experimentation) and horizontally incomplete. DELTA may provide a resolution to this problem by allowing assembly of temporally “sliced” incomplete CLTs, just as sequence alignment allows the assembly of the genome from short reads.
Limitations of the Study
There are several potential caveats in our study that are worth discussing. The DELTA result critically relies on the choice of its two parameters: the scoring matrix between cell types and the pruning cost. Although we have carefully chosen biologically informed parameters (see Transparent Methods), there is no objective estimate of how good or bad they are, which is likely impossible to obtain before more CLT data become available, similar to the refinement of the substitution matrix for sequence alignment when more sequences were determined (Dayhoff, 1972). Additionally, the value of the pruning cost relative to the matching score also affects the DELTA results. On the one hand, a lower pruning cost makes DELTA more sensitive because pruning of small sub-CLTs improves alignment compared with terminal cell type mismatches. On the other hand, a higher pruning cost makes DELTA more specific since terminal cell type mismatches are more likely to be retained than pruned. Nevertheless, a poor choice of parameters likely reduces biological signals. The significance of all patterns we have shown in this paper would thus be stronger if the parameters were further optimized, further enhancing the value of DELTA.We have considered the qualitative cell type or gene expression status in our definition of CLTs. However, the dynamic programming scheme for comparison between CLTs is readily adaptable to the quantitative definition of cell types made by high-throughput experiments, such as single-cell transcriptomics. In this case, instead of using the scoring matrix between qualitatively defined cell types, the alignment score between a pair of terminal cells is calculated by the similarities between their transcriptome profiles, using quantitative metrics such as correlation coefficients or negated Euclidean distances.In the current study, CLTs were used without considering the temporal duration of the cell cycle for each cell. On the one hand, this is a necessary trade-off to ensure DELTA's compatibility with genome-editing-based CLTs, which contains minimal (if any) information on the temporal duration of the cell cycle. On the other hand, discarding the information of cell cycle duration did not prevent us from discovering the similarities in developmental programs by analyzing CLTs. There are two potential explanations for this observation. First, the molecular pathway that regulates cell cycles might be tightly coupled to the developmental program, such that cell cycle duration is an intrinsic property of the cell type. In other words, most cell types have their own specific cell cycle duration, such that the definition of cell types already contains the information concerning cell cycle duration. Second, the definition of cell types might have nothing to do with the cell cycle duration, in which case DELTA needs to be further improved by allowing cell cycle duration adjustment as an additional CLT revision (in addition to sub-CLT pruning), with properly defined costs to the DELTA score.For both simulated CLTs and real CLTs from C. elegans, the developmental process is mostly autonomous, whereas in most complex organisms (except nematodes), it appears to be largely non-deterministic/regulatory. The critical difference between autonomous and regulatory development is whether the gene expression status of individual cells can be altered by external cues, such as environmental stress or signals from other cells. For example, when isolated from the 8-cell-stage mouse blastomere, one cell can grow into one individual mouse (Kelly, 1977) but not one-eighth of a mouse, as would be predicted by autonomous development. However, autonomous cell fate determination is certainly not absent in complex organisms, especially toward the end of the developmental process. Meiosis is one such example of re-occurring sub-CLT, where a primary oocyte divides twice (three division events) and creates one mature ovum and three polar bodies. As long as such autonomous sub-CLTs exist, DELTA alignment will be possible and informative, as demonstrated by our simulated CLTs with perturbed gene expression (Figure 2).In the current study, we revisited previously identified homeotic cell fate transformation caused by gene knockdown. Although we found that >87% of previously found homeotic transformations gave rise to a statistically significant DELTA alignment, i.e., despite providing high sensitivity, the specificity of DELTA in identifying homeotic transformations was low because most local alignments found in DELTA-l between wild-type and mutant strains are between sub-CLTs that are unchanged by the gene knockdown or highly similar in the original CLT. We caution readers that additional systematic examination of these local alignments is required to exclude those false positives before one can identify candidate homeotic transformation warranting further investigation.Last but not least, current single-cell high-throughput experiments suffer from the loss of cells; thus, the CLT reconstructed by lineage tracing of DNA barcodes is likely incomplete, with the majority of terminal cells missing. The robustness of DELTA to such data quality issues dictates the applicability of DELTA. As shown in our DELTA analysis by simulated CLTs with randomly dropped terminal cells (Figure S3), a cell loss rate of 5%, 10%, 20%, or 50%, but not 90%, might still give rise to statistically significant DELTA alignments with high gene expression similarity. In other words, the loss of terminal cells decreases the sensitivity of DELTA. Unfortunately, to the best of our knowledge, none of the CLTs reconstructed by genome editing and single-cell sequencing so far have achieved a cell loss rate ≤50%, which is why we refrained from analyzing genome-editing-based CLT in this current study. Nevertheless, we do believe this situation will be much improved in the near future, which is exactly why we decided to develop DELTA at this time. Moreover, if more CLT data were to become available, this limitation could potentially be alleviated by assembling small fragmented CLTs into big complete CLTs (DELTA or similar algorithms should be required in such assembling efforts), as short sequence fragments with sufficiently high coverage can be assembled into the full-length sequence.Overall, DELTA establishes a computational foundation for the alignment of CLTs and potentiates systematic analyses of lineage trees. Albeit the limitations, DELTA will likely illuminate the connection between phenotypes represented by CLTs and their underlying genotypes, providing novel insights into many unresolved biological questions.
Resource Availability
Lead Contact
Further information and requests for resources and reagents should be directed to and will be fulfilled by the Lead Contact, Jian-Rong Yang (yangjianrong@mail.sysu.edu.cn).
Materials Availability
No new materials were generated in this study.
Data and Code Availability
This current study conducted computational analyses on publicly available datasets, whose source were all explicitly mentioned in main text. As for the source code, DELTA has been deposited in GitHub (https://github.com/yxj17173/DELTA), ggVITA has been deposited in GitHub (https://github.com/helloicyvodka/ggvita), scripts for various DELTA-based analyses presented in this study have been deposited in Github (https://github.com/helloicyvodka/DELTA_code).
Methods
All methods can be found in the accompanying Transparent Methods supplemental file.