| Literature DB >> 31713873 |
Yibi Chen1,2, Raúl A González-Pech1, Timothy G Stephens1, Debashish Bhattacharya3, Cheong Xin Chan1,2.
Abstract
Comparative algal genomics often relies on predicted genes from de novo assembled genomes. However, the artifacts introduced by different gene-prediction approaches, and their impact on comparative genomic analysis remain poorly understood. Here, using available genome data from six dinoflagellate species in the Symbiodiniaceae, we identified methodological biases in the published genes that were predicted using different approaches and putative contaminant sequences in the published genome assemblies. We developed and applied a comprehensive customized workflow to predict genes from these genomes. The observed variation among predicted genes resulting from our workflow agreed with current understanding of phylogenetic relationships among these taxa, whereas the variation among the previously published genes was largely biased by the distinct approaches used in each instance. Importantly, these biases affect the inference of homologous gene families and synteny among genomes, thus impacting biological interpretation of these data. Our results demonstrate that a consistent gene-prediction approach is critical for comparative analysis of dinoflagellate genomes.Entities:
Year: 2020 PMID: 31713873 PMCID: PMC7065002 DOI: 10.1111/jpy.12947
Source DB: PubMed Journal: J Phycol ISSN: 0022-3646 Impact factor: 2.923
Figure 1Variation among α and β genes from six Symbiodiniaceae genomes. (a) PCA plot based on ten metrics of the predicted genes, shown for the α genes in orange, and the β genes in purple, for each of the six genomes (noted in different symbols) as indicated in the legend. The two Cladocopium and the two Symbiodinium species were highlighted for clarity. (b) Tree topology depicting the phylogenetic relationship among the six taxa, based on LaJeunesse et al. (2018). [Color figure can be viewed at http://wileyonlinelibrary.com]
Figure 2Conserved synteny and homologous sets among six Symbiodiniaceae genomes. The number of collinear syntenic gene blocks between each genome‐pair is shown for those inferred based on (a) α and (b) β genes; the upper bar chart shows the number of blocks, the lower bar chart shows the number of implicated genes in these blocks, and the middle panel shows the genome‐pairs corresponding to each bar with a line joining the dots that represent the implicated taxa. The number of homologous sets inferred from (c) α and (d) β genes is shown, in which the taxa represented in the set corresponding to each bar are indicated in the bottom panel. The most remarkable differences between (a) and (b), and (c) and (d), focusing on Symbiodinium and Cladocopium species, are highlighted in red. [Color figure can be viewed at http://wileyonlinelibrary.com]