| Literature DB >> 32031627 |
James F Fleming1,2, Roberto Feuda1, Nicholas W Roberts3, Davide Pisani1,3.
Abstract
Our ability to correctly reconstruct a phylogenetic tree is strongly affected by both systematic errors and the amount of phylogenetic signal in the data. Current approaches to tackle tree reconstruction artifacts, such as the use of parameter-rich models, do not translate readily to single-gene alignments. This, coupled with the limited amount of phylogenetic information contained in single-gene alignments, makes gene trees particularly difficult to reconstruct. Opsin phylogeny illustrates this problem clearly. Opsins are G-protein coupled receptors utilized in photoreceptive processes across Metazoa and their protein sequences are roughly 300 amino acids long. A number of incongruent opsin phylogenies have been published and opsin evolution remains poorly understood. Here, we present a novel approach, the canary sequence approach, to investigate and potentially circumvent errors in single-gene phylogenies. First, we demonstrate our approach using two well-understood cases of long-branch attraction in single-gene data sets, and simulations. After that, we apply our approach to a large collection of well-characterized opsins to clarify the relationships of the three main opsin subfamilies.Entities:
Keywords: nonbilaterian metazoan; opsin; phylogeny; systematic error
Year: 2020 PMID: 32031627 PMCID: PMC7058159 DOI: 10.1093/gbe/evaa015
Source DB: PubMed Journal: Genome Biol Evol ISSN: 1759-6653 Impact factor: 3.416
. 1.—Competing hypothesis on the phylogenetic affinities of canonical opsin families. (A) Porter et al. (2012); (B) Feuda et al. (2012, 2014) and Hering and Mayer (2014); (C) Ramirez et al. (2016).
. 2.—A flowchart illustrating the key steps in the canary sequence approach. The first stage of the methodology (Red), shows how sequences are classified as members of the base data set or sequences of interest. In the second stage (Orange) sequences are assessed using checking data sets to determine whether or not they are canary sequences. In the fianl stage (Green) noncanary sequences are assessed using canary checking data sets to generate a minimal dataset and its associated minimal tree. Stages are numbered with reference to the steps in the pipeline described in the “The Canary Sequence Approach to Identify Problematic Sequences” section of this paper. A more detailed description of the methodology is available in supplementary figure S1, Supplementary Material online.
. 3.—The canary sequence methodology applied to the opsin data set. The number of sequences at each stage of the canary sequence approach, when applied to the nonbilaterian opsin sequences. Each stage is color-coded to correspond to the stages depicted in figure 2.
. 4.—The Minimal opsin tree recovered under GTR+G. Support values (Bayesian PPs) are reported only for key nodes.
. 5.—Synopsis of opsin evolution. (A) Phylogenetic distribution of canonical opsin in Eumetazoans. (B) Duplication pattern of opsin genes in Eumetazoa. Dashed lines indicate lineage-specific losses.