Literature DB >> 30517749

ORTHOSCOPE: An Automatic Web Tool for Phylogenetically Inferring Bilaterian Orthogroups with User-Selected Taxa.

Jun Inoue1, Noriyuki Satoh1.   

Abstract

Identification of orthologous or paralogous relationships of coding genes is fundamental to all aspects of comparative genomics. For accurate identification of orthologs among deeply diversified bilaterian lineages, precise estimation of gene trees is indispensable, given the complicated histories of genes over millions of years. By estimating gene trees, orthologs can be identified as members of an orthogroup, a set of genes descended from a single gene in the last common ancestor of all the species being considered. In addition to comparisons with a given species tree, purposeful taxonomic sampling increases the accuracy of gene tree estimation and orthogroup identification. Although some major phylogenetic relationships of bilaterians are gradually being unraveled, the scattering of published genomic data among separate web databases is becoming a significant hindrance to identification of orthogroups with appropriate taxonomic sampling. By integrating more than 250 metazoan gene models predicted in genome projects, we developed a web tool called ORTHOSCOPE to identify orthogroups of specific protein-coding genes within major bilaterian lineages. ORTHOSCOPE allows users to employ several sequences of a specific molecule and broadly accepted nodes included in a user-specified species tree as queries and to evaluate the reliability of estimated orthogroups based on topologies and node support values of estimated gene trees. A test analysis using data from 36 bilaterians was accomplished within 140 s. ORTHOSCOPE results can be used to evaluate orthologs identified by other stand-alone programs using genome-scale data. ORTHOSCOPE is freely available at https://www.orthoscope.jp or https://github.com/jun-inoue/orthoscope (last accessed December 28, 2018).
© The Author(s) 2018. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

Entities:  

Keywords:  ORTHOSCOPE; bilaterians; gene tree; orthogroup; orthology; species tree

Mesh:

Year:  2019        PMID: 30517749      PMCID: PMC6389317          DOI: 10.1093/molbev/msy226

Source DB:  PubMed          Journal:  Mol Biol Evol        ISSN: 0737-4038            Impact factor:   16.240


Introduction

Identifying orthology and paralogy is fundamental to all aspects of molecular biological research, including cross-species comparisons (Fitch 1970). Given that orthologs are genes derived by speciation, they are used to infer gene functions in nonmodel organisms (Gabaldon and Koonin 2013) and phylogenetic analysis of species (Moritz and Hillis 1996). Considering the complicated history of genes that have diverged via speciation and gene gain (duplication) or loss, the most reliable approach for distinguishing orthologs from paralogs is by explicit phylogenetic inference (Gabaldon 2008; Sonnhammer et al. 2014; Kuraku et al. 2016), especially among distantly related groups of bilaterians. By estimating gene trees, orthologs can be identified as members of an orthogroup (Li et al. 2003; Chen et al. 2006), a set of genes descended from a single gene in the last common ancestor of all the species being considered (Emms and Kelly 2015). However, identifying an orthogroup by estimating gene trees involves large computational costs, especially for genome-scale data sets. To reduce the computational burden of gene tree estimation, stand-alone programs, such as OrthoMCL (Li et al. 2003) and OrthoFinder (Emms and Kelly 2015), compute sequence similarity scores in multiple species comparisons by employing all-versus-all Blast searches. Then the MCL clustering algorithm (Van Dongen 2000) is used for ortholog identification. On another front, some databases such as EnsemblCompara (Vilella et al. 2008) and PhylomeDB (Huerta-Cepas et al. 2014) store and curate genome-scale orthology hypotheses derived from phylogenetic gene trees. These databases, however, cannot accommodate researchers’ demands to estimate gene trees using their own sequences and purposeful taxonomic sampling. The use of a species tree, in addition to a gene alignment, yields better gene trees than methods that only consider gene alignments (Szöllősi et al. 2015). Recently, some major phylogenetic relationships of bilaterians have gradually begun to be unraveled (Dunn et al. 2014). However, scattering of genome resources among databases, such as NCBI (https://www.ncbi.nlm.nih.gov/), Ensembl (http://www.ensembl.org/), and other independent project-based web sites (e.g., OIST Marine Genomics Unit: http://marinegenomics.oist.jp/) prevents appropriate taxonomic sampling to increase the accuracy of phylogenetic estimation (Heath et al. 2008). Kuraku et al. (2013) integrated scattered protein-coding sequences and created a web tool, aLeaves/MAFFT. With this system, ortholog candidates can be collected from selected databases with their 13 classified groups. Thereafter, for purposeful taxon sampling, orthogroup identification should be achieved by estimating a gene tree manually by selecting sequences. In the course of evolutionary studies of teleost and chordate genes, we constructed databases of genome-scale protein-coding gene sequences, enabling purposeful taxonomic sampling so as to bisect possible long branches. Moreover, we developed an analytical pipeline to identify orthogroups by estimating gene trees and comparing them with their corresponding species trees. This analytical pipeline successfully identified orthologs not only derived from teleost genome duplication (TGD) (Inoue et al. 2015), but it also identified those that contributed to formation of chordate characteristics (Inoue et al. 2017; Inoue and Satoh 2018).

New Approaches

In the process of developing our analytical pipeline to identify orthogroups of major bilaterian lineages, we created a web tool called ORTHOSCOPE. It enables biologists interested in specific molecules to identify orthogroups and to count numbers of orthogroup members in each species/lineage. For this purpose, the database consists of gene models predicted in genome/transcriptome sequencing projects, in an elementary sense. In order to exclude transcript variants of single loci, the database does not incorporate individually reported gene sequences from each species without full genomic or transcriptomic data. Orthogroup identification using ORTHOSCOPE has the following characteristics: Users can 1) employ several query sequences (fig. 1) to collect diverse genes derived from ancestral gene/species separation, 2) select from >250 metazoan species with decoded genomes (fig. 2 and supplementary fig. S1, Supplementary Material online) and one of four taxonomic groups (Deuterostomia, Protostomia, Vertebrata, and Actinopterygii) to employ broadly accepted nodes for orthogroup identification, 3) refer to a hypothetical metazoan species tree reconstructed from a literature survey in order to make their own species trees, and 4) evaluate reliability of orthogroups using topologies, node support values, and functions attached to some sequences shown in estimated gene trees.
. 1.

An overview of the interface of the ORTHOSCOPE web server. (A) The front page. Ortholog identification is conducted by selecting one of four focal groups of species, Actinopterygii, Vertebrata, Deuterostomia, or Protostomia. The user can select species for orthogroup/tree estimation. (B) The resultant tree of Brachyury gene analysis using the focal group Deuterostomia (supplementary fig. S2A2, Supplementary Material online). White letters on a navy blue back ground (Homo sapiens Brachyury gene sequence) indicate the first query (ENST00000296946.6 from Ensembl) and those on a gray back ground indicate others (BRAFL121413 from JGI and ENST00000389554.7). The smallest bilaterian clade, including the first query sequence, is identified as the orthogroup (connected by thick branches). The orthogroup is shown with a vertical bar consisting of black segment (focal group: deuterostome genes) and gray (its sister group: a protostome gene) segment. The basal node denotes the basal split of the orthogroup. Nodes marked with an “r” were rearranged using NOTUNG during comparisons with the species tree, because they had lower bootstrap support values than the user-defined threshold (60%).

. 2.

Phylogenetic relationships of bilaterian lineages and the number of species included in the ORTHOSCOPE database. With respect to each focal group (A–D) of species, an orthogroup is identified in an estimated gene tree by finding the broadly accepted node (marked with black circle and alphabet of focal group of species): Basal splits of Bilateria (Dunn et al. 2014), Deuterostomia (Satoh 2016), Chordata/Olfactores (Satoh 2016), Vertebrata, and bony vertebrates (Meyer and Zardoya 2003). Those broadly accepted nodes are appropriate for corresponding nodes as orthogroup basal nodes because these nodes are insulated from the influence of whole genome duplications when identifying corresponding nodes in gene trees. Phylogenetic positions of whole genome duplications (VGD, vertebrate genome duplication; TGD, teleost genome duplication) follow Braasch and Postlethwait (2012). Whether the second vertebrate genome duplication (VGD2) occurred before or after divergence of jawless fish remains controversial. Black segments denote focal groups of species and gray segments denote their sister groups. Those species groups are used for orthogroup identification by finding their basal nodes (key nodes) in gene trees. Triangles indicate species groups in which monophyly is supported (black) or unsupported (white). For details, see supplementary figure S1, Supplementary Material online.

An overview of the interface of the ORTHOSCOPE web server. (A) The front page. Ortholog identification is conducted by selecting one of four focal groups of species, Actinopterygii, Vertebrata, Deuterostomia, or Protostomia. The user can select species for orthogroup/tree estimation. (B) The resultant tree of Brachyury gene analysis using the focal group Deuterostomia (supplementary fig. S2A2, Supplementary Material online). White letters on a navy blue back ground (Homo sapiens Brachyury gene sequence) indicate the first query (ENST00000296946.6 from Ensembl) and those on a gray back ground indicate others (BRAFL121413 from JGI and ENST00000389554.7). The smallest bilaterian clade, including the first query sequence, is identified as the orthogroup (connected by thick branches). The orthogroup is shown with a vertical bar consisting of black segment (focal group: deuterostome genes) and gray (its sister group: a protostome gene) segment. The basal node denotes the basal split of the orthogroup. Nodes marked with an “r” were rearranged using NOTUNG during comparisons with the species tree, because they had lower bootstrap support values than the user-defined threshold (60%). Phylogenetic relationships of bilaterian lineages and the number of species included in the ORTHOSCOPE database. With respect to each focal group (A–D) of species, an orthogroup is identified in an estimated gene tree by finding the broadly accepted node (marked with black circle and alphabet of focal group of species): Basal splits of Bilateria (Dunn et al. 2014), Deuterostomia (Satoh 2016), Chordata/Olfactores (Satoh 2016), Vertebrata, and bony vertebrates (Meyer and Zardoya 2003). Those broadly accepted nodes are appropriate for corresponding nodes as orthogroup basal nodes because these nodes are insulated from the influence of whole genome duplications when identifying corresponding nodes in gene trees. Phylogenetic positions of whole genome duplications (VGD, vertebrate genome duplication; TGD, teleost genome duplication) follow Braasch and Postlethwait (2012). Whether the second vertebrate genome duplication (VGD2) occurred before or after divergence of jawless fish remains controversial. Black segments denote focal groups of species and gray segments denote their sister groups. Those species groups are used for orthogroup identification by finding their basal nodes (key nodes) in gene trees. Triangles indicate species groups in which monophyly is supported (black) or unsupported (white). For details, see supplementary figure S1, Supplementary Material online.

Results and Discussion

Interface and Analytical Pipeline

ORTHOSCOPE can be accessed via a web browser (fig. 1). To start an analysis, ORTHOSCOPE requires a set of sequences consisting only of coding (DNA) or amino acid sequences of protein-coding genes. When sequences (FASTA format) and a species tree (NEWICK format) are provided by the user (fig. 3), ORTHOSCOPE estimates a gene tree and an orthogroup within several minutes (e.g., 57 s in a case study of deuterostome Brachyury) without the need for user input. Users can modify the species tree in reference to a hypothesis that can be obtained from the ORTHOSCOPE front page.
. 3.

An overview of the ORTHOSCOPE analytical pipeline of orthogroup identification. By uploading query sequences and a species tree (input), after an ORTHOSCOPE analysis, the estimated gene tree and candidate ortholog sequences are downloaded as text/PDF files (output). The species tree (F) consists of a focal group (black segment) and its sister group (gray segment). In the rearranged gene tree (G), the orthogroup consists of a focal group of genes (black segment) and its sister group (gray segment). Nodes marked with “D” are duplication nodes whereas those with no mark are speciation nodes. Refer to the main text for details of each procedure.

An overview of the ORTHOSCOPE analytical pipeline of orthogroup identification. By uploading query sequences and a species tree (input), after an ORTHOSCOPE analysis, the estimated gene tree and candidate ortholog sequences are downloaded as text/PDF files (output). The species tree (F) consists of a focal group (black segment) and its sister group (gray segment). In the rearranged gene tree (G), the orthogroup consists of a focal group of genes (black segment) and its sister group (gray segment). Nodes marked with “D” are duplication nodes whereas those with no mark are speciation nodes. Refer to the main text for details of each procedure. Before starting an analysis, the user needs to select one of the four “Focal groups” of species to identify orthologs with a focal gene in a specific lineage (fig. 1). The user can set parameters in “Sequence collection” for the BlastP search (fig. 3). A threshold (Aligned site rate) in “Alignment” (fig. 3) is used to remove extremely short sequences when such sequences prevent estimation of data matrices for phylogenetic analysis (fig. 3). Parameters in “Tree search” are used for gene tree estimation (fig. 3). Taxonomic sampling is determined by selecting species in “Genome taxon sampling” (fig. 1). In order to count orthologs, ORTHOSCOPE employs a genome-scale protein-coding gene database (coding and amino acid sequence data sets) constructed for each species using only the longest sequence when transcript variants exist for single locus. If a species targeted by a query sequence is not present in the ORTHOSCOPE database, the user needs to add the species name to his species tree. When the analysis starts (fig. 3), ORTHOSCOPE first collects amino acid sequences of ortholog candidates by performing a BlastP search (fig. 3) against selected protein sequence databases. Corresponding coding sequences are also selected from the database. The collected sequences (fig. 3) are aligned using MAFFT (Katoh and Standley 2013). The resultant multiple sequence alignment is trimmed by removing poorly aligned regions using trimAl (Capella-Gutierrez et al. 2009) with the option “gappyout.” Corresponding coding sequences are forced onto the amino acid alignment using PAL2NAL (Suyama et al. 2006) to generate nucleotide alignments for subsequent comparative analysis. To achieve faster analysis speed than is possible with the maximum likelihood method, phylogenetic analyses (fig. 3) employ the neighbor joining (NJ) method (Saitou and Nei 1987) implemented in APE in R (Popescu et al. 2012) for DNA alignments and FastME (Lefort et al. 2015) for amino acid alignments. For analyses of DNA alignments, the most parameter-rich model in the program, the TN 93 model (Tamura and Nei 1993), is applied with a gamma-distributed rate for site heterogeneity (Yang 1994). For analyses of amino acid alignments, a widely used substitution model for nuclear gene analysis, the WAG model (Whelan and Goldman 2001), is applied with the gamma model. To evaluate robustness of internal branches, 100 bootstrap replications are calculated for each data set. Resultant gene trees (fig. 3), however, often have weakly supported nodes. In such cases, one can revise these ambiguous nodes in comparison with a specific species tree. For this purpose, ORTHOSCOPE conducts rearrangement/reconciliation analysis using a method implemented in NOTUNG (Chen et al. 2000) for the NJ gene tree (fig. 3) in comparison with the uploaded species tree (fig. 3). As a first step, NOTUNG rearranges weakly supported nodes of the gene tree, to minimize duplication and extinction of genes, using parsimony with equal weights and the threshold parameter for bootstrap support values of nodes (fig. 1). Then, the rearranged gene tree is reconciled with the species tree. Finally, an orthogroup is identified (fig. 3).

Orthogroup of ORTHOSCOPE

Orthogroups are defined as sets of genes descended from single genes in the last common ancestor of all the species being considered (Emms and Kelly 2015). In gene trees, ancestral states of genes are single at speciation nodes (fig. 3). For this reason, the basal node of orthogroup should be a speciation node, when finding it in a gene tree. However, considering the presence of duplication nodes and weak resolution of gene tree nodes, identification of the orthogroup basal node is difficult without a priori information about species relationships and phylogenetic positions of genome duplication events related to the node (fig. 2). As a corresponding node of the orthogroup basal node (fig. 3), ORTHOSCOPE uses a key node (fig. 3), one of the broadly accepted nodes of a species tree (fig. 2). From a given species tree (fig. 3), ORTHOSCOPE identifies focal and sister groups for two species lineages separated at a key node. Accordingly, an orthogroup identified by ORTHOSCOPE (fig. 3) contains genes not only of the focal group of species, but also of its sister group species. Therefore, when comparing genes within a focal group of genes, some relationships are paralogous. However, when comparing genes between a focal group of genes and its sister group, all relationships are orthologous. In the Deuterostome Brachyury analysis, ORTHOSCOPE identifies deuterostomes as the focal group and protostomes as their sister group (fig. 3). In this case, the separation between deuterostomes and protostomes is used as the key node. By finding the corresponding node of this key node from the rearranged gene tree (fig. 3), ORTHOSCOPE identifies an orthogroup, a bilaterian gene clade including the first query sequence. The bootstrap value of the basal node can be used to evaluate the accuracy of orthogroup identification.

Case Studies

We demonstrate the utility of ORTHOSCOPE using case studies with four focal groups of species. In each case, to show novelty in ORTHOSCOPE, resultant orthogroups were compared with those estimated using two pioneering tools in this field, OrthoFinder (ver. 2.2.6) and aLeaves (last access date: June 24, 2018). Although these two programs also facilitate ortholog estimation, their scopes are different from that of ORTHOSCOPE: 1) OrthoFinder estimates orthogroups for all protein-coding genes at one time using user-specified data sets; and 2) aLeaves collects as many ortholog candidates as possible for a particular molecule using their database including individually reported gene sequences from each species without full genomic/transcriptomic data.

Deuterostome Brachyury

ORTHOSCOPE can identify orthologs of a gene that creates morphological novelty in deuterostomes (fig. 2). The Brachyury gene encodes a member of the T-box transcription factor family and is crucial for notochord formation in chordates (Satoh 2016). Using Brachyury gene sequences of Homo sapiens and Branchiostoma floridae (Florida lancelet) as queries (fig. 1), ORTHOSCOPE identified orthologs from all five deuterostome lineages (table 1A;fig. 4, and supplementary fig. S2A, Supplementary Material online). As suggested in Inoue et al. (2017), two copies of the Brachyury ortholog were identified in each of two cephalochordate species. We confirmed that one of the three queries, H. sapiens TBR1 (ENST00000389554.7 in Ensembl), is placed outside the vertebrate Brachyury orthogroup because the orthogroup was identified based solely on the first query, H. sapiens Brachyury.
Table 1.

Taxon Samplings and Estimated Numbers of Orthogroup Members.

No. of orthogroup members
Taxon samplingORTHOSCOPEOrthoFinder
A. Deuterostome BrachyuryBilateriansa
Protostomia
 Spiralia
   Pinctada fucata00
 Ecdysozoa
   Drosophila melanogaster11
Deuterostomia
 Hemichordata
   Saccoglossus kowalevskii11
   Ptychodera flava11
 Echinodermata
   Strongylocentrotus purpuratus11
   Acanthaster planci11
 Cephalochordata
   Branchiostoma floridae22
   B. belcheri22
 Urochordata
   Oikopleura dioica11
   Botryllus schlosseri11
   Ciona savignyi11
   C. intestinalis11
 Vertebrata
   Gallus gallus22
   Homo sapiens22
B. Protostome BrachyuryBilateriansa
Deuterostomia
   Gallus gallus217
   Homo sapiens216
Protostomia
 Spiralia
  Rotifera
   Adineta vaga328
  Platyhelminthes
   Schistosoma mansoni06
  Annelida
   Capitella teleta18
   Helobdella robusta118
  Nemertea
   Notospermus geniculatus19
  Phoronida
   Phoronis australis16
  Brachiopoda
   Lingula anatina17
  Cephalopoda
   Octopus bimaculoides110
  Gastropoda
   Lottia gigantea111
   Biomphalaria glabrata19
   Aplysia californica112
  Bivalvia
   Crassostrea virginica113
   Crassostrea gigas111
   Mizuhopecten yessoensis112
   Pinctada fucata07
 Ecdysozoa
  Priapulida
   Priapulus caudatus15
  Nematoda
   Trichinella spiralis04
   Strongyloides ratti04
   Onchocerca volvulus04
   Loa loa09
   Brugia malayi04
   Pristionchus pacificus06
   Caenorhabditis japonica013
   C. brenneri09
   C. remanei014
   C. briggsae013
   C. elegans08
  Chelicerata
   Limulus polyphemus132
   Stegodyphus mimosarum012
  Myriapoda
   Strigamia maritima26
  Crustacea
   Daphnia pulex17
  Hexapoda
   Nasonia vitripennis15
   Bombyx mori111
   Drosophila melanogaster18
C. Vertebrate ALDH1AVertebratesa
Urochordata
   Ciona savignyi1
   C. intestinalis1
Vertebrata
 Chondrichthyes
   Callorhinchus milii33
   Rhincodon typus33
 Actinopterygii
   Lepisosteus oculatus33
   Danio rerio22
   Salmo salar55
   Oncorhynchus mykiss35
   Tetraodon nigroviridis12
   Oreochromis niloticus22
   Oryzias latipes11
 Sarcopterygii
  Amphibia
   Xenopus tropicalis33
   Tylototriton wenxianensisb4c5
  Lepidosauria
   Anolis carolinensis33
  Testudines
   Pelodiscus sinensis33
  Aves
   Gallus gallus33
  Mammalia
   Bos taurus33
   Mus musculus44
   Homo sapiens34
D. Actinopterygian PLCB1Actinopsa
Chondrichthyes
   Callorhinchus milii1
   Rhincodon typus1
Sarcopterygii
   Gallus gallus1
   Homo sapiens1
Actinopterygii
 Neopterygii
  Lepisosteidae
   Lepisosteus oculatus11
 Teleostei
  Osteoglossomorpha
   Scleropages formosus22
   Paramormyrops kingsleyae22
  Otomorpha
   Astyanax mexicanus22
   Danio rerio00
   Cyprinus carpio23
  Protacanthopterygii
   Esox lucius22
   Coregonus lavaretusb2c2
   Salmo salar34
  Acanthomorphata
   Gadus morhua22
   Takifugu rubripes22
   Oreochromis niloticus22
   Oryzias latipes11

Taxon sampling (supplementary fig. S2, Supplementary Material online).

Databases constructed from NCBI transcriptome shotgun assembly (TSA).

Numbers manually counted.

. 4.

Schematic of estimated gene trees using ORTHOSCOPE (supplementary fig. S2A–D, Supplementary Material online). (A) Deuterostome Brachyury gene tree. (B) Protostome Brachyury gene tree. (C) Vertebrate ALDH1A gene tree. An asterisk indicates that the orthogroup was not supported by the 60% bootstrap value criterion for the basal node of orthogroup (basal chordate vs. vertebrate lineages). (D) Actinopterygian PLCB1 gene tree. Orthogroups are shown with black (focal group of genes) and gray (sister group of genes) segments.

Taxon Samplings and Estimated Numbers of Orthogroup Members. Taxon sampling (supplementary fig. S2, Supplementary Material online). Databases constructed from NCBI transcriptome shotgun assembly (TSA). Numbers manually counted. Schematic of estimated gene trees using ORTHOSCOPE (supplementary fig. S2A–D, Supplementary Material online). (A) Deuterostome Brachyury gene tree. (B) Protostome Brachyury gene tree. (C) Vertebrate ALDH1A gene tree. An asterisk indicates that the orthogroup was not supported by the 60% bootstrap value criterion for the basal node of orthogroup (basal chordate vs. vertebrate lineages). (D) Actinopterygian PLCB1 gene tree. Orthogroups are shown with black (focal group of genes) and gray (sister group of genes) segments. When the same amino acid databases were used, OrthoFinder produced exactly the same orthogroup as that estimated by ORTHOSCOPE (table 1A and supplementary fig. S2A, Supplementary Material online), identifying Brachyury orthologs in every deuterostome lineage. Moreover, orthogroup members identified by ORTHOSCOPE were also the same as those estimated based on sequences collected by aLeaves (supplementary fig. S2A4, Supplementary Material online), except for hemichordates, which were not included in the aLeaves database. The main difference between the results of ORTHOSCOPE and aLeaves lies in the number of species with identified orthologs from vertebrates and protostomes due to the limitation of purposeful taxonomic sampling.

Protostome Brachyury

ORTHOSCOPE can also evaluate the presence or absence of orthologs in morphologically and genetically diverse protostomes (fig. 2). A Brachyury ortholog has not been identified in the C. elegans (nematode worm) genome (Hejnol and Martin-Duran 2015; Inoue et al. 2017). In order to confirm whether this lack of a Brachyury ortholog is shared among other nematodes, an ORTHOSCOPE analysis was conducted using protostome Brachyury gene sequences and a C. elegans mab-9 sequence (T27A1.6 in WormBase: https://www.wormbase.org), which is related to Brachyury (Woollard and Hodgkin 2000) as queries (supplementary fig. S2B1–B3, Supplementary Material online). The resultant tree confirmed that no Brachyury ortholog is found in 11 nematode species (table 1B, fig. 4, and supplementary fig. S2B, Supplementary Material online). In addition, no Brachyury ortholog was found in platyhelminth genomes, as reported previously (Martin-Duran and Romero 2011; Hejnol and Martin-Duran 2015). To evaluate results indicating the absence of Brachyury orthologs in the nematode and platyhelminth genomes, we estimated the protostome Brachyury orthogroup using OrthoFinder and aLeaves. OrthoFinder identified Brachyury orthologs from nematodes and platyhelminths (table 1B and supplementary table S1B, Supplementary Material online), conflicting with results from ORTHOSCOPE. Divergent protostome sequences and analyses without the broadly accepted node, the basal split of bilaterians, may prevent OrthoFinder from delineating the protostome Brachyury orthogroup. On the other hand, the resultant tree based on sequences collected by aLeaves identified no Brachyury ortholog in either lineage and supported the ORTHOSCOPE results (supplementary fig. S2B4, Supplementary Material online).

Vertebrate ALDH1A

From a transcriptome assembly, ORTHOSCOPE can identify orthologs of genes that experienced ancient whole genome duplications. A comparative genomic study suggested that 20–30% of duplicate genes (Makino and McLysaght 2010) derived from vertebrate genome duplications (VGDs) are still retained in the human genome, even after several hundred million years (fig. 2). Their duplicates, called ohnologs, complicate identification of vertebrate orthologs (Kuraku et al. 2016). The vertebrate ALDH1A (retinaldehyde dehydrogenase 1 A) gene is thought to have been foundational for the emergence of vertebrates (Duester 2008). In vertebrates, the ALDH1A gene encodes cytosolic enzymes capable of metabolizing all-trans-retinaldehyde to retinoic acid, a molecular signal that guides vertebrate development and adipogenesis (Holmes 2015). In order to identify ALDH1A orthologs of Tylototriton wenxianensis (wenxian knobby newt), ORTHOSCOPE analysis was conducted. At first, using a Blast search, five candidate sequences similar to the H. sapiens ALDH1A1 gene sequence (ENST00000297785.7) were selected from the T. wenxianensis transcriptome assembly (GESS00000000 in NCBI). Then an ORTHOSCOPE analysis was conducted using these five sequences as queries. As a result, four out of the five sequences were identified as members of the vertebrate ALDH1A orthogroup (table 1C, fig. 4, and supplementary fig. S2C, Supplementary Material online). A phylogenetic analysis focusing on orthogroup members (supplementary fig. S2C3, Supplementary Material online) indicated that the T. wenxianensis sequences distributed among these ALDH1A gene lineages were duplicated during VGD events. Although the analysis did not provide strong support for relationships among ALDH1A1-3 genes of T. wenxianensis, orthology can be identified by means of conserved synteny. In fact, a syntenic analysis (Canestro et al. 2009) suggests a closer relationship between ALDH1A-1 and ALDH1A-2 gene lineages and the loss of the ALDH1A-3 gene lineage counterpart just after VGDs. We compared the ORTHOSCOPE result with that of OrthoFinder analysis using the same T. wenxianensis transcriptome assembly. Under taxonomic sampling comprising only vertebrates (table 1C), OrthoFinder identified the same four orthologs found by ORTHOSCOPE (supplementary fig. S2C2, Supplementary Material online). However, the OrthoFinder analysis also identified additional sequences, including T. wenxianensis (supplementary table S1C, Supplementary Material online, GESS01039398.1) with an extremely short sequence (324 bp) compared with the others (1,539–1,722 bp). In order to determine their phylogenetic positions, by including this additional T. wenxianensis sequence as one of queries (supplementary fig. S2C4, Supplementary Material online), an ORTHOSCOPE analysis was conducted with the top five Blast hits. As a result, although these additional sequences were included in the vertebrate ALDH1A gene lineage, except for a short sequence (777 bp) of H. sapiens (ENST00000546840.2), the sequence of T. wenxianensis was not grouped with the other T. wenxianensis sequences within the same gene lineage. The sequence alignment and the resultant gene tree produced by ORTHOSCOPE highlighted an ambiguity in the assembly of this extremely short sequence.

Actinopterygian Phospholipase C Beta 1

ORTHOSCOPE can also identify orthologs from genes that experienced TGD (in fig. 2). In order to identify Phospholipase C beta 1 (PLCB1) orthologs of Coregonus lavaretus (common whitefish) from a transcriptome assembly (GESS00000000), an ORTHOSCOPE analysis was conducted using three ortholog candidates of the C. lavaretus PLCB1 gene as queries. The resultant tree showed that two of the three candidate sequences were found in the actinopterygian PLCB1 orthogroup and placed in two gene lineages, teleost PLCB1-1 and -2 (table 1D, fig. 4, and supplementary fig. S2D, Supplementary Material online). Teleost gene lineages PLCB1-1 and -2 are thought to have been derived from TGD, according to phylogenetic and synteny analyses (figs. S27 and S64 in Sato et al. 2009, respectively). Moreover, the ORTHOSCOPE analysis identified duplicated genes in the lineage leading to Cyprinus carpio (common carp) and Salmo salar (Atlantic salmon)/C. lavaretus (in the teleost PLCB1-2 gene lineage [supplementary fig. S2D3, Supplementary Material online]). They may have been derived from the carp genome duplication or the salmonid genome duplication, respectively (supplementary fig. S1D, Supplementary Material online). From the same transcriptome assembly, OrthoFinder identified the same orthologs of C. lavaretus under a taxonomic sampling comprising only actinopterygians (table 1D and supplementary fig. S2D2, Supplementary Material online). For other orthogroup members, however, two additional sequences, C. carpio (XP018928569.1) and S. salar (XP014066076.1), were included in the orthogroup (supplementary table S1D, Supplementary Material online). When the top five sequences of the Blast search were employed (supplementary fig. S2D4, Supplementary Material online), an ORTHOSCOPE analysis included the C. carpio sequence in the actinopterygian PLCB1 gene lineage as an orthogroup member. Again, in comparison with the three C. lavaretus sequences (3,597–3,768 bp), the short length found in this sequence (1,497 bp) may have prevented its inclusion among the top three Blast hits. On another front, the S. salar sequence was not included in the bony vertebrate PLCB1 gene lineage, probably due to its long branch (supplementary fig. S2D4, Supplementary Material online). In the alignment produce by ORTHOSCOPE, a highly diversified region was found in this long S. salar sequence (6,328 bp). A possible mis-assembly made ortholog identification of this sequence difficult.

Conclusions

ORTHOSCOPE, a fully automatic web pipeline, successfully identified orthologs in the present four example analyses, consistent with manual identifications in prior research. As shown in the present study, ORTHOSCOPE can be used to evaluate orthologs identified in genome-scale analyses by other programs. ORTHOSCOPE users can evaluate reliability of orthogroups using estimated gene trees in light of user knowledge of species/gene evolutionary histories, even when the same orthogroups were identified among different programs. In addition to inferring gene function from model to nonmodel organisms (but see Gabaldon and Koonin 2013), orthogroups identified by ORTHOSCOPE can be applied to evolutionary studies of gene regulatory networks (Marti-Solans et al. 2016) and local synteny (Inoue et al. 2017) including nonmodel organisms. Moreover, with regard to genes derived from VGDs or TGD, ORTHOSCOPE can evaluate phylogenetic markers in vertebrates or teleosts by identifying the presence or absence of ohnologs, which complicate phylogenetic analyses. We will include newly published genome-wide protein-coding sequences from various metazoan species and expand focal groups in ORTHOSCOPE (e.g., spiralians and urochordates) in response to user requests.

Materials and Methods

The server runs on the Linux operating system and an Apache HTTP Server provides web services. Python scripts process all data and requests from users. All these resources have been extensively used and are well supported.

Supplementary Material

Supplementary data are available at Molecular Biology and Evolution online. Click here for additional data file.
  33 in total

1.  ape 3.0: New tools for distance-based phylogenetics and evolutionary analysis in R.

Authors:  Andrei-Alin Popescu; Katharina T Huber; Emmanuel Paradis
Journal:  Bioinformatics       Date:  2012-04-11       Impact factor: 6.937

2.  EnsemblCompara GeneTrees: Complete, duplication-aware phylogenetic trees in vertebrates.

Authors:  Albert J Vilella; Jessica Severin; Abel Ureta-Vidal; Li Heng; Richard Durbin; Ewan Birney
Journal:  Genome Res       Date:  2008-11-24       Impact factor: 9.043

Review 3.  Retinoic acid synthesis and signaling during early organogenesis.

Authors:  Gregg Duester
Journal:  Cell       Date:  2008-09-19       Impact factor: 41.582

4.  Distinguishing homologous from analogous proteins.

Authors:  W M Fitch
Journal:  Syst Zool       Date:  1970-06

5.  Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: approximate methods.

Authors:  Z Yang
Journal:  J Mol Evol       Date:  1994-09       Impact factor: 2.395

6.  Evolutionary implications of morphogenesis and molecular patterning of the blind gut in the planarian Schmidtea polychroa.

Authors:  José María Martín-Durán; Rafael Romero
Journal:  Dev Biol       Date:  2011-02-03       Impact factor: 3.582

7.  MAFFT multiple sequence alignment software version 7: improvements in performance and usability.

Authors:  Kazutaka Katoh; Daron M Standley
Journal:  Mol Biol Evol       Date:  2013-01-16       Impact factor: 16.240

8.  PAL2NAL: robust conversion of protein sequence alignments into the corresponding codon alignments.

Authors:  Mikita Suyama; David Torrents; Peer Bork
Journal:  Nucleic Acids Res       Date:  2006-07-01       Impact factor: 16.971

9.  Consequences of lineage-specific gene loss on functional evolution of surviving paralogs: ALDH1A and retinoic acid signaling in vertebrate genomes.

Authors:  Cristian Cañestro; Julian M Catchen; Adriana Rodríguez-Marí; Hayato Yokoi; John H Postlethwait
Journal:  PLoS Genet       Date:  2009-05-29       Impact factor: 5.917

10.  trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses.

Authors:  Salvador Capella-Gutiérrez; José M Silla-Martínez; Toni Gabaldón
Journal:  Bioinformatics       Date:  2009-06-08       Impact factor: 6.937

View more
  11 in total

1.  Active Expression of Genes for Protein Modification Enzymes in Habu Venom Glands.

Authors:  Akiko Isomoto; Eiichi Shoguchi; Kanako Hisata; Jun Inoue; Yinrui Sun; Kenji Inaba; Noriyuki Satoh; Tomohisa Ogawa; Hiroki Shibata
Journal:  Toxins (Basel)       Date:  2022-04-22       Impact factor: 5.075

2.  Draft Genome of Tanacetum Coccineum: Genomic Comparison of Closely Related Tanacetum-Family Plants.

Authors:  Takanori Yamashiro; Akira Shiraishi; Koji Nakayama; Honoo Satake
Journal:  Int J Mol Sci       Date:  2022-06-24       Impact factor: 6.208

3.  Repertoires of G protein-coupled receptors for Ciona-specific neuropeptides.

Authors:  Akira Shiraishi; Toshimi Okuda; Natsuko Miyasaka; Tomohiro Osugi; Yasushi Okuno; Jun Inoue; Honoo Satake
Journal:  Proc Natl Acad Sci U S A       Date:  2019-04-01       Impact factor: 11.205

4.  ORTHOSCOPE Analysis Reveals the Presence of the Cellulose Synthase Gene in All Tunicate Genomes but Not in Other Animal Genomes.

Authors:  Jun Inoue; Keisuke Nakashima; Noriyuki Satoh
Journal:  Genes (Basel)       Date:  2019-04-10       Impact factor: 4.096

5.  Genome editing reveals fitness effects of a gene for sexual dichromatism in Sulawesian fishes.

Authors:  Satoshi Ansai; Koji Mochida; Shingo Fujimoto; Daniel F Mokodongan; Bayu Kreshna Adhitya Sumarto; Kawilarang W A Masengi; Renny K Hadiaty; Atsushi J Nagano; Atsushi Toyoda; Kiyoshi Naruse; Kazunori Yamahira; Jun Kitano
Journal:  Nat Commun       Date:  2021-03-01       Impact factor: 14.919

6.  ORTHOSCOPE*: A Phylogenetic Pipeline to Infer Gene Histories from Genome-Wide Data.

Authors:  Jun Inoue
Journal:  Mol Biol Evol       Date:  2022-01-07       Impact factor: 16.240

7.  Genomic Fishing and Data Processing for Molecular Evolution Research.

Authors:  Héctor Lorente-Martínez; Ainhoa Agorreta; Diego San Mauro
Journal:  Methods Protoc       Date:  2022-03-07

8.  Draft genome of Tanacetum cinerariifolium, the natural source of mosquito coil.

Authors:  Takanori Yamashiro; Akira Shiraishi; Honoo Satake; Koji Nakayama
Journal:  Sci Rep       Date:  2019-12-03       Impact factor: 4.379

9.  Evolutionary History of GLIS Genes Illuminates Their Roles in Cell Reprograming and Ciliogenesis.

Authors:  Yuuri Yasuoka; Masahito Matsumoto; Ken Yagi; Yasushi Okazaki
Journal:  Mol Biol Evol       Date:  2020-01-01       Impact factor: 16.240

10.  Eighteen Coral Genomes Reveal the Evolutionary Origin of Acropora Strategies to Accommodate Environmental Changes.

Authors:  Chuya Shinzato; Konstantin Khalturin; Jun Inoue; Yuna Zayasu; Miyuki Kanda; Mayumi Kawamitsu; Yuki Yoshioka; Hiroshi Yamashita; Go Suzuki; Noriyuki Satoh
Journal:  Mol Biol Evol       Date:  2021-01-04       Impact factor: 16.240

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.