| Literature DB >> 33022036 |
Abstract
Orthobench is the standard benchmark to assess the accuracy of orthogroup inference methods. It contains 70 expert-curated reference orthogroups (RefOGs) that span the Bilateria and cover a range of different challenges for orthogroup inference. Here, we leveraged improvements in tree inference algorithms and computational resources to reinterrogate these RefOGs and carry out an extensive phylogenetic delineation of their composition. This phylogenetic revision altered the membership of 31 of the 70 RefOGs, with 24 subject to extensive revision and 7 that required minor changes. We further used these revised and updated RefOGs to provide an assessment of the orthogroup inference accuracy of widely used orthogroup inference methods. Finally, we provide an open-source benchmarking suite to support the future development and use of the Orthobench benchmark.Entities:
Keywords: benchmark; orthogroup; orthology
Year: 2020 PMID: 33022036 PMCID: PMC7738749 DOI: 10.1093/gbe/evaa211
Source DB: PubMed Journal: Genome Biol Evol ISSN: 1759-6653 Impact factor: 3.416
. 1.Evaluation and revision of RefOGs from Orthobench. (A) Summary of the corrections made to the RefOG data set. (B) Reasons for major corrections to RefOGs from the previous study. (C) The species tree. Green shaded area shows the 12 Bilaterian species for which the Bilateria-level orthogroups (RefOGs) were defined. One outgroup species, which appears in the gene trees in the figure, is also shown. (D) Example of a major improvement for which clades had been missing from the original RefOG tree: RefOG 63 gene tree as determined in the original study. (E) Gene tree from this study showing the corrected RefOG 63 orthogroup shaded green. Phylogenetic analysis revealed that the original RefOG32 comprises two separate orthogroups that diverged at a gene duplication even preceding the divergence of the vertebrates. (F) Example of a major improvement for which extra clades of genes had been included in the original RefOG: RefOG 32 gene tree as determined in the original study. (G) Gene tree from this study showing the corrected RefOG 32 orthogroup. Phylogenetic analysis revealed that these genes diverged from the remaining genes in the tree at a gene duplication event predating the divergence of the Deuterostomes and Protostomes. Gene trees show previously identified orthogroup containing the newly delimited orthogroup from this study (green shaded clade). Genes/species are colored according to species. Corresponding genes identified as members of the orthogroup in both studies are underlined (including when identifiers have been updated). Red dot = 100% bootstrap support.
. 2.The benchmark results for the methods tested. (A) Precision, recall, and F-score. (B) Number of orthogroups predicted exactly, with no extra or missing genes.
. 3.Breakdown of the precision (P), recall (R), and F-score (F) of the methods under different levels of technical challenges to orthogroup inference. (A–C) RefOG size, N. (A) Low, N < 15; (B) medium, 15 ≤ N < 31; and (C) high, N ≥ 31. (D–F) Evolutionary rate measured my mean sequence identity, I. (D) Low evolutionary rate, I > 73.8%; (E) medium, 62.4% < I ≤ 73.8%; and (F) high, I ≤ 62.4%. (G–I) Alignment quality, Q = norMD. (G) Low, Q < 0.88; (H) medium, 0.88 ≤ Q ≤ 1.15; and (I) high, Q > 1.15. (J–L) Number of domains, D. (J) Low, D = 1; (K) medium, 2 ≤ D ≤ 3; and (L) high, D > 3.