| Literature DB >> 34352080 |
Xavier Grau-Bové1,2, Arnau Sebé-Pedrós1,2.
Abstract
Possvm (Phylogenetic Ortholog Sorting with Species oVerlap and MCL [Markov clustering algorithm]) is a tool that automates the process of identifying clusters of orthologous genes from precomputed phylogenetic trees and classifying gene families. It identifies orthology relationships between genes using the species overlap algorithm to infer taxonomic information from the gene tree topology, and then uses the MCL to identify orthology clusters and provide annotated gene families. Our benchmarking shows that this approach, when provided with accurate phylogenies, is able to identify manually curated orthogroups with very high precision and recall. Overall, Possvm automates the routine process of gene tree inspection and annotation in a highly interpretable manner, and provides reusable outputs and phylogeny-aware gene annotations that can be used to inform comparative genomics and gene family evolution analyses.Entities:
Keywords: comparative genomics; gene phylogenetics; orthology inference
Mesh:
Year: 2021 PMID: 34352080 PMCID: PMC8557443 DOI: 10.1093/molbev/msab234
Source DB: PubMed Journal: Mol Biol Evol ISSN: 0737-4038 Impact factor: 16.240
Fig. 1(A) Summary of the main steps in Possvm. The final step produces an annotated table with the orthology group assignments of each gene, as well as, optionally, their orthologs in a reference species (human in this example). (B) Example of the iterative midpoint rooting procedure. In this example, the original root (r1) results in the identification of four orthogroups whereas the second iteration (r2) results in two.
Fig. 2(A) Precision, recall, and F-score values for 43 ANTP families defined in HomeoDB. Mean values have been weighted by family size. (B) Effect of gene misplacement on precision, recall, and F-score, for the ANTP data set. (C) Distribution of accuracy statistics (precision, recall, adjusted Rand index) for the ANTP families, using various methods (details in supplementary material S2, Note 1, Supplementary Material online). (D) Effect of the iterative rooting strategy in precision, recall, and F-score, for the Orthobench tree collection. The pie plot shows the number of inflated pairs of trees that had the same or different roots and orthology solutions using each rooting strategy. The bar plots show how often did iterative or midpoint rooting improve recall or precision in the subset of trees with different roots and overall accuracies. Source data available in supplementary material S3, Supplementary Material online.
Fig. 3(A) Global phylogeny of ANTP genes in bilaterians, cnidarians, and placozoans. (B) Summary of annotated genes in Cnidaria and in Nematostella vectensis. (C–E) Three examples of Possvm annotations from the ANTP phylogeny, including reporting evolutionary relationships at the gene pair level.