| Literature DB >> 28859620 |
Rahul V Rane1,2, John G Oakeshott3, Thu Nguyen4, Ary A Hoffmann4, Siu F Lee3,5.
Abstract
BACKGROUND: Distinguishing orthologous and paralogous relationships between genes across multiple species is essential for comparative genomic analyses. Various computational approaches have been developed to resolve these evolutionary relationships, but strong trade-offs between precision and recall of orthologue prediction remains an ongoing challenge.Entities:
Keywords: Gene birth; Gene duplication; Inparalogue; Orthologue
Mesh:
Year: 2017 PMID: 28859620 PMCID: PMC5580312 DOI: 10.1186/s12864-017-4079-6
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1Process flow diagram for the Orthonome pipeline. The five-step pipeline combines the power of heuristic and greedy algorithms to improve the accuracy and recall from both well annotated and draft genomes
Fig. 2Species tree discordance and orthologue recall tests for six pipelines on the 12 and 20 Drosophila species data sets. The circular markers represent the results for the 20 species data set while the square markers represent those for the 12 high quality Flybase genomes. The yellow circular markers and grey square markers represent pipelines that produce orthogonal orthologue sets with only one gene per species in a cluster. Orthonome provides a superior combination of low average tree error and high recall with both data sets
Fig. 3Comparison between Orthonome, MSOAR and OrthoDB for orthologue identification across syntenically supported regions and fast evolving gene families. a A comparison of orthologue capture success in a 40 Kb syntenic region between D. sechellia and D. melanogaster. Orthologous relationships are indicated by vertical lines. Black lines = orthologous pairs supported by OrthoDB, MSOAR and Orthonome; blue lines = orthologous pairs detected only by MSOAR and Orthonome; red line = orthologous pair recovered only by Orthonome. b The newly recovered orthologous pair FBgn0004554 and FBgn0170274 in Panel A is supported by high (>90%) amino acid identity. c Orthonome is able to split OrthoDB orthogroup EOG7KHP47 (dotted orange clade) consisting of P450 monooxygenases into three independent orthogroups (solid blue clade). The three D. melanogaster genes are highlighted in green and the six additional genes that were allocated to the orthogroups by Orthonome only are marked in red. OrthoDB was unable to identify orthology to these six genes
Numbers of orthologues, orthogroups, inparalogues and gene births identified by Orthonome, OrthoDB, MultiMSOAR2, OMA groups and reciprocal best hit (RBH) using the same input data
| Comparison | Measures | Orthonome | OrthoDB | Multi | OMA | RBH |
|---|---|---|---|---|---|---|
| Twelve FlyBase genomes | Average number of genes per species | 15,446 | ||||
| Average number of orthologues per species | 13,517 | 13,270 | 13,310 | 12,934 | 13,545 | |
| Average number of inparalogues per species | 1519 |
| 1543 |
|
| |
| Average number of gene births per species | 400 | 2805 | 593 |
|
| |
| Number of 1:1 (n=all) orthogroups | 9538 | 6621 | 9595 | 5380 | 8643 | |
| Number of 1:1 (1<n<all) orthogroups | 6711 | 10,231 | 6042 | 15,493 | 9756 | |
| Twenty genomes (Twelve FlyBase + eight modENCODE) | Average number of genes per species | 15,055 | ||||
| Average number of orthologues per species | 13,272 | 13,091 | 12,964 | 12,565 | 13,504 | |
| Average number of inparalogues per species | 1305 |
| 1422 |
|
| |
| Average number of gene births per species | 468 | 2343 | 670 |
|
| |
| Number of 1:1 (n=all) orthogroups | 6541 | 3912 | 7491 | 2555 | 5757 | |
| Number of 1:1 (1<n<all) orthogroups | 14,047 | 16,681 | 11,880 | 25,694 | 16,799 | |
Inparalogue predictions were carried out only in Orthonome and MultiMSOAR2. NA denotes the lack of inparalogue identification by OrthoDB and values that could not be calculated for MSOAR2 (since it has a different scoring method than Orthonome)