| Literature DB >> 30084941 |
Justin B Miller1, Brandon D Pickett1, Perry G Ridge1.
Abstract
MOTIVATION: Orthologous gene identification is fundamental to all aspects of biology. For example, ortholog identification between species can provide functional insights for genes of unknown function and is a necessary step in phylogenetic inference. Currently, most ortholog identification algorithms require all-versus-all BLAST comparisons, which are time-consuming and memory intensive.Entities:
Mesh:
Year: 2019 PMID: 30084941 PMCID: PMC6378933 DOI: 10.1093/bioinformatics/bty669
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Estimated time of species divergence
| Species 1 | Species 2 | Estimated time | Median time | Confidence interval |
|---|---|---|---|---|
| 6.65 MYA | 6.4 MYA | 6.23–7.07 MYA | ||
| 96 MYA | 94 MYA | 91–102 MYA | ||
| 312 MYA | 320 MYA | 297–326 MYA | ||
| 96 MYA | 94 MYA | 91–102 MYA | ||
| 312 MYA | 320 MYA | 297–326 MYA | ||
| 312 MYA | 320 MYA | 297–326 MYA |
Note: Species Divergence taken from the average estimate from various studies included in TimeTree (Hedges , 2015; Kumar and Hedges, 2011; Kumar ).
Ortholog groups recovered using JustOrthologs and CombineOrthoGroups
| Genes with the same annotation | Genes with other annotations | Genes with unknown annotations | Total genes | Reason for other annotations |
|---|---|---|---|---|
| 127 | 0 | 63 | 190 | N/A |
| 178 | 0 | 7 | 185 | N/A |
| 172 | 1 | 7 | 180 | XP_018109801.1 has 100% BLAST identity with NP_001087532.1, which is annotated the same as the other 172 genes |
| 155 | 2 | 21 | 178 | The nucleotide composition and exon length of XP_001959559.1 and XP_002071834.1 are similar to XP_010179458.1. However, the alignment is very different. These two genes are probably incorrectly reported as orthologous by JustOrthologs |
| 169 | 0 | 9 | 178 | N/A |
| 169 | 1 | 5 | 175 | XP_414807.2 has a 99% BLAST identity with XP_015732072.1 from a closely related species, which is annotated the same as the other 169 genes |
| 166 | 0 | 5 | 171 | N/A |
| 165 | 1 | 5 | 171 | NP_068697.1 is annotated Trp53inp1 instead of TP53INP1 |
| 163 | 1 | 6 | 170 | XP_014347657.1 is annotated LRRC8E instead of LRRC8C |
| 165 | 0 | 4 | 169 | N/A |
| 161 | 0 | 7 | 168 | N/A |
| 162 | 0 | 5 | 167 | N/A |
| 161 | 1 | 4 | 166 | XP_020368157.1 is incorrectly reported as orthologous by JustOrthologs. The CDS region lengths matched some exons in XP_005866852.1, but the alignment of the sequences was very poor |
| 163 | 0 | 3 | 166 | N/A |
| 152 | 1 | 13 | 166 | XP_018123052.1 is annotated grb10.L instead of GRB10 |
| 161 | 0 | 4 | 165 | N/A |
| 156 | 0 | 9 | 165 | N/A |
| 159 | 0 | 6 | 165 | N/A |
| 160 | 0 | 5 | 165 | N/A |
| 160 | 0 | 4 | 164 | N/A |
| 159 | 0 | 5 | 164 | N/A |
| 158 | 0 | 5 | 163 | N/A |
| 156 | 1 | 5 | 162 | XP_017312051.1 is incorrectly reported as orthologous by JustOrthologs. The CDS region lengths matched several exons within XP_020920808.1, but the alignment of the sequences was poor |
| 156 | 0 | 5 | 161 | N/A |
| 158 | 0 | 3 | 161 | N/A |
| 153 | 0 | 7 | 160 | N/A |
| 149 | 0 | 9 | 158 | N/A |
| 154 | 0 | 3 | 157 | N/A |
| 146 | 0 | 11 | 157 | N/A |
| 153 | 0 | 4 | 157 | N/A |
Note: The first 30 ortholog groups are ordered from the most genes to the fewest genes. The first column shows the number of genes with the same annotations. The second column shows the number of genes with a different annotation than the genes in the first column. The third column shows the number of genes without annotations. The fourth column shows the total number of genes in the ortholog group. The fifth column is an analysis of why genes in the second column were not annotated the same as genes in the first column but were reported as orthologous by JustOrthologs. Each gene comes from a different species.
Whole genome comparison of different species
| Species 1 | Species 2 | Number of genes in species 1 | Number of genes in species 2 | Number of shared ortholog annotations from HGNC | True positives reported | False positives reported | Unnamed genes reported in orthologous pairs | Precision (%) | Recall (%) |
|---|---|---|---|---|---|---|---|---|---|
| 20 088 | 17 900 | 14 653 | 14 119 | 462 | 905 | 96.83 | 96.36 | ||
| 20 088 | 16 691 | 12 725 | 8229 | 150 | 246 | 98.21 | 64.67 | ||
| 20 088 | 12 643 | 10 659 | 841 | 38 | 35 | 95.68 | 7.89 | ||
| 16 420 | 12 643 | 9163 | 5132 | 139 | 597 | 97.36 | 56.01 | ||
| 21 920 | 22 408 | 5832 | 683 | 296 | 688 | 69.77 | 11.71 | ||
| 19 450 | 22 408 | 5699 | 199 | 104 | 205 | 65.68 | 3.49 | ||
| 30 680 | 40 642 | 2800 | 2424 | 183 | 18 300 | 92.98 | 86.57 | ||
| 27 785 | 21 832 | 8645 | 8326 | 94 | 9857 | 98.88 | 96.31 | ||
| 17 492 | 13 837 | 10993 | 10 238 | 4 | 1615 | 99.96 | 93.13 | ||
| 21 815 | 21 481 | 15199 | 12 183 | 720 | 279 | 94.42 | 80.16 | ||
| 17 980 | 19 208 | 12894 | 11 929 | 97 | 1337 | 99.19 | 92.52 | ||
| 17 980 | 16 297 | 11411 | 7991 | 18 | 502 | 99.78 | 70.03 | ||
| 12 225 | 14 150 | 9825 | 7041 | 15 | 662 | 99.79 | 71.66 | ||
| 12 225 | 11 852 | 8770 | 6565 | 14 | 695 | 99.79 | 74.86 | ||
| 24 179 | 22 628 | 0 | 0 | 0 | 14 004 | N/A | N/A |
Note: All available genes are compared between various species. The first two columns are the names of the species being compared. Columns three and four indicate how many genes are present in each species. Column five shows how many genes have the same ortholog annotations in both species. Column six shows the number of true positives JustOrthologs identifies. Column seven shows the number of false positives identified by JustOrthologs. Column eight shows the number of genes reported as orthologous by JustOrthologs but not named by the HGNC. Columns nine and ten report the precision and recall of the compared species, respectively.