| Literature DB >> 20053291 |
Guanqun Shi1, Liqing Zhang, Tao Jiang.
Abstract
BACKGROUND: Ortholog assignment is a critical and fundamental problem in comparative genomics, since orthologs are considered to be functional counterparts in different species and can be used to infer molecular functions of one species from those of other species. MSOAR is a recently developed high-throughput system for assigning one-to-one orthologs between closely related species on a genome scale. It attempts to reconstruct the evolutionary history of input genomes in terms of genome rearrangement and gene duplication events. It assumes that a gene duplication event inserts a duplicated gene into the genome of interest at a random location (i.e., the random duplication model). However, in practice, biologists believe that genes are often duplicated by tandem duplications, where a duplicated gene is located next to the original copy (i.e., the tandem duplication model).Entities:
Mesh:
Year: 2010 PMID: 20053291 PMCID: PMC2821317 DOI: 10.1186/1471-2105-11-10
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1An outline of MSOAR.
Figure 2Genes .
Figure 3Genes .
Figure 4An outline of MSOAR 2.0.
Figure 5Simulation results on the parameter set (*, 50, 75%, 50%) where the parameter .
Figure 6Simulation results on the parameter set (10, *, 75%, 50%) where the parameter .
Figure 7Simulation results on the parameter set (10, 50, *, 50%) where the parameter .
Figure 8Simulation results on the parameter set (10, 50, 75%, *) where the parameter .
Contributions of the major steps in MSOAR 2.0.
| Pair of Species | Inparalogs in TAGs Identified by | Orthologs Assigned | Orthologs Assigned after |
|---|---|---|---|
| human vs mouse | 1,232/2,675 | 16,661 | 16,774 |
| human vs rat | 1,354/2,216 | 15,830 | 15,942 |
Comparison of the performance of five programs using gene symbol validation.
| Pair of Species | Program | Assignable | Total Assigned | True Positives | Unknowns | Sensitivity | Specificity |
|---|---|---|---|---|---|---|---|
| human | InParanoid | 14,341 | 16,058 | 13,216 | 1,394 | 92.16% | 90.13% |
| Ensembl | 14,341 | 20,670 | 13,619 | 2,850 | 94.97% | 76.43% | |
| MultiZ | 14,341 | 16,543 | 13,136 | 1,433 | 91.60% | 86.94% | |
| MSOAR | 14,341 | 16,769 | 13,528 | 1,554 | 94.33% | 88.91% | |
| MSOAR 2.0 | 14,341 | 16,774 | 13,625 | 1,551 | 95.01% | 89.50% | |
| human | InParanoid | 12,688 | 15,197 | 11,750 | 1,529 | 92.61% | 85.97% |
| Ensembl | 12,688 | 18,814 | 12,004 | 2,490 | 94.61% | 73.54% | |
| MultiZ | 12,688 | 16,102 | 11,600 | 1,570 | 91.42% | 79.82% | |
| MSOAR | 12,688 | 15,883 | 11,970 | 1,723 | 94.34% | 84.53% | |
| MSOAR 2.0 | 12,688 | 15,942 | 12,085 | 1,765 | 95.25% | 85.24% | |
In order to assess the accuracy of InParanoid, we take the first pair of genes in each ortholog group (i.e., the main ortholog pair of the group) as a one-to-one ortholog pair. For the Ensembl ortholog database, we directly download all the ortholog pairs from Ensembl Biomart Browser, which includes one-to-one, one-to-many, and many-to-many orthology relationships. In order to extract the orthology information from MultiZ, we download the whole-genome multiple alignment for human, mouse and rat from UCSC genome browser, and map the annotated genes to the alignment based on their coordinates on each genome.
Differences between the ortholog pairs assigned by MSOAR 2.0 and those by the other programs.
| Pair of Species | MSOAR 2.0 vs InParanoid | MSOAR 2.0 vs Ensembl | MSOAR 2.0 vs MSOAR | |||
|---|---|---|---|---|---|---|
| TPs in MSOAR | Not | FPs in Ensembl | Inparalogs | FPs in MSOAR | Inparalogs | |
| human vs mouse | 487 | 408 | 2,997 | 2,664 | 330 | 312 |
| human vs rat | 429 | 400 | 2,681 | 2,366 | 311 | 299 |
(a) This column lists the number of TPs found by MSOAR 2.0 but missed by InParanoid. (b) This column lists the number of TPs in the previous column that are not BBHs. (c) This column lists the number of FPs found by Ensembl but not by MSOAR 2.0. (d) This column lists the number of FPs in the previous column that are inparalogs occurring in TAGs. (e) This column lists the number of FPs found by MSOAR but not by MSOAR 2.0. (f) This column lists the number of FPs in the previous column that are inparalogs occurring in TAGs.
Figure 9A real example of non-BBH true one-to-one ortholog pairs in the human-mouse comparison caught by MSOAR 2.0 in the post-processing step. Four one-to-one ortholog pairs were assigned by MSOAR between two corresponding orthologous blocks on human chromosome 10 (7,244,255 bp-7,900,507 bp) and mouse chromosome 2 (9,977,663 bp-10,636,794 bp). The genes ITIH2 and Itih2 were not assigned orthology by MSOAR, since ITIH2 is not among the top hits of Itih2. However, because Itih2 is the best hit of ITIH2 and the genes are located in corresponding "gaps", MSOAR 2.0 outputs them as an additional one-to-one ortholog pair.
Figure 10Comparison of ortholog assignments made by Ensembl, MSOAR and MSOAR 2.0 for the two segments of human chromosome 2 (178,123,219 bp-178,685,428 bp) and mouse chromosome 2 (75,773,906 bp-76,192,000 bp). Among the 7 pairs of genes illustrated in the figure, only (TTC30B, Ttc30b) and (PDE11A, Pde11a) are known one-to-one ortholog pairs according to gene symbols, as indicated by solid lines. Since the Ensembl ortholog database includes many-to-many relationship, it outputs 7 ortholog pairs, i.e., (TTC30B, Ttc30b), (TTC30B, Ttc30a2), (TTC30B, Ttc30a1), (TTC30A, Ttc30b), (TTC30A, Ttc30a2), (TTC30A, Ttc30a1), and (PDE11A, Pde11a), introducing 5 false ortholog pairs, as indicated by dashed lines. MSOAR assigns three one-to-one ortholog pairs as indicated by the arrows in the figure, i.e., (TTC30B, Ttc30b), (TTC30A, Ttc30a1), and (PDE11A, Pde11a), including one false one-to-one ortholog pair. MSOAR 2.0, however, identifies TTC30A as an inparalog of TTC30B on the human genome and Ttc30a2 and Ttc30a1 as inparalogs of Ttc30b on the mouse genome during the phylogenetic analysis of TAGs, and removes them before invoking MSOAR. Thus, MSOAR 2.0 only outputs two one-to-one ortholog pairs, i.e., (TTC30B, Ttc30b) and (PDE11A, Pde11a), both of which are true positives.
Figure 11Orthologs assigned between human and chimpanzee.
Figure 12Orthologs assigned between human and macaque.
Support of the MSOAR 2.0 one-to-one ortholog pairs by the other two programs.
| Support | human vs | human vs | human vs | human |
|---|---|---|---|---|
| By both programs | 94.72% | 90.69% | 89.93% | 87.71% |
| By at least one program | 98.97% | 97.15% | 96.98% | 96.48% |
Inparalogs found in human and the other species by MSOAR 2.0
| Inparalogs found by MSOAR 2.0 | human vs | human vs | human vs | human |
|---|---|---|---|---|
| Inparalogs in human | 3,161 | 4,103 | 4,390 | 5,222 |
| Inparalogs in the other species | 569 | 3,962 | 6,454 | 6,548 |