| Literature DB >> 21712981 |
Guanqun Shi1, Meng-Chih Peng, Tao Jiang.
Abstract
The identification of orthologous genes shared by multiple genomes plays an important role in evolutionary studies and gene functional analyses. Based on a recently developed accurate tool, called MSOAR 2.0, for ortholog assignment between a pair of closely related genomes based on genome rearrangement, we present a new system MultiMSOAR 2.0, to identify ortholog groups among multiple genomes in this paper. In the system, we construct gene families for all the genomes using sequence similarity search and clustering, run MSOAR 2.0 for all pairs of genomes to obtain the pairwise orthology relationship, and partition each gene family into a set of disjoint sets of orthologous genes (called super ortholog groups or SOGs) such that each SOG contains at most one gene from each genome. For each such SOG, we label the leaves of the species tree using 1 or 0 to indicate if the SOG contains a gene from the corresponding species or not. The resulting tree is called a tree of ortholog groups (or TOGs). We then label the internal nodes of each TOG based on the parsimony principle and some biological constraints. Ortholog groups are finally identified from each fully labeled TOG. In comparison with a popular tool MultiParanoid on simulated data, MultiMSOAR 2.0 shows significantly higher prediction accuracy. It also outperforms MultiParanoid, the Roundup multi-ortholog repository and the Ensembl ortholog database in real data experiments using gene symbols as a validation tool. In addition to ortholog group identification, MultiMSOAR 2.0 also provides information about gene births, duplications and losses in evolution, which may be of independent biological interest. Our experiments on simulated data demonstrate that MultiMSOAR 2.0 is able to infer these evolutionary events much more accurately than a well-known software tool Notung. The software MultiMSOAR 2.0 is available to the public for free.Entities:
Mesh:
Year: 2011 PMID: 21712981 PMCID: PMC3119667 DOI: 10.1371/journal.pone.0020892
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1An example of genome evolution and TOGs.
(a) The species tree for four species: . (b) An example of genome evolution for the four species in (a). (c) The TOG for genes in (b). (d) The TOG for genes in (b). (e) The TOG for gene in (b). Note that, in this paper, we will only be interested in ortholog groups containing at least two genes, and singleton ortholog groups will be ignored since they consist of only inparalogs from individual genomes. (f) The TOG for genes in (b). (g) An example of a TOG labeling. The labeling suggests two ortholog groups in the TOG, one consisting of two genes from the two leftmost species and the other two genes from the last three species.
Figure 2An outline of MultiMSOAR 2.0.
Figure 3Comparison of MultiMSOAR 2.0 and MultiParanoid on simulated data.
(a) Simulation results on the parameter set where the parameter is varied. (b) Simulation results on the parameter set where the parameter is varied. (c) Simulation results on the parameter set where the parameter is varied. (d) Simulation results on the parameter set where the parameter is varied.
Performance of the four programs on human, mouse and rat.
| Program | Assignable TPs | TPs | FPs | Unknowns | Total | Sensitivity | Specificity |
| MultiMSOAR 2.0 | 15,598 | 14,051 | 2,399 | 2,919 | 19,369 | 90.08% | 85.42% |
| MultiParanoid | 15,598 | 13,697 | 2,609 | 2,328 | 18,634 | 87.81% | 84.00% |
| Ensembl | 15,598 | 13,474 | 2,495 | 2,091 | 18,060 | 86.38% | 84.38% |
| Roundup | 14,616 | 10,094 | 2,424 | 6,790 | 19,308 | 69.06% | 80.66% |
Ortholog groups shared by MultiMSOAR 2.0, MultiParanoid and Ensembl on the seven mammalian genomes.
| Programs | 7 genomes | 6 genomes | 5 genomes | 4 genomes | 3 genomes | 2 genomes |
| MultiMSOAR 2.0 | 12,034 | 3,772 | 1,337 | 584 | 875 | 3,195 |
| MultiParanoid | 11,397 | 3,311 | 1,127 | 609 | 800 | 2,728 |
| Ensembl | 13,566 | 2,002 | 493 | 270 | 363 | 991 |
| MultiMSOAR 2.0 and MultiParanoid | 9,075 | 2,237 | 633 | 239 | 348 | 1,483 |
| MultiMSOAR 2.0 and Ensembl | 8,722 | 1,003 | 225 | 104 | 131 | 524 |
| MultiParanoid and Ensembl | 8,438 | 983 | 237 | 117 | 143 | 587 |
| All three programs | 7,763 | 872 | 202 | 92 | 119 | 505 |