| Literature DB >> 19204816 |
Abstract
Genomes and genes diversify during evolution; however, it is unclear to what extent genes still retain the relationship among species. Model species for molecular phylogenetic studies include yeasts and viruses whose genomes were sequenced as well as plants that have the fossil-supported true phylogenetic trees available. In this study, we generated single gene trees of seven yeast species as well as single gene trees of nine baculovirus species using all the orthologous genes among the species compared. Homologous genes among seven known plants were used for validation of the finding. Four algorithms-maximum parsimony (MP), minimum evolution (ME), maximum likelihood (ML), and neighbor-joining (NJ)-were used. Trees were reconstructed before and after weighting the DNA and protein sequence lengths among genes. Rarely a gene can always generate the "true tree" by all the four algorithms. However, the most frequent gene tree, termed "maximum gene-support tree" (MGS tree, or WMGS tree for the weighted one), in yeasts, baculoviruses, or plants was consistently found to be the "true tree" among the species. The results provide insights into the overall degree of divergence of orthologous genes of the genomes analyzed and suggest the following: 1) The true tree relationship among the species studied is still maintained by the largest group of orthologous genes; 2) There are usually more orthologous genes with higher similarities between genetically closer species than between genetically more distant ones; and 3) The maximum gene-support tree reflects the phylogenetic relationship among species in comparison.Entities:
Keywords: gene evolution; genome; molecular phylogeny; true tree
Year: 2008 PMID: 19204816 PMCID: PMC2614190 DOI: 10.4137/ebo.s652
Source DB: PubMed Journal: Evol Bioinform Online ISSN: 1176-9343 Impact factor: 1.625
Maximum gene-support (MGS), weighted maximum gene-support (WMGS), the second highest gene-support (2nd HGS), weighted second highest gene-support (2nd WHGS), number of unique trees (NUT), and threshold gene number (TGN) required to overcome incongruence based on a data set of 106 genes from seven yeast species*.
| MGS | WMGS | 2nd HGS | 2nd WHGS | NUT | TGN | |
|---|---|---|---|---|---|---|
| MP | 37(35%) | 42(40%) | 10(9%) | 13(12%) | 31 | 15 |
| ME | 33(31%) | 32(30%) | 16(15%) | 17(16%) | 23 | 26 |
| NJ | 25(24%) | 25(23%) | 23(22%) | 23(22%) | 20 | 106 |
| ML | 28(26%) | 34(32%) | 9(9%) | 12(11%) | 38 | 25 |
| MP | 14(13%) | 18(17%) | 11(10%) | 9(9%) | 51 | 55 |
| ME | 14(14%) | 14(14%) | 10(9%) | 10(9%) | 40 | 50 |
| NJ | 17(16%) | 17(16%) | 8(8%) | 10(9%) | 40 | 50 |
Gene-support: number of genes that infer a unique tree; Gene-support percentage in parenthesis: the percentage of a gene-support divided by total genes; Number of unique trees: number of unique trees inferred from 106 genes; Threshold gene number: the minimum number of genes required for overcoming incongruence.
Figure 1The rooted tree with the maximum gene-support inferred from 106 genes of seven yeast species. The outgroup in the analysis was C. albicans. The single gene trees were recovered using bootstrap consensus with a 50% majority rule.
Figure 3The correlation between symmetric differences from trees to the MSG tree and gene lengths.
Figure 2Distribution of sequence length of 106 genes.
Figure 4Relationship between gene-support percentage of unique trees and symmetric distances of the trees from the maximum gene-support tree. The symmetric difference is the number of steps required to convert between two trees. MP trees inferred from nucleotides were used here (no data shown for other methods included).
*Indicates statistical significance at the p = 0.05 level. Top panel: Full dataset; Bottom panel: After taking off the last point.
The number of sampled genes, the maximum gene-supports (MGS), the second highest gene-supports (2nd_HGS), the differences between MGS and 2nd_HGS (DMGS), the maximum gene-support percentages (MGSP), the second highest gene-support percentages (2nd_HGSP), the differences of MGSP and 2nd_HGSP (DMGSP), and precisions*.
| Genes | MGS | 2nd_HGS | DMGS | MGSP % | 2nd HGSP % | DMGSP % | Precision % |
|---|---|---|---|---|---|---|---|
| 5 | 1.6(0.5) | 1.0(0) | 0.6(0.5) | 32.0(11.0) | 20.0(0) | 12.0(11.0) | 60 |
| 10 | 3.2(1.4) | 1.4(0.5) | 1.8(1.3) | 32.0(14.0) | 14.0(5.2) | 18.0(13.2) | 80 |
| 15 | 4.0(1.6) | 2.4(0.7) | 1.6(2.0) | 26.7(10.4) | 16.0(4.7) | 10.7(13.0) | 60 |
| 20 | 5.3(2.3) | 2.5(0.5) | 2.8(2.6) | 26.5(11.6) | 12.5(2.6) | 14.0(13.0) | 90 |
| 24 | 6.6(1.8) | 2.7(0.7) | 3.9(2.2) | 27.5(7.7) | 11.3(2.8) | 16.3(9.1) | 90 |
| 25 | 6.5(1.5) | 2.9(0.6) | 3.6(1.8) | 26.0(6.0) | 11.6(2.3) | 14.4(7.4) | 100 |
| 30 | 8.3(1.6) | 3.1(0.7) | 5.2(2.1) | 27.7(5.2) | 10.3(2.5) | 17.3(7.2) | 100 |
| 106 | 28 | 9 | 19 | 26.4 | 8.5 | 17.9 | 100 |
| r | 0.91 | 0.86 | 0.78 | −0.11 | −0.44 | 0.07 | 0.55 |
| df | 64 | 64 | 64 | 64 | 64 | 64 | 6 |
ML trees inferred from nucleotides (data not shown for other methods). Sample replicates: 10. Precision: the percentage of the number of congruent trees divided by the total number of trees.
Values in parenthesis are standard deviations of the values.
Significant correlation at P ≤ 0.001 level. r: Correlation coefficient. Df: Degree of freedom.
Figure 5The rooted maximum gene-support tree based on 36 genes from seven plant species. G. biloba was specified as the outgroup.
Figure 6Phylogenetic analyses of the concatenated alignments of 106 genes from seven yeast species. Numbers above branches are bootstrap values (ME on amino acids/NJ on amino acids/NJ on nucleotides).