| Literature DB >> 21935348 |
Miklos Csuros1, Igor B Rogozin, Eugene V Koonin.
Abstract
Protein-coding genes in eukaryotes are interrupted by introns, but intron densities widely differ between eukaryotic lineages. Vertebrates, some invertebrates and green plants have intron-rich genes, with 6-7 introns per kilobase of coding sequence, whereas most of the other eukaryotes have intron-poor genes. We reconstructed the history of intron gain and loss using a probabilistic Markov model (Markov Chain Monte Carlo, MCMC) on 245 orthologous genes from 99 genomes representing the three of the five supergroups of eukaryotes for which multiple genome sequences are available. Intron-rich ancestors are confidently reconstructed for each major group, with 53 to 74% of the human intron density inferred with 95% confidence for the Last Eukaryotic Common Ancestor (LECA). The results of the MCMC reconstruction are compared with the reconstructions obtained using Maximum Likelihood (ML) and Dollo parsimony methods. An excellent agreement between the MCMC and ML inferences is demonstrated whereas Dollo parsimony introduces a noticeable bias in the estimations, typically yielding lower ancestral intron densities than MCMC and ML. Evolution of eukaryotic genes was dominated by intron loss, with substantial gain only at the bases of several major branches including plants and animals. The highest intron density, 120 to 130% of the human value, is inferred for the last common ancestor of animals. The reconstruction shows that the entire line of descent from LECA to mammals was intron-rich, a state conducive to the evolution of alternative splicing.Entities:
Mesh:
Year: 2011 PMID: 21935348 PMCID: PMC3174169 DOI: 10.1371/journal.pcbi.1002150
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.475
Figure 1Reconstruction of intron gains and losses in the evolution of eukaryotes and intron density in ancestral eukaryote forms.
Branch widths are proportional to intron density which is shown next to terminal taxa and some deep ancestors, in units of the introns count per 1 kbp coding sequence. Human (Hsap) is marked by a blue dot. Edges are colored by the relative amount of intron gain and loss, as indicated in the inset scatter plot where each point corresponds to an edge in the tree. Gain% is the percentage of introns gained in the given lineage from the parent node; loss% is the percentage of the parent's introns lost within the same lineage. Species names and abbreviations: Aureococcus anophagefferens (Aano), Aedes aegypti (Aaeg), Agaricus bisporus (Abis), Anopheles gambiae (Agam), Allomyces macrogynus ATCC 38327 (Amac), Apis mellifera (Amel), Aspergillus nidulans FGSC A4 (Anid), Acyrthosiphon pisum (Apis), Arabidopsis thaliana (Atha), Babesia bovis (Bbov), Batrachochytrium dendrobatidis (Bden), Branchiostoma floridae (Bflo), Botryotinia fuckeliana B05.10 (Bfuc), Brugia malayi (Bmal), Bombyx mori (Bmor), Coccomyxa sp. C-169 (C169), Chlorella sp. NC64a (C64a), Caenorhabditis briggsae (Cbri), Caenorhabditis elegans (Cele), Coprinopsis cinerea okayama7#130 (Ccin), Cochliobolus heterostrophus C5 (Chet), Coccidioides immitis RS (Cimm), Ciona intestinalis (Cint), Cryptococcus neoformans var. neoformans (Cneo), Chlamydomonas reinhardtii (Crei), Capitella teleta (Ctel), Capsaspora owczarzaki ATCC 30864 (Cowc), Dictyostelium discoideum (Ddis), Dictyostelium purpureum (Dpur), Drosophila melanogaster (Dmel), Drosophila mojavenis (Dmoj), Daphnia pulex (Dpul), Danio rerio (Drer), Entamoeba dispar (Edis), Entamoeba histolytica (Ehis), Emiliania huxleyi (Ehux), Fragilariopsis cylindrus (Fcyl), Phanerochaete chrysosporium (Fchr), Phaeodactylum tricornutum (Ftri), Gallus gallus (Ggal), Gibberella zeae PH-1 (Gzea), Hydra magnipapillata (Hmag), Helobdella robusta (Hrob), Homo sapiens (Hsap), Ixodes scapularis (Isca), Laccaria bicolor (Lbic), Lottia gigantea (Lgig), Micromonas sp. RCC299 (M299), Monosiga brevicollis (Mbre), Mucor circinelloides (Mcir), Mycosphaerella fijiensis (Mfij), Mycosphaerella graminicola (Mgra), Magnaporthe grisea 70-15 (Mgri), Melampsora laricis-populina (Mlar), Micromonas pusilla CCMP1545 (Mpus), Neurospora crassa OR74A (Ncra), Nematostella vectensis (Nvec), Nasonia vitripennis (Nvit), Ostreococcus sp. RCC809 (O809), Ostreococcus lucimarinus (Oluc), Oryza sativa japonica (Osat), Ostreococcus taurii (Otau), Phytophthora capsici (Pcap), Plasmodium falciparum (Pfal), Puccinia graminis (Pgra), Pediculus humanus (Phum), Phaeosphaeria nodorum SN15 (Pnod), Physcomitrella patens subsp. patens (Ppat), Phytophthora ramorum (Pram), Pyrenophora tritici-repentis Pt-1C-BFP (Prep), Proterospongia sp. ATCC 50818 (Prsp), Phytophthora sojae (Psoj), Paramecium tetraurelia (Ptet), Plasmodium vivax (Pviv), Plasmodium yoelii yoelii (Pyoe), Rhizopus oryzae (Rory), Sorghum bicolor (Sbic), Saccharomyces cerevisiae (Scer), Schizosaccharomyces japonicus yFS175 (Sjap), Schistosoma mansoni (Sman), Selaginella moellendorffii (Smoe), Schizosaccharomyces pombe (Spom), Spizellomyces punctatus DAOM BR1173 (Spun), Strongylocentrotus purpuratus (Spur), Sporobolomyces roseus (Sros), Sclerotinia sclerotiorum 1980 UF-70 (Sscl), Trichoplax adhaerens (Tadh), Theileria annulata (Tann), Tribolium castaneum (Tcas), Toxoplasma gondii (Tgon), Taenopygia guttata (Tgut), Theileria parvum (Tpar), Thalassiosira pseudonana (Tpse), Tetrahymena thermophila (Tthe), Ustilago maydis 521 (Umay), Uncinocarpus reesii 1704 (Uree), Volvox carteri (Vcar), Vitis vinifera (Vvin).
Figure 2Inferred ancestral intron densities and confidence intervals.
The plots for 9 key ancestral forms show the posterior distributions of the ancestral intron density inferred from the sampling chains. On each plot, the horizontal red line shows the median (the dot) and the 95% (+/−47.5%) confidence interval around it, estimated from 50,000 sampled MCMC steps.
Figure 3Inferred intron site histories in prohibitin orthologs (KOG3083).
The tree from Figure 1 is used as the template for the reconstruction. Vertical bars are placed at intron sites proportionally along the X axis within the bars with respect to the underlying alignment. The height of green bars is proportional to the probability of intron presence; the height of red bars is proportional to the probability of intron gain in the lineage leading to the node.