| Literature DB >> 16314314 |
Abstract
In this work, 21 completely sequenced eukaryotic genomes were analyzed using an intragene comparison approach. We found that all of these genomes show a significant 5'-biased distribution of introns of protein-coding genes. Our findings are different from previous studies based on the intergene method, where introns are biased towards the 5' end of genes only in intron-poor genomes, but are evenly distributed in intron-rich genomes. In addition, by analyzing the patterns of intron distribution of a set of well-compiled housekeeping genes from human and their respective orthologs identified by a bidirectional best BLAST hit method from the other genomes, we found that the trend of 5'-biased intron positions of the set of housekeeping genes for each genome is much more skewed than that of all genes of the same genome, and rarely if any of the housekeeping genes examined have an extremely 3'-biased position distribution in which all introns of a gene are located only at the 3' portion of the gene. The most parsimonious explanation for our findings may be the model in which intron loss is caused by homologous recombination between the genomic copy of a gene and a reverse transcriptase product of a spliced mRNA.Entities:
Mesh:
Year: 2005 PMID: 16314314 PMCID: PMC1292992 DOI: 10.1093/nar/gki970
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
The intron characteristics of interest in the 21 completely sequenced genomes
| Species | Number of CDSs | Number of introns | Introns per CDS | Number of intronless CDSs | Percentage of intronless CDSs |
|---|---|---|---|---|---|
| Canis familiaris | 16 827 | 198 889 | 13 | 1518 | 9 |
| Gallus gallus | 14 250 | 152 758 | 11.4 | 799 | 5.6 |
| Homo sapiens | 20 552 | 154 358 | 8.8 | 2977 | 14.5 |
| Rattus norvegicus | 21 053 | 162 464 | 8.7 | 2333 | 11.1 |
| Pan troglodytes | 21 673 | 165 510 | 8.4 | 1988 | 9.2 |
| Mus musculus | 23 913 | 154 360 | 7.7 | 3865 | 16.2 |
| Apis mellifera | 5798 | 38 069 | 6.8 | 167 | 2.9 |
| Caenorhabditis elegans | 11 754 | 67 455 | 5.9 | 313 | 2.7 |
| Arabidopsis thaliana | 26 258 | 111 649 | 5.4 | 5638 | 21.5 |
| Drosophila melanogaster | 13 181 | 37 216 | 3.6 | 2850 | 21.6 |
| Anopheles gambiae | 1953 | 4021 | 2.6 | 399 | 20.4 |
| Schizosaccharomyces pombe | 1688 | 1662 | 2.2 | 921 | 54.6 |
| Plasmodium falciparum | 567 | 621 | 2 | 258 | 45.5 |
| Yarrowia lipolytica | 6058 | 703 | 1.1 | 5424 | 89.5 |
| Debaryomyces hansenii | 6697 | 354 | 1.1 | 6362 | 95 |
| Encephalitozoon cuniculi | 1468 | 15 | 1.1 | 1454 | 99 |
| Saccharomyces cerevisiae | 4491 | 260 | 1 | 4238 | 94.4 |
| Eremothecium gossypii | 4708 | 220 | 1 | 4492 | 95.4 |
| Kluyveromyces lactis | 5217 | 128 | 1 | 5091 | 97.6 |
| Candida glabrata | 5255 | 84 | 1 | 5173 | 98.4 |
| Guillardia theta | 214 | 15 | 1 | 199 | 93 |
All genes annotated either as hypothetical or with incomplete exon positions were excluded. See the Materials and Methods section for details.
aThe number of introns per CDS was calculated as the number of total introns divided by the subtraction of the number of total CDS from the number of intronless CDS within a genome studied. This is different from the definition by Mourier and Jeffares (3).
The distributions of intron locations for housekeeping genes (HKGs) in each genome
| Species | Number of HKGs | Ratio of 5′/3′-bias | Extremely 3′-biased | |||
|---|---|---|---|---|---|---|
| Total | 5′-bias | 3′-bias | Equal | |||
| C.familiaris | 61 | 29 | 22 | 10 | 1.3 | 0 |
| G.gallus | 55 | 23 | 17 | 15 | 1.4 | 0 |
| H.sapiens | 86 | 45 | 16 | 25 | 2.8 | 0 |
| R.norvegicus | 66 | 35 | 12 | 19 | 2.9 | 0 |
| P.troglodytes | 59 | 22 | 20 | 17 | 1.1 | 0 |
| M.musculus | 70 | 34 | 15 | 21 | 2.3 | 1 |
| A.mellifera | 34 | 15 | 10 | 9 | 1.5 | 3 |
| C.elegans | 48 | 19 | 10 | 19 | 1.9 | 5 |
| A.thaliana | 38 | 13 | 12 | 13 | 1.1 | 2 |
| D.melanogaster | 46 | 24 | 9 | 13 | 2.7 | 5 |
| A.gambiae | 4 | 2 | 0 | 2 | – | 0 |
| S.pombe | 36 | 16 | 2 | 18 | 8 | 1 |
| P.falciparum | 11 | 7 | 0 | 4 | – | 0 |
| Y.lipolytica | 36 | 20 | 0 | 16 | – | 0 |
| D.hansenii | 34 | 8 | 0 | 26 | – | 0 |
| E.cuniculi | 16 | 1 | 0 | 15 | – | 0 |
| S.cerevisiae | 34 | 7 | 0 | 27 | – | 0 |
| E.gossypii | 33 | 7 | 0 | 26 | – | 0 |
| K.lactis | 35 | 6 | 0 | 29 | – | 0 |
| C.glabrata | 33 | 6 | 0 | 27 | – | 0 |
| G.theta | 8 | 1 | 0 | 7 | – | 0 |
aThe total number of housekeeping genes in each genome except Homo sapiens is chosen using bidirectional best BLASTP hit approach that is explained in detail in the Materials and Methods section.
bSince our knowledge about the role of genes is still accumulating and evolving, it might be difficult to identify a true set of housekeeping genes for each specific genome.
The observed and expected distributions of intron positions and results of χ2-test for biased intron distribution for each genome
| Species | Total CDSs | Observed equally distributed | Expected 5′-biased | Observed 5′-biased | Expected 3′-biased | Observed 3′-biased | χ2-value | χ2-value | Ratio of 5′-to 3′-biased |
|---|---|---|---|---|---|---|---|---|---|
| C.familiaris | 16 827 | 2501 | 6404 | 7214 | 6404 | 5594 | 204.9 | 118.9 | 1.3 |
| G.gallus | 14 250 | 2558 | 5446 | 5936 | 5446 | 4957 | 88 | 48.7 | 1.2 |
| H.sapiens | 20 552 | 3334 | 7120 | 8362 | 7120 | 5879 | 432.9 | 256.6 | 1.4 |
| R.norvegicus | 21 053 | 3822 | 7449 | 8954 | 7449 | 5944 | 608.1 | 385 | 1.5 |
| P.troglodytes | 21 673 | 3711 | 7987 | 8787 | 7987 | 7187 | 160.3 | 84.8 | 1.2 |
| M.musculus | 23 913 | 4035 | 8006 | 9535 | 8006 | 6478 | 583.6 | 364.7 | 1.5 |
| A.mellifera | 5798 | 1370 | 2130 | 2506 | 2130 | 1755 | 132.4 | 103.8 | 1.4 |
| C.elegans | 11 754 | 2769 | 4336 | 5676 | 4336 | 2996 | 828.2 | 712.1 | 1.9 |
| A.thaliana | 26 258 | 3799 | 8410 | 8749 | 8410 | 8072 | 27.2 | 0.1 | 1.1 |
| D.melanogaster | 13 181 | 1986 | 4172 | 5354 | 4172 | 2991 | 669.1 | 290 | 1.8 |
| A.gambiae | 1953 | 271 | 642 | 689 | 642 | 594 | 7 | 0.4 | 1.2 |
| S.pombe | 1688 | 101 | 333 | 528 | 333 | 138 | 228.4 | 137.9 | 3.8 |
| P.falciparum | 567 | 21 | 144 | 199 | 144 | 89 | 42 | 2.5 | 2.2 |
| Y.lipolytica | 6058 | 6 | 314 | 582 | 314 | 46 | 457.5 | 35.9 | 12.7 |
| D.hansenii | 6697 | 2 | 166 | 304 | 166 | 29 | 227.1 | 7.1 | 10.5 |
| E.cuniculi | 1468 | 1 | 6 | 13 | 6 | 0 | 13 | – | – |
| S.cerevisiae | 4491 | 2 | 126 | 239 | 126 | 12 | 205.3 | – | 19.9 |
| E.gossypii | 4708 | 1 | 108 | 200 | 108 | 15 | 159.2 | – | 13.3 |
| K.lactis | 5217 | 0 | 63 | 119 | 63 | 7 | 99.6 | – | 17 |
| C.glabrata | 5255 | 1 | 40 | 75 | 40 | 6 | 58.8 | – | 12.5 |
| G.theta | 214 | 0 | 8 | 15 | 8 | 0 | 15 | – | – |
aχ2-test was performed with df = 1, χ2-value is 10.83 (6.63) at an α level of 0.001 (0.01).
bThis χ2-value is calculated when excluded from all of CDSs with only single intron.