| Literature DB >> 21920901 |
Stephen P Moss1, Domino A Joyce, Stuart Humphries, Katherine J Tindall, David H Lunt.
Abstract
We have developed a bioinformatics pipeline for the comparative evolutionary analysis of Ensembl genomes and have used it to analyze the introns of the five available teleost fish genomes. We show our pipeline to be a powerful tool for revealing variation between genomes that may otherwise be overlooked with simple summary statistics. We identify that the zebrafish, Danio rerio, has an unusual distribution of intron sizes, with a greater number of larger introns in general and a notable peak in the frequency of introns of approximately 500 to 2,000 bp compared with the monotonically decreasing frequency distributions of the other fish. We determine that 47% of D. rerio introns are composed of repetitive sequences, although the remainder, over 331 Mb, is not. Because repetitive elements may be the origin of the majority of all noncoding DNA, it is likely that the remaining D. rerio intronic sequence has an ancient repetitive origin and has since accumulated so many mutations that it can no longer be recognized as such. To study such an ancient expansion of repeats in the Danio, lineage will require further comparative analysis of fish genomes incorporating a broader distribution of teleost lineages.Entities:
Mesh:
Year: 2011 PMID: 21920901 PMCID: PMC3205604 DOI: 10.1093/gbe/evr090
Source DB: PubMed Journal: Genome Biol Evol ISSN: 1759-6653 Impact factor: 3.416
The Summary Statistics for the Five Teleost Fish
| Genome size | 1,412,464,843 | 461,533,448 | 868,983,502 | 393,312,790 | 358,618,246 |
| Number of genes | 32,312 | 22,456 | 20,422 | 19,388 | 20,562 |
| Number of transcripts | 51,569 | 29,245 | 25,397 | 48,706 | 24,078 |
| Protein coding genes | 24,803 | 20,109 | 18,920 | 17,876 | 18,872 |
| Canonical transcripts | 24,803 | 20,109 | 18,920 | 17,876 | 18,872 |
| Introns per gene | 8.93 | 9.93 | 9.80 | 10.51 | 9.96 |
| Number of introns | 221,589 | 199,624 | 185,494 | 187,962 | 187,875 |
| Maximum intron length | 378,145 | 175,269 | 295,125 | 93,537 | 631,227 |
| Total intron length | 622,476,590 | 151,619,269 | 219,591,667 | 108,524,412 | 90,447,562 |
| Mean length | 2,809 | 760 | 1,184 | 577 | 481 |
| Median length | 984 | 219 | 252 | 143 | 118 |
| Mode length | 84 | 85 | 77 | 78 | 76 |
| 25th percentile length | 138 | 104 | 90 | 84 | 80 |
| 75th percentile length | 2,563 | 615 | 1,026 | 450 | 350 |
| GC content | 50.58% | 50.48% | 47.10% | 40.39% | 49.21% |
| Percentage of genome | 44.07% | 32.85% | 25.27% | 27.59% | 25.22% |
NOTE.—We include total genome size, total number of genes, and total number of transcripts, but our study focuses on the introns found within the genes matching Ensembl’s protein_coding biotype.
F(a) A frequency distribution plot of intron size in the five teleost fish. Each point represents the mean of intron sizes within a 25-bp sliding window. The lower and upper dashed lines represent the 5% and 95% confidence intervals, respectively. All fish present an initial peak of approximately 80 bp and then decay in a similar pattern, with the exception of Danio rerio, which has a second peak between 500 and 2,000 bp and, subsequently, decays parallel to the others. (b) A frequency distribution plot of unique intron size in the five teleost fish, representing the intron sizes after removal of repeat sequences.
A Summary of Repeat Element Content in the Five Teleost Fish, Determined Using the WindowMasker Software
| Number of repeat elements | 4,583,943 | 891,753 | 1,498,499 | 591,789 | 509,271 |
| Length of repeat elements | 291,676,913 | 31,910,164 | 74,289,913 | 20,701,619 | 20,313,082 |
| Number of repeat elements per intron | 20.69 | 4.47 | 8.08 | 3.15 | 2.71 |
| Percentage of intron length | 46.86% | 21.05% | 33.83% | 19.08% | 22.46% |
| Length of unique introns | 330,799,677 | 119,709,105 | 145,301,754 | 87,822,793 | 70,134,480 |
F(a) A frequency distribution of individual repeat element sizes in introns between 500 and 2,000 bp in size. Each point represents the mean of intron sizes within a 25-bp sliding window. (b) Frequency distribution of cumulative repeat element size produced by pooling all repeat elements within individual introns.