| Literature DB >> 18366801 |
Suzhi Wang1, Marcé D Lorenzen, Richard W Beeman, Susan J Brown.
Abstract
BACKGROUND: Insect genomes vary widely in size, a large fraction of which is often devoted to repetitive DNA. Re-association kinetics indicate that up to 42% of the genome of the red flour beetle, Tribolium castaneum, is repetitive. Analysis of the abundance and distribution of repetitive DNA in the recently sequenced genome of T. castaneum is important for understanding the structure and function of its genome.Entities:
Mesh:
Substances:
Year: 2008 PMID: 18366801 PMCID: PMC2397513 DOI: 10.1186/gb-2008-9-3-r61
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Abundance and average density of microsatellites, minisatellites and satellites in the D. melanogaster and T. castaneum genomes identified by TRF
| Number of base pairs | Percentage of genome | Number of loci | Average density* (loci/Mb) | |
| Microsatellites | 591,105 | 0.4 | 17,328 | 114 |
| Minisatellites | 3,112,304 | 2.1 | 120,474 | 796 |
| Satellites | 3,775,523 | 2.5 | 4,272 | 28 |
| Total tandem repeats | 7,478,923 | 4.9 | 142,074 | 939 |
| Genome | 151,333,735 | |||
| Microsatellites | 1,442,241 | 1.0 | 52,906 | 367 |
| Minisatellites | 3,590,753 | 2.5 | 126,237 | 876 |
| Satellites | 1,075,701 | 0.7 | 1,343 | 9 |
| Total tandem repeats | 6,108,695 | 4.2 | 180,486 | 1,253 |
| Genome | 143,955,363† |
*For the Tribolium genome, average density = number of repeats/151 Mb; for the Drosophila genome, average density = number of repeats/144 Mb. †The size of the Drosophila genome was calculated by summing the euchromatin (124,006,872 bp) and heterochromatin (19,948,491 bp) not including sequence gaps.
Figure 1Distribution of microsatellites, minisatellites and satellites on each chromosome of the T. castaneum genome.
Figure 2Frequencies of microsatellites per million base pairs in the D. melanogaster and T. castaneum genomes.
Summary of LTR and non-LTR retrotransposons and DNA transposons identified by TEpipe in the T. castaneum genome assembly
| Class | TE library* (kb) | Number of families | Percentage of genome† | TE length range (bp) | Average length (bp) | Copy number (range) | Average copy number | GC content range (%) | Average GC content (%) |
| Non-LTR | 238.1 | 69 | 2.0 | 786-6,820 | 3,363 | 1-2,556 | 161 | 27.15-57.94 | 38.14 |
| LTR | 290.2 | 48 | 1.7 | 3,292-11,097 | 6,019 | 1-1,634 | 202 | 30.61-53.21 | 39.31 |
| DNA transposons | 78.6 | 45 | 2.2 | 456-4,878 | 1,746 | 1-8,949 | 420 | 30.90-46.08 | 37.22 |
*Non-LTR, LTR and DNA transposon TE libraries were produced by TEpipe, which is based on sequence similarity searches using conserved domains from reverse transcriptase and transposase. †To calculate the abundance of TEs in the Tribolium genome assembly, RepeatMasker was run using our TEpipe libraries.
Comparison of repetitive DNA in D. melanogaster and T. castaneum identified by RepeatScout
| Genome | Assembled genome size (Mb) | RepeatScout library size (Mb) | Number of repeat families | Amount of genome (Mb) | Percentage of genome | GC content of library (%) | GC content of the genome (%) |
| 144 | 2.51 | 3,297 | 29.3 | 20 | 59.94 | 41.44 | |
| 151 | 1.41 | 4,475 | 38.9 | 26 | 34.52 | 33.87 |
Analysis of the Tribolium repeat library produced by RepeatScout
| Repeat class | Total repeat family length (kb) | Number of repeat families | Percentage of RepeatScout library | Percentage of genome* | Repeat family length range (bp) | Repeat family average length (bp) | Repeat family copy number range | Repeat family average copy number | Repeat family GC content range (%) | Repeat family average GC content (%) |
| HighA† | 26.1 | 31 | 1.9 | 7.1 | 160-1,771 | 841 | 323-4,337 | 1,368 | 23.05-33.75 | 28.37 |
| Mid‡ | 220.3 | 304 | 15.6 | 7.4 | 67-4,881 | 725 | 11-1,746 | 204 | 13.46-47.51 | 30.19 |
| Low§ | 738.2 | 3,237 | 52.3 | 4.7 | 51-4,520 | 228 | 3-215 | 14 | 12.28-71.15 | 33.61 |
| HighB¶ | 4.6 | 5 | 0.3 | 1.6 | 982-1,277 | 921 | 432-3,531 | 1,306 | 26.58-31.32 | 29.67 |
| 360 bp satellite¥ | 0.4 | 1 | 0.2 | 0.3 | - | - | 1,122 | - | - | 26.31 |
| Transposable elements# | 406.2 | 896 | 28.9 | 4.4 | 51-11,289 | 453.3 | 3-2,471 | 27 | 15.28-65.93 | 38.59 |
*RepeatMasker was used to determine the percent of the genome occupied by each repeat class. †High repetitive A, 31 repeat sequences that each masked >0.1% of the genome. ‡Middle repetitive, 304 repeat sequences that each masked >0.01% and <0.1% of the genome. §Low repetitive, 3,237 repeat sequences that each masked <0.01% of the genome. ¶High repetitive B, repeat sequences that each masked >0.1% of the genome, but show a different distribution pattern to the HighA repeat sequences. ¥360 bp satellite was removed from the HighA class for separate analysis. #Transposable elements were removed from the HighA, Mid, and Low repetitive classes for separate analysis.
Figure 3Distribution of repetitive elements and transposable elements identified by RepeatScout and TEpipe on the Tribolium chromosomes. Repeat elements in the RepeatScout library were classified into High, Mid and Low classes based on the percent of the genome (in bp) that they masked. High repetitive, 37 repeat sequences that each masked >0.1% of the genome. Middle repetitive, 352 repeat sequences that each masked >0.01% and <0.1% of the genome. Low repetitive, 3,179 repeat sequences that each masked <0.01% of the genome.
Estimated total repetitive DNA in T. castaneum genome assembly
| Tools | Percentage of genome masked | Percentage of masked genome overlapping with RepeatScout |
| RepeatScout | 25.7 | N/A |
| TRF | 4.9 | 1.5 |
| TEpipe | 5.8 | 5.2 |
| Total | 36.4 | 6.7 |
| Total repetitive DNA in | 36.4 - 6.7 = 29.7 | |
Figure 4Density and distribution of repetitive DNA on each chromosome of T. castaneum. The total length (kb) of repetitive DNA in each 500 kb interval along the chromosome is plotted. The 300 kb placeholders were not included in the chromosomes. Sequencing gaps are included in the calculation if they are ≥50 bp. The length cutoff for parsing the RepeatMasker results was 50 bp. The HighA class includes the 360 bp satellite. Gene number, gap length and distribution of other repetitive classes within the 500 kb intervals are shown below the main graph for each chromosome. The combined average of HighA repeats and TE per 500 kb along the chromosome is depicted as a black line.
The distribution of repetitive DNA in putative heterochromatin and euchromatin in assembled anchored genome of T. castaneum
| Repeat element | Total length (kb) | Amount in heterochromatin (kb) | Amount in euchromatin (kb) | Percentage in heterochromatin | Percentage in euchromatin |
| Total anchored DNA | 137,758 | 54,754 | 83,004 | 39.70 | 60.30 |
| HighA | 8,729 | 5,633 | 3,096 | 64.53 | 35.47 |
| Mid | 8,769 | 5,633 | 3,096 | 59.00 | 41.00 |
| Low | 4,915 | 2,893 | 2,022 | 58.86 | 41.14 |
| HighB | 2,045 | 267 | 1,778 | 13.06 | 86.94 |
| Non-LTR | 1,370 | 962 | 408 | 70.22 | 29.78 |
| LTR | 1,042 | 896 | 312 | 74.17 | 25.83 |
| DNA transposon | 2,579 | 1,963 | 616 | 76.11 | 23.89 |
| Microsatellite | 439 | 188 | 251 | 42.82 | 57.18 |
| Minisatellite | 2,593 | 1,152 | 1,441 | 44.43 | 55.57 |
| Tandem satellites | 2,621 | 646 | 1,975 | 24.65 | 75.35 |
Nonparametric one-sample runs test for randomness of distribution of heterochromatin and euchromatin blocks
| CH | Interval sequence* | ||||
| CH1 | 15 | 5 | 10 | 2† | 000000000011111 |
| CH2 | 30 | 12 | 18 | 6† | 111111111101000100000000000000 |
| CH3 | 61 | 24 | 37 | 11† | 0000000000000000000000111111111111101111110011011001000000000 |
| CH4 | 25 | 8 | 17 | 5† | 0000001000000000011111110 |
| CH5 | 29 | 11 | 18 | 4† | 11111111100000000001100000000 |
| CH6 | 18 | 7 | 11 | 4† | 000000000010111111 |
| CH7 | 30 | 8 | 22 | 8† | 100000000010011000000000011110 |
| CH8 | 28 | 12 | 16 | 6† | 1111011111101100000000000000 |
| CH9 | 31 | 16 | 15 | 7† | 0101111100111111111100000000000 |
| CH10 | 15 | 7 | 8 | 4† | 111111000010000 |
Columns: CH, chromosome; n, total interval; n1, the number of observations of 1; n2, the number of observations of 0; r, the total number of runs. *We calculated the average density of TEs and HighA satellites per 500 kb for each chromosome and then compared the observed density in each 500 kb interval across the chromosome to this average. If the observed density/average density is >1, this interval was considered to be putative heterochromatin and was denoted as 1. If the observed density/average density is ≤1, this interval was considered to be euchromatin and was denoted as 0. †P < 0.05.
Analysis of density, average size and GC content of genes, exons and introns in putative heterochromatin and euchromatin of T. castaneum
| Heterochromatin | Euchromatin | Average in anchored genome | |
| Length (Mb) | 54.7 | 83.0 | - |
| Percentage in anchored scaffolds | 40 | 60 | 100 |
| GC content (%) | 32.4 | 35.1 | 34.0 |
| Average gene size (kb) | 6.5 | 5.0 | 5.5 |
| Gene* size/MB (kb) | 546 | 602 | 579 |
| Number of genes/Mb | 83 | 120 | 105 |
| Gene GC content (%) | 33.6 | 36.5 | 35.4 |
| Average exon size (bp) | 312 | 329 | 314 |
| Exon* size/gene (bp) | 1,272 | 1,501 | 1,429 |
| Number of exons/gene | 4.1 | 4.6 | 4.4 |
| Number of exons/Mb | 340 | 547 | 465 |
| Exon GC content (%) | 44.8 | 46.3 | 45.9 |
| Average intron size (bp) | 2,711 | 1,705 | 1,999 |
| Intron* size/gene (bp) | 5,238 | 3,694 | 4,180 |
| Number of introns/gene | 3.1 | 3.6 | 3.4 |
| Number of introns/Mb | 339 | 543 | 462 |
| Intron GC content (%) | 30.8 | 32.8 | 32.0 |
*Genes, exons and introns from the GLEAN gene prediction data were used in this analysis.
Recombination rate as reflected in physical size of recombination units in putative heterochromatin and euchromatin in the Tribolium genome assembly
| Linkage group | Average physical size of a recombination unit (kb/cM) | ||
| Heterochromatin | Euchromatin | ||
| CH1* | - | - | - |
| CH2 | Range: 721.1, 523.9, 463.9, 208.3 | Range: 130.7, 153.1, 218.5 | <0.01 |
| Average: 479.3 | Average: 167.4 | ||
| CH3 | Range: 322.5† | Range: 184.4, 226.8, 198.6, 176.4 | <0.01 |
| Average: 322.5 | Average: 196.6 | ||
| CH4 | Range: 346.3, 1440.7 | Range: 141.2, 200.6, 318.3 | <0.01 |
| Average: 893.5 | Average: 220.3 | ||
| CH5* | - | Range: 247.7, 320.9, 176.4, 397.2, 225.0 | - |
| Average: 273.4 | |||
| CH6 | Range: 145.1, 244.5 | Range: 191.4, 38.7 | <0.01 |
| Average: 194.8 | Average: 130.2 | ||
| CH7 | Range: 440.8† | Range: 132.2, 257.0, 31.6, 255.8 | <0.01 |
| Average: 440.8 | Average: 169.2 | ||
| CH8 | Range: 318.5, 543.2 | Range: 165.5, 110.3, 98.2, 367.3 | <0.01 |
| Average: 426.4 | Average: 185.4 | ||
| CH9‡ | Range: 195.9, 326.2, 296.0 | Range: 234.3, 336.6, 164.1 | - |
| Average: 272.7 | Average: 245.0 | ||
| CH10 | Range: 241.7† | Range: 237.9, 127.5 | <0.01 |
| Average: 241.7 | Average: 182.7 | ||
*Not enough genetic markers for analysis. †Recombination was calculated for one scaffold that falls in heterochromatin. ‡No significant difference was observed in the average physical size of a recombination unit in heterochromatin versus euchromatin (P = 0.179).