| Literature DB >> 24093210 |
Lucia Natali1, Rosa Maria Cossu, Elena Barghini, Tommaso Giordani, Matteo Buti, Flavia Mascagni, Michele Morgante, Navdeep Gill, Nolan C Kane, Loren Rieseberg, Andrea Cavallini.
Abstract
BACKGROUND: Next generation sequencing provides a powerful tool to study genome structure in species whose genomes are far from being completely sequenced. In this work we describe and compare different computational approaches to evaluate the repetitive component of the genome of sunflower, by using medium/low coverage Illumina or 454 libraries.Entities:
Mesh:
Substances:
Year: 2013 PMID: 24093210 PMCID: PMC3852528 DOI: 10.1186/1471-2164-14-686
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Characteristics of contig sets obtained by CLC Bio Workbench and Minimus2 assemblies after different splitting of Illumina and 454 reads
| Illumina | 1 | 0.86 x | 28,283 | 505.4 | 22.1 | 487 |
| | 565 | 0.0015 x | 4,599 | 542.9 | 606.1 | 519 |
| Total | | | 32,377 | 513.4 | 205.8 | 496 |
| 454 (large) | 1 | 1.25 x | 144,755 | 627.7 | 13.8 | 678 |
| | 26 | 0.048 x | 133,900 | 595.3 | 39.2 | 610 |
| Total | | | 227,160 | 662 | 42.6 | 718 |
| 454 (small) | 1 | 0.55 x | 42,964 | 457.6 | 16.6 | 449 |
| | 18 | 0.031 x | 28,984 | 466.1 | 57.5 | 455 |
| Total | 59,923 | 484.7 | 106.1 | 478 |
Figure 1Distributions of mapped Illumina reads to the six sequence sets obtained by assembling original Illumina or 454 reads.
Figure 2Functional composition of the assembled sequence sets. obtained by assembling original Illumina or 454 read sets (first row, unsplit), by assembling the same read sets after a preliminary splitting into subpackages of reads (second row, split), by assembling the two assembled sequence sets previously obtained from Illumina, 454 large and 454 small sets of reads (third row, total), and by assembling the three assembled sequence sets described in the third row (fourth row, WGSAS).
Figure 3Distribution of mapped Illumina reads in the WGSAS. Sequences were subdivided into redundant and unique (low redundant), based on an arbitrary value corresponding to five-fold the mean average coverage of five putatively unique gene sequences.
Functional distribution of the sequences in the SUNREP database
| DNA transposons | Unclassified | 373 |
| | Tc1 Mariner | 5 |
| | hAT | 67 |
| | Mutator | 101 |
| | PIF-Harbinger | 18 |
| | CACTA | 64 |
| | Helitron | 324 |
| | MITE | 382 |
| Retrotransposons | Unclassified | 192 |
| | LTR- | 8,605 |
| | LTR- | 19,726 |
| | LTR-Unknown | 5,636 |
| | Non-LTR | 261 |
| | Pararetrovirus | 11 |
| Tandem repeats and SSR | | 385 |
| rDNA | | 84 |
| Putative genes | | 483 |
| Unknown repeats | Unclassified | 4,739 |
| | Contig 61 type [ | 957 |
| No hits found | | 5,511 |
| Total | 47,924 |
Figure 4Size distribution of and unknown LTR REs, of non-LTR REs, and of DNA transposons families obtained performing an all-by-all BLAST analysis. For each superfamily, the histograms depict the number of families (Y-axis) containing a specified number of contigs. The total number of families and singletons (i.e. families represented by one contig) are also reported.
Figure 5Number of sequences composing the 30 most numerous families of LTR-REs (above) and DNA transposons (below).
The most abundant gene families represented in the SUNREP database
| NBS-LRR Disease Resistance Protein | 24 |
| DNAJ-like Protein | 18 |
| Protein Kinase Domain Containing Protein | 13 |
| F-box Motif Containing Protein | 10 |
| Serine/Threonine/Tyrosine Protein Kinase | 9 |
Statistics of the mapping of Illumina reads to the WGSAS
| Matched nuclear reads | Repeated | 61,860,742 | 52.13 | 80.92 |
| | Unique or low redundant | 14,586,474 | 12.29 | 19.08 |
| | Total | 76,447,216 | 64.42 | 100.00 |
| Not matched nuclear reads | | 42,217,465 | 35.58 | |
| Total nuclear reads | | 118,664,681 | 100.00 | |
| Organellar reads | | 11,680,927 | | |
| Total reads | 130,345,608 |
Percentage distribution of different functional classes of non-coding DNA sequences in the sunflower genome, based on the mapping of the WGSAS
| DNA transposons | Unclassified | 521,152 | 0.68 |
| | Subclass I | 445,239 | 0.58 |
| | Subclass II | 348,166 | 0.46 |
| | MITE | 641,043 | 0.84 |
| | Total | 1,955,600 | 2.56 |
| Retrotransposons | Unclassified | 347,042 | 0.45 |
| | LTR- | 14,693,697 | 19.22 |
| | LTR- | 37,625,059 | 49.22 |
| | LTR-Unknown | 7,569,830 | 9.90 |
| | Non-LTR | 541,494 | 0.71 |
| | Pararetrovirus | 20,624 | 0.03 |
| | Total | 60,797,746 | 79.53 |
| Tandem repeats | | 457,613 | 0.60 |
| rDNA | | 266,528 | 0.35 |
| Unknown repeats | Unclassified | 3,888,190 | 5.09 |
| | Contig 61 type [ | 2,148,599 | 2.81 |
| | Total | 6,036,789 | 7.90 |
| Total matched reads excluding organellar ones | 76,447,216 | ||
Average coverage of a sample of full-length sunflower LTR-retrotransposons measured separately on LTR and inter-LTR regions
| DESRLC1 | 1037.85 | 830.65 | 1.25 | |
| | DHNRLC1 | 186.63 | 3303.45 | 0.06 |
| | LTPRLC1 | 62.72 | 13.47 | 4.66 |
| | LTPRLC2 | 12.54 | 1367.95 | 0.01 |
| | LTPRLC3 | 340.37 | 415.11 | 0.82 |
| | mean | | | 1.36 |
| DESRLG1f | 9239.29 | 2288.53 | 4.04 | |
| | DESRLG2 | 2337.58 | 659.22 | 3.55 |
| | DESRLG3 | 14003.10 | 3932.65 | 3.56 |
| | DHNRLG1 | 2345.90 | 560.36 | 4.19 |
| | DHNRLG2 | 976.16 | 1776.89 | 0.55 |
| | LTPRLG1 | 6267.76 | 10258.53 | 0.61 |
| | LTPRLG2 | 823.73 | 125.25 | 6.58 |
| | LTPRLG3 | 6016.77 | 1024.56 | 5.87 |
| | mean | | | 3.62 |
| Unknown | DESRLX1 | 234.20 | 104.46 | 2.24 |
| | DESRLX2 | 2064.22 | 1827.43 | 1.13 |
| | DHNRLX1 | 702.73 | 5577.56 | 0.13 |
| | DHNRLX2 | 519.09 | 791.34 | 0.66 |
| | LTPRLX1 | 1053.57 | 1875.62 | 0.56 |
| | LTPRLX2 | 956.69 | 356.17 | 2.69 |
| | mean | | | 1.23 |
| Mean | 2.27 | |||