| Literature DB >> 21951788 |
Grit Haseneyer1, Thomas Schmutzer, Michael Seidel, Ruonan Zhou, Martin Mascher, Chris-Carolin Schön, Stefan Taudien, Uwe Scholz, Nils Stein, Klaus Fx Mayer, Eva Bauer.
Abstract
BACKGROUND: The improvement of agricultural crops with regard to yield, resistance and environmental adaptation is a perpetual challenge for both breeding and research. Exploration of the genetic potential and implementation of genome-based breeding strategies for efficient rye (Secale cereale L.) cultivar improvement have been hampered by the lack of genome sequence information. To overcome this limitation we sequenced the transcriptomes of five winter rye inbred lines using Roche/454 GS FLX technology.Entities:
Mesh:
Substances:
Year: 2011 PMID: 21951788 PMCID: PMC3191334 DOI: 10.1186/1471-2229-11-131
Source DB: PubMed Journal: BMC Plant Biol ISSN: 1471-2229 Impact factor: 4.215
Figure 1Pipeline for the assembly procedure of Roche/454 sequence reads. After data generation [A], sequence (fasta), quality (qual) and trace file information were extracted. Low quality regions, vector and adaptor sequences were removed from raw reads [B]. Preprocessing was finished by subjecting trimmed reads to the line-specific assembly. For establishment of the SNP resource Sce_Assembly02 [C] only reads assembled in contigs of line-specific assemblies were subjected to the merging process of the second assembly using Mira. For establishment of the EST resource Sce_Assembly03 [D] assemblies were computed for each of the five lines separately with CLC assembly cell, Mira, and Newbler and merged by CAP3 assembly. Consensus sequences of all lines were passed to a second CAP3 assembly combining sequences over multiple lines. The resulting sequence set comprises contigs that were confirmed by consensus sequences from two to five lines (multi-line contigs) or contigs that contain reads originating from one line (single-line contigs).
Descriptive statistics of five independent Roche/454 GS FLX sequencing runs.
| Inbred line | |||||
|---|---|---|---|---|---|
| Lo7 | Lo152 | Lo225 | P87 | P105 | |
| Number of sequences | 364,343 | 469,345 | 572,518 | 488,829 | 681,787 |
| Average read length [bp] | 239 | 248 | 242 | 240 | 244 |
| Number of sequences | 363,681 | 469,208 | 571,433 | 488,132 | 681,136 |
| Average read length [bp] | 207 | 220 | 213 | 208 | 214 |
| Total bp | 75,281,967 | 103,225,760 | 121,715,229 | 101,531,456 | 145,763,104 |
| 25% quantile [bp] | 203 | 210 | 208 | 203 | 207 |
| Median [bp] | 213 | 222 | 218 | 213 | 217 |
| 75% quantile [bp] | 223 | 236 | 229 | 223 | 228 |
Description of the Sce_Assembly03.
| Multi-line contigs | Single-line contigs | |
|---|---|---|
| 2,000,855 | 286,386 | |
| 60 | 3 | |
| 1,527 | 505 | |
| 1,070 | 333 | |
| 727 | 247 | |
| 33,352 | 82,048 | |
| < 500 bp | 11,188 | 71,581 |
| 501-1000 bp | 12,679 | 8,347 |
| 1001-2000 bp | 7,693 | 1,952 |
| 2001-5000 bp | 1,767 | 166 |
| > 5000 bp | 25 | 2 |
| 8,636 | 5,721 |
BlastN comparisons of the five line-specific assemblies generated with CAP3 and the Sce_Assembly03.
| Query | |||||||
|---|---|---|---|---|---|---|---|
| Subject | Lo7 | Lo152 | Lo225 | P87 | P105 | Multi-line contigs | Single-line contigs |
| Lo7 | 52.2 | 56.1 | 61.8 | 56.9 | 76.1 | 35.5 | |
| Lo152 | 67.7 | 54.3 | 59.6 | 56.0 | 77.1 | 49.5 | |
| Lo225 | 77.6 | 58.3 | 68.7 | 63.8 | 84.2 | 53.5 | |
| P87 | 74.4 | 55.4 | 59.9 | 60.9 | 82.8 | 40.6 | |
| P105 | 78.7 | 59.5 | 63.8 | 70.2 | 87.8 | 47.5 | |
| Multi-line contigs | 85.2 | 64.4 | 69.6 | 78.0 | 72.3 | 35.3 | |
| Single-line contigs | 59.1 | 64.4 | 67.3 | 59.2 | 62.4 | 58.5 | |
Values show percent hits of query sequences counting the first best hit in each comparison.
Figure 2Heatmap of (t)BlastX analysis results to public model grass genomes and . Contig sequences from the line-specific assemblies generated by CAP3 and the Sce_Assembly03 were aligned to public barley and wheat EST and flcDNA sequences and to Brachypodium, maize, rice, and sorghum genomic sequences. Percent hits to individual databases were counted using a 70% similarity cutoff and visualized in colours (colour code shown on the right).
Heterozygosity of five sequenced rye inbred lines after genotyping with the Rye5K array.
| Inbred line | |||||
|---|---|---|---|---|---|
| Lo7 | Lo152 | Lo225 | P87 | P105 | |
| Loci total | 3,145 | 3,133 | 3,134 | 3,148 | 3,127 |
| Homozygous loci | 3,004 | 3,005 | 2,987 | 2,997 | 2,988 |
| Heterozygous loci | 141 | 128 | 147 | 151 | 139 |
| Generation | F7 | F7 | F7 | F7:10 | F6:9 |
| Expected heterozygosity [%] | 1.6 | 1.6 | 1.6 | 1.6 | 3.1 |
| Observed heterozygosity [%] | 4.5*** | 4.1*** | 4.7*** | 4.8*** | 4.4* |
Significant (***: p-value < 0.01, *: p-value < 0.05) deviation from the expected level of heterozygosity is indicated.
Figure 3Distribution of allele frequencies for evaluable SNPs on the Rye5K SNP array. Allele frequencies observed in total and separately in the rye breeding seed parent and pollen parent pools belong to one category if the value is > the left category border and ≤ the right category border. Allele frequency values equal to 0 and 1 fall into the first and last category, respectively.