| Literature DB >> 23093955 |
Stéphane Deschamps1, Kishore Nannapaneni, Yun Zhang, Kevin Hayes.
Abstract
The use of next-generation DNA sequencing technologies has greatly facilitated reference-guided variant detection in complex plant genomes. However, complications may arise when regions adjacent to a read of interest are used for marker assay development, or when reference sequences are incomplete, as short reads alone may not be long enough to ascertain their uniqueness. Here, the possibility of generating longer sequences in discrete regions of the large and complex genome of maize is demonstrated, using a modified version of a paired-end RAD library construction strategy. Reads are generated from DNA fragments first digested with a methylation-sensitive restriction endonuclease, sheared, enriched with biotin and a selective PCR amplification step, and then sequenced at both ends. Sequences are locally assembled into contigs by subgrouping pairs based on the identity of the read anchored by the restriction site. This strategy applied to two maize inbred lines (B14 and B73) generated 183,609 and 129,018 contigs, respectively, out of which at least 76% were >200 bps in length. A subset of putative single nucleotide polymorphisms from contigs aligning to the B73 reference genome with at least one mismatch was resequenced, and 90% of those in B14 were confirmed, indicating that this method is a potent approach for variant detection and marker development in species with complex genomes or lacking extensive reference sequences.Entities:
Year: 2012 PMID: 23093955 PMCID: PMC3474217 DOI: 10.1155/2012/360598
Source DB: PubMed Journal: Int J Plant Genomics ISSN: 1687-5389
Figure 1Preparation of paired-end reduced representation libraries. Genomic DNA is digested with methyl-sensitive restriction endonuclease PstI. After random shearing, DNA fragments containing the PstI end are selected via biotin selection and end-sequenced. Resulting sequences are assembled locally to create large contig sequences.
Run metrics. The numbers of paired reads 2, paired reads 2 containing the 10 bp “TGCAGGTGCA” signature sequence at their 5′ ends, and paired reads 2 aligning to the public B73RefGen_v2 reference genome sequence are indicated.
| Run metrics | B73 | B14 |
|---|---|---|
| Number of paired reads 2 | 63,964,770 | 94,976,365 |
| Number of paired reads 2 with signature sequence | 61,512,151 | 92,262,878 |
|
| ||
| Alignment against the B73 reference genome sequencea | ||
| Align once | 31,121,355 | 57,009,568 |
| Align more than once | 13,306,206 | 22,869,021 |
| Do not align | 17,084,590 | 12,384,289 |
aBest match to reference sequence of reads aligning uniquely or multiple times to the reference sequence with no more than 2 mismatches.
Figure 2Read 2 coverage of regions by percentages of reads 2 sequences. The percentage of high quality paired reads 2 uniquely aligned to the B73 reference genome is shown in relation to their presence in regions with variable coverage. Y-axis: percentage of all high quality reads 2 uniquely aligned to B73 reference genome; X-axis: variations in sequencing coverage for regions covered by high quality paired read 2 (e.g., 1–10 = sequencing coverage varying from one read 2 to ten reads 2 for each covered region).
Position overlap between B73 and B14. The numbers of distinct B73 and B14 reads overlapping at the same genomic position (as determined by the B73_RefGen v2.0 reference genome) are shown, including redundant and nonredundant positions.
| B73 | B14 | |
|---|---|---|
| Redundant positions | ||
| Not overlapping | 657,961 | 421,056 |
| Overlapping | 2,367,323 | |
|
| ||
| Nonredundant positions | ||
| Not overlapping | 731,837 | 407,130 |
| Overlapping | 1,616,620 | |
Figure 3Contig length distribution. The number of contigs generated de novo is shown in relation to their length in bps. Y-axis: number of contigs generated by assembling paired read 1 and read 2 data extracted from regions with at least 100 stacked read 2 sequences uniquely aligned to the B73 reference genome; X-axis: contig length distribution (in bps) (e.g., 1–100 = contigs <100 bps in length).
Contig alignment to the B73 reference genome. The number of contigs uniquely aligned to the B73 reference genome assembly and exhibiting 0, 1, or 2 mismatches in relation to the reference are shown.
| Number of contigs | B73 | B14 |
|---|---|---|
| 0 mismatch | 22,436 | 42,279 |
| 1 mismatch | 4,142 | 11,875 |
| 2 mismatches | 4,185 | 8,370 |
|
| ||
| Total | 30,763 | 62,524 |