| Literature DB >> 30715311 |
Laura-Jayne Gardiner1,2, Thomas Brabbs1, Alina Akhunov3, Katherine Jordan3, Hikmet Budak4, Todd Richmond5, Sukhwinder Singh6, Leah Catchpole1, Eduard Akhunov3, Anthony Hall1,7.
Abstract
BACKGROUND: Whole-genome shotgun resequencing of wheat is expensive because of its large, repetitive genome. Moreover, sequence data can fail to map uniquely to the reference genome, making it difficult to unambiguously assign variation. Resequencing using target capture enables sequencing of large numbers of individuals at high coverage to reliably identify variants associated with important agronomic traits. Previous studies have implemented complementary DNA/exon or gene-based probe sets in which the promoter and intron sequence is largely missing alongside newly characterized genes from the recent improved reference sequences.Entities:
Keywords: gene capture; plant genomes; promoter capture; wheat
Mesh:
Year: 2019 PMID: 30715311 PMCID: PMC6461119 DOI: 10.1093/gigascience/giz018
Source DB: PubMed Journal: Gigascience ISSN: 2047-217X Impact factor: 6.524
Figure 1:Design of the wheat gene and putative promoter capture probe sets. Processing of the TGAC Chinese Spring, IWGSC Chinese Spring, emmer, and A. tauschii reference sets of gene/putative promoter sequences to generate a final design space for the wheat gold standard putative promoter/gene capture probe set, i.e., NR and high complexity (Methods).
Probe set designs and predicted performance metrics
| Probe set | Design space (bp) with Ns removed | Probe space (bp) | Design space covered by probes (%) | Estimated design space coverage if 75-bp probe captures 200 bp (bp [%]) |
|---|---|---|---|---|
| Putative promoter | 277,010,676 | 154,920,447 | 55.9 | 249,749,794 (90.2) |
| Putative promoter-2 | 282,328,008 | 160,237,779 | 56.8 | 247,535,534 (87.7) |
| Gene | 508,560,490 | 16,796,494 | 31.8 | 465,988,638 (91.6) |
Detailing the size of the putative promoter/gene capture design space and probe space. Estimations of the percentage coverage of the design space after sequencing if each probe captures DNA sequencing library fragments of 200 bp.
Coverage statistics for Chinese Spring
| Gene capture: Chinese Spring 426,725,926 reads | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Reference | Reference size (bp) | % Reads aligned uniquely after duplicate removal | % Reads duplicates | No. of reference contigs | No. of reference contigs mapped | Reference contigs mapped (%) | Mean depth of coverage per reference contig | bp mapped at ≥1× (% reference covered) | bp mapped at ≥5× (% reference covered) | bp mapped at ≥10× (% reference covered) |
| Probe design space | 426,246,621 | 75.2 | 4.52 | 254,950 | 220,837 | 86.6 | 99.15 | 403,219,923 (94.6%) | 395,727,063 (92.8%) | 377,795,215 (88.6%) |
| TGAC gene targets | 440,066,424 | 71.9 | 5.37 | 114,247 | 112,275 | 98.3 | 73.83 | 426,719,705 (97.0%) | 419,176,417 (95.2%) | 400,456,907 (91.0%) |
| TGAC whole genome | 13,427,354,022 | 89.9 | 3.34 | 735,943 | 733,488 | 99.7 | 5.95 | 10,258,685,302 | 2,361,028,858 | 996,680,117 |
| 440,066,424a | (97.4%)a | (93.8%)a | (87.6%)a | |||||||
| 808,769,138b | (90.3%)b | (69.4%)b | (58.1%)b | |||||||
| 711,198,745c | (93.1%)c | (78.2%)c | (68.4%)c | |||||||
| 1,345,755,884d | (87.0%)d | (58.6%)d | (45.7%)d | |||||||
| 219,982,922e | (83.0%)e | (42.7%)e | (26.1%)e | |||||||
| Putative promoter capture: Chinese Spring 232,437,854 reads | ||||||||||
| Probe design space | 232,172,120 | 71.9 | 4.32 | 249,698 | 210,176 | 84.2 | 91.39 | 215,230,363 (92.7%) | 208,424,502 (89.8%) | 194,378,612 (83.7%) |
| TGAC promoter targets | 219,982,922 | 68.1 | 4.64 | 112,999 | 112,600 | 99.6 | 97.51 | 213,924,917 (97.2%) | 207,575,996 (94.4%) | 194,868,834 (88.6%) |
| TGAC whole genome | 13,427,354,022 | 90.4 | 3.02 | 735,943 | 720,291 | 97.9 | 3.83 | 7,746,539,630 | 1,221,249,873 | 620,781,923 |
| 219,982,922e | (95.4%)e | (87.2%)e | (78.2%)e | |||||||
| 625,932,059f | (76.3%)f | (48.6%)f | (37.7%)f | |||||||
| 440,066,424a | (57.8%)a | (23.5%)a | (14.0%)a | |||||||
| 401,070,091g | (85.7%)g | (64.4%)g | (53.5%)g | |||||||
| 1,093,175,155h | (72.9%)h | (40.1%)h | (28.9%)h | |||||||
| 327,780,890i | (86.3%)i | (68.1%)i | (57.6%)i | |||||||
| 592,416,169j | (80.0%)j | (53.2%)j | (41.6%)j | |||||||
Sequencing reads from the gene and putative promoter captures were individually aligned to their respective design spaces, targets, and the full TGAC wheat genome assembly. For alignments to the probe design space, percentages are shown excluding non–Chinese Spring–based sequence. For alignments to the gene and putative promoter targets, percentages are shown using NR sequence. For alignments to the full wheat genome, metrics are shown for coverage of the following: ahigh-confidence genes, bhigh-confidence genes with 2000 bp upstream and downstream, chigh- and low-confidence genes, dhigh- and low-confidence genes with 2000 bp upstream and downstream, ehigh-confidence putative promoter sequences, fhigh-confidence putative promoters with 2000 bp upstream and downstream, ghigh- and low-confidence putative promoters, hhigh- and low-confidence putative promoters with 2000 bp upstream and downstream, ihigh-confidence putative promoters with 1000 bp downstream, and jhigh- and low-confidence putative promoters with 1000 bp downstream.
Figure 2:Highlighting coverage of the MYB transcription factor gene triplet using an island probe design approach. The depth of sequencing coverage is shown per base pair across 3 chromosomal intervals corresponding to a trio of homoeologous genes for the Myb transcription factor (TraesCS7A01G179900 on chr7A at 134491245–134492378 bp, TraesCS7B01G085100 on chr7B at 97192168–97193300 bp, and TraesCS7D01G181400 on chr7D at 135357355–135358494 bp).
Sequencing recommendations for gene and putative promoter capture probe sets
| Capture probe set | Approximate read number required with standard protocol | Approximate read number required with optimized protocol | Expected % coverage of target (≥1×) | Expected % coverage of target (≥5×) | Expected % coverage of target (≥10×) | Mean coverage across target region[ |
|---|---|---|---|---|---|---|
| Gene | 100,000,000 | 55,000,000 | 94.3 | 69.8 | 35.4 | 9.05 |
| Gene | 200,000,000 | 105,000,000 | 96.4 | 86.9 | 68.3 | 17.13 |
| Gene | 300,000,000 | 160,000,000 | 97.1 | 91.8 | 81.6 | 25.42 |
| Gene | 400,000,000 | 210,000,000 | 97.4 | 93.8 | 87.6 | 34.06 |
| Putative promoter | 50,000,000 | 30,000,000 | 87.3 | 43.4 | 9.9 | 5.27 |
| Putative promoter | 100,000,000 | 55,000,000 | 93.2 | 78.2 | 52.6 | 12.05 |
| Putative promoter | 150,000,000 | 80,000,000 | 94.2 | 83.3 | 66.3 | 15.82 |
| Putative promoter | 200,000,000 | 105,000,000 | 95.4 | 87.2 | 78.2 | 21.59 |
Projected coverage of gene and putative promoter capture target sequence (high-confidence gene and promoter sequences, respectively) with varying numbers of sequencing reads. Also shown are the predicted read number requirements to achieve the same coverage using our optimized capture protocol (numbers rounded to the nearest 5 million reads). Read numbers are for total number of paired-end reads and should be halved to get the number of read clusters.
aTarget region is defined as all gene or putative promoter sequences that the probe sets are tiled across, i.e., including padding between probes.
Figure 3:Summary statistics for the 8-plex gene and promoter capture tests. We performed read alignments for the 8 CIMMYT samples to the full Chinese Spring genome. For the (a) gene capture and (b) putative promoter capture probe sets, from left to right, we show box and whisker plots for the percentage of sequencing reads per sample that were identified as duplicates, the percentage of reads mapping uniquely to the whole genome reference sequence, the percentage of reads defined as “on target,” i.e., aligned to the capture probe design space, the mean depth of coverage per sample, and the coefficient of variation per sample.