| Literature DB >> 33080972 |
Roshan Kulkarni1, Ratan Chopra2, Jennifer Chagoya3, Charles E Simpson4, Michael R Baring5, Andrew Hillhouse6, Naveen Puppala7, Kelly Chamberlin8, Mark D Burow1,3.
Abstract
The use of molecular markers in plant breeding has become a routine practice, but the cost per accession can be a hindrance to the routine use of Quantitative Trait Loci (QTL) identification in breeding programs. In this study, we demonstrate the use of targeted re-sequencing as a proof of concept of a cost-effective approach to retrieve highly informative allele information, as well as develop a bioinformatics strategy to capture the genome-specific information of a polyploid species. SNPs were identified from alignment of raw transcriptome reads (2 × 50 bp) to a synthetic tetraploid genome using BWA followed by a GATK pipeline. Regions containing high polymorphic SNPs in both A genome and B genomes were selected as targets for the resequencing study. Targets were amplified using multiplex PCR followed by sequencing on an Illumina HiSeq. Eighty-one percent of the SNP calls in diploids and 68% of the SNP calls in tetraploids were confirmed. These results were also confirmed by KASP validation. Based on this study, we find that targeted resequencing technologies have potential for obtaining maximum allele information in allopolyploids at reduced cost.Entities:
Keywords: allopolyploid; heterozygous SNP calls; targeted resequencing; tetraploids
Year: 2020 PMID: 33080972 PMCID: PMC7650781 DOI: 10.3390/genes11101220
Source DB: PubMed Journal: Genes (Basel) ISSN: 2073-4425 Impact factor: 4.096
Summary of target selection from different approaches for the Fluidigm experiment (targets selected from GATK, filtered through SWEEP, OLin as reference and SNP chip as control).
| Approach | No. of Targets |
|---|---|
| GATK (Genome reference) | 18 |
| SWEEP (Genome reference) | 10 |
| OLin (Olin reference) | 14 |
| SNP chip (control) | 6 |
Figure 1GATK pipeline for SNP identification and target selection (for the OLin method, the SNP identification pipeline would be the same expect for the alignment with the OLin transcriptome sequence instead of the synthetic tetraploid genome reference).
Summary of SNPs identified from the SNP identification pipeline for target design.
| Criterion | SNP Comparison | Number of Genome Sequence Scaffolds | Number of SNPs | SWEEP (No. of Genome Sequence Scaffolds) | SWEEP (No. of SNPs) | SNPs in Common to GATK and SWEEP |
|---|---|---|---|---|---|---|
| Raw SNPs | NA | NA | 1.6 million | NA | NA | NA |
| Filter for MAF 1 | A Genome SNPs among tetraploids | 1157 | 222,387 | 881 | 55,234 | NA |
| Filter for MAF | B Genome SNPs among tetraploids | 247 | 64,090 | 231 | 16,343 | NA |
| Filter for RD 2 (100), PIC 3 (0.3) | A Genome SNPs among tetraploids | 617 | 9884 | 452 | 6292 | 4618 |
| Filter for RD 2 (100), PIC 3 (0.3) | B Genome SNPs among tetraploids | 184 | 5684 | 152 | 2548 | 2128 |
| SNPs from common contigs (+/− 100 bp) | SNPs common to A genome of tetraploids and A genome diploids | 418 | 10,533 | 309 | 7153 | 3004 |
| SNPs from common contigs (+/− 100 bp) | SNPs common to tetraploid A and B genomes and also to the parents of the A genome diploid population | 129 | 502 | 60 | 192 | 179 |
| Best BLAST Hit * (Alignment length > 350 bps) | SNPs common to both A and B tetraploid genomes | NA | 517 | NA | 212 | NA |
1 MAF—Minor Allele Frequency filter > 10%; 2 RD—Read Depth; 3 PIC—Polymorphic Information Content. ‘*’—BLAST was performed to identify homoeologous SNP containing scaffolds in order to target SNPs common to both A and B genome tetraploids. ‘NA’—Represents either not applicable or it was not calculated for that particular comparison.
Summary of SNPs identified from the OLin tetraploid reference method.
| Read Depth | Polymorphic Information Content (PIC) | Number of SNPs |
|---|---|---|
| >100 | >0.5 | 15 |
| >100 | 0.4–0.49 | 592 |
| >100 | 0.3–0.39 | 258 |
Figure 2MiSeq Sequencing Run Distribution of reads by Peanut Accession on MiSeq (250X2) (total reads: 30,153,796; PF Reads (Reads that have passed Illumina quality filter): 24,430,536) (each index corresponds to a different plant accession, given in Supplemental Table S5). Red bars indicate the accessions that failed to amplify satisfactorily.
Summary of percentage validation of SNPs from different methods.
| Criterion | Method | ||||||
|---|---|---|---|---|---|---|---|
| GATK | SWEEP | SNP Chip | OLin | Sum | GATK + SWEEP + SNP Chip | ||
| A genome diploid | Total no. of targets | 19 | . | 6 | . | 25 | 25 |
| No. of missing targets | 4 | . | 0 | . | 4 | 4 | |
| No. of targets assayed | 15 | . | 6 | . | 21 | 21 | |
| No. of targets validated | 12 | . | 5 | . | 17 | 17 | |
| % Validation (of assayed) | 80.0 | 83.3 | . | 80.9 | 80.9 | ||
| % Validation (of targets) | 63.1 | . | 83.3 | . | 68.0 | 68.0 | |
| Tetra-ploids | Total no. targets | 19 | 9 | 6 | 14 | 48 | 34 |
| No. of missing targets | 5 | 1 | 0 | 6 | 12 | 6 | |
| No. of targets assayed | 14 | 8 | 6 | 8 | 36 | 28 | |
| No. of targets validated | 10 | 5 | 4 | 4 | 23 | 19 | |
| % Validation (% of assayed) | 71.4 | 62.5 | 66.6 | 50 | 50 | 68 | |
| % validation (% of targets) | 52.6 | 55.5 | 66.7 | 28.5 | 47.9 | 55.8 | |
‘.’—Not targeted.
Summary of SNPs obtained from MiSeq reads mapped to the 48 targets.
| 48 Target Region | |
|---|---|
| Genome | No. of SNPs |
| Tetraploid | 111 |
| Tetraploid (PIC > 0.3) | 81 |
| A genome diploid | 246 |
| B genome diploid | 313 |
KASP validation.
| Primer No. | Source Details | Validation Percentage |
|---|---|---|
| P01 | MiSeq sequence derived from GATK compared against 48 target references | 73 |
| P03 | MiSeq sequence derived from the SNP chip | monomorphic |
| P04 | MiSeq sequence derived from SWEEP | 100 |
| P05 | MiSeq sequence derived from SWEEP | 100 |
| P06 | MiSeq sequence derived from GATK compared against 48 target references | 73 |
| P07 | MiSeq sequence derived from GATK aligned against the whole genome reference | 100 |
| P08 | MiSeq sequence derived from GATK aligned against the whole genome reference | 92 |
| P10 | MiSeq sequence derived from GATK aligned against the whole genome reference | 70 |
Figure 3Example of validation of SNP calls derived from MiSeq using KASP markers.
Figure 4Target sequence with multiple SNPs and indels. (a)—An example showing homoelogous (top) and homoeologous (bottom) SNPs in tetraploids. Yellow rectangles at left denote tetraploids. Red, blue, and green rectangles (top) denote A, B, and K genome diploids, and red and blue ovals denote A and B genomes in tetraploids. Allele calls are shown above the figure. At bottom are homologous SNPs differentiating tetraploids; SNPs are denoted above the figure as alleles 1 or 2 (orange and purple ovals at left), and 3 or 4 (pink or cyan ovals at right). The order of sequences is the same in the top and bottom figures. (b) An example of target sequence in IGV viewer showing multiple SNPs (There are two SNPs showing up in all accessions (blue and orange), which may be referred to as anchor SNPs). The target SNP is highlighted by a red circle.
Cost analysis for amplicon sequencing for different chemistries for 48 plate reactions.
| Item | Fluidigm 2012 | Fluidigm 2020 | Generic |
|---|---|---|---|
| 48.48 IFC | 238 | 477 | 0 |
| PCR plates | 10 | 10 | 10 |
| Integrated Reagents Pkg | 0 | 481 | 0 |
| Loading and other solutions | 15 | 0 | 0 |
| Primers (48) | 730 | 2400 | 384 |
| Barcodes | 21 | 0 | 58 |
| Taq | 4 | 0 | 6 |
| Beads | 5 | 0 | 36 |
| Sequencing | 357 | 357 | 357 |
| ------------- | ------------- | ------------- | ------------- |
| MiSeq 48 samples | 1380 | 3725 | 850 |
| MiSeq per sample | 28.75 | 77.60 | 17.72 |
Cost analysis for amplicon sequencing for different chemistries for 384 plate reactions.
| Item | Fluidigm 2012 | Fluidigm 2020 | Generic |
|---|---|---|---|
| 48.48 IFC | 3808 | 3816 | 0 |
| PCR plates | 80 | 80 | 100 |
| Integrated Reagents Pkg | 0 | 3848 | 0 |
| Loading and other solutions | 15 | 0 | 0 |
| Primers (960) | 2359 | 14,746 | 2359 |
| Barcodes | 341 | 0 | 418 |
| Taq | 35 | 0 | 971 |
| Beads | 73 | 0 | 179 |
| Sequencing | 2365 | 1381 | 1381 |
| ------------- | ------------- | ------------- | ------------- |
| HiSeq/NovaSeq 384 samples | $9076 | $23,871 | $5408 |
| HiSeq/NovaSeq per sample | $23.64 | $62.16 | $14.08 |