| Literature DB >> 27562535 |
Emily Humble1,2, Michael A S Thorne3, Jaume Forcada3, Joseph I Hoffman4.
Abstract
BACKGROUND: Single nucleotide polymorphism (SNP) discovery is an important goal of many studies. However, the number of 'putative' SNPs discovered from a sequence resource may not provide a reliable indication of the number that will successfully validate with a given genotyping technology. For this it may be necessary to account for factors such as the method used for SNP discovery and the type of sequence data from which it originates, suitability of the SNP flanking sequences for probe design, and genomic context. To explore the relative importance of these and other factors, we used Illumina sequencing to augment an existing Roche 454 transcriptome assembly for the Antarctic fur seal (Arctocephalus gazella). We then mapped the raw Illumina reads to the new hybrid transcriptome using BWA and BOWTIE2 before calling SNPs with GATK. The resulting markers were pooled with two existing sets of SNPs called from the original 454 assembly using NEWBLER and SWAP454. Finally, we explored the extent to which SNPs discovered using these four methods overlapped and predicted the corresponding validation outcomes for both Illumina Infinium iSelect HD and Affymetrix Axiom arrays.Entities:
Keywords: Antarctic fur seal; Arctocephalus gazella; Illumina HiSeq sequencing; Marine mammal; Roche 454 sequencing; Single nucleotide polymorphism; Transcriptome; Validation success
Mesh:
Year: 2016 PMID: 27562535 PMCID: PMC5000416 DOI: 10.1186/s13104-016-2209-x
Source DB: PubMed Journal: BMC Res Notes ISSN: 1756-0500
Fig. 1Circular plot showing the hybrid transcriptome assembly. The inner track represents the breakdown of the transcriptome into 454 (purple) and Illumina (blue) components. The middle and outer tracks show the depth of coverage of the 454 and Illumina reads plotted on a log scale. Transcripts are sorted in order of average Illumina coverage. As we required at least ten fold Illumina coverage of a given nucleotide to call a SNP, Illumina coverage of transcripts with less than tenfold average coverage has been truncated zero
Fig. 2Venn diagram showing the extent of overlap among SNPs called using four different methods (see ‘Methods’ section for details)
Fig. 3Variation in SNP minor allele frequency (MAF) and depth of sequence coverage. The upper panels correspond to 4679 SNPs that were called from both the 454 and Illumina datasets, with panel a showing the 454 parameter space and b showing the corresponding Illumina parameter space. The lower panels correspond to the total number of SNPs called from the 454 and Illumina data (20,426 and 18,971 respectively), with panel c showing the 454 parameter space and d showing the corresponding Illumina parameter space
Fig. 4Flow diagram showing the number of SNPs remaining after each step of the SNP detection pipeline for both an Illumina Infinium iSelect HD array (blue circles) and an Affymetrix Axiom array (purple circles)
Proportion of SNPs from each discovery method predicted to successfully validate on both an Illumina Infinium and an Affymetrix Axiom array using predictive modeling and simple filtering approaches
| Discovery method | Predicted validation success (%) | |||
|---|---|---|---|---|
| Infinium | Axiom | |||
| Predictive | Filtering | Predictive | Filtering | |
| BOWTIE2 | 75.7 | 45.6 | 83.3 | 61.7 |
| BWA | 72.1 | 39.7 | 78.5 | 54.6 |
| NEWBLER | 46.8 | 27.3 | 48.9 | 35.5 |
| SWAP454 | 57.0 | 34.6 | 61.8 | 45.9 |
Proportion of those SNPs shared by one, two, three and four calling methods predicted to successfully validate on both an Illumina Infinium and an Affymetrix Axiom array using using predictive modeling and simple filtering approaches
| Share | Predicted validation success (%) | |||
|---|---|---|---|---|
| Infinium | Axiom | |||
| Predictive | Filtering | Predictive | Filtering | |
| One | 66.8 | 30.7 | 91.7 | 57.0 |
| Two | 92.9 | 57.2 | 96.7 | 72.6 |
| Three | 89.3 | 52.5 | 93.2 | 68.0 |
| Four | 89.9 | 54.0 | 93.3 | 68.2 |