| Literature DB >> 22132191 |
Ilaria Milano1, Massimiliano Babbucci, Frank Panitz, Rob Ogden, Rasmus O Nielsen, Martin I Taylor, Sarah J Helyar, Gary R Carvalho, Montserrat Espiñeira, Miroslava Atanassova, Fausto Tinti, Gregory E Maes, Tomaso Patarnello, Luca Bargelloni.
Abstract
The growing accessibility to genomic resources using next-generation sequencing (NGS) technologies has revolutionized the application of molecular genetic tools to ecology and evolutionary studies in non-model organisms. Here we present the case study of the European hake (Merluccius merluccius), one of the most important demersal resources of European fisheries. Two sequencing platforms, the Roche 454 FLX (454) and the Illumina Genome Analyzer (GAII), were used for Single Nucleotide Polymorphisms (SNPs) discovery in the hake muscle transcriptome. De novo transcriptome assembly into unique contigs, annotation, and in silico SNP detection were carried out in parallel for 454 and GAII sequence data. High-throughput genotyping using the Illumina GoldenGate assay was performed for validating 1,536 putative SNPs. Validation results were analysed to compare the performances of 454 and GAII methods and to evaluate the role of several variables (e.g. sequencing depth, intron-exon structure, sequence quality and annotation). Despite well-known differences in sequence length and throughput, the two approaches showed similar assay conversion rates (approximately 43%) and percentages of polymorphic loci (67.5% and 63.3% for GAII and 454, respectively). Both NGS platforms therefore demonstrated to be suitable for large scale identification of SNPs in transcribed regions of non-model species, although the lack of a reference genome profoundly affects the genotyping success rate. The overall efficiency, however, can be improved using strict quality and filtering criteria for SNP selection (sequence quality, intron-exon structure, target region score).Entities:
Mesh:
Year: 2011 PMID: 22132191 PMCID: PMC3222667 DOI: 10.1371/journal.pone.0028008
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Geographic location of the four sampling sites.
In brackets the number of specimens in the discovery panel (454 and GAII) and the number of individuals that have been subsequently genotyped in the validation step. NTHS: North Sea (59°19′N, 1°39′E); ATIB: Iberian Atlantic coast (43°20′N, 8°56′W); TYRS: Tyrrhenian Sea (42°32′N, 10°9′E); AEGS: Aegean Sea (40°19′N, 24°33′E).
Summary statistics of sequence assembly.
| 454 | GAII | |
|
| 506,772 | 50,685,405 |
|
| 206 (6–457) bp | 74 bp |
|
| 462,489 | 50,685,405 |
|
| 5,702 | 3,756 |
|
| 331 (100–5,103) bp | 190 (100–3,063) bp |
|
| 4,221 (74.03) | 2,644 (70.39) |
Figure 2Distribution of SNPs across 454 and GAII contigs.
On the x-axis, number of SNPs per contig; on the y-axis, the percentage of contigs showing a specific number of SNPs.
Summary statistics of SNP discovery and selection.
| 454 | GAII | |
|
| 4,034 | 8,606 |
|
| 889 | 2,384 |
|
| 617.9 (101–5103) bp | 212.3 (100–3063) bp |
|
| 89 (4–3,678) | 674 (8–33,079) |
|
| 3,621 | 4,684 |
|
| 0.76 | 0.82 |
|
| 3,437 (94.92%) | 4,637 (99%) |
| | 1,322 (38.46%) | 3,389 (73.09%) |
| | 851 (24.76%) | 468 (10.09%) |
*Intron/exon boundary pipeline result.
Figure 3Enrichment of SNP-containing contigs in GO terms.
Differential distribution of GO terms in SNP-containing contigs (test set) compared to all contigs (reference set) in 454 data (A) and GAII data (B).
Summary statistics for SNP validation.
| 454 | GAII | |
|
| 966 (829) | 707 (570) |
|
| 944 (817) | 684 (557) |
|
| 409 (334) | 296 (221) |
|
| 259 (195) | 200 (136) |
| | 130 (97) | 73 (45) |
| | 110 (83) | 60 (37) |
| | 20 (14) | 13 (8) |
|
| 150 (139) | 96 (85) |
|
| 535 (483) | 388 (336) |
In brackets the number of SNPs after excluding the set of common loci.
*Data referring to nuclear SNPs.
Figure 4Minor allele frequency distribution.
Box plot of minor sequence allele frequency (MSAF) in the discovery panel and Minor allele frequency (MAF) in the validation panel for 454 data (A) and GAII data (B).
Figure 5Observed heterozygosity distribution.
Box plot of observed heterozygosity (Ho) calculated for the discovery panel and the validation panel of 454 data (A) and GAII data (B).
Predictor variables for 454 SNP data (failed/successful), backward stepwise elimination.
| B | Wald | df | P | |
|
| 1.735 | 14.607 | 1 |
|
|
| −0.361 | 6.545 | 1 |
|
|
| 1.415 | 5.418 | 1 |
|
|
| 0.075 | 2.919 | 1 | 0.088 |
|
| −0.007 | 5.205 | 1 |
|
|
| −2.339 | 13.067 | 1 | 0.000 |
Regression coefficient for individual variable,
Wald χ2 statistic,
associated probability. (e.g. average SNP_score for successful SNPs is 0.794, whereas mean SNP_score is 0.759 for “failed” assays).
Predictor variables for GAII SNP data (failed/successful), backward stepwise elimination.
| B | Wald | df | P | |
|
| 0.938 | 2.506 | 1 | 0.113 |
|
| 1.203 | 8.795 | 1 |
|
|
| −1.206 | 7.427 | 1 |
|
|
| −2.196 | 6.785 | 1 | 0.009 |
Regression coefficient for individual variable,
Wald χ2 statistic,
associated probability.
Predictor variables for 454 SNP data (monomorphic/polymorphic), backward stepwise elimination.
| B | Wald | df | P | |
|
| −0.127 | 6.369 | 1 |
|
|
| 6.977 | 2 |
| |
|
| 0.676 | 4.232 | 1 |
|
|
| 0.841 | 4.983 | 1 |
|
|
| −0.224 | 3.489 | 1 | 0.062 |
|
| 0.802 | 10.433 | 1 |
|
|
| −0.044 | 8.907 | 1 |
|
|
| −2.740 | 5.915 | 1 | 0.015 |
Regression coefficient for individual variable,
Wald χ2 statistic,
associated probability.
Predictor variables for GAII SNP data (monomorphic/polymorphic), backward stepwise elimination.
| B | Wald | df | P | |
|
| 2.247 | 8.272 | 1 |
|
|
| 0.345 | 2.735 | 1 | 0.098 |
|
| 1.805 | 5.162 | 1 |
|
|
| −2.188 | 5.581 | 1 |
|
|
| −1.577 | 1.361 | 1 | 0.243 |
Regression coefficient for individual variable,
Wald χ2 statistic,
associated probability.
Figure 6ROC curve of Q score predicting monomorphic/polymorphic SNPs in 454 data.