| Literature DB >> 26014582 |
Robert Kofler1, Viola Nolte1, Christian Schlötterer1.
Abstract
Sequencing pools of individuals (Pool-Seq) is a cost-effective method to determine genome-wide allele frequency estimates. Given the importance of meta-analyses combining data sets, we determined the influence of different genomic library preparation protocols on the consistency of allele frequency estimates. We found that typically no more than 1% of the variation in allele frequency estimates could be attributed to differences in library preparation. Also read length had only a minor effect on the consistency of allele frequency estimates. By far, the most pronounced influence could be attributed to sequence coverage. Increasing the coverage from 30- to 50-fold improved the consistency of allele frequency estimates by at least 27%. We conclude that Pool-Seq data can be easily combined across different library preparation methods, but sufficient sequence coverage is key to reliable results.Entities:
Keywords: Drosophila; NGS libraries; Pool-Seq; population genetics-empirical
Mesh:
Year: 2015 PMID: 26014582 PMCID: PMC4744716 DOI: 10.1111/1755-0998.12432
Source DB: PubMed Journal: Mol Ecol Resour ISSN: 1755-098X Impact factor: 8.678
Mapping statistics for Pool‐Seq data generated with different library preparation protocols from genomic DNA of Drosophila melanogaster (Dmel) and Drosophila simulans (Dsim); data are shown for two PCR‐free protocols (−) and two protocols using PCR amplification (+); Rep.: replicates; Reads: reads in million; m.: mapped reads in percent; br.p.: broken pairs, that is paired‐end fragments not mapped as proper pair, in percent; Error: sequencing error in percent (including polymorphism); Indel: indel error in percent (including polymorphism); Chi.: chimera, that is paired‐end fragments where reads map to discordant positions, in percent; Dup.: duplicates in percent; Cov. CV: coefficient of variation for the coverage
| Protocol | Rep. | Reads | m. (%) | br.p (%) | Error (%) | Indel (%) | Chi. (%) | Dup. | Cov. CV | |
|---|---|---|---|---|---|---|---|---|---|---|
| Dmel | NEBNext Ultra (+) | 1 | 79 | 96.3 | 2.0 | 0.76 | 0.055 | 0.88 | 4.03 | 0.29 |
| 2 | 103 | 96.3 | 2.0 | 0.77 | 0.055 | 0.94 | 4.00 | 0.29 | ||
| NEXTflex (−) | 1 | 162 | 95.0 | 4.7 | 0.65 | 0.046 | 3.04 | 2.46 | 0.40 | |
| 2 | 199 | 94.2 | 4.2 | 0.65 | 0.047 | 2.5 | 2.07 | 0.34 | ||
| NEBNext DNA (+) | 1 | 90 | 96.2 | 2.3 | 0.73 | 0.054 | 1.03 | 3.76 | 0.25 | |
| 2 | 96 | 96.1 | 2.1 | 0.73 | 0.053 | 0.92 | 3.22 | 0.24 | ||
| TruSeq (−) | 1 | 76 | 96.8 | 2.4 | 0.76 | 0.056 | 1.22 | 2.15 | 0.27 | |
| 2 | 84 | 96.7 | 3.0 | 0.76 | 0.056 | 1.74 | 1.95 | 0.27 | ||
| Dsim | NEBNext Ultra (+) | 1 | 74 | 85.6 | 2.1 | 1.36 | 0.101 | 0.76 | 2.64 | 0.41 |
| 2 | 64 | 85.6 | 2.1 | 1.36 | 0.102 | 0.77 | 2.94 | 0.42 | ||
| NEXTflex (−) | 1 | 133 | 87.5 | 5.6 | 1.21 | 0.086 | 3.26 | 1.49 | 0.53 | |
| 2 | 137 | 87.4 | 5.0 | 1.25 | 0.094 | 2.82 | 2.54 | 0.45 | ||
| NEBNext DNA (+) | 1 | 79 | 85.3 | 2.5 | 1.32 | 0.100 | 0.93 | 2.79 | 0.41 | |
| 2 | 90 | 85.9 | 2.2 | 1.32 | 0.100 | 0.8 | 2.22 | 0.41 | ||
| TruSeq (−) | 1 | 92 | 86.5 | 2.6 | 1.37 | 0.104 | 0.93 | 0.68 | 0.37 | |
| 2 | 90 | 86.5 | 2.9 | 1.37 | 0.103 | 1.19 | 0.68 | 0.37 |
For statistics that are sensitive to coverage differences we subsampled the data to 26 930 986 proper pairs across all sample.
Average allele frequency difference within (d within) and between (d between) the library preparation protocols, for different coverages (cov.) in data from Drosophila melanogaster (Dmel) and Drosophila simulans (Dsim); E: error due to library preparation (%); snps: number of SNPs analysed
| Cov. | SNP |
|
| E | |
|---|---|---|---|---|---|
| Dmel | 30 | 2 977 317 | 0.0548 | 0.0551 | 0.5558 |
| 40 | 1 954 383 | 0.0466 | 0.0468 | 0.578 | |
| 50 | 592 354 | 0.04 | 0.0402 | 0.5384 | |
| Dsim | 30 | 2 992 813 | 0.0597 | 0.06 | 0.5153 |
| 40 | 1 032 206 | 0.0502 | 0.0504 | 0.4042 | |
| 50 | 119 047 | 0.0427 | 0.0431 | 0.7862 |
Assignment of accession nos to sequencing libraries
| Protocol | r1 | r2 | |
|---|---|---|---|
| Dmel | NEBNext Ultra (+) |
|
|
| NEXTflex (−) |
|
| |
| NEBNext DNA (+) |
|
| |
| TruSeq (−) |
|
| |
| Dsim | NEBNext Ultra (+) |
|
|
| NEXTflex (−) |
|
| |
| NEBNext DNA (+) |
|
| |
| TruSeq (−) |
|
|
Dmel, Drosophila melanogaster; Dsim, Drosophila simulans; r1, replicate 1; r2, replicate 2.