| Literature DB >> 22716167 |
Soo Hyung Eo1, J Andrew Dewoody.
Abstract
BACKGROUND: Next-generation sequencing methods have contributed to rapid progress in the fields of genomics and population genetics. Using this high-throughput and cost-effective technology, a number of studies have estimated single nucleotide polymorphism (SNP) frequency by calculating the mean number of SNPs per unit sequence length (e.g., mean SNPs/kb). However, both read length and contig depth are highly variable and thus raise doubt about simple methods of SNP frequency estimation.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22716167 PMCID: PMC3416719 DOI: 10.1186/1471-2164-13-259
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Summary statistics of assembled contigs (with depth of ≥10 reads) and SNPs
| | | ||||
| | Number of cocntigs | 4552 | 2146 | 2406 | |
| | | Mean length (bp) | 811 | 977 | 664 |
| | | Mean depth | 28 | 36 | 21 |
| | Number of SNPs | 2980 | 1744 | 1236 | |
| | | Number of transitions | 1917 | 1139 | 778 |
| | | Number of transversions | 1063 | 605 | 458 |
| | | ||||
| | Number of contigs | 3181 | 1762 | 1419 | |
| | | Mean length (bp) | 1003 | 1095 | 889 |
| | | Mean depth | 32 | 39 | 23 |
| | Number of SNPs | 2515 | 1555 | 960 | |
| | | Number of transitions | 1603 | 1009 | 594 |
| Number of transversions | 912 | 546 | 366 | ||
Figure 1Correlations between the number of SNPs and both length and depth of contigs. Positive correlations between the number of SNPs and both (A) length and (B) depth of contigs with depth of ≥10 reads and length of ≥501 bp. In both cases, P < 0.001.
Model selection among candidate models predicting frequencies of SNPs, transition and transversion in contigs
| | M1 (best, full model) | Intercept**, LENGTH*, DEPTH**, C/NC† | 7283.4 | 0.0 | 0.904 | |
| | M2 | Intercept**, DEPTH**, C/NC | 7287.9 | 4.5 | 0.096 | |
| | M3 | Intercept**, LENGTH**, C/NC | 7540.7 | 257.3 | 1.2 × 10-56 | |
| | M4 | Intercept**, C/NC** | 7748.3 | 464.9 | 1.0 × 10-101 | |
| | M1 (best, full model) | Intercept**, LENGTH†, DEPTH**, C/NC | 5682.4 | 0.0 | 0.681 | |
| | M2 | Intercept**, DEPTH**, C/NC | 5683.9 | 1.5 | 0.319 | |
| | M3 | Intercept**, LENGTH**, C/NC† | 5896.1 | 213.7 | 2.7 × 10-47 | |
| | M4 | Intercept**, C/NC** | 6061.0 | 378.5 | 6.3 × 10-83 | |
| | ||||||
| | M1 (best, full model) | Intercept**, LENGTH**, DEPTH**, C/NC* | 4077.1 | 0.0 | 0.991 | |
| | M2 | Intercept**, DEPTH**, C/NC† | 4086.6 | 9.5 | 0.009 | |
| | M3 | Intercept**, LENGTH**, C/NC | 4183.8 | 106.7 | 6.7 × 10-24 | |
| M4 | Intercept**, C/NC* | 4310.4 | 233.3 | 2.2 × 10-51 | ||
Model selection among candidate regression models using negative binomial distribution predicting the frequency of SNPs, transitions, and transversions in contigs (depth of 10 reads or more; length of 501 bp or longer), using Akaike's Information Criteria (AIC).
a LENGTH and DEPTH are the length and the depth of contigs, respectively, and C/NC is a dummy variable for the type of transcript (protein-coding (coded with 1) vs non-coding transcript (coded with 0). **, P < 0.01; *, P < 0.05; †, P < 0.1) variable in each model.
bΔAIC is the difference between the AIC of the best fitting model and that of each model.
cw is Akaike weight of each model.
Parameter estimates from best fitting model predicting frequencies of SNPs, transition and transversions in contigs
| | | ||||
| | Intercept | −0.9828 | −1.1137 | −0.8520 | < 0.0001 |
| | LENGTH | 0.0002 | 0.0000 | 0.0003 | 0.0106 |
| | DEPTH | 0.0146 | 0.0127 | 0.0165 | < 0.0001 |
| | C/NC | −0.0999 | −0.2136 | 0.0139 | 0.0854 |
| | | ||||
| | Intercept | −1.4039 | −1.5489 | −1.2588 | < 0.0001 |
| | LENGTH | 0.0001 | 0.0000 | 0.0003 | 0.0594 |
| | DEPTH | 0.0139 | 0.0119 | 0.0158 | < 0.0001 |
| | C/NC | −0.0398 | −0.1683 | 0.0886 | 0.5435 |
| | |||||
| | Intercept | −2.0007 | −2.1868 | −1.8147 | < 0 .0001 |
| | LENGTH | 0.0003 | 0.0001 | 0.0005 | 0.0006 |
| | DEPTH | 0.0123 | 0.0099 | 0.0147 | < 0.0001 |
| C/NC | −0.1883 | −0.3539 | −0.0226 | 0.0259 | |
Estimates of variables from best fitting candidate model predicting the frequency of SNPs, transitions, and transversions in contigs (depth of 10 reads or more; length of 501 bp or longer).
a LENGTH and DEPTH are the length and the depth of contigs, respectively, and C/NC is a dummy variable for the type of transcript (protein-coding (coded as 1) vs non-coding transcript (coded as 0)).
Figure 2Estimates of the number of SNPs in transcripts based on length and depth of contigs. Estimates and comparison of the number of SNPs in protein-coding and non-coding transcripts based on various length and depth of contigs (depth of ≥10 reads; length of ≥501 bp), using best fitting models (Table 3).