| Literature DB >> 22140562 |
Ellen Knierim1, Barbara Lucke, Jana Marie Schwarz, Markus Schuelke, Dominik Seelow.
Abstract
Next Generation Sequencing (NGS) technologies are gaining importance in the routine clinical diagnostic setting. It is thus desirable to simplify the workflow for high-throughput diagnostics. Fragmentation of DNA is a crucial step for preparation of template libraries and various methods are currently known. Here we evaluated the performance of nebulization, sonication and random enzymatic digestion of long-range PCR products on the results of NGS. All three methods produced high-quality sequencing libraries for the 454 platform. However, if long-range PCR products of different length were pooled equimolarly, sequence coverage drastically dropped for fragments below 3,000 bp. All three methods performed equally well with regard to overall sequence quality (PHRED) and read length. Enzymatic fragmentation showed highest consistency between three library preparations but performed slightly worse than sonication and nebulization with regard to insertions/deletions in the raw sequence reads. After filtering for homopolymer errors, enzymatic fragmentation performed best if compared to the results of classic Sanger sequencing. As the overall performance of all three methods was equal with only minor differences, a fragmentation method can be chosen solely according to lab facilities, feasibility and experimental design.Entities:
Mesh:
Year: 2011 PMID: 22140562 PMCID: PMC3227650 DOI: 10.1371/journal.pone.0028240
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Workflow for fragmentation and NGS sequencing of long-range PCR fragments.
(A) Graphical illustration of the entire workflow. The red arrows depict a measuring and DNA-quantification step. (B) Analysis of fragment lengths by PAGE before (left panel) and after (right panel) removal of small fragments <500 bp with AMPure™ columns. The red boxes depict the desired size range between 600 and 1,000 bp. Neb, nebulization; Son, sonication; Enz, enzymatic fragmentation.
Figure 2Coverage, sequence quality and read lengths of the 454 sequence run.
(A) Sequence coverage over the entire genomic region of the LPPR4 gene. The colors separate the results with respect to the fragmentation method. The gray bars above the graph depict the location and length PCR fragments [in kbp] and the red squares highlight the seven coding exons of the ENST00000370185 transcript. It becomes clear that the sequence coverage drops considerably for all three fragmentation methods if the PCR fragment size is below 3,000 bp. (B) Comparison of the sequence qualities scores (PHRED) at the 3′-ends of the sequences that have been generated using the three fragmentation methods. The bars depict the mean and standard deviation for three replicates of each fragmentation method, averaged over stretches of 5 base pairs. No significant difference was found between the three fragmentation methods. (C) Number of the sequence reads for different read lengths (averaged over stretches of 50 base pairs). The error bars depict the standard deviation of three replicates for each fragmentation method. This shows the variation between technical replicates to be larger than between the averages of the three fragmentation methods. No significant difference was found between the three methods.
Figure 3Comparison of the percentage of missense, deletion and insertion errors in individual sequence reads.
The error frequency was calculated according to Method#1 (see Materials and Methods section) with respect to the fragmentation method. The error bars depict the standard deviation. In order to classify a position on a sequence read as erroneous, the coverage of the respective position had to be >20 fold and the percentage of the alternative (erroneous) allele to be <20%. *, p<0.05.
NGS results in comparison with classic Sanger sequencing.
| Genomic position on Chr 1 (hg19) | RefSeq allele | Sequencing(Sanger) | ALL | Nebulization (Neb) | Sonication (Son) | Enzymatic (Enz) | |||||||
| Exp#1 | Exp#2 | Exp#3 | Exp#1 | Exp#2 | Exp#3 | Exp#1 | Exp#2 | Exp#3 | |||||
| 99.748.311 | T | het | T|G | T|GΣ 753T = 51% | T|GΣ 44T = 59% | T|GΣ 143T = 52% | T|GΣ 71T = 52% | T|GΣ 150T = 51% | T|GΣ 36T = 64% | T|GΣ 96T = 47% | T|GΣ 98T = 55% | T|GΣ 53T = 43% | T|GΣ 62T = 40% |
| 99.748.324 | C | het | C|T | C|TΣ 729C = 63% | C|TΣ 49C = 67% | C|TΣ 140C = 65% | C|TΣ 71C = 69% | C|TΣ 144C = 57% | C|TΣ 35C = 69% | C|TΣ 85C = 61% | C|TΣ 87C = 71% | C|TΣ 55C = 65% | C|TΣ 63C = 52% |
| 99.748.522 | A | het | A|G | A|GΣ 779A = 51% | A|GΣ 62A = 60% | A|GΣ 151A = 52% | A|GΣ 78A = 54% | A|GΣ 157A = 46% | A|GΣ 43A = 49% | A|GΣ 102A = 53% | A|GΣ 75A = 53% | A|GΣ 49A = 49% | A|GΣ 62A = 45% |
| 99.762.338–99.762.339 | AA | hom | AA|AA | AA|AAΣ 1221AA = 99% |
| AA|AAΣ 228AA = 100% | AA|AAΣ 146AA = 98% | AA|AAΣ 205AA = 99% |
| AA|AAΣ 160AA = 100% | AA|AAΣ 123AA = 99% | AA|AAΣ 99AA = 99% | AA|AAΣ 121AA = 99% |
| 99.764.728 | T | hom | T|T | T|TΣ 482T = 100% | T|TΣ 22T = 100% | T|TΣ 109T = 100% | T|TΣ 62T = 100% | T|TΣ 84T = 99% |
| T|TΣ 67T = 100% | T|TΣ 42T = 100% | T|TΣ 42T = 100% | T|TΣ 37T = 97% |
| 99.767.383 | C | het | C|G | C|GΣ 138C = 49% | C|GΣ 13C = 31% | C|GΣ 29C = 59% | C|GΣ 21C = 29% | C|GΣ 16C = 56% |
| C|GΣ 22C = 45% | C|GΣ 11C = 64% | C|GΣ 12C = 50% | C|GΣ 10C = 60% |
| 99.772.437–99.772.438 | TT | hom | TT|TT | TT |TTΣ 675TT = 100% | TT|TTΣ 37TT = 100% | TT|TTΣ 109TT = 100% | TT|TTΣ 132TT = 100% | TT|TTΣ 109TT = 100% | TT|TTΣ 22TT = 100% |
| TT|TTΣ 82TT = 99% | TT|TTΣ 54TT = 99% | TT|TTΣ 78TT = 100% |
The seven out of 4,096 analyzed positions comprise four heterozygous SNPs as well as three homozygous positions that had been misinterpreted by at least one experiment. 1st line: NGS sequence call after alignment and homopolymer filtering; 2nd line: Σ, number of sequence fragments covering the respective position; 3rd line: percentage of the calls for the major allele. Erroneous positions are highlighted in bold face.
*SAMtools prints the allele counts before it applies the homopolymer filter but calls the genotype afterwards. Allele frequencies and predicted genotype may hence differ.
**Please note that SAMtools only reports InDels supported by a sufficient number of reads. For the subsets in which this was not the case, we calculated the percentage of the RefSeq allele with all counts for both positions and display the mean coverage over both positions as Σ.