| Literature DB >> 22021376 |
Martin Kircher1, Susanna Sawyer, Matthias Meyer.
Abstract
Due to the increasing throughput of current DNA sequencing instruments, sample multiplexing is necessary for making economical use of available sequencing capacities. A widely used multiplexing strategy for the Illumina Genome Analyzer utilizes sample-specific indexes, which are embedded in one of the library adapters. However, this and similar multiplex approaches come with a risk of sample misidentification. By introducing indexes into both library adapters (double indexing), we have developed a method that reveals the rate of sample misidentification within current multiplex sequencing experiments. With ~0.3% these rates are orders of magnitude higher than expected and may severely confound applications in cancer genomics and other fields requiring accurate detection of rare variants. We identified the occurrence of mixed clusters on the flow as the predominant source of error. The accuracy of sample identification is further impaired if indexed oligonucleotides are cross-contaminated or if indexed libraries are amplified in bulk. Double-indexing eliminates these problems and increases both the scope and accuracy of multiplex sequencing on the Illumina platform.Entities:
Mesh:
Year: 2011 PMID: 22021376 PMCID: PMC3245947 DOI: 10.1093/nar/gkr771
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.(A) Regular Illumina multiplex library design. The grafting sequences (P5 and P7) are used for template immobilization and amplification. Three distinct sequence reads (forward read, index read, reverse read) are primed from different adapter sites. (B) Double-index library design with an additional index incorporated into the second adapter. Here, four distinct sequence reads are performed.
Numerical summary of the false-assignment rates and fractions of false index pairs observed for the three different experiments no-CAP, SP-CAP and MP-CAP
| no-CAP | SP-CAP | MP-CAP | |
|---|---|---|---|
| Total number of raw reads | 34 241 955 | 48 546 372 | 34 684 183 |
| Index pairs in raw data | |||
| Correct pairs ( | 89.14 | 78.83 | 89.38 |
| False pairs (%) | 0.582 | 0.509 | 0.760 |
| False index pairs after PF-filtering of raw reads | |||
| Total number of PF-filtered reads | 27 466 817 | 37 586 292 | 27 220 161 |
| False pairs (%) | 0.523 | 0.387 | 0.691 |
| False index pairs after quality score based filtering of index reads | |||
| Average index quality filter (~PF) (%) | 0.059 | 0.192 | 0.423 |
| Mininum index quality filter (~PF) (%) | 0.060 | 0.177 | 0.422 |
| Minimum index quality score of 15 (%) | 0.060 | 0.138 | 0.428 |
| False index pairs after quality score based filtering of template reads | |||
| Read quality filter on both reads (~PF) (%) | 0.362 | 0.439 | 0.614 |
| Read quality filter on the forward read (~PF) (%) | 0.389 | 0.394 | 0.593 |
| Read quality filter on the reverse read (~PF) (%) | 0.298 | – | – |
| Quantifying cross-contamination, mixed clusters and jumping PCR | |||
| False pairs due to contamination (%) | 0.042 | 0.104 | 0.038 |
| False pairs due to mixed clusters / jump. PCR (%) | 0.018 | 0.034 | 0.390 |
~PF indicates values for a quality score cutoff that removes fewer raw reads than Illumina's Pass Filter (PF) flag.
*The fraction of correct index pairs is strongly affected by loading density. Denser loading in experiment SP-CAP led to a higher sequencing error rate and hence reduced the fraction of correct index pairs.
Figure 2.Changes in the fraction of false (blue) and correct (red) index pairs when applying two different types of base quality filters on the index reads (minimum accepted quality score and average quality score). The fraction remaining after PF is indicated by a green ‘sun’ symbol. Black circles denote the fraction of reads remaining when considering quality score cutoffs that remove just a little bit less raw data than the Pass Filter flag (green lines, ~20% of the data). Both filter criteria remove considerably more false pairs. While no-CAP, SP-CAP and MP-CAP show similar trajectories, the fraction of false pairs is always considerably higher for the MP-CAP experiment, in which samples have been enriched and amplified in a multiplex setup. Quality score cutoffs for the SP-CAP experiment are lower than for the other two experiments due to the 40% higher cluster density of this experiment.
Figure 3.Heat map with the counts observed for all index combinations in the experiments no-CAP, SP-CAP and MP-CAP after applying a minimum quality score filter of 15 to the index reads. Only indexes that were actually used in the experiments are plotted; forward indexes on the horizontal and reverse indexes on the vertical axis. Color frequencies are provided for each of the experiments in the top right graphs.