| Literature DB >> 30866797 |
Qiaoling Li1,2,3, Xia Zhao1,2,3, Wenwei Zhang1,2,4, Lin Wang5, Jingjing Wang1,2, Dongyang Xu1,2, Zhiying Mei3, Qiang Liu6, Shiyi Du3, Zhanqing Li1,2,3, Xinming Liang3, Xiaman Wang6, Hanmin Wei3, Pengjuan Liu1,2,3, Jing Zou3, Hanjie Shen1,2,3, Ao Chen1,2, Snezana Drmanac1,5, Jia Sophie Liu5, Li Li1,2, Hui Jiang3, Yongwei Zhang1,5, Jian Wang1,7, Huanming Yang1,7, Xun Xu1,2, Radoje Drmanac8,9,10,11, Yuan Jiang12.
Abstract
BACKGROUND: Massively-parallel-sequencing, coupled with sample multiplexing, has made genetic tests broadly affordable. However, intractable index mis-assignments (commonly exceeds 1%) were repeatedly reported on some widely used sequencing platforms.Entities:
Keywords: DNA nanoball technology; Multiplex sequencing; NGS; Rare index mis-assignment
Mesh:
Year: 2019 PMID: 30866797 PMCID: PMC6416933 DOI: 10.1186/s12864-019-5569-5
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1Mechanisms of index hopping on different sequencing platforms. a Sequencing using DNA nanoball technology is accomplished through Phi29 and RCR linear amplification; each copy is amplified independently using the same template ssCir. In this case, error reads from index hopping cannot accumulate, and most of the signal originates from correct indexes. b Bridge PCR or ExAmp chemistry utilizes exponential amplification, and index hopping can accumulate as amplification proceeds through each cycle, resulting in mis-assigned samples. Green, correct index; red, wrong index
Fig. 2Library preparation workflows. a “standard PCR-based WGS”-like library; (b) PCR-free library; (c) two-step PCR library. Pooling after each step, indicated by red arrows, is examined for different library preparation strategies. Gray rectangle, adapter; colored rectangle, unique index assigned to a particular sample; gray vertical lines, unique sample index; white rectangle, UID
Observed frequencies of read mis-assignment in controls
| Experiments | Mis-assignment causes | Index # | Total reads mapped to 8 gene regions | Mis-assignment rate per index | ||
|---|---|---|---|---|---|---|
| Repeat 1 | Repeat 2 | Repeat 3 | ||||
| Experimental groups | N.A. | Barcode 1–8 | 41,686,373 | 44,974,964 | 42,874,988 | N.A. |
| Empty controls | Physical barcode hopping | Barcode 33–40 | 9 | 14 | 6 | 1 in 36 million reads |
| Balancing library controls | Total mis-assignments occur after ssCir | Barcode 41–48 | 612 | 650 | 724 | 1 in 0.5 million reads |
| All groups | All above | All indexes above | 41,686,994 | 44,975,628 | 42,875,718 | N.A. |
Experimental groups, WGS-like libraries prepared separately using indexes 1 to 8; empty controls, indexes 33–40 and reagents used but without sample DNA; balancing library controls, samples prepared and indexed with indexes 41–48 independently and pooled with test samples after ssCir formation; all groups, total reads of all the indexes. Reads were presented after applying a Q30 > 60% filter
Fig. 3a Total contamination rates for each pooling scenario. Three replicates are presented with different types of bars. Wider bars with dashed borders represent the average of the three replicates, the exact values of which are labeled on top. The exact values are shown in Additional file 1: Table S3. b index split rates when pooling was performed after PCR amplification. Average ± standard deviation (SD) of three replicates is presented. The theoretical split rate for each index is 0.125. c index contamination matrix when pooling occurred after PCR purification. Indexes 1 to 8 were assigned to Notch1, EFEMP2, Lox, USP9Y, HIST1H1D, C7orf61, GXYLT2, and TM9SF4 respectively. Read numbers and percentages are shown with or without Q30 filter application. Green shading, proper combinations; brown and yellow shading, improper combinations; yellow shading, improper combinations likely resulting from contamination during oligo synthesis. Index contamination rates were calculated by dividing the sum of contaminated reads by the sum of total reads for all eight indexes
Fig. 4The effect of filter on total contamination rate and percent of remaining reads. The reads when library pooling occurred after PCR amplification were filtered. Total contamination rate is shown in red and percent of remaining reads is shown in blue. Reads with index 7 were excluded from the calculation. Mapped reads were filtered by different criteria for the Q30 score. Averages ± SD of three replicates are presented. The average values are labeled on top
Level of contamination for PCR-free library on BGISEQ-500
| a. Sample arrangement of PCR-free library (HPV) | |||||||||||||
| Template | YH-1 |
| YH-1 | YH-1 | YH-1 | YH-1 |
| YH-1 | YH-1 | YH-1 | YH-1 | YH-1 | Barcode 1 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
| Template | YH-2 | YH-2 |
| YH-2 | YH-2 | YH-2 | YH-2 | YH-2 | YH-2 |
| YH-2 | YH-2 | Barcode 2 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
| Template | YH-3 | YH-3 | YH-3 | YH-3 |
| YH-3 | YH-3 | YH-3 | YH-3 | YH-3 | YH-3 | YH-3 | Barcode 3 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
| Template | YH-4 | YH-4 | YH-4 | YH-4 | YH-4 | YH-4 |
| YH-4 | YH-4 | YH-4 | YH-4 | YH-4 | Barcode 4 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
| Template |
| YH-5 | YH-5 | YH-5 | YH-5 |
| YH-5 | YH-5 | YH-5 | YH-5 | YH-5 | YH-5 | Barcode 5 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
| Template | YH-6 | YH-6 | YH-6 | YH-6 | YH-6 | YH-6 | YH-6 |
|
| YH-6 | YH-6 | YH-6 | Barcode 6 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
| b. Performance of SeqHPV | |||||||||||||
| Library | Sample | Total | Mapped | Mapped | Major Types | HBB Score | HPV Score | False | False | ||||
| 1 | MGIP002 | 2470768 | 1800287 | 72.90% | HPV11,HBB | 10 | 10 | 0 | 0 | ||||
| 2 | MGIP022 | 2653747 | 2526477 | 95.20% | HPV18,HBB | 10 | 10 | 0 | 0 | ||||
| 3 | MGIP029 | 1793620 | 690665 | 94.30% | HPV31,HBB | 10 | 10 | 0 | 0 | ||||
| 4 | MGIP043 | 1511740 | 1210189 | 80.10% | HPV33,HBB | 10 | 10 | 0 | 0 | ||||
| 5 | MGIP049 | 1641545 | 1447782 | 88.20% | HPV52,HBB | 10 | 10 | 0 | 0 | ||||
| 6 | MGIP069 | 2800830 | 1942883 | 69.40% | HPV45,HPV11,HBB | 10 | 10 | 0 | 0 | ||||
|
|
|
|
|
|
|
|
|
|
| ||||
|
|
|
|
|
|
|
| |||||||
|
|
|
|
|
|
|
| |||||||
|
|
|
|
|
|
|
| |||||||
| c. Index contamination rate of PCR-free libraries | |||||||||||||
| Library | HBB | HPV11 | HPV18 | HPV31 | HPV33 | HPV52 | HPV45 | ||||||
| Read depth | 1 | 2994608 |
|
|
|
|
|
| |||||
| 2 | 2722311 |
|
|
|
|
|
| ||||||
| 3 | 1891540 |
|
|
|
|
|
| ||||||
| 4 | 2936888 |
|
|
|
|
|
| ||||||
| 5 | 2289158 |
|
|
|
|
|
| ||||||
| 6 | 1747934 |
|
|
|
|
|
| ||||||
| 8 | 27 |
|
|
|
|
|
| ||||||
| Percent of read depth | 1 |
|
|
|
|
|
| ||||||
| 2 |
|
|
|
|
|
| |||||||
| 3 |
|
|
|
|
|
| |||||||
| 4 |
|
|
|
|
|
| |||||||
| 5 |
|
|
|
|
|
| |||||||
| 6 |
|
|
|
|
|
| |||||||
| 8 |
|
|
|
|
|
| |||||||
a. Positive samples are in italic bold, negative samples with YH genome only are in black font, water controls are bolded and sample index are in italic. b. Empty controls are in italic. Index 7 data was excluded due to its oligo synthesis contamination. c. Italic bold, proper combinations; italic, improper combinations. The average sample-to-sample mis-assignment rate is 0.0004% without any filtering
Contamination rate of PCR-introduced adapter library preparation method using MGI lung cancer kit
| a. Contamination rate before removing duplication | |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Index | Repeats | EGFR (L858R) | KRAS (G12D) | EGFR (19del) | NRAS (p.Q61H) | ||||||||
| Reference reads | Mut reads | Mut allele rate | Reference reads | Mut reads | Mut allele rate | Reference reads | Mut reads | Mut allele rate | Reference reads | Mut reads | Mut allele rate | ||
| 1 | Repeat 1 | 1,423,408 | 4 | negative | 52,589 | 34 | negative | 31,150 | 0 | negative | 188,086 | 0 | negative |
| Repeat 2 | 1,158,060 | 4 | negative | 54,331 | 33 | negative | 31,047 | 0 | negative | 201,147 | 0 | negative | |
| 2 | Repeat 1 |
|
|
| 59,590 | 39 | negative | 40,077 | 0 | negative | 205,321 | 0 | negative |
| Repeat 2 |
|
|
| 57,175 | 27 | negative | 36,381 | 0 | negative | 192,472 | 0 | negative | |
| 3 | Repeat 1 | 1,604,176 | 6 | negative |
|
|
| 32,294 | 0 | negative | 199,296 | 2 | negative |
| Repeat 2 | 1,430,975 | 5 | negative |
|
|
| 36,961 | 0 | negative | 200,989 | 4 | negative | |
| 4 | Repeat 1 | 1,321,771 | 3 | negative | 56,766 | 20 | negative |
|
|
| 150,478 | 0 | negative |
| Repeat 2 | 1,275,573 | 7 | negative | 59,610 | 31 | negative |
|
|
| 204,544 | 0 | negative | |
| b. Contamination rate after removing duplication | |||||||||||||
| Index | Repeats | EGFR (L858R) | KRAS (G12D) | EGFR (19del) | NRAS (p.Q61H) | ||||||||
| Reference templates | Mut templates | Mut allele rate | Reference templates | Mut templates | Mut allele rate | Reference templates | Mut templates | Mut allele rate | Reference templates | Mut templates | Mut allele rate | ||
| 1 | Repeat 1 | 26,824 | 0 | negative | 6889 | 2 | negative | 5295 | 0 | negative | 10,798 | 0 | negative |
| Repeat 2 | 21,904 | 0 | negative | 6209 | 1 | negative | 5088 | 0 | negative | 9617 | 0 | negative | |
| 2 | Repeat 1 |
|
|
| 6903 | 3 | negative | 5509 | 0 | negative | 10,770 | 0 | negative |
| Repeat 2 |
|
|
| 6757 | 2 | negative | 5565 | 0 | negative | 9911 | 0 | negative | |
| 3 | Repeat 1 | 23,017 | 0 | negative |
|
|
| 4622 | 0 | negative | 8788 | 0 | negative |
| Repeat 2 | 23,485 | 0 | negative |
|
|
| 5274 | 0 | negative | 9391 | 0 | negative | |
| 4 | Repeat 1 | 31,688 | 0 | negative | 7203 | 0 | negative |
|
|
| 13,032 | 0 | negative |
| Repeat 2 | 30,261 | 0 | negative | 8300 | 1 | negative |
|
|
| 13,937 | 0 | negative | |
Correct positive calls are in bold italic. Theoretical percentages are indicated in brackets