| Literature DB >> 32616006 |
Austin N Southard-Smith1, Alan J Simmons1, Bob Chen1,2, Angela L Jones3, Marisol A Ramirez Solano4, Paige N Vega1, Cherie' R Scurrah1, Yue Zhao5, Michael J Brenan6, Jiekun Xuan5, Martha J Shrubsole7,8, Ely B Porter5, Xi Chen5, Colin J H Brenan6, Qi Liu4, Lauren N M Quigley9, Ken S Lau10,11,12,13.
Abstract
BACKGROUND: The increasing demand of single-cell RNA-sequencing (scRNA-seq) experiments, such as the number of experiments and cells queried per experiment, necessitates higher sequencing depth coupled to high data quality. New high-throughput sequencers, such as the Illumina NovaSeq 6000, enables this demand to be filled in a cost-effective manner. However, current scRNA-seq library designs present compatibility challenges with newer sequencing technologies, such as index-hopping, and their ability to generate high quality data has yet to be systematically evaluated.Entities:
Keywords: Exclusion amplification; Index hopping; Multiplexing; Next-generation sequencing; NovaSeq; Single-cell RNA sequencing; TruSeq; inDrop
Mesh:
Year: 2020 PMID: 32616006 PMCID: PMC7331155 DOI: 10.1186/s12864-020-06843-0
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1Mechanism for index hopping and its effects on sequencing library demultiplexing. a-e Illustration of index hopping due to (a) free adapter molecules remaining after purification post-PCR, resulting in (b) mis-priming of a single stranded library molecule. c The mis-primed library molecule is extended via ExAmp polymerase to generate (d) a fully complete library molecule with an incorrect sample index assigned. e Both correct and index-hopped molecule can form clusters on the flow cell. f-i Demultiplexing runs with single- or dual-indexed libraries with index hopping. f The case with a single index and no index hopping where the read(s) for a cluster are associated with a specific sample index (green with green and blue with blue) added to each molecule during library preparation, allowing reads to be assigned to its correct library of origin. g The case as above but with index hopping (a blue index now marks a green cluster), where that read will be incorrectly assigned to the wrong library. h A unique dual-indexed strategy allows for a single sample to have 2 indexes to be associated with a single library molecule. Here, library 1 = yellow + green, library 2 = purple and blue. i The case as above but with index hopping will result in reads displaying unanticipated combination of indexes (e.g., purple + green). The reads associated with unanticipated indexes can then be filtered out
Sequencing yield and quality of V2 inDrop with/without standard illumina libraries
| Sequencing Run | Sequencer | Sequencing Kit | Targeted inDrop read depth | Observed inDrop read depth | Mean transcript Quality Score | Mean Barcodes and UMI Quality |
|---|---|---|---|---|---|---|
| V2 structure mouse 1 | NextSeq | Mid-throughput | 130,000,000 | 148,238,920 | 30.72 | 30.55 |
| V2 structure mouse 1 + 10% illumina PhiX | MiSeqa | Nano | 900,000 | 745,903 | 34.94 | 32.24 |
| V2 structure mouse 2 and 3 + 15% illumina PhiX | NextSeq | Mid-throughput | 110,500,000b | 122,520,660 | 33.09 | 33.08 |
aIt is thought that the inDrop reads (745,903) for the MiSeq test was lower than the expected 1 million reads due to the fact that the loading concentration of inDrop libraries has been optimized on the NextSeq, but not on the MiSeq. On the NextSeq we have found that loading the inDrop libraries at 1.5x the listed optimal loading concentration improves clustering efficiency on the flow cell. The loading concentration of inDrop libraries on the MiSeq for this sequencing run was just the standard loading concentration
bThe targeted read depth is slightly decreased here compared to that of the V2 Structure mouse 1 because 15% of the read depth is expected to be taken up by PhiX
Fig. 2Quality of single-indexed inDrop libraries sequenced alongside Illumina libraries and data loss from index hopping. a The base calling accuracy plot for a inDrop V2 library on a NextSeq sequencing run, depicting the spread of quality scores as each base is sequenced. This plot consists of a series of box plots where each box plot maps the percent of clusters in each image of the flow cell with quality scores ≥30 (called Q30) in each cycle. The first 100 cycles correspond to the transcript read; the next 6 correspond to the i7 index read; the final 50 correspond to the cell barcode + UMI reads. The last 6 cycles read into the poly A tail due to the variable length of the inDrop cell barcodes. b The base calling accuracy plot for a inDrop V2 library sequenced alongside the control Illumina library, PhiX, on a NextSeq. When sequencing alongside PhiX, the 7-base long i7- and i5- index reads are used so that PhiX reads can be filtered out and discarded during demultiplexing. c Plot of the calculated proportion of cell barcodes that need to be discarded from single-indexed sequencing runs at different levels of multiplexing. We assume each sample will contain ~ 3000 cell barcodes
Evaluation of the raw yield and quality of TruDrop libraries when sequenced on the NovaSeq
| Library | Sequencer | i7 | i5 | Targeted inDrop Read Depth | Observed inDrop Reads | Average % of the lane | Percent perfect index reads | Mean transcript Quality Score | Mean Barcodes and UMI quality score |
|---|---|---|---|---|---|---|---|---|---|
| TruDrop Mouse 4 | NovaSeq 6000 | CCGCGGTT | AGCGCTAG | 50,000,000 | 53,655,662 | 0.64% | 96.99% | 35.57 | 36.22 |
| TruDrop Mouse 5 | NovaSeq 6000 | TTATAACC | GATATCGA | 50,000,000 | 44,554,464 | 0.53% | 94.13% | 35.53 | 36.19 |
| V2 Mouse 2 + 15% illumina PhiX | NextSeq | GATATCGA | – | 65,000,000 | 57,847,546 | 37.68% | 91.72% | 33.06 | 33.02 |
| V2 Mouse 3 + 15% illumina PhiX | NextSeq | GCCAAT | – | 65,000,000 | 64,673,114 | 42.14% | 92.64% | 33.12 | 33.14 |
| V2 Mouse 4 + 99% Illumina PhiX | NovaSeq 6000 | CTTGTA | – | 50,000,000 | 10,985,817 | 0.09% | – | – | – |
| V2 Mouse 5 + 99% Illumina PhiX | NovaSeq 6000 | GTGAAA | – | 50,000,000 | 10,144,573 | 0.08% | – | – | – |
| V2 Mouse 4 + 0% Illumina PhiX | NextSeq | CTTGTA | – | 100,000,000 | 97,872,275 | 24% | – | 31.89 | 28.67 |
| V2 Mouse 5 + 0% Illumina PhiX | NextSeq | GTGAAA | – | 100,000,000 | 92,742,820 | 23% | – | 32.03 | 28.09 |
Fig. 3Variations of inDrop library structures from the perspective of sequencing. a A standard Illumina library contains P7 and P5 adapter sites that are used to bind Illumina sequencing flow cells. i7-and i5-indexes are incorporated onto the P7 and P5 sides, respectively, to adopt a dual-indexing strategy. On either side of the insert are sites (R1 and R2) where standard Illumina sequencing primers are used to read across both sides of the insert. The reverse complement of these read priming sites then allows for the priming and subsequent reading of the i7 and i5 sample indexes. b The inDrop V2 library structure also incorporates the P7 and P5 flow cell adapter binding sites, with a single i7 index. The V2 structure utilizes a R1 priming site that is a truncated version of the standard R2 priming site, and a R2 priming site that is a deprecated R2 priming site. In addition, the R1 and R2 of the V2 structure are flipped so that the insert is read backwards from a normal Illumina library. c The TruSeq-inDrop (TruDrop) structure incorporates a second (i5) index and the standard Illumina R1 and R2 priming sites that are used in all Illumina TruSeq libraries
Fig. 4Sequencing quality of TruDrop libraries on exAmp chemistry sequencers. a The base calling accuracy plot for two dual-indexed TruDrop libraries on iSeq alongside PhiX. Cycles 1–50 depict the quality scores for the cell barcode + UMI read. Cycles 51–151 are sequence data that will be trimmed and discarded during analysis. Cycles 152–159 correspond to the i7 index read. Cycles 160–167 are the i5 index read. Cycles 168–318 are the transcript read. For the purpose of direct comparison only cycles 168–267 are marked as transcript as only 100 bases of transcript were sequenced for the V2 libraries. b The base calling accuracy plot for the same two TruDrop libraries when sequenced on the NovaSeq alongside 107 other libraries. c The base calling accuracy plot for 24 dual-indexed TruDrop library sequenced on a NovaSeq alongside 186 other libraries
Comparison of data alignment quality of the inDrop V2 and TruDrop structures
| Sample | Sequencer | Sequencing Depth (reads) | mapped reads (%) | Uniquely aligned reads (%) |
|---|---|---|---|---|
| V2 Mouse | NextSeq | 98,606,967 | 96.38 | 67.15 |
| TruDrop Mouse | NovaSeq | 43,657,381 | 96.56 | 73.48 |
| V2 Human | NextSeq | 55,507,773 | 96.11 | 84.44 |
| TruDrop Human | NovaSeq | 188,061,057 | 95.12 | 87.23 |
Fig. 5Comparison of cell types identified between inDrop V2 libraries on NextSeq and TruDrop on NovaSeq. a and c Combined t-SNE analysis of cells identified from a TruDrop and V2 library prepared from the same samples of (a) mouse and (c) human tumors. b and d sc-UniFrac tree representations of subpopulation structures for libraries presented in a and c, respectively. Cell groups enriched using V2 NextSeq libraries have red branches, while those enriched using TruDrop NovaSeq have blue branches. Thickness of branches represent level of enrichment. Distance values range from 0 to 1, with 0 representing complete overlap between two datasets