| Literature DB >> 34837482 |
Yi Fang1, Amogh Changavi1, Manyun Yang2, Luo Sun1, Aihua Zhang1, Daniel Sun1, Zhiyi Sun1, Boce Zhang3, Ming-Qun Xu1.
Abstract
The requirement of a large input amount (500 ng) for Nanopore direct RNA-seq presents a major challenge for low input transcriptomic analysis and early pathogen surveillance. The high RNA input requirement is attributed to significant sample loss associated with library preparation using solid-phase reversible immobilization (SPRI) beads. A novel solid-phase catalysis strategy for RNA library preparation to circumvent the need for SPRI bead purification to remove enzymes is reported here. This new approach leverages concurrent processing of non-polyadenylated transcripts with immobilized poly(A) polymerase and T4 DNA ligase, followed by directly loading the prepared library onto a flow cell. Whole transcriptome sequencing, using a human pathogen Listeria monocytogenes as a model, demonstrates this new method displays little sample loss, takes much less time, and generates higher sequencing throughput correlated with reduced nanopore fouling compared to the current library preparation for 500 ng input. Consequently, this approach enables Nanopore low-input direct RNA-seq, improving pathogen detection and transcript identification in a microbial community standard with spike-in transcript controls. Besides, as evident in the bioinformatic analysis, the new method provides accurate RNA consensus with high fidelity and identifies higher numbers of expressed genes for both high and low input RNA amounts.Entities:
Keywords: Oxford Nanopore Technologies; direct RNA-seq; foodborne pathogen; immobilized enzymes; next-generation sequencing; transcriptome
Mesh:
Year: 2021 PMID: 34837482 PMCID: PMC8787394 DOI: 10.1002/advs.202103373
Source DB: PubMed Journal: Adv Sci (Weinh) ISSN: 2198-3844 Impact factor: 16.806
Figure 1Schematic illustration of flowcharts for various library preparation protocols. A) Standard protocol using soluble enzymes with a reverse transcription step (Sol‐RT) and two bead‐based purification steps. B) Protocol using immobilized enzymes for coupled enzymatic reactions (Im‐cpl) without bead‐based purification. C) Protocol using soluble enzymes for sequential reactions (Sol‐seq) with single bead‐based purification. D) Protocol using soluble enzymes for coupled enzymatic reactions (Sol‐cpl) with single bead‐based purification. E) Protocol using soluble enzymes for sequential reactions without bead‐based purification [Sol (‐BP)].
Comparison of library metrics of Nanopore direct RNA‐seq using soluble or immobilized enzymes by sequential or coupled enzymatic protocols using 500 ng of RNA input
| RNA input: 500 ng | ||||||
|---|---|---|---|---|---|---|
| Bead‐based purification | No bead‐based purficaiton | |||||
| Sol‐seq | Sol‐cpl | Sol‐RT | Im‐cpl (1/3) | Im‐cpl | Sol (‐BP) | |
| Recovery rate | 34 | 34 | 24 | 106 | 131 | 100 |
| Loading (ng) | 186.4 | 208.8 | 112.0 | 190.3 | 623.2 | 164.4 |
| Loading [µL] | 20 | 20 | 20 | 13.4 | 38 | 10 |
| Loading percentage [%] | 100 | 100 | 100 | 34 | 100 | 25 |
| Reads generated [K]/flow cell | 141.4 | 699.3 | 1185.3 | 1325.0 | – | 327.9 |
| Reads generated [K]/library | 141.4 | 699.3 | 1185.3 | – | 2566.6 | 327.9 |
| RNA‐seq time [h] | 20.2 | 48.8 | 53.0 | 65.8 | – | 31.8 |
| Mean read quality | 10.2 | 10.1 | 10.3 | 10.2 | – | 10.3 |
| Median read quality | 10.4 | 10.2 | 10.4 | 10.2 | – | 10.5 |
| Mean read quality (‐rRNA | 9.2 | 9.0 | – | 9.1 | – | 9.1 |
| Median read quality (‐rRNA | 9.2 | 8.8 | – | 8.8 | – | 8.9 |
Recovery rate measures the percentage of RNA before and after RNA library preparation (performed with a minimum of two replicates) by each method.
The entire prepared libraries generated by the soluble enzyme protocols, i.e., Sol‐seq, Sol‐cpl, Sol‐RT were loaded on a single flow cell. One‐third of the sample was loaded on a single flow cell for Im‐cpl (1/3), and the entire library was split to three parts and loaded on three flow cells for Im‐cpl. Total of 20 µL collected after bead‐based purification was loaded on a MinION R9.4 flow cell.
Total of 20 µL collected after bead‐based purification was loaded on a MinION R9.4 flow cell.
Due to the volume capacity of the library loading to each flow cell, the maximum volume of loading was 15 µL as we tested. The amount of each Im‐cpl (1/3) and Sol (‐BP) library per sequencing run was comparable to that of Sol‐seq. The entire RNA library, except for 2 µL, was loaded for Im‐cpl.
All protocols generated a mapped rate of ≈99% before the removal of rRNA sequences.
RNA‐seq time refers to the time RNA‐seq continues until the flow cell retains less than 10 active pores.
rRNA were removed in the data analysis.
Figure 2Direct RNA reads and nanopore fouling rate. A) Direct RNA reads generated from single flow cell run of each L. monocytogenes total RNA library using various protocols as depicted in Figure 1. B) Reads generated with low RNA input from 100, 50, 20, and 10 ng. 100 ng of RNA input was also used for soluble enzyme with a reverse transcription step (Sol‐RT 100). C) Nanopore fouling demonstrated by nanopore retaining percentage plotted against flow cell run time using various library preparation protocols.
Bioinformatics analysis of the community standard sample using Im‐cpl protocol
| Library preparation protocol | Im‐cpl | Sol‐RT | |
|---|---|---|---|
| Community standard sample | Total input [ng] | 50 | |
|
| 39 (78%) | ||
|
| 10 (20%) | ||
|
| 1 (2%) | ||
| Total yield [reads] | 151 877 | 121 037 | |
| Failed reads for analysis | 16 331 (10.8%) | 37 228 (30.8%) | |
| Mean read length [nt] | 537 | 129 | |
| Bioinformatics pipelines (MG‐RAST*) |
| 12 164 (55.7%) | 158 (69.0%) |
|
| 4158 (19.0%) | 32 (14.0%) | |
|
| 2036 (9.3%) | 10 (4.4%) | |
Figure 3Poly(A) length analysis of Im‐cpl and Sol‐RT. Normalized 3′ poly(A) length subgroups were generated using the datasets produced with 500 ng L. monocytogenes RNA by the four protocols, depicted in Figure 2, with 10 nt bin size. The mean and median poly(A) lengths of different datasets are shown in Table S2 of the Supporting Information.
Figure 4Venn Diagrams and transcripts per million (TPM) Pearson correlations of RNA‐seq data from the Im‐cpl and Sol‐RT protocols and different L. monocytogenes RNA input amounts. A) Im‐cpl 500 (1/3) (single flow cell run of one‐third of the RNA library) and Sol‐RT 500. B) Im‐cpl 500 (the entire library run on three flow cells) and Sol‐RT 500. C) Im‐cpl 100 and Sol‐RT 100.
Figure 5The coverage of genes identified from various protocols with different RNA input amounts. A) Gene coverage from the entire Im‐cpl 500 library (three sequencing run datasets combined) from a single 500 ng input library. B) Gene coverage from a Sol‐RT 500 ng library (Sol‐RT 500). C) Gene coverage from Im‐cpl protocol with 100 ng of RNA input (Im‐cpl 100). D) Gene coverage from Sol‐RT protocol with 100 ng of RNA input (Sol‐RT 100).
Sequencing coverage of common genes and unique genes of the Im‐cpl using the entire library and the Sol‐RT with different RNA input amounts
| Protocols | Mean [%] | Median [%] | Number of genes |
|---|---|---|---|
| Im‐cpl 500 | 85.8 | 100 | 1319 |
| Sol‐RT 500 | 84.9 | 100 | 1528 |
| Im‐cpl 100 | 66.7 | 78.6 | 1081 |
| Sol‐RT 100 | 60.2 | 66.5 | 336 |
Figure 6Distribution of gene counts at different gene lengths and sequencing coverage of gene body from 5′ to 3′ by Im‐cpl and Sol‐RT protocols with 500 ng RNA input. A) Distribution of gene counts at different gene lengths of the common genes and the unique genes of the Im‐cpl and Sol‐RT methods in comparison to the distribution of the L. monocytogenes reference genome. B) Sequencing coverage of gene body from 5′ to 3′ by Im‐cpl and Sol‐RT protocols with 500 ng RNA input.