| Literature DB >> 31959126 |
Gabriel Sholder1, Thomas A Lanz1, Robert Moccia1, Jie Quan1, Estel Aparicio-Prat1, Robert Stanton1, Hualin S Xi2.
Abstract
BACKGROUND: The advent of Next Generation Sequencing has allowed transcriptomes to be profiled with unprecedented accuracy, but the high costs of full-length mRNA sequencing have posed a limit on the accessibility and scalability of the technology. To address this, we developed 3'Pool-seq: a simple, cost-effective, and scalable RNA-seq method that focuses sequencing to the 3'-end of mRNA. We drew from aspects of SMART-seq, Drop-seq, and TruSeq to implement an easy workflow, and optimized parameters such as input RNA concentrations, tagmentation conditions, and read depth specifically for bulk-RNA.Entities:
Keywords: 3’Pool-seq; 3′-RNA sequencing; Differential gene expression; Next generation sequencing; RNA-seq; Transcriptomics
Mesh:
Substances:
Year: 2020 PMID: 31959126 PMCID: PMC6971924 DOI: 10.1186/s12864-020-6478-3
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1A schematic representation of the 3’Pool-seq protocol. The use of anchored oligo-dT primers with standard indexed TruSeq i7 adapter overhangs for first strand synthesis allows immediate pooling of multiple samples after reverse transcription. Within a pool, each sample can be uniquely identified by the TruSeq i7 index. Once pooled, purification, PCR, and Nextera tagmentation reagents are used to generate cDNA fragments. A second PCR step using standard TruSeq i7 and indexed Nextera i5 adapters allows selective amplification of only 3′-end cDNA fragments and barcoding of each sample pool with a standard Nextera i5 index. The final product is a dual-indexed hybrid Nextera/TruSeq 3′-library where the i5 Nextera index serves as the pool index, and the i7 TruSeq index serves as the sample index within a pool. Multiple indexed library pools can be further quantified and combined in equal proportions into a superpool for sequencing
Sequencing and mapping quality metrics comparison between 3’Pool-seq and TrusSeq. Shown in the table are the mean and standard deviation of the different quality metrics
| Quality Metrics | 3’Pool-seq | mRNA TruSeq |
|---|---|---|
| # of samples | 6 | 6 |
| Reads per sample (Millions) | 6.4 ± 3.6 | 33 ± 10.4 |
| Number Uniquely Mapped Reads (Millions) | 4.7 ± 2.7 | 28.7 ± 8.7 |
| % mapped reads | 87.2 ± 2 | 94.4 ± 1.9 |
| % Uniquely mapped reads | 72 ± 4 | 87 ± 1 |
| % coding reads | 24 ± 0.8 | 36 ± 2 |
| % UTR reads | 42 ± 0.7 | 34 ± 0.2 |
| % rRNA reads (× 10^-5) | 2 ± 0.4 | 19.8 ± 8 |
| % non-mRNA reads | 31 ± 2 | 28 ± 3 |
| # of genes detected (TPM >1) | 13,571 ± 179 | 14,135 ± 211 |
| ERCC correlation with theoretical concentrations (r2) | 0.93 ± 0.01 | 0.87 ± 0.03 |
| ERCC pairwise correlation between samples (r2) | 0.97 ± 0.01 | 0.95 ± 0.01 |
Fig. 23’Pool-seq provides robust and reproducible gene expression quantification. a Read distribution from full-length mRNA-seq (Truseq) and 3’Pool-seq in the ApoE gene region. Reads generated using 3’Pool-seq are mapped preferentially towards the 3′-end of the gene. b Correlation of the abundance levels of ERCC spike-ins between 3’Pool-seq quantifications and actual pre-mixed concentrations. c Correlation of the abundance levels of ERCC spike-ins between 3’Pool-seq replicates. d Correlation of gene expression values (log2TPM) between 3’Pool-seq replicates. e Number of genes detected with different minimal abundance thresholds at increasing read depths (i.e. total number of reads uniquely aligned to gene features). f Distribution of 3’Pool-seq reads is skewed towards the 3′-end of the gene body as expected. Normalized positions 0 and 100 correspond to 5′-end and 3′-end of genes, respectively
Fig. 3Performance of 3’Pool-seq in detecting differential expressed genes. a Differentially expressed genes identified by TruSeq (FDR q-value< 0.05, absolute log2(Fold-Change) > 1) were used as the “true DE genes”. b Correlation of the log2(Fold-Change) quantified by 3’Pool-seq and TruSeq for DE genes identified by the TruSeq protocol
Fig. 4Performance of 3’Pool-seq with low RNA input samples. a Number of genes detected (TPM > 1) when different RNA input amounts were used. b Correlations of ERCC spike-ins among replicates when different amounts of RNA input were used and ERCC spike-ins were diluted proportionately. c Comparisons of log2(Fold-Changes) for DE genes (defined as FDR q-value< 0.05, log2(Fold-Change) > 1 in the 3’Pool-seq run with 50 ng RNA input) between 10 ng input RNA 3’Pool-seq run and 50 ng input RNA 3’Pool-seq run
Fig. 5Plate-based format of 3’Pool-seq applied to differentiate gene expression responses between troglitazone and pioglitazone treatments. a Layout of plate-based 3’Pool-seq using row pooling scheme. Principal component analysis using ERCC spike-ins is used to assess row effect b and column effect c. 95% confidence eclipses are shown for each row or column groups. Row effect is observable as indicated by the strong correlation of row groups with PC1 (R2 = 0.53), while column effect is not observed (correlation of column groups with PC1 R2 = 0.11). d Differentially expressed genes identified at different doses and time points for the two PPARγ agonists. Row I.D.s were used in the differential expression analysis to correct for row pooling effect. e DE genes identified upon 16 h 25 μM troglitazone treatment showed little differential changes in 16 h 25 μM pioglitazone treatment
Cost, Time, and Qualitative Metrics comparison of 3’Pool-seq and TrusSeq, as well as two additional 3′-end sequencing techniques: Plate-Seq and DRUG-seq. (N/A) indicates that values were not readily accessible in the corresponding article. (*) Represents sequencing costs on a HighSeq platform, while others represent costs on a NextSeq platform
| 3’Pool-seq | TruSeq | Plate-Seq | DRUG-seq | |
|---|---|---|---|---|
| Library prep cost per sample | $3 | $60 | $3 | $0.2–1 |
| Sequencing cost per sample | $12 | $100 | $12* | $2–4* |
| Overall time for library prep | 8–12 h | 2–3 days | > 2 days | N/A |
| Hands-on time | 2–3 h | 6–8 h | N/A | N/A |
| Samples per Run | 96 | 12–24 | 96 | 384–1536 |
Major Advantage | No custom equipment or sequencing primers, stringently benchmarked against ERCC and TruSeq. | Best option for detecting low-abundance genes or splice variants. | Oligo-dT Plate-based RNA purification. | Most affordable option, highest throughput, manual alternative is described. |
| Major Disadvantage | Involves RNA purification step, lowest throughput of three 3′-end techniques described herein. | Most expensive option, low-throughput, technically tedious. | Requires custom liquid dispensing equipment, no detailed benchmarking with ERCC or TruSeq. | Requires custom liquid dispensing equipment, manual protocol not benchmarked. |