| Literature DB >> 33168820 |
Li Chen1, Ruirui Yang1, Tony Kwan2, Chao Tang1, Stephen Watt3, Yiming Zhang1, Guillaume Bourque2, Bing Ge2, Kate Downes4,5,6, Mattia Frontini4,5,7,8, Willem H Ouwehand3,4,5,9,10, Jing-Wen Lin1, Nicole Soranzo11,12, Tomi Pastinen13, Lu Chen14.
Abstract
Both poly(A) enrichment and ribosomal RNA depletion are commonly used for RNA sequencing. Either has its advantages and disadvantages that may lead to biases in the downstream analyses. To better access these effects, we carried out both ribosomal RNA-depleted and poly(A)-selected RNA-seq for CD4+ T naive cells isolated from 40 healthy individuals from the Blueprint Project. For these 40 individuals, the genomic and epigenetic data were also available. This dataset offers a unique opportunity to understand how library construction influences differential gene expression, alternative splicing and molecular QTL (quantitative loci) analyses for human primary cells.Entities:
Mesh:
Substances:
Year: 2020 PMID: 33168820 PMCID: PMC7652884 DOI: 10.1038/s41597-020-00719-4
Source DB: PubMed Journal: Sci Data ISSN: 2052-4463 Impact factor: 6.444
Fig. 1Study design of paired poly(A)-selected and ribosomal RNA-depleted RNA-sequencing.
Summary of the cell purity, RNA quality and sequencing of poly(A)-selected RNA-seq. * indicates the sequencing depth of the rRNA-depleted samples.
| Donor ID | Cell purity (FACS, %) | RIN | TIN (median) | rRNA-depleted reads (Million)* | Poly(A)-selected reads (Million) | Q30 (%) | GC (%) | Uniquely mapped reads (%) | Average mapped length |
|---|---|---|---|---|---|---|---|---|---|
| S000GZ | 94.8 | 9.8 | 79.3 | 46.5 | 78.0 | 86.5 | 48.0 | 93.4 | 198.4 |
| S000X1 | 97.2 | 9.1 | 79.3 | 61.4 | 64.8 | 86.1 | 48.0 | 93.6 | 198.5 |
| S0010Q | 95.5 | 9.3 | 79.4 | 54.3 | 51.9 | 86.5 | 48.0 | 93.3 | 198.4 |
| S0012M | 94.2 | 9.6 | 78.9 | 56.2 | 54.7 | 86.2 | 48.0 | 93.0 | 198.4 |
| S001C2 | 94.3 | 9.6 | 79.1 | 36.8 | 57.1 | 88.5 | 48.0 | 93.8 | 198.5 |
| S001GV | 96.1 | 9.5 | 79.3 | 68.4 | 60.4 | 88.3 | 48.0 | 93.6 | 198.5 |
| S001KN | 95.7 | 9.8 | 79.5 | 69.0 | 58.7 | 88.4 | 48.0 | 94.1 | 198.6 |
| S001NH | 97.4 | 9.7 | 79.5 | 45.8 | 69.2 | 87.9 | 48.0 | 94.0 | 198.5 |
| S001T5 | 95.7 | 9.6 | 79.3 | 31.9 | 60.6 | 86.8 | 48.0 | 93.6 | 198.4 |
| S0021K | 97.6 | 9.8 | 79.3 | 71.4 | 56.7 | 86.3 | 48.0 | 93.3 | 198.3 |
| S0026A | 97.7 | 9.7 | 79.6 | 53.6 | 61.7 | 86.4 | 48.0 | 93.6 | 198.3 |
| S00294 | 94.9 | 9.8 | 79.0 | 57.4 | 63.7 | 86.6 | 47.0 | 92.6 | 198.1 |
| S002EV | NA | 9.6 | 78.0 | 49.2 | 19.9 | 83.2 | 47.0 | 92.8 | 198.3 |
| S002FT | 94.3 | 9.6 | 78.6 | 50.1 | 27.9 | 83.1 | 48.0 | 93.2 | 198.3 |
| S002MF | 96.1 | 9.6 | 78.7 | 51.7 | 33.4 | 83.1 | 48.0 | 93.4 | 198.3 |
| S002WW | 95.8 | 9.5 | 79.1 | 54.0 | 35.4 | 82.4 | 48.0 | 92.7 | 198.3 |
| S002XU | 95.9 | 9.1 | 78.8 | 51.5 | 36.3 | 83.4 | 47.0 | 92.2 | 198.3 |
| S0031G | 95.8 | 9.5 | 78.4 | 41.1 | 29.4 | 82.9 | 48.0 | 93.2 | 198.3 |
| S0032E | 98.2 | 9.3 | 77.9 | 54.7 | 32.9 | 83.3 | 47.0 | 92.6 | 198.4 |
| S00382 | 98.6 | 9.6 | 78.5 | 62.1 | 32.1 | 84.2 | 47.0 | 92.9 | 198.4 |
| S003AZ | 99.1 | 9.5 | 78.5 | 68.6 | 34.1 | 84.5 | 47.0 | 92.8 | 198.4 |
| S003JH | 94.2 | 9.7 | 78.5 | 80.1 | 33.3 | 84.3 | 47.5 | 93.3 | 198.4 |
| S003P5 | 95.1 | 9.5 | 77.9 | 48.7 | 24.2 | 81.7 | 48.0 | 92.9 | 198.3 |
| S003Q3 | 95.0 | 9.9 | 77.9 | 53.2 | 27.6 | 81.8 | 48.0 | 92.8 | 198.3 |
| S003R1 | 98.4 | 9.9 | 78.7 | 60.5 | 27.7 | 84.3 | 48.0 | 93.1 | 198.4 |
| S0041C | 95.0 | 9.5 | 78.7 | 76.7 | 27.5 | 84.3 | 48.0 | 93.4 | 198.4 |
| S004M7 | 98.0 | 10.0 | 79.2 | 134.9 | 27.3 | 83.9 | 48.0 | 93.4 | 198.3 |
| S004N5 | 93.5 | 9.7 | 78.8 | 71.1 | 30.8 | 84.2 | 48.0 | 93.6 | 198.4 |
| S005N1 | 97.4 | 9.7 | 78.6 | 61.5 | 29.9 | 81.5 | 48.0 | 93.2 | 198.3 |
| S005VM | 93.5 | NA | 77.5 | 52.6 | 25.2 | 81.3 | 48.0 | 93.0 | 198.3 |
| S005WK | 96.8 | 9.0 | 77.6 | 63.0 | 24.5 | 81.6 | 48.0 | 93.2 | 198.4 |
| S00630 | 89.9 | 9.5 | 77.8 | 52.8 | 22.4 | 81.5 | 47.0 | 93.4 | 198.3 |
| S0064Z | 96.9 | 9.9 | 78.3 | 57.4 | 27.5 | 81.7 | 47.0 | 93.3 | 198.4 |
| S006XE | 96.5 | 9.6 | 78.2 | 58.9 | 29.5 | 81.9 | 47.0 | 93.3 | 198.4 |
| S007CF | 99.0 | 9.0 | 77.4 | 59.4 | 23.9 | 88.7 | 46.0 | 93.4 | 198.5 |
| S007DD | 98.0 | 9.0 | 77.7 | 61.1 | 20.7 | 88.5 | 47.0 | 93.4 | 198.4 |
| S007F9 | 98.2 | 9.0 | 78.4 | 70.6 | 35.2 | 87.8 | 48.0 | 93.2 | 198.3 |
| S007G7 | 97.7 | NA | 78.2 | 67.7 | 30.4 | 88.5 | 47.0 | 93.5 | 198.5 |
| S007PQ | 91.9 | 8.8 | 77.6 | 91.9 | 30.0 | 88.5 | 46.0 | 93.2 | 198.4 |
| S007VE | 94.5 | 8.6 | 77.9 | 55.4 | 37.3 | 88.5 | 47.5 | 93.6 | 198.5 |
Fig. 2Summary of key quality control metrics. (a) Boxplot of average sequence quality per base per sample. Blue boxes indicate data gathered using poly(A)-selected and yellow boxes indicate rRNA-depleted. (b) Reads distribution along the gene body. Relative coverage of uniquely mapped tags generated based on the poly(A)-selected and rRNA-depleted RNA-seq. (c) Frequency of counts in various gene regions. The statistical test is Student’s t-test, and the error bars depict the standard deviation.
Fig. 3Comparison of gene expression identification between poly(A)-selected and rRNA-depleted. (a) Pearson’s correlation coefficients of the gene expression between 40 paired samples using HTSeq and ExonOnly quantification using STAR. (b) Scatter plot of rlog gene expression in poly(A)-selected RNA-seq and rRNA-depleted RNA-seq of one sample. (c) Scatterpie plot of paired genes identified in both datasets using ExonOnly. The x-axis and y-axis showed the sequencing depth of rRNA-depleted and poly(A)-selected of each pair respectively, and the pie chart illustrates the fractions of shared and protocol-specific genes. (d) Violin plot showing the percentage of genes that are library-specific and shared between two sequencing libraries. (e) Percentage of each biotype of library-specific and shared genes, biotypes accounting for more than 2% of all genes were shown (pseudogenes and antisense genes from non-stranded poly(A)-selected samples were excluded).
Fig. 4Unsupervised clustering analysis. (a) PCA of three cell types in the Blueprint project. (b) Hierarchical clustering of ExonOnly quantification before (left) and after (right) batch correction using ComBat based on STAR alignment.
| Measurement(s) | RNA • gene expression |
| Technology Type(s) | RNA-seq of coding RNA • RNA-seq of total RNA • bioinformatics analysis |
| Factor Type(s) | gene expression level quantification • ribosomal RNA-depleted and poly(A)-selected RNA-seq • type of human primary cells |
| Sample Characteristic - Organism | Homo sapiens |