| Literature DB >> 32286338 |
Christina A Harrington1,2, Suzanne S Fei3, Jessica Minnier4,5, Lucia Carbone6,3,7,8, Robert Searles4,8, Brett A Davis3,7, Kimberly Ogle9, Stephen R Planck9,10, James T Rosenbaum9,10,11, Dongseok Choi5,9,10,12.
Abstract
Peripheral blood is a highly accessible biofluid providing a rich source of information about human physiology and health status. However, for studies of the blood transcriptome with RNA sequencing (RNA-Seq) techniques, high levels of hemoglobin mRNAs (hgbRNA) present in blood can occupy valuable sequencing space, impacting detection and quantification of non-hgbRNAs. In this study, we evaluated two methods for preparing ribosomal RNA (rRNA)-depleted sequencing libraries for RNA-Seq of whole blood, one of which is also designed to deplete hgbRNAs. Two experiments were performed: one evaluating library performance across 6 human blood samples and the other examining library reproducibility and performance in a two-subject subset. We find that addition of hgbRNA depletion to the rRNA-depletion protocol for library preparation from blood RNA effectively reduces highly abundant hgbRNA reads; however, it does not result in a statistically significant increase in differentially expressed genes in our patient-control study. Bioinformatic removal of globin gene counts in non-hgbRNA depleted libraries provides improvement in overall performance of these libraries. We conclude that use of a standard ribosomal RNA depletion method for library preparation coupled with bioinformatic removal of globin gene counts is sufficient for reproducible and sensitive measurement of both coding and noncoding RNAs in the blood transcriptome.Entities:
Mesh:
Substances:
Year: 2020 PMID: 32286338 PMCID: PMC7156519 DOI: 10.1038/s41598-020-62801-6
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Experimental Design and Analysis. (a) Blood RNA preparation steps from 6 human subjects: 3 controls (c) and 3 patients (P). RNA was used for preparing libraries with Ribo-Zero Gold (RZG) or Globin-Zero (GZ) kits as shown. In Experiment 1, a total of 33 libraries were prepared with either 250 ng total RNA for low (L) input or 900 ng for high (H) input. In Experiment 2 technical replicate libraries were prepared using 250 ng RNA from C1 and C2 with either RZG or Globin-Zero for a total of 12 libraries. Individual RNA-seq data files generated from the libraries shown in B and C were named according to the following convention: sample ID_RNA input_library method_replicate number, e.g., C1_L_GZ_1. (b) Fastq files were assessed for quality metrics, aligned and analyzed as shown. Correspondence between data analysis steps and results shown in Figs. 2–4 is indicated.
Figure 2Read distributions across methods and samples. (a) Estimated fraction of reads that map to hemoglobin (Hgb) genes, rRNA, and Other in method-paired datasets from 6 human donors (Experiment 1) and technical replicate datasets from C1 and C2 donors (Experiment 2) with either Globin-Zero (GZ) or Ribo-Zero-Gold (RZG). Proportions were estimated by mapping a subset of total trimmed reads to only Hgb gene sequences or only rRNA gene sequences, intentionally including multi-mapped reads. (b) HgbRNA levels in C1 and C2 libraries. Counts per million (CPM) expression levels measured in Experiment 2 GZ and RZG replicate libraries for Hgb genes were used to calculate the average log2 CPM for each gene in the C1 and C2 samples with each library method. Hgb gene family members with fewer than 20 raw counts in all libraries are not shown. (c) Ensembl gene biotype (https://uswest.ensembl.org/info/genome/genebuild/biotypeshtml.) proportions from Experiment 2 utilizing the raw gene counts (number of uniquely-mapped reads that unambiguously map to only one gene) that were used for subsequent quantitative analysis. Technical replicates are averaged. RZG-Hg indicates gene counts from RZG libraries after bioinformatic removal of Hgb gene counts. (d) Gene biotype proportions of counts after TPM calculation which estimates the number of RNA transcripts from each gene by not only normalizing the raw gene counts by library depth as CPM does but also by accounting for gene length; this permits count comparison of differently-sized transcripts within a sample. Technical replicates are averaged.
Figure 4Impact of library method on differentially expressed gene measurements. Differentially-expressed genes (DEG) were measured as described with RNA-Seq data from Experiment 1 libraries. (a) Biotypes of genes measured as DE between library methods in a paired analysis of 6 blood samples. (b) DEG from comprehensive, multi-factor analysis of the Experiment 1 libraries (patients and healthy controls). Green numbers library circle indicates number of genes up in Globin-Zero relative to RZG and purple down in Globin-Zero. No significant genes were detected by the interactions between disease diagnosis and library method in Experiment 1.
Experiment 2 RNA-Seq performance metrics reveal differences between library methods in rRNA and hemoglobin mRNA levels.
| Alignment Metrics | Gene Counts | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Blood Sample | Library Method | Technical Replicate | Total reads | rRNA ratea | Globin ratea | Total mapped | Duplicate rateb | Exonic ratec | Intergenic ratec | Total countsd | Globin |
| C1 | Ribo-Zero Globin | 1 | 75,889,973 | 0.46% | 74,970,455 | 56.86% | 46.49% | 8.53% | 26,940,402 | 0.03% | |
| 2 | 68,187,212 | 0.47% | 67,382,218 | 57.07% | 45.93% | 9.61% | 23,852,287 | 0.02% | |||
| 3 | 66,912,681 | 0.79% | 65,920,965 | 59.11% | 43.00% | 14.99% | 21,312,218 | 0.02% | |||
| Ribo-Zero Gold | 1 | 62,258,328 | 2.38% | 61,590,653 | 58.87% | 53.62% | 6.02% | 23,277,919 | |||
| 2 | 63,705,502 | 2.52% | 63,053,394 | 60.15% | 52.83% | 6.36% | 23,427,822 | ||||
| 3 | 64,213,439 | 2.42% | 63,462,386 | 60.39% | 52.82% | 6.44% | 23,582,022 | ||||
| C2 | Ribo-Zero Globin | 1 | 61,117,132 | 0.88% | 59,981,781 | 60.00% | 40.97% | 18.55% | 17,569,235 | 0.03% | |
| 2 | 66,744,713 | 0.91% | 65,499,404 | 62.22% | 41.02% | 17.84% | 19,226,041 | 0.02% | |||
| 3 | 62,514,672 | 1.65% | 60,682,926 | 71.18% | 34.45% | 28.66% | 13,765,481 | 0.03% | |||
| Ribo-Zero Gold | 1 | 64,728,692 | 4.01% | 63,842,383 | 60.71% | 52.01% | 7.69% | 22,556,548 | |||
| 2 | 67,886,975 | 2.97% | 67,119,332 | 60.32% | 52.75% | 6.86% | 24,114,916 | ||||
| 3 | 66,636,958 | 4.33% | 65,293,327 | 62.61% | 53.46% | 8.14% | 22,931,860 | ||||
aEstimated % of total reads via independent alignment of reads to only rRNA and hemoglobin sequences; b% of mapped reads; c% of mapped non-duplicate reads[14]; duniquely mapped reads unambiguously assigned to one gene; e% of gene counts.
Rates in bold text highlight elevated rRNA detection in the Ribo-Zero Globin (Globin-Zero) libraries and the high hemoglobin mRNA levels in the Ribo-Zero Gold (RZG) libraries.
Figure 3Library method affects within-sample concordance and gene detection. (a) MA plots averaged over 6 Experiment 1 library method pairs. Each library pair is matched for RNA input amount. Mean paired log FC is averaged over 6 pairs and mean log CPM of protein-coding genes is averaged over all 12 libraries. (b) Correlation plots of gene expression (averaged log2CPM) for Experiment 2 C1 and C2 technical replicates prepared with either GZ or RZG library method. (c) Genes detected at CPM > 1 in a single pair of RZG and GZ libraries prepared from each of the 6 samples in this study. (d) Genes detected at CPM > 1 in 3 out of 3 library replicates for either C1 or C2 using libraries prepared in Experiment 2. Venn diagrams show genes detected in common with GZ and RZG or uniquely with one library method. Detection analysis was done with all mapped gene counts included or after removal of all hemoglobin gene counts.