| Literature DB >> 34518880 |
Calla L Telzrow1,2, Paul J Zwack3, Shannon Esher Righi4, Fred S Dietrich2, Cliburn Chan5, Kouros Owzar5,6, J Andrew Alspaugh1,2, Joshua A Granek5,6.
Abstract
RNA sequencing (RNA-Seq) experiments focused on gene expression involve removal of ribosomal RNA (rRNA) because it is the major RNA constituent of cells. This process, called RNA enrichment, is done primarily to reduce cost: without rRNA removal, deeper sequencing must be performed to compensate for the sequencing reads wasted on rRNA. The ideal RNA enrichment method removes all rRNA without affecting other RNA in the sample. We tested the performance of three RNA enrichment methods on RNA isolated from Cryptococcus neoformans, a fungal pathogen of humans. We find that the RNase H depletion method is more efficient in depleting rRNA and more specific in recapitulating non-rRNA levels present in unenriched controls than the commonly-used Poly(A) isolation method. The RNase H depletion method is also more effective than the Ribo-Zero depletion method as measured by rRNA depletion efficiency and recapitulation of protein-coding RNA levels present in unenriched controls, while the Ribo-Zero depletion method more closely recapitulates annotated non-coding RNA (ncRNA) levels. Finally, we leverage these data to accurately map the C. neoformans mitochondrial rRNA genes, and also demonstrate that RNA-Seq data generated with the RNase H and Ribo-Zero depletion methods can be used to explore novel C. neoformans long non-coding RNA genes.Entities:
Keywords: RNA enrichment; RNA sequencing; non-coding RNA; ribosomal RNA
Mesh:
Substances:
Year: 2021 PMID: 34518880 PMCID: PMC8527493 DOI: 10.1093/g3journal/jkab301
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
Figure 2The RNase H depletion method is highly specific. Pearson correlations were calculated for normalized read counts of all annotated genes in the C. neoformans genome, excluding rRNA genes and genes containing coding-strand rRNA duplications. (A) Unenriched libraries have high internal consistency as determined by leave-one-out cross-correlation of each Unenriched library with the mean of other Unenriched libraries. (B) The RNase H depletion method has the best overall rRNA depletion specificity, as determined by Pearson correlation of read counts for all genes with the Unenriched libraries. Pearson correlation coefficient (R) was calculated between each enriched library and the gene-wise average of counts across all Unenriched libraries.
Figure 1The RNase H depletion method is highly efficient in eliminating rRNA. The percentage of rRNA reads in each library is plotted. The RNase H depletion method has the most efficient depletion (lowest percentage of rRNA reads), with the Poly(A) isolation method a close second, and the Ribo-Zero depletion method a distant third. Unenriched libraries show that rRNA makes up most of the RNA in C. neoformans.
Figure 3The RNase H depletion method is highly specific with respect to protein-coding genes. Pearson correlations were calculated in the same way as Figure 2, but only for protein-coding genes, excluding genes containing coding-strand rRNA duplications. (A) Unenriched libraries have high internal consistency for protein-coding genes. (B) The RNase H depletion method has the best rRNA depletion specificity for protein-coding genes.
Figure 4The RNase H depletion method is highly specific with respect to annotated ncRNA genes, although less so than the Ribo-Zero depletion method. Pearson correlations were calculated in the same way as Figure 2, but only for annotated ncRNA genes, excluding rRNA genes. (A) Unenriched libraries have high internal consistency for annotated ncRNA genes. (B) The Ribo-Zero depletion method has the best rRNA depletion specificity for annotated ncRNA genes.
LncPipe identification of C. neoformans lncRNA
| Name | Chromosome | Start | End | # Exons | Total exonic length | Mean TPM | Median TPM |
|---|---|---|---|---|---|---|---|
| LINC-CNAG_07358-1 | 1 | 996421 | 997387 | 2 | 863 | 6.689427833 | 6.2667 |
| LINC-CNAG_07633-1 | 6 | 499352 | 499840 | 3 | 350 | 2.844233167 | 0 |
| LINC-CNAG_07649-1 | 6 | 1351673 | 1352718 | 3 | 913 | 5.5161474 | 5.799715 |
| LINC-CNAG_07769-5 | 9 | 828268 | 829327 | 3 | 919 | 9.2005976 | 10.76355 |
| LINC-CNAG_07769-4 | 9 | 831951 | 833064 | 4 | 2019 | 5.333811583 | 5.026716 |
| LINC-CNAG_07769-1 | 9 | 838389 | 840007 | 10 | 2730 | 4.443440783 | 3.846055 |
| LINC-CNAG_04857-1 | 10 | 199988 | 203380 | 43 | 6849 | 12.5665295 | 12.729905 |
| LINC-CNAG_04857-2 | 10 | 203693 | 205767 | 2 | 1983 | 1.903616517 | 1.67792 |
| LINC-CNAG_01945-1 | 11 | 1333989 | 1334590 | 2 | 540 | 3.47618165 | 1.745465 |
| LINC-CNAG_06521-2 | 13 | 743402 | 744570 | 4 | 1003 | 5.924291983 | 5.627535 |
| LINC-CNAG_07042-1 | 13 | 750280 | 751009 | 3 | 609 | 5.950332833 | 4.02392 |
Predicted lncRNA were discovered by analysis of RNase H-treated, Ribo-Zero-treated, and Unenriched RNA libraries. The name (assigned by LncPipe), chromosomal location, exon number, exonic length, and transcripts per million (TPM) across samples are shown for all 11 lncRNA identified.