| Literature DB >> 26130076 |
Juan Pablo Lopez1,2, Alpha Diallo3, Cristiana Cruceanu4,5, Laura M Fiori6, Sylvie Laboissiere7, Isabelle Guillet8, Joelle Fontaine9, Jiannis Ragoussis10,11, Vladimir Benes12, Gustavo Turecki13,14, Carl Ernst15,16.
Abstract
BACKGROUND: Small ncRNAs (sncRNAs) offer great hope as biomarkers of disease and response to treatment. This has been highlighted in the context of several medical conditions such as cancer, liver disease, cardiovascular disease, and central nervous system disorders, among many others. Here we assessed several steps involved in the development of an ncRNA biomarker discovery pipeline, ranging from sample preparation to bioinformatic processing of small RNA sequencing data.Entities:
Mesh:
Substances:
Year: 2015 PMID: 26130076 PMCID: PMC4487992 DOI: 10.1186/s12920-015-0109-x
Source DB: PubMed Journal: BMC Med Genomics ISSN: 1755-8794 Impact factor: 3.063
Fig. 1Illustration of study design and samples. Human biological samples (N = 45) were included in the present study. a Peripheral blood from a single individual was split into 11 aliquots (technical replicates) to test three different small RNA library purification methods: Novex TBE PAGE gel (N = 3), Pippin Prep automated gel system (PPS) (N = 4), and AMPure XP beads ((N = 3). Sample C1 (control-human brain) (N = 1), sample AC (control-no purification method) (N = 1). b Peripheral blood from a single individual was split into 5 aliquots (technical replicates) to test optimal amounts of RNA input: (1 μg), (0.5 μg), (0.25 μg), (0.1 μg), and (0.05 μg). All libraries were purified using the PPS system. c Peripheral blood samples from 15 healthy volunteers (biological replicates) to test the effects of RNA integrity. Samples were split into 5 groups (N = 3) with average RIN values of 9, 7, 5.4, 2.2 and 0. All libraries were purified using AMPure XP beads. d Peripheral blood samples from 12 healthy volunteers (biological replicates) to test effects of sequencing coverage. Samples sequenced on both a HiSeq2500 (N = 12) and MiSeq (N = 12) Illumina sequencers. All libraries were purified using AMPure XP beads. e Human whole-blood (N = 4), brain (N = 4), heart (N = 4) and liver (N = 4) tissues to test expression and tissue specificity of small ncRNAs. All libraries were purified using AMPure XP beads
Bioinformatic output measures for small RNA sequencing quality control
| QC metric | Description |
|---|---|
| Raw Reads | According to Illumina guidelines for small RNA sequencing, 1–2 M reads is an accepted range for expression profiling experiments, while 2–5 M reads is the accepted range for discovery applications. |
| Size | To avoid background noise due to small fragments of degraded RNA, we removed all reads <15 nt. Size filtering can be easily modified to target a specific small RNA species. For example, 15–28 nt (miRNAs), 24–31 nt (piRNAs), or 15–40 nt if interested in all small ncRNAs. |
| Quality | Quality (Q) is based on a Phred score, which estimates sequencing error probabilities per base. A Q = 10 means a 1/10 probability of incorrect base calling or 90 % accuracy; Q = 20 (1/100; 99 %); Q = 30 (1/1000; 99.9 %); and Q = 40 (1/10000; 99.99 %). We removed reads with a quality score <30. |
| Adapter-Adapter | Adapter detection can be adjusted to allow for one or more mismatches in the first 10 nt to identify and trim the adapters. In order to enhance high quality reads, we set our adapter detection threshold to a perfect-10 nt match. Ligation of the 3′ and 5′ adapters to each other happens by chance at a very low rate. However, this can become an important issue for libraries prepared from very small amounts of RNA. We removed all adapter-adapter reads. |
| RNAs > 40 nt | This feature refers to RNA reads larger than 40 nt in length. In most cases these reads map to midsize and larger non-coding RNA populations. The percentage of reads >40 nt can vary (1 %–50 %) depending on library preparation method used. |
| Surviving Reads | This metric shows the number of reads that pass all the quality and trimming filters previously described. A good quality library should have surviving rates between 50 % and 100 %, depending on method used. |
| Unmapped | Due to sequencing errors, stringent QC filters, or RNA from other species (usually added as control, i.e. PhiX), a very small percentage of reads do not map to any human genomic location. |
| Unique & Multi-Mapped | In contrast to other types of sequencing (DNA and larger RNA), the percentage of reads that map to multiple genomic locations in small RNA sequencing is expected to be high (>50 %). Several small RNAs are encoded at more than one genomic location. This is thought to be a compensatory mechanism or response to ncRNA knockouts by random mutations. |
| miRNA | We used miRBase to align our reads to known miRNA species. A high percentage of reads aligned to miRNAs is expected. However, this percentage can vary depending on the source and quality of RNA. |
| Other ncRNAs | Rfam and NCBI’s piRNA databases were used to map our reads to other small RNA species. The number of these reads is very small compared to miRNAs. However, just like with miRNAs, the number of reads mapping back to other sncRNAs is associated with the source and quality of RNA. |
| (Repeat, Coding gene, Unknown) | This refers to an additional portion of reads that map to repetitive sequences, coding genes, and unknown sequences in the human genome. The number of these reads is expected to be low. |
| miRNA Count | We set a detection threshold at one count per miRNA (present at least once in each of the libraries tested) in order to get a better picture of lowly expressed miRNAs. However, for quantification and discovery studies, we recommend higher detection thresholds, usually >10 or >20 counts per miRNA, to avoid background noise and false positives. |
Important quality control (QC) measures for bioinformatic analysis of our high-throughput biomarker discovery pipeline
Purification method
| Method | Novex | PPS | AMPure | Control |
|---|---|---|---|---|
| Sample | A1-A3 | A4-A7 | A8-A10 | AC |
| Amount | 1ug | 1ug | 1ug | 1ug |
| RIN | 8.2 | 8.2 | 8.2 | 8.2 |
| Average Quality | 37 | 37 | 37 | 37 |
| Raw Reads | 8.840869 | 11.871091 | 9.152952 | 8.491022 |
| Size (<15 nt) | 0.40 % | 0.15 % | 2.12 % | 2.12 % |
| Low Quality (Q <30) | 1.56 % | 1.50 % | 1.20 % | 1.21 % |
| Adapter-Adapter | 0.05 % | 0.03 % | 0.15 % | 0.53 % |
| RNAs >40 nt | 1.12 % | 2.09 % | 21.73 % | 18.66 % |
| Surviving Reads | 96.87 % | 96.24 % | 74.78 % | 77.47 % |
| Unmapped | 1.31 % | 1.50 % | 2.00 % | 1.89 % |
| Unique-Mapped | 7.21 % | 6.44 % | 6.52 % | 6.42 % |
| Multi-Mapped | 91.47 % | 92.06 % | 91.47 % | 91.69 % |
| miRNA | 96.92 % | 96.45 % | 96.03 % | 96.24 % |
| Other ncRNAs | 0.42 % | 0.46 % | 0.49 % | 0.48 % |
| Repeat | 0.77 % | 1.04 % | 0.88 % | 0.82 % |
| Coding Gene | 0.05 % | 0.04 % | 0.04 % | 0.04 % |
| Unknown | 0.52 % | 0.48 % | 0.41 % | 0.42 % |
| miRNA Count (≥1) | 415 | 425 | 370 | 372 |
Small RNA data analysis shows the percentage, composition and quality of reads from eleven libraries produced by our bioinformatics pipeline in order to test and compare three different small RNA library preparation methods
Fig. 2Quality control (QC) data (A1-A10). a Mean quality value scores over 40nts. b Distribution of reads based on length (19–25 nt, microRNAs) (30–35 nt, other sncRNAs). c-d Total number of reads, mapping percentages, and fraction of reads mapping RNA species
Library preparation: purification methods
| Method | Specificity | Throughput | Cost ($) | Study size |
|---|---|---|---|---|
| Novex TBE PAGE gel | High | Low | $$$$$ | Small |
| (manually cutting band; very specific) | (few libraries/day) | (2–10 samples) | ||
| Pippin Prep Automated gel system | Medium | Low | $$$ | Medium |
| (automated band; less specific) | (4 libraries/run [2 hrs]) | (10–50 samples) | ||
| AMPure XP beads | Low | High | $ | Large |
| (all products >100 nt) | (24 libraries/2 hrs) | (50 and up) |
Recommendations for small RNA sequencing library purification. Recommendations include: (1) Specificity: based on specificity to a particular small RNA population. (2) Throughput: based on the number of libraries that can be prepared per day and efficiency of processing. This number is relative to the number of people working and instruments available in the lab. (3) Cost: based on price of reagents, hands-on laboratory time, service fees by genome centers. (4) Study Size: based on number of biological or technical replicates
Total RNA input
| Sample | A11 | A12 | A13 | A14 | A15 |
|---|---|---|---|---|---|
| Amount | 1ug | 0.5ug | 0.25ug | 0.1ug | 0.05ug |
| RIN | 8.2 | 8.2 | 8.2 | 8.2 | 8.2 |
| Average Quality | 38 | 38 | 38 | 38 | 38 |
| Raw Reads | 13.862726 | 7.995412 | 11.234898 | 11.921206 | 13.026487 |
| Size (<15 nt) | 0.12 % | 0.11 % | 0.54 % | 0.18 % | 0.29 % |
| Low Quality (Q <30) | 0.99 % | 1.02 % | 1.11 % | 1.04 % | 1.22 % |
| Adapter-Adapter | 0.02 % | 0.03 % | 0.13 % | 0.13 % | 0.17 % |
| RNAs >40 nt | 0.75 % | 3.40 % | 1.05 % | 1.33 % | 0.91 % |
| Surviving Reads | 98.12 % | 95.44 % | 97.17 % | 97.32 % | 97.41 % |
| Unmapped | 1.64 % | 2.17 % | 1.93 % | 1.99 % | 2.11 % |
| Unique-Mapped | 7.08 % | 7.70 % | 9.03 % | 8.75 % | 9.39 % |
| Multi-Mapped | 91.27 % | 90.14 % | 89.03 % | 89.27 % | 88.51 % |
| miRNA | 96.40 % | 93.99 % | 94.35 % | 93.95 % | 93.48 % |
| Other ncRNAs | 0.47 % | 0.78 % | 0.84 % | 0.88 % | 0.95 % |
| Repeat | 0.86 % | 2.20 % | 1.77 % | 2.04 % | 2.16 % |
| Coding Gene | 0.05 % | 0.07 % | 0.09 % | 0.09 % | 0.11 % |
| Unknown | 0.57 % | 0.79 % | 1.02 % | 1.05 % | 1.20 % |
| miRNA Count (≥1) | 499 | 424 | 536 | 558 | 560 |
Small RNA data analysis shows the percentage, composition and quality of reads from five libraries produced by our bioinformatics pipeline to test RNA input amounts for small RNA library preparation
RNA degradation: whole-blood
| Sample | C1-C3 | C4-C6 | C7-C9 | C10-C12 | C13-C15 |
|---|---|---|---|---|---|
| Tissue | Blood | Blood | Blood | Blood | Blood |
| RIN | 9 | 7 | 6 | 2 | 0 |
| Average Quality | 36 | 36 | 36 | 36 | 35 |
| Raw Reads | 14.221591 | 15.528347 | 12.679709 | 14.225867 | 11.689436 |
| Size (<15 nt) | 3.78 % | 4.92 % | 3.99 % | 3.54 % | 6.23 % |
| Low Quality (Q <30) | 2.82 % | 2.96 % | 3.00 % | 2.63 % | 3.42 % |
| Adapter-Adapter | 1.11 % | 0.47 % | 0.38 % | 0.85 % | 3.35 % |
| RNAs >40 nt | 25.56 % | 21.41 % | 28.98 % | 23.04 % | 15.47 % |
| Surviving Reads | 66.73 % | 70.24 % | 63.67 % | 69.95 % | 71.53 % |
| Unmapped | 3.26 % | 4.30 % | 3.53 % | 2.81 % | 3.40 % |
| Uniq-Mapped | 7.78 % | 8.83 % | 6.82 % | 7.98 % | 7.74 % |
| Multi-Mapped | 88.96 % | 86.87 % | 89.65 % | 89.21 % | 88.86 % |
| miRNA | 91.57 % | 87.64 % | 92.01 % | 93.66 % | 89.99 % |
| Other ncRNAs | 1.20 % | 3.74 % | 0.93 % | 0.84 % | 1.40 % |
| Repeat | 2.52 % | 2.57 % | 2.29 % | 1.64 % | 3.25 % |
| Coding Gene | 0.11 % | 0.23 % | 0.09 % | 0.08 % | 0.17 % |
| Unknown | 1.35 % | 1.53 % | 1.15 % | 0.97 % | 1.79 % |
| miRNA Count (≥1) | 469 | 463 | 399 | 476 | 488 |
Small RNA data analysis shows the percentage, composition and quality of reads from 15 libraries produced by our bioinformatics pipeline to test the effects of RNA quality on small RNA library preparation
Sequencing coverage
| Sample | D1-D12 | D1-D12 | Pearson (r) |
|---|---|---|---|
| RIN | 7.4 | 7.4 | ----- |
| Sequencer | HiSeq2500 | MiSeq | ----- |
| Average Quality | 37 | 36 | ----- |
| Raw Reads | 11.556456 | 889645 | ----- |
| Size (<15 nt) | 6.39 % | 6.79 % | 0.99039 |
| Low Quality (Q <30) | 1.68 % | 1.33 % | 0.95246 |
| Adapter-Adapter | 0.27 % | 0.32 % | 0.99639 |
| RNAs >40 nt | 37.63 % | 33.13 % | 0.98538 |
| Surviving Reads | 54.03 % | 58.42 % | 0.98834 |
| Unmapped | 6.01 % | 5.61 % | 0.99573 |
| Uniq-Mapped | 12.73 % | 13.14 % | 0.99653 |
| Multi-Mapped | 81.27 % | 81.24 % | 0.99672 |
| miRNA | 86.11 % | 85.77 % | 0.99374 |
| Other ncRNAs | 1.81 % | 1.83 % | 0.99512 |
| Repeat | 3.83 % | 4.37 % | 0.99679 |
| Coding Gene | 0.14 % | 0.15 % | 0.98360 |
| Unknown | 2.11 % | 2.26 % | 0.99109 |
| miRNA Count (≥1) | 563 | 231 | 0.99997 |
| miRNA Count (≥10) | 264 | 111 | 0.99997 |
| miRNA Count (≥20) | 217 | 92 | 0.99998 |
Small RNA data analysis shows the percentage, composition and quality of reads from 12 libraries produced by our bioinformatics pipeline in order to test sequencing coverage for small RNA sequencing. Libraries were sequenced on both on HiSeq2500 and MiSeq Illumina sequencers
Number of total miRNAs expected per million reads in whole-blood
| # of Reads (million) | 12 M | 6 M | 3 M | 1.5 M | 1 M |
|---|---|---|---|---|---|
| miRNA Count (>1) | 563 | 446 | 353 | 289 | 263 |
| miRNA Count (>10) | 264 | 216 | 177 | 138 | 124 |
| miRNA Count (>20) | 217 | 177 | 138 | 111 | 101 |
Number of total miRNAs expected per million reads at three different thresholds of detection
Fig. 3Expression of miRNAs in four human samples. Pie graph showing: a Whole-blood. b Brain. c Heart. d Liver
Fig. 4Tissue-specific patterns of expression of mi RNAs in human samples. Venn diagram showing: a Whole-blood vs. Brain vs. Heart vs. Liver. b Co-expression levels of miRNAs between whole-blood and other tissues
Fig. 5MicroRNA expression. a Bar graph showing small RNA sequencing Log2 expression of eight miRNAs in human whole-blood. b qRT-PCR validation. c Correlation of small RNA sequencing and qRT-PCR expression levels
Fig. 6Expression and distribution of other small non-coding RNAs in four human samples. Pie graph showing: a Whole-blood. b Brain. c Heart. d Liver
Fig. 7Tissue-specific patterns of expression of other small non-coding RNAs in human samples. Venn diagram showing: a Whole-blood vs. Brain vs. Heart vs. Liver. b Co-expression levels of miRNAs between whole-blood and other tissues