| Literature DB >> 31101892 |
Marie-Ange Palomares1,2, Cyril Dalmasso3,4,5,6, Eric Bonnet7,8, Céline Derbois1,2, Solène Brohard-Julien1,2, Christophe Ambroise3,4,5,6, Christophe Battail1,2, Jean-François Deleuze1,2, Robert Olaso1,2.
Abstract
High-throughput RNA-sequencing has become the gold standard method for whole-transcriptome gene expression analysis, and is widely used in numerous applications to study cell and tissue transcriptomes. It is also being increasingly used in a number of clinical applications, including expression profiling for diagnostics and alternative transcript detection. However, despite its many advantages, RNA sequencing can be challenging in some situations, for instance in cases of low input amounts or degraded RNA samples. Several protocols have been proposed to overcome these challenges, and many are available as commercial kits. In this study, we systematically test three recent commercial technologies for RNA-seq library preparation (TruSeq, SMARTer and SMARTer Ultra-Low) on human biological reference materials, using standard (1 mg), low (100 ng and 10 ng) and ultra-low (<1 ng) input amounts, and for mRNA and total RNA, stranded and unstranded. The results are analyzed using read quality and alignment metrics, gene detection and differential gene expression metrics. Overall, we show that the TruSeq kit performs well with an input amount of 100 ng, while the SMARTer kit shows decreased performance for inputs of 100 and 10 ng, and the SMARTer Ultra-Low kit performs relatively well for input amounts <1 ng. All the results are discussed in detail, and we provide guidelines for biologists for the selection of an RNA-seq library preparation kit.Entities:
Mesh:
Substances:
Year: 2019 PMID: 31101892 PMCID: PMC6525156 DOI: 10.1038/s41598-019-43983-0
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Overview of the different RNA preparation kits and conditions analyzed in this study.
| Kit | Manufacturer | RNA type | Input (ng) | Stranded | Acronym | PC |
|---|---|---|---|---|---|---|
|
| ||||||
| mRNA TruSeq | Illumina | mRNA | 1000 | no | mRNATruseq_1 ug | 13 |
| Stranded mRNA TruSeq | Illumina | mRNA | 1000 | yes | ssmRNATruseq_1 ug | 13 |
| Stranded total RNA TruSeq | Illumina | total RNA | 1000 | yes | ssTotRNATruseq_1 ug | 14 |
| Stranded mRNA TruSeq | Illumina | mRNA | 100 | yes | ssmRNATruseq_100 ng | 15 |
| Stranded total RNA TruSeq | Illumina | total RNA | 100 | yes | ssTotRNATruseq_100 ng | 15 |
|
| ||||||
| RiboGone Stranded SMARTer | Takara Bio | total RNA | 100 | yes | RiboG_ssTotSmarter_100 ng | 18 |
| RiboZero Gold stranded SMARTer | Illumina + Takara Bio | total RNA | 100 | yes | RiboZ_ssTotSmarter_100 ng | 18 |
| RiboZero Gold stranded SMARTer | Illumina + Takara Bio | total RNA | 10 | yes | RiboZ_ssTotSmarter_10 ng | 18 |
|
| ||||||
| SMARTer UL + Nextera | Takara Bio + Illumina | mRNA | 1 (0.75) | no | mRNASmarterUL_XT1 ng_750 pg | 11/12 |
| SMARTer UL + Nextera | Takara Bio + Illumina | mRNA | 1 (0.13) | no | mRNASmarterUL_XT1 ng_130 pg | 11/12 |
PC: number of PCR cycles done for each application. For the Ultra-Low SMARTer kits, the first digit is the number of cycles for the LD-PCR (first step), and the second digit is the number of cycles for the PCR (second step).
Figure 1Read quality metrics for all samples and conditions. The metrics shown in the figure are the raw number of reads in the fastq files, the mapping rate after sequence trimming, the duplication rate and the average insert size.
Figure 2Normalized alignment rates (vertical axis) to intergenic, exon and intron regions for all the samples and conditions (horizontal axis). The mRNA samples are on located on the left side of the figure and the total RNA samples are on the right side.
Figure 3Number of genes detected (vertical axis) for all sampling levels and all conditions (horizontal axis).
Figure 4Percentage of pseudogenes (blue), protein-coding (green) and non-coding RNAs (red) for 1 μg TruSeq samples (A) and 100 ng TruSeq samples (B). The percentages are indicated within the bars. The vertical axis indicates the percentage and the horizontal axis indicates the sample type and sampling level. In each figure, the mRNA samples are on the left side and total RNA samples are on the right.
Figure 5Heatmap of the coverage percentage for 1,000 genes having a medium expression level for the standard and low input amounts categories. The coverage values are standardized to take into account the different gene lengths.
Figure 6Number of differentially expressed genes (DEG, vertical axis) detected between samples A and B for all sampling levels and conditions (horizontal axis).
Overlap in the number of differentially expressed genes (DEG) between a reference set (mRNATruseq_1 ug) and all the other conditions for the sampling level of 2 × 25 M.
| Set 1 | Set 2 | NDEG Set 1 | NDEG Set 2 | Inter. S1 S2 | % |
|---|---|---|---|---|---|
| mRNATruseq_1 ug | ssmRNATruseq_1 ug | 16983 | 16942 | 15505 | 91% |
| mRNATruseq_1 ug | ssTotRNATruseq_1 ug | 16983 | 15181 | 13457 | 79% |
| mRNATruseq_1 ug | ssmRNATruseq_100 ng | 16983 | 16305 | 15070 | 89% |
| mRNATruseq_1 ug | ssTotRNATruseq_100 ng | 16983 | 16367 | 14248 | 84% |
| mRNATruseq_1 ug | RiboG_ssTotSmarter_100 ng | 16983 | 13702 | 12285 | 72% |
| mRNATruseq_1 ug | RiboZ_ssTotSmarter_100 ng | 16983 | 10677 | 10013 | 59% |
| mRNATruseq_1 ug | RiboZ_ssTotSmarter_10 ng | 16983 | 4562 | 4419 | 26% |
| mRNATruseq_1 ug | mRNASmarterUL_XT1 ng_750 pg | 16983 | 9671 | 9233 | 54% |
| mRNATruseq_1 ug | mRNASmarterUL_XT1 ng_130 pg | 16983 | 9775 | 9330 | 55% |
The percentage represents the fraction of DEG relative to the reference set.
Figure 7Ratios (vertical axis) between real and predicted gene expression values for samples C for all sampling levels and conditions (horizontal axis). The red dotted line indicates a ratio value of 1, i.e. a perfect match between real and predicted value. Some of the outlier values are not shown in this figure.
Figure 8Ratios (vertical axis) between real and predicted gene expression values for samples D for all sampling levels and conditions (horizontal axis). The red dotted line indicates a ratio value of 1, i.e. a perfect match between real and predicted value. Some of the outlier values are not shown in this figure.
Assessment summary of the different RNA preparation kits and conditions analyzed in this study.
| Kit/Input | NR | MR | DR | GD | DEG | AC |
|---|---|---|---|---|---|---|
|
| ||||||
| mRNATruseq_1 ug | ***** | ***** | ***** | ***** | ***** | Yes |
| ssmRNATruseq_1 ug | ***** | ***** | ***** | ***** | ***** | Yes |
| ssTotRNATruseq_1 ug | ***** | ***** | ***** | ***** | ***** | Yes |
| ssmRNATruseq_100 ng | ***** | ***** | *** | **** | ***** | Yes |
| ssTotRNATruseq_100 ng | ***** | ***** | ***** | ***** | ***** | Yes |
|
| ||||||
| RiboG_ssTotSmarter_100 ng | **** | * | *** | **** | *** | Yes |
| RiboZ_ssTotSmarter_100 ng | ** | ** | ** | *** | ** | No |
| RiboZ_ssTotSmarter_10 ng | * | ** | * | *** | * | No |
|
| ||||||
| mRNASmarterUL_XT1 ng_750 pg | *** | **** | NA | *** | *** | Yes |
| mRNASmarterUL_XT1 ng_130 pg | *** | **** | NA | *** | *** | Yes |
The table presented here is based on the 2 × 10 M sampling level. The number of stars indicates the global level of quality of the metrics used, with a maximum of five stars for best quality. NR, MR and DR: sequencing and alignment quality metrics (Number of reads, mapping rate and duplication rate). GD: gene detection metrics. DEG: differentially expressed gene metrics. AC: automation capability.