| Literature DB >> 31226924 |
Carrie Wright1,2, Anandita Rajpurohit1, Emily E Burke1, Courtney Williams1, Leonardo Collado-Torres1, Martha Kimos1, Nicholas J Brandon3, Alan J Cross3, Andrew E Jaffe1,4,5,6,7,8,9, Daniel R Weinberger10,11,12,13,14, Joo Heon Shin15,16.
Abstract
BACKGROUND: RNA sequencing offers advantages over other quantification methods for microRNA (miRNA), yet numerous biases make reliable quantification challenging. Previous evaluations of these biases have focused on adapter ligation bias with limited evaluation of reverse transcription bias or amplification bias. Furthermore, evaluations of the quantification of isomiRs (miRNA isoforms) or the influence of starting amount on performance have been very limited. No study had yet evaluated the quantification of isomiRs of altered length or compared the consistency of results derived from multiple moderate starting inputs. We therefore evaluated quantifications of miRNA and isomiRs using four library preparation kits, with various starting amounts, as well as quantifications following removal of duplicate reads using unique molecular identifiers (UMIs) to mitigate reverse transcription and amplification biases.Entities:
Keywords: Library preparation; Small RNA sequencing; Unique molecular identifiers; isomiRs; microRNA
Mesh:
Substances:
Year: 2019 PMID: 31226924 PMCID: PMC6588940 DOI: 10.1186/s12864-019-5870-3
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1Study Design. a We evaluated the influence of starting amount on the consistency of results, as well as the accuracy of results obtained when using a variety of methods, including those intended to reduce bias from adapter ligation, reverse transcription (RT), and PCR amplification. b We compared four commercially available kits and two preprocessing methods to address RT and PCR bias. c In the Deduped method we collapsed duplicate reads based on a unique molecular identifier (UMI) of degenerate bases in the adapter sequences (bases within the black boxes). We also compared the collapsed data with a 5% subset of the NEXTflex data to determine if performance differences were due to the UMI-based collapsing of reads or simply due to having fewer reads. d We evaluated two data types: miRNA quantifications from homogenate whole brain total RNA and miRNA quantifications from a pool of 962 equimolar synthetic RNAs with sequences corresponding to human, rat, mouse, and virus miRNA. We had two batches of human brain data. The first included triplicates of different starting amounts based on the kit manufacturers’ suggested ranges. The second included a single sample of the same human brain with 1000 ng of input. We used 300 ng of the synthetic miRNAs for each tested method. e Our processing pipelines for the two types of RNA studied. f We evaluated the 6 small-RNA sequencing methods using 4 major assessments. The brain icon indicates utilization of brain samples to assess a question, while the red tube indicates utilization of synthetic miRNA samples
Fig. 2Similarity Assessment. a Dendrogram depicting cluster analysis shows that samples largely cluster by method and starting amount. b MA plot demonstrating the difference between the miRNA quantifications by the various library preparation methods. Individual points represent the miRNAs quantified; the y-axis of each plot shows the log ratio, which is the difference between the log2 transformed and DESeq2 normalized quantification estimates between the two methods, while the x-axis shows the average expression of each miRNA for the pair of methods compared (also log2 transformed and DESeq2 normalized). See the Similarity analysis section of the methods for more information. Thus the plot on the upper left corner shows the difference in quantification estimates between Clontech and Illumina for each miRNA quantified by both methods. We can see that some individual miRNAs greatly differ between the two methods across the full range of expression levels, as there is a difference of up to roughly 6 between the two methods for some miRNAs (with quantifications being much lower for the Clontech method relative to the Illumina method for these miRNAs), while the range of expression is roughly 6 to 18. As another example, the lower right plot shows the difference between the Deduped method and the Fivepercent method. Quantification estimates are quite similar, except for abundant miRNAs which show lower expression in the Deduped method, as one might expect. c The percent of variance explained by method, starting amount, batch, the number of reads mapped to miRNA, and the variance unaccounted for by these factors. Each point represents the variance explained by each factor for an individual miRNA sequence that was quantified by all of the tested methods
Fig. 3Accuracy Assessment. a Individual points represent the absolute difference of each synthetic miRNA quantification from the mean of all quantifications of the equimolar synthetic sequences for each small RNA sequencing method. b The variance of all the quantification estimates for the synthetic sequences. c The percent variance of synthetic sequence quantifications explained by each of these sequence characteristics: GC content, length, free energy of the predicted secondary structure (FoldG), identity of the first (5′) and last (3′) two bases, the count of individual bases, and the presence of repeat sequences, such as duplets of the same base or quadruplets of the same base. The heatmap legend shows the percentage of variance from 0 to 10%. d The percent variance explained by each of the sequence characteristics but weighted by the overall variance of each method, as shown in b. The heatmap legend shows the percentage of variance from 0 to 10%
Fig. 4Detection Diversity Assessment. a Mapping rate of various small RNAs utilizing the 1000 ng input human brain data. Undetermined indicates that the read did not map to the annotations of the evaluated small RNA classes. Error bars show standard deviation. Significance is only shown for methods that had significantly different mapping rates compared to all other methods with the same direction of change. b Mapping rate of small RNAs for all starting input amounts for each method. The Y-axis shows the percentage of reads of each category and the X-axis shows each tested brain sample. Statistical tests for the differences in mapping rates are shown in Additional file 6: Table S4. c The bars show the number of unique miRNAs with greater than 10 normalized reads common to all triplicates for the 1000 ng data of the first batch. Points indicate the number of unique miRNAs for each triplicate and the standard deviation error bars shown are for these points. d Percentage of miRNAs that had quantifications above 10 in only 1 or 2 of the triplicates. e Bars show number of unique isomiRs with greater than 100 normalized reads common to all triplicates for the 1000 ng data of the first batch. Points indicate the number of unique isomiRs for each triplicate and the standard deviation error bars shown are for these points. f Percentage of isomiRs that had quantifications above 100 in only 1 or 2 of the triplicates. g Overlap of unique miRNAs with greater than 10 normalized reads in all triplicates for the 1000 ng data of the first batch. h Overlap of unique isomiRs with greater than 100 normalized reads in all triplicates for the 1000 ng data of the first batch. i Number of false isomiRs detected for each of the 962 synthetic sequences. j Number of normalized reads (expression) of the false isomiRs. k Percent variance of the number of isomiRs observed for each synthetic sequence explained by various sequence characteristics. The heatmap legend shows the percentage of variance from 0 to 9%
Fig. 5Consistency Assessment. a Absolute difference of the normalized and log2 transformed quantifications (norm_quantifications) of the second batch from the mean of the triplicates of the first batch for each quantified miRNA of the 1000 ng input data. b Absolute difference of norm_quantifications for each quantified miRNA from a given triplicate to that of the mean of all three triplicates of the 1000 ng input data. c Absolute difference of norm_quantifications for each quantified miRNA from a given triplicate to that of the mean of all three triplicates of the data for all the starting inputs. d Percent variance of batch inconsistency (data in a) explained by various sequence factors. The heatmap legend shows the percentage of variance from 0 to 75%. e Percent variance of batch inconsistency (data in a) explained by various sequence factors weighted by the overall batch variance of each method. The heatmap legend shows the percentage of variance from 0 to 75%. f Plots of the association of expression and batch error. g Percent variance explained by various sequence factors of the triplicate inconsistency plotted in c. The heatmap legend shows the percentage of variance from 0 to 100%. h Percent variance explained by various sequence factors of the triplicate inconsistency plotted in c and weighted by the overall variance of triplicate error for each method. i Plots of the association of expression and triplicate inconsistency using all starting input data in c. The heatmap legend shows the percentage of variance from 0 to 100%
Summary of Results
| Representative Figure(s) | Assessment | Clontech | Illumina | NEB | NEXTflex | Deduped | Fivepercent |
|---|---|---|---|---|---|---|---|
| Method information | |||||||
| Figure | Designed for adapter ligation mitigation? | Adapter Ligation free | None | PEG | PEG and Randomized adapters | PEG and Randomized adapters | PEG and Randomized adapters |
| Figure | Designed for reverse transcription and PCR bias mitigation? | None | None | None | None | UMI | None |
| Performance metrics | |||||||
| Similarity | |||||||
| Figure | How similar are the quantifications of the same brain sample? | lowest correlations with other methods | Intermediate | Intermediate | Intermediate | Intermediate | Intermediate |
| Accuracy | |||||||
| Figure | Accuracy - similar detection of different sequences | Intermediate | Worst | Worst | Intermediate | Best | Intermediate |
| Detection Diversity | |||||||
| Figure | Mapping rate to miRNA | Worst | Best | Best | Best | Besta | Besta |
| Figure | Detection of miRNA across triplicates at 1000 ng | Intermediate | Intermediate | Intermediate | Intermediate | Best | Worst |
| Figure | Consistency of miRNA detection at 1000 ng | Worst | Intermediate | Best | Intermediate | Intermediate | Intermediate |
| Figure | Diversity of detection of isomiRs across triplicates at 1000 ng | Best | Intermediate | Intermediate | Worst | Intermediate | Worst |
| Figure | Consistency of isomiR detection at 1000 ng | Intermediate | Worst | Best | Intermediate | Intermediate | Intermediate |
| Figure | False isomiR detection per synthetic sequence | Worst | Intermediate | Intermediate | Best | Best | Best |
| Figure | False IsomiR expression | Intermediate | Intermediate | Worst | Best | Best | Best |
| Consistency | |||||||
| Figure | Consistency across batch | Intermediate | Worst | Best | Best | Best | Intermediate |
| Figure | Consistency across triplicates | Intermediate | Worst | Best | Intermediate | Best | Worst |
| Overall conclusions | |||||||
| Poor mapping rate, more accurate miRNA quantification, poor isomiR accuracy | Not as accurate, less detection, and less consistency | Good mapping, not as accurate, quite consistent, poor isomiR accuracy | Good mapping, more accurate miRNA quantification, more accurate isomiR quantification | Best performance overall, more accurate miRNA quantification | Deduping improves performance, not just an effect of reducing number of reads | ||
The table depicts the results of the various assessments performed on the 6 small RNA sequencing methods. aThe mapping rate of the raw NEXTflex data should reflect the quality for the data used for the Deduped and Fivepercent methods as these are derived from the raw NEXTflex data
Number of PCR cycles used for each sample
| Sample | Clontech | Illumina | NEB | NEXTflex |
|---|---|---|---|---|
| 100 ng Brain total RNA | 13 cycles | NA | 15 cycles | 18 cycles |
| 250 ng Brain total RNA | 13 cycles | NA | 15 cycles | 18 cycles |
| 500 ng Brain total RNA | 11 cycles | NA | 15 cycles | 18 cycles |
| 1000 ng Brain total RNA | 10 cycles | 11 cycles | 15 cycles | 18 cycles |
| 1500 ng Brain total RNA | 9 cycles | 11 cycles | NA | 18 cycles |
| 2000 ng Brain total RNA | 7 cycles | 11 cycles | NA | 18 cycles |
| 300 ng of equimolar synthetic pool | 7 cycles | 11 cycles | 15 cycles | 18 cycles |
Both batches of 1000 ng brain total RNA samples had the same number of cycles. We used the same number of cycles as suggested by each protocol for a range of input amounts. The brain samples were all derived from the same RNA extraction, purchased from Ambion of a 74-year-old Caucasian Female. The cause of death of this individual was respiratory failure. The Miltenyi Biotec miRXplore Universal Reference equimolar pool was used for the accuracy assessments