| Literature DB >> 30001700 |
Yu Fu1,2, Pei-Hsuan Wu3, Timothy Beane3, Phillip D Zamore4, Zhiping Weng5,6.
Abstract
BACKGROUND: RNA-seq and small RNA-seq are powerful, quantitative tools to study gene regulation and function. Common high-throughput sequencing methods rely on polymerase chain reaction (PCR) to expand the starting material, but not every molecule amplifies equally, causing some to be overrepresented. Unique molecular identifiers (UMIs) can be used to distinguish undesirable PCR duplicates derived from a single molecule and identical but biologically meaningful reads from different molecules.Entities:
Keywords: PCR cycle; PCR duplicates; RNA-seq; Ribognome; Sequencing depth; Small RNA-seq; Starting material; Transcriptome; UMI; Unique molecular identifier
Mesh:
Substances:
Year: 2018 PMID: 30001700 PMCID: PMC6044086 DOI: 10.1186/s12864-018-4933-1
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1UMI incorporation into RNA-seq. a Overall workflow. Schematic of a read produced from RNA-seq with UMIs (b) and of UMI locators (c)
Fig. 2UMI incorporation into small RNA-seq. a Overall workflow. The method uses a 3′ adapter composed of DNA, except for a single, 5′ ribonucleotide (rA); the 5′ adapter is entirely RNA. A standard index barcode allows multiplexing. b Schematic of a read produced from small RNA-seq with UMIs
Fig. 3Identifying PCR duplicates. a Strategy for correcting errors in UMIs. b Illustration of how correcting errors in UMIs increases accuracy of PCR duplicate elimination
Fig. 4Simulation of PCR duplicate removal with or without error correction for UMIs. One parameter (PCR cycle number, starting material, or sequencing depth) was varied with the other parameters kept constant. Upper plots show the fraction of duplicates, while lower plots show the accuracy of duplicate detection. Each dotted line indicates the value for this parameter used in other simulations
Fig. 5a Transcript abundance (FPKM) calculated by removing PCR duplicates using only mapping coordinates compared to using mapping coordinates and UMIs. b Using only mapping coordinates significantly biases against abundant and short genes. Outliers omitted. Wilcoxon rank sum test; n, number of genes in each group. c Relationship between cumulative coefficient of variation and transcript abundance
Fig. 6Fraction of PCR duplicates across genes for (a) a series of UMI RNA-seq and small RNA-seq libraries made with different amount of starting materials, and (b) a series of UMI small RNA-seq libraries all made with 5 μg of total mouse testis RNA and with an increasing number of PCR cycles