| Literature DB >> 31703623 |
Zsolt Balázs1, Dóra Tombácz1,2, Zsolt Csabai1, Norbert Moldován1, Michael Snyder2, Zsolt Boldogkői3.
Abstract
BACKGROUND: Alternative polyadenylation is commonly examined using cDNA sequencing, which is known to be affected by template-switching artifacts. However, the effects of such template-switching artifacts on alternative polyadenylation are generally disregarded, while alternative polyadenylation artifacts are attributed to internal priming.Entities:
Keywords: Direct RNA sequencing; Internal priming; Long-read sequencing; Polyadenylation; RNA sequencing; Template switching; cDNA sequencing
Mesh:
Substances:
Year: 2019 PMID: 31703623 PMCID: PMC6839120 DOI: 10.1186/s12864-019-6199-7
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1The mechanisms of internal priming and template switching. (a) Internal priming occurs due to the annealing of a primer to an A-rich region. A-rich regions are typically defined as genomic loci with six or more consecutive As or 12 As out of 20 nucleotides. (b) Template-switching artifacts are produced when the polymerase dislocates during elongation and reinitiates at a homologous sequence of another template
Fig. 2Comparison of cDNA and dRNA sequencing results of potential poly(A) sites supported by more than 10 reads. (a) The proportion of potential pA sites supported by dRNA sequencing for the HCMV (purple, n = 181) and human (orange, n = 30,139) datasets. (b) Performance of the different filtering methods. The left side shows the positive predictive value of the internal-priming (IP - red) and template-switching (TS - blue) filters based on the dRNA sequencing results (positive predictive value ~ kept sites which are also detected in dRNA Seq). Potential human pA sites were filtered using SQANTI (yellow) and also based on whether or not they occurred in PolyA_DB (green). The right side of the panel shows the proportion of potential pA sites filtered out by the different filtering options not supported by dRNA sequencing (~ negative predictive value). (c) Barplot of the number of potential pA sites and regions with different adenine content in the HCMV (left) and human datasets (right). The features that the filtering algorithm characterized as TES are marked in blue, whereas putative artifacts are marked in red. (d) The positive predictive value of the different filtering methods is shown as a function of adenine content. The HCMV results are not detailed because the low number of TESs contained in the dataset cannot provide for a meaningful analysis
Fig. 3Putative template-switching artifacts differ from putative transcriptional end sites. (a) The nucleotide composition of the regions surrounding (±50 nt) putative TESs and putative template-switching artifacts in the HCMV dataset (above) and the human dataset (below). Common polyadenylation motifs are marked on the top of the panel. Zero denotes the location of potential pA sites. (b) Polyadenylation signals detected upstream of TESs (blue) and putative artifactual pA sites (red). Data for human PAS usage taken from reference [26] are shown in purple. (c) Density plot of the distance between the detected PASs and potential pA sites at positions characterized as TESs (blue) and at positions characterized as artifactual sites (red). (d) Heatmap showing the proportion of reads ending at a given nucleotide in the vicinity (±10 nt) of a potential pA site. The values of all high-confidence (supported by > 10 reads) potential pA sites are averaged. Darker colors mean that a higher proportion of alignments ended at a given position. The separate cDNA sequencing experiments from the HCMV dataset are shown separately. (e) Poly(A) tail length distributions measured by cDNA at TES (above) and at artifactual sites (below). The medians are shown as vertical lines. Apart from the median values which may be somewhat dislocated by to A-rich regions, it is important to note that long poly(A) tails (> 40 nucleotides) are just as prevalent in the genuine and in the artifactual groups