Allison J Taggart1, William G Fairbrother1,2. 1. Department of Molecular Biology, Cell Biology and Biochemistry, Brown University, Providence, RI 02912, USA. 2. Center for Computational Molecular Biology, Brown University, Providence, RI 02912, USA.
Abstract
BACKGROUND: Most intronic lariats are rapidly turned over after splicing. However, new research suggests that some introns may have additional post-splicing functions. Current bioinformatics methods used to identify lariats require a sequencing read that traverses the lariat branchpoint. This method provides precise branchpoint sequence and position information, but is limited in its ability to quantify abundance of stabilized lariat species in a given RNAseq sample. Bioinformatic tools are needed to better address these emerging biological questions. METHODS: We used an unsupervised machine learning approach on sequencing reads from publicly available ENCODE data to learn to identify and quantify lariats based on RNAseq read coverage shape. RESULTS: We developed ShapeShifter, a novel approach for identifying and quantifying stable lariat species in RNAseq datasets. We learned a characteristic "lariat" curve from ENCODE RNAseq data and were able to estimate abundances for introns based on read coverage. Using this method we discovered new stable introns in these samples that were not represented using the older, branchpoint-traversing read method. CONCLUSIONS: ShapeShifter provides a robust approach towards detecting and quantifying stable lariat species.
BACKGROUND: Most intronic lariats are rapidly turned over after splicing. However, new research suggests that some introns may have additional post-splicing functions. Current bioinformatics methods used to identify lariats require a sequencing read that traverses the lariat branchpoint. This method provides precise branchpoint sequence and position information, but is limited in its ability to quantify abundance of stabilized lariat species in a given RNAseq sample. Bioinformatic tools are needed to better address these emerging biological questions. METHODS: We used an unsupervised machine learning approach on sequencing reads from publicly available ENCODE data to learn to identify and quantify lariats based on RNAseq read coverage shape. RESULTS: We developed ShapeShifter, a novel approach for identifying and quantifying stable lariat species in RNAseq datasets. We learned a characteristic "lariat" curve from ENCODE RNAseq data and were able to estimate abundances for introns based on read coverage. Using this method we discovered new stable introns in these samples that were not represented using the older, branchpoint-traversing read method. CONCLUSIONS: ShapeShifter provides a robust approach towards detecting and quantifying stable lariat species.
Authors: W James Kent; Charles W Sugnet; Terrence S Furey; Krishna M Roskin; Tom H Pringle; Alan M Zahler; David Haussler Journal: Genome Res Date: 2002-06 Impact factor: 9.043
Authors: Nathaniel E Clark; Adam Katolik; Allison J Taggart; Luke Buerer; Stephen P Holloway; Nathaniel Miller; John D Phillips; Colin P Farrell; Masad J Damha; William G Fairbrother Journal: RNA Date: 2022-04-22 Impact factor: 5.636
Authors: Youri Hoogstrate; Malgorzata A Komor; René Böttcher; Job van Riet; Harmen J G van de Werken; Stef van Lieshout; Ralf Hoffmann; Evert van den Broek; Anne S Bolijn; Natasja Dits; Daoud Sie; David van der Meer; Floor Pepers; Chris H Bangma; Geert J L H van Leenders; Marcel Smid; Pim J French; John W M Martens; Wilbert van Workum; Peter J van der Spek; Bart Janssen; Eric Caldenhoven; Christian Rausch; Mark de Jong; Andrew P Stubbs; Gerrit A Meijer; Remond J A Fijneman; Guido W Jenster Journal: Gigascience Date: 2021-12-09 Impact factor: 6.524