Heng Li1,2. 1. Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA. 2. Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
Abstract
SUMMARY: Human alpha satellite and satellite 2/3 contribute to several percent of the human genome. However, identifying these sequences with traditional algorithms is computationally intensive. Here we develop dna-brnn, a recurrent neural network to learn the sequences of the two classes of centromeric repeats. It achieves high similarity to RepeatMasker and is times faster. Dna-brnn explores a novel application of deep learning and may accelerate the study of the evolution of the two repeat classes. AVAILABILITY AND IMPLEMENTATION: https://github.com/lh3/dna-nn.
SUMMARY:Human alpha satellite and satellite 2/3 contribute to several percent of the human genome. However, identifying these sequences with traditional algorithms is computationally intensive. Here we develop dna-brnn, a recurrent neural network to learn the sequences of the two classes of centromeric repeats. It achieves high similarity to RepeatMasker and is times faster. Dna-brnn explores a novel application of deep learning and may accelerate the study of the evolution of the two repeat classes. AVAILABILITY AND IMPLEMENTATION: https://github.com/lh3/dna-nn.
Authors: Valerie A Schneider; Tina Graves-Lindsay; Kerstin Howe; Nathan Bouk; Hsiu-Chuan Chen; Paul A Kitts; Terence D Murphy; Kim D Pruitt; Françoise Thibaud-Nissen; Derek Albracht; Robert S Fulton; Milinn Kremitzki; Vincent Magrini; Chris Markovic; Sean McGrath; Karyn Meltz Steinberg; Kate Auger; William Chow; Joanna Collins; Glenn Harden; Timothy Hubbard; Sarah Pelan; Jared T Simpson; Glen Threadgold; James Torrance; Jonathan M Wood; Laura Clarke; Sergey Koren; Matthew Boitano; Paul Peluso; Heng Li; Chen-Shan Chin; Adam M Phillippy; Richard Durbin; Richard K Wilson; Paul Flicek; Evan E Eichler; Deanna M Church Journal: Genome Res Date: 2017-04-10 Impact factor: 9.043
Authors: Aaron M Wenger; Paul Peluso; William J Rowell; Pi-Chuan Chang; Richard J Hall; Gregory T Concepcion; Jana Ebler; Arkarachai Fungtammasan; Alexey Kolesnikov; Nathan D Olson; Armin Töpfer; Michael Alonge; Medhat Mahmoud; Yufeng Qian; Chen-Shan Chin; Adam M Phillippy; Michael C Schatz; Gene Myers; Mark A DePristo; Jue Ruan; Tobias Marschall; Fritz J Sedlazeck; Justin M Zook; Heng Li; Sergey Koren; Andrew Carroll; David R Rank; Michael W Hunkapiller Journal: Nat Biotechnol Date: 2019-08-12 Impact factor: 54.908
Authors: Swapan Mallick; Heng Li; Mark Lipson; Iain Mathieson; Melissa Gymrek; Fernando Racimo; Mengyao Zhao; Niru Chennagiri; Susanne Nordenfelt; Arti Tandon; Pontus Skoglund; Iosif Lazaridis; Sriram Sankararaman; Qiaomei Fu; Nadin Rohland; Gabriel Renaud; Yaniv Erlich; Thomas Willems; Carla Gallo; Jeffrey P Spence; Yun S Song; Giovanni Poletti; Francois Balloux; George van Driem; Peter de Knijff; Irene Gallego Romero; Aashish R Jha; Doron M Behar; Claudio M Bravi; Cristian Capelli; Tor Hervig; Andres Moreno-Estrada; Olga L Posukh; Elena Balanovska; Oleg Balanovsky; Sena Karachanak-Yankova; Hovhannes Sahakyan; Draga Toncheva; Levon Yepiskoposyan; Chris Tyler-Smith; Yali Xue; M Syafiq Abdullah; Andres Ruiz-Linares; Cynthia M Beall; Anna Di Rienzo; Choongwon Jeong; Elena B Starikovskaya; Ene Metspalu; Jüri Parik; Richard Villems; Brenna M Henn; Ugur Hodoglugil; Robert Mahley; Antti Sajantila; George Stamatoyannopoulos; Joseph T S Wee; Rita Khusainova; Elza Khusnutdinova; Sergey Litvinov; George Ayodo; David Comas; Michael F Hammer; Toomas Kivisild; William Klitz; Cheryl A Winkler; Damian Labuda; Michael Bamshad; Lynn B Jorde; Sarah A Tishkoff; W Scott Watkins; Mait Metspalu; Stanislav Dryomov; Rem Sukernik; Lalji Singh; Kumarasamy Thangaraj; Svante Pääbo; Janet Kelso; Nick Patterson; David Reich Journal: Nature Date: 2016-09-21 Impact factor: 49.962
Authors: Rachel M Sherman; Juliet Forman; Valentin Antonescu; Daniela Puiu; Michelle Daya; Nicholas Rafaels; Meher Preethi Boorgula; Sameer Chavan; Candelaria Vergara; Victor E Ortega; Albert M Levin; Celeste Eng; Maria Yazdanbakhsh; James G Wilson; Javier Marrugo; Leslie A Lange; L Keoki Williams; Harold Watson; Lorraine B Ware; Christopher O Olopade; Olufunmilayo Olopade; Ricardo R Oliveira; Carole Ober; Dan L Nicolae; Deborah A Meyers; Alvaro Mayorga; Jennifer Knight-Madden; Tina Hartert; Nadia N Hansel; Marilyn G Foreman; Jean G Ford; Mezbah U Faruque; Georgia M Dunston; Luis Caraballo; Esteban G Burchard; Eugene R Bleecker; Maria I Araujo; Edwin F Herrera-Paz; Monica Campbell; Cassandra Foster; Margaret A Taub; Terri H Beaty; Ingo Ruczinski; Rasika A Mathias; Kathleen C Barnes; Steven L Salzberg Journal: Nat Genet Date: 2018-11-19 Impact factor: 38.330
Authors: M A Dobrynin; N M Korchagina; A D Prjibelski; D Shafranskaya; D I Ostromyshenskii; K Shunkina; I Stepanova; A V Kotova; O I Podgornaya; N I Enukashvily Journal: Sci Rep Date: 2020-11-12 Impact factor: 4.379