Literature DB >> 34020551

TERL: classification of transposable elements by convolutional neural networks.

Murilo Horacio Pereira da Cruz1,2, Douglas Silva Domingues3,4,5, Priscila Tiemi Maeda Saito6,7,8,9, Alexandre Rossi Paschoal1,10, Pedro Henrique Bugatti6,7,9.   

Abstract

Transposable elements (TEs) are the most represented sequences occurring in eukaryotic genomes. Few methods provide the classification of these sequences into deeper levels, such as superfamily level, which could provide useful and detailed information about these sequences. Most methods that classify TE sequences use handcrafted features such as k-mers and homology-based search, which could be inefficient for classifying non-homologous sequences. Here we propose an approach, called transposable elements pepresentation learner (TERL), that preprocesses and transforms one-dimensional sequences into two-dimensional space data (i.e., image-like data of the sequences) and apply it to deep convolutional neural networks. This classification method tries to learn the best representation of the input data to classify it correctly. We have conducted six experiments to test the performance of TERL against other methods. Our approach obtained macro mean accuracies and F1-score of 96.4% and 85.8% for superfamilies and 95.7% and 91.5% for the order sequences from RepBase, respectively. We have also obtained macro mean accuracies and F1-score of 95.0% and 70.6% for sequences from seven databases into superfamily level and 89.3% and 73.9% for the order level, respectively. We surpassed accuracy, recall and specificity obtained by other methods on the experiment with the classification of order level sequences from seven databases and surpassed by far the time elapsed of any other method for all experiments. Therefore, TERL can learn how to predict any hierarchical level of the TEs classification system and is about 20 times and three orders of magnitude faster than TEclass and PASTEC, respectively https://github.com/muriloHoracio/TERL. Contact:murilocruz@alunos.utfpr.edu.br.
© The Author(s) 2020. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

Entities:  

Keywords:  convolutional neural networks; deep learning; representation learning; sequence classification; transposable elements

Year:  2021        PMID: 34020551     DOI: 10.1093/bib/bbaa185

Source DB:  PubMed          Journal:  Brief Bioinform        ISSN: 1467-5463            Impact factor:   11.622


  4 in total

1.  TransposonUltimate: software for transposon classification, annotation and detection.

Authors:  Kevin Riehl; Cristian Riccio; Eric A Miska; Martin Hemberg
Journal:  Nucleic Acids Res       Date:  2022-06-24       Impact factor: 19.160

2.  InpactorDB: A Classified Lineage-Level Plant LTR Retrotransposon Reference Library for Free-Alignment Methods Based on Machine Learning.

Authors:  Simon Orozco-Arias; Paula A Jaimes; Mariana S Candamil; Cristian Felipe Jiménez-Varón; Reinel Tabares-Soto; Gustavo Isaza; Romain Guyot
Journal:  Genes (Basel)       Date:  2021-01-28       Impact factor: 4.096

3.  A chromosome-level reference genome of a Convolvulaceae species Ipomoea cairica.

Authors:  Fan Jiang; Sen Wang; Hengchao Wang; Anqi Wang; Dong Xu; Hangwei Liu; Boyuan Yang; Lihua Yuan; Lihong Lei; Rong Chen; Weihua Li; Wei Fan
Journal:  G3 (Bethesda)       Date:  2022-08-25       Impact factor: 3.542

4.  K-mer-based machine learning method to classify LTR-retrotransposons in plant genomes.

Authors:  Simon Orozco-Arias; Mariana S Candamil-Cortés; Paula A Jaimes; Johan S Piña; Reinel Tabares-Soto; Romain Guyot; Gustavo Isaza
Journal:  PeerJ       Date:  2021-05-19       Impact factor: 2.984

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.