Literature DB >> 31422557

The Helitron family classification using SVM based on Fourier transform features applied on an unbalanced dataset.

Rabeb Touati1,2, Afef Elloumi Oueslati3,4, Imen Messaoudi3,5, Zied Lachiri3.   

Abstract

Helitrons are mobile sequences which belong to the class 2 of eukaryotic transposons. Their specificity resides in their mechanism of transposition: the rolling circle mechanism. They play an important role in remodeling proteomes due to their ability to modify existing genes and introducing new ones. A major difficulty in identifying and classifying Helitron families comes from the complex structure, the unspecified length, and the unbalanced appearance number of each Helitron type. The Helitron's recognition is still not solved in literature. The purpose of this paper is to characterize and classify Helitron types using spectral features and support vector machine (SVM) classification technique. Thus, the helitronic DNA is transformed into a numerical form using the FCGS2 coding technique. Then, a set of spectral features is extracted from the smoothed Fourier transform applied on the FCGS2 signals. Based on the spectral signature and the classification's confusion matrix, we demonstrated that some specific classes which do not show similarities, such as HelitronY2 and NDNAX3, are easily discriminated with important accuracy rates exceeding 90%. However, some Helitron types have great similarities such as the following: Helitron1, HelitronY1, HelitronY1A, and HelitronY4. Our system is also able to predict them with promising values reaching 70%. Graphical abstract The Helitron recognizer based on features extracted from smoothed Fourier transform.

Entities:  

Keywords:  C. elegans; FCGS2 coding; Helitrons; SVM classification; Smoothed Fourier transform; Spectral signature

Mesh:

Year:  2019        PMID: 31422557     DOI: 10.1007/s11517-019-02027-5

Source DB:  PubMed          Journal:  Med Biol Eng Comput        ISSN: 0140-0118            Impact factor:   2.602


  31 in total

1.  Support vector machine classification and validation of cancer tissue samples using microarray expression data.

Authors:  T S Furey; N Cristianini; N Duffy; D W Bednarski; M Schummer; D Haussler
Journal:  Bioinformatics       Date:  2000-10       Impact factor: 6.937

2.  Rolling-circle transposons in eukaryotes.

Authors:  V V Kapitonov; J Jurka
Journal:  Proc Natl Acad Sci U S A       Date:  2001-07-10       Impact factor: 11.205

3.  Systematic learning of gene functional classes from DNA array expression data by using multilayer perceptrons.

Authors:  Alvaro Mateos; Joaquín Dopazo; Ronald Jansen; Yuhai Tu; Mark Gerstein; Gustavo Stolovitzky
Journal:  Genome Res       Date:  2002-11       Impact factor: 9.043

4.  Spectral Repeat Finder (SRF): identification of repetitive sequences using Fourier transformation.

Authors:  Deepak Sharma; Biju Issac; G P S Raghava; R Ramaswamy
Journal:  Bioinformatics       Date:  2004-02-19       Impact factor: 6.937

Review 5.  Repbase Update, a database of eukaryotic repetitive elements.

Authors:  J Jurka; V V Kapitonov; A Pavlicek; P Klonowski; O Kohany; J Walichiewicz
Journal:  Cytogenet Genome Res       Date:  2005       Impact factor: 1.636

6.  Strong nucleosomes of A. thaliana concentrate in centromere regions.

Authors:  Bilal Salih; Edward N Trifonov
Journal:  J Biomol Struct Dyn       Date:  2013-11-27

7.  Distribution, diversity, evolution, and survival of Helitrons in the maize genome.

Authors:  Lixing Yang; Jeffrey L Bennetzen
Journal:  Proc Natl Acad Sci U S A       Date:  2009-11-19       Impact factor: 11.205

8.  iMiRNA-PseDPC: microRNA precursor identification with a pseudo distance-pair composition approach.

Authors:  Bin Liu; Longyun Fang; Fule Liu; Xiaolong Wang; Kuo-Chen Chou
Journal:  J Biomol Struct Dyn       Date:  2015-03-03

9.  PASTEC: an automatic transposable element classification tool.

Authors:  Claire Hoede; Sandie Arnoux; Mark Moisset; Timothée Chaumier; Olivier Inizan; Véronique Jamilloux; Hadi Quesneville
Journal:  PLoS One       Date:  2014-05-02       Impact factor: 3.240

10.  Computational Detection of piRNA in Human Using Support Vector Machine.

Authors:  Atefeh Seyeddokht; Ali Asghar Aslaminejad; Ali Masoudi-Nejad; Mohammadreza Nassiri; Javad Zahiri; Balal Sadeghi
Journal:  Avicenna J Med Biotechnol       Date:  2016 Jan-Mar
View more
  2 in total

1.  Comparative genomic signature representations of the emerging COVID-19 coronavirus and other coronaviruses: High identity and possible recombination between Bat and Pangolin coronaviruses.

Authors:  Rabeb Touati; Sondes Haddad-Boubaker; Imen Ferchichi; Imen Messaoudi; Afef Elloumi Ouesleti; Henda Triki; Zied Lachiri; Maher Kharrat
Journal:  Genomics       Date:  2020-07-06       Impact factor: 5.736

2.  New methodology for repetitive sequences identification in human X and Y chromosomes.

Authors:  Rabeb Touati; Asma Tajouri; Imen Mesaoudi; Afef Elloumi Oueslati; Zied Lachiri; Maher Kharrat
Journal:  Biomed Signal Process Control       Date:  2020-10-19       Impact factor: 3.880

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.