Literature DB >> 31981184

A new DNA sequence entropy-based Kullback-Leibler algorithm for gene clustering.

Houshang Dehghanzadeh1, Mostafa Ghaderi-Zefrehei2, Seyed Ziaeddin Mirhoseini3, Saeid Esmaeilkhaniyan4, Ishaku Lemu Haruna5, Hamed Amirpour Najafabadi5.   

Abstract

Information theory is a branch of mathematics that overlaps with communications, biology, and medical engineering. Entropy is a measure of uncertainty in the set of information. In this study, for each gene and its exons sets, the entropy was calculated in orders one to four. Based on the relative entropy of genes and exons, Kullback-Leibler divergence was calculated. After obtaining the Kullback-Leibler distance for genes and exons sets, the results were entered as input into 7 clustering algorithms: single, complete, average, weighted, centroid, median, and K-means. To aggregate the results of clustering, the AdaBoost algorithm was used. Finally, the results of the AdaBoost algorithm were investigated by GeneMANIA prediction server to explore the results from gene annotation point of view. All calculations were performed using the MATLAB Engineering Software (2015). Following our findings on investigating the results of genes metabolic pathways based on the gene annotations, it was revealed that our proposed clustering method yielded correct, logical, and fast results. This method at the same that had not had the disadvantages of aligning allowed the genes with actual length and content to be considered and also did not require high memory for large-length sequences. We believe that the performance of the proposed method could be used with other competitive gene clustering methods to group biologically relevant set of genes. Also, the proposed method can be seen as a predictive method for those genes bearing up weak genomic annotations.

Keywords:  Dairy cattle; Gene clustering; Information theory; Kullback-Leibler divergence

Mesh:

Year:  2020        PMID: 31981184     DOI: 10.1007/s13353-020-00543-x

Source DB:  PubMed          Journal:  J Appl Genet        ISSN: 1234-1983            Impact factor:   3.240


  7 in total

1.  Prediction of total genetic value using genome-wide dense marker maps.

Authors:  T H Meuwissen; B J Hayes; M E Goddard
Journal:  Genetics       Date:  2001-04       Impact factor: 4.562

2.  A study of performance on microarray data sets for a classifier based on information theoretic learning.

Authors:  Iago Porto-Díaz; Verónica Bolón-Canedo; Amparo Alonso-Betanzos; Oscar Fontenla-Romero
Journal:  Neural Netw       Date:  2011-06-12

3.  The GeneMANIA prediction server: biological network integration for gene prioritization and predicting gene function.

Authors:  David Warde-Farley; Sylva L Donaldson; Ovi Comes; Khalid Zuberi; Rashad Badrawi; Pauline Chao; Max Franz; Chris Grouios; Farzana Kazi; Christian Tannus Lopes; Anson Maitland; Sara Mostafavi; Jason Montojo; Quentin Shao; George Wright; Gary D Bader; Quaid Morris
Journal:  Nucleic Acids Res       Date:  2010-07       Impact factor: 16.971

4.  An entropy test for single-locus genetic association analysis.

Authors:  Manuel Ruiz-Marín; Mariano Matilla-García; José Antonio García Cordoba; Juan Luis Susillo-González; Alejandro Romo-Astorga; Antonio González-Pérez; Agustín Ruiz; Javier Gayán
Journal:  BMC Genet       Date:  2010-03-23       Impact factor: 2.797

Review 5.  Information theory applications for biological sequence analysis.

Authors:  Susana Vinga
Journal:  Brief Bioinform       Date:  2013-09-20       Impact factor: 11.622

6.  The bovine lactation genome: insights into the evolution of mammalian milk.

Authors:  Danielle G Lemay; David J Lynn; William F Martin; Margaret C Neville; Theresa M Casey; Gonzalo Rincon; Evgenia V Kriventseva; Wesley C Barris; Angie S Hinrichs; Adrian J Molenaar; Katherine S Pollard; Nauman J Maqbool; Kuljeet Singh; Regan Murney; Evgeny M Zdobnov; Ross L Tellam; Juan F Medrano; J Bruce German; Monique Rijnkels
Journal:  Genome Biol       Date:  2009-04-24       Impact factor: 13.583

7.  Structural complexity of DNA sequence.

Authors:  Cheng-Yuan Liou; Shen-Han Tseng; Wei-Chen Cheng; Huai-Ying Tsai
Journal:  Comput Math Methods Med       Date:  2013-04-04       Impact factor: 2.238

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.