Julien Jorda1, Andrey V Kajava. 1. Centre de Recherches de Biochimie Macromoléculaire UMR 5237, CNRS, University of Montpellier 1 and 2, Montpellier, France. julien.jorda@crbm.cnrs.fr
Abstract
MOTIVATION: Over the last years a number of evidences have been accumulated about high incidence of tandem repeats in proteins carrying fundamental biological functions and being related to a number of human diseases. At the same time, frequently, protein repeats are strongly degenerated during evolution and, therefore, cannot be easily identified. To solve this problem, several computer programs which were based on different algorithms have been developed. Nevertheless, our tests showed that there is still room for improvement of methods for accurate and rapid detection of tandem repeats in proteins. RESULTS: We developed a new program called T-REKS for ab initio identification of the tandem repeats. It is based on clustering of lengths between identical short strings by using a K-means algorithm. Benchmark of the existing programs and T-REKS on several sequence datasets is presented. Our program being linked to the Protein Repeat DataBase opens the way for large-scale analysis of protein tandem repeats. T-REKS can also be applied to the nucleotide sequences. AVAILABILITY: The algorithm has been implemented in JAVA, the program is available upon request at http://bioinfo.montp.cnrs.fr/?r=t-reks. Protein Repeat DataBase generated by using T-REKS is accessible at http://bioinfo.montp.cnrs.fr/?r=repeatDB.
MOTIVATION: Over the last years a number of evidences have been accumulated about high incidence of tandem repeats in proteins carrying fundamental biological functions and being related to a number of human diseases. At the same time, frequently, protein repeats are strongly degenerated during evolution and, therefore, cannot be easily identified. To solve this problem, several computer programs which were based on different algorithms have been developed. Nevertheless, our tests showed that there is still room for improvement of methods for accurate and rapid detection of tandem repeats in proteins. RESULTS: We developed a new program called T-REKS for ab initio identification of the tandem repeats. It is based on clustering of lengths between identical short strings by using a K-means algorithm. Benchmark of the existing programs and T-REKS on several sequence datasets is presented. Our program being linked to the Protein Repeat DataBase opens the way for large-scale analysis of protein tandem repeats. T-REKS can also be applied to the nucleotide sequences. AVAILABILITY: The algorithm has been implemented in JAVA, the program is available upon request at http://bioinfo.montp.cnrs.fr/?r=t-reks. Protein Repeat DataBase generated by using T-REKS is accessible at http://bioinfo.montp.cnrs.fr/?r=repeatDB.
Authors: António Lourenço; Aitor de Las Heras; Mariela Scortti; Jose Vazquez-Boland; Joseph F Frank; Luisa Brito Journal: Appl Environ Microbiol Date: 2013-07-26 Impact factor: 4.792
Authors: Kim L Johnson; Andrew M Cassin; Andrew Lonsdale; Gane Ka-Shu Wong; Douglas E Soltis; Nicholas W Miles; Michael Melkonian; Barbara Melkonian; Michael K Deyholos; James Leebens-Mack; Carl J Rothfels; Dennis W Stevenson; Sean W Graham; Xumin Wang; Shuangxiu Wu; J Chris Pires; Patrick P Edger; Eric J Carpenter; Antony Bacic; Monika S Doblin; Carolyn J Schultz Journal: Plant Physiol Date: 2017-04-26 Impact factor: 8.340
Authors: Vladimir Perovic; Jeremy Y Leclercq; Neven Sumonja; Francois D Richard; Nevena Veljkovic; Andrey V Kajava Journal: Bioinformatics Date: 2020-05-01 Impact factor: 6.937