Literature DB >> 20626859

KID--an algorithm for fast and efficient text mining used to automatically generate a database containing kinetic information of enzymes.

Stephanie Heinen1, Bernhard Thielen, Dietmar Schomburg.   

Abstract

BACKGROUND: The amount of available biological information is rapidly increasing and the focus of biological research has moved from single components to networks and even larger projects aiming at the analysis, modelling and simulation of biological networks as well as large scale comparison of cellular properties. It is therefore essential that biological knowledge is easily accessible. However, most information is contained in the written literature in an unstructured way, so that methods for the systematic extraction of knowledge directly from the primary literature have to be deployed. DESCRIPTION: Here we present a text mining algorithm for the extraction of kinetic information such as K(M), K(i), k(cat) etc. as well as associated information such as enzyme names, EC numbers, ligands, organisms, localisations, pH and temperatures. Using this rule- and dictionary-based approach, it was possible to extract 514,394 kinetic parameters of 13 categories (K(M), K(i), k(cat), k(cat)/K(M), V(max), IC(50), S(0.5), K(d), K(a), t(1/2), pI, n(H), specific activity, V(max)/K(M)) from about 17 million PubMed abstracts and combine them with other data in the abstract. A manual verification of approx. 1,000 randomly chosen results yielded a recall between 51% and 84% and a precision ranging from 55% to 96%, depending of the category searched.The results were stored in a database and are available as "KID the KInetic Database" via the internet.
CONCLUSIONS: The presented algorithm delivers a considerable amount of information and therefore may aid to accelerate the research and the automated analysis required for today's systems biology approaches. The database obtained by analysing PubMed abstracts may be a valuable help in the field of chemical and biological kinetics. It is completely based upon text mining and therefore complements manually curated databases. The database is available at http://kid.tu-bs.de. The source code of the algorithm is provided under the GNU General Public Licence and available on request from the author.

Entities:  

Mesh:

Substances:

Year:  2010        PMID: 20626859      PMCID: PMC2912889          DOI: 10.1186/1471-2105-11-375

Source DB:  PubMed          Journal:  BMC Bioinformatics        ISSN: 1471-2105            Impact factor:   3.169


  13 in total

1.  The potential use of SUISEKI as a protein interaction discovery tool.

Authors:  C Blaschke; A Valencia
Journal:  Genome Inform       Date:  2001

2.  BRENDA, the enzyme database: updates and major new developments.

Authors:  Ida Schomburg; Antje Chang; Christian Ebeling; Marion Gremse; Christian Heldt; Gregor Huhn; Dietmar Schomburg
Journal:  Nucleic Acids Res       Date:  2004-01-01       Impact factor: 16.971

3.  Finding kinetic parameters using text mining.

Authors:  Jörg Hakenberg; Sebastian Schmeier; Axel Kowald; Edda Klipp; Ulf Leser
Journal:  OMICS       Date:  2004

Review 4.  Machine learning in bioinformatics.

Authors:  Pedro Larrañaga; Borja Calvo; Roberto Santana; Concha Bielza; Josu Galdiano; Iñaki Inza; José A Lozano; Rubén Armañanzas; Guzmán Santafé; Aritz Pérez; Victor Robles
Journal:  Brief Bioinform       Date:  2006-03       Impact factor: 11.622

Review 5.  Literature mining for the biologist: from information retrieval to biological discovery.

Authors:  Lars Juhl Jensen; Jasmin Saric; Peer Bork
Journal:  Nat Rev Genet       Date:  2006-02       Impact factor: 53.242

Review 6.  Text mining and its potential applications in systems biology.

Authors:  Sophia Ananiadou; Douglas B Kell; Jun-ichi Tsujii
Journal:  Trends Biotechnol       Date:  2006-10-12       Impact factor: 19.536

7.  KiPar, a tool for systematic information retrieval regarding parameters for kinetic modelling of yeast metabolic pathways.

Authors:  Irena Spasic; Evangelos Simeonidis; Hanan L Messiha; Norman W Paton; Douglas B Kell
Journal:  Bioinformatics       Date:  2009-03-31       Impact factor: 6.937

8.  BioRAT: extracting biological information from full-length papers.

Authors:  David P A Corney; Bernard F Buxton; William B Langdon; David T Jones
Journal:  Bioinformatics       Date:  2004-07-01       Impact factor: 6.937

9.  Database resources of the National Center for Biotechnology Information.

Authors:  David L Wheeler; Tanya Barrett; Dennis A Benson; Stephen H Bryant; Kathi Canese; Vyacheslav Chetvernin; Deanna M Church; Michael DiCuccio; Ron Edgar; Scott Federhen; Lewis Y Geer; Wolfgang Helmberg; Yuri Kapustin; David L Kenton; Oleg Khovayko; David J Lipman; Thomas L Madden; Donna R Maglott; James Ostell; Kim D Pruitt; Gregory D Schuler; Lynn M Schriml; Edwin Sequeira; Stephen T Sherry; Karl Sirotkin; Alexandre Souvorov; Grigory Starchenko; Tugba O Suzek; Roman Tatusov; Tatiana A Tatusova; Lukas Wagner; Eugene Yaschenko
Journal:  Nucleic Acids Res       Date:  2006-01-01       Impact factor: 16.971

10.  A tutorial on information retrieval: basic terms and concepts.

Authors:  Wei Zhou; Neil R Smalheiser; Clement Yu
Journal:  J Biomed Discov Collab       Date:  2006-03-13
View more
  6 in total

1.  Discrete derivative: a data slicing algorithm for exploration of sharing biological networks between rheumatoid arthritis and coronary heart disease.

Authors:  Guang Zheng; Xiaojuan He; Aiping Lu; Miao Jiang; Jing Zhao; Hongtao Guo; Gao Chen; Qinglin Zha
Journal:  BioData Min       Date:  2011-06-23       Impact factor: 2.522

2.  Semantic annotation of biological concepts interplaying microbial cellular responses.

Authors:  Rafael Carreira; Sónia Carneiro; Rui Pereira; Miguel Rocha; Isabel Rocha; Eugénio C Ferreira; Anália Lourenço
Journal:  BMC Bioinformatics       Date:  2011-11-28       Impact factor: 3.169

3.  Automated extraction and semantic analysis of mutation impacts from the biomedical literature.

Authors:  Nona Naderi; René Witte
Journal:  BMC Genomics       Date:  2012-06-18       Impact factor: 3.969

4.  Automated extraction of precise protein expression patterns in lymphoma by text mining abstracts of immunohistochemical studies.

Authors:  Jia-Fu Chang; Mihail Popescu; Gerald L Arthur
Journal:  J Pathol Inform       Date:  2013-07-31

5.  BRENDA in 2013: integrated reactions, kinetic data, enzyme function data, improved disease classification: new options and contents in BRENDA.

Authors:  Ida Schomburg; Antje Chang; Sandra Placzek; Carola Söhngen; Michael Rother; Maren Lang; Cornelia Munaretto; Susanne Ulas; Michael Stelzer; Andreas Grote; Maurice Scheer; Dietmar Schomburg
Journal:  Nucleic Acids Res       Date:  2012-11-29       Impact factor: 16.971

6.  Ion Channel ElectroPhysiology Ontology (ICEPO) - a case study of text mining assisted ontology development.

Authors:  Ravikumar Komandur Elayavilli; Hongfang Liu
Journal:  AMIA Jt Summits Transl Sci Proc       Date:  2016-07-20
  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.