Literature DB >> 31453694

Data Mining Approach for Extraction of Useful Information About Biologically Active Compounds from Publications.

Olga A Tarasova1, Nadezhda Yu Biziukova1, Dmitry A Filimonov1, Vladimir V Poroikov1, Marc C Nicklaus2.   

Abstract

A lot of high quality data on the biological activity of chemical compounds are required throughout the whole drug discovery process: from development of computational models of the structure-activity relationship to experimental testing of lead compounds and their validation in clinics. Currently, a large amount of such data is available from databases, scientific publications, and patents. Biological data are characterized by incompleteness, uncertainty, and low reproducibility. Despite the existence of free and commercially available databases of biological activities of compounds, they usually lack unambiguous information about peculiarities of biological assays. On the other hand, scientific papers are the primary source of new data disclosed to the scientific community for the first time. In this study, we have developed and validated a data-mining approach for extraction of text fragments containing description of bioassays. We have used this approach to evaluate compounds and their biological activity reported in scientific publications. We have found that categorization of papers into relevant and irrelevant may be performed based on the machine-learning analysis of the abstracts. Text fragments extracted from the full texts of publications allow their further partitioning into several classes according to the peculiarities of bioassays. We demonstrate the applicability of our approach to the comparison of the endpoint values of biological activity and cytotoxicity of reference compounds.

Entities:  

Mesh:

Substances:

Year:  2019        PMID: 31453694      PMCID: PMC8194363          DOI: 10.1021/acs.jcim.9b00164

Source DB:  PubMed          Journal:  J Chem Inf Model        ISSN: 1549-9596            Impact factor:   4.956


  38 in total

1.  Mining the characteristic interaction patterns on protein-protein binding interfaces.

Authors:  Yan Li; Zhihai Liu; Li Han; Chengke Li; Renxiao Wang
Journal:  J Chem Inf Model       Date:  2013-09-09       Impact factor: 4.956

2.  NCBI disease corpus: a resource for disease name recognition and concept normalization.

Authors:  Rezarta Islamaj Doğan; Robert Leaman; Zhiyong Lu
Journal:  J Biomed Inform       Date:  2014-01-03       Impact factor: 6.317

3.  Perspectives on validation of high-throughput assays supporting 21st century toxicity testing.

Authors:  Richard Judson; Robert Kavlock; Matthew Martin; David Reif; Keith Houck; Thomas Knudsen; Ann Richard; Raymond R Tice; Maurice Whelan; Menghang Xia; Ruili Huang; Christopher Austin; George Daston; Thomas Hartung; John R Fowle; William Wooge; Weida Tong; David Dix
Journal:  ALTEX       Date:  2013       Impact factor: 6.043

4.  How Consistent are Publicly Reported Cytotoxicity Data? Large-Scale Statistical Analysis of the Concordance of Public Independent Cytotoxicity Measurements.

Authors:  Isidro Cortés-Ciriano; Andreas Bender
Journal:  ChemMedChem       Date:  2015-11-06       Impact factor: 3.466

5.  BioAssay Ontology (BAO): a semantic description of bioassays and high-throughput screening results.

Authors:  Ubbo Visser; Saminda Abeyruwan; Uma Vempati; Robin P Smith; Vance Lemmon; Stephan C Schürer
Journal:  BMC Bioinformatics       Date:  2011-06-24       Impact factor: 3.169

6.  Exploring the boundaries: gene and protein identification in biomedical text.

Authors:  Jenny Finkel; Shipra Dingare; Christopher D Manning; Malvina Nissim; Beatrice Alex; Claire Grover
Journal:  BMC Bioinformatics       Date:  2005-05-24       Impact factor: 3.169

7.  Identifying gene and protein mentions in text using conditional random fields.

Authors:  Ryan McDonald; Fernando Pereira
Journal:  BMC Bioinformatics       Date:  2005-05-24       Impact factor: 3.169

8.  PCorral--interactive mining of protein interactions from MEDLINE.

Authors:  Chen Li; Antonio Jimeno-Yepes; Miguel Arregui; Harald Kirsch; Dietrich Rebholz-Schuhmann
Journal:  Database (Oxford)       Date:  2013-05-02       Impact factor: 3.451

9.  Fast and accurate semantic annotation of bioassays exploiting a hybrid of machine learning and user confirmation.

Authors:  Alex M Clark; Barry A Bunin; Nadia K Litterman; Stephan C Schürer; Ubbo Visser
Journal:  PeerJ       Date:  2014-08-14       Impact factor: 2.984

10.  BIGCHEM: Challenges and Opportunities for Big Data Analysis in Chemistry.

Authors:  Igor V Tetko; Ola Engkvist; Uwe Koch; Jean-Louis Reymond; Hongming Chen
Journal:  Mol Inform       Date:  2016-07-28       Impact factor: 3.353

View more
  4 in total

1.  (Q)SAR Models of HIV-1 Protein Inhibition by Drug-Like Compounds.

Authors:  Leonid A Stolbov; Dmitry S Druzhilovskiy; Dmitry A Filimonov; Marc C Nicklaus; Vladimir V Poroikov
Journal:  Molecules       Date:  2019-12-25       Impact factor: 4.411

2.  Data and Text Mining Help Identify Key Proteins Involved in the Molecular Mechanisms Shared by SARS-CoV-2 and HIV-1.

Authors:  Olga Tarasova; Sergey Ivanov; Dmitry A Filimonov; Vladimir Poroikov
Journal:  Molecules       Date:  2020-06-26       Impact factor: 4.411

3.  Automated Extraction of Information From Texts of Scientific Publications: Insights Into HIV Treatment Strategies.

Authors:  Nadezhda Biziukova; Olga Tarasova; Sergey Ivanov; Vladimir Poroikov
Journal:  Front Genet       Date:  2020-12-22       Impact factor: 4.599

4.  Chemical named entity recognition in the texts of scientific publications using the naïve Bayes classifier approach.

Authors:  O A Tarasova; A V Rudik; N Yu Biziukova; D A Filimonov; V V Poroikov
Journal:  J Cheminform       Date:  2022-08-13       Impact factor: 8.489

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.