Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Data Mining Approach for Extraction of Useful Information About Biologically Active Compounds from Publications.

Literature DB >> 31453694

Data Mining Approach for Extraction of Useful Information About Biologically Active Compounds from Publications.

Olga A Tarasova¹, Nadezhda Yu Biziukova¹, Dmitry A Filimonov¹, Vladimir V Poroikov¹, Marc C Nicklaus².

Abstract

A lot of high quality data on the biological activity of chemical compounds are required throughout the whole drug discovery process: from development of computational models of the structure-activity relationship to experimental testing of lead compounds and their validation in clinics. Currently, a large amount of such data is available from databases, scientific publications, and patents. Biological data are characterized by incompleteness, uncertainty, and low reproducibility. Despite the existence of free and commercially available databases of biological activities of compounds, they usually lack unambiguous information about peculiarities of biological assays. On the other hand, scientific papers are the primary source of new data disclosed to the scientific community for the first time. In this study, we have developed and validated a data-mining approach for extraction of text fragments containing description of bioassays. We have used this approach to evaluate compounds and their biological activity reported in scientific publications. We have found that categorization of papers into relevant and irrelevant may be performed based on the machine-learning analysis of the abstracts. Text fragments extracted from the full texts of publications allow their further partitioning into several classes according to the peculiarities of bioassays. We demonstrate the applicability of our approach to the comparison of the endpoint values of biological activity and cytotoxicity of reference compounds.

Entities: CellLine Chemical Disease Gene Species

Mesh：

Substances：

Year: 2019 PMID： 31453694 PMCID： PMC8194363 DOI： 10.1021/acs.jcim.9b00164

Source DB: PubMed Journal: J Chem Inf Model ISSN： 1549-9596 Impact factor: 4.956

38 in total

1. Mining the characteristic interaction patterns on protein-protein binding interfaces.

Authors: Yan Li; Zhihai Liu; Li Han; Chengke Li; Renxiao Wang
Journal: J Chem Inf Model Date: 2013-09-09 Impact factor: 4.956

2. NCBI disease corpus: a resource for disease name recognition and concept normalization.

Authors: Rezarta Islamaj Doğan; Robert Leaman; Zhiyong Lu
Journal: J Biomed Inform Date: 2014-01-03 Impact factor: 6.317

3. Perspectives on validation of high-throughput assays supporting 21st century toxicity testing.

Authors: Richard Judson; Robert Kavlock; Matthew Martin; David Reif; Keith Houck; Thomas Knudsen; Ann Richard; Raymond R Tice; Maurice Whelan; Menghang Xia; Ruili Huang; Christopher Austin; George Daston; Thomas Hartung; John R Fowle; William Wooge; Weida Tong; David Dix
Journal: ALTEX Date: 2013 Impact factor: 6.043

4. How Consistent are Publicly Reported Cytotoxicity Data? Large-Scale Statistical Analysis of the Concordance of Public Independent Cytotoxicity Measurements.

Authors: Isidro Cortés-Ciriano; Andreas Bender
Journal: ChemMedChem Date: 2015-11-06 Impact factor: 3.466

5. BioAssay Ontology (BAO): a semantic description of bioassays and high-throughput screening results.

Authors: Ubbo Visser; Saminda Abeyruwan; Uma Vempati; Robin P Smith; Vance Lemmon; Stephan C Schürer
Journal: BMC Bioinformatics Date: 2011-06-24 Impact factor: 3.169

6. Exploring the boundaries: gene and protein identification in biomedical text.

Authors: Jenny Finkel; Shipra Dingare; Christopher D Manning; Malvina Nissim; Beatrice Alex; Claire Grover
Journal: BMC Bioinformatics Date: 2005-05-24 Impact factor: 3.169

7. Identifying gene and protein mentions in text using conditional random fields.

Authors: Ryan McDonald; Fernando Pereira
Journal: BMC Bioinformatics Date: 2005-05-24 Impact factor: 3.169

8. PCorral--interactive mining of protein interactions from MEDLINE.

Authors: Chen Li; Antonio Jimeno-Yepes; Miguel Arregui; Harald Kirsch; Dietrich Rebholz-Schuhmann
Journal: Database (Oxford) Date: 2013-05-02 Impact factor: 3.451

9. Fast and accurate semantic annotation of bioassays exploiting a hybrid of machine learning and user confirmation.

Authors: Alex M Clark; Barry A Bunin; Nadia K Litterman; Stephan C Schürer; Ubbo Visser
Journal: PeerJ Date: 2014-08-14 Impact factor: 2.984

10. BIGCHEM: Challenges and Opportunities for Big Data Analysis in Chemistry.

Authors: Igor V Tetko; Ola Engkvist; Uwe Koch; Jean-Louis Reymond; Hongming Chen
Journal: Mol Inform Date: 2016-07-28 Impact factor: 3.353

4 in total

1. (Q)SAR Models of HIV-1 Protein Inhibition by Drug-Like Compounds.

Authors: Leonid A Stolbov; Dmitry S Druzhilovskiy; Dmitry A Filimonov; Marc C Nicklaus; Vladimir V Poroikov
Journal: Molecules Date: 2019-12-25 Impact factor: 4.411

2. Data and Text Mining Help Identify Key Proteins Involved in the Molecular Mechanisms Shared by SARS-CoV-2 and HIV-1.

Authors: Olga Tarasova; Sergey Ivanov; Dmitry A Filimonov; Vladimir Poroikov
Journal: Molecules Date: 2020-06-26 Impact factor: 4.411

3. Automated Extraction of Information From Texts of Scientific Publications: Insights Into HIV Treatment Strategies.

Authors: Nadezhda Biziukova; Olga Tarasova; Sergey Ivanov; Vladimir Poroikov
Journal: Front Genet Date: 2020-12-22 Impact factor: 4.599

4. Chemical named entity recognition in the texts of scientific publications using the naïve Bayes classifier approach.

Authors: O A Tarasova; A V Rudik; N Yu Biziukova; D A Filimonov; V V Poroikov
Journal: J Cheminform Date: 2022-08-13 Impact factor: 8.489

4 in total