Literature DB >> 16562972

Harvesting chemical information from the Internet using a distributed approach: ChemXtreme.

M Karthikeyan1, S Krishnan, Anil Kumar Pandey, Andreas Bender.   

Abstract

The Internet is a comprehensive resource of chemical information which is at the same time largely unstructured. It provides a wealth of scientific information such as experimental data and requires a suitable automated data mining and analysis tool for its meaningful exploration. The Java based software presented here, ChemXtreme, is developed for harvesting chemical information from the Internet employing the Google API in combination with a distributed client/server text analysis architecture based on JavaRMI. It represents the first and until now the only toolkit for automated structured data retrieval from the Internet which is itself open source. ChemXtreme employs the "search the search engine" strategy, where the URLs returned from the search engine are analyzed further via textual pattern analysis. This process resembles the manual analysis of the hit list, where relevant data are captured and, by means of human intervention, are mined into a format suitable for further analysis. ChemXtreme on the other hand transforms chemical information automatically into a structured format suitable for storage in databases and further analysis and also provides links to the original information source. The query data retrieved from the search engine by the server is encoded, encrypted, and compressed and then sent to all the participating active clients in the network for parsing. Relevant information identified by the clients on the retrieved Web sites is sent back to the server, verified, and added to the database for data mining and further analysis. The distributed further analysis of URLs in a client/server architecture scales very favorably, thus producing only minimal overhead.

Entities:  

Mesh:

Year:  2006        PMID: 16562972     DOI: 10.1021/ci050329+

Source DB:  PubMed          Journal:  J Chem Inf Model        ISSN: 1549-9596            Impact factor:   4.956


  3 in total

1.  Translational integrity and continuity: personalized biomedical data integration.

Authors:  Xiaoming Wang; Lili Liu; James Fackenthal; Shelly Cummings; Maggie Cook; Kisha Hope; Jonathan C Silverstein; Olufunmilayo I Olopade
Journal:  J Biomed Inform       Date:  2008-08-12       Impact factor: 6.317

2.  ChemEngine: harvesting 3D chemical structures of supplementary data from PDF files.

Authors:  Muthukumarasamy Karthikeyan; Renu Vyas
Journal:  J Cheminform       Date:  2016-12-29       Impact factor: 5.514

3.  Userscripts for the life sciences.

Authors:  Egon L Willighagen; Noel M O'Boyle; Harini Gopalakrishnan; Dazhi Jiao; Rajarshi Guha; Christoph Steinbeck; David J Wild
Journal:  BMC Bioinformatics       Date:  2007-12-21       Impact factor: 3.169

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.