Literature DB >> 20088574

SPECTRa-T: machine-based data extraction and semantic searching of chemistry e-theses.

Jim Downing1, Matt J Harvey, Peter B Morgan, Peter Murray-Rust, Henry S Rzepa, Diana C Stewart, Alan P Tonge, Joe A Townsend.   

Abstract

The SPECTRa-T project has developed text-mining tools to extract named chemical entities (NCEs), such as chemical names and terms, and chemical objects (COs), e.g., experimental spectral assignments and physical chemistry properties, from electronic theses (e-theses). Although NCEs were readily identified within the two major document formats studied, only the use of structured documents enabled identification of chemical objects and their association with the relevant chemical entity (e.g., systematic chemical name). A corpus of theses was analyzed and it is shown that a high degree of semantic information can be extracted from structured documents. This integrated information has been deposited in a persistent Resource Description Framework (RDF) triple-store that allows users to conduct semantic searches. The strength and weaknesses of several document formats are reviewed.

Mesh:

Year:  2010        PMID: 20088574     DOI: 10.1021/ci9003688

Source DB:  PubMed          Journal:  J Chem Inf Model        ISSN: 1549-9596            Impact factor:   4.956


  2 in total

1.  The creation and characterisation of a National Compound Collection: the Royal Society of Chemistry pilot.

Authors:  David M Andrews; Laura M Broad; Paul J Edwards; David N A Fox; Timothy Gallagher; Stephen L Garland; Richard Kidd; Joseph B Sweeney
Journal:  Chem Sci       Date:  2016-02-23       Impact factor: 9.825

2.  Cheminformatics and the Semantic Web: adding value with linked data and enhanced provenance.

Authors:  Jeremy G Frey; Colin L Bird
Journal:  Wiley Interdiscip Rev Comput Mol Sci       Date:  2013-01-08
  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.