Jing Peng1, Xinyi Shi2, Yiming Sun2, Dongye Li2, Baohui Liu2, Fanjiang Kong2, Xiaohui Yuan2. 1. College of Electronic and Information, Northeast Agricultural University, Harbin, China, School of Computer Science and Technology, Changchun University of Science and Technology, Changchun, China and The Key Lab of Soybean Molecular Design Breeding, Northeast Institute of Geography and Agroecology, Chinese Academy of Sciences, Harbin, China College of Electronic and Information, Northeast Agricultural University, Harbin, China, School of Computer Science and Technology, Changchun University of Science and Technology, Changchun, China and The Key Lab of Soybean Molecular Design Breeding, Northeast Institute of Geography and Agroecology, Chinese Academy of Sciences, Harbin, China. 2. College of Electronic and Information, Northeast Agricultural University, Harbin, China, School of Computer Science and Technology, Changchun University of Science and Technology, Changchun, China and The Key Lab of Soybean Molecular Design Breeding, Northeast Institute of Geography and Agroecology, Chinese Academy of Sciences, Harbin, China.
Abstract
MOTIVATION: Figures and tables in biomedical literature record vast amounts of important experiment results. In scientific papers, for example, quantitative trait locus (QTL) information is usually presented in tables. However, most of the popular text-mining methods focus on extracting knowledge from unstructured free text. As far as we know, there are no published works on mining tables in biomedical literature. In this article, we propose a method to extract QTL information from tables and plain text found in literature. Heterogeneous and complex tables were converted into a structured database, combined with information extracted from plain text. Our method could greatly reduce labor burdens involved with database curation. RESULTS: We applied our method on a soybean QTL database curation, from which 2278 records were extracted from 228 papers with a precision rate of 96.9% and a recall rate of 83.3%, F value for the method is 89.6%.
MOTIVATION: Figures and tables in biomedical literature record vast amounts of important experiment results. In scientific papers, for example, quantitative trait locus (QTL) information is usually presented in tables. However, most of the popular text-mining methods focus on extracting knowledge from unstructured free text. As far as we know, there are no published works on mining tables in biomedical literature. In this article, we propose a method to extract QTL information from tables and plain text found in literature. Heterogeneous and complex tables were converted into a structured database, combined with information extracted from plain text. Our method could greatly reduce labor burdens involved with database curation. RESULTS: We applied our method on a soybean QTL database curation, from which 2278 records were extracted from 228 papers with a precision rate of 96.9% and a recall rate of 83.3%, F value for the method is 89.6%.
Authors: Gurnoor Singh; Arnold Kuzniar; Erik M van Mulligen; Anand Gavai; Christian W Bachem; Richard G F Visser; Richard Finkers Journal: BMC Bioinformatics Date: 2018-05-25 Impact factor: 3.169