| Literature DB >> 18430222 |
Wei Yu1, Melinda Clyne, Siobhan M Dolan, Ajay Yesupriya, Anja Wulf, Tiebin Liu, Muin J Khoury, Marta Gwinn.
Abstract
BACKGROUND: Synthesis of data from published human genetic association studies is a critical step in the translation of human genome discoveries into health applications. Although genetic association studies account for a substantial proportion of the abstracts in PubMed, identifying them with standard queries is not always accurate or efficient. Further automating the literature-screening process can reduce the burden of a labor-intensive and time-consuming traditional literature search. The Support Vector Machine (SVM), a well-established machine learning technique, has been successful in classifying text, including biomedical literature. The GAPscreener, a free SVM-based software tool, can be used to assist in screening PubMed abstracts for human genetic association studies.Entities:
Mesh:
Year: 2008 PMID: 18430222 PMCID: PMC2387176 DOI: 10.1186/1471-2105-9-205
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Performance test results comparing SVM results with known classification in test set (data selected from PubMed during five consecutive weeks from Feb 22, 2007 to March 28, 2007)
| One Way | Recall | 0.946 | 0.968 | 0.951 | 0.965 | 0.951 | 0.967 | |
| Precision | 0.345 | 0.297 | 0.265 | 0.298 | 0.265 | (0.958–0.975) | ||
| Specificity | 0.981 | 0.981 | 0.980 | 0.981 | 0.980 | |||
| < 0.0001 | ||||||||
| Two Way | Recall | 0.946 | 0.992 | 0.967 | 0.977 | 0.993 | 0.982 | |
| Precision | 0.345 | 0.311 | 0.291 | 0.323 | 0.336 | (0.976 – 0.987) | ||
| Specificity | 0.981 | 0.982 | 0.982 | 0.983 | 0.984 | |||
One-way: key words with z scores greater than 1.96 were selected as featured key words.
Two-way: key words with z scores greater than 1.96 or less than -1.96 were selected as featured key words.
AUC: area under the curve.
CI: confident interval
Results of the SVM method and previous method in screening PubMed for the HuGE Pub Lit database.
| 22 | 17 | 5 | 3 | |
| 5 | 4 | 1 | 4 | |
| 179 | 159 | 131 | 114 | |
| 206 | 180 | 137 | 121 |
* True positives re-evaluated by the curator.
Numbers of PubMed abstracts requiring manual review after screening by SVM method and previous method*.
| 521 | 397 | 458 | 400 | 1776 | |
| 4010 | 3013 | 3789 | 3382 | 14194 |
Note: the number for the SVM tool was generated based on Entrez date; the number for the previous method was generated based on MeSH date.
*: Previous method: the screening method described in the reference 5.
Figure 1Results of traditional search method compared with use of GAPscreener (preterm birth example). Both methods searched all PubMed abstracts entered from January 1, 1990 through April 12, 2007. Numbers indicate the number of PubMed abstracts processed at each stage.
Figure 2Data flow scheme in GAPscreener's screening process.
Figure 3Graphical user interface (GUI) of GAPscreener.