| Literature DB >> 22380483 |
Xue Ting Wee1, Yvonne Koh, Chun Wei Yap.
Abstract
BACKGROUND: Product risk management involves critical assessment of the risks and benefits of health products circulating in the market. One of the important sources of safety information is the primary literature, especially for newer products which regulatory authorities have relatively little experience with. Although the primary literature provides vast and diverse information, only a small proportion of which is useful for product risk assessment work. Hence, the aim of this study is to explore the possibility of using text mining to automate the identification of useful articles, which will reduce the time taken for literature search and hence improving work efficiency. In this study, term-frequency inverse document-frequency values were computed for predictors extracted from the titles and abstracts of articles related to three tumour necrosis factors-alpha blockers. A general automated system was developed using only general predictors and was tested for its generalizability using articles related to four other drug classes. Several specific automated systems were developed using both general and specific predictors and training sets of different sizes in order to determine the minimum number of articles required for developing such systems.Entities:
Mesh:
Year: 2012 PMID: 22380483 PMCID: PMC3315431 DOI: 10.1186/1472-6947-12-13
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Comparison of performance of different algorithms using general predictors on the validation set
| Model | AUC |
|---|---|
| Logistic regression | 0.829 |
| K-nearest neighbor (k = 3) | 0.642 |
| Naive Bayes | 0.673 |
| SVM (gamma = 1.0, C = 0.0) | 0.870 |
Comparison of performance of SVM models using different types of frequencies on the validation set
| Type of frequency | All predictors | General predictors |
|---|---|---|
| Word occurrence | 0.892 | 0.870 |
| Binary frequency | 0.849 | 0.828 |
| TF-IDF | 0.909 | 0.898 |
Figure 1Lift chart of SVM model trained using TF-IDF of general predictors on validation set.
Figure 2Lift chart of SVM model trained using TF-IDF of general predictors on generalizability set.
Comparison of performance of models trained on various training set sizes using all predictors on the validation set
| Training set size | Ratio of useful: non-useful articles | AUC |
|---|---|---|
| 2 | 1: 1 | 0.684 |
| 20 | 1: 9 | 0.748 |
| 36 | 1: 17 | 0.699 |
| 74 | 1: 9.57 | 0.749 |
| 112 | 1: 4.89 | 0.794 |
Figure 3Use of automated system in routine risk assessment work.
Figure 4Splitting of dataset.
Figure 5Summary of text mining work flow.
Figure 6Projection of abstracts to a high dimensional space in a SVM model.