| Literature DB >> 23282337 |
Yonghui Wu1, Mia A Levy, Christine M Micheel, Paul Yeh, Buzhou Tang, Michael J Cantrell, Stacy M Cooreman, Hua Xu.
Abstract
BACKGROUND: Many cancer clinical trials now specify the particular status of a genetic lesion in a patient's tumor in the inclusion or exclusion criteria for trial enrollment. To facilitate search and identification of gene-associated clinical trials by potential participants and clinicians, it is important to develop automated methods to identify genetic information from narrative trial documents.Entities:
Mesh:
Year: 2012 PMID: 23282337 PMCID: PMC3535695 DOI: 10.1186/1471-2164-13-S8-S21
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Six categories of gene mentions in clinical trial documents.
| Category | Definition | Examples | ||
|---|---|---|---|---|
| 1 | Gene-related | Genetic lesion detected | Genetic lesion status is detected. | • Positive EGFR mutation test... |
| 2 | Genetic lesion not detected | Genetic lesion status is Not Detected. | • ...negative staining for Kit. | |
| 3 | Genetic lesion mentioned | Analysis of genetic lesion is mentioned but not particular results | • BRAF - gene analysis of archival tissue | |
| 4 | Gene only | It refers to the gene entity only, no status is associated. | • KIT is a gene that codes for ... | |
| 5 | Drug | Gene related drugs, drug classes, or other therapy | • WT1 Peptide Vaccination in Carcinomas. | |
| 6 | Others | None of the above classes, e.g., English words, | • ...using the kit and testing procedures. | |
Accuracy of two-stage and single classifiers for individual genes.
| Gene | Two-stage classifier | Single-stage classifier |
|---|---|---|
| ALK | 88.2% | 88.1% |
| BRAF | 85.4% | 85.4% |
| EGFR | 82.5% | 82.0% |
| KIT | 75.9% | 75.2% |
| KRAS | 87.5% | 87.5% |
| MET | 93.0% | 92.5% |
| PTEN | 73.9% | 71.3% |
| WT1 | 83.5% | 83.5% |
Accuracy of the gene-neutral two-stage and single classifiers.
| Testing Gene | Training Genes | Two-stage Classifier | Single-stage Classifier |
|---|---|---|---|
| ALK | BRAF, EGFR, KIT, KRAS, MET, PTEN, WT1 | 68.6% | 64.7% |
| BRAF | ALK, EGFR, KIT, KRAS, MET, PTEN, WT1 | 85.4% | 84.6% |
| EGFR | ALK, BRAF, KIT, KRAS, MET, PTEN, WT1 | 78.0% | 73.5% |
| KIT | ALK, BRAF, EGFR, KRAS, MET, PTEN, WT1 | 74.5% | 70.3% |
| KRAS | ALK, BRAF, EGFR, KIT, MET, PTEN, WT1 | 87.0% | 81.5% |
| MET | ALK, BRAF, EGFR, KIT, KRAS, PTEN, WT1 | 73.5% | 41.5% |
| PTEN | ALK, BRAF, EGFR, KIT, KRAS, MET, WT1 | 78.3% | 78.3% |
| WT1 | ALK, BRAF, EGFR, KIT, KRAS, MET, PTEN | 65.0% | 55.7% |
Precision, Recall and F-score for individual categories across the top eight genes.
| Two-stage classifier | Single-stage classifier | |||||
|---|---|---|---|---|---|---|
| Genetic lesion detected | 75.4% | 92.0% | 82.9% | 78.4% | 89.4% | 83.5% |
| Genetic lesion not detected | 90.2% | 78.0% | 83.7% | 89.4% | 78.0% | 83.3% |
| Genetic lesion mentioned | 78.5% | 60.9% | 68.6% | 83.5% | 55.1% | 66.4% |
| Gene only | 66.7% | 16.0% | 25.8% | 60.0% | 12.0% | 20.0% |
| Drug | 91.0% | 90.3% | 90.6% | 82.0% | 95.2% | 88.1% |
| Others | 100% | 93.8% | 96.8% | 100% | 94.3% | 97.1% |
Figure 1Frequency distribution of detected gene symbols detected in cancer trial documents.
Frequency distribution among different categories for the top eight genes.
| Gene | # of Samples | Others | Drug | Genetic lesion detected | Genetic lesion not detected | Genetic lesion mentioned | Gene only |
|---|---|---|---|---|---|---|---|
| ALK | 102 | 32 | 10 | 41 | 15 | 4 | 0 |
| BRAF | 130 | 0 | 32 | 63 | 13 | 21 | 1 |
| EGFR | 200 | 4 | 117 | 37 | 5 | 26 | 11 |
| KIT | 145 | 5 | 26 | 82 | 12 | 16 | 4 |
| KRAS | 200 | 2 | 0 | 65 | 95 | 38 | 0 |
| MET | 200 | 147 | 28 | 16 | 0 | 5 | 4 |
| PTEN | 69 | 4 | 2 | 41 | 1 | 19 | 2 |
| WT1 | 97 | 0 | 53 | 32 | 0 | 9 | 3 |
| 1,143 | 194 | 268 | 377 | 141 | 138 | 25 | |
Figure 2Performance of the gene-neutral classifier at each iteration. The triplet values at each point represent the accuracy, the number of training samples, and the number of test samples respectively.