| Literature DB >> 20470429 |
Florian Boudin1, Jian-Yun Nie, Joan C Bartlett, Roland Grad, Pierre Pluye, Martin Dawes.
Abstract
BACKGROUND: Formulating a clinical information need in terms of the four atomic parts which are Population/Problem, Intervention, Comparison and Outcome (known as PICO elements) facilitates searching for a precise answer within a large medical citation database. However, using PICO defined items in the information retrieval process requires a search engine to be able to detect and index PICO elements in the collection in order for the system to retrieve relevant documents.Entities:
Mesh:
Year: 2010 PMID: 20470429 PMCID: PMC2891622 DOI: 10.1186/1472-6947-10-29
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Statistics about the training data
| Dataset | Abstracts | Sentences |
|---|---|---|
| Population/Problem | 14,279 | 191,608 |
| Intervention/Comparison | 9,095 | 125,399 |
| Outcome | 2,394 | 32,908 |
Statistics about the semantic type lists.
| ULMS Semantic type identifiers | Terms | |
|---|---|---|
| List 1 (Living Beings) | Age Group (T100), Family Group (T099), Group (T096), Human (T016), Patient or Disabled Group (T101), Population Group (T098) | 716 |
| List 2 (Disorders) | Acquired Abnormality (T020), Anatomical Abnormality (T190), Cell or Molecular Dysfunction (T049), Congenital Abnormality (T019), Disease or Syndrome (T047), Experimental Model of Disease (T050), Finding (T033), Injury or Poisoning (T037), Mental or Behavioral Dysfunction (T048), Neoplastic Process (T191), Pathologic Function (T046), Sign or Symptom (T184) | 23,541 |
| List 3 (Chemicals & Drugs) | Amino Acid, Peptide, or Protein (T116), Antibiotic (T195), Biologically Active Substance (T123), Biomedical or Dental Material (T122), Carbohydrate (T118), Chemical (T103), Chemical Viewed Functionally (T120), Chemical Viewed Structurally (T104), Clinical Drug (T200), Eicosanoid (T111), Element, Ion, or Isotope (T196), Enzyme (T126), Hazardous or Poisonous Substance (T131), Hormone (T125), Immunologic Factor (T129), Indicator, Reagent, or Diagnostic Aid (T130), Inorganic Chemical (T197), Lipid (T119), Neuroreactive Substance or Biogenic Amine (T124), Nucleic Acid, Nucleoside, or Nucleotide (T114), Organic Chemical (T109), Organophosphorus Compound (T115), Pharmacologic Substance (T121), Receptor (T192), Steroid (T110), Vitamin (T127) | 57,793 |
Statistical features (marked with *) and knowledge-based (marked with †) features extracted for classifying sentences.
| Feature |
|---|
| Position in the document (absolute, relative) * |
| Sentence length * |
| Number of punctuation marks * |
| Number of numeric numbers n > 10, n < 10 * |
| Word overlap with title * |
| Number of cue-words (P, I, O)† |
| Number of cue-verbs (P, I, O)† |
| MeSH semantic types |
| Number of (n = [0-9]+) † |
Performance of each classifier in terms of precision (p), recall (r) and f-measure (f).
| P-element | I-element | O-element | |||||||
|---|---|---|---|---|---|---|---|---|---|
| BL | 52.1 | 52.1 | 52.1 | 21.9 | 21.9 | 21.9 | 20.0 | 20.0 | 20.0 |
| J48 | 79.7 | 75.8 | 77.7 | 57.3 | 54.6 | 55.9 | 49.7 | 42.0 | 45.5 |
| NB | 66.9 | 65.0 | 66.0 | 50.1 | 47.9 | 49.0 | 48.6 | 47.7 | 48.1 |
| RF | 86.7 | 81.3 | 83.9 | 67.2 | 60.2 | 63.5 | 55.7 | 46.2 | 50.6 |
| SVM | 94.6 | 61.2 | 74.3 | 79.6 | 26.1 | 39.3 | 75.4 | 10.9 | 19.0 |
| F1 | 89.9 | 78.2 | 83.6 | 71.2 | 55.2 | 62.2 | 62.6 | 42.7 | 50.8 |
| F2 | 86.2 | 85.0 | 85.6 | 66.5 | 64.8 | 65.6 | 57.2 | 54.8 | 56.0 |
Performance of the Outcome classifiers in terms of f-measure (f) at 2 and 3 sentence cut-off.
| 2-sentence cut-off | 3-sentence cut-off | |
|---|---|---|
| J48 | 57.0 | 61.2 |
| NB | 65.2 | 74.5 |
| RF | 61.9 | 67.3 |
| SVM | 19.1 | 19.1 |
| F1 | 58.2 | 60.8 |
| F2 | 71.4 | 78.8 |