| Literature DB >> 19376825 |
Leon French1, Suzanne Lane, Tamryn Law, Lydia Xu, Paul Pavlidis.
Abstract
MOTIVATION: Many microarray datasets are available online with formalized standards describing the probe sequences and expression values. Unfortunately, the description, conditions and parameters of the experiments are less commonly formalized and often occur as natural language text. This hinders searching, high-throughput analysis, organization and integration of the datasets.Entities:
Mesh:
Year: 2009 PMID: 19376825 PMCID: PMC2687992 DOI: 10.1093/bioinformatics/btp259
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Outline of the methods. The procedure starts with free text associated with a genomics study. The text is converted to UMLS concepts then mapped to a FMA ontology term. The ontology term can then be associated with the genomics study.
Number and accuracy of mentions before and after filtering steps
| Stage | Mentions | Annotations | Recall | Precision (min) |
|---|---|---|---|---|
| Unfiltered | 58 030 | 5484 | 0.497 | 0.094 |
| Filtered for rejected SUI+CUI, and CUI → URI pairs | 39 155 | 3985 | 0.488 | 0.128 |
| Filtered for uninformative concepts | 26 525 | 2740 | 0.488 | 0.185 |
Comparison to manual annotations, divided by ontology
| Name | Existing | Predicted | Intersection | Recall | Precision (min) |
|---|---|---|---|---|---|
| FMA | 682 | 1351 | 304 | 0.446 | 0.225 |
| DO | 217 | 1041 | 127 | 0.585 | 0.122 |
| BIRNLex | 143 | 348 | 77 | 0.538 | 0.221 |
| All | 1042 | 2740 | 508 | 0.488 | 0.185 |
Top 40 concepts mapped to experiments
| Concept name | Count |
|---|---|
| Brain | 119 |
| Cerebral cortex | 61 |
| Spinal cord | 56 |
| Malignant neoplasms | 55 |
| cancer | 49 |
| Hippocampus | 46 |
| Spleen | 42 |
| Stem cell | 35 |
| Cerebellum | 34 |
| Heart | 32 |
| Liver | 31 |
| Muscle tissue | 30 |
| Kidney | 28 |
| Pair of lungs | 27 |
| Infection | 25 |
| Communicable diseases | 24 |
| Nervous system | 21 |
| Skeletal muscle tissue | 21 |
| Breast | 21 |
| Epithelial cell | 19 |
| Blood | 18 |
| Hypothalamus | 17 |
| Neurodegenerative disorders | 17 |
| Chromosome | 16 |
| Retina | 16 |
| Carcinoma | 16 |
| Prostate | 16 |
| Neoplasm metastasis | 15 |
| Frontal lobe | 15 |
| Bone marrow | 15 |
| Malignant neoplasm of breast | 15 |
| Breast carcinoma | 15 |
| Amygdala | 14 |
| Colon | 14 |
| Alzheimer's disease | 13 |
| Neuraxis | 13 |
| Mammary neoplasms | 12 |
| Primary tumor | 12 |
| Fibroblast | 12 |
| Epithelium | 11 |
Recall of annotations with cross-references
| Name | Existing annotations with mappings | Predicted annotations | Recall |
|---|---|---|---|
| FMA | 404 | 1351 | 0.752 |
| DO | 217 | 1041 | 0.585 |
| BIRNLex | 100 | 348 | 0.770 |
| All | 721 | 2740 | 0.705 |
Manual evaluation of annotation quality
| Name | Predicted | Accepted | Precision |
|---|---|---|---|
| FMA | 213 | 179 | 0.840 |
| DO | 195 | 176 | 0.903 |
| BIRNLex | 55 | 55 | 1.000 |
| All | 463 | 410 | 0.886 |