| Literature DB >> 17686183 |
Nigam H Shah1, Daniel L Rubin, Inigo Espinosa, Kelli Montgomery, Mark A Musen.
Abstract
BACKGROUND: The Stanford Tissue Microarray Database (TMAD) is a repository of data serving a consortium of pathologists and biomedical researchers. The tissue samples in TMAD are annotated with multiple free-text fields, specifying the pathological diagnoses for each sample. These text annotations are not structured according to any ontology, making future integration of this resource with other biological and clinical data difficult.Entities:
Mesh:
Substances:
Year: 2007 PMID: 17686183 PMCID: PMC1988837 DOI: 10.1186/1471-2105-8-296
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 3Overview of the process of annotating TMAD samples with NCI terms. The figure shows the workflow for setting up TMAD to use NCI thesaurus for annotating tissue microarray samples. The scripts in the grey boxes have to be run once; they create the necessary tables to load the NCI-T in relational form as well as the tables storing the mapping b/w a sample id, its user specified annotation and the matched NCI term. The scripts in the blue boxes are run every time new samples are added to TMAD. The Browsing script (yellow boxes) is run when a user navigates TMAD using the NCI-T terms.
Figure 1NCI-thesaurus based browsing interface. The figure shows our NCI-T based browsing interface. The user begins a query by typing a term in a text box. The same methods that map a sample's description to NCI terms will match the query words to NCI terms. Matched terms and the number of samples corresponding to each term are then presented in a graph view which explained further in the next figure.
Figure 2Details of the NCI-thesaurus based browsing interface. The figure shows a zoomed in region of the DAG view resulting from clicking on the term Adrenal gland neoplasm as described in the example in the main text. The red node is the term that has been clicked by the user, the yellow nodes are the child terms that have at least one sample in the TMA database assigned to that term, grey nodes are child terms with no corresponding samples in the TMAD and burlywood nodes are parent terms with less than 50 samples. Samples can be retrieved for the selected node.
Precision calculation for three samples from the matched records
| True Positive | False Positive | |
| Set-1 | 44 | 6 |
| Set-2 | 42 | 8 |
| Set-3 | 43 | 7 |
| Total | 129 | 21 |
| Average (%) | 43.0 (86%) | 7.0 (14%) |
False negatives for three samples from the unmatched records
| True negative | False negative | |
| Set-1 | 36 | 14 |
| Set-2 | 31 | 19 |
| Set-3 | 31 | 19 |
| Total | 98 | 52 |
| Average (%) | 32.6 (66%) | 17.4 (34%) |
Figure 4Schematic showing the need to integrate annotations of tissue samples. The figure shows a schematic of the need for a specific mechanism to identify relevant samples in tissue and gene expression databases to perform integrative analyses. The correspondence denoted by the red arrow is hard to establish with free text sample descriptions.