| Literature DB >> 22211174 |
Elizabeth S Chen1, Indra Neil Sarkar.
Abstract
Within large sequence repositories such as GenBank there is a wealth of metadata providing contextual information that may enhance search and retrieval of relevant sequences for a range of subsequent analyses. One challenge is the use of free-text in these metadata fields where approaches are needed to extract, structure, and encode essential information. The goal of the present study was to explore the feasibility of using a combination of existing resources for annotating unstructured GenBank metadata, initially focusing on the "host" and "isolation_source" fields. This paper summarizes early results for 10 host organisms that include a characterization of associated isolation sources with respect to biomedical ontologies and semantic types. The findings from this preliminary study provide insights to the rich amount of information captured within these unstructured metadata, guidance for addressing the challenges and issues encountered, and highlight the potential value for enriching comparative biological studies towards improving human health.Entities:
Year: 2011 PMID: 22211174 PMCID: PMC3248757
Source DB: PubMed Journal: AMIA Jt Summits Transl Sci Proc
Figure 1:Overview of Methods
Top 10 Host Organisms with Frequencies for Host (A), Isolation Source (B), and Organism (C).
| 9606 | Homo sapiens | human | 545470 | 609 | 337437 | 3628 | 123 | 83 | 545357 | 19645 |
| 10116 | Rattus norvegicus | Norway rat | 156894 | 19 | 77399 | 30 | 71 | 20 | 80888 | 184 |
| 10118 | Rattus sp. | 76008 | 7 | 75933 | 4 | 20 | 5 | 75963 | 13 | |
| 9805 | Diceros bicornis | black rhinoceros | 49500 | 3 | 49494 | 3 | 12 | 2 | 49499 | 7 |
| 9796 | Equus caballus | horse | 27582 | 38 | 4338 | 59 | 71 | 32 | 27575 | 323 |
| 9792 | Equus grevyi | Grevy’s zebra | 23280 | 3 | 23270 | 4 | 14 | 4 | 23276 | 4 |
| 10090 | Mus musculus | house mouse | 21088 | 33 | 14710 | 44 | 80 | 26 | 21071 | 172 |
| 9913 | Bos taurus | cattle | 19540 | 78 | 10454 | 191 | 96 | 47 | 19462 | 884 |
| 9891 | Antilocapra americana | pronghorn | 12951 | 1 | 12950 | 1 | 12 | 2 | 12951 | 2 |
| 9844 | Lama glama | llama | 11582 | 2 | 11579 | 3 | 31 | 5 | 11582 | 7 |
Top 5 Body Parts, Body Substances, Diseases or Syndromes, and Organisms for Selected Host Organisms.
|
| |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
|
| |||||||||||
| esophagus | 0.212 | lung | 0.432 | brain | 0.775 | cecum | 0.540 | rumen | 0.729 | ||
| external auditory canal | 0.143 | rat colon | 0.326 | vagina | 0.113 | ileum | 0.449 | teat | 0.050 | ||
| hoof | 0.028 | spleen | 0.002 | omasum | 0.028 | ||||||
| umbilicus | 0.140 | ileum | 0.068 | gastric mucosa | 0.028 | lung | 0.002 | brain | 0.031 | ||
| manubrium | 0.128 | caecum | 0.030 | intestinal | 0.002 | nasal | 0.026 | ||||
| glabella | 0.123 | kidney | 0.023 | uterus | 0.014 | ||||||
|
| |||||||||||
|
| |||||||||||
| saliva | 0.317 | feces | >99.999 | feces | 0.959 | feces | 0.980 | feces | 0.947 | ||
| feces | 0.259 | blood | <0.001 | semen | 0.022 | blood | 0.011 | blood | 0.021 | ||
| plasma | 0.166 | isolate | <0.001 | blood | 0.014 | lysate | 0.009 | milk | 0.014 | ||
| serum | 0.142 | peripheral blood | 0.003 | serum | 0.006 | ||||||
| blood | 0.027 | serum | <0.001 | exudate | 0.004 | ||||||
|
| |||||||||||
|
| |||||||||||
| subgingival plaque | 0.161 | sarcoid | 0.714 | Salmonella | 1.000 | interdigital necrobacillosis | 0.892 | ||||
| encephalitis | 0.143 | ||||||||||
| chronic hepatitis b | 0.140 | valvular endocarditis | 0.071 | mastitis | 0.070 | ||||||
| dermatitis | 0.020 | ||||||||||
| pneumococcal infection | 0.121 | endometritis | 0.071 | septicemia | 0.004 | ||||||
| warts | 0.004 | ||||||||||
| liver abscess | 0.050 | ||||||||||
| acute hepatitis b | 0.049 | ||||||||||
|
| |||||||||||
| uncultured bacterium (0.589) | uncultured bacterium (0.986) | uncultured Neocallimastigales (0.897) | uncultured bacterium (0.957) | uncultured Neocallimastigales (0.280) | |||||||
| Human immunodeficiency virus 1 (0.112) | uncultured Escherichia sp. (0.002) | Equine infectious anemia virus (0.022) | Lactobacillus Reuteri (0.005) | uncultured bacterium (0.277) | |||||||
| Hepatitis C virus (0.027) | Seoul virus (0.002) | Burkholderia mallei PRL-2 (0.010) | uncultured Clostridiales Bacterium (0.005) | Rabies virus (0.055) | |||||||
| uncultured organism (0.020) | Lactobacillus reuteri (0.001) | Burkholderia mallei GB8 horse 4 (0.007) | Lymphocytic choriomeningitis virus (0.005) | uncultured rumen archaeon (0.036) | |||||||
| Hepatitis B virus (0.018) | uncultured Bacillus sp. (0.001) | Equine arteritis virus (0.005) | Hepatitis C virus (0.004) | uncultured rumen bacterium (0.035) | |||||||