| Literature DB >> 19850722 |
Lynn M Schriml1, Cesar Arze, Suvarna Nadendla, Anu Ganapathy, Victor Felix, Anup Mahurkar, Katherine Phillippy, Aaron Gussman, Sam Angiuoli, Elodie Ghedin, Owen White, Neil Hall.
Abstract
The Gemina system (http://gemina.igs.umaryland.edu) identifies, standardizes and integrates the outbreak metadata for the breadth of NIAID category A-C viral and bacterial pathogens, thereby providing an investigative and surveillance tool describing the Who [Host], What [Disease, Symptom], When [Date], Where [Location] and How [Pathogen, Environmental Source, Reservoir, Transmission Method] for each pathogen. The Gemina database will provide a greater understanding of the interactions of viral and bacterial pathogens with their hosts and infectious diseases through in-depth literature text-mining, integrated outbreak metadata, outbreak surveillance tools, extensive ontology development, metadata curation and representative genomic sequence identification and standards development. The Gemina web interface provides metadata selection and retrieval of a pathogen's; Infection Systems (Pathogen, Host, Disease, Transmission Method and Anatomy) and Incidents (Location and Date) along with a hosts Age and Gender. The Gemina system provides an integrated investigative and geospatial surveillance system connecting pathogens, pathogen products and disease anchored on the taxonomic ID of the pathogen and host to identify the breadth of hosts and diseases known for these pathogens, to identify the extent of outbreak locations, and to identify unique genomic regions with the DNA Signature Insignia Detection Tool.Entities:
Mesh:
Year: 2009 PMID: 19850722 PMCID: PMC2808878 DOI: 10.1093/nar/gkp832
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Accession and keyword criteria
| Genome category | Determined by | Keywords or accession patterns |
|---|---|---|
| Whole Genome Project (WGP) | Accession prefix | GenBank: AE, CP, CY EMBL: AL, BX, CR, CT, CU DDBJ: AP RefSeq: AC, NC, NT, NW WGS: AAAA–AZZZ, BAAA– BZZZ, CAAA–CZZZ, NZHTGS: AK, AC, DP |
| Complete Genomes (cg) | NCBIs genome database | Accessions identified by ‘complete’ [properties] search Accessions from NCBI's;/genomes/IDs files Keyword ‘complete genome’ or ‘complete chromosome’ in definition line |
| NCBIs closed genomes | Accession Patterns: GenBank: AE, CP, CY EMBL: AL, BX, CR, CT, CU DDBJ: AP RefSeq: AC, NC |
Genome category, keywords and accession patterns for selection of WGS and Complete Genome sequences for Gemina.
Figure 1.Gemina Database Query Web Interface. The Gemina web interface provides users with the option to select one or more vocabulary terms to build a query to submit against the Gemina database. This Bacillus anthracis example provides a demonstration of terms selected from two of Gemina's; vocabularies. Synonyms, common names, taxonomy IDs and multiple terms (such as Anthrax 2012) may be used in the search box. A search of the Diseases vocabulary and the term Anthrax will initiate a broader search and will retrieve data associated with the Anthrax term and terms that are children of Anthrax as they are types of Anthrax.
Figure 2.Gemina database geospatial surveillance resource. The Gemina Incident Search Report page provides geographic locations in GIS environment as demonstrated in this example of all incident locations for Clostridium botulinum. Links to related data at GenBank and NCBI's; Taxonomy database are provided under ‘Places’ and as pop-ups for each incident.
Figure 3.Incidents query results demonstration. Pathogen outbreak data in Gemina's; Incident Results reporting the extent of outbreak locations for Bacillus anthracis str. A2012 (Florida strain) in 2001 provides an example of Gemina's; incident metadata and Google Maps geographic location data display.
Figure 4.Gemina Infection results for multiple vocabulary query. Infection systems for Bacillus anthracis str. A2012 (Florida strain) were identified from the Gemina database. The query included the pathogen name: ‘Bacillus anthracis str. A2012’; the set of possible Anthrax diseases (disease ontology). The less restrictive search of ‘Bacillus anthracis’ identified 19 unique pathogens and 68 infection systems and a wide range of hosts and 215 associated incidents.