| Literature DB >> 35675101 |
Michael Feldgarden1, Vyacheslav Brover1, Boris Fedorov1, Daniel H Haft1, Arjun B Prasad1, William Klimke1.
Abstract
Antimicrobial resistance (AMR) is a significant public health threat. Low-cost whole-genome sequencing, which is often used in surveillance programmes, provides an opportunity to assess AMR gene content in these genomes using in silico approaches. A variety of bioinformatic tools have been developed to identify these genomic elements. Most of those tools rely on reference databases of nucleotide or protein sequences and collections of models and rules for analysis. While the tools are critical for the identification of AMR genes, the databases themselves also provide significant utility for researchers, for applications ranging from sequence analysis to information about AMR phenotypes. Additionally, these databases can be evaluated by domain experts and others to ensure their accuracy. Here we describe how we curate the genes, point mutations and blast rules, and hidden Markov models used in NCBI's AMRFinderPlus, along with the quality-control steps we take to ensure database quality. We also describe the web interfaces that display the full structure of the database and their newly developed cross-browser relationships. Then, using the Reference Gene Catalog as an example, we detail how the databases, rules and models are made publicly available, as well as how to access the software. In addition, as part of the Pathogen Detection system, we have analysed over 1 million publicly available genomes using AMRFinderPlus and its databases. We discuss how the computed analyses generated by those tools can be accessed through a web interface. Finally, we conclude with NCBI's plans to make these databases accessible over the long-term.Entities:
Keywords: antimicrobial resistance; curation; genomics
Mesh:
Year: 2022 PMID: 35675101 PMCID: PMC9455714 DOI: 10.1099/mgen.0.000832
Source DB: PubMed Journal: Microb Genom ISSN: 2057-5858
Fig. 1.The Reference Gene Catalog. For acquired genes, each row contains the gene symbol, the allele symbol, GenBank and RefSeq nucleotide and protein accessions, phenotype information, and a PubMed citation. For point mutations, each row contains an allele symbol, which is a concatenation of the point mutation and gene symbol of the reference gene, the gene symbol of the reference gene, GenBank and RefSeq nucleotide or protein accessions of the reference sequence, phenotype information and a PubMed citation.
Fig. 2.. (a) Example of AMRFinderPlus’ hierarchical structure, starting with bla KPC-2 at the top and moving to less specific proteins. b) A screenshot showing how bla KPC-2 is displayed in the Reference Hierarchy Viewers (https://www.ncbi.nlm.nih.gov/pathogens/genehierarchy/#blaKPC-2). Note that the uppermost ‘bla’ row is an organizational node, and lacks a HMM, so it is not represented in (a).
Fig. 3.How curators update and release AMR database products. Circles are data sources, squares are processes, and diamonds are tests. Orange is for curator supervised steps, light blue is for internal-only, light green is other NCBI resources and databases, dark blue is for public access to complete data files, and dark green is for public access through web interfaces. ‘SAUTE guided assembler’ refers to a set of non-redundant nucleotide sequences derived from the Refseq nucleotide sequences of acquired AMR genes and some virulence genes deemed of critical importance. These sequences are used by the SAUTE guided assembler in the Pathogen Detection assembly process to ensure assembly of these critical genes.
Fig. 4.Interactions among database viewers and existing NCBI resources. Arrows represent links resulting from selection of individual field values or cross-browser selection.
Field-specific links within browsers. ‘Browser’ describes the browser. ‘Field in browser’ describes the specific column with the hyperlink. ‘Function’ describes what is displayed upon selecting the hyperlink
|
Browser |
Field in Browser |
Function |
|---|---|---|
|
|
Assembly |
Links to Assembly record for that page |
|
|
BioSample |
Links to BioSample page for that BioSample |
|
|
BioProject |
Links to BioProject record for that BioProject |
|
|
Assembly |
Links to Assembly record for that assembly |
|
|
BioSample |
Links to BioSample record for that BioSample |
|
|
BioProject |
Links to BioProject record for that BioProject |
|
|
Closest Reference Accession |
Links to Protein record of closest reference protein |
|
|
Contig |
Links to Nucleotide record of contig containing the element |
|
|
HMM Accession |
Links to HMM record in Protein Family Model database |
|
|
Protein |
Links to Protein record |
|
|
PubMed ID |
Links to related PubMed record(s) for that genetic element |
|
|
Start/Stop |
Links to Nucleotide record of contig containing the element but displays only the element itself |
|
|
Allele |
Links to all isolates in the Isolates Browser containing that allele |
|
|
Gene Family |
Links to all isolates in the Isolates Browser containing that gene family |
|
|
GenBank Nucleotide Accession |
Links to the GenBank Nucleotide Record for that element |
|
|
GenBank Protein Accession |
Links to the GenBank Protein Record for that element |
|
|
Hierarchy Node ID |
Displays that node in the Reference Gene Hierarchy |
|
|
PubMed ID |
Links to related PubMed record(s) for that genetic element |
|
|
RefSeq Nucleotide Accession |
Links to the RefSeq Nucleotide Record for that element |
|
|
RefSeq Protein Accession |
Links to the RefSeq Protein Record for that element |
|
|
HMM Accession |
Links to HMM record in Reference HMM Catalog |
|
|
Protein |
Links to Protein record |
|
|
Accession |
Links to HMM record in Protein Family Model database |
|
|
MicroBIGG-E |
Displays all genetic elements identified by the HMM in MicroBIGG-E |