Literature DB >> 28968762

PATRIC as a unique resource for studying antimicrobial resistance.

Dionysios A Antonopoulos, Rida Assaf, Ramy Karam Aziz, Thomas Brettin, Christopher Bun, Neal Conrad, James J Davis, Emily M Dietrich, Terry Disz, Svetlana Gerdes, Ronald W Kenyon, Dustin Machi, Chunhong Mao, Daniel E Murphy-Olson, Eric K Nordberg, Gary J Olsen, Robert Olson, Ross Overbeek, Bruce Parrello, Gordon D Pusch, John Santerre, Maulik Shukla, Rick L Stevens, Margo VanOeffelen, Veronika Vonstein, Andrew S Warren, Alice R Wattam, Fangfang Xia, Hyunseung Yoo.

Abstract

The Pathosystems Resource Integration Center (PATRIC, www.patricbrc.org) is designed to provide researchers with the tools and services that they need to perform genomic and other 'omic' data analyses. In response to mounting concern over antimicrobial resistance (AMR), the PATRIC team has been developing new tools that help researchers understand AMR and its genetic determinants. To support comparative analyses, we have added AMR phenotype data to over 15 000 genomes in the PATRIC database, often assembling genomes from reads in public archives and collecting their associated AMR panel data from the literature to augment the collection. We have also been using this collection of AMR metadata to build machine learning-based classifiers that can predict the AMR phenotypes and the genomic regions associated with resistance for genomes being submitted to the annotation service. Likewise, we have undertaken a large AMR protein annotation effort by manually curating data from the literature and public repositories. This collection of 7370 AMR reference proteins, which contains many protein annotations (functional roles) that are unique to PATRIC and RAST, has been manually curated so that it projects stably across genomes. The collection currently projects to 1 610 744 proteins in the PATRIC database. Finally, the PATRIC Web site has been expanded to enable AMR-based custom page views so that researchers can easily explore AMR data and design experiments based on whole genomes or individual genes.

Entities: Chemical Disease Gene Species

Keywords: RAST; antibiotic; antimicrobial resistance (AMR); genome annotation; minimum inhibitory concentration; the SEED

Mesh：

Year: 2019 PMID： 28968762 PMCID： PMC6781570 DOI： 10.1093/bib/bbx083

Source DB: PubMed Journal: Brief Bioinform ISSN： 1467-5463 Impact factor: 11.622

Background

The Pathosystems Resource Integration Center (PATRIC) is one of four bioinformatics resource centers (BRCs) funded by the National Institute of Allergy and Infectious Diseases (NIAID) [1]. The BRC program supports research by providing access to data associated with the NIAID Category A–C pathogenic genera [2], with PATRIC serving as the bacterial database. To provide a rich comparative analysis environment, PATRIC provides access to all publicly available genomes and associated metadata for bacterial and archaeal isolates, which includes >104 000 genomes as of June 2017. All of the genomes in PATRIC have been consistently annotated using the Rapid Annotation using Subsystems Technology toolkit (RASTtk) [3, 4]. This annotation consistency and subsequent protein family generation [5] serve as the backbone for many of the comparative analysis tools on the Web site [1]. The PATRIC database retains the annotations and identifiers from both GenBank [6, 7] and RefSeq [8] to facilitate side-by-side comparisons across the public data, allowing researchers to quickly find genomes and genes with information that they have gathered from different resources. PATRIC also provides researchers with a private workspace, where they can access bioinformatics services including genome assembly, annotation, RNA sequencing, variation calling, Tn-Seq, similar genome finder, proteome comparison and metabolic model reconstruction. When a user annotates a private genome with the PATRIC annotation service, they can compare their genome with the public collection. This ‘virtual integration’ provides a unique analysis experience that is not available at a similar scale at any other data repository. Facilitating research on antimicrobial resistance (AMR) has become increasingly important with the recent escalation in resistance and the loss of effectiveness to first-line drugs [9-13]. This resistance has a human cost, with ∼2 million people being sickened and 23 000 dying annually in the United States alone [14]. Here, we describe a set of enhancements introduced to support research on AMR.

AMR strategy

The current strategy for integrating AMR data into PATRIC breaks down roughly into two parts: (1) data collection to support analyses of whole genomes and (2) data collection to support analyses of individual proteins (Figure 1). In both cases, the data are drawn from the literature as well as a number of public resources. Specifics on the data integration, curation and tools are described below.

Figure 1

PATRIC annotation process for integrating AMR data in both genomic regions and genes.

AMR—integrating data at the genome level

Data collection

To support an environment for comparative analysis, we integrate metadata associated with the public genomes at GenBank [7] into the PATRIC database. This makes it easy to build sets of genomes that are based on collection date, geographic location, host, isolation source, etc. These metadata fields are incorporated both from BioSample [15] and directly from the GenBank file when an assembled genome is added to PATRIC. In some cases, metadata are acquired first hand from the NIAID-funded genome sequencing centers and from collaborators wishing to make their data public. Given the increasing emphasis on research to combat AMR and the decreasing costs of sequencing, we have been able to collect a large number of genomes with AMR panel data in the form of minimum inhibitory concentrations (MICs) or susceptible, intermediate and resistant (SIR) calls [16]. These panel data provide critical context for AMR research by allowing researchers to quickly build data sets for performing protein and gene comparisons, novel gene discovery, whole-genome variation analyses and machine learning (ML) experiments (described below). To increase the number of genomes with AMR metadata in PATRIC and expand our ability to support AMR-based comparative analyses, we began searching the literature for studies that included sequenced bacterial genomes and AMR panel data. Oftentimes, panel data from these studies were not recorded in the public archives, so PATRIC becomes the only place, where both the assembled genomes and metadata are available in the same place. If a genome was assembled and deposited in GenBank [7], we attach the AMR metadata directly to the corresponding genome in PATRIC. If the reads for a genome were deposited in the Sequence Read Archive (SRA) or the European Nucleotide Archive (ENA) [17, 18], we assemble and annotate the genome using PATRIC services [1, 4, 19]. We then incorporate the genome into the database along with the metadata (Supplementary Document S1). As laboratory methods for determining MIC values vary, incorporating these data into PATRIC requires a significant manual curation effort. When information is available from the study, we record how the MIC data were generated, including the laboratory method, the units of the measurement and the platform that was used to make the measurements. When an assertion about a phenotype is provided in the form of a SIR call, we record the laboratory standard from the European Committee on Antimicrobial Susceptibility Testing (EUCAST) [20] or the Clinical and Laboratory Standards Institute [21] and the year of the standard. To date, we have attached metadata to PATRIC genomes for ∼9165 genomes and have assembled and annotated ∼6122 genomes from SRA and ENA (Supplementary Table S1). To date, all AMR metadata in PATRIC are phenotypes that are derived from laboratory analyses. Studies often assert the susceptibility or resistance of an organism based on the presence or absence of key AMR genes. We do not currently incorporate data that are only based on genotypic data. The complete collection of AMR data in PATRIC can be downloaded from the PATRIC FTP site: ftp.patricbrc.org/patric2/current_release/RELEASE_NOTES/PATRIC_genomes_AMR.txt.

ML classifiers

As the PATRIC database was rapidly accumulating AMR panel data associated with sequenced genomes, a small number of studies were being published that explored using ML algorithms to study AMR [22-24]. With a sufficient number of genomes and AMR panel data, ML algorithms can be used to predict AMR phenotypes and the genomic regions associated with AMR with no a priori knowledge of the underlying mechanisms. This is an appealing area of exploration for PATRIC because it allows us to leverage our growing metadata collection to predict AMR phenotypes within the annotation service and to identify AMR-associated genomic regions with single-nucleotide polymorphism (SNP)-level resolution, a feature that can be used to inform our ongoing manual protein annotation efforts. In early 2016, we published a study describing the collection of AMR metadata for genomes and an ML approach that used the AdaBoost algorithm [25, 26] to build classifiers for predicting AMR [16]. At the time, we had sufficient data to make predictions in the species Acinetobacter baumannii, Mycobacterium tuberculosis, Staphylococcus aureus and Streptococcus pneumoniae for nine antibiotics [16] (Table 1). Shortly thereafter, we collaborated with scientists at the Houston Methodist Research Hospital to build classifiers for Klebsiella pneumoniae covering 13 antibiotics using 1777 genomes collected in their hospital system between 2011 and 2015 [27]. Using the same protocol as described in the Davis et al. [16] and Long et al. studies [27], we added 18 additional classifiers to the annotation system that have not been previously reported, including classifiers for M. tuberculosis, Peptoclostridium difficile, Pseudomonas aeruginosa, S. aureus and S. pneumoniae (Table 1). Receiver operating characteristic (ROC) curves for the newly added classifiers are shown in Figure 2.

Table 1

AMR classifiers in the PATRIC annotation system

Species	Antibiotic^a	Resistant genomes^b	Susceptible genomes^b	F1 score	Initially described in
Acinetobacter baumannii	Carbapenem	122	110	0.95	[16]
Klebsiella pneumoniae	Amikacin	1190	364	0.92	[27]
Klebsiella pneumoniae	Aztreonam	1377	100	0.75	[27]
Klebsiella pneumoniae	Cefoxitin	555	976	0.80	[27]
Klebsiella pneumoniae	Ciprofloxacin	119	1435	0.91	[27]
Klebsiella pneumoniae	Ertapenem	265	178	0.96	[27]
Klebsiella pneumoniae	Gentamicin	786	768	0.86	[27]
Klebsiella pneumoniae	Imipenem	1100	453	0.94	[27]
Klebsiella pneumoniae	Levofloxacin	246	1307	0.93	[27]
Klebsiella pneumoniae	Meropenem	1123	430	0.92	[27]
Klebsiella pneumoniae	Piperacillin–tazobactam	322	1230	0.76	[27]
Klebsiella pneumoniae	Tetracycline	658	896	0.79	[27]
Klebsiella pneumoniae	Tobramycin	501	1053	0.94	[27]
Klebsiella pneumoniae	Co-trimoxazole	331	1223	0.87	[27]
Mycobacterium tuberculosis	Amikacin	210	350	0.91	This study
Mycobacterium tuberculosis	Capreomycin	204	350	0.83	This study
Mycobacterium tuberculosis	Isoniazid	250	250	0.88	[16]
Mycobacterium tuberculosis	Kanamycin	188	250	0.87	[16]
Mycobacterium tuberculosis	Ofloxacin	239	250	0.79	[16]
Mycobacterium tuberculosis	Rifampicin	250	250	0.86	[16]
Mycobacterium tuberculosis	Streptomycin	250	250	0.71	[16]
Peptoclostridium difficile	Azithromycin	213	246	0.97	This study
Peptoclostridium difficile	Ceftriaxone	228	86	0.86	This study
Peptoclostridium difficile	Clarithromycin	213	246	0.99	This study
Peptoclostridium difficile	Clindamycin	310	89	0.74	This study
Peptoclostridium difficile	Moxifloxacin	188	271	0.97	This study
Pseudomonas aeruginosa	Levofloxacin	192	290	0.85	This study
Staphylococcus aureus	Ciprofloxacin	467	762	0.98	This study
Staphylococcus aureus	Clindamycin	350	274	0.97	This study
Staphylococcus aureus	Erythromycin	484	821	0.96	This study
Staphylococcus aureus	Gentamicin	162	1144	0.98	This study
Staphylococcus aureus	Methicillin	707	886	0.99	[16]
Staphylococcus aureus	Penicillin	886	156	0.96	This study
Staphylococcus aureus	Tetracycline	203	1029	0.97	This study
Staphylococcus aureus	Co-trimoxazole	142	178	0.96	This study
Streptococcus pneumoniae	Beta-lactam	2124	584	0.90	[16]
Streptococcus pneumoniae	Chloramphenicol	165	289	0.94	This study
Streptococcus pneumoniae	Co-trimoxazole	2124	584	0.88	[16]
Streptococcus pneumoniae	Erythromycin	381	324	0.96	This study
Streptococcus pneumoniae	Tetracycline	368	290	0.96	This study

aAMR data in PATRIC may be described as individual antibiotics or classes of antibiotics.

bUsed for building the classifiers.

Figure 2

ROC curves for AdaBoost-based AMR classifiers installed in the annotation service since the publication of the Davis et al. [16] and Long et al. papers [27]. Accuracy and F1 scores are displayed in each inset. ROC curves depict classifiers for (A) P. difficile, (B) S. aureus and (C) K. pneumoniae (Kpn), M. tuberculosis (Mtb), P. aeruginosa (Pae) and S. pneumoniae (Spn). Antibiotic abbreviations are: AZM, azithromycin; CC, clindamycin; CIP, ciprofloxacin; CLR, clarithromycin; CRO, ceftriaxone; E, erythromycin; GM, gentamicin; MFX, moxifloxacin; OX, ofloxacin; P, penicillin; SXT, trimethoprim sulfamethoxazole; TE, tetracycline.

AMR classifiers in the PATRIC annotation system aAMR data in PATRIC may be described as individual antibiotics or classes of antibiotics. bUsed for building the classifiers. ROC curves for AdaBoost-based AMR classifiers installed in the annotation service since the publication of the Davis et al. [16] and Long et al. papers [27]. Accuracy and F1 scores are displayed in each inset. ROC curves depict classifiers for (A) P. difficile, (B) S. aureus and (C) K. pneumoniae (Kpn), M. tuberculosis (Mtb), P. aeruginosa (Pae) and S. pneumoniae (Spn). Antibiotic abbreviations are: AZM, azithromycin; CC, clindamycin; CIP, ciprofloxacin; CLR, clarithromycin; CRO, ceftriaxone; E, erythromycin; GM, gentamicin; MFX, moxifloxacin; OX, ofloxacin; P, penicillin; SXT, trimethoprim sulfamethoxazole; TE, tetracycline. To date, we have maintained a policy of adding classifiers to the annotation system when their accuracies and F1 scores exceed 70% and their top feature k-mers relate to known AMR genes. The classifiers built in this project and described in Table 1 and Figure 2 are integrated into the annotation service and can be accessed through PATRIC and RAST. Phenotype predictions and the associated genomic regions are available for browsing on both Web sites and are described in tutorials at http://tutorial.theseed.org/. Our AMR metadata collection and classifier building efforts are ongoing at PATRIC. In many cases, the AMR metadata available in published studies report pan-resistant strains, which can be difficult to classify. In an effort to improve the accuracy of the classifiers, we are actively seeking strains with AMR metadata that improve the biological diversity of the collection. This includes collecting strains susceptible to many antibiotics. We are also comparing the results from several ML methods and are in the process of adding classifiers based on these other methods when they outperform AdaBoost [25]. In this manner, an antibiotic and species would be paired with the best ML algorithm in the annotation system.

AMR—integrating data at the gene level

Starting in 2015, the PATRIC annotation team, which also maintains the SEED [28] and RAST projects [3], began a focused effort to incorporate and manually curate protein functions relating to AMR. There are several well-known consortia that strive to provide standardized nomenclature for specific groups of antibiotic resistance genes including tetracycline resistance determinants [29, 30], and different classes of β-lactamases maintained by the Lahey Clinic [31], the University of Stuttgart [32, 33] and the Institute Pasteur [34]. There are also several well-respected databases that provide collections of AMR genes covering broad categories of AMR mechanisms including the Comprehensive Antibiotic Resistance Database (CARD) [35], the Bacterial Antimicrobial Resistance Reference Gene Database [36] hosted by the National Center for Biotechnology Information as part of the National Database of Antibiotic Resistant Organisms (NDARO) and ResFinder [37]. These resources maintain reference sequences for each AMR gene type, providing each with well-curated informative product names (in the case of NDARO) or a specialized Antibiotic Resistance Ontology (ARO, provided by CARD). These collections enable accurate detection and annotation of specific AMR determinates in pathogen isolates by means of supporting the BLAST-based [38, 39] or hidden Markov model (HMM)-based [40] screening of user-submitted sequences against representative sets of AMR sequences. However, in many cases, these AMR annotations project ambiguously because newly discovered proteins can match representative proteins with differing annotations at nearly equal BLAST similarities. For example, a novel CTX-M, SHV or TEM β-lactamase could potentially present the researcher with over a hundred nearly equal BLAST hits against highly homologous but clinically different reference sequence variants, making the choice of the most appropriate product name difficult. In many cases, the best choice would be a novel allele designation, rather than one of the existing curated product names. We believed that a manual curation effort was necessary to integrate AMR sequence variants into distinct functional roles (isofunctional protein families, which are integral for the SEED/PATRIC environment) to ensure that they can be unambiguously projected to the genomes in PATRIC by the annotation service. As many resources focus more heavily on the horizontally transferred AMR genes, we began our curation effort by building functional roles for AMR-related porin and efflux pump proteins described in the literature that are often chromosomally encoded, reasoning that this would rapidly add new value to the scientific community. Afterward, this naturally led into an effort to incorporate annotations for proteins involved in tetracycline resistance. The proteins involved in efflux pumps are known to play an important role in this type of resistance [41], and there are well-described annotation rules, which have been curated by the community for decades for naming them [30, 42]. More recently, we have been annotating class by class using publicly available resources when possible.

Curation process and k-mer projection

Significant manual curation and modification of the existing RAST/RASTtk automatic annotation pipeline were required to accommodate AMR-related functional roles, as their biology differs significantly from ‘classic’ functional roles encoding prokaryotic enzymatic and nonenzymatic housekeeping functions. The process of creating projectable AMR annotations starts with the incorporation of reference proteins from the literature and public resources. BLAST searches are used to compare reference sequences against the SEED database and PATRIC [1]. The subsequent matching proteins are used to build alignments and trees, which are manually inspected to understand how specific or general an annotation is, and if it will project cleanly in the annotation system. When reference proteins from the literature create ambiguous BLAST matches or split high-similarity clades in the tree, the nomenclature is retained, but then combined into a single annotation that covers the entire clade. The training sets of representative AMR sequence variants from outside sources and the SEED database [28] are then built. They form the basis for each AMR-related functional role. An annotation string for each of the functional roles is assigned, taking into account the SEED database internal nomenclature conventions as well as those developed by the AMR research community and accepted by CARD, ResFinder, NCBI and other resources. Signature k-mers (amino acid 8-mers) are built from these functional roles as described previously [4], and the annotations are then projected to all of the genomes in PATRIC. Trees for the newly annotated AMR proteins are then manually inspected to identify clades that contain multiple annotations, indicating a lack of consistency. Inconsistencies are also identified by comparing the generation of protein families before and after the addition of a new function. The inconsistent proteins are manually re-annotated and this process is iterated until the annotations project stably and accurately across the entire database. The PATRIC manual curation effort offers a variety of additional benefits to the field of AMR research. For example, this effort is helping to alleviate the well-documented problem of miss-annotation and over prediction of AMR annotations [43, 44]. We are doing this by systematically removing erroneous annotations, which implicate non-AMR-related proteins with antibiotic resistance functions, and by annotating and attaching literature references to these closely related proteins to prevent over-projection of AMR roles, and then curating their projection over the PATRIC collection as described above. We occasionally discover clades of potential AMR proteins that are surrounded by solid AMR reference sequences, yet have not been described in any reference database. In these cases, we describe the protein as a ‘putative’ AMR protein of a given resistance type, if the sequence identity levels are 50% or better over the entire length of the protein, which enables functional projection. These are obvious targets for characterization in the laboratory. However, if a newly discovered hypothetical clade has a sequence identity that is <50%, we use the less specific annotation string for all its members. In these cases, we use the following annotations: ‘weak similarity to aminoglycoside N(6')-acetyltransferase’ and ‘weak similarity to aminoglycoside N(3)-acetyltransferase’. These are obvious targets for characterization in the laboratory. Finally, having clean sets of AMR-related functional roles facilitates SNP and other comparative analyses at PATRIC and elsewhere by providing relevant sequence peer groups for variation research. As of May 2017, the annotation of AMR determinants conferring resistance to tetracycline, β-lactam, aminoglycoside [45, 46], chloramphenicol [47] and MLSKO (macrolides, lincosamides, streptogramins, ketolides and oxazolidinones) [42, 48, 49] antibiotic classes has been completed. These include 450 functional roles for these five major antibiotic classes, as well as 36 roles for closely related non-AMR proteins. This collection comprises a combined set of 7370 reference and SEED proteins with AMR roles and 36 424 proteins with related non-AMR roles. The collection projects consistently to 1 610 744 AMR proteins with AMR roles and 2 518 252 proteins with related non-AMR roles in PATRIC. We have also associated literature references with the majority of the newly curated AMR functional roles in PATRIC, totaling 411 references. The curation effort is ongoing and is focusing on proteins conveying resistance to quinolone, vancomycin, fosfomycin, rifampin/rifamycin, nitroimidazole, bleomycin and other antibiotic classes.

Visualization of AMR data at PATRIC

Several new interfaces have been developed on the PATRIC Web site to allow researchers to fully explore the AMR data available in the resource. These interfaces include information that is summarized across all genomes for the available antibiotics, at the taxon level, and for individual genomes and genes. Details on each of these interfaces are described below.

Antibiotic view

Data from PubChem [50] are now integrated for nearly 100 specific antibiotics that can be viewed on landing pages designed especially to display this information. Each individual antibiotic has a landing page with several tabs that provide a general overview, specific information on the AMR phenotype, the genes associated with that phenotype and the regions within the individual genes or genomes that are linked to resistance or susceptibility to that specific drug (Figure 3).

Figure 3

Summary information for the antibiotic methicillin at PATRIC. The antibiotic interface provides a summary of the antibiotic, its synonyms and actions, and also provides links via separate tabs for AMR phenotypes, genes and regions across all the data available in PATRIC. The overview tab includes a general description of the drug, the chemical structure, the mechanism of action, a description of the pharmacological activity and class and known synonyms. The AMR phenotype tab provides a list of all the genomes that have been identified as being susceptible or resistant to that antimicrobial. This tab also includes the laboratory typing method and platform, and the testing standard if that information is available. A third tab, called AMR genes, displays information on the genes associated with resistance. The final tab, AMR regions, includes the location of the specific k-mers that are associated with the genome’s phenotype.

Taxon-level view

PATRIC organizes relevant data for all the available sequenced bacterial and archaeal genomes according to NCBI taxonomy [51]. Data are summarized at each level, from the highest (the Superkingdoms: Bacteria and Archaea) to the strain (or isolate) from which the genome has been sequenced. For each taxonomic level with associated AMR data, PATRIC provides several summaries. A bar graph summarizing the antibiotics, the AMR phenotype (resistant, intermediate or susceptible) and the number of genomes that match that phenotype is available on the overview tab at the top of the main landing page for each taxon (Figure 4A). Clicking on any of the antibiotics displayed in the graph will open a new page that summarizes all the genomes from that taxon level that have the particular AMR phenotype. An alternate tabular view of the data is also available (Figure 4B). The taxon-level summary page also includes an AMR phenotype tab that lists all of the genomes within the selected taxon that have an AMR phenotype, and the data that are associated with it, including specific treatments, phenotypes or laboratory methods. All tables in PATRIC include a dynamic filter for rapid filtering of the genomes based on metadata selections.

Figure 4

A taxon-level summary on the PATRIC Web site describing AMR phenotype data across all of the genomes that are part of the Staphylococcus genus. (A) A bar graph summarizes the antibiotics, the AMR phenotype (resistant, intermediate or susceptible) and the number of genomes that match that phenotype. (B) The AMR phenotype tabular view, which shows all the genomes that have associated AMR data, includes a dynamic filter for rapid selection of genomes based on the metadata.

Gene view and predicted regions associated with AMR phenotypes

PATRIC provides a summary of data at the gene level, where the physical characteristics of a gene, its functional role(s), available experimental data and associated publications are provided. This view also includes information on homology to genes known to be important in AMR. In addition, PATRIC provides a view for predicted regions within some genes that are associated with AMR phenotypes. The k-mer regions predicted by the ML classifiers are visually indicated and their genomic region can be seen on the genome browser (Figure 5).

Figure 5

AMR predicted regions, located in the genome of S. aureus strain 08S00974, as visualized in the PATRIC JBrowse viewer [57]. These predicted regions, numbered sequentially by their occurrence in the genome as ‘classifier_predicted_regions 12–15’, were predicted by the ML algorithm that is being used to predict AMR phenotypes. The predicted regions are located in and around a gene (fig|1280.11691.peg.56) that is annotated as ‘Tetracycline resistance, MFS efflux pump = > Tet(K)’. The annotation for this gene came from the focused manual curation effort at PATRIC to incorporate and propagate information for specific genes that were known to play an important role in AMR.

Future improvements

We continue to peruse resources and publications to identify new genomes and AMR genes to incorporate into PATRIC. These will be used to expand the AMR phenotype predictions and AMR gene analysis to new genera and new antibiotics. We plan to map AMR properties to the genus-specific families (PLfams) to support comparative analysis of AMR genes, incorporate new AMR gene trees and allow users to build nucleotide-based multiple sequence alignments to identify SNPs and their association with AMR phenotypes. We are acutely aware that several important types of AMR determinants are not amenable to being encoded and automatically propagated via the automated annotation propagation strategy described above. These include antibiotic targets, which are largely cellular proteins performing essential household cellular functions, and such proteins are grouped into ‘classic’ functional roles in SEED/PATRIC. They carry functional annotations that are unrelated to AMR. Antibiotic susceptibility in these target proteins is determined by a few, or even a single, non-synonymous mutation in the corresponding gene [52-54]. Likewise, single mutations in noncoding DNA regions, including promoters, operators and attenuators, can lead to dramatic increase in MIC, or an increase in resistance levels to particular antimicrobials [55, 56]. These cases will be treated separately in PATRIC. We are in the process of designing tools specific for SNP detection and analysis targeted at the gene level. While PATRIC does not currently enable examining AMR data from metagenomes or from population-based studies, this is something that we plan to provide in future releases. PATRIC includes AMR information at both the genome and gene level, and uses manual curation and ML to integrate these data into the annotation service. A large collection of AMR-specific functional roles has been manually curated, and this information is propagated by the annotation service. With summaries of the available data across all taxonomic levels and new interfaces, researchers can quickly locate and examine these data in their private genomes and compare with the PATRIC collection. Click here for additional data file. Click here for additional data file.

43 in total

Review 1. Mutation frequencies and antibiotic resistance.

Authors: J L Martinez; F Baquero
Journal: Antimicrob Agents Chemother Date: 2000-07 Impact factor: 5.191

2. A Naturally Occurring Single Nucleotide Polymorphism in a Multicopy Plasmid Produces a Reversible Increase in Antibiotic Resistance.

Authors: Alfonso Santos-Lopez; Cristina Bernabe-Balas; Manuel Ares-Arroyo; Rafael Ortega-Huedo; Andreas Hoefer; Alvaro San Millan; Bruno Gonzalez-Zorn
Journal: Antimicrob Agents Chemother Date: 2017-01-24 Impact factor: 5.191

3. The comprehensive antibiotic resistance database.

Authors: Andrew G McArthur; Nicholas Waglechner; Fazmin Nizam; Austin Yan; Marisa A Azad; Alison J Baylay; Kirandeep Bhullar; Marc J Canova; Gianfranco De Pascale; Linda Ejim; Lindsay Kalan; Andrew M King; Kalinka Koteva; Mariya Morar; Michael R Mulvey; Jonathan S O'Brien; Andrew C Pawlowski; Laura J V Piddock; Peter Spanogiannopoulos; Arlene D Sutherland; Irene Tang; Patricia L Taylor; Maulik Thaker; Wenliang Wang; Marie Yan; Tennison Yu; Gerard D Wright
Journal: Antimicrob Agents Chemother Date: 2013-05-06 Impact factor: 5.191

4. The Sequence Read Archive: explosive growth of sequencing data.

Authors: Yuichi Kodama; Martin Shumway; Rasko Leinonen
Journal: Nucleic Acids Res Date: 2011-10-18 Impact factor: 16.971

5. RASTtk: a modular and extensible implementation of the RAST algorithm for building custom annotation pipelines and annotating batches of genomes.

Authors: Thomas Brettin; James J Davis; Terry Disz; Robert A Edwards; Svetlana Gerdes; Gary J Olsen; Robert Olson; Ross Overbeek; Bruce Parrello; Gordon D Pusch; Maulik Shukla; James A Thomason; Rick Stevens; Veronika Vonstein; Alice R Wattam; Fangfang Xia
Journal: Sci Rep Date: 2015-02-10 Impact factor: 4.379

6. PATtyFams: Protein Families for the Microbial Genomes in the PATRIC Database.

Authors: James J Davis; Svetlana Gerdes; Gary J Olsen; Robert Olson; Gordon D Pusch; Maulik Shukla; Veronika Vonstein; Alice R Wattam; Hyunseung Yoo
Journal: Front Microbiol Date: 2016-02-08 Impact factor: 5.640

7. Improvements to PATRIC, the all-bacterial Bioinformatics Database and Analysis Resource Center.

Authors: Alice R Wattam; James J Davis; Rida Assaf; Sébastien Boisvert; Thomas Brettin; Christopher Bun; Neal Conrad; Emily M Dietrich; Terry Disz; Joseph L Gabbard; Svetlana Gerdes; Christopher S Henry; Ronald W Kenyon; Dustin Machi; Chunhong Mao; Eric K Nordberg; Gary J Olsen; Daniel E Murphy-Olson; Robert Olson; Ross Overbeek; Bruce Parrello; Gordon D Pusch; Maulik Shukla; Veronika Vonstein; Andrew Warren; Fangfang Xia; Hyunseung Yoo; Rick L Stevens
Journal: Nucleic Acids Res Date: 2016-11-29 Impact factor: 16.971

8. BLAST: a more efficient report with usability improvements.

Authors: Grzegorz M Boratyn; Christiam Camacho; Peter S Cooper; George Coulouris; Amelia Fong; Ning Ma; Thomas L Madden; Wayne T Matten; Scott D McGinnis; Yuri Merezhuk; Yan Raytselis; Eric W Sayers; Tao Tao; Jian Ye; Irena Zaretskaya
Journal: Nucleic Acids Res Date: 2013-04-22 Impact factor: 16.971

9. The SEED and the Rapid Annotation of microbial genomes using Subsystems Technology (RAST).

Authors: Ross Overbeek; Robert Olson; Gordon D Pusch; Gary J Olsen; James J Davis; Terry Disz; Robert A Edwards; Svetlana Gerdes; Bruce Parrello; Maulik Shukla; Veronika Vonstein; Alice R Wattam; Fangfang Xia; Rick Stevens
Journal: Nucleic Acids Res Date: 2013-11-29 Impact factor: 16.971

10. PubChem Substance and Compound databases.

Authors: Sunghwan Kim; Paul A Thiessen; Evan E Bolton; Jie Chen; Gang Fu; Asta Gindulyte; Lianyi Han; Jane He; Siqian He; Benjamin A Shoemaker; Jiyao Wang; Bo Yu; Jian Zhang; Stephen H Bryant
Journal: Nucleic Acids Res Date: 2015-09-22 Impact factor: 16.971

28 in total

Review 1. Overview of bioinformatic methods for analysis of antibiotic resistome from genome and metagenome data.

Authors: Kihyun Lee; Dae-Wi Kim; Chang-Jun Cha
Journal: J Microbiol Date: 2021-02-23 Impact factor: 3.422

2. Using Machine Learning To Predict Antimicrobial MICs and Associated Genomic Features for Nontyphoidal Salmonella.

Authors: Marcus Nguyen; S Wesley Long; Patrick F McDermott; Randall J Olsen; Robert Olson; Rick L Stevens; Gregory H Tyson; Shaohua Zhao; James J Davis
Journal: J Clin Microbiol Date: 2019-01-30 Impact factor: 5.948

Review 3. Experimental approaches to tracking mobile genetic elements in microbial communities.

Authors: Christina C Saak; Cong B Dinh; Rachel J Dutton
Journal: FEMS Microbiol Rev Date: 2020-09-01 Impact factor: 16.408

4. The PATRIC Bioinformatics Resource Center: expanding data and analysis capabilities.

Authors: James J Davis; Alice R Wattam; Ramy K Aziz; Thomas Brettin; Ralph Butler; Rory M Butler; Philippe Chlenski; Neal Conrad; Allan Dickerman; Emily M Dietrich; Joseph L Gabbard; Svetlana Gerdes; Andrew Guard; Ronald W Kenyon; Dustin Machi; Chunhong Mao; Dan Murphy-Olson; Marcus Nguyen; Eric K Nordberg; Gary J Olsen; Robert D Olson; Jamie C Overbeek; Ross Overbeek; Bruce Parrello; Gordon D Pusch; Maulik Shukla; Chris Thomas; Margo VanOeffelen; Veronika Vonstein; Andrew S Warren; Fangfang Xia; Dawen Xie; Hyunseung Yoo; Rick Stevens
Journal: Nucleic Acids Res Date: 2020-01-08 Impact factor: 16.971

5. Viruses as key reservoirs of antibiotic resistance genes in the environment.

Authors: Didier Debroas; Cléa Siguret
Journal: ISME J Date: 2019-07-29 Impact factor: 10.302

6. A genomic data resource for predicting antimicrobial resistance from laboratory-derived antimicrobial susceptibility phenotypes.

Authors: Margo VanOeffelen; Marcus Nguyen; Derya Aytan-Aktug; Thomas Brettin; Emily M Dietrich; Ronald W Kenyon; Dustin Machi; Chunhong Mao; Robert Olson; Gordon D Pusch; Maulik Shukla; Rick Stevens; Veronika Vonstein; Andrew S Warren; Alice R Wattam; Hyunseung Yoo; James J Davis
Journal: Brief Bioinform Date: 2021-11-05 Impact factor: 13.994

Review 7. Machine Learning for Antimicrobial Resistance Prediction: Current Practice, Limitations, and Clinical Perspective.

Authors: Jee In Kim; Finlay Maguire; Kara K Tsang; Theodore Gouliouris; Sharon J Peacock; Tim A McAllister; Andrew G McArthur; Robert G Beiko
Journal: Clin Microbiol Rev Date: 2022-05-25 Impact factor: 50.129

Review 8. Sequencing-based methods and resources to study antimicrobial resistance.

Authors: Manish Boolchandani; Alaric W D'Souza; Gautam Dantas
Journal: Nat Rev Genet Date: 2019-06 Impact factor: 53.242

9. A Uniform Computational Approach Improved on Existing Pipelines to Reveal Microbiome Biomarkers of Nonresponse to Immune Checkpoint Inhibitors.

Authors: Fyza Y Shaikh; James R White; Joell J Gills; Taiki Hakozaki; Corentin Richard; Bertrand Routy; Yusuke Okuma; Mykhaylo Usyk; Abhishek Pandey; Jeffrey S Weber; Jiyoung Ahn; Evan J Lipson; Jarushka Naidoo; Drew M Pardoll; Cynthia L Sears
Journal: Clin Cancer Res Date: 2021-02-16 Impact factor: 13.801

10. Predicting Antimicrobial Resistance Using Partial Genome Alignments.

Authors: D Aytan-Aktug; M Nguyen; P T L C Clausen; R L Stevens; F M Aarestrup; O Lund; J J Davis
Journal: mSystems Date: 2021-06-15 Impact factor: 6.496