| Literature DB >> 31874612 |
Jinhong Shi1, Yan Yan1, Matthew G Links1,2, Longhai Li3, Jo-Anne R Dillon4,5, Michael Horsch1, Anthony Kusalik6.
Abstract
BACKGROUND: Antimicrobial resistance (AMR) is a major threat to global public health because it makes standard treatments ineffective and contributes to the spread of infections. It is important to understand AMR's biological mechanisms for the development of new drugs and more rapid and accurate clinical diagnostics. The increasing availability of whole-genome SNP (single nucleotide polymorphism) information, obtained from whole-genome sequence data, along with AMR profiles provides an opportunity to use feature selection in machine learning to find AMR-associated mutations. This work describes the use of a supervised feature selection approach using deep neural networks to detect AMR-associated genetic factors from whole-genome SNP data.Entities:
Keywords: Antimicrobial resistance; Deep neural network; Feature selection; Neisseria gonorrhoeae; SNP
Mesh:
Substances:
Year: 2019 PMID: 31874612 PMCID: PMC6929425 DOI: 10.1186/s12859-019-3054-4
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Workflow of the proposed machine learning approach to identify SNPs from WGS data. The prediction of AMR resistance profiles based on these identified SNPs is also part of the workflow. Although prediction is not the main purpose of this study, it is a natural next step after feature selection. In the figure, rectangles represent methodological steps, while parallelograms without right angles represent data or information. From the SNPs, resistance genes and other genetic elements can then be identified
SNPs identified for the resistance to ciprofloxacin (CIP) by DNP-AAP
| ID Range | ID | AAP | Genes | Annotations | Known |
|---|---|---|---|---|---|
| [18797,18817] | 18799 | 0.658 | DNA gyrase subunit A | ✓ | |
| [4309,4366] | 4363 | 0.536 | DNA topoisomerase IV subunit A | ✓ | |
| 5087 | 0.506 | intergenic between NGK_0295 and NGK_0296 ∗ | |||
| 5075 | 0.497 | NGK_0295 | glutathione synthetase | ||
| 34282 | 0.483 | intergenic between NGK_2199 and NGK_2200 ∗ | |||
| 33843 | 0.482 | NGK_2182 | putative integral membrane protein | ||
| 20553 | 0.478 | NGK_1395 | OTB_PSEPK Probable sugar efflux transporter | ||
| 2285 | 0.477 | NGK_0116 | conserved hypothetical protein | ||
| 34301 | 0.475 | NGK_2201 | hypoxanthine-guanine phosphoribosyltransferase | ||
| 16353 | 0.447 | NGK_1090 | conjugal transfer pilus assembly protein TraD |
Annotations are from EnsemblBacteria. The column “ID Range” lists the ranges of SNPs that fall in known AMR-associated genes (only) in our data. ID: ID of Identified SNP.
*NGK_0295: glutathione synthetase; NGK_0296: diacylglycerol kinase (DagK); NGK_2199: PtsH; NGK_2200: putative sugar transport PTS system IIA protein
SNPs identified for the resistance to cefixime (CFX) by DNP-AAP
| ID Range | ID | AAP | Genes | Annotations | Known |
|---|---|---|---|---|---|
| 31799 | 0.423 | NGK_rrna16s3 | NGK_rrna16s3 | ||
| [28398,28481] | 28431 | 0.419 | penicillin-binding protein 2 | ✓ | |
| [28398,28481] | 28418 | 0.406 | penicillin-binding protein 2 | ✓ | |
| 29914 | 0.402 | NGK_rrna16s2 | NGK_rrna16s2 | ||
| [28398,28481] | 28417 | 0.382 | penicillin-binding protein 2 | ✓ | |
| [28398,28481] | 28428 | 0.382 | penicillin-binding protein 2 | ✓ | |
| 29915 | 0.376 | NGK_rrna16s2 | NGK_rrna16s2 | ||
| 29916 | 0.370 | NGK_rrna16s2 | NGK_rrna16s2 | ||
| [28398,28481] | 28427 | 0.368 | penicillin-binding protein 2 | ✓ | |
| [28398,28481] | 28429 | 0.367 | penicillin-binding protein 2 | ✓ |
Annotations are from EnsemblBacteria. The column “ID Range” lists the ranges of SNPs that fall in known AMR-associated genes (only) in our data. ID: ID of Identified SNP
SNPs identified for the resistance to penicillin (PEN) by DNP-AAP
| ID Range | ID | AAP | Genes | Annotations | Known |
|---|---|---|---|---|---|
| 38424 | 0.344 | NGK_2469 | conserved hypothetical protein | ||
| 33601 | 0.342 | NGK_2170 | outer membrane preprotein PIIc | ||
| 18799 | 0.330 | DNA gyrase subunit A | |||
| 29502 | 0.322 | NGK_1906 | monofunctional biosynthetic peptidoglycan | ||
| transglycosylase | |||||
| 29504 | 0.251 | NGK_1906 | monofunctional biosynthetic peptidoglycan | ||
| transglycosylase | |||||
| [2749,2763] | 2755 | 0.236 | penicillin-binding protein 1A | ✓ | |
| 35095 | 0.219 | NGK_2270 | adhesin MafA | ||
| 10120 | 0.213 | NGK_0679 | putative phage associated protein | ||
| 40335 | 0.204 | intergenic between NGK_2581 and NGK_2582 ∗ | |||
| 6817 | 0.203 | NGK_0423 | 23S rRNA pseudo-uridine 1911/1915/1917 | ||
| synthase |
Annotations are from EnsemblBacteria. The column “ID Range” lists the ranges of SNPs that fall in known AMR-associated genes (only) in our data. ID: ID of Identified SNP
*NGK_2581: Putative hemoglobin receptor component precursor HpuA; NGK_2582: Conserved hypothetical protein
SNPs identified for the resistance to tetracycline (TET) by DNP-AAP
| ID Range | ID | AAP | Genes | Annotations | Known |
|---|---|---|---|---|---|
| 27095 | 0.470 | intergenic between NGK_1771 and NGK_1772 ∗ | |||
| 21468 | 0.205 | NGK_1458 | putative phage associated protein | ||
| [37926,37927] | 37927 | 0.196 | 30S ribosomal protein S10 | ✓ | |
| 29960 | 0.159 | NGK_1968 | IS1016 transposase | ||
| 37300 | 0.150 | NGK_2398 | methionyl-tRNA formyltransferase | ||
| 40041 | 0.131 | NGK_2557 | hemoglobin/transferrin/lactoferrin receptor | ||
| protein | |||||
| 21467 | 0.121 | NGK_1458 | putative phage associated protein | ||
| 9785 | 0.120 | NGK_0668 | putative phage associated protein | ||
| 9787 | 0.120 | NGK_0668 | putative phage associated protein | ||
| 18761 | 0.119 | NGK_1227 | putative HTH-type transcriptional regulator | ||
| NMB1378 |
Annotations are from EnsemblBacteria. The column “ID Range” lists the ranges of SNPs that fall in known AMR-associated genes (only) in our data. ID: ID of Identified SNP
*NGK_1771: transferrin-binding protein A; NGK_1772: TbpB
SNPs identified for the resistance to azithromycin (AZM) by DNP-AAP
| ID Range | ID | AAP | Genes | Annotations | Known |
|---|---|---|---|---|---|
| 27421 | 0.424 | NGK_1776 | conserved hypothetical protein | ||
| 27690 | 0.420 | NGK_1793 | putative drug resistance protein | ||
| 30659 | 0.300 | NGK_2022 | Infection response protein Irg2 | ||
| 36328 | 0.294 | NGK_2342 | pilC protein | ||
| 36810 | 0.290 | intergenic between NGK_2354 and NGK_2355 ∗ | |||
| 30434 | 0.278 | intergenic between NGK_1994 and NGK_1995 ∗ | |||
| 21513 | 0.269 | NGK_1463 | putative phage associated protein | ||
| 39676 | 0.266 | NGK_2537 | homoserine kinase | ||
| 36809 | 0.258 | intergenic between NGK_2354 and NGK_2355 ∗ | |||
| 29095 | 0.254 | NGK_1872 | phosphatidylglycerophosphatase A |
Annotations are from EnsemblBacteria. The column “ID Range” lists the ranges of SNPs that fall in known AMR-associated genes (only) in our data. ID: ID of Identified SNP
*NGK_2354: Conserved hypothetical protein; NGK_2355: Hypothetical protein; NGK_1994: TspB2; NGK_1995: putative phage associated protein
Fig. 2ROC curves and AUCs for the predicted resistance profiles for the five antibiotics under consideration
Counts of N. gonorrhoeae strains for each antibiotic
| AMR/Antibiotics | CIP | AZM | TET | CFX | PEN |
|---|---|---|---|---|---|
| Susceptible | 302 | ≤0.1 | ≤0.25 | ≤0.005 | ≤0.06 |
| 45 | 26 | 75 | 46 | ||
| Resistant | 364 | ≥16 | ≥50 | ≥0.25 | ≥6 |
| 38 | 26 | 108 | 37 | ||
| Total number | 666 | 83 | 52 | 183 | 83 |
N. gonorrhoeae strains for each antibiotic are balanced by selecting strains with the lowest and the highest MIC values. Criteria for selection are given above each count
Summary of original antibiotic resistance data for N. gonorrhoeae strains
| AMR/Antibiotic | CIP | AZM | TET | CFX | PEN |
|---|---|---|---|---|---|
| Susceptible | 302 | 443 | 26 | 557 | 258 |
| Intermediate | 5 | 0 | 124 | 0 | 363 |
| Resistant | 364 | 233 | 526 | 108 | 46 |
| Total number | 671 | 676 | 676 | 665 | 667 |
There are 676 strains in total. MIC values were available for most strains for all five antibiotics. The numbers under each antibiotic are the counts in each category, obtained based on its CLSI breakpoints. CIP: ciprofloxacin; CFX: cefixime; PEN: penicillin; TET: tetracycline (TET); AZM: azithromycin (AZM)
TPR (=TP/(TP+FN)) for each antibiotic resistance prediction given different FPR (=FP/(FP+TN))
| Drug/FPR | 0.05 | 0.10 | 0.15 | 0.20 |
|---|---|---|---|---|
| CIP | 1.00 | 1.00 | 1.00 | 1.00 |
| TET | 0.74 | 0.98 | 1.00 | 1.00 |
| PEN | 0.86 | 0.95 | 0.98 | 0.996 |
| CFX | 0.89 | 0.91 | 0.96 | 0.96 |
| AZM | 0.76 | 0.89 | 0.92 | 0.93 |
CIP: Ciprofloxacin; AZM: azithromycin; TET: tetracycline; CFX: cefixime; PEN: penicillin
Fig. 3Classification performance of SNPs identified by DNP-AAP versus randomly selected SNPs. Shown are ROC curves for classifications made with SNPs identified by DNP-AAP and with randomly selected SNPs for ciprofloxacin data. The latter curve was obtained by randomly selecting 10 SNPs 100 times and averaging the resultant FPR (false positive rate) and TPR (true positive rate) values
Fig. 4Distribution of average activation potentials (AAP) for the five antibiotic datasets
Fig. 5The main steps in defining average activation potential (AAP)