| Literature DB >> 31466530 |
Suraj Gupta1, Gustavo Arango-Argoty2, Liqing Zhang2, Amy Pruden3, Peter Vikesland4.
Abstract
BACKGROUND: The interconnectivities of built and natural environments can serve as conduits for the proliferation and dissemination of antibiotic resistance genes (ARGs). Several studies have compared the broad spectrum of ARGs (i.e., "resistomes") in various environmental compartments, but there is a need to identify unique ARG occurrence patterns (i.e., "discriminatory ARGs"), characteristic of each environment. Such an approach will help to identify factors influencing ARG proliferation, facilitate development of relative comparisons of the ARGs distinguishing various environments, and help pave the way towards ranking environments based on their likelihood of contributing to the spread of clinically relevant antibiotic resistance. Here we formulate and demonstrate an approach using an extremely randomized tree (ERT) algorithm combined with a Bayesian optimization technique to capture ARG variability in environmental samples and identify the discriminatory ARGs. The potential of ERT for identifying discriminatory ARGs was first evaluated using in silico metagenomic datasets (simulated metagenomic Illumina sequencing data) with known variability. The application of ERT was then demonstrated through analyses using publicly available and in-house metagenomic datasets associated with (1) different aquatic habitats (e.g., river, wastewater influent, hospital effluent, and dairy farm effluent) to compare resistomes between distinct environments and (2) different river samples (i.e., Amazon, Kalamas, and Cam Rivers) to compare resistome characteristics of similar environments.Entities:
Keywords: Antibiotic resistance genes; Aquatic environments; Ensemble learning; Extremely randomized trees; Surveillance; Wastewater
Mesh:
Substances:
Year: 2019 PMID: 31466530 PMCID: PMC6716844 DOI: 10.1186/s40168-019-0735-1
Source DB: PubMed Journal: Microbiome ISSN: 2049-2618 Impact factor: 14.650
Metadata of different environmental samples obtained from public databases
| Sample | Biomes | Sampling location/region | Description | Database | Accession number | Total number of reads | DNA extraction kit/method | Sequencing platform | Reference |
|---|---|---|---|---|---|---|---|---|---|
ARP 1 ARP 2 ARP 3 | Amazon River plume | Western tropical North Atlantic Ocean | Samples were collected at Amazon River plume | NCBI_SRA | SRR1185414 SRR1186214 SRR1199271 | 1,724,868 1,304,682 1,277,913 | Method proposed by [ | Illumina Genome Analyzer IIx | [ |
KR 1 KR 2 KR 3 | Kalamas River | Epirus region of Greece | Samples were collected from the Kalamas River | NCBI_SRA | SRR3098756 SRR3098759 SRR3098769 | 10,792,071 7,698,589 5,351,255 | Mo Bio Power Soil Kit (Mo Bio Inc. Carlsbad, CA, USA) | Illumina HiSeq 2500 | [ |
DF 1 DF 2 DF 3 | Dairy farm effluent | Cambridge, UK | Samples were collected from the effluent lagoon of a dairy farm | EMBL-EBI | ERR1193297 ERR1193298 ERR1193301 | 23,154,788 82,819,396 26,421,627 | Meta-G-Nome DNA Isolation Kit, Epicentre) | Illumina HiSeq 2000 | [ |
HE 1 HE 2 HE 3 | Hospital effluent | Cambridge, UK | Samples were collected from the combined wastewater effluents of the main wards of university hospital | EMBL-EBI | ERR1191817 ERR1191818 ERR1191819 | 30,295,299 22,689,323 50,303,545 | |||
CR 1 CR 2 CR 3 | Cam River | Cambridge, UK | Samples were collected within the River Cam catchment | EMBL-EBI | ERR1193292 ERR1193293 ERR1193294 | 17,899,008 68,902,092 43,078,304 | |||
| HE 4 | Hospital effluent | Singapore | Sample was collected from general ward | NCBI-SRA | SRR5997540 | 2,257,389 | PowerWater DNA Isolation Kit (Mo Bio Lab, Inc, CA) | Illumina HiSeq 2500 | [ |
HE 5 HE 6 | Hospital effluent | Singapore | Samples were collected from clinical isolation ward | NCBI-SRA | SRR5997541 SRR5997548 | 1,927,227 2,090,859 |
Sampling information: WWTP influent samples
| Sample ID | Sample type | Sampling country | Coordinates | Location | Sampling date | Total reads | Annotated read |
|---|---|---|---|---|---|---|---|
| IN 1 | Influent | India | 13.036238, 80.193738 | Chennai | 10 Mar. 2016 | 13,045,504 | 15,421 |
| IN 2 | Influent | USA | 37.201889, − 76.447378 | Christiansburg | 19 Jan. 2017 | 13,460,770 | 11,776 |
| IN 3 | Influent | Philippines | 14.592113, 121.058931 | Mandalayong City | 29 Nov. 2016 | 14,332,977 | 20,267 |
| IN 4 | Influent | Switzerland | 47.405586, 8.597585 | Zurich | 18 May 2016 | 15,314,202 | 18,311 |
| IN 5 | Influent | Sweden | 57.704713, 12.926666 | Boras | 8 Jun. 2016 | 11,801,763 | 10,149 |
| IN 6 | Influent | Hong Kong | 22.406709, 114.213706 | Hong Kong | 14 Jul. 2016 | 15,763,560 | 14,979 |
Fig. 1Computational pipeline for the selection of discriminatory ARGs
Fig. 2a (Left) Gini importance of the identified top 10 discriminatory ARGs. (Right) Gini importance of the ARGs (sul1, tet(W), ermB) added in the known variations to the in silico datasets (simulated metagenomic Illumina sequencing data generated using InSilicoSeq). b Silhouette plot for in silico samples using all the annotated ARGs. c Silhouette plot for in silico samples using the discriminatory ARGs
Fig. 3Comparison of silhouette scores estimated using discriminatory features (ARGs) obtained using different classifiers and feature selection methods
Fig. 4a Heatmap and b hierarchical clustering of different aquatic environment samples based on the relative abundance of discriminatory ARGs. c Silhouette plot for environmental samples using all the annotated ARGs. d Silhouette plot for environmental samples using the discriminatory ARGs. (Legend: ARP: Amazon River Plume, DF: Dairy Farm Effluent, HE: Hospital Effluent, KR: Kalamas River, CR: Cam River, IN: Influent)
Fig. 5a Heatmap and b hierarchical clustering of different riverine samples based on the relative abundance of discriminatory ARGs. c Silhouette plot for riverine samples using all the annotated ARGs. d Silhouette plot for riverine samples using the discriminatory ARGs. (Legend: ARP: Amazon River Plume, KR: Kalamas River, CR: Cam River)