Nicholas C Penney1,2, Derek K T Yeung1,2, Isabel Garcia-Perez1, Joram M Posma1,3, Aleksandra Kopytek1, Bethany Garratt2, Hutan Ashrafian1,2, Gary Frost1, Julian R Marchesi1, Sanjay Purkayastha2, Lesley Hoyles1,4, Ara Darzi2,5, Elaine Holmes1,6. 1. Department of Metabolism, Digestion and Reproduction, Faculty of Medicine, Imperial College London, London, SW7 2AZ UK. 2. Department of Surgery and Cancer, Faculty of Medicine, Imperial College London, London, W2 1NY UK. 3. Health Data Research UK, London, NW1 2BE UK. 4. Department of Biosciences, Nottingham Trent University, Nottingham, NG11 8NS UK. 5. Institute of Global Health Innovation, Imperial College London, London, W2 1NY UK. 6. Centre for Computational & Systems Medicine, Health Futures Institute, Murdoch University, Perth, WA 6150 Australia.
Abstract
Background: Resolution of type 2 diabetes (T2D) is common following bariatric surgery, particularly Roux-en-Y gastric bypass. However, the underlying mechanisms have not been fully elucidated. Methods: To address this we compare the integrated serum, urine and faecal metabolic profiles of participants with obesity ± T2D (n = 80, T2D = 42) with participants who underwent Roux-en-Y gastric bypass or sleeve gastrectomy (pre and 3-months post-surgery; n = 27), taking diet into account. We co-model these data with shotgun metagenomic profiles of the gut microbiota to provide a comprehensive atlas of host-gut microbe responses to bariatric surgery, weight-loss and glycaemic control at the systems level. Results: Here we show that bariatric surgery reverses several disrupted pathways characteristic of T2D. The differential metabolite set representative of bariatric surgery overlaps with both diabetes (19.3% commonality) and body mass index (18.6% commonality). However, the percentage overlap between diabetes and body mass index is minimal (4.0% commonality), consistent with weight-independent mechanisms of T2D resolution. The gut microbiota is more strongly correlated to body mass index than T2D, although we identify some pathways such as amino acid metabolism that correlate with changes to the gut microbiota and which influence glycaemic control. Conclusion: We identify multi-omic signatures associated with responses to surgery, body mass index, and glycaemic control. Improved understanding of gut microbiota - host co-metabolism may lead to novel therapies for weight-loss or diabetes. However, further experiments are required to provide mechanistic insight into the role of the gut microbiota in host metabolism and establish proof of causality.
Background: Resolution of type 2 diabetes (T2D) is common following bariatric surgery, particularly Roux-en-Y gastric bypass. However, the underlying mechanisms have not been fully elucidated. Methods: To address this we compare the integrated serum, urine and faecal metabolic profiles of participants with obesity ± T2D (n = 80, T2D = 42) with participants who underwent Roux-en-Y gastric bypass or sleeve gastrectomy (pre and 3-months post-surgery; n = 27), taking diet into account. We co-model these data with shotgun metagenomic profiles of the gut microbiota to provide a comprehensive atlas of host-gut microbe responses to bariatric surgery, weight-loss and glycaemic control at the systems level. Results: Here we show that bariatric surgery reverses several disrupted pathways characteristic of T2D. The differential metabolite set representative of bariatric surgery overlaps with both diabetes (19.3% commonality) and body mass index (18.6% commonality). However, the percentage overlap between diabetes and body mass index is minimal (4.0% commonality), consistent with weight-independent mechanisms of T2D resolution. The gut microbiota is more strongly correlated to body mass index than T2D, although we identify some pathways such as amino acid metabolism that correlate with changes to the gut microbiota and which influence glycaemic control. Conclusion: We identify multi-omic signatures associated with responses to surgery, body mass index, and glycaemic control. Improved understanding of gut microbiota - host co-metabolism may lead to novel therapies for weight-loss or diabetes. However, further experiments are required to provide mechanistic insight into the role of the gut microbiota in host metabolism and establish proof of causality.
The global epidemic in obesity and associated disease states carries a significant health and economic burden. The gut microbiota (GM) has been implicated as a contributing factor in a number of these diseases, including obesity and type 2 diabetes (T2D)[1-3]. Faecal microbiota transplant experiments in obesity[3] and T2D[3,4] have shown that this relationship is causal, but these studies have failed to fully unravel the complex mechanisms behind this observation, further complicated by the fact that each individual’s GM is unique and subject to redundancy in its metabolic function[5]. Therefore, there is a need to move beyond simply profiling the composition of GM communities in order to understand the true nature of host-microbe relationships.Surgical procedures such as Roux-en-Y gastric bypass (RYGB) and vertical sleeve gastrectomy (VSG) achieve sustainable weight-loss in obesity[6]. Importantly, they are also highly successful in the resolution of obesity-related co-morbidities including T2D[7]. These metabolic outcomes are achieved through both weight-dependent and, interestingly, weight-independent mechanisms[8]. Weight-independent effects occur because bariatric surgery, particularly RYGB, induces a complex system-wide metabolic effect, including modification of the GM-host metabolic axis[9]. The overwhelming disruption to the GM caused by bariatric surgery is only just being defined, as is its functional importance. To date, few studies have explored longitudinal host-microbe interactions in human cohorts following bariatric surgery, with most studies focussing on either the microbiota or the metabolome[10,11]. Multiple mechanisms for the GMs contribution to achieving weight-loss and metabolic improvement post-surgery have been hypothesised, including: reduced energy harvest of non-digestible food types such as complex carbohydrates; reduced gut permeability leading to decreased systemic inflammation; and alterations in microbe-host co-metabolites such as bile acids (BAs), amino acids (AAs) and short-chain fatty acids (SCFAs)[10,12].Bariatric surgery provides a unique opportunity to unravel these complex host-microbe interactions through phenotyping before and after intervention to reduce the impact of inter-individual variability. Here we have performed multi-platform profiling, to establish changes in the host-microbe interactions in volunteers with obesity ± T2D and in individuals undergoing bariatric surgery with and without T2D to identify dysregulated pathways in T2D that are functionally restored after bariatric surgery. First, we have compared differences in GM-host co-metabolism in participants with T2D compared to individuals without diabetes at baseline to ascertain which metabolites were associated with glycaemic control. Next, we profiled subgroups of patients undergoing RYGB or VSG to evaluate changes in GM-host co-metabolism following these contrasting interventions and assessed their impact on glycaemic control, taking into account intervention-dependent changes in eating behaviour.Our results provide a comprehensive atlas of host-gut microbe responses to bariatric surgery, weight-loss and glycaemic control at the systems level. We find minimal overlap between metabolites associated with body mass index (BMI) and those associated with glycaemic control, consistent with weight-independent mechanisms of T2D resolution. Further, we establish multiple associations between the gut microbiota and host metabolites. However, further experiments are required to provide mechanistic insight into the role of the gut microbiota in host metabolism and establish proof of causality.
Methods
Experimental model and participant details
Recruitment
Patients referred for consideration of bariatric surgery who were obese (BMI > 30 kg/m2), aged ≥ 18, had failed efforts at lifestyle modification and dieting and were willing to comply with the trial protocol were recruited prospectively to this observational study, with volunteers providing written consent. Volunteers with type 2 diabetes (HbA1c ≥ 48 mmol/mol or treated), impaired glucose tolerance (HbA1c 42–47 mmol/mol) and without diabetes were eligible for recruitment.Patients who had previously undergone bariatric or major abdominal surgery, were or intended to become pregnant during the trial period, or took long-term antibiotics were excluded. Major abdominal surgery included patients who had undergone small or large bowel resection, liver, pancreatic, splenic or stomach surgery, as these could influence the gut microbiota and/or the patient’s metabolic state. Patients that had previously had an appendicectomy, cholecystectomy or hernia repair were not excluded.The study protocol and sample collection instructions were co-developed with patient representatives to help reduce the study burden for patients. To improve patient compliance sample collection occurred at the time of patients’ usual hospital appointments preoperatively and at 3-months post-procedure. A small exploratory cohort were also sampled at 1-year post-procedure and are reported in Supplementary Figs. 4–9, Supplementary Tables 6–8 and Supplementary Data 6–8.
Metabolic surgery
Participants underwent Roux-en-Y Gastric Bypass (RYGB) or Vertical Sleeve Gastrectomy (VSG) surgery at a National Health Service (NHS) University Teaching Hospital in London, UK. A single dose of 1.2 g intravenous co-amoxiclav was given during induction of anaesthesia (clindamycin if penicillin allergic). See Supplementary Note 1 for further details of the perioperative dietary advice given and surgical techniques used.
Regulatory approvals
The study received NHS Research Ethics Committee (15/ES/0026) approval and was registered with ClinicalTrials.gov (NCT02421055).
Clinical, demographic and dietary data collection
Participants were assessed at the above time points for: (1) anthropometric & physiological measurements, (2) demographic details, (3) biochemical parameters including glycated haemoglobin (HbA1c), (4) oral-hypoglycaemic, insulin and other medication use, length of diabetes diagnosis and other co-morbidities.
Diet assessment
An online self-reported 24-hour dietary recall questionnaire (www.myfood24.org) was utilised to capture detailed dietary intake information from patients at each of the study time points so that changes in diet following surgery could be accounted for in the analysis. Three 24-hour recall questionnaires were completed at each time point. Participants were able to pick from a selection of pictures of corresponding foods to accurately ascertain portion quantities. Collected dietary information was used to calculate Alternative Healthy Eating Index 2010 (AHEI-2010) scores as described previously[13]. In brief, scores of 0-10 were given for 11 components (maximum score 110). High scores were given for a high intake of vegetables, fruit, nuts and legumes, whole grains, long chain omega-3 fats and polyunsaturated fats, moderate intake of alcohol and low intake of sugar, sweetened drinks and fruit juice, red and processed meat, trans-fat and sodium.
Sample collection
Serum (n = 156), faecal (n = 80) and 24 h urine (n = 83) samples were collected preoperatively and at 3-months postoperatively (serum = 49, faecal = 27, urine = 30) in a non-fasted state (see Supplementary Fig. 1). Urine samples were collected in sterile containers over a 24-hour period finishing at 9am on the day of the study visit. Stool samples were collected using a Faecotainer® collection kit and stored on an ice pack provided, as close as possible to but not more than 6 h before the study visit. Serum was collected using red top BD Vacutainer® serum tubes (no additive) and processed according to manufacturer guidelines. After collection, samples were aliquoted and stored at −80 °C until analysis. Prior to freezing, a separate aliquot of homogenised stool was used to generate faecal water as follows: approximately 10 g of stool was added to 4 parts HPLC grade H2O (g/ml), vortexed at 2850 rpm for 15 min then centrifuged at 10,000 × g for 15 min at 4 °C. The resulting supernatant (faecal water) was frozen at −80 °C until analysis[14].
Analysis methods
1H-NMR spectroscopic analysis of metabolites
Sample preparation
Urine and faecal water samples were prepared for analysis by 1H-NMR spectroscopy as follows: frozen samples (−80 °C) were thawed, vortexed and then centrifuged at 1600 × g for 10 min to remove particulates and precipitated proteins. Faecal water supernatant was further filtered through Micro centrifuge filters (0.45 µm Nylon, Costar) at 16,000 × g for 15 min at 4 °C. 540 μL of each sample was mixed with 60 μL of 1.5 M KH2PO4 buffer (pH 7.4, 80% D2O) containing 1 mM of the internal reference standard, 3-(trimethylsilyl)-[2,2,3,3,-2H4]-propionic acid (TSP) and 2 mM sodium azide (NaN3), as described previously[15].After thawing, serum samples were centrifuged at 12,000 × g for 5 min at 4 °C. Subsequently, 300 µL of serum was mixed with 300 µL of 0.075 M NaH2PO4 buffer (pH 7.4) containing 0.8 mM of the internal reference standard, 3-(trimethylsilyl)-[2,2,3,3,-2H4]-propionic acid (TSP) and 3.1 mM sodium azide (NaN3), as described previously[15].
1H-NMR spectroscopy
1H-NMR spectroscopy was performed at 300 K on Bruker 600 MHz (urine and serum) and 800 MHz (faecal water) spectrometers (Bruker Biospin) using the following standard one-dimensional pulse sequence: RD – gz1 – 90° – t1 – 90° – tm – gz2 – 90° – ACQ[15]. The relaxation delay (RD) was set at 4 s, 90° represents the applied 90° radio frequency pulse, interpulse delay (t1) was set to an interval of 4 μs, mixing time (tm) was 10 ms, magnetic field gradients (gz1 and gz2) were applied for 1 ms and the acquisition period (AQA) was 2.7 s. Water suppression was achieved through irradiation of the water signal during RD and tm. For the urine samples, each spectrum was acquired using 4 dummy scans followed by 32 scans while faecal spectra were acquired using 256 scans and 4 dummy scans and collected into 64 K data points. A spectral width of 12,000 Hz was used for all the samples. Prior to Fourier transformation, the free induction decays (FIDs) were multiplied by an exponential function corresponding to a line broadening of 0.3 Hz. Serum samples were analysed by 1H-NMR spectroscopy using the standard one-dimensional pulse sequence described above and Carr-Purcell-Meiboom-Gill (CPMG) one dimensional pulse sequences. CPMG was used to attenuate broad, interfering peaks from lipids and proteins present in serum. The CPMG pulse sequence had the form RD – 90° – (t – 180° – t) n – ACQ. The acquisition parameters were set using the same settings as the standard 1D pulse sequence, with the spin-echo delay (t) set at 0.3 ms and 128 loops (n) performed. Continuous wave irradiation was applied at the water resonance frequency during the relaxation delay (RD).
Pre-processing
1H-NMR spectra were automatically corrected for phase and baseline distortions and referenced to the TSP singlet at δ 0.0 using TopSpin 3.1 software. Spectra were then digitized into 20 K data points at a resolution of 0.0005 ppm using an in-house MATLAB R2014a (Mathworks) script. Subsequently, spectral regions corresponding to the internal standard (δ −0.5 to 0.5) and water (δ 4.6–5) peaks were removed. In addition, the region containing urea (δ 5.4–6.3) was removed from the urinary and serum spectra due to its tendency to cross-saturate with the suppressed water resonance. All spectra were normalised using median fold change normalisation using the median spectrum as the reference[16].
Quantitative bile acid analysis
Quantitative analysis of 57 bile acids was performed using an established technique[17]. The method was adapted for analysis of bile acids in faecal samples.Bile acids were extracted from serum using the following method: 100 µL of serum was vortexed with 280 µL of MeOH. Samples were centrifuged at 14,000 × g for 15 min at 4 °C, followed by incubation at −20 °C for 20 min. Internal standards (16 deuterated bile acids) were added to the supernatant at a final concentration of 50 nM.Bile acids were extracted from faecal samples using the following method. Faecal samples were first freeze-dried. 100 mg of freeze-dried material was then placed in microtubes with 1 ml of 2:1:1 H2O: Acetonitrile (ACN): Isopropanol (IPA) and approximately 50 mg of 1 mm Zirconia beads. This underwent 3 × 30 seconds bead beating and a Biospec bead beater followed by centrifugation at 16,000 × g for 20 min at 4 °C. The supernatant was further filtered through Micro-centrifuge filters (0.45 µm Nylon, Costar) at 16,000 × g for 15 min at 4 °C. To ensure bile acid concentrations were within the dynamic range of the machine extracts were diluted 1:25 and 1:200 prior to analysis using the H2O:ACN:IPA mix. A mixture of internal standards (16 deuterated bile acids) was added to the filtered supernatant at a final concentration of 50 nM.
LC-MS machine conditions
BA analysis was performed using an ACQUITY ultra-performance liquid chromatography (UPLC) coupled to a Xevo triple quadrupole (TQ-S) mass spectrometer.For liquid chromatography, an ACQUITY BEH C8 column (1.7 μm, 100 mm × 2.1 mm) was used at an operating temperature of 60 °C. The mobile phase solvent A consisted of a 1:10 ACN:H2O, with 1 mM ammonium acetate and pH 4.15 adjusted with acetic acid. Mobile phase solvent B consisted of 1:1 ACN:IPA. The chromatographic gradient was as previously published[17].Mass spectrometry was performed in negative ionisation mode (ESI-) using the following parameters: capillary voltage 1.5 kV, cone voltage 60 V, source temperature 150 °C, desolvation temperature 600 °C, desolvation gas flow 1000 L/hr, and cone gas flow 150 L/hr. 57 bile acid species (36 non-conjugated, 12 taurine conjugated, 9 glycine conjugated) were assayed using multiple reaction monitoring (MRM). The transitions for each bile acid and deuterated internal standard were set as previously published[17].
Quantitative analysis of SCFAs and other carboxylic acids
A total of five short/medium chain fatty acids, three methyl-branched SCFAs and two hydroxyl carboxylic acids were analysed by GC-MS using a method adapted from Moreau et al.[18].After defrosting and mixing, 100 µL of urine / serum was aliquoted with 500 µL of methyl tert-butyl ether (MTBE) with 100ppm of internal standard (methyl stearate) and 2 µL of HCL. This was vortexed and then shaken for 20 min. Following this samples were centrifuged at 10,000 × g for 5 min at 4 °C. Next, 90 µL of the polar phase was placed into a silanised vial and vortexed with 150 µL of derivatiser N-tert-butyldimethylsilyl-N-methyltrifluoroacetamide with 1% tert-butyldimethylchlorosilane (MTBSTFA + 1% TBDMSCI). This was then incubated for 45 min at 60 °C before aliquoting into silanised inserts for analysis.The method was modified to account for higher levels of SCFA in stool. After defrosting, 100 mg of stool was aliquoted with 1,000 µL of MTBE with 100ppm of internal standard (methyl stearate) and 4 µL of HCL. 30 µL of the polar phase was mixed with 150 µL of derivatiser.
GC-MS machine conditions
Derivatised samples were analysed by GC-MS with a Bruker triple quadrupole (TQ) GC-MS/MS. Helium was used as a carrier gas at a constant flow rate of 1.5 ml/min through the column. The injector temperature was 250 °C with a split ratio 1:10. The temperature of the oven was started at 40 °C and increased at the rate of 46 °C/min to 127 °C, 2 °C/min to reach 131 °C, 30 °C/min to reach 160 °C, then 50 °C/min to reach a final temperature of 300 °C. The transfer line to the mass spectrometer was set at 280 °C. Targeted analysis of the ten compounds and internal standard was performed in multiple reaction monitoring mode (MRN) using the settings outlined in Table 1.
Table 1
GC-MS conditions used to analyse SCFA and carboxylic acid compounds.
Compound
Quantifier (m/z)
Qualifier (m/z)
Collision energy (eV)
Acetate
117 → 75
117 → 47
10
Propionate
131 → 75
131 → 47
10
Butyrate / Isobutyrate
145 → 75
145 → 43
10
Valerate / Isovalerate
159 → 75
159 → 57
12
2 Methylbutyrate
159 → 75
159 → 57
12
2 Hydroxybutyrate
147 → 73
147 → 45
20
Caproate
173 → 75
173 → 81
15
Lactate
147 → 73
147 → 45
20
Methyl stearate (IS)
87 → 55
87 → 59
10
GC-MS conditions used to analyse SCFA and carboxylic acid compounds.
Quantitative serum metabolite analysis
Quantitative analysis of other metabolites in serum samples, including amino acids, biogenic amines, acylcarnitines, phosphatidylcholines, lysophosphatidylcholines and sphingolipids was performed using the Biocrates AbsoluteIDQ® p180 kit, according to the manufacturer guidelines[19]. Samples were analysed using flow injection analysis (FIA)-MS/MS and LC-MS/MS for different metabolite groups.In total, 10 µL of serum sample / PBS / calibration / QC and 10 µL of the ISTD mix (except in blanks) was added to each well. This was dried for 30 min under nitrogen flow. Following this, 50 µL of the derivatization solution was pipetted into each well. The plate was covered and incubated for 20 min, then dried for 60 min under nitrogen flow. Next, 300 µL of extraction solvent was added to each well, shaken for 30 min at 450 rpm, then centrifuged for 2 min at 500 × g. For the LC-MS/MS 150 µL was added to 150 µl H2O. For the FIA, 15 µL was added to 750 µL of FIA mobile phase. Both plates were shaken for 2 min at 600 rpm.
Machine conditions
Samples were analysed using a Waters I-Class UHPLC system and Waters Xevo TQ-S tandem mass spectrometer.For FIA-MS/MS (direct infusion): the FIA mobile phase consisted of Biocrates Solvent I + 290 mL MeOH. A 2 min isocratic method was used, starting at 0.15 mL/min for 0.1 min, gradually decreasing to 0.03 mL/min at 1 min, increasing to 0.2 mL/min at 1.5 min, to 0.8 mL/min at 1.60 min, and finally decreasing to 0.15 mL/min at 1.95 min. MS settings were: capillary voltage 3.2 kV, cone voltage 10 V, source offset 50 V, source temp 150 °C, desolvation temp 620 °C, cone gas 150 L/H, desolvation gas 1000 L/H, collision gas flow 0.15 mL/min, probe position 5 mm.For LC-MS/MS: A Waters Acquity UPLC BEHC18 1.7 µm 2.1 x 75 mm column was used. Mobile phase A: 1000 mL H2O + 2 mL formic acid (FA), Mobile phase B: 500 mL ACN + 1 mL FA. Gradient elution was used; starting at a flow rate of 0.8 mL/min with 100% A for 0.45 min, then changing in a linear gradient to 85% A at 3.3 min, to 30% A at 5.9 min, to 100% B at 6.05 min, flow then increased in a concave gradient to 0.9 mL/min 100% B by 6.20 min, remaining at 0.9 mL/min 100% B until 6.42 min, before decreasing back in a concave gradient to 0.8 mL/min 100% B at 6.52 min. The mobile phase was then changed in a concave gradient from 100% B to 100% A between 6.52 and 6.7 min and remained at 100% A 0.8 ml/min until 7.3 min. MS settings were: capillary voltage 3.9 kV, cone voltage 20 V, source offset 50 V, source temp 150 °C, desolvation temp 350 °C, cone gas 150 L/Hr, desolvation Gas 650 L/Hr, collision gas flow 0.15 mL/min, probe position 7 mm.Data were processed using targetlynx (Waters) and METIDQ (Biocrates; version Carbon) then exported as a CSV file for statistical analysis.
Faecal metagenomic analysis
DNA extraction
Faecal samples were stored at −80 °C prior to analysis. DNA was extracted using the MoBio PowerFaecal® DNA Isolation Kit, according to manufacturer’s instructions. In brief, DNA was extracted from two separate 0.25 g aliquots of mixed whole stool samples at each analysis point. Samples were homogenised in 2 ml bead beating tubes containing garnet beads. Cell lysis of host and microbial cells was facilitated through both mechanical collisions between beads and chemical disruption of cell membranes. The reagent to precipitate non-DNA organic and inorganic material was then applied. Lastly, DNA was captured on a silica spin column, washed and eluted for downstream analysis. Quality control of DNA quality and quantity was assessed using an Agilent 4200 TapeStation.
Shotgun sequencing
Shotgun sequencing was performed using an Illumina HiSeq 4000 with paired-end 150 bp reads. Library preparation was undertaken using the NEBNext Ultra II DNA Library Prep Kit. 15 dual index barcodes (unique at both ends) were custom-designed and ordered from Integrate DNA Technologies (IDT®). Quality control of prepped libraries was performed using the Promega GloMax® and QuantiFluor® dsDNA systems. Each of the “uniquely dual-indexed” libraries were pooled and run on a single lane of the HiSeq4000. A mean of 6.89 Gb sequence data was acquired for each of 120 samples (median 6.84, range 3.88 – 14 Gb).
Processing of sequence data
There are known lane-swapping issues with the HiSeq 4000, leading to duplication of some sequencing reads. For this reason, fastq files for each sample were subject to de-duplication using FastUniq[20]. Sequencing data were then processed using the Scalable Metagenomics Pipeline (ScaMP)[21], (https://github.com/jamesabbott/SCaMP). In brief, raw sequence data were assessed for the presence of adapter sequences and trimmed using Trim Galore! (Babraham Bioinformatics) to remove low-quality bases (Q < 20) from the 3ʹ end of reads and discard trimmed reads shorter than 100 nt. Quality control of trimmed reads was performed using FastQC (Babraham Bioinformatics). Reads that mapped with BWA-MEM[22] to human genome (hg19) were removed from read pairs, as ethical permission is not available for use of human data derived from metagenomes. Remaining reads were assumed to be microbial (bacteria, archaea, virus, fungi, protozoa) and processed further. Trimmed sequence data with human reads removed have been deposited with GenBank, EMBL and DDBJ databases under the BioProject accession number PRJNA473348.MetaPhlAn 2.6[23,24] was used to determine the bacterial and archaeal taxonomic composition/abundance for each sample. Metagenome assembly was carried out in two rounds using metaSPAdes 3.11.0[25], with an initial independent assembly carried out for each sample. Unassembled reads were then pooled and subjected to a second round of assembly to improve the representation of low-abundance sequences. Taxa were normalised to relative abundance for downstream analyses.Ab-initio gene prediction was carried out using MetaGeneMark[26,27]. The resulting predictions were translated, and the protein sequences clustered using the cluster-fast method of UCLUST[28], with a 95% identity cut-off. Centroid sequences from each cluster were used to form a non-redundant gene catalogue used for downstream analyses. Gene abundance in each sample was determined by alignment of the reads using BWA-MEM against the gene catalogue, determining the number of reads mapped to each gene sequence and normalising as described[29]. Functional annotation to KEGG pathways was carried out by mapping centroid sequences to the eggNOG-Mapper database[30] (version 4.5, downloaded on 1 March 2018) using Diamond software on our in-house server.Microbial gene richness (MGR) was determined as described previously[29,31]. Briefly, data were downsized to adjust for sequencing depth and technical variability by randomly selecting 7 million reads mapped to the gene catalogue (of 11,005,136 genes) for each sample and then computing the mean number of genes drawn over 30 random samplings.
Quantification and statistical analysis
Pre-processing
To correct for dilution differences between samples normalisation procedures were applied. Global metabolite (1H-NMR spectra) data sets were corrected for dilution effects using median fold change normalisation[16]. Scaling to unit variance was then applied to serum and urine data sets, while pareto scaling was used for faecal datasets, due to the presence of dominant and variable oligosaccharide resonances. Targeted metabolites measured within urine samples were corrected for dilutional differences using osmolality and creatinine measurements[32]. Metagenome data were expressed as relative abundance. Taxa with low abundance (present in <30% of both subgroups) were excluded from downstream statistical analyses.
Univariate analysis
Due to the non-parametric nature of the results, differences between paired samples pre/post intervention in clinical data, quantified metabolites and within the gut microbiota were assessed for significance using the Wilcoxon Rank test (two-sided). Differences in non-paired data were assessed using the Mann–Whitney U test (two-sided). P-values were adjusted for multiple testing using the Benjamini-Hochberg (BH) False Discovery Rate method (pFDR). Phylogenetic Trees were generated to illustrate significant gut microbiota changes using the GraPhlAn[33] script in Python.
Multivariate analysis
Multivariate statistical analysis of normalised 1H-NMR spectra was performed using SIMCA 15 (Umetrics)[16]. Principal component analysis (PCA) was used to provide an overview of the data. Orthogonal Partial Least Squares—Discriminant Analysis (OPLS-DA) models were established based on one predictive component and one orthogonal component to discriminate between samples from participants with and without T2D. Unit variance scaling was applied to the 1H-NMR spectral data. The fit and predictivity of the models obtained were determined by the R2X and Q2Y values respectively. Significant metabolites differentiating between groups were obtained from 1H-NMR OPLS-DA models after investigating 1H-NMR signals with correlation coefficient values higher than 0.35. Jack-knifed 95% confidence intervals of the coefficients were used to confirm statistical significance of the variables.Paired global metabolic data, pre- and post-intervention, were analysed using Repeated Measures, Monte-Carlo Cross-Validation, PLS-DA (RM-MCCV-PLSDA)[34,35] using covariate adjusted projection to latent structures in MATLAB. Data were centred and scaled to account for the repeated-measures design. 1000 MCCV models were generated and used to calculate the mean cross-validated predictive component score (Tpred) and variance for each sample[35]. The fit and predictivity of the models obtained was determined by the R2X and Q2Y values respectively. Gaussian kernel density estimates of the Tpred in each group were generated for visual interpretation[35]. A total of 25 bootstrap resamplings in each of the 1000 models was used to estimate the variance and mean coefficient for each variable and derive a p value for each variable accordingly[35]. Benjamini-Hochberg false discovery corrections were performed and a variable was considered significant with a false discovery rate value (q) ≤ 0.01. Manhattan plots showing -log10(q) x sign of the variable regression coefficient for each variable within each RM-MCCV-PLSDA model were generated, with dotted lines added to illustrate the q value significance cut off level on the log10 scale.Exploration of gut microbiota taxa was performed using Principal coordinates analysis (PCoA) of Bray-Curtis dissimilarity matrices (β-diversity) using the Vegan[36] function in R. Significance of group separation in β-diversity was assessed by permutational multivariate analysis of variance (PERMANOVA). Nested PERMANOVA was used for paired analyses pre- and post-intervention to account for the repeated measures design.
Metabolite identification
A combination of data-driven strategies such as such as SubseT Optimization by Reference Matching (STORM)[37] and Statistical TOtal Correlation SpectroscopY (STOCSY)[38] and analytical identification strategies were used to aid structural identification of significant discriminatory metabolites. Specifically, a catalogue of 1D 1H-NMR sequence with water pre-saturation and 2D NMR experiments such as J-Resolved spectroscopy, 1H-1H TOtal Correlation SpectroscopY (TOCSY), 1H-1H COrrelation SpectroscopY (COSY), 1H-13C Hetero-nuclear Single Quantum Coherence (HSQC) and 1H-13C Hetero-nuclear Multiple-Bond Correlation (HMBC) spectroscopy were performed. Finally, where possible, metabolites were confirmed by in situ spiking experiments using authentic chemical standards. See Supplementary Fig. 2 for an example 1H-NMR spectrum labelled with identified metabolites.Relative concentrations of identified metabolites from 1H-NMR datasets were calculated from intensity measurements of a representative spectral peak of the metabolite, ensuring no overlap with signals from other metabolites.
Euler diagram of metabolites
A Euler diagram of identified metabolites from Serum, Urine and Faecal biofluids associated with Bariatric Surgery, Weight / BMI, T2D and Diet (pFDR <0.05) was generated using Eulerr in R (version 6.1.1)[39].
DIABLO integration of omics datasets
To probe relationships between data sets we used Data Integration Analysis for Biomarker discovery using Latent cOmponents (DIABLO)[40], implemented through the mixOmics[41] package in R. DIABLO extends sparse generalized canonical correlation analysis (sGCCA)[42] to a classification framework. Resulting in a multi-omics integrative method that simultaneously identifies key variables correlated across different data types while discriminating between phenotypic groups.Normalised datasets (gut microbiota species taxa, gut microbiota KEGG pathways (Levels 2-3) and quantified metabolites from urine, serum and faecal biofluids) were used with a full weighted design matrix where correlation was 0.1 between data matrices and 1 for the Y outcome, to result in a correlated and discriminant molecular signature[41]. To account for the repeated measures (pre/post procedure) experimental design, multilevel DIABLO (mDIABLO) models were constructed using within-subject variation matrices for each omics dataset[40,43,44]. Classification performance was assessed using the balanced error rate (BER) from cross-validation of samples, with BER = 0.5 x (false positive rate + false negative rate). BER scores range from 0-1, with a perfect classification model scoring 0, a random predictor 0.5 and a model with systematically incorrect predictions 1.
Gut microbiota-metabolome associations
Spearman’s correlations between BMI, HbA1c and Microbiota-Metabolome datasets were generated in MATLAB. Partial Spearman’s correlations were also performed to adjust for covariates. Corrected p values (pFDR) were used to select significant correlations. Significant (pFDR<0.01) first order correlations to BMI and HbA1c and cross-correlations between these variables were displayed using Cytoscape 3.8.0.[45]. Correlations between gut microbiota, metabolite and dietary datasets were displayed using ComplexHeatmap in R[46]. Correlations with a pFDR value <0.05 are displayed, correlations with a pFDR <0.01 are highlighted. Hierarchical clustering of correlations was performed using Euclidean distances.
Authors: Vanessa K Ridaura; Jeremiah J Faith; Federico E Rey; Jiye Cheng; Alexis E Duncan; Andrew L Kau; Nicholas W Griffin; Vincent Lombard; Bernard Henrissat; James R Bain; Michael J Muehlbauer; Olga Ilkayeva; Clay F Semenkovich; Katsuhiko Funai; David K Hayashi; Barbara J Lyle; Margaret C Martini; Luke K Ursell; Jose C Clemente; William Van Treuren; William A Walters; Rob Knight; Christopher B Newgard; Andrew C Heath; Jeffrey I Gordon Journal: Science Date: 2013-09-06 Impact factor: 47.728
Authors: Robert A Koeth; Zeneng Wang; Bruce S Levison; Jennifer A Buffa; Elin Org; Brendan T Sheehy; Earl B Britt; Xiaoming Fu; Yuping Wu; Lin Li; Jonathan D Smith; Joseph A DiDonato; Jun Chen; Hongzhe Li; Gary D Wu; James D Lewis; Manya Warrier; J Mark Brown; Ronald M Krauss; W H Wilson Tang; Frederic D Bushman; Aldons J Lusis; Stanley L Hazen Journal: Nat Med Date: 2013-04-07 Impact factor: 53.440
Authors: Matthew D Barberio; G Lynis Dohm; Walter J Pories; Natalie A Gadaleta; Joseph A Houmard; Evan P Nadler; Monica J Hubal Journal: Front Endocrinol (Lausanne) Date: 2021-10-06 Impact factor: 5.555