| Literature DB >> 30626002 |
Julia Rechenberger1, Patroklos Samaras2, Anna Jarzab3, Juergen Behr4, Martin Frejno5, Ana Djukovic6, Jaime Sanz7,8, Eva M González-Barberá9, Miguel Salavert10, Jose Luis López-Hontangas11, Karina B Xavier12, Laurent Debrauwer13,14, Jean-Marc Rolain15, Miguel Sanz16,17, Marc Garcia-Garcera18, Mathias Wilhelm19, Carles Ubeda20,21, Bernhard Kuster22,23.
Abstract
The microbiome has a strong impact on human health and disease and is, therefore, increasingly studied in a clinical context. Metaproteomics is also attracting considerable attention, and such data can be efficiently generated today owing to improvements in mass spectrometry-based proteomics. As we will discuss in this study, there are still major challenges notably in data analysis that need to be overcome. Here, we analyzed 212 fecal samples from 56 hospitalized acute leukemia patients with multidrug-resistant Enterobactericeae (MRE) gut colonization using metagenomics and metaproteomics. This is one of the largest clinical metaproteomic studies to date, and the first metaproteomic study addressing the gut microbiome in MRE colonized acute leukemia patients. Based on this substantial data set, we discuss major current limitations in clinical metaproteomic data analysis to provide guidance to researchers in the field. Notably, the results show that public metagenome databases are incomplete and that sample-specific metagenomes improve results. Furthermore, biological variation is tremendous which challenges clinical study designs and argues that longitudinal measurements of individual patients are a valuable future addition to the analysis of patient cohorts.Entities:
Keywords: clinical proteomics; data analysis; human gut microbiome; mass spectrometry; metaproteome; multi-omics; multidrug-resistant Enterobacteriaceae; proteomics
Year: 2019 PMID: 30626002 PMCID: PMC6473847 DOI: 10.3390/proteomes7010002
Source DB: PubMed Journal: Proteomes ISSN: 2227-7382
Figure 1Study design, proteomic workflow and data processing pipeline. (A) Acute Leukemia patients were sampled in weekly interval during the time of hospitalization. In total 212 fecal samples of 56 patients with MRE gut colonization were analyzed, providing additional information about age, gender, and treatment conditions. (B) For the protein extraction, fecal samples were divided in supernatant and pellet fractions. Bacterial cells in the pellet fraction were lysed with ultrasonication and for both samples’ proteins were digested in gel. Thereafter, samples were measured with LC-MS/MS. (C) Raw files were searched with four different databases in separate and combined MaxQuant searches and post-processed with Percolator and with quantitative functional and taxonomic annotation analyzed.
Figure 2In silico comparison of four different databases. Four different databases (Integrated Genome Reference Catalog (IGC), SWISS-PROT bacteria, SWISS-PROT human and sample specific metagenome-based databases) were digested in silico, and the possible search space was compared. (A) Venn diagram of the resulting peptides after in silico digestion comparing the three bacterial databases and all bacterial databases combined versus the peptides from the in silico digested human database. (B) Number of shared peptides in the 212 sample specific databases against the percentage of samples. The right axis indicates to which the percentage of the average sample specific database the number of shared peptides corresponds.
Figure 3Comparing the influence of database selection on peptide identification. (A) Multi-scatter plot of identified peptides at 1% PSM and peptide FDR for the four different databases and all databases combined. Identification for pellet and supernatant fraction of each sample is shown separately. Pearson correlation is shown in top left of each box (highest p value is 8.2 × 10−21). The Venn diagram shows the overlap of identified peptides over all samples for the three bacterial and the combination of all four databases. (B) Histogram of the number of identified peptides of supernatant or pellet for each sample. Raw files are sorted according to the number of identified peptides. (C) Bar plot of the number of total identified peptides over all samples per database. ‘All DBs additive’ shows the theoretical identification by summing up all unique peptides of the three bacterial database types. (D) Polynomial curve fit for the number of shared peptides across all samples for the different databases. Separated for supernatant and pellet fraction of the samples.
Figure 4Description of the taxonomic and functional composition. (A) Box plot of Pearson correlation of taxonomic composition detected at the class level with 16S rRNA sequencing and proteomic analysis for each sample. Both: supernatant and pellet for each sample combined, Supernatant: only the supernatant fraction of each sample, Pellet: only the pellet fraction of each sample. (B) Pie chart shows the most abundant identified taxonomic classes over all samples. (C) Bar plot of average spectral counts for the 10 most abundant bacterial gene ontology (GO) term over all samples. (D) Bar plot of average spectral counts for the 10 most abundant human GO term over all samples.
Figure 5Sample variability (A) Heatmap of Jaccard similarities based on the presence/absence of bacterial peptides for the top six patients with the most sampling time points. Dendrogram clustering is based on Pearson correlation of Jaccard distances. Bottom triangle for the supernatant fraction of the sample. Top triangle for pellet fraction of the sample. (B) Heatmap of Jaccard similarities based on the presence/absence of human peptides for the top six patients with the most sampling time points. Dendrogram clustering is based on Pearson correlation of Jaccard distances. Bottom triangle for the supernatant fraction of the sample. Top triangle for pellet fraction of the sample. (C) Boxplot of Jaccard similarities for bacterial peptides of paired samples with different time distances between sampling points. (D) Boxplot of Jaccard similarities for human peptides of paired samples with different time distances between sampling points.
Figure 6Comparing taxonomic and functional data for longitudinal samples. Taxonomic class abundances retrieved from proteomic and 16S rRNA data as well as GO term abundances were compared for samples for two patients over time. In addition, antibiotic treatment at sampling time point and type of hospital admission (i.e., chemotherapy or transplantation) for the sampling time is indicated.