Literature DB >> 35024099

Inferring early-life host and microbiome functions by mass spectrometry-based metaproteomics and metabolomics.

Veronika Kuchařová Pettersen1,2,3, Luis Caetano Martha Antunes4,5, Antoine Dufour6, Marie-Claire Arrieta6,7,8.   

Abstract

Humans have a long-standing coexistence with microorganisms. In particular, the microbial community that populates the human gastrointestinal tract has emerged as a critical player in governing human health and disease. DNA and RNA sequencing techniques that map taxonomical composition and genomic potential of the gut community have become invaluable for microbiome research. However, deriving a biochemical understanding of how activities of the gut microbiome shape host development and physiology requires an expanded experimental design that goes beyond these approaches. In this review, we explore advances in high-throughput techniques based on liquid chromatography-mass spectrometry. These omics methods for the identification of proteins and metabolites have enabled direct characterisation of gut microbiome functions and the crosstalk with the host. We discuss current metaproteomics and metabolomics workflows for producing functional profiles, the existing methodological challenges and limitations, and recent studies utilising these techniques with a special focus on early life gut microbiome.
© 2021 The Authors.

Entities:  

Keywords:  Early life human microbiome; Metabolomics; Metagenomics; Metaproteomics; Microbial colonisation

Year:  2021        PMID: 35024099      PMCID: PMC8718658          DOI: 10.1016/j.csbj.2021.12.012

Source DB:  PubMed          Journal:  Comput Struct Biotechnol J        ISSN: 2001-0370            Impact factor:   7.271


Introduction

The mammalian gastrointestinal tract accommodates one of the densest microbial populations known, the gut microbiome. Each mammalian species, including humans, has a unique microbial community that has coevolved with its host and is finely adapted to the species lifestyle [1]. The trillions of microbial cells, including bacteria, fungi, protozoa, archaea, as well as viruses, all take advantage of the nutrient-rich gut environment, but it is mainly bacteria for which there is evidence of benefits being provided to host physiology. Commensal bacteria augment host functions by breaking down indigestible food components, synthesising essential vitamins, stimulating the immune system, and protecting against invading pathogens [2], [3], [4]. Still, the nature of the relationship mammalian hosts share with their gut microbiomes is convoluted, and research has so far elucidated only initial clues of the functions involved in microbiome-host crosstalk. The gut microbiome has been linked to the development and progression of both infectious [5] and chronic non-communicable diseases [6], [7], including cancer [8], autoimmune [9], and neurological disorders [10]. Practical knowledge about the gut microbiome is highly relevant for medicine because characteristics of the gut microbiome can be used as a complementary tool to clinical diagnosis and be a target for therapeutic interventions by itself. By being a diagnostic adjunct, microbially derived biomarkers could inform on treatment response [11], serve as a window into the side-effects of antibiotics [12] and other drugs [13], or be a baseline measurement before therapy initiation [14]. Most importantly, because of its inherent connection to human physiology, the gut microbiome will serve as a signature of diseases, based on which targeted therapeutic interventions such as guided nutritional plans can be recommended [15]. In this minireview, we outline advances in gut microbiome characterisation using high-resolution liquid chromatography–mass spectrometry (LC–MS) for large-scale profiling of proteins and metabolites. Initially, we discuss steps in a typical workflow used in LC–MS-based metaproteomics and metabolomics (Fig. 1). Although there are differences in microbial colonisation and dissimilar protein and metabolite profiles along the gastrointestinal tract [16], we concentrate on stool-based approaches because of their application for biomarker discovery and non-invasive nature. Furthermore, feces are a heterogeneous material rich in various macromolecules and small metabolites, introducing challenges for analysis using instrumental methods and subsequent computational workflows. We conclude the review with recent studies using LC–MS omics for gut microbiome characterisation in the pediatric population (Table 1), which have enabled deeper biological insights on microbe-microbe and host-microbe interactions during early life.
Fig. 1

Key steps during functional investigations of the human microbiome by techniques based on liquid chromatography-mass spectrometry. The workflow starts with a robust design of a clinical study and experimental controls. Sample transport chain, storage, and pretreatment methods need to be carefully evaluated as any of these steps might influence the composition of microbial cells and different biomolecules. Mass spectra acquisition is followed by searching the data against a sample-specific database and statistical filtering of false-positive matches. Metaproteome and metabolome datasets can be analysed by different bioinformatics and statistical approaches to extract biological information (see text for details). Finally, further experimental design is needed to validate identified proteins and metabolites significantly associated with a specific phenotype. Figure was created with Biorender.com.

Table 1

Metaproteomic and metabolomic studies describing early life gut microbiome functions.

ReferenceMain objectives(s) with highlighted LC–MS techniquesStudy Population2Samples Collected (Age)Sample Storage and Pre-processingLC–MS/MS analysis; instrument, software and database usedDetected peptide, proteins, or metabolitesKey findings
Henderickx et al. 2021 [34]To characterise GIT functionality and maturation of preterm infants by GIT enzyme activity assays and metaproteomics.Preterm infants n = 40 (GA 24–33), term infants n = 3 (GA 37–42)Gastric aspirates (PW 1–2), feces (PW 1–6)Samples were frozen at−20 °C after collection, and stored at −80 °C.nano-LC–LTQ-Orbitrap-MS; MaxQuant;in-house database based on 16S rRNA sequencing and the Human MicrobiomeProject reference genomes89,294 unique proteins, 2317 protein groups (886-human or bovine, 1431-bacterial)The fecal proteome of preterm infants was deprived of GIT barrier-related proteins compared to term infants. In preterm infants, bacterial oxidative stress proteins were increased compared to term infants and higher birth weight correlated with higher relative abundance of bifidobacterial proteins.



Lay et al. 2021 [160]To elucidate characteristics (metabolome, 16S rRNA profile, metagenome, metatransciptome) of a compromised microbiome and study the role of a synbiotic in microbiome restoration.127 infants born by elective C-section26 vaginally born infantsFeces (PW 1–22)Individual stool samples werelyophilized and equal amount of dry weight was combined to prepare a pool sample for each treatment group.UPLC–MS(QExactive)KEGG and HMDB for metabolomicsNot givenGut microbiome acquired during elective C-section birth was adapted to a more oxidative environment characterised by reactive oxygen species metabolism, biosynthesis of lipopolysaccharides and the absence of detection of genes, transcripts involved in the metabolism of milk carbohydrates.



Petersen et al. 2021 [119]Investigation of the meconium metabolome to identify components of the neonatal gut niche that contribute to allergic sensitization.100 infants of the CHILD studyMeconium -the first stool passed after birthNot reported besides storage at −80 °C before metabolomic analysisUPLC–MS/MSProprietary analysis done at Metabolon, Inc.714 metabolitesNewborns who develop immunoglobulin E-mediated allergic sensitization by 1 year of age had a less-diverse gut metabolome at birth, and specific metabolic clusters were associated with both protection against atopy and the abundance of key taxa driving microbiota maturation.



Cortes et al. 2019 [172]To develop metaproteomics approach for assessment of biological phenotype and metabolic status, as a functional complement to DNA sequence analysis.8 infantsFeces, one timepoint (2–5 months of age)4 °C for 1 h, homogenised stool aliquots kept at −80 °CDifferential centrifugation to enrich for bacterial cellsFractionation of the peptide mixes by strong cation exchange chromatography;nanoAcquity UPLC–MS (Q Exactive);Mascot software;Custom database based 16S rRNA sequencing15,250 unique peptides,2154 protein groupsMetaproteomics data yielded more refined information on microbial composition than 16S rRNA gene sequencing of the same samples.



Levan et al. 2019 [36]To test whether elevated faecal concentrations of 12,13-diHOME identified in infants by targeted metabolomics promote allergic inflammation in experimental models.91 infantsFeces (first month of life)Initial condition of storage not given, later stored at −80 °CLC–MS (LTQ-Orbitrap-XL)Faecal oxylipin (9,10-diHOME and 12,13-diHOME)An increase in the copy number of bacterial epoxide hydrolase genes linked to 12,13-diHOME production, or the concentration of 12,13-diHOME in the faeces of neonates was found to be associated with an increased probability of developing atopy, eczema or asthma during childhood.



Brown et al. 2018 [35]To study the premature infant gut colonization process by metagenomics and metaproteomics.35 preterm infant (GA 24–32)Feces (first 3 months of life)Direct freezing at −80 °CMicrobial cells enrichment by filtration;LC–MS/MS (LTQ-Orbitrap Elite MS); MyriMatch v2.1; Matched metagenome-based database:8691 protein familiesInfants were found to be colonized by similar microbes, but each underwent a distinct colonization trajectory. Related microbes colonizing different infants were found to have distinct proteomes, indicating that microbiome function is not only driven by which organisms are present, but also largely depends on microbial responses to the unique set of physiological conditions in the infant gut.



Zwittink et al. 2017 [71]To study microbiota development during the first six weeks in preterm infants by 16S-rRNA gene sequencing and metaproteomics, and to identify the factors associated with this development.10 preterm infants (GA 25–30)Feces (PW 1–6)Direct freezing, temporal storage at −20 °C until transfer to −80 °Cnano-LC–LTQ-Orbitrap-MS, MaxQuantCustom database based on the bacterial part of the Human Microbiome Project (Uniprot)- 87 bacterial species, 438,537 sequences953 bacterial proteinsGA-dependent microbial signature differentiated between extremely preterm (25–27 GA) and very preterm (30 GA) infants. In very preterm infants, the intestinal microbiota developed toward a Bifidobacterium-dominated community and associated with high abundance of proteins involved in carbohydrate and energy metabolism. Extremely preterm infants remained predominantly colonized by facultative anaerobes and associated with proteins involved in membrane transport and translation.



Young et al. 2015 [170]To determine time-dependent functional signatures of microbial and human proteins during early colonization of the gut.One preterm infants (GA 28)Feces (PW 1–3)Immediately stored at −80 °C until analysisnano-2D-LC–MS/MS (LTQ Orbitrap Velos);SEQUEST & DTASelect,Database derived from metagenome data16,605 peptides, and 4031 proteins (per run)Detected human proteins included those responsible for epithelial barrier function and antimicrobial activity. Neutrophil-derived proteins increased in abundance, suggesting activation of the innate immune system.

Abbreviations used: GA - gestational age; GIT - gastrointestinal tract; HMDB - Human Metabolome Database, PW - Postnatal week.

Key steps during functional investigations of the human microbiome by techniques based on liquid chromatography-mass spectrometry. The workflow starts with a robust design of a clinical study and experimental controls. Sample transport chain, storage, and pretreatment methods need to be carefully evaluated as any of these steps might influence the composition of microbial cells and different biomolecules. Mass spectra acquisition is followed by searching the data against a sample-specific database and statistical filtering of false-positive matches. Metaproteome and metabolome datasets can be analysed by different bioinformatics and statistical approaches to extract biological information (see text for details). Finally, further experimental design is needed to validate identified proteins and metabolites significantly associated with a specific phenotype. Figure was created with Biorender.com. Metaproteomic and metabolomic studies describing early life gut microbiome functions. Abbreviations used: GA - gestational age; GIT - gastrointestinal tract; HMDB - Human Metabolome Database, PW - Postnatal week.

Need for functional description of the gut microbiome

The potential for exploiting the gut microbiome in biomedical applications is immense; however, host-microbiome molecular interactions are still largely uncharacterised. This is in part because of the microbiome multi-layered complexity. The gut microbiome configuration depends on a metabolically active microbial community (microbiota), which dynamically responds to fluctuating physio-chemical properties of the gut [17], the host control of its composition [18], and other potential factors influencing the host-microbiota interactions such as microbial pathogens and medications. Additionally, only a small portion of human gut microorganisms have been cultured in specialized laboratories, and most of the microbiota remains uncharacterised using cultivation techniques. Because of the latter, culture-independent approaches such as profiling taxonomical marker genes [16S ribosomal RNA gene for bacteria and the internal transcribed spacer (ITS) region for fungi] have gained a major foothold among methods for microbiota characterisation. Although amplicon sequencing based on 16S and ITS is limited to describing taxonomic composition, researchers can use bioinformatics methods such as PICRUSt (https://picrust.github.io/picrust) to predict the microbial community functional profiles based on the taxa found [19]. Amplicon sequencing and predictive functional profile tools are restricted by low power for taxonomy resolution. Therefore more robust whole genome shotgun sequencing is used to answer specific biological questions about less abundant taxa [20], interindividual strain transfer [21], or prevalence of gene families such as those involved in antimicrobial resistance [22]. Besides delivering more refined information on the microbiome taxonomic composition, metagenomics gives insights into the functional capabilities of the microbiome by profiling the relative abundances of genes within the microbial community. Still, similar to other omics strategies, many challenges remain, including efforts to answer questions about lower abundant taxa. Aspects such as sequencing depth, human DNA content removal, and targeted enrichment methods for less abundant microbial taxa need to be therefore considered in the design of metagenomics experiments [23]. DNA sequencing techniques will continue to be indispensable in microbiome studies. Still, conclusions about microbiome function derived from metagenomics predictions must be treated as hypotheses requiring functional validation [24]. Despite an earlier belief that the gut microbiome functional profile is more stable and generally conserved, based on the bioinformatic annotation of putative protein-coding genes [25], studies measuring mRNA or proteins have demonstrated that the metatranscriptome and metaproteome display greater variability and sensitivity to perturbation when compared to the information content of the metagenome [26], [27], [28]. This is partly because of an imperfect coupling of the gut microbiome composition and function [29], which stems from complex regulatory networks along the gene-transcript-protein expression path. Although metatranscriptomics gives greater insights into the functional potential of the microbial community than metagenomics [26], not all transcripts are translated to proteins in the same manner. For example, timing of expression (transcriptional regulation) and various mechanisms of post-transcriptional regulation, such as differences in mRNA stability, will affect transcript levels [30]. Similarly, protein abundance is a combined result of protein synthesis and degradation, the latter being ignored in metatranscriptomics. Accordingly, a popular strategy to gain insights into the microbiome function has been integration of DNA- or RNA-based information with high-throughput measurements of microbial metabolic products and proteins, i.e., metabolomics and metaproteomics.

Proteins and metabolites as microbiome functional descriptors

Each microbial cell responds to the unique physicochemical conditions of the host by adjusting its protein synthesis, metabolism, and secretion of biomolecules that facilitate its adaptation to the environment. Proteins carry out most functions in the cell (e.g., catalysis of biochemical reactions, transport, maintenance of cell structure), and protein amounts reflect the cell's most recent activities. Metaproteomics, the characterisation of the entire set of proteins accumulated by all community members at a given point in time [31], has emerged as the most relevant approach to characterise gut microbiome function. In addition, metaproteomics can simultaneously detect host and microbial proteins and aid in the characterisation of host-microbiome interactions [32]. Besides proteins, the collection of small molecules found in feces, the fecal metabolome, can be seen as a recording of the recent chemical communication between the microbial community and its host. Metaproteomics and metabolomics thus provide insight into the metabolic and physiological state of both the host and microbiome, and give a direct description of their phenotypes (Fig. 2).
Fig. 2

Advantages and challenges of liquid chromatography-mass spectrometry (LC–MS) omics. Metaproteomics and metabolomics complement other meta-omic approaches such as metagenomics that assess the diversity and functional potential of microorganisms but cannot observe their actual phenotypes. Further, metaproteomics and metabolomics can identify proteins and metabolites originating from either the host or microbiome and give indications of their interactions. However, a wide range of metabolites is common to the human host and gut microbes and thus not possible to discriminate by metabolomics. A significant advantage of LC–MS omics is their ability to characterise cellular metabolism at the molecular level for different microbial species and provide system-level information for the host. Besides these advantages, five challenges of LC–MS omics are listed. These include the chemical complexity of fecal samples, lack of standardisation, and especially bioinformatics and statistical challenges associated with large datasets. Moreover, even if different omics analyses are done on the same sample, complex gene expression regulation processes will hinder direct comparison between DNA abundance and the levels of transcripts and proteins, and consequently the omics data interpretation. Also, complex sample preparation protocols and incomplete databases on proteins cleavage sites and other post-translational modifications are hindering the use of metaproteomics in the discovery of novel regulatory mechanisms. In metaproteomics a formidable issue is the assignment of shared peptides to proteins that originate from different microbial species. Finally, both metabolomics and metaproteomics face the challenge of low abundant molecules detection in complex mixtures. For the sake of clarity, the last two points are not illustrated in the figure. Figure was created with Biorender.com.

Advantages and challenges of liquid chromatography-mass spectrometry (LC–MS) omics. Metaproteomics and metabolomics complement other meta-omic approaches such as metagenomics that assess the diversity and functional potential of microorganisms but cannot observe their actual phenotypes. Further, metaproteomics and metabolomics can identify proteins and metabolites originating from either the host or microbiome and give indications of their interactions. However, a wide range of metabolites is common to the human host and gut microbes and thus not possible to discriminate by metabolomics. A significant advantage of LC–MS omics is their ability to characterise cellular metabolism at the molecular level for different microbial species and provide system-level information for the host. Besides these advantages, five challenges of LC–MS omics are listed. These include the chemical complexity of fecal samples, lack of standardisation, and especially bioinformatics and statistical challenges associated with large datasets. Moreover, even if different omics analyses are done on the same sample, complex gene expression regulation processes will hinder direct comparison between DNA abundance and the levels of transcripts and proteins, and consequently the omics data interpretation. Also, complex sample preparation protocols and incomplete databases on proteins cleavage sites and other post-translational modifications are hindering the use of metaproteomics in the discovery of novel regulatory mechanisms. In metaproteomics a formidable issue is the assignment of shared peptides to proteins that originate from different microbial species. Finally, both metabolomics and metaproteomics face the challenge of low abundant molecules detection in complex mixtures. For the sake of clarity, the last two points are not illustrated in the figure. Figure was created with Biorender.com. Functional characterisation of stool is an attractive option to assess human health and disease due to the non-invasive sampling nature and broad coverage of biomolecules reflecting different physiological processes. Both metaproteomics and metabolomics have been used in clinical research to discover biomarkers that might facilitate early detection and diagnostics of various diseases. For example, several studies demonstrated the potential of proteins and peptides present in stool as biomarkers for colorectal cancer and other bowel-related diseases in the adult population [33]. In the pediatric population, a few metaproteomics studies reported findings on promising protein biomarkers for gastrointestinal tract maturation [34], [35]. Further, detection of metabolites identified as key mediators of the interactions between the gut microbiome and the host during early life is critical for disease prevention. A potential biomarker for early prediction of disease risk is 12,13-diHOME, a linoleic acid metabolite produced by certain gut bacteria that was elevated in neonates who developed asthma during childhood [36]. On the other hand, indole-3-lactic acid has been associated with beneficial microbiota in infants, decreased inflammation in intestinal epithelial cells [37], and beneficial immunoregulation [38]. However, the above-mentioned metabolites were identified in small cohorts, and future studies must address their validation on a larger number of clinical samples. Overall, although there are still limited numbers of metaproteomics and metabolomics studies of human diseases, the methodologies and available analytical tools have been recently greatly refined and encourage further in-depth characterization of the gut microbiome.

Mass spectrometry-based metaproteomics and metabolomics

General principles. The combination of liquid chromatography (LC) and mass spectrometry (MS) is a powerful analytical method for large-scale identification and quantification of biomolecules. LC–MS can be used in a global discovery mode to identify thousands of compounds or in a targeted manner for detecting specific analytes at levels of a few parts per billion [39]. In a prototypical LC–MS experiment, a solution containing analytes of interest is first separated on an LC column to reduce sample complexity. Then, the LC effluent is directed to the mass spectrometer, where it is nebulised, desolvated, and ionised by an ionisation source, allowing small biomolecules to enter the gas phase as charged particles. By applying electromagnetic fields, the charged particles migrate under a high vacuum through a series of mass analysers, where they are sorted according to their mass-to-charge ratio (m/z). The resulting peak patterns define a fingerprint of the original sample. Tandem mass spectrometry (MS/MS), an analytical setup where two or more MS acquisitions are arranged sequentially, is especially useful for analysing complex biological mixtures and when greater certainty of analyte identification is desired. In the first MS, precursor ions of selected m/z are isolated from the rest of the ions and fragmented by collision with an inert gas into product ions, which are mass analysed in the second MS. This transition from precursor to product ions is specific for each compound and distinguishes even minor changes in molecular structure [40]. Although this method provides a high degree of selectivity, some highly similar isomers still cannot be distinguished, and additional information (discussed in chapter 6 – Metabolite identification) or alternative methods (nuclear magnetic resonance) are required for structural elucidation. Data acquisition. Data-dependent acquisition (DDA) and data-independent acquisition (DIA) are the two standard modes used in the untargeted identification of biomolecules based on high-resolution mass spectrometry [41], [42]. When using DDA, MS/MS data acquisition occurs sequentially, and the resulting data are used to search an existing database. The main DDA advantages include 1) a simpler setup, 2) a need for less computational resources, and 3) a more sensitive quantification than DIA. However, the main issue with DDA has been lower precision and reproducibility and undersampling of low-abundance analytes compared to DIA. In the DIA mode, MS/MS data acquisition occurs in parallel across analytes, and the resulting MS spectra are highly multiplexed. In contrast to DDA, all analytes are analyzed during the second stage of tandem MS, resulting in no need for an a priori knowledge of the sample composition. Moreover, DIA can quantify analytes in complex mixtures over a large dynamic range, thereby overcoming the challenge of undersampling when using DDA. One of the current challenges of DIA is an unmet need for tools and software that can be used to deconvolute the complex spectra produced. In this review, we will primarily discuss studies using the DDA approach. Data analyses. An LC–MS run produces raw data, which need to be denoised, peak-picked, feature-detected, deisotoped, and deconvoluted before analyte identification [43], [44], [45]. These preprocessing steps are crucial as any errors produced during the initial stage will propagate throughout the analysis. Data preprocessing methods are constantly being improved [46], [47] and are an integral part of proteomics and metabolomics software packages, which search the mass spectra against a database of candidate biomolecules and compare the experimental observations to theoretical patterns (discussed in detail in chapters 5 and 6). However, due to noisy data containing high background signals and incomplete databases, analyte identification is prone to false positives and mismatches. Statistical analyses that assign quality metrics are therefore needed to ascertain the significance of analytes identifications. Several reviews have summarised recent metaproteomics [48], [49] and metabolomics [50], [51] software and addressed in detail issues associated with database size and completeness, demand for computational power, and identification of false-positive matches. Finally, all areas of mass spectrometry applications are currently being challenged when it comes to the standardisation of analytical workflows [52] as well as data and method transparency. Nevertheless, developments in the metagenomics community [53], [54] predict that the potential of mass spectrometry applications can be fully realised and LC–MS techniques more widely adopted as long as the community guidelines [55] and the FAIR, i.e., findable, accessible, interoperable, and reusable, principles are followed for reporting of methods, data, and software [56].

Current challenges of stool-based metaproteomics

Sample storage and processing. Metaproteomics and metabolomics approaches based on LC–MS technology share similar sample collection and processing workflows, yet unique methodological and computational challenges exist for both techniques (Fig. 2). Among the biggest challenges of stool-based metaproteomics is sample processing. In addition to carrying a complex microbial community, the stool matrix comprises undigested food particles and various host components (see chapter 6 for the macromolecular composition of feces). Therefore, an appropriate sample processing protocol needs to be evaluated in the context of each study's aims and should consider unbiased methods for storage of collected samples, microbial protein enrichment, and protein extraction efficiency [57], [58]. Sample storage is a crucial step in any omic study because different storage temperatures introduce alterations to microbial profiles [59], [60]. Frozen intact stool material is more stable than frozen extracted proteins [61] and thus recommended for long-term storage. Several studies tested preservatives that maintain sample integrity at room temperature when immediate freezing is not possible, and the results indicated RNAlater as suitable for metaproteome preservation [62], [63]. However, these studies only examined environmental samples, and the effects of preservatives have not yet been characterised for stool-derived metaproteomes. Further, different enrichment methods, such as strategies based on double filtering [64] and differential centrifugation [65], have been applied to concentrate microbial cells from stool samples. The differential centrifugation step was later shown to cause non-specific removal of microbial cells and proteins [66]. Stool without pretreatment thus likely provides the best representation of the microbial proteins. Finally, differences in cell membranes between microbes require combining chemical and physicomechanical methods to ensure proper disruption of different cell types and consequently optimal metaproteome coverage [67]. Protein databases. Similar to proteomics, metaproteomics aims to identify and quantify all proteins in a sample, but in addition, each protein has to be correctly assigned to a microbial species [55]. Proteins extracted from stool samples are first digested into peptides whose smaller size is better suited for LC–MS analysis. The most common approach for peptide identification is matching the experimental MS/MS spectra against theoretical fragmentation patterns of peptides derived from in silico digestion of a protein sequence database [48], [68], [69]. Currently, shared peptides originating from homologous proteins remain a challenge when searching for protein IDs from a specific species and this complexity is greatly enhanced when profiling the microbiome. The success of peptide identifications depends on the provided database, making the protein database selection crucial in any proteomic workflow [69]. Estimations for fecal samples suggest the presence of 200,000 [55] to 1,000,000 [70] proteins, leading to enormous sequence databases that bring associated bioinformatics challenges. The larger the protein database is, the lower the sensitivity of identifications, the higher the computational requirements and the chance of false-positive matches. Hence, more tailored databases give better results, and ideally, spectral searches should be performed using matched metagenome or metatranscriptome databases derived from the same sample. Although metagenome-based databases have several drawbacks, such as being prone to sequencing and assembly errors, often lacking useful sequence annotation, and introduction of additional costs, the benefit of increased protein identification rates outweighs these potential pitfalls [69]. Alternatively, the use of 16S-guided metaproteome databases is a practical solution. For example, a custom-made library based on representative bacterial genera identified by 16S rRNA sequencing [71] was compiled from reference proteomes (http://www.uniprot.org/proteomes/) of species within these genera and merged into one database together with the human proteome. Coverage. One of the obstacles hindering a wider use of metaproteomics is low coverage of the expected metaproteome. Currently, up to 60,000 protein groups have been identified in individual metaproteomics studies [72], [73], which might correspond only to a fraction (∼15-25%) of the expected proteins in the adult gut microbiome [55], [70]. The gut microbiome of an adult might contain ∼1000 bacterial species and ∼10 million genes [74]. Using protein abundances from metaproteomics analysis of a patient cohort of pediatric inflammatory bowel disease [72], Zhang and Figeys estimated that over 90% of gut microbiome-derived biomass comes from less than 100 most abundant species [55], while the rest of the species is identified only with one or two peptides. They further emphasized the need for techniques that increase protein identification for low abundance microbial taxa. Among these methods is a combination of stable isotope labeling with activity-based probe enrichment that allows for quantification of low-abundance proteins with specific functionalities, and which was recently used in an animal study [75]. The gut microbiome of children, and infants in particular, displays a lower species richness and overall microbial diversity than adults [76], alluding that more complete metaproteome coverage can be achieved even with present-day metaproteomics workflows. A case study recently demonstrated how low abundance microbial taxa, fungal species in this example, affect microbiome dynamics in a preterm infant [77]. Using a strategy of two bioinformatics pipelines for deriving eukaryotic and prokaryotic metagenomes, and creating a custom-built database composed of the concatenated metagenome-derived predicted proteomes, the authors described unique interactions between the fungus Candida parapsilosis and the bacterium Enterococcus faecalis within the infant gut microbiome. Similarly, our recent findings from a gnotobiotic study of germ-free mice colonised with defined consortia of bacterial and fungal species showed that metaproteomics could describe interkingdom interactions in the gut microbiome with high resolution [78]. The results from this animal study further highlighted that genome-matched databases are critical for the correct assignment of proteins to individual species. For 12 bacterial species with sequenced genomes, which were used as templates for the database construction, MS searches yielded relatively high coverage of the bacterial proteomes. However, for the fungal strains that did not have sequenced genomes and only general, species-specific databases were available, decreased specificity of MS data searches and lower coverage of the fungal proteomes were achieved. Bioinformatics. Dedicated bioinformatics software tools have been developed and used to deal with the computing demands of large database searches, including MetaLab [79], MetaProteomeAnalyzer [80], PEAKS [81], Galaxy-P [82], and ComPIL [83]. Two approaches have proven particularly useful for improving the identification rate: combining multiple search engines that match the theoretical spectra to the measured ones [52] and iterative search strategies that significantly speed up the database search process [84]. Another search strategy based on multi-staged filtering of peptide-spectrum matches has been implemented in the ProteoStorm tool [85]. A percentage of the identified peptide-to-spectrum matches will eventually be false positives, which need to be distinguished from the correct matches. The proportion of false-positive identifications is usually controlled by searching a decoy database containing reversed or scrambled protein sequences and calculating the false discovery rate threshold. However, the target-decoy approach is less sensitive in metaproteomics because of the large search space and high sequence similarity between many proteins, especially proteins from different taxa with the same function [69]. Alternatives include the use of machine learning approaches for modelling incorrect peptide-to-spectrum matches [86], [87]. These approaches distinguish correct and incorrect peptide-to-spectrum matches using a classifier based on learning algorithms from real data. Still, despite recent advances in big data analyses and newly available software tools, bioinformatics assessments of metaproteomics data remains a formidable challenge. Protein and species inference. A nontrivial task in metaproteomics, which follows peptide identification and validation, is peptide-to-protein-to-microbial species inference. Due to many similar proteins resulting from closely related species and horizontal gene transfer events within the microbiome, a peptide identification can potentially be matched to several proteins from different taxa. This is a major issue if metaproteomics data are used for species quantification; for example, using peptides shared by two microbial taxa will result in an overestimation of the taxon’s abundance [88]. Therefore, the use of highly specific protein inference criteria is recommended if the aim is to accurately quantify microbial taxa abundance. Furthermore, longer peptides are more likely to be unique to a single protein, while short peptides often match multiple proteins. It is therefore advantageous to optimise the mass spectrometer acquisition settings for preferential analysis of longer peptides [89]. The protein inference problem can be further mitigated by grouping together proteins inferred from the same set of identified peptides. The proteins of the same group usually exhibit the same function but have different taxonomic origins; therefore, the taxonomic origin of the entire protein group can be described by the lowest common ancestor within a phylogenetic tree [48]. Other methods for taxonomical annotation of metaproteomic data include the use of taxon-specific peptides, such as UniPep and ProteoClade [90], [91]. Post-translational modifications. Identification of post-translational modifications (PTMs) is an aspect of metaproteomics that can inform on regulatory mechanisms within different microbial taxa or host cells. The study of PTMs using proteomics typically requires an enrichment or depletion step, and a limited number of studies have profiled PTMs in the human gut environment from intestinal biopsies or stool samples. For example, a pioneering study has used a peptide immuno-affinity enrichment strategy to profile an abundant PTM in prokaryotes, lysine acetylation, in the gut microbiome [92]. The study identified lysine-acetylated sites on both host and microbial proteins that were differentially abundant in patients with Crohn's disease and healthy controls. Another form of PTMs is proteolytic processing of proteins by proteases, which act in concerted networks to amplify regulatory signals and are hypothesized to be molecular effectors involved in all aspects of biology, including microbiota homeostasis [93]Dysregulated proteolysis is often implicated in the initiation of inflammation but also persist in chronic inflammatory diseases [94], [95]. Using an N-terminomics approach that enriches N-termini to determine protein cut by proteases, TAILS (terminal amine isotopic labelling of substrates) was used to profile human colonic mucosal biopsies where over 1642 human N-termini were identified [96]. Using the bioinformatics software TopFIND [97], cleavage peptide positions was compared to known proteolytic processing preferences of human, bacterial, fungal and viral protease using the MEROPS database (https://www.ebi.ac.uk/merops/index.shtml). Interestingly, based on the reported site of cleavage preferences, the predicted proteolytic activity was identified to be potentially from human proteases (63%), followed by bacterial (27%), fungal (7%), and viral sources (3%) [96]. It is important to mention that proteases cleavage sites are largely uncharacterized; therefore, such analysis is likely to change as more information is added to the MEROPS database. Furthermore, little is known about the key PTMs involved in microbiome homeostasis, their provenance (human vs bacterial, fungal, or viral) and their roles in promoting human pathologies. Protein Annotation. Another important aspect of metaproteomics data analysis is protein functional annotation. With a well-annotated metaproteomics dataset, one can access multiple functional levels, from exploring broad classes of the metaproteome that give hints to overall functional changes to focused pathway-level analysis within specific taxa (Fig. 2). Nevertheless, proteins can often be assigned to multiple functional groups, which further augments the existing challenge of assigning the identified peptides to proteins sharing similar sequences but originating from different species. A variety of metaproteomics software tools for functional microbiome analysis is available and have been recently reviewed [58] and compared [98]. The performance of these computational tools differed to a large extent when tested on a single dataset, indicating potential difficulties for cross-study comparisons of data acquired by different labs, with different sample processing protocols and MS settings. Finally, bioinformatics tools for taxonomic and functional analysis face a large number of unannotated sequences. This is partly because the quality of annotations of sequence databases originating from metagenomic projects might be low, and for most proteins there is missing biochemical evidence of their function. Resources for proteins functional annotations [e.g, Gene Ontology [99], eggNOG [100], UniProt [101]], biochemical pathways [MetaCyc [102], KEGG [103], neXtProt [104]] and interactions [STRING [105]], are essential for the use of metaproteomics to address biological questions. In summary, the complexity and heterogeneity of stool samples brings considerable wet lab challenges to the metaproteomics field, but tailored protein databases, combined search algorithms, and iterative workflows, improve protein identification. This was demonstrated in a recent multi-lab comparison of metaproteomics workflows, where the same samples were given to 7 different labs. Different wet lab processing protocols introduced a variability at the peptide level, which, however, largely disappeared at the protein level in downstream bioinformatic analysis [52]. Nonetheless, there are still substantial bioinformatics limitations in metaproteomics related to the identification of false positives and functional annotation of the data. Metaproteomics will benefit from standardised bioinformatics pipelines that reliably process metaproteome data within a short time frame and link protein sequence to the taxonomic and biochemical information available from community resources. Without a doubt, new efficient bioinformatics tools adapted to the complexity of microbiomes are the key for more routine application of metaproteomics.

Current challenges of LC–MS-based metabolomics of stool samples

Chemical complexity of feces. In addition to proteins, fecal matter contains other biomolecules that reflect the process of nutrition to which the gut microbiome significantly contributes. Feces contain typically between 60 and 85% of water, depending on the fiber intake, and the dry matter consists of microbial biomass (25–54%), shredded epithelial cells and mucus, undigested food residues, macromolecules (fiber, protein, DNA, mucopolysaccharides) and small molecules or metabolites [106]. The fecal metabolome refers to the collection of these small molecules, i.e., sugars, organic acids, amino acids, nucleotides, phenols, indoles, lipids, and hormones, all of which might have roles as signaling molecules, metabolic intermediates, or secondary metabolites [107]. Thus, the metabolome can be interpreted as a molecular signature of the host under certain physiological conditions and a record of the interactions between the host and the gut microbiome. A recent estimate suggested that gut bacterial products account for up to 90% of the fecal metabolome [107], reflecting the gut microbiota composition and explaining on average ∼68% of its variance [108]. Thus, the fecal metabolome is considered a functional readout of the microbiome [108]; however, some of the metabolites will be common for the gut microbiota and the host as feces contain a combined metabolic output of both. Volatile metabolites. Currently, there are over 115,000 characterised metabolites in the Human Metabolome Database [109], of which 5.9% (6810) originate from feces and are accessible in the Human Fecal Metabolome Database [107]. The annotation of metabolites in the Human metabolome database is based on CFM-ID, a web server for annotation, spectrum prediction and metabolite identification from tandem mass spectra [110]. In addition, many of the assignments were performed using a combination of manual annotation and data mining software tools such as PolySearch2 [111]. The composition of the fecal metabolome depends on the diet [112], and although the majority of the metabolites is non-volatile, some of the most abundant metabolites in human feces are short-chain fatty acids (SCFA) such as acetic, propionic, and butyric acid [107]. SCFA are the best-known representatives of almost 400 volatile organic compounds that have been identified in human fecal samples [113]. SCFA are the end products of bacterial fermentation in the gut and function as energy sources for epithelial cells and as bioactive metabolites regulating the immune system, intestinal barrier function [114], and microbial behavior [115], [116]. Despite SCFA's importance, it is difficult to handle volatile organic compounds because of their gaseous form, resulting in their loss when using common sample preparation methods. Still, several targeted metabolomics methods for SCFA detection and quantification have been developed and successfully applied, using chemical derivatization and a highly sensitive MS method of multiple reaction monitoring [117], [118]. Methods standardisation. Recent functional investigations of the gut microbiome document growing interest in stool metabolomics [119], [120]. However, standardised methods for collecting, processing, and analysing fecal samples are still lacking, and their paucity greatly limits study-to-study comparisons. There is inherent variability in fecal samples even within one individual that depends on feeding status and bowel activity, reflected by dynamic changes in metabolite composition over time [121]. Consequently, multiple-day sampling and pooling have been proposed to minimise day to day variation in metabolite profiles [122]. In addition, the fact that feces contain metabolically active microbial cells makes their analysis sensitive to differences in collection methods, as exposure to aerobic conditions and different temperatures can change the metabolite composition of samples. Ethanol preservation is an alternative when immediate freezing of samples is not possible, as shown for samples stored in 95% ethanol up to 4 days that exhibited a metabolic profile similar to fresh samples [83]. The topographical position from which the fecal sample is taken can also affect the metabolic profile, and therefore it is crucial to homogenise the fresh sample before aliquoting [123]. Alternatively, small molecule extraction can be performed using the entire fecal sample to avoid missing metabolites present in unsampled areas. One of the most critical steps in sample preparation is normalisation to account for feces water content. Fecal samples can have up to 30% variation in water content, which is significant enough to affect downstream statistical analysis and skew especially modest metabolite differences between samples [106]. Finally, the chemical diversity of the metabolome makes metabolite extraction a formidable problem. A particular solvent can extract metabolites of the same chemical class, and no single extraction method is optimal for all metabolites [124]. If metabolite coverage is of utmost importance, multiple extractions should be performed using solvents of various polarity indices. Although this will increase coverage, it can also significantly increase financial and logistical burdens. The points above, plus several other guidelines regarding sample collection and preparation [107], [125], must be considered in the experimental design of metabolomics studies. Quality control. In addition to inherent biological variation, analytical variation of LC–MS instruments can also cause issues if not appropriately addressed. In general, LC–MS data are collected over long periods of time and, although not a recommended practice, are sometimes analysed in multiple batches. Consequently, LC–MS and MS/MS data exhibit significant variability depending on the instrument condition and operating environment. Shifts in m/z values and retention times of molecular features between runs might result in different spectral patterns, negatively impacting metabolite identification or quantification. Therefore, quality control (QC) samples should be applied and used to model and correct systematic measurement bias and between-batch errors [126]. In targeted metabolomics, the QC sample often consists of a mixture of the authentic chemical standards representing the target analytes. The selection of an appropriate QC sample for untargeted assays is more complex. It is generally recommended that the QC sample reflects the aggregate metabolite composition of the biological samples in a given study, and a homogenous pooled QC sample prepared from all biological samples under study should be analyzed before injection of individual samples, after a fixed number of samples have been injected, and also after injection of the last sample [126]. Several software tools based on mathematical models for signal correction [127], [128], [129] and simulation of QC sample data [130] have the potential to correct batch-to-batch variations and instrumental drift. Of note, the use of pooled QC samples is also valid for metaproteomics studies. Metabolite identification. From the popular instrumental platforms used for metabolomics, i.e., nuclear magnetic resonance spectroscopy, LC–MS, and gas chromatography coupled to MS, LC–MS approaches offer higher sensitivity and relatively broad metabolite coverage. However, this sensitivity often results in more laborious identification of analytes [107]. A standard approach for metabolite identification in untargeted, discovery-based analysis, analogous to the one used in (meta)proteomics, is querying metabolomic databases for the molecular mass values of the identified peaks using a tolerance window. Because metabolomics databases lack genetic templates as those used in metaproteomics, the databases will likely be incomplete with missing candidate matches for more rarely occurring compounds. Moreover, compared to peptides, small metabolites often lack common building blocks and are built from both very frequently occurring elements (C, H, O, N, S, and P) and trace elements (e.g., Na, K, Mg, Zn, Fe, Ca, Mo, Cu, Co, and Mn). Even though MS analysis can accurately determine the mass of a compound, this information alone is not sufficient to differentiate isomers, and additional information, including the fragmentation spectrum and retention time, is critical for structural elucidation of a mass measurement [131]. The use of standards for comparing analyte retention times and mass fragmentation patterns assists in accurate biomolecules identification and especially quantification. Once the metabolite has been confidently identified, an additional challenge posed is the determination of its concentration in the sample studied. Given that potential biomarkers of health and disease states will most often be found in both conditions, though at different levels, accurate quantification of promising targets is a desirable feature. Although a regular MS run will provide semi-quantitative information on a metabolite, such as peak area and signal intensity, due to the variation between runs commonly seen in MS experiments, more careful analyses are required to directly compare metabolite concentrations in different samples. This is usually achieved using metabolite standards containing deuterium, a heavy hydrogen isotope. By spiking samples with known concentrations of the deuterated standard and comparing the peaks of the standard and target compounds, one can accurately determine the absolute concentration of the metabolite studied. Still, because of an impracticality for untargeted analyses in which standards are not available for most compounds, general approaches based on prediction models are gaining importance [132]. In conclusion, high mass accuracy of state-of-the-art MS instruments and complementary analysis of molecular patterns are increasingly able to assign putative structures to the detected features despite the inherent challenges that metabolomics faces. But instrument advances can do little if the databases are not in constant improvement. As mentioned above, only 6810 metabolites in the Human Metabolome Database are annotated as being found in feces. This is possibly orders of magnitude below the real chemical diversity of the human gastrointestinal tract. Equally important is the development and improvement of metabolomics software tools, which still need to address many challenges associated with metabolite identification, diverse data types, and large volumes of data [133], [134], [135].

Gut microbiome establishment: Insights from metaproteomics and metabolomics studies

Stool metaproteomics and metabolomics have been used to study various diseases; yet, here we focus on their use for functional characterisation of the early life gut microbiome (Table 1). Understanding the establishment of the human gut microbiome during infancy is paramount for modern medicine because of its implications for long-term health [136], [137]. Numerous reports have demonstrated that mammalian systems are adapted to receive specific microbial signals necessary for optimal physiological development [138], [139]. Specifically for humans, an infant gut microbiome characterised by early bacterial colonisers from the genera Bifidobacterium and Bacteroides, adapted to utilise human milk oligosaccharides, appears to be a cornerstone of healthy development. Perturbations of the microbiome at the earliest time in life during maximal immune, metabolic, and neuroendocrine development predispose infants to non-communicable diseases caused by underlying defects in physiology [9], [140] as well as more frequent infections [141], [142], [143]. The biochemical processes that govern the microbial dynamics during gut colonisation remain a poorly understood yet exciting research frontier. Initial colonisation. Integrative analysis of metagenomic data from 34 longitudinal studies worldwide showed that gut microbiome maturation happens in an orchestrated manner, suggesting that the timing of microbial succession is biologically determined [144]. The gut microbiome of infants born at term and vaginally is seeded with vertically transmitted microbes from the mother and is initially dominated by facultative anaerobic bacteria (i.e., Streptococcus spp., Enterobacterales, Staphylococcus spp.), which are soon replaced by a community dominated by Bacteroides and especially Bifidobacterium during the lactation period [145], [146]. A common belief has been that the initial facultative anaerobes consume oxygen and facilitate the subsequent engraftment of obligate anaerobes. This view was recently questioned by a multi-omic study that provided evidence of anaerobic fermentation of amino acids as a mechanism for the initial growth of E. coli, the most common early colonizer [120]. A gnotobiotic animal study showed a similar finding: establishment of the dominant intestinal anaerobe Bacteroides thetaiotaomicron was dependent on the Bacteroides inoculum size and preestablishment by bacteria capable or not of consuming oxygen [147]. Another example of the versatile metabolic capacities of facultative anaerobes from the order Enterobacterales is their ability to degrade fatty acids and lipids [148], which constitute ∼50% of the infant's first stool [119]. Additional translational research using functional omics needs to clarify whether the presence of the very first microbial colonisers is driven by their better survival in the environment [147] and increased capacity to degrade host-derived components such as proteins [149] and lipids [150], or a combination of these factors. Foundation species. The strongest documented disruptors to the gut microbiome development are birth by C-section, lack of breastfeeding, and antibiotic use in infancy [141], [151], [152]. Alterations of the microbiota composition by these adverse external factors have also been documented at the metabolome level. For example, early antibiotic exposure in preterm infants functionally altered the gut metabolic output, including pathways related to vitamin biosynthesis, bile acids, amino acid metabolism, and neurotransmitters [153]. Similarly, different early life feeding methods, i.e., breastfeeding, formula-feeding, or their combination, induced distinct fecal metabolite profiles in infants [154]. The above reports illustrate how the metabolic output of the gut microbiome directly depends on its composition [155], and adverse external factors may affect the levels of most metabolites currently detectable, as predicted in an animal study [156]. A signature of C-section born infants is a lack of Bacteroides spp., delayed Bifidobacterium development, and an expansion of facultative anaerobes adapted to a more oxidative environment and without the genomic capability to metabolize milk carbohydrates [152]. Although several metagenomic studies compared fecal microbiota composition of infants delivered vaginally and by C-section, characterisation of the microbiota functions is largely missing. Nevertheless, there is evidence for the functional benefits of Bifidobacterium species, strict anaerobes from the phylum Actinobacteria that are the founder species of the gut microbiome associated with a protective immune system modulation [157]. Bifidobacterium persists at high levels during lactation because of their unique genomic capacity to utilise human milk oligosaccharides. In breastfed infants, high bifidobacterial levels lead to a high SCFA concentration and a decrease in the gut pH [158], [159], [160], limiting the growth of other bacteria, such as Enterobacterales [161]. Supplementation with Bifidobacterium strains has also been associated with altered fecal metabolome [162], lower levels of potential pathogens such as Enterococcus, Enterobacter, and Klebsiella, and reduced carriage of antimicrobial genes [163], [164]. In addition to host management of the first gut microbiota through breastfeeding, IgA – the most abundant immunoglobulin isotype secreted into the gut, and received by the infant via breastmilk – appears to play a key role in gut microbiota maturation [165]. This was demonstrated in a metaproteomic investigation that assessed gut microbiota maturation in newborn mice [166], using an IgA-deficient (Rag2−/−) mouse genetic background. The results confirmed the role of breastfeeding in modulating the mouse gut microbiota in the first days of life, but at the same time suggested other concurrent factors, related to the mother’s gut microbiota, immune response, or regulation by the mucosal immune system itself. Preterm infants. The above-described gut colonisation process is very different in infants born prematurely. Reports describing the gut microbiome composition showed that premature infants display reduced alpha diversity, delayed colonization with obligate anaerobic bacteria, and increased abundance of opportunistic pathogens compared to term infants [167], [168]. A recent metaproteomics study followed the fecal microbiome of preterm infants during the first six weeks of life and brought additional information on the physiology of the premature gut [34]. Compared to term infants, gastrointestinal barrier‑related proteins were less abundant in preterm infants' feces, while bacterial oxidative stress proteins of facultative anaerobes were increased. The authors hypothesised that these findings might suggest the introduction of oxygen into the gut lumen by respiratory support commonly used in neonatal intensive care units. Previously, respiratory support was associated with delayed colonisation by strict anaerobes [138], a hallmark of the preterm gut colonisation process. Subsequently, the aerobic environment might decrease the abundance of strict anaerobes such as Bifidobacterium spp., the primary producers of SCFA involved in the production of anti-inflammatory cytokines and stimulation of the intestinal barrier function [37], [38]. A study that combined metagenomics and metaproteomics has given an ecological perspective on the premature infant gut colonization process. This genome-resolved metaproteomics study demonstrated that the contributions of individual organisms to microbiome development depend on microbial community context [35]. Furthermore, the microbial metaproteome was more variable over time than the community composition, and genetically similar microbes colonizing different infants were found to have distinct proteomes. These results indicated that microbiome function is not only driven by the type of organisms present but largely depends on microbial responses to the unique set of physiological conditions in the infant's gut. Similarly, the stool metabolome of preterm infants appears to be distinct between individuals, without any apparent associations to health outcomes, such as necrotizing enterocolitis and sepsis [169]. Studies describing preterm infants gut colonisation have dominated the early life microbiome functional description because of more straightforward sample collection logistics and availability of detailed clinical data. Although these reports are specific for the premature gut microbiome, there might be certain parallels with the general microbial colonisation process in humans, regardless of gestational age. For example, a case study of one preterm infant documented how bacterial activity transits toward more complex metabolic functions over the first month of life [170]. Based on the identified proteins functional classification, the authors predicted that the gut microbial community first focused its resources on biomass growth, protein production, and lipid metabolism. After this initial microbiome establishment during the first two weeks, it switched to more complex metabolic functions, such as carbohydrate metabolism. Several reports on the preterm infant fecal metaproteome also documented low bacterial load in the first weeks after birth, showing a time-dependent increase in the relative abundance of microbial proteins while the abundance of host- and dietary-derived proteins gradually decreased [71], [170], [171]. A similar trend has also been observed for term infants, although the observation was based on metaproteomes of only three infants [34]. Overall, these studies highlight the strong interdependency between the human host and the gut microbiome for both to reach maturity. Proteins that directly regulate gut colonisation and maturation will serve as valuable markers for intestinal barrier development and immune system education. Time course metaproteomics had also been applied to specific actions of microbial eukaryotes within the gut microbiome [77]. This case study of low abundance microbial taxa characterised fecal samples from a premature infant with a documented Candida blood infection with the aim to describe the behaviour of the fungi in the human gut. Metagenomic sequencing confirmed the presence of C. parapsilosis in the infant's fecal sample, with indications of robust establishment and active function within the gut microbiome. Further, protein-derived metabolic activities of bacteria, fungi, and their shared activity showed distinct partitioning of function and cooperation between eukaryotes and prokaryotes within the community during early life. This study highlights the importance of characterising interkingdom interactions within the human microbiome, as these are essential components of the relationship between the microbiome and its host.

Conclusions and outlook

Omics based on LC–MS are gradually gaining momentum to identify with high precision functionalities of the gut microbiome. LC–MS analyses of stool assist in unravelling interactions between different microorganisms residing in the gut as well as those with the host, offering insights beyond taxonomic composition and genomic information. MS-based omics provide data that DNA sequences cannot; that is which proteins and metabolites are present and their quantitative information. In addition, identification of post-translational modifications is only possible by metaproteomics. The combination of LC–MS techniques with DNA sequencing applied on longitudinal human studies has already led to the description of nuanced signatures of healthy and disease states. Still, several methodological and bioinformatics challenges persist, with stool sample chemical complexity, lack of standardised method, and incomplete databases being the main issues contributing to low metaproteome and metabolome coverages (Fig. 2). However, once the current challenges are overcome, it will be possible to fully define the intertwined metabolic networks of individual gut microbes and the human host.

Declaration of Competing Interest

The study funding sources are listed in the Acknowledgements. The authors have no financial/commercial conflicts of interest.
  162 in total

1.  Assessing the impact of protein extraction methods for human gut metaproteomics.

Authors:  Xu Zhang; Leyuan Li; Janice Mayne; Zhibin Ning; Alain Stintzi; Daniel Figeys
Journal:  J Proteomics       Date:  2017-07-10       Impact factor: 4.044

Review 2.  The germ-organ theory of non-communicable diseases.

Authors:  Mariana X Byndloss; Andreas J Bäumler
Journal:  Nat Rev Microbiol       Date:  2018-01-08       Impact factor: 60.633

Review 3.  Metaproteomic and Metabolomic Approaches for Characterizing the Gut Microbiome.

Authors:  Danielle L Peters; Wenju Wang; Xu Zhang; Zhibin Ning; Janice Mayne; Daniel Figeys
Journal:  Proteomics       Date:  2019-07-31       Impact factor: 3.984

4.  Antibiotics in early life associate with specific gut microbiota signatures in a prospective longitudinal infant cohort.

Authors:  Katri Korpela; Anne Salonen; Harri Saxen; Anne Nikkonen; Ville Peltola; Tytti Jaakkola; Willem de Vos; Kaija-Leena Kolho
Journal:  Pediatr Res       Date:  2020-01-18       Impact factor: 3.756

Review 5.  Eight key rules for successful data-dependent acquisition in mass spectrometry-based metabolomics.

Authors:  Emmanuel Defossez; Julien Bourquin; Stephan von Reuss; Sergio Rasmann; Gaétan Glauser
Journal:  Mass Spectrom Rev       Date:  2021-06-18       Impact factor: 10.946

6.  Dysbiosis, inflammation, and response to treatment: a longitudinal study of pediatric subjects with newly diagnosed inflammatory bowel disease.

Authors:  Kelly A Shaw; Madeline Bertha; Tatyana Hofmekler; Pankaj Chopra; Tommi Vatanen; Abhiram Srivatsa; Jarod Prince; Archana Kumar; Cary Sauer; Michael E Zwick; Glen A Satten; Aleksandar D Kostic; Jennifer G Mulle; Ramnik J Xavier; Subra Kugathasan
Journal:  Genome Med       Date:  2016-07-13       Impact factor: 11.117

7.  Assessing species biomass contributions in microbial communities via metaproteomics.

Authors:  Manuel Kleiner; Erin Thorson; Christine E Sharp; Xiaoli Dong; Dan Liu; Carmen Li; Marc Strous
Journal:  Nat Commun       Date:  2017-11-16       Impact factor: 14.919

Review 8.  Guidelines and considerations for the use of system suitability and quality control samples in mass spectrometry assays applied in untargeted clinical metabolomic studies.

Authors:  David Broadhurst; Royston Goodacre; Stacey N Reinke; Julia Kuligowski; Ian D Wilson; Matthew R Lewis; Warwick B Dunn
Journal:  Metabolomics       Date:  2018-05-18       Impact factor: 4.290

9.  Critical Assessment of MetaProteome Investigation (CAMPI): a multi-laboratory comparison of established workflows.

Authors:  Tim Van Den Bossche; Benoit J Kunath; Kay Schallert; Stephanie S Schäpe; Paul E Abraham; Jean Armengaud; Magnus Ø Arntzen; Ariane Bassignani; Dirk Benndorf; Stephan Fuchs; Richard J Giannone; Timothy J Griffin; Live H Hagen; Rashi Halder; Céline Henry; Robert L Hettich; Robert Heyer; Pratik Jagtap; Nico Jehmlich; Marlene Jensen; Catherine Juste; Manuel Kleiner; Olivier Langella; Theresa Lehmann; Emma Leith; Patrick May; Bart Mesuere; Guylaine Miotello; Samantha L Peters; Olivier Pible; Pedro T Queiros; Udo Reichl; Bernhard Y Renard; Henning Schiebenhoefer; Alexander Sczyrba; Alessandro Tanca; Kathrin Trappe; Jean-Pierre Trezzi; Sergio Uzzau; Pieter Verschaffelt; Martin von Bergen; Paul Wilmes; Maximilian Wolf; Lennart Martens; Thilo Muth
Journal:  Nat Commun       Date:  2021-12-15       Impact factor: 14.919

10.  Extensive impact of non-antibiotic drugs on human gut bacteria.

Authors:  Lisa Maier; Mihaela Pruteanu; Michael Kuhn; Georg Zeller; Anja Telzerow; Exene Erin Anderson; Ana Rita Brochado; Keith Conrad Fernandez; Hitomi Dose; Hirotada Mori; Kiran Raosaheb Patil; Peer Bork; Athanasios Typas
Journal:  Nature       Date:  2018-03-19       Impact factor: 49.962

View more
  1 in total

Review 1.  Metagenomics Approaches to Investigate the Neonatal Gut Microbiome.

Authors:  Zakia Boudar; Sofia Sehli; Sara El Janahi; Najib Al Idrissi; Salsabil Hamdi; Nouzha Dini; Hassan Brim; Saaïd Amzazi; Chakib Nejjari; Michele Lloyd-Puryear; Hassan Ghazal
Journal:  Front Pediatr       Date:  2022-06-21       Impact factor: 3.569

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.