| Literature DB >> 30801026 |
Robert H Mills1,2,3,4, Yoshiki Vázquez-Baeza3, Qiyun Zhu3, Lingjing Jiang3,5, James Gaffney3, Greg Humphrey3, Larry Smarr6,7,4, Rob Knight3,4, David J Gonzalez1,2,4.
Abstract
Although genetic approaches are the standard in microbiome analysis, proteome-level information is largely absent. This discrepancy warrants a better understanding of the relationship between gene copy number and protein abundance, as this is crucial information for inferring protein-level changes from metagenomic data. As it remains unknown how metaproteomic systems evolve during dynamic disease states, we leveraged a 4.5-year fecal time series using samples from a single patient with colonic Crohn's disease. Utilizing multiplexed quantitative proteomics and shotgun metagenomic sequencing of eight time points in technical triplicate, we quantified over 29,000 protein groups and 110,000 genes and compared them to five protein biomarkers of disease activity. Broad-scale observations were consistent between data types, including overall clustering by principal-coordinate analysis and fluctuations in Gene Ontology terms related to Crohn's disease. Through linear regression, we determined genes and proteins fluctuating in conjunction with inflammatory metrics. We discovered conserved taxonomic differences relevant to Crohn's disease, including a negative association of Faecalibacterium and a positive association of Escherichia with calprotectin. Despite concordant associations of genera, the specific genes correlated with these metrics were drastically different between metagenomic and metaproteomic data sets. This resulted in the generation of unique functional interpretations dependent on the data type, with metaproteome evidence for previously investigated mechanisms of dysbiosis. An example of one such mechanism was a connection between urease enzymes, amino acid metabolism, and the local inflammation state within the patient. This proof-of-concept approach prompts further investigation of the metaproteome and its relationship with the metagenome in biologically complex systems such as the microbiome. IMPORTANCE A majority of current microbiome research relies heavily on DNA analysis. However, as the field moves toward understanding the microbial functions related to healthy and disease states, it is critical to evaluate how changes in DNA relate to changes in proteins, which are functional units of the genome. This study tracked the abundance of genes and proteins as they fluctuated during various inflammatory states in a 4.5-year study of a patient with colonic Crohn's disease. Our results indicate that despite a low level of correlation, taxonomic associations were consistent in the two data types. While there was overlap of the data types, several associations were uniquely discovered by analyzing the metaproteome component. This case study provides unique and important insights into the fundamental relationship between the genes and proteins of a single individual's fecal microbiome associated with clinical consequences.Entities:
Keywords: colonic Crohn's disease; gut inflammation; inflammatory bowel disease; metagenomics; metaproteomics; microbiome; multiomics; proteomics; tandem mass tags; time series
Year: 2019 PMID: 30801026 PMCID: PMC6372841 DOI: 10.1128/mSystems.00337-18
Source DB: PubMed Journal: mSystems ISSN: 2379-5077 Impact factor: 6.496
Roles of immunological proteins of interest
| Protein | Role |
|---|---|
| CRP | An acute-phase response protein produced by the liver upon stimulation by IL-6, TNF-α, and IL-1-β and a common clinical |
| Lysozyme | A glycoside hydrolase used in the innate immune system for hydrolysis of cell walls of Gram-positive bacteria ( |
| Secretory IgA | The most abundant antibody in the human colon; helps tightly control the relationship between commensal microbes and |
| Calprotectin | An antimicrobial protein that sequesters manganese to prevent the growth of pathogenic microbes that require these metals |
| Lactoferrin | An antimicrobial glycoprotein and a major component of the secondary granules of neutrophils ( |
IL-6, interleukin-6; TNF-α, tumor necrosis factor alpha.
FIG 1Study design. (a) Immune markers associated with samples. Mass-spectrometry-based relative abundances of fecal calprotectin, CRP, lysozyme, lactoferrin, and secretory IgA are plotted as indicated on the left y axis for each of the eight time points in this study. (b) Workflow schematic describing omic methods. Shotgun sequencing and metaproteomic methods were performed in parallel for the analysis of eight selected samples. Both methods were performed in technical triplicate for evaluation of technical variability. Tandem mass tag (TMT) labeling of tryptic peptides was performed for three mass spectrometry experiments. Green and dark blue hexagons represent composite samples used as controls, while other colors represent the random labeling of samples using the remaining TMT reagents. Shotgun sequencing reads were combined and assembled into a shared reference database (Personal Database Assembly) for assigning gene counts (in counts per million [cpm]) and protein abundances. Data corresponding to MS1, which was used for precursor selection, are not depicted.
FIG 2Broad-scale data type comparisons. (a) Procrustes analysis comparing clustering of the metaproteome to that of the metagenome. Bray-Curtis distance metric was used on both the metagenome and the metaproteome (only proteins common to all samples; pDB database) to assess technical and biological variability within and between data sets. Samples are colored according to calprotectin relative abundances. (b) Distribution of Spearman correlations comparing metagenomic and metaproteomic fluctuations. The x axis displays Spearman correlation (ρ) data, and the y axis displays the number of gene-protein pairs within a range of Spearman correlation values. (c) Dynamic range comparison. Histograms fitted with a Gaussian kernel density estimate are displayed at the gene and protein levels. The log 10 values representing the maximum value for each protein or gene divided by the minimum value are plotted on the x axis. The numbers of proteins corresponding to each maximum/minimum (Max/Min) range are plotted on the y axis. (d) Variability comparison. The analyses were performed as described for panel c but according to the standard deviation of each gene or protein. (e) GO categories with the largest fluctuations. Proteins and genes were summed according to their GO categories, and the maximum values were compared to the minimum values. The highest metagenomic fluctuations for each category are recorded at the top, and the highest metaproteomic fluctuations are displayed at the bottom.
FIG 3Functional categories with strong or weak genomic prediction of proteome fluctuation. (a) Box plot demonstrating the distribution of Spearman correlations for each gene with an associated eggNOG functional category. The Spearman correlation (ρ) between the summed metagenomic counts per million per time point and the average relative abundance of associated metaproteomic protein is displayed. Summary statistics for these data can be found in Table S1. (b) Summed GO categories with strong genomic and proteomic correlation. (c) Summed GO categories with weak genomic and proteomic correlation.
FIG 4Genus-level associations with clinical markers. (a and b) Bar plot displaying the fractional composition of the most abundant genera (>0.03) in the metagenome (a) and the metaproteome (b) in each of the samples analyzed. (c) Comparison of genes and proteins significantly associated with each clinical marker. Venn diagrams show the number of genes and proteins with a large effect size (|r| > 0.7) with respect to clinical markers based on linear regression. (d) Genera associated with clinical markers. The associated proteins with genus-level taxonomy analyzed as described for panel c were compared by determining the log ratios of the compositions of proteins with positive and negative associations. The log ratio is plotted on the x axis for each clinical marker, and bars represent the association with each genus. Metaproteome values are plotted in red, and metagenome values are plotted in black. The numbers of genes and proteins included in this analysis are listed in Table S2.
FIG 5Functional associations with clinical markers. (a) Functions associated with clinical markers. Linear regressions to clinical markers were performed and the number of proteins or genes derived from each functional group with a large effect size (|r| > 0.7) were compared. The log ratio of the composition of positive and negative proteins is plotted on the x axis for each clinical marker. Metaproteome values are plotted in red and metagenome values are plotted in black. PTM, posttranslational modification; T&M, transport and metabolism. (b) Time series plots of selected proteins of interest. Protein abundances of one finding from each clinical marker are shown. A legend describing the protein names and associated genera is shown below each graph. DAHP synthase, 3-deoxy-d-arabinoheptulosonate 7-phosphate synthase.