| Literature DB >> 27528457 |
Sandip Chatterjee1, Gregory S Stupp1, Sung Kyu Robin Park2, Jean-Christophe Ducom3, John R Yates2, Andrew I Su4,5, Dennis W Wolan6,7.
Abstract
BACKGROUND: Mass spectrometry-based shotgun proteomics experiments rely on accurate matching of experimental spectra against a database of protein sequences. Existing computational analysis methods are limited in the size of their sequence databases, which severely restricts the proteomic sequencing depth and functional analysis of highly complex samples. The growing amount of public high-throughput sequencing data will only exacerbate this problem. We designed a broadly applicable metaproteomic analysis method (ComPIL) that addresses protein database size limitations.Entities:
Keywords: Database; Metaproteomics; Microbiome; MongoDB; Proteomic search engine; Proteomics
Mesh:
Substances:
Year: 2016 PMID: 27528457 PMCID: PMC4986259 DOI: 10.1186/s12864-016-2855-3
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1Design, components, and generation of ComPIL databases. a ComPIL utilizes 3 databases that are generated from an input protein FASTA file. MassDB contains peptide sequences organized by distinct mass; ProtDB contains protein information; SeqDB contains distinct peptide sequences along with their parent proteins (mapped to ProtDB). b Public protein repositories and numbers of proteins incorporated into ComPIL. Numbers shown above columns are in millions. c 1) Protein data from various repositories (shown in b) were grouped together in FASTA format. Protein records were imported into ProtDB. 2) Proteins were in silico digested to peptides using trypsin specificity. 3) Peptides were sorted by sequence or by mass to group peptides with identical sequences or masses together, respectively. 4) Peptides with identical sequences or masses were grouped into JSON objects which were imported into MongoDB as SeqDB or MassDB, respectively. For implementation details see Additional file 1: Methods
Fig. 2Validation of ComPIL with Blazmass searches using human and B. fragilis samples. a Human and B. fragilis proteins were extracted and tandem MS data was collected using MudPIT. The datasets were searched against ComPIL and either the human proteome or the B. fragilis proteome. b ComPIL searches of HEK293 cells. Unfiltered PSMs (top) and filtered PSMs (after filtering at 1 % FDR with DTASelect2, bottom) are shown categorized as forward human matches (left), forward non-human matches (middle) or reverse (e.g., decoy) protein matches (right)
Fig. 3Post-infection detection of Influenza A peptides in Calu-3 cells searched using the ComPIL database. Detected Influenza peptides are shown mapped to their location within the Influenza A/Anhui/1/2013 proteome as a function of time. The color represents the normalized spectral counts of peptides found at each residue. Influenza A peptides could be detected from 7 h post-infection, and shown a direct relationship between infection time and relative quantitation of the peptides. A Jupyter notebook with more details about this figure is available at https://bitbucket.org/sulab/metaproteomics
Fig. 4Evaluation of ComPIL-Blazmass search of a complex microbiome sample. a Filtered PSMs from four human stool proteome samples after searching each dataset with either the “46 proteomes” database or ComPIL. b Protein loci identified from four human stool proteome samples after searching each dataset with either the “46 proteomes” database or ComPIL
Fig. 5Functional annotation of five human microbiome proteome samples. Stacked bar chart showing the most abundant GO terms in each sample quantified by spectral counts. GO terms comprising the top 80 % of spectral counts (on average across all samples) are shown, with the others GO terms grouped into the “Other” category. Represented are the five healthy human fecal samples subjected to metaproteomics and the three technical replicates of each sample