Literature DB >> 29215871

MPA Portable: A Stand-Alone Software Package for Analyzing Metaproteome Samples on the Go.

Thilo Muth1, Fabian Kohrs2, Robert Heyer2, Dirk Benndorf2,3, Erdmann Rapp3, Udo Reichl2,3, Lennart Martens4,5, Bernhard Y Renard1.   

Abstract

Metaproteomics, the mass spectrometry-based analysis of proteins from multispecies samples faces severe challenges concerning data analysis and results interpretation. To overcome these shortcomings, we here introduce the MetaProteomeAnalyzer (MPA) Portable software. In contrast to the original server-based MPA application, this newly developed tool no longer requires computational expertise for installation and is now independent of any relational database system. In addition, MPA Portable now supports state-of-the-art database search engines and a convenient command line interface for high-performance data processing tasks. While search engine results can easily be combined to increase the protein identification yield, an additional two-step workflow is implemented to provide sufficient analysis resolution for further postprocessing steps, such as protein grouping as well as taxonomic and functional annotation. Our new application has been developed with a focus on intuitive usability, adherence to data standards, and adaptation to Web-based workflow platforms. The open source software package can be found at https://github.com/compomics/meta-proteome-analyzer .

Entities:  

Mesh:

Substances:

Year:  2017        PMID: 29215871      PMCID: PMC5757220          DOI: 10.1021/acs.analchem.7b03544

Source DB:  PubMed          Journal:  Anal Chem        ISSN: 0003-2700            Impact factor:   6.986


The key role of microbial consortia has recently gained increased attention due to promising findings on their functional repertoire in the human intestinal tract. Complex microbial communities fulfill essential host-related functions regarding nutrient uptake, digestion, and immune response.[1] Importantly, the human gut microbiome has also been correlated with pathological states such as type-2 diabetes,[2] cardiovascular disease,[3] Crohn’s disease,[4] inflammatory bowel disease,[5] and obesity.[6,7] In more general terms, the importance of microbial communities is related to the well-known fact that microbes are critical to the niche system (e.g., human host) in which they reside. One of the most common approaches for studying microbial communities presents genome analysis, using either 16S rRNA gene sequencing or shotgun whole metagenome sequencing.[8] While these techniques are highly useful tools for gaining insights into the composition and functional potential of a microbial community, these do lack the ability to capture the actual functional profile of such a community at a given time point and under specific conditions. However, such profiling is essential to demonstrate that predicted biological processes are actually present and active in a given sample and can be only gained from the functionally active snapshot of microbial communities.[9] Metaproteomics, the mass spectrometry-based analysis of multispecies proteins from microbial samples, aims to elucidate the functional expression and taxonomic origin of such microbial consortia.[10−12] This proteomic technique is also employed for rapidly detecting pathogens and studying their host-adaptation mechanisms.[13] The application of metaproteomics has led to promising findings in recent studies for which disease-associated protein markers could be identified, e.g., when analyzing samples from bovine blood serum[14] or human oral saliva.[15] While throughput and resolution of instrumentation have evolved dramatically within the past decade, the analysis and interpretation of the upcoming data still remains a challenge. This can mainly be attributed to the complexity and heterogeneity of microbiome samples, which can contain proteins from hundreds or thousands of different species.[16] Despite the increase in popularity of metaproteomics, existing proteome bioinformatics methods have not yet been sufficiently adapted to adequately address these challenges,[17] and tailored solutions for metaproteomics remain rare.[18−21] In this article, we present the MetaProteomeAnalyzer (MPA) Portable software and demonstrate all novel features and improvements which have been developed since the original MPA publication.[21] MPA Portable is a lightweight and freely available application which serves as a one-stop solution for processing and analyzing metaproteomics data. In contrast to the original server-based MPA software,[21] the MPA Portable tool requires no further installation steps and is independent of any relational database system. In addition to the graphical user interface (GUI), which can be used for in-depth data exploration, a command line interface (CLI) has also been added to MPA Portable. This allows the program to be executed as part of a larger, scripted workflow, for instance, on a high-performance cluster environment. While a standalone version (including a guided tutorial) is available for download on the GitHub Web site (https://github.com/compomics/meta-proteome-analyzer), the whole MPA workflow has also been included within the community-accepted multiomics informatics platform of Galaxy-P.[22] In addition to the previously supported database search engine X!Tandem,[23] the newly developed software now also integrates the SEQUEST-derivative Comet[24] and MS-GF+[25] as search algorithms. Similar to the original development, MPA Portable also allows the results of multiple search engines to be combined to increase the overall peptide and protein identification yield, but it adds the ability to perform an optional two-step search workflow.[26] In the two-step searching approach, the spectra are first matched against a wide search space (e.g., the whole UniProt database) without applying any FDR filtering. On the basis of the results of this search, the proteins with at least one PSM are retained by which a new sequence database is created. In a second round, a typical search against the reduced search space is applied with stringent FDR filtering. The objective of such an iterative search procedure is to increase the number of highly confident peptide spectrum matches and, consequently, to improve overall protein identification yield. This strategy is particularly useful for metaproteomics data analysis which usually suffers from a decreased identification rate when searching against large protein sequence databases, which is in turn caused by the higher chance of retrieving high-scoring false positive identifications.[27] Moreover, to improve compatibility with existing proteomic software tools, the import of files stored in the mzIdentML standard data format (version 1.2)[28] has been also implemented. This latter feature is particularly useful for the reprocessing of identification results that have been generated using external tools or elsewhere, as is for instance the case for data obtained from the public domain PRIDE database.[29]Figure provides an overview of an MPA Portable workflow, comprising all typical steps of data processing, ranging from the input of MS/MS spectra, over protein identification, to the MPA-specific postprocessing features such as protein grouping and automated sequence annotation at the taxonomic and functional level.[21]
Figure 1

Overview on the MPA Portable workflow. The software can be accessed using either the graphical user interface (A) or the command line interface (B). User-provided MS/MS spectra (C) are processed within the application for matching against a FASTA database by up to three different search algorithms (X!Tandem, MS-GF+, and Comet) (D). As an alternative to conventional searching, a two-step search (E) can be applied to iteratively reduce the search space. Further postprocessing steps (F) include grouping of homologous proteins to meta-proteins and the fully automated assignment of peptides and protein to taxonomic levels and functional annotations, as described for the original MPA software package.[21]

Overview on the MPA Portable workflow. The software can be accessed using either the graphical user interface (A) or the command line interface (B). User-provided MS/MS spectra (C) are processed within the application for matching against a FASTA database by up to three different search algorithms (X!Tandem, MS-GF+, and Comet) (D). As an alternative to conventional searching, a two-step search (E) can be applied to iteratively reduce the search space. Further postprocessing steps (F) include grouping of homologous proteins to meta-proteins and the fully automated assignment of peptides and protein to taxonomic levels and functional annotations, as described for the original MPA software package.[21] We tested our proposed software workflow on two experimental data sets from samples with known composition. The first benchmarking data set was established by mixing the bacterial strains (5BCT) Bacillus subtilis, Escherichia coli, Pseudomonas fluorescens, Micrococcus luteus, and Desulfovibrio vulgaris with a protein ratio of 1:1:1:1:1. The corresponding sample was prepared in-house and sample specifications can be found in the Supporting Information. The mass spectrometry data have been deposited to the ProteomeXchange Consortium via PRIDE,[29] with data set identifier PXD007681. In addition, the second data set which was used for evaluation derived from a lab-assembled mixture of nine microbial organisms (9MM) published by Tanca et al.[30] The first evaluation concerned the performance of the newly integrated search algorithms in MPA Portable against UniProtKB/Swiss-Prot (2016/12/13) with regard to the accuracy when assigning peptides to given reference taxa at the species level. In this analysis, the assignments were classified at the peptide level as follows (i) “correct unique” when a peptide was matched unambiguously to a protein from the correct taxon, (ii) “correct” when a peptide was matched to the correct taxon but was also shared with proteins from incorrect taxa, and (iii) “incorrect” when a peptide was assigned to species which were not contained in the original sample. In addition, we also applied a taxonomy filter after combining the search results within the MPA application (as previously described by Tanca et al.[30]) corresponding to 5% of the total number of taxon-specific assignments. Thus, only taxa with a higher number of peptide assignments than the specified filter threshold were taken into consideration in that case. The results show that the combination of hits from multiple search engines within the MPA Portable workflow significantly increases the number of correct unique and correct taxon-specific peptides (Figure ). In addition, the results show that the number of incorrect assignments could be decreased by filtering for the most abundant organisms using a relatively high threshold of 5%. However, for more complex samples, it is not recommended to use such a stringent taxonomic filtering, as it may considerably reduce the proportion of sparsely identified but essential organisms.
Figure 2

Taxon-specific peptide assignment performance for 5BCT and 9MM reference data. The numbers of correct unique, correct (i.e., unique and shared) and incorrect taxon-specific peptide identifications are shown as bar charts for data sets 5BCT (A) and 9MM (B) when using X!Tandem (blue), Comet (orange), MS-GF+ (violet), and MPA Portable with an applied taxon filter (TF) of 5% (green). For the latter, the results of all three database search algorithms were combined by taking the union of all hits. The data sets were searched against UniProtKB/Swiss-Prot and filtered by an FDR threshold of 1%.

Taxon-specific peptide assignment performance for 5BCT and 9MM reference data. The numbers of correct unique, correct (i.e., unique and shared) and incorrect taxon-specific peptide identifications are shown as bar charts for data sets 5BCT (A) and 9MM (B) when using X!Tandem (blue), Comet (orange), MS-GF+ (violet), and MPA Portable with an applied taxon filter (TF) of 5% (green). For the latter, the results of all three database search algorithms were combined by taking the union of all hits. The data sets were searched against UniProtKB/Swiss-Prot and filtered by an FDR threshold of 1%. The second evaluation of our software concerned the effect of the newly implemented two-step search strategy. In general, the composition and size of the protein sequence database has a strong impact on the results in any proteomics data analysis workflow, and it has been recommended to focus on relevant sequences for better identification yields.[31] In metaproteomics, however, the actual microbial composition in the sample is commonly unknown and the identification of relevant sequences is therefore highly problematic. Indeed, it would be particularly damaging to introduce selection bias by the mistaken removal of relevant taxa and their reference proteomes. To automate the process of building a sample-optimized search database with sufficient taxonomic coverage and depth, we therefore implemented the previously described two-step search approach.[26] To evaluate the performance of two-step in comparison to conventional searching, we first searched the 9MM data set of known species composition against a tailored database which only contained the protein sequence sets from the nine expected organisms. Next, we matched the same data set against UniProtKB/Swiss-Prot using (i) conventional and (ii) two-step searching. The two-step method was used by searching against the whole database without applying any FDR threshold in the first round. The protein identifications obtained from this first round then serve as the reduced database in a second round search on the same data. In this second round, a stringent FDR threshold of 1% is applied to reduce the number of false positive hits. For each search setting, the peptide identifications were classified into correct and incorrect taxon assignments as described above. We also applied a taxonomy filter with a threshold ranging from 0% to 10% to test the influence of this parameter. In this analysis, the database search results from all three search algorithms X!Tandem, Comet, and MS-GF+ were combined by taking the union of the respective hits. The results of the search method evaluation can be found in Figure . As expected, the most correct taxon assignments at the peptide level could be obtained when searching the data against the 9MM reference database (Figure A). When searching the same data against the whole UniProtKB/Swiss-Prot database, on average, only around 69% of the original hits could be correctly assigned. This can be explained by the increased search space and the peptide sequence ambiguities among homologous species. When applied to the UniProtKB/Swiss-Prot search, the two-step searching approach recovered around 80% of the peptides originally identified against only the 9MM reference database, thus showing better performance than the standard search. However, the proportion of incorrect species assignments is also higher for two-step approach when compared to the standard search (Figure B). Fortunately, this effect can be minimized when increasing the taxon filter threshold, demonstrating that the two-step approach is optimally beneficial when used in combination with taxonomic filtering. It should be noted, however, that the results from the described two-step approach should be treated carefully as the actual FDR of the overall process is likely to be higher than the 1% FDR set for the second round.[27]
Figure 3

Performance evaluation of standard versus two-step searching for 9MM reference data. The numbers of correct (A) and incorrect (B) taxon-specific peptides at 1% FDR are shown as line charts for the 9MM data set, which was searched against a tailored reference database (green) and against UniProtKB/Swiss-Prot using standard search (blue) and two-step search approach (violet). The taxonomy filter shown on the x-axis was applied for values between 0% and 10%. Note that incorrect taxon-specific peptides were not found for the tailored reference database, since only known microbial species were included in the search database.

Performance evaluation of standard versus two-step searching for 9MM reference data. The numbers of correct (A) and incorrect (B) taxon-specific peptides at 1% FDR are shown as line charts for the 9MM data set, which was searched against a tailored reference database (green) and against UniProtKB/Swiss-Prot using standard search (blue) and two-step search approach (violet). The taxonomy filter shown on the x-axis was applied for values between 0% and 10%. Note that incorrect taxon-specific peptides were not found for the tailored reference database, since only known microbial species were included in the search database. The availability of the MPA Portable software marks another step toward fulfilling the needs of the metaproteomics community, which requires reliable and easily accessible solutions for analyzing its valuable high-throughput data. Moreover, the addition of a command-line interface to MPA Portable enables analyses using high-performance cluster hardware.[32] This is particularly important in the context of metaproteomics, because searches are often performed against very large protein sequence databases to cover a wide taxonomic range (e.g., UniProtKB/TrEMBL[33] with over 88 million entries as of July 2017). In our evaluation on the performance of the taxonomic assignment, a rather limited number of fewer than 10 species from microbial mixture samples was used. However, data from more complex samples are required to improve the performance of our pipeline. Overall, the research community of metaproteomics would strongly benefit from such well-defined reference data for benchmarking and optimization of analytical workflows and software tools. Moreover, the thorough assessment of protein FDR estimation represents an important next step in the field which would highlight demand that tools still need to become more reliable at the statistical side for making results better reproducible. It should be also noted that the output of the command-line execution of MPA Portable can be fully imported into the GUI, thus allowing large data sets (e.g., hundreds of thousands to millions of MS/MS spectra) to be analyzed on a cluster environment, while being visualized in the application on a local desktop computer. Consequently, MPA Portable now offers users the combination of usability with computational power in a single package. Finally, to make the developed software even more hardware-independent and sustainable, it has also been integrated into the Galaxy-based workflow for metaproteomics analysis.[34] Similarly, we plan to distribute it within the system-agnostic BioContainers framework which allows software to be installed and executed under an isolated and controllable environment.[35]
  35 in total

Review 1.  Systems biology: Functional analysis of natural microbial consortia using community proteomics.

Authors:  Nathan C VerBerkmoes; Vincent J Denef; Robert L Hettich; Jillian F Banfield
Journal:  Nat Rev Microbiol       Date:  2009-03       Impact factor: 60.633

2.  Metaproteomic analysis using the Galaxy framework.

Authors:  Pratik D Jagtap; Alan Blakely; Kevin Murray; Shaun Stewart; Joel Kooren; James E Johnson; Nelson L Rhodus; Joel Rudney; Timothy J Griffin
Journal:  Proteomics       Date:  2015-07-24       Impact factor: 3.984

3.  The MetaProteomeAnalyzer: a powerful open-source software suite for metaproteomics data analysis and interpretation.

Authors:  Thilo Muth; Alexander Behne; Robert Heyer; Fabian Kohrs; Dirk Benndorf; Marcus Hoffmann; Miro Lehtevä; Udo Reichl; Lennart Martens; Erdmann Rapp
Journal:  J Proteome Res       Date:  2015-02-23       Impact factor: 4.466

4.  A complex microworld in the gut: gut microbiota and cardiovascular disease connectivity.

Authors:  Michael R Howitt; Wendy S Garrett
Journal:  Nat Med       Date:  2012-08       Impact factor: 53.440

5.  Deep Metaproteomics Approach for the Study of Human Microbiomes.

Authors:  Xu Zhang; Wendong Chen; Zhibin Ning; Janice Mayne; David Mack; Alain Stintzi; Ruijun Tian; Daniel Figeys
Journal:  Anal Chem       Date:  2017-08-11       Impact factor: 6.986

6.  Gut microbiota in human adults with type 2 diabetes differs from non-diabetic adults.

Authors:  Nadja Larsen; Finn K Vogensen; Frans W J van den Berg; Dennis Sandris Nielsen; Anne Sofie Andreasen; Bente K Pedersen; Waleed Abu Al-Soud; Søren J Sørensen; Lars H Hansen; Mogens Jakobsen
Journal:  PLoS One       Date:  2010-02-05       Impact factor: 3.240

7.  Pediatric Crohn disease patients exhibit specific ileal transcriptome and microbiome signature.

Authors:  Yael Haberman; Timothy L Tickle; Phillip J Dexheimer; Mi-Ok Kim; Dora Tang; Rebekah Karns; Robert N Baldassano; Joshua D Noe; Joel Rosh; James Markowitz; Melvin B Heyman; Anne M Griffiths; Wallace V Crandall; David R Mack; Susan S Baker; Curtis Huttenhower; David J Keljo; Jeffrey S Hyams; Subra Kugathasan; Thomas D Walters; Bruce Aronow; Ramnik J Xavier; Dirk Gevers; Lee A Denson
Journal:  J Clin Invest       Date:  2014-07-08       Impact factor: 14.808

8.  Dysfunction of the intestinal microbiome in inflammatory bowel disease and treatment.

Authors:  Xochitl C Morgan; Timothy L Tickle; Harry Sokol; Dirk Gevers; Kathryn L Devaney; Doyle V Ward; Joshua A Reyes; Samir A Shah; Neal LeLeiko; Scott B Snapper; Athos Bousvaros; Joshua Korzenik; Bruce E Sands; Ramnik J Xavier; Curtis Huttenhower
Journal:  Genome Biol       Date:  2012-04-16       Impact factor: 13.583

9.  MS-GF+ makes progress towards a universal database search tool for proteomics.

Authors:  Sangtae Kim; Pavel A Pevzner
Journal:  Nat Commun       Date:  2014-10-31       Impact factor: 14.919

10.  MetaPro-IQ: a universal metaproteomic approach to studying human and mouse gut microbiota.

Authors:  Xu Zhang; Zhibin Ning; Janice Mayne; Jasmine I Moore; Jennifer Li; James Butcher; Shelley Ann Deeke; Rui Chen; Cheng-Kang Chiang; Ming Wen; David Mack; Alain Stintzi; Daniel Figeys
Journal:  Microbiome       Date:  2016-06-24       Impact factor: 14.650

View more
  8 in total

1.  A complete and flexible workflow for metaproteomics data analysis based on MetaProteomeAnalyzer and Prophane.

Authors:  Henning Schiebenhoefer; Kay Schallert; Bernhard Y Renard; Kathrin Trappe; Emanuel Schmid; Dirk Benndorf; Katharina Riedel; Thilo Muth; Stephan Fuchs
Journal:  Nat Protoc       Date:  2020-08-28       Impact factor: 13.491

2.  Novel Bioinformatics Strategies Driving Dynamic Metaproteomic Studies.

Authors:  Caitlin M A Simopoulos; Daniel Figeys; Mathieu Lavallée-Adam
Journal:  Methods Mol Biol       Date:  2022

3.  Bottom-Up Community Proteome Analysis of Saliva Samples and Tongue Swabs by Data-Dependent Acquisition Nano LC-MS/MS Mass Spectrometry.

Authors:  Alexander Rabe; Manuela Gesell Salazar; Uwe Völker
Journal:  Methods Mol Biol       Date:  2021

Review 4.  Progress and Challenges in Ocean Metaproteomics and Proposed Best Practices for Data Sharing.

Authors:  Mak A Saito; Erin M Bertrand; Megan E Duffy; David A Gaylord; Noelle A Held; William Judson Hervey; Robert L Hettich; Pratik D Jagtap; Michael G Janech; Danie B Kinkade; Dagmar H Leary; Matthew R McIlvin; Eli K Moore; Robert M Morris; Benjamin A Neely; Brook L Nunn; Jaclyn K Saunders; Adam I Shepherd; Nicholas I Symmonds; David A Walsh
Journal:  J Proteome Res       Date:  2019-03-12       Impact factor: 4.466

Review 5.  Advancing functional and translational microbiome research using meta-omics approaches.

Authors:  Xu Zhang; Leyuan Li; James Butcher; Alain Stintzi; Daniel Figeys
Journal:  Microbiome       Date:  2019-12-06       Impact factor: 14.650

Review 6.  Challenges and Perspective in Integrated Multi-Omics in Gut Microbiota Studies.

Authors:  Eric Banan-Mwine Daliri; Fred Kwame Ofosu; Ramachandran Chelliah; Byong H Lee; Deog-Hwan Oh
Journal:  Biomolecules       Date:  2021-02-17

7.  Survey of metaproteomics software tools for functional microbiome analysis.

Authors:  Ray Sajulga; Caleb Easterly; Michael Riffle; Bart Mesuere; Thilo Muth; Subina Mehta; Praveen Kumar; James Johnson; Bjoern Andreas Gruening; Henning Schiebenhoefer; Carolin A Kolmeder; Stephan Fuchs; Brook L Nunn; Joel Rudney; Timothy J Griffin; Pratik D Jagtap
Journal:  PLoS One       Date:  2020-11-10       Impact factor: 3.240

Review 8.  Harnessing microbial wealth for lignocellulose biomass valorization through secretomics: a review.

Authors:  Sivasamy Sethupathy; Gabriel Murillo Morales; Yixuan Li; Yongli Wang; Jianxiong Jiang; Jianzhong Sun; Daochen Zhu
Journal:  Biotechnol Biofuels       Date:  2021-07-05       Impact factor: 6.040

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.