| Literature DB >> 32326049 |
Chen Chen1, Jie Hou2,3, John J Tanner4, Jianlin Cheng1.
Abstract
Recent advances in mass spectrometry (MS)-based proteomics have enabled tremendous progress in the understanding of cellular mechanisms, disease progression, and the relationship between genotype and phenotype. Though many popular bioinformatics methods in proteomics are derived from other omics studies, novel analysis strategies are required to deal with the unique characteristics of proteomics data. In this review, we discuss the current developments in the bioinformatics methods used in proteomics and how they facilitate the mechanistic understanding of biological processes. We first introduce bioinformatics software and tools designed for mass spectrometry-based protein identification and quantification, and then we review the different statistical and machine learning methods that have been developed to perform comprehensive analysis in proteomics studies. We conclude with a discussion of how quantitative protein data can be used to reconstruct protein interactions and signaling networks.Entities:
Keywords: bioinformatics analysis; computational proteomics; machine learning
Year: 2020 PMID: 32326049 PMCID: PMC7216093 DOI: 10.3390/ijms21082873
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Figure 1General workflow of bioinformatics analysis in mass spectrometry-based proteomics. (a) MA-plot from protein differential abundance analysis. X-axis is the log2 transformed fold change and Y-axis is the average protein abundance from replicates. (b) Distribution of protein abundance data before and after normalization. (c) Heatmap for protein abundance with clustering; (d) Protein set enrichment analysis, Y-axis in the above plot shows the ranked list metric, and in the bottom plot shows the running enrichment score. X-axis is the ranked position in protein list. (e) Machine learning-based sample clustering. (f) Illustration of a network inferred from proteomics data. (g) Dimensionality reduction of proteomics expression profile.
Commonly used software packages for peptide and protein identification.
| Category | Name | Description |
|---|---|---|
| Database search algorithms | Andromeda [ | Probabilistic scoring-based peptide search engine integrated in MaxQuant. |
| Mascot [ | Probability-based database searching algorithm. | |
| MSPLIT-DIA [ | Sensitive Peptide Identification for Data Independent Acquisition. | |
| MudPIT [ | Multidimensional protein identification. | |
| PepArML [ | An Unsupervised, Model-Free, Machine-Learning Combiner for Peptide Identifications from Tandem Mass Spectra. | |
| PepHMM [ | A hidden Markov model-based scoring function for mass spectrometry database search. | |
| Protein Prospector [ | An integrated framework of about twenty proteomic analysis tools. | |
| SEQUEST [ | An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database | |
| TopPIC [ | A software tool for top-down mass spectrometry-based complex proteoforms identification. | |
| X!Tandem/X!!Tandem [ | An open source software that search tandem mass spectra with peptide sequences in database. | |
| De novo peptide sequencing | DeepNovo-DIA [ | De novo peptide sequencing by deep learning. |
| EigenMS [ | De novo Analysis of Peptide Tandem Mass Spectra. | |
| NovoHMM [ | A hidden Markov model for de novo peptide sequencing. | |
| PEAKS [ | A fast de novo sequencing tool. | |
| PECAN [ | Library Free Peptide Detection for Data-Independent Acquisition Tandem Mass Spectrometry Data. | |
| PepNovo [ | De novo peptide sequencing via probabilistic network modeling. | |
| pNovo 3 [ | A software for precise de novo peptide sequencing using a learning-to-rank framework. | |
| SHERENGA [ | De novo peptide sequencing via tandem mass spectrometry. | |
| SWPepNovo [ | An Efficient De Novo Peptide Sequencing Tool for Large-scale MS/MS Spectra Analysis. | |
| UniNovo [ | A universal de novo peptide sequencing algorithm with a modified offset frequency function. | |
| Hybrid identification approach | ByOnic [ | A hybrid of de novo sequencing and database search for protein identification by tandem mass spectrometry. |
| DirecTag [ | Accurate sequence tags from peptide MS/MS with statistical scoring method. | |
| InsPecT [ | A software for identification of peptides posttranslational modification (PTM) from tandem mass spectra. | |
| JUMP [ | A tag-based database search tool for peptide identification. | |
| PEAKS DB [ | A hybrid de novo sequencing tool run in parallel with database search. | |
| ProteomeGenerator [ | A hybrid framework for based on de novo transcriptome assembly and database matching. | |
| Other software related to protein/PTM identification | DBParser [ | Web-based software for shotgun proteomic data analyses. |
| DIA-Umpire [ | Comprehensive computational framework for data independent acquisition proteomics. | |
| MassSieve [ | Panning MS/MS peptide data for proteins. | |
| MAYU [ | A novel strategy that reliably estimates false discovery rates for protein identifications in large-scale datasets. | |
| ModifiComb [ | Mapping substoichiometric post-translational modifications. | |
| Nokoi [ | A decoy-free approach for improved peptide identification accuracy | |
| Param-Medic [ | A strategy for inferring optimal search parameters for shotgun proteomics analysis. | |
| Perseus [ | Platform for comprehensive analysis of proteomics data. | |
| PROVALT [ | A heuristic method for computing false discovery rate (FDR) for protein identifications. | |
| MetaMorpheus [ | Enhanced Global Post-translational Modification Discovery. | |
| PTMselect [ | Optimization of protein modifications discovery by mass spectrometry. |
Commonly used software package for quantitative proteomics.
| Category | Name | Description |
|---|---|---|
| Label-based | IsobariQ [ | A relative quantification software that can be used for both iTRAQ (Isobaric tags for relative and absolute quantitation) and TMT (Tandem mass tag) labeling. |
| iTracker [ | Allows quantitative information gained using the iTRAQ protocol to be linked with peptide identifications from popular tandem MS identification tools | |
| Libra [ | The iTRAQ quantification module of the TPP (Trans-Proteomic Pipeline). | |
| MaxQuant [ | One of the most frequently used platforms for mass-spectrometry (MS)-based proteomics data analysis. | |
| ProteinPilot (ABSciex) | Full solution for protein identification and label-based protein expression experiments. | |
| Proteome Discoverer (Thermo Scientific) | Proteomics workflows for a wide range of applications. | |
| PVIEW [ | LC-MS/MS Data Viewer and Analyzer developed by Princeton. | |
| XPRESS [ | Quantitative profiling of differentiation-induced microsomal proteins using isotope-coded affinity tags and mass spectrometry. | |
| Label-free | emPAI [ | Exponentially modified protein abundance index. |
| Mascot Distiller (Matrix Science) | A single, intuitive interface to a wide range of native (binary) mass spectrometry data files. | |
| MaxLFQ [ | Label free quantification module integrated in MaxQuant. | |
| VIPER [ | Visual Inspection of Peak/Elution Relationships. | |
| Integrated platform | OpenMS [ | A cross-platform software for the flexible analysis of MS data. |
| Peaks Studio X + [ | A peptide/protein identification & quantification software platform that offers complete solutions, including PTM and sequence variants. | |
| Skyline [ | An open-source Windows client application for accelerating targeted proteomics experimentation. |
Figure 2A typical workflow for protein abundance quantification using Trans-Proteomic Pipeline (TPP). After the raw data files are converted to one of the supported open XML formats, the data are processed with a designated search engine and the search results are validated at spectrum, peptide, and protein level. Meanwhile, the quantification tools will perform the protein quantification with the normalized data. The final output are ready for various downstream analysis.
Figure 3Illustration of enrichment analysis with proteomics data. (a) Gene Ontology (GO) enrichment with Circos plot. The left part of the circle shows the differentially expressed proteins and their significant levels. The right part shows the enriched GO Terms. (b) Pathway enrichment in dot plot. The colors of the dots represent the adjusted p-value and the sizes of the nodes represents the ratio of the proteins that are differentially expressed to the total proteins in the pathway. (c) The running sum score of the gene set enrichment analysis (GSEA). Significant proteins will increase the enrichment score and proteins that are not in the list will decrease the score. (d) Protein enrichment with KEGG pathway database with Pathview. The color of the boxes represents the log2 fold change of the protein abundances.
Downstream bioinformatics analysis software tools.
| Category | Name | Description |
|---|---|---|
| Downstream bioinformatics analysis tools | COVAIN [ | A software for statistics, time series, and correlation network analysis. |
| CRONOS [ | A cross-reference navigation server. | |
| GNET2 [ | An R package for module inference of biological network. | |
| GSimp [ | A Gibbs sampler-based left-censored missing value imputation approach for metabolomics studies. | |
| HIPPIE v2.0 [ | A tool for enhancing meaningfulness and reliability of protein–protein interaction networks. | |
| IKAP [ | A heuristic framework for inference of kinase activities from phosphoproteomics data. | |
| INGA [ | A tool for protein function prediction combining interaction networks, domain assignments, and sequence similarity. | |
| KSEA [ | A web-based tool for kinase activity inference from quantitative phosphoproteomics. | |
| Neglog [ | Reconstruction of Human Protein–Protein Interaction Networks. | |
| Pathview [ | An R package for pathway-based data integration and visualization. | |
| PP2A [ | An integrated workflow for charting the human interaction proteome. | |
| ProLoc-GO [ | Utilizing informative Gene Ontology terms for sequence-based prediction of protein subcellular localization. | |
| PSEA-Quant [ | A Protein Set Enrichment Analysis on Label-Free and Label-Based Protein Quantification Data. | |
| viper [ | Virtual Inference of Protein-activity by Enriched Regulon analysis. | |
| Databases and web services for downstream analysis | STRING [ | A database for quality-controlled protein-protein association networks. |
| SIGNOR [ | A database of causal relationships between biological entities. | |
| KEGG [ | A disease and pathway database. | |
| PANTHER Pathway [ | An Ontology-Based Pathway Database Coupled with Data Analysis Tools. | |
| Pathway Commons [ | A web resource for biological pathway data. | |
| PhosphoSitePlus [ | A comprehensive web services for post-translational modifications. | |
| PICR [ | Reconciling protein identifiers across multiple source databases. | |
| Reactome [ | A database of reactions, pathways, and biological processes. |