| Literature DB >> 32650610 |
Subina Mehta1, Caleb W Easterly1, Ray Sajulga1, Robert J Millikin2, Andrea Argentini3, Ignacio Eguinoa3, Lennart Martens3, Michael R Shortreed2, Lloyd M Smith2, Thomas McGowan4, Praveen Kumar1, James E Johnson4, Timothy J Griffin1, Pratik D Jagtap1.
Abstract
For mass spectrometry-based peptide and protein quantification, label-free quantification (LFQ) based on precursor mass peak (MS1) intensities is considered reliable due to its dynamic range, reproducibility, and accuracy. LFQ enables peptide-level quantitation, which is useful in proteomics (analyzing peptides carrying post-translational modifications) and multi-omics studies such as metaproteomics (analyzing taxon-specific microbial peptides) and proteogenomics (analyzing non-canonical sequences). Bioinformatics workflows accessible via the Galaxy platform have proven useful for analysis of such complex multi-omic studies. However, workflows within the Galaxy platform have lacked well-tested LFQ tools. In this study, we have evaluated moFF and FlashLFQ, two open-source LFQ tools, and implemented them within the Galaxy platform to offer access and use via established workflows. Through rigorous testing and communication with the tool developers, we have optimized the performance of each tool. Software features evaluated include: (a) match-between-runs (MBR); (b) using multiple file-formats as input for improved quantification; (c) use of containers and/or conda packages; (d) parameters needed for analyzing large datasets; and (e) optimization and validation of software performance. This work establishes a process for software implementation, optimization, and validation, and offers access to two robust software tools for LFQ-based analysis within the Galaxy platform.Entities:
Keywords: galaxy framework; label-free quantification; proteomics; workflows
Year: 2020 PMID: 32650610 PMCID: PMC7563855 DOI: 10.3390/proteomes8030015
Source DB: PubMed Journal: Proteomes ISSN: 2227-7382
Figure 1Galaxy interface of moFF and FlashLFQ: (A) Bioconductor package of moFF is wrapped within Galaxy and available via Galaxy toolshed [21] and Galaxy public instances (proteomics.usegalaxy.eu). (B) A docker/singularity container of FlashLFQ is wrapped within Galaxy and available via Galaxy toolshed [22] and Galaxy public instances (proteomics.usegalaxy.eu).
Figure 2Experimental design of the evaluation study: spectra files are converted to MGF before mass spectra are matched with peptides using respective search engines. Each of the quantification tools use RAW files and the peptide identification tabular output as inputs. The figure also shows the features of each tool. The outputs from all of the tools were then compared against each other. The asterisk symbol (*) denotes that the files were run on same computing device.
Figure 3(A) Effect of MBR after software version updates: The log10 values of the intensities (blue bars) from each of the four ABRF spiked-in proteins (ABRF-1: beta Galactosidase from E. coli, ABRF-2: Lysozyme from Gallus gallus, ABRF-3: amylase from Aspergillus, ABRF-4: protein G Streptococcus) were plotted. The results from prior versions of moFF (v1.2.1) and FlashLFQ (v0.1.99) (before) shows that MBR detects ABRF proteins (shown in red) in the negative control sample in both software. The results from the current versions of moFF (v2.0.2) and FlashLFQ (v1.0.3.0) implemented in Galaxy (after), shows that the MBR feature does not detect ABRF proteins in the negative control. (B) Accuracy of fold-change estimation: for evaluating the accuracy of quantified results, we estimated the fold change of the spiked-in proteins in the 500 fmol sample as compared to 100 fmol sample. The root mean squared log error (RMSLE) was calculated for fold change estimation. For this dataset, moFF with MBR displayed significantly higher RMSLE value, whereas FlashLFQ’s MBR performed similarly to MaxQuant’s MBR.
Figure 4(A) Fold-change accuracy (MBR) of all proteins: after normalization, the estimated protein abundance ratios for all the identified UPS proteins were compared to the true abundance ratios, using the root mean squared log error (RMSLE). The plot represents the RMSLE values using different normalization methods. *LFQ denotes the LFQ values represent FlashLFQ’s and MaxQuant’s inbuilt normalization value. The value on the top of the bars denotes the number of proteins that were quantified. (B) Fold change accuracy (MBR) of proteins with similar estimated ratios: In total there are 48 UPS proteins, we classified the UPS proteins into different groups based on the UPS2/UPS1 ratio estimation, the true ratios run from 10 to 10−4. The value on the top of the bars denotes the number of proteins that were quantified using each normalization method. The RMSLE of the intensity ratio was used to measure the accuracy of the estimated fold changes.