Literature DB >> 26243018

Correcting systematic bias and instrument measurement drift with mzRefinery.

Bryson C Gibbons¹, Matthew C Chambers², Matthew E Monroe¹, David L Tabb², Samuel H Payne¹.

Abstract

MOTIVATION: Systematic bias in mass measurement adversely affects data quality and negates the advantages of high precision instruments.
RESULTS: We introduce the mzRefinery tool for calibration of mass spectrometry data files. Using confident peptide spectrum matches, three different calibration methods are explored and the optimal transform function is chosen. After calibration, systematic bias is removed and the mass measurement errors are centered at 0 ppm. Because it is part of the ProteoWizard package, mzRefinery can read and write a wide variety of file formats.
AVAILABILITY AND IMPLEMENTATION: The mzRefinery tool is part of msConvert, available with the ProteoWizard open source package at http://proteowizard.sourceforge.net/ CONTACT: samuel.payne@pnnl.gov. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Entities: Disease Gene

Mesh：

Substances：

Year: 2015 PMID： 26243018 PMCID： PMC4653383 DOI： 10.1093/bioinformatics/btv437

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

1 Introduction

For data analysis algorithms to take advantage of the higher accuracy of newer mass spectrometers, it is essential to remove systematic bias in mass measurement. Mass measurement error may originate from a variety of sources, e.g. power supply voltage/temperature drift, space charge effects, temperature/humidity variation in the laboratory, vacuum system stability, etc. Real-time calibration adjusts the mass measurement during data acquisition (Charles, 2003; Olsen ;), typically using a known species as an internal reference. Lock mass methods may also be used to calibrate after the run has completed (Zhang ). A separate method for calibration utilizes spectrum identifications to estimate measurement error and guide mass correction (Cox ; Petyuk ). We present a new calibration tool, mzRefinery, written directly into the ProteoWizard package (Kessner ). Like existing tools, mzRefinery models mass measurement error based on peptide identifications and finds the optimal calibration function. In addition to simply adjusting the precursor ion, mzRefinery corrects the m/z of every ion in any high-resolution spectrum. With the increasingly common use of high-resolution tandem mass spectra in PRM and DIA experiments, more data are being created with high-resolution fragments. Given the inherent complexity of such multiplexed fragmentation protocols, calibrating the mass accuracy will be a great benefit for these experiments.

2 Implementation

mzRefinery has three different methods for calibration. The goal of each method is to identify the m/z offset that should be applied in creating a calibrated spectrum file. The software architecture is specifically designed to allow for new calibration methods to be written and seamlessly integrated. A detailed description of the mass spectrometry data files, software class architecture and operation are provided in Supplementary Data and Supplementary Figure S1.

2.1 Global shift

Using the sub-class AdjustSimpleGlobal creates a single global shift. For every confident identification in the mzIdentML file (default q < 0.01), the exact monoisotopic m/z is calculated and compared with the observed m/z (using xml field experimentalMassToCharge). Mass errors >±0.2 m/z are filtered to avoid using data where the monoisotope was incorrectly reported by the spectrum file. After converting the error to ppm, the errors are collected into 0.5 ppm bins. After the entire file is processed, the median ppm error is calculated and used as the global shift. In the output mzML file, the SpectrumList_mzRefiner object applies the global ppm error to every peak in every high-resolution spectrum.

2.2 LC-dependent shift

Calculating the LC-dependent shift uses sub-class AdjustByScanTime. In general, the process is very similar to the calculation of a global shift. For every confident identification, both the ppm error and LC time are calculated. LC time is derived from the ScanStartTime field in mzIdentML, or from ScanStartTime in mzML. Errors are ordered by time and sorted into bins containing all scans within a 75-s period. The median ppm error of the bin is calculated, and smoothed using the median of neighboring bins (Supplementary Fig. S2). Bins in addition to the i + 1 and i − 1 neighbors are included as necessary to achieve a minimum of 100 identifications in the weighted average. When writing out the calibrated mzML file, the applied mass correction is generated through a linear interpolation of the median error values based on the scan time. By binning the data and then preforming a linear fit, the algorithm approximates a more complex smoothing.

2.3 m/z-dependent shift

Calculating the m/z-dependent shift uses sub-class AdjustByMass. This function is exactly like the LC-dependent shift except that measured m/z is tracked as the dependent variable.

3 Results and discussion

The mzRefinery program is designed to calibrate any mass spectrometry data file based on a preliminary set of identifications. The algorithm is implemented within the msconvert program, part of the ProteoWizard suite, and therefore natively understands multiple input and output formats (Chambers ). As described in the Supplementary Data, we use mzRefinery to calibrate MS and MS/MS data from Thermo Orbitrap and Bruker QqTOF instruments. All files were searched with the appropriate database and parameters by both MSGF+ and MyriMatch. This preliminary set of PSMs was used as input to the msconvert program. The resulting mzML file has updated (calibrated) m/z values. Figure 1 shows the mass measurement error present in the original mzML files. We note that for the file in Figure 1, the error changed during the LC run, and is effectively eliminated by mzRefinery. The calibrated file shows no such dependency.

Fig. 1.

Calibration. The top two graphs show a histogram of mass error, calculated using PSM identifications for dataset sample3-B_BB4_01_926. This particular file has a bimodal error in the original. After calibration (top right), the error has been removed. The bottom two graphs plot mass measurement error according to scan number. The original data (bottom left) show that the error varies dramatically with time. By using the LC-dependent calibration, the errors are removed (bottom right) When viewing the performance of the algorithm across multiple files, it is remarkably consistent. For the 91 files tested, the original median error of any given file ranged between −2.8 and +8.4 ppm (average 1.4, SD 2.3). After calibration the median error ranged between −0.59 and +0.28 (average 0.02, SD 0.08), with 70 of the 91 files having a median error <±0.05 ppm. Thus, the method accurately removes any systematic bias in mass measurement. A primary goal of the project is to make the mzRefinery algorithm broadly accessible. As part of the ProteoWizard suite, it is available as both an executable program and a platform for further development. The software architecture is intentionally written to be extensible and new calibration methods are automatically considered. Several reasons might prompt design of a new calibration method. In the current implementation, only one dependent variable is considered (i.e. LC time or m/z). However, previous study has shown that additional improvement is possible with more complex multivariate dependencies (Petyuk ). A second motivation would be to create a new calibration for a distinct instrument or mass analyzer. Although the current software has been shown to perform well on both Orbitrap and TOF instruments, we acknowledge that keeping up with new instrumentation is an ongoing process. A suggested workflow for using mzRefinery is to first search each LC-MS/MS dataset for PSMs using fully tryptic search rules, no dynamic modifications and a relatively wide parent ion mass window, e.g. ±50 ppm. These parameters allow the search engine to quickly search for confident PSMs, yet allow for identifying PSMs even if the data were acquired when the instrument was not at its optimal calibration. Next, use mzRefinery to recalibrate each dataset using the identified PSMs from this initial search. Now re-search for PSMs in the data, but this time using the calibrated mzML files, a partially tryptic search, dynamic modifications and a narrower parent ion mass window, e.g. ±10 ppm or even ±ppm. Use of this narrow mass window will result in fewer false positives at a given false discovery rate.

7 in total

1. Flow injection of the lock mass standard for accurate mass measurement in electrospray ionization time-of-flight mass spectrometry coupled with liquid chromatography.

Authors: Laurence Charles
Journal: Rapid Commun Mass Spectrom Date: 2003 Impact factor: 2.419

2. Software lock mass by two-dimensional minimization of peptide mass errors.

Authors: Jürgen Cox; Annette Michalski; Matthias Mann
Journal: J Am Soc Mass Spectrom Date: 2011-04-22 Impact factor: 3.109

3. Parts per million mass accuracy on an Orbitrap mass spectrometer via lock mass injection into a C-trap.

Authors: Jesper V Olsen; Lyris M F de Godoy; Guoqing Li; Boris Macek; Peter Mortensen; Reinhold Pesch; Alexander Makarov; Oliver Lange; Stevan Horning; Matthias Mann
Journal: Mol Cell Proteomics Date: 2005-10-24 Impact factor: 5.911

4. DtaRefinery, a software tool for elimination of systematic errors from parent ion mass measurements in tandem mass spectra data sets.

Authors: Vladislav A Petyuk; Anoop M Mayampurath; Matthew E Monroe; Ashoka D Polpitiya; Samuel O Purvine; Gordon A Anderson; David G Camp; Richard D Smith
Journal: Mol Cell Proteomics Date: 2009-12-17 Impact factor: 5.911

5. Improving proteomics mass accuracy by dynamic offline lock mass.

Authors: Ying Zhang; Zhihui Wen; Michael P Washburn; Laurence Florens
Journal: Anal Chem Date: 2011-11-16 Impact factor: 6.986

6. ProteoWizard: open source software for rapid proteomics tools development.

Authors: Darren Kessner; Matt Chambers; Robert Burke; David Agus; Parag Mallick
Journal: Bioinformatics Date: 2008-07-07 Impact factor: 6.937

7. A cross-platform toolkit for mass spectrometry and proteomics.

Authors: Matthew C Chambers; Brendan Maclean; Robert Burke; Dario Amodei; Daniel L Ruderman; Steffen Neumann; Laurent Gatto; Bernd Fischer; Brian Pratt; Jarrett Egertson; Katherine Hoff; Darren Kessner; Natalie Tasman; Nicholas Shulman; Barbara Frewen; Tahmina A Baker; Mi-Youn Brusniak; Christopher Paulse; David Creasy; Lisa Flashner; Kian Kani; Chris Moulding; Sean L Seymour; Lydia M Nuwaysir; Brent Lefebvre; Frank Kuhlmann; Joe Roark; Paape Rainer; Suckau Detlev; Tina Hemenway; Andreas Huhmer; James Langridge; Brian Connolly; Trey Chadick; Krisztina Holly; Josh Eckels; Eric W Deutsch; Robert L Moritz; Jonathan E Katz; David B Agus; Michael MacCoss; David L Tabb; Parag Mallick
Journal: Nat Biotechnol Date: 2012-10 Impact factor: 54.908

7 in total

11 in total

1. The Salivary Protein Repertoire of the Polyphagous Spider Mite Tetranychus urticae: A Quest for Effectors.

Authors: Wim Jonckheere; Wannes Dermauw; Vladimir Zhurov; Nicky Wybouw; Jan Van den Bulcke; Carlos A Villarroel; Robert Greenhalgh; Mike Grbić; Rob C Schuurink; Luc Tirry; Geert Baggerman; Richard M Clark; Merijn R Kant; Bartel Vanholme; Gerben Menschaert; Thomas Van Leeuwen
Journal: Mol Cell Proteomics Date: 2016-10-04 Impact factor: 5.911

2. Proteogenomic Characterization of Endometrial Carcinoma.

Authors: Yongchao Dou; Emily A Kawaler; Daniel Cui Zhou; Marina A Gritsenko; Chen Huang; Lili Blumenberg; Alla Karpova; Vladislav A Petyuk; Sara R Savage; Shankha Satpathy; Wenke Liu; Yige Wu; Chia-Feng Tsai; Bo Wen; Zhi Li; Song Cao; Jamie Moon; Zhiao Shi; MacIntosh Cornwell; Matthew A Wyczalkowski; Rosalie K Chu; Suhas Vasaikar; Hua Zhou; Qingsong Gao; Ronald J Moore; Kai Li; Sunantha Sethuraman; Matthew E Monroe; Rui Zhao; David Heiman; Karsten Krug; Karl Clauser; Ramani Kothadia; Yosef Maruvka; Alexander R Pico; Amanda E Oliphant; Emily L Hoskins; Samuel L Pugh; Sean J I Beecroft; David W Adams; Jonathan C Jarman; Andy Kong; Hui-Yin Chang; Boris Reva; Yuxing Liao; Dmitry Rykunov; Antonio Colaprico; Xi Steven Chen; Andrzej Czekański; Marcin Jędryka; Rafał Matkowski; Maciej Wiznerowicz; Tara Hiltke; Emily Boja; Christopher R Kinsinger; Mehdi Mesri; Ana I Robles; Henry Rodriguez; David Mutch; Katherine Fuh; Matthew J Ellis; Deborah DeLair; Mathangi Thiagarajan; D R Mani; Gad Getz; Michael Noble; Alexey I Nesvizhskii; Pei Wang; Matthew L Anderson; Douglas A Levine; Richard D Smith; Samuel H Payne; Kelly V Ruggles; Karin D Rodland; Li Ding; Bing Zhang; Tao Liu; David Fenyö
Journal: Cell Date: 2020-02-13 Impact factor: 41.582

3. An algorithm to correct saturated mass spectrometry ion abundances for enhanced quantitation and mass accuracy in omic studies.

Authors: Aivett Bilbao; Bryson C Gibbons; Gordon W Slysz; Kevin L Crowell; Matthew E Monroe; Yehia M Ibrahim; Richard D Smith; Samuel H Payne; Erin S Baker
Journal: Int J Mass Spectrom Date: 2017-11-06 Impact factor: 1.986

4. Deoxyhypusine synthase promotes a pro-inflammatory macrophage phenotype.

Authors: Emily Anderson-Baucum; Annie R Piñeros; Abhishek Kulkarni; Bobbie-Jo Webb-Robertson; Bernhard Maier; Ryan M Anderson; Wenting Wu; Sarah A Tersey; Teresa L Mastracci; Isabel Casimiro; Donalyn Scheuner; Thomas O Metz; Ernesto S Nakayasu; Carmella Evans-Molina; Raghavendra G Mirmira
Journal: Cell Metab Date: 2021-09-07 Impact factor: 31.373

5. Proteogenomic analysis of cancer aneuploidy and normal tissues reveals divergent modes of gene regulation across cellular pathways.

Authors: Pan Cheng; Xin Zhao; Lizabeth Katsnelson; Elaine M Camacho-Hernandez; Angela Mermerian; Joseph C Mays; Scott M Lippman; Reyna Edith Rosales-Alvarez; Raquel Moya; Jasmine Shwetar; Dominic Grun; David Fenyo; Teresa Davoli
Journal: Elife Date: 2022-09-21 Impact factor: 8.713

6. Proteogenomic and metabolomic characterization of human glioblastoma.

Authors: Liang-Bo Wang; Alla Karpova; Marina A Gritsenko; Jennifer E Kyle; Song Cao; Yize Li; Dmitry Rykunov; Antonio Colaprico; Joseph H Rothstein; Runyu Hong; Vasileios Stathias; MacIntosh Cornwell; Francesca Petralia; Yige Wu; Boris Reva; Karsten Krug; Pietro Pugliese; Emily Kawaler; Lindsey K Olsen; Wen-Wei Liang; Xiaoyu Song; Yongchao Dou; Michael C Wendl; Wagma Caravan; Wenke Liu; Daniel Cui Zhou; Jiayi Ji; Chia-Feng Tsai; Vladislav A Petyuk; Jamie Moon; Weiping Ma; Rosalie K Chu; Karl K Weitz; Ronald J Moore; Matthew E Monroe; Rui Zhao; Xiaolu Yang; Seungyeul Yoo; Azra Krek; Alexis Demopoulos; Houxiang Zhu; Matthew A Wyczalkowski; Joshua F McMichael; Brittany L Henderson; Caleb M Lindgren; Hannah Boekweg; Shuangjia Lu; Jessika Baral; Lijun Yao; Kelly G Stratton; Lisa M Bramer; Erika Zink; Sneha P Couvillion; Kent J Bloodsworth; Shankha Satpathy; Weiva Sieh; Simina M Boca; Stephan Schürer; Feng Chen; Maciej Wiznerowicz; Karen A Ketchum; Emily S Boja; Christopher R Kinsinger; Ana I Robles; Tara Hiltke; Mathangi Thiagarajan; Alexey I Nesvizhskii; Bing Zhang; D R Mani; Michele Ceccarelli; Xi S Chen; Sandra L Cottingham; Qing Kay Li; Albert H Kim; David Fenyö; Kelly V Ruggles; Henry Rodriguez; Mehdi Mesri; Samuel H Payne; Adam C Resnick; Pei Wang; Richard D Smith; Antonio Iavarone; Milan G Chheda; Jill S Barnholtz-Sloan; Karin D Rodland; Tao Liu; Li Ding
Journal: Cancer Cell Date: 2021-02-11 Impact factor: 31.743

7. Ancient Regulatory Role of Lysine Acetylation in Central Metabolism.

Authors: Ernesto S Nakayasu; Meagan C Burnet; Hanna E Walukiewicz; Christopher S Wilkins; Anil K Shukla; Shelby Brooks; Matthew J Plutz; Brady D Lee; Birgit Schilling; Alan J Wolfe; Susanne Müller; John R Kirby; Christopher V Rao; John R Cort; Samuel H Payne
Journal: mBio Date: 2017-11-28 Impact factor: 7.867

8. Proteogenomic Characterization of Ovarian HGSC Implicates Mitotic Kinases, Replication Stress in Observed Chromosomal Instability.

Authors: Jason E McDermott; Osama A Arshad; Vladislav A Petyuk; Yi Fu; Marina A Gritsenko; Therese R Clauss; Ronald J Moore; Athena A Schepmoes; Rui Zhao; Matthew E Monroe; Michael Schnaubelt; Chia-Feng Tsai; Samuel H Payne; Chen Huang; Liang-Bo Wang; Steven Foltz; Matthew Wyczalkowski; Yige Wu; Ehwang Song; Molly A Brewer; Mathangi Thiagarajan; Christopher R Kinsinger; Ana I Robles; Emily S Boja; Henry Rodriguez; Daniel W Chan; Bing Zhang; Zhen Zhang; Li Ding; Richard D Smith; Tao Liu; Karin D Rodland
Journal: Cell Rep Med Date: 2020-04-10

9. Metabolite, Protein, and Lipid Extraction (MPLEx): A Method that Simultaneously Inactivates Middle East Respiratory Syndrome Coronavirus and Allows Analysis of Multiple Host Cell Components Following Infection.

Authors: Carrie D Nicora; Amy C Sims; Kent J Bloodsworth; Young-Mo Kim; Ronald J Moore; Jennifer E Kyle; Ernesto S Nakayasu; Thomas O Metz
Journal: Methods Mol Biol Date: 2020

10. The AML microenvironment catalyzes a stepwise evolution to gilteritinib resistance.

Authors: Sunil K Joshi; Tamilla Nechiporuk; Daniel Bottomly; Paul D Piehowski; Julie A Reisz; Janét Pittsenbarger; Andy Kaempf; Sara J C Gosline; Yi-Ting Wang; Joshua R Hansen; Marina A Gritsenko; Chelsea Hutchinson; Karl K Weitz; Jamie Moon; Francesca Cendali; Thomas L Fillmore; Chia-Feng Tsai; Athena A Schepmoes; Tujin Shi; Osama A Arshad; Jason E McDermott; Ozgun Babur; Kevin Watanabe-Smith; Emek Demir; Angelo D'Alessandro; Tao Liu; Cristina E Tognon; Jeffrey W Tyner; Shannon K McWeeney; Karin D Rodland; Brian J Druker; Elie Traer
Journal: Cancer Cell Date: 2021-06-24 Impact factor: 38.585