MOTIVATION: Systematic bias in mass measurement adversely affects data quality and negates the advantages of high precision instruments. RESULTS: We introduce the mzRefinery tool for calibration of mass spectrometry data files. Using confident peptide spectrum matches, three different calibration methods are explored and the optimal transform function is chosen. After calibration, systematic bias is removed and the mass measurement errors are centered at 0 ppm. Because it is part of the ProteoWizard package, mzRefinery can read and write a wide variety of file formats. AVAILABILITY AND IMPLEMENTATION: The mzRefinery tool is part of msConvert, available with the ProteoWizard open source package at http://proteowizard.sourceforge.net/ CONTACT: samuel.payne@pnnl.gov. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: Systematic bias in mass measurement adversely affects data quality and negates the advantages of high precision instruments. RESULTS: We introduce the mzRefinery tool for calibration of mass spectrometry data files. Using confident peptide spectrum matches, three different calibration methods are explored and the optimal transform function is chosen. After calibration, systematic bias is removed and the mass measurement errors are centered at 0 ppm. Because it is part of the ProteoWizard package, mzRefinery can read and write a wide variety of file formats. AVAILABILITY AND IMPLEMENTATION: The mzRefinery tool is part of msConvert, available with the ProteoWizard open source package at http://proteowizard.sourceforge.net/ CONTACT: samuel.payne@pnnl.gov. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
For data analysis algorithms to take advantage of the higher accuracy of newer mass spectrometers, it is essential to remove systematic bias in mass measurement. Mass measurement error may originate from a variety of sources, e.g. power supply voltage/temperature drift, space charge effects, temperature/humidity variation in the laboratory, vacuum system stability, etc. Real-time calibration adjusts the mass measurement during data acquisition (Charles, 2003; Olsen ;), typically using a known species as an internal reference. Lock mass methods may also be used to calibrate after the run has completed (Zhang ). A separate method for calibration utilizes spectrum identifications to estimate measurement error and guide mass correction (Cox ; Petyuk ).We present a new calibration tool, mzRefinery, written directly into the ProteoWizard package (Kessner ). Like existing tools, mzRefinery models mass measurement error based on peptide identifications and finds the optimal calibration function. In addition to simply adjusting the precursor ion, mzRefinery corrects the m/z of every ion in any high-resolution spectrum. With the increasingly common use of high-resolution tandem mass spectra in PRM and DIA experiments, more data are being created with high-resolution fragments. Given the inherent complexity of such multiplexed fragmentation protocols, calibrating the mass accuracy will be a great benefit for these experiments.
2 Implementation
mzRefinery has three different methods for calibration. The goal of each method is to identify the m/z offset that should be applied in creating a calibrated spectrum file. The software architecture is specifically designed to allow for new calibration methods to be written and seamlessly integrated. A detailed description of the mass spectrometry data files, software class architecture and operation are provided in Supplementary Data and Supplementary Figure S1.
2.1 Global shift
Using the sub-class AdjustSimpleGlobal creates a single global shift. For every confident identification in the mzIdentML file (default q < 0.01), the exact monoisotopic m/z is calculated and compared with the observed m/z (using xml field experimentalMassToCharge). Mass errors >±0.2 m/z are filtered to avoid using data where the monoisotope was incorrectly reported by the spectrum file. After converting the error to ppm, the errors are collected into 0.5 ppm bins. After the entire file is processed, the median ppm error is calculated and used as the global shift. In the output mzML file, the SpectrumList_mzRefiner object applies the global ppm error to every peak in every high-resolution spectrum.
2.2 LC-dependent shift
Calculating the LC-dependent shift uses sub-class AdjustByScanTime. In general, the process is very similar to the calculation of a global shift. For every confident identification, both the ppm error and LC time are calculated. LC time is derived from the ScanStartTime field in mzIdentML, or from ScanStartTime in mzML. Errors are ordered by time and sorted into bins containing all scans within a 75-s period. The median ppm error of the bin is calculated, and smoothed using the median of neighboring bins (Supplementary Fig. S2). Bins in addition to the i + 1 and i − 1 neighbors are included as necessary to achieve a minimum of 100 identifications in the weighted average. When writing out the calibrated mzML file, the applied mass correction is generated through a linear interpolation of the median error values based on the scan time. By binning the data and then preforming a linear fit, the algorithm approximates a more complex smoothing.
2.3 m/z-dependent shift
Calculating the m/z-dependent shift uses sub-class AdjustByMass. This function is exactly like the LC-dependent shift except that measured m/z is tracked as the dependent variable.
3 Results and discussion
The mzRefinery program is designed to calibrate any mass spectrometry data file based on a preliminary set of identifications. The algorithm is implemented within the msconvert program, part of the ProteoWizard suite, and therefore natively understands multiple input and output formats (Chambers ). As described in the Supplementary Data, we use mzRefinery to calibrate MS and MS/MS data from Thermo Orbitrap and Bruker QqTOF instruments. All files were searched with the appropriate database and parameters by both MSGF+ and MyriMatch. This preliminary set of PSMs was used as input to the msconvert program. The resulting mzML file has updated (calibrated) m/z values. Figure 1 shows the mass measurement error present in the original mzML files. We note that for the file in Figure 1, the error changed during the LC run, and is effectively eliminated by mzRefinery. The calibrated file shows no such dependency.
Fig. 1.
Calibration. The top two graphs show a histogram of mass error, calculated using PSM identifications for dataset sample3-B_BB4_01_926. This particular file has a bimodal error in the original. After calibration (top right), the error has been removed. The bottom two graphs plot mass measurement error according to scan number. The original data (bottom left) show that the error varies dramatically with time. By using the LC-dependent calibration, the errors are removed (bottom right)
Calibration. The top two graphs show a histogram of mass error, calculated using PSM identifications for dataset sample3-B_BB4_01_926. This particular file has a bimodal error in the original. After calibration (top right), the error has been removed. The bottom two graphs plot mass measurement error according to scan number. The original data (bottom left) show that the error varies dramatically with time. By using the LC-dependent calibration, the errors are removed (bottom right)When viewing the performance of the algorithm across multiple files, it is remarkably consistent. For the 91 files tested, the original median error of any given file ranged between −2.8 and +8.4 ppm (average 1.4, SD 2.3). After calibration the median error ranged between −0.59 and +0.28 (average 0.02, SD 0.08), with 70 of the 91 files having a median error <±0.05 ppm. Thus, the method accurately removes any systematic bias in mass measurement.A primary goal of the project is to make the mzRefinery algorithm broadly accessible. As part of the ProteoWizard suite, it is available as both an executable program and a platform for further development. The software architecture is intentionally written to be extensible and new calibration methods are automatically considered. Several reasons might prompt design of a new calibration method. In the current implementation, only one dependent variable is considered (i.e. LC time or m/z). However, previous study has shown that additional improvement is possible with more complex multivariate dependencies (Petyuk ). A second motivation would be to create a new calibration for a distinct instrument or mass analyzer. Although the current software has been shown to perform well on both Orbitrap and TOF instruments, we acknowledge that keeping up with new instrumentation is an ongoing process.A suggested workflow for using mzRefinery is to first search each LC-MS/MS dataset for PSMs using fully tryptic search rules, no dynamic modifications and a relatively wide parent ion mass window, e.g. ±50 ppm. These parameters allow the search engine to quickly search for confident PSMs, yet allow for identifying PSMs even if the data were acquired when the instrument was not at its optimal calibration. Next, use mzRefinery to recalibrate each dataset using the identified PSMs from this initial search. Now re-search for PSMs in the data, but this time using the calibrated mzML files, a partially tryptic search, dynamic modifications and a narrower parent ion mass window, e.g. ±10 ppm or even ±ppm. Use of this narrow mass window will result in fewer false positives at a given false discovery rate.
Authors: Jesper V Olsen; Lyris M F de Godoy; Guoqing Li; Boris Macek; Peter Mortensen; Reinhold Pesch; Alexander Makarov; Oliver Lange; Stevan Horning; Matthias Mann Journal: Mol Cell Proteomics Date: 2005-10-24 Impact factor: 5.911
Authors: Vladislav A Petyuk; Anoop M Mayampurath; Matthew E Monroe; Ashoka D Polpitiya; Samuel O Purvine; Gordon A Anderson; David G Camp; Richard D Smith Journal: Mol Cell Proteomics Date: 2009-12-17 Impact factor: 5.911
Authors: Matthew C Chambers; Brendan Maclean; Robert Burke; Dario Amodei; Daniel L Ruderman; Steffen Neumann; Laurent Gatto; Bernd Fischer; Brian Pratt; Jarrett Egertson; Katherine Hoff; Darren Kessner; Natalie Tasman; Nicholas Shulman; Barbara Frewen; Tahmina A Baker; Mi-Youn Brusniak; Christopher Paulse; David Creasy; Lisa Flashner; Kian Kani; Chris Moulding; Sean L Seymour; Lydia M Nuwaysir; Brent Lefebvre; Frank Kuhlmann; Joe Roark; Paape Rainer; Suckau Detlev; Tina Hemenway; Andreas Huhmer; James Langridge; Brian Connolly; Trey Chadick; Krisztina Holly; Josh Eckels; Eric W Deutsch; Robert L Moritz; Jonathan E Katz; David B Agus; Michael MacCoss; David L Tabb; Parag Mallick Journal: Nat Biotechnol Date: 2012-10 Impact factor: 54.908
Authors: Wim Jonckheere; Wannes Dermauw; Vladimir Zhurov; Nicky Wybouw; Jan Van den Bulcke; Carlos A Villarroel; Robert Greenhalgh; Mike Grbić; Rob C Schuurink; Luc Tirry; Geert Baggerman; Richard M Clark; Merijn R Kant; Bartel Vanholme; Gerben Menschaert; Thomas Van Leeuwen Journal: Mol Cell Proteomics Date: 2016-10-04 Impact factor: 5.911
Authors: Yongchao Dou; Emily A Kawaler; Daniel Cui Zhou; Marina A Gritsenko; Chen Huang; Lili Blumenberg; Alla Karpova; Vladislav A Petyuk; Sara R Savage; Shankha Satpathy; Wenke Liu; Yige Wu; Chia-Feng Tsai; Bo Wen; Zhi Li; Song Cao; Jamie Moon; Zhiao Shi; MacIntosh Cornwell; Matthew A Wyczalkowski; Rosalie K Chu; Suhas Vasaikar; Hua Zhou; Qingsong Gao; Ronald J Moore; Kai Li; Sunantha Sethuraman; Matthew E Monroe; Rui Zhao; David Heiman; Karsten Krug; Karl Clauser; Ramani Kothadia; Yosef Maruvka; Alexander R Pico; Amanda E Oliphant; Emily L Hoskins; Samuel L Pugh; Sean J I Beecroft; David W Adams; Jonathan C Jarman; Andy Kong; Hui-Yin Chang; Boris Reva; Yuxing Liao; Dmitry Rykunov; Antonio Colaprico; Xi Steven Chen; Andrzej Czekański; Marcin Jędryka; Rafał Matkowski; Maciej Wiznerowicz; Tara Hiltke; Emily Boja; Christopher R Kinsinger; Mehdi Mesri; Ana I Robles; Henry Rodriguez; David Mutch; Katherine Fuh; Matthew J Ellis; Deborah DeLair; Mathangi Thiagarajan; D R Mani; Gad Getz; Michael Noble; Alexey I Nesvizhskii; Pei Wang; Matthew L Anderson; Douglas A Levine; Richard D Smith; Samuel H Payne; Kelly V Ruggles; Karin D Rodland; Li Ding; Bing Zhang; Tao Liu; David Fenyö Journal: Cell Date: 2020-02-13 Impact factor: 41.582
Authors: Aivett Bilbao; Bryson C Gibbons; Gordon W Slysz; Kevin L Crowell; Matthew E Monroe; Yehia M Ibrahim; Richard D Smith; Samuel H Payne; Erin S Baker Journal: Int J Mass Spectrom Date: 2017-11-06 Impact factor: 1.986
Authors: Emily Anderson-Baucum; Annie R Piñeros; Abhishek Kulkarni; Bobbie-Jo Webb-Robertson; Bernhard Maier; Ryan M Anderson; Wenting Wu; Sarah A Tersey; Teresa L Mastracci; Isabel Casimiro; Donalyn Scheuner; Thomas O Metz; Ernesto S Nakayasu; Carmella Evans-Molina; Raghavendra G Mirmira Journal: Cell Metab Date: 2021-09-07 Impact factor: 31.373
Authors: Pan Cheng; Xin Zhao; Lizabeth Katsnelson; Elaine M Camacho-Hernandez; Angela Mermerian; Joseph C Mays; Scott M Lippman; Reyna Edith Rosales-Alvarez; Raquel Moya; Jasmine Shwetar; Dominic Grun; David Fenyo; Teresa Davoli Journal: Elife Date: 2022-09-21 Impact factor: 8.713
Authors: Liang-Bo Wang; Alla Karpova; Marina A Gritsenko; Jennifer E Kyle; Song Cao; Yize Li; Dmitry Rykunov; Antonio Colaprico; Joseph H Rothstein; Runyu Hong; Vasileios Stathias; MacIntosh Cornwell; Francesca Petralia; Yige Wu; Boris Reva; Karsten Krug; Pietro Pugliese; Emily Kawaler; Lindsey K Olsen; Wen-Wei Liang; Xiaoyu Song; Yongchao Dou; Michael C Wendl; Wagma Caravan; Wenke Liu; Daniel Cui Zhou; Jiayi Ji; Chia-Feng Tsai; Vladislav A Petyuk; Jamie Moon; Weiping Ma; Rosalie K Chu; Karl K Weitz; Ronald J Moore; Matthew E Monroe; Rui Zhao; Xiaolu Yang; Seungyeul Yoo; Azra Krek; Alexis Demopoulos; Houxiang Zhu; Matthew A Wyczalkowski; Joshua F McMichael; Brittany L Henderson; Caleb M Lindgren; Hannah Boekweg; Shuangjia Lu; Jessika Baral; Lijun Yao; Kelly G Stratton; Lisa M Bramer; Erika Zink; Sneha P Couvillion; Kent J Bloodsworth; Shankha Satpathy; Weiva Sieh; Simina M Boca; Stephan Schürer; Feng Chen; Maciej Wiznerowicz; Karen A Ketchum; Emily S Boja; Christopher R Kinsinger; Ana I Robles; Tara Hiltke; Mathangi Thiagarajan; Alexey I Nesvizhskii; Bing Zhang; D R Mani; Michele Ceccarelli; Xi S Chen; Sandra L Cottingham; Qing Kay Li; Albert H Kim; David Fenyö; Kelly V Ruggles; Henry Rodriguez; Mehdi Mesri; Samuel H Payne; Adam C Resnick; Pei Wang; Richard D Smith; Antonio Iavarone; Milan G Chheda; Jill S Barnholtz-Sloan; Karin D Rodland; Tao Liu; Li Ding Journal: Cancer Cell Date: 2021-02-11 Impact factor: 31.743
Authors: Ernesto S Nakayasu; Meagan C Burnet; Hanna E Walukiewicz; Christopher S Wilkins; Anil K Shukla; Shelby Brooks; Matthew J Plutz; Brady D Lee; Birgit Schilling; Alan J Wolfe; Susanne Müller; John R Kirby; Christopher V Rao; John R Cort; Samuel H Payne Journal: mBio Date: 2017-11-28 Impact factor: 7.867
Authors: Jason E McDermott; Osama A Arshad; Vladislav A Petyuk; Yi Fu; Marina A Gritsenko; Therese R Clauss; Ronald J Moore; Athena A Schepmoes; Rui Zhao; Matthew E Monroe; Michael Schnaubelt; Chia-Feng Tsai; Samuel H Payne; Chen Huang; Liang-Bo Wang; Steven Foltz; Matthew Wyczalkowski; Yige Wu; Ehwang Song; Molly A Brewer; Mathangi Thiagarajan; Christopher R Kinsinger; Ana I Robles; Emily S Boja; Henry Rodriguez; Daniel W Chan; Bing Zhang; Zhen Zhang; Li Ding; Richard D Smith; Tao Liu; Karin D Rodland Journal: Cell Rep Med Date: 2020-04-10
Authors: Carrie D Nicora; Amy C Sims; Kent J Bloodsworth; Young-Mo Kim; Ronald J Moore; Jennifer E Kyle; Ernesto S Nakayasu; Thomas O Metz Journal: Methods Mol Biol Date: 2020
Authors: Sunil K Joshi; Tamilla Nechiporuk; Daniel Bottomly; Paul D Piehowski; Julie A Reisz; Janét Pittsenbarger; Andy Kaempf; Sara J C Gosline; Yi-Ting Wang; Joshua R Hansen; Marina A Gritsenko; Chelsea Hutchinson; Karl K Weitz; Jamie Moon; Francesca Cendali; Thomas L Fillmore; Chia-Feng Tsai; Athena A Schepmoes; Tujin Shi; Osama A Arshad; Jason E McDermott; Ozgun Babur; Kevin Watanabe-Smith; Emek Demir; Angelo D'Alessandro; Tao Liu; Cristina E Tognon; Jeffrey W Tyner; Shannon K McWeeney; Karin D Rodland; Brian J Druker; Elie Traer Journal: Cancer Cell Date: 2021-06-24 Impact factor: 38.585