Literature DB >> 19505941

ChromA: signal-based retention time alignment for chromatography-mass spectrometry data.

Abstract

SUMMARY: We describe ChromA, a web-based alignment tool for chromatography-mass spectrometry data from the metabolomics and proteomics domains. Users can supply their data in open and standardized file formats for retention time alignment using dynamic time warping with different configurable local distance and similarity functions. Additionally, user-defined anchors can be used to constrain and speedup the alignment. A neighborhood around each anchor can be added to increase the flexibility of the constrained alignment. ChromA offers different visualizations of the alignment for easier qualitative interpretation and comparison of the data. For the multiple alignment of more than two data files, the center-star approximation is applied to select a reference among input files to align to. AVAILABILITY: ChromA is available at http://bibiserv.techfak.uni-bielefeld.de/chroma. Executables and source code under the L-GPL v3 license are provided for download at the same location.

Entities: Chemical Species

Mesh：

Year: 2009 PMID： 19505941 PMCID： PMC2722998 DOI： 10.1093/bioinformatics/btp343

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

1 INTRODUCTION

Modern analytical methods in biology and chemistry use separation techniques coupled to sensitive detectors, such as gas chromatography–mass spectrometry (GC-MS) and liquid chromatography–mass spectrometry (LC-MS). These hyphenated methods provide high-dimensional data. Comparing such data manually to find corresponding signals is a tedious task, as each experiment usually consists of thousands of individual scans, each containing hundreds or even thousands of distinct signals. In order to allow successful identification of metabolites or proteins within such data, especially in the context of metabolomics and proteomics, an accurate alignment and matching of corresponding features between two or more experiments is required. Such a matching algorithm should capture fluctuations in the chromatographic system which lead to non-linear distortions on the time axis (Strehmel et al., 2008). Many different algorithms for the alignment of GC-MS/LC-MS data have been proposed and published, but only some of them are easily accessible or contained in publicly available toolkits (De Vos et al., 2007; Jonsson et al., 2005, 2006; Kohlbacher et al., 2007; Smith et al., 2006; Sturm et al., 2008). The tool presented here, ChromA, is immediately accessible for pairwise alignment and easy to use via the web frontend (see Supplementary Fig. 1) and as a web service. It provides different visual representations of the alignment, focusing on differences and similarities between the chromatograms. We additionally offer ChromA as an immediately deployable JAVAtm Web Start application and for download as a platform-independent command-line tool. These allow alignment of more than two chromatograms, using the center-star approximation to select a reference chromatogram among all input files to align to. To compute the pairwise alignment, we use dynamic time warping (DTW) due to its applicability to data with non-linear time scale distortions (Itakura, 1975; Kruskal and Liberman, 1999; Sakoe and Chiba, 1978). It is suitable to globally align chromatograms, which are sequences of mass spectra. Every mass spectrum is preprocessed to nominal mass bin accuracy. In contrast to other methods (Robinson et al., 2007), there is no need for a priori selection of peaks for alignment, but a priori knowledge can be used to improve and speedup the alignment.

2 DATA MANAGEMENT AND METHODS

Currently, netcdf files (Rew and Davis, 1990) following the ASTM/AIA/ANDI-MS standard (Matthews and Miller, 2000) and xml files following the mzXML format (Pedrioli et al., 2004) can be read. Aligned chromatograms are stored in netcdf files, whereas general processing results, statistics and status information are saved in tab-separated value text format for easier access. All files generated during a run of ChromA, their creator and their designation (preprocessing, alignment, visualization, etc.), are stored in an xml file to allow an easy integration with data curation and analysis platforms for metabolomic experiments, for example, MeltDB (Neuweger et al., 2008). In our software, we included different local distance and similarity functions between mass spectral intensity vectors, like the Euclidean distance, cosine similarity and linear correlation (Prince and Marcotte, 2006), to calculate a retention time alignment of chromatograms with DTW. Additionally, we included the Hamming distance on binarized vectors and a very fast function based on squared difference of total ion current (TIC) (Reiner et al., 1979), which is available for quick evaluation. Depending on the local function used, we apply different weights to provide a smooth warping. ChromA allows the user to define a number of optional configuration choices. As a preprocessing step, intensities contained in user-defined mass bins may be removed from consideration by the alignment. Additionally, manually or automatically matched peaks (Robinson et al., 2007; Styczynski et al., 2007) may be included as anchors to constrain the alignment to pass through their positions (see Supplementary Fig. 2). Even though the worst case complexity of DTW is still of order 𝒪(m2) in space and time, where m is the number of scans in a chromatogram, we can achieve large speedups in practice. An alignment of two chromatograms with about 5400 scans each, 500 nominal mass bins, 38 defined anchors and a maximum scan deviation of 10% (about 540 scans to the left and right of the diagonal) using the cosine score as local similarity was calculated in 12 s on a MacBook with 2.4 GHz Core2 Duo processor, using around 500 MB of memory. Without any constraints, the same alignment was calculated in 7 min. The multiple alignment of 20 chromatograms using the center-star approximation required computation of 190 pairwise alignments. Using the aforementioned constraints, it was calculated within 40 min, without constraints in <24 h. With the introduction of anchors to DTW, we address one major issue of peak-alignment algorithms, namely the problem of prior peak detection, by allowing strong peak candidates, such as reference compounds with unique mass traces (LC-MS) or characteristic fragmentation patterns (GC-MS), to be included, but at the same time allowing an alignment of weaker peaks. To allow the alignment additional flexibility, a neighborhood of radius n can be defined for all anchors.

3 VISUALIZATIONS

ChromA provides a number of visualizations for alignments, variable data and chromatograms, which are generated using the open source library JFreeChart (Gilbert and Morgner, 2009). In order to visualize alignments, we implemented different chart types. Figure 1 shows a plot of the TIC of the second chromatogram below the first chromatogram's TIC after alignment. Corresponding peaks are easily spotted with this visualization, as well as peaks that are only present in one of the chromatograms. We additionally provide visualizations of a multiple alignment of TICs before and after the alignment using the Web Start version of ChromA (Supplementary Figs 3 and 4), as well as an exemplary mass sensitive visualization of nominal mass 73 (silylation agent) before and after the alignment (Supplementary Figs 5 and 6).

Fig. 1.

Visualization of TICs after DTW alignment with ChromA. The TIC of file glucoseA is displayed above the TIC of file mannitolA. Files were obtained from experiments with Xanthomonas campestris pv. campestris B100 raised on different carbon sources (Neuweger et al., 2008). Chromatograms were aligned based on cosine similiarity between nominal mass-spectral intensity vectors.

4 CONCLUSION

ChromA is an easily accessible tool for retention time alignment of GC-MS and LC-MS chromatograms. Integration of the positions of matched peaks or of already identified compounds as anchors speeds up alignment calculation, yet still provides enough flexibility for it. The visualizations provided allow easy qualitative comparison of both unaligned and aligned replicate and non-replicate chromatograms. The framework used to develop ChromA, Maltcms (modular application toolkit for chromatography–mass spectrometry), available at http://maltcms.sourceforge.net, published under the GNU L-GPL v3 license, will be extended in the future, so we would like to encourage other researchers to join the project and contribute to it.

13 in total

1. High-throughput data analysis for detecting and identifying differences between samples in GC/MS-based metabolomic analyses.

Authors: Pär Jonsson; Annika I Johansson; Jonas Gullberg; Johan Trygg; Jiye A; Bjørn Grung; Stefan Marklund; Michael Sjöström; Henrik Antti; Thomas Moritz
Journal: Anal Chem Date: 2005-09-01 Impact factor: 6.986

2. XCMS: processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching, and identification.

Authors: Colin A Smith; Elizabeth J Want; Grace O'Maille; Ruben Abagyan; Gary Siuzdak
Journal: Anal Chem Date: 2006-02-01 Impact factor: 6.986

3. Predictive metabolite profiling applying hierarchical multivariate curve resolution to GC-MS data--a potential tool for multi-parametric diagnosis.

Authors: Pär Jonsson; Elin Sjövik Johansson; Anna Wuolikainen; Johan Lindberg; Ina Schuppe-Koistinen; Miyako Kusano; Michael Sjöström; Johan Trygg; Thomas Moritz; Henrik Antti
Journal: J Proteome Res Date: 2006-06 Impact factor: 4.466

4. Systematic identification of conserved metabolites in GC/MS data for metabolomics and biomarker discovery.

Authors: Mark P Styczynski; Joel F Moxley; Lily V Tong; Jason L Walther; Kyle L Jensen; Gregory N Stephanopoulos
Journal: Anal Chem Date: 2007-02-01 Impact factor: 6.986

5. Chromatographic alignment of ESI-LC-MS proteomics data sets by ordered bijective interpolated warping.

Authors: John T Prince; Edward M Marcotte
Journal: Anal Chem Date: 2006-09-01 Impact factor: 6.986

6. Untargeted large-scale plant metabolomics using liquid chromatography coupled to mass spectrometry.

Authors: Ric C H De Vos; Sofia Moco; Arjen Lommen; Joost J B Keurentjes; Raoul J Bino; Robert D Hall
Journal: Nat Protoc Date: 2007 Impact factor: 13.491

7. MeltDB: a software platform for the analysis and integration of metabolomics experiment data.

Authors: Heiko Neuweger; Stefan P Albaum; Michael Dondrup; Marcus Persicke; Tony Watt; Karsten Niehaus; Jens Stoye; Alexander Goesmann
Journal: Bioinformatics Date: 2008-09-02 Impact factor: 6.937

8. Characterization of normal human cells by pyrolysis gas chromatography mass spectrometry.

Authors: E Reiner; L E Abbey; T F Moran; P Papamichalis; R W Schafer
Journal: Biomed Mass Spectrom Date: 1979-11

9. A common open representation of mass spectrometry data and its application to proteomics research.

Authors: Patrick G A Pedrioli; Jimmy K Eng; Robert Hubley; Mathijs Vogelzang; Eric W Deutsch; Brian Raught; Brian Pratt; Erik Nilsson; Ruth H Angeletti; Rolf Apweiler; Kei Cheung; Catherine E Costello; Henning Hermjakob; Sequin Huang; Randall K Julian; Eugene Kapp; Mark E McComb; Stephen G Oliver; Gilbert Omenn; Norman W Paton; Richard Simpson; Richard Smith; Chris F Taylor; Weimin Zhu; Ruedi Aebersold
Journal: Nat Biotechnol Date: 2004-11 Impact factor: 54.908

10. A dynamic programming approach for the alignment of signal peaks in multiple gas chromatography-mass spectrometry experiments.

Authors: Mark D Robinson; David P De Souza; Woon Wai Keen; Eleanor C Saunders; Malcolm J McConville; Terence P Speed; Vladimir A Likić
Journal: BMC Bioinformatics Date: 2007-10-29 Impact factor: 3.169

12 in total

1. PyMS: a Python toolkit for processing of gas chromatography-mass spectrometry (GC-MS) data. Application and comparative study of selected tools.

Authors: Sean O'Callaghan; David P De Souza; Andrew Isaac; Qiao Wang; Luke Hodkinson; Moshe Olshansky; Tim Erwin; Bill Appelbe; Dedreia L Tull; Ute Roessner; Antony Bacic; Malcolm J McConville; Vladimir A Likić
Journal: BMC Bioinformatics Date: 2012-05-30 Impact factor: 3.169

Review 2. Image analysis tools and emerging algorithms for expression proteomics.

Authors: Andrew W Dowsey; Jane A English; Frederique Lisacek; Jeffrey S Morris; Guang-Zhong Yang; Michael J Dunn
Journal: Proteomics Date: 2010-12 Impact factor: 3.984

3. DIAlignR Provides Precise Retention Time Alignment Across Distant Runs in DIA and Targeted Proteomics.

Authors: Shubham Gupta; Sara Ahadi; Wenyu Zhou; Hannes Röst
Journal: Mol Cell Proteomics Date: 2019-01-31 Impact factor: 5.911

4. Feature based retention time alignment for improved HDX MS analysis.

Authors: John D Venable; William Scuba; Ansgar Brock
Journal: J Am Soc Mass Spectrom Date: 2013-02-22 Impact factor: 3.109

5. Comprehensive analysis of LC/MS data using pseudocolor plots.

Authors: Christopher A Crutchfield; Matthew T Olson; Evgenia Gourgari; Maria Nesterova; Constantine A Stratakis; Alfred L Yergey
Journal: J Am Soc Mass Spectrom Date: 2013-01-03 Impact factor: 3.109

6. Bioinformatics Tools for Mass Spectroscopy-Based Metabolomic Data Processing and Analysis.

Authors: Masahiro Sugimoto; Masato Kawakami; Martin Robert; Tomoyoshi Soga; Masaru Tomita
Journal: Curr Bioinform Date: 2012-03 Impact factor: 3.543

7. An ultra-fast metabolite prediction algorithm.

Authors: Zheng Rong Yang; Murray Grant
Journal: PLoS One Date: 2012-06-20 Impact factor: 3.240

8. Systematic applications of metabolomics in metabolic engineering.

Authors: Robert A Dromms; Mark P Styczynski
Journal: Metabolites Date: 2012-12-14

9. Combining peak- and chromatogram-based retention time alignment algorithms for multiple chromatography-mass spectrometry datasets.

Authors: Nils Hoffmann; Matthias Keck; Heiko Neuweger; Mathias Wilhelm; Petra Högy; Karsten Niehaus; Jens Stoye
Journal: BMC Bioinformatics Date: 2012-08-27 Impact factor: 3.169

10. MeltDB 2.0-advances of the metabolomics software system.

Authors: Nikolas Kessler; Heiko Neuweger; Anja Bonte; Georg Langenkämper; Karsten Niehaus; Tim W Nattkemper; Alexander Goesmann
Journal: Bioinformatics Date: 2013-08-05 Impact factor: 6.937