Chiung-Ting Wu1, Yizhi Wang1, Yinxue Wang1, Timothy Ebbels2, Ibrahim Karaman3,4, Gonçalo Graça2, Rui Pinto3,4, David M Herrington5, Yue Wang1, Guoqiang Yu1. 1. Bradley Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA. 2. Section of Bioinformatics, Division of Systems Medicine, Department of Metabolism, Digestion and Reproduction, Imperial College London, London SW7 2AZ, UK. 3. Department of Epidemiology and Biostatistics, Imperial College London, London W2 1PG, UK. 4. UK Dementia Research Institute, Imperial College London, London, UK. 5. Department of Internal Medicine, Wake Forest University, Winston-Salem, NC 27157, USA.
Abstract
MOTIVATION: Liquid chromatography-mass spectrometry (LC-MS) is a standard method for proteomics and metabolomics analysis of biological samples. Unfortunately, it suffers from various changes in the retention times (RT) of the same compound in different samples, and these must be subsequently corrected (aligned) during data processing. Classic alignment methods such as in the popular XCMS package often assume a single time-warping function for each sample. Thus, the potentially varying RT drift for compounds with different masses in a sample is neglected in these methods. Moreover, the systematic change in RT drift across run order is often not considered by alignment algorithms. Therefore, these methods cannot effectively correct all misalignments. For a large-scale experiment involving many samples, the existence of misalignment becomes inevitable and concerning. RESULTS: Here, we describe an integrated reference-free profile alignment method, neighbor-wise compound-specific Graphical Time Warping (ncGTW), that can detect misaligned features and align profiles by leveraging expected RT drift structures and compound-specific warping functions. Specifically, ncGTW uses individualized warping functions for different compounds and assigns constraint edges on warping functions of neighboring samples. Validated with both realistic synthetic data and internal quality control samples, ncGTW applied to two large-scale metabolomics LC-MS datasets identifies many misaligned features and successfully realigns them. These features would otherwise be discarded or uncorrected using existing methods. The ncGTW software tool is developed currently as a plug-in to detect and realign misaligned features present in standard XCMS output. AVAILABILITY AND IMPLEMENTATION: An R package of ncGTW is freely available at Bioconductor and https://github.com/ChiungTingWu/ncGTW. A detailed user's manual and a vignette are provided within the package. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: Liquid chromatography-mass spectrometry (LC-MS) is a standard method for proteomics and metabolomics analysis of biological samples. Unfortunately, it suffers from various changes in the retention times (RT) of the same compound in different samples, and these must be subsequently corrected (aligned) during data processing. Classic alignment methods such as in the popular XCMS package often assume a single time-warping function for each sample. Thus, the potentially varying RT drift for compounds with different masses in a sample is neglected in these methods. Moreover, the systematic change in RT drift across run order is often not considered by alignment algorithms. Therefore, these methods cannot effectively correct all misalignments. For a large-scale experiment involving many samples, the existence of misalignment becomes inevitable and concerning. RESULTS: Here, we describe an integrated reference-free profile alignment method, neighbor-wise compound-specific Graphical Time Warping (ncGTW), that can detect misaligned features and align profiles by leveraging expected RT drift structures and compound-specific warping functions. Specifically, ncGTW uses individualized warping functions for different compounds and assigns constraint edges on warping functions of neighboring samples. Validated with both realistic synthetic data and internal quality control samples, ncGTW applied to two large-scale metabolomics LC-MS datasets identifies many misaligned features and successfully realigns them. These features would otherwise be discarded or uncorrected using existing methods. The ncGTW software tool is developed currently as a plug-in to detect and realign misaligned features present in standard XCMS output. AVAILABILITY AND IMPLEMENTATION: An R package of ncGTW is freely available at Bioconductor and https://github.com/ChiungTingWu/ncGTW. A detailed user's manual and a vignette are provided within the package. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: Lukas N Mueller; Oliver Rinner; Alexander Schmidt; Simon Letarte; Bernd Bodenmiller; Mi-Youn Brusniak; Olga Vitek; Ruedi Aebersold; Markus Müller Journal: Proteomics Date: 2007-10 Impact factor: 3.984
Authors: Albert Hofman; Sarwa Darwish Murad; Cornelia M van Duijn; Oscar H Franco; André Goedegebure; M Arfan Ikram; Caroline C W Klaver; Tamar E C Nijsten; Robin P Peeters; Bruno H Ch Stricker; Henning W Tiemeier; André G Uitterlinden; Meike W Vernooij Journal: Eur J Epidemiol Date: 2013-11-21 Impact factor: 8.082
Authors: Diane E Bild; David A Bluemke; Gregory L Burke; Robert Detrano; Ana V Diez Roux; Aaron R Folsom; Philip Greenland; David R Jacob; Richard Kronmal; Kiang Liu; Jennifer Clark Nelson; Daniel O'Leary; Mohammed F Saad; Steven Shea; Moyses Szklo; Russell P Tracy Journal: Am J Epidemiol Date: 2002-11-01 Impact factor: 4.897
Authors: Matthew R Lewis; Jake T M Pearce; Konstantina Spagou; Martin Green; Anthony C Dona; Ada H Y Yuen; Mark David; David J Berry; Katie Chappell; Verena Horneffer-van der Sluis; Rachel Shaw; Simon Lovestone; Paul Elliott; John Shockcor; John C Lindon; Olivier Cloarec; Zoltan Takats; Elaine Holmes; Jeremy K Nicholson Journal: Anal Chem Date: 2016-08-26 Impact factor: 6.986