Kui Deng1,2, Falin Zhao3, Zhiwei Rong4, Lei Cao4, Liuchao Zhang4, Kang Li5, Yan Hou6, Zheng-Jiang Zhu7. 1. Key Laboratory of Growth Regulation and Translational Research of Zhejiang Province, School of Life Sciences, Westlake University, Hangzhou, China. 2. Institute of Basic Medical Sciences, Westlake Institute for Advanced Study, Hangzhou, China. 3. Department of Health Management, School of Medicine, Hangzhou Normal University, Hangzhou, China. 4. Department of Epidemiology and Biostatistics, School of Public Health, Harbin Medical University, Harbin, China. 5. Department of Epidemiology and Biostatistics, School of Public Health, Harbin Medical University, Harbin, China. likang@ems.hrbmu.edu.cn. 6. Department of Biostatistics, School of Public Health, Peking University, Beijing, China. houyan@bjmu.edu.cn. 7. Interdisciplinary Research Center on Biology and Chemistry, and Shanghai Institute of Organic Chemistry, Chinese Academy of Sciences, Shanghai, China. jiangzhu@sioc.ac.cn.
Abstract
INTRODUCTION: Untargeted metabolomics based on liquid chromatography-mass spectrometry is inevitably affected by batch effects that are caused by non-biological systematic bias. Previously, we developed a novel method called WaveICA to remove batch effects for untargeted metabolomics data. To detect batch effect information, the method relies on a batch label. However, it cannot be used in the scenario in which there is only one batch of data or the batch label is unknown. OBJECTIVES: We aim to improve the WaveICA method to remove batch effects for untargeted metabolomics data without using batch information. METHODS: We improved the WaveICA method by developing WaveICA 2.0 to remove batch effects for metabolomics data, and provided an R package WaveICA_2.0 to implement this method. RESULTS: The performance of the WaveICA 2.0 method was evaluated on real metabolomics data. For metabolomics data with three batches, the performance of the WaveICA 2.0 method was similar to that of the WaveICA method in terms of gathering quality control samples (QCSs) and subject samples together in principle component analysis score plots, increasing the similarity of QCSs, increasing differential peaks, and improving classification accuracy. For metabolomics data with only one batch, the WaveICA 2.0 method had a strong ability to remove intensity drift and reveal more biological information and outperformed the QC-RLSC and QC-SVRC methods in our study using our metabolomics data. CONCLUSION: Our results demonstrated that the WaveICA 2.0 method can be used in practice to remove batch effects for untargeted metabolomics data without batch information.
INTRODUCTION: Untargeted metabolomics based on liquid chromatography-mass spectrometry is inevitably affected by batch effects that are caused by non-biological systematic bias. Previously, we developed a novel method called WaveICA to remove batch effects for untargeted metabolomics data. To detect batch effect information, the method relies on a batch label. However, it cannot be used in the scenario in which there is only one batch of data or the batch label is unknown. OBJECTIVES: We aim to improve the WaveICA method to remove batch effects for untargeted metabolomics data without using batch information. METHODS: We improved the WaveICA method by developing WaveICA 2.0 to remove batch effects for metabolomics data, and provided an R package WaveICA_2.0 to implement this method. RESULTS: The performance of the WaveICA 2.0 method was evaluated on real metabolomics data. For metabolomics data with three batches, the performance of the WaveICA 2.0 method was similar to that of the WaveICA method in terms of gathering quality control samples (QCSs) and subject samples together in principle component analysis score plots, increasing the similarity of QCSs, increasing differential peaks, and improving classification accuracy. For metabolomics data with only one batch, the WaveICA 2.0 method had a strong ability to remove intensity drift and reveal more biological information and outperformed the QC-RLSC and QC-SVRC methods in our study using our metabolomics data. CONCLUSION: Our results demonstrated that the WaveICA 2.0 method can be used in practice to remove batch effects for untargeted metabolomics data without batch information.