Hunter A Miller1, Ramy Emam2, Chip M Lynch3, Samuel Bockhorst4, Hermann B Frieboes5,6,7,8. 1. Department of Pharmacology and Toxicology, University of Louisville, Louisville, KY, USA. 2. Department of Bioengineering, University of Louisville, Louisville, KY, USA. 3. Department of Computer Engineering and Computer Science, University of Louisville, Louisville, USA. 4. Department of Medicine, University of Louisville, Louisville, USA. 5. Department of Pharmacology and Toxicology, University of Louisville, Louisville, KY, USA. hbfrie01@louisville.edu. 6. James Graham Brown Cancer Center, University of Louisville, Louisville, KY, USA. hbfrie01@louisville.edu. 7. Center for Predictive Medicine, University of Louisville, Louisville, KY, USA. hbfrie01@louisville.edu. 8. Department of Bioengineering, University of Louisville, Lutz Hall 419, Louisville, KY, 40292, USA. hbfrie01@louisville.edu.
Abstract
INTRODUCTION: The identification of metabolomic biomarkers predictive of cancer patient response to therapy and of disease stage has been pursued as a "holy grail" of modern oncology, relying on the metabolic dysfunction that characterizes cancer progression. In spite of the evaluation of many candidate biomarkers, however, determination of a consistent set with practical clinical utility has proven elusive. OBJECTIVE: In this study, we systematically examine the combined role of data pre-treatment and imputation methods on the performance of multivariate data analysis methods and their identification of potential biomarkers. METHODS: Uniquely, we are able to systematically evaluate both unsupervised and supervised methods with a metabolomic data set obtained from patient-derived lung cancer core biopsies with true missing values. Eight pre-treatment methods, ten imputation methods, and two data analysis methods were applied in combination. RESULTS: The combined choice of pre-treatment and imputation methods is critical in the definition of candidate biomarkers, with deficient or inappropriate selection of these methods leading to inconsistent results, and with important biomarkers either being overlooked or reported as a false positive. The log transformation appeared to normalize the original tumor data most effectively, but the performance of the imputation applied after the transformation was highly dependent on the characteristics of the data set. CONCLUSION: The combined choice of pre-treatment and imputation methods may need careful evaluation prior to metabolomic data analysis of human tumors, in order to enable consistent identification of potential biomarkers predictive of response to therapy and of disease stage.
INTRODUCTION: The identification of metabolomic biomarkers predictive of cancer patient response to therapy and of disease stage has been pursued as a "holy grail" of modern oncology, relying on the metabolic dysfunction that characterizes cancer progression. In spite of the evaluation of many candidate biomarkers, however, determination of a consistent set with practical clinical utility has proven elusive. OBJECTIVE: In this study, we systematically examine the combined role of data pre-treatment and imputation methods on the performance of multivariate data analysis methods and their identification of potential biomarkers. METHODS: Uniquely, we are able to systematically evaluate both unsupervised and supervised methods with a metabolomic data set obtained from patient-derived lung cancer core biopsies with true missing values. Eight pre-treatment methods, ten imputation methods, and two data analysis methods were applied in combination. RESULTS: The combined choice of pre-treatment and imputation methods is critical in the definition of candidate biomarkers, with deficient or inappropriate selection of these methods leading to inconsistent results, and with important biomarkers either being overlooked or reported as a false positive. The log transformation appeared to normalize the original tumor data most effectively, but the performance of the imputation applied after the transformation was highly dependent on the characteristics of the data set. CONCLUSION: The combined choice of pre-treatment and imputation methods may need careful evaluation prior to metabolomic data analysis of human tumors, in order to enable consistent identification of potential biomarkers predictive of response to therapy and of disease stage.
Authors: Hyun-Woo Cho; Seoung Bum Kim; Myong K Jeong; Youngja Park; Nana Gletsu Miller; Thomas R Ziegler; Dean P Jones Journal: Int J Data Min Bioinform Date: 2008 Impact factor: 0.667
Authors: E A Eisenhauer; P Therasse; J Bogaerts; L H Schwartz; D Sargent; R Ford; J Dancey; S Arbuck; S Gwyther; M Mooney; L Rubinstein; L Shankar; L Dodd; R Kaplan; D Lacombe; J Verweij Journal: Eur J Cancer Date: 2009-01 Impact factor: 9.162
Authors: Piotr S Gromski; Yun Xu; Helen L Kotze; Elon Correa; David I Ellis; Emily Grace Armitage; Michael L Turner; Royston Goodacre Journal: Metabolites Date: 2014-06-16
Authors: Hunter A Miller; Xinmin Yin; Susan A Smith; Xiaoling Hu; Xiang Zhang; Jun Yan; Donald M Miller; Victor H van Berkel; Hermann B Frieboes Journal: Lung Cancer Date: 2021-04-15 Impact factor: 6.081