Perrine Soret1,2,3, Marta Avalos4,5, Linda Wittkop1,2,6, Daniel Commenges1,2, Rodolphe Thiébaut1,2,3,6. 1. Univ. Bordeaux, Inserm, Bordeaux Population Health Research Center, UMR 1219, Bordeaux, F-33000, France. 2. Inria SISTM Team, Talence, F-33405, France. 3. Vaccine Research Institute (VRI), Créteil, F-94000, France. 4. Univ. Bordeaux, Inserm, Bordeaux Population Health Research Center, UMR 1219, Bordeaux, F-33000, France. marta.avalos-fernandez@u-bordeaux.fr. 5. Inria SISTM Team, Talence, F-33405, France. marta.avalos-fernandez@u-bordeaux.fr. 6. CHU Bordeaux, Department of Public Health, Bordeaux, F-33000, France.
Abstract
BACKGROUND: Biological assays for the quantification of markers may suffer from a lack of sensitivity and thus from an analytical detection limit. This is the case of human immunodeficiency virus (HIV) viral load. Below this threshold the exact value is unknown and values are consequently left-censored. Statistical methods have been proposed to deal with left-censoring but few are adapted in the context of high-dimensional data. METHODS: We propose to reverse the Buckley-James least squares algorithm to handle left-censored data enhanced with a Lasso regularization to accommodate high-dimensional predictors. We present a Lasso-regularized Buckley-James least squares method with both non-parametric imputation using Kaplan-Meier and parametric imputation based on the Gaussian distribution, which is typically assumed for HIV viral load data after logarithmic transformation. Cross-validation for parameter-tuning is based on an appropriate loss function that takes into account the different contributions of censored and uncensored observations. We specify how these techniques can be easily implemented using available R packages. The Lasso-regularized Buckley-James least square method was compared to simple imputation strategies to predict the response to antiretroviral therapy measured by HIV viral load according to the HIV genotypic mutations. We used a dataset composed of several clinical trials and cohorts from the Forum for Collaborative HIV Research (HIV Med. 2008;7:27-40). The proposed methods were also assessed on simulated data mimicking the observed data. RESULTS: Approaches accounting for left-censoring outperformed simple imputation methods in a high-dimensional setting. The Gaussian Buckley-James method with cross-validation based on the appropriate loss function showed the lowest prediction error on simulated data and, using real data, the most valid results according to the current literature on HIV mutations. CONCLUSIONS: The proposed approach deals with high-dimensional predictors and left-censored outcomes and has shown its interest for predicting HIV viral load according to HIV mutations.
BACKGROUND: Biological assays for the quantification of markers may suffer from a lack of sensitivity and thus from an analytical detection limit. This is the case of humanimmunodeficiency virus (HIV) viral load. Below this threshold the exact value is unknown and values are consequently left-censored. Statistical methods have been proposed to deal with left-censoring but few are adapted in the context of high-dimensional data. METHODS: We propose to reverse the Buckley-James least squares algorithm to handle left-censored data enhanced with a Lasso regularization to accommodate high-dimensional predictors. We present a Lasso-regularized Buckley-James least squares method with both non-parametric imputation using Kaplan-Meier and parametric imputation based on the Gaussian distribution, which is typically assumed for HIV viral load data after logarithmic transformation. Cross-validation for parameter-tuning is based on an appropriate loss function that takes into account the different contributions of censored and uncensored observations. We specify how these techniques can be easily implemented using available R packages. The Lasso-regularized Buckley-James least square method was compared to simple imputation strategies to predict the response to antiretroviral therapy measured by HIV viral load according to the HIV genotypic mutations. We used a dataset composed of several clinical trials and cohorts from the Forum for Collaborative HIV Research (HIV Med. 2008;7:27-40). The proposed methods were also assessed on simulated data mimicking the observed data. RESULTS: Approaches accounting for left-censoring outperformed simple imputation methods in a high-dimensional setting. The Gaussian Buckley-James method with cross-validation based on the appropriate loss function showed the lowest prediction error on simulated data and, using real data, the most valid results according to the current literature on HIV mutations. CONCLUSIONS: The proposed approach deals with high-dimensional predictors and left-censored outcomes and has shown its interest for predicting HIV viral load according to HIV mutations.
Entities:
Keywords:
Buckley-James least squares procedure; Cross-sectional studies; Drug resistance; HIV genotypic mutations; HIV viral load; Limit of detection
Authors: W B Paxton; R W Coombs; M J McElrath; M C Keefer; J Hughes; F Sinangil; D Chernoff; L Demeter; B Williams; L Corey Journal: J Infect Dis Date: 1997-02 Impact factor: 5.226
Authors: Gregg E Dinse; Todd A Jusko; Lindsey A Ho; Kaushik Annam; Barry I Graubard; Irva Hertz-Picciotto; Frederick W Miller; Brenda W Gillespie; Clarice R Weinberg Journal: Am J Epidemiol Date: 2014-03-04 Impact factor: 4.897
Authors: Niko Beerenwinkel; Hesam Montazeri; Heike Schuhmacher; Patrick Knupfer; Viktor von Wyl; Hansjakob Furrer; Manuel Battegay; Bernard Hirschel; Matthias Cavassini; Pietro Vernazza; Enos Bernasconi; Sabine Yerly; Jürg Böni; Thomas Klimkait; Cristina Cellerai; Huldrych F Günthard Journal: PLoS Comput Biol Date: 2013-08-29 Impact factor: 4.475
Authors: Hannelore K van der Burgh; Ruben Schmidt; Henk-Jan Westeneng; Marcel A de Reus; Leonard H van den Berg; Martijn P van den Heuvel Journal: Neuroimage Clin Date: 2016-10-11 Impact factor: 4.881