Zhuang Jiang1, Botao Fa2, Xunmiao Zhang3, Jiping Wang1, Yanmei Feng1, Haibo Shi1, Yue Zhang2, Daoyuan Sun4, Hui Wang5, Shankai Yin1. 1. Department of Otolaryngology-Head and Neck Surgery, Shanghai Jiao Tong University Affiliated Sixth People's Hospital, 600 Yishan Road, Shanghai 200233, China; Otolaryngology Institute of Shanghai Jiao Tong University, Shanghai 200233, China; Shanghai Key Laboratory of Sleep Disordered Breathing, Shanghai 200233, China. 2. Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China. 3. Department of Occupational Disease, Shanghai Pulmonary Hospital, Tongji University School of Medicine, Shanghai 200433, China. 4. Department of Occupational Disease, Shanghai Pulmonary Hospital, Tongji University School of Medicine, Shanghai 200433, China. Electronic address: dysun@163.com. 5. Department of Otolaryngology-Head and Neck Surgery, Shanghai Jiao Tong University Affiliated Sixth People's Hospital, 600 Yishan Road, Shanghai 200233, China; Otolaryngology Institute of Shanghai Jiao Tong University, Shanghai 200233, China; Shanghai Key Laboratory of Sleep Disordered Breathing, Shanghai 200233, China. Electronic address: wangh2005@alumni.sjtu.edu.cn.
Abstract
BACKGROUND: The overall genetic profile for noise-induced hearing loss (NIHL) remains elusive. Herein we proposed a novel machine learning (ML) based strategy to evaluate individual susceptibility to NIHL and identify the underlying genetic risk variants based on a subsample of participants with extreme phenotypes. METHODS: Five features (age, sex, cumulative noise exposure [CNE], smoking, and alcohol drinking status) of 5,539 shipbuilding workers from large cross-sectional surveys were included in four ML classification models to predict their hearing levels. The area under the curve (AUC) and prediction accuracy were exploited to evaluate the performance of the models. Based on the prediction error of the ML models, the NIHL-susceptible group (n=150) and NIHL-resistant group (n=150) with a paradoxical relationship between hearing levels and features were separately screened, to identify the underlying variants associated with NIHL risk using whole-exome sequencing (WES). Subsequently, candidate risk variants were validated in an additional replication cohort (n=2108), followed by a meta-analysis. RESULTS: With 10-fold cross-validation, the performances of the four ML models were robust and similar, with average AUCs and accuracies ranging from 0.783 to 0.798 and 73.7% to 73.8%, respectively. The phenotypes of the NIHL-susceptible and NIHL-resistant groups were significantly different (all p<0.001). After WES analysis and filtering, 12 risk variants contributing to NIHL susceptibility were identified and replicated. The meta-analyses showed that the A allele of CDH23 rs41281334 (odds ratio [OR]=1.506, 95% confidence interval [CI]=1.106-2.051) and the C allele of WHRN rs12339210 (OR=3.06, 95% CI=1.398-6.700) were significantly associated with increased risk of NIHL after adjustment for confounding factors. CONCLUSIONS: This study revealed two genetic variants in CDH23 rs41281334 and WHRN rs12339210 that associated with NIHL risk, based on a promising approach for evaluating individual susceptibility using ML models.
BACKGROUND: The overall genetic profile for noise-induced hearing loss (NIHL) remains elusive. Herein we proposed a novel machine learning (ML) based strategy to evaluate individual susceptibility to NIHL and identify the underlying genetic risk variants based on a subsample of participants with extreme phenotypes. METHODS: Five features (age, sex, cumulative noise exposure [CNE], smoking, and alcohol drinking status) of 5,539 shipbuilding workers from large cross-sectional surveys were included in four ML classification models to predict their hearing levels. The area under the curve (AUC) and prediction accuracy were exploited to evaluate the performance of the models. Based on the prediction error of the ML models, the NIHL-susceptible group (n=150) and NIHL-resistant group (n=150) with a paradoxical relationship between hearing levels and features were separately screened, to identify the underlying variants associated with NIHL risk using whole-exome sequencing (WES). Subsequently, candidate risk variants were validated in an additional replication cohort (n=2108), followed by a meta-analysis. RESULTS: With 10-fold cross-validation, the performances of the four ML models were robust and similar, with average AUCs and accuracies ranging from 0.783 to 0.798 and 73.7% to 73.8%, respectively. The phenotypes of the NIHL-susceptible and NIHL-resistant groups were significantly different (all p<0.001). After WES analysis and filtering, 12 risk variants contributing to NIHL susceptibility were identified and replicated. The meta-analyses showed that the A allele of CDH23 rs41281334 (odds ratio [OR]=1.506, 95% confidence interval [CI]=1.106-2.051) and the C allele of WHRN rs12339210 (OR=3.06, 95% CI=1.398-6.700) were significantly associated with increased risk of NIHL after adjustment for confounding factors. CONCLUSIONS: This study revealed two genetic variants in CDH23 rs41281334 and WHRN rs12339210 that associated with NIHL risk, based on a promising approach for evaluating individual susceptibility using ML models.