Yankang Jing1, Ziheng Hu1, Peihao Fan1, Ying Xue1, Lirong Wang1, Ralph E Tarter2, Levent Kirisci2, Junmei Wang3, Michael Vanyukov4, Xiang-Qun Xie5. 1. Department of Pharmaceutical Sciences, Computational Chemical Genomics Screen Center, School of Pharmacy, University of Pittsburgh, 3501 Terrace St, Pittsburgh, PA, 15213, USA; Department of Pharmaceutical Sciences, School of Pharmacy, NIDA National Center of Excellence for Computational Drug Abuse Research, University of Pittsburgh, 3501 Terrace St, Pittsburgh, PA, 15213, USA. 2. Department of Pharmaceutical Sciences, School of Pharmacy, University of Pittsburgh, 3501 Terrace St, Pittsburgh, PA, 15213, USA. 3. Department of Pharmaceutical Sciences, Computational Chemical Genomics Screen Center, School of Pharmacy, University of Pittsburgh, 3501 Terrace St, Pittsburgh, PA, 15213, USA; Department of Pharmaceutical Sciences, School of Pharmacy, NIDA National Center of Excellence for Computational Drug Abuse Research, University of Pittsburgh, 3501 Terrace St, Pittsburgh, PA, 15213, USA. Electronic address: junmei.wang@pitt.edu. 4. Department of Pharmaceutical Sciences, School of Pharmacy, University of Pittsburgh, 3501 Terrace St, Pittsburgh, PA, 15213, USA. Electronic address: mmv@pitt.edu. 5. Department of Pharmaceutical Sciences, Computational Chemical Genomics Screen Center, School of Pharmacy, University of Pittsburgh, 3501 Terrace St, Pittsburgh, PA, 15213, USA; Department of Pharmaceutical Sciences, School of Pharmacy, NIDA National Center of Excellence for Computational Drug Abuse Research, University of Pittsburgh, 3501 Terrace St, Pittsburgh, PA, 15213, USA. Electronic address: xix15@pitt.edu.
Abstract
BACKGROUND: Substance use disorder (SUD) exacts enormous societal costs in the United States, and it is important to detect high-risk youths for prevention. Machine learning (ML) is the method to find patterns and make prediction from data. We hypothesized that ML identifies the health, psychological, psychiatric, and contextual features to predict SUD, and the identified features predict high-risk individuals to develop SUD. METHOD: Male (N = 494) and female (N = 206) participants and their informant parents were administered a battery of questionnaires across five waves of assessment conducted at 10-12, 12-14, 16, 19, and 22 years of age. Characteristics most strongly associated with SUD were identified using the random forest (RF)algorithm from approximately 1000 variables measured at each assessment. Next, the complement of features was validated, and the best models were selected for predicting SUD using seven ML algorithms. Lastly, area under the receiver operating characteristic curve (AUROC) evaluated accuracy of detecting individuals who develop SUD+/- up to thirty years of age. RESULTS: Approximately thirty variables strongly predict SUD. The predictors shift from psychological dysregulation and poor health behavior in late childhood to non-normative socialization in mid to late adolescence. In 10-12-year-old youths, the features predict SUD+/- with 74% accuracy, increasing to 86% at 22 years of age. The RF algorithm optimally detects individuals between 10-22 years of age who develop SUD compared to other ML algorithms. CONCLUSION: These findings inform the items required for inclusion in instruments to accurately identify high risk youths and young adults requiring SUD prevention.
BACKGROUND: Substance use disorder (SUD) exacts enormous societal costs in the United States, and it is important to detect high-risk youths for prevention. Machine learning (ML) is the method to find patterns and make prediction from data. We hypothesized that ML identifies the health, psychological, psychiatric, and contextual features to predict SUD, and the identified features predict high-risk individuals to develop SUD. METHOD: Male (N = 494) and female (N = 206) participants and their informant parents were administered a battery of questionnaires across five waves of assessment conducted at 10-12, 12-14, 16, 19, and 22 years of age. Characteristics most strongly associated with SUD were identified using the random forest (RF)algorithm from approximately 1000 variables measured at each assessment. Next, the complement of features was validated, and the best models were selected for predicting SUD using seven ML algorithms. Lastly, area under the receiver operating characteristic curve (AUROC) evaluated accuracy of detecting individuals who develop SUD+/- up to thirty years of age. RESULTS: Approximately thirty variables strongly predict SUD. The predictors shift from psychological dysregulation and poor health behavior in late childhood to non-normative socialization in mid to late adolescence. In 10-12-year-old youths, the features predict SUD+/- with 74% accuracy, increasing to 86% at 22 years of age. The RF algorithm optimally detects individuals between 10-22 years of age who develop SUD compared to other ML algorithms. CONCLUSION: These findings inform the items required for inclusion in instruments to accurately identify high risk youths and young adults requiring SUD prevention.
Authors: Miles N Wernick; Yongyi Yang; Jovan G Brankov; Grigori Yourganov; Stephen C Strother Journal: IEEE Signal Process Mag Date: 2010-07 Impact factor: 12.551
Authors: Thanthirige Lakshika Maduwanthi Ruberu; Emily A Kenyon; Karen A Hudson; Francesca Filbey; Sarah W Feldstein Ewing; Swati Biswas; Pankaj K Choudhary Journal: Prev Med Rep Date: 2021-12-13