Benson Mwangi1, Jair C Soares2, Khader M Hasan3. 1. UT Center of Excellence on Mood Disorders, Department of Psychiatry and Behavioral Sciences, UT Houston Medical School, Houston, TX, USA. Electronic address: benson.irungu@uth.tmc.edu. 2. UT Center of Excellence on Mood Disorders, Department of Psychiatry and Behavioral Sciences, UT Houston Medical School, Houston, TX, USA. 3. The University of Texas Health Science Center at Houston, Department of Diagnostic & Interventional Imaging, Houston, TX, USA.
Abstract
BACKGROUND: Neuroimaging machine learning studies have largely utilized supervised algorithms - meaning they require both neuroimaging scan data and corresponding target variables (e.g. healthy vs. diseased) to be successfully 'trained' for a prediction task. Noticeably, this approach may not be optimal or possible when the global structure of the data is not well known and the researcher does not have an a priori model to fit the data. NEW METHOD: We set out to investigate the utility of an unsupervised machine learning technique; t-distributed stochastic neighbour embedding (t-SNE) in identifying 'unseen' sample population patterns that may exist in high-dimensional neuroimaging data. Multimodal neuroimaging scans from 92 healthy subjects were pre-processed using atlas-based methods, integrated and input into the t-SNE algorithm. Patterns and clusters discovered by the algorithm were visualized using a 2D scatter plot and further analyzed using the K-means clustering algorithm. COMPARISON WITH EXISTING METHODS: t-SNE was evaluated against classical principal component analysis. CONCLUSION: Remarkably, based on unlabelled multimodal scan data, t-SNE separated study subjects into two very distinct clusters which corresponded to subjects' gender labels (cluster silhouette index value=0.79). The resulting clusters were used to develop an unsupervised minimum distance clustering model which identified 93.5% of subjects' gender. Notably, from a neuropsychiatric perspective this method may allow discovery of data-driven disease phenotypes or sub-types of treatment responders.
BACKGROUND: Neuroimaging machine learning studies have largely utilized supervised algorithms - meaning they require both neuroimaging scan data and corresponding target variables (e.g. healthy vs. diseased) to be successfully 'trained' for a prediction task. Noticeably, this approach may not be optimal or possible when the global structure of the data is not well known and the researcher does not have an a priori model to fit the data. NEW METHOD: We set out to investigate the utility of an unsupervised machine learning technique; t-distributed stochastic neighbour embedding (t-SNE) in identifying 'unseen' sample population patterns that may exist in high-dimensional neuroimaging data. Multimodal neuroimaging scans from 92 healthy subjects were pre-processed using atlas-based methods, integrated and input into the t-SNE algorithm. Patterns and clusters discovered by the algorithm were visualized using a 2D scatter plot and further analyzed using the K-means clustering algorithm. COMPARISON WITH EXISTING METHODS: t-SNE was evaluated against classical principal component analysis. CONCLUSION: Remarkably, based on unlabelled multimodal scan data, t-SNE separated study subjects into two very distinct clusters which corresponded to subjects' gender labels (cluster silhouette index value=0.79). The resulting clusters were used to develop an unsupervised minimum distance clustering model which identified 93.5% of subjects' gender. Notably, from a neuropsychiatric perspective this method may allow discovery of data-driven disease phenotypes or sub-types of treatment responders.
Authors: Dominic B Dwyer; Carlos Cabral; Lana Kambeitz-Ilankovic; Rachele Sanfelici; Joseph Kambeitz; Vince Calhoun; Peter Falkai; Christos Pantelis; Eva Meisenzahl; Nikolaos Koutsouleris Journal: Schizophr Bull Date: 2018-08-20 Impact factor: 9.306
Authors: Mon-Ju Wu; Benson Mwangi; Isabelle E Bauer; Ives C Passos; Marsal Sanches; Giovana B Zunta-Soares; Thomas D Meyer; Khader M Hasan; Jair C Soares Journal: Neuroimage Date: 2016-02-13 Impact factor: 6.556
Authors: Andre F Marquand; Thomas Wolfers; Maarten Mennes; Jan Buitelaar; Christian F Beckmann Journal: Biol Psychiatry Cogn Neurosci Neuroimaging Date: 2016-09
Authors: Sandeep R Panta; Runtang Wang; Jill Fries; Ravi Kalyanam; Nicole Speer; Marie Banich; Kent Kiehl; Margaret King; Michael Milham; Tor D Wager; Jessica A Turner; Sergey M Plis; Vince D Calhoun Journal: Front Neuroinform Date: 2016-03-15 Impact factor: 4.081
Authors: David A Bridwell; James F Cavanagh; Anne G E Collins; Michael D Nunez; Ramesh Srinivasan; Sebastian Stober; Vince D Calhoun Journal: Front Hum Neurosci Date: 2018-03-26 Impact factor: 3.169
Authors: J Edward van Veen; Laura G Kammel; Patricia C Bunda; Michael Shum; Michelle S Reid; Megan G Massa; Douglas Arneson; Jae W Park; Zhi Zhang; Alexia M Joseph; Haley Hrncir; Marc Liesa; Arthur P Arnold; Xia Yang; Stephanie M Correa Journal: Nat Metab Date: 2020-04-13