| Literature DB >> 35501679 |
Chirag Gupta1,2, Pramod Chandrashekar1,2, Ting Jin1,2, Chenfeng He1,2, Saniya Khullar1,2, Qiang Chang1,3,4, Daifeng Wang5,6,7.
Abstract
Intellectual and Developmental Disabilities (IDDs), such as Down syndrome, Fragile X syndrome, Rett syndrome, and autism spectrum disorder, usually manifest at birth or early childhood. IDDs are characterized by significant impairment in intellectual and adaptive functioning, and both genetic and environmental factors underpin IDD biology. Molecular and genetic stratification of IDDs remain challenging mainly due to overlapping factors and comorbidity. Advances in high throughput sequencing, imaging, and tools to record behavioral data at scale have greatly enhanced our understanding of the molecular, cellular, structural, and environmental basis of some IDDs. Fueled by the "big data" revolution, artificial intelligence (AI) and machine learning (ML) technologies have brought a whole new paradigm shift in computational biology. Evidently, the ML-driven approach to clinical diagnoses has the potential to augment classical methods that use symptoms and external observations, hoping to push the personalized treatment plan forward. Therefore, integrative analyses and applications of ML technology have a direct bearing on discoveries in IDDs. The application of ML to IDDs can potentially improve screening and early diagnosis, advance our understanding of the complexity of comorbidity, and accelerate the identification of biomarkers for clinical research and drug development. For more than five decades, the IDDRC network has supported a nexus of investigators at centers across the USA, all striving to understand the interplay between various factors underlying IDDs. In this review, we introduced fast-increasing multi-modal data types, highlighted example studies that employed ML technologies to illuminate factors and biological mechanisms underlying IDDs, as well as recent advances in ML technologies and their applications to IDDs and other neurological diseases. We discussed various molecular, clinical, and environmental data collection modes, including genetic, imaging, phenotypical, and behavioral data types, along with multiple repositories that store and share such data. Furthermore, we outlined some fundamental concepts of machine learning algorithms and presented our opinion on specific gaps that will need to be filled to accomplish, for example, reliable implementation of ML-based diagnosis technology in IDD clinics. We anticipate that this review will guide researchers to formulate AI and ML-based approaches to investigate IDDs and related conditions.Entities:
Keywords: Artificial intelligence; Brain; Genomics; Intellectual and developmental disabilities; Machine learning; Multi-omics
Mesh:
Year: 2022 PMID: 35501679 PMCID: PMC9059371 DOI: 10.1186/s11689-022-09438-w
Source DB: PubMed Journal: J Neurodev Disord ISSN: 1866-1947 Impact factor: 4.074
Fig. 1An overview of the ML process and potential applications to IDDs. Various types of data (e.g., clinical, behavior, neuroimaging, and multi-omics) are usually recorded in IDD cohorts. These data sets are first individually processed and cleaned to remove noise and extract relevant biological signals (feature extraction). Then, an AI/ML algorithm is trained to find rules and patterns in the integrated dataset. The choice of the algorithm usually depends on the formulation of the biological problem and other data-set specific factors (discussed in Machine learning methods and applications to IDDs section of the main text). Typically, the model can be tested objectively with independent datasets or prior knowledge. A correctly evaluated and validated model is often generalizable, and such models have a variety of clinical and laboratory applications in IDDs
Glossary of key terms in artificial intelligence and machine learning
| a. Confusion matrix: False positives (FP) are the number of negative samples which were wrongly predicted as being positive; false negatives (FN) are the number of positive samples which were wrongly predicted as being negative. Accurate predictions are true positives (TP: number of truly positive samples correctly predicted) and true negatives (TN: number of truly negative samples correctly predicted). | |
| b. Accuracy (ACC)—This is mostly used for classification tasks. It tells us the ratio of correctly predicted labels among all the labels. It ranges between 0 and 1 where 1 means all samples are correctly predicted and 0 means random guess. | |
| c. Area under the curve (AUC)—Also used in classification tasks. It tells us how well the model can differentiate among classes at various thresholds. Higher AUCs correspond to models that can better distinguish between disease (usually class 1) and healthy (usually class 0) patients. The values range from 0 to 1 and are usually compared with random guessing (AUC of 0.5). | |
| d. Mean squared error (MSE)—It is mostly used in regression purposes. It measures the average of the squared difference between the predicted values and the respective ground truth values. Intuitively, it computes the variance of the residuals. | |
| e. Mean absolute error (MAE)—Widely used in regression tasks, it measures the absolute distance between the predicted and the ground truth labels. | |
| f. Purity—This metric is used in clustering unsupervised learning approaches. It measures how well each cluster contains an individual class. | |
| g. F1 score (F1)—Harmonic mean between precision and recall. The values can be between 0 and 1, where predictive models try to achieve F1 scores close to 1. | |
Fig. 2A typical machine learning workflow. While there are generally four main broad steps to develop a ML strategy (shaded boxes here and discussed comprehensively in the main text), a user is often left with multiple choices to select from a variety of tools/algorithms available for each step. The choice mainly depends on the dataset-specific factors and desired outcomes from the model. The workflow presented here can possibly guide decisions when choosing algorithms for developing and testing a data analysis strategy
Published research articles demonstrating machine learning applications to intellectual and developmental disabilities
| Reference | Disorder | Data type | Application |
|---|---|---|---|
| Koivu et al. (2018) [ | Down’s Syndrome (DS) | NeuroImage | Disease risk assessment |
| Tenev et al. (2014) [ | ADHD, ADHD, ASD | NeuroImage | Patient classification |
| Wang & Avillach (2021) [ | ASD, DS, ADHD | Multi-Omics, Behavior | Patient classification |
| Hazlett et al. (2017) [ | ASD | NeuroImage | Diagnosis |
| Heinsfeld et al. (2018) [ | ASD, FXS | Clinical | Diagnosis |
| Stahl et al. (2012) [ | Cerebral Palsy (CP) | Behavior | Diagnosis |
| Ramaswami et al. (2020) [ | ASD | Multi-Omics | Disease sub-typing |
| Voineagu (2011) [ | ASD, Neurodevelopmental disease, ASD | Multi-Omics | Biomarker discovery |
| Liu et al. (2021) [ | ADHD | Multi-Omics | Biomarker discovery |
| Cogill et al. (2016) [ | ASD | Multi-Omics | Gene prioritization |
| Kimura et al. (2019) [ | Williams syndrome | Multi-Omics | Cellular/molecular pathways |
| Colby et al. (2012) [ | ADHD, ASD | NeuroImage | Patient classification |
| Jacobs et al. (2021) [ | Multiple | NeuroImage, Behavior | Disease sub-typing |
Fig. 3Applications of ML/AI-driven research. Some of the key studies we discussed in Machine learning methods and applications to IDDs section of this review are summarized. It is clear that the application of AI/ML (colored edges with references) to multimodal data types (depicted in the middle layer) has the potential to be useful in enhancing clinical decision-making (top layer) as well as developing a mechanistic understanding of IDDs (bottom layer)