| Literature DB >> 33854086 |
Arielle Borovsky1, Donna Thal2, Laurence B Leonard3.
Abstract
Due to wide variability of typical language development, it has been historically difficult to distinguish typical and delayed trajectories of early language growth. Improving our understanding of factors that signal language disorder and delay has the potential to improve the lives of the millions with developmental language disorder (DLD). We develop predictive models of low language (LL) outcomes by analyzing parental report measures of early language skill using machine learning and network science approaches. We harmonized two longitudinal datasets including demographic and standardized measures of early language skills (the MacArthur-Bates Communicative Developmental Inventories; MBCDI) as well as a later measure of LL. MBCDI data was used to calculate several graph-theoretic measures of lexico-semantic structure in toddlers' expressive vocabularies. We use machine-learning techniques to construct predictive models with these datasets to identify toddlers who will have later LL outcomes at preschool and school-age. This approach yielded robust and reliable predictions of later LL outcome with classification accuracies in single datasets exceeding 90%. Generalization performance between different datasets was modest due to differences in outcome ages and diagnostic measures. Grammatical and lexico-semantic measures ranked highly in predictive classification, highlighting promising avenues for early screening and delineating the roots of language disorders.Entities:
Year: 2021 PMID: 33854086 PMCID: PMC8047042 DOI: 10.1038/s41598-021-85982-0
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Feature importance rankings across younger and older datasets.
Model performance on each dataset when trained and tested using repeated-cross validation.
| Train/Test | BalAcc | Sens | Spec | PPV | NPV | LR+ | LR− |
|---|---|---|---|---|---|---|---|
| EIRLI-older | .91*** | .99 | .82 | .19 | 1 | 5.50 | 0.01 |
| LASER-older | .94*** | 1 | .88 | .61 | 1 | 8.33 | 0.00 |
| EIRLI-young | .88*** | 1 | .76 | .13 | 1 | 4.17 | 0.00 |
| LASER-young | .93*** | 1 | .87 | .58 | 1 | 7.69 | 0.00 |
| Aggregated-older | .92*** | .99 | .85 | .29 | 1 | 6.60 | 0.01 |
| Aggregated-young | .92*** | 1 | .84 | .28 | 1 | 6.25 | 0.00 |
***Balanced accuracy significantly exceeds baseline balanced accuracy of .50 where all cases are classified as normal, all p values < .0001.
Model performance for external validation, training and testing on separate datasets.
| Train | Test | BalAcc | Sens | Spec | PPV | NPV | LR+ | LR− |
|---|---|---|---|---|---|---|---|---|
| EIRLI-older | LASER-older | .54*** | .37 | .72 | .19 | .87 | 1.32 | 0.88 |
| LASER-older | EIRLI-older | .55*** | .23 | .87 | .085 | .96 | 1.77 | 0.89 |
| EIRLI-younger | LASER-younger | .46 | .15 | .76 | .098 | .84 | 0.63 | 1.12 |
| LASER-younger | EIRLI-younger | .50 | .41 | .58 | .10 | .92 | 0.98 | 1.02 |
Figure 2An example noun-feature network with four nodes (words) and five links (indicating multiple overlapping semantic features between words.)
Figure 3Visual schematic of strategy for model training and testing on internal and external cases from each dataset. Aggregated dataset models (depicted on gray background) reported in Supplemental Material.