| Literature DB >> 28203249 |
Jianbo Yuan1, Chester Holtz1, Tristram Smith2, Jiebo Luo1.
Abstract
Autism spectrum disorder (ASD) is a developmental disorder that significantly impairs patients' ability to perform normal social interaction and communication. Moreover, the diagnosis procedure of ASD is highly time-consuming, labor-intensive, and requires extensive expertise. Although there exists no known cure for ASD, there is consensus among clinicians regarding the importance of early intervention for the recovery of ASD patients. Therefore, to benefit autism patients by enhancing their access to treatments such as early intervention, we aim to develop a robust machine learning-based system for autism detection by using Natural Language Processing techniques based on information extracted from medical forms of potential ASD patients. Our detecting framework involves converting semi-structured and unstructured medical forms into digital format, preprocessing, learning document representation, and finally, classification. Testing results are evaluated against the ground truth set by expert clinicians and the proposed system achieve a 83.4% accuracy and 91.1% recall, which is very promising. The proposed ASD detection framework could significantly simplify and shorten the procedure of ASD diagnosis.Entities:
Keywords: Autism spectrum disorder; Classification; Distributed representation; Medical forms
Year: 2017 PMID: 28203249 PMCID: PMC5288414 DOI: 10.1186/s13637-017-0057-1
Source DB: PubMed Journal: EURASIP J Bioinform Syst Biol ISSN: 1687-4145
Fig. 1The framework of proposed ASD detection
Fig. 2The framework of learning document representations
Fig. 3An example of semi-structured medical form (left), after de-skewing (middle) and de-identification (right)
Number of extracted lexical feature
|
|
|
| |
|---|---|---|---|
| Number of Features | 4839 | 4839 | 9284 |
Classification results without upsampling
| Precision | Recall | F2 Score | |
|---|---|---|---|
|
| 33.2% | 34.3% | 34.1% |
|
| 34.9% | 36.7% | 36.3% |
|
| 36.7% | 38.4% | 38.1% |
| All Lexical Features | 37.5% | 46.2% | 44.2% |
|
| 39.7% | 52.4% | 49.2% |
|
| 47.2% | 64.4% | 60.0% |
Classification results with upsampling
| Precision | Recall | F2 Score | |
|---|---|---|---|
|
| 40.4% | 41.1% | 40.9% |
|
| 41.4% | 42.9% | 42.5% |
|
| 43.1% | 44.6% | 44.3% |
| All Lexical Features | 44.4% | 42.9% | 43.2% |
|
| 58.0% | 83.9% | 77.0% |
|
| 64.6% | 91.1% | 84.2% |
Fig. 4Classification results for LDA and doc2vec features with different dimensions
Top 10 selected features with the largest weights
| Positive | Negative |
|---|---|
| Traits | Behavioral patterns |
| Seizures | Vocalizes vowel sounds |
| Attention span | Concerns |
| Physical | Actively involved |
| Disorder | Sure |
| Severely | Individual |
| Sensory | Help |
| Seems | Disability |
| Functionally plays | Affection family |
| Variety | Mood swings |