| Literature DB >> 28572606 |
Arezoo Movaghar1,2,3, Marsha Mailick4, Audra Sterling5,6, Jan Greenberg7, Krishanu Saha8,9,10.
Abstract
Millions of people globally are at high risk for neurodegenerative disorders, infertility or having children with a disability as a result of the Fragile X (FX) premutation, a genetic abnormality in FMR1 that is underdiagnosed. Despite the high prevalence of the FX premutation and its effect on public health and family planning, most FX premutation carriers are unaware of their condition. Since genetic testing for the premutation is resource intensive, it is not practical to screen individuals for FX premutation status using genetic testing. In a novel approach to phenotyping, we have utilized audio recordings and cognitive profiling assessed via self-administered questionnaires on 200 females. Machine-learning methods were developed to discriminate FX premutation carriers from mothers of children with autism spectrum disorders, the comparison group. By using a random forest classifier, FX premutation carriers could be identified in an automated fashion with high precision and recall (0.81 F1 score). Linguistic and cognitive phenotypes that were highly associated with FX premutation carriers were high language dysfluency, poor ability to organize material, and low self-monitoring. Our framework sets the foundation for computational phenotyping strategies to pre-screen large populations for this genetic variant with nominal costs.Entities:
Mesh:
Substances:
Year: 2017 PMID: 28572606 PMCID: PMC5454004 DOI: 10.1038/s41598-017-02682-4
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Workflow overview and language sample instruction. (A) The workflow for this project starts with data collection via phone or in-person interviews. Five-minute language samples were generated in response to interview instruction shown in (B) as previously described[11]. The audio recordings were transcribed and a text-processing module was used to extract linguistic features from the resulting transcripts. In parallel, cognitive profiles were collected using a cognitive self-assessment based on BRIEF-A standard[32, 33]. Then, a comprehensive profile was generated using the combination of linguistic and cognitive features. For the machine learning portion of the study, a feature selection module was implemented to choose the most informative features based on the information gain score. A classifier was developed to determine the FX premutation status. Finally, the results of evaluation were used to determine the feasibility of improving data collection process. A mobile app based on these results was developed to expand the study.
Participants’ information and results of independent two-sample t-test.
| FX premutation carriers | Comparison group |
|
| |||
|---|---|---|---|---|---|---|
| Mean | SD | Mean | SD | |||
| Maternal age | 48.91 | 6.332 | 47.374 | 6.886 | 1.633 | 0.104 |
| Maternal education | 3.16 | 0.66 | 3.14 | 0.75 | 0.201 | 0.841 |
Demographic information and comparison for all participants in this study. Maternal education scaled as follows: 1: Less than high school; 2: High school graduate; 3: Some college/Bachelor’s degree; and, 4 Post bachelor’s/grad degree.
Valuable linguistic and cognitive features.
| Type | Features | Description | Utterance example | |
|---|---|---|---|---|
|
|
| Filled pauses | Words or vocalizations that fill in a pause, e.g., um, ah, oh. | She is uh a very uh lovely girl. |
| Total number of Repetitions | The exact duplication of a linguistic unit of any length, from a word to an entire clause with no other utterances besides fillers (such as “um” or “uh” allowed in between | |||
| Repetition (word) | Number of repetitions of single words | (He) He is John. | ||
| Repetition (phrase) | Number of repetitions at the phrase level | (He is) He is John. | ||
|
| Number of utterances | Number of all the spoken statements, questions, exclamations, or vocal sound. An utterance is any independent clause and any dependent clauses or phrases associated with it. | What is his name? I don’t know! He is John. | |
| Number of statements | Number of all the statements (utterances ending with “.”) | He is John. | ||
| Number of questions | Number of all the questions (utterances ending with “?”) | What is his name? | ||
| Mean length of utterances | Average number of morphemes per utterance | |||
| One-word utterance | Number of utterances, which their length is equal to 1. | Yes. | ||
| Number of repeated words | Total number of words occurring in a dysfluency. | |||
| Repeated words Percentage | Rate of the repeated words per all words produced by speaker. | |||
| Short utterances | Utterances with less than 5 morphemes | |||
| Medium utterances | Utterances with 6 to 10 morphemes | |||
| Average repetitions per utterance | Average number of repetitions occurred per utterances | |||
|
| Self-monitoring | The Self-Monitor scale from BRIEF-A assesses aspects of social or interpersonal awareness. It captures the degree to which an individual perceives himself as aware of the effect that his or her behavior has on others. | ||
| Organization of material | The Organization of Materials scale from BRIEF-A measures orderliness of work, living, and storage spaces. | |||
| Working memory | The capacity to hold information in mind for the purpose of completing a task, encoding information, or generating goals, plans, and sequential steps to achieving goals. | |||
Description of dysfluency features are defined as previously[11] and cognitive features are defined as previously[32]. See Fig. 2 for information gain values.
Figure 2Valuable features with highly ranked information gain. Values are reported as mean ± standard deviation and presented in descending order within each type of feature. Higher value indicates higher information gain. See Table 2 for a description of the features.
Figure 3Performance of different classifiers. (A) Receiver operating characteristic (ROC) curves for five commonly-used classifier methods. Inputs for the classifiers utilized the full linguistic and cognitive profiles of FX premutation carriers and the comparison group. ROC curves provide a comprehensive visualization to summarize accuracy of prediction methods. The diagram shows a test’s false-positive rate (FPR), or 1 – specificity versus its sensitivity. (B) F1 score measures the test’s accuracy. It considers both precision and recall. As the F1 score approaches 1, the test has better accuracy. Random forest has F1 score equal to 0.81, which indicates the best performance among the tested classifiers.
Figure 4Random forest classifier performance for different sets of input features. (A) Receiver operating characteristic (ROC) curves for classifiers using three different profiles of FX premutation carriers and the comparison group. ROC curves provide a comprehensive visualization to summarize accuracy of prediction methods. The diagram shows a test’s false-positive rate (FPR), or 1 – specificity versus its sensitivity. Cognitive profile has the worst diagnostic utility, and our proposed profile has shown the best performance. (B) F1 score measures the test’s accuracy. It considers both precision and recall. As the F1 score approaches 1, the test has better accuracy. The proposed profile has F1 score equal to 0.81, which indicates the best performance among the tested profiles.
Figure 5Random forest classifier performance for different segments of each interview. (A) ROC curves for classifiers using 5 different data profiles of FX premutation carriers and comparison group. These profiles created from a different segment of each interview, but with the same length. The first and last segments of the language samples provided the most amount of information for the classifier. The information provided in segment 3 resulted in the worst performance. (B) F1 score measures the test’s accuracy considering both precision and recall. The profile constructed from the last segment has a F1 score equal to 0.74, which indicates the best performance among the tested profiles from a particular segment of an interview.