| Literature DB >> 29213908 |
Cíntia Matsuda Toledo1, Andre Cunha2, Carolina Scarton3, Sandra Aluísio2.
Abstract
Discourse production is an important aspect in the evaluation of brain-injured individuals. We believe that studies comparing the performance of brain-injured subjects with that of healthy controls must use groups with compatible education. A pioneering application of machine learning methods using Brazilian Portuguese for clinical purposes is described, highlighting education as an important variable in the Brazilian scenario.Entities:
Keywords: adults; age groups; educational status; language tests; narratives; natural language processing
Year: 2014 PMID: 29213908 PMCID: PMC5619399 DOI: 10.1590/S1980-57642014DN83000006
Source DB: PubMed Journal: Dement Neuropsychol ISSN: 1980-5764
Examples of original and edited descriptions
| Original examples | Edited examples | |
|---|---|---|
| 30-60 | Eles estão vendo televisão/They are watching television | Eles estão vendo televisão./They are watching television. |
| Ele | Ele | |
| Eu | Eu | |
| Eu | Eu | |
| 30-60 | Eu estou vendo vários carros com motorista pessoas nas janelas | Eu estou vendo vários carros com motorista, pessoas nas janelas. |
| Young man changing tire woman walking | Young man changing tire, woman walking. |
Statistics from the corpus of descriptions.
| Education | # Words | # Sentences | Clauses/Sentences | Writing time | # Descriptions |
|---|---|---|---|---|---|
| 3-4 years | 16.5 (8.32) | 1.91 (1.25) | 2.01 (1.63) | 6.42 (4.51) | 43 |
| 5-8 years | 28.4 (17.8) | 2.14 (1.36) | 3.00 (2.33) | 5.08 (2.46) | 64 |
| 9-15 years | 26.1 (12.4) | 2.08 (3.04) | 3.08 (1.95) | 4.27 (2.17) | 61 |
| 15+ years | 48.1 (29.4) | 3.58 (2.66) | 2.86 (2.52) | 3.64 (2.16) | 74 |
SD: standard deviation.
Set of features for all experiments.
| 1. number of words (LE) | 39. incidence of adverb ambiguity (SE) |
| 2. number of sentences (LE) | 40. argument overlap in adjacent sentences (DI) |
| 3. words per sentence (LE) | 41. argument overlap in previous sentences of the text (DI) |
| 4. syllables per word (LE) | 42. word stem overlap in adjacent sentences (DI) |
| 5. verb incidence (MO) | 43. word stem overlap in previous sentences of the text (DI) |
| 6. noun incidence (MO) | 44. content word overlap in adjacent sentences (DI) |
| 7. incidence of adjectives (MO) | 45. anaphor reference in adjacent sentences (DI) |
| 8. incidence of adverbs (MO) | 46. anaphor reference in previous sentences of the text (DI) |
| 9. incidence of pronouns (MO) | 47. ratio of number of simple words to number of words (LE) |
| 10. incidence of content words (verbs, nouns, adjectives, and adverbs) (MO) | 48. ratio of number of sentences in passive voice to number of sentences (SI) |
| 11. rlesch Index for Portuguese (LE) | 49. ratio of number of sentences initiated by subordinate conjunctions to number of sentences (SI) |
| 12. syllables per word (LE) | 50. ratio of number of sentences initiated by coordinate conjunctions to number of sentences (SI) |
| 13. occurrence of noun phrases (SI) | 51. mode of the number of clauses (SI) |
| 14. occurrence of modifiers per noun phrase (SI) | 52. ratio of average number of clauses to number of sentences in the text (SI) |
| 15. occurrence of words before main verbs (MO) | 53. ratio of number of coordinate conjunctions to number of words (SI) |
| 16. frequency for content words (MO) | 54. ratio of number of subordinate conjunctions to number of words (SI) |
| 17. minimum frequency for content words (MO) | 55. ratio of number of gerunds to number of verbs (SI) |
| 18. incidence of all connectives (LE) | 56. ratio of number of participles to number of verbs (SI) |
| 19. incidence of positive additive connectives (LE) | 57. ratio of infinitives to number of verbs (SI) |
| 20. incidence of negative additive connectives (LE) | 58. ratio of total of gerunds, participles and infinitives to number of words (SI) |
| 21. incidence of positive temporal connectives (LE) | 59. ratio of average number of preposition phrases to number of sentences in the text (SI) |
| 22. incidence of negative temporal connectives (LE) | 60. ratio of average number of preposition phrases to number of clauses in the text (SI) |
| 23. incidence of positive causal connectives (LE) | 61. ratio of number of relative clauses to number of verbs (SI) |
| 24. incidence of negative causal connectives (LE) | 62. ratio of number of restrictive appositives to number of sentences (SI) |
| 25. incidence of positive logical connectives (LE) | 63. ratio of number adverbial adjuncts to number of sentences (SI) |
| 26. incidence of negative logical connectives (LE) | 64. ratio of number of personal pronouns to number of words (LE) |
| 27. incidence of logical operators (LE) | 65. ratio of number of possessive pronouns to number of words (LE) |
| 28. incidence of number of | 66. ratio of total number of markers to number of words (LE) |
| 29. incidence of number of | 67. ratio of total number of ambiguous markers to number of markers (LE) |
| 30. incidence of number of | 68. description time (TA) |
| 31. incidence of number of negations (LE) | 69. simple or complex description (TA) |
| 32. incidence of personal pronouns (LE) | 70. amount of information (TA) |
| 33. incidence of pronouns per noun phrase (SI) | 71. understood the main information (yes/no) (TA) |
| 34. incidence of type/token ratio (LE) | 72. age (TA) |
| 35. incidence of verb hypernym (SE) | 73. picture presentation order (TA) |
| 36. incidence of verb ambiguity (SE) | 74. percentage of misspellings (LE) |
| 37. incidence of noun ambiguity (SE) | 75. percentage of positive words from the LIWC dictionary (SE) |
| 38. incidence of adjective ambiguity (SE) | 76. percentage of negative words from the LIWC dictionary (SE) |
LE: use of lexicons or sentence segmentation tools; MO: use of morphosyntactic taggers; SI: use of full or shallow parsers; SE: use of semantic dictionaries, thesauri, WordNets; DI: use of tools for discourse evaluation; TA: use of features dedicated to the task, whose processing was not manually calculated. Incidence corresponds to the number of units classified for a given measure divided by the number of total words in the text by 1,000 words.
Performance of classification methods to evaluate classification difficulty involving the intermediate classes 5-8 and 9-15.
| Algorithm | Exp. 1 | Exp. 2 | Exp. 3 | Exp. 4 | Exp. 5 | Exp. 6 |
|---|---|---|---|---|---|---|
| NaïveBayes | 70.9 | 74.2 | 75.7 | 81.3 | 74.2 | 80.4 |
| SVM | 43.9 | 68.0 | ||||
| MLP | 69.0 | 81.7 | 93.5 | 93.2 | 91.6 | 93.5 |
| SimpleLogistic | 73.4 | 86.7 | 86.3 | 85.2 | 81.8 | 84.8 |
| JRip | 88.6 | 86.3 | 87.8 | 88.3 | ||
| J48 | 66.2 | 79.3 | 93.2 | 91.2 | 90.5 | 92.8 |
| Baseline |