| Literature DB >> 35463247 |
Meng Ji1, Wenxiu Xie2, Mengdan Zhao1, Xiaobo Qian3, Chi-Yin Chow2, Kam-Yiu Lam2, Jun Yan4, Tianyong Hao3.
Abstract
Background: Medication nonadherence represents a major burden on national health systems. According to the World Health Organization, increasing medication adherence may have a greater impact on public health than any improvement in specific medical treatments. More research is needed to better predict populations at risk of medication nonadherence. Objective: To develop clinically informative, easy-to-interpret machine learning classifiers to predict people with psychiatric disorders at risk of medication nonadherence based on the syntactic and structural features of written posts on health forums.Entities:
Mesh:
Year: 2022 PMID: 35463247 PMCID: PMC9033323 DOI: 10.1155/2022/6722321
Source DB: PubMed Journal: Comput Intell Neurosci
Figure 1Automatic feature selection recursive feature elimination with SVM as the base estimator.
Optimised features through zero importance feature elimination (RZE) and recursive feature elimination with support vector machine (SVM_RFE).
| Category | Features | Notation | Feature |
|---|---|---|---|
| Syntactic complexity analyzer | 11 | SCA11 | MLT, MLC, C_S, VP_T, C_T, T_S, CT_T, CP_T, CP_C, CN_T, CN_C |
| 10 | SCA10 | MLC, C_S, VP_T, C_T, T_S, CT_T, CP_T, CP_C, CN_T, CN_C | |
|
| |||
| Syntactic sophistication | 5 | SS5 | all_av_construction_freq_stdev, all_av_lemma_freq_stdev, all_av_lemma_freq_type, acad_av_approx_collexeme all_av_approx_collexeme_stdev |
| 4 | SS4 | all_av_construction_freq_stdev, all_av_lemma_freq_stdev, all_av_lemma_freq_type, acad_av_approx_collexeme | |
|
| |||
| Noun phrase complexity | 3 | NPC3 | dobj_stdev, advmod_pobj_deps_NN_struct, nsubj_NN_stdev |
| Clause complexity | 12 | CC12 | aux_per_cl, ccomp_per_cl, nsubjpass_per_cl, prepc_per_cl, nsubj_per_cl, mark_per_cl, ncomp_per_cl, cl_av_deps, cc_per_cl, prep_per_cl, csubj_per_cl, dep_per_cl |
| ALL | 38 | ALL 38 | CN_C, CT_T, acad_av_construction_freq_stdev, acad_av_lemma_freq_stdev, advmod_pobj_deps_struct, all_av_construction_freq_log, amod_pobj_deps_struct, aux_per_cl, auxpass_per_cl, av_ncomp_deps, av_nominal_deps_NN, cc_per_cl, ccomp_per_cl, conj_and_all_nominal_deps_struct, conj_and_pobj_deps_NN_struct, conj_or_all_nominal_deps_struct, csubj_per_cl, dep_per_cl, det_pobj_deps_NN_struct, dobj_NN_stdev, dobj_stdev, fic_av_delta_p_const_cue_stdev, fic_av_lemma_construction_freq_log, mark_per_cl, nn_all_nominal_deps_struct, nn_dobj_deps_NN_struct, nsubj_NN_stdev, nsubj_per_cl, nsubj_stdev, nsubjpass_per_cl, poss_dobj_deps_struct, prep_per_cl, prep_pobj_deps_NN_struct, prepc_per_cl, prt_per_cl, rcmod_dobj_deps_NN_struct, tmod_per_cl, xcomp_per_cl |
Performance of RVM classifiers with syntactic complexity features (no zero importance feature).
| RVM | Training data | Testing data | |||||
|---|---|---|---|---|---|---|---|
| AUC mean | SD | AUC | Accuracy | Sensitivity | Specificity | Macro-F1 | |
| SCA full 14 | 0.468 | 0.077 | 0.500 | 0.566 | 1.000 | 0.000 | 0.361 |
| SCA full 14 with min-max | 0.532 | 0.080 | 0.514 | 0.526 | 0.756 | 0.227 | 0.469 |
| SCA full 14 with | 0.454 | 0.089 | 0.474 | 0.566 | 1.000 | 0.000 | 0.361 |
| SCA full 14 with | 0.467 | 0.087 | 0.513 | 0.513 | 0.674 | 0.303 | 0.481 |
| SCA RFE 11 | 0.443 | 0.047 | 0.568 | 0.520 | 0.244 | 0.879 | 0.490 |
| SCA RFE 11 with min-max | 0.522 | 0.107 | 0.589 | 0.572 | 0.674 | 0.439 | 0.556 |
| SCA RFE 11 with | 0.512 | 0.106 | 0.596 | 0.572 | 0.663 | 0.455 | 0.558 |
| SCA RFE 11 with | 0.464 | 0.060 | 0.569 | 0.559 | 0.628 | 0.470 | 0.549 |
Performance of RVM classifiers with syntactic sophistication features (eliminated 55 zero importance features).
| RVM | Training data | Testing data | |||||
|---|---|---|---|---|---|---|---|
| AUC mean | SD | AUC | Accuracy | Sensitivity | Specificity | Macro-F1 | |
| SS full 190 | 0.514 | 0.050 | 0.529 | 0.566 | 1.000 | 0.000 | 0.361 |
| SS full 190 with min-max | 0.462 | 0.076 | 0.500 | 0.566 | 1.000 | 0.000 | 0.361 |
| SS full 190 with | 0.487 | 0.046 | 0.534 | 0.566 | 1.000 | 0.000 | 0.361 |
| SS full 190 with Z-score | 0.522 | 0.044 | 0.650 | 0.632 | 0.651 | 0.606 | 0.628 |
| SS RZF 135 | 0.518 | 0.054 | 0.533 | 0.566 | 1.000 | 0.000 | 0.361 |
| SS RZF 135 with min-max | 0.445 | 0.103 | 0.523 | 0.566 | 1.000 | 0.000 | 0.361 |
| SS RZF 135 with | 0.483 | 0.042 | 0.541 | 0.566 | 1.000 | 0.000 | 0.361 |
| SS RZF 135 with Z-score | 0.486 | 0.050 | 0.628 | 0.605 | 0.651 | 0.546 | 0.598 |
| SS RFE 5 | 0.519 | 0.054 | 0.533 | 0.566 | 1.000 | 0.000 | 0.361 |
| SS RFE 5 with min-max | 0.469 | 0.032 | 0.484 | 0.566 | 1.000 | 0.000 | 0.361 |
| SS RFE 5 with | 0.476 | 0.049 | 0.634 | 0.540 | 0.767 | 0.242 | 0.484 |
| SS RFE 5 with Z-score | 0.500 | 0.043 | 0.550 | 0.566 | 0.919 | 0.106 | 0.440 |
Performance of RVM classifiers with noun phrase complexity features (eliminated 59 zero importance features).
| RVM | Training data | Testing data | |||||
|---|---|---|---|---|---|---|---|
| AUC mean | SD | AUC | Accuracy | Sensitivity | Specificity | Macro-F1 | |
| NPC full 132 | 0.589 | 0.049 | 0.608 | 0.579 | 0.581 | 0.576 | 0.576 |
| NPC full 132 with min-max | 0.553 | 0.062 | 0.602 | 0.566 | 0.570 | 0.561 | 0.563 |
| NPC full 132 with | 0.615 | 0.049 | 0.603 | 0.566 | 0.581 | 0.546 | 0.562 |
| NPC full 132 with | 0.555 | 0.038 | 0.598 | 0.559 | 0.581 | 0.530 | 0.555 |
| NPC RZF 73 | 0.612 | 0.045 | 0.614 | 0.533 | 0.512 | 0.561 | 0.532 |
| NPC RZF 73 with min-max | 0.603 | 0.091 | 0.614 | 0.553 | 0.593 | 0.500 | 0.546 |
| NPC RZF 73 with | 0.595 | 0.017 | 0.611 | 0.566 | 0.570 | 0.561 | 0.563 |
| NPC RZF 73 with | 0.597 | 0.026 | 0.574 | 0.540 | 0.547 | 0.530 | 0.537 |
| NPC RFE 3 | 0.634 | 0.028 | 0.616 | 0.559 | 0.512 | 0.621 | 0.559 |
| NPC RFE 3 with min-max | 0.643 | 0.033 | 0.621 | 0.586 | 0.512 | 0.682 | 0.586 |
| NPC RFE 3 with | 0.660 | 0.039 | 0.611 | 0.605 | 0.547 | 0.682 | 0.605 |
| NPC RFE 3 with | 0.636 | 0.029 | 0.618 | 0.586 | 0.547 | 0.636 | 0.585 |
Performance of RVM classifiers with clause complexity features (eliminated 6 zero importance features).
| RVM | Training data | Testing data | |||||
|---|---|---|---|---|---|---|---|
| AUC mean | SD | AUC | Accuracy | Sensitivity | Specificity | Macro-F1 | |
| CC full 32 | 0.594 | 0.060 | 0.547 | 0.540 | 0.558 | 0.515 | 0.536 |
| CC full 32 with min-max | 0.563 | 0.059 | 0.548 | 0.540 | 0.581 | 0.485 | 0.533 |
| CC full 32 with | 0.605 | 0.047 | 0.543 | 0.526 | 0.547 | 0.500 | 0.522 |
| CC full 32 with | 0.580 | 0.047 | 0.532 | 0.546 | 0.558 | 0.530 | 0.543 |
| CC RZF 26 | 0.604 | 0.069 | 0.552 | 0.559 | 0.581 | 0.530 | 0.555 |
| CC RZF 26 with min-max | 0.566 | 0.067 | 0.577 | 0.540 | 0.570 | 0.500 | 0.534 |
| CC RZF 26 with | 0.602 | 0.051 | 0.544 | 0.526 | 0.547 | 0.500 | 0.522 |
| CC RZF 26 with | 0.569 | 0.047 | 0.570 | 0.605 | 0.651 | 0.546 | 0.598 |
| CC RFE 12 | 0.623 | 0.048 | 0.559 | 0.513 | 0.512 | 0.515 | 0.511 |
| CC RFE 12 with min-max | 0.625 | 0.047 | 0.585 | 0.572 | 0.581 | 0.561 | 0.569 |
| CC RFE 12 with L2 | 0.624 | 0.041 | 0.560 | 0.540 | 0.523 | 0.561 | 0.538 |
| CC RFE 12 with | 0.590 | 0.022 | 0.597 | 0.586 | 0.605 | 0.561 | 0.582 |
Performance of RVM classifiers with all (SCA + SS + NPC + CC) features (eliminated 125 zero importance features).
| RVM | Training data | Testing data | |||||
|---|---|---|---|---|---|---|---|
| AUC mean | SD | AUC | Accuracy | Sensitivity | Specificity | Macro-F1 | |
| ALL full 368 | 0.514 | 0.050 | 0.529 | 0.566 | 1.000 | 0.000 | 0.361 |
| ALL full 368 with min-max | 0.543 | 0.065 | 0.704 | 0.665 | 0.721 | 0.591 | 0.657 |
| ALL full 368 with | 0.455 | 0.034 | 0.481 | 0.566 | 1.000 | 0.000 | 0.361 |
| ALL full 368 with | 0.554 | 0.080 | 0.723 | 0.684 | 0.779 | 0.561 | 0.671 |
| ALL RZF 243 | 0.491 | 0.073 | 0.519 | 0.566 | 1.000 | 0.000 | 0.361 |
| ALL RZF 243 with min-max | 0.607 | 0.041 | 0.675 | 0.638 | 0.674 | 0.591 | 0.632 |
| ALL RZF 243 with | 0.508 | 0.029 | 0.504 | 0.566 | 1.000 | 0.000 | 0.361 |
| ALL RZF 243 with | 0.597 | 0.097 | 0.592 | 0.540 | 0.547 | 0.530 | 0.537 |
| ALL RFE 38 | 0.488 | 0.052 | 0.474 | 0.566 | 1.000 | 0.000 | 0.361 |
| ALL RFE 38 with min-max | 0.698 | 0.010 | 0.710 | 0.658 | 0.686 | 0.621 | 0.653 |
| ALL RFE 38 with | 0.517 | 0.084 | 0.516 | 0.566 | 1.000 | 0.000 | 0.361 |
| ALL RFE 38 with | 0.671 | 0.042 | 0.655 | 0.691 | 0.802 | 0.546 | 0.676 |
Performance of RVM classifiers with paired feature sets.
| Feature set | RVM | Training data | Testing data | |||||
|---|---|---|---|---|---|---|---|---|
| AUC mean | SD | AUC | Accuracy | Sensitivity | Specificity | Macro-F1 | ||
| F1 | SCA11 + NPC3 | 0.597 | 0.045 | 0.631 | 0.579 | 0.628 | 0.515 | 0.572 |
| F2 | SCA11 + NPC3 with min-max | 0.627 | 0.02 | 0.657 | 0.618 | 0.64 | 0.591 | 0.614 |
| F3 | SCA11 + NPC3 with | 0.624 | 0.025 | 0.624 | 0.572 | 0.616 | 0.515 | 0.566 |
| F4 | SCA11 + NPC3 with | 0.625 | 0.034 | 0.64 | 0.599 | 0.616 | 0.576 | 0.595 |
| F5 | SCA11 + CC12 | 0.449 | 0.049 | 0.568 | 0.526 | 0.279 | 0.848 | 0.504 |
| F6 | SCA11 + CC12 with min-max | 0.563 | 0.057 | 0.665 | 0.638 | 0.616 | 0.667 | 0.637 |
| F7 | SCA11 + CC12 with | 0.521 | 0.116 | 0.631 | 0.625 | 0.686 | 0.545 | 0.616 |
| F8 | SCA11 + CC12 with Z-score | 0.604 | 0.062 | 0.631 | 0.612 | 0.616 | 0.606 | 0.609 |
| F9 | SCA11 + SS5 | 0.523 | 0.045 | 0.536 | 0.566 | 1 | 0 | 0.361 |
| F10 | SCA11 + SS5 with min-max | 0.544 | 0.094 | 0.631 | 0.586 | 0.605 | 0.561 | 0.581 |
| F11 | SCA11 + SS5 with | 0.5 | 0.037 | 0.503 | 0.566 | 1 | 0 | 0.361 |
| F12 | SCA11 + SS5 with | 0.523 | 0.097 | 0.608 | 0.572 | 0.57 | 0.576 | 0.57 |
| F13 | NPC3 + CC12 | 0.662 | 0.04 | 0.628 | 0.566 | 0.535 | 0.606 | 0.565 |
| F14 | NPC3 + CC12 with min-max | 0.658 | 0.029 | 0.635 | 0.671 | 0.674 | 0.667 | 0.668 |
| F15 | NPC3 + CC12 with | 0.676 | 0.034 | 0.647 | 0.605 | 0.581 | 0.636 | 0.604 |
| F16 | NPC3 + CC12 with | 0.637 | 0.036 | 0.621 | 0.645 | 0.651 | 0.636 | 0.642 |
| F17 | NPC3 + SS5 | 0.524 | 0.049 | 0.536 | 0.566 | 1 | 0 | 0.361 |
| F18 | NPC3 + SS5 with min-max | 0.665 | 0.039 | 0.687 | 0.678 | 0.628 | 0.742 | 0.677 |
| F19 | NPC3 + SS5 with | 0.489 | 0.058 | 0.572 | 0.566 | 1 | 0 | 0.361 |
| F20 | NPC3 + SS5 with | 0.637 | 0.031 | 0.645 | 0.625 | 0.605 | 0.652 | 0.624 |
| F21 | CC12 + SS5 | 0.523 | 0.045 | 0.536 | 0.566 | 1 | 0 | 0.361 |
| F22 | CC12 + SS5 with min-max | 0.594 | 0.028 | 0.654 | 0.625 | 0.628 | 0.621 | 0.622 |
| F23 | CC12 + SS5 with | 0.478 | 0.051 | 0.572 | 0.566 | 1 | 0 | 0.361 |
| F24 | CC12 + SS5 with | 0.561 | 0.028 | 0.594 | 0.579 | 0.616 | 0.53 | 0.573 |
Performance of RVM classifiers with multiple feature sets.
| No. | RVM | Training data | Testing data | |||||
|---|---|---|---|---|---|---|---|---|
| AUC mean | SD | AUC | Accuracy | Sensitivity | Specificity | Macro-F1 | ||
| F25 | SCA11 + NPC3 + CC12 | 0.602 | 0.065 | 0.634 | 0.572 | 0.616 | 0.515 | 0.566 |
| F26 | SCA11 + NPC3 + CC12 with min-max | 0.653 | 0.030 | 0.651 | 0.638 | 0.663 | 0.606 | 0.634 |
| F27 | SCA11 + NPC3 + CC12 with | 0.614 | 0.032 | 0.609 | 0.546 | 0.605 | 0.470 | 0.537 |
| F28 | SCA11 + NPC3 + CC12 with | 0.646 | 0.015 | 0.660 | 0.625 | 0.686 | 0.545 | 0.616 |
| F29 | SCA11 + NPC3 + SS5 | 0.523 | 0.045 | 0.536 | 0.566 | 1.000 | 0.000 | 0.361 |
| F30 | SCA11 + NPC3 + SS5 with min-max | 0.662 | 0.034 | 0.664 | 0.625 | 0.616 | 0.636 | 0.623 |
| F31 | SCA11 + NPC3 + SS5 with | 0.509 | 0.041 | 0.504 | 0.566 | 1.000 | 0.000 | 0.361 |
| F32 | SCA11 + NPC3 + SS5 with | 0.634 | 0.037 | 0.674 | 0.625 | 0.628 | 0.621 | 0.622 |
| F33 | SCA11 + CC12 + SS5 | 0.523 | 0.046 | 0.536 | 0.566 | 1.000 | 0.000 | 0.361 |
| F34 | SCA11 + CC12 + SS5 with min-max | 0.551 | 0.050 | 0.572 | 0.599 | 0.616 | 0.576 | 0.595 |
| F35 | SCA11 + CC12 + SS5 with | 0.487 | 0.027 | 0.491 | 0.566 | 1.000 | 0.000 | 0.361 |
| F36 | SCA11 + CC12 + SS5 with | 0.587 | 0.059 | 0.659 | 0.651 | 0.663 | 0.636 | 0.648 |
| F37 | NPC3 + CC12 + SS5 | 0.523 | 0.046 | 0.536 | 0.566 | 1.000 | 0.000 | 0.361 |
| F38 | NPC3 + CC12 + SS5 with min-max | 0.685 | 0.038 | 0.709 | 0.704 | 0.721 | 0.682 | 0.700 |
| F39 | NPC3 + CC12 + SS5 with | 0.477 | 0.049 | 0.572 | 0.566 | 1.000 | 0.000 | 0.361 |
| F40 | NPC3 + CC12 + SS5 with Z-score | 0.643 | 0.025 | 0.682 | 0.671 | 0.709 | 0.621 | 0.665 |
| F41 | SCA11 + NPC3 + CC12 + SS5 | 0.523 | 0.046 | 0.536 | 0.566 | 1.000 | 0.000 | 0.361 |
| F42 | SCA11 + NPC3 + CC12 + SS5 with min-max | 0.672 | 0.039 | 0.740 | 0.724 | 0.756 | 0.682 | 0.719 |
| F43 | SCA11 + NPC3 + CC12 + SS5 with | 0.470 | 0.046 | 0.473 | 0.566 | 1.000 | 0.000 | 0.361 |
| F44 | SCA11 + NPC3 + CC12 + SS5 with Z-score | 0.651 | 0.054 | 0.725 | 0.711 | 0.756 | 0.652 | 0.704 |
| F45 | SCA10 + NPC3 + CC12 + SS4 | 0.525 | 0.043 | 0.530 | 0.566 | 1.000 | 0.000 | 0.361 |
| F46 | SCA10 + NPC3 + CC12 + SS4 with min-max |
|
|
|
|
|
|
|
| F47 | SCA10 + NPC3 + CC12 + SS4 with | 0.517 | 0.035 | 0.510 | 0.566 | 1.000 | 0.000 | 0.361 |
| F48 | SCA10 + NPC3 + CC12 + SS4 with | 0.665 | 0.034 | 0.727 | 0.717 | 0.733 | 0.697 | 0.714 |
Features included in the best-performing F45 model.
| Feature | Name | Description |
|---|---|---|
| Syntactic complexity analyzer (SCA10) | MLC | Mean length of clause |
| C_S | Clauses per sentence | |
| VP_T | Verb phrases per T-unit | |
| C_T | Clauses per T-unit | |
| T_S | T-units per sentence | |
| CT_T | Complex T-unit ratio | |
| CP_T | Coordinate phrases per T-unit | |
| CP_C | Coordinate phrases per clause | |
| CN_T | Complex nominals per T-unit | |
| CN_C | Complex nominals per clause | |
|
| ||
| Syntactic sophistication (SS4) | all_av_construction_freq_stdev | Average construction frequency-all (standard deviation) |
| all_av_lemma_freq_stdev | Average lemma frequency-all (standard deviation) | |
| acad_av_approx_collexeme_stdev | Average approximate collostructional strength- academic (std.) | |
| all_av_lemma_freq_type | Average lemma frequency (types only)-all | |
|
| ||
| Noun phrase complexity (NPC 3) | dobj_stdev | Dependents per direct object (standard deviation) |
| advmod_pobj_deps_NN_struct | Adverbial modifiers per object of the preposition (no pronouns) | |
| nsubj_NN_stdev | Dependents per nominal subject (no pronouns, standard deviation) | |
|
| ||
| Clause complexity (CC12) | aux_per_cl | Auxiliary verbs per clause |
| ccomp_per_cl | Clausal complements per clause | |
| nsubjpass_per_cl | Passive nominal subjects per clause | |
| prep_per_cl | Prepositions per clause | |
| nsubj_per_cl | Nominal subjects per clause | |
| mark_per_cl | Subordinating conjunctions per clause | |
| ncomp_per_cl | Nominal complements per clause | |
| cl_av_deps | Dependents per clause | |
| cc_per_cl | Clausal coordinating conjunctions per clause | |
| csubj_per_cl | Clausal subjects per clause | |
| dep_per_cl | Undefined dependents per clause | |
| prepc_per_cl | The number of prepositional complements per clause | |
Figure 2AUCs of RVMs on testing data using different feature sets.
Paired-sample t-test of the difference in sensitivity between the best-performing model and other models.
| No. | Pairs of RVMs | Mean difference | SD | 95% confidence interval of difference |
| Rank | (i/m) Q | Sig. | |
|---|---|---|---|---|---|---|---|---|---|
| Lower | Upper | ||||||||
| 1 | F46 versus ALL 38 with min-max | 0.0930 | 0.0104 | 0.0726 | 0.1134 | 0.0041 | 1 | 0.0083 |
|
| 2 | F46 versus F6 | 0.1628 | 0.0151 | 0.1332 | 0.1924 | 0.0029 | 2 | 0.0167 |
|
| 3 | F46 versus F18 | 0.1512 | 0.0145 | 0.1228 | 0.1795 | 0.0030 | 3 | 0.0250 |
|
| 4 | F46 versus F14 | 0.1047 | 0.0114 | 0.0824 | 0.1269 | 0.0039 | 4 | 0.0333 |
|
| 5 | F46 versus F38 | 0.0581 | 0.0071 | 0.0442 | 0.0721 | 0.0050 | 5 | 0.0417 |
|
| 6 | F46 versus F42 | 0.0233 | 0.0031 | 0.0172 | 0.0294 | 0.0059 | 6 | 0.0500 |
|
Paired-sample t-test of the difference in specificity between the best-performing model and other models.
| No. | Pairs of RVMs | Mean difference | SD | 95% confidence interval of difference |
| Rank | (i/m) Q | Sig. | |
|---|---|---|---|---|---|---|---|---|---|
| Lower | Upper | ||||||||
| 1 | F46 versus ALL 38 with min-max | 0.1212 | 0.0115 | 0.0986 | 0.1438 | 0.003 | 1 | 0.0083 |
|
| 2 | F46 versus F6 | 0.0758 | 0.0082 | 0.0596 | 0.0919 | 0.0039 | 2 | 0.0167 |
|
| 3 | F46 versus F14 | 0.0758 | 0.0082 | 0.0596 | 0.0919 | 0.0039 | 3 | 0.0250 |
|
| 4 | F46 versus F38 | 0.0606 | 0.0069 | 0.0471 | 0.0741 | 0.0043 | 4 | 0.0333 |
|
| 5 | F46 versus F42 | 0.0606 | 0.0069 | 0.0471 | 0.0741 | 0.0043 | 5 | 0.0417 |
|
| 6 | F46 versus F18 | 0 | 0 | 0 | 0 | 1 | 6 | 0.0500 | |
Figure 3Forest plot of logistic regression. P < 0.001. P < 0.01. P < 0.05. Multiple logistic regression is to predict medication nonadherence.
Mann–Whitney U test.
| Features (29 in total) | Name | Nonadherence mean (std.) | Adherence mean (std.) |
|
|---|---|---|---|---|
| Syntactic complexity analyzer (SCA10) | MLC | 9.369 (684) | 8.630 (2.530) | 0.003 |
| C_S | 2.260 (1.223) | 2.368 (1.460) | 0.5 | |
| VP_T | 2.645 (1.362) | 2.528 (1.389) | 0.378 | |
| C_T | 1.900 (0.943) | 1.950 (1.009) | 0.599 | |
| T_S | 1.227 (0.494) | 1.209 (0.419) | 0.818 | |
| CT_T | 0.426 (0.276) | 0.471 (0.279) | 0.046 | |
| CP_T | 0.334 (0.331) | 0.346 (0.378) | 0.908 | |
| CP_C | 0.202 (0.305) | 0.186 (0.184) | 0.631 | |
| CN_T | 1.585 (1.229) | 1.461 (0.979) | 0.463 | |
| CN_C | 0.825 (0.433) | 0.752 (0.381) | 0.125 | |
|
| ||||
| Syntactic sophistication (SS4) | all_av_construction_freq_stdev | 624630.667 (259280.32) | 628026.977 (280808.55) | 0.865 |
| all_av_lemma_freq_stdev | 2237002.019 (759008.785) | 2200521.353 (840754.36) | 0.773 | |
| acad_av_approx_collexeme_stdev | 30199.338 (69193.092) | 16682.656 (42391.101) | 0.027 | |
| all_av_lemma_freq_type | 1469180.468 (793748.056) | 1561373.037 (945196.99) | 0.417 | |
|
| ||||
| Noun phrase complexity (NPC3) | dobj_stdev | 0.888 (0.432) | 0.722 (0.476) | 0 |
| advmod_pobj_deps_NN_struct | 0.030 (0.054) | 0.045 (0.086) | 0.291 | |
| nsubj_NN_stdev | 0.510 (0.498) | 0.584 (0.532) | 0.096 | |
|
| ||||
| Clause complexity (CC12) | aux_per_cl | 0.295 (0.154) | 0.271 (0.157) | 0.155 |
| ccomp_per_cl | 0.124 (0.103) | 0.145 (0.128) | 0.121 | |
| nsubjpass_per_cl | 0.025 (0.045) | 0.043 (0.073) | 0.025 | |
| prep_per_cl | 0.288 (0.159) | 0.325 (0.233) | 0.288 | |
| nsubj_per_cl | 0.679 (0.167) | 0.711 (0.181) | 0.031 | |
| mark_per_cl | 0.098 (0.084) | 0.108 (0.109) | 0.55 | |
| ncomp_per_cl | 0.054 (0.085) | 0.054 (0.091) | 0.407 | |
| cl_av_deps | 2.724 (0.399) | 2.732 (0.449) | 0.492 | |
| cc_per_cl | 0.012 (0.035) | 0.009 (0.028) | 0.128 | |
| csubj_per_cl | 0.007 (0.028) | 0.011 (0.036) | 0.121 | |
| dep_per_cl | 0.100 (0.099) | 0.109 (0.125) | 0.968 | |
| prepc_per_cl | 0.023 (0.048) | 0.023 (0.048) | 0.643 | |
Figure 4Interpreting the diagnostic utility of the best-performing classifier using Bayes' nomogram.