| Literature DB >> 32641159 |
Van-Hoang Nguyen1, Kazunari Sugiyama2, Min-Yen Kan2, Kishaloy Halder2.
Abstract
BACKGROUND: Health 2.0 allows patients and caregivers to conveniently seek medical information and advice via e-portals and online discussion forums, especially regarding potential drug side effects. Although online health communities are helpful platforms for obtaining non-professional opinions, they pose risks in communicating unreliable and insufficient information in terms of quality and quantity. Existing methods in extracting user-reported adverse drug reactions (ADRs) in online health forums are not only insufficiently accurate as they disregard user credibility and drug experience, but are also expensive as they rely on supervised ground truth annotation of individual statement. We propose a NEural ArchiTecture for Drug side effect prediction (NEAT), which is optimized on the task of drug side effect discovery based on a complete discussion while being attentive to user credibility and experience, thus, addressing the mentioned shortcomings. We train our neural model in a self-supervised fashion using ground truth drug side effects from mayoclinic.org. NEAT learns to assign each user a score that is descriptive of their credibility and highlights the critical textual segments of their post.Entities:
Keywords: Credibility analysis; Deep learning; Drug side effect discovery; Natural language processing; Online health communities
Mesh:
Year: 2020 PMID: 32641159 PMCID: PMC7341623 DOI: 10.1186/s13326-020-00221-1
Source DB: PubMed Journal: J Biomed Semantics
Side effects of anti-depressants
| Drugs | Side effects |
|---|---|
| Lexapro | chills, constipation, cough, decreased appetite, |
| Xanax | abdominal or stomach pain, muscle weakness, |
| Zoloft |
The Drugs and Side effects columns respectively list the anti-depressants and their side effects extracted from a drug–side effect database. Side effects in common among those listed are bold
A sample discussion thread from an online health community
| User IDs | Posts | Mentioned drugs | Aggregated side effects |
|---|---|---|---|
| 3690 | While my experience of 10 years is with Paxil, I expect that Zoloft will be the same. You should definitely feel better within 2 weeks. One way I found to make it easier to sleep was to get lots of exercize. Walk or run or whatever to burn off that anxiety. | Zoloft, Paxil | changed behavior, decreased sexual desire, diarrhea, dry mouth, heart-burn, sleepiness or unusual drowsiness,... |
| 26521 | I’ve heard of people going “cold turkey” and having withdrawal at 6 months! Please, get in contact with a doctor ASAP! “common symptoms include dizziness, electric shock-like sensations, sweating, nausea, insomnia, tremor, confusion, nightmares and vertigo” |
The User IDs and Posts columns respectively list the IDs of users involved in the discussions and their messages. The Mentioned drugs and Aggregated side effects columns respectively list the explicitly discussed drugs and their combined side effects
Fig. 1The neural architecture of our proposed NEAT. The w and v boxes denote Credibility Weight (CW) component and User Expertise (UE) component. The yellow boxes and blue boxes denote Cluster Attention (CA) component and neural text encoders with attention. The highlighted words in red denoted the text segments that are being attended by the encoder. The ×, , and σ symbols denote the multiplication, summation, and sigmoid, respectively
Fig. 2Principal component analysis on user experience vectors. The horizontal axis denotes the number of principal components chosen for PCA, while the vertical axis denotes their percentage of variance explained. We notice that the percentage of variance explained does not increase significantly after 100 principal components
Fig. 3Silhouette scores for User Clustering. The horizontal axis denotes the number of clusters chosen for K-means clustering, while the vertical axis denotes their the silhouette scores. We notice that the silhouette scores drop sharply after 7 clusters.
Most common experienced side effects for each user cluster c (i=1 to 7)
| Cluster | Most common experienced side effects |
|---|---|
| vision blurred, yellow skin, vision double, yellow eye, nose stuffy | |
| headache, itch, stomach pain, weak, nausea | |
| itch, irritate, headache, pain abdominal, stomach cramp | |
| bad taste, nausea, tiredness, irritate, mouth ulcer | |
| skin red, itch, rash skin, skin peeling, burning skin | |
| sneezing, nose runny, nose stuffy, decrease sexual desire, pain breast | |
| nausea, stomach pain, vomit, diarrhea, pain abdominal |
The left column lists the names of 7 clusters, and the right column describes the most common experienced side effects of users in each cluster
Fig. 4LSTM-based encoder with cluster attention. The × and + cells denote the attention-weighted summation described in Eq. (2). The C cell denotes the concatenation of the forward, , and backward, , hidden states
Fig. 5CNN-based Encoder with Cluster Attention. The × and + cells denote the attention-weighted summation described in Eq. 2. The C cell denotes the concatenation of the final hidden states of K convolution blocks
Some statistics on our dataset
| # Users | 14,966 |
| # Threads | 78,213 |
| Avg. words per post | 67.45 |
| Avg. posts per thread | 3.97 |
| Avg. participated threads per user | 54.7 |
| # Side effects (SE) | 315 |
| Avg. SEs per thread | 74.25 |
| # Drugs | 1869 |
| Avg. experienced side effects per user | 128.12 |
The left column contains the statistics’ descriptions while the right column contains the statistics’ values
Performance of CNN-based models and LSTM-based models in Ablation Study
| Components | Evaluation Metrics | |||||
|---|---|---|---|---|---|---|
| CW | UE | CA | Pre. | Rec. | ||
| LSTM-Vanilla | 0.6173 | 0.407 | 0.4335 | |||
| LSTM-WPE | ✓ | 0.4344 | 0.4503 | |||
| LSTM-WPEU | ✓ | ✓ | 0.6064 | 0.5001 | 0.4896 | |
| LSTM-NEAT | ✓ | ✓ | ✓ | 0.6197 | ||
| CNN-Vanilla | 0.7214 | 0.5503 | 0.5637 | |||
| CNN-WPE | ✓ | 0.5799 | 0.5804 | |||
| CNN-WPEU | ✓ | ✓ | 0.6923 | 0.6350 | 0.5910 | |
| CNN-NEAT | ✓ | ✓ | ✓ | 0.7066 | ||
In the Components column, CW, UE, CA denote Credibility Weights, User Expertise and Cluster Attention module components, respectively. In the Evaluation Metrics column, Pre., Rec. and F1 denote Precision, Recall, and F1 score
Performance for individual integration of UE and CA in Ablation Study
| Components | Evaluation Metrics | |||||
|---|---|---|---|---|---|---|
| CW | UE | CA | Pre. | Rec. | ||
| LSTM-UE | ✓ | 0.6513 | 0.4204 | 0.4531 | ||
| LSTM-CA | ✓ | 0.6416 | 0.4293 | 0.4611 | ||
| CNN-UE | ✓ | 0.6738 | 0.6185 | 0.5743 | ||
| CNN-CA | ✓ | 0.7441 | 0.5616 | 0.5883 | ||
In the Components column, CW, UE, CA denote Credibility Weights, User Expertise and Cluster Attention module components, respectively. In the Evaluation Metrics column, Pre., Rec. and F1 denote Precision, Recall, and F1 score
Analysis of NEAT’s Credibility versus baselines in approximating credibility proxy
| Thread nDCG@2 | Thread Spearman | Forum Spearman | |
|---|---|---|---|
| Random | 0.7968 | -0.0271 | 0.0 |
| Post frequency | 0.8812 | 0.4223 | 0.1924 |
| Question frequency | 0.8341 | 0.1773 | 0.0279 |
| NEAT’s Credibility |
The Thread nDCG@2, Thread Spearman, and Forum Spearman columns respectively denote the values of Normalized Discounted Cumulative Gain at 2 at thread level, Spearman’s rank correlation coefficient at thread level and forum level of each method when using rankings by number of thanks as ground truths
Performance of NEAT versus baselines in Side Effect Discovery of Ibuprofen, Levothyroxine, and Metoformin
| Ibuprofen | Levothyroxine | Metoformin | |||||||
|---|---|---|---|---|---|---|---|---|---|
| Pre. | Rec. | Pre. | Rec. | Pre. | Rec. | ||||
| RF | 0.583 | 0.414 | 0.474 | 0.319 | 0.401 | 0.347 | 0.48 | 0.647 | 0.491 |
| uNEAT | 0.859 | 0.371 | 0.487 | 0.505 | 0.349 | 0.404 | 0.798 | 0.361 | 0.497 |
| NEAT | 0.845 | 0.427 | 0.549 | 0.385 | 0.814 | 0.365 | |||
In the Methods column, RF denotes Random Forest baseline from Bag-of-word, and uNEAT denotes User permutation baseline from NEAT. Pre., Rec. and F1 denote Precision, Recall, and F1 score, respectively.
Performance of NEAT versus baselines in Side Effect Discovery of Omeprazole and Alprazolam
| Omeprazole | Alprazolam | |||||
|---|---|---|---|---|---|---|
| Pre. | Rec. | Pre. | Rec. | |||
| RF | 0.229 | 0.458 | 0.271 | 0.639 | 0.432 | 0.511 |
| uNEAT | 0.534 | 0.393 | 0.394 | 0.981 | 0.551 | 0.663 |
| NEAT | 0.522 | 0.421 | 0.977 | 0.596 | ||
In the Methods column, RF denotes Random Forest baseline from Bag-of-word, and uNEAT denotes User permutation baseline from NEAT. Pre., Rec. and F1 denote Precision, Recall, and F1 score, respectively
Performance of CNN-NEAT’s Attention versus baselines in Side Effect Extraction in term of Precision
| Ibuprofen | Levothyroxine | Metoformin | Omeprazole | Alprazolam | |
|---|---|---|---|---|---|
| UMLS Tagging | 0.6801 | 0.6145 | 0.8378 | 0.614 | |
| Neural Extractor [ | 0.6741 | 0.6259 | 0.8092 | 0.4665 | 0.6161 |
| CNN-NEAT’s Attention | 0.504 |
The Methods column includes the two baselines, UMLS Tagging and Neural Extractor, and the extracted attention of our proposed NEAT – CNN-NEAT’s Attention. We present the five evaluated drugs, Ibuprofen, Levothyroxine, Metoformin, Omeprazole, Alprazolam and the Precision of extracting their side effects for all three methods.
A test example highlighting the extracted side effects obtained by CNN-NEAT’s Attention versus baselines
In the Post content column, the correct and incorrect side effects are highlighted in blue and red, respectively. The extracted side effects of UMLS Tagging, Neural Extractor and CNN-NEAT’s Attention are followed by ,