| Literature DB >> 35789887 |
Mourad Ellouze1, Lamia Hadrich Belguith1.
Abstract
Research in the medical field does not stop evolving. This evolution obliges doctors to be up-to-date in order to well manage every situation that may occur with their patients. However, the medical field is very sensitive and requires a great deal of precision, all of that poses a major problem. Consequently, there is a recourse to computer science, to resolve all of these issues. In this context, we propose in this paper an architecture, taking advantage of artificial intelligence (AI) and text mining techniques to: (i) identify individuals with personality disorder from their textual production on social networks by classifying their set of tweets into distinct classes representing respectively the presence, the category and the type of the disease and (ii) guarantee personalized monitoring by filtering inappropriate tweets according to patient's circumstance. The first phase was achieved by taking advantage of a deep neuronal approach that benefits of: (i) CNN layers for features extraction from the textual part, (ii) two LSTM layers to preserve long-term dependencies between different lexical units, (iii) SVM classifier to detect the sick person using the dependency links found from the previous layer. The second phase was accomplished by applying a hybrid approach that combined linguistic and statistical techniques in order to filter inappropriate tweets according to the state of each patient. Following the evaluation of our approach, we acquire an F-measure rate equivalent to 84% for the detection of personality disorder, 64% for the detection of the type of disease and 70% for the task of filtering inappropriate content. The obtained results are motivating and may encourage researchers to improve them in view of the interest and the importance of this research axis.Entities:
Keywords: Deep learning; Natural language processing; Personality disorder; Semantic analysis; Social media; Text mining
Year: 2022 PMID: 35789887 PMCID: PMC9244050 DOI: 10.1007/s13278-022-00884-x
Source DB: PubMed Journal: Soc Netw Anal Min
Fig. 1The proposed approach for the detection and monitoring of people having personality disorders on social networks
Fig. 2Proposed deep CNN-LSTM model for existence, category and type of disease detected
Linguistic features extraction
| Type | Description |
|---|---|
| Numeric features | Number of each punctuation |
| Number of each sentence | |
| Number of words in a sentence | |
| Number of named entities | |
| Morphological features | Number of each POS |
| Tense of each sentence | |
| Number of entity gender (masculine/feminine) | |
| Number of entity forms (singular/plural) | |
| Semantic features | Sentimental analysis |
| Semantic relations |
The distribution of instances per class
| Classes | Number of instances per user | Number of instances per tweets |
|---|---|---|
| Person with PD | 884 users | 17680 tweets |
| Normal Person | 531 users | 10620 tweets |
| Person with Suspicious Category disease | 422 users | 8 440 tweets |
| Person with Emotional Category disease | 580 users | 11600 tweets |
| Person with Anxious Category disease | 577 users | 11540 tweets |
| Person with Paranoid disease | 263 users | 5260 tweets |
| Person with Schizoid disease | 97 users | 1940 tweets |
| Person with Schizotypal disease | 174 users | 3480 tweets |
| Person with Antisocial disease | 154 users | 9000 tweets |
| Person with Borderline disease | 231 users | 4620 tweets |
| Person with Histrionic disease | 248 users | 4960 tweets |
| Person with Narcissistic disease | 83 users | 1660 tweets |
| Person with Avoiding disease | 79 users | 1580 tweets |
| Person with Dependent disease | 258 users | 5160 tweets |
| Person with Obsessive compulsive disease | 241 users | 4820 tweets |
Model parameter structure
| Layer type | Output shape | Param# |
|---|---|---|
| Input Layer | (700,1) | |
| conv1d (Conv1D) | (700, 320) | 1280 |
| Max_pooling1d | (233, 320) | 0 |
| Dropout (Dropout) | (233, 320) | 0 |
| conv1d_1 (Conv1D) | (233, 320) | 307520 |
| Max_pooling1d_1 | (77, 320) | 0 |
| Dropout_1 (Dropout) | (77, 320) | 0 |
| conv1d_2 (Conv1D) | (77, 320) | 307520 |
| Max_pooling1d_2 | (25, 320) | 0 |
| Dropout_2 (Dropout) | (25, 320) | 0 |
| Time_distributed | (1, 8000) | 0 |
| lstm (LSTM) | (250) | 8251000 |
| lstm_1 (LSTM) | (100) | 140400 |
| Dense (Dense) | (None, 80) | 8080 |
| Classification layer | 2 | 82 |
Fig. 3Architecture of the proposed model
Extract of results (translated to English) of disease detection
| An excerpt from a user’s history of tweets | Personality Disorder | Cluster A | Cluster B | Paranoid | Borderline |
|---|---|---|---|---|---|
| 1. Du grand délire ! Une hystérie incommensurable. | YES | YES | YES | YES | YES |
| 2. Le Professeur Raoult confirme la fin de l’épidémie sur Radio Classique et règle ses comptes avec les sorciers prévisionnistes de la catastrophe. | |||||
| 3. Ils sont payés au PV, dommage, les FDO perdent toute crédibilité ! | |||||
| 4. Second vague couplée á d’autres nouvelles pandémies donc confinement á perpétuité jusqu’á se que les populations ne se rappellent plus ce que signifie les mots liberté et contestation ... | |||||
| 5. Très vindicative Ruth Elkrief envers Michel Onfray. Insupportables ces journalistes serviteurs du gouvernement. | |||||
| 6. Nous avons beaucoup de variétés de penis de chiens. | |||||
| 7. Attention : âmes sensibles, ne regardez pas ! Resto chinois, suite et fin (2/2) Question bonus : Qu’avons-nous de commun avec cette culture ? | |||||
| 8. Aujourd’hui, pour les poulets, c’est un peu comme l’ouverture des soldes, faut remplir un maximum avant minuit, alors ils sont en mode racket intensif. Ils doivent toucher une prime pour être d’aussi “bons serviteurs de l’état.” |
Extract of results (translated to English) of filtering inappropriate content
| Tweet shared by a sick person | Tweet shared by a person among the list of following of the sick person | Filtering |
|---|---|---|
| #urgent | 146 nouveaux cas de #coronavirus et un nouveau décès lié á la maladie sont enregistrés en #Haïti. ( | Déjá plus de 520 000 décès par coronavirus dans le monde. ( | YES |
| Chine, 03 Juillet : Étude sur 15000 patients . “Nous avons constaté que le taux de COVID-19 symptomatique était plus fort. ( | NEW YORK NEW STUDY - Étude Rétrospective de 6493 patients ambulatoires et hospitalisés avec COVID-19. ( | YES |
| Covid-19 : le rapport choc des pompiers sur la gestion de la pandémie. ( | Une étude chinoise met en garde, contre la possibilité d’un nouveau “#virus #pandémique” provenant des #porcs. ( | YES |
Writing style analysis (general features) for the two corpus
| Our Corpus | Corpus Astuti ( | |||
|---|---|---|---|---|
| Punctuation : semicolon | 4.17 | 0.38 | 3.91 | 0.27 |
| POS : conjunction, pronoun, determining and preposition | 104.66 | 0.04 | 81.66 | 2.32e-05 |
| Punctuation : exclamation and question marks | 21.74 | 0.17 | 10.70 | 0.05 |
| POS : singular and named entity | 66.03 | 0.05 | 102.37 | 0.5e-09 |
| POS : adjective and adverb grammatical categories | 99.51 | 0.01 | 57.1 | 0.00045 |
| Sentiment analysis : negative feeling | 4.71 | 0.58 | 0.54 | 0.46 |
| semantic relation : consequence and explanation | 10.02 | 0.05 | 11.07 | 0.006 |
| Semantic relation : linking and addition | 22.07 | 0.01 | 12.81 | 0.05 |
| Semantic relation : opposition | 4.86 | 0.08 | 0.07 | 0.78 |
Variation of F-measure according to the different sentence embedding and classifiers used for PD detection
| BERT | LASER | SISTER | USE | Doc2Vect | |
|---|---|---|---|---|---|
| BILSTM | 80 | 81 | 81 | 81 | |
| LSTM | 78 | 82 | 85 | 84 | |
| LSTM+SVM | 83 | 81 | 76 | 84 | |
| BILSTM+SVM | 84 | 81 | 84 |
Bold values represent the best result related to each classification case
Variation of F-measure according to the different sentence embedding and classifiers used for suspicious category (cluster A) of PD detection
| BERT | LASER | SISTER | USE | Doc2Vect | |
|---|---|---|---|---|---|
| BILSTM | 65 | 66 | 67 | 51 | |
| LSTM | 64 | 68 | 59 | 48 | |
| LSTM+SVM | 71 | 65 | 62 | 58 | |
| BILSTM+SVM | 76 | 63 | 76 | 51 |
Bold values represent the best result related to each classification case
Variation of F-measure according to the different sentence embedding and classifiers used for emotional category (cluster B) of PD detection
| BERT | LASER | SISTER | USE | Doc2Vect | |
|---|---|---|---|---|---|
| BILSTM | 50 | 46 | 56 | 48 | |
| LSTM | 77 | 64 | 68 | 59 | |
| LSTM+SVM | 55 | 57 | 53 | ||
| BILSTM+SVM | 57 | 61 | 52 | 52 |
Bold values represent the best result related to each classification case
Variation of F-measure according to the different sentence embedding and classifiers used for anxious category (cluster C) of PD detection
| BERT | LASER | SISTER | USE | Doc2Vect | |
|---|---|---|---|---|---|
| BILSTM | 57 | 46 | 59 | 51 | |
| LSTM | 53 | 52 | 51 | ||
| LSTM+SVM | 55 | 46 | 50 | ||
| BILSTM+SVM | 56 | 56 | 46 | 60 |
Bold values represent the best result related to each classification case
Variation of F-measure according to the different sentence embedding techniques for LSTM+SVM classifier combination used to detect PD type
| BERT | LASER | SISTER | USE | Doc2Vect | |
|---|---|---|---|---|---|
| Antisocial | 49 | 54 | 52 | 52 | |
| Borderline | 74 | 64 | 69 | 60 | |
| Compulsive | 62 | 56 | 62 | 55 | |
| Dependent | 67 | 61 | 65 | 58 | |
| Avoiding | 75 | 74 | 71 | 73 | |
| histrionic | 55 | 52 | 48 | 48 | |
| Narcissistic | 55 | 51 | 61 | 57 | |
| Paranoid | 56 | 51 | 61 | 56 | |
| Schizoid | 55 | 59 | 61 | 57 | |
| Schizotypal | 56 | 57 | 50 | 50 |
Bold values represent the best result related to each classification case
Variation of F-measure according to the different sentence embedding techniques for BILSTM+SVM classifier combination used to detect PD type
| BERT | LASER | SISTER | USE | Doc2Vect | |
|---|---|---|---|---|---|
| Antisocial | 54 | 57 | 57 | 58 | |
| Borderline | 76 | 79 | 74 | 77 | |
| Compulsive | 66 | 72 | 67 | 61 | |
| Dependent | 65 | 66 | 72 | 67 | |
| Avoiding | 73 | 81 | 79 | 71 | |
| Histrionic | 60 | 52 | 54 | 60 | |
| Narcissistic | 60 | 74 | 66 | 53 | |
| Paranoid | 60 | 56 | 68 | 61 | |
| Schizoid | 70 | 66 | 68 | 65 | |
| Schizotypal | 53 | 54 | 58 | 52 |
Bold values represent the best result related to each classification case
Fig. 4F-score comparison result of different embedding models for personality disorder classification using model composed of CNN-BILSTM-SVM
Fig. 5F-score comparison result of different embedding models for different categories of personality disorder classification using model composed of CNN-BILSTM-SVM
Fig. 6F-score comparison result of LASER model embedding technique for different types of personality disorder classification using model composed of CNN-BILSTM-SVM
Comparison of our analysis with ASHA results for antisocial writing style analysis.
| ASHA | Analysis | Our corpus results | Corpus Astuti ( |
|---|---|---|---|
| Tends to employ shorter T-unit of sentence fragments and has a less intricate sentence structure | The utilization rate of semicolon punctuation is notable | ✓ | ✓ |
| The narrative and explanatory discourse are poorly organized; finds it tough to bring his ideas to fruition | The usage of semantic relations to demonstrate explanations, connections, additions, and consequences have decreased significantly | ✓ | ✓ |
| Is not able to write from different viewpoints | Decrease in the rate of use of the opposition relationship | ✓ | ✗ |
| Suffers from inflexible morphological difficulties which contrarily influences the structure of sentences | The POS determinant, conjunction, and preposition are used less frequently | ✓ | ✓ |
| Difficulty in understanding and interpreting texts written by others | Exclamation points and question marks are frequently used, as are sentences having a negative tone | ✓ | ✓ |
| Uses less abstract language | The employment of singular versus plural forms is very impressive. High utilization of adjectives and named entities | ✗ | ✗ |