| Literature DB >> 33095174 |
Muhammad Afzal1,2, Fakhare Alam2, Khalid Mahmood Malik2, Ghaus M Malik3.
Abstract
BACKGROUND: Automatic text summarization (ATS) enables users to retrieve meaningful evidence from big data of biomedical repositories to make complex clinical decisions. Deep neural and recurrent networks outperform traditional machine-learning techniques in areas of natural language processing and computer vision; however, they are yet to be explored in the ATS domain, particularly for medical text summarization.Entities:
Keywords: automatic text summarization; biomedical informatics; brain aneurysm; deep neural network; semantic similarity; word embedding
Mesh:
Year: 2020 PMID: 33095174 PMCID: PMC7647812 DOI: 10.2196/19810
Source DB: PubMed Journal: J Med Internet Res ISSN: 1438-8871 Impact factor: 5.428
Figure 1Proposed Biomed-Summarizer architecture with four major components: data preprocessing, quality recognition, context identification, and summary construction. PQR: prognosis quality recognition; CCA: clinical context-aware; PICO: Population/Problem, Intervention, Comparison, Outcome.
Example sequence of sentences for an assigned category of Aim, Population, Methods, Interventions, Results, Conclusion, and Outcomes.
| Sequence | Category |
| The aim of the present study was to evaluate whether the Anterior communicating artery (A com) aneurysms behave differently from the aneurysms located elsewhere with respect to size being a rupture risk. To this end, we examined the clinical data of ruptured A com aneurysms and analyzed other morphological parameters, including size parameter, providing adequate data for predicting rupture risk of the A com aneurysms. | Aim (A) |
| Between January 2010 and December 2015, a total of 130 consecutive patients at our institution with the A com aneurysms-86 ruptured and 44 unruptured-were included in this study. The ruptured group included 43 females (50%) and 43 males (50%) with the mean age of 56 years (range, 34-83 years). The unruptured group included 23 females (52%) and 21 males (48%) with the mean age of 62 years (range, 28-80 years). All patients underwent either digital subtraction angiography or 3-dimensional computed tomography angiography. The exclusion criteria for this study were the patients with fusiform, traumatic, or mycotic aneurysm. There were preexisting known risk factors, such as hypertension in 73 patients, who required antihypertensive medication; other risk factors included diabetes mellitus (9 patients), coronary heart disease (9 patients), previous cerebral stroke (18 patients), and end-stage renal disease (3 patients) in the ruptured group. In the unruptured group, 38 patients had hypertension, 4 had diabetes mellitus, 5 had coronary heart disease, 10 had a previous cerebral stroke, and 2 had end-stage renal disease. | Population (P) |
| Four intracranial aneurysms cases were selected for this study. Using CT angiography images, the rapid prototyping process was completed using a polyjet technology machine. The size and morphology of the prototypes were compared to brain digital subtraction arteriography of the same patients. | Methods (M) |
| After patients underwent dural puncture in the sitting position at L3-L4or L4-L5, 0.5% hyperbaric bupivacaine was injected over two minutes: group S7.5 received 1.5 mL, Group S5 received 1.0 mL, and group S4 received 0.8 mL. interventions after sitting for 10 minutes, patients were positioned for surgery. | Intervention (I) |
| The ruptured group consisted of 9 very small (<2 mm), 38 small (2-4 mm), 32 medium (4-10 mm), and 7 large (>10 mm) aneurysms; the unruptured group consisted of 2 very small, 16 small, 25 medium, and one large aneurysms. There were 73 ruptured aneurysms with small necks and 13 with wide necks (neck size>4 mm), and 34 unruptured aneurysms with small necks and 10 with wide necks. | Results (R) |
| The method which we develop here could become surgical planning for intracranial aneurysm treatment in the clinical workflow. | Conclusion (C) |
| The prevailing view is that larger aneurysms have a greater risk of rupture. Predicting the risk of aneurysmal rupture, especially for aneurysms with a relatively small diameter, continues to be a topic of discourse. In fact, the majority of previous large-scale studies have used the maximum size of aneurysms as a predictor of aneurysm rupture. | Outcome (O) |
Figure 2Process steps of proposed prognosis quality recognition (PQR) model training and testing.
Figure 3Clinical context–aware (CCA) classifier trained on 250-dimension feature vectors, 100 nodes at the embedding layer, 100 memory units of the long short-term memory (LSTM) layer logical hidden layers, and 5 classification nodes.
Assigned weights for research study year of publication.
| Year of publication | Rank |
| Previous 1-5 years | 1 |
| Previous 6-10 years | 2 |
| Previous 11-15 years | 3 |
| Other | 4 |
Figure 4Step-by-step scenario of query execution, retrieval of documents, quality checking, clinical context-aware (CCA) classification, semantic similarity, ranking, and summary creation. A: Aim; P: Population/Patients/Problem; I: Intervention; R: Results; O: Outcome; PICO: Patient/Problem, Intervention, Comparison, Outcome.
Dataset preparation protocols.
| Preparation Protocol | PQRa dataset (D1) | CCAb dataset (D2) |
| Description | This dataset was created for the quality assessment of biomedical studies related to the prognosis of brain aneurysm. | This dataset was curated for the use of PICOc sequence classification. The final dataset was specific to the prognosis of brain aneurysm. |
| Purpose | To select only published documents that are scientifically rigorous for final summarization. | To identify a sentence or a group of sentences for discovering the clinical context in terms of population, intervention, and outcomes. |
| Methods | The manual preparation of the dataset is a cumbersome job, and thus AId models were used. For development of an AI model, a massive set of annotated documents is needed. Annotation is a tedious job; therefore, PubMed Clinical Queries (narrow) as a surrogate were used to obtain scientifically rigorous studies. | N/Ae |
| Data sources | PubMed Database (for positive studies, the “Narrow[filter]” parameter was enabled). | First, we collected a publicly available dataset, BioNLP 2018 [ |
| Query | The term “(Prognosis/Narrow[filter]) AND (intracranial aneurysm)” was used as a query string. | The term “Intracranial aneurysm” (along with its synonyms “cerebral aneurysm” and “brain aneurysm”) were used as a query string. |
| Size | 2686 documents, including 697 positive (ie, scientifically rigorous) records | A total of 173,000 PICO sequences (131,000 BioNLP+42,000 Brain Aneurysm) were included in the dataset. |
| Inclusion/exclusion | Only studies that were relevant and passed the criteria to be “Prognosis/Narrow[filter]” were included in the positive set. The other relevant studies not in the positive set were included in the negative set. All other studies were excluded from the final dataset. | Only structured abstracts identified with at least one of the PICO elements were considered to extract the text sequence. |
| Study types | RCTsg, systematic reviews, and meta-analysis of RCTs were given more importance. | RCTs, systematic reviews, and meta-analysis of RCTs were given more importance. |
aPQR: prognosis quality recognition.
bCCA: clinical context–aware.
cPICO: Patient/Problem, Intervention, Comparison, Outcome.
dAI: artificial intelligence.
eN/A: not applicable.
fNCBI: National Center of Biotechnology Information.
gRCT: randomized controlled trial.
Comparative results of the deep-learning model with shallow machine-learning models.
| Algorithm | F1-score | Accuracy | AUCa |
| Naïve Bayes | 90.83 | 87.47 | 0.987 |
| Decision tree | 85.10 | 74.07 | 0.50 |
| k-nearest neighbor | 46.53 | 48.39 | 0.829 |
| General linear model | 89.34 | 82.38 | 0.904 |
| Support vector machine | 86.96 | 77.79 | 0.983 |
| Deep learning (MLPb) | 93.17 | 90.20 | 0.967 |
aAUC: area under the receiver operating characteristic curve.
bMLP: multilayer perceptron.
Results of multilayer perceptron with varied hyperparameter settings.
| Hidden layers (n) | Hidden layer size | BoWa | Activation | Epochs (n) | Recall | Precision | F1-score | Accuracy | AUCb |
| 2 | 50, 50 | No | Rectifier | 10 | 90.28 | 96.25 | 93.17 | 90.2 | 0.967 |
| 3 | 100, 50, 25 | No | Rectifier | 10 | 93.47 | 96.71 | 95.06 | 92.8 | 0.969 |
| 3 | 100, 50, 25 | No | Maxout | 10 | 96.82 | 96.01 | 96.41 | 94.67 | 0.976 |
| 3 | 100, 50, 25 | No | Maxout with Dropout | 10 | 98.16 | 93.61 | 95.83 | 93.67 | 0.963 |
| 3 | 100, 50, 25 | No | Tanh | 10 | 90.62 | 97.65 | 94.00 | 91.44 | 0.978 |
| 3 | 100, 50, 25 | Yes | Rectifier | 10 | 93.47 | 98.24 | 95.80 | 93.92 | 0.999 |
| 3 | 100, 50, 25 | Yes | Maxout | 10 | 94.47 | 97.41 | 95.92 | 94.04 | 0.977 |
| 3 | 50, 50, 50 | No | Rectifier | 10 | 87.77 | 96.86 | 92.09 | 88.83 | 0.958 |
| 3 | 200, 100, 50 | No | Rectifier | 10 | 92.96 | 96.86 | 94.87 | 92.56 | 0.975 |
| 4 | 200, 100, 50, 25 | No | Rectifier | 10 | 93.63 | 96.05 | 94.82 | 92.43 | 0.973 |
aBoW: bag of words.
bAUC: area under the receiver operating characteristic curve.
Ensembling of deep-learning models.
| Boosting Model | Recall | Precision | F1-score | Accuracy | AUCa |
| Ensemble voting (MLPb, DTc, NBd) | 95.81 | 97.28 | 96.54 | 94.91 | 0.955 |
| Proposed model (AdaBooste-MLP) | 97.99 | 95.9 | 96.93 | 95.41 | 0.999 |
aAUC: area under the receiver operating characteristic curve.
bMLP: multilayer perceptron.
cDT: decision tree.
dNB: naïve Bayes.
eAdaBoost: adaptive boosting.
Comparative results of deep learning with traditional machine-learning models.
| Model | Recall | Precision | F1-score | Accuracy |
| Logistic Regression | 0.42 | 0.34 | 0.36 | 0.42 |
| AdaBoosta | 0.49 | 0.48 | 0.46 | 0.49 |
| Gradient Boost | 0.50 | 0.59 | 0.45 | 0.50 |
| ANNb | 0.29 | 0.08 | 0.13 | 0.29 |
| kNNc | 0.35 | 0.36 | 0.37 | 0.35 |
| Proposed Bi-LSTMd model | 0.93 | 0.94 | 0.94 | 0.93 |
aAdaBoost: adaptive boosting.
bANN: artificial neural network.
cKNN: k-nearest neighbor.
dBi-LSTM: bidirectional long-short term memory.
Precision, recall, F1-score, and support for individual classes of the proposed deep-learning model.
| Class | Precision | Recall | F1-score | Support |
| Aim | 0.94 | 0.95 | 0.95 | 3133 |
| Intervention | 0.84 | 0.94 | 0.89 | 1238 |
| Outcome | 0.96 | 0.94 | 0.95 | 5036 |
| Result | 0.96 | 0.94 | 0.95 | 4852 |
| Population | 0.94 | 0.95 | 0.94 | 3082 |
Comparison of similarity approaches.
| Methods | Pearson correlation coefficient (0.0-1.0) |
| Jaccard similarity | 0.56 |
| Cosine similarity (Count Vectorizer) | 0.54 |
| GloVe Embedding | 0.43 |
| Word2Vec (Google) | 0.38 |
| fastText (Facebook) | 0.52 |
| Jaccard Similarity after semantic enrichment (JS2E) | 0.61 |
Correlation matrix (Pearson correlation coefficients) of similarity approaches among evaluators for summaries according to the three metrics.
| Metric | Evaluator A | Evaluator B | Evaluator C |
| M1a | 0.728 | 0.767 | 0.837 |
| M2b | 0.826 | 0.924 | 0.841 |
| M3c | 0.772 | 0.843 | 0.804 |
aM1: summary relevance to the inbound query.
bM2: aim, population, intervention, results, and outcome classification representation in the summary.
cM3: model summary better than the baseline summary.
Frequency distribution of scores with respect to each metric by all evaluators.
| Metric | Score | Frequency |
| M1a | 2 | 4 |
| M1 | 3 | 10 |
| M1 | 4 | 12 |
| M1 | 5 | 4 |
| M2b | 2 | 3 |
| M2 | 3 | 10 |
| M2 | 4 | 10 |
| M2 | 5 | 7 |
| M3c | 3 | 12 |
| M3 | 4 | 16 |
| M3 | 5 | 2 |
aM1: summary relevance to the inbound query.
bM2: aim, population, intervention, results, and outcome classification representation in the summary.
cM3: model summary better than the baseline summary.