| Literature DB >> 34178569 |
Jafar A Alzubi1, Rachna Jain2, Anubhav Singh2, Pritee Parwekar3, Meenu Gupta4.
Abstract
In the current situation of worldwide pandemic COVID-19, which has infected 62.5 Million people and caused nearly 1.46 Million deaths worldwide as of Nov 2020. The profoundly powerful and quickly advancing circumstance with COVID-19 has made it hard to get precise, on-request latest data with respect to the virus. Especially, the frontline workers of the battle medical services experts, policymakers, clinical scientists, and so on will require expert specific methods to stay aware of this literature for getting scientific knowledge of the latest research findings. The risks are most certainly not trivial, as decisions made on fallacious, answers may endanger trust or general well being and security of the public. But, with thousands of research papers being dispensed on the topic, making it more difficult to keep track of the latest research. Taking these challenges into account we have proposed COBERT: a retriever-reader dual algorithmic system that answers the complex queries by searching a document of 59K corona virus-related literature made accessible through the Coronavirus Open Research Dataset Challenge (CORD-19). The retriever is composed of a TF-IDF vectorizer capturing the top 500 documents with optimal scores. The reader which is pre-trained Bidirectional Encoder Representations from Transformers (BERT) on SQuAD 1.1 dev dataset built on top of the HuggingFace BERT transformers, refines the sentences from the filtered documents, which are then passed into ranker which compares the logits scores to produce a short answer, title of the paper and source article of extraction. The proposed DistilBERT version has outperformed previous pre-trained models obtaining an Exact Match(EM)/F1 score of 80.6/87.3 respectively. © King Fahd University of Petroleum & Minerals 2021.Entities:
Keywords: BERT; CDQA; CORD-19; COVID-19; Cosine-similarity; DistilBERT; HuggingFace; Question answering; SQuAD; TF-IDF
Year: 2021 PMID: 34178569 PMCID: PMC8220121 DOI: 10.1007/s13369-021-05810-5
Source DB: PubMed Journal: Arab J Sci Eng ISSN: 2191-4281 Impact factor: 2.334
Fig. 1COBERT pipeline Architecture
Comparison Between Open-Domain QA an Closed Domain QA
| Ability to answer about anything | Able to answer question regarding the specific domain. |
| Mainly rely on general ontology’s and world knowledge | Able to exploit knowledge mainly domain-specific |
| Eg: DrQA(Facebook Research) | Example: cdQA |
Fig. 2COBERT Retriever Pipeline
Fig. 3COBERT Reader pipeline
Fig. 4BERT for question answering
Hyperparameters Comparison of Model
| Model | Bert-Base-Uncased | DistilBERT |
|---|---|---|
| max_seq_length | 384 | 384 |
| doc_stride | 128 | 128 |
| max_query_length | 64 | 64 |
| train_batch_size | 128 | 256 |
| max_answer_length | 30 | 30 |
| Learning Rate | 1e-8 | 1e-8 |
Fig. 5Example of output of the COBERT system with the input query
Comparative Analysis Of Models
| GPU version of BERT(with sklearn wrapper) | SQuAD 1.1 dev | 81.2 | 88.6 |
| BERT for QA | SQuAD 1.1 dev | 81.3 | 88.7 |
| Distil BERT for QA with Knowledge Distillation | SQuAD 1.1 dev | 80.1 | 87.5 |
| GPU version of BERT(with sklearn wrapper) | CORD-19 | 79.3 | 86.4 |
| BERT for QA | CORD-19 | 81.5 | 88.3 |
| Distil BERT for QA with Knowledge Distillation | CORD-19 | 80.6 | 87.3 |
Fig. 6Exact match (EM) comparison of models
Fig. 7F1 score comparison of models