| Literature DB >> 34245913 |
Kirk Roberts1, Tasmeer Alam2, Steven Bedrick3, Dina Demner-Fushman4, Kyle Lo5, Ian Soboroff6, Ellen Voorhees6, Lucy Lu Wang5, William R Hersh3.
Abstract
We present an overview of the TREC-COVID Challenge, an information retrieval (IR) shared task to evaluate search on scientific literature related to COVID-19. The goals of TREC-COVID include the construction of a pandemic search test collection and the evaluation of IR methods for COVID-19. The challenge was conducted over five rounds from April to July 2020, with participation from 92 unique teams and 556 individual submissions. A total of 50 topics (sets of related queries) were used in the evaluation, starting at 30 topics for Round 1 and adding 5 new topics per round to target emerging topics at that state of the still-emerging pandemic. This paper provides a comprehensive overview of the structure and results of TREC-COVID. Specifically, the paper provides details on the background, task structure, topic structure, corpus, participation, pooling, assessment, judgments, results, top-performing systems, lessons learned, and benchmark datasets.Entities:
Keywords: COVID-19; Information retrieval; Pandemics; TREC-COVID
Mesh:
Year: 2021 PMID: 34245913 PMCID: PMC8264272 DOI: 10.1016/j.jbi.2021.103865
Source DB: PubMed Journal: J Biomed Inform ISSN: 1532-0464 Impact factor: 8.000
Fig. 1High-level structure of TREC-COVID.
Overview of the TREC-COVID timeline over the five rounds.
| Apr 10 | May 1 | May 19 | Jun 19 | Jul 16 | |
| Apr 15 | May 4 | May 26 | Jun 26 | Jul 22 | |
| Apr 23 | May 13 | Jun 3 | Jul 6 | Aug 3 | |
| 51,103 | 59,851 | 128,492 | 157,817 | 191,175 | |
| 30 | 35 | 40 | 45 | 50 | |
| 56 | 51 | 31 | 27 | 28 | |
| 143 | 136 | 79 | 72 | 126 | |
| 8,691 | 20,728 | 33,068 | 46,203 | 69,318 |
Three example TREC-COVID topics.
All 50 topics (only the Query field) along with the research field and function categories assigned to each topic.
| 1 | coronavirus origin | Biological | Transmission |
| 2 | coronavirus response to weather changes | Public Health | Transmission |
| 3 | coronavirus immunity | Clinical | Prevention |
| 4 | how do people die from the coronavirus | Clinical | Effect |
| 5 | animal models of COVID-19 | Biological | Treatment |
| 6 | coronavirus test rapid testing | Public Health | Prevention |
| 7 | serological tests for coronavirus | Public Health | Prevention |
| 8 | coronavirus under reporting | Public Health | Prevention |
| 9 | coronavirus in Canada | Public Health | Transmission |
| 10 | coronavirus social distancing impact | Public Health | Prevention |
| 11 | coronavirus hospital rationing | Clinical | Treatment |
| 12 | coronavirus quarantine | Public Health | Prevention |
| 13 | how does coronavirus spread | Biological | Transmission |
| 14 | coronavirus super spreaders | Public Health | Transmission |
| 15 | coronavirus outside body | Biological | Transmission |
| 16 | how long does coronavirus survive on surfaces | Biological | Transmission |
| 17 | coronavirus clinical trials | Clinical | Prevention |
| 18 | masks prevent coronavirus | Public Health | Prevention |
| 19 | what alcohol sanitizer kills coronavirus | Biological | Prevention |
| 20 | coronavirus and ACE inhibitors | Biological | Effect |
| 21 | coronavirus mortality | Public Health | Effect |
| 22 | coronavirus heart impacts | Clinical | Effect |
| 23 | coronavirus hypertension | Clinical | Effect |
| 24 | coronavirus diabetes | Clinical | Effect |
| 25 | coronavirus biomarkers | Biological | Effect |
| 26 | coronavirus early symptoms | Clinical | Effect |
| 27 | coronavirus asymptomatic | Clinical | Transmission |
| 28 | coronavirus hydroxychloroquine | Clinical | Treatment |
| 29 | coronavirus drug repurposing | Biological | Treatment |
| 30 | coronavirus remdesivir | Clinical | Treatment |
| 31 | difference between coronavirus and flu | Biological | N/A |
| 32 | coronavirus subtypes | Biological | N/A |
| 33 | coronavirus vaccine candidates | Clinical | Treatment |
| 34 | coronavirus recovery | Clinical | Effect |
| 35 | coronavirus public datasets | Biological | Transmission |
| 36 | SARS-CoV-2 spike structure | Biological | Transmission |
| 37 | SARS-CoV-2 phylogenetic analysis | Biological | N/A |
| 38 | COVID inflammatory response | Clinical | Effect |
| 39 | COVID-19 cytokine storm | Biological | Effect |
| 40 | coronavirus mutations | Biological | Transmission |
| 41 | COVID-19 in African-Americans | Public Health | Effect |
| 42 | Vitamin D and COVID-19 | Clinical | Treatment |
| 43 | violence during pandemic | Public Health | Effect |
| 44 | impact of masks on coronavirus transmission | Public Health | Prevention |
| 45 | coronavirus mental health impact | Public Health | Effect |
| 46 | dexamethasone coronavirus | Clinical | Treatment |
| 47 | COVID-19 outcomes in children | Clinical | Effect |
| 48 | school reopening coronavirus | Public Health | Prevention |
| 49 | post-infection COVID-19 immunity | Public Health | Effect |
| 50 | mRNA vaccine coronavirus | Biological | Treatment |
Teams participating in all five TREC-COVID rounds, with run counts for each round. Rounds 1–4 limited participants to 3 runs. Round 5 limited participants to 8 runs.
| 0_214_wyb | 2 | ||||
| abccaba | 2 | ||||
| anserini | 2 | 3 | 3 | 8 | |
| ASU_biomedical | 3 | ||||
| AUEB_NLP_GROUP | 1 | ||||
| azimiv | 1 | ||||
| BBGhelani | 2 | 3 | |||
| BioinformaticsUA | 3 | 3 | 3 | 3 | 6 |
| BITEM | 3 | 2 | 2 | 2 | |
| BRPHJ | 3 | ||||
| BRPHJ_NLP | 3 | ||||
| CincyMedIR | 3 | 3 | 3 | 3 | 8 |
| CIR | 3 | 3 | 2 | ||
| CMT | 3 | ||||
| CogIR | 3 | ||||
| columbia_university_dbmi | 2 | 2 | |||
| cord19.vespa.ai | 1 | 2 | 3 | ||
| covidex | 3 | 3 | 3 | 3 | 8 |
| CovidSearch | 3 | ||||
| CSIROmed | 3 | 3 | 3 | 3 | 3 |
| cuni | 3 | ||||
| DA_IICT | 3 | ||||
| DY_XD | 3 | ||||
| Elhuyar_NLP_team | 3 | 3 | 5 | ||
| Emory_IRLab | 2 | 2 | 3 | ||
| Factum | 1 | 3 | 2 | ||
| fcavalier | 1 | ||||
| GUIR_S2 | 3 | 3 | |||
| HKPU | 1 | 3 | 8 | ||
| ielab | 3 | 3 | |||
| ILPS_UvA | 3 | ||||
| ims_unipd | 3 | ||||
| IR_COVID19_CLE | 3 | 3 | |||
| IRC | 3 | 2 | |||
| IRIT_LSIS_FR | 2 | 3 | |||
| IRIT_markers | 3 | 3 | |||
| IRLabKU | 3 | 3 | 2 | ||
| ixa | 3 | ||||
| julielab | 3 | 3 | 3 | 1 | |
| KAROTENE_SYNAPTIQ_UMBC | 3 | ||||
| KoreaUniversity_DMIS | 3 | ||||
| LTR_ESB_TEAM | 1 | ||||
| MacEwan_Business | 1 | ||||
| Marouane | 2 | ||||
| MedDUTH_AthenaRC | 3 | ||||
| mpiid5 | 3 | 3 | 1 | 2 | |
| NI_CCHMC | 3 | ||||
| NTU_NMLab | 3 | ||||
| OHSU | 3 | 3 | 3 | 3 | |
| PITT | 3 | ||||
| PITTSCI | 3 | ||||
| POZNAN | 3 | 3 | 3 | 3 | 3 |
| Random | 1 | ||||
| req_rec | 3 | ||||
| reSearch2vec | 7 | ||||
| risklick | 3 | 3 | 3 | 7 | |
| RMITB | 2 | 1 | |||
| RUIR | 3 | 3 | 1 | ||
| ruir | 3 | ||||
| sabir | 3 | 3 | 3 | 3 | 8 |
| SavantX | 3 | 3 | |||
| SFDC | 2 | 2 | 3 | 3 | 1 |
| shamra | 1 | ||||
| Sinequa | 2 | ||||
| Sinequa2 | 1 | ||||
| smith | 3 | ||||
| tcs_ilabs_gg | 1 | ||||
| Technion | 3 | 3 | |||
| test_uma | 1 | ||||
| THUMSR | 3 | ||||
| TM_IR_HITZ | 3 | ||||
| TMACC_SeTA | 1 | 3 | |||
| TU_Vienna | 2 | ||||
| UAlbertaSearch | 1 | 2 | |||
| UB_BW | 3 | 3 | 1 | ||
| UB_NLP | 1 | ||||
| UCD_CS | 3 | 3 | 3 | 3 | |
| udel_fang | 3 | 3 | 3 | 3 | 3 |
| UH_UAQ | 1 | 2 | 7 | ||
| UIowaS | 3 | 3 | 3 | 3 | |
| UIUC_DMG | 3 | ||||
| UMASS_CIIR | 2 | ||||
| unipd.it | 3 | ||||
| unique_ptr | 3 | 3 | 3 | 3 | 6 |
| uogTr | 3 | 2 | 3 | 8 | |
| UWMadison_iSchool | 3 | ||||
| VATech | 3 | ||||
| VirginiaTechHAT | 3 | 3 | |||
| whitej_relevance | 3 | ||||
| WiscIRLab | 6 | ||||
| wistud | 3 | ||||
| xj4wang | 1 | 3 | 3 | 3 | 3 |
Fig. 2Number of documents judged in the top 50 ranks of a submission by round. The black line within a box is the median number of documents judged for that submission over the set of topics in that round. Judged submissions (submissions that contributed to the qrels) are plotted in light blue and unjudged submissions are in dark blue. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Fig. 3Assessment platform.
Fig. 4The number of articles judged per topic, by round.
Fig. 5Distributions of assignments per topic across rounds of judging.
Counts of total numbers of judged documents and number of relevant documents per topic. Percent relevant is the fraction of judged documents that are some form of relevant.
| Topic | Total Judged | PartiallyRel | FullyRel | % Rel | Topic | Total Judged | Partially Rel | Fully Rel | % Rel |
|---|---|---|---|---|---|---|---|---|---|
| 1 | 1647 | 362 | 337 | 42.4 | 26 | 1720 | 148 | 684 | 48.4 |
| 2 | 1287 | 71 | 264 | 26.0 | 27 | 1477 | 580 | 321 | 61.0 |
| 3 | 1688 | 443 | 209 | 38.6 | 28 | 1103 | 74 | 543 | 55.9 |
| 4 | 1849 | 331 | 236 | 30.7 | 29 | 1241 | 275 | 374 | 52.3 |
| 5 | 1697 | 339 | 307 | 38.1 | 30 | 1035 | 211 | 193 | 39.0 |
| 6 | 1607 | 328 | 666 | 61.9 | 31 | 1701 | 213 | 158 | 21.8 |
| 7 | 1382 | 50 | 474 | 37.9 | 32 | 1571 | 80 | 149 | 14.6 |
| 8 | 1869 | 391 | 257 | 34.7 | 33 | 1270 | 125 | 182 | 24.2 |
| 9 | 1664 | 104 | 105 | 12.6 | 34 | 1842 | 74 | 124 | 10.7 |
| 10 | 1141 | 203 | 294 | 43.6 | 35 | 1360 | 32 | 207 | 17.6 |
| 11 | 1821 | 226 | 216 | 24.3 | 36 | 1233 | 105 | 572 | 54.9 |
| 12 | 1626 | 295 | 353 | 39.9 | 37 | 1234 | 144 | 369 | 41.6 |
| 13 | 1893 | 656 | 264 | 48.6 | 38 | 1920 | 618 | 765 | 72.0 |
| 14 | 1296 | 172 | 101 | 21.1 | 39 | 1264 | 438 | 539 | 77.3 |
| 15 | 1981 | 266 | 180 | 22.5 | 40 | 1230 | 217 | 371 | 47.8 |
| 16 | 1640 | 236 | 174 | 25.0 | 41 | 1043 | 87 | 269 | 34.1 |
| 17 | 1353 | 372 | 345 | 53.0 | 42 | 769 | 23 | 255 | 36.2 |
| 18 | 1325 | 319 | 347 | 50.3 | 43 | 878 | 97 | 203 | 34.2 |
| 19 | 1489 | 68 | 49 | 7.9 | 44 | 1238 | 182 | 360 | 43.8 |
| 20 | 1234 | 288 | 469 | 61.3 | 45 | 1171 | 352 | 549 | 76.9 |
| 21 | 1600 | 80 | 577 | 41.1 | 46 | 680 | 109 | 91 | 29.4 |
| 22 | 1325 | 216 | 379 | 44.9 | 47 | 1064 | 113 | 353 | 43.8 |
| 23 | 1293 | 194 | 201 | 30.5 | 48 | 747 | 202 | 279 | 64.4 |
| 24 | 1248 | 150 | 300 | 36.1 | 49 | 1093 | 131 | 136 | 24.4 |
| 25 | 1590 | 167 | 408 | 36.2 | 50 | 889 | 98 | 51 | 16.8 |
Top automatic/feedback runs (best run per team), as determined by NDCG, for each of the five rounds of TREC-COVID. P@N: Precision at rank N; NDCG@N: Normalized Discounted Cumulative Gain at rank N; MAP: Mean Average Precision; bpref: Binary Preference; judged?: whether the run contributed to the pooling.
| sabir | sab20.1.meta.docs | automatic | 0.7800 | 0.6080 | 0.3128 | 0.4832 | yes |
| GUIR_S2 | run2 | automatic | 0.6867 | 0.6032 | 0.2601 | 0.4177 | no |
| IRIT_markers | IRIT_marked_base | automatic | 0.7200 | 0.5880 | 0.2309 | 0.4198 | yes |
| CSIROmed | CSIROmedNIR | automatic | 0.6600 | 0.5875 | 0.2169 | 0.4066 | no |
| unipd.it | base.unipd.it | automatic | 0.7267 | 0.5720 | 0.2081 | 0.3782 | no |
| CMT | SparseDenseSciBert | feedback | 0.7600 | 0.6772 | 0.3115 | 0.5096 | yes |
| mpiid5 | mpiid5_run1 | feedback | 0.7771 | 0.6677 | 0.2946 | 0.4609 | no |
| UIowaS | UIowaS_Run3 | feedback | 0.7657 | 0.6382 | 0.2845 | 0.4867 | no |
| unique_ptr | UPrrf16lgbertd50-r2 | feedback | 0.7086 | 0.6320 | 0.3000 | 0.4414 | yes |
| GUIR_S2 | GUIR_S2_run2 | feedback | 0.7771 | 0.6286 | 0.2531 | 0.4067 | yes |
| covidex | covidex.r3.t5_lr | feedback | 0.8600 | 0.7740 | 0.3333 | 0.5543 | yes |
| BioinformaticsUA | BioInfo-run1 | feedback | 0.8650 | 0.7715 | 0.3188 | 0.5560 | yes |
| UIowaS | UIowaS_Rd3Borda | feedback | 0.8900 | 0.7658 | 0.3207 | 0.5778 | no |
| udel_fang | udel_fang_lambdarank | feedback | 0.8900 | 0.7567 | 0.3238 | 0.5764 | yes |
| CIR | sparse-dense-SBrr-2 | feedback | 0.8000 | 0.7272 | 0.3134 | 0.5419 | yes |
| unique_ptr | UPrrf38rrf3-r4 | feedback | 0.8211 | 0.7843 | 0.4681 | 0.6801 | yes |
| covidex | covidex.r4.duot5.lr | feedback | 0.7967 | 0.7745 | 0.3846 | 0.5825 | yes |
| udel_fang | udel_fang_lambdarank | feedback | 0.7844 | 0.7534 | 0.3907 | 0.6161 | yes |
| CIR | run2_Crf_A_SciB_MAP | feedback | 0.7700 | 0.7470 | 0.4079 | 0.6292 | yes |
| mpiid5 | mpiid5_run1 | feedback | 0.7589 | 0.7391 | 0.3993 | 0.6132 | yes |
| unique_ptr | UPrrf93-wt-r5 | feedback | 0.8760 | 0.8496 | 0.4718 | 0.6372 | yes |
| covidex | covidex.r5.2 s.lr | feedback | 0.8460 | 0.8311 | 0.3922 | 0.533 | yes |
| Elhuyar_NLP_team | elhuyar_prf_nof99p | feedback | 0.8340 | 0.8116 | 0.4029 | 0.6091 | yes |
| risklick | rk_ir_trf_logit_rr | feedback | 0.8260 | 0.7956 | 0.3789 | 0.5659 | yes |
| udel_fang | udel_fang_ltr_split | feedback | 0.8270 | 0.7929 | 0.3682 | 0.5451 | yes |
Fig. 6Median average precision (AP) scores over all runs submitted to a given round. The topics on the x-axis are sorted by decreasing median AP.