| Literature DB >> 35768777 |
Jinghang Gu1, Rong Xiang2, Xing Wang3, Jing Li2, Wenjie Li2, Longhua Qian4, Guodong Zhou4, Chu-Ren Huang5.
Abstract
BACKGROUND: The COVID-19 pandemic has increasingly accelerated the publication pace of scientific literature. How to efficiently curate and index this large amount of biomedical literature under the current crisis is of great importance. Previous literature indexing is mainly performed by human experts using Medical Subject Headings (MeSH), which is labor-intensive and time-consuming. Therefore, to alleviate the expensive time consumption and monetary cost, there is an urgent need for automatic semantic indexing technologies for the emerging COVID-19 domain.Entities:
Keywords: Biomedical semantic indexing; COVID-19; Deep learning; Topic identification
Mesh:
Year: 2022 PMID: 35768777 PMCID: PMC9241329 DOI: 10.1186/s12859-022-04803-x
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.307
Fig. 1An example of MeSH semantic indexing
taken from PubMed
Fig. 2The construction framework of the CovSI corpus
The attribute statistics in the CovSI corpus
| Attribute name | Count |
|---|---|
| PMID | 87,207 |
| PMCID | 46,487 |
| Title | 87,192 |
| Abstract | 87,162 |
| Body Text | 45,968 |
| MeSH Terms | 1,161,962 |
| MeSH Identifiers | 1,161,962 |
| Journal Name | 87,207 |
| Year | 87,207 |
| Authors | 87,128 |
| Affiliations | 83,749 |
| Keywords | 35,928 |
| Chemicals | 43,711 |
| DOI | 77,776 |
| URL | 87,207 |
The statistic information of different CovSI datasets
| Type | Training set | Development set | Test set |
|---|---|---|---|
| #Articles | 71,207 | 8,000 | 8,000 |
| #MeSH term types | 17,758 | 9,035 | 8,991 |
| #Total terms | 945,462 | 106,088 | 110,412 |
| #Average terms per article | 13.28 | 13.26 | 13.80 |
Fig. 3The framework of the multi-probe attention neural network
The settings of the hyper-parameters
| Parameter | Value |
|---|---|
| Batch size | 10 |
| Word embedding size | 200 |
| Sequence length | 512 |
| Transformer hidden size | 200 |
| Candidate term size | 400 |
| Term embedding size | 200 |
| Journal embedding size | 200 |
| Dynamic topic probe size | 30 |
| Dynamic probe embedding size | 200 |
| Linear layer size | 200 |
| Dropout rate | 0.3 |
| Learning rate | 0.00001 |
The comparison of different systems on the CovSI test set
| Model | EBP (%) | EBR (%) | EBF (%) | MaP (%) | MaR (%) | MaF (%) | MiP (%) | MiR (%) | MiF (%) |
|---|---|---|---|---|---|---|---|---|---|
| MPANN | 87.41 | 63.52 | 97.03 | 50.44 | 55.02 | 88.62 | 62.78 | ||
| BioTrans [ | 87.02 | 62.74 | 70.47 | 97.17 | 47.63 | 52.23 | 87.99 | 61.92 | 72.68 |
| AttentionMeSH [ | 81.18 | 54.52 | 63.08 | 88.51 | 51.15 | 54.36 | 81.57 | 53.48 | 64.60 |
| FullMeSH [ | 88.40 | 51.92 | 63.29 | 95.11 | 57.56 | 60.47 | 88.44 | 51.92 | 65.43 |
| MeSHProbeNet-P [ | 82.81 | 54.36 | 65.64 | 95.64 | 57.66 | 83.33 | 57.14 | 67.79 |
The ablation experiments of MPANN
| Model name | EBP (%) | EBR (%) | EBF (%) | MaP (%) | MaR (%) | MaF (%) | MiP (%) | MiR (%) | MiF (%) | |
|---|---|---|---|---|---|---|---|---|---|---|
| MPANN | MPANN- | 87.41 | 97.03 | 50.44 | 88.62 | |||||
| - Context-Term Attention | 94.86 | 52.90 | 64.25 | 98.70 | 41.51 | 46.28 | 93.50 | 52.97 | 67.64 | |
| - Journal-Term Attention | 90.66 | 60.78 | 70.28 | 98.36 | 47.63 | 52.58 | 91.55 | 59.71 | 72.28 | |
| - Journal-Context Attention | 86.59 | 59.85 | 68.29 | 96.51 | 47.78 | 52.29 | 88.39 | 59.32 | 70.99 | |
| - Journal-Topic Attention | 87.61 | 62.29 | 70.36 | 97.06 | 49.40 | 54.01 | 88.88 | 61.60 | 72.77 | |
| - Context-Topic Attention | 84.91 | 62.47 | 69.64 | 96.02 | 54.87 | 86.57 | 62.05 | 72.29 | ||
| MPANN- | 41.89 | 54.76 | 36.06 | 40.78 | 42.81 | 59.60 | ||||
Fig. 4The performance of MiF with different settings of hyper-parameter M
Fig. 5The performance of MiF with different settings of hyper-parameter N
The comparison of the state-of-the-art systems on the BioASQ test set
| System | EBP(%) | EBR(%) | EBF(%) | MaP(%) | MaR(%) | MaF(%) | MiP(%) | MiR(%) | MiF(%) |
|---|---|---|---|---|---|---|---|---|---|
| deepmesh_dmiip_fdu | 72.51 | 70.10 | 59.34 | 72.02 | |||||
| NLM System 3 | 71.28 | 67.87 | 67.74 | 69.22 | 54.67 | 54.53 | 71.01 | 65.94 | 68.37 |
| attention_dmiip_fdu | 68.40 | 65.65 | 65.40 | 65.53 | 55.84 | 55.06 | 67.95 | 63.87 | 65.84 |
| MTI First Line Index | 69.39 | 63.58 | 64.50 | 65.43 | 57.33 | 55.38 | 68.21 | 61.52 | 64.69 |
| Default MTI | 64.54 | 67.28 | 64.02 | 61.17 | 56.54 | 63.76 | 65.11 | 64.42 | |
| NLM CNN | 68.03 | 62.11 | 62.86 | 63.02 | 45.81 | 46.10 | 67.30 | 60.75 | 63.85 |
| pi_dna_3 | 65.73 | 62.45 | 62.14 | 55.50 | 50.37 | 48.45 | 65.01 | 60.75 | 62.80 |
| bert_dna | 61.31 | 55.15 | 56.02 | 48.86 | 38.52 | 37.05 | 60.57 | 53.90 | 57.03 |
| iria-1 | 41.70 | 55.25 | 46.36 | 38.92 | 39.16 | 35.14 | 42.11 | 53.89 | 47.28 |
| MPANN | 72.13 | 59.73 | 63.70 | 68.13 | 52.62 | 53.20 | 72.02 | 58.56 | 64.59 |
The detailed performance of MPANN on different batches of the BioASQ test set
| Batch | #Articles | EBP(%) | EBR(%) | EBF(%) | MaP(%) | MaR(%) | MaF(%) | MiP(%) | MiR(%) | MiF(%) |
|---|---|---|---|---|---|---|---|---|---|---|
| Test batch 1, week 1 | 7,967 | 73.36 | 60.56 | 64.78 | 68.88 | 53.52 | 54.15 | 73.33 | 59.41 | 65.64 |
| Test batch 1, week 2 | 10,053 | 71.32 | 58.97 | 62.76 | 67.72 | 53.04 | 53.65 | 71.06 | 57.58 | 63.61 |
| Test batch 1, week 3 | 4,870 | 72.04 | 60.31 | 64.05 | 67.49 | 52.43 | 52.86 | 71.92 | 59.06 | 64.86 |
| Test batch 1, week 4 | 5,758 | 71.81 | 58.79 | 62.94 | 68.15 | 52.29 | 53.08 | 71.87 | 57.52 | 63.90 |
| Test batch 1, week 5 | 5,770 | 70.81 | 60.34 | 63.48 | 66.77 | 52.13 | 52.25 | 70.43 | 59.41 | 64.45 |
| Test batch 2, week 1 | 6,376 | 72.11 | 58.15 | 62.60 | 68.40 | 51.11 | 52.02 | 72.04 | 57.00 | 63.64 |
| Test batch 2, week 2 | 9,101 | 70.93 | 58.29 | 62.31 | 67.74 | 53.08 | 53.63 | 70.99 | 57.48 | 63.52 |
| Test batch 2, week 3 | 7,013 | 71.88 | 58.30 | 62.81 | 68.05 | 52.74 | 53.41 | 71.81 | 57.25 | 63.71 |
| Test batch 2, week 4 | 6,070 | 73.35 | 59.64 | 64.17 | 69.31 | 53.35 | 53.93 | 73.04 | 58.51 | 64.98 |
| Test batch 2, week 5 | 6,151 | 73.14 | 61.17 | 64.94 | 68.70 | 53.02 | 53.76 | 73.16 | 59.84 | 65.83 |
| Test batch 3, week 1 | 5,890 | 73.31 | 60.09 | 64.59 | 68.69 | 51.59 | 52.46 | 73.31 | 59.20 | 65.51 |
| Test batch 3, week 2 | 10,818 | 72.61 | 59.10 | 63.55 | 68.27 | 53.05 | 53.97 | 72.38 | 57.84 | 64.30 |
| Test batch 3, week 3 | 4,022 | 71.43 | 60.79 | 64.07 | 68.46 | 50.86 | 51.38 | 71.61 | 59.50 | 65.00 |
| Test batch 3, week 4 | 5,373 | 72.89 | 60.93 | 64.80 | 68.25 | 52.82 | 53.35 | 72.62 | 59.45 | 65.38 |
| Test batch 3, week 5 | 5,325 | 70.93 | 60.53 | 63.63 | 67.08 | 54.26 | 54.17 | 70.70 | 59.36 | 64.54 |