| Literature DB >> 28881963 |
Maryam Habibi1, Leon Weber1, Mariana Neves2, David Luis Wiegandt1, Ulf Leser1.
Abstract
MOTIVATION: Text mining has become an important tool for biomedical research. The most fundamental text-mining task is the recognition of biomedical named entities (NER), such as genes, chemicals and diseases. Current NER methods rely on pre-defined features which try to capture the specific surface properties of entity types, properties of the typical local context, background knowledge, and linguistic information. State-of-the-art tools are entity-specific, as dictionaries and empirically optimal feature sets differ between entity types, which makes their development costly. Furthermore, features are often optimized for a specific gold standard corpus, which makes extrapolation of quality measures difficult.Entities:
Mesh:
Year: 2017 PMID: 28881963 PMCID: PMC5870729 DOI: 10.1093/bioinformatics/btx228
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1CRF-LSTM architecture. For instance, for the word ‘SH3’ from the input sentence S, the character-based representation is computed by applying a bi-directional LSTM onto the sequence of its characters ‘S’, ‘H’, ‘3’. The resulting embedding is concatenated with the corresponding word embedding, trained on a huge corpus. This word representation is then processed by another bi-directional LSTM and finally by a CRF layer. The output is the most probable tag sequence, as estimated by the CRF
The details of the gold standard corpora
| Corpora | Text genre | Text type | Entity type | No. sentences | No. tokens | No. unique tokens | No. annotations | No. unique annotations |
|---|---|---|---|---|---|---|---|---|
| CHEMDNER patent ( | Patent | Abstract | Chemicals | 35 584 | 1 465 471 | 62 344 | 64 026 | 20 893 |
| Genes/proteins | 35 584 | 1 465 471 | 62 344 | 12 597 | 5835 | |||
| CHEBI ( | Patent | Full-text | Chemicals | 10 638 | 314 105 | 24 424 | 17 724 | 5109 |
| BioSemantics ( | Patent | Full-text | Chemicals | 173 808 | 5 690 518 | 208 326 | 368 091 | 72 756 |
| CHEMDNER ( | Scientific article | Abstract | Chemicals | 89 679 | 2 235 435 | 114 837 | 79 842 | 24 321 |
| CDR ( | Scientific article | Abstract | Chemicals | 14 228 | 323 281 | 23 068 | 15 411 | 3629 |
| Diseases | 14 228 | 323 281 | 23 068 | 12 694 | 3459 | |||
| BioCreative II GM ( | Scientific article | Abstract | Genes/proteins | 20 510 | 508 257 | 50 864 | 20 703 | 14 906 |
| JNLPBA ( | Scientific article | Abstract | Genes/proteins | 22 562 | 597 333 | 25 046 | 35 460 | 10 732 |
| Cell Lines | 22 562 | 597 333 | 25 046 | 4332 | 2528 | |||
| CellFinder ( | Scientific article | Full-text | Genes/proteins | 2177 | 65 031 | 7977 | 1340 | 600 |
| Species | 2177 | 65 031 | 7977 | 435 | 51 | |||
| Cell Lines | 2177 | 65 031 | 7977 | 350 | 69 | |||
| OSIRIS ( | Scientific article | Abstract | Genes/proteins | 1043 | 28 697 | 4669 | 768 | 275 |
| DECA ( | Scientific article | Abstract | Genes/proteins | 5468 | 138 034 | 14 515 | 6048 | 2360 |
| Variome ( | Scientific article | Full-text | Genes/proteins | 6471 | 172 409 | 12 659 | 4309 | 596 |
| Diseases | 6471 | 172 409 | 12 659 | 6025 | 629 | |||
| Species | 6471 | 172 409 | 12 659 | 182 | 8 | |||
| PennBioIE ( | Scientific article | Abstract | Genes/proteins | 14 305 | 357 313 | 21 089 | 17 427 | 4348 |
| FSU-PRGE ( | Scientific article | Abstract | Genes/proteins | 35 465 | 899 426 | 57 498 | 56 087 | 19 470 |
| IEPA ( | Scientific article | Abstract | Genes/proteins | 243 | 15 174 | 2923 | 1109 | 209 |
| BioInfer ( | Scientific article | Abstract | Genes/proteins | 1141 | 33 832 | 5301 | 4162 | 1150 |
| miRNA ( | Scientific article | Abstract | Genes/proteins | 2704 | 65 998 | 7821 | 1006 | 409 |
| Diseases | 2704 | 65 998 | 7821 | 2123 | 671 | |||
| Species | 2704 | 65 998 | 7821 | 726 | 47 | |||
| NCBI Disease ( | Scientific article | Abstract | Diseases | 7639 | 174 487 | 12 128 | 6881 | 2192 |
| Arizona Disease ( | Scientific article | Abstract | Diseases | 2884 | 76 489 | 7358 | 3206 | 1188 |
| SCAI ( | Scientific article | Abstract | Diseases | 4332 | 104 015 | 12 558 | 2226 | 1048 |
| S800 ( | Scientific article | Abstract | Species | 7933 | 195 197 | 20 526 | 3646 | 1564 |
| LocText ( | Scientific article | Abstract | Species | 949 | 22 550 | 4371 | 276 | 39 |
| Linneaus ( | Scientific article | Full-text | Species | 17 788 | 473 148 | 34 396 | 4077 | 419 |
| CLL ( | Scientific article | Abstract, Full-text | Cell Lines | 215 | 6547 | 2468 | 341 | 311 |
| Gellus ( | Scientific article | Abstract, Full-text | Cell Lines | 11 221 | 278 910 | 25 628 | 640 | 237 |
Macro averaged performance values in terms of precision, recall and F1-score for CRF and LSTM-CRF methods with word embedding features: (i) Patent, (ii) PubMed-PMC and (iii) Wiki-PubMed-PMC
| Precision (%) | Recall (%) | F1-score (%) | |||||||
|---|---|---|---|---|---|---|---|---|---|
| (i) | (ii) | (iii) | (i) | (ii) | (iii) | (i) | (ii) | (iii) | |
| CRF | 82.71 | 84.49 | 71.98 | 73.07 | 76.36 | 77.98 | |||
| LSTM-CRF | 80.10 | 81.39 | 81.04 | 80.72 | 80.26 | 80.79 | |||
The highest values for each method are represented in bold.
Fig. 2F1-scores of baseline (B), generic CRF (C) and generic LSTM-CRF (L) for five entity types, each measured within 4–12 corpora. The score for each corpus per entity type is depicted by a specific colored circle
Macro and micro averaged performance values in terms of precision, recall and F1-score for the baselines (B), the generic CRF method (C) and the generic LSTM-CRF method (L) over the corpora per each entity type
| a) Macro averaged performance | |||||||||
| Precision (%) | Recall (%) | F1-score (%) | |||||||
| B | C | L | B | C | L | B | C | L | |
| Chemicals | 79.8 | 82.82 | 80.73 | 79.26 | 80.22 | 82.32 | |||
| Diseases | 78.54 | 81.66 | 73.67 | 73.47 | 75.89 | 77.56 | |||
| Species | 72.75 | 80.84 | 79.63 | 76.15 | 73.03 | 82.09 | |||
| Gene/protein | 82.54 | 81.57 | 75.87 | 74.67 | 79.35 | 78.15 | |||
| Cell lines | 84.04 | 82.65 | 61.32 | 56.91 | 70.35 | 66.96 | |||
| Average | 80.38 | 81.77 | 75.13 | 73.26 | 76.61 | 78.04 | |||
| b) Micro averaged performance | |||||||||
| Precision (%) | Recall (%) | F1-score (%) | |||||||
| B | C | L | B | C | L | B | C | L | |
| Chemicals | 75.16 | 81.81 | 79.7 | 79.12 | 77.36 | 81.23 | |||
| Diseases | 79.29 | 82.56 | 75.98 | 77.17 | 77.60 | 80.26 | |||
| Species | 77.33 | 83.84 | 67.74 | 76.68 | 72.22 | 82.35 | |||
| Gene/protein | 81.34 | 81.50 | 77.45 | 78.14 | 79.35 | 79.92 | |||
| Cell lines | 70.20 | 68.59 | 57.20 | 57.60 | 63.68 | 63.28 | |||
| Average | 76.4 | 81.72 | 78.87 | 78.71 | 77.62 | 80.85 | |||
The highest values for each entity class are highlighted in bold.
Fig. 3Venn diagrams demonstrating the area of overlap among the FP sets or the FN sets of the three methods: the baseline (B), the generic CRF method (C) and the generic LSTM-CRF method (L) per entity type
The average length of errors from the lists of FPs and FNs, and the F1-scores of single-token and multi-token entities measured for baselines (B), generic CRF methods (C) and generic LSTM-CRF methods (L) per entity type
| F1-score (%) | Mention length | ||||
|---|---|---|---|---|---|
| Single-Token | Multi-Token | FP | FN | ||
| Chemicals | L | 19.07 | 20.69 | ||
| C | 82.53 | 78.07 | 25.1 | 20.11 | |
| B | 76.77 | 76.84 | 19.13 | 16.49 | |
| Diseases | L | 14.51 | 15.58 | ||
| C | 84.20 | 74.92 | 15.09 | 14.66 | |
| B | 80.42 | 73.80 | 13.90 | 14.36 | |
| Species | L | 79.10 | 10.68 | 12.34 | |
| C | 85.57 | 76.75 | 11.55 | 11.52 | |
| B | 67.72 | 8.00 | 9.15 | ||
| Genes/Proteins | L | 12.85 | 14.13 | ||
| C | 84.77 | 69.32 | 13.50 | 13.52 | |
| B | 83.21 | 70.99 | 11.73 | 11.98 | |
| Cell Lines | L | 20.02 | 15.65 | ||
| C | 64.77 | 62.52 | 19.01 | 15.24 | |
| B | 62.55 | 64.24 | 18.41 | 13.91 | |
The highest F1-scores are emphasized in bold.
Fig. 4Aggregated precision and recall of the generic LSTM-CRF method before (L) and after applying filters, removing 5 (L—5%), 10 (L—10%) and 15 (L—15%) of entities, per entity type. The highest averaged precision and recall per entity type obtained by baselines or the generic CRF model are represented by the green and the orange dash line, respectively
Precision, recall and F1-scores obtained for each corpus by the baselines and eight variants of the CRF and the LSTM-CRF methods
| Baseline | CRF | CRF (i) | CRF (ii) | CRF (iii) | LSTM- CRF | LSTM- CRF (i) | LSTM- CRF (ii) | LSTM- CRF (iii) | LSTM-CRF (iii) 5% | LSTM-CRF (iii) 10% | LSTM-CRF (iii) 15% | ||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| a) Precision scores | |||||||||||||
| Chemicals | CHEMDNER patent | 82.56 | 84.52 | 84.29 | 84.42 | 84.29 | 81.09 | 82.25 | 83.35 | 83.33 | 84.66 | 86.14 | 87.44 |
| CHEBI | 71.46 | 80.59 | 78.56 | 78.68 | 79.67 | 71.58 | 71.1 | 72.95 | 69.5 | 71.3 | 72.85 | 74.57 | |
| BioSemantics | 72.56 | 82.17 | 82.1 | 81.93 | 81.96 | 80.64 | 82.07 | 79.72 | 80.88 | 82.4 | 83.81 | 85.11 | |
| CHEMDNER | 83.23 | 90.4 | 90.52 | 90.66 | 90.79 | 87.31 | 87.27 | 88.25 | 87.83 | 89.97 | 91.77 | 93.47 | |
| CDR | 89.19 | 91.04 | 91.47 | 91.81 | 92.37 | 88.19 | 91.12 | 92.18 | 92.57 | 94.61 | 95.98 | 97.16 | |
| Genes/Proteins | CHEMDNER patent | 67.02 | 67.01 | 67.33 | 67.66 | 67.13 | 61.05 | 64.02 | 65.74 | 66.23 | 67.3 | 68.32 | 69.29 |
| BioCreative II GM | 79.92 | 76.45 | 77.07 | 77.27 | 77.22 | 73.79 | 77.22 | 78.99 | 77.5 | 79.6 | 81.5 | 83.81 | |
| JNLPBA | 72.74 | 73.28 | 73.21 | 73.39 | 73.65 | 71.86 | 73.07 | 74.77 | 74.83 | 76.59 | 78.16 | 79.71 | |
| CellFinder | 81.98 | 74.39 | 77.27 | 84.05 | 83.77 | 78.49 | 82.01 | 83.56 | 88.73 | 91.19 | 93.12 | 95.28 | |
| OSIRIS | 83.33 | 77.92 | 76.88 | 81.38 | 78.61 | 74.07 | 76.52 | 76.73 | 78.02 | 80 | 81.73 | 82.74 | |
| DECA | 81.89 | 76.89 | 76.6 | 78.1 | 77.66 | 64.11 | 67.22 | 72.91 | 75.27 | 76.76 | 78.67 | 80.05 | |
| Variome | 90.32 | 90.08 | 89.56 | 89.92 | 90.12 | 87.63 | 87.64 | 86.59 | 87.47 | 89.41 | 90.72 | 91.33 | |
| PennBioIE | 88.12 | 88.14 | 88.32 | 89.5 | 89.58 | 83.31 | 86.51 | 88.02 | 86.97 | 89.34 | 91.21 | 93.11 | |
| FSU-PRGE | 87.79 | 87.04 | 86.71 | 87.33 | 87.2 | 81.69 | 85.09 | 86.03 | 87.26 | 89.7 | 91.51 | 92.88 | |
| IEPA | 90.68 | 88.28 | 87.08 | 90.11 | 88.89 | 91.01 | 84.67 | 86.6 | 88.01 | 89.21 | 89.35 | 90.76 | |
| BioInfer | 94.16 | 91.57 | 91.05 | 92.38 | 92.01 | 87.73 | 90.87 | 91.71 | 92.64 | 94.72 | 95.86 | 96.76 | |
| miRNA | 87.35 | 83.45 | 80.31 | 85.11 | 84.69 | 70.15 | 69.94 | 75 | 75.95 | 78.47 | 80.38 | 80.82 | |
| Diseases | NCBI Disease | 80.11 | 84.81 | 84.25 | 84.87 | 85.18 | 81.7 | 86.16 | 86.43 | 85.31 | 87.49 | 89.1 | 90.4 |
| CDR | 80.61 | 83.54 | 83.26 | 83.36 | 83.72 | 79.24 | 81.06 | 83.3 | 84.19 | 84.63 | 86.4 | 88.17 | |
| Variome | 78.42 | 85.42 | 84.68 | 85.17 | 85.28 | 83.13 | 83.42 | 85.12 | 84.51 | 86.21 | 87.62 | 88.89 | |
| Arizona Disease | 76.11 | 79.54 | 78.41 | 80.91 | 79.89 | 72.64 | 74.52 | 79.15 | 76.64 | 78.62 | 81.17 | 82.89 | |
| SCAI | 76.99 | 82.85 | 81.47 | 81.51 | 81.3 | 67.4 | 74.97 | 77.21 | 78.46 | 80.1 | 81.61 | 83.49 | |
| miRNA | 79.01 | 81.56 | 77.9 | 80.27 | 79.54 | 80.39 | 77.44 | 80.84 | 80.86 | 82.81 | 83.92 | 85.59 | |
| Species | S800 | 77.66 | 77.33 | 76.55 | 76.8 | 76.8 | 69.67 | 73.39 | 72.67 | 74.55 | 75.95 | 77.27 | 78.99 |
| CellFinder | 82 | 93.18 | 89.9 | 90.22 | 91.3 | 81.98 | 83.76 | 81.67 | 79.03 | 80.51 | 83.04 | 83.02 | |
| Variome | 32.17 | 82.98 | 81.25 | 83.33 | 83.33 | 50.46 | 66.27 | 62.92 | 59 | 61.05 | 62.22 | 64.71 | |
| LocText | 84.78 | 95.45 | 92.5 | 95.65 | 94.37 | 94.74 | 93.48 | 90.36 | 91.01 | 92.94 | 92.59 | 93.42 | |
| Linneaus | 87.4 | 98.26 | 96.21 | 97.73 | 97.63 | 93.23 | 91.54 | 92.49 | 93.57 | 95.27 | 96.38 | 96.99 | |
| miRNA | 72.54 | 92.67 | 85.06 | 94.19 | 94.16 | 91.67 | 89.29 | 86.13 | 87.88 | 92.95 | 94.59 | 94.29 | |
| Cell lines | CLL | 83.33 | 71.88 | 73.97 | 77.78 | 81.97 | 76.47 | 82.43 | 77.63 | 86.84 | 87.67 | 91.3 | 90.77 |
| Gellus | 93.18 | 93.6 | 86.25 | 93.55 | 93.42 | 70.89 | 85.64 | 90.23 | 89.53 | 91.46 | 91.61 | 94.56 | |
| CellFinder | 98.72 | 94.12 | 95.45 | 97.01 | 96.67 | 89.47 | 87.34 | 97.06 | 92.52 | 93.14 | 95.88 | 95.6 | |
| JNLPBA | 65.34 | 64.8 | 64.19 | 64.11 | 64.11 | 51.48 | 54.2 | 59.83 | 61.73 | 63.54 | 64.75 | 66.95 | |
| b) Recall scores | |||||||||||||
| Chemicals | CHEMDNER patent | 84.53 | 83.09 | 83.99 | 84.14 | 84.14 | 86.23 | 90.41 | 87.72 | 87.53 | 85.28 | 82.19 | 78.8 |
| CHEBI | 66.87 | 60.61 | 64.81 | 64.06 | 65.58 | 71.44 | 80.27 | 78.52 | 79 | 76.99 | 74.52 | 72.05 | |
| BioSemantics | 78.37 | 78.12 | 78.29 | 78.38 | 78.34 | 79.29 | 82.56 | 84.42 | 83.14 | 80.47 | 77.54 | 74.37 | |
| CHEMDNER | 85.08 | 78.27 | 80.51 | 81.58 | 81.8 | 79.8 | 84.96 | 83.17 | 85.45 | 83.36 | 80.54 | 77.48 | |
| CDR | 88.83 | 81.25 | 84.63 | 86.03 | 86.46 | 85.58 | 89.86 | 89.94 | 88.77 | 86.18 | 82.84 | 79.18 | |
| Genes/Proteins | CHEMDNER patent | 69.85 | 64.19 | 67.69 | 68.88 | 68.17 | 69.9 | 77.76 | 76.74 | 74.34 | 72.84 | 70.04 | 67.1 |
| BioCreative II GM | 75.21 | 67.97 | 69.69 | 70.83 | 70.89 | 72.46 | 77.72 | 78.16 | 78.13 | 76.19 | 73.9 | 71.78 | |
| JNLPBA | 72.77 | 73.31 | 73.93 | 74.96 | 75.14 | 75.32 | 79.11 | 79.22 | 79.82 | 77.62 | 75.04 | 72.27 | |
| CellFinder | 63.17 | 56.53 | 63.47 | 67.47 | 68.8 | 36 | 63.2 | 65.07 | 65.07 | 63.47 | 61.33 | 59.2 | |
| OSIRIS | 63.49 | 48.58 | 57.89 | 61.94 | 63.97 | 56.68 | 76.52 | 76.11 | 73.28 | 71.26 | 68.83 | 65.99 | |
| DECA | 58.24 | 55.12 | 57.7 | 61.25 | 61.09 | 67.68 | 73.45 | 66.82 | 66.16 | 64.08 | 62.21 | 59.78 | |
| Variome | 93.51 | 90.2 | 92.24 | 93.47 | 93.06 | 91.56 | 94.56 | 96.6 | 96.87 | 94.15 | 90.48 | 85.99 | |
| PennBioIE | 82.61 | 80.54 | 81.82 | 82.35 | 82.76 | 82.98 | 87.09 | 85.44 | 86.54 | 84.47 | 81.7 | 78.77 | |
| FSU-PRGE | 84.14 | 81.25 | 82.27 | 83.42 | 83.44 | 84.25 | 87.55 | 88.23 | 87.24 | 85.79 | 82.91 | 79.48 | |
| IEPA | 85.18 | 75.33 | 78.67 | 79 | 80 | 81 | 84.67 | 88.33 | 85.67 | 82.67 | 78.33 | 75.33 | |
| BioInfer | 89.44 | 88.42 | 89.98 | 90.06 | 90.14 | 90.06 | 92.72 | 91.78 | 89.59 | 87.01 | 83.41 | 79.5 | |
| miRNA | 72.93 | 42.76 | 54.77 | 56.54 | 58.66 | 49.82 | 80.57 | 79.51 | 78.09 | 75.97 | 73.85 | 69.96 | |
| Diseases | NCBI Disease | 79.69 | 76.62 | 79.74 | 80.69 | 81.59 | 75.86 | 83.06 | 82.92 | 83.58 | 81.45 | 78.56 | 75.3 |
| CDR | 78.36 | 73.21 | 76.47 | 79.15 | 79.17 | 76.98 | 83.15 | 83.38 | 82.79 | 81.54 | 78.87 | 76.01 | |
| Variome | 74.66 | 81.2 | 81.47 | 81.64 | 81.82 | 84.38 | 84.82 | 86.32 | 87.64 | 85 | 81.82 | 78.38 | |
| Arizona Disease | 59.3 | 57.64 | 62.63 | 64.75 | 64.3 | 65.05 | 70.8 | 70.65 | 72.47 | 70.65 | 69.14 | 66.72 | |
| SCAI | 70.6 | 48.45 | 59.6 | 61.02 | 62.01 | 56.07 | 76.55 | 70.34 | 70.48 | 68.22 | 65.82 | 63.56 | |
| miRNA | 79.41 | 65.05 | 73.18 | 71.11 | 71.97 | 70.93 | 78.37 | 76.64 | 75.26 | 73.36 | 70.42 | 67.82 | |
| Species | S800 | 69.08 | 55.79 | 60.09 | 59.72 | 60.93 | 62.9 | 64.95 | 69.35 | 69.81 | 67.57 | 65.14 | 62.9 |
| CellFinder | 87.85 | 77.36 | 83.96 | 78.3 | 79.25 | 85.85 | 92.45 | 92.45 | 92.45 | 89.62 | 87.74 | 83.02 | |
| Variome | 98.48 | 59.09 | 59.09 | 60.61 | 60.61 | 83.33 | 83.33 | 84.85 | 89.39 | 87.88 | 84.85 | 83.33 | |
| LocText | 86.66 | 68.48 | 80.43 | 71.74 | 72.83 | 78.26 | 93.48 | 81.52 | 88.04 | 85.87 | 81.52 | 77.17 | |
| Linneaus | 64.56 | 79.37 | 84.64 | 90.78 | 90.34 | 76.21 | 94.03 | 87.53 | 93.24 | 90.25 | 86.39 | 82.09 | |
| miRNA | 71.15 | 89.1 | 94.87 | 93.59 | 92.95 | 91.67 | 96.15 | 95.51 | 92.95 | 92.95 | 89.74 | 84.62 | |
| Cell lines | CLL | 77.92 | 59.74 | 70.13 | 63.64 | 64.94 | 67.53 | 79.22 | 76.62 | 85.71 | 83.12 | 81.82 | 76.62 |
| Gellus | 49.8 | 47.37 | 55.87 | 58.7 | 57.49 | 61.13 | 67.61 | 63.56 | 62.35 | 60.73 | 57.49 | 56.28 | |
| CellFinder | 60.63 | 25.81 | 33.87 | 52.42 | 46.77 | 41.13 | 55.65 | 79.84 | 79.84 | 76.61 | 75 | 70.16 | |
| JNLPBA | 56.95 | 55.31 | 57.13 | 59.14 | 58.47 | 63.44 | 67.94 | 66.7 | 64.98 | 63.54 | 61.34 | 59.9 | |
| c) F1-scores | |||||||||||||
| Chemicals | CHEMDNER patent | 83.53 | 83.8 | 84.14 | 84.28 | 84.22 | 83.58 | 86.14 | 85.48 | 85.38 | 84.97 | 84.12 | 82.9 |
| CHEBI | 69.09 | 69.18 | 71.03 | 70.63 | 71.94 | 71.51 | 75.41 | 75.63 | 73.95 | 74.03 | 73.67 | 73.29 | |
| BioSemantics | 75.35 | 80.1 | 80.15 | 80.12 | 80.11 | 79.96 | 82.32 | 82.01 | 81.99 | 81.42 | 80.55 | 79.27 | |
| CHEMDNER | 84.15 | 83.9 | 85.22 | 85.88 | 86.06 | 83.38 | 86.1 | 85.63 | 86.62 | 86.54 | 85.79 | 84.73 | |
| CDR | 89.01 | 85.87 | 87.92 | 88.83 | 89.31 | 86.87 | 90.48 | 91.05 | 90.63 | 90.2 | 88.93 | 87.25 | |
| Genes/Proteins | CHEMDNER patent | 68.4 | 65.57 | 67.51 | 68.26 | 67.65 | 65.17 | 70.22 | 70.81 | 70.05 | 69.96 | 69.17 | 68.18 |
| BioCreative II GM | 77.49 | 71.96 | 73.2 | 73.91 | 73.92 | 73.12 | 77.47 | 78.57 | 77.82 | 77.86 | 77.52 | 77.33 | |
| JNLPBA | 72.75 | 73.3 | 73.57 | 74.17 | 74.39 | 73.55 | 75.97 | 76.93 | 77.25 | 77.1 | 76.57 | 75.81 | |
| CellFinder | 71.35 | 64.24 | 69.69 | 74.85 | 75.55 | 49.36 | 71.39 | 73.16 | 75.08 | 74.84 | 73.95 | 73.03 | |
| OSIRIS | 72.06 | 59.85 | 66.05 | 70.34 | 70.54 | 64.22 | 76.52 | 76.42 | 75.57 | 75.37 | 74.73 | 73.42 | |
| DECA | 68.06 | 64.21 | 65.82 | 68.65 | 68.39 | 65.85 | 70.2 | 69.73 | 70.42 | 69.85 | 69.48 | 68.45 | |
| Variome | 91.88 | 90.14 | 90.88 | 91.66 | 91.57 | 89.55 | 90.97 | 91.32 | 91.93 | 91.72 | 90.6 | 88.58 | |
| PennBioIE | 85.27 | 84.17 | 84.94 | 85.78 | 86.04 | 83.14 | 86.8 | 86.71 | 86.75 | 86.84 | 86.19 | 85.34 | |
| FSU-PRGE | 85.92 | 84.05 | 84.43 | 85.33 | 85.27 | 82.95 | 86.3 | 87.12 | 87.25 | 87.7 | 87 | 85.66 | |
| IEPA | 87.84 | 81.29 | 82.66 | 84.19 | 84.21 | 85.71 | 84.67 | 87.46 | 86.82 | 85.81 | 83.48 | 82.33 | |
| BioInfer | 91.73 | 89.97 | 90.52 | 91.2 | 91.07 | 88.88 | 91.79 | 91.75 | 91.09 | 90.7 | 89.21 | 87.29 | |
| miRNA | 79.49 | 56.54 | 65.13 | 67.94 | 69.31 | 58.26 | 74.88 | 77.19 | 77 | 77.2 | 76.98 | 75 | |
| Diseases | NCBI Disease | 79.89 | 80.51 | 81.94 | 82.73 | 83.35 | 78.67 | 84.58 | 84.64 | 84.44 | 84.36 | 83.5 | 82.16 |
| CDR | 79.46 | 78.03 | 79.72 | 81.2 | 81.38 | 78.1 | 82.09 | 83.34 | 83.49 | 83.06 | 82.46 | 81.64 | |
| Variome | 76.49 | 83.26 | 83.04 | 83.37 | 83.51 | 83.75 | 84.11 | 85.71 | 86.05 | 85.6 | 84.62 | 83.3 | |
| Arizona Disease | 66.66 | 66.84 | 69.64 | 71.93 | 71.25 | 68.64 | 72.61 | 74.66 | 74.49 | 74.42 | 74.67 | 73.93 | |
| SCAI | 73.65 | 61.14 | 68.84 | 69.79 | 70.35 | 61.22 | 75.75 | 73.61 | 74.26 | 73.68 | 72.87 | 72.17 | |
| miRNA | 79.2 | 72.38 | 75.47 | 75.41 | 75.57 | 75.37 | 77.9 | 78.69 | 77.96 | 77.8 | 76.58 | 75.68 | |
| Species | S800 | 73.11 | 64.82 | 67.33 | 67.19 | 67.95 | 66.11 | 68.91 | 70.97 | 72.1 | 71.51 | 70.69 | 70.03 |
| CellFinder | 84.82 | 84.54 | 86.83 | 83.84 | 84.85 | 83.87 | 87.89 | 86.73 | 85.22 | 84.82 | 85.32 | 83.02 | |
| Variome | 48.49 | 69.03 | 68.42 | 70.18 | 70.18 | 62.86 | 73.83 | 72.26 | 71.08 | 72.05 | 71.79 | 72.85 | |
| LocText | 85.7 | 79.75 | 86.05 | 81.99 | 82.21 | 85.71 | 93.48 | 85.71 | 89.5 | 89.27 | 86.71 | 84.52 | |
| Linneaus | 74.26 | 87.81 | 90.05 | 94.13 | 93.84 | 83.86 | 92.77 | 89.94 | 93.4 | 92.7 | 91.11 | 88.92 | |
| miRNA | 71.83 | 90.85 | 89.7 | 93.89 | 93.55 | 91.67 | 92.59 | 90.58 | 90.34 | 92.95 | 92.11 | 89.19 | |
| Cell lines | CLL | 80.54 | 65.25 | 72 | 70 | 72.46 | 71.72 | 80.79 | 77.12 | 86.27 | 85.33 | 86.3 | 83.1 |
| Gellus | 64.91 | 62.9 | 67.81 | 72.14 | 71.18 | 65.65 | 75.57 | 74.58 | 73.51 | 72.99 | 70.65 | 70.56 | |
| CellFinder | 75.12 | 40.51 | 50 | 68.06 | 63.04 | 56.35 | 67.98 | 87.61 | 85.71 | 84.07 | 84.16 | 80.93 | |
| JNLPBA | 60.86 | 59.68 | 60.46 | 61.52 | 61.16 | 56.84 | 60.3 | 63.08 | 63.31 | 63.54 | 63 | 63.23 | |
CRF and LSTM-CRF stand for the CRF and the LSTM-CRF methods without word embedding features. The other ones represented by numbers are using one of the three word embeddings: (i) Patent, (ii) PubMed-PMC and (iii) Wiki-PubMed-PMC. The ones represented by percentage numbers indicate the performance of the method after removing a percentage of predicted entities with lower confidence values.