| Literature DB >> 34276970 |
Rodrique Kafando1,2, Rémy Decoupes1,2, Sarah Valentin2,3,4, Lucile Sautot2,5, Maguelonne Teisseire1,2, Mathieu Roche2,3.
Abstract
Here, we introduce ITEXT-BIO, an intelligent process for biomedical domain terminology extraction from textual documents and subsequent analysis. The proposed methodology consists of two complementary approaches, including free and driven term extraction. The first is based on term extraction with statistical measures, while the second considers morphosyntactic variation rules to extract term variants from the corpus. The combination of two term extraction and analysis strategies is the keystone of ITEXT-BIO. These include combined intra-corpus strategies that enable term extraction and analysis either from a single corpus (intra), or from corpora (inter). We assessed the two approaches, the corpus or corpora to be analysed and the type of statistical measures used. Our experimental findings revealed that the proposed methodology could be used: (1) to efficiently extract representative, discriminant and new terms from a given corpus or corpora, and (2) to provide quantitative and qualitative analyses on these terms regarding the study domain.Entities:
Keywords: Biomedical terminology; Intelligent analysis; Terminology extraction
Year: 2021 PMID: 34276970 PMCID: PMC8272612 DOI: 10.1007/s13755-021-00156-6
Source DB: PubMed Journal: Health Inf Sci Syst ISSN: 2047-2501
Statistics related to the COVID19-MOOD-data dataset
| Papers1-title | 9315 | 15 | ± 8 |
| Papers1-abstract | 9315 | 180 | ± 94 |
| Papers1-content | 9315 | 4639 | ± 359 |
|
| |||
| Papers2-title | 32322 | 13 | ± 10 |
| Papers2-abstract | 32322 | 168 | ± 88 |
| Papers2-content | 32322 | 4913 | ± 720 |
Table legend
| Abbreviations | Description |
|---|---|
| Number of documents in the corpus | |
| Average number of words of a document in the corpus | |
| Corpus standard deviation | |
| Noun | |
| Matches singular and plural noun terms | |
| Adjective | |
| Proper noun |
Fig. 1The Free and Driven process for term extraction using BioTex and FASTR
Fig. 2Proposed combination for term extraction
Example of BioTex ouput
| Terms | in_umls | Rank |
|---|---|---|
| Public health | 1 | 1602.3971 |
| Respiratory syndrome | 0 | 1481.9399 |
| Infectious disease | 1 | 1198.2317 |
| Virus infection | 1 | 1126.9083 |
| Influenza virus | 1 | 1023.8858 |
| Immune response | 1 | 1008.0362 |
Fig. 3Representative terms from Papers1
Best ranked terms extracted from Paper1 using F-TFIDF-C_M
| F-TFIDF-C_M | |||||
|---|---|---|---|---|---|
| Title Corpus | Abstract Corpus | Content Corpus | |||
| terms | rank | terms | rank | terms | rank |
| Respiratory syncytial virus | 1.9880 | Public health | 1.9986 | Additional file | 1.9976 |
| Middle east respiratory syndrome coronavirus | 1.9846 | Infectious diseases | 1.9979 | Infectious disease | 1.997 |
| Systematic review | 1.9842 | Immune responses | 1.9976 | nk cells | 1.997 |
| Open access | 1.9819 | Influenza virus | 1.9976 | Health care | 1.996 |
| Zika virus | 1.9819 | t cells | 1.9975 | Endothelial cells | 1.9957 |
| Gene expression | 1.9795 | Virus infection | 1.9974 | Frequency domain | 1.9957 |
| Virology journal | 1.9788 | Respiratory tract | 1.9973 | Ebola virus | 1.9948 |
| Human coronavirus | 1.976 | Viral infections | 1.9969 | Influenza infection | 1.9943 |
| Case report | 1.9756 | RNA viruses | 1.9967 | Real-time rt-pcr | 1.9933 |
| Syncytial virus | 1.9752 | Acute respiratory syndrome | 1.9961 | Incubation period | 1.99325 |
| t cell | 1.9746 | 95percent ci | 1.996 | Health emergency | 1.9932 |
| Infectious bronchitis | 1.9726 | Ebola virus | 1.9945 | Index patient | 1.9932 |
| Sars coronavirus | 1.9723 | Influenza viruses | 1.9943 | Membrane rafts | 1.9931 |
| BMC public health | 1.9701 | Avian influenza | 1.9939 | pcr products | 1.9929 |
| t cells | 1.9689 | Respiratory tract infections | 1.9938 | 2c atpase | 1.9926 |
| Acute respiratory infection | 1.9672 | Health care | 1.9925 | b cell | 1.9924 |
| Mini review | 1.9636 | Hepatitis c | 1.9922 | Close contact | 1.9924 |
| Respiratory viral infections | 1.9636 | Type I | 1.9918 | Final dataset | 1.9922 |
| BMC public | 1.9625 | Cell line | 1.9914 | 3d8 scfv | 1.9921 |
| Ebola virus disease | 1.9592 | Spike protein | 1.9909 | Pol ii | 1.992 |
| Supplementary information | 1.9574 | Codon usage | 1.9908 | 3c pro | 1.992 |
| Community-acquired pneumonia | 1.9543 | Pandemic influenza | 1.9907 | Influenza pandemic | 1.9919 |
| Global health | 1.9543 | Endoplasmic reticulum | 1.9904 | Phylogenetic tree | 1.9918 |
| Peer review | 1.9543 | Saudi Arabia | 1.9904 | Protein vi | 1.9917 |
| Japanese encephalitis virus | 1.9512 | Innate immunity | 1.9903 | ag nps | 1.9916 |
| Innate immunity | 1.9488 | Porcine epidemic | 1.9903 | Influenza b | 1.99125 |
| Multiple sclerosis | 1.9488 | Global health | 1.9902 | ifn | 1.991 |
| Human rhinovirus | 1.9466 | Vaccine development | 1.9901 | ill patients | 1.9908 |
| Supplementary material | 1.9442 | Cell death | 1.9898 | Poly tail | 1.9908 |
| Cell entry | 1.9417 | Infectious disease | 1.9896 | Host range | 1.9906 |
| Coronavirus spike | 1.9417 | Peripheral blood | 1.9895 | Cyclin d3 | 1.9903 |
| Human adenovirus | 1.9417 | Hong Kong | 1.9894 | Sequence accession | 1.9903 |
| East respiratory syndrome coronavirus | 1.9414 | Immune cells | 1.9888 | Antiviral drugs | 1.9897 |
| Mers coronavirus | 1.9388 | Cell cycle | 1.9886 | Subunit vaccines | 1.9897 |
| West Africa | 1.9388 | Clinical trials | 1.9885 | Protein sequences | 1.9895 |
| Molecular epidemiology | 1.9323 | Infection control | 1.9884 | Oil spill | 1.9895 |
| National natural science | 1.931 | Mass spectrometry | 1.9883 | Swine flu | 1.9894 |
| Natural science foundation | 1.931 | Genome sequence | 1.9881 | Membrane proteins | 1.9893 |
| Rift valley fever | 1.931 | Clinical samples | 1.9877 | Contact tracing | 1.9891 |
| National natural science foundation | 1.9307 | Acute respiratory infections | 1.9874 | sars 3a | 1.9889 |
| Influenza infection | 1.9284 | Severe disease | 1.9868 | Critical care | 1.9888 |
| Protein response | 1.9284 | Hepatitis b | 1.9864 | hk-2 cells | 1.9888 |
| Science foundation | 1.9284 | Host response | 1.9864 | ap2 group | 1.9887 |
| Supplementary materials | 1.9284 | Type II | 1.9864 | prp sc | 1.9887 |
| Natural science | 1.9241 | Nucleic acids | 1.9862 | t-cell responses | 1.9887 |
| Respiratory syndrome coronavirus infection | 1.9241 | Surveillance systems | 1.9859 | DNA vaccines | 1.9886 |
| Influenza virus | 1.9212 | Influenza virus infection | 1.9852 | Reverse genetics | 1.9886 |
| Obstructive pulmonary disease | 1.92 | Antiviral drugs | 1.9851 | Health system | 1.9884 |
| Emerging microbes | 1.9193 | DNA vaccine | 1.9847 |
| 1.9884 |
| Original research | 1.9193 | Influenza infection | 1.9845 | hcv infection | 1.9883 |
| Retrospective study | 1.9193 | Reference genes | 1.9842 | Lung cancer | 1.9879 |
| Phylogenetic analysis | 1.9153 | Cell types | 1.984 | Nucleocapsid protein | 1.9879 |
| Respiratory syndrome coronavirus | 1.9151 | b cell | 1.9835 | 3c protease | 1.9878 |
| Clinical characteristics | 1.9138 | Vaccine candidates | 1.9835 | tgev infection | 1.9878 |
| Mass spectrometry | 1.9138 | Host species | 1.9833 | cs dna | 1.9878 |
| National natural | 1.9138 | Respiratory viral infections | 1.9832 | Risk perception | 1.9875 |
| Rift valley | 1.9138 | Endothelial cells | 1.9829 | s1 protein | 1.9875 |
| Science china | 1.9138 | Sequence data | 1.9829 | Ring vaccination | 1.9875 |
| Valley fever | 1.9138 | DNA viruses | 1.9826 | Syrian hamster | 1.9873 |
| Respiratory virus infections | 1.913 | Host innate | 1.9826 | Wild mice | 1.9873 |
| Syndrome coronavirus | 1.9096 | Parainfluenza virus | 1.9824 | Yellow fever | 1.9873 |
| Classical swine fever virus | 1.9087 | Tract infections | 1.9822 | Climate change | 1.9873 |
| b cells | 1.9074 | South Korea | 1.9821 | Public health services | 1.9873 |
| Host response | 1.9074 | Acute respiratory infection | 1.9817 | Index patients | 1.9872 |
| Science foundation of china | 1.9074 | Reproduction number | 1.9816 | Small rna | 1.9872 |
| Viral proteins | 1.9074 | Surveillance system | 1.9816 | IC activity | 1.9871 |
| Virus disease | 1.9065 | Causative agent | 1.9813 | Ebola virus disease | 1.9868 |
| Clinical infectious diseases | 1.9048 | Multiple sclerosis | 1.9811 | RNA chaperone | 1.9867 |
| World health organization | 1.9048 | rsv infection | 1.9809 | Caco-2 cells | 1.9867 |
| Antiviral agents | 1.9001 | Cellular proteins | 1.9808 | m2 channel | 1.9865 |
| Cell culture | 1.9001 | West nile virus | 1.9806 | Overlapping genes | 1.9865 |
| Pulmonary disease | 1.9001 | Respiratory diseases | 1.9805 | Nasal mucosa | 1.9865 |
| Study protocol | 1.9001 | tgev infection | 1.9805 | Hepatitis e | 1.9865 |
| Dengue virus | 1.8946 | e protein | 1.9802 | Genetic drift | 1.9865 |
| Public health | 1.893 | Gene expression | 1.9801 | a7 gfp | 1.9865 |
| RNA replication | 1.8915 | Structural proteins | 1.9799 | Tumor cells | 1.9864 |
| Japanese encephalitis | 1.8902 | Acute respiratory tract | 1.9792 | Tanguticum nanoparticles | 1.9864 |
| Syndrome coronavirus infection | 1.8864 | Hand hygiene | 1.9792 | cfu ml | 1.9864 |
| Human respiratory syncytial virus | 1.8841 | Disease transmission | 1.9788 | Ward closure | 1.9861 |
| Synonymous codon usage | 1.8824 | Human rhinovirus | 1.9785 | Case definitions | 1.9861 |
| Clinical infectious | 1.8813 | Bacterial infections | 1.9781 | Richards model | 1.9861 |
| Health organization | 1.8813 | Cancer cells | 1.9781 | Epimedium koreanum | 1.9861 |
| Severe pneumonia | 1.8813 | DNA vaccines | 1.9777 | ms2 plp | 1.986 |
| Dengue virus infection | 1.8772 | Type III | 1.9777 | Gene therapy | 1.9859 |
| Clinical samples | 1.8768 | Viral pathogenesis | 1.9773 | Integrin b3 | 1.9859 |
| Classical swine fever | 1.8744 | Zoonotic diseases | 1.9773 | Cardiovascular diseases | 1.9859 |
| Human antibody | 1.869 | Early detection | 1.9765 | Fourth site | 1.9859 |
| Lassa virus | 1.869 | Lung cancer | 1.9756 | Serial interval | 1.9858 |
| Pilot study | 1.869 | Nile virus | 1.9756 | trm cells | 1.9858 |
| Avian influenza viruses | 1.8667 | Human disease | 1.9751 | Electronic supplementary material | 1.9857 |
| Human respiratory syncytial | 1.8667 | rnase l | 1.9751 | Emergency nurses | 1.9856 |
| International health regulations | 1.8667 | Health systems | 1.9746 | Pet substrate | 1.9856 |
| Hepatitis c virus infection | 1.8661 | Incubation period | 1.9746 | fcov type | 1.9856 |
| Infectious bronchitis virus strain | 1.8661 | Rabies virus | 1.9746 | s1 text | 1.9856 |
| Vaccine development | 1.8601 | Adaptive immunity | 1.9741 | Global health research | 1.9854 |
| Protects hepatocytes from type I | 1.8564 | Multiplex pcr | 1.9741 | ace2 activity | 1.9853 |
| Type I interferon signaling disrupts | 1.8564 | nk cells | 1.9741 | 1.9853 | |
| Adaptive immunity | 1.8538 | Feline coronavirus | 1.9735 | Global health | 1.9852 |
| Adenovirus type | 1.8538 | Human populations | 1.9735 | Ham tsp | 1.9851 |
| Nonhuman primates | 1.8538 | Common cold | 1.9723 | Blood culture | 1.9849 |
Best ranked terms extracted from Paper1 using C-Value
| C-Value | |||||
|---|---|---|---|---|---|
| Title Corpus | Abstract Corpus | Content Corpus | |||
| Terms | Rank | Terms | Rank | Terms | Rank |
| Respiratory syndrome | 386.7309 | Public health | 1393.182 | t cells | 2063.1457 |
| Virus infection | 366.1263 | Respiratory syndrome | 1095.2091 | Public health | 1644.7156 |
| Porcine epidemic diarrhea virus | 329.7138 | Infectious diseases | 952.5625 | Amino acid | 1409.82415 |
| Porcine epidemic diarrhea | 318.0 | Immune response | 908.1835 | Immune response | 1400.94835 |
| Epidemic diarrhea virus | 306.0 | Immune responses | 841.6151 | Influenza virus | 1185.8689 |
| East respiratory syndrome | 284.0 | Influenza virus | 841.6151 | Immune responses | 1056.536 |
| Middle east | 261.5188 | t cells | 803.576 | t cell | 1056.37753 |
| Epidemic diarrhea | 256.7639 | Virus infection | 760.7811 | Gene expression | 1050.6716 |
| Diarrhea virus | 245.6692 | Respiratory tract | 727.4978 | Viral replication | 1021.5083 |
| Infectious diseases | 245.6692 | Vviral infection | 668.8542 | Infected cells | 939.72426 |
| Respiratory syndrome coronavirus | 240.0 | Viral replication | 665.6843 | Cell lines | 897.4057 |
| Influenza a | 225.0647 | Viral infections | 640.3249 | Viral infection | 888.6884 |
| Public health | 209.2151 | East respiratory syndrome | 638.0 | Virus infection | 872.68035 |
| Syndrome coronavirus | 191.7805 | Respiratory syndrome coronavirus | 636.0 | Amino acids | 866.816 |
| Porcine epidemic | 190.1955 | Middle east | 630.8151 | mg ml | 824.4975 |
| Influenza virus | 182.2707 | Gene expression | 627.6452 | Infectious diseases | 822.27855 |
| Respiratory tract | 180.6857 | Infectious disease | 613.3805 | Present study | 812.45177 |
| Middle east respiratory syndrome | 174.1446 | RNA viruses | 603.8707 | Respiratory tract | 812.13477 |
| Middle east respiratory | 170.0 | Present study | 575.3414 | Epithelial cells | 759.03855 |
| Respiratory syncytial virus | 166.0 | Respiratory viruses | 551.567 | Previous studies | 732.41119 |
| Infectious bronchitis | 160.0812 | Acute respiratory syndrome | 516.0 | Room temperature | 714.3426 |
| Infectious disease | 156.9113 | t cell | 513.5279 | Cell culture | 673.60907 |
| Infectious bronchitis virus | 156.0 | Syndrome coronavirus | 511.9429 | Additional file | 657.75946 |
| East respiratory | 136.3068 | Porcine epidemic diarrhea | 506.0 | Viral infections | 635.72848 |
| Syncytial virus | 134.7218 | 95percent ci | 502.4331 | Immune system | 617.97689 |
| Avian influenza | 131.5519 | Viral rna | 499.2632 | Respiratory syndrome | 617.3429 |
| Respiratory viruses | 131.5519 | Amino acid | 489.7534 | Cell line | 611.16155 |
| East respiratory syndrome coronavirus | 130.028 | Respiratory syncytial virus | 472.0 | Infectious disease | 607.04063 |
| Middle east respiratory syndrome coronavirus | 129.2481 | Cell lines | 443.7895 | 576.13388 | |
| Influenza a virus | 126.0 | Respiratory infections | 426.3549 | Western blot | 568.36754 |
| Bronchitis virus | 125.212 | Epithelial cells | 424.77 | rnase l | 565.0391 |
| Respiratory infections | 125.212 | Virus replication | 420.0151 | Virus replication | 560.6012 |
| Systematic review | 125.212 | Polymerase chain reaction | 408.0 | Cell surface | 543.9591 |
| Ebola virus | 120.4572 | Epidemic diarrhea virus | 406.0 | xx | 542.0572 |
| Acute respiratory | 117.2872 | Epidemic diarrhea | 402.5805 | Host cell | 539.83825 |
| Viral infections | 117.2872 | Host cell | 396.2406 | Codon usage | 523.03765 |
| Virus replication | 115.7023 | Syncytial virus | 378.806 | Viral proteins | 520.6601 |
| Open access | 109.3624 | Porcine epidemic diarrhea virus | 376.1524 | Respiratory viruses | 515.4298 |
| Zika virus | 109.3624 | Antiviral activity | 374.0512 | nk cells | 503.2256 |
| Respiratory tract infections | 102.0 | Risk factors | 374.0512 | Time points | 497.8367 |
| Viral infection | 101.4376 | Immune system | 369.2963 | Influenza viruses | 492.7648 |
| Immune response | 99.8526 | Ebola virus | 364.5414 | Important role | 491.0213 |
| Hepatitis c virus | 98.0 | Chain reaction | 355.0316 | Allergic rhinitis | 486.5835 |
| Gene expression | 96.6827 | Influenza viruses | 348.6918 | Antiviral activity | 481.3531 |
| Pandemic influenza | 96.6827 | Infected cells | 347.1068 | Global health | 473.9038 |
| Respiratory syndrome virus | 96.0 | Diarrhea virus | 340.7669 | mg kg | 470.0998 |
| Epithelial cells | 95.0978 | Host cells | 334.4271 | Frequency domain | 469.1489 |
| Complete genome | 93.5128 | Important role | 331.2572 | Control group | 466.13749 |
| Syndrome virus | 93.5128 | Phylogenetic analysis | 331.2572 | Viral load | 465.34499 |
| Virology journal | 93.5128 | Polymerase chain | 331.2572 | Binding site | 459.6391 |
| Hepatitis c | 91.9278 | Respiratory disease | 326.5023 | Expression levels | 453.6162 |
| Immune responses | 90.3429 | Avian influenza | 324.9173 | Hong Kong | 450.7237 |
| Genome sequence | 88.7579 | Respiratory tract infections | 320.0 | Clinical signs | 448.8613 |
| Dengue virus | 87.1729 | Infectious bronchitis | 285.2933 | Protein expression | 448.2274 |
| Molecular sciences | 84.0029 | Cell culture | 272.6136 | Wild type | 446.7833 |
| Type i | 84.0029 | Hepatitis c virus | 268.0 | Endothelial cells | 441.4120 |
| Acute respiratory syndrome | 84.0 | Health care | 264.6887 | Table s1 | 438.4006 |
| Complete genome sequence | 84.0 | Zika virus | 264.6887 | Flow cytometry | 437.4496 |
| Human coronavirus | 82.4181 | Infectious bronchitis virus | 260.0 | Saudi Arabia | 433.4872 |
| Respiratory infection | 82.4181 | Tract infections | 258.3489 | Viral genome | 433.3992 |
| Case report | 80.8331 | Hepatitis c | 255.179 | Negative control | 433.2230 |
| Tract infections | 80.8331 | Innate immune response | 252.0 | – | 431.7890 |
| Risk factors | 79.2481 | Monoclonal antibodies | 248.8391 | Cell types | 431.1098 |
| Spike protein | 77.6632 | Viral genome | 247.2542 | Viral entry | 427.9399 |
| t cell | 77.6632 | Type I | 242.4993 | Cell death | 425.24544 |
| Acute respiratory infections | 76.0 | Central nervous system | 242.0 | er stress | 423.185 |
| Coronavirus infection | 74.4932 | Amino acids | 239.3293 | Significant differences | 420.6490 |
| RNA viruses | 74.4932 | Animal models | 237.7444 | Health care | 420.4905 |
| Severe acute respiratory | 72.0 | Real-time pcr | 236.1594 | Tcid 50 | 417.3734 |
| Sars coronavirus | 71.3233 | Dengue virus | 232.9895 | Cathepsin l | 410.5053 |
| Isothermal amplification | 69.7384 | Viral load | 232.9895 | Risk factors | 408.9203 |
| Respiratory disease | 69.7384 | World Health Organization | 232.0 | Positive selection | 405.7504 |
| BMC public health | 66.0 | Cell line | 231.4045 | Cell cycle | 400.9955 |
| Disease virus | 64.9835 | Viral proteins | 229.8196 | Nucleotide sequences | 397.8256 |
| t cells | 63.3985 | Nervous system | 226.6496 | Plasma membrane | 393.5990 |
| Influenza viruses | 61.8135 | Wide range | 223.4797 | Intensive care | 392.2782 |
| Acute respiratory infection | 60.0 | Virus infections | 221.8948 | Host cells | 384.82889 |
| Type i interferon | 60.0 | Middle east respiratory syndrome | 220.5832 | Hand hygiene | 383.5609 |
| Journal frontiers | 58.6436 | Immunodeficiency virus | 218.7248 | Significant difference | 382.6099 |
| Fever virus | 57.0587 | Spike protein | 218.7248 | Immune cells | 381.02498 |
| Respiratory syncytial | 57.0587 | Life cycle | 217.1399 | Reference genes | 380.3909 |
| Severe acute | 57.0587 | Recent years | 217.1399 | HIV aids | 377.2211 |
| Respiratory tract infection | 56.0 | Codon usage | 215.5549 | Avian influenza | 376.8688 |
| Antiviral activity | 55.4737 | Viral pathogens | 215.5549 | Serum samples | 375.8625 |
| BMC infectious | 55.4737 | Pandemic influenza | 213.9699 | Body weight | 375.0021 |
| Hong Kong | 55.4737 | Clinical signs | 212.385 | Fig. | 374.0511 |
| Viral replication | 55.4737 | Dendritic cells | 209.2151 | Membrane fusion | 374.0511 |
| Virus infections | 55.4737 | Acute respiratory syndrome coronavirus | 208.9735 | Clinical trials | 373.8750 |
| BMC infectious diseases | 54.0 | Bronchitis virus | 207.6301 | Time point | 373.3719 |
| Respiratory viral infections | 54.0 | Endoplasmic reticulum | 207.6301 | Protein synthesis | 369.2962 |
| Case study | 53.8887 | RNA virus | 207.6301 | Dengue virus | 367.7113 |
| Dendritic cells | 53.8887 | Saudi Arabia | 207.6301 | e protein | 367.7113 |
| Mini review | 53.8887 | Innate immunity | 206.0451 | High levels | 365.3339 |
| RNA virus | 53.8887 | Recent studies | 206.0451 | Virus particles | 364.5414 |
| Transmissible gastroenteritis | 53.8887 | Economic losses | 204.4602 | Target cells | 362.5601 |
| BMC public | 52.3038 | Porcine epidemic | 204.4602 | Viral particles | 360.4204 |
| Monoclonal antibodies | 52.3038 | World health | 204.4602 | Dendritic cells | 357.5675 |
| Creative commons cc-by 4 | 51.0824 | Global health | 202.8752 | Total number | 356.4580 |
| Influenza pandemic | 50.7188 | Type 1 | 202.8752 | Cancer cells | 356.0883 |
| Type 1 | 50.7188 | Vaccine development | 201.2902 | Disease control | 355.2957 |
Fig. 4Common terms in Papers2
TOP@20 terms extracted from Paper1-content using C_Value and F-TFIDF-C_M - SWTs vs MWTs
| C_Value Measure | |||||
|---|---|---|---|---|---|
| SWTs | |||||
| TOP 20 | Cells | Virus | Infection | Protein | Study |
| Data | al | Patients | Expression | ||
| rna | Analysis | Result | Disease | p | |
| Mice | c | Influenza | Number | ||
| TOP 20 | t cells | Public health | Amino acid | Immune response | Gene expression |
| Viral replication | Cell lines | Viral infection | Virus infection | ||
| mg ml | Infectious diseases | Respiratory tract | Epithelial cells | ||
| Room temperature | Cell culture | Viral infection | |||
| TOP 20 | Mice | Patients | Influenza | Proteins | Health |
| dna | Vaccine | Transmission | Research | Model | |
| Children | Outbreak | Vaccination | e | China | |
| Peptide | Fusion | Network | Percent | mers-cov | |
| TOP 20 | Infectious disease | nk cells | Health care | Endothelial cells | |
| Ebola virus | Incubation period | ||||
| b cell | pol ii | ||||
In bold terms not in the UMLS thesaurus
Fig. 5C_Value vs F-TFIDF-C_M SWTs
Fig. 6C_Value vs F-TFIDF-C_M MWTs
Fig. 10Distribution of concepts according to the measures and their presence in the UMLS Metathesaurus: from Papers2-title corpus
Fig. 11Distribution of representative concepts when taking multiple corpora into account using C_Value: Papers1 corpora
Fig. 7Example of term variants
Term extraction variations using FASTR
| Infectious disease | Virus replication | Laboratory tests | Respiratory syndrome | Preventive measure | Syndrome coronavirus | |
|---|---|---|---|---|---|---|
| Diseases including infectious | Laboratory confirmation tests | Syndrome coronavirus-related coronavirus | ||||
| Infectious pulmonary diseases | replication of N1347A virus | Respiratory acute syndrome | Preventive hygienic measures | Syndrome human coronavirus | ||
| Virus optimal replication | Testing presents isolation laboratories | Syndrome coronavirus and respiratory | Prevention community-engaged measures | Syndromic Surveillance Coronavirus | ||
| Replicating influenza viruses | Laboratory diagnostic testing | Respiratory tract syndromic | Preventive health measures | Syndrome virus coronavirus | ||
| Infectious acute disease | Replication of human viruses | Laboratory genomic testing | Respiratory insufficiency syndrome | Preventive behavioral measures | Coronavirus Associated Syndromes |
Terms in the UMLS Metathesaurus in bold
Fig. 8Combined intra-corpus term extraction strategies
Fig. 9Combined inter-corpus term extraction strategies
Relevance of terms extracted from Papers2 depending on the metrics (C_Value or F-TFIDF-C_M)
| Corpus (Papers2) | Measure | Domain relevant | COVID-19 surveillance | Syndromic surveillance | Part of disease MWE | |
|---|---|---|---|---|---|---|
| Title | C_Value | 10 | 3 | 0 | 2 | 6 |
| Title | F-TFIDF-C_M | 9 | 4 | 1 | 1 | 1 |
| Abstract | C_Value | 10 | 1 | 0 | 0 | 6 |
| Abstract | F-TFIDF-C_M | 10 | 5 | 1 | 2 | 1 |
| Content | C_Value | 10 | 0 | 0 | 0 | 1 |
| Content | F-TFIDF-C_M | 10 | 2 | 0 | 2 | 0 |
| Title + abstract | C_Value | 10 | 3 | 0 | 0 | 7 |
| Title + abstract | F-TFIDF-C_M | 0 | - | - | - | - |
| Title + content | C_Value | 3 | 1 | 0 | 0 | 2 |
| Title + content | F-TFIDF-C_M | 0 | - | - | - | - |
Expanded terms from Table 6
| sous_corpus | Measure | Term | Domain relevant | COVID-19 surveillance | Syndromic surveillance | Incomplet disease name |
|---|---|---|---|---|---|---|
| title | C-value | Respiratory syndrome coronavirus | n | n | n | y |
| Porcine epidemic diarrhea | y | n | n | n | ||
| Syndrome coronavirus | n | n | n | y | ||
| Epidemic diarrhea virus | n | n | n | y | ||
| Acute respiratory syndrome | n | n | n | y | ||
| Public access | n | n | n | n | ||
| Diarrhea virus | n | n | n | y | ||
| Infectious bronchitis | y | n | y | n | ||
| Acute respiratory | n | n | n | y | ||
| Bronchitis virus | y | n | y | n | ||
| F-TFIDF-C | Journal pre-proof | n | n | n | n | |
| Virology journal | n | n | n | n | ||
| Influenza pandemic | y | n | n | n | ||
| Coronavirus spike | y | y | n | n | ||
| BMC public health | n | n | n | n | ||
| Influenza virus infection | y | n | n | n | ||
| Emerging infectious | y | n | y | n | ||
| Prcine circovirus type | n | n | n | y | ||
| Codon usage | n | n | n | n | ||
| Respiratory syndrome | n | n | n | y | ||
| abstract | C-value | Acute respiratory syndrome | n | n | n | y |
| Respiratory syndrome coronavirus | n | n | n | y | ||
| East respiratory syndrome | n | n | n | y | ||
| Syndrome coronavirus | n | n | n | y | ||
| Present study | n | n | n | n | ||
| Chain reaction | n | n | n | n | ||
| Syncytial virus | n | n | n | y | ||
| Porcine epidemic diarrhea | y | n | n | n | ||
| Polymerase chain | n | n | n | n | ||
| F-TFIDF-C | Virus infections | y | n | y | n | |
| Porcine epidemic | n | n | n | y | ||
| Clinical samples | n | n | n | n | ||
| Codon usage | n | n | n | n | ||
| Mers-cov infection | y | y | n | n | ||
| Pandemic influenza | y | n | n | n | ||
| Viral entry | y | n | y | n | ||
| 95 percent confidence interval | n | n | n | n | ||
| Immune cells | n | n | n | n | ||
| Influenza pandemic | y | n | n | n | ||
| Sono stati | n | n | n | n | ||
| content | C-value | Infected cells | n | n | n | n |
| Respiratory syndrome | n | n | n | y | ||
| Present study | n | n | n | n | ||
| Individual components | n | n | n | n | ||
| Essential medicines | n | n | n | n | ||
| Previous studies | n | n | n | n | ||
| de los | n | n | n | n | ||
| Functional task | n | n | n | n | ||
| der Schwangerschaft | n | n | n | n | ||
| F-TFIDF-C | Health emergency | y | n | y | n | |
| Membrane rafts | n | n | n | n | ||
| pcr products | n | n | n | n | ||
| afa dr | n | n | n | n | ||
| COD trypsin | n | n | n | n | ||
| 2c atpase | n | n | n | n | ||
| Naked mole | n | n | n | n | ||
| Intracellular delivery | n | n | n | n | ||
| Close contact | y | n | y | n | ||
| Final dataset | n | n | n | n | ||
| Respiratory syndrome | n | n | n | y | ||
| title + abstract | C-value | Acute respiratory syndrome | n | n | n | y |
| Respiratory syndrome coronavirus | n | n | n | y | ||
| East respiratory syndrome | n | n | n | y | ||
| Syndrome coronavirus | n | n | n | y | ||
| Syncytial virus | n | n | n | y | ||
| Porcine epidemic diarrhea | y | n | n | n | ||
| Antiviral activity | y | n | n | n | ||
| Acute respiratory syndrome coronavirus | n | n | n | y | ||
| Infectious bronchitis | y | n | n | n |
Each term has been evaluated by an expert according 4 criteria: domain relevant, COVID-19 surveillance, syndromic surveillance, incomplet disease name (y: yes, n: no)
60 terms randomly selected from FASTR variants (Section The driven term extraction approach)
| Influenza virus | Evaluation | Respiratory infections | Evaluation | Infectious diseases | Evaluation |
|---|---|---|---|---|---|
| Influenza a/wsn/33 virus | Not relevant | Respiratory virus infections | Relevant | Diseases relates to infectious | Relevant |
| Viruses and conventional influenza | Relevant | Respiratory viral infection | Relevant | Disease called feline infectious | Relevant |
| Virus remains the influenza | Lack of context | Infection by respiratory | Relevant | Infectious animal diseases | Relevant |
| Influenza by virus | Not relevant | Infections of the respiratory | Relevant | Infectious enteric diseases | Relevant |
| Influenza vaccine virus | Relevant | Infect respiratory | Relevant | Disease without being infectious | Not relevant |
| Virus and canine influenza | Relevant | Infections are respiratory | Relevant | Disease has an infectious | Lack of context |
| Virus influenza | Relevant | Infected with respiratory | Relevant | Infectious disease | Relevant |
| Viruses such as influenza | Relevant | Infection with other respiratory | Relevant | Disease named it infectious | Lack of context |
| Influenza b viruses | Relevant | Respiratory virus infection | Relevant | Infectious swine diseases | Relevant |
| Viruses and emerging influenza | Relevant | Infection transmitted via respiratory | Relevant | Disease models for infectious | Lack of context |
| Viral infections | Evaluation | Sars coronavirus | Evaluation | Incubation period | Evaluation |
| viral bronchopulmonary infection | Relevant | Coronavirus is urbani sars | Not relevant | Incubating period | Relevant |
| Virally infected | Relevant | Coronavirus of 18 sars | Not relevant | Periods of incubation | Relevant |
| Viral respiratory infections | Relevant | Coronavirus that causes sars | Relevant | Incubation periods | Relevant |
| Infection and encounter virally | Lack of context | Coronavirus named sars | Relevant | Period of incubation | Relevant |
| Infection or viral | Lack of context | Coronavirus related to sars | Relevant | Period than incubation | Lack of context |
| Viral skin infection | Relevant | Coronavirus isolated from sars | Relevant | Period and incubation | Lack of context |
| Virals infection | Relevant | Coronavirus responsable du sars | Relevant | Incubation for period | Lack of context |
| Infection with one viral | Relevant | Sars -associated coronavirus | Relevant | Period and incubating | Lack of context |
| viral opportunistic infections | Relevant | Sars human coronavirus | Relevant | Period covering an incubation | Relevant |
| infection at high viral | Lack of context | Sars and coronavirus | Relevant | Period of extrinsic incubation | Relevant |