| Literature DB >> 29589569 |
Xieling Chen1, Haoran Xie2, Fu Lee Wang3, Ziqing Liu4, Juan Xu5, Tianyong Hao6,7.
Abstract
BACKGROUND: Natural language processing (NLP) has become an increasingly significant role in advancing medicine. Rich research achievements of NLP methods and applications for medical information processing are available. It is of great significance to conduct a deep analysis to understand the recent development of NLP-empowered medical research field. However, limited study examining the research status of this field could be found. Therefore, this study aims to quantitatively assess the academic output of NLP in medical research field.Entities:
Keywords: Bibliometrics; Medical; Natural language processing; Scientific collaboration; Statistical characteristics; Thematic discovery and evolution
Mesh:
Year: 2018 PMID: 29589569 PMCID: PMC5872501 DOI: 10.1186/s12911-018-0594-x
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
The statistical characteristics of the dataset
| Characteristics | Statistics |
|---|---|
| Total #pub. | 1405 |
| #pub. with author address information | 1386 |
| #pub. with abstract | 1382 |
| #pub. with author keywords or PubMed MeSH | 1277 |
| #unique publication sources | 324 |
| #unique countries/first countries | 56/45 |
| #unique authors/first authors | 4391/1053 |
| #unique affiliations/first affiliations | 961/514 |
| Average #words/word characters in title | 12.53; 6.50 |
| Average number/standard deviation of character in title | 95.43; 29.72 |
| Average #words/word characters in abstract | 215.24; 5.62 |
| Average number/standard deviation of character in abstract | 1456.95; 536.2 |
| Top 10 frequency words/phrases in author keywords or PubMed MeSH | Electronic health record (363; 25.84%); Data mining (278; 19.79%); Information storage and retrieval (239; 17.01%); Artificial intelligence (179; 12.74%); Female (163; 11.60%); Semantics (156; 11.10%); Male (153; 10.89%); Controlled vocabulary (140; 9.96%); Automatic pattern recognition (127; 9.04%); Medical record system (112; 7.97%) |
| Top 10 frequency words/phrases extracted from title | Electronic health record (69; 4.91%); Medical record (55; 3.91%); Clinical text (45; 3.20%); Clinical note (41; 2.92%); Patient (37; 2.63%); Text mining (23; 1.64%); Classification (22; 1.57%); Clinical narrative (21; 1.49%); Radiology report (21; 1.49%); Natural language processing method (20; 1.42%) |
| Top 10 frequency words/phrases extracted from abstract | Patient (322; 22.92%); Precision (217; 15.44%); F-measure (205; 14.59%); Recall (178; 12.67%); Accuracy (164; 11.67%); Electronic health record (161; 11.46%); Natural language processing method (155; 11.03%); Medical record (143; 10.18%); Disease (141; 10.04%); Concept (128; 9.11%) |
Fig. 1The number and growth rate of publications by year
AIC and of 3 fitting models
| Model | AIC |
|
|---|---|---|
| 117.1439 | 0.7829 | |
| 98.70681 | 0.855 | |
| 98.26147 | 0.8703 |
Top 10 most productive publication sources
| Publication sources | # related pub. | Proportion of related pub. against 1405 pub. (%) | Total #pub. of the sources (Proportion of related pub. against total #pub.) |
|---|---|---|---|
| Journal of the American Medical Informatics Association | 154 | 10.96 | 1689 (9.12%) |
| AMIA Annual Symposium Proceedings | 153 | 10.89 | 2283 (6.70%) |
| Journal of Biomedical Informatics | 133 | 9.47 | 1378 (9.65%) |
| Studies in Health Technology and Informatics | 91 | 6.48 | 7434 (1.22%) |
| BMC Bioinformatics | 61 | 4.34 | 6332 (0.96%) |
| PloS ONE | 36 | 2.56 | 166,876 (0.02%) |
| AMIA Joint Summits on Translational Science Proceedings | 32 | 2.28 | 331 (9.67%) |
| Journal of Biomedical Semantics | 28 | 1.99 | 322 (8.70%) |
| BMC Medical Informatics and Decision Making | 27 | 1.92 | 1071 (2.52%) |
| Biomedical Informatics Insights | 22 | 1.57 | 59 (37.29%) |
| Total | 737 | 52.46 | N/A |
Publications and GDP per capita by country
| Country | #pub. | Proportion | Country | GDP per capita (1000 US dollars) |
|---|---|---|---|---|
| United States | 931 | 67.17% | Norway | 897.046 |
| United Kingdom | 72 | 5.19% | Switzerland | 780.731 |
| China (including Hong Kong and Macao) | 54 | 3.90% | Denmark | 589.324 |
| France | 50 | 3.61% | Ireland | 554.754 |
| Canada | 29 | 2.09% |
| 551.685 |
| Germany | 28 | 2.02% | Sweden | 545.730 |
| Japan | 24 | 1.73% |
| 514.139 |
| Australia | 23 | 1.66% | Netherlands | 506.744 |
Fig. 2Geomap visualization of publications by country (the more publications one country had, the closer the color was to red)
Top productive authors and first authors
| Rank | Authors | #pub. | Rank | First authors | #pub. |
|---|---|---|---|---|---|
| 1 |
| 54 | 1 |
| 12 |
| 2 |
| 50 | 2 |
| 9 |
| 3 |
| 41 | 3 |
| 8 |
| 4 |
| 27 | 4 |
| 7 |
| 5 |
| 25 | 4 |
| 7 |
| 6 |
| 24 | 6 |
| 6 |
| 7 |
| 21 | 6 |
| 6 |
| 8 |
| 20 | 6 |
| 6 |
| 9 |
| 19 | 6 |
| 6 |
| 10 |
| 18 | 6 |
| 6 |
| 10 |
| 18 | |||
| 10 |
| 18 |
Top productive author affiliations and first author affiliations
| Rank | Author affiliations | #pub. | Rank | First author affiliations | #pub. |
|---|---|---|---|---|---|
| 1 | Mayo Clinic | 86 | 1 | Mayo Clinic | 56 |
| 2 | The University of Utah | 82 | 2 | The University of Utah | 54 |
| 3 | Vanderbilt University | 78 | 3 | Vanderbilt University | 51 |
| 4 | National Institutes of Health | 64 | 4 | Columbia University | 43 |
| 5 | Columbia University | 59 | 5 | National Institutes of Health | 41 |
| 6 | Brigham and Women’s Hospital | 52 | 6 | Brigham and Women’s Hospital | 30 |
| 7 | University of Washington | 36 | 7 | University of Minnesota | 23 |
| 8 | University of Pittsburgh | 32 | 7 | University of Pittsburgh | 23 |
| 9 | Massachusetts General Hospital | 31 | 9 | VA Salt Lake City Health Care System | 21 |
| 9 | Stanford University | 31 | 10 | Massachusetts General Hospital | 19 |
The statistics of author and affiliation cooperation
| Year | Total #pub. | #co-author pub. | Co-author rate% | #co-affiliation pub. | Co-affiliation rate% | #co-country pub. | Co-country rate% |
|---|---|---|---|---|---|---|---|
| 2007 | 58 | 54 | 93.10 | 26 | 44.83 | 7 | 12.07 |
| 2008 | 73 | 64 | 87.67 | 32 | 43.84 | 8 | 10.96 |
| 2009 | 75 | 70 | 93.33 | 36 | 48.00 | 9 | 12.00 |
| 2010 | 94 | 85 | 90.43 | 44 | 46.81 | 6 | 6.38 |
| 2011 | 100 | 96 | 96.00 | 46 | 46.00 | 10 | 10.00 |
| 2012 | 129 | 121 | 93.80 | 63 | 48.84 | 13 | 10.08 |
| 2013 | 180 | 175 | 97.22 | 111 | 61.67 | 24 | 13.33 |
| 2014 | 171 | 161 | 94.15 | 111 | 64.91 | 22 | 12.87 |
| 2015 | 278 | 273 | 98.20 | 170 | 61.15 | 46 | 16.55 |
| 2016 | 228 | 219 | 96.05 | 146 | 64.04 | 36 | 15.79 |
| Total | 1386 | 1318 | N/A | 785 | N/A | 181 | N/A |
Fig. 3Force directed network of 87 authors with #pub. > = 8
Fig. 4Force directed network of 50 affiliations with #pub. > = 10
The top 10 key terms in the co-occurrence matrix
| Artificial intelligence | Data mining | Electronic health record | Female | Information storage and retrieval | Machine learning | Medical record | Patient | Precision | Semantics | |
|---|---|---|---|---|---|---|---|---|---|---|
| Artificial intelligence | 185 | 52 | 53 | 11 | 56 | 40 | 25 | 33 | 40 | 33 |
| Data mining | 52 | 288 | 122 | 31 | 20 | 53 | 38 | 55 | 46 | 52 |
| Electronic health record | 53 | 122 | 420 | 78 | 80 | 60 | 95 | 167 | 77 | 40 |
| Female | 11 | 31 | 78 | 169 | 15 | 10 | 46 | 82 | 18 | 10 |
| Information storage and retrieval | 56 | 20 | 80 | 15 | 239 | 18 | 30 | 42 | 47 | 47 |
| Machine learning | 40 | 53 | 60 | 10 | 18 | 162 | 25 | 39 | 30 | 22 |
| Medical record | 25 | 38 | 95 | 46 | 30 | 25 | 178 | 77 | 29 | 8 |
| Patient | 33 | 55 | 167 | 82 | 42 | 39 | 77 | 326 | 59 | 19 |
| Precision | 40 | 46 | 77 | 18 | 47 | 30 | 29 | 59 | 217 | 34 |
| Semantics | 33 | 52 | 40 | 10 | 47 | 22 | 8 | 19 | 34 | 165 |
Fig. 5Heatmap of AP clustering result for the 2007–2016 period
AP clustering result for the publication during the year2007–2016
| Cluster | Theme | Key terms |
|---|---|---|
| 1 | Computational biology | |
| 2 | Terminology mining | |
| 3 | Information extraction | |
| 4 | Text classification | |
| 5 | Social medium as data source | |
| 6 | Clinical information | |
| 7 | Patient characteristics | |
| 8 | Performance measurements | |
| 9 | Outcome evaluation | |
| 10 | Information retrieval |
Comparison of AP clustering results for the 2007–2011 and 2012–2016 periods
| Cluster | 2007–2011 | Cluster | 2012–2016 |
|---|---|---|---|
| 1 | Text mining; Abstracting and indexing as topic; Annotation; Database management system; Sentence | 1 | Text mining; |
| 2 | Female; Male | 2 | Female; Male; |
| 3 | Recall; Precision; F-measure | 3 | Recall; Precision; F-measure; Accuracy |
| 4 | Artificial intelligence; Information storage and retrieval; Automatic pattern recognition | 4 | Artificial intelligence; Semantics; Information storage and retrieval; Clinical text; Concept; Language; Sentence; Unified medical language system |
| 5 | Computational biology; Factual database; Gene; Protein; Protein-protein interaction | 5 | Computational biology; Factual database; Software; |
| 6 | Classification; Feature; Semantics; Data mining; Natural language processing method; Unified medical language system | 6 | Classification; Feature; Support vector machine; |
| 7 | Patient; Disease; Medical record; Medical record system; Patient discharge; Sensitivity and specificity | 7 | Patient; Medical record; Electronic health record; Clinical note |
| 8 | Medical informatics; User-computer interface; Software | 8 | Medical informatics; Annotation; Corpus; Gene; |
| 9 | Clinical text; Accuracy; Clinical decision support system; Clinical note; Electronic health record; Natural language processing system; Support vector machine | 9 | Automatic pattern recognition; Controlled vocabulary; Data mining; |
| 10 | Word; Corpus; Language | 10 | |
| 11 | Biomedical literature; Knowledge; Medline; Ontology | 11 | |
| 12 | Terminology as topic; Concept; Controlled vocabulary | 12 | Disease; Natural language processing method; |
First term in each cluster donates exemplar. Terms in bold type donate new emergent terms for 2012–2016 period compared with 2007–2011 period