Literature DB >> 27729352

Evolution of primary care databases in UK: a scientometric analysis of research output.

Paraskevas Vezyridis1, Stephen Timmons1.   

Abstract

OBJECTIVE: To identify publication and citation trends, most productive institutions and countries, top journals, most cited articles and authorship networks from articles that used and analysed data from primary care databases (CPRD, THIN, QResearch) of pseudonymised electronic health records (EHRs) in UK.
METHODS: Descriptive statistics and scientometric tools were used to analyse a SCOPUS data set of 1891 articles. Open access software was used to extract networks from the data set (Table2Net), visualise and analyse coauthorship networks of scholars and countries (Gephi) and density maps (VOSviewer) of research topics co-occurrence and journal cocitation.
RESULTS: Research output increased overall at a yearly rate of 18.65%. While medicine is the main field of research, studies in more specialised areas include biochemistry and pharmacology. Researchers from UK, USA and Spanish institutions have published the most papers. Most of the journals that publish this type of research and most cited papers come from UK and USA. Authorship varied between 3 and 6 authors. Keyword analyses show that smoking, diabetes, cardiovascular diseases and mental illnesses, as well as medication that can treat such medical conditions, such as non-steroid anti-inflammatory agents, insulin and antidepressants constitute the main topics of research. Coauthorship network analyses show that lead scientists, directors or founders of these databases are, to various degrees, at the centre of clusters in this scientific community.
CONCLUSIONS: There is a considerable increase of publications in primary care research from EHRs. The UK has been well placed at the centre of an expanding global scientific community, facilitating international collaborations and bringing together international expertise in medicine, biochemical and pharmaceutical research. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/.

Entities:  

Keywords:  PRIMARY CARE; STATISTICS & RESEARCH METHODS; electronic patient records; scientometrics

Mesh:

Year:  2016        PMID: 27729352      PMCID: PMC5073525          DOI: 10.1136/bmjopen-2016-012785

Source DB:  PubMed          Journal:  BMJ Open        ISSN: 2044-6055            Impact factor:   2.692


First study to perform a scientometric analysis of research output from primary care databases of electronic patient records. We analysed articles published from 1995 to 2015 in order to explore the historical breadth and growth of this type of research. The analysis is limited to articles and structured data retrieved from the Scopus database. Some latest articles and related citations might not have appeared in Scopus when the data set was extracted.

Introduction

Big data (analytics) refer to the aggregation and interrogation of—high volume, high velocity, high variety—data sets so as to reveal new, non-obvious, information and patterns.1 This field is advancing because of technological and scientific developments in information infrastructure and digitisation.2 For governments, opening up the data sets states hold about their citizens is believed to have, through computational and algorithmic analyses, a disruptive and transformative effect on knowledge.3 In the UK, big (open) data have been at the forefront of research activity and policymaking. Termed as one of the eight great technologies,4 UK has embraced the big (open) data movement more than many other developed countries (eg, USA, Australia, France).3 One area of particular relevance to big data analytics is healthcare. In UK, the National Health Service (NHS) is organised around primary care and, unless there is an accident or emergency, whenever citizens would like to use the NHS they have to go through their primary care physician, known in the UK as a general practitioner (GP). From there, they can be referred to a specialist at a hospital if necessary. Secondary care clinicians can then feedback information to GPs. Since the vast majority of the population (98%) is registered with a general practice, GPs act not only as the main gatekeepers for the NHS but also as important custodians of a longitudinal electronic health record (EHR).5 There are now many ongoing primary care databases of anonymised patient records in UK that can be used for healthcare research. These population-based databases contain data originating from routine general practice. Some newly established databases and research platforms of linked EHRs include ResearchOne6 and CALIBER.7 While there are more than 9600 general practices in UK that could potentially contribute data to these databases,8 it is usually 6–10% of these practices that do so. Such databases are usually used for cross-sectional surveys, case–control or cohort studies and for epidemiological, drug safety, clinical and healthcare usage research purposes. They rely heavily on individual general practices voluntarily contributing data via the propriety clinical systems they use to maintain these patient records. The records are usually anonymised or pseudoanonymised at source by allocating a unique number to each patient to allow for the updates of the records as well as for their linkage to other data sets, such as national mortality, national cancer registration and hospital records as well as with socioeconomic, ethnicity and environmental data sets. Access to these data sets is usually granted after scientific and ethics review and can be tailored to customer requirements. In this study, we examined the research output of three such databases that are well established in the research community and have contributed to a substantial number of scientific studies and publications. These are the Clinical Practice Research Datalink (CPRD),9 The Health Improvement Network (THIN)10 and QResearch.11 The CPRD (formerly known as the General Practice Research Database) is a not-for-profit research service funded by the NHS National Institute for Health Research (NIHR) and the Medicines and Healthcare products Regulatory Agency (MHRA). It is owned by the UK Department of Health and contains the records of 11 million patients (4.4 million active) from 674 general practices.5 There is a service cost associated with the preparation of the requested data. Unlike the other services described below, CPRD does not extract data from a particular propriety clinical system. Any general practice can contribute data after a data sharing agreement with the software supplier. Importantly, it is the only database accessible online.12 The THIN database contains the health records of 12 million patients from around 600 general practices that use the Vision clinical system by In Practice Systems (INPS). IMS Health can provide access to data, for example, via a yearly sublicense to an academic institution. THIN is the only database that can provide access to data for for-profit companies. QResearch is a research service located at the University of Nottingham. Its database contains the health records of 18 million patients from 1000 general practices that use the EMIS clinical system. Only academics employed by a UK university can have (in site) access only to a sample of the data set (maximum 100 000 patients) that is sufficient to answer a specific research question or hypothesis. As this research service is not-for-profit and entirely self-funded, there is usually a fee to be paid to cover the cost of the data extraction. The strengths of these databases lie in their size, breadth, representativeness of the UK population, long-term follow-up and data quality.5 They include good information on morbidity and lifestyle, prescribing, preventive care, current standards of care and interpractice variation.13 Since they are continually (and automatically) updated, they are ideal for researchers to discover and monitor healthcare trends as well as the effectiveness of new interventions and treatments, with minimum cost. They are increasingly linked to secondary care and mortality data sets. In contrast, their weaknesses include the fact that data are extracted from propriety clinical systems developed for patient management and not for healthcare research. There are issues of missing data (eg, from healthy patients), variable definitions for diagnoses and incomplete secondary care data (eg, hospital admissions). Wider health data (eg, treatment adherence, over the counter medication) and data about subpopulations (eg, prisoners, homeless people, refugees, travellers) are not captured adequately.5 Information governance and informed consent procedures around the data sharing of EHRs for research are still considered complex.14 These databases also require considerable clinical and scientific expertise, as well as technical capacity in data management to support research. When selecting a particular data resource for an observational study, researchers have to consider several other factors, such as the population covered and its geographical distribution, data capture and latency, linkage with other resources, privacy and security, quality and validation.15 Nonetheless, these databases are highly regarded within the research community since they have proved their value in helping researchers reach definitive answers in various healthcare debates of considerable public interest, particularly where other types of research have produced contradictory evidence. For example, in 2004 researchers from UK and Canada proved beyond doubt that measles–mumps–rubella (MMR) vaccination is not associated with autism in children.16 In contrast to expensive, time-consuming, and unrepresentative (of the population) traditional randomised trials,17 large-scale and randomised observational (comparative) evaluations of treatments and medications are minimally obtrusive for clinicians and patients and can support faster turnaround times for pragmatic evidence useful in clinical practice.18 The aim of this study was to perform a scientometric analysis of articles, published from 1995 to 2015, which have used data from at least one of these primary care databases. This empirical, semiautomated, method of quantitatively analysing a large number of publications provides a reliable and objective examination of the current status and trends as well as the structure and dynamics of this scientific field.19 20 In this way, policymakers, research funding bodies but most importantly new researchers entering this field can have a general overview of its knowledge base and an indication of what kind of network features, research activities and topics of interest are driving it.20 21 To the best of our knowledge, this is the first study of a systematic mapping of primary care databases research output.

Methods

In this study, Elsevier's Scopus database (http://www.scopus.com) was selected as the source of structured data on articles. This database covers more scientific articles than other databases (eg, Thomson Reuters Web of Science) and has the advantage of providing advance export functionality of structured data, including full citation information, abstracts, keywords and references. On 30 October 2015, we searched, using the document search functionality, for all articles containing the terms ‘General Practice Research Database (GPRD)’ OR ‘Clinical Practice Research Datalink (CPRD)’ OR ‘The Health Improvement Network (THIN)’ OR ‘QResearch’ in article title, abstract and keywords. The results were then limited to articles, articles in press, conference papers, reviews, book chapters and short surveys. Notes, letters, editorials and errata were excluded from the analysis. From there, we compared the resulted records with the bibliographic lists maintained by these databases22–24 so as to include articles that could not be retrieved using the above search queries. Data cleansing included the removal of duplicate records and records that were missing essential information for the analysis (eg, article title, journal). The fields of authors, year, source title, affiliations, author keywords and document type were used for the analysis. The final bibliography retrieved from Scopus was imported to Table 2 Net25 to extract networks of authors and contributing countries. It was then imported to Gephi26 where the ForceAtlas 2 algorithm27 was used to visualise the structural proximities for the communities of authors and contributing countries. The VOSviewer (V.1.6.3)28 software was used to visualise bibliometric networks and densities29 of frequent terms and journals. All other statistical analyses were performed using Microsoft Excel. We used the Journal Citation Reports (JCR) Science Edition 2014 to extract impact factor values for the identified journal titles.
Table 2

Distribution of scientific literature by document type

TypeNo. of papersPer cent
Article182596.5
Conference paper180.95
Book chapter40.21
Review412.16
Short survey30.15

Results

A total of 1891 papers from 1995 to 2015 were included in this bibliometric and scientometric analysis. The results are presented below.

Publication and citation trends

The literature related to the 3 primary care databases in England increased gradually from 7 papers in 1995 to 171 in 2015 (table 1). We estimated their compound annual growth rate (CAGR), for the years 1995–2014, to be 18.65%. The vast majority of papers were published in English across 425 different sources (16.76% CAGR for 1995–2014). In total, these papers have already been cited 73 929 times. There is, however, a small percentage of 1.16% (n=163) papers that have not yet been cited yet. The average citation per year is ∼3.52.
Table 1

Distribution of scientific literature by year

YearNo. of papersPer centNo. of citationsNo. of different sources
20151719.04255107
201421411.311188114
201320310.731934123
20121759.25305099
20111487.823900100
20101296.82523281
20091266.66534179
20081156.08475773
2007965.07523265
2006743.91594356
2005814.28583956
2004733.86641149
2003462.43299837
2002522.74383237
2001512.69377038
2000512.69633435
1999261.37159421
1998291.53258220
1997170.89156013
199670.3711377
199570.3710406
Total189110073 929425
Distribution of scientific literature by year We explored the distribution of publications by document type. This is presented in table 2 to identify the preferences of scholars using these databases in their research to share knowledge. The vast majority of scholars prefer to publish the findings of their research through journals, particularly as original articles (96.5%). Distribution of scientific literature by document type Next, we analysed the distribution of papers based on the academic discipline in which they have been categorised by Scopus (table 3) and by which each paper may be attributed to more than one subject area.30 Since we analysed bibliographic data based on published research using primary care databases, it comes as no surprise that the vast majority of papers are under the medicine category. There is, however, a considerable number of papers (∼25%) under the categories biochemistry, genetics and molecular biology and pharmacology, toxicology and pharmaceutics, which indicates an emphasis on the use of these databases for the study of medications. It also indicates the potential interest in these databases from the pharmaceutical sector.
Table 3

Distribution of scientific literature by discipline

SubjectNo. of papersPer cent
Medicine183897.2
Biochemistry, genetics and molecular biology26614.1
Pharmacology, toxicology and pharmaceutics19710.4
Neuroscience784.1
Immunology and microbiology703.7
Agricultural and biological sciences532.8
Nursing442.3
Psychology211.1
Arts and humanities130.7
Environmental science100.5
Distribution of scientific literature by discipline

Most productive institutions and countries

For a deeper insight into contribution patterns and scientific impact, we first identified the top 10 institutions (by number of papers) authors have used as affiliation and then we analysed citation patterns (table 4). We also analysed authors' affiliations based on the country of their institution. For this, each publication was assigned to its authors' respective affiliated countries so as to identify the network of multinational collaborations. The distribution of the top 10 contributing countries is presented in table 5. Finally, we visualised the network of contributing countries using Gephi. We ended up with a network of 29 nodes and 175 edges (figure 1). Each node represents a country, while its size denotes the country's degree and the colour the number of papers. The thickness of interconnected lines (edges) denotes the number of coauthored papers between the countries.
Table 4

Most productive institutions

RankInstitutionNo. of papersPer centTotal citationsMedian (IQR)Country
1University of Nottingham26614.0611 54018 (6–44.75)UK
2Boston University22812.0512 32821.5 (6–57)USA
3Centro Espanol de Investigacion Farmacoepidemiologica (CEIFE)1638.62849326 (7–59.5)Spain
4University College London1568.25522614 (4.75–37)UK
5London School of Hygiene & Tropical Medicine1246.55369614 (4–40.5)UK
6University of Utrecht1186.24506717.5 (4–48.75)The Netherlands
7University of Pennsylvania1105.81750824.5 (10–64.25)USA
8Medicines and Healthcare Products Regulatory Agency944.97262314 (4–33.25)UK
9King's College London924.86222113 (4–32.75)UK
10University of Oxford864.5418069.5 (3–23.25)UK
Table 5

Top contributing countries

RankCountryNo. of papersPer cent
1UK120263.56
2USA56329.77
3Spain19210.15
4Netherlands1648.67
5Switzerland1156.08
6Canada1125.92
7Sweden1065.60
8Germany643.38
9France512.69
10Italy361.90
Figure 1

Network of contributing countries.

Most productive institutions Top contributing countries Network of contributing countries. The majority of the most productive institutions are universities. Top universities include the University of Nottingham, Boston University, University College London (UCL), the London School of Hygiene & Tropical Medicine and the University of Utrecht. Apart from these academic institutions, a research unit in Spain (CEIFE) and the MHRA in the UK are involved in primary care databases-based research. In this table, we also report the medians along with the IQRs. From this, it seems that scholars from CEIFE, University of Pennsylvania and Boston University are coauthors in publications that are highly cited compared to the other institutions in this list. Switzerland, Canada, Sweden, Germany, France and Italy had no institution among the top 10 list, although they were ranked among the top 10 productive countries. Most papers are published by scholars from UK (63.56%), followed by the USA and Spain. With the exception of USA and Canada, most of the productive countries are in Europe. What is particularly interesting in these two tables is that scholars in institutions from the USA and Spain produce not only a great number of publications but also publications that are widely recognised by this scientific community in terms of citations. Taking into account the measurements of weighted degree, clustering, eigenvector centrality and betweenness centrality (table 6), we observe that once again the UK, followed by USA, is placed at the centre of this scientific community. With the highest degrees of all measurements, institutions from this country are the most well-connected and authoritative ones, facilitating the linking between institutions in other countries.
Table 6

Top countries by centrality

RankCountryOccurrencesWeighted degreePage rankEigen centralityCloseness centralityBetweenness centrality
1UK577027.00.0621.00.9330.250
2USA250626.00.0600.980.9030.229
3Netherlands66622.00.0470.950.80.049
4Canada56718.00.0390.830.7170.024
5Switzerland40918.00.0390.850.7170.015
6Italy8517.00.0370.780.70.024
7Spain48117.00.0370.820.70.011
8Australia3716.00.0360.760.6820.024
9Sweden31516.00.0360.770.6820.023
10Germany12116.00.0360.710.6820.020
Top countries by centrality

Top journals

In table 7, we identify the top 10 journals where most research is published. Six of these journals are published by a UK publisher, and the rest are published in the USA. The journal Pharmacoepidemiology and Drug Safety features at the top of list, followed by the British Journal of Clinical Pharmacology and Pharmacotherapy, which signifies the focus of research, produced from these primary care databases, on the safe use of medication. This focus can also be seen in table 3, where (apart from medicine) most papers are published in the fields of biochemistry, genetics and molecular biology and pharmacology, toxicology and pharmaceutics (Scopus classification), but also in table 5, where most of the top-cited papers refer to potential risk for particular complications/conditions from the use of specific medication. More specialised journals, such as Diabetes Care and Annals of the Rheumatic Diseases, are also featured in this list, indicating a particular focus of research activity in specific spectrums of diseases.
Table 7

Top journals of published research

RankJournal nameNo. of papersPer centPublisherImpact factorOpen accessCountry
1Pharmacoepidemiology and Drug Safety1156.08Wiley2.939HybridUK
2British Medical Journal1005.28BMJ Publishing Group17.445FullUK
3British Journal of General Practice673.54Royal College of General Practitioners2.294FullUK
4British Journal of Clinical Pharmacology573.01Wiley3.878HybridUK
5PLoS One512.69Public Library of Science3.234FullUSA
6BMJ Open412.16BMJ Publishing Group2.271FullUK
7Pharmacotherapy341.79Wiley2.662HybridUSA
8Diabetes Care301.58American Diabetes Association8.420HybridUSA
9Epidemiology241.26Wolters Kluwer6.196HybridUSA
10Annals of the Rheumatic Diseases231.21BMJ Publishing Group10.377HybridUK
Top journals of published research Four journals in this list are open access (BMJ, British Journal of General Practice, PLoS One, BMJ Open), which greatly facilitates the sharing of knowledge without limitations. The BMJ enjoys widespread recognition of the high quality of its published studies, as indicated by the high impact factor. The rest are behind a pay wall but offer authors an open access option to publish their research (hybrid access). An extra column with the Impact Factors of these top 10 journals from the 2014 JCR was also added in the table. What is also of particular importance in terms of scientific impact is that the 10 most cited papers identified (see table 8) have not been published in journals in this list. However, by performing a (full counting) analysis of cocitation links in VOSviewer (figure 2) for journals cited in the Scopus data set (minimum number of citations=10) we see that most of this list is represented here (blue—lowest density, red—highest density).
Table 8

Most cited papers

RankAuthors/titleYearCountryJournalImpact factorCitationsOpen access
1Jick H, Zornberg GL, Jick SS, Seshadri S, Drachman DA2000USALancet45.2171322No
Statins and the risk of dementia
2Gelfand JM, Neimann AL, Shin DB, Wang X, Margolis DJ, Troxel AB2006USAJournal of the American Medical Association35.289854Yes
Risk of myocardial infarction in patients with psoriasis
3Van Staa TP, Leufkens HGM, Abenhaim L, Zhang B, Cooper C2000UKJournal of Bone and Mineral Research6.832796Yes
Use of oral corticosteroids and risk of fractures
4Henry D, Lim LLY, Rodriguez LAG, Perez Gutthann S, Carson JL, Griffin M, Savage R Logan R, Moride Y, Hawkey C, Hill S, Fries JT1996Australia, Spain, USA, New Zealand, UKBritish Medical Journal17.445688Yes
Variability in risk of gastrointestinal complications with individual non-steroidal anti-inflammatory drugs: Results of a collaborative meta-analysis
5Yang YX, Lewis JD, Epstein S, Metz DC2006USAJournal of the American Medical Association35.289637Yes
Long-term proton pump inhibitor therapy and risk of hip fracture
6Currie CJ, Poole CD, Gale EAM2009UKDiabetologia6.671629Yes
The influence of glucose-lowering therapies on cancer risk in type 2 diabetes
7Jick H, Jick SS, Gurewich V, Myers MW, Vasilakis C1995USA, UKLancet45.217612No
Risk of idiopathic cardiovascular death and nonfatal venous thromboembolism in women using oral contraceptives with differing progestagen components
8Dial S, Delaney JAC, Barkun AN, Suissa S2005CanadaJournal of the American Medical Association35.289582Yes
Use of gastric acid-suppressive agents and the risk of community-acquired Clostridium difficile-associated disease
9Smeeth L, Thomas SL, Hall AJ, Hubbard R, Farrington P, Vallance P2004UKNew England Journal of Medicine55.873546Yes
Risk of myocardial infarction and stroke after acute infection or vaccination
10Neimann AL, Shin DB, Wang X, Margolis DJ, Troxel AB, Gelfand JM2006USAJournal of the American Academy of Dermatology4.449528No
Prevalence of cardiovascular risk factors in patients with psoriasis
Figure 2

Journal cocitation analysis.

Most cited papers Journal cocitation analysis.

Most cited papers

Next, we focused on the top 10 papers31–40 and calculated the total count of citations for each paper (table 8) for the period 1995–2015. Citations totalled 7194 (1.02%) of all citations in this data set. It seems that these studies in dementia, psoriasis, fractures, cardiovascular diseases and gastrointestinal complications in relation to certain medications have been of great interest in this scientific community. The majority of the top 10 most cited papers (60%) are open access at the publisher's website and can be freely read by anyone. Of the 10 papers, 8 are single country papers, while none were singled authored. Again, the USA has a considerable presence in this list, producing papers that are highly cited. In addition, many of the highly productive authors identified (table 10) were also found in this list.
Table 10

Most productive authors

RankAuthorAffiliationCountryNo. of papersWeighted degreeClusteringEigen centralityCloseness centralityBetweenness centrality
1Rodriguez LAGSpanish Centre for Pharmacoepidemiologic Research (CEIFE)Spain16682.00.0820.3460.3370.073
2Jick SSBoston UniversityUSA14290.00.0660.3270.3290.078
3Van Staa TPUniversity of ManchesterUK115144.00.0631.00.4030.217
4Jick HBoston UniversityUSA9855.00.0980.2170.3220.032
5Meier CRUniversity of BaselSwitzerland9454.00.1160.2320.2940.014
6Hubbard RUniversity of NottinghamUK8682.00.0920.4410.3590.067
7Smeeth LLondon School of Hygiene & Tropical MedicineUK7989.00.1010.5780.3660.067
8Hippisley-Cox JUniversity of NottinghamUK7257.00.1440.2440.3250.065
9Johansson SAstraZenecaSweden7046.00.2180.2670.3100.012
10Cooper CUniversity of SouthamptonUK6567.00.1390.4140.3420.035
West JUniversity of NottinghamUK6542.00.2130.2300.3100.012

Authorship patterns and networks

Authorship distributions varied from single to a maximum of 155 authors—for a study about the feasibility of international collaboration to evaluate, based on a common protocol, the risk of Guillain–Barré syndrome following pH1N1 vaccination.41 In total, there were 9385 authors involved in the 1981 papers during 1995–2015. In table 9, we can see that more than three-quarters of all papers were published by three or more authors. Only 1% of papers were written by a single author, while five papers did not have any authorship details. Almost a quarter of all papers was published by four authors, which have been widely cited across this scientific community. This indicates the high degree of expert collaboration in this field, that is necessary in analysing millions of primary care records.
Table 9

Coauthorship distribution

RankNo. of authorsNo. of papersPer centCitationsPer cent
1446024.3316 43722.23
2538320.2516 07121.74
3331016.3912 20916.51
4625213.3312 08416.35
521457.6771599.68
671236.5037985.14
78884.6529143.94
8>11482.5417922.42
99281.486170.83
1010251.324120.56
111191.004360.59
Coauthorship distribution Table 10 provides the ranking of the top 10 scholars, first, in terms of research productivity based on the overall number of coauthored papers. While, generally, most scholars are from the UK, the Director of the Spanish Centre for Pharmacoepidemiologic Research (CEIFE)42 is the scholar with the most published research from these primary care databases. Also, there are researchers in this field who do not necessarily come from the academic environment. The pharmaceutical sector is actively involved in knowledge production from electronic primary care records. Most productive authors Considering only those scholars who have coauthored at least two papers in this data set, the analysis suggested a network (figure 3) with 1261 nodes and 6186 edges. Here, each node represents an author, while its size denotes the number of author's papers. The interconnected lines (edges) denote the coauthored papers between those authors. For better visualisation, we limited the number of minimum degrees to 5 (maximum degrees=145). After a modularity measurement, to identify community structure,43 we observe some established collaborative teams (clusters with different colours) around specific and highly productive scholars in the analysis of data from primary care databases also found in table 10. We also observe a new (blue) cluster around the lead statistician for THIN44—one of the three primary care databases studied.
Figure 3

Clustered coauthor network.

Clustered coauthor network. Taking into account the measurements of weighted degree, clustering, eigenvector centrality and betweenness centrality (table 10), results indicate a cluster placed at the centre of this scientific community. With the lowest degree of clustering and the highest degrees of all the other measurements, its prominent scholar is the most well-connected, facilitating, more than any other scholar, linking between other scientific clusters and scholars. What is also particularly interesting is the fact that some of the scholars (and their institutions) in this list are affiliated, to a certain extent, to these databases, having served or currently acting as their founders, directors, lead scientists or members of their scientific committee.45 46 For example, lead scientists from the Boston Collaborative Drug Surveillance Program in Boston University were among the first who developed the technical and scientific capacity of these databases in pharmacoepidemiological research.47 48

Research topics

We conducted a keyword analysis to identify important topics of published research. For this, we first extracted from the bibliographic data set 5813 unique keywords as indexed by Scopus30 to base our analysis on more complete indexing information compared to authors' keywords. We retrieved the top 30 keywords for two specific categories: medical conditions and medications/substances (tables 11 and 12). Next, we created a (full counting of) term co-occurrence density map in VOSviewer (figure 4) by building a text corpus out of the title and abstract fields in the bibliographic data set (minimum number of a term occurrence=10). In this way, we were able to identify topics that not only appear more frequently in the literature, but that were also strongly related to each other, forming clusters of topics. Blue indicates a low density of terms and red indicates the highest density of terms. In many cases, the density map represents the frequency of indexed keywords in tables 11 and 12. Clearly, smoking, diabetes, cardiovascular diseases, mental illnesses, psoriasis, obesity, pregnancy and cancer as well as medication and substances that can treat these medical conditions, such as aspirin, insulin, antidepressants and non-steroid anti-inflammatory agents (NSAIDs), have been of great interest for scholars using EHRs in primary care.
Table 11

Top keywords: medical conditions

RankKeywordOccurrencesRankKeywordOccurrences
1Smoking32816Cardiovascular diseases96
2Diabetes mellitus22317Myocardial infarction94
3Hypertension22318Chronic obstructive lung disease90
4Non-insulin dependent diabetes mellitus17919Heart failure86
5Depression16720Cerebrovascular accident85
6Stroke16521Rheumatoid arthritis83
7Asthma15822Epilepsy81
8Diabetes mellitus, type 215523Breast cancer78
9Cancer risk15024Fracture75
10Cardiovascular risk14725Psoriasis75
11Cardiovascular disease13326Gastrointestinal haemorrhage69
12Obesity12927Hip fracture68
13Heart infarction12628Osteoporosis68
14Pregnancy12529Colorectal cancer65
15Ischaemic heart disease10430Fractures, bone65
Table 12

Top keywords: medications/substances

RankKeywordOccurrencesRankKeywordOccurrences
1Non-steroid anti-inflammatory agent18216Proton pump inhibitor75
2Acetylsalicylic acid15417Warfarin71
3Metformin15018Antidiabetic agent69
4Corticosteroid14319Anticonvulsive agent68
5Hydroxymethylglutaryl coenzyme a reductase inhibitor13820Serotonin uptake inhibitor65
6Insulin13321Calcium channel blocking agent64
7Antidepressant agent12422Antibacterial agents62
8β adrenergic receptor blocking agent10823Hydroxymethylglutaryl-coA reductase inhibitors62
9Hypoglycemic agents9124Oral antidiabetic agent61
10Anti-inflammatory agents, non-steroidal9025Paracetamol56
11Dipeptidyl carboxypeptidase inhibitor8826Diuretic agent53
12Antihypertensive agent8227Ibuprofen52
13Neuroleptic agent8028Simvastatin52
14Antibiotic agent7729Tricyclic antidepressant agent52
15Hemoglobin A1c7530Diclofenac50
Figure 4

Term co-occurrence density map.

Top keywords: medical conditions Top keywords: medications/substances Term co-occurrence density map.

Discussion

This study identified the leading institutions, countries, authors, journals and topics as well as their networks of published research that have used primary care databases in the UK to extract and analyse data from EHRs. There is a growing production of such papers which indicates the interest of a global and highly collaborative scientific community in this field and also the knowledge and insights that can be gained for healthcare improvement. Publication output increased from 7 papers in 1995 to 171 by October 2015 (18.65% CAGR for 1995–2014). It may be worth noting that by performing a similar, limited to the UK, search in Scopus for the same period and with the keyword ‘primary care’ we found a 10.83% CAGR, which shows the increase in research conducted from these databases outstrips the field more generally. The vast majority of publications (96.5%) were journal articles. While this research field can be located, generally, in medicine, biochemical and pharmaceutical developments seem to be equally important, aimed at addressing widespread medical conditions, such as diabetes, cardiovascular diseases, mental illnesses, smoking, obesity and cancer. The UK has been well placed in this scientific field. This is partly due to the fact that there is now more than 30 years of data available in GP information systems.49 The investment in developing primary care databases from EHRs for research purposes has placed the country at the centre of a network of collaborations across the globe, bringing together international expertise for the analysis of ever-expanding and increasingly interlinked clinical data sets from primary, secondary and tertiary care. Access to these data sets has also allowed researchers and institutions from other countries to develop their own programmes of research to answer important clinical questions. Six of the most productive institutions are located in the UK, and 63.56% of publications were authored by scholars affiliated with this country, followed by the USA. Interestingly, the top institutions were not exclusively universities. Among them, we can find a research unit in Spain (CEIFE) and the executive agency in UK that funds and runs the CPRD database (MHRA), while one of the most productive scholars is affiliated with a pharmaceutical company. This signifies the great interest of various actors, from academic, governmental and private sectors, in research with primary care databases. The geographical trend can also be observed from the location of journals with the most published papers. Six of the top 10 journals are published in the UK, followed by the USA. The journals with the most papers published included Pharmacoepidemiology and Drug Safety, BMJ, British Journal of General Practice and British Journal of Clinical Pharmacology, which signifies the great interest of scholars in using data from EHRs for pharmaceutical research. This is partially because one of the oldest sets of routine information collected by GP practices in the UK and made available by these databases is drug histories.47 Regarding restrictions on access to research outputs, only four journals in this list are fully open access. This may limit access to knowledge to researchers and members of the public that cannot afford subscription costs. Interestingly, it is the more established journals in medicine, such as JAMA, Lancet and NEJM, that have published some of the most cited papers in this bibliometric data set and enjoy a high level of cocitation activity. Keyword analyses show that smoking, diabetes, cardiovascular diseases, mental illnesses, psoriasis, obesity, pregnancy and cancer constitute the main topics of research activity using EHRs in primary care. Often, this research concentrates on developing algorithms to identify risk of occurrence of a particular disease. Researchers are also interested in investigating medications that can treat these medical conditions, such as aspirin, other NSAIDs, insulin and antidepressants. For the vast majority of publications, authorship varied between three and six authors, indicating widely collaborative, international, efforts to promote research in this field. Coauthorship network analyses showed that the lead scientists, directors and founders of these databases were found, to various degrees, at the centre of clusters in this scientific community, highlighting their invaluable contribution to knowledge production. As Azoulay et al50 have demonstrated in their study about eminent researchers and the vitality of a field, the development of coauthorship networks and clusters of collaborators in newly established scientific domains might be useful to boost research productivity. On the basis of each database's data access requirements, their established researchers appear to have a fundamental role in facilitating and promoting international collaborations for more researchers, institutions and countries. Importantly, they have a clear and in-depth understanding of the kind of research activities these databases can support in terms of data quality, structure and EHR coding practices. As these databases are expected to open up in the future to more stakeholders from various disciplines around health and as universities prepare to incorporate training in data science skills (eg, statistics, biomedical informatics, biology and medicine)51 into their clinical curricula, so as to nurture the next generation of clinical investigators,52 these established researchers could promote quality, reliable and ethically appropriate scientific research53 from complicated and highly contextual data sets. Our study has the typical limitations of a scientometric study. We analysed articles published in a period of 20 years in order to explore the historical breadth and growth of research from electronic primary care records. However, this analysis is limited on structured data retrieved from one bibliometric database of peer-reviewed literature. Therefore, only articles published in journals in its index were analysed. Also, some of the latest articles and related citations might not have been retrieved at the time of the search, which might explain the decrease in the number of publications and citations particularly from 2010 onwards. It was beyond the scope of this quantitative study to assess the scientific quality and the socioeconomic impact of the large number of publications analysed here. These studies have deployed a range of study designs across many subfields of primary care research and with various research findings. Our main objective was restricted to assessing one aspect of academic impact and research quality, that is, patterns and trends in research outputs.54 Future research could focus on the wider academic and socioeconomic impact of these studies by examining the relationship between publications, citation patterns and collaborations with the development of new scientific methods in the field or of new medical products and healthcare services. In conclusion, output of primary care research from EHRs has consistently increased since their development. The development of these databases in the UK has placed the country and affiliated academic institutions at the centre of an expanding global scientific community, facilitating international collaborations and bringing together international expertise in medicine, biochemical and pharmaceutical research.
  32 in total

1.  The UK General Practice Research Database.

Authors:  T Walley; A Mantgani
Journal:  Lancet       Date:  1997-10-11       Impact factor: 79.321

2.  Guidelines for good database selection and use in pharmacoepidemiology research.

Authors:  Gillian C Hall; Brian Sauer; Alison Bourke; Jeffrey S Brown; Matthew W Reynolds; Robert LoCasale; Robert Lo Casale
Journal:  Pharmacoepidemiol Drug Saf       Date:  2011-11-08       Impact factor: 2.890

3.  Feasibility study and methodology to create a quality-evaluated database of primary care data.

Authors:  Alison Bourke; Hassy Dattani; Michael Robinson
Journal:  Inform Prim Care       Date:  2004

4.  International collaboration to assess the risk of Guillain Barré Syndrome following Influenza A (H1N1) 2009 monovalent vaccines.

Authors:  Caitlin N Dodd; Silvana A Romio; Steven Black; Claudia Vellozzi; Nick Andrews; Miriam Sturkenboom; Patrick Zuber; Wei Hua; Jan Bonhoeffer; Jim Buttery; Nigel Crawford; Genevieve Deceuninck; Corinne de Vries; Philippe De Wals; M Victoria Gutierrez-Gimeno; Harald Heijbel; Hayley Hughes; Kwan Hur; Anders Hviid; Jeffrey Kelman; Tehri Kilpi; S K Chuang; Kristine Macartney; Melisa Rett; Vesta Richardson Lopez-Callada; Daniel Salmon; Francisco Gimenez-Sanchez; Nuria Sanz; Barbara Silverman; Jann Storsaeter; Umapathi Thirugnanam; Nicoline van der Maas; Katherine Yih; Tao Zhang; Hector Izurieta
Journal:  Vaccine       Date:  2013-06-14       Impact factor: 3.641

5.  Use of gastric acid-suppressive agents and the risk of community-acquired Clostridium difficile-associated disease.

Authors:  Sandra Dial; J A C Delaney; Alan N Barkun; Samy Suissa
Journal:  JAMA       Date:  2005-12-21       Impact factor: 56.272

6.  Statins and the risk of dementia.

Authors:  H Jick; G L Zornberg; S S Jick; S Seshadri; D A Drachman
Journal:  Lancet       Date:  2000-11-11       Impact factor: 79.321

7.  Big data and new knowledge in medicine: the thinking, training, and tools needed for a learning health system.

Authors:  Harlan M Krumholz
Journal:  Health Aff (Millwood)       Date:  2014-07       Impact factor: 6.301

8.  The National Institutes of Health's Big Data to Knowledge (BD2K) initiative: capitalizing on biomedical big data.

Authors:  Ronald Margolis; Leslie Derr; Michelle Dunn; Michael Huerta; Jennie Larkin; Jerry Sheehan; Mark Guyer; Eric D Green
Journal:  J Am Med Inform Assoc       Date:  2014-07-09       Impact factor: 4.497

9.  Data resource profile: cardiovascular disease research using linked bespoke studies and electronic health records (CALIBER).

Authors:  Spiros C Denaxas; Julie George; Emily Herrett; Anoop D Shah; Dipak Kalra; Aroon D Hingorani; Mika Kivimaki; Adam D Timmis; Liam Smeeth; Harry Hemingway
Journal:  Int J Epidemiol       Date:  2012-12-05       Impact factor: 7.196

10.  ForceAtlas2, a continuous graph layout algorithm for handy network visualization designed for the Gephi software.

Authors:  Mathieu Jacomy; Tommaso Venturini; Sebastien Heymann; Mathieu Bastian
Journal:  PLoS One       Date:  2014-06-10       Impact factor: 3.240

View more
  15 in total

1.  Individual, programmatic and systemic indicators of the quality of mental health care using a large health administrative database: an avenue for preventing suicide mortality.

Authors:  Lise Thibodeau; Elham Rahme; James Lachaud; Éric Pelletier; Louis Rochette; Ann John; Anne Reneflot; Keith Lloyd; Alain Lesage
Journal:  Health Promot Chronic Dis Prev Can       Date:  2018 Jul/Aug       Impact factor: 3.240

2.  Global cocaine intoxication research trends during 1975-2015: a bibliometric analysis of Web of Science publications.

Authors:  Sa'ed H Zyoud; W Stephen Waring; Samah W Al-Jabi; Waleed M Sweileh
Journal:  Subst Abuse Treat Prev Policy       Date:  2017-02-02

3.  Spatial distribution of clinical computer systems in primary care in England in 2016 and implications for primary care electronic medical record databases: a cross-sectional population study.

Authors:  Evangelos Kontopantelis; Richard John Stevens; Peter J Helms; Duncan Edwards; Tim Doran; Darren M Ashcroft
Journal:  BMJ Open       Date:  2018-02-28       Impact factor: 2.692

4.  Under-recording of hospital bleeding events in UK primary care: a linked Clinical Practice Research Datalink and Hospital Episode Statistics study.

Authors:  Laura McDonald; Cormac J Sammon; Mihail Samnaliev; Sreeram Ramagopalan
Journal:  Clin Epidemiol       Date:  2018-09-04       Impact factor: 4.790

5.  Characteristics and trends of oral leukoplakia research: A bibliometric study of the 100 most cited articles.

Authors:  Wei Liu; Yu Zhang; Lan Wu; Xi Yang; Linjun Shi
Journal:  Medicine (Baltimore)       Date:  2019-07       Impact factor: 1.817

6.  Limitations for health research with restricted data collection from UK primary care.

Authors:  Helen Strongman; Rachael Williams; Wilhelmine Meeraus; Tarita Murray-Thomas; Jennifer Campbell; Lucy Carty; Daniel Dedman; Arlene M Gallagher; Jessie Oyinlola; Antonis Kousoulis; Janet Valentine
Journal:  Pharmacoepidemiol Drug Saf       Date:  2019-04-16       Impact factor: 2.890

Review 7.  Concept libraries for automatic electronic health record based phenotyping: A review.

Authors:  Zahra A Almowil; Shang-Ming Zhou; Sinead Brophy
Journal:  Int J Popul Data Sci       Date:  2021-06-16

Review 8.  Methods for enhancing the reproducibility of biomedical research findings using electronic health records.

Authors:  Spiros Denaxas; Kenan Direk; Arturo Gonzalez-Izquierdo; Maria Pikoula; Aylin Cakiroglu; Jason Moore; Harry Hemingway; Liam Smeeth
Journal:  BioData Min       Date:  2017-09-11       Impact factor: 2.522

9.  Linkage of the CHHiP randomised controlled trial with primary care data: a study investigating ways of supplementing cancer trials and improving evidence-based practice.

Authors:  Agnieszka Lemanska; Rachel C Byford; Clare Cruickshank; David P Dearnaley; Filipa Ferreira; Clare Griffin; Emma Hall; William Hinton; Simon de Lusignan; Julian Sherlock; Sara Faithfull
Journal:  BMC Med Res Methodol       Date:  2020-07-25       Impact factor: 4.615

10.  Use of real-world evidence in postmarketing medicines regulation in the European Union: a systematic assessment of European Medicines Agency referrals 2013-2017.

Authors:  Jeremy Philip Brown; Kevin Wing; Stephen J Evans; Krishnan Bhaskaran; Liam Smeeth; Ian J Douglas
Journal:  BMJ Open       Date:  2019-10-28       Impact factor: 2.692

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.