Literature DB >> 32296253

A network-based method with privacy-preserving for identifying influential providers in large healthcare service systems.

Xiaoyu Qi¹, Gang Mei¹, Salvatore Cuomo², Lei Xiao¹.

Abstract

In data science, networks provide a useful abstraction of the structure of many complex systems, ranging from social systems and computer networks to biological networks and physical systems. Healthcare service systems are one of the main social systems that can also be understood using network-based approaches, for example, to identify and evaluate influential providers. In this paper, we propose a network-based method with privacy-preserving for identifying influential providers in large healthcare service systems. First, the provider-interacting network is constructed by employing publicly available information on locations and types of healthcare services of providers. Second, the ranking of nodes in the generated provider-interacting network is conducted in parallel on the basis of four nodal influence metrics. Third, the impact of the top-ranked influential nodes in the provider-interacting network is evaluated using three indicators. Compared with other research work based on patient-sharing networks, in this paper, the provider-interacting network of healthcare service providers can be roughly created according to the locations and the publicly available types of healthcare services, without the need for personally private electronic medical claims, thus protecting the privacy of patients. The proposed method is demonstrated by employing Physician and Other Supplier Data CY 2017, and can be applied to other similar datasets to help make decisions for the optimization of healthcare resources in the response to public health emergencies.

Entities: Chemical Disease Gene Species

Keywords: Algorithm; Data science; Healthcare service system; Influential node; Network analysis; Network resilience

Year: 2020 PMID： 32296253 PMCID： PMC7157485 DOI： 10.1016/j.future.2020.04.004

Source DB: PubMed Journal: Future Gener Comput Syst ISSN： 0167-739X Impact factor: 7.187

Introduction

A network is a collection of nodes/vertices joined together in pairs by links/edges. Networks are used in many fields to represent the patterns of connections between the components of various complex systems [1]. Networks provide a useful abstraction of the structures of many complex systems, ranging from social systems and computer networks to biological networks and physical systems. Much research has been conducted to extract insights from network data especially based on the topology of networks [2]. Healthcare service systems are one of the main social systems that can also be understood using network-based approaches [3]. For example, social network analysis can be used to estimate patient flows, evaluate cooperation between healthcare providers, or identify influential providers in networked healthcare service systems. These extracted insights from networked healthcare systems can be exploited to improve healthcare service utilization and management [3]. Identifying the most influential providers in a healthcare service system is an important step towards optimizing the use of available healthcare resources and ensuring the more efficient delivery of the matched healthcare service. This information can be abstracted to identify the influential nodes in networks, which mainly consists of two stages: (1) creating reasonable networks for the interested healthcare service systems and (2) identifying the influential nodes in the created networks. Much research has been conducted to abstract healthcare systems as networks, and most of this research has focused on creating “patient-sharing networks” based on electronic medical claims [4]. For example, Pollack et al. [5] first created patient-sharing networks among physicians using medical claims data and then investigated care coordination and costs of care. Ong et al. [6] created a patient-sharing network using commercial healthcare claims spanning the years 2008 through 2011 and elucidated the effect of professional relationships among providers on multiple-provider prescriptions of benzodiazepines. Moreover, using Medicare Part B claims data, Moen et al. [7] first constructed a network of physicians who care for patients with cardiovascular disease based on patient-sharing relationships and then analyzed a case study of physicians’ adherence to clinical guidelines for the prevention of sudden cardiac death. Using a dataset consisting of 1306 physicians who practiced within the Chicago hospital referral region in 2016 and who collectively had 12,091 patient-sharing ties, Linde [8] created a patient-sharing network and examined the degree to which hospital affiliation drives physicians’ sharing of Medicare patients. In addition, Landon et al. [9] identified patient-sharing networks in which physicians share patients, information, and behaviors and presented a social network analysis of Medicare administrative data from 2006 to 2010 in 51 hospital referral regions. Similarly, DuGoff et al. [10] constructed patient-sharing networks using administrative data in which pairs of physicians are considered connected if they both deliver care to the same patient and examined the approaches to conceptualizing, measuring, and analyzing provider patient-sharing networks. There are typically two procedures in identifying influential nodes in the created networks: (1) ranking nodes according to nodal influence measure metrics, such as degree centrality [11], clustering coefficients [12], H-Index [13], [14], and -shell [15], [16]; and (2) evaluating the influence of top-ranked nodes by comparing the structures and functions before and after removing a certain percentage of top-ranked nodes. Currently, many algorithms have been proposed for identifying influential nodes. Most of the identification algorithms are designed for online social networks [17], [18], [19]. In these identifications, some influence measures, such as degree centrality [11], betweenness centrality [20], closeness centrality [21], and Katz centrality [22], are highly dependent on the topological structure of the network. Moreover, Xiao et al. [12] used company behaviors and clustering coefficient to rank nodes. Lu et al. [13] adopted the H-Index [14] to identify vital nodes. Kitsak et al. [23] proposed the theory of -shell decomposition to identify influential nodes. In addition, in the development of search engines, there are also important and famous algorithms for ranking nodes, such as PageRank [24], HITs [25], and LeaderRank [26]. These ranking algorithms have been excellently investigated and summarized in several reviews [20], [27]. The node ranking and influential nodes identification methods have their own advantages and disadvantages. Users can employ suitable metrics and develop algorithms in their specific applications. In this paper, we propose a network-based method with privacy-preserving for identifying influential providers in large healthcare service systems. First, the provider-interacting network is constructed by employing publicly available information on the locations and types of healthcare services of providers. Second, the ranking of nodes in the generated provider-interacting network is conducted in parallel on the basis of four nodal influence metrics, including the degree centrality (DC) [11], the companion behaviors (CB) [12], the clustering coefficients (CC) [12], and H-Index [13], [14]. Third, the impact of top-ranking influential nodes in the provider-interacting network is evaluated by comparing three network indicators, including the maximum connectivity coefficient [28], network efficiency [29], [30] and susceptibility [27]. The proposed method is finally demonstrated by employing Physician and Other Supplier Data CY 2017 [31] and can be applied to other similar datasets to help make healthcare system management decisions, such as the optimization of healthcare resources. The novelty of the proposed method can be explained as follows. As mentioned above, most of the related research is based on patient-sharing networks, which are created according to electronic medical claims, focusing on the sharing of patient information between providers. The advantage of creating patient-sharing networks is that they are quite precise. However, the data of electronic medical claims are personal and private [32], [33], and the data are generally not directly available to the public. Healthcare organizations need to consider regulations and rules of privacy in regard to patient information. Researchers need to deal with personal privacy information carefully before creating the required patient-sharing network. In contrast, our study is based on a patient-sharing network rather than the provider-interacting network. In the proposed method, the provider-interacting network joined by healthcare providers can be roughly created according to the location and available types of healthcare service, without the need for personal, private electronic medical claims data. We only utilize the relationships between providers to identify influential providers, thus protecting the privacy of patients. Most importantly, the locations and the provided types of healthcare services are publicly available. For example, these data are publicly available from the U.S. Centers for Medicare & Medicaid Services (https://www.cms.gov/). The aforementioned networks connected by healthcare provides are referred to as provider-interacting networks. The “provider-interacting” networks can be created by employing the publicly available information on the locations and the provided healthcare service. The main contributions of this paper can be summarized as follows. (1) We construct a provider-interacting network by employing publicly available information on locations and types of healthcare services of providers. (2) We rank the influential nodes of the created provider-interacting networks using four local metrics. (3) We evaluate the impact of the top-ranked influential nodes in the provider-interacting network using three indicators. The rest of this paper is organized as follows. Section 2 describes the details of the proposed network-based method for identifying the influential provider of healthcare service in the provider-interacting network. Section 3 presents the application of the proposed method for the Physician and Other Supplier Data CY 2017 [31]. Section 4 discusses the experimental results and the proposed method. Finally, Section 5 draws several conclusions.

Materials and methods

In this section, we will first introduce the data source and then describe the details of the proposed network-based method.

Data source

In this paper, the data of providers and other suppliers who have effective National Provider Identifiers (NPIs) and submit Part B medical insurance services in the United States from 2012 to 2017 are obtained from the U.S. Centers for Medicare & Medicaid Services. We first extract interesting and important data and summarize them according to the following content: (1) npi — NPI for the performing provider on the claim. The provider’s NPI is the numeric identifier registered in the NPPES. Each provider has a unique NPI. (2) nppes_provider_zip — The provider’s ZIP code. (3) hcpcs_code — Healthcare Common Procedure Coding System (HCPCS) code used to identify the specific medical service provided. The availability of the obtained datasets is as follows. Dataset: Medicare Physician and Other Supplier PUF, CY2017, Interactive Dataset [31]. URL: https://www.cms.gov/. Flowchart of the proposed network-based method for identifying influential providers in a healthcare service system.

Overview of the proposed method

In this paper, we propose a network-based method with privacy-preserving for identifying influential providers in large healthcare service systems. The proposed network-based method is composed of three main procedures. The first procedure is the construction of the network, which uses the ZIP code and HCPCS code of the providers to build the provider-interacting networks. The key step in this procedure is to set a threshold value for the location of the neighboring providers, specify that providers within the threshold range can generate the connection relationship, and add weights to the edges according to the similarity of the types of medical services provided. The second procedure is to rank influential nodes. The ranking of the nodes in the generated provider-interacting network is conducted in parallel on the basis of four nodal influence metrics, including the DC, CB, CC, and H-Index. The third procedure is to evaluate the impact of the identified influential nodes. By removing a certain proportion of the top-ranked nodes, the impact on the three indicators of the network is evaluated, including the maximum connectivity coefficient [28], network efficiency [29], [30], and susceptibility [27], to evaluate the effectiveness of the influential nodes ranking algorithm. The flowchart of the proposed network-based method for identifying influential providers in a healthcare service system is illustrated in Fig. 1.

Fig. 1

Flowchart of the proposed network-based method for identifying influential providers in a healthcare service system.

Procedures of the proposed method

Construction of the provider-interacting network

First, we cleaned and processed the obtained data. By writing a specific Python script, we extracted the NPI, ZIP code, and HCPCS code of 898,257 providers from Physician and Other Supplier Data CY 2017 [31]. Address matching was used to map geographic entities to addresses (spatial locations). There are three methods for address matching: (1) by street address, (2) by ZIP code, and (3) by boundary information [34], [35], [36]. In this paper, we employed the second approach by using the ZIP code of the providers. The centroid coordinates based on ZIP code were converted into longitude and latitude and then represented by nodes on the map (Fig. 2). The size of each node is the weight added according to the number of different medical services provided by the providers. The greater the number is, the greater the weight, and the larger the point (Fig. 3).

Fig. 2

Map of the nationwide providers.

Fig. 3

Weighted nodes distribution in a local area.

We constructed provider-interacting networks A() for 50 U.S. states (), who have an effective NPI and submit Part B medical insurance services. The network can be abstracted as an undirected weighted network. The model of the undirected weighted network is given as a triple G (V, E, W), where V (v1, v2,, v) represents the set of nodes and v V. E (e1, e2,, e) represents the set of edges; W [w] is the weight matrix of connected edges, where w 0 and w 0, w(ij) represents the weight of edges (v, v) [37]; see Eq. (1). Map of the nationwide providers. Weighted nodes distribution in a local area. The distribution of providers in California and Nevada. Each provider is regarded as a node. Node v and node v are connected by weighted lines. The weighted lines are created by considering (1) ZIP code proximity and (2) HCPCS code similarity. (1) ZIP code proximity We converted the centroid coordinates of each provider’s ZIP code to latitude and longitude. Node v’s longitude is X, the latitude is Y; node v’s longitude is X, and the latitude is Y. If the absolute value of the difference between the longitudes of two nodes, i.e., X-X, and the absolute value of the difference between the latitudes, i.e., Y-Y, are both within a certain range, then the two providers are considered potential neighbors; see Eq. (2). According to the division of cities, districts, and streets, people who live in a city are likely to know each other. After statistical calculation, the average value of threshold is 0.2°, and the average value of threshold is also 0.2°. (2) HCPCS code similarity The HCPCS is divided into two principal subsystems, referred to as level I and level II. Level I of the HCPCS is comprised of CPT (Current Procedural Terminology), a numeric coding system maintained by the American Medical Association (AMA). Level II codes are also referred to as alpha-numeric codes because they consist of a single letter of the alphabet followed by 4 numeric digits, while CPT codes are identified using 5 numeric digits. For each alpha-numeric HCPCS code, there is descriptive terminology that identifies a category of similar items. In this paper, we specify that the set of HCPCS codes for medical services provided by node v is represented by S, and the set of HCPCS codes provided by node v is represented by S. If two providers have the same HCPCS code, they are considered potential neighbors, and the weight of the edge is (Eq. (3)). Due to the large number of nodes in the nationwide provider-interacting network, we choose a representative provider- interacting network in California and Nevada for research. The distributions of the providers in California and Nevada are shown in Fig. 4. According to the above two rules for finding neighbors of providers, we generate local provider-interacting networks as shown in Fig. 5, Fig. 6.

Fig. 4

The distribution of providers in California and Nevada.

Fig. 5

The local provider-interacting network in California and Nevada.

Fig. 6

Partially enlarged map of the provider-interacting network.

The local provider-interacting network in California and Nevada. Partially enlarged map of the provider-interacting network.

Ranking of the influential nodes

In network science, one of the most important research areas is how to rank influential nodes in a complex network [38]. It is interesting to rank nodes when there are a massive number of nodes and find the vital providers that have a strong impact on the function and efficiency of the whole network. For example, it is possible to replace the missing vital nodes in a timely manner to protect the network from paralysis and maintain the stability of the medical system [39]. Moreover, Gao et al. [40] proposed a local structural centrality measure to rank the spreading ability of nodes in the network. Zhao et al. [41] proposed the IM-LPA algorithm to solve the influence maximization problem in social networks with a community structure. Mo et al. [42] proposed a new comprehensive centrality measure based on Dempster–Shafer evidence theory. The above algorithms can be roughly divided into two categories: one is based on local nodal influence metrics such as the degree centrality, and the other is based on global metrics such as the betweenness centrality, closeness centrality, and k-shell metrics [43]. In this paper, after considering the ranking metrics that are relatively suitable for undirected weighted provider-interacting networks, four nodal influential metrics are used to rank nodes, including (1) the degree centrality [11], (2) the companion behaviors [12], (3) the clustering coefficient [12], and (4) the H-Index [13], [14]. (1) Degree Centrality (DC) DC is measured by the degree of a node, which can directly reflect the possibility of the node having direct contact with other nodes in the network (Eq. (4)). DC indicates the DC of node v and K indicates the degree of node v. To reflect the function of weighted edges, we specify how many HCPCS codes are the same between nodes v and v, and how many connecting edges are there between nodes v and v. When calculating the DC of node v, the number of added connected edges can be equivalent to the number of increased neighbors. For a network where node v has connected edges, the maximum degree of a node is 1. After the node is divided by 1 for normalization, there is 0 1. The larger the value is, the stronger the centrality of the nodes is [37]. (2) Companion Behaviors (CB) The local metric CB was originally proposed by Mei et al. [12] and is based on the calculation of Jaccard Coefficients (JCs) of edges. The JC reflects the difference between the neighbors of two nodes, that is, the relationship strength of two nodes (Eq. (5)). If the JC of two nodes is small, the relationship between the two nodes may be weak. Conversely, if the JC of two nodes is large, the relationship between the two nodes may be strong. Suppose “v” is a common neighbor of nodes v and v, v and v have x HCPCS codes that the same, and v and v have y HCPCS codes that the same. Similarly, to reflect the function of weighted edges, we specify that there are several HCPCS codes between nodes v and v, and there are several connecting edges between nodes v and v. The increased number of connected edges is equivalent to the increased number of neighbors (Fig. 7 (a)), and the intersection of v and v neighbors means min(x, y).

Fig. 7

Illustrations of the summations of Jaccard Coefficient.

For example, node v shares an edge with v, node v has 5 connected providers (including node v), node v has 4 connected providers (including node v), and there is a common understanding provider v. At the same time, there are five identical HCPCS codes between v and v and four identical HCPCS codes between v and v, which means that v, v and v are connected with 4 triangles. The edges of node v and node v with the other nodes are shown in Table 1. In Fig. 7(b), JC 4/(3 2 1 5 1 4 3 4 4) 4/19. JC represents the weight of the edge whose value is always in the range of 0–1. The weight of the nodes is represented by CB, which means the sum of the JCs of the connected edge of the nodes (Eq. (6)).

Table 1

The number of identical HCPCS codes shared by nodes v and v with other neighbor nodes.

Connection relationship		Number of identical HCPCS codes
Node vi	Neighbor 1	3
Node vi	Neighbor 2	2
Node vi	Neighbor 3	1
Node vi	Neighbor vs	5
Node vi	Neighbor vj	1
Node vj	Neighbor 4	4
Node vj	Neighbor 5	3
Node vj	Neighbor vs	4

Illustrations of the summations of Jaccard Coefficient. The number of identical HCPCS codes shared by nodes v and v with other neighbor nodes. (3) Clustering Coefficient (CC) CC reflects the tightness of the connection between neighbors. For a node v with a degree of K, the number of edges between node v and K nodes is K(K 1)/2, which is the case with the largest number of edges. If the number of edges between v and its neighbors is E, the CC of a node can be calculated via Eq. (7). In special cases, if the degree K of the node is 0 or 1, the C of the node is considered to be 0. Obviously, the nodal CC is also between 0 and 1 [20], [44]. (4) H-Index The definition of the H-Index proposed by J.E. Hirsch [14] is as follows. All papers published by an author are sorted in descending order by the size of the citation frequency. If and only if the citation frequency of each paper in the first papers is at least and the citation frequency of the 1 paper in the first papers is less than 1, then value is defined as the H-Index of the author. If is set as the citation frequency of article , the expression of the H-Index can be written as in Eq. (8). Similarly, a node has an H-Index if of its neighbors have a degree of at least [12].

Impact evaluation of the identified influential nodes

There are two typical categories of indicators to evaluate the effectiveness of influential node ranking algorithms: (1) the propagation dynamics of the network and (2) the resilience (also called the robustness) of the network [30]. The resilience is a system’s ability to adjust its activity to retain its basic functionality when errors, failures, and environmental changes occur [45]. The research on network resilience due to external damage such attacks has drawn much attention in network science [46]. In general, there are two kinds of attacks on complex networks: random attacks and selective attacks. Random attacks mean that nodes are randomly removed with a certain probability, while selective attacks mean that nodes or edges are selectively deleted in a certain way [47]. Research on the structural robustness of network nodes subjected to random failures or malicious attacks has been widely studied [28]. Based on the concept of network resilience, in this paper, we employ three indicators, including the maximum connectivity coefficient, network efficiency, and susceptibility, to quantify the impact of selective removal of top-ranked influential nodes on the network structure and function. We abstract this situation as node deletion due to the influential provider’s own reasons (such as retirement or resignation) or other external factors. The results can be examined in two aspects: (a) the connectivity of the network has been damaged; and (b) the efficiency of the network has declined, resulting in the network not meeting the business requirements. (1) Maximum connectivity coefficient The maximum connectivity coefficient of a network can be calculated by first ranking the nodes according to the nodal influence metrics from the largest to the smallest and then observing the impact of removing a part of the nodes on the giant connected component (Eq. (9)). where represents the total number of nodes in the network and represents the giant connected component. The smaller the scale of the giant connected component after the removal of nodes, the more obvious the trend is, indicating that the effect of using this method to attack a network is better than that of other methods. (2) Network efficiency To investigate the effect of node removal on network efficiency, the network efficiency can be used to evaluate the connectivity of the network (Eq. (10)). To remove nodes and all their corresponding edges in the network, some paths in the network are interrupted, resulting in the shortest path between some nodes becoming larger, and then the average path length of the whole network increases, affecting network connectivity [30]. where , is the shortest path between nodes v and v, and is the number of network nodes. In this paper, we remove a certain proportion of specific nodes in the network to simulate the effect of a network attack and then calculate the network efficiency decline ratio before and after the attack to quantify the accuracy of each node influence metric. The proportion of network efficiency decrease is expressed through Eq. (11). where represents the network efficiency after node removal, 0 represents the original network efficiency, and 1. The higher the value of is, the worse the network efficiency becomes after removing the node. (3) Susceptibility The giant connected component decreases with an increase in the number of removed nodes and vanishes when a critical proportion of nodes is removed [27]. To find the accurate value, the susceptibility of the network is calculated before and after removing a certain number of nodes (Eq. (12)). where is the number of components of size , is the size of the whole network and represents the giant connected component. In general, there is a peak value of at the critical proportion of network collapses. If the network experiences multiple collapses during selective node deletion, there are multiple peaks, and the value is determined by the highest one. Obviously, according to the objective function on network connectivity, the smaller the value of is, the better the sorting algorithm.

Implementation details of the proposed method

In this subsection, we will introduce more implementation details on the development of the proposed network-based method for identifying influential providers in healthcare service systems. As has been described several times, there are three main procedures in the proposed method: (1) constructing the provider-interacting network, (2) ranking the influential nodes in the generated provider-interacting network, and (3) evaluating the impact of top-ranked influential nodes. This subsection will introduce more details on the above three procedures.

Implementation details of constructing the provider- interacting network

There are two steps in this procedure: the extraction of NPI, ZIP code, and HCPCS data and the construction of the provider-interacting network according to (1) the ZIP code proximity and (2) the HCPCS code similarity. We wrote a Python script specifically to extract the NPI, ZIP code, and HCPCS of each provider from the raw dataset, and we obtained 898,257 providers from Physician and Other Supplier Data CY 2017 [31]. The entire Python script is presented in Listing 1. We also specifically wrote C/C++ code to construct the local provider-interacting network in California and Nevada according to (1) the ZIP code proximity and (2) the HCPCS code similarity. After reading in the data of the NPI, ZIP code, and HCPCS of each provider, we first exclude those providers located outside California and Nevada according to the latitude and longitude. Those providers with latitudes in the range of 32.5 and 42 and with longitudes in the range of −124.5 and −114 are kept, while the rest are excluded. All the remaining providers are stored in a node list. We then implement a double loop over all the providers stored in the node list to form the links between providers by (1) comparing the HCPCS codes and (2) evaluating the distance between any pair of providers. If two providers have the same HCPCS codes and the providers are also located within a given region (see Eq. (2)), then a link/edge is created and stored in the edge list. If there are duplicate edges, then those edges will be merged into a unique one using sorting and scanning operations, while the weight of the merged edge indicates the number of duplications.

Implementation details of ranking the influential nodes

In the proposed method, the ranking of nodes in the generated provider-interacting network is conducted in parallel on the basis of four nodal influence metrics, including the DC, CB, CC, and H-Index. In this paper, we directly employ the source code presented in our previous work [12] to rank the influential nodes. Much more implementation detail is provided in reference [12].

Implementation details of evaluating the impact of top-ranked influential nodes

In the proposed method, the impact of those top-ranked influential nodes in the provider-interacting network is evaluated by comparing three network indicators: the maximum connectivity coefficient, network efficiency, and susceptibility. This procedure is implemented by invoking the Stanford Network Analysis Platform (SNAP) [48]. Much more implementation detail is provided in reference [49].

Results

Experimental environment

The performance of the proposed method is evaluated on a workstation computer. Detailed specifications of the employed workstation computer are listed in Table 2.

Table 2

Specifications of the workstation computer for testing the proposed method.

Specifications	Details
CPU	Intel Xeon Gold 5118 CPU
CPU Frequency (GHz)	2.30
CPU RAM (GB)	128
CPU core	48
GPU	Quadro P6000
GPU memory (GB)	24
CUDA cores	3840
OS	Windows 10 professional
Compiler	VS2015 community
CUDA version	v9.0
Anaconda version	Python 3.7

Specifications of the workstation computer for testing the proposed method.

Experimental results

Results of ranking influential nodes

The frequency distribution of the four node-ranking metrics is shown in Fig. 8. Fig. 8(a) shows the frequency distribution of the unweighted node degrees, and Fig. 8(b) shows the frequency distribution of the weighted node degrees.

Fig. 8

Frequency distributions of node ranking using four nodal influence metrics.

Fig. 8 shows that the unweighted degree of nodes ranges from 1 to 4050, of which 98.7% are less than 200; the weighted degree of nodes ranges from 1 to 14,247, of which 99.1% is less than 1000; and the CB of nodes ranges from 0 to 40.2575, of which 98.6% are less than 2. The distributions of the degree and CB metrics of nodes are extremely irregular from low to high, and most of them are low-value metrics. Frequency distributions of node ranking using four nodal influence metrics. The CC of nodes ranges from 0 to 1, of which 20.3% are less than 0.05; the H-Index of nodes ranges from 1 to 99, of which 34.1% are less than 5. The distributions of the CC and H-Index metrics of nodes are relatively regular from low to high, but the proportion of low values is the largest. The general trend is that with increasing of CC and H-Index values, the proportion decreases.

Results of impact evaluation of the top-ranked influential nodes

To evaluate the influence of static attacks (i.e., the node importance remains unchanged) on the maximum connectivity coefficient, network efficiency, and susceptibility of the network, we selected 11 groups of removal ratios (0%, 0.1%, 0.2%, 0.3%, 0.4%, 0.5%, 0.6%, 0.7%, 0.8%, 0.9%, 1%) for testing; see Table 3, Table 4, Table 5, Table 6 and Fig. 9 for the detailed test results.

Table 3

Influences of removing the top nodes with different proportions on the network resilience when using the DC ranking.

Removal ratio (p)	Number of nodes	Number of edges	G (%)	η (%)	S
0.0%	45 552	668 954	0.508	0.081	3596.355
0.1%	45 429	539 246	0.505	0.078	3592.650
0.2%	45 126	444 925	0.500	0.074	3579.129
0.3%	44 738	367 210	0.495	0.069	3498.248
0.4%	44 305	301 859	0.489	0.064	3407.059
0.5%	43 477	245 117	0.481	0.062	3182.364
0.6%	42 730	194 460	0.471	0.056	3084.534
0.7%	41 274	151 044	0.451	0.049	2857.727
0.8%	38 821	119 703	0.407	0.041	2708.904
0.9%	35 281	92 997	0.379	0.032	2024.314
1.0%	32 407	72 804	0.343	0.025	1678.572

Table 4

Influences of removing the top nodes with different proportions on the network resilience when using the CB ranking.

Removal ratio (p)	Number of nodes	Number of edges	G (%)	η (%)	S
0.0%	45 552	668 954	0.508	0.081	3596.355
0.1%	45 460	569 467	0.506	0.080	3590.922
0.2%	45 304	468 619	0.504	0.078	3567.002
0.3%	45 163	374 261	0.501	0.076	3552.402
0.4%	44 746	292 790	0.493	0.069	3535.150
0.5%	43 733	226 624	0.478	0.062	3365.996
0.6%	41 577	181 441	0.451	0.053	2922.463
0.7%	40 600	148 456	0.442	0.049	2700.981
0.8%	38 625	120 309	0.413	0.042	2510.987
0.9%	36 157	96 810	0.391	0.035	1945.490
1.0%	33 561	76 743	0.363	0.028	1154.344

Table 5

Influences of removing the top nodes with different proportions on the network resilience when using the CC ranking.

Removal ratio (p)	Number of nodes	Number of edges	G (%)	η (%)	S
0.0%	45 552	668 954	0.508	0.081	3596.355
0.1%	45 379	668 638	0.508	0.081	3566.708
0.2%	45 294	668 480	0.507	0.081	3549.826
0.3%	45 191	668 141	0.507	0.081	3541.666
0.4%	45 064	667 858	0.506	0.081	3526.877
0.5%	44 920	667 610	0.506	0.081	3518.809
0.6%	44 743	667 252	0.505	0.080	3465.465
0.7%	44 425	666 895	0.505	0.080	3452.600
0.8%	44 157	666 511	0.505	0.008	3444.969
0.9%	44 041	666 224	0.503	0.008	3438.460
1.0%	43 869	665 941	0.503	0.008	3413.773

Table 6

Influences of removing the top nodes with different proportions on the network resilience when using the H-Index ranking.

Removal ratio (p)	Number of nodes	Number of edges	G (%)	η (%)	S
0.0%	45 552	668 954	0.508	0.081	3596.355
0.1%	45 499	623 144	0.507	0.080	3596.355
0.2%	45 377	538 173	0.504	0.078	3595.746
0.3%	45 109	475 009	0.498	0.071	3594.700
0.4%	44 680	437 861	0.490	0.068	3585.881
0.5%	44 284	421 755	0.483	0.066	3576.973
0.6%	44 018	410 930	0.476	0.063	3574.111
0.7%	43 656	403 373	0.468	0.061	3568.747
0.8%	42 880	388 296	0.453	0.058	3562.841
0.9%	42 133	379 275	0.439	0.054	3540.003
1.0%	41 857	375 023	0.434	0.053	3507.595

Fig. 9

The influence of removing the top nodes with different proportions on the (a) maximum connectivity coefficient, (b) network efficiency, and (c) susceptibility.

Fig. 9 shows the effect of removing the top-ranked nodes with different scales on the maximum connectivity coefficient, network efficiency, and susceptibility. After removing the top-ranked nodes, the trend in the giant connected component becoming smaller is more obvious, the more obvious the decline in the maximum connectivity coefficient is, and the worse the network efficiency is. Fig. 9(a) and (b) show that when removal ratios reach 0.5%–1%, using the metric DC to remove the top-ranked nodes leads to the largest decline in the maximum connectivity coefficient and network efficiency, followed by the metrics CB and H-Index. The metric CC leads to a slight decrease in the maximum connectivity coefficient and network efficiency, which is almost unchanged. In Fig. 9(c), the metrics CC and H-Index reduce the susceptibility by a small margin, which is almost unchanged, while the metric CB reduces the maximum connectivity coefficient and network efficiency by the largest margin, followed by the metric DC. The above results show that, in the provider-interacting networks, when selectively removing the top 0.1%–1% of nodes for each metric, the efficiency, connectivity and susceptibility of the network become the worst when using the metrics DC and CB to remove top-ranked nodes. Influences of removing the top nodes with different proportions on the network resilience when using the DC ranking. Influences of removing the top nodes with different proportions on the network resilience when using the CB ranking. Influences of removing the top nodes with different proportions on the network resilience when using the CC ranking. Influences of removing the top nodes with different proportions on the network resilience when using the H-Index ranking. The influence of removing the top nodes with different proportions on the (a) maximum connectivity coefficient, (b) network efficiency, and (c) susceptibility.

Discussion

Advantage of the proposed method

The most obvious advantage of the proposed method is that it does not require the information of personally private medical claims. As mentioned above, patient-sharing networks are created according to electronic medical claims, and the advantage of creating patient-sharing networks is that they are quite precise. However, the data of electronic medical claims are personal and private [32], [33]. Personally private information needs to be carefully addressed before creating the required patient-sharing networks [50], [51]. In contrast, in the proposed method, the provider-interacting network joined by healthcare providers can be roughly created according to the location and the available types of healthcare services, without the need for personally private electronic medical claims. For example, the data of the location and the provided types of healthcare service are publicly available at the U.S. Centers for Medicare & Medicaid Services (https://www.cms.gov/). This is the main advantage of the proposed method.

Shortcomings of the proposed method

One of the essential ideas behind the proposed method is the construction of the provider-interacting network, rather than the commonly used patient-sharing networks. Those patient-sharing networks are typically constructed based on medical claims data. Medical claims are usually quite accurate, and the patient-sharing network is also quite accurate in reflecting the relationships between providers. Compared with those patient-sharing networks constructed based on medical claims data, the provider-interacting network is generated according to the location and available healthcare service of the providers, more specifically, according to the ZIP codes and HCPCS codes of the providers. In the proposed method, several nearby providers located within a certain distance who provide the same HCPCS codes are linked and considered neighbors in the provider-interacting network. This procedure may not be accurate because it is not able to determine the “best” distance threshold for selecting neighboring providers. In fact, the distance threshold is specified. Thus, the generation of the provider-interacting network is not as accurate as that of the patient-sharing network.

Applicability of the proposed method

The proposed method can be applied to help make decisions for healthcare systems management, such as the optimization of healthcare resources in the response to public health emergencies [52]. For example, since mid-December 2019, coronavirus disease 2019 (COVID-19) has been spreading from Wuhan, China. The epidemic quickly disseminated from Wuhan, and as of 12 February 2020, 45,179 cases have been confirmed in 25 countries, including 1116 deaths [53]. As of March 1, 2020, it has spread to 58 other countries [54]. Recognizing the Wuhan-focused and nationwide outbreak responses in China, the WHO has encouraged countries with heavy air travel exchanges with Wuhan to take precautionary public health measures and, if there is an imported infection, to undertake activities that could lead to the elimination of the virus in human populations, as occurred during the 2003 SARS outbreak [55]. Rapid and effective collaboration between the clinicians (e.g., the general practitioners attending the cases, emergency hotline clinicians, and infectious diseases specialists), the National Reference Centre and the regional and national health authorities has played a crucial role in the systemic capacity to quickly detect, isolate and investigate cases to implement adequate control measures [56], [57]. In China, during the period of epidemic prevention, the demand for front-line medical staff, disinfection materials, protective equipment and emergency supplies in the respiratory tract infection department increased dramatically. In contrast, many patients with other diseases went to the hospital as little as possible; thus, the investment in diagnosis and treatment in other departments of the hospital decreased, and the required medical resources (e.g., the labor and material resources) decreased accordingly. It is quite meaningful to employ the proposed method in this paper to build a network of hospital providers, identify the important providers remaining to maintain the normal operation of local medical services, and transfer other redundant providers and available medical materials to support areas heavily affected by the epidemic. In this way, the proposed method can not only effectively ensure that local hospitals maintain their capability to provide healthcare services but can also reasonably distribute medical resources, which plays an important role in epidemic prevention. Moreover, more attention needs to be paid to strengthening the distribution of personnel and facilities in district and community level medical institutions. According to urban planning, the method in this paper ensures that public medical resources can flow between different levels corresponding to different geographical units and can be transferred at the grassroots level to establish a resilient and hierarchical healthcare service system.

Outlook and future work

Due to the large number of nodes in the provider-interacting network, in this paper, we only choose the providers in California and Nevada for research. Moreover, in the proposed method, several nearby providers located within a certain distance who provide the same HCPCS codes are linked. This procedure may not be accurate because it is not able to determine the “best” distance threshold to select neighboring providers. In fact, the threshold of distance is specified. In the future, we will generate a nationwide provider-interacting network for further research and optimize the selection of a proper distance threshold. Moreover, with significant advances in communication technologies, an era of “Internet of Things” (IoT) appears [58], [59]. A large amount of IoT data can be collected in various ways [60], [61] which may be used to generate networks for healthcare service systems. Identifying influential providers in those generated network-based systems can also help optimize and distribute the healthcare service resource. In the future, we will also conduct research work in this field.

Conclusions

Identifying the most influential providers in a healthcare service system is an important step towards optimizing the use of available healthcare resources and ensuring the more efficient delivery of matched healthcare services. In this paper, we proposed a network-based method with privacy-preserving for identifying influential providers in large healthcare service systems. First, we constructed a provider-interacting network by employing the publicly available information on the locations and types of healthcare service of providers. Second, we ranked the nodes in the generated provider-interacting network on the basis of four nodal influence metrics. Third, we evaluated the impact of those top-ranked influential nodes in the provider-interacting network by comparing three network indicators. The proposed method is demonstrated by employing the dataset of Physician and Other Supplier Data CY 2017, and can be applied to other similar datasets to help make decisions for healthcare system management, such as the optimization of healthcare resources in the response to public health emergencies.

CRediT authorship contribution statement

Xiaoyu Qi: Conceptualization, Data curation, Methodology, Software, Writing - original draft. Gang Mei: Supervision, Conceptualization, Data curation, Methodology, Writing - original draft, Writing - review & editing. Salvatore Cuomo: Investigation, Writing - review & editing. Lei Xiao: Software, Validation.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

28 in total

1. Identifying influential and susceptible members of social networks.

Authors: Sinan Aral; Dylan Walker
Journal: Science Date: 2012-06-21 Impact factor: 47.728

2. Universal resilience patterns in complex networks.

Authors: Jianxi Gao; Baruch Barzel; Albert-László Barabási
Journal: Nature Date: 2016-02-18 Impact factor: 49.962

3. Patient-Sharing Networks of Physicians and Health Care Utilization and Spending Among Medicare Beneficiaries.

Authors: Bruce E Landon; Nancy L Keating; Jukka-Pekka Onnela; Alan M Zaslavsky; Nicholas A Christakis; A James O'Malley
Journal: JAMA Intern Med Date: 2018-01-01 Impact factor: 21.873

4. Leaders in social networks, the Delicious case.

Authors: Linyuan Lü; Yi-Cheng Zhang; Chi Ho Yeung; Tao Zhou
Journal: PLoS One Date: 2011-06-27 Impact factor: 3.240

5. Provider Patient-Sharing Networks and Multiple-Provider Prescribing of Benzodiazepines.

Authors: Mei-Sing Ong; Karen L Olson; Aurel Cami; Chunfu Liu; Fang Tian; Nandini Selvam; Kenneth D Mandl
Journal: J Gen Intern Med Date: 2015-07-18 Impact factor: 5.128

6. Disparities in obesity rates: analysis by ZIP code area.

Authors: Adam Drewnowski; Colin D Rehm; David Solet
Journal: Soc Sci Med Date: 2007-08-29 Impact factor: 4.634

7. The H-index of a network node and its relation to degree and coreness.

Authors: Linyuan Lü; Tao Zhou; Qian-Ming Zhang; H Eugene Stanley
Journal: Nat Commun Date: 2016-01-12 Impact factor: 14.919

8. Case of the Index Patient Who Caused Tertiary Transmission of COVID-19 Infection in Korea: the Application of Lopinavir/Ritonavir for the Treatment of COVID-19 Infected Pneumonia Monitored by Quantitative RT-PCR.

Authors: Jaegyun Lim; Seunghyun Jeon; Hyun Young Shin; Moon Jung Kim; Yu Min Seong; Wang Jun Lee; Kang Won Choe; Yu Min Kang; Baeckseung Lee; Sang Joon Park
Journal: J Korean Med Sci Date: 2020-02-17 Impact factor: 2.153

9. Substantial undocumented infection facilitates the rapid dissemination of novel coronavirus (SARS-CoV-2).

Authors: Ruiyun Li; Sen Pei; Bin Chen; Yimeng Song; Tao Zhang; Wan Yang; Jeffrey Shaman
Journal: Science Date: 2020-03-16 Impact factor: 47.728

10. The formation of physician patient sharing networks in medicare: Exploring the effect of hospital affiliation.

Authors: Sebastian Linde
Journal: Health Econ Date: 2019-10-28 Impact factor: 3.046

27 in total

1. Diagnosing Coronavirus Disease 2019 (COVID-19): Efficient Harris Hawks-Inspired Fuzzy K-Nearest Neighbor Prediction Methods.

Authors: Hua Ye; Peiliang Wu; Tianru Zhu; Zhongxiang Xiao; Xie Zhang; Long Zheng; Rongwei Zheng; Yangjie Sun; Weilong Zhou; Qinlei Fu; Xinxin Ye; Ali Chen; Shuang Zheng; Ali Asghar Heidari; Mingjing Wang; Jiandong Zhu; Huiling Chen; Jifa Li
Journal: IEEE Access Date: 2021-01-19 Impact factor: 3.367

2. Vomiting Management and Effect Prediction after Early Chemotherapy of Lung Cancer with Diffusion-Weighted Imaging under Artificial Intelligence Algorithm and Comfort Care Intervention.

Authors: Cailing Mei; Ling Zhang; Zhiying Zhang
Journal: Comput Math Methods Med Date: 2022-06-15 Impact factor: 2.809

3. Treatment of Fracture of the Calcaneus via Bone Axial X-Ray Image-Based Minimally Invasive Approach.

Authors: Jie Xiao; Zengfeng Xin; Xiaojun Fu; Jiaqi Huang; Bi Zhang; Haiping Yu
Journal: Comput Math Methods Med Date: 2022-07-01 Impact factor: 2.809

4. Early Diagnosis of Acute Ischemic Stroke by Brain Computed Tomography Perfusion Imaging Combined with Head and Neck Computed Tomography Angiography on Deep Learning Algorithm.

Authors: Yi Yang; Jinjun Yang; Jiao Feng; Yi Wang
Journal: Contrast Media Mol Imaging Date: 2022-05-09 Impact factor: 3.009

5. Intelligent Algorithm-Based Ultrasound for Evaluating the Anesthesia and Nursing Intervention for Elderly Patients with Femoral Intertrochanteric Fractures.

Authors: Zhen Li; Yimei Peng; Liping Zou; Yanfang He
Journal: Comput Intell Neurosci Date: 2022-05-24

6. Magnetic Resonance Imaging Data Features to Evaluate the Efficacy of Compound Skin Graft for Diabetic Foot.

Authors: Chunlei Wang; Xiaomei Yu; Ying Sui; Junhui Zhu; Bo Zhang; Yongtao Su
Journal: Contrast Media Mol Imaging Date: 2022-06-13 Impact factor: 3.009

7. Diffusion-Weighted Imaging Image Combined with Transcranial Doppler Ultrasound in the Diagnosis of Patients with Cerebral Infarction and Vertigo.

Authors: Ying Lv; Yijie Zhang; Jun Wu
Journal: Contrast Media Mol Imaging Date: 2022-06-30 Impact factor: 3.009

8. Intelligent Algorithm-Based Echocardiography to Evaluate the Effect of Lung Protective Ventilation Strategy on Cardiac Function and Hemodynamics in Patients Undergoing Laparoscopic Surgery.

Authors: Huijuan Wang; Chao Gong; Yi Zhang; Yun Wang; Xiaoli Wang; Xiao Zhao; Lianhua Chen; Shitong Li
Journal: Comput Math Methods Med Date: 2022-06-30 Impact factor: 2.809

9. Three-Dimensional Reconstruction of a CT Image under Deep Learning Algorithm to Evaluate the Application of Percutaneous Kyphoplasty in Osteoporotic Thoracolumbar Compression Fractures.

Authors: Jiameng Li; Zhong Xiang; Jiaqing Zhou; Meng Zhang
Journal: Contrast Media Mol Imaging Date: 2022-04-28 Impact factor: 3.009

10. Artificial Intelligence Algorithm-Based Intraoperative Magnetic Resonance Navigation for Glioma Resection.

Authors: Jianqiang Wei; Chunman Zhang; Liujia Ma; Chunrui Zhang
Journal: Contrast Media Mol Imaging Date: 2022-03-04 Impact factor: 3.161