Literature DB >> 36001543

Regional medical inter-institutional cooperation in medical provider network constructed using patient claims data from Japan.

Yu Ohki¹, Yuichi Ikeda¹, Susumu Kunisawa², Yuichi Imanaka².

Abstract

The aging world population requires a sustainable and high-quality healthcare system. To examine the efficiency of medical cooperation, medical provider and physician networks were constructed using patient claims data. Previous studies have shown that these networks contain information on medical cooperation. However, the usage patterns of multiple medical providers in a series of medical services have not been considered. In addition, these studies used only general network features to represent medical cooperation, but their expressive ability was low. To overcome these limitations, we analyzed the medical provider network to examine its overall contribution to the quality of healthcare provided by cooperation between medical providers in a series of medical services. This study focused on: i) the method of feature extraction from the network, ii) incorporation of the usage pattern of medical providers, and iii) expressive ability of the statistical model. Femoral neck fractures were selected as the target disease. To build the medical provider networks, we analyzed the patient claims data from a single prefecture in Japan between January 1, 2014 and December 31, 2019. We considered four types of models. Models 1 and 2 use node strength and linear regression, with Model 2 also incorporating patient age as an input. Models 3 and 4 use feature representation by node2vec with linear regression and regression tree ensemble, a machine learning method. The results showed that medical providers with higher levels of cooperation reduce the duration of hospital stay. The overall contribution of the medical cooperation to the duration of hospital stay extracted from the medical provider network using node2vec is approximately 20%, which is approximately 20 times higher than the model using strength.

Entities: Chemical

Mesh：

Year: 2022 PMID： 36001543 PMCID： PMC9401144 DOI： 10.1371/journal.pone.0266211

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.752

Introduction

As the global population ages, there is an increasing need to establish healthcare systems that can sustainably provide high-quality healthcare to the elderly. According to the United Nations’ World Population Prospects 2019, the aging rate, which is the percentage of the total population aged 65 and above, is predicted to increase from 9.3% in 2020 to 15.9% in 2050 [1]. We expect to emerge as a super-aged society on a global scale. We need to provide efficient medical services to address this situation. Care coordination with multiple participants can improve medical outcomes while limiting healthcare costs [2]. We used network science to evaluate the cooperation of medical providers and physicians to provide efficient healthcare. Researchers have proposed various methods to construct physician and medical provider networks to share or transfer patients [3]. Previous studies on physician networks have examined the relationship between network features and medical cost [4-9], quality of care [10], and mortality [11] as a patient-level medical outcome measure and the relationship between various medical outcomes and network features at the regional level [12]. Physician networks contain information about medical cooperation and the quality of healthcare. Compared with studies on physician networks, few reports of studies on medical provider networks exist. The extent to which medical cooperation, extracted from the medical provider network, affects the quality of health care provision is an interesting question. Ostovari and Yu and Gander et al. examined the relationship between the network features of medical providers and the patient-level medical outcome [13, 14]. These studies assigned one medical provider spending most resources on each patient and did not incorporate the usage patterns of multiple medical providers in a patient’s medical care series. These studies, including those of physician networks, have used general network features to represent cooperation among medical providers and physicians. However, the general network features extract only one aspect of cooperation, which may not fully represent the information. We expect the network to contain more information on medical cooperation to explain the quality of healthcare provision. The healthcare system in Japan requires more extended hospital stays than those in other countries. In Japan, the acute, recovery, and chronic care hospitals are separated to share healthcare provision and efficiently provide healthcare with the aim of shortening the duration of hospital stay. We focus on this regional medical inter-institutional cooperation in Japan. From this perspective, it is crucial to examine the contribution of cooperation among medical providers to a patient’s usage pattern of medical providers. We consider that smooth cooperation among medical providers leads to good results. However, it is unclear how much cooperation takes place, how cooperation is implemented, and what its implications are for individual patients. Therefore, it is necessary to establish the relationship between medical cooperation and the quality of healthcare, considering the usage patterns of medical providers. We analyzed the medical provider network to examine the overall contribution to healthcare quality by cooperation among the medical providers. This study focused on: i) the method of feature extraction from the network; ii) incorporating the usage pattern of medical providers; iii) expressive ability for statistical models. First, we extracted features representing each medical provider in a network structure through machine learning. Second, we integrated these features of medical providers used by a patient with each case and obtained features corresponding to the usage patterns of medical providers. Third, we used a machine learning model with a high expressive ability to explain the healthcare quality based on these features. Additionally, we selected femoral neck fractures as the target disease. Femoral neck fractures frequently occur in the elderly. The treatment phase shifts from hospitalization in the acute phase to rehabilitation in the recovery phase and long-term care after discharge [15-17]. Therefore, this ailment is suitable for evaluating the effect of regional medical inter-institutional cooperation in the transition of treatment phases. We analyzed the claims data of patients residing in a prefecture in Japan between January 1, 2014 and December 31, 2019 to construct a network representing the medical providers’ cooperation. We consider several models: a model using the node strength and linear regression to a model using feature representation by node2vec and regression tree ensemble. We used the regression models to examine the relationship between the duration of hospital stay and features of nodes in the medical provider network. We also analyzed the relationship between medical cooperation and network structure using community analysis. This study clarifies how the strength of each medical provider relates to the hospital stay. However, the statistical relationship between strength and the duration of hospital stay is weak, even when age is considered. In contrast, the model using the feature representation by node2vec, input variables, and a regression tree ensemble as a regression model was more explanatory than the model using strengths. The community analysis revealed that the high-strength network structure has short distance, high clustering, and weak disassortability. This result indicates that a horizontally decentralized network structure is more effective for medical cooperation than a centralized network. This study is valuable because we reveal that regional medical inter-institutional cooperation represented by the medical provider network is an important factor in shortening the hospital stay. Its overall contribution to the duration of hospital stay for femoral neck fracture is approximately 20%. In addition, we investigated effective network structure for medical cooperation. These findings suggest ways to function the regional medical inter-institutional cooperation. This study contributes to an efficient healthcare system to cope with a super-aged society. The remainder of this paper is organized as follows. The following section describes the role of medical cooperation among healthcare providers for a femoral neck fracture. We then describe the data, data analysis method, statistical model, and community analysis in the “Data and Method” section. The “Results” section presents the relationship between the features of medical providers and the duration of hospital stay based on the constructed network. Using a community analysis, we also examine the relationship between the network structure and medical cooperation. The “Discussion” section discusses the results, and the “Summary” section summarizes and concludes this paper.

Regional medical inter-institutional cooperation

Regional medical inter-institutional cooperation is one of the characteristics of the Japanese healthcare system. The conceptual diagram of the regional medical inter-institutional cooperation for femoral neck fracture is shown in Fig 1. We consider that the medical provider network represents the cooperation among medical providers.

Fig 1

Conceptual diagram of regional inter-institutional medical cooperation for treating femoral neck fracture.

Conceptual diagram of regional inter-institutional medical cooperation for treating femoral neck fracture.

The red arrows represent the flow of patients. The blue arrows represent the service provision relationship. Patients with femoral neck fractures are hospitalized in the acute and recovery phase hospitals. Community hospitals and nursing homes provide long-term care (LTC) after discharge. The administration provides social services to maintain and improve the health of patients before and after a fracture. Patients with femoral neck fractures usually undergo surgery in an acute phase hospital after the fracture, then transition to rehabilitation in a recovery phase hospital, and they receive care at home through outpatient care hospital or at a nursing home after discharge [15-17]. Quality of surgery and rehabilitation in acute and recovery phase hospitals affects recovery. The quality of long-term care in community hospitals and nursing homes also affects recovery. We expect that effective cooperation among these medical providers will improve the quality of care. In addition, we must consider the social factors that maintain people’s health status. The impact of various social services on health was investigated in [18]. Patients with femoral neck fractures require social service support before and after the fracture. We examine the factors associated with mortality [19-23], recovery [24], and quality of life [25] for hip fractures, including the femoral neck fractures. As described below, each factor is related to one of the actors presented in Fig 1, such as the patients, medical providers (acute phase hospitals, recovery phase hospitals, and community hospitals), nursing homes, or administration. Firstly, there are patient-specific factors such as age and sex. The older the patient, the higher was the risk of a femoral neck fracture. Men are more likely to die than women, and women are more likely to recover than men. Secondly, health factors were related to the patients’ physical condition and health status. Low weight and nutritional status and diseases, such as renal and cardiovascular diseases, may increase the risk of fractures in patients. Thirdly, there are environmental factors such as residential conditions. Patients residing in nursing homes were more likely to die than those residing at home. Fourthly, there are social service factors. Social services improve patient health and the environment. Fifthly, there are medical care factors, such as the quality of surgery and rehabilitation. Sixthly, there are medical cooperation factors. We consider the time from fracture to surgery and the duration of hospital stay as medical care cooperation factors because effective cooperation can reduce them. Based on the above, the quality of healthcare provided to patients Q is determined by the following factors: patient-specific factor P, health factor H, Environmental factor E, social service factor S, medical care factor M, and medical cooperation factor C. Thus, we obtain the following functional: This study examines the contribution of medical cooperation factor C from analysis of medical provider networks.

Data and methods

We analyzed the patient claims data, constructed a medical provider network, and calculated the duration of hospital stay to model the relationship between medical inter-institutional cooperation and the quality of healthcare provision. We considered statistical models to examine the relationship. We also used community analysis to examine the relationship between medical cooperation and network structure.

Data

We analyzed anonymized personal-level patient claims data for residents of a prefecture in Japan. The data included inpatient and outpatient records related to the femoral neck fractures from January 1, 2014, to December 31, 2019. This study included 9,496 patients and 863 medical providers, respectively in Table 1. Approximately 94% of the patients used medical providers within the prefecture. The data items for inpatient claims include patient ID, medical provider ID, and dates of admission, discharge, and surgery. The data items for outpatient claims include patient and medical provider ID and the visit date.

Table 1

Summary of patient claims data.

	Number of records	Number of patients	Number of medical provider	Percentage of medical records in the prefecture
Inpatient claims data	13193	8775	411	94.2%
Outpatient claims data	153035	7187	692	93.8%
Total	166228	9496	863	93.8%

As there is an overlap in the number of patients and medical providers in the inpatient and outpatient claims data, the total values do not equal the sum of both.

Medical provider network

We constructed a medical provider network using the patient claims data. At first, we constructed a patient–medical provider bipartite graph. The ij−component B of the bi-adjacency matrix (Np × Nm) corresponding to the bipartite graph represents inpatient or outpatient care of medical provider i to patient j. Np and Nm denote the number of patients and medical providers, respectively. Also when patient j uses medical provider i multiple times, B = 1. Column vector in the ith row of the bipartite graph is a vector representing the patients who use the medical provider i. We then project this bipartite graph onto the medical provider network. As the self-loop is removed, the ij−component A of the adjacency matrix (Np × Np) is as follows: The weight w between medical providers i and j is cosine similarity between vectors and that represent patients using medical providers i and j, respectively. It is an undirected graph from the definition of weights, such as the following equation: In many previous studies, the weights were the number of patients shared among medical providers: . However, we used cosine similarity for the weights to normalize the effect of the size of medical providers.

Network features

We calculated the network features representing the structural characteristics of the medical provider network. We calculated the average degree, strength, distance, clustering coefficient, and assortativity. The degree k is the number of edges of node i. Average degree 〈k〉 is the mean of the degrees of all nodes: 〈k〉 = ∑k/N = 2L/N, where N denotes the number of nodes, and L denote the edges. 〈k〉 represents the average number of medical providers with which a medical provider shares patients. The strength s is the sum of the weights of the edges of node i. Average strength 〈s〉 is the mean of the strength of all nodes: 〈s〉 = ∑s/N. The weight w is normalized for the effect of the size of the medical provider, as defined in Eq (3). We consider the value of strength as the level of cooperation of each medical provider with neighboring medical providers. Distance d is the number of edges when connecting the nodes i and j with the shortest path length in the network. The average distance 〈d〉 is the mean of the distances between all nodes: 〈d〉 = ∑d/N(N − 1). The cluster coefficient C is the fraction of edges between neighboring nodes of node i: C = 2l/k(k − 1), where l is the number of edges between neighboring nodes of i. Average clustering coefficient 〈C〉 = ∑C/N. Assortativity r is the Pearson correlation coefficient of the degree of the nodes at both ends of the edge, r = ∑(kk − 〈k〉2)/∑(k − 〈k〉)2. Networks with r > 0 represent degree assortativity networks, and networks with r < 0 represent disassortativity networks.

Statistical model

The medical provider network represents cooperation among medical providers. We consider a statistical model with the features extracted from this network as input variables and the quality of healthcare provision as an output variable. We use strength and feature representation by node2vec network embedding as input variables to extract information about medical cooperation from the medical provider network. Strength is an explicit feature of cooperation. However, we obtain feature representations by embedding the network structure in the feature space. Duration of hospital stay was used as the output variable.

Duration of hospital stay

We calculated the duration of hospital stay Din using the patient claims data. Firstly, we extracted a series of medical records from inpatient and outpatient claims data, as shown in Fig 2(a). We set two types of thresholds, τ1 and τ2, for the extraction. The threshold of hospital transfer τ1, determines the inpatient transfer. If a patient is admitted to another hospital within τ1 days of the date of discharge from one hospital, the patient transfer is related to the same fracture. Threshold of care continuity τ2 was used to integrate the pre-and post-hospitalization outpatient records in a series of medical records. We considered an outpatient record within τ2 days of the date of admission, discharge, or visit as outpatient care related to the same fracture. We extracted a series of medical records related to a single fracture in a patient using τ1 and τ2. We set τ1 = 10 d and τ2 = 35 d. We determined these thresholds to be the inpatient and outpatient intervals distributions, as shown in S1 Fig.

Fig 2

Methods for calculating duration of hospital stay Din.

Methods for calculating duration of hospital stay Din.

(a) Extracting a series of medical records using two types of threshold τ1 and τ2. The threshold of hospital transfer τ1 determines the inpatient transfer. The threshold of care continuity τ2 is used to integrate pre-and post-hospitalization outpatient records into the series of medical records. (b) Calculating the duration of hospital stay Din from each series of medical records. In the case of medical records with hospital transfers, the date of admission at the first medical provider to the date of discharge at the last medical provider is Din. H denotes set of medical providers where a patient is hospitalized. Secondly, we calculated the duration of hospital stay Din in each series of medical records. A series of medical records may or may not include hospital transfers. Without hospital transfer, a patient spends the acute-phase and recovery-phase hospitalization in a medical provider. In this case, the duration of hospital stay with the medical provider is Din. However, with hospital transfer, the patient spends acute-phase and recovery-phase hospitalization with different medical providers. Din is the duration from the date of admission to the first medical provider to the date of discharge from the last medical provider. H denotes a set of medical providers where a patient is hospitalized in a series of medical records.

Network embedding

The medical cooperation represented by the medical provider network contains various aspects of information contributing to the quality of healthcare provision. General network features, such as strength, represent only one aspect of the network. Thus, we use node2vec for feature engineering to represent the relationships among nodes in the structure of medical provider networks as features of medical cooperation. node2vec is a method of embedding network structure into feature space to obtain feature representation of nodes using node sampling and skip-gram model [26]. Let G = (V, E) denote the given network. node2vec samples nodes using second-order random walk controlled by two parameters, p and q. When transitioning a random walker from the source node, the transition probability from the (i − 1)th node v to ith node x: Here, Z is the normalizing constant and π is the unnormalized transition probability. We introduced a search bias α(u, x) to determine the probability π that a random walker transitions from node v to node x when it transitions from node u to node v, as in the following equation: Here, d denotes the shortest path length between nodes u and x. In this time, the transition probability π = α(u, x) ⋅ w. The parameters p and q controlling this second-order random walk are called the return parameter and the in-out parameter, respectively. Return parameter p represents the likelihood that random walker transitions from u to v and then to u again. In-out parameter q controls whether the random walker performs inward or outward search. Node sampling by a random walker becomes a depth-first sampling (DFS)-like strategy or a breadth-first sampling (BFS)-like strategy controlling these parameters. Since BFS reflects homophily and DFS reflects structural equivalence of nodes in feature representation, we can obtain desirable feature representation by adjusting the p and q parameters to appropriate values. The length of the random walk l and the number of walkers per node t also need to be set. These parameters determine the number of samples for the following feature learning process. We used a skip-gram model for feature learning. This method obtained a feature representation of each node by learning a model that predicts the neighboring nodes N(u) of node u. Here, neighbor nodes refer to the nodes before and after node sequences obtained from the node sampling. The w nodes before and after node u were used as N(u), where w is the window size. Let one of the neighborhood nodes of node u be v ∈ N(u). Given the one-hot vector (1 × |V| vector) corresponding to the input node u, the probability of outputting one-hot vector (|V| × 1 vector) corresponding to the output node v is calculated using a softmax function, gives as Here, when the feature representations for nodes u and v are and (1 × d vector), respectively. The dimension of the feature representation d is given as a parameter. We assume conditional independence among the nodes in N(u): When the input vector is given for all nodes in V, we determine the feature representation of node u to maximize the likelihood of outputting the neighboring node N(u). Feature learning is the following log-likelihood maximization problem: Using Eqs (6) and (7), we represent Eq (8) as follows: Here, Z = ∑exp( ⋅ ). We obtained the feature representations optimizing Eq (9) using the stochastic gradient ascent method. The hyperparameters in node2vec are t, l, w, d, p, q, and these need to be set.

Regression model

As shown in Fig 2(b), in case i, the patient is hospitalized in medical providers included in H. Let denote the level of cooperation among the medical providers included in H. Examining the relationship between the level of medical cooperation and the duration of hospital stay at the case level indicates a regression problem. It is estimated using the following function f: where ε denotes the residual error. This model focuses only on the contribution of the medical cooperation factors C and Q = Din in Eq (1). The predicted value is calculated value by the function f corresponding to the input variable . When strength s was used as a network feature to represent the level of medical cooperation of medical provider j, is defined as the geometric mean of s for medical providers included in H: Here, n is the number of nodes in H, which represents the number of medical providers when the patient was admitted to the case i. In this case, is one-dimensional input variable. However, when we use feature representation , we define as the mean of of medical providers included in H: In this case, is a d−dimensional input variable. In the model with defined by the feature representation by node2vec as the input variables, we used a linear regression model and regression tree ensemble model. The regression tree ensemble model is a method for fitting a function f using an ensemble of regression trees. The algorithms for the regression tree ensemble model are based on random forest [27, 28] and least-squares boosting (LSBoost) [29]. We use Bayesian optimization to optimize the method, which is random forest or LSBoost, and hyperparameters. [30-32]. Given n pairs of output variable y and input variables , the regression tree T fits y with the terminal node R (i = 1, 2, …, I), which is the node at the end of tree splitting, as follows: Here, γ is a constant determined for each terminal node R. We repeated the following process to obtain T until the minimum number of samples at the terminal node nmin or the maximum number of splits smax is reached: Randomly select i from the input variables. Choose the best variable and split point among i. Split the node into two sub-nodes. Ensemble learning is a statistical model that uses an ensemble as a learner of trees T obtained by the above-mentioned process. Here, let the output variable y = Din and the input variables = . We obtained M bootstrap replicas Z* by randomly selecting n samples extracted from n samples. Tree T is trained for each . The random forest algorithm uses the average of M trees T as the ensemble learner: LSBoost considers the loss function L as the mean square error and aggregates the new learner to all the previously trained learners in order to minimize the loss function L at each step: The detailed procedure is as follows: First, we initialized f0() to This represents a constant model with only one leaf. We performed the following steps from m = 1 to M. We fitted the regression tree T to the r. We calculated the following equation using terminal nodes R(j = 1, 2, …, J) of T: Using this γ, we update f as shown in the equation below: Here, η is a parameter that controls the learning rate for each step, and 0 < η ≤ 1. We obtain the regression function f() by repeating this step M times. Using this function, we output .

Considered models

We investigated 4 models using these methods: Model 1 Output variable: duration of hospital stay Din, input variable: geometric mean of strength μ(s), regression model: linear regression. Model 2 Output variable: duration of hospital stay Din, input variable: geometric mean of strength μ(s), age at time of admission and regression model: linear regression. Model 3 Output variable: duration of hospital stay Din, input variable: mean of feature representation using node2vec μ(), regression model: linear regression. Model 4 Output variable: duration of hospital stay Din, input variable: mean of feature representation using node2vec μ(), regression model: regression tree ensemble. Model 1 explains the duration of hospital stay, Din, using a geometric mean of strength μ(s) representing the explicit level of medical cooperation. Model 2 checks whether there is a contribution of medical cooperation when the age at admission was incorporated as an input variable in Model 1. Model 3 and Model 4 use mean of the feature representation using node2vec μ(). represents the mapping of the relationship between each medical provider on the network representing the cooperation among medical providers in the feature space. When the input variables were extended to the d−dimension, we expected a higher explanatory ability for the contribution of medical cooperation to the quality of healthcare provision than the one-dimensional s. Furthermore, Model 4 uses ensemble learning as a statistical model to examine the extent to the medical cooperation factor can explain the variation in the duration of hospital stay Din.

Community analysis

We used community analysis to divide the network into densely connected sub-networks and further investigate regional medical inter-institutional cooperation in the prefecture. We used Infomap for community analysis. Infomap is a method used to detect communities by optimizing the map equation as an evaluation function [33-35]. Infomap efficiently encodes the trajectories of random walkers in a network. We use the fact that a random walker stays in a community for a long time under the ideal code. The ideal coding procedure is as follows: Assign one code to each community. Assign one code to each node within each community. Assign exit code when the walker leaves a community. The community structure is obtained from the code assigned to each community. The optimal code can be obtained by minimizing the map equation: where nc denotes the number of communities. The first term of Eq (20) corresponds to the average number of bits to describe the movement between communities. The second term of Eq (20) represents the average number of bits used to describe the movement within a community. takes a specific value depending on the partitioned community structure. We must minimize for all partitions to determine the optimal partition. Louvain’s algorithm [36] was used for efficient optimization. We assigned each node to a separate community. If decreases when neighboring nodes are joined, we join the nodes. By repeating this process, we can efficiently perform community detection by Infomap.

Results

Firstly, we present the basic features of the constructed network. We also describe the linear regression model analysis results with strength as the input variable, linear regression model, and regression tree ensemble model with the feature representation using node2vec as the input variable. Finally, we examine the relationship between medical cooperation and networks structure using community analysis.

Constructed network

We analyzed the maximum connected component of the medical provider network. The prefecture is divided into five medical administration areas, each of which is designed to provide general inpatient care. The network diagram Fig 3 shows that the medical providers in each medical administration areas tend to cooperate and were clustered close to the network. Note that we classified out-of-prefecture medical providers used by patients in the prefecture as others. In addition, the nodes in Fig 3 were sized according to the number of beds, indicating that the larger medical providers in the prefecture occupy the central positions in the network.

Fig 3

Network diagram of the medical provider network.

Network diagram of the medical provider network.

Number of nodes (medical providers) N = 644, number of edges L = 2756. The prefecture is divided into five medical administration areas, and the nodes are color-coded according to each area. The node size is based on the number of beds in the medical provider. The nodes located in the same area are close to each other on the network. Table 2 lists the network features, and Fig 4(a) and 4(b) show the cumulative distributions of degree and strength. The network features reveal the structural characteristics of the medical provider network. The average clustering coefficient is 0.535. This indicates that the medical providers tended to form clusters with each other. The assortativity was negative. This implies that the network is disassortative. These indicate that the central hospital in an area serves as a hub for medical cooperation and shares patients with smaller medical providers in the periphery of the network.

Table 2

Network features of medical providers network.

Network feature	Calculated value
Number of nodes	644
Number of edges	2756
Average degree	8.3
Average strength	1.3
Average distance	3.09
Average clustering coefficient	0.535
Assortativity	-0.258

Fig 4

Empirical complementary cumulative distribution functions (CCDF) of degree and strength.

(a) Degree and (b) strength. Both are widely distributed. These figures show that the number of medical providers with which patients are shared and the level of medical cooperation with neighboring medical providers vary greatly among medical providers.

Empirical complementary cumulative distribution functions (CCDF) of degree and strength.

Linear regression model using strength

We examined the relationship between medical cooperation and the duration of hospital stay Din using a linear regression model. The strength of each medical provider was the value representing the level of medical cooperation (Model 1 and 2). We included cases of hospitalization with surgery at a medical provider for the series of medical records. In addition, among the target cases, hospitalized patients are in the maximum-connected components of the network. Some elderly patients do not undergo surgery even if their fracture requires hospitalization due to their physical condition. Therefore, cases without surgery were excluded. Additionally, cases in which the duration of hospital stay Din exceeded 400 days were excluded from the study. Table 3 presents information on the target cases. The total number of cases was 2,332. The average age of patients at admission was 83.2 years, and approximately 80% of the patients were female. In 80% of cases, patients were admitted to the same medical provider for the acute and recovery phases without transfer. In about 20% of cases, patients were transferred between two or more medical providers.

Table 3

Information of targeted cases.

Items		Value
Number of cases		2332
Age	Mean	83.2
Duration of hospital stay Dⁱⁿ	Mean	65.6
Sex	Male	495(21.2%)
Sex	Female	1837(78.8%)
Number of hospitalized medical providers	1	1898(81.4%)
	2	401(17.2%)
	>3	33(1.4%)

The average value of the duration of hospital stay Din was 65.6 days. Fig 5(a) and 5(b) show the distribution of Din. Both tails are similar to power law distribution. We calculated the μ(s) of medical providers hospitalized by a patient in each case from the strength s of individual medical providers using Eq (11). We used μ(s) as an input variable. The distribution of the strength s is shown in Fig 4(b). In addition to μ(s), we also considered a model with age as an input variable. This is because age is related to the damage caused by fractures and the time required for recovering.

Fig 5

Distribution of duration of hospital stay Din.

Distribution of duration of hospital stay Din.

(a) Empirical cumulative distribution functions (CDF). (b) Empirical complementary cumulative distribution functions (CCDF). In the logarithmic plot, both tails are straight, which are similar to a power law distribution. Tables 4 and 5 show the results of the regression analysis for Model 1 and 2. In both analyses, the p-value of the t-test for the coefficient of each μ(s) was significantly negative at the 1% level. We found that patients admitted to medical providers with a higher average level of cooperation among medical providers had a shorter duration of hospital stay Din. The same relationships can be observed in Fig 6(a) and 6(b). The explanatory ability of the statistical model improved when age was added to the input variables to account for the effect of increasing the duration of hospital stay Din due to aging (R2 = 0.0085 and 0.011, respectively). In contrast, the coefficients of determination R2 are low in both models, and the statistical model did not have a high explanatory ability for the duration of hospital stay Din. Fig 7 suggests that Model 1 and 2 are underfitting, indicating that the statistical model lacks expressive abilities. We also compared the results with the arithmetic mean of the strength μ(s), and confirmed that μ(s) has the better explanatory ability (S2 Fig).

Table 4

Regression analysis of duration of hospital stay Din by geometric mean of strength μ(s).

	Coefficient	Std. error	t	p-value (t)
(Intercept)	78.02	2.96	26.36	<0.01
Geometric mean of strength μ_G(s)	-5.72	1.28	-4.46	<0.01
R ²	0.0085	F	19.89
R² (5-fold CV)*	0.0075	p-value (F)	<0.01
Adjusted R²	0.0080

*R2 calculated from of the test data with five-fold cross-validation. We calculated it for comparison with models using ensemble learning.

Table 5

Regression analysis of duration of hospital stay Din by geometric mean of strength μ(s) and age at admission.

	Coefficient	Std. error	t	p-value (t)
(Intercept)	56.48	8.87	6.37	<0.01
Geometric mean of strength μ_G(s)	-5.67	1.30	-4.43	<0.01
Age	0.26	0.10	2.58	0.01
R ²	0.011	F	13.28
R² (5-fold CV)*	0.0097	p-value (F)	<0.01
Adjusted R²	0.010

*R2 calculated from of the test data with five-fold cross-validation. We calculated it for comparison with models using ensemble learning.

Fig 6

Result of linear regression model with duration of hospital stay Din as output variable and geometric mean of strength μ(s) and age as input variables.

(a) Din vs. μ(s) (R2 = 0.0085). (b) Din vs. μ(s) and age (R2 = 0.011). These show the negative relationship between Din and μ(s). In addition, the explanatory ability of the statistical model is increased by considering the effect of age on the duration of of hospital stay.

Fig 7

Comparison of measured duration of hospital stay Din and predicted the duration of of hospital stay by linear regression model with geometric mean of strength μ(s) and age as input variables.

(a) μ(s). (b) μ(s) and age. obtained by five-fold cross validation is shown. Both are underfitting and do not fully explain the variation in Din.

*R2 calculated from of the test data with five-fold cross-validation. We calculated it for comparison with models using ensemble learning. *R2 calculated from of the test data with five-fold cross-validation. We calculated it for comparison with models using ensemble learning.

Result of linear regression model with duration of hospital stay Din as output variable and geometric mean of strength μ(s) and age as input variables.

Comparison of measured duration of hospital stay Din and predicted the duration of of hospital stay by linear regression model with geometric mean of strength μ(s) and age as input variables.

(a) μ(s). (b) μ(s) and age. obtained by five-fold cross validation is shown. Both are underfitting and do not fully explain the variation in Din.

Linear regression model using feature representation

We performed regression using a statistical model with the feature representation of each medical provider using node2vec as the input variable. The output variable was Din, and the input variable was the mean value of among the medical providers included in H, as shown in Eq (12). We determined the hyperparameters of node2vec in the following way: As pre-analysis, we set the default parameter values as (t, l, w, d, p, q) = (5, 25, 10, 70, 1, 1). We extracted feature representations by adjusting t, l, w, and d using one parameter each. Regression analysis was performed using the extracted . We calculated root mean squared error (RMSE) using obtained from the regression analysis. RMSE represents the fitting performance of the fitting of the regression model. In the pre-analysis, we performed 20 iterations with five-fold cross-validation and used the average value of RMSE as the evaluation value of the regression performance. As a result, we used the parameter (t, l, w, d) = (7, 60, 13, 75), where the mean value of RMSE is the minimum value of each parameter. The results for this pre-analysis are shown in S3 Fig. After the pre-analysis, we performed a grid search for parameters p and q. The range of p and q explored are p, q ∈ {1/8, 1/4, 1/2, 1, 2, 4, 8}, respectively. The evaluation is the RMSE of the regression analysis with five-fold cross-validation. The average value of the RMSE over 100 iterations is shown in Fig 8(a). Optimal combination of the parameters were (p, q) = (8, 0.25). From the result of the grid search, we determined that the regression performance tends to improve when p is large and q is small.

Fig 8

Result of grid search of hyperparameters p and q of node2vec.

Result of grid search of hyperparameters p and q of node2vec.

(a) p, q ∈ {1/8, 1/4, 1/2, 1, 2, 4, 8}, the number of iteration = 100. (b) p ∈ {4, 8, 16, 32, 64, 128} and q ∈ {1/128, 1/64, 1/32, 1/16, 1/8, 1/4}, the number of iteration = 300. Here, we set (t, l, w, d) = (7, 60, 13, 75), as determined by the pre-analysis. We evaluated the regression performance with the changes in parameters p and q using the root mean squared error (RMSE). Although the variation in RMSE with the change in parameters is not large, the regression performance tends to improve when p is large, and q is small. However, there is little change in RMSE in the range when p is sufficiently large and q is sufficiently small. We expanded the search range to p ∈ {4, 8, 16, 32, 64, 128} and q ∈ {1/128, 1/64, 1/32, 1/16, 1/8, 1/4} including the optimal parameters of the first grid search and performed a second grid search. In addition, we increased the number of iterations to 300 to smoothen the evaluation values. Fig 8(b) shows the results. We obtained (p, q) = (16, 1/32) as the optimal parameters, whereas there is little change in the RMSE where p is sufficiently large and q is sufficiently small. As shown in the surface plot in S4 Fig, the RMSE becomes planar as p increases and q decreases. It does not take a unique optimum value and has many local minimum points. The p and q parameter controls the probability of selecting the next transition node x when transitioning from node u to node v, as shown in Eq (5). When p is large, and q is small, the random walker tends not to return to node u and transits to node x with d = 2. The setting was a DFS-like sampling strategy. These results indicate that the regression performance is improved when the parameter settings are closer to DFS. However, after setting the parameters sufficiently close to DFS, the performance did not improve and got independent of parameters p and q. The regression analysis of Din with the optimal parameters (t, l, w, d, p, q) = (7, 60, 13, 75, 16, 1/32) obtained from the grid search showed that the mean value of the coefficient of determination R2 was 0.10, and the highest value was 0.17. Fig 9 shows a plot of the predicted value and measured values Din when the R2 had the highest value. Compared with the model with μ(s) as the input variable, the performance was improved approximately ten-fold. We found that the causes of decreasing regression performance were variation in around Din = 0 and underestimation in domains with large Din.

Fig 9

Comparison of measured duration of hospital stay Din and predicted duration of hospital stay calculated by linear regression model.

The predicted and the measured values calculated by the linear regression model with the average value of μ() of the feature representations by node2vec as the input variable. We select the best result comparing the coefficient of determination R2 among all sets extracted through 300 iterations (R2 = 0.17).

Comparison of measured duration of hospital stay Din and predicted duration of hospital stay calculated by linear regression model.

Regression tree ensemble model using feature representation

We performed an analysis using the regression tree ensemble model with the same used in the linear regression. The algorithms and hyperparameters of regression tree ensemble model were optimized using Bayesian optimization. The mean value of the coefficient of determination R2 was 0.21, and the highest value was 0.24. Fig 9 shows a plot of the predicted value and the measured values Din when the R2 had the highest value. We found that the performance improved by approximately two-fold compared to the linear regression model, which is Model 3. The performance was approximately 20 times better than that of the model using μ(s), which is Model 1. The tree ensemble model improves the performance of around Din = 0 based on a comparison of Figs 8 and 9. However, there is still a tendency to underestimate in domains with large Din.

Medical cooperation and network structure

Based on the results of the regression analysis, we found a relationship between the geometric mean of strength μ(s) among medical providers and the duration of hospital stay Din. We considered that strength s is adopted as a value representing the level of cooperation among medical providers from this relationship. To examine the average level of medical cooperation in each community, we identified communities and calculated each community’s μ(s). We used a two-level Infomap and detected 40 communities. Since 95% of the medical providers in the prefecture were in the seven communities with the largest number of components, we compared the μ(s) among these seven communities. We also compared μ(s) and other network features of each community. Table 6 shows the results. We calculated μ(s) for all medical providers belonging to a community and only those in the prefecture. We found a disparity in medical cooperation among communities, which was more pronounced when we focused only on medical providers in the prefecture.

Table 6

Network features of communities.

		Community 1	Community 2	Community 3	Community 4
Number of nodes	Total	196	152	64	45
	In-prefecture	135(68.9%)	108(71.1%)	40(62.5%)	35(77.8%)
	Area 1	99(50.5%)	0(0.0%)	0(0.0%)	5(11.1%)
	Area 2	1(0.5%)	5(3.3%)	0(0.0%)	27(60.0%)
	Area 3	35(17.9%)	3(2.0%)	32(50.0%)	1(2.2%)
	Area 4	0(0.0%)	77(50.7%)	8(12.5%)	2(4.4%)
	Area 5	0(0.0%)	23(15.1%)	0(0.0%)	0(0.0%)
	Out-of-prefecture	61(31.1%)	44(28.9%)	24(37.5%)	10(22.2%)
Number of edges		848	606	143	89
Average degree		8.65	7.97	4.47	3.96
Average strength		0.99	1.01	0.79	0.54
Average distance		2.53	2.43	2.28	2.07
Average clustering coefficient		0.51	0.63	0.62	0.57
Assortativity		-0.40	-0.40	-0.57	-0.61
Geometric mean of strength		0.30	0.31	0.30	0.18
	In-prefecture	0.39	0.38	0.34	0.23
		Community 5	Community 6	Community 7
Number of nodes	Total	31	18	12
	In-prefecture	20(64.5%)	12(66.7%)	8(66.7%)
	Area 1	1(3.2%)	0(0.0%)	0(0.0%)
	Area 2	0(0.0%)	7(38.9%)	8(66.7%)
	Area 3	18(58.1%)	0(0.0%)	0(0.0%)
	Area 4	1(3.2%)	5(27.8%)	0(0.0%)
	Area 5	0(0.0%)	0(0.0%)	0(0.0%)
	Out-of-prefecture	11(33.3%)	6(33.3%)	4(33.3%)
Number of edges		42	21	14
Average degree		2.71	2.33	2.33
Average strength		0.50	0.95	0.50
Average distance		2.12	1.95	1.89
Average clustering coefficient		0.39	0.24	0.37
Assortativity		-0.74	-0.71	-0.69
Geometric mean of strength		0.18	0.22	0.21
	In-prefecture	0.20	0.17	0.28

We explain the cause of this disparity from the relationship between the network features and μ(s). Fig 10 shows there is a linear relationship between network features and μ(s). Fig 10(a) shows the relationship between μ(s) and the average distance normalized by the number of nodes, 〈d〉/N. The closer the medical providers in the network, the higher is the level of medical cooperation. Fig 10(b) shows the relationship between μ(s) and the average clustering coefficient 〈C〉, where communities with a higher clustering tendency have a higher level of medical cooperation. Fig 10(c) shows the relationship between μ(s) and assortativity r, where communities with less disassortativity have a stronger level of medical cooperation. This indicates that a horizontally decentralized structure where each medical provider is connected to the other is more suitable for medical cooperation than a centralized structure in which the edge is concentrated around a particular medical provider.

Fig 10

Relationship between geometric mean of strength μ(s) and network features of each community.

Relationship between geometric mean of strength μ(s) and network features of each community.

(a) μ(s) vs. the average distance normalized by the number of nodes 〈d〉/N. (b) μ(s) vs. the average clustering coefficient 〈C〉. (c) μ(s) vs. assortativity r. There is a linear relationship between each network feature and μ(s), indicating that there is a relationship between network structure and medical cooperation. Each community comprises medical providers located in a specific area, indicating the mutual collaboration among medical providers in close geographical distance. In addition, the results of the medical providers included in each community by medical administration areas are listed in Table 6. This result indicates that the community structure reflects the geographical distance. It confirms the observation from the network diagram shown in Fig 3, in terms of community analysis.

Discussion

This study aimed to examine the overall contribution to the quality of health care provision from medical inter-institutional cooperation. For this purpose, we considered the four types of statistical models. Table 7 presents a comparison of the results of each regression model. In the table, we compare the R2 and adjusted R2 values of the test data from five-fold cross-validation for each model considering the differences in regression methods and dimensions of input variables.

Table 7

Comparison of regression performance among four models.

		R²(5-fold CV)	Adjusted R²(5-fold CV)
Model 1		0.0075	0.0071
Model 2		0.010	0.0088
Model 3	Mean	0.10	0.066
Model 3	Best	0.17	0.15
Model 4	Mean	0.21	0.18
Model 4	Best	0.24	0.21

Based on Model 1, we found a negative relationship between the geometric mean of the strength and the duration of hospital stay in medical providers. Patients with femoral neck fractures were admitted into the hospital. In addition, Model 2 shows that medical cooperation is effective even when the patient’s age at admission as a factor affecting prolonged hospital stay was considered. These results indicate that the general network features contribute to the healthcare quality at the case level, which have been investigated in previous studies on medical provider networks at the patient level [13, 14]. Contrarily, the explanatory ability of statistical models for Models 1 and 2 was weak (R2 = 0.0085, 0.011, respectively), and Fig 7 confirms that underfitting is the reason for the low explanatory ability of the statistical models. Strength represents the summation of the cosine similarities of patients shared with neighboring medical providers in the network. The cooperation with neighboring medical providers to adjust capacity to appropriately care for patients may function effectively and shorten the duration of hospital stay. However, it occurs only if a patient hospitalizes in medical providers that frequently share patients with neighboring medical providers. The time from the fracture to surgery was identified as a risk factor for femoral neck fractures [20, 22] and capacity adjustment in the acute phase hospitals is essential. This implies that the results of the study examining impact of network structure on A&E performance in the analysis of patient transfer networks between wards may also be appropriate for regional medical cooperation [37]. We also examined Model 3 as a model with enhanced ability to represent input variables using feature representation by node2vec. We determined the parameters p and q of node2vec using a grid search. Fig 8 shows that the performance of the regression model is improved when p is large and q is small. It means that the probability of returning is low, and exploring outward is high when a random walker samples nodes, and the sampling strategy was DFS-like. DFS reflects a structural equivalence of the nodes in the embedding into the feature space, whereas the BFS reflects homophily [26]. We found that information on structural similarity (rather than the relationship with neighboring nodes closely interacting in the network structure) represents medical cooperation in medical provider networks. This result suggests that nodes structurally similar in a medical provider network play an important role in maintaining healthcare quality. The results of Model 3 show that using node2vec feature representations as input improved performance approximately ten times compared to Model 2 (mean value of R2 = 0.10). Thus, the strength does not fully represent the information about medical cooperation contained in the medical provider network. node2vec fully extracted information as a feature. However, when Din was approximately 0, the value of varied greatly from Fig 9. This is a factor of performance loss. We used the regression tree ensemble model to improve the regression performance in Model 4. We improved the performance by approximately two times that of Model 3, which was approximately 20 times the regression performance of the model using strength (Model 1 and 2). Ensemble learning is effective in reducing model bias and variance [38]. Thus, Model 4 was improved by preventing overfitting to near zero in Model 3. We revealed that the overall contribution of medical cooperation factors to the duration of hospitalization in each patient with femoral neck fracture was about 20%. This result indicates that the remaining variation is explained by factors other than medical cooperation. We show six possible factors in Eq (1): patient-specific factors P, health factors H, environment factors E, social service factors S, medical care factors M, medical cooperation factors C. We considered that there is the remaining variation not explained by medical cooperation because these factors differed in each case. In Model 4, the accuracy of fitting Din was low in the domain where Din was large. This suggests that patients and the quality of medical care are provided at each medical provider contribute more than factors related to medical cooperation in prolonged hospitalization. The community analysis results showed that each community prominently contained nodes in the same medical administration area. This result is consistent with the research findings on the structure of medical provider networks in China [39]. The geometric mean of strength μ(s) varied among communities, indicating differences in the level of medical cooperation among communities. Fig 11 shows the relationship between each network feature and μ(s) to explain this difference. The closer the distance between nodes, the more horizontally distributed in the structure and the higher the level of medical cooperation. In contrast, the centralized structure may lead to bottlenecks, and prolonged hospital stays when patients are centered in a hub medical provider.

Fig 11

Comparison of measured duration of hospital stay Din and predicted duration of hospital stay Din calculated by regression tree ensemble model.

Comparison of measured duration of hospital stay Din and predicted duration of hospital stay Din calculated by regression tree ensemble model.

The predicted and measured values calculated by the regression tree ensemble model with the average value of μ() of the feature representations by node2vec as the input variable. We select the best result comparing the coefficient of determination R2 among all sets extracted through 300 iterations (R2 = 0.24). Patients admitted to medical providers with a high level of medical cooperation had a shorter duration of hospital stay. The regional medical inter-institutional cooperation inferred from the medical provider network explains approximately 20% of the variation in the duration of hospitalization. The duration is an evaluation of the healthcare quality. We expect to develope a medical cooperation measure based on the network features composed of interpretable network features, which has the same level of explanatory ability as the feature representation using node2vec. This measure could be used to assess regional medical inter-institutional cooperation as a guide for introducing medical policies. We also consider the optimization of the healthcare system using this measure. In addition, although this study focused on femoral neck fractures, it is also necessary to investigate the relationship between medical cooperation and the quality of healthcare for other diseases. It is possible to clarify the contribution of medical cooperation with each disease using the method described in this study. This could identify diseases in which medical cooperation contributes to the quality of healthcare provision.

Summary

It is necessary to construct an efficient healthcare system and provide good quality healthcare for the aging population worldwide. The networks among medical providers and physicians constructed by patient claims data have been considered to represent medical cooperation. Previous studies have revealed the relationship between the quality of healthcare provision and network features. In contrast, patterns of multiple medical providers used by patients have not been taken into account. They used only general network features to explain the quality of healthcare provision. The overall contribution to the quality of healthcare provision from the information extracted from medical networks is unknown. In addition, it is important to incorporate the usage patterns of medical providers in the Japanese health care system. This study aimed to examine the overall contribution to the quality of health care provision from medical inter-institutional cooperation, using the information on medical cooperation in the medical provider network in Japan. We conclude that the regional medical inter-institutional cooperation represented by the medical provider network is an important factor in shortening the duration of hospital stay. We examined a model that uses the node2vec feature representation as input variables and ensemble learning as a regression model to improve the explanatory ability and found that the overall contribution of medical inter-institutional cooperation to the quality of healthcare provision at case level was approximately 20%. Other factors, such as the patient condition and care by the individual hospital, explained the remaining variation in the quality of healthcare. Medical inter-institutional cooperation still plays a significant role in providing high-quality healthcare. In addition, the results for community analysis suggested that the horizontally distributed network structure is effective for medical inter-institutional cooperation.

Distribution of inpatient and outpatient intervals.

(a) Cumulative distribution function (CDF) of inpatient intervals. (b) Probability distribution function (PDF) of inpatient intervals. (c)Cumulative distribution function (CDF) of outpatient intervals. (d)Probability distribution function (PDF) of inpatient interval. The distribution of inpatient intervals followed an exponential distribution (μ = 126), except for the zero-day interval. The outpatient interval has a cyclic variation with a peak every 7 days. The inpatient interval distribution included approximately 60% of the inpatient interval at τ1 = 10 days. The outpatient interval distribution included approximately 80% of the outpatient interval at τ2 = 35 days. (TIF) Click here for additional data file.

Duration of hospital stay Din vs. mean and geometric mean of strength.

(a) shows mean μ(s). (b) shows the geometric mean μ(s). The geometric mean has a better explanatory ability when both R2 values are compared. (TIF) Click here for additional data file.

Pre-analysis for determining hyperparameters of node2vec.

(a) number of walkers per node t; (b) Length of random walk l; (c) Window size w; and (d) Dimension d. We set the default parameter values as (t, l, w, d, p, q) = (5, 25, 10, 70, 1, 1), and extracted feature representations by adjusting t, l, w, and d by one parameter each. We performed a regression analysis using the extracted . We calculated the root mean squared error (RMSE) using obtained from regression analysis. We performed 20 iterations with five-fold cross-validation and used the average value of the RMSE as the evaluation value of the regression performance. (TIF) Click here for additional data file.

Surface plot of the grid search of hyperparameters p and q of node2vec p ∈ {4, 8, 16, 32, 64, 128} and q ∈ {1/128, 1/64, 1/32, 1/16, 1/8, 1/4}.

We evaluated the regression performance with changes in parameters p and q using the RMSE. The figure shows that the RMSE approaches the plane when p increases, and q decreases. This means that there are a large number of local minima, which makes it difficult to find a unique optimum value. (TIF) Click here for additional data file. 27 Jun 2022

PONE-D-22-07692

Regional medical inter-institutional cooperation in medical provider network constructed using patient claims data from Japan

PLOS ONE Dear Dr. Ohki, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please submit your revised manuscript by Aug 11 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript:

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols. We look forward to receiving your revised manuscript. Kind regards, Hocine Cherifi Academic Editor PLOS ONE Journal Requirements: When submitting your revision, we need you to address these additional requirements. 1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf 2. In your Data Availability statement, you have not specified where the minimal data set underlying the results described in your manuscript can be found. PLOS defines a study's minimal data set as the underlying data used to reach the conclusions drawn in the manuscript and any additional data required to replicate the reported study findings in their entirety. All PLOS journals require that the minimal data set be made fully available. For more information about our data policy, please see http://journals.plos.org/plosone/s/data-availability. Upon re-submitting your revised manuscript, please upload your study’s minimal underlying data set as either Supporting Information files or to a stable, public repository and include the relevant URLs, DOIs, or accession numbers within your revised cover letter. For a list of acceptable repositories, please see http://journals.plos.org/plosone/s/data-availability#loc-recommended-repositories. Any potentially identifying patient information must be fully anonymized. Important: If there are ethical or legal restrictions to sharing your data publicly, please explain these restrictions in detail. Please see our guidelines for more information on what we consider unacceptable restrictions to publicly sharing data: http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions. Note that it is not acceptable for the authors to be the sole named individuals responsible for ensuring data access. We will update your Data Availability statement to reflect the information you provide in your cover letter. 3. Thank you for stating the following financial disclosure: “This work was supported by JSPS KAKENHI (Grant Number: JP19H01075) from the Japan Society for the Promotion of Science and Health and Labour Sciences Research Grant from the Ministry of Health, Labour and Welfare, Japan (Grant Number: 21IA1005 and 21FA1012). YO thanks the Kyoto University Science and Technology Innovation Creation Fellowship. Y. Ikeda and YO would also like to acknowledge Ripple, which is providing financial support through its University Blockchain Research Initiative.” Please state what role the funders took in the study. If the funders had no role, please state: "The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript." If this statement is not correct you must amend it as needed. Please include this amended Role of Funder statement in your cover letter; we will change the online submission form on your behalf. 4. Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice. Additional Editor Comments : This paper studies medical cooperation efficiency through medical provider and physician networks based on patient claim data. They are starting with a bipartite network of a patient using medical providers. They build the projected networks. They extract network features to predict the quality of healthcare. They computed the duration of hospital stay of patients in each case of surgery related to the femoral neck 633 fracture. They also use community structure analysis to investigate the relationship between the level of medical cooperation and network structure. They show that the regional medical inter-institutional collaboration represented by the medical provider network is an essential factor in shortening the duration of hospital stay. The paper is well written. Experiments and results are sound. The methodology and the findings are of great interest to the scientific community. Therefore, I recommend publication after taking care of the minor revisions suggested by the reviewer. [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes ********** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes ********** 3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes ********** 4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes ********** 5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: Overall, this paper is excellent. This paper uses standard network science notations and so is generally easy to read and figure out. The work is strong, and the results are quite interesting. I have only small comments, listed below. Abstract: In the abstract and intro you mention 4 types of models. It would be useful to simply list the 4 types, maybe even as 1, 2, 3, 4, like what you did with the study focus. Or something like: “Models 1 and 2 use node strength and linear regression, with Model 2 also incorporating patient age as an input. Models 3 and 4 use feature representation by node2vec with linear regression and regression tree ensemble, a machine learning method.” Also, in both abstract and intro you use the word “strength” (or a “stronger medical provider”) to describe medical providers, and it is not totally clear what it means in that context (it hasn’t been defined yet – l assume you mean it as defined on line 170). It might be better to say “The results showed that medical providers with higher levels of cooperation reduce the duration of hospital stay.” (as you do on lines 414-415) Introduction: Very small language errors. For example in the first paragraph, “establish healthcare systems” and “services to address this situation.” The organization section is confusion, referring to “section 2,” although the sections aren’t numbered. Line 108: “recovery phase hospitals” Line 125: “the following function” Data and Methods: I like the fact that you have used cosine similarity to normalize size of providers. Line 181: Might be useful to define assortativity. Line 233: This is my favorite paragraph in the paper. You have done an excellent job of explaining things. Overall I find this section to be well done. Good job. Results Line 399 Din is typeset wrong. Line 492 talks about 7 communities, but table 6 only shows 6. Discussion Line 522 I don’t think the models need to be repeated. This section is very well done. Summary This section is highly repetitive, and you have already said the important things in the discussion. I don’t think you need the second paragraph at all, and the 3rd paragraph is mostly repetition. ********** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No ********** [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

19 Jul 2022 Response to Reviewers 2022/07/12 We would like to thank the reviewers for their helpful comments. After serious consideration of all the comments, the paper has been revised as follows. Revision Abstract: In the abstract and intro you mention 4 types of models. It would be useful to simply list the 4 types, maybe even as 1, 2, 3, 4, like what you did with the study focus. Or something like: “Models 1 and 2 use node strength and linear regression, with Model 2 also incorporating patient age as an input. Models 3 and 4 use feature representation by node2vec with linear regression and regression tree ensemble, a machine learning method.” Response to the comment: We described the four models we focused on in this research following your suggestion: "We considered four types of models. Models 1 and 2 use node strength and linear regression, with Model 2 also incorporating patient age as an input. Models 3 and 4 use feature representation by node2vec with linear regression and regression tree ensemble, a machine learning method." Also, in both abstract and intro you use the word “strength” (or a “stronger medical provider”) to describe medical providers, and it is not totally clear what it means in that context (it hasn’t been defined yet – l assume you mean it as defined on line 170). It might be better to say “The results showed that medical providers with higher levels of cooperation reduce the duration of hospital stay.” (as you do on lines 414-415) Response to the comment: We changed the sentence following your suggestions: "The results showed that medical providers with higher levels of cooperation reduce the duration of hospital stay.” Introduction: Very small language errors. For example in the first paragraph, “establish healthcare systems” and “services to address this situation.” Response to the comment: We changed the sentence to "establish healthcare systems" and "services to address this situation." The organization section is confusion, referring to “section 2,” although the sections aren’t numbered. Response to the comment: We changed the paragraph to "The remainder of this paper is organized as follows. The following section describes the role of medical cooperation among healthcare providers for a femoral neck fracture. We then describe the data, data analysis method, statistical model, and community analysis in the "Data and Method" section. The "Results" section presents the relationship between the features of medical providers and the duration of hospital stay based on the constructed network. Using a community analysis, we also examine the relationship between the network structure and medical cooperation. The "Discussion" section discusses the results, and the "Summary" section summarizes and concludes this paper." Line 108: “recovery phase hospitals” Response to the comment: We changed the sentence to "recovery phase hospitals." Line 125: “the following function” Response to the comment: We used the term "functional" because each component of Eq. (1) is a function and f represents a higher-order function. Thus, "the following functional" is correct. Data and Methods: I like the fact that you have used cosine similarity to normalize size of providers. Line 181: Might be useful to define assortativity. Response to the comment: We changed the sentence to "Assortativity r is the Pearson correlation coefficient of the degree of the nodes at both ends of the edge, r=Σ_ij (k_i k_j-〈k〉^2)/〖Σ_i (k_i-〈k〉)〗^2." Line 233: This is my favorite paragraph in the paper. You have done an excellent job of explaining things. Overall I find this section to be well done. Good job. Results Line 399 Din is typeset wrong. Response to the comment: We revised it to "D^in." Line 492 talks about 7 communities, but table 6 only shows 6. Response to the comment: We revised table 6 to add a "Community 7" column. Discussion Line 522 I don’t think the models need to be repeated. Response to the comment: We removed line 522 to line 531. This section is very well done. Summary This section is highly repetitive, and you have already said the important things in the discussion. I don’t think you need the second paragraph at all, and the 3rd paragraph is mostly repetition. Response to the comment: We removed the 2nd paragraph. We merged the 3rd paragraph and 4th paragraph: " We conclude that the regional medical inter-institutional cooperation represented by the medical provider network is an important factor in shortening the duration of hospital stay. We examined a model that uses the node2vec feature representation as input variables and ensemble learning as a regression model to improve the explanatory ability and found that the overall contribution of medical inter-institutional cooperation to the quality of healthcare provision at case level was approximately 20%. Other factors, such as the patient condition and care by the individual hospital, explained the remaining variation in the quality of healthcare. Medical inter-institutional cooperation still plays a significant role in providing high-quality healthcare. In addition, the results for community analysis suggested that the horizontally distributed network structure is effective for medical inter-institutional cooperation." 3 Aug 2022 Regional medical inter-institutional cooperation in medical provider network constructed using patient claims data from Japan PONE-D-22-07692R1 Dear Dr. Ohki, We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org. If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Kind regards, Hocine Cherifi Academic Editor PLOS ONE Additional Editor Comments (optional): Reviewers' comments: 12 Aug 2022 PONE-D-22-07692R1 Regional medical inter-institutional cooperation in medical provider network constructed using patient claims data from Japan Dear Dr. Ohki: I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department. If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org. If we can help with anything else, please email us at plosone@plos.org. Thank you for submitting your work to PLOS ONE and supporting open access. Kind regards, PLOS ONE Editorial Office Staff on behalf of Professor Hocine Cherifi Academic Editor PLOS ONE

26 in total

1. Properties of healthcare teaming networks as a function of network construction algorithms.

Authors: Martin S Zand; Melissa Trayhan; Samir A Farooq; Christopher Fucile; Gourab Ghoshal; Robert J White; Caroline M Quill; Alexander Rosenberg; Hugo Serrano Barbosa; Kristen Bush; Hassan Chafi; Timothy Boudreau
Journal: PLoS One Date: 2017-04-20 Impact factor: 3.240

2. Analysis of the U.S. patient referral network.

Authors: Chuankai An; A James O'Malley; Daniel N Rockmore; Corey D Stock
Journal: Stat Med Date: 2017-12-04 Impact factor: 2.373

Review 3. Quality of life after hip fracture in the elderly: A systematic literature review.

Authors: Charles M M Peeters; Eva Visser; Cornelis L P Van de Ree; Taco Gosens; Brenda L Den Oudsten; Jolanda De Vries
Journal: Injury Date: 2016-04-23 Impact factor: 2.586

4. Patient sharing among physicians and costs of care: a network analytic approach to care coordination using claims data.

Authors: Craig Evan Pollack; Gary E Weissman; Klaus W Lemke; Peter S Hussey; Jonathan P Weiner
Journal: J Gen Intern Med Date: 2012-06-14 Impact factor: 5.128

5. Predictors of early mortality after hip fracture surgery.

Authors: Muhammad Asim Khan; Fahad Siddique Hossain; Iftikhar Ahmed; Nagarajan Muthukumar; Amr Mohsen
Journal: Int Orthop Date: 2013-08-28 Impact factor: 3.075

6. Predicting survival after treatment for fracture of the proximal femur and the effect of delays to surgery.

Authors: J Elliott; T Beringer; F Kee; D Marsh; C Willis; M Stevenson
Journal: J Clin Epidemiol Date: 2003-08 Impact factor: 6.437

7. Physician network connections to specialists and HIV quality of care.

Authors: Chad Stecher
Journal: Health Serv Res Date: 2021-02-04 Impact factor: 3.734

8. The association between patient sharing network structure and healthcare costs.

Authors: Kimberley H Geissler; Benjamin Lubin; Keith M Marzilli Ericson
Journal: PLoS One Date: 2020-06-22 Impact factor: 3.240

9. Care coordination for severe mental health disorders: an analysis of healthcare provider patient-sharing networks and their association with quality of care in a French region.

Authors: Coralie Gandré; Laurent Beauguitte; Alexandre Lolivier; Magali Coldefy
Journal: BMC Health Serv Res Date: 2020-06-17 Impact factor: 2.655

10. Physician patient-sharing relationships and healthcare costs and utilization in China: social network analysis based on health insurance data.

Authors: Huajie Hu; Yichen Zhang; Dawei Zhu; Xiaodong Guan; Luwen Shi
Journal: Postgrad Med Date: 2021-06-28 Impact factor: 3.840