Literature DB >> 36026434

Encoding edge type information in graphlets.

Mingshan Jia¹, Maité Van Alboom², Liesbet Goubert², Piet Bracke², Bogdan Gabrys¹, Katarzyna Musial¹.

Abstract

Graph embedding approaches have been attracting increasing attention in recent years mainly due to their universal applicability. They convert network data into a vector space in which the graph structural information and properties are maximumly preserved. Most existing approaches, however, ignore the rich information about interactions between nodes, i.e., edge attribute or edge type. Moreover, the learned embeddings suffer from a lack of explainability, and cannot be used to study the effects of typed structures in edge-attributed networks. In this paper, we introduce a framework to embed edge type information in graphlets and generate a Typed-Edge Graphlets Degree Vector (TyE-GDV). Additionally, we extend two combinatorial approaches, i.e., the colored graphlets and heterogeneous graphlets approaches to edge-attributed networks. Through applying the proposed method to a case study of chronic pain patients, we find that not only the network structure of a patient could indicate his/her perceived pain grade, but also certain social ties, such as those with friends, colleagues, and healthcare professionals, are more crucial in understanding the impact of chronic pain. Further, we demonstrate that in a node classification task, the edge-type encoded graphlets approaches outperform the traditional graphlet degree vector approach by a significant margin, and that TyE-GDV could achieve a competitive performance of the combinatorial approaches while being far more efficient in space requirements.

Entities: Chemical

Mesh：

Year: 2022 PMID： 36026434 PMCID： PMC9416998 DOI： 10.1371/journal.pone.0273609

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.752

1 Introduction

Abstracting entities and their interactions as nodes and links, networks are a general model for studying complex systems [1]. Real-world complex networks contain not only topological information but also rich information about nodes and links [2]. Many previous works propose to exploit node attributes by jointly embedding them with topological structures, and the enhanced representation has been shown to be powerful for numerous applications, such as node classification [3-5], link prediction[6, 7], anomaly detection [8, 9], and network visualisation [10]. These approaches, however, overlook rich information about interactions between nodes. Edge attribute or edge type information is indispensable when studying many networks. For instance, the label of each edge in a routing network reflects the cost of traffic via that edge and is used to determine the best possible routing scheme; in a user-object bipartite network, an edge is labelled with the user’s rating for the product, based on which effective recommender systems can be built [11]; and in egocentric social networks, labels of edges illustrate different types of social relationships and are essential in analysing individuals’ behaviours and characteristics [12]. To address this issue, we propose to incorporate edge type information into graphlets and form a Typed-Edge Graphlets Degree Vector (TyE-GDV) [13]. This is mainly inspired by the classic graphlets approach that generates a graphlet degree vector (GDV) [14]. Each coordinate in GDV has a clear meaning, i.e., representing a particular topological structure. Due to this excellent explainability, graphlets have gained considerable ground in a variety of domains. It is revealed that in molecular networks, proteins performing similar biological functions possess similar local structures depicted by GDV [15]. Graphlets are also used in computer vision and neuroscience, in order to capture the spatial structure of superpixels [16] or to detect structural and functional abnormalities in the brain [17]. Notably, in social science, egocentric graphlets are used to depict the social interaction patterns of individuals [18]. In the proposed TyE-GDV approach, we choose to add an extra dimension of edge type on top of GDV, that is to say, counting each type of edge touched by each graphlet. Therefore, each coordinate in the two-dimensional vector also has a clear meaning—the number of edges of a certain type in a certain graphlet. We also propose an egocentric version of TyE-GDV that is more succinct and space efficient when dealing with egocentric networks. We then employ the proposed TyE-GDV and the classic graphlets degree vector [15] to evaluate and analyse a collection of egocentric social networks of chronic pain patients. The real-life data is gathered from two chronic pain leagues in Belgium [19]. Each patient creates an egocentric social network with edges denoted by the type of social relationships. The patients are divided into four groups based on their self-perceived pain grades. First, we find that graphlet patterns are indeed helpful in assessing the pain grade—patients with higher pain grades form more star-like structures (3-star graphlets), whereas patients with lower pain grades have more tightly connected structures (3-cliques, 4-chordal-cycles and 4-cliques). Second, the edge-type embedded graphlets depicted by TyE-GDV provide us with more insights into how particular social ties could affect the perceived pain. Specifically, we find that in patients of higher pain grades, friends and healthcare workers are the dominant social types in the poorly connected 3-stars; and that in patients of lower pain grades, friends and colleagues appear more often in the tightly connected graphlets such as 3-cliques and 4-cliques. To compare with the proposed method, we further extend two recent graphlets-based approaches, i.e., the colored graphlets approach [20] and the heterogeneous graphlets approach [21], to edge-attributed networks and egocentric networks. We then apply TyE-GDV and the extended colored and heterogeneous graphlets approaches to a node classification task. Besides the egocentric social networks of chronic pain patients, the dataset also contains rich information about the patients’ demographic attributes, pain scores and other physical/psychological well-being descriptors, which are used as baseline features in the experiment. We then set up to include features captured by the proposed method and other related approaches and aim to classify patients into different pain grade groups. The result shows that the edge-type augmented graphlet features are more distinctive than the traditional non-typed graphlet features provided by GDV in separating patients with different pain grades. To summarise, the main contributions of this work are as follows: In order to effectively encode edge type information, we propose a novel framework to generate a Typed-Edge Graphlet Degree Vector; We further modify the TyE-GDV framework so that it is better suited for egocentric networks; We extend colored graphlets and heterogeneous graphlets approaches for edge-typed networks and egocentric networks. According to a case study on individuals with chronic pain, certain social ties are more crucial in understanding the effects of chronic pain and may result in more successful therapeutic interventions. We demonstrate that rich structural information enhanced by edge-type information leads to significant improvement in a typical machine learning task. The remainder of this paper is organised as follows. Related works are discussed in Section 2. Preliminary knowledge is provided in Section 3. The proposed typed-edge graphlets, and the extended colored graphlets and heterogeneous graphlets are introduced in Section 4 and Section 5, respectively. Experiments, results and analysis are presented in Section 6. Finally, we conclude and discuss future directions in Section 7.

2 Related work

Compared to abundant approaches that take advantage of node attributes, fewer works have focused on leveraging edge attribute information in graph analysis. A straightforward approach is to construct an adjacency matrix containing edge attributes and then to factorise it [22]. This approach, however, involves the expensive matrix operation like the singular value decomposition and therefore lacks scalability. EdgeCentric focuses on the problem of anomaly detection and proposes to aggregate attribute values of edges incident to each node and defines an abnormality scoring function [23]. One limitation of EdgeCentric is that its topological scope is restricted within directly connected edges. The framework GERI proposes to first construct a heterogeneous graph by adding extra bridge nodes that represent node/edge attributes, then take a random walk to sample a node’s neighbourhood, and learn its embedding [24]. However, converting attribute information into structural information will also make the attribute information lose its original meaning. Based on the approach of Poincaré embeddings [25], Chen and Quirk recently proposed an embedding method that simultaneously preserves the hierarchical property and edge attributes [26]. This approach is apparently limited in its exclusive focus on hierarchical relationships. Although these approaches are shown to be effective in some downstream tasks, a common issue about them is that their learned embeddings lack explainability—we do not know what each element of the embedding vector means. They are, therefore, unable to reveal the deeper and, ideally, more easily explainable relationship between a local network structure and an edge attribute.

3 Preliminaries

In this section, we introduce the notions of graphlets and orbits, and discuss how they can be adapted in egocentric networks.

3.1 Graphlets and orbits

Node degree, being the most basic structural feature, counts the number of edges incident to a node. Graphlets or graphlets degree generalises the idea of node degree by counting the number of graphlets the node participates. Specifically, graphlets are a set of “small connected nonisomorphic induced subgraphs” [14]. Small is to say the size of subgraphs is small, usually no more than 4 or 5 nodes. Nonisomorphic means that those subgraphs are structurally distinct, and induced means that all the edges among the nodes in a subgraph need to be considered. The original work covers graphlets of sizes ranging from 2 to 5 nodes, resulting in a total number of 30 different graphlets. Besides, as a node-level structural measure, the non-symmetry of node position is also taken into account, leading to a total number of 73 different subgraph structures, termed automorphism orbits [15]. Briefly, orbits are graphlets that distinguish the position of a focal node (we use orbits and node-orbit graphlets interchangeably in this work). The Graphlet Degree Vector (GDV) of a particular node is thus defined as a vector of the frequencies of 73 orbits. GDV, or sometimes normalised GDV, has been widely applied in various domains and has become a standard structural feature when measuring the similarities and differences between nodes [15-17]. We summarise node-orbit graphlets of 2 to 4 nodes in Fig 1(a). Taking one of the black nodes in G7, for example, it touches orbit-0 three times (the degree of the node), orbit-2 once (the open triad), orbit-3 twice (the triangle), and orbit-13 once. Therefore, its graphlet degree vector has 3 at the 0th coordinate, 1s at the 2th and 13th coordinates, 2 at the 3rd coordinate, and 0s at the remaining coordinates.

Fig 1

Graphlets of 2–4 nodes with the enumeration of orbits.

Graphlets of 2–4 nodes with the enumeration of orbits.

(a) Node orbits: there are in total 15 node orbits, different node colors indicating nonisomorphic node positions within a given graphlet. (b) Edge orbits: there are in total 13 edge orbits, different line types denoting nonisomorphic edge positions within a given graphlet. The notion of orbits was originally established at a node level, distinguishing a node position when counting graphlets. Hočevar and Demšar later proposed to count graphlets at a link level and introduced the notion of edge orbits [27]. Fig 1(b) gives all edge orbits containing 2 to 4 nodes. Apparently, edge orbits are different from node orbits. For example, there is only one edge orbit in graphlet G1, but two node orbits in it. We also refer to edge orbits as edge-orbit graphlets in this work. The concept of heterogeneous graphlets is built upon edge orbits, and we will discuss more about it in Section 5.

3.2 Egocentric graphlets

Graphlets is initially proposed for general networks or sociocentric networks. Although sociocentric networks appear to be more comprehensive modellings of complex systems, collecting sociocentric data via survey is also difficult because participants need to be identifiable to the researcher, and this lack of anonymity can result in unwillingness to participate or bias in responses [12]. Moreover, there are situations where we care more about individuals and their immediate environment. For example, we may want to understand why some people form densely connected ego networks while others don’t. Being a node-level measure, graphlets are naturally suitable to be applied in egocentric networks, with two more restrictions. First, some graphlets that do not fit the definition of an egocentric network need to be eliminated. For example, in graphlets of 2 to 4 nodes (Fig 1(a)), G3 (3-path) and G5 (4-cycle) are excluded because any node in them acting as an ego cannot reach other nodes in a single hop. Second, since only one node can serve as the ego in an egocentric graphlet, it is unnecessary to discriminate between different orbits. Therefore, there are in total seven egocentric graphlets of 2 to 4 nodes, which are 2-clique, 2-path, 3-clique, 3-star, tailed-triangle, 4-chordal-cycle and 4-clique (Fig 2).

Fig 2

Egocentric graphlets of 2 to 4 nodes.

There are in total seven egocentric graphlets. The black node in a given graphlet is the ego node, other nodes are alter nodes.

Egocentric graphlets of 2 to 4 nodes.

There are in total seven egocentric graphlets. The black node in a given graphlet is the ego node, other nodes are alter nodes.

4 Typed-edge graphlet degree vector

This section describes the framework for generating the typed-edge graphlet degree vector. The classic graphlet degree vector manages to capture the structural patterns in homogeneous networks. However, many real-world networks also contain rich information on nodes and edges, making them node-attributed, edge-attributed or heterogeneous networks. Information about edge type is particularly important in social networks since it provides a detailed description of relationships among individuals. In the target dataset of this study, for instance, each patient with chronic pain specifies their egocentric social network, including up to ten actors, and each ego-to-alter edge is labelled with one of 13 different types of social ties. In order to analyse edge-attributed networks at a finer granularity and capture the rich edge-typed connectivity patterns, we propose to embed edge type information in graphlets. The original graphlet degree vector generates a one-dimensional vector by counting the instances of each type of graphlet. Here, we propose to build a two-dimensional vector by adding an extra dimension of edge type on top of GDV, that is to say, counting each type of edge contained in each type of graphlet. We start by formally defining an edge-attributed network. Definition 1 An edge-attributed network G is a triple , where V = {v1, v2, …, v} is the set of nodes, E = {e} ⊂ V × V is the set of edges where e indicates an edge between nodes v and v, and is the set of edge types, where denotes the type of edge e. The initial step of the framework is a graph preprocessing, where the set of edge types is mapped to integers ranging from 0 to . For example, the 13 different types of social ties in the target dataset are represented from 0 to 12. (τ ∈ [0, 12]). Additionally, the set of orbits is converted to integers from 0 to . In this study, we take into account all the node-orbit graphlets within the size of 2 to 4 nodes (Fig 1(a)). Thus, there are 15 orbits coded from 0 to 14 (o∈ [0, 14]). Algorithm 1: Build Typed-Edge Graphlet Degree Vector. input: preprocessed graph , set of node-orbits , node set V′. Output: dictionary dic of vectors for all nodes ∈ V′. 1 initialise: dic = {} 2 foreach i ∈ V′ do 3 initialise a 2d-vector vec of size with zeros 4 foreach do 5 L = GetEdgeList(o); 6 Upadate(vec, o, L) 7 dic[i] = vec; Algorithm 2: Update Vector. 1 Function Update input 2d-vector vec, type of node orbit o, list of edges L. 2 foreach e ∈ L do 3 τ = GetType(e); /* o and τ are used as indices in vec. */ 4 vec[o][τ] increase by 1; Algorithm 3: Code Snippet for Orbit-6, 9 and 10. Next, for any node of interest, the typed-edge graphlet degree vector (TyE-GDV), i.e., a two-dimensional vector of size , is generated using Algorithm 1. Concretely, after initialisation, for each node in a given node set V′ and for each orbit in the set of node-orbit graphlets , the vector is updated through the Update function (Algorithm 2). The calculation of each orbit in Algorithm 1 is omitted for a more concise expression. To demonstrate the detailed process, we give a program snippet for calculating orbit-6, orbit-9 and orbit-10 in Algorithm 3. C(N, 2) denotes all possible 2-combinations of the neighbours of node u. The use of combinations is to avoid repetitive calculation. In Algorithm 2, o and τ are readily used as indices when updating the vector as a result of the preprocessing stage. Finally, at the end of Algorithm 1, a dictionary of nodes as keys and their corresponding TyE-GDV as values is returned. For example, if an orbit-9 is detected and its four edges are of type ‘0’, ‘1’, ‘2’ and ‘2’, vector elements at coordinates (9, 0), (9, 1), (9, 2) and (9, 2) will increase by 1. Obviously, the time complexity of generating TyE-GDV is the same as counting graphlets. Although the introduction and implementation of the typed-edge graphlets approach is aimed at dealing with edge attributed networks, it can be easily extended to node attributed networks by replacing an edge type with a node type, or to networks containing both different node and edge types by adding an extra dimension of a node type. As discussed in Section 3.2, egocentric networks are sometimes of special interest, especially when edge type information is included (as in our case study dataset of chronic pain patients). With the restriction of being egocentric, there are fewer orbits in graphlets that need to be considered. Therefore, we propose a tailor-made version of the framework for egocentric networks, called TyE-EGDV (see Algorithm 4). C(N, 2) and C(N, 3) stand for all possible 2-combinations and 3-combinations of the neighbours of node i. Note that in TyE-EGDV, there are in total 7 orbits in , instead of 15 (see Fig 2). Therefore, the algorithm is more efficient in both time and space. Algorithm 4: Build Typed-Edge Ego-Graphlet Degree Vector.

5 Typed-edge degree, colored graphlets and heterogeneous graphlets

Since a node degree is the simplest network structural metric, a naive way of encoding edge type information in a network structure is first to have the notion of a typed-edge degree. Formally, the typed-edge degree of a node i with an edge type t, i.e., , is defined as the number of edges of type t that are connected to i. Then, a typed-edge degree vector (TyE-DV) can be defined as a vector containing typed-edge degrees of all types. Some other approaches that also aim to take a node and/or an edge type into consideration include the colored motifs [28], colored graphlets [20] and heterogeneous graphlets [21]. Colored motifs, as the name suggests, extended G-Tries algorithm that counts motifs [29] by including the information of a node or edge type. This approach, however, is at the network level and is therefore not suitable for a node-level analysis. Colored graphlets approach [20] is at the node level, and proposes to distinguish different graphlets according to all combinations of node types. The approach is said to be able to deal with typed edges, but without theoretical explanation or experimental demonstration. The article alleges that the total number of combinations equals 2 − 1, where T is the total number of possible node types. This is incorrect as it fails to take the size of the graphlet into account. When graphlet size is smaller than the number of node types, the total number of combinations will be smaller than 2 − 1. For example, when we consider the graphlet G0, i.e., 2-clique, with three possible node types, there are in total six combinations, instead of seven. The combination containing all three types cannot exist since there are only two nodes in this graphlet. Below, we give the amended equation for calculating the number of combinations in a given graphlet g: where K(g) is the number of nodes of the graphlet when T refers to a node type, or the number of edges of the graphlet when it refers to an edge type. Note that when K(g)≥T, the equation becomes , which equals 2 − 1. We then develop a colored graphlets approach for edge-typed networks, named ColoredE-GDV, which is also applied to the case study in the next section. The recently proposed heterogeneous graphlets approach [21] also considers a node type in graphlets. It is different from the colored graphlets approach in two ways. First, heterogeneous graphlets are computed at a link level. It distinguishes the position of a given edge, instead of a given node (please refer to the notion of edge-orbit graphlets in Section 3.1). The benefit of a link-based computation is that it is more time-efficient in sparse networks than node-based approaches. The downside, apparently, is that it is not suitable for a node-level analysis. Second, heterogeneous graphlets propose to use combinations with repetitions of node types, rather than just a combination, when distinguishing different graphlets. The total number of possible heterogeneous graphlets is calculated as: Similarly, K(g) is the number of nodes of the graphlet when T refers to a node type, and the number of edges when it refers to an edge type. Since type repetition is allowed in heterogeneous graphlets, the number of possible heterogeneous graphlets is larger than that of colored graphlets. In order to extend the idea of heterogeneous graphlets to a node-level analysis and to deal with typed edges, we propose a node-based typed-edge heterogeneous graphlets approach, named HeteroE-GDVN (the original link-based typed-node approach is noted as HeteroN-GDVL). The approach of HeteroE-GDVN is demonstrated through Algorithm 5. We see clearly that its time complexity stays the same when counting untyped graphlets, but the space complexity grows fast with the number of edge types. Algorithm 5: Node-based Heterogeneous Graphlets Degree Vector (Hetero-GDVN) input: preprocessed graph , set of node-orbits , node set V′. output: dictionary dic of vectors for all nodes ∈V′. 1 initialise: dic = {}; 2 ; /* range of edge number of graphlets of size 2—4 nodes */ 3 for k ← 1 to 6 do 4 L = [GetCombWithRep ; 5 foreach i ∈ V′ do 6 for o ← 0 to do 7 initialise vec; 8 foreach do 9 k = GetNumOfEdge(o); 10 L = GetEdgeList(o); 11 tup = (Sort(L)); 12 vec[GetIndex(L, tup)] increase by 1; 13 ; 14 dic[i] = vec; Although the above approaches seem powerful to capture all possible combinations (or combinations of repetitions) of different types of nodes or edges, their numbers of possible graphlets, which are also their space complexities, grow near-exponentially with the number of node or edge types. For example, with 9 node types, in the colored graphlets approach, there are 255 possible colored graphlets for a graphlet of 4 nodes; and in the heterogeneous graphlets approach, there are 495 possible graphlets. In comparison, the space complexity grows linearly with the number of edge types in the proposed TyE-GDV approach. Moreover, out of this large number of possible graphlets, only a tiny percentage of them actually exists in real networks. For example, in Cora citation network [30], only 19 heterogeneous graphlets exist out of 210 possible ones in a 4-clique graphlet. In order to utilise the colored graphlets and the heterogeneous graphlets approaches in egocentric networks, we further develop their egocentric versions, and apply them in the chronic pain case study. With fewer node orbits to consider, egocentric colored graphlets and egocentric heterogeneous graphlets are faster and more space-saving than the original ones. The implementation of these algorithms is available at https://github.com/MingshanJia/explore-local-structure. To conclude this section, we summarise the time and space complexities of the four main approaches in Table 1. Colored-GDV, HeteroE-GDVN and TyE-GDV share the same time complexity because they are all node-based algorithms. Hetero-GDVL as the only link-based algorithm, could be faster in sparse networks. When it comes to space complexity, the proposed TyE-GDV grows linearly with the number of edge types, while the other three methods grow near exponentially with it.

Table 1

Time and space complexities of four approaches that deal with edge type information.

S is the maximum number of nodes in graphlets, K is the maximum number of edges in graphlets, is the number of edge-orbit graphlets.

Approach	Time complexity	Space complexity
Colored-GDV [20]	O(\|V\|·kmaxS-1)	O(\|V\|·\|O\|·2\|Te\|)
Hetero-GDV^L [21]	O(\|E\|·kmaxS-2)	O(\|E\|·\|Oe\|·KC\|Te\|+K-1)
HeteroE-GDV^N	O(\|V\|·kmaxS-1)	O(\|V\|·\|O\|·KC\|Te\|+K-1)
TyE-GDV	O(\|V\|·kmaxS-1)	O(\|V\|·\|O\|·\|Te\|)

Time and space complexities of four approaches that deal with edge type information.

S is the maximum number of nodes in graphlets, K is the maximum number of edges in graphlets, is the number of edge-orbit graphlets.

6 Experiments and analysis

In this section, we apply the proposed methods to analyse the egocentric social networks of chronic pain patients.

6.1 Dataset

The real-world dataset is collected from chronic pain patients of the League for Rheumatoid Arthritis, the League for Fibromyalgia and the Flemish Pain League [19]. Each patient creates their own egocentric social networks containing up to 10 alters using the graphical tool GENSI [31]. The types of social ties between the patient (the ego node) and his/her contacts (the alters) are explicitly given. There are in total 13 types of social relationships, including families, friends, colleagues, neighbours, etc. The full list of social ties and their total occurrences are listed in Table 2). The patients were also asked to fill out a questionnaire on pain-related and sociodemographic information. In addition to that, a daily diary consisting of items measuring pain intensity, and physical, psychological and social well-beings, was provided to participants for 14 consecutive days. After eliminating inconsistent and incomplete entries, the final dataset consists of the egocentric social networks, sociodemographic and pain characteristics of 303 patients. The average age of all patients is 53.5±12 years, including 248 females and 55 males.

Table 2

13 types of social relationships and their total number of occurrences in 303 egocentric networks.

Social Relationship	Type Code	Number of Occurrences
Partner	T-1	222
Father/Mother	T-2	209
Brother/Sister	T-3	293
Children/Grandchildren	T-4	493
Friend	T-5	506
Family-in-law	T-6	207
Other family	T-7	142
Neighbour	T-8	69
Colleague	T-9	57
Healthcare worker	T-10	233
Member of organisations	T-11	74
Acquaintance	T-12	15
Other	T-13	17

Some basic characteristics of the egocentric networks, such as the ego nodes’ degree distribution and their edge-type distribution, are shown in Fig 3. The edge-type distribution is computed by summing over all ego nodes on each type of the edges, which is also displayed in the third column of Table 2. The degree distribution reveals that the majority of patients (62%) have ten social connections in their social networks (Fig 3a). However, we do not anticipate node degree to be a discriminative feature in the following analysis since ten contacts are the upper limit in the dataset. According to the edge-type distribution (Fig 3b), the most frequent types in these networks are T-5 “friend” and T-4 “children/grandchildren”. In contrast, edge types T-8 “neighbour”, T-9 “colleague” and T-11 “member of organisations” are underrepresented. T12 “acquaintance” and T-13 “other” are almost negligible because people would first list their strongest contacts with the limitation of ten connections, leaving little room for those weaker ties.

Fig 3

Degree distribution and edge-type distribution of 303 egocentric social networks.

Moreover, the grades of chronic pain are calculated by means of the Graded Chronic Pain Scale (GCPS), which evaluates both pain disability and pain intensity [32]. Then, patients are divided into five grades based on their average intensity and disability scores: grade-0 for no pain; grade-1 for low intensity and low disability; grade-2 for high intensity and low disability; grade-3 for moderate disability irrespective of pain intensity; and grade-4 for high disability irrespective of pain intensity. Due to the fact that all participants have a certain degree of chronic pain, their GCPS grades vary from grade-1 to grade-4. Specifically, there are 21 patients in grade-1, 33 patients in grade-2, 67 patients in grade-3 and 182 patients in grade-4. In this study, we aim to investigate whether the structural feature, especially the edge type augmented structural feature captured by TyE-GDV, are helpful in understanding the patients’ pain grades.

6.2 Analysing pain grades

Evidence within the fields of pain and rehabilitation science has shown that social interactions play an important role in the perception of pain [33]. Perceived social support and pain inference are found to be associated in individuals with chronic musculoskeletal pain [34]. Lower levels of social support and higher levels of pain intensity are observed in rheumatoid arthritis patients at the 3- and 5-year follow-ups [35]. It has also been demonstrated recently that reduced social isolation accounts for significant improvements in self-reported emotional and physical functioning [36]. Typically in these studies, the social milieu of a patient is assessed by the Social Support Satisfaction Scale (ESSS) [37] or the Patient Reported Outcome Measurement Information System (PROMIS®) [38]. However, as these measurements are not based on the real social networks of the patients, they are unable to shed light on the impact of network topologies, especially certain types of interactions, on the perception of pain. To address this issue, we choose to apply both the traditional graphlets approach and the proposed typed-edge graphlets approach to analyse the egocentric networks of chronic pain patients. First, in order to investigate the impact of network structure on pain grade, we calculate the average egocentric graphlet degree vector for each GCPS grade. A radar chart shows the average values of the seven egocentric graphlets at each grade (Fig 4). We observe clearly that patients with higher pain grades (grade-3 and grade-4) possess more star-like structures (3-star graphlet) in their social networks, whereas patients with lower pain grades (grade-1 and grade-2) compose more clique-like or quasi-clique-like structures (3-clique, 4-clique and 4-chordal-cycle graphlets). A poorer-connected star-like structure denotes a more isolating social setting, whereas a better-connected structure, such as the 3-clique or 4-clique, may suggest stronger social support. These findings are in agreement with the aforementioned studies [33-36] and provide further evidence that a patient’s social network may influence how much pain they perceive. Additionally, we discover that the number of immediate connections (2-cliques) is ineffective in differentiating pain grades, which may be partially caused by the limited number of contacts in the dataset. Nevertheless, Evers et al. [35] also discovered that changes in pain are not substantially correlated with the size of a patient’s egocentric social network. Jia et al. revealed that the clustering coefficient and the quadrangle coefficient are useful topological features in assessing the perception of pain [39]. These findings further underline the need to consider more complex network topologies when examining patients’ social networks.

Fig 4

Radar chart of average GDV of different GCPS grades.

Each spoke represents the average number of graphlets belonging to that type.

Radar chart of average GDV of different GCPS grades.

Each spoke represents the average number of graphlets belonging to that type. Furthermore, in order to analyse the association between the types of social ties and the perception of pain, we employ the typed-edge graphlet degree vector and focus on two specific graphlets, namely the weakly connected 3-star graphlet and the highly connected 4-clique graphlet. These two graphlets are selected not only because they represent two extremes of 4-node structures but also because distinct differences between patients with lower pain grades and patients with higher pain grades are observed in them. We first calculate the average counts of the 13 edge types at each pain grade for the 3-star graphlet, i.e., the 3rd row of the Typed-Edge Ego-Graphlet Degree Vector (see Algorithm 4), and draw a parallel coordinates plot (Fig 5(a)). We discover that in the poorly connected star-like structure, edges of type T-5 “friend” and T-10 “healthcare worker” are significantly more frequent in patients with higher pain grades than in patients of lower pain grades. That is to say, in the social networks of higher pain grade patients, friends and healthcare workers are in a rather isolated position—not well connected with other contacts of the patient. Thus, it provides the potential for treatments that boost a patient’s friends’ and healthcare professionals’ social involvement to improve chronic pain management.

Fig 5

Two parallel coordinates plots revealing the association of edge type and pain grade.

(a). Average TyE-GDV of four GCPS grades for 3-star graphlet. (b). Average TyE-GDV of four GCPS grades for 4-clique graphlet.

Two parallel coordinates plots revealing the association of edge type and pain grade.

(a). Average TyE-GDV of four GCPS grades for 3-star graphlet. (b). Average TyE-GDV of four GCPS grades for 4-clique graphlet. We then calculate the average counts of the 13 edge types at each pain grade for the 4-clique graphlet, i.e., the 6th row of the Typed-Edge Ego-Graphlet Degree Vector, and the corresponding parallel coordinates plot is given in Fig 5(b). We observe that, in this tightly-connected structure, patients with lower pain grades have more edges of type T-5 “friend” than patients with higher pain grades. In other words, friends are better involved in the social networks of patients who perceive lower level pain grades than those who perceive higher pain grades. The importance of friendship is revealed in both 3-star and 4-clique graphlets. As pointed out by other studies [40, 41], people with severe chronic pain may be more liable to a deterioration of their friend relationships and are in more need of supportive behaviours from friends. Another noticeable difference between patients of lower pain grades and patients of higher pain grades is found in edge T-9 “colleague”. In contrast to the lower pain grade group, where more than one colleague appears in the clique structures (1.1 on average), colleagues hardly exist in them among the higher pain grade group (0.24 on average). This could be a result of the negative consequences that severe chronic pain has on patients’ capacity for work [42]. To provide an intuitive grasp of the edge type encoded structural differences between the social networks of patients with different pain grades, we extract two real examples from the dataset as the network prototypes of patients of pain grade-1 and patients of pain grade-4, respectively (Fig 6).

Fig 6

Social network prototypes of patients with GCPS grade-1 and patients with GCPS grade-4.

Social network prototypes of patients with GCPS grade-1 and patients with GCPS grade-4.

(a). In the prototype network of patients with pain grade-1, contacts are tightly connected to each other with the appearance of T-5 friend and T-9 colleague; (b) In the prototype network of patients with pain grade-4, contacts are loosely connected with limited links incident to T-5 friend and T-10 healthcare workers. This experiment demonstrates that the extra edge type information encoded in TyE-GDV provides us with more insights into the association between patients’ perception of pain grade and the type of social ties in their egocentric networks. It thus has implications for improving therapeutic interventions through boosting particular types of social interactions.

6.3 Node classification

We now apply the proposed TyE-GDV, and the extended egocentric versions of colored graphlets (ColoredE-GDV) and heterogeneous graphlets (HeteroE-GDVN) approaches in a typical machine learning task. Node classification, being one of the most popular and extensively explored tasks in network science [43], aims to predict the labelling of nodes based on a subset of nodes that have ground-truth labels. Here, our goal is to predict the GCPS grade of patients with chronic pain. In order to evaluate the effectiveness of the proposed approaches, we fit six sets of features into a random forest classifier. The first set comprises the patients’ demographic attributes, pain-related descriptors and their physical and psychological well-being indicators. Since it contains no network-related information, we refer to it as the raw feature set. The second set and the third set add the typed-edge degree vector (TyE-DV) and the traditional graphlet degree vector (GDV), respectively, on top of the raw features. The fourth set combines the raw features with the proposed typed edge graphlet degree vector (TyE-GDV), and finally, the fifth set and the sixth set plus the colored graphlets degree vector (ColoredE-GDV) and the heterogeneous graphlets degree vector (HeteroE-GDVN), respectively, to the raw feature set. Since the dataset is not big and the distribution of the four pain grades is not balanced (see Section 6.1), we adopt a stratified 5-fold cross-validation [44] to evaluate the classification performance with different feature sets. Plus, we repeat the above step 500 times and report the mean metric score given the stochastic nature of decision tree-based models. Table 3 lists the prediction results for six models. As this is a multi-class classification task, and the distribution of the four classes is imbalanced, the macro-F1 score is selected as the evaluation metric. A naive classifier named Stratified is also added to the table (the first row), which simply generates predictions by adhering to the class distribution in the training set. We see clearly that the bottom three approaches that encode type information in graphlets (raw features plus ColoredE-GDV, raw features plus HeteroE-GDVN, and raw features plus TyE-GDV) perform better than the set of raw features plus TyE-DV and the set of raw features plus GDV. Recall that TyE-DV captures edge type information but with very limited structural information, and GDV, on the other hand, captures the rich structural information but without edge type information. This evidently shows that combining edge type information and rich structural information could lead to more distinctive features in network learning tasks.

Table 3

Result table of node classification, reported in the average macro-F1 score (± standard deviation), the average percentage gain over the raw feature set, and the total running time of 500 repetitions.

	Macro F1 (Mean ± Std)	Gain over raw feat. (Mean)	Time in sec. (Sum)
Stratified	0.248 ± 0.024	—	3
Raw feat.	0.578 ± 0.005	—	116
Raw feat. + TyE-DV	0.600 ± 0.005	3.8%	130
Raw feat. + GDV	0.597 ± 0.008	3.3%	138
Raw feat. + ColoredE-GDV	0.608 ± 0.006	5.2%	2091
Raw feat. + HeteroE-GDV^N	0.638 ± 0.006	10.4%	8230
Raw feat. + TyE-GDV	0.619 ± 0.004	7.1%	252

We also observe large differences in the running time of those methods. The running time of the set of raw features plus ColoredE-GDV, and especially the set of raw features plus HeteroE-GDVN are many times higher than other methods. This is because our dataset has 13 types of edges and the lengths of vectors generated from these two methods grow near exponentially with the number of edge types . Correspondingly, the speed of the machine learning algorithm will slow down as the feature vector becomes larger. Table 4 gives the vector lengths of all five approaches. Note that there is no edge type information between alter nodes in many egocentric networks, including this case study dataset. Thus, our implementations of ColoredE-GDV and HeteroE-GDVN have excluded all the impossible combinations. Overall speaking, the proposed TyE-GDV is able to achieve a competitive performance while maintaining a small vector length.

Table 4

Comparison of vector length of different approaches.

Approach	GDV	TyE-DV	TyE-GDV	ColoredE-GDV	HeteroE-GDV
Len. of vector	7	13	91	12367	38870

6.4 Limitations and future directions

Here, we describe some limitations of this work and outline how these might be overcome in future studies. Edge direction. Our current work is limited to undirected networks. To encode edge type information in directed networks, a natural extension of our approach is to apply the notion of directed graphlets [45-47]. The potential approach would be more complex due to the larger number of directed node-orbit graphlets. For example, even without considering bidirectional edges, there are in total 40 directed graphlets and 128 directed node orbits for graphlets of 2 to 4 nodes [45]. Temporal information. The proposed approach is static or time-independent. To make it suitable for more real-world networks that have nodes and edges appearing and disappearing over time, a potential future work would be studying how to encode edge type or node type information in temporal graphlets [48]. With the extra dimension of time, the potential extension could be beneficial in predicting types of future links or nodes [49, 50]. Potential applications. Apart from social networks, the typed edge graphlets approach could be convenient in studying biological networks, especially molecular graphs, where link attributes or bond types are essential information. The proposed approach is promising to be applied in biological network alignment, which aims to find a node mapping between molecular networks that reveals similar network regions [20, 51]. Moreover, inspired by recent works that include subgraph counting in Graph Neural Networks [52, 53], an interesting avenue is to incorporate the edge type enhanced structural information in GNN’s message passing scheme.

7 Conclusion

In this paper, we propose to encode edge type information in graphlets and introduce the framework for generating the Typed-Edge Graphlets Degree Vector for both sociocentric and egocentric networks. Moreover, we extended the colored graphlets approach and the heterogeneous graphlets approach to edge-typed networks and egocentric networks. Following the application of the traditional graphlet degree vector and the proposed TyE-GDV to the chronic pain patient dataset, we discover that 1) a patient’s social network structure could inform their perceived pain; and 2) the extra edge type information encoded in TyE-GDV provides us with more insights into the association between specific social relationships and patients’ perception of pain. We also showed that the rich structural information combined with the edge type information results in a significant improvement of a typical machine learning task that predicts patients’ pain grades. Due to the simplicity and excellent explainability, we anticipate that the typed edge graphlets approach would become a standard approach in studying edge-attributed networks and be applied in various tasks. 7 Jul 2022

PONE-D-22-15936

Encoding edge type information in graphlets

PLOS ONE Dear Dr. Jia, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. ============================== ACADEMIC EDITOR: Please address all the concerns raised by the reviewers. ============================== Please submit your revised manuscript by Aug 21 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols. We look forward to receiving your revised manuscript. Kind regards, Lun Hu Academic Editor PLOS ONE Journal Requirements: When submitting your revision, we need you to address these additional requirements. 1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf 2. We noticed you have some minor occurrence of overlapping text with the following previous publication(s), which needs to be addressed: - https://link.springer.com/chapter/10.1007/978-3-030-93409-5_43 In your revision ensure you cite all your sources (including your own works), and quote or rephrase any duplicated text outside the methods section. Further consideration is dependent on these concerns being addressed. [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Partly Reviewer #2: Yes ********** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: No Reviewer #2: Yes ********** 3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: No ********** 4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes ********** 5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: ID: PONE-D-22-15936 Title: Encoding edge type information in graphlets Summary: In this work, a framework to embed edge type information in graphlets and generate a Typed-Edge Graphlets Degree Vector (TyE-GDV) is introduced. The manuscript is interesting; however, the following comment should be addressed : Abstract : - - - - - - - - - - - 1 – The brief methodology should be given in the abstract . 2 - The obtained results should be included in the abstract. The results should be included in terms of improvement ratio between the proposed and existing works . Introduction Section : - - - - - - - - - - - - - - - - - - - - - - 3 – The structure of the manuscript need to be included at the end of this section . 4 – A related work section should be included in the manuscript . Background and Preliminaries Section : - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 5 – This section is fine . no comments . Typed-Edge Graphlet Degree Vector Section : - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 6 – More justification is required in this section . Typed-Edge Degree, Colored Graphlets and 162 Heterogeneous Graphlets Section : - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 7 – This section is fine . No comments . Experiments and Analysis Section : - - - - - - - - - - - - - - - - - - - - - - - - - - - - 8 – For more readability, more visual examples are required Conclusion Section : - - - - - - - - - - - - - - - - - - - - - - 8 – The limitation of this work should be clearly included in the conclusion section . Also, future work need to be included in this section . References : - - - - - - - - - - - - - - 9 – References need to be updated from literature (2022) . - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Reviewer #2: The paper introduces Typed-Edge Graphlets Degree Vectors (TyE-GDV) as a tool to analyze egocentric networks with different edge types. The method is applied to pain patient data, both for analysis and for prediction of pain sensation from the patient's social network. The paper is well written and presents interesting ideas and results. I have a few minor comments: 1. While the code is made available on GitHub, the data is not (at least not obviously to be found in the GitHub repository). For means of reproducibility it should be provided. 2. Algorithm 3: The variables t_o_6, t_o_9 etc. used in the algorithm are not defined, neither in the algorithm nor in the text. 3. Lines 135 and 137: It should read "from 0 to |T_e| - 1" and "from 0 to |O| - 1", respectively, to match the examples given within the same sentences. 4. Line 160f: It would be helpful to add a reference to Fig. 2, which shows the 7 orbits of TyE-EGDV. 5. Equations (1) and (2): The derivation of the formulae should be briefly explained, in particular since (1) is claimed to be a correction of a result previously published in [23]. 6. Fig. 4: The figure could be drawn as a radar chart, which might be more familiar to the usual reader than a parallel coordinates plot. The difference between the pain grades with respect to orbit types might also be more obvious in a radar chart. For Fig. 5 parallel coordinate plots are probably more appropriate due to the large number of attributes (edge types). 7. References: Starting from ref. [17], page numbers are missing for most references. Also the formatting of reference entries varies, e.g., regarding the capitalization of journal titles. The whole reference section should be thoroughly proofread and amended where necessary. ********** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: Yes: Volker Ahlers ********** [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. 22 Jul 2022 Our detailed reponse is in the attached rebuttal letter. Submitted filename: Rebuttal_Letter_PlosOne22.pdf Click here for additional data file. 2 Aug 2022

PONE-D-22-15936R1

Encoding edge type information in graphlets

PLOS ONE Dear Dr. Jia, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please submit your revised manuscript by Sep 16 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript:

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols. We look forward to receiving your revised manuscript. Kind regards, Lun Hu Academic Editor PLOS ONE Journal Requirements: Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice. Additional Editor Comments: Although the quality of this work has been improved a lot after revision, one reviewer still has some minor concerns about the paper, please address them for publication. [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation. Reviewer #1: (No Response) Reviewer #2: All comments have been addressed ********** 2. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes Reviewer #2: (No Response) ********** 3. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #2: (No Response) ********** 4. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: (No Response) ********** 5. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: (No Response) ********** 6. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: ID: PONE-D-22-15936 R1 Title: Encoding edge type information in graphlets Summary: In this work, a framework to embed edge type information in graphlets and generate a Typed-Edge Graphlets Degree Vector (TyE-GDV) is introduced. The authors have addressed the raised comments; however, one comment need to be addressed. Please check the comments below : Abstract : - - - - - - - - - - - 1 – The abstract is fine. No comments . Introduction Section : - - - - - - - - - - - - - - - - - - - - - - 2 – This section is fine. No comments . Preliminaries Section : - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 3 – This section is fine . no comments . Typed-Edge Graphlet Degree Vector Section : - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 4 – This section is fine . no comments . Typed-Edge Degree, Colored Graphlets and 162 Heterogeneous Graphlets Section : - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 5 – This section is fine . No comments . Experiments and Analysis Section : - - - - - - - - - - - - - - - - - - - - - - - - - - - - 6 – This section is fine . No comments . Related Work Section : - - - - - - - - - - - - - - - - - - - - - - - - - - - - 7 – This section need to be moved after the Introduction Section . Conclusion Section : - - - - - - - - - - - - - - - - - - - - - - 8 – This section is fine . No comments . References : - - - - - - - - - - - - - - 9 – The references are fine. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Reviewer #2: (No Response) ********** 7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: Yes: Volker Ahlers ********** [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

2 Aug 2022 Please find our detailed response in the attached rebuttal letter. Submitted filename: Rebuttal_letter_R2.pdf Click here for additional data file. 15 Aug 2022 Encoding edge type information in graphlets PONE-D-22-15936R2 Dear Dr. Jia, We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org. If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Kind regards, Lun Hu Academic Editor PLOS ONE Additional Editor Comments (optional): All reviewers agreed that this work can now be accepted. Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation. Reviewer #1: All comments have been addressed Reviewer #2: All comments have been addressed ********** 2. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes Reviewer #2: (No Response) ********** 3. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #2: (No Response) ********** 4. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: (No Response) ********** 5. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: (No Response) ********** 6. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: ID: PONE-D-22-15936 R2 Title: Encoding edge type information in graphlets Summary: In this work, a framework to embed edge type information in graphlets and generate a Typed-Edge Graphlets Degree Vector (TyE-GDV) is introduced. The authors have addressed all the raised comments. Abstract : - - - - - - - - - - - 1 – The abstract is fine. No comments . Introduction Section : - - - - - - - - - - - - - - - - - - - - - - 2 – This section is fine. No comments . Related Work Section : - - - - - - - - - - - - - - - - - - - - - - - - - - - - 3 – This section is fine. No comments . Preliminaries Section : - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 4 – This section is fine . no comments . Typed-Edge Graphlet Degree Vector Section : - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 5 – This section is fine . no comments . Typed-Edge Degree, Colored Graphlets and 162 Heterogeneous Graphlets Section : - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 6 – This section is fine . No comments . Experiments and Analysis Section : - - - - - - - - - - - - - - - - - - - - - - - - - - - - 7 – This section is fine . No comments . Conclusion Section : - - - - - - - - - - - - - - - - - - - - - - 8 – This section is fine . No comments . References : - - - - - - - - - - - - - - 9 – The references are fine. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Reviewer #2: (No Response) ********** 7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: Yes: Volker Ahlers ********** 17 Aug 2022 PONE-D-22-15936R2 Encoding edge type information in graphlets Dear Dr. Jia: I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department. If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org. If we can help with anything else, please email us at plosone@plos.org. Thank you for submitting your work to PLOS ONE and supporting open access. Kind regards, PLOS ONE Editorial Office Staff on behalf of Dr. Lun Hu Academic Editor PLOS ONE

19 in total

1. Network science.

Authors: Albert-László Barabási
Journal: Philos Trans A Math Phys Eng Sci Date: 2013-02-18 Impact factor: 4.226

2. Well-being and Perceived Stigma in Individuals With Rheumatoid Arthritis and Fibromyalgia: A Daily Diary Study.

Authors: Maité Van Alboom; Lies De Ruddere; Sara Kindt; Tom Loeys; Dimitri Van Ryckeghem; Piet Bracke; Manasi M Mittinty; Liesbet Goubert
Journal: Clin J Pain Date: 2021-05-01 Impact factor: 3.442

3. Measuring social health in the patient-reported outcomes measurement information system (PROMIS): item bank development and testing.

Authors: Elizabeth A Hahn; Robert F Devellis; Rita K Bode; Sofia F Garcia; Liana D Castel; Susan V Eisen; Hayden B Bosworth; Allen W Heinemann; Nan Rothrock; David Cella
Journal: Qual Life Res Date: 2010-04-25 Impact factor: 4.147

Approach	Time complexity	Space complexity
Colored-GDV [20]	O(\|V\|·kmaxS-1)	O(\|V\|·\|O\|·2\|Te\|)
Hetero-GDV^L [21]	O(\|E\|·kmaxS-2)	O(\|E\|·\|Oe\|·KC\|Te\|+K-1)
HeteroE-GDV^N	O(\|V\|·kmaxS-1)	O(\|V\|·\|O\|·KC\|Te\|+K-1)
TyE-GDV	O(\|V\|·kmaxS-1)	O(\|V\|·\|O\|·\|Te\|)