Literature DB >> 35572050

Hashing-based semantic relevance attributed knowledge graph embedding enhancement for deep probabilistic recommendation.

Nasrullah Khan^1,2, Zongmin Ma^1,3, Li Yan¹, Aman Ullah⁴.

Abstract

Knowledge graph embedding (KGE) is effectively exploited in providing precise and accurate recommendations from many perspectives in different application scenarios. However, such methods that utilize entire embedded Knowledge Graph (KG) without applying information-relevance regulatory constraints fail to stop the noise penetration into the underlying information. Moreover, higher computational time complexity is a CPU overhead in KG-enhanced systems and applications. The occurrence of these limitations significantly degrade the recommendation performance. Therefore, to cope with these challenges we proposed novel KGEE (Knowledge Graph Embedding Enhancement) approach of Hashing-based Semantic-relevance Attributed Graph-embedding Enhancement (H-SAGE) to model semantically-relevant higher-order entities and relations into the unique Meta-paths. For this purpose, we introduced Node Relevance-based Guided-walk (NRG) modeling technique. Further, to deal with the computational time-complexity, we converted the relevant information to the Hash-codes and proposed Deep-Probabilistic (dProb) technique to place hash-codes in the relevant hash-buckets. Again, we used dProb to generate guided function-calls to maximize the possibility of Hash-Hits in the hash-buckets. In case of Hash-Miss, we applied Locality Sensitive (LS) hashing to retrieve the required information. We performed experiments on three benchmark datasets and compared the empirical as well as the computational performance of H-SAGE with the baseline approaches. The achieved results and comparisons demonstrate that the proposed approach has outperformed the-state-of-the-art methods in the mentioned facets of evaluation.

Entities: Chemical

Keywords: DNN; Hashing; Information relevance; KGEE; Knowledge graph; Recommendation

Year: 2022 PMID： 35572050 PMCID： PMC9075930 DOI： 10.1007/s10489-022-03235-7

Source DB: PubMed Journal: Appl Intell (Dordr) ISSN： 0924-669X Impact factor: 5.086

Introduction

With the current burst of data-overload on the web, choosing feasible options in the bulks of analogous choices is challenging. The process of streamlining the online data-overload and suggesting relevant options to the end users to satisfy their needs and interests is personalized recommendation. Recommender Systems (RS) exploit the information of individuals and their concerned people to provide suggestions about their potential interests [1, 2]. Previously, the interaction information was acquired from the user’s previous interaction-logs on the web and preferences about their hidden interests were generated via CF1-based techniques [3] – the most common and widely utilized recommendation algorithms [4]. Although these methods achieved great importance and consideration, their performance was significantly affected by the issues of data-sparsity, gray-sheep and cold-start [5, 6]. To cope with these challenges, researchers adapted to utilize side information acquired from different knowledge resources as input to the recommender systems. Although the side information exploited by the recommendation methods has many types, KG-based side information gained valuable attention and popularity. It describes information via entities and relations among them and exhibits easy theme of manipulation, processing and understanding. The information structure of Heterogeneous Information Graph (HIG) can easily be modelled into feature vectors [7, 8]; and this ideology is highly empowered by the exploitation of language-modeling techniques with graph theory. Previously, inter-connections among words in the documents (e.g., like relations among nodes in KG) was introduced via pairwise-interaction in the vector space. Similarly, based on the distance among vectors in the Euclidian space, NLP2-research defined distance-based language modelling to represent the structured-embedding of triplets. Moreover, translation-based models represent relations among entities as translation in vector space and assign preferential values to the entities based on the distance among them via distance function. NLP-based language-models that support random-walk strategies over the highlighted “words in the documents” up to 1st-order relation, proposed idea about the possibility of sequential flow among nodes via relations. Later, random-walk “in the documents” up to the 2nd-order relation was developed which introduced the concept of walk over nodes in KG. To efficiently exploit KG-based side information via systems and application domains, entire KGs are embedded to the low dimensional Euclidian space – the process is termed as KGE. It is basically vector-notion of nodes and paths that preserves the semantics and topologies of KG data in the embedded state. KGE is significantly contributing to RS by providing the data in feasibly-interlinked-mode of presentations, i.e., KG-based side information [9, 10]. RS utilize KG-based side information to provide personalized recommendations based on item-relations, user-item-interactions and user-feedback. Although “paths” are information channels among nodes in the KG to infer reasoning-facts for recommendation, suffer from constraints and limitations wrt the implementation concerns. To well demonstrate the scenario, we picturize a tiny exemplary situation from KG-based news recommendation, as shown in Fig. 1. Let consider path “P1: Babar → Kamal → eco-News → News Publisher → Covid News → Pandemic → World’s Economy”. A length constraint LC has to be imposed on “P1” that results in truncation of nodes exceeding the length limit of LC. For instance, “→ Covid News → Pandemic → World’s Economy” will be discarded from “P1” if LC = 3. Moreover, linear paths fail to reflect the graph structure in information aggregation, and a CPU overhead in processing all of the outgoing paths on “Babar” without having a concern of information relevance. Similarly, the current methods that aggregate information from neighbors of the entities regardless of applying information relevance constraints aggregate noise with information. Let see how noise penetrates into the information being aggregated; e.g., if everything in under-consideration part of KG is irrelevant except this meta-path, “Kamal → eco-News → Pandemic → US’s Economy”, then all of the aggregated information coming-in and going-out via the node “Pandemic” is noise.

Fig. 1

News Recommendation’s – an exemplary scenario; where Babar interacts with “Aljazeera News” and “Afghanistan News” whereas “Kamal” visited “eco-news” published by “Bing-News”. In response, “Covid-News” and “Economy News” are recommended to both of them Subsequently, to effectively deal with the above discussed research challenges, we proposed a novel hashing-based semantic relevance attributed Knowledge Graph Embedding Enhancement (KGEE) approach for effective recommendations. The framework contains Relevance-based Influential-graph Construction (RIG) and Hashing-based Recommendation (HR) modules. The data (i.e., KG-in-raw and user interaction-logs) is given as input to the RIG module. RIG captures higher-order semantically relevant entity-information via the connecting relations, and HR transforms the corresponding information to the hamming space to generate recommendations. To the best of our knowledge, KGEE is a first ever contribution to the hashing-based recommendation in the scope of heterogeneous data. The experimental-work and theoretical-comparison confirm that H-SAGE has outperformed the state-of-the-art methods with significant improvements. The key contributions of this study are given below; We proposed Node Relevance-based Guided-walk (NRG), i.e., path-modeling technique, to highlight Unique Meta Path (UMP) based on semantically relevant entities in KG. NRG works like the spreading effect of an influential-graph. We converted UMPs to hash-codes and placed the corresponding hash-codes in semantically relevant hash-buckets-based on their mutual likelihood to maximize the possibility of Hash-Hits via function calls. In case of Hash-Miss, we used LS-hashing to acquire the required hash-codes around the location of Miss up to a maximum length of 3 × 3 hash-indices. Upon success, we return the required hash-codes to the function, otherwise we return 0. We exploited a predictive presentation interface that collects the retrieved hash-codes, processes them, calculates the potential preferences and generates the formal recommendation responses. We performed extensive experiments on three real world benchmark datasets to assess and analyze the performance of H-SAGE in comparison with the baseline methods. The experimental results and theoretical comparison demonstrate that the proposed approach has outperformed the baseline methodologies. Further organization of this paper is as follows; Section 2 covers the Related Works, Section 3 describes the Preliminaries, Section 4 presents The Proposed Methodology, Section 5 demonstrates the Experimental, Empirical & Theoretical comparative analysis; and finally, Section 6 concludes this paper with Conclusion and Future-work.

Related works

Although current KG-based recommendation undergoes various considerable methods in particular, we broadly categorize the relevant work in three classes from the perspectives of implementation techniques, i.e., Path, Embedding and Propagation, and Hashing-based recommendation methods.

Path-based recommendation

Path-based methods exploit similarity of the interaction-sequence of previous occurrences to provide new recommendations. For example, PER [8] learned user to item and item to item relations through feature extraction with Meta-path-based walk. It is basically HIG-based entity recommendation approach. Typically, it introduced hidden features via meta-paths to demonstrate the relations among entities (i.e., users and items) wrt different connectivity-types in the data. Formally, it modelled HIG-based data differently wrt different users to provide quality recommendations based on user-to-item implicit interactions and defined recommendation models. SemRec [11] introduced the concept of weighted information structure (i.e., HIG and meta-paths) to demonstrate the semantics of relations via differentiating the values of different link-attributes. Further, it predicted user-to-item potential preference (i.e., rating scores) via its proposed semantic meta-path-based recommender system. FMG [12] calculated similarities of path-sequences between users and items in KG wrt the flow of previous interactions and generated new recommendations. Similarly, MCRec [13] utilized meta-path-based random connectivity-processing to obtain the representations of user-to-item context. It combined Deep Neural Network (DNN) with Attention Mechanism (AM) to exploit the rich information context of HIG through Meta-paths for top-N recommendation. It applied priority-guided path sampling technique to choose quality-path samples from the context to construct meta-paths. KGR [14] analyzed reader’s exploration history to model his research interest, and extracted the required material from different research proposals. KGR considered the distance between the reader’s research interest and his required information as the knowledge gap or path. Correspondingly, HERec [15] learned the entity embedding representations via the Meta-path-based random-walk sampling technique. Although these methods attained significant performance and popularity in the context of path matching, their major drawback is being fully dependent on manual or random selection of the meta-paths. As a solution to the problem, [16] proposed RKGE approach to learn semantical representations of users, items and relations among them to forecast the preferences of users towards items of their potential interest. Typically, it applied Recurrent Neural Network (RNN) to sample the semantical context of the paths connecting same or identical entity pairs. Similarly, KPRN [17] exploited neural networks to automatically mine the required meta-paths of the defined length. However, capturing user to item graphical structure via independent, limitedly-lengthened and linear meta-paths; and computing path-based static similarities to retrieve the potential preferences lead to the wastage of notable extent of information and unwanted CPU overhead respectively.

Embedding and propagation-based recommendation

KG-based information is embedded to the low dimensional vector space via translation techniques, enriched through mapping to the relevant entities in the external Knowledge Bases (KBs) and tackled through the recommendation techniques for recommendation. For instance, CKE [18] aggregated CF-technique with item-based side information in a Bayesian network to acquire the semantic embedding of items via TransR. KSR [19] accomplished the sequential recommendations via TransE, and DKN [20] acquired the representations of entities-and-relations embedding based on KG-features-learning via TransD. Similarly, RCF [21] used DistMult and attention mechanism to access item-to-item relations for preference computation. Typically, it utilized various types of item-relations for recommendation. It proposed that relation-type (i.e., Allen Turing) and relation-value (i.e., Turing Machine) both are significant for recommendation and greatly impact the performance wrt the preference calculation. KTUP [22] used TransH to jointly learn the recommendation and KG completion modules without preserving the semantic connections among the data instances. RippleNet [23] propagated the historical interactions of users to items over the graph and aggregated the potential preferences of users about unseen-items of their interest based on the propagated and aggregated information. KGCN [24] incorporated the information of neighbors of items into the neural network to learn the embeddings of items with GCN3 via propagation to calculate the preferences of users. Correspondingly, KGAT [25] enriched the embeddings of users and items, and recursively performed the information propagation over the graph to enhance the performance; and AKGE [26] applied the Euclidian-distance-based similarity technique to filter out the irrelevant information, constructed local subgraph with relevant entities, and performed relation aware propagation over the network to enhance the performance. Moreover, NACF [27] also identified the potential interests of the users via preference propagation based on the information collected from the neighbors of the entities. DKEN [28] highlighted the impact of information-exchange between the implicit-interactions and explicit-semantics in user-to-item interactions and KG-features respectively to acquire a better grip on semantical and hierarchical structure of information-flow in the graph. Regularization-based approaches utilized the graphical structure of underlying data to acquire the entity representations via accessing the regularization terms; and Unification-based approaches combined the regularization terms with path-based methods to enhance the recommendation performance. Although the majority of these methods attained great focus and consideration due to their overwhelming performance, noise-free and effective undertaking of the semantical and hierarchical structure of KG-based relevant information via the recommendation algorithms is nevertheless a challenge.

Hashing-based recommendation

Hashing techniques are exploited to reduce the computational time complexity of algorithmic processes in different application scenarios [29]. Currently, to make data management efficient and smoother, KG-based information processing is being notably incorporated in many domains of computational intelligence. Although KG retains powerful theme of information presentation and management, it undergoes quadratic and higher computational time complexity in normal cases. Unsurprisingly, KG-based side information is utilized for recommendation generation in particular, and hashing-enhanced techniques are used to reduce the computational time complexity of the recommendation as well. In this section, therefore, we enlist such relevant approaches that exploited hashing techniques in recommendation and information retrieval to lighten the computation. For instance, [30] proposed semantic hashing method to deal with query and document-based textual data to retrieve the required information, and [31] proposed hash-graph-kernel-based approach to extract the interaction of protein-to-protein from the context of node-neighboring in the graph. Similarly, HashNet [32] proposed deep-learning-based approach to use hashing-by-continuation with convergence to learn the exact binary-codes from the imbalanced experimental data. Moreover, [33] introduced KG2Rec to exploit the Locality-sensitive Hashing [34] though CF-algorithms for KG-enhanced recommendation. NeuHash-CF [35] proposed their approach of content-based neural hashing via CF-algorithms to address the limitations of cold-start in recommendation. RSLH [36] tried to reduce the length of hash-codes to the minimal possible extent of applicability via the exploitation of reinforcement learning. Also, HashGNN [37] proposed GNN4-based deep-hashing approach to accomplish recommendations with KG. Last but not least, HALF [29] applied search oriented KGE-technique via hash-learning framework to enhance time-complexity-based performance of computation. The above discussed hashing-based approaches achieved better performance in their applied circumstances to improve the computation against time-complexity, but these methods have overlooked the issues of noise inclusion (emergence, existence) to/in the experimental data. Also, they did not apply any information relevance check or noise filtration constraint to verify (i.e., validate) the applicability-health of the underlying data. On the other hand, our main objectives are to keep the irrelevant data excluded from the underlying information (i.e., experimental datasets), and to perform KG-enhanced recommendation in a lighter-environment of computation. For this purpose, therefore, we propose novel hashing-based semantic relevance attributed KGEE approach of computationally less-heavier recommendation over optimally noise-free KG-data.

Preliminaries

In this section, we define primary notations, recommendation task and some of the basic concepts.

Notations

Set of entities , set of users , set of items given that , and set of relations belongs to KG. Entities belong to Entity matrix E as e ∈ E ∣ j = 1, 2, …, k, and relations r belongs to Relation matrix R as r ∈ R ∣ j = 1, 2, …, k, given that, e = [n] ∈ {±1} and r = [p] ∈ {±1}, and correspondence of ith instance with jth entity or jth relation represents the node n or path p, respectively, belongs to μth dimension of the feature vector. The interaction among entities is handled through 1 or 0 in interaction-matrices as if there exists an interaction or ith instance is similar to jth entity or jth relation, the entry of node n = 1 or path p = 1, respectively, and 0 else to the corresponding matrix. The neighbors of node n are shown as . Moreover, anchor-points γ belongs to the training set X and placed in Anchor matrix A as γ ∈ A ∣ i = 1, 2, …, υ, where the items of A are randomly selected from X. Similarly, h belongs to Hash matrix H ∣ H ∈ {±1} as h ∈ H ∣ i = 1, 2, …, l, ‖H‖2 is normalization of H, and H⊤ is transpose of H. Thus, H and H are the hash matrices of the corresponding E and R matrices respectively.

Problem description

This approach is supposed to produce a list of top-K recommendations out of a bulk of candidate items. Therefore, it scans raw-data and interaction-log as inputs, exploits the relevance factor and to filter-out the irrelevant data, and utilizes relevant information to construct the influential subgraph . The is transformed to hash-codes to forecast the probability of u selects v via ; where is the prediction function that predicts true likelihood among the entities of the unseen triplets (e, r, e), ⨂ is the set of environment parameters, e is entity and CH represents the hash-codes. The descending-order sorted outcome of illustrates the list of top-K recommendations.

Definitions

In this section, we define some technical terminologies that are to be utilized in implementation or comparison.

Hash function

A function f is hash function f if where Υ = {−1, 1} satisfies the polynomial bounded predetermined constraint α : Γ → Γ ∣ α(μ) < μ, ∀ μ ∈ Γ, and belongs to the infinite family of functions. A f is collision-proof if it map information instances from a non-fixed bulk of triplets into a fixed set of independent data entities with an expected ideal hit-ratio of A ← f(B) ∣ f(A) = A optimally for all cases, but it is NP-hard task.

Information hashing

It is the process of transforming data-instances to the hamming space, i.e., {−1, 1} where , to learn non-linear mapping f : y ⟶ h ∈ {−1, 1} to transform each entity into a μ-bit binary hash-codes h = f(y), such that; the relevance among the transformed entities can be preserved in hash-codes.

Hamming-space constraints

Hamming space imposes un-correlation and balance as validation constraints on the bits of transformed information. Un-correlation and bit-balance refer to the sparseness of vector’s dimensions and equal probability of hit or miss respectively. Specifically, un-correlation is the rows must not have any correlation with other rows in the corresponding matrix, and bit-balance is out of any two bits of 0 and 1, we must have a 1.

Locality sensitive (LS) hashing

The LS function depends on the distance between two consecutive data-points and the probability of collision of the data-points is inversely proportional to their mutual distance. LH belongs to the family of hash functions f that maps a location α from testing information instances to retrieve a location β from z hash-bucket (HB) as HB in the bucket-space Ω as LS = {z : α → β}. Formally, to retrieve β from HB via a function call f, the following condition defines LS as: Where P[·] is probabilistic function and is a rectangular-shaped lengthy box having β as a target location of the projected location α, and shows the number of possible attempts (i.e., ) to get the required location in the case of miss. Moreover, α ≃ β is assumption of hit, α and α are constants; h = 0, and 0 < α < α < 1. The probability of collision of α and β decreases with each increase in the distance between them in Ω.

Knowledge graph and Meta-path

A graph G = (E, P, Φ, Ψ), where E represents the set of entities, P shows the set of paths-between-entities, Φ denotes the entity-type mapping function f as , and Ψ describes the path-type mapping function f as . Therefore, each entity e ⊆ E is mapped to the specific entity-type in E, i.e., , and each path is mapped to the concerned path-type in , i.e., . If or then G is heterogeneous. Meta path℘ is a sequence of finite paths p between the consecutive nodes in G. For instance, a Meta path from n to n is written as℘(n, n) = p1 ∘ p2 ∘ p3 implies that , where ∘ is the composition operator.

Interaction log and session

Interaction log is a timespan-based series of user-interactions preserved through user profiling [38] and stored in tracks. The contains set of items , set of interactions like viewed, rated, purchased, etc., set of sessions , and set of timespans T = {t1, t2, …, t}. For instance, an arbitrary session or where T is implicitly retrieved from sessions or explicitly collected from interactions respectively.

The proposed methodology

This section contains Pre-Processing, Hashing and Presentation modules. In this section, we purify the datasets, construct the influential (local subgraph) graph, perform Hashing, and compile-&-display the recommendation outcomes.

Pre-processing module

Data Embedding, Dimension Rationalization, Data Purification and subgraph Construction are described in this module.

Data embedding

We used TransD [39] to embed entities and relations to the independent vector spaces and preserved their isolation through the following precise matrices Where h, p, t denote a triplet, and and show the embedded information; entities belong to ℝ and relations to ℝ, represents the identity matrix having 1 s on diagonal and 0 s elsewhere, and and are used to project the vectors of head and tail entities to concerned relation-spaces respectively. Subsequently, the plausibility factor f of a given triplet (n, p, n) is defined as

Dimension rationalization

Although placing identical information instances in identical vector spaces is important for efficient computation, the initial embeddings span over vast vector dimensions, and filtering out the less relevant instances is a possible solution [40]. Therefore, to filter out the less relevant data-instances we introduced a Relevance-Factor Rel constrained by the threshold ϑ. Rel evaluates the relevance-extent Rel between current node n and the candidate node n to validate whether to discard n or not. If the candidate of Rel satisfies ϑ, that is assigned with 1 and 0 otherwise. All entries with 1 are orthogonally stored via Eq. (2) or (3) and rest of the entries are discarded and updated with 0 according to Lemma 1, based on their relative relevance-extents with n. Therefore, the transition matrix is linearly computed because further it doesn’t depend upon the previous entries.

Lemma 1

Proof: Eq. (2) and (3) represent ordinary matrices that can be operated wrt different matrices for storage or transformation. Therefore, we transform the information instances n to the orthogonal basis of orthogonal basis matrix to maintain a sequential flow among them. Let’s recall from the basic matrix theory, for each i of , the first i instances n form orthogonal base from [k, l] ∣ k = l = 0 to [K, L] wrt the winding instance of {n}, is expressed as by increasing [k, l] by 1 with each iteration; where k and l increase with an increase in row and column respectively. The highest index value of k and l in the matrix depends upon the row-head index of k. Mutually relevant data instances are orthogonally co-related in nature and present a unit matrix k × l in the Euclidian embedding space. Henceforth, i + 1 instances (i.e., n) creates cosine angles with i instances n from [k + 2, l + 2] to [K, L] wrt {n} as sim = cos(n, n). Since, the underlying instances have unit length; thus, their cosine is equal to their dot product as cos(n, n) = n · n, by putting , ∀l > i, we formulate the expression of similarity between two instances as , where is the first instance of i, and thus . As, we already mentioned that the instances have unit lengths, thus, the length of n is equal to 1. Thus, . Since, for index [1, 1] in Eq. (5), is the first instance of n wrt i, and hence . Let us consider for all l > k + 1 and α = k = l, then from ; for the previous instance of k, we have ω = 1 as Hence it is proved that for all instances, ω effectively describes the coordinates of i against l wrt {n}. On substitution of 1 against k = l in Eq. (5), we have , , and so on. Further, we also described the raw iteration set wrt all of the related entries in corresponding matrix as shown in Eq. (5).

Data purification and influential-graph construction

We introduce Node Relevance-based Guided-walk (NRG) modeling technique to refine data by determining, highlighting and maintaining the prominent paths in the embedded space. Although a large quantity of works exist in the literature [14, 15] on meta-path-based random crawl sampling techniques, it face diverse types of limitations that significantly degrade the performance of recommendation. For instance, path length constraints, inability to effectively capture the graph structure due to linear paths, difficulties in optimized path selection, etc. are a few notable examples of their drawbacks. But the major drawback is applying no-constraint on information relevance concern; all cadres of the data is considered as the part of the underlying information that result in performance degradation and unwanted computational overhead. Therefore, we apply NRG modeling technique to determine such unique meta-paths that portray semantically relevant information to construct the influential-graph . The modeling mechanism of NRG is similar to the spreading effect of a pandemic. The main objectives of NRG modeling technique are; (i) Determining semantically relevant nodes, (ii) Maintaining single hop prominent paths, (iii) Unifying single hop prominent paths together to create meta-paths, (iv) Identifying the meta-paths via unique identifiers (IDs), (v) Interlinking the meta-paths wrt their IDs to create , (vi) Discarding the irrelevant data. Using Eq. (7) and (8), we randomly consider a node as a central node , termed it as the current node , calculate the relevance between n and the candidate nodes n ∣ j = 1, 2, 3…, and selected the node having the highest similarity with n as the target node n for the next step of the walker. Then for the second hop, we again consider n as n ∣ n = n and repeated the process to get the next relevant node, and so on. After completing a 5-hops long meta-path and assigning it with the identifying ID, e.g., ℘1, we come back to the initial point, i.e., 1st-hop, and consider the node having the second highest relevance with n, and repeated the process for the next meta-path. After evaluating all of the neighbors of , we considered the highest-relevant neighbor of the as the next and repeated the process. Iteratively, the model performed the same technique for all of the nodes, and subsequently, the meta-paths were stored in an indexed track wrt their assigned IDs, as described in Algorithm 1. Finally, the meta-paths are combined based on their IDs to construct the local subgraph. We calculated the relevance between the corresponding nodes in two steps of comparison and sum-up the outcomes; (1) tendency between n and n, (2) Local similarity between n and n, and the similarity between n and the information on the path going from n to n. For (1), we calculated the factor of similarity-based node-to-node tendency to predict the next node, as: Where f(·) shows the function that calculates the tendency between the concerned nodes, defined as: Where x ∣ 0 < x < 1 and ten represent the co-efficient and parameter of tendency; and show the recent and previous user-interactions, respectively. In the case of tendency, the ten factor greatly depends upon the input of . In case of (2), for (n, n) and (n, n, p), we apply Adamic and Jaccard (AJ) similarities, respectively: Where represents the ratio of common to all neighbors of the candidate node. To get the required relevance between n and n, we summed up Eq. (7) and (8) into (10) as, Where Rel describes the extent of the relevance between the current and the candidate nodes. The greater is the value of Rel(n, n), the higher is the probability of n to become the next step of the crawler. The nodes are interlinked in based on their relevance factor with each other in the graph. For instance, a current node n is linked to its neighbors , if fulfills the applied relevance constraints on inclusion to as and f(x) is defined as: Where 1 shows the existence of influential connection between neighbors, 0 represents no connection, ϑ is a threshold and its value is 30% in this case, and defines generalization expression for Rel as Where is the first-order influential graph, ∧ is the sequential concatenation operator, n is the next node that is linked to n, and n is the node that is not-connected to n.

Hashing module

In this section, we transform to hash-codes, place identical hash-codes in identical hash-buckets, retrieve information and generate the final outcomes.

Transformation

We transform k-dimensional actual-valued continuous-embeddings n to b-dimensional binarized embeddings n, where |k| = ∣ b∣. We applied tanh activation function for transformation as Where W ∈ ℝ and show weight parameters and bias factor respectively. We independently denote u-v hash-codes as h = h = {±1}, where μ represents the length of bits. For formal hash-codes h, we applied sign function to formalize f wrt all instances as, if f ≥ 0 the function enters +1 against each i into otherwise−1. To overcome the limitations5 of the sign function, we incorporated SRS6 [41] with margin weight γ on continuous increase, as an activation function, till the convergence of instances. We defined the SRS through margin weight γ as follows. Where γ ∣ γ > 0 shows the cumulative margin weight and x is the actual input. Formally, with a continuous increase in the value of γ, SRS approaches the original sign function, as Thus, the hash-codes of the corresponding entities are preserved, but to enhance the performance via maximizing the Hash-Hits, it is essential to embed the similar hash-codes into the similar hash-buckets.

Identical hash-codes to identical hash-buckets

Although the orthogonal arrangement of instances is feasible, we verified their apparent relevance with their corresponding neighbor’s through Deep-Probabilistic (dProb) technique, as shown in Algorithm 2. We provided the hash-codes h to the DNN as h = [i = 1, 2, …, K] to calculate the mutual likelihood of the corresponding entities through the following process: Where k is the depth of layer and σ is the non-linear activation function; W( represents the model’s weight matrix wrt k, shows the outcome and b( is bias factor of kth layer; is the loss function, λ is the regularization of kth layer, Θ describes the set of model’s hyper parameters, and is the Euclidian norm. Through the dProb, we formulated the interlinking probability N of n and n wrt their concerned hash-codes h and h respectively. N is defined as: Additively, path p selection probability P by n to go to n, to consider it as relevant node, wrt to their concerned hash-codes h and h respectively. P is defined as: Thus, the total probability is the sum of Eq. (17) and (18), we have:where Φ = x · σ(x), Ψ = 1 − x · σ(x); x = Δ(h, h), σ is sigmoid function, i.e., σ(x) = (1 + exp (−x))−1, Δ represents the distance; and to activate the hash-codes, we used the swish activation function, i.e., f(x) = x · σ(x) [42]. The is subjected to the probabilistic density function f(Π) as to preserve the selection head in the data-range determined by the density function f(Π), defined as: The information instances are placed in relevant hash-buckets-based on their mutual likelihood, as shown in the Fig. 2, to enhance the hit-ratio of hash functions up to a maximum possible extent.

Fig. 2

Conversion of prominent Meta-Paths to Hashed Tracks of Binarized Information. Abbreviations used: MP – Meta Path, ep – Entity Path (one relation distance between two nodes), r – relation, eh & et – Head & Tail entities respectively

Information retrieval

Data retrieval via hashing-method recovers the loss of information with radial-based data spreading technique. It captures the local-structure of information and reduces its feature’s dimensionality via nonlinear-projection. The retrieval process is presented in Algorithm 3. Moreover, graph anchors are incorporated to highlight the selective features (hash-codes) in the information instances. We configured the graph anchors to alleviate the burdensome of blind matching in the buckets. We transformed the items of A to the representations of graph anchor as: Where f(a) is a function used to represent the graph anchors and i = 1, 2, …, υ, and shows the Euclidean distance between the selected feature i and the graph anchor a. The defined linear models E, R, H, H are binary representations of information in hamming space, therefore according to [36], linear auto-encoder regression (LAR) is feasible for projection. To persist the semantic relevance among the information instances, we applied LAR for matrix projection to minimize the regression loss and used transpose for matrix multiplication to optimize the computational overhead. We performed two-way linear projection from H into E (1st way) and conversely with the graph anchored factor a from E into H (2nd way). We aggregated the projections via additive association by keeping the hit-rate maximized among the instances of H and E matrices. The objective function is formulated as: Similarly, we projected H and R into each other by defining through the following expression: Where ξ shows a hyper parameter used to counterbalance the projection impact on both sides. After the required projections, we again applied dProb via Eq. (19), to attack the hash-buckets via function calls H(α) to directly retrieve the required hash-codes and return them to the presentation module. Where represents the sequential retrieval, α shows the index of f to hit the target, H represents the hash-bucket and β is the specific hash-code being retrieved during the Hash-Hit in Eq. (19). In case of Hash-Miss, we call LS function exactly from the location in the bucket that returned 0 to the 1st call, and try to retrieve the required code in a margin of 3 indices above and below the location of Hash-Miss. The LS expression for retrieval is described as: Where is Relevance between α and β, R is mean of all user references, H is the corresponding hash-buckets of the very specific hash-codes, and . Any set of bits represents a matrix of binary instances. In {−1, 1}, where μ = n × k, each tuple is a hash-code of instance n of n dimension and k is the total number of instances. In order to preserve the semantic relevance among the representations of identical instances and to optimize their entropy, we exploit un-correlation of each bit as ∆∆⊤ = Ι and balance of each bit as ∆b = 0, where ∆∆⊤ is a dot product to acquire the unity diagonal index Ι of vector of b with a length k having all ones as b = 1. Typically, to make the objective function to retrieve the identical hash-codes in the maximum number of its iterations, un-correlation demands to preserve the condition that each Δ-row must not be co-related with any other row-index of ∆ by ignoring the concern of correlation of the column-index; whereas, balance demands to preserve the condition that each bit must demonstrate 1 in half of the frequency of its occurrences. We applied these constraints through the extended normalization technique, i.e., Elastic Net Regularization (ENR) [43].

Presentation module

This module triggered the hash calls; and in return collected, IDfied and stored the responses wrt the function calls in the result pool. Thus, the received hash-codes about the inter-entity interactions are processed and potential preferences are generated as; Where is the required preference score from user u to the potential item v, φ is multilayer perceptron, ⨂ is focus projection operator and, h and h are the hash-codes of the concerned information instances of users and items (i.e., the source and destination h-codes), and and are hash hit and miss expressions, respectively. We adopted an equally divide and conquer strategy on each higher layer by overwhelming the hidden units to model the abstracted information. We applied Swish to activate the hidden layers and to regulate the outcomes of between 0 and 1 wrt the Elastic Net Regularization. Finally, based on the predicted preferences about the potential interests of the users, this module generated the recommendation results.

Optimization

ENR is aggregation of ridge and lasso coefficients, i.e., ℓ1 and ℓ2 norm regularizations respectively, to generate an optimized output. ENR creates λ elastic net by tuning α parameters to 0 for ridge and 1 for lasso coefficients. In ENR, we selected α between 0 and 1 to optimize the elastic-net7 to effectively shrink the coefficients and to set them to 0 for dealing with the sparse selections [44]. In overall regularization, we applied ENR to optimize the experimental setup via the following loss function. Where β represents the retrieved instances, x and y are the classes of present and required instances, α is the coefficient of ENR-regularization set for f, and λ is a hyper-parameter to counter-balance the impact of loss.

Testimony loss

Transformation outputs fall in two categories wrt completion, i.e., correctly transformed and reported with 1, and faced some issue and reported with 0. The correctly transformed instances belong to pool , and rest are considered as the transformation forfeiture (loss), that is defined as: Where is the testimony of transformation forfeiture, t is the index of transformation, , λ is ENR coefficient, Ω is the set of model’s parameters and is Euclidian norm.

Interaction loss

In information retrieval, the observed interactions between user and item are considered as 1 and not-observed (or missing) interactions as 0. The 1 s possess upper edge in loss declaration as compared to 0 s because they contribute to preference generation [45]. The testimony of optimal interaction loss is expressed as: Where and are the observed and un-observed interactions among users and items, respectively.

Training

According to [46], we optimized the overall loss via SGD8 that streamlines learning-rate according to the absolute-value of the concerned gradients. We trained hyper-parameters via back propagation by maximizing the log-likelihood of as: Where Π operates the swish activator that is triggered via softmax-function as:

The overall loss (forfeiture)

Collectively, the overall loss belongs to four different aspects of the proposed approach; i.e., (i) Relevance calculation , (ii) Translation , (iii) Transformation , and (iv) Decoding . For (i), we declared relevance error-ratio between the relevant and the irrelevant nodes as Where Y = (a, b) are the nodes that are mutually irrelevant – suggested by the function, and are relevant ones. For (ii), we declared where B ∣ B = (a, p, b) describes a triplet that is invalid because the facts in a or b or both are either unreadable or missing. Although the required information is automatically inserted to the concerned representations by the regularization-layer of the translator based on the structure of triplets, yet the triplets are not-validated by the triplet-validator [47], whereas is a validated triplet. For (iii), is declared in Eq. (28), and similarly for (iv), is given in Eq. (29). Therefore, is the total loss of the proposed approach; where η is a hyper parameter used to counter-balance the impact of overall loss.

The time complexity

Collectively, the Time Complexity (TC) belongs to five different aspects of H-SAGE; (i) Rel, (ii) Meta-Path aggregation, (iii) Translation and Transformation, (iv) Hashing on Hit, and (v) Hashing on Miss. For (i), the TC of entity-relevance is O(|Rel|d2) and that of semantic-relevance is O(|SRel|d) where, SRel is semantic-Relevance. So, the total TC of (i) is O(|Rel|d2 + |SRel|d) ≃ O(|Rel|d2). For (ii), we collected the facts wrt aggregation of meta-paths from (i), that possesses the TC of modeling entities to meta-paths and meta-paths to influential graph. Therefore, O(|pN|d + |plogN|d) and O(|℘M|d + |℘logM|d) are TCs of aggregation of paths and meta-paths respectively. So, the total TC of (ii) is O(|pN + ℘ M|d + |plogN + ℘ logM|d) ≃ O(|C|d + |logC|d), where p is path, N is number of paths in meta-path, ℘ is meta-path, M is number of meta-paths in the influential graph and C is constant. Similarly, for (iii), the TC for translation is because it belongs to the construction module in offline processing, we already discussed it in (i); and in case of transformation, it possesses offline linear computing time of , where is the influential graph. For (iv), TC is O(1), and for (v), TC is , where H is hash-code and b is bucket. By adding the TCs of all parts, we have as the highest optimal TC of non-hashing part of H-SAGE; that is not always executed. On the other hand, TC of Hash-Hit and Hash-Miss is O(1) and respectively. Thus, the overall TC of H-SAGE is feasible for experiments on preprocessed datasets. Concerning to the individual TC of the proposed algorithms, we provide statement-wise computational time t(n) and total computational time T(n) of each algorithm in the following section.

Algorithm 1

In this algorithm, we fetch prominent meta-paths in the raw data and inter-connect them to construct the subgraph. We mention the t(n) as: foreach (nc, nj) pair do; t(n) = n. The TC of all statements in the outer-loop is t(n) = n, but these statements are not involved in the sequential computation, therefore, their t(n) = 1. In the inner block; while (h ! = H) do; t(n) = n, ; t(n) = n + 1, if(cmp(nc, nj) > ϑ; t(n) = 2n − 1, ; t(n) = log n, identify & store ℘ as ℘ [ ] ← ℘i; t(n) = n, Rest, in all of the k statements; t(n) = n = kn. The TC of the inner block is T(n) = n + n + 1 + 2n − 1 + log n + kn = 5n + kn + log n, and the total TC via the outer loop is T(n) = n(5n + kn + log n) = 5n2 + kn2 + nlogn ≅ O(n2).

Algorithm 2

This algorithm is based on YES/NO strategy in a sense the required hash-code FOUND/NOT-FOUND to move that to the hash-buckets. In other words, the inner calculations rely on discrete values having no concern with the sequential processing and management. The t(n) is as: do; t(n) = n; is initiator, ; t(n) = n, func N2N-P; t(n) = n, for i = 1 to K wrt H(hc, hj) do; t(n) = n, in Eq. (17); t(n) = log n, from Eq. (18); t(n) = log n, in Eq. (19); t(n) = constant, return ; t(n) = n, store in bucket i; t(n) = n, For rest of the k statements; t(n) = kn. Hence expressively, the total TC is T(n) = n(n(klogn) + kn) = n2klogn + kn2 ≅ O(n2); but this algorithm have no sequential processing. For each outer-loop, the instant-inner loop is just iteration if there is no sequential processing, deem the computation in such nested loops with additive property as T(n) = O(n + n) instead of multiplicative property as O(n ∗ n). Thus, in our case the overall T(n) is n + n + klogn + kn ≅ O(n).

Algorithm 3

In this algorithm, we retrieve the target hash-code via Hash-hit or Hash-Miss. The t(n) is: do; t(n) = n; it is basically the initiator, foreach index α in P do; t(n) = n, func H-Retrieval; t(n) = n, for i = 1 to K wrt H do; t(n) = n, β in Eq. (19); t(n) = constant; if it gives 1 (Hash-Hit), β; t(n) = log n; if it gives 0 (Hash-Miss), In case of 0, we need to do the following process: in Eq (25); t(n) = log n, For rest of the k statements; t(n) = kn. Hence based on the discussion in Algorithm 2, in case of Hash-Hit, T(n) = n(constant) ≅ O(1), because β gives 1, and an indication along-with the target hash-code is sent to the main function saying that this hash-code has matched and control is also transferred to the main function, therefore, the TC of Hash-Hit is O(1). On the other hand, during the Hash-Miss, β gives 0, and the statement after the else statement is executed for locality-sensitive hashing. Therefore, the total TC in Hash-Miss is T(n) = nlogn ≅ O(n).

Experiments

In this section, we deal with the following research questions – the main objectives of this work. RQ1: Can H-SAGE outperform the-state-of-the-art methods wrt performance? RQ2: Can H-SAGE outperform the-state-of-the-art methods wrt computational complexity? RQ3: How is the performance of H-SAGE in dealing with the issues of data-sparsity? RQ4: What is the impact of different modules of H-SAGE on performance? RQ5: Is H-SAGE sensitive to the different adjustments of hyper-parameters? RQ6: Can H-SAGE provide explainable personalized recommendations?

Experimental setup

In this section, we discuss the exploited datasets and the applied evaluation techniques (metrics). Also, we define the baseline methodologies selected for comparison with H-SAGE and the environment settings of hyper-parameters.

Data and data pre-processing

We came across extensive experimental work on three real world benchmark datasets, i.e., Amazon-Book, Last-FM and Bing-News to evaluate the performance of H-SAGE. Amazon-Book9 contains users, items, interactions and over 20 M user ratings (in a range of 1 to 5) about the books. It is retrieved from Amazon-product’s data – a widely used knowledge base for various product’s recommendation [48]. Last-FM10 contains information about the performance of musicians and previous music-listening records of the interacted users (i.e., music-tracks). The tracks are considered as items. The information is retrieved from online music-system, i.e., Last.fm [49]. Bing-News,11 also known as MIND12 - an assemblage of implicitly collected user-feedbacks from the server-side logs of MS News,13 contains users with their feedbacks and small statements of news with titles from December 22, 2020, to May 30, 2021, according to [50]. There exist many other benchmark and ordinary datasets having various versions or variations like Movie-Lens (e.g., Movie-Lens-100 K, Movie-Lens-1 M, Movie-Lens-10 M, etc.), YELP (e.g., Yelp-2013, 2014, 2018, etc.), Douban-Book, Book Crossing, KKBox, CEM, Dianping-Food, etc. But, by keeping the space limitations in our consideration, we selected only these three benchmark datasets to perform the experiments because they are populated and hereby commonly utilized datasets. In particular, they contain explicit/implicit ratings from the users towards items in different ranges (i.e., 1–5/1–10) and provide positive argumentation to augment the decision-making process. In the datasets, although we have users-to-items interactions, we are required to create an enriched item knowledge-base for each dataset with or without the help of an external knowledge-base – to enrich the entity representations of the datasets. Therefore, for Amazon-Book, we preprocessed the actual dataset to highlight the interactions between users and items. As initial input, we used ID-embeddings of entities and considered 1 if we obtained any interaction between users and books. Further according to [50], we identified triplets with word “book” anywhere in the context of the dataset, validated via the imposed constraint of RAV14 >0.5 among the nodes, and retrieved their entities and relations to enrich the entities of the local subgraph. To maintain the consecutive streams of relevant information in the graph, we enriched Meta paths by incorporating such triplets in the concerned sequence that had “book” in minimum at 3 places out of 5 in ℘. We guaranteed feasible relations among entities in the ℘ via unique triplet identifiers, i.e., TIDs, to preserve the synchronized relations among old and new information instances in the Meta-paths. To further enrich the underlying information, we kept the RAV fixed and used a formatted query (i.e., = < ∗ . book. ∗ , ∗ . book. ∗ , ∗ . book. ∗ >) to retrieve feasible and relevant information from MS-Satori. Finally, we used TransE [47] to validate triplet’s granularity and redundancy and discarded the inappropriate instances. According to KB4Rec,15 we retrieved map-able information instances and their mappings from DBpedia-ontology16 and exploited to enrich the entity-information of the local subgraph of Last-FM. We aligned the data-items of Last-FM according to the highlighted information instances of the external KG, and retrieved only those instances that have a relation-frequency r ≥ 3 wrt their first-order neighbors. We continued the process under the applied constraints to enrich each data-item in the local subgraph up to the Meta-path extent of 5 hops from every current item. Moreover, for Bing-News we mapped the embeddings of corresponding items of the subgraph, via concatenation of item-IDs, to the embeddings of string-tokens of titles and statements of news retrieved from MS-Satori. The information in string-tokens of news-titles and statements is greater than information in titles of books and music-tracks; thus, Bing-News is comparatively more feasible for effective decision making. We removed those items that had no relevant mappings in the external KGs as well as users with ratings <5, and calculated the data sparsity via “subtracting the ratio of interactions to the product of users and items from 1” as . Moreover, we divided the datasets in 70%, 20% and 10% of ratings as Training, Testing and Validation, respectively. The statistics are displayed in Table 1, as well as we released and published the datasets in Mendeley-Data entitled “H-SAGE-Dataset”.17

Table 1

The Statistics of Datasets Utilized

Literals	Datasets
Literals	Amazon-Book	Last-FM	Bing-News
Domain	Books	Music	News
Users u	55,255	1865	40,237
Items v	20,235	6526	32,562
Interactions \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal{I}$$\end{document}I	232,562	68,456	192,356
Entities	77,253	12,039	76,586
r-Types	29	58	45
r-Counts	122,562	29,650	115,620
Sparsity sp	0.999792	0.994375	0.999853
Training Y_train	162,793	47,919	134,649
Testing Y_test	46,512	13,691	38,471
Validation Y_val	23,256	6845	19,235

The Statistics of Datasets Utilized

Evaluation

We used well-known CTRP18 and top-K recommendation for performance evaluation. In CTRP, we calculated the predictive possibility of next click wrt the testing data given the learning interactions of training data. We applied (AUC & Acc)19 for the performance evaluation of H-SAGE and baseline methods wrt CTRP. Similarly, we obtained top-K items with having the highest probability of the next possible click via utilization of training data-interactions for each instance of user in the testing data. We applied (Prec, Rec, NDCG)@K20 for the performance evaluation of top-K recommendation.

Comparison

To evaluate H-SAGE, we selected eight state-of-the-art methods from various relevant implementation and application perspectives. PER [8] designates user to item and item to item relations and represents the heterogeneity of KG based on features extraction among nodes via Meta path interconnections. CKE [18] formulates aggregated Bayesian framework to empower matrix factorization with the help of TransR-based embeddings and combines this knowledge base with CF-mechanism for recommendation. MCRec [13] co-attentively supports HIG and attains the context and actual representations of entities via meta-path-based random walk technique. RippleNet [23] aggregates path and embedding-based recommendation methods and enriches the representations of users by adding the representations of items to the paths interconnecting users via propagation. KGAT [25] utilizes TransR to obtain the initial representations of users and items and performs representations propagation. It enriches entity representations with the information of corresponding neighbors of items. AKGE [26] models higher order relations though information propagation on less-distance-similarity-based self-constructed subgraph. NACF [27] collects neighborhood information of the entities to mine the potential relations between users 0an0d items to generate the potential preferences for their unobserved items. DKEN [28] exploits the impact of data exchange between implicit and explicit semantics of information in user to item interactions and KG-based features respectively, through CIS21 layer to preserve a better grip on semantical and hierarchical structure of the information in KG.

Parameterization

We tried hyper-parameters wrt different tentative settings, and finalized an optimized environment of parameterization via Grid-Search technique [46] as summarized in Table 2; literals are defined in the caption of Table 2. We maintained μ between −0.2 to 0.8 and b = 1024 for all datasets. Moreover, we selected optimized parameter-values for all datasets from the collections of candidate values, as η = 7 × 10−4 out of {5 × 10−2, 5 × 10−3, 7 × 10−3, 7 × 10−4}, λ = 10−7 out of {10−10, 10−9, …, 102, 103}, and ε = 10−3 out of {10−4, 10−3, 10−2, 10−1}. Similarly, we kept s as 3, 4, 4, d as 32, 64, 64, and K as 16, 32, 32, for Amazon-Book, Last-FM and Bing-News respectively. We kept d unchanged for the implementation of baseline methods and utilized grid-search technique to adjust rest of their parameters. We repeated the experiments thrice and reported averages of the achieved results.

Table 2

The settings of hyper parameters wrt the experimental Environment. Literals: μ – Dropout Ration, b – Batch Size, η – Learning Rate, λ – L2 Regularization Weight, ε – KGE Weight, s – hop-length wrt path-steps, d – Dimensions of Embedding, K – The sampling size of influential neighboring

Datasets	Environment Setting
Amazon-Book	μ = [−0.2, 08], b = 1024, η = 7 × 10⁻⁴, λ = 10⁻⁷, ε = 10⁻³, s = 3, d = 32, K = 16
Last-FM	μ = [−0.2, 08], b = 1024, η = 7 × 10⁻⁴, λ = 10⁻⁷, ε = 10⁻³, s = 4, d = 64, K = 32
Bing-News	μ = [−0.2, 08], b = 1024, η = 7 × 10⁻⁴, λ = 10⁻⁷, ε = 10⁻³, s = 4, d = 64, K = 32

Comparative study (RQ1)

Formally, Table 3 represents the complete results of CTRP wrt AUC and Acc, whereas Table 4 demonstrates the results of top-K recommendations wrt (Prec, Rec and NDCG)@K = 5 and 10 only. Regarding top-K recommendation, we clarify that we performed experiments on eight different variations of K, i.e., 1, 2, 5, 10, 25, 50, 75, 100, but due to the space limitations, we could only present the results via K = 5 and 10 in Table 4, and the complete outcome of this process is shown in Fig. 3.

Table 3

CTRP Results: Evaluated wrt AUC and Acc. Terms: Upper-Bound (α), Lower-Bound (β), Mean ()

Approaches		Amazon-Book		Last-FM		Bing-News
Approaches		AUC	Acc	AUC	Acc	AUC	Acc
PER		0.6210	0.5812	0.6129	0.5866	0.5153	0.4932
CKE		0.6419	0.6056	0.7389	0.6632	0.5432	0.5011
MCRec		0.6421	0.6287	0.7412	0.6701	0.5814	0.5633
RippleNet		0.6646	0.6419	0.7611	0.6787	0.6418	0.6002
KGAT		0.6789	0.6498	0.7691	0.6823	0.6796	0.6437
AKGE		0.6641	0.6399	0.7785	0.6891	0.6632	0.6411
NACF		0.7063	0.6734	0.7913	0.7232	0.6952	0.6741
DKEN		0.7309^*	0.6911^*	0.8019^*	0.7415^*	0.7215^*	0.6959^*
H-SAGE		0.7597	0.7208	0.8313	0.7727	0.7436	0.7189
Improved: (%)-age	α	03.79	04.12	03.54	04.04	02.97	03.20
	β	03.94	04.30	03.67	04.21	03.06	03.31
	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\overline{\mathrm{x}}$$\end{document}x¯	03.87	04.21	03.60	04.12	03.02	03.25

*The numbers in bold represent the most significant values among the identical comparing outcomes, and the numbers in italic with '*' describe the second important values accordingly

Table 4

Top-k Recommendations via Prec, Rec & NDCG. Terms: Upper-Bound (α), Lower-Bound (β), Mean ()

Datasets & Evaluation @K = 5, 10		Comparison and Improvements of H-SAGE wrt the Baseline Approaches
		PER	CKE	MCRec	RNet^a	KGAT	AKGE	NACF	DKEN	H-SAGE	Improved: (%)-age
		PER	CKE	MCRec	RNet^a	KGAT	AKGE	NACF	DKEN	H-SAGE	α	β	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\overline{x}$$\end{document}x¯
Amazon-Book	Prec@05	0.057	0.060	0.067	0.074	0.077	0.078	0.090	0.095^*	0.110	13.64	15.79	14.71
	Prec@10	0.052	0.050	0.060	0.076	0.080	0.080	0.088	0.094^*	0.105	10.48	11.70	11.09
	Rec@05	0.027	0.031	0.030	0.032	0.035	0.035	0.039	0.040^*	0.042	04.76	05.00	04.88
	Rec@10	0.029	0.032	0.034	0.035	0.037	0.040	0.043	0.045^*	0.048	06.25	06.67	06.46
	NDCG@05	0.031	0.031	0.031	0.032	0.033	0.034^*	0.034^*	0.034^*	0.036	05.56	05.88	05.72
	NDCG@10	0.032	0.032	0.034	0.035	0.037^*	0.035	0.037^*	0.037^*	0.039	05.13	05.41	05.27
Last-FM	Prec@05	0.057	0.060	0.067	0.064	0.073	0.075	0.073	0.077^*	0.081	04.94	05.19	05.07
	Prec@10	0.050	0.053	0.060	0.061	0.064	0.064	0.066	0.068^*	0.071	04.23	04.41	04.32
	Rec@05	0.020	0.030	0.040	0.050	0.050	0.040	0.050	0.053^*	0.058	08.62	09.43	09.03
	Rec@10	0.030	0.040	0.050	0.060	0.070	0.060	0.080^*	0.077	0.091	12.09	13.75	12.92
	NDCG@05	0.030	0.033	0.045	0.045	0.050	0.045	0.055	0.057^*	0.063	09.52	10.53	10.03
	NDCG@10	0.060	0.080	0.090	0.100	0.110	0.114	0.119^*	0.113	0.130	08.46	09.24	08.85
Bing-News	Prec@05	0.004	0.004	0.006	0.007	0.007	0.007	0.009	0.010^*	0.011	09.09	10.00	09.55
	Prec@10	0.004	0.004	0.006	0.007	0.008	0.008	0.010^*	0.010^*	0.011	09.09	10.00	09.55
	Rec@05	0.027	0.027	0.028	0.032	0.035	0.035	0.039	0.040^*	0.045	11.11	12.50	11.81
	Rec@10	0.028	0.029	0.030	0.035	0.037	0.036	0.043	0.045^*	0.055	18.18	22.22	20.20
	NDCG@05	0.010	0.011	0.015	0.017	0.023	0.025	0.029	0.033^*	0.037	10.81	12.12	11.47
	NDCG@10	0.050	0.070	0.100	0.110	0.110	0.115	0.120	0.125^*	0.132	05.30	05.60	05.45

aRippleNet. *The numbers in bold represent the most significant values among the identical comparing outcomes, and the numbers in italic with '*' describe the second important values accordingly

Fig. 3

Result-Analysis of top-K recommendations wrt (Prec, Rec, NDCG)@K on the three Mentioned Datasets

CTRP Results: Evaluated wrt AUC and Acc. Terms: Upper-Bound (α), Lower-Bound (β), Mean () *The numbers in bold represent the most significant values among the identical comparing outcomes, and the numbers in italic with '*' describe the second important values accordingly Top-k Recommendations via Prec, Rec & NDCG. Terms: Upper-Bound (α), Lower-Bound (β), Mean () aRippleNet. *The numbers in bold represent the most significant values among the identical comparing outcomes, and the numbers in italic with '*' describe the second important values accordingly Result-Analysis of top-K recommendations wrt (Prec, Rec, NDCG)@K on the three Mentioned Datasets In this section, we discuss the results analysis of the conducted experiments wrt the comparative study. The proposed approach has outperformed the baselines on all datasets with decent margin of improvement, shows that H-SAGE is capable of providing effective recommendations due to its strong mechanisms of data filtration and information-hashing. H-SAGE preserves meaningful information via dedicated meta-paths-based on the inter-entities semantic relevance. It relies on semantic relevance instead of depending on short distance or fixed weights among entities. In DKEN [28]; first, there is no mechanism to filter out the irrelevant information. Second, redundant data is provided to two parallel layers simultaneously. Third, handling more instances of data on different places increases the computational overhead. Although it can cause space complexity, currently it’s not a big issue on smaller datasets. In NACF [27], attention mechanism is used to assign weights in the subgraph; and similarities among them are calculated based on the assigned weights, regardless of the syntactic or semantic relevance among entities or their mutual relations. NACF is outperformed by DKEN notifying that incorporation of CIS and KEN layers contributed more to the performance of DKEN during distribution and aggregation of information. Likewise, Generalization-Layer22 enhanced the generalization capability in learning high level information features from the data. During the construction of subgraph in AKGE [26], the stance lesser is the distance between two entities in the Euclidian space, greater is the similarity between them is considered as the basic rule to find the similarity among entities in KG, caused the emergence of noise to the information. In KGAT [25], extensive propagation is performed and information is gathered without the application of any noise filtration constraint caused noise emergence to the local knowledge base. Moreover, KGAT fails to efficiently attain the hierarchical and sequential structure of nodes in KG due to its attentive-embedding-propagation – a reason that AKGE outperformed it. But, both AKGE and KGAT are outperformed by NACF in about all cases, justifying that the incorporation of similarity mechanism is in favor of performance. Moreover, the propagation of attentive embedding and aggregation is capable of generalizing the corresponding frameworks smoothly as compared to the straight-propagation and aggregation, and the meta-path-based random walk traversal techniques in RippleNet [23] and MCRec [13] respectively. It is why AKGE and KGAT outperformed RippleNet and MCRec wrt different aspects. Counter-wise, MCRec, CKE [18] and PER [8] are outperformed by RippleNet in majority of their comparisons intensifying the supremacy of information propagation over the limited meta-path-based or simple embedding and regularization-based methods. Further, compared to CKE and PER, MCRec comparatively performed well highlighting the significance of information selection technique of Meta-path-based methods over simple regularization or embedding-based methods. Finally, although it’s evident from the empirical values that CKE has a tiny upper edge over PER in some cases, their average performance is almost indistinguishable according to the experiments.

Complexity study (RQ2)

We tabularized the comparison of all approaches wrt the executional time-complexity, as shown in Table 5. For instance, PER and MCRec defined complexities but not defined modules or layers, so, we only put ticks under M / L Mentioned and mentioned their complexities under Defined Time Complexity. We put information of CKE, NACF and DKEN based on their defined pseudocodes; CKE and NACF neither defined complexities nor mentioned modules or layers; whereas, DKEN not defined its complexity but mentioned its layers. Moreover, RippleNet, KGAT and AKGE defined complexities as well as mentioned their modules. Analysis demonstrates that the overall complexities of PER, CKE, MCRec and NACF are greater than O(n2).

Table 5

Study of Time Complexity Comparison of the proposed Approach with the baseline approaches. Terms used: M – Modules, L – Layers, T – Type of the mentioned item, X – No. of Modules or Layers

Approaches	M / L Mentioned			Defined Time Complexity	Time Complexity of M / L
	No	Yes			≥O(n²)	O(n²)	O(n)	O(1)
	No	T	X		≥O(n²)	O(n²)	O(n)	O(1)
PER	✓			O(mn²)	✓
CKE	✓			Not Defined	✓
MCRec	✓			O(lLN + lN log N)	✓
RippleNet		M	2	O(YHK^Hd² + Gd²)	2
RippleNet		M	2	O(YHK^H + 1d + YHK^Hd²)	2
KGAT		M	3	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$O\left(\left\|{G}_2\right\|{d}^2+\sum_{i=1}^L\left\|G\right\|{d}_i{d}_{i-1}+\left\|G\right\|{d}_i\right)$$\end{document}OG2d2+∑i=1LGdidi-1+Gdi	1	1	1
AKGE		M	3	O(PQ + QlogQ)	1	1	1
AKGE		M	3	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$O\left(\left\|R\right\|\sum_{k=1}^K\left\|{E}_s\right\|{d}^2\right)$$\end{document}OR∑k=1KEsd2	1	1	1
NACF	✓			Not Defined	✓
DKEN		L	4	Not Defined	1	2	1
H-SAGE_Miss		L	2	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\left\|M\right\|\sum_{s=1}^S\left\|{\bar{G}}_s\right\|{d}_s^2\simeq O\left({n}^2\right)$$\end{document}M∑s=1SG¯sds2≃On2		1	1
H-SAGE_Miss		L	2	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$O\left(\sum_{i=1}^3\left\|\mathrm{H}\right\|{b}_i\right)\simeq O(n)$$\end{document}O∑i=13Hbi≃O(n)		1	1
H-SAGE_Hit		L	2	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\left\|M\right\|\sum_{s=1}^S\left\|{\bar{G}}_s\right\|{d}_s^2\simeq O\left({n}^2\right)$$\end{document}M∑s=1SG¯sds2≃On2		1		1
H-SAGE_Hit		L	2	O(1)		1		1

Study of Time Complexity Comparison of the proposed Approach with the baseline approaches. Terms used: M – Modules, L – Layers, T – Type of the mentioned item, X – No. of Modules or Layers Similarly, in both modules RippleNet and in one M/L KGAT, AKGE and DKEN each, undergo TC greater than O(n2). Moreover, one M/L of KGAT, AKGE and H-SAGE each, and two of DKEN possess TC of O(n2); and one M/L of KGAT, AKGE, DKEN and H-SAGE undergo TC of O(n). Formally, the worst TC of H-SAGE, i.e., O(n2), is experienced by its first layer due the calculation of similarity; but, its second layer can experience time complexity of O(1) in case of Hash-Hit, otherwise O(n) in Hash-Miss. Conclusively, H-SAGE has the minimum contribution to Table 5, we can claim that it has outperformed the-state-of-the-art methods in comparison of the computational time complexity.

Sparsity study (RQ3)

The experimental results (i.e., Tables 3 and 4, and Fig. 4) demonstrate that the performance of H-SAGE is not bad on sparse datasets as well. For AUC in Fig. 4, the performance continuously increased up to 70% of data-utilization on all datasets, where H-SAGE attained the highest performance. After this till the end, the performance on all datasets decreased gradually except Amazon-Book that abruptly fell down after 90% utilization of data, as shown in Fig. 4(a). On the other hand, it is evident from Fig. 4(b) that the error-rate wrt AUC is considered as the additive inverse of the performance of AUC. Similarly, for Acc, the performance on Amazon-Book and Bing-News gradually increased up to 70% of the data where the performance is highest. However, on Last-FM from 20 to 55% of the data, the Acc kept variating, but achieved the highest performance on 60%. After the highest values, the performance wrt other datasets decreased gradually except Bing-News that abruptly degraded after 80% of data, as in Fig. 4(c). It implies that while dealing with larger sizes of information of news titles and snippets, the model loosed its grip on information structure that caused overfitting. Contrariwise, from Fig. 4(d), the error-rate on Acc is additive inverse of its performance. Finally, we claim that H-SAGE is capable to effectively deal with the limitations of data sparsity.

Fig. 4

AUC and Acc results analysis wrt the Data-Sparsity on the proposed datasets

Ablation study (RQ4)

In section, we demonstrate the ablation study of H-SAGE wrt two different applicability variations. First, wrt the importance of information relevance; and second, wrt the significance of H-SAGE’s architecture.

Impact of prominent Meta-paths

We present the comparison of the proposed Node Relevance Guided-walk (NRG) with four state-of-the-art path modeling methods. First, we applied Meta-path guided Similarity (MS) to create subgraph with similar Meta-paths-based on their mutual similarity [11]. Second, we used Meta-path guided Retrieval (MR) technique to construct local subgraph based on retrieval of selective meta-paths [12]. Third, we utilized Mata-path guided random Walk (MW) approach to acquire prominent paths from KG to create the subgraph [15]. Fourth, we used Shortest-distance steered Node-selection (SN) method to access salient paths to construct subgraph [26]. We experimented the constructed subgraphs via H-SAGE under the titles of H-SAGEMS, H-SAGEMR, H-SAGEMW, H-SAGESN and H-SAGENRG (H-SAGE), and summarized the results in Table 6. The achieved outcomes express small variations with minimal increase in the results from SAGEMS to SAGEMW via SAGEMR. However, the SAGESN is a bit better than the previous techniques due to the utilization of shortest-distance-based similarity. The experimental results demonstrate that H-SAGE has outperformed the state-of-the-art methods by clarifying that the relevance-based information collection can better contribute to the performance.

Table 6

The comparison of performance among variations of H-SAGE based on path selection techniques

H-SAGE Variants	Amazon-Book		Last-FM		Bing-News
H-SAGE Variants	AUC	Acc	AUC	Acc	AUC	Acc
H-SAGE_MS	0.6832	0.6601	0.7532	0.6786	0.6656	0.6515
H-SAGE_MR	0.7011	0.6689	0.7609	0.6910	0.6708	0.6602
H-SAGE_MW	0.7219	0.6799	0.7723	0.6987	0.6898	0.6692
H-SAGE_SN	0.7342	0.6957	0.8011	0.7235	0.7029	0.6720
H-SAGE	0.7597	0.7208	0.8313	0.7727	0.7436	0.7189

The numbers in bold represent the most significant values among the comparing outcomes

The comparison of performance among variations of H-SAGE based on path selection techniques The numbers in bold represent the most significant values among the comparing outcomes

Impact of different modules

For an easy conduct, we verbalize H-SAGE wrt different modules as H-SAGELS, H-SAGELH, H-SAGELS + LH, H-SAGENR + LS, H-SAGENR + LH, H-SAGENR + LS + LH i.e., H-SAGE. In case of H-SAGELS, the performance is lowest because the locality-based hashing alone can neither access the higher order relations nor effectively maintain the required graph structure. H-SAGELH – the standalone learning to hash – is however better than LS wrt the achieved performance, however it cannot effectively tackle the graph structure in hash-miss. Moreover, H-SAGELS + LH has effectively tackled the hash-miss and produced better performance as compared to LS or LH but still it is not satisfactory. The main reason of performance degradation is the occurrence of noise and irrelevant data in the underlying information. Next, we applied H-SAGENR + LS and H-SAGENR + LH, one by one, to judge the difference in their results. Amazingly via H-SAGENR + LS, the performance is better than that of H-SAGELS + LH; and more amazingly, the results of H-SAGENR + LH are better than those of H-SAGENR + LS. At this point, we tried a collection of all, i.e., H-SAGENR + LS + LH, and achieved extraordinarily better performance compared to the previously discussed modules and their combinations with H-SAGE, as summarized in Table 7. Thus, we confirmed H-SAGENR + LS + LH as the proposed model, i.e., H-SAGE with the set of NR + LS + LH as the necessary modules.

Table 7

The comparison of performance wrt different variations of H-SAGE. Abbreviations: H-SAGELS is H-SAGE through LS (Locality Sensitive-hashing), LH (Learning to Hash), NR (Node Relevance), H-SAGELS + LH H-SAGE LS and LH and so on, & H-SAGE means H-SAGEALL i.e., NR + LS + LH

H-SAGE Variants	Amazon-Book		Last-FM		Bing-News
H-SAGE Variants	AUC	Acc	AUC	Acc	AUC	Acc
H-SAGE_LS	0.6875	0.6578	0.7698	0.7192	0.6912	0.6434
H-SAGE_LH	0.7099	0.6626	0.7768	0.7278	0.7007	0.6508
H-SAGE_LS + LH	0.7120	0.6791	0.7811	0.7332	0.7065	0.6621
H-SAGE_NR + LS	0.7331	0.6902	0.8023	0.7519	0.7203	0.6842
H-SAGE_NR + LH	0.7482	0.7099	0.8134	0.7589	0.7289	0.6923
H-SAGE	0.7597	0.7208	0.8313	0.7727	0.7436	0.7189

The numbers in bold represent the most significant values among the comparing outcomes

Sensitivity study (RQ5)

In this section, we demonstrate the impact of hyper-parameter’s sensitivity on the performance of H-SAGE.

Length of Meta-path in higher-order relations

We evaluated the performance of H-SAGE based on the increase in hop-length s wrt the Meta-paths. We performed experiments on s = 1 to 5 with increase of 1 in s at each next iteration. In the results, we noticed a continuous and rapid increase in the performance up to s = 3. With a further increase in s, i.e., s = 4, the performance of H-SAGE still remained better but undergone a slight downfall compared to that on s = 3 wrt a few instances. However, on s > 4; the performance faced such an extraordinary downfall that on s = 5, H-SAGE undergone the worst performance, as shown in Table 8. Hence, we can conclude that on s > 4, the framework faces overfitting that hinders it in effectively capturing the graph structure.

Table 8

The result analysis of H-SAGE’s performance wrt the Meta-path length of higher-order relations

s	Amazon-Book		Last-FM		Bing-News
s	AUC	Acc	AUC	Acc	AUC	Acc
1	0.7326	0.7023	0.8091	0.7485	0.7101	0.6819
2	0.7506	0.7179	0.8259	0.7622	0.7298	0.7011
3	0.7597	0.7208	0.8309	0.7727	0.7436	0.7181
4	0.7547	0.7122	0.8313	0.7681	0.7388	0.7189
5	0.7101	0.6599	0.7546	0.6697	0.6755	0.6257

The numbers in bold represent the most significant values among the comparing outcomes

The result analysis of H-SAGE’s performance wrt the Meta-path length of higher-order relations The numbers in bold represent the most significant values among the comparing outcomes

Length of embedding dimensions

The levels of embeddings are shown through ℓ ∣ ℓ = 2 where d = 1, 2, …, 7. We performed the experiments wrt d = 1 to 7 with an increase of 1 in d for each next iteration. From the results, we noticed that with the increase in d, the performance is continuously and rapidly increasing up to d = 5, where it acquired the highest performance as compared to the most of the incurred instances. With the next increase in d, i.e., on d = 6, though the performance of H-SAGE is still in a better and persisting position, a trivial decrease is undergone wrt a sound number of instances in the observations. However, on d > 6, a significant degradation is observed, and on d = 7, H-SAGE acquired the worst performance. It means that H-SAGE can generalize the representations effectively up to d = 6 only. Therefore, we can conclude that after d = 6, the framework has lost its grip on information structure, considered the triplet’s false granularity as valid, included that irrelevant data to the information and degraded the performance. Finally, we summarized the performance analysis of H-SAGE in Table 9.

Table 9

The result analysis of H-SAGE’s performance wrt the embedding length of entity representations

ℓ	Amazon-Book		Last-FM		Bing-News
ℓ	AUC	Acc	AUC	Acc	AUC	Acc
2¹	0.7124	0.6675	0.7839	0.7314	0.6892	0.6723
2²	0.7299	0.6835	0.8036	0.7497	0.7036	0.6860
2³	0.7432	0.7018	0.8199	0.7536	0.7212	0.7011
2⁴	0.7521	0.7156	0.8313	0.7622	0.7376	0.7127
2⁵	0.7597	0.7208	0.8311	0.7724	0.7436	0.7189
2⁶	0.7501	0.7125	0.8232	0.7727	0.7388	0.7117
2⁷	0.6855	0.6432	0.7623	0.7109	0.6645	0.6433

The numbers in bold represent the most significant values among the comparing outcomes

The result analysis of H-SAGE’s performance wrt the embedding length of entity representations The numbers in bold represent the most significant values among the comparing outcomes

Sampling length of influential neighboring

We represent the levels of neighborhood sampling through K ∣ K = 2 where i = 1, 2, …, 7. We performed the experiments wrt i = 1 to 7 with an increase of 1 in i for each next iteration. From the results, we observed that with increase in i, the performance of H-SAGE is continuously and rapidly increasing up to i = 5, and preserved an optimal performance on i = 4, 5 and 6 wrt the sampling size. However, on i > 6; a significant degradation is observed in the performance, and on i = 7; the model acquired the worst performance. The conclusion is similar to that of Section 5.6.2, and we summarized the results in Table 10.

Table 10

The result analysis of H-SAGE’s performance wrt the sampling length of influential neighboring

K	Amazon-Book		Last-FM		Bing-News
K	AUC	Acc	AUC	Acc	AUC	Acc
2¹	0.7367	0.6731	0.7823	0.7278	0.7065	0.6802
2²	0.7432	0.6891	0.8067	0.7412	0.7208	0.6898
2³	0.7509	0.7030	0.8235	0.7546	0.7316	0.7032
2⁴	0.7597	0.7156	0.8313	0.7622	0.7376	0.7127
2⁵	0.7552	0.7208	0.8311	0.7724	0.7436	0.7189
2⁶	0.7501	0.7125	0.8232	0.7727	0.7388	0.7117
2⁷	0.6835	0.6445	0.7456	0.6875	0.6643	0.6457

The numbers in bold represent the most significant values among the comparing outcomes

The result analysis of H-SAGE’s performance wrt the sampling length of influential neighboring The numbers in bold represent the most significant values among the comparing outcomes

Case study (RQ6)

H-SAGE is capable of providing satisfactory explanations about its generated recommendations. To further explain its working mechanism, we present a daily life recommendation scenario from Bing-News; that provides news suggestions to the users based on their previous-interactions with the online news catalogues via click-record. We retrieved the following seven random click-samples from the user’s interaction-logs and presented below: C1: None is safe: Osaka, Japan also crumples under COVID-19 onslaught. C2: Biden’s COVID warning: “Unvaccinated will end up paying the price” C3: Biden’s remarks announcing Afghanistan troop withdrawal. C4: US announces plans to cut troop levels in Afghanistan. C5: Trump backs Afghanistan withdrawal, putting him at odds with some Republicans. C6: COVID-19 pandemic effected the US economy badly. C7: Economy grew at 6.4% in 1st quarter of 2021 – the massive vaccination rollout and the army withdrawal. According to [50], we considered the heading-entities as instances in the text of the retrieved click samples, and indexed as i1 = “COVID-19 onslaught”, i2 = “COVID warning”, i3 = “price”, i4 = “troop withdrawal”, i5 = “cut troop level”, i6 = “Afghanistan withdrawal”, i7 = “COVID-19”, i8 = “affected”, i9 = “economy”, i10 = “Economy grew”, i11 = “vaccination rollout” and i1 = “army withdrawal”. After the required preprocessing, we categorized these instances into the semantically relevant hash-buckets-based on their likelihood, as discussed in Section 4.2.2. We tackled the information in the stated buckets through function calls. In case of hash-hit, the target hash-code is returned, otherwise locality sensitive hashing technique is applied to acquire the required hash-codes around the location of hash-miss in the buckets. The working phenomenon of the case study is portrayed in Fig. 5. Formally, in response to the above click history, stating from the random system’s outcome, top three of the candidate news are copied and pasted below. Although the framework of H-SAGE has smoothly generated the news recommendations, it is quite evident from the system’s response that the performance still needs a sound improvement. For instance, it is well clear that N1 is normal, N2 is ambiguous, whereas N3 is totally based on unrealistic assumption, as shown below.

Fig. 5

Case Study of a daily-based news recommendation scenario

Case Study of a daily-based news recommendation scenario N1: US announces troop withdrawal from Afghanistan; considering an economy overload. N2: COVID: Bad to economy or Good; currently undecidable. N3: US announces troop withdrawal, due to COVID-19. Therefore, based on demonstration of the case study wrt the working mechanism, we can conclusively narrate that H-SAGE is capable of providing satisfactory explanations about its generated recommendations.

Conclusion and future work

In this work, we proposed a novel semantic-relevance-and-hashing guided KGE enhancement approach for recommendation. We introduced Node Relevance Guided-walk (NRG) modeling technique to construct entity-relevance-based influential graph by capturing higher-order semantically relevant nodes in KG. We converted the graph to hash-codes for implementation. We proposed dProb to place hash-codes in identical hash-buckets-based on their mutual likelihood to maximize the Hash-Hits. We also used dProb to generate feasible hash function-calls. For Hash-Miss, we applied LS-hashing to extract the required hash-codes from the target hash-bucket around the location of Hash-Miss and return the information. Finally, we used a predictive interface to evaluate the retrieved hash-codes, compute the preferences and generate the formal recommendation responses. We evaluated H-SAGE on three real world datasets and compared its performance and time-complexity with eight baseline methods. The experimental and theoretical analysis signifies that H-SAGE has outperformed the-state-of-the-art methods. In future work, we plan to exploit KGEE for drugs and precautionary alerts recommendation against any happened or happening disease/pandemic’s outspread. Further, we plan to enhance KGE-based retrieval of semantically interlinked entities and relations to further preserve actual information structure of the triplets.

1 in total

1. Neighborhood hash graph kernel for protein-protein interaction extraction.

Authors: Yijia Zhang; Hongfei Lin; Zhihao Yang; Yanpeng Li
Journal: J Biomed Inform Date: 2011-08-23 Impact factor: 6.317

1 in total