Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 G-Hash: Towards Fast Kernel-based Similarity Search in Large Graph Databases.

Literature DB >> 20428322

G-Hash: Towards Fast Kernel-based Similarity Search in Large Graph Databases.

Xiaohong Wang¹, Aaron Smalter, Jun Huan, Gerald H Lushington.

Abstract

Structured data including sets, sequences, trees and graphs, pose significant challenges to fundamental aspects of data management such as efficient storage, indexing, and similarity search. With the fast accumulation of graph databases, similarity search in graph databases has emerged as an important research topic. Graph similarity search has applications in a wide range of domains including cheminformatics, bioinformatics, sensor network management, social network management, and XML documents, among others.Most of the current graph indexing methods focus on subgraph query processing, i.e. determining the set of database graphs that contains the query graph and hence do not directly support similarity search. In data mining and machine learning, various graph kernel functions have been designed to capture the intrinsic similarity of graphs. Though successful in constructing accurate predictive and classification models for supervised learning, graph kernel functions have (i) high computational complexity and (ii) non-trivial difficulty to be indexed in a graph database.Our objective is to bridge graph kernel function and similarity search in graph databases by proposing (i) a novel kernel-based similarity measurement and (ii) an efficient indexing structure for graph data management. Our method of similarity measurement builds upon local features extracted from each node and their neighboring nodes in graphs. A hash table is utilized to support efficient storage and fast search of the extracted local features. Using the hash table, a graph kernel function is defined to capture the intrinsic similarity of graphs and for fast similarity query processing. We have implemented our method, which we have named G-hash, and have demonstrated its utility on large chemical graph databases. Our results show that the G-hash method achieves state-of-the-art performance for k-nearest neighbor (k-NN) classification. Most importantly, the new similarity measurement and the index structure is scalable to large database with smaller indexing size, faster indexing construction time, and faster query processing time as compared to state-of-the-art indexing methods such as C-tree, gIndex, and GraphGrep.

Entities: Chemical Disease Gene Species

Year: 2009 PMID： 20428322 PMCID： PMC2860326 DOI： 10.1145/1516360.1516416

Source DB: PubMed Journal: Adv Database Technol

5 in total

1 in total

1. Application of kernel functions for accurate similarity search in large chemical databases.

Authors: Xiaohong Wang; Jun Huan; Aaron Smalter; Gerald H Lushington
Journal: BMC Bioinformatics Date: 2010-04-29 Impact factor: 3.169

1 in total

G-Hash: Towards Fast Kernel-based Similarity Search in Large Graph Databases.

1. CHEMICAL COMPOUND CLASSIFICATION WITH AUTOMATICALLY MINED STRUCTURE PATTERNS.

2. Virtual screening of molecular databases using a support vector machine.

3. SAGA: a subgraph matching tool for biological graphs.

4. BindingDB: a web-accessible database of experimentally determined protein-ligand binding affinities.

5. A maximum common substructure-based algorithm for searching and predicting drug-like compounds.

1. Application of kernel functions for accurate similarity search in large chemical databases.