Literature DB >> 23936227

The discrimination power of structural SuperIndices.

Matthias Dehmer1, Abbe Mowshowitz.   

Abstract

In this paper, we evaluate the discrimination power of structural superindices. Superindices for graphs represent measures composed of other structural indices. In particular, we compare the discrimination power of the superindices with those of individual graph descriptors. In addition, we perform a statistical analysis to generalize our findings to large graphs.

Entities:  

Mesh:

Year:  2013        PMID: 23936227      PMCID: PMC3723667          DOI: 10.1371/journal.pone.0070551

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


Introduction

The absence of a polynomial time algorithm for determining if two arbitrary graphs are isomorphic has stimulated efforts to develop efficient heuristics that work in almost all cases. In particular, research on structural network measures has been undertaken in recent decades, see, e.g., [1]–[6]. Several different types of network measures have been developed. Some of them have been used to characterize the structure of graphs locally or globally [2]–[7]. Others have been used to characterize graphs quantitatively, and these have been applied to problems in areas such as structural chemistry, structural drug design, ecology, and computational physics [2], [8]–[10]. Bonchev [11] and Balaban et al. [12] developed structural indices to detect branching in molecular graphs. In addition to research directed at measuring structural features of a given network, work has been carried out on comparative network measures [13]–[16]. Examples include such work as graph similarity and graph distance measures which have been applied to graph clustering and other problems, see [17]–[19]. Properties of structural measures have also been examined in some detail. Research in this area encompasses investigations of the mathematical interrelations between network measures [20], [21], correlations between measures [22], [23], and their respective discrimination powers (also called uniqueness) [24]–[29]. Discrimination power (or the uniqueness property) is the central concern of this paper. In addition to earlier work on the uniqueness of structural graph measures [24]–[27], [29], [30], Dehmer et al. [28], [31] recently performed large scale analyses of the uniqueness of information-theoretic, degree-based and eigenvalue-based network measures. Here we focus on single indices defined relative to graph decompositions such as those induced by symmetry structure, distances, vertices, chromatic features, etc. Such an index is a mapping and can be interpreted as a graph complexity measure [2], [5], [9]. Single indices interpreted as graph invariants [6] have been studied in areas such as structural chemistry [3], [32] and computer science [33]. Also, we emphasize that approaches employing single indices for finding complete graph invariants have failed so far [32], [34], [35]. A complete graph invariant is an index that distinguishes between non-isomorphic graphs in a given collection. The reason for their failure is that every known single index has a certain degree of degeneracy [25], [35], that is, the measure can not distinguish non-isomorphic graphs by its values. Hence, single structural indices are not suitable for determining graph isomorphism, see [35]. In this paper, we explore the uniqueness of so-called superindices [25], [36], [37] for graphs (see section ‘SuperIndices’). Such superindices have been studied in structural chemistry and other disciplines [25], [36], [37]. A superindex is a composition of several structural index components, and is designed to obtain a measure which captures structural information more meaningfully than the individual components by themselves. To the best of our knowledge, the uniqueness of superindices [25] has not yet been explored to any great extent. To this end we use exhaustively generated general graphs [28] rather than any special graph classes such as chemical graphs [26], [29], [30]. The reason for using exhaustively generated general graphs (i.e., graphs without any structural constraints [28]) is to study the uniqueness of the superindices applied to arbitrary graphs. In short, the problem we address is the use of structural superindices that appear useful in determining graph isomorphism. Superindices are not restricted to any particular class of graphs - they can be applied to arbitrary graphs. Furthermore, a graph index is a measure that maps a single graph to the reals. In contrast, a graph metric [14], [38], [39] is a comparative measure designed to determine the structural similarity between graphs. Those metrics will not be used in this paper. Other graph measures such as the clustering coefficient or degree-based measures do not quantify structural features of graphs meaningfully as they exhibit a high degree of degeneracy [40].

Methods

SuperIndices

Superindices [25], [36], [37] are combinations of existing indices, where “combination” means algebraic or transcendental operations on the component indices. The term superindex was coined by Bonchev et al. [25] who devised superindices to achieve better discrimination between isomers than was possible using individual graph measures. Dehmer et al. [36] applied information-theoretic superindices to the Ames benchmark dataset of Hansen et al. [41] using supervised machine learning. In addition, Pogliani [37] derived certain superindices and demonstrated their power to predict melting points. Let be a graph class and a topological index (or descriptor). Given and we define the following superindices, chosen because they are the simplest and most obvious linear combinations of two indices, and turn out to have high discrimination power, and, after all, this is the acid test of the utility of the indices. It is of course possible that other combination methods, based for example on rank reduction techniques such as Singular Value Decomposition, would produce indices with even greater discrimination power. However, that is something to be explored in future papers. We define: Balaban et al. [12] proposed similar superindices in QSAR/QSPR [12], [42]. That selection proved quite useful and has influenced our choice of superindices for the current study of uniqueness. In the following sections, we analyze the discrimination power of these superindices numerically and statistically. In particular, we demonstrate that some superindices far outperform the underlying single descriptors.

Data and Computation

The uniqueness of the superindices listed above has been analyzed on a collection of exhaustively generated graphs [28]. This collection, denoted (with ) [28], consists of all non-isomorphic connected graphs on 9 vertices. As in [28], the graphs in this collection were generated by the program geng from the Nauty package [43]. The individual as well as the superindices were calculated with the aid to the R-package QuACN [44], [45]. The random graph construction model was selected because it yields the most general class of graphs, and seems appropriate for an initial study of the discrimination power of superindices. Other construction methods, e.g., [46] are also of interest, especially because they model many real world graphs known to exhibit a power law distribution. However, application of the superindices to graphs produced by other construction methods is beyond the scope of the current paper.

Results

Numerical Results

Table 1 presents the QuACN-descriptors [44] with their input options (parameter) and their abbreviations. Superindices with components drawn from the descriptors in Table 1 have been calculated. The results of these computations (discussed below) are shown by Tables 5, 6, 7, 8, 9, 10, 11, 12. Table 4 shows the uniqueness of QuACN-descriptors for given ndv-values, i.e., the number of the non-distinguishable values (graphs) for a particular index and sensitivitysee [26], [28]. The tables show that only a few of the QuACN-descriptors possess high uniqueness, having . Examples of such highly discriminating indices are infotheolin2, infotheoquad2, infotheoexp2, infotheoexp3, laplacianEstrada, minBalabanID, eigenvalaugement, eigenvalextadj, eigenvalvertconnect, eigenvalrandomwalk, eigenvalweightedlin, eigenvalweightedexp. High discrimination power has already been observed (see [28], [31]) for some of the indices, namely, information-theoretic measures (e.g., infotheolin2, infotheoquad2, infotheoexp2 etc.) and the entropic eigenvalue-based measures (eigenvalaugement, eigenvalextadj, eigenvalvertconnect etc.) due to Dehmer [47], [48]. Note that the uniqueness of the minBalabanID [49] is less than the uniqueness of some of the above mentioned measures due to Dehmer [28], [31]. Most of the so-called molecular ID numbers (such as minBalabanID) appear to be highly discriminating but have never been evaluated on general graph classes such as exhaustively generated general graphs. It has also been observed that the uniqueness of structural graph indices depend on the graph class under consideration, see [28], [31], [50].
Table 1

Descriptors from QuACN [44] where g denotes an input graph.

QuACN-descriptors with input optionsAbbreviation
augmentedZagreb(g)augmentedZagreb
balabanJ(g)balabanJ
balabanlike1(g)balabanlike1
balabanlike2(g)balabanlike2
bertz(g)bertz
bonchev1(g)bonchev1
bonchev2(g)bonchev2
bonchev3(g)bonchev3
compactness(g)compactness
complexityIndexB(g)complexityIndexB
randic(g)randic
wiener(g)wiener
zagreb1(g)zagreb1
zagreb2(g)zagreb2
harary(g)harary
normalizedEdgeComplexity(g)normalizedEdgeComplexity
radialCentric(g)radialCentric
infoTheoreticGCM(g,coeff = “lin”,infofunct = “sphere”,lambda = 1000)infotheolin1
infoTheoreticGCM(g,coeff = “lin”,infofunct = “vertcent”,lambda = 1000)infotheolin2
infoTheoreticGCM(g,coeff = “quad”,infofunct = “sphere”,lambda = 1000)infotheoquad1
infoTheoreticGCM(g,coeff = “quad”,infofunct = “vertcent”,lambda = 1000)infotheoquad2
infoTheoreticGCM(g,coeff = “exp”,infofunct = “sphere”,lambda = 1000)infotheoexp1
infoTheoreticGCM(g,coeff = “exp”,infofunct = “vertcent”,lambda = 1000)infotheoexp2
infoTheoreticGCM(g,coeff = “exp”,infofunct = “degree”,lambda = 1000)infotheoexp3
meanDistanceDeviation(g)meanDistanceDeviation
productOfRowSums(g,log = T)productofrowsums
hyperDistancePathIndex(g)hyperDistancePathIndex
topologicalInfoContent(g)topologicalinfocontent
vertexDegree(g)vertexDegree
graphVertexComplexity(g)graphVertexComplexity
graphIndexComplexity(g)graphIndexComplexity
graphDistanceComplexity(g)graphDistanceComplexity
informationLayerIndex(g)informationLayerIndex
modifiedZagreb(g)modifiedZagreb
minConnectivityID(g)minConnectivityID
laplacianEnergy(g)laplacianEnergy
laplacianEstrada(g)laplacianEstrada
mediumArticulation(g)mediumArticulation
minBalabanID(g)minBalabanID
minConnectivityID(g)minConnectivityID
modifiedZagreb(g)modifiedZagreb
narumiKatayama(g)narumiKatayama
offdiagonal(g)offdiagonal
spanningTreeSensitivity(g)spanningTreeSensitivity
spectralRadius(g)spectralRadius
symmetryIndex(g)symmetryIndex
variableZagreb(g)variableZagreb
eigenvalueBased(g,adjacencyMatrix,1)eigenvaladj
eigenvalueBased(g,laplaceMatrix,1)eigenvallaplace
eigenvalueBased(g,distanceMatrix,1)eigenvaldistance
eigenvalueBased(g,distancePathMatrix,1)eigenvaldistancepath
eigenvalueBased(g,augmentedMatrix,1)eigenvalaugement
eigenvalueBased(g,extendedAdjacencyMatrix,1)eigenvalextadj
eigenvalueBased(g,vertConnectMatrix,1)eigenvalvertconnect
eigenvalueBased(g,randomWalkMatrix,1)eigenvalrandomwalk
eigenvalueBased(g,weightStrucFuncMatrix_lin,1)eigenvalweightedlin
eigenvalueBased(g,weightStrucFuncMatrix_exp,1)eigenvalweightedexp
infoTheoreticGCM(g,coeff = “lin”,infofunct = “pathlength”,lambda = 1000)infotheolin3
infoTheoreticGCM(g,coeff = “quad”,infofunct = “pathlength”,lambda = 1000)infotheoquad3
infoTheoreticGCM(g,coeff = “exp”,infofunct = “pathlength”,lambda = 1000)infotheoexp4
Table 5

ndv-values of graphs in for different combinations of QuACN-descriptors from the first set of superindices.

Descriptors ndv
augmentedZagreb_eigenvalaugement00000
balabanJ_eigenvalaugement00000
balabanlike1_eigenvalaugement00000
balabanlike2_eigenvalaugement00000
bertz_eigenvalaugement00000
bonchev1_eigenvalaugement00000
bonchev2_eigenvalaugement00000
bonchev3_eigenvalaugement00000
compactness_eigenvalaugement00000
complexityIndexB_eigenvalaugement00000
randic_eigenvalaugement00000
wiener_eigenvalaugement00000
zagreb1_eigenvalaugement00000
zagreb2_eigenvalaugement00000
harary_eigenvalaugement00000
normalizedEdgeComplexity_eigenvalaugement00000
radialCentric_eigenvalaugement07967379673796730
infotheolin1_eigenvalaugement00000
infotheolin2_eigenvalaugement00000
infotheolin2_eigenvalextadj00000
infotheoquad1_eigenvalaugement00000
infotheoquad2_eigenvalaugement00000
infotheoquad2_eigenvalextadj00000
infotheoexp1_eigenvalaugement00000
infotheoexp2_eigenvalaugement00000
infotheoexp2_eigenvalextadj00000
infotheoexp2_eigenvalvertconnect00022
infotheoexp2_eigenvalrandomwalk00222
infotheoexp3_laplacianEstrada00000
infotheoexp3_eigenvallaplace00000
infotheoexp3_eigenvaldistance00000
infotheoexp3_eigenvaldistancepath00000
infotheoexp3_eigenvalaugement00000
meanDistanceDeviation_eigenvalaugement02323230
hyperDistancePathIndex_eigenvalaugement00000
Table 6

ndv-values of graphs in for different combinations of QuACN-descriptors from the second subset of superindices.

Descriptors ndv
augmentedZagreb_eigenvalaugement0000
balabanJ_eigenvalaugement0000
balabanlike1_eigenvalaugement0000
balabanlike2_eigenvalaugement0000
bertz_eigenvalaugement0000
bonchev1_eigenvalaugement0000
bonchev2_eigenvalaugement0000
bonchev3_eigenvalaugement0000
compactness_eigenvalaugement0000
complexityIndexB_eigenvalaugement0000
randic_eigenvalaugement0000
wiener_eigenvalaugement0000
zagreb1_eigenvalaugement0000
zagreb2_eigenvalaugement0000
harary_eigenvalaugement0000
normalizedEdgeComplexity_eigenvalaugement0000
radialCentric_eigenvalaugement00079673
infotheolin1_eigenvalaugement0000
infotheolin2_eigenvalaugement0000
infotheolin2_eigenvalextadj0000
infotheoquad1_eigenvalaugement0000
infotheoquad2_eigenvalaugement0000
infotheoquad2_eigenvalextadj0000
infotheoexp1_eigenvalaugement0000
infotheoexp2_eigenvalaugement0000
infotheoexp2_eigenvalextadj0000
infotheoexp2_eigenvalvertconnect2222
infotheoexp2_eigenvalrandomwalk2222
infotheoexp3_laplacianEstrada0000
infotheoexp3_eigenvallaplace0000
infotheoexp3_eigenvaldistance0000
infotheoexp3_eigenvaldistancepath0000
infotheoexp3_eigenvalaugement0000
meanDistanceDeviation_eigenvalaugement00023
hyperDistancePathIndex_eigenvalaugement0000
Table 7

ndv-values of graphs in for different combinations of QuACN-descriptors from the first set of superindices (continued).

Descriptors ndv
topologicalinfocontent_eigenvalaugement02222220
vertexDegree_eigenvalaugement02222220
graphVertexComplexity_eigenvalaugement00000
graphIndexComplexity_eigenvalaugement00000
graphDistanceComplexity_eigenvalaugement00000
minConnectivityID_eigenvalaugement00000
laplacianEnergy_eigenvalaugement00000
laplacianEstrada_eigenvalaugement00000
laplacianEstrada_eigenvalextadj00000
mediumArticulation_eigenvalaugement02220
minBalabanID_spanningTreeSensitivity05757570
minBalabanID_eigenvalaugement00000
minBalabanID_eigenvalextadj00000
minBalabanID_eigenvalvertconnect00000
minBalabanID_eigenvalrandomwalk00000
modifiedZagreb_eigenvalaugement00000
offdiagonal_eigenvalaugement03030300
spanningTreeSensitivity_eigenvalaugement05757570
spectralRadius_eigenvalaugement00000
symmetryIndex_eigenvalaugement07082970829708290
variableZagreb_eigenvalaugement00000
eigenvaladj_eigenvalaugement00000
eigenvallaplace_eigenvalaugement00000
eigenvallaplace_eigenvalextadj00000
eigenvaldistance_eigenvalaugement00000
eigenvaldistancepath_eigenvalaugement00000
eigenvalaugement_eigenvalextadj00000
eigenvalaugement_eigenvalvertconnect00000
eigenvalaugement_eigenvalrandomwalk00000
eigenvalaugement_eigenvalweightedlin00000
eigenvalaugement_eigenvalweightedexp00000
eigenvalaugement_infotheolin300000
eigenvalaugement_infotheoquad300000
eigenvalaugement_infotheoexp400000
meanDistanceDeviation_eigenvalextadj205226220213213
Table 8

ndv-values of graphs in for different combinations of QuACN-descriptors from the second subset of superindices (continued).

Descriptors ndv
topologicalinfocontent_eigenvalaugement00022
vertexDegree_eigenvalaugement00022
graphVertexComplexity_eigenvalaugement0000
graphIndexComplexity_eigenvalaugement0000
graphDistanceComplexity_eigenvalaugement0000
minConnectivityID_eigenvalaugement0000
laplacianEnergy_eigenvalaugement0000
laplacianEstrada_eigenvalaugement0000
laplacianEstrada_eigenvalextadj0000
mediumArticulation_eigenvalaugement0002
minBalabanID_spanningTreeSensitivity00057
minBalabanID_eigenvalaugement0000
minBalabanID_eigenvalextadj0000
minBalabanID_eigenvalvertconnect0000
minBalabanID_eigenvalrandomwalk0000
modifiedZagreb_eigenvalaugement0000
offdiagonal_eigenvalaugement00030
spanningTreeSensitivity_eigenvalaugement00057
spectralRadius_eigenvalaugement0000
symmetryIndex_eigenvalaugement00070829
variableZagreb_eigenvalaugement0000
eigenvaladj_eigenvalaugement0000
eigenvallaplace_eigenvalaugement0000
eigenvallaplace_eigenvalextadj0000
eigenvaldistance_eigenvalaugement0000
eigenvaldistancepath_eigenvalaugement0000
eigenvalaugement_eigenvalextadj0000
eigenvalaugement_eigenvalvertconnect0000
eigenvalaugement_eigenvalrandomwalk0000
eigenvalaugement_eigenvalweightedlin0000
eigenvalaugement_eigenvalweightedexp0000
eigenvalaugement_infotheolin30000
eigenvalaugement_infotheoquad30000
eigenvalaugement_infotheoexp40000
meanDistanceDeviation_eigenvalextadj219199215216
Table 9

ndv-values of graphs in for different combinations of QuACN-descriptors from the first subset of superindices (continued).

Descriptors ndv
balabanlike1_spanningTreeSensitivity208161215221200
infotheolin2_spanningTreeSensitivity212203181167220
balabanJ_spanningTreeSensitivity214241201177214
hyperDistancePathIndex_eigenvalextadj215201207202215
radialCentric_eigenvalextadj216797937978979779232
spanningTreeSensitivity_infotheoquad3216219213171214
balabanlike1_eigenvalextadj217201197201211
wiener_eigenvalextadj217213205185213
harary_eigenvalextadj217187199215217
bonchev2_eigenvalextadj219191170209217
symmetryIndex_eigenvalextadj221710317105071040225
eigenvaldistance_infotheoexp4230210202248250
balabanlike2_eigenvalvertconnect235245239240257
infotheoexp2_eigenvaldistance238190248246248
infotheolin2_eigenvaldistancepath240220246248250
infotheoquad2_eigenvaldistance242198240238246
eigenvaldistancepath_infotheoexp4242208218250252
infotheolin2_eigenvaldistance244216246238244
infotheoexp2_eigenvaldistancepath244232246252252
infotheoquad2_eigenvaldistancepath246212248246250
balabanJ_eigenvalvertconnect247221231253263
balabanlike2_eigenvalrandomwalk247249245245259
balabanJ_eigenvalrandomwalk255229239261261
balabanlike1_eigenvalvertconnect261245237245247
eigenvaladj_infotheoexp4262226222286288
balabanlike1_eigenvalrandomwalk263253249247257
infotheolin2_eigenvaladj264212280282286
infotheoexp2_eigenvaladj264216282280288
infotheoquad2_eigenvaladj268216284288288
infotheolin1_eigenvalvertconnect273241305297309
infotheoquad1_eigenvalvertconnect275245305303303
complexityIndexB_eigenvalvertconnect287259295289305
graphDistanceComplexity_eigenvalvertconnect287245299303311
narumiKatayama_eigenvalextadj519469457463519
graphIndexComplexity_infotheoquad3535515505501537
Table 10

ndv-values of graphs in for different combinations of QuACN-descriptors from the second subset of superindices (continued).

Descriptors ndv
balabanlike1_spanningTreeSensitivity200192200217
infotheolin2_spanningTreeSensitivity220218232187
balabanJ_spanningTreeSensitivity228214214239
hyperDistancePathIndex_eigenvalextadj219219219146
radialCentric_eigenvalextadj23421522079793
spanningTreeSensitivity_cinfotheoquad3230228226205
balabanlike1_eigenvalextadj215211217187
wiener_eigenvalextadj219219219159
harary_eigenvalextadj219219219152
bonchev2_eigenvalextadj217219217132
symmetryIndex_eigenvalextadj23122522971036
eigenvaldistance_infotheoexp4246244246234
balabanlike2_eigenvalvertconnect261247257235
infotheoexp2_eigenvaldistance250248250246
infotheolin2_eigenvaldistancepath248250248246
infotheoquad2_eigenvaldistance250250248242
eigenvaldistancepath_infotheoexp4252246252248
infotheolin2_eigenvaldistance248246252240
infotheoexp2_eigenvaldistancepath250250250244
infotheoquad2_eigenvaldistancepath250248252244
balabanJ_eigenvalvertconnect261253259249
balabanlike2_eigenvalrandomwalk259255263235
balabanJ_eigenvalrandomwalk259253263257
balabanlike1_eigenvalvertconnect263253263247
eigenvaladj_infotheoexp4294286288270
balabanlike1_eigenvalrandomwalk261255263239
infotheolin2_eigenvaladj292284292272
infotheoexp2_eigenvaladj294284286274
infotheoquad2_eigenvaladj290282292272
infotheolin1_eigenvalvertconnect313307307293
infotheoquad1_eigenvalvertconnect311305305293
complexityIndexB_eigenvalvertconnect309307311293
graphDistanceComplexity_eigenvalvertconnect307309305295
narumiKatayama_eigenvalextadj515519517124837
graphIndexComplexity_infotheoquad3545537539481
Table 11

ndv-values of graphs in for different combinations of QuACN-descriptors from the first subset of superindices (continued).

Descriptors ndv
radialCentric_eigenvalvertconnect551801138008180088596
infotheoexp3_graphIndexComplexity568549548515577
radialCentric_eigenvalrandomwalk597801608015480130615
topologicalinfocontent_eigenvalvertconnect728677713804807
offdiagonal_eigenvalvertconnect788815832750845
randic_eigenvalvertconnect791825821657839
balabanlike2_laplacianEstrada800885110210671015
randic_eigenvalrandomwalk801839833735851
topologicalinfocontent_eigenvalrandomwalk807753777822833
infotheoexp3_symmetryIndex812715727154971565811
balabanlike1_laplacianEstrada820861110210551026
offdiagonal_eigenvalrandomwalk831849854812857
bertz_eigenvalvertconnect835783780642819
vertexDegree_eigenvalvertconnect835810761899948
mediumArticulation_eigenvalvertconnect841841820791887
bertz_eigenvalrandomwalk845811818748839
balabanJ_laplacianEstrada84910517148411030
mediumArticulation_eigenvalrandomwalk865889860839907
zagreb2_eigenvalvertconnect869747743785867
zagreb2_eigenvalrandomwalk869795823819867
augmentedZagreb_eigenvalvertconnect870752742769866
augmentedZagreb_eigenvalrandomwalk870792819821870
laplacianEstrada_spanningTreeSensitivity8838368128051027
infotheoexp2_infotheoexp3887858889881916
infotheoquad2_infotheoexp3892857877877919
infotheolin2_infotheoexp3901844875875913
infotheoexp3_infotheoquad3901852845883917
infotheoexp3_infotheoexp4902827866870907
infotheoexp3_infotheolin3907836839879908
vertexDegree_eigenvalrandomwalk907896891953956
narumiKatayama_eigenvalvertconnect917790795777917
narumiKatayama_eigenvalrandomwalk917852857846917
infotheoexp3_minBalabanID949889910846939
Table 12

ndv-values of graphs in for different combinations of QuACN-descriptors from the second subset of superindices (continued).

Descriptors ndv
radialCentric_eigenvalvertconnect60456257580087
infotheoexp3_graphIndexComplexity573561566556
radialCentric_eigenvalrandomwalk62860360780150
topologicalinfocontent_eigenvalvertconnect831815814732
offdiagonal_eigenvalvertconnect851782819814
randic_eigenvalvertconnect839839839729
balabanlike2_laplacianEstrada159610331495112
randic_eigenvalrandomwalk857849857781
topologicalinfocontent_eigenvalrandomwalk835825827776
infotheoexp3_symmetryIndex82077779671592
balabanlike1_laplacianEstrada161310771479146
offdiagonal_eigenvalrandomwalk859817835846
bertz_eigenvalvertconnect837843841513
vertexDegree_eigenvalvertconnect946918940762
mediumArticulation_eigenvalvertconnect899843875810
bertz_eigenvalrandomwalk839845845612
balabanJ_laplacianEstrada163910761465102
mediumArticulation_eigenvalrandomwalk905893883864
zagreb2_eigenvalvertconnect859869863500
zagreb2_eigenvalrandomwalk863869867505
augmentedZagreb_eigenvalvertconnect842868868471
augmentedZagreb_eigenvalrandomwalk862870868522
laplacianEstrada_spanningTreeSensitivity12447671146205
infotheoexp2_infotheoexp3925907915827
infotheoquad2_infotheoexp3915909913821
infotheolin2_infotheoexp3915920912834
infotheoexp3_infotheoquad3924922915821
infotheoexp3_infotheoexp4917916915811
infotheoexp3_infotheolin3917905914830
vertexDegree_eigenvalrandomwalk956932938863
narumiKatayama_eigenvalvertconnect917919917125640
narumiKatayama_eigenvalrandomwalk917919917125677
infotheoexp3_minBalabanID942930945824
Table 4

ndv-values for the individual QuACN-descriptors of graphs in .

Descriptors (abbreviation)ndv
augmentedZagreb2417770.07394
balabanJ1566740.39990
balabanlike11481320.43262
balabanlike21481320.43262
bertz2610800.00000
bonchev12609710.00042
bonchev22608030.00106
bonchev32609710.00042
compactness2610720.00003
complexityIndexB2371990.09147
randic2434130.06767
wiener2610720.00003
zagreb12610780.00001
zagreb22609310.00057
harary2610180.00024
normalizedEdgeComplexity2610780.00001
radialCentric2610790.00000
infotheolin12494390.04459
infotheolin2363100.86092
infotheoquad12350440.09972
infotheoquad2270320.89646
infotheoexp12350550.09968
infotheoexp2270170.89652
infotheoexp318770.99281
meanDistanceDeviation2610670.00005
productofrowsums2522620.03378
hyperDistancePathIndex2610540.00010
topologicalinfocontent2610800.00000
vertexDegree2610790.00000
graphVertexComplexity2606480.00165
graphIndexComplexity446520.82897
graphDistanceComplexity2352330.09900
minConnectivityID198420.92400
laplacianEnergy595420.77194
laplacianEstrada233930.91040
mediumArticulation2605760.00193
minBalabanID183410.92975
modifiedZagreb2582930.01067
narumiKatayama2609250.00059
offdiagonal2599670.00426
spanningTreeSensitivity443890.82998
spectralRadius481200.81569
symmetryIndex2610700.00004
variableZagreb2582860.01070
eigenvaladj423470.83780
eigenvallaplace352060.86515
eigenvaldistance232020.91113
eigenvaldistancepath199820.92346
eigenvalaugement01.00000
eigenvalextadj4790.99817
eigenvalvertconnect10890.99583
eigenvalrandomwalk11760.99550
eigenvalweightedlin36930.98585
eigenvalweightedexp44020.98314
infotheolin31583910.39332
infotheoquad3581960.77710
infotheoexp4270170.89652
Tables 5, 6, 7, 8 present the uniqueness results for certain combinations of descriptors involving the superindices. Each pair of tables shows the the results for two subsets of such indices. The first subset consists of Equations 1–5 (e.g., Table 5) and the second subset consists of Equations 6–9 (e.g., Table 6), respectively. For instance if we look at Table 5, we see that most of the superindices now discriminate the graphs perfectly (ndv = 0) even when indices with very low uniqueness (such as augmentedZagreb, bertz, wiener etc.) are involved. When applying the descriptors radialCentric and eigenvalaugement to the Equations representing the superindices, some of them are much less discriminating (ndv = 79676 corresponds to ). This is due to the fact that radialCentric has little discrimination power (it discriminate only two graphs out of 261080). A similar effect can be seen in Tables 9, 10, 11, 12. For instance, Table 9 shows that the composition (based on the superindices) of a descriptor with little discrimination power (e.g., narumiKatayama; ndv = 260925, 0.00059, see Table 4) with another descriptor having high discrimination power (e.g., eigenvalvertconnect; ndv = 1089, 0.99583, see Table 4) leads again to a highly unique measure. In this particular case and by using the superindex , we find its discrimination power to be ndv = 535 and . Uniqueness (measured by ndv and ) of the new measure is better than the uniqueness of the component measures, see Table 9. More extreme cases can be found in Table 12 defined as the composition of the two descriptors topologicalinfocontent and eigenvalvertconnect using the superindex . In short, Tables 5, 6, 7, 8, 9, 10, 11, 12 demonstrate that most of the superindices possess high uniqueness when one of the constituent graph measures has little discrimination power. To better understand the behavior of these indices it would be desirable to explore the structural interpretation of these measures. Many of the constituent measures have a structural interpretation associated with a branching index [11], [22] (e.g., the Wiener index (wiener) or as a cyclicity index [12] (e.g., the Balaban index (balabanJ). A correlation analysis might be used to determine classes of superindices having a distinctive interpretation, e.g., branching, cyclicity, irregularity etc. Such an analysis would involve finding the correlations between and , and , and , etc. However, this is beyond the scope of the present paper.

Statistical Analysis

To determine the scalability of our findings on discrimination power of superindices applied to the graphs in , we have performed a statistical analysis. The aim of this analysis is to determine whether or not the results for determining uniqueness are statistically stable for graphs with larger numbers of vertices. Central to this analysis is a method for generating random graphs. We used Bootstrapping [51], [52] to estimate the underlying sampling distribution. Let be a graph with vertices and edges. Now, the size of the edge set of a connected random graph with vertices satisfies. For the statistical analysis see Figures 1 and 2. Samples of random (Erdös-Rényi) graphs have been generated using the R-library igraph [53] for . More precisely, we have generated 50 random graphs for each of the edge sizes . The parameter denotes the bound on the size of the random sample dictated by the computational algorithm. The procedure we used is detailed in the following algorithm.
Figure 1

The means of the sensitivity values (see Equation 10) vs. the vertex sizes using the superindices (Left) and (Right).

Figure 2

The means of the sensitivity values vs. the vertex sizes of the generated random graphs using the individual indices from Table 2 only.

Generate a connected random graph possessing Add edges randomly between non-adjacent vertices to obtain edge sizes Check each generated random graph for isomorphism with previously generated graphs. If the newly generated graph is not isomorphic to any of the previously generated graphs, we add this graph to the list, and return to step 1.

Algorithm 1

Performing the computation in Algorithm 1, we obtain complete random samples for . For the sake of completeness, we also give the sizes of the random samples generated: and . By choosing , we generated random graphs with . Hence, we obtain 58500 random graphs in total. and . By choosing , we generated random graphs with . Hence, we obtain 134650 random graphs in total. and . By choosing , we generated random graphs with . Hence, we obtain 242150 random graphs in total. and . By choosing , we generated random graphs with . Hence, we obtain 550650 random graphs in total. In order to calculate the superindices, we computed all possible (pairwise) combinations of the descriptors given in Table 2. To calculate the mean sensitivity for each descriptor combination, we bootstrapped the samples -times without replacement. Finally, the mean values of all sensitivity values for superindices and together with their variances are shown by Figures 1 and 2. The mean values are quite stable. Thus, there is little dependency between the mean sensitivity and the number of vertices of the generated random graphs. In particular, we see that the mean value detoriates slightly for . In short, Figure 1 strongly supports the hypothesis that the computed superindices have high discrimination power for graphs of increasing size and the values are quite stable. Indeed, stability could be defined here by the degree of the dependency between the mean sensitivity values and the number of vertices. Note that the analysis whose results are shown in Figure 1 was computationally demanding due to the combinatorial explosion of cases. Hence, to repeat the analysis for much larger (i.e., ) may not be feasible.
Table 2

Individual QuACN-descriptors.

eigenvaladjbalabanlike2
eigenvaldistancebertz
eigenvaldistancepathbonchev2
eigenvalextadjbonchev3
eigenvalrandomwalkcompactness
eigenvalvertconnectcomplexityIndexB
eigenvalaugementeigenvallaplace
eigenvalweightedexpgraphDistanceComplexity
eigenvalweightedlinmediumArticulation
laplacianEnergymodifiedZagreb
laplacianEstradaoffdiagonal
spectralRadiussymmetryIndex
infotheoexp1graphVertexComplexity
infotheoexp2harary
infotheoexp3hyperDistancePathIndex
infotheoexp4meanDistanceDeviation
infotheolin1variableZagreb
infotheolin2vertexDegree
infotheolin3normalizedEdgeComplexity
infotheoquad1radialCentric
infotheoquad2randic
infotheoquad3topologicalinfocontent
augmentedZagrebwiener
graphIndexComplexityzagreb1
minBalabanIDzagreb2
balabanJnarumiKatayama
balabanlike1
In contrast to the superindices, the results in Figure 2 show that the discrimination power of the individual descriptors listed in Table 2 is worse for larger graphs. This is indicated by the mean sensitivity values which are much lower than the ones shown in Figure 1. This demonstrates that superindices and have a much better discrimination power on the generated random graphs. A reason for this is that the superindices seem to capture structural information more meaningfully than the individual ones. This seems to be clear (for the used graph class) as multiple descriptors capture several different aspects of structural information which may complement each other and, thus, provide a (super) index with improved discrimination power. The results in Figures 1 and 2 summarize the uniqueness of some superindices as a function of the size of randomly generated graphs. We next consider the relationship between uniqueness (measured by ) and graph size. The results are shown in Figure 3, 4, 5. Earlier work by Dehmer et al. [28] on superindices restricted the component individual indices to information-theoretic measures. In the present study, we aim to examine the dependency between the uniqueness of the superindex using certain descriptor categories applied to generated random graphs of fixed size (. The categories included eigenvalue-based, information-theoretic, distance-based and degree-based descriptors. The descriptors in the categories are listed in Table 3. In order to calculate the mean sensitivity using the descriptors of the above mentioned categories, we bootstrapped the descriptor values times without replacement for each combination to determine of randomly generated graphs (). The sample sizes are 100, 1000, 10000, 100000, 900000.
Figure 3

The means of the sensitivity values vs. the total number of randomly generated graphs using the superindex .

To calculate the superindex, we used all combinations of eigenvalue-based descriptors (Left) and eigenvalue-based and information-theoretic descriptors (Right), see Table 3.

Figure 4

The means of the sensitivity values vs. the total number of randomly generated graphs using the superindex .

To calculate the superindex, we used all combinations of eigenvalue-based and distance-based descriptors (Left) and eigenvalue-based and degree-based descriptors (Right), see Table 3.

Figure 5

The means of the sensitivity values vs. the total number of randomly generated graphs using the superindex .

To calculate the superindex, we used all combinations of distance-based descriptors (Left) and distance-based and degree-based descriptors (Right), see Table 3.

Table 3

Categories of QuACN-descriptors.

Eigenvalue-based descriptors
eigenvaladj, eigenvallaplace, eigenvaldistance, eigenvaldistancepath, eigenvalaugement, eigenvalextadj, eigenvalvertconnect, eigenvalrandomwalk, eigenvalweightedlin, eigenvalweightedexp, laplacianEnergy, laplacianEstrada, spectralRadius
Information-theoretic descriptors
infotheolin1, infotheolin2, infotheoquad1, infotheoquad2, infotheoexp1, infotheoexp2, infotheoexp3, infotheolin3, infotheoquad3, infotheoexp4
Distance-based descriptors
balabanJ, balabanlike1, balabanlike2, bertz, bonchev2, bonchev3, compactness, complexityIndexB, wiener, harary, radialCentric, meanDistanceDeviation, hyperDistancePathIndex, topologicalinfocontent, graphVertexComplexity, graphDistanceComplexity, symmetryIndex, productofrowsums
Degree-based descriptors
augmentedZagreb, randic, zagreb1, zagreb2, vertexDegree, modifiedZagreb, narumiKatayama, offdiagonal, variableZagreb

The means of the sensitivity values vs. the total number of randomly generated graphs using the superindex .

To calculate the superindex, we used all combinations of eigenvalue-based descriptors (Left) and eigenvalue-based and information-theoretic descriptors (Right), see Table 3. To calculate the superindex, we used all combinations of eigenvalue-based and distance-based descriptors (Left) and eigenvalue-based and degree-based descriptors (Right), see Table 3. To calculate the superindex, we used all combinations of distance-based descriptors (Left) and distance-based and degree-based descriptors (Right), see Table 3. Figures 3, 4, 5 shows the impact of the underlying category on the above mentioned dependency. From Figure 3 we see that there is nearly no dependency between and the sample size. A plausible reason for this is the high uniqueness of the underlying individual descriptors of the categories employed, namely, (left) eigenvalue-based descriptors and (right) eigenvalue-based and information-theoretic descriptors (see Table 4). Figure 4 shows a similar result but there is a slight detoriation of uniqueness for the degree-based descriptors used calculate the superindex. This seems plausible as many degree-based measures possess little discrimination power, e.g., see [31]. The left hand side of Figure 5 shows the dependency plot by using the (pure) category of distance-based measures (see Table 3). In particular, the variances are very high and the mean sensitivity values detoriate substantially as the sample size increases. Again, this can be understood by the low uniqueness of various distance-based graph measures (see Table 4). The right hand side of Figure 5 shows that this effect is eased for a (mixed) category of descriptors - distance-based and degree-based descriptors in the present case. In summary, we see that the uniqueness of the superindex does not depend much on the sample size when the component descriptors are relatively unique. In our study, this applies to the eigenvalue-based and information-theoretic descriptors. It is not surprising that we obtained very similar results by using the superindex .

Summary and Conclusion

In the foregoing we examined the discrimination power of structural superindices composed of two or more individual measures (or descriptors) defined on graphs. Our results show that superindices generally have greater discrimination power than individual descriptors. The initial analysis of the superindices was performed the collection of graphs on nine vertices. In addition, we examined the relative performance of superindices on randomly generated connected graphs on 50, 75, 100, and 150 vertices, respectively. The findings show that the superindices perform consistently over these different sized graphs, whereas individual descriptors exhibit declining performance. We conjecture that this superior performance of superindices is attributable to their taking account of multiple structural features of a graph, rather than the single feature captured by individual descriptors. Further research is needed to account for the differences in performance between different superindices, and between superindices and individual descriptors.
  14 in total

1.  Comparison of chemical clustering methods using graph- and fingerprint-based similarity measures.

Authors:  John W Raymond; C John Blankley; Peter Willett
Journal:  J Mol Graph Model       Date:  2003-03       Impact factor: 2.518

2.  Quantitative methods for ecological network analysis.

Authors:  Robert E Ulanowicz
Journal:  Comput Biol Chem       Date:  2004-12       Impact factor: 2.877

3.  Biological network comparison using graphlet degree distribution.

Authors:  Natasa Przulj
Journal:  Bioinformatics       Date:  2007-01-15       Impact factor: 6.937

4.  Complexity of chemical graphs in terms of size, branching, and cyclicity.

Authors:  A T Balaban; D Mills; V Kodali; S C Basak
Journal:  SAR QSAR Environ Res       Date:  2006-08       Impact factor: 3.000

5.  On entropy-based molecular descriptors: statistical analysis of real and synthetic chemical structures.

Authors:  Matthias Dehmer; Kurt Varmuza; Stephan Borgert; Frank Emmert-Streib
Journal:  J Chem Inf Model       Date:  2009-07       Impact factor: 4.956

6.  Benchmark data set for in silico prediction of Ames mutagenicity.

Authors:  Katja Hansen; Sebastian Mika; Timon Schroeter; Andreas Sutter; Antonius ter Laak; Thomas Steger-Hartmann; Nikolaus Heinrich; Klaus-Robert Müller
Journal:  J Chem Inf Model       Date:  2009-09       Impact factor: 4.956

7.  Entropy and the complexity of graphs. I. An index of the relative complexity of a graph.

Authors:  A Mowshowitz
Journal:  Bull Math Biophys       Date:  1968-03

8.  Novel topological descriptors for analyzing biological networks.

Authors:  Matthias M Dehmer; Nicola N Barbarini; Kurt K Varmuza; Armin A Graber
Journal:  BMC Struct Biol       Date:  2010-06-17

9.  Structural discrimination of networks by using distance, degree and eigenvalue-based measures.

Authors:  Matthias Dehmer; Martin Grabner; Boris Furtula
Journal:  PLoS One       Date:  2012-07-06       Impact factor: 3.240

10.  A large scale analysis of information-theoretic network complexity measures using chemical structures.

Authors:  Matthias Dehmer; Nicola Barbarini; Kurt Varmuza; Armin Graber
Journal:  PLoS One       Date:  2009-12-15       Impact factor: 3.240

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.