| Literature DB >> 26072480 |
Y Hulovatyy1, H Chen1, T Milenković1.
Abstract
MOTIVATION: With increasing availability of temporal real-world networks, how to efficiently study these data? One can model a temporal network as a single aggregate static network, or as a series of time-specific snapshots, each being an aggregate static network over the corresponding time window. Then, one can use established methods for static analysis on the resulting aggregate network(s), but losing in the process valuable temporal information either completely, or at the interface between different snapshots, respectively. Here, we develop a novel approach for studying a temporal network more explicitly, by capturing inter-snapshot relationships.Entities:
Mesh:
Year: 2015 PMID: 26072480 PMCID: PMC4765862 DOI: 10.1093/bioinformatics/btv227
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Illustration of the difference between static and dynamic graphlets. (a) All nine static graphlets with up to four nodes, along with their 15 ‘node symmetry groups’ (or formally, automorphism orbits) (Milenković and Pržulj, 2008; Pržulj ). Within a given graphlet, different orbits are denoted by different node colors. For example, there is a single orbit in graphlet G2, as all three nodes are topologically identical to each other. But there are two orbits in graphlet G2, as the two end nodes are topologically identical to each other but not to the middle node (and vice versa). (b) All dynamic graphlets with up to three events, along with their automorphism orbits. Multiple events along the same edge are separated with commas. Node colors correspond to different orbits. (c) All four dynamic graphlets D whose static backbone is G1
Fig. 2.Illustration of how we extend a dynamic graphlet with an additional more recent event, on the example of D9. There are seven possible extensions of D9 (which contains four nodes and three events) with the most recent event 4 (shown in bold) into a dynamic graphlet with four events. Five of the extensions keep the same number of nodes but increment the number of events, while the remaining two extensions increment both the number of nodes and events. Note that in order to extend D9 with event 4, at least one of the nodes involved in event 3 has to participate in event 4 as well
Fig. 3.Illustration of our dynamic graphlet counting procedure. The temporal network is a sequence of three snapshots. Dashed lines denote instances of the same node in different snapshots. Colored lines denote the path of how the temporal network is explored in order to count the given dynamic graphlet. Regular dynamic graphlet counting will detect all three of the dynamic graphlets D1 (involving nodes c and f), D2 (involving nodes c, d and f), and D9 (involving nodes a, b, c and d). Constrained dynamic graphlet counting (Supplementary Section S2) will detect only the first two dynamic graphlets, but not D9. This is because nodes c and d are interacting in both the second and third snapshots. That is, according to constrained counting, the event between c and d at time t3, which is necessary for identifying a graphlet D9 in the network, is considered to be redundant to the event between c and d at time t2. As such, the event between c and d at time t3 is ignored by constrained counting and thus no D9 can be detected
Fig. 4.Comparison of the graphlet approaches in the context of network and node classification, in terms of (a) AUPR and AUROC values and (b) precision-recall curves. For each method, the highest-scoring graphlet size is chosen. For other parameter choices, see Supplementary Tables S3 and S5
Precision of the different methods in the context of aging at the two k values when considering all node pairs (left) and ignoring all node pairs in which both genes are non-aging-related (right)
| Considering all node pairs | Ignoring non-aging-related node pairs | |||
|---|---|---|---|---|
| Static | 0.981 | 0.981 | 0.492 | 0.489 |
| Static-temporal | 0.992 | 0.992 | 0.915 | |
| Dynamic | 0.784 | |||
| Constrained dynamic | 0.993 | 0.993 | 0.684 | 0.681 |
| Random | 0.850 | 0.851 | 0.041 | 0.035 |
For each method, the highest-scoring graphlet size is chosen. In a column, the value in bold is the best result over all methods.
Overlaps (given as percentages of the smaller of the compared data sets), along with P-values of the overlaps (shown in parentheses), of (1) genes, (2) enriched functions and (3) enriched diseases, between each graphlet approach’s novel predictions and the three independent ‘ground truth’ aging-related datasets (BrainExpression2004Age, BrainExpression2008Age and SequenceAge), for the two k values
The first two ‘ground truth’ datasets have been derived via gene expression analyses, whereas the latter has been derived via genomic sequence analyses; see Faisal and Milenković (2014) for details. ‘N/A’ is shown when there are fewer than two objects (genes, functions or diseases) in the overlap. Statistically significant P-values (at 0.05 threshold) are shown in bold. Note that low P-values are extremely encouraging, since we are aiming to validate novel aging-related knowledge. For the same reason, it is not necessarily discouraging when a result is not statistically significant (Faisal and Milenković, 2014). Also, note that a larger relative (percent) overlap between two sets does not necessarily mean a lower P-value, as the P-value also depends on the size of the two sets of interest; see the description of the hypergeometric test in e.g. Faisal and Milenković (2014). There are combinations of overlap type, ‘ground truth’ data, and value of k. For each combination, the value in gray is the best result over the four graphlet methods. By ‘best result’, we mean the lowest P-value, if at least one of the four P-values is significant; otherwise, we mean the largest overlap, unless the overlap is 0. In , and of the combinations, (constrained) dynamic graphlets are superior, comparable or inferior, respectively, to static and static-temporal graphlets.