| Literature DB >> 31404084 |
Sámuel G Balogh1, Dániel Zagyva1, Péter Pollner2, Gergely Palla2.
Abstract
Hierarchical organisation is a prevalent feature of many complex networks appearing in nature and society. A relating interesting, yet less studied question is how does a hierarchical network evolve over time? Here we take a data driven approach and examine the time evolution of the network between the Medical Subject Headings (MeSH) provided by the National Center for Biotechnology Information (NCBI, part of the U. S. National Library of Medicine). The network between the MeSH terms is organised into 16 different, yearly updated hierarchies such as "Anatomy", "Diseases", "Chemicals and Drugs", etc. The natural representation of these hierarchies is given by directed acyclic graphs, composed of links pointing from nodes higher in the hierarchy towards nodes in lower levels. Due to the yearly updates, the structure of these networks is subject to constant evolution: new MeSH terms can appear, terms becoming obsolete can be deleted or be merged with other terms, and also already existing parts of the network may be rewired. We examine various statistical properties of the time evolution, with a special focus on the attachment and detachment mechanisms of the links, and find a few general features that are characteristic for all MeSH hierarchies. According to the results, the hierarchies investigated display an interesting interplay between non-uniform preference with respect to multiple different topological and hierarchical properties.Entities:
Mesh:
Year: 2019 PMID: 31404084 PMCID: PMC6690519 DOI: 10.1371/journal.pone.0220648
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Basic hierarchy data.
| Root name | size range | max. depth | average change | |
|---|---|---|---|---|
| A | Anatomy | 1350–1826 | 10 | 4.66% |
| B | Organisms | 2252–3815 | 13 | 6.49% |
| C | Diseases | 3975–4799 | 8 | 4.23% |
| D | Chemicals and Drugs | 6902–9934 | 11 | 6.22% |
| E | Analytical, Diagnostic and Therapeutic Techniques and Equipment | 2040–2924 | 9 | 4.89% |
| F | Psychiatry and Psychology | 807–1083 | 7 | 3.60% |
| G | Phenomena and Processes | 1733–2259 | 10 | 15.18% |
| H | Disciplines and Occupations | 334–537 | 8 | 12.07% |
| I | Anthropology, Education, Sociology and Social Phenomena | 449–641 | 9 | 5.23% |
| J | Technology, Industry, Agriculture | 254–582 | 10 | 8.92% |
| K | Humanities | 152–200 | 7 | 3.93% |
| L | Information Science | 322–476 | 9 | 5.82% |
| M | Named Groups | 174–290 | 7 | 5.71% |
| N | Health Care | 1072–1795 | 10 | 4.94% |
| V | Publication Characteristics | 137–163 | 6 | 3.44% |
| Z | Geographicals | 369–402 | 6 | 1.94% |
The 1st column lists the hierarchy ID, the 2nd gives the name of the root, the 3rd column provides the minimum and maximum sizes during the time evolution, the 4th contains the maximum level depth, and finally the 5th column lists the average fraction of changed links under one year.
Fig 1Changes between subsequent time steps in a MeSH hierarchy.
a) A small part of the hierarchy ‘A’ (Anatomy) in 2002. Red links are deleted in the next time step b) The corresponding part of the same hierarchy in 2003. Nodes and links colored red are newly appearing elements.
Fig 2Testing W(x) by simulated attachments.
The property x here corresponds to the number of children, and the full symbols connected by continuous lines show the measured W(x) for random attachment (independent of x) in orange (circles), and for preferential attachment with an additive constant (i.e. when a newly added node connects to node i with a probability where a is an arbitrary constant) in blue (squares). Dashed lines correspond to the analytic mean for W(x), whereas the shaded areas indicate the standard deviation around the mean.
Fig 3Measuring preference in attachment and detachment events.
In each panel we compare Wemp(x) defined in (8) to the mean and standard deviation of W(x) for random events, given in (9 and 10) and indicated by dashed lines in shaded areas. The pictograms beside the panels show the type of the studied attachment/detachment events and highlight in red whether the given property x was measured on the source or on the target of the links involved in the events. a) Results for the total number of descendants of source nodes in attachments of new links pointing from old nodes to new nodes in hierarchies D (orange) and C (blue). b) Wemp(x) for the number of ancestors of source nodes on new links appearing between old nodes, measured in hierarchies D (orange) and C (blue). c) The same plots when x is equal to the number of ancestors of the target nodes in link deletion events for hierarchies D (orange) and C (blue). d) Wemp(x) in case x is corresponding to the number of ancestors of the target node in attachment of new links between old nodes.
Summary of the results for hierarchy D.
The columns of the table correspond to the studied different link types, and the rows indicate the studied node property on either the source (top 4 rows) or the target (bottom 4 rows). The 3rd, 4th and 5th columns correspond to impossible link types, therefore, are left empty. The entries in the cells correspond to the following abbreviations: ‘s+’, ‘s0’ and ‘s-’ for strong indication of preference, no preference and anti-preference, ‘p+’ and ‘p-’ for indication of preference or anti-preference with a peak, and ‘i.s’ for insufficient statistics.
| D | link: add | link: del | |||||||
|---|---|---|---|---|---|---|---|---|---|
| source: new | source: old | source: new | source: old | ||||||
| target: new | target: old | target: new | target: old | target: new | target: old | target: new | target: old | ||
| source | child. | s+ | s+ | s+ | s+ | s+ | |||
| par. | s− | ||||||||
| desc. | p− | s+ | s+ | s+ | p+ | ||||
| anc. | s0 | s− | s− | s− | s0 | ||||
| target | child. | s− | s− | s0 | s0 | ||||
| par. | s+ | s0 | s+ | ||||||
| desc. | s− | s− | s0 | s0 | |||||
| anc. | s+ | s0 | s+ | s0 | p+ | ||||
Aggregated summary results.
Based on Table 2. and Tables H-N in S1 Text, the contribution to a given cell is counted according to ‘s+’ = 1, ‘w+’ = ’p+’ = 0.5, ‘s0 = 0’, ‘w–’ = ’p–’ = -0.5, ‘s–’ = -1, and the obtained sum is divided by the number of tables contributing to the given cell. Aggregated cells become ‘i.s’ if more than 3 out of the 7 tables has ‘i.s.’ as well.
|
| |||||||||
|---|---|---|---|---|---|---|---|---|---|
| Σ | link: add | link: del | |||||||
| source: new | source: old | source: new | source: old | ||||||
| target: new | target: old | target: new | target: old | target: new | target: old | target: new | target: old | ||
| source | child. | 1.0, | 0.67, | ||||||
| par. | |||||||||
| desc. | 0.89, | 0.83, | 0.61, | ||||||
| asc. | -0.44, | -0.75, | -0.90, | -0.78, | -0.28, | ||||
| target | child. | ||||||||
| par. | |||||||||
| desc. | |||||||||
| asc. | 0.24, | -0.11, | 0.29, | 0.0, | 0.29, | ||||