William Qian1, Christopher W Lynn2,3,4, Andrei A Klishin5, Jennifer Stiso5, Nicolas H Christianson6, Dani S Bassett1,5,7,8,9,10. 1. Department of Physics and Astronomy, College of Arts and Sciences, University of Pennsylvania, Philadelphia, PA 19104. 2. Initiative for the Theoretical Sciences, Graduate Center, City University of New York, New York, NY 10016. 3. Joseph Henry Laboratories of Physics, Princeton University, Princeton, NJ 08544. 4. Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544. 5. Department of Bioengineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA 19104. 6. Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, CA 91125. 7. Department of Electrical and Systems Engineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA 19104. 8. Department of Neurology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104. 9. Department of Psychiatry, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104. 10. Santa Fe Institute, Santa Fe, NM 87501.
Abstract
Precisely how humans process relational patterns of information in knowledge, language, music, and society is not well understood. Prior work in the field of statistical learning has demonstrated that humans process such information by building internal models of the underlying network structure. However, these mental maps are often inaccurate due to limitations in human information processing. The existence of such limitations raises clear questions: Given a target network that one wishes for a human to learn, what network should one present to the human? Should one simply present the target network as-is, or should one emphasize certain parts of the network to proactively mitigate expected errors in learning? To investigate these questions, we study the optimization of network learnability in a computational model of human learning. Evaluating an array of synthetic and real-world networks, we find that learnability is enhanced by reinforcing connections within modules or clusters. In contrast, when networks contain significant core-periphery structure, we find that learnability is best optimized by reinforcing peripheral edges between low-degree nodes. Overall, our findings suggest that the accuracy of human network learning can be systematically enhanced by targeted emphasis and de-emphasis of prescribed sectors of information.
Precisely how humans process relational patterns of information in knowledge, language, music, and society is not well understood. Prior work in the field of statistical learning has demonstrated that humans process such information by building internal models of the underlying network structure. However, these mental maps are often inaccurate due to limitations in human information processing. The existence of such limitations raises clear questions: Given a target network that one wishes for a human to learn, what network should one present to the human? Should one simply present the target network as-is, or should one emphasize certain parts of the network to proactively mitigate expected errors in learning? To investigate these questions, we study the optimization of network learnability in a computational model of human learning. Evaluating an array of synthetic and real-world networks, we find that learnability is enhanced by reinforcing connections within modules or clusters. In contrast, when networks contain significant core-periphery structure, we find that learnability is best optimized by reinforcing peripheral edges between low-degree nodes. Overall, our findings suggest that the accuracy of human network learning can be systematically enhanced by targeted emphasis and de-emphasis of prescribed sectors of information.
Entities:
Keywords:
complex networks; graph learning; maximum entropy
From a young age, humans demonstrate the capacity to learn the relationships between concepts (1–3). During the learning process, humans are exposed to discrete chunks of information that combine and interconnect to form cognitive maps that can be represented as complex networks (4–9). These chunks of information often appear in a natural sequential order, such as words in language, notes in music, and abstract concepts in stories and classroom lectures (10–14). Further, these sequences are encoded in the brain as networks, with links between items reflecting observed transitions (see refs. 15–18 for empirical studies and 19 for a recent review). Broadly, the fact that many different types of information exhibit temporal order (and therefore network structure) motivates investigations into the processes that underlie the human learning of transition networks (8, 19, 20).To understand the network-learning process, recent studies have investigated how humans internally construct abstract representations of associations (21–23). Using a variety of approaches, from computational models to artificial neural networks, such studies have consistently found that the mind builds network representations by integrating information over time. Such integration enables humans to compress exact sequences of experienced events into broader, but less precise, representations of context (24). These mental representations allow learners to make better generalizations about new information, at the cost of accuracy (22). Here, we focus on one particular modeling approach that accounts for the temporal integration and inaccuracies inherent in human learning. In particular, we build upon a maximum-entropy model, which posits that the mind learns a network representation of the world in a manner guided by a tradeoff between accuracy and complexity (21, 25). Specifically, in order to conserve mental resources, humans will tend to reduce the complexity of their representations at the cost of accuracy by allowing for errors during the learning process.While inaccuracies in human learning can aid flexibility across contexts, they present fundamental obstacles for the human comprehension of transition networks. Thus, a clear question emerges: What strategies should be employed to most effectively communicate the structure of a network to an inaccurate human learner? Prior studies of animal communication and behavior have demonstrated the utility of exaggerating the presentation of certain signals to receivers in offsetting erroneous information processing (26, 27). Similarly, one could imagine that, by emphasizing some features of a network over others, one may be able to correct for errors in human learning. Such an approach of targeted modulation of emphasis may be helpful not only in learning a whole network, but also in optimally learning particularly challenging parts of a network. In fact, humans show consistent difficulties in learning certain motifs in networks, such as the connections between modules (21, 28–30). Taken together, these observations suggest that disproportionately weighting specific network features that are difficult to learn may facilitate human network learning.
Mathematical Methods
To study the optimization of network learnability, we first require a model describing how humans learn networks. Following ref. 21, we write down an analytic model that captures a wide range of human behaviors observed in network-learning experiments. Specifically, we consider a transition probability matrix that describes random walks on the observed network, where A represents the probability of a transition to node j when starting from node i. Then, after observing a sufficient number of random walks, the human’s internal representation of the network converges to the following analytic form:where the single parameter β reflects the accuracy of the human learner. In the limit , the learned network structure f(A) is a fully connected network with uniform edge weights and, hence, bears no resemblance to the actual network A. In contrast, in the limit , the learning process is free of errors, and the learned network structure f(A) is an exact replica of the actual network A.The following question then arises: Given a transition network A and a human learner with accuracy β, what is the optimal input transition network that, when presented to a human learner, results in a perceived network structure that most closely matches the true structure A? In general, there is no reason to presume that it is optimal to present the learner with the true network (such that ). Indeed, teaching and other forms of communication often involve the purposeful emphasis or exaggeration of some pieces of information over others (31–33). Thus, it is possible that modulating emphasis on certain network features, in a precise and targeted manner that serves to counteract natural biases or expected errors, might enhance learnability. Moreover, optimal emphasis strategies are likely to vary from person to person, depending critically upon the accuracy β of the human’s learning process.One natural approach to answering this generic question is to find the input matrix such that the learned representation is equal to the target network A. From Eq. , one can derive for a given β an analytical form for the input such that holds exactly. While this approach is mathematically elegant, it has limited application to real-world scenarios because the resulting often contains negative entries or requires inverting a singular matrix and is, hence, an ill-defined transition matrix (). To overcome this hurdle, one can instead characterize how a perceived network structure diverges from some true structure A using the Kullback–Leibler divergence . We choose the Kullback–Leibler divergence for our analyses because of its connections with information theory and its usage in previous papers that study this model of human network learning (20, 21, 25). In particular, the Kullback–Leibler divergence, as used in this context, can be interpreted as a measure of the inefficiency of the learned representation. To determine the optimal input such that is a well-defined transition matrix, we determine a weighted network with adjacency matrix such that the corresponding transition matrix minimizes the Kullback–Leibler divergence between the learned structure and the true structure A. Practically, we implement this strategy using dual annealing, which is a powerful and common method for bounded optimization (34–36). For simplicity, we restrict our analysis to undirected networks .For large input networks with many edges, this optimization process can become computationally unwieldy. To address this issue, we only consider input networks that respect the symmetries of A: All structurally unique edges in A must have the same edge weight in . Further, we only consider the inclusion of edges in when they have a counterpart in A with nonzero weight. In this manner, the network-optimization process can be parameterized by a significantly smaller number of trainable values for networks with a high degree of symmetry.To investigate possible strategies for enhancing transition-network learnability, we apply the optimization method to two transition networks: a modular network and a lattice network. Both of these networks have 15 vertices and share the property that every node has degree 4. Importantly, previous human experiments were able to directly estimate the accuracy parameter β of human learning in these networks (21). Next, to explore the optimization of learnability in asymmetric networks with nonuniform transition probabilities, we consider the optimization of learnability for networks constructed from generative network models. Lastly, to probe how real-world information networks ought to be designed, we investigate how learnability can be maximized for semantic networks extracted from college mathematics textbooks.In performing these numerical experiments, we are guided by several hypotheses. Specifically, in considering prior work demonstrating the efficacy of exaggeration in animal communication (26, 27), we predict that strategic modulation of emphasis will significantly improve learnability in both synthetic and real-world information networks. Furthermore, in view of prior work demonstrating that highly clustered networks are more learnable than lattice or random networks (25), we hypothesize that optimal emphasis-modulation strategies will reinforce connections within clusters and de-emphasize connections across clusters. And, finally, given the human tendency to attend to salient information, we hypothesize that core–periphery networks will be best learned by emphasizing edges in the network periphery, which would otherwise be less easily learned. Taken together, our numerical experiments aim to assess whether and how the accuracy of human network learning can be systematically enhanced by targeted emphasis and de-emphasis of network features.
Results
Optimizing the Learnability of Graph Exemplars.
The modular graph exemplar.
We begin by studying the learning optimization of the modular graph shown in Fig. 1, which has been used in human learning studies (15, 28–30). In this graph, there are only three structurally unique edges: cross-cluster edges (orange), boundary edges that are adjacent to cross-cluster edges (green), and edges deep within modules (gray). Thus, the structure of can be determined by two free parameters, and λ, representing the weights of cross-cluster and boundary edges in , respectively, relative to the weight of deep edges in . Note that one free parameter has been removed due to the constraint that is normalized, such that all rows sum to one.
Fig. 1.
Optimizing the learnability of a modular graph. (A) A modular graph with 15 nodes, each with degree k = 4, resulting in 30 edges. (B–D) Here, we show the Kullback–Leibler divergence ratio (less than one indicates enhanced learnability) across a section of the λ, λ parameter space, for different values of β. For increased contrast, the ratios have been truncated to the range . (B) Results for , corresponding roughly to the median accuracy of human learners in prior studies (21). (C) Results for , corresponding to the mean accuracy of human learners in prior studies (21). (D) Results for β = 5, corresponding to an exceptionally accurate network learner. (E) The optimal edge weights λ and λ for . (F) The Kullback–Leibler divergence between the learned network and the true network for different values of β, both with and without input network optimization.
Optimizing the learnability of a modular graph. (A) A modular graph with 15 nodes, each with degree k = 4, resulting in 30 edges. (B–D) Here, we show the Kullback–Leibler divergence ratio (less than one indicates enhanced learnability) across a section of the λ, λ parameter space, for different values of β. For increased contrast, the ratios have been truncated to the range . (B) Results for , corresponding roughly to the median accuracy of human learners in prior studies (21). (C) Results for , corresponding to the mean accuracy of human learners in prior studies (21). (D) Results for β = 5, corresponding to an exceptionally accurate network learner. (E) The optimal edge weights λ and λ for . (F) The Kullback–Leibler divergence between the learned network and the true network for different values of β, both with and without input network optimization.To illustrate the parameter regimes where certain weighting combinations of λ and λ are effective in enhancing learnability, we first computed the ratio over the parameter space , for three different values of β (Fig. 1 ). This ratio characterizes the learnability that can be achieved by targeted modulation of emphasis in the network. Specifically, a ratio of less than one would indicate that the emphasized network improves learnability over the true network.Interestingly, at low values of β (Fig. 1
), when the learning process is highly inaccurate, there are two regimes in which an emphasized network structure improves learnability: one that heavily de-emphasizes boundary edges and one that moderately de-emphasizes cross-cluster edges. For intermediate values of β (such as in Fig. 1), the two optimal regimes combine into one. As the learning accuracy increases further (Fig. 1
β = 5), the one optimal regime decreases in size and converges to the true network structure. Thus, for extremely precise learners, the only reasonable network structure to learn would be the true network structure, corresponding to (Fig. 1
β = 5).To assess the precise values of edge weights that lead to optimal learning of the modular graph, we minimize with respect to λ and λ at different values of β (Fig. 1). We find that cross-cluster edges are always de-emphasized, whereas boundary edges are overemphasized for inaccurate learners, but de-emphasized for the average human learner (). We present a graphical depiction of the optimal input network and the resulting learned structure for the modular graph at in Fig. 2.
Fig. 2.
Optimal emphasis modulation of the modular and lattice networks. Here, we show the learned networks resulting from human learning of the modular and lattice networks, respectively (A and B, Upper), as well as from the modular and lattice networks optimized for learnability (A and B, Lower). Optimized and learned networks were both computed at . Edge thickness indicates transition probabilities.
Optimal emphasis modulation of the modular and lattice networks. Here, we show the learned networks resulting from human learning of the modular and lattice networks, respectively (A and B, Upper), as well as from the modular and lattice networks optimized for learnability (A and B, Lower). Optimized and learned networks were both computed at . Edge thickness indicates transition probabilities.
The lattice-graph exemplar.
To understand how optimizing network learnability varies with the topology of the target network, we also study the optimization of learnability of a lattice graph that was examined in human learning studies (21, 29) (). While we find qualitative differences in the efficacy of the optimization process for modular and lattice graphs, we find, similarly, that small, cluster-like substructures in the lattice graph are emphasized to maximize learnability, whereas edges between these substructures are de-emphasized. We present a graphical depiction of the optimal input network and the resulting learned structure for the lattice graph at in Fig. 2.
A Sierpiński graph exemplar.
Next, to assess whether the strategy of overemphasizing edges within clusters and de-emphasizing those between clusters extends to larger networks with more complex community organization, we also consider a Sierpiński network with hierarchical community structure (). Consistent with previous findings, we find that de-emphasizing cross-cluster edges at all hierarchical levels of organization is an effective strategy for optimizing learnability. In addition, we observe that cross-cluster edges at the highest level of organization ought to be de-emphasized the most.
Optimizing the Learnability of Generated Networks.
Stochastic block networks.
Thus far, we have found that the learnability of networks with modular structure is optimized by overemphasizing the edges within clusters of nodes and de-emphasizing the edges between clusters. However, our analyses have focused on networks with high degrees of structural symmetry. Here, we extend our analysis to randomly generated networks, studying the optimization of learnability in stochastic block networks.We consider two classes of stochastic block networks: 1) stochastic block networks in the absence of specific structure–degree correlations, where all cross-cluster edges are equally likely to be included, and all within-cluster edges are equally likely to included; and 2) degree-corrected stochastic block networks with heterogeneous degree distributions. These classes were chosen to assess whether degree heterogeneity, a common feature of real-world networks (37, 38), influences the efficacy of strategies for enhancing network learnability in modular networks. In particular, we consider networks with nodes, 5 communities, and an average degree of (Fig. 3 and B; see for network-generation procedures). For a given stochastic block network G with a normalized transition matrix A, we parameterize the network presented to learners by a single parameter λ, representing the weight of edges between clusters relative to the weight of edges within clusters. We then compute the cross-cluster weight λ that optimizes the learnability of the transition network A.
Fig. 3.
Optimizing the learnability of synthetic modular networks. (A and B) Examples of a standard stochastic block network and a degree-corrected stochastic block network. Node sizes are proportional to node degrees, with cross-cluster edges shown in purple and orange, respectively. (C and D) The optimal cross-cluster edge weight λ for enhancing learnability versus the fraction f of edges within communities at different values of β. Results are shown for stochastic block networks and degree-corrected stochastic block networks, respectively. (E and F) The Kullback–Leibler (KL) divergence ratio achieved with optimal cross-cluster edge weights at different values of β. Results are shown for stochastic block networks and degree-corrected stochastic block networks, respectively. The findings reported in C–F represent results obtained for networks with N = 200 nodes, 5 communities, and an average degree of . Each curve is an average over the results from 25 generated networks.
Optimizing the learnability of synthetic modular networks. (A and B) Examples of a standard stochastic block network and a degree-corrected stochastic block network. Node sizes are proportional to node degrees, with cross-cluster edges shown in purple and orange, respectively. (C and D) The optimal cross-cluster edge weight λ for enhancing learnability versus the fraction f of edges within communities at different values of β. Results are shown for stochastic block networks and degree-corrected stochastic block networks, respectively. (E and F) The Kullback–Leibler (KL) divergence ratio achieved with optimal cross-cluster edge weights at different values of β. Results are shown for stochastic block networks and degree-corrected stochastic block networks, respectively. The findings reported in C–F represent results obtained for networks with N = 200 nodes, 5 communities, and an average degree of . Each curve is an average over the results from 25 generated networks.We begin by analyzing the enhancement of learnability for stochastic block networks without structure–degree correlations. In Fig. 3, we show the optimal cross-cluster weight λ as a function of the fraction f of edges chosen to be within communities. Importantly, we find that the optimal cross-cluster weight decreases considerably as the modularity of the target stochastic block networks increases. Moreover, for higher β, we find that optimally emphasized networks maintain more weight on cross-cluster edges, consistent with our earlier analysis of the modular network (Fig. 1). In addition, increases in the learnability of stochastic block networks (reductions of Kullback–Leibler divergence ratios) are most prominent for values of f above 0.8 and increase considerably as β decreases (Fig. 3).To determine whether the degree heterogeneity impacts the optimization of network learnability, we study degree-corrected stochastic block networks. For such networks, we find that the optimal cross-cluster edge weight λ decreases faster with increasing f than for standard stochastic block networks (Fig. 3). Interestingly, for degree-corrected stochastic block networks, the improvements in learnability are significantly larger than for regular stochastic block networks (Fig. 3 ). For both types of networks, increases in network learnability are most pronounced for low values of β and peak for highly clustered networks with a fraction of within-community edges f = 0.92. This finding indicates that highly modular networks are most optimizable through cross-cluster weight tuning.
Watts–Strogatz networks.
Just as we generalized the analysis of modular networks to stochastic block networks, we can also extend our analysis of lattice networks to a wide range of randomly generated Watts–Strogatz networks (). Consistent with previous analysis, we find that edges in Watts–Strogatz networks that contribute to local, lattice-like clustering are emphasized when maximizing learnability.
Optimizing the Learnability of Semantic Networks Extracted from Mathematics Textbooks.
Our results thus far have demonstrated that, for many classes of synthetic networks, edges that contribute to local clustering or intramodular connections are reinforced to maximize learnability. Still, it remains to be demonstrated that these results extend more generally to real-world information networks. To probe the optimal emphasis modulation strategies of real-world networks, we study semantic networks extracted from college-level linear algebra textbooks (39, 40). Specifically, nodes represent recurring concepts (e.g., “vector space” and “invertible”), and edges between concepts are weighted by the number of sentences in which the two concepts co-occur.In previous analyses, we were able to reduce the number of free parameters in the network-learnability optimization process by considering either network symmetry or a partitioning of the edges into classes that are made distinguishable by the network-generation process (e.g., cross-cluster edges or nonring edges). However, given that these semantic networks are empirical, when optimizing a network representation to maximize learnability, we cannot reduce the number of optimization parameters a priori, and, instead, we must vary all edges with nonzero weight as free parameters. Specifically, for some semantic network with edge weights w and normalized transition matrix A, we determine a weighted graph with weights such that its corresponding normalized transition matrix minimizes . In particular, for every edge , the factor by which the edge e is scaled in is used as an optimization parameter. As before, these parameters are then simultaneously optimized via dual annealing to minimize the Kullback–Leibler divergence cost function. Then, we reduce the number of free parameters by one by enforcing the requirement that the total sum of edge weights in equals that of G. Doing so allows for more interpretable comparisons between the optimized network and the original semantic network A.Interestingly, we find that for these semantic networks, very little improvement in learnability can be achieved at extremely low values of , reflecting poor learning accuracy), but significant enhancement of learnability is possible for all other values of β (Fig. 4). In particular, the network-optimization process yields the most benefit for moderately accurate learners (). This observation contrasts greatly with our prior findings in studying modular networks: that the greatest improvements in learnability occur near , with significantly diminishing improvements at higher β values. One natural explanation for this difference is that the semantic networks do not possess large-scale community structure, but, rather, can be characterized as possessing core–periphery structure with community structure within the periphery nodes (39). Therefore, the learnability-optimization strategies for modular networks—which mainly involved overemphasizing edges within clusters and de-emphasizing cross-cluster edges—are likely not applicable to increasing learnability of these semantic networks.
Fig. 4.
Optimizing the learnability of semantic networks extracted from college mathematics textbooks. (A) A schematic of how edges in the semantic networks were classified based on core–periphery node classification and periphery community structure. (B) The optimal weight scaling for each of the four classes of edges shown at different values of β, averaged over all semantic networks. (C) The Kullback–Leibler (KL) divergence ratio achieved with optimized weight scaling at different values of β. Results are shown for each of the 10 semantic networks corresponding to the 10 college-level linear algebra textbooks (61–70). (D) The distribution of optimized edge-weight scalings for the classes of edges at , aggregated over all semantic networks. Prob., probability. (E and F) The optimal edge-weight scaling versus edge-betweenness centrality and edge-degree centrality, respectively, aggregated over all semantic networks for . Each datapoint represents an average over 500 edges binned by centrality score.
Optimizing the learnability of semantic networks extracted from college mathematics textbooks. (A) A schematic of how edges in the semantic networks were classified based on core–periphery node classification and periphery community structure. (B) The optimal weight scaling for each of the four classes of edges shown at different values of β, averaged over all semantic networks. (C) The Kullback–Leibler (KL) divergence ratio achieved with optimized weight scaling at different values of β. Results are shown for each of the 10 semantic networks corresponding to the 10 college-level linear algebra textbooks (61–70). (D) The distribution of optimized edge-weight scalings for the classes of edges at , aggregated over all semantic networks. Prob., probability. (E and F) The optimal edge-weight scaling versus edge-betweenness centrality and edge-degree centrality, respectively, aggregated over all semantic networks for . Each datapoint represents an average over 500 edges binned by centrality score.To understand which edges are reinforced or de-emphasized to increase learnability, for each semantic network, we follow the procedure outlined in ref. 39 and classify nodes into core and periphery categories (). Then, for each network, we determine the community structure of the periphery nodes and categorize the edges of each network into four categories (Fig. 4): edges between core nodes; edges between a core node and a periphery node; cross-cluster edges between periphery nodes; and within-cluster edges between periphery nodes. For each class of edges, we compute the mean optimal weight scaling over all 10 semantic networks for (Fig. 4). Notably, we find that for , edges between core nodes are de-emphasized the most, whereas edges between periphery nodes are reinforced. This finding is sensible, as the core of a core–periphery network is densely connected. Therefore, any particular edge within the core could be de-emphasized to suppress potential spurious connections to nearby periphery nodes resulting from inaccurate learning. In addition, among the two classes of periphery–periphery edges, those that connect two nodes within the same periphery community tend to be exaggerated, whereas cross-cluster periphery–periphery edges are de-emphasized. These observations regarding cross-cluster and within-cluster periphery–periphery edges are consistent with our prior analyses of optimal cross-cluster edge weights in modular networks, which also suggest that cross-cluster weights should be de-emphasized.We can further characterize the types of edges that are either overemphasized or underemphasized by comparing changes in edge weight with structural measures of centrality (). First, we consider the relationship between edge-weight scaling and edge-betweenness centrality, a metric that quantifies the frequency with which shortest paths pass through a given edge. Given that cross-cluster edges have high edge-betweenness, it is natural to expect that edges with high betweenness will be de-emphasized when optimizing learnability. Indeed, we observe that, aside from edges with an edge betweenness centrality of zero, there is a clear inverse relationship between edge-betweenness centrality and optimal edge-weight scaling in semantic networks (Fig. 4
).We also consider edge-weight scaling and edge-degree centrality, a metric that quantifies the average weighted degree of the two connected nodes. Edges within the core of a core–periphery network are likely to be incident on nodes with greater connectivity and, thus, would generally have higher edge-degree centrality. Thus, we expect that edge-degree centrality will be inversely related to optimal edge-weight scaling. Consistent with these expectations, we observe that edges with lower edge-degree centrality tended to be reinforced more, and edges with higher edge-degree centrality tended to be de-emphasized more (Fig. 4).
Performance of Network-Optimization Strategies in the Transient Network-Learning Regime.
Thus far, our analysis has implicitly assumed that human network learners are allowed to observe infinite sequences of network transitions (21, 25). While this assumption greatly simplifies optimization strategies, it is possible that such strategies break down when the number of transitions that human learners are allowed to observe is limited. To address this possibility, we ran simulations of the network-learning process in the transient regime with three different network-learning strategies (): 1) maximum-likelihood estimation (optimal in the infinite observation limit), 2) standard human network learning (as reported in ref. 21), and 3) optimized human learning (as described here in our paper). Both the standard and optimized human learning strategies were evaluated at , which is close to the mean learning accuracy reported in ref. 21. These simulations were run for the modular graph with 15 nodes (Fig. 1) and the semantic networks extracted from the linear algebra textbooks authored by Axler and Edwards (Fig. 1 , respectively).For the modular network, optimized human learning maintained an edge in accuracy over standard human network learning, as the number of transitions observed increased (Fig. 5). Both learning strategies were outperformed by maximum-likelihood estimation throughout the duration of the simulated learning processes. Remarkably, for the semantic networks, both human learning and optimized human learning initially outperformed the accuracy of maximum-likelihood estimation, with optimized human learning maintaining its superiority for a longer duration (Fig. 5 ). One plausible explanation is that the inductive biases introduced in the human learning process enable humans to initially learn clustered areas of networks more efficiently than unbiased maximum-likelihood estimation strategies.
Fig. 5.
Performance of network-learning strategies in the transient regime. Each panel shows the Kullback–Leibler (KL) divergence between some true network and the learned network as a function of the number of observed transitions. Three network-learning strategies are shown: maximum-likelihood estimation (optimal in the limit of infinite observations), standard human network learning (supported by ref. 21), and optimized human network learning (introduced in this paper). All plots report 10 simulations of each network-learning strategy, with human learning and optimized human learning simulations run at , close to the median learning accuracy reported in ref. 21. The three networks analyzed are the modular network with 15 nodes (A) (Fig. 1), the semantic network extracted from the linear algebra textbook authored by Axler (B) (61), and the semantic network extracted from the linear algebra textbook authored by Edwards (C) (63).
Performance of network-learning strategies in the transient regime. Each panel shows the Kullback–Leibler (KL) divergence between some true network and the learned network as a function of the number of observed transitions. Three network-learning strategies are shown: maximum-likelihood estimation (optimal in the limit of infinite observations), standard human network learning (supported by ref. 21), and optimized human network learning (introduced in this paper). All plots report 10 simulations of each network-learning strategy, with human learning and optimized human learning simulations run at , close to the median learning accuracy reported in ref. 21. The three networks analyzed are the modular network with 15 nodes (A) (Fig. 1), the semantic network extracted from the linear algebra textbook authored by Axler (B) (61), and the semantic network extracted from the linear algebra textbook authored by Edwards (C) (63).
Discussion
In this article, we study how networks presented to human learners can be tuned to increase learnability. In particular, using a computational model of human learning, we compute the optimal network to present to a human learner so as to minimize the discrepancy between the learned representation and the target network. First, these methods were used to analyze two simple networks: a modular graph and a lattice graph. We find that for both graphs, improvements in learnability can be made by de-emphasizing edges that connect different modules or clusters of nodes and by exaggerating edges within modules or small clusters. This finding is consistent with studies of in silico models and in vivo animal behavior of sampling spaces. Animals and computational models exhibit nonrandom patterns of exploration in order to better sample an environment with nonregular network structure (41), effectively emphasizing and overrepresenting specific harder-to-learn portions of the environment (42). Further, these improvements increase considerably in magnitude for highly inaccurate human learners, but are less advantageous for accurate learners. Importantly, for inaccurate learners, the optimal input networks for both modular and lattice graphs result in internal network representations that capture clusters in the original network in a near-perfect manner, but poorly capture edges between small clusters or modules. Notably, edges between communities or clusters are already naturally difficult to learn in the absence of disproportionate edge-weighting in the input network (21, 25), but are found to be worth de-emphasizing further when optimizing overall learnability. Our findings are consistent with prior work showing that the difficulty of learning cross-cluster edges in modular networks is robust to the size and number of modules in the network (43).Then, to probe whether our findings with the modular and lattice networks extended more generally to larger, more complex networks lacking a high degree of structural symmetry or uniform transition probabilities, we analyzed the optimization of learnability of networks generated from generative network models. We first began by studying the optimization of stochastic block networks. Importantly, we observed that for stochastic block networks with a high fraction of edges within communities, significant gains in network learnability can be achieved only by tuning a single parameter representing the weight of all cross-cluster edges. Specifically, we observed that stochastic block networks optimized for learnability de-emphasized cross-cluster edges. Next, motivated by the prevalence of heterogeneous degree distributions among real-world networks (37, 38, 44–46), we investigated the optimization of learning for degree-corrected stochastic block networks. Through applying a similar single-parameter optimization approach, we found that degree-corrected stochastic block networks share similar learning-optimization properties to standard stochastic block networks when tuning cross-cluster edge weights. However, the efficacy of these optimization strategies in improving network learnability was found to be slightly higher for degree-corrected stochastic block networks. This finding suggests that the learning of networks with hierarchically modular organization can be improved significantly more (using this cross-cluster edge-weight tuning) than can random modular networks. Taken together with prior work showing that hierarchically modular networks share similar information-theoretic properties with a large class of real-world networks (25) that random modular networks lack, these findings have implications for how features of real-world information networks ought to be weighted or designed.Then, to understand optimal strategies for enhancing the learnability of real-world information networks, we analyzed how semantic networks extracted from college-level mathematics textbooks can be reweighted to maximize learnability. These networks exhibit core–periphery structure, indicating that they are composed of nodes that can roughly be divided into a densely connected core and a periphery that is loosely connected to nodes within the core (47–49). In addition, prior work has established that the periphery of these semantic networks possesses community structure (39). Importantly, unlike modular networks, we find that the semantic networks are not very optimizable near , but are significantly optimizable, even for moderately large values of β. One explanation for the difference in optimization near between the modular networks and semantic networks is that, for modular networks, optimal input networks approach disconnected graphs as . Typically, as , the learned representation from any input network approaches a uniform network with no particular structure (21, 25). However, a competing limit may occur when studying the optimization of learnability of modular networks, in that cross-cluster weights may also approach zero. In this limit, a network presentation approaches disconnected components, with each component representing a module of the original network. Thus, when β is taken close to zero, learning inaccuracies primarily strengthen edges within modules, causing only minor decreases in overall accuracy of the optimized network, as nodes within modules or clusters are already densely connected with each other in the original target network. In contrast, for networks without overall community structure, such as the semantic networks extracted from the mathematical texts, the optimal input graph always remains connected as , and, thus, any learned representation will approach an all-to-all network with uniform transition probabilities in this limit.To characterize the features of these semantic networks that become reinforced or de-emphasized when optimizing for learnability, we categorize the edges in each of the semantic networks by the core–periphery status of their endpoints, as well as by cross-cluster or within-cluster participation in periphery community structure. We find that edges contained solely among the periphery of these networks are emphasized the most, whereas core edges are de-emphasized, even at higher β. These findings suggest that the learnability of information networks can generally be enhanced by placing additional emphasis on relationships between less commonly occurring concepts and de-emphasizing highly central concepts. To assess this idea further, we investigated how the centrality measures of edge betweenness and edge degree were associated with the scaling of edge weights for these semantic networks. In particular, we found that both centrality measures are negatively correlated with optimal edge-weight scaling, confirming that, in these semantic networks, edges that are highly central generally should be de-emphasized to maximize learnability.The optimization of networks for human learnability presents new angles for understanding real-world information networks. Recent work has demonstrated that science texts covering the same topic may vary greatly in their lexicon profiles and didactic approach (50). Understanding how edges in information networks associated with educational materials should be modulated may also be an insightful way of profiling the semantic content of educational materials. Furthermore, while we have explored the optimization of semantic networks, other real-world information networks, such as phonological networks, may exhibit significantly different local and global topology (51). The erroneous clustering introduced in the human network-learning process would be especially detrimental for learning triangle-free substructures. Therefore, for some phonological networks, where star-like structures and leaf nodes are more common, one might expect that the network-optimization strategies employed in this work would de-emphasize edges to leaf nodes. Further investigation of how other kinds of information networks should be optimized may reveal new insights into network substructures that are optimal for learning.Finally, we would like to draw attention to a possible connection between the ideas explored in this paper and the distinctions between human learning and machine learning. In particular, the simulations of network learning in the transient regime demonstrate that, although perfect, machine-like learning strategies outperform human learning after many observed transitions, human-learning strategies work surprisingly well in the low-data regime. This finding mirrors the observation that humans are capable of learning many tasks from few observations, in comparison to common machine-learning methods that often require large amounts of training examples to achieve decent performance. In view of this connection, an insightful way of explaining our results on transient-regime learning is that human learning strategies enforce helpful inductive biases in the learning process. Much like how regularization strategies in machine learning act as inductive biases and often improve performance in the low-data regime, the inductive biases of human learning greatly increase the data efficiency of the learning process. However, in both kinds of learning, when data are plentiful, inductive biases can hinder performance. While the present work primarily focuses on learning in the limit of infinite observations, future work should expand upon learning dynamics in the transient regime to uncover additional insights related to this connection.
Methodological Considerations.
We note that our results on optimizing the learnability of generated networks are strictly a lower bound on improvements in human learnability of networks that can be afforded through targeted emphasis and de-emphasis of particular input network features. First, it is possible that relaxing the symmetry constraint enforced in the optimization process may lead to further improvements in network learnability. In addition, since we only consider optimization of learnability of generated networks via single-parameter tuning (cross-cluster edge weights or nonring edge weights), it is likely that more nuanced emphasis-modulation strategies may enhance learnability even further, beyond what has been demonstrated for both classes of stochastic block networks analyzed, as well as for Watts–Strogatz networks. Similarly, our results on semantic networks extracted from mathematics textbooks only represent lower bounds on improvements in learnability that can be achieved through targeted emphasis modulation. This is a consequence of the fact that each of the semantic networks analyzed had thousands of edge weights to be varied as parameters, and, thus, achieving globally optimal network representations via dual annealing was not always possible. In addition, the analyses presented in this work have yet to consider whether adding edges that are entirely nonexistent in a target network to the input network presented to a human learner may enhance learnability of the target network. Preliminary findings suggest that increases in learnability can indeed occur when nonexistent edges are added to an input network representation (). Future work could fruitfully consider how relaxing edge-existence constraints may affect the efficacy of the estimated optimal network representations.
Conclusion
Recent advances in the study of human information processing have shed light on the ways that humans learn information networks. Rather than mapping the structure of networks exactly, inaccuracies in human learning often result in erroneous or biased internal representations. To overcome inaccuracies in human learning, we investigate how networks presented to human learners can be designed to counteract and minimize inaccuracies in learning. Across a range of synthetic networks, we find that reinforcing edges within clusters and de-emphasizing edges between clusters improves network learnability. In addition, we analyze how real-world semantic networks can be optimized for learnability, thereby uncovering the fact that relationships between periphery concepts ought to be reinforced. Together, our findings demonstrate that the learnability of network representations can be significantly enhanced through intentionally modulating the emphasis of specific network features.
Materials and Methods
Converting between Weighted Graphs and Transition Matrices.
To convert between graphs and transition matrices, we use the following two relations:Given a weighted graph G, the normalized transition matrix corresponding to random walks is given by .Conversely, the adjacency matrix for a weighted, undirected graph corresponding to a reversible transition matrix A is determined up to a multiplicative constant by , where π is the stationary distribution of A.
Optimization Methods.
To assess how much a learned network differs from the target network, we use the Kullback–Leibler divergence. The Kullback–Leibler divergence between normalized transition networks A and B is defined aswhere π is the stationary distribution of A, and only terms with nonzero transition probabilities are summed over.For some target transition structure A, the optimal input structure was determined by using the dual-annealing optimization method in scipy, with as the cost function. During the optimization process, the edge weights of corresponding to edges that exist in A were varied as free parameters bounded between zero and one. The matrix was then normalized prior to every cost-function evaluation. For the modular and lattice graphs, the number of free parameters varied during optimization was reduced based on the networks’ respective symmetries. Network symmetries were computed by using the iGraph package in Python (52).
Network Generation.
We generated stochastic block networks as follows: Starting with N nodes, we first assigned each node to a community labeled 1 through 5 uniformly at random. Then, edges were added between nodes in the same module, where is the total number of edges in the network. Specifically, each addition of a within-cluster edge was performed by selecting a community uniformly at random, selecting two nonadjacent nodes within the community uniformly at random, and then adding an edge between the selected nodes. Finally, edges were then added between nodes in different communities. This process was performed by selecting two different communities uniformly at random, then selecting one node in each community uniformly at random, and connecting the nodes if they were nonadjacent. If they were adjacent, the step was repeated.For degree-corrected stochastic block networks, a similar network-generation method was used, with modifications adapted from the procedure described in ref. 25. Specifically, each node was assigned an index from i from 1 to 200 and was then assigned a weight . Then, the probability of a given node being chosen for the addition of an edge at any step was proportional to the weight of the node.
Analysis of Semantic Networks.
Given a weighted network with edge weights w, we identified core–periphery structure by finding a partition of the vertex set V into disjoint sets C and P so that the following core-ness quality function is maximized:where v is a normalization constant, is the average over all edge weights, and γ is a resolution parameter that we set to one.Given a partition into core and periphery nodes, we evaluate the community structure of the periphery of G by maximizing the following modularity quality function on the subgraph of G induced on P:where v is a normalization constant, is the sum over the weights of all edges incident on vertex i, and is a resolution parameter that we set to one.The edge-betweenness centrality for some edge e is defined aswhere is the number of weighted shortest paths between nodes s and t, and is the number of such shortest paths that go through the edge e.For an edge , we define the weighted edge degree centrality aswhere is the weighted degree of node i in the network.
Simulating Transient Network Learning.
While observing random walks drawn from some transition network, the maximum-likelihood estimate for the transition network can be described bywhere n is the number of observed transitions from node i to node j by time step t. As described in ref. 21, the human learned representation takes a similar form:where is the revised (and erroneous) count of transitions from node i to node j. In particular, it is described bywhere is the node observed at time step t + 1, is the Iverson bracket, and encodes an internal belief about which node was observed at time t, described bywhere Z represents a normalizing constant.
Citation Diversity Statement.
Recent work in several fields of science has identified a bias in citation practices, such that papers from women and other minorities are undercited relative to the number of such papers in the field (53–58). Here, we sought to proactively consider choosing references that reflect the diversity of the field in thought, form of contribution, gender, and other factors. We obtained predicted gender of the first and last author of each reference by using databases that store the probability of a name being carried by a woman (53, 59). By this measure (and excluding self-citations to the first and last authors of our current paper), our references contain 21.05% woman(first)/woman(last), 7.89% man/woman, 14.17% woman/man, and 56.88% man/man. This method is limited in that 1) names, pronouns, and social media profiles used to construct the databases may not, in every case, be indicative of gender identity; and 2) it cannot account for intersex, nonbinary, or transgender people. We look forward to future work that could help us to better understand how to support equitable practices in science.
Authors: Isabelle Peretz; Nathalie Gosselin; Pascal Belin; Robert J Zatorre; Jane Plailly; Barbara Tillmann Journal: Ann N Y Acad Sci Date: 2009-07 Impact factor: 5.691
Authors: Timothy E J Behrens; Timothy H Muller; James C R Whittington; Shirley Mark; Alon B Baram; Kimberly L Stachenfeld; Zeb Kurth-Nelson Journal: Neuron Date: 2018-10-24 Impact factor: 17.173
Authors: James C R Whittington; Timothy H Muller; Shirley Mark; Guifen Chen; Caswell Barry; Neil Burgess; Timothy E J Behrens Journal: Cell Date: 2020-11-11 Impact factor: 41.582