Literature DB >> 27516705

Will big data yield new mathematics? An evolving synergy with neuroscience.

Abstract

New mathematics has often been inspired by new insights into the natural world. Here we describe some ongoing and possible future interactions among the massive data sets being collected in neuroscience, methods for their analysis and mathematical models of the underlying, still largely uncharted neural substrates that generate these data. We start by recalling events that occurred in turbulence modelling when substantial space-time velocity field measurements and numerical simulations allowed a new perspective on the governing equations of fluid mechanics. While no analogous global mathematical model of neural processes exists, we argue that big data may enable validation or at least rejection of models at cellular to brain area scales and may illuminate connections among models. We give examples of such models and survey some relatively new experimental technologies, including optogenetics and functional imaging, that can report neural activity in live animals performing complex tasks. The search for analytical techniques for these data is already yielding new mathematics, and we believe their multi-scale nature may help relate well-established models, such as the Hodgkin-Huxley equations for single neurons, to more abstract models of neural circuits, brain areas and larger networks within the brain. In brief, we envisage a closer liaison, if not a marriage, between neuroscience and mathematics.

Entities: Chemical Disease Gene Species

Keywords: big data; connectome; macroscopic theories; mathematical models; neuroimaging

Year: 2016 PMID： 27516705 PMCID： PMC4975073 DOI： 10.1093/imamat/hxw026

Source DB: PubMed Journal: IMA J Appl Math ISSN： 0272-4960 Impact factor: 0.845

1. Introduction

Discussions of ‘big data’, largely fuelled by industry’s growing ability to gather, quantify and profit from massive data sets, permeate modern society. In 2008 a short opinion piece announced ‘The End of Theory,’ arguing that ‘the data deluge makes the scientific method obsolete’ and ‘Petabytes allow us to say: ‘Correlation is enough’,’ (Anderson, 2008). We strongly disagree: correlation may be enough for e-marketing, but it surely does not suffice for understanding or scientific progress. Nonetheless, big data are transforming the mathematical sciences. For example, Napoletani identify new methodological ‘motifs’ emerging in the use of statistics and mathematics in biology. A panel at the Society for Industrial and Applied Mathematics’ 2015 Conference on Computational Science and Engineering addressed the topic (Sterk & Johnson, 2015), as did a symposium on data and computer modelling also held in Spring 2015 (Koutsourelakis ). Also, see Donoho (2015) for a historically based view of how the big data movement relates to statistics and machine learning. In this article, we discuss implications for the underlying mathematical models and the science they represent: where might the waves of data carry applied mathematicians? New experimental technologies and methods have produced similar excitement in neuroscience. Optogenetics, multi-electrode and multi-tetrode arrays and advanced imaging techniques yield massive amounts of in vivo data on neuronal function over wide ranges of spatial and temporal scales (Deisseroth ; Mancuso ; Spira & Hai, 2013; Lopes da Silva, 2013), thus revealing brain dynamics never before observed. The connectome (wiring diagram) of every neuron and synapse in a local circuit, or an entire small animal, can be extracted by electron microscopy (Seung, 2012; Kandel ; Burns ). Analyses of the resulting graphs, which contain nonlinear dynamical nodes and evolving edges, will demand all that statistics and the growing array of geometrical and analytical data-mining tools can offer. More critically, creating consistent, explanatory and predictive models from such data seem far beyond today’s mathematical tools. In response, funding bodies and scientific organizations have identified brain science as a major mathematical and scientific problem. In 2007, the first of 23 mathematical challenges posed by DARPA was the Mathematics of the Brain; in 2013, the European Commission’s Human Brain Project dedicated over 1 billion Euros over a 10-year period to interdisciplinary neuroscience (Markram, 2012; European Commission, 2014), and the United States’ BRAIN Initiative launched with approximately 100 million USD of support in Obama’s 2014 budget (Insel ; The White House, 2013). In addition to governmental support, the past 15 years have seen numerous universities establish neuroscience programs and institutes, as well as the creation of extra-academic efforts like the Allen Institute for Brain Science, which has raised over 500 million USD in funding and employs almost 100 PhDs (Allen Institute for Brain Science, 2015). We believe that the accelerating collection of pertinent data in neuroscience will demand deeper connections between mathematics and experiment than ever before, that new fields and problems within mathematics will be born out of this, and that new experiments and data streams will be driven, in return, by the new mathematics. We (optimistically) envisage a synergy as productive as that between physics and mathematics which began with Kepler, Brahe and Newton. As experiment and theory develop in tandem, brain science could drive analysis and mathematical modelling much as celestial mechanics and the mechanics of solids and fluids has driven the development of differential equations, analysis and geometry over the past three centuries. Applied mathematicians familiar with the current big data enthusiasm may rightly feel uneasy. More data cannot trivially overcome theoretical obstacles; indeed, the emergence of spurious correlations for large and the multiple testing problem have caused serious errors (Ioannidis, 2005; Button ; Colquhoun, 2014). However, reproducible massive data, upon which theories may be established and/or conclusively falsified, will surely bring changes. Scientists have traditionally wielded Occam’s razor: ‘the best theory is the simplest explanation of the data’. More data should not tempt us to abandon it; indeed, statistical and computational learning theories enable the search for simple explanations of complicated phenomena. Vapnick–Chervonenkis (VC) theory, Rademacher complexity, Bayesian inference and probably approximately correct (PAC) learning are just some frameworks that precisely formulate the intuition that the simplest models are the most likely to generate predictive insights (Vapnik & Vapnik, 1998; Bartlett & Mendelson, 2003; Valiant, 1984). Briefly, given data, an associated probability structure and a hypothesized class of models, these frameworks provide complexity measures and probabilistic bounds on the inferred model error ‘outside’ of the data used for fitting (i.e. generalization). Higher model complexity typically corresponds to weaker bounds on the generalization error. The PAC framework requires that this inference from data be efficiently computable. The proper application of such ideas is crucial for avoiding modelling pitfalls, and we believe that properly generalized models may well become more complex. Our claims are organized as follows. In Section 2 we take a historical perspective by recalling an example from physics: low-dimensional models of turbulence. We return to neuroscience in Section 3, providing some background before introducing a well-established model of single cells, the Hodgkin–Huxley equations, in Section 3.1. In Sections 3.2–3.4 we describe how optogenetics, voltage sensitive dyes, direct electrical recordings and non-invasive imaging methods are beginning to interface with models of larger neural networks and broader cognitive processes. In Section 3.5 we discuss models based on optimal theories, against which performance in simple tasks can be assessed. Section 4 contains a discussion and concluding remarks. Our discussions are brief, but we provide over 250 references for those wishing to explore primary sources.

2. Turbulence models: a historical perspective

We review an analytical approach to understanding turbulent fluid flows that was proposed in the 1960s but only developed after relatively fast data collection and computational methods of analysis became available in the 1980s. In combining big data (for the 1980s) with constraints from physics and mechanics, it may serve as a model for progress in theoretical neuroscience. Fluid mechanics has a considerable advantage over neuroscience, in that the Navier–Stokes equations (NSEs) provide a widely accepted model of the dynamics of a fluid subject to body forces and constrained by boundaries. For an incompressible fluid, they take the form where and denote the velocity and pressure fields, represents body forces and the non-dimensional ‘Reynolds’ number’ —the ratio of a length multiplied by a velocity divided by the kinematic viscosity of the fluid—quantifies the range of spatial scales active in the flow (Holmes ). Appropriate boundary conditions, possibly involving moving surfaces, must also be provided. These equations were derived in the early 19th century, and a vast analytical and numerical machinery has been developed to study them. Much beautiful and deep mathematics has emerged in the process, and although global existence of smooth solutions of these nonlinear partial differential equations (PDEs) in three space dimensions remains unproven, it is generally believed that they provide an excellent model of fluid flow over a wide range of temperatures and pressures. Numerical simulations of them are routinely used to design aircraft, ships, road vehicles and medical devices. In modelling a homogeneous medium the NSEs provide an ostensibly simpler challenge than that of the heterogeneous and multi-scale nervous system. Indeed, no such equations exist for the governing dynamics of the brain—at best there are good models for neural dynamics at specific spatial scales or in experimentally controlled paradigms (see Section 3.1–Section 3.5 below). Nonetheless, understanding turbulent flows has proved difficult, and it remains an important problem among engineers, physicists and mathematicians. Very few solutions of the NSEs are explicitly available in terms of functions known to mathematics, and most of them describe low-speed laminar flows. Existence of finite-dimensional attractors for NSEs in two space dimensions was proved in the 1980s (Constantin & Foias, 1980) and the discovery of deterministic chaos in ordinary differential equations (ODEs) such as the Lorenz convention model (Lorenz, 1963; Tucker, 2002) suggests possible mechanisms for the appearance of turbulence. In certain cases, such as Taylor–Couette flow (Chossat & Iooss, 1994) and some water wave problems, amplitude equations can be semi-rigorously derived and evidence of bifurcations consistent with deterministic chaos found in the resulting reduced systems, e.g. (Chomaz, 2005), but to our knowledge the general existence of attractors, let alone chaos or strange attractors, is still an open problem in three space dimensions. The difficulty of analysing the NSEs has led to numerous ‘submodels’, including statistical theories and perturbative reductions to simpler, albeit still nonlinear PDEs that describe wave motions and the like. Meanwhile, experimental observations revealed characteristic fluid instabilities and coherent structures: persistent and spatiotemporally rich flow patterns such as eddies and wakes behind obstacles. (Drawings of such structures appear in the works of Leonardo da Vinci (Holmes , Section 2.2).) In a short paper published in 1967 (Lumley, 1967), J.L. Lumley proposed that coherent structures might be extracted from flow fields using the proper orthogonal decomposition (POD), an infinite-dimensional version of principle components analysis (PCA) that identifies the energetically dominant components of the flow field. This requires the computation of correlation tensors from a database of 4D space- and time-dependent velocity fields, followed by solution of a high-dimensional discretized eigenvalue problem. It was first achieved for the near-wall region of a turbulent boundary layer in a pipe, almost 20 years after Lumley’s proposal, by his student Herzog (1986). The discretized version of POD, like PCA, essentially takes a collection of -dimensional data samples (a cloud of points in , representing observations of velocity fields) and computes from the correlation matrix, a set of orthogonal eigenvectors that define the directions in which the cloud extends, with successively decreasing eigenvalues that quantify the average squared norm projected onto each (the energy of that mode). The representation is the POD of , and for fluid flows, expression in terms of the basis spanned by the empirical eigenfunctions maximizes the kinetic energy captured by any finite (-dimensional) truncation: Equipped with Herzog’s data, (1) was projected onto a low-dimensional subspace spanned by a subset of the empirical eigenfunctions, the full set of which provide a basis for the space of solutions, and it was shown that the resulting finite set of ODEs could capture key features of coherent structure dynamics (Aubry ). Moreover, the ODEs revealed an interesting class of solutions:—structurally stable heteroclinic cycles (Armbruster )—that reproduced the repeated formation of streamwise streaks and their disruption in turbulent bursts, thus identifying a dynamical mechanism. (Such cycles have since been extensively analysed in abstract settings.) Much subsequent work on low-dimensional models has used POD bases derived from numerical simulations of NSE that can provide finer spatial and temporal resolution than experimental measurements (e.g. Smith , 2005b and see Berkooz and Holmes for further references). This approach to turbulence exemplifies the combination of a mathematical model—the NSE based on Newtonian mechanics—with big data of many flow field observations over time. The latter, via POD, identify the most interesting region of the infinite-dimensional space of velocity field to examine; the former describe the appropriate physics to simulate. However, even given an accepted model and the apparently unbiased POD method to extract the dominant eigenfunctions, naive truncations of the projected NSE that neglect all modes above the th can exclude important modes. Short-lived, unstable velocity fields that carry little energy on average can play fundamental roles in the overall flow dynamics and thus be more important than their relatively small empirical eigenvalues indicate. Here the model provides guidance. Characteristic resonances among Fourier modes with different wavenumbers resulting from the quadratic nonlinearity in the NSEs’ convected derivative (the term in (1)) along with careful analyses of symmetry groups derived from the geometry of the flow domain assist in choosing sets of modes to include (Smith ,a). Kinetic energy is dissipated at very small spatial wavelengths, so the energy cascade and viscous dissipation that occur at modes above the th must be modelled by ‘eddy viscosity’ or similar means (Holmes et al., 2012). One can also focus on confined flows with few dominant spatial scales or, in the case of translation symmetry, minimal flow units that retain one unit of an approximately periodic pattern (Jiménez & Moin, 1991). With such refinements (or, less politely, subterfuges), truncations to modes can capture and help explain the underlying dynamics of turbulence production. This brief history illustrates that applying new mathematics (POD, dynamical systems theory and symmetry groups) to the study of a substantial data set can advance understanding in a field in which a good ‘global’ model exists. In concert with NSE, new data enabled a class of models that preserve the key physics and are simpler to analyse than NSE. More generally, since the 1970s the use of dynamical systems and bifurcation theory has enlivened fluid mechanics (e.g. see Swinney & Gollub, 1985; Chossat & Iooss, 1994 and references in in Holmes ) and motivated numerous experiments seeking to observe predicted bifurcations and dynamical behaviours, including deterministic chaos (e.g. Andereck ). In the work described above firm evidence of chaos was not found, but the fact that the turbulent flow outside the wall region was neglected due to the restricted data set led to the introduction of a model for pressure due to this flow as additive noise in the projected ODEs (Stone & Holmes, 1989, 1990). This revealed a mechanism responsible for the bursting statistics in boundary layers (Stone & Holmes, 1991). Incomplete data can also generate interesting hypotheses and require more mathematics. Space precludes discussion of other methods and submodels that have been developed to address different aspects of turbulence. These include Reynolds averaging (Wilcox, 2006) and models of eddies (Spalart, 2009; Meneveau & Katz, 2010), some of which are used in conjunction with POD reduction (e.g. eddy viscosity, mentioned above), others use stochastic PDE models (Hairer & Mattingly, 2006; Debussche, 2013) to show that certain types of noise can produce ergodicity on an invariant subspace.

3. Big neuroscience and new mathematics

Before delving into the current neuroscience landscape, an introductory note may be helpful. The human brain contains neurons: electrically active cells that communicate by emitting ‘action potentials’ (APs). These brief ( ms) voltage spikes travel along axons to release neurotransmitter molecules at ‘synapses’ with other neurons, either promoting or delaying their APs. On average, each neuron connects to others in a dynamic network. In the central nervous system (CNS) as many other neurons control our physiological rhythms. Several hundred different types of neurons have been identified, mostly on the basis of morphology (see Jabr, 2012; Kandel et al., 2000 and The Neuroscience Lexicon at http://www.neurolex.org/). This is a very big nonlinear dynamical system that will require many different types of mathematical modelling and analysis. To make matters worse, the summary above is idealized. For example, Glial cells for example, are typically ignored in mathematical models. Yet they most certainly affect the macro-landscape, being neuronal partners crucial to brain development and repair after trauma. They also affect neurotransmitters (Araque ), intracellular calcium (Newman & Zahs, 1998), and likely play a role in Alzheimer’s disease (Nagele et al., 2004). Compared to fluid mechanics, neuroscience is an infant and theoretical neuroscience may still be in embryo. There is no near-term prospect of a macroscale or continuum model of a CNS or brain analogous to NSE, and given the inhomogeneity of gray and white matter, such a model would be very complex. We shall argue below for the continued development of different models at different spatial and temporal scales but also stress that relating them to each other and ideally, deriving macroscale models from those at smaller scales are suitable goals. Excepting Section 3.1, which treats a more established synergy, the following subsections describe how bigger data are driving developments that we believe will supply important mathematical challenges in the decades to come.

3.1. The HH equations, single cells and small circuits

A fundamental and famous mathematical model in neuroscience derives from painstaking experiments on the squid giant axon performed by Hodgkin and Huxley in the 1940s and early 1950s (Hodgkin et al., 1952; Hodgkin & Huxley, 1952a,b,c,d). They proposed an ODE model of a single neuron as a homogeneous mix of ionic species governed by Kirchhoff’s current law for the potential difference across the cell membrane (Hodgkin & Huxley, 1952d): Here is the membrane capacitance, the sum is taken over channels with conductances carrying different ions, is the reversal potential for ionic specie (so called because the current reverses direction at ) and is the current due to synaptic inputs from other neurons. Some current research uses continuum descriptions of neural tissue at a macroscopic scale by PDEs or integro-differential equations (e.g. Ermentrout ), an approach pioneered by Wilson and Cowan in the 1970s (Wilson & Cowan, 1972, 1973; Wilson, 1999, Section 7.4), but the majority of cellular-scale biophysically based modelling follows Hodgkin and Huxley’s lead. Equation (5) appears simple: for constant should settle on a stable ‘resting potential’ determined by the ’s and 's. Alas for analysis, but happily for brain function, the conductances depend on , which is modelled by adding ODEs for gates in the ion channels: Here and characterize the fraction of open channels and their timescales governing approach to The conductances are typically polynomials and , are sigmoidal functions. Equations~(5-6), describing how varies as channels open and close, are therefore nonlinear and insoluble in closed form, and, like small neural circuits in vitro and brains in vivo, they can produce sustained and even chaotic trains of APs (Aihara, 2008). Hodgkin & Huxley (1952d) extended their model to a diffusive PDE to describe AP propagation along axons, and further nonlinear ODEs can account for neurotransmitter release and uptake at receptors on dendrites of the post-synaptic neurons. As more has been learned about ionic currents in different neurons, many extensions to the HH equations have been developed, some with over a dozen gating variables and multiple compartments allowing representation of complex cell morphologies. They are generally accepted as cellular scale models (Dayan & Abbott, 2001; Ermentrout & Terman, 2010), as exemplified in studies of stomatogastric ganglia in which model simulations and experiments have been tightly linked (e.g. Marder & Bucher, 2007; Marder et al., 2014. Neural circuits built from the HH equations are now routinely used in modelling small brain regions (see, e.g. Kopell et al., 2011; Lee ), and the NEURON software allows for simulations using multiple ion channels and heterogeneous cell morphologies; (see Carnevale & Hines, 2006, 2009; De Sousa et al., 2015; Hines & Carnevale, 2015). However, their resistance to mathematical analyses has led to submodels of HH such as integrate-and-fire ODEs, which employ only (5) with a constant leak conductance and replace the spike dynamics with a delta function and reset mechanism, inserted when crosses a threshold (Izhikevich, 2007). This ODE is now linear, but the discontinuous resets make it a hybrid dynamical system that is still difficult to analyse, because solutions must be restarted after each AP and pieced together (der Schaft & Schumacher, 2000; di Bernardo et al., 2008). The HH equations are perhaps the first example of a quantitative, mechanistic model in neuroscience. In the intervening years, experimental and mathematical techniques have grown more specialized and theoretical neuroscience has broadened to ask bigger questions of human (and other animal) brains. Since around 2000, new experimental technologies have started drawing mathematics closer to neuroscience than ever before. In the rest of Section 3, we sample some of the major developments.

3.2. Optical neural imaging and large networks of spiking neurons

Despite their successes in single cells and small networks, these neuronal models and submodels must be assembled into larger networks to represent brain areas capable of complex processing. Even supposing the models and their reductions of Section 3.1 are good, new mathematical challenges arise. Millions of parameters must be chosen and the resulting huge systems of ODEs are again unanalysable without some further model reduction. We first discuss this and then consider the impact of big data on model validation. In rough analogy to fluid mechanics, the roles of atoms or molecules in macroscale behaviour are played by neurons and those of inter-atomic forces by APs and synapses. Despite identifying neurons as the brain’s fundamental building blocks, analogues of the averaging principles that lead, via kinetic theory, to continuum models in physics such as NSE are generally lacking. The ‘force laws’ that yield excitatory or inhibitory postsynaptic potentials are complex, synaptic strengths that change in response to global neurotransmitter release and, over longer timescales, due to pre- and post-synaptic neuronal firing (Kandel et al., 2000) (this “rewiring” is a key component in learning). Current research typically employs either probabilistic methods that approximate population firing rates and higher statistical moments for large spiking networks with noisy inputs (Nykamp & Tranchina, 2000; Haskell or mean field methods derived from statistical physics that reduce spiking neural networks to sets of stochastic differential equations (SDEs) describing firing rates and neurotransmitter release in different sub-populations of cells (as in, e.g. Abbott & van Vreeswijk, 1993; Brunel & Wang, 2001; Fourcaud & Brunel, 2002; Renart et al., 2003; Eckhoff et al., 2011; Deco et al., 2013. More sophisticated theories are needed to supplement these methods: there are few rigorous derivations of equations for cortical circuits or brain areas from large spiking neural networks, although the work of Touboul (2012, 2014a,b) makes steps in this direction. He proves that solutions of sparsely connected networks of SDEs, whose deterministic (drift) terms include those of HH type, converge toward solutions of an integro-differential mean field equation that has the form of the Wilson–Cowan equations (Wilson & Cowan, 1973) in the noise-free limit. In some formal mean field reductions, if timescales of firing rate changes are short in comparison with the slowest neurotransmitter release timescales, the former can be eliminated and the dynamics approximated by SDEs whose state variables represent neurotransmitter levels of each subpopulation (Wong & Wang, 2006). Intriguingly, just such models, now called leaky competing accumulators, were proposed considerably earlier by psychologists for evidence accumulation in perceptual decision making (e.g. Cohen ; Usher & McClelland, 2001), a separate line of work to which we return in Section 3.5. These models, descended from yet earlier cascade models of cognitive processes McClelland, 1979; Rumelhart & McClelland, 1986), have been successful in capturing and explaining both behaviour and electrophysiological data in specific brain areas (see Rorie ; Purcell , 2012 for recent examples). Multi-area networks have also been fitted to such data, but with greater difficulty (e.g. Schwemmer et al., 2015). In spite of these difficulties, simulations of networks containing integrate-and-fire cells have successfully reproduced qualitative dynamics in cortical columns and identified mechanisms that produce oscillations in local field potentials and other phenomena (e.g. Wang, 1999, 2002, 2008, 2010). Indeed, the Blue Brain project (Markram, 2006; Kandel ) proposes to simulate all the cells and most of the synapses in an entire brain, thereby hoping to ‘challenge the foundations of our understanding of intelligence and generate new theories of consciousness.’ Here we envisage a less ambitious engagement of big data with biophysically based models that may strengthen the modeller’s hand. Until recently, researchers have lacked sufficient tools to move beyond qualitative validation of large network models, especially when massive simulations of single cell dynamics are used only to reproduce macroscopic behaviours. A major obstacle to confirming or rejecting such models and the averaging schemes that produce them has been lack of experimentally controlled and simultaneous electrophysiological data from multiple brain areas or even within a single area. However, recent methods based on imaging the responses of chemical and genetically encoded fluorescent reporters specifically address this issue and may finally bring an understanding of these large networks within reach (Deisseroth ; Fenno ; Portugues ). Perhaps the most striking advance is the ability of these new tools to probe neuronal function at fine spatial and temporal resolutions. For example, methods using voltage-sensitive dyes (Ebner & Chen, 1995; Bullen & Saggau, 1999) excite and record neural activity using calcium cages to induce ionic changes. Combined with sophisticated photon-microscopy, these methods allow researchers to excite and record activity at specific 3D locations within a single neuron at kHz rates (Reddy ). These methods often lack capabilities in live animals, and the dynamic relationship between the observed optical response and underlying neuronal activity is difficult to understand or quantify, given the introduction of foreign molecules into the cells. Nevertheless, their advent prompted advances in inverse problems that led to new algorithms for recovering distributions of ion channels, calcium concentrations and other intra-cellular molecular concentrations from these data (Cox, 2006; Burger, 2011; Raol & Cox, 2013). More recent optogenetic methods use light-sensitive proteins to both record and probe neural activity in live animals such as fruit flies and rats (Boyden et al., 2005; Fenno ; Witten ). Insertion of microbial opsin genes in selected cell types enables optical observation and external control of electrical activity via laser illumination and fluorescence. These methods can be applied to wild-type animals and those with genetically engineered sensory, motor or cognitive deficits while they perform natural behaviors in an experimentally controlled environment (Yizhar et al., 2011). In particular, the individual activities of large groups of neurons can now be monitored in a tethered animal performing simple tasks, such as tracking visual stimuli in a virtual reality environment (Portugues et al., 2014; Weir & Dickinson, 2015). Although optical data blurs APs from individual neurons, it could be used to fit ion-channel or integrate-and-fire models of small circuits, much as for data from intra- or extra-cellular recordings, as noted in Section 3.1. Where cell types, ionic currents and synaptic neurotransmitters are known and key cellular and network parameters can be estimated a priori, this approach might be scaled up to or more cells. But optogenetics may also provide potential synergies with the reduction theory sketched above. Fluorescence signals from behaving animals could be spatially averaged over multi-cellular regions and fitted to reduced models. Clustering or nonlinear manifold learning methods might be used to extract correlated subgroups of cells and compute graphs or ‘bases’ analogous to the empirical eigenfunctions of POD (see Bullmore & Sporns, 2009; Klimm et al., 2014; Hermundstad ; Lohse ; Davison for examples of graph theory applied to imaging data). These could in turn provide mathematical structures—linear subspaces and manifolds—on which to create low-dimensional sets of ODEs at circuit or brain-area scales (e.g. Yu ), and perhaps allow their derivation from cellular scale spiking networks in a manner analogous to the models of coherent structure dynamics in fluids described in Section 2.

3.3. Multi-electrode arrays and sorting spikes

Multi-electrode (or multi-tetrode) arrays are technologies that interface brain tissue directly with electronic circuitry in order to achieve higher fidelity and signal to noise ratios (SNRs). These devices may contain multiplexed and amplified electrodes that can both monitor and deliver voltage in specific regions of cortical tissue ( mm) (Taketani & Baudry, 2006). Each electrode records neuronal activity near its terminus and relays it to a receiver which is usually attached to an external controller, and arrays of electrodes can be simultaneously recorded from multiple brain regions, allowing collection of substantial data regarding connectivity among areas that are inaccessible from the cortical surface (Lansink ). Laboratories have developed protocols for using implanted electrodes both in vivo and in vitro Taketani & Baudry, 2006; Lansink ; Viventi ; David-Pur et al., 2014). They enable weeks of recordings from specific brain regions (Nguyen ) and have been used to study performance on various tasks in rats (Neunuebel ; Sauerbrei ; Stratton ), cats (Viventi ) and monkeys (Ecker ). These technologies allow collection of high SNR data from specific brain regions without the need for sophisticated image processing and therefore seem promising for human-machine neural interfaces, in which they are typically used to train artificial neural nets that provide mappings between neural activity and motor outputs (Nicolelis, 2001; Ulbert et al., 2001; Viventi et al., 2011). However, substantial challenges remain if these signals are to be properly understood and used in technological applications. Put simply: what do we make of them? Given high SNR extracellular signals with good spatio-temporal resolution, what exactly do they represent at the neuronal level? If spikes are truly the currency of neuronal function, then the answer depends on solving the difficult problem of spike sorting: the identification of APs and other characteristic neuronal behaviorus (e.g. bursts of APs) and their assignment to specific cells. An electrode senses APs of neurons within ,μm and voltage modulations within ,μm (Goldstein,2009), resulting in each electrode sensing neurons. Moreover, the APs and synaptic effects may differ in both waveform and amplitude. To capitalize on such differences, earlier spike sorting algorithms were based on a combination of linear filters, window discrimination and thresholding (for reviews, see Lewicki, 1998; Rey ). More recently PCA, independent component analysis (ICA) and artificial neural networks have been used (Hermle ), but these methods become computationally intractable as the data scales beyond a few hundred electrodes. In attempts to develop an unsupervised, efficient and accurate algorithm for spike sorting larger multi-electrode array data, current algorithms are incorporating newer mathematical tools such as wavelets, Bayesian inference, expectation–maximization and non-parametric clustering techniques (Mallat, 1999; Pouzat ; Quiroga ; Ott ; Takekawa ; Prentice ; Rey et al., 2015; Franke ; Kadir ). Despite substantial theoretical progress, there is still no widely accepted solution to this problem (Rey ), primarily due to the lack of conclusive methods to validate spike sorting algorithms. Currently, algorithms are shown to exhibit desirable properties (e.g. robustness to noise) and are usually tested against simulated data sets representing ‘ground truth’, thus further highlighting the need for efficient and biologically accurate computational models of spiking neural networks (Prentice ; Franke ; Einevoll ). Here too, mathematical advances are needed.

3.4. Non-invasive neuroimaging and macroscopic theories of the brain

The past two decades have also seen the development of remarkable neuroimaging technologies that have impacts far beyond neuroscience research. Within neuroscience, magnetic resonance imaging (MRI) technologies have developed beyond their medical applications to the point where human brains can be scanned during the performance of behavioural tasks, allowing observation of brain activity without invasive surgery. In functional MRI (fMRI), the method most commonly used by experimentalists, blood-oxygen-level dependent (BOLD) contrast signals are obtained from 1 to 3 mm cubic voxels (3D pixels) of brain tissue, each containing neurons (Huettel ). While 10 Hz sampling rates are possible, the BOLD signal develops over 2–3s, preventing immediate resolution of millisecond scale neuronal dynamics (Huettel ). Despite this, and their relatively coarse spatial resolution, fMRI studies can still comprise nearly half a terabyte of raw data using double precision, representing hours of recordings over voxels (Smith ). These volumes can only be expected to increase as technologies improve spatial and temporal resolution and as meta-analyses confront data sets from multiple independent experiments (Glasser et al., 2013). Massive fMRI data sets must be deconvolved with the slow BOLD response function to determine the faster spatio-temporal neuronal dynamics that give rise to the observed signal. The growth of fMRI has helped motivate the development of mathematical techniques for such inverse problems and, more generally, for image processing. These include fast iterative shrinkage (Beck & Teboulle, 2009; Zhang ), compressed sensing (Lustig ; Candes ; Donoho, 2006; Zhang ), convex optimization (Daducci ; Yin ), variational methods (Aubert & Kornprobst, 2006), nonlocal means denoising (Iftikhar ), regularization (Purdon ; Calamante ), low-rank matrix approximation/constraints (Recht ; Zhao ; Lingala ; Chiew ) and Bayesian modelling/inference (Friston ; Penny ; Woolrich ; Lindquist, 2008). Activity in these areas will surely continue. Two further non-invasive technologies currently in use are magnetoencephalography (MEG) and electroencephalography (EEG). MEG enables much higher Hz sampling rates but at lower spatial resolution () by monitoring changes in magnetic fields caused by neuronal activity. EEG measures electrical activity over the scalp with an array of external electrodes (Lopes da Silva, 2013), also at high Hz sampling rates but does not receive strong signals from regions deeper in the brain, rendering the full inverse problem intractable. These technologies, which predated MRI methods, have experienced recent refinements. For example, EEG and fMRI signals can now be recorded simultaneously, presenting a new class of challenges on how best to infer neuronal activity by combining the spatial discrimination of MRI with the millisecond timescale of EEG (Huster ; Bénar ; Mulert ; Horovitz ; Daunizeau ). While precise relationships between spiking neurons and macroscopic images will remain unknown for some time, the functions of many brain areas and networks of areas are already being inferred. Specific brain regions of size have been associated with arousal (Lang ; Brooks ), attention (Coull & Nobre, 1998), working memory (D’Ardenne ), decision making (Sanfey ; Nieuwenhuis ), taste preferences (McClure ), moral judgments (Greene ; Shenhav & Greene, 2010), language (Fernández ; Hasegawa ), social interaction (Redcay ) and learning (Gershman ; Delazer ). Further mathematical (meta-?) challenges will arise as neuroscientists begin to integrate this ocean of data and scientific findings into a quantitative understanding of brains (let alone entire central and autonomic nervous systems). Researchers are already refining supervised learning algorithms for classifying fMRI/MEG/EEG data (Pereira ; Ryali ), as well as Bayesian frameworks for modelling fMRI data in order to determine underlying neural parameters (Gershman ; Gershman & Blei, 2012; Turner ; Rigoux ). The use of graph theoretic metrics in identifying functional interactions among brain areas was noted in Section 3.2 (Bullmore & Sporns,2009; Klimm ; Hermundstad et al., 2014; Lohse ;Davison ). Others have noted that better statistical tools are needed for meta-analyses, there being no generally-accepted methods for combining imaging data across multiple studies (Kriegeskorte et al., 2009; Cortese ).

3.5. Optimal performance theories

We end Section 3 by highlighting a general theoretical approach that may interact well with the evolving experimental technologies illustrated in Sections 3.2–3.4, and also motivate mathematical developments. Theories based on the concept of optimality have been applied throughout neuroscience, initially in analysing sensory data (e.g. Barlow, 1961a,b; Atick & Redlich, 1990; Bialek cf. Rieke ) and now more generally in normative probabilistic models based on Bayes’ theorem (e.g. Dayan & Abbott, 2001; Dayan, 2012; Solway & Botvinick, 2012). Here we describe a model based on the sequential probability ratio test (SPRT) from statistical decision theory (Wald, 1947) that is used widely by cognitive psychologists and neuroscientists studying perceptual decision making. In the special case of 2-alternative forced-choice (2AFC) perceptual decisions requiring identification of signals obscured by statistically stationary noise, the SPRT is known to be optimal in the sense that it renders a decision of given average accuracy in the shortest possible time (Wald & Wolfowitz, 1948). Over each trial, successive observations of a stimulus are taken and a running product of likelihood ratios computed, or, equivalently, the log likelihoods are summed (Gold & Shadlen, 2002). In the continuum limit, this discrete process becomes a scalar drift-diffusion SDE: in which represents the difference between the accumulated evidences for each alternative and are decision thresholds. The drift and diffusion rates and depend upon the distributions from which stimulus samples are drawn, and the initial data can encode bias or prior evidence. Decision times (DTs) are determined by first passages of sample paths through , between which lies, and unless the signal strength is zero we may assume that , so that correct decisions and errors are modelled by crossing and , respectively. As shown in Bogacz , this SDE also emerges from a pair of competing accumulators in the limiting case in which leak and inhibition are balanced and sufficiently large. However, random walk models and extensions of (7) were proposed by cognitive psychologists to model decision making and memory recall tasks long before the ideas connecting spiking neural networks with accumulators (Section 3.2 above) were developed (Luce, 1986; Ratcliff, 1978). Also see Ratcliff and Ratcliff& Smith (2004) for discussions of extended drift-diffusion models that allow trial-to-trial variability in drift rate and initial data to better fit subject data, albeit with the loss of optimality. Given an optimal theory and associated mathematical model(s), one can design tasks and test the performance of subjects to investigate how closely they can approach optimality. This was done with human participants making 2AFC decisions in blocks of trials of fixed duration with difficulty (SNR ) varying from block to block, in which maximization of reward rate is optimal (Simen ; Bogacz ; Balci ). While a subset of participants approached optimality, a substantial fraction did not, evidently preferring to be accurate and spending too long on their decisions. This fraction reduced after multiple training sessions. As described in Holmes & Cohen (2014), these results raised numerous questions on the reasons for deviating from optimality that have led to further experiments and modelling. For example, reward rate (the ratio of proportion of correct responses to averaged reaction time) was chosen as the underlying utility function. Optimal performance requires resetting of thresholds for each new block of trials when the SNR changes, but reward rate excludes the costs of cognitive control exerted in gauging stimulus difficulty and threshold resetting. More generally, optimal theories represent an ideal towards which evolution can be expected to drive animals, and in this context they can provide guiding principles for modelling more complex cognitive behaviours and neural processing than those involved in 2AFC tasks. The studies of neural coding in sensory processing, noted above and discussed briefly in Section 4, have followed this route. Such mathematically motivated theories already guide experimentalists’ choices for future data collection. However, even in simple tasks like 2AFC, selection of appropriate utility functions in experimental designs is not straightforward; in more complex behaviours, poorly chosen functions can produce seriously misleading results.

4. How might theoretical neuroscience move mathematics?

Hopefully, the examples sketched above convey the excitement and energy now pervading neuroscience and the synergies that are emerging with applied mathematics. Many, if not all, of these advances come from data acquired using new technologies. In contrast, mathematicians study the world through models arising from a variety of motivations and intuitions (Gowers, 2000). The models throughout this article range from mechanistic, including the physics of ion channels and APs in the case of HH, to empirical or descriptive in the drift diffusion equation, where behavioural data derives from abstract parameters () (Holmes, 2014). The NSEs are widely accepted as a mechanistic model of fluid flow, but it seems unlikely that diverse data types from a multi-scale, heterogeneous, interconnected brain (see Section 3) will be fruitfully restricted to differential equations. Already in neuroscience, we have competing arrays of models, mechanistic and empirical, that permeate the spatial and temporal scales. In some cases empirical models can be (at least formally) derived from mechanistic ones (see Section 3.2), and larger structures are beginning to emerge. These relationships among models, their subjects and the mathematics enabling their comprehension are complicated (Gowers, 2000), and below we highlight some examples of surprising connections among both mechanistic and empirical models, and their applications. Mathematics can reveal deeper insights and help verify the logical consistency of these models, which in turn will strengthen the connections between the best models and the most complete data. Importantly, the expanding data reflect undiscovered underlying processes, as opposed to redundant content. Florian Engert (2014) has recently observed a distinction between big data, exemplified by raw electron micrographs or optical image pixels, and the ‘complex data’ of a connectome or an ‘activitome’ (voltage traces of many single cells) that can be derived from them. He notes that while the former may be PB, the latter is more likely GB: a substantial reduction but one whose complexity must be recognized and respected in modelling. It is this increase, not so much in the number of bytes, but in their complexity, that excites us and which, we believe, will lead applied mathematics in new directions that may reveal some of the brain’s mechanisms. Such openly accessible data are already available. The Human Connectome Project has neuroimages of over 900 subjects, including MRI and MEG recordings (Marcus et al., 2011; Essen et al., 2013).The Allen Brain Atlas contains neural connectivity, genomic and high resolution anatomical data for rodents, primates and humans (Allen Institute for Brain Science, 2015a). The Brain Genomics Superstruct Project contains neuroimaging, behaviour, and personality data for over 1500 human participants (Holmes ). Other international collaborations such as ENIGMA and IMAGEN involve dozens of research groups sharing genomic and neuroimaging data in studies of brain structure, function and health Schumann ; Thompson ). We now ask what is a good mid-century goal for mathematical neuroscience? Should we attempt to model single brain areas, networks of such areas or the entire brain, as proposed in the 10-year Human Brain Project (Markram, 2012; European Commission, 2014)? Building a mammalian brain from the cellular scale seems premature, given that good models of individual areas are still lacking (Sample, 2014; Trader, 2014). Nonetheless, research groups continue to propose models that seem to fit data and reproduce qualitative dynamics observed over spatiotemporal scales from intracellular recordings to whole brain images (Daw ; Schwemmer ; Wang, 1999, 2008, 2010; Turk-Browne, 2013; Park & Friston, 2013). Regardless of how these goals are approached, as more data accumulate, we expect many models to be invalidated or heavily modified. Most brain areas require improved models and analytical methods for tests against data, and new experimental technologies need more powerful statistical and mathematical tools for data processing and analyses (Poldrack & Poline, 2015; Keller ; Stelzer, 2015). As better models evolve, mathematical analyses will play central roles in understanding and leveraging them. While most models should engage seriously with data, we believe that abstract theories will remain valuable, for they can investigate potential roles of neural circuits and architectures in cognitive processes and suggest experimental designs, as in the case of 2AFC tasks of Section 3.5. The sequence of proposals, counterproposals, comparisons with data and model simulations of neural (spike-train) coding of visual scenes, beginning with and returning to Barlow’s work (Barlow, 1961a,b, 2001), is another instructive example (e.g. Atick & Redlich, 1990; Bialek ; Baddeley ; Simoncelli, 2003). It also illustrates that neuroscience can motivate new mathematical developments, here in information theory, which was originally created to solve problems in telecommunications (Shannon & Weaver, 1949). Similarly, abstract formalizations of the hierarchy of computations performed in the visual cortex have been proposed (Smale ). Critical phenomena in percolation theory provides a further example, having connections with the Ising model of statistical physics, which can in turn be used to analyse pairwise neural interactions (Grimmett, 1999; Bollobas & Riordan, 2006; Schneidman ; Roudi ). As mathematicians and neuroscientists collaborate more deeply, problems arising from brain dynamics and insights into them will likely reveal similar links and motivate models suitable for mathematical study per se, much as the gravitational field equations of general relativity continue to present challenging problems and yield new results in PDEs. Studies of spatial pattern formation in biology and ecology (Okubo & Levin, 2001; Murray, 2001, Vol. II), beginning with those of Fisher (1937) in genetics and Turing (1952) on morphogenesis, have already motivated such work on reaction-diffusion equations. The liaison of mathematics and biology has substantially strengthened over the last century (cf. Thompson, 1917), and given the volumes of data and size of the research community, progress in neuroscience may accelerate. Regardless of its potential benefits to mathematics, neuroscience undeniably needs help from mathematicians, and despite its embryonic state, theoretical neuroscientists have already imported an array of tools and methods representative of applied mathematics as a whole. In addition to the examples highlighted in Section 3 involving ODEs, PDEs, statistical mechanics, stochastic processes, graph theory and statistics, other imports include results from algebra (Golubitsky ) and scientific computation (Brette ). The modular structure of the brain and its extreme heterogeneity and interconnectivity will continue to demand insights from diverse branches of mathematics and may thus create deeper links than those of continuum mechanics and turbulence studies with which our story began. Speculations have already begun on how neuroscience as a whole will respond to growing amounts of data (Poldrack, 2011). Our examples suggest that fundamental progress will demand creative partnerships between experimentalists and theoreticians, working in teams with distinct expertise. Data collection and analyses continue to dominate the field, but many laboratories now have applied mathematicians in house or are collaborating with theory groups to build such teams. Experimentalists increasingly agree that theory is essential not only in providing methods for data compression and analysis but also in interpreting and understanding that data. Collaborations take time to develop (as much as a year or two of weekly laboratory meetings, in our experience), but as they do, we expect that waves of complex data, at finer and faster scales, will create many new models and theories. Those that survive further injections of data could help guide future experiments, especially as projects increase in cost. More generally, we expect to find mathematics PhDs and mathematically trained researchers working within data-driven fields throughout the biological sciences, no longer regarded as doing mathematical or computational biology or biomathematics, but as fellow biologists. A substantial fraction will have their own experimental laboratories. As more data falsify plausible models, we hope that experimentalists, in a complementary manner, will embrace the surviving ones and appreciate the mathematics that made them possible. In this optimistic future, mathematics will pervade neuroscience as much as biology and will help elucidate brain function at deeper levels than before.

Funding

Research reported in this publication was supported by the National Institute of Neurological Disorders and Stroke of the National Institutes of Health under Award U01NS090514. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

149 in total

1. Symmetry in locomotor central pattern generators and animal gaits.

Authors: M Golubitsky; I Stewart; P L Buono; J J Collins
Journal: Nature Date: 1999-10-14 Impact factor: 49.962

2. Learning complex arithmetic--an fMRI study.

Authors: M Delazer; F Domahs; L Bartha; C Brenneis; A Lochy; T Trieb; T Benke
Journal: Brain Res Cogn Brain Res Date: 2003-12

3. A comparison of sequential sampling models for two-choice reaction time.

Authors: Roger Ratcliff; Philip L Smith
Journal: Psychol Rev Date: 2004-04 Impact factor: 8.934

4. Weak pairwise correlations imply strongly correlated network states in a neural population.

Authors: Elad Schneidman; Michael J Berry; Ronen Segev; William Bialek
Journal: Nature Date: 2006-04-09 Impact factor: 49.962

Review 5. The blue brain project.

Authors: Henry Markram
Journal: Nat Rev Neurosci Date: 2006-02 Impact factor: 34.870

6. Bayesian model selection for group studies - revisited.

Authors: L Rigoux; K E Stephan; K J Friston; J Daunizeau
Journal: Neuroimage Date: 2013-09-07 Impact factor: 6.556

7. Light-sheet fluorescence microscopy for quantitative biology.

Authors: Ernst H K Stelzer
Journal: Nat Methods Date: 2015-01 Impact factor: 28.547

8. Informatics and data mining tools and strategies for the human connectome project.

Authors: Daniel S Marcus; John Harwell; Timothy Olsen; Michael Hodge; Matthew F Glasser; Fred Prior; Mark Jenkinson; Timothy Laumann; Sandra W Curtiss; David C Van Essen
Journal: Front Neuroinform Date: 2011-06-27 Impact factor: 4.081

9. Flexible, foldable, actively multiplexed, high-density electrode array for mapping brain activity in vivo.

Authors: Jonathan Viventi; Dae-Hyeong Kim; Leif Vigeland; Eric S Frechette; Justin A Blanco; Yun-Soung Kim; Andrew E Avrin; Vineet R Tiruvadi; Suk-Won Hwang; Ann C Vanleer; Drausin F Wulsin; Kathryn Davis; Casey E Gelber; Larry Palmer; Jan Van der Spiegel; Jian Wu; Jianliang Xiao; Yonggang Huang; Diego Contreras; John A Rogers; Brian Litt
Journal: Nat Neurosci Date: 2011-11-13 Impact factor: 24.884

10. Action potential waveform variability limits multi-unit separation in freely behaving rats.

Authors: Peter Stratton; Allen Cheung; Janet Wiles; Eugene Kiyatkin; Pankaj Sah; François Windels
Journal: PLoS One Date: 2012-06-12 Impact factor: 3.240

1 in total

Review 1. Affective neuroimaging in generalized anxiety disorder: an integrated review.

Authors: Gregory A Fonzo; Amit Etkin
Journal: Dialogues Clin Neurosci Date: 2017-06 Impact factor: 5.986

1 in total