Big data, or “large-volume, complex, growing data sets with multiple, autonomous sources” [1], in combination with new data analytics, are challenging epistemologies across the natural sciences, social sciences and humanities. In so doing, they are “engendering paradigm shifts across multiple disciplines” [2]. Big data and new data analytics therefore represent disruptive innovations which are “in many instances [reconfiguring] how research is conducted,” and which therefore require critical reflection “within the academy of the epistemological implications of [an] unfolding data revolution” [2]. According to probabilistic innovation theory, incorporating the potential of big data analytics in theory development can improve the probability of achieving research breakthroughs [3]. Such probabilities may be increasingly more likely due to the emergence of research communities applying big data opportunities to scientific research [4]. An example of this is the emergent biocuration community. Biology, “like most scientific disciplines, is in an era of accelerated information accrual” [4]. Large-scale sequencing centres, high-throughput analytical facilities and individual laboratories generate exponentially increasing volumes of data related to nucleotide and protein sequences, protein crystal structures, gene-expression measurements, protein and genetic interactions, and phenotype studies. By 2008 over 18 million articles in this area were already indexed in PubMed, with nucleotide sequences for over 260 000 organisms submitted to GenBank. These developments, as well as projects sequencing thousands of human genomes to uncover DNA polymorphisms however represent just the “tip of the data iceberg [4]. Other core fields ‘underlying,’ or providing inputs into, biomedical research, such as genomics and DNA sequencing, are also experiencing decreasing data costs, as “the breakneck pace of genome-technology development has revolutionised bioscience research” [5]. In fact, the rate at which genome-technology is advancing has actually outstripped the predictions of Moore’s Law as it relates to technology [5]. However, despite these developments, scientific research, and in particular contemporary pharmaceutical innovation faces increasing criticism in terms of its failure to take full advantage of its R & D processes [6], facing what some have described as a threshold constraint to innovativeness due to outdated R & D models [7].In light of the disjuncture between exponentially increasing data availability enabled by big data analytics and the lack of a commensurate increase in medical breakthroughs, particularly in aging research, this paper seeks to make a theoretical contribution to the literature relating big data to aging research. Whereas, previous research has typically adopted a ‘silo-based’ approach, it is argued here that new potentialities offered by big data analytics allow for a novel approach in that linking of data across fields can provide opportunities for a novel theory development, in the form of holistic theoretical assemblage [1], or theory development which applies a synthesis of induction, deductive theory development, and abduction to better leverage big data opportunities to link research across sub-fields. Whereas, some have argued that the big data era heralds the ‘death of theory’ as comprehensive data coverage unearths causal relationships between phenomena, others have argued theory cannot be supplanted, and that theory development will remain essential to scientific advances. With a special focus on Kitchin’s notion of holistic theoretical assemblage [1], this paper seeks to relate these debates to their implications for biomedical and social scientific research in general, and more specifically for aging research.The objective of this critical review paper is to present an argument that although a new paradigm of data collection and analysis may have emerged in the wake of big data analytics, key to understanding how these developments can be applied to aging research is knowledge of how near real time large-scale knowledge creation can result. Such near real time knowledge creation can arise from the application of big data analytics to the scientific research process. In other words, exponential increases in availability of data enabled by big data can also dramatically reduce the time taken to solve research problems, as more puzzle pieces link research areas and enable holistic theoretical assemblage [1].Grounded in knowledge management theory, large-scale knowledge creation is taken here to refer to the solving of ‘front-line’ research problems though effective interactions and collaborations between large numbers of problem solvers, a process made possible by recent developments in big data analytics. An example of large-scale knowledge creation can be found in the achievements of the open source software development movement (and its communities), which have for example developed the Linux [8] and Android [9] systems. These dynamics can also be found in the emergence of communities which are increasingly enabling open innovation systems in general, and in the patterns or theoretical regularities that are common to open source software development and to open source collaborations in biomedical research. Building on these developments, this paper seeks to make a contribution to the aging research literature in the following ways.First, big data is introduced in terms of its characteristics, such as its exponential growth, and its new paradigm of data availability which offers important new opportunities for aging research. To illustrate these opportunities, examples are drawn from recent advances in biology and similar areas of enquiry, which are then juxtaposed against other examples of how open source research problem solving have given rise to the likes of Android and Linux. These open source software communities are likened to emerging communities of researchers involved in biocuration [4] and genomics research communities, and what seems to be a rapidly unfolding global system of increasing collaboration is linked to the emergence of big data analytics. An argument is offered that transformation of big data into big knowledge requires a stream of literature explicitly focused to this end, and the probabilistic innovation literature is identified as a potential candidate for this, given trends in the literature.Second, in order to ground observations relating to the emergence of interlinked open research communities, and the opportunities offered by exponentially increasing data availability, big data is related to theory, and theoretical frameworks which seek to make sense of these relationships. Theory which offers implications specifically for aging research is considered. The tension between arguments which frame big data developments as potentially offering the ‘death of theory’ are related to a novel perspective of how theory can usefully apply certain principles of inductive data-based inquiry [1], in the form of holistic theoretical assemblage. An example drawing from the application of big data to ecosystems theory is used to suggest how big data can be usefully applied to biomedical aging research to capture opportunities associated with holistic theoretical assemblage.Third, with the understanding that biomedical research is currently but a portion of total research into aging, discussions of potential contributions of big data are extended to social science applications. The emerging stream of big data related to the ‘quantified self’ is considered as a logical extension to biomedical examples, and its potential to link social and biomedical aging research areas is discussed.The arguments made in this paper are considered important, as the exponential increases in data availability enabled by big data stand in stark contrast to a lack of commensurate breakthroughs in scientific research. According to probabilistic innovation theory [3], a probabilistic relationship should exist between the rise in big data and its transmission to increases in the rate of scientific breakthroughs. Probabilistic innovation theory therefore provides an explicit focus for literature seeking to understand the boundary conditions to these predictions.Theory which helps to understand how big data can be transformed and can transmit to ‘big knowledge’ is therefore important. Big data analytics, increasingly enabled by emergent technologies and methodologies such as crowdsourcing, crowdsourced research and development (R & D) [3], as well as other capabilities resulting from big data advances themselves [1], provide hitherto perhaps unimagined opportunities for scientific advancement.Crowdsourcing, for example, when applied to solve research problems in medical sciences has long demonstrated proof of concept, with well-documented successes in medical data collection [10], data analysis [11] and medical disaster management [12]. Social media advances have been found to usefully complement expert crowdsourcing as a tool to support maximised collaboration between researchers. Mobile applications such as ‘DocCHIRP’ have been designed to crowdsource medical expertise and support near real time problem solving [11], and other social media applications have demonstrated their usefulness in disaster management contexts [13]. These methodologies, together with exponential increases in data availability offer useful opportunities for increased collaborations between scientific researchers in support of research problem solving.What also makes a focus on harnessing big data in support of aging research important is the possibility that further paradigmatic change might also be underway in biomedicine. The ‘low hanging fruit,’ or more easily accessible breakthroughs of chemistry-based biomedical research are now increasingly scarce, as the ‘chemistry-based’ era has perhaps given way to the large molecule era of protein-based research [14]. Not only are protein-based biomedical research products radically more complex, but these compounds can be unstable and sensitive to external conditions [15]. Given the almost infinite potential combinations and permutations of proteomic proteins, the future of biomedicine, and aging research, might be increasingly linked to protein-based biomedical research. Such complexity, however, is uniquely suited to big data analytics. This paper therefore seeks to offer those engaged in aging research a perspective of relevant aspects of big data analytics and certain opportunities it may offer.The paper proceeds as follows. First, definitional understandings of big data are discussed. Next, certain theoretical implications of big data for aging research are considered, and holistic theoretical assemblage is discussed in relation to certain specific research examples. Issues around the development of an ethical framework for big data applications to aging research are explored. Examples of the application of big data to aging research are then offered. Finally, the paper concludes with consideration of Popperian theory in relation to the tension between deductive and inductive approaches. The importance of transdisciplinary insights for aging research is now considered.
AGING RESEARCH AND BIG DATA OPPORTUNTIES
Aging, comprehensively defined here as the “universal, intrinsic, genetically-controlled, evolutionary-conserved and time-dependent intricate biological process characterised by the cumulative decline in the physiological functions and their coordination in an organism after the attainment of adulthood resulting in the imbalance of neurological, immunological and metabolic functions of the body” [16], is increasingly the subject of cross-disciplinary research, given the complexity of the aging process. Indeed, the aging research field “is large and dispersed” making it important to identify and highlight topics of major importance in this area [17]. It has long been recognised that complexities of aging and other biomedical processes in general require transdisciplinary theory development [18], and the advent of the big data era offers important opportunities for radical improvements in the effectiveness of transdisciplinary research [19]. Developments in big data analytics considered here are therefore taken to suggest important implications for biomedical research in general and research into aging in particular.A consensual definition of big data, according to Hilbert, is that “Big Data represents the Information assets characterised by such a High Volume, Velocity and Variety to require specific technology and Analytical Methods for its transformation into Value” [19]. However, according to Hilbert, the primary obstacle to the effective application of big data is lack of human understanding of how to apply analytics to achieve outcomes. Implicit in this definition is an acknowledgement of the role theory development plays in understanding how human behaviour contributes to the research process itself, through collaborations and other behaviours. These collaborations and researcher behaviours are therefore key to transformation of big data opportunities to achieve biomedical research outputs. Ultimately, increased biomedical research breakthroughs may match the increases in data availability resulting from dramatic advances in big data analytics.Big data is therefore essentially (i) high in volume; (ii) high in velocity, as it is created in real time or ‘near’ real time; (iii) diverse in its variety (structured and unstructured in its nature); (iv) exhaustive in scope, as it seeks to take into account entire populations or systems; (v) high in resolution and indexical; (vi) highly relational, as common fields enable conjoining of data sets; and (vii) highly flexible in that new fields can be added (extensionality) and size of data collection can be increased rapidly (scalability) [1]. Exponentially increasing data inputs from different sources, if effectively utilised, offer a new paradigm in exploratory science, as the harnessing of data-intensive statistical exploration and data mining opportunities has the potential to fill gaps in information and knowledge between areas of enquiry. Given the complexity of aging research, and its interlinkages with other fields of medical research, big data can offer useful insights into how knowledge of interlinkages can be improved. An important scientific field providing data and theory for aging research to use and build on is biology.Biology, “like most scientific disciplines, is in an era of accelerated information accrual,” as large-scale “sequencing centres, high-throughput analytical facilities and individual laboratories produce vast amounts of data such as nucleotide and protein sequences, protein crystal structures, gene-expression measurements, protein and genetic interactions and phenotype studies;” by 2008, over 18 million articles were already indexed in PubMed, with nucleotide sequences for over 260 000 organisms submitted to GenBank [4]. Theory development that seeks to enable the transmission of this increased data availability to knowledge that can increase the likelihood of commensurate biomedical breakthroughs is therefore increasingly important.Further, projects sequencing thousands of human genomes to uncover DNA polymorphisms may be just the “tip of the data iceberg” [4]. By 2008 a point was reached where about 750 000 individuals with separate IP addresses visited over 20 million pages in one month, which highlights the scale of activity in this area [4]. Web-based research communities are now a common feature of this research landscape. One cannot help but draw comparisons between these and the communities of open source software development which emerged at the dawn of the software era, that were able to solve research problems associated with building systems like Android and Linux. Theory development focused on the spread of open-source community research problem solving to human genome sequencing, for example, offers useful opportunities for researchers in the field of aging research, especially in light of emerging data capabilities offered by big data analytics.Insights into how big data can be transformed into big knowledge may be found in studying the work of researchers who are forming open communities to tackle issues such as challenges associated with biocuration. The following quote highlights these challenges.The exponential growth in biological data means that revolutionary measures are needed for data management, analysis and accessibility. Online databases have become important avenues for publishing biological data. Biocuration, the activity of organizing, representing and making biological information accessible to both humans and computers, has become an essential part of biological discovery and biomedical research. But curation increasingly lags behind data generation in funding, development and recognition. When all the data produced are curated to a high standard and made accessible as soon as they become available, biological research will be conducted in a manner that is quite unlike the way it is done now. Researchers will be able to process massive amounts of complex data much more quickly. They will garner insight about the areas of their interest rapidly with the help of inference programs. Digesting information and generating hypotheses at the computer screen will be so much faster…this increased specificity will cause an exponential growth in knowledge, much as we are experiencing exponential growth in data today [4].This ‘exponential growth in knowledge,’ if it ultimately comes to reflect the ‘exponential growth in data today’ undoubtedly promises a new era of breakthroughs in biomedical research in general, and aging research in particular. However, what is necessary is a focus on theory development, which can draw insights from different fields and apply these insights to take advantage of these new opportunities. Part of the solution might lie in how incentives are created.Examples of how communities of contributors (some of over 300) have come together to annotate genomes include the Daphnia Genomics Consortium () collaboration wiki, and the International Glossina Genome Initiative; and incentives for researchers “to curate data should include new information or insight for their research interests, improvement in academic reputation or impact, career advancement and better funding chances” [4].There are perhaps a host of measures that institutions can take to help incentivise the development of such communities, as large volumes of data require large scale engagement in order to take advantage of opportunities. Ultimately, research problem solving utilising big data analytics and large-scale collaboration may result in biomedical research problem solving in real time, or in ‘near real time.’ Problems such as disease outbreaks (for instance, Ebola) which can spread rapidly, or rapidly increasing microbial resistance, could benefit from the development of real time research capacity. Probabilistic innovation theory [3] is concerned with theory development specifically focused on real time or near real time problem solving. This stream of literature seeks explicitly to understand how to maximise the probability of solving complex research problems through maximally increasing data and problem solving inputs, and enabling interactions. Probabilistic innovation literature is therefore an example of a stream of literature which seeks to draw together theory specifically for the purpose of ultimately attaining real time research capacity, or the capacity to solve complex scientific research problems in real time, or as close to real time as ultimately possible. According to this perspective, the potential of big data for big knowledge creation, and therefore biomedical research, is yet to be fully realised.
THEORETICAL IMPLICATIONS OF BIG DATA FOR AGING RESEARCH
Aging and its life effects are pervasive, and a review of this wide-ranging literature brings home the point that advances in aging research problem solving may have implications beyond the physical, or biomedical. These implications can be important in many spheres of human life. Indeed, few research fields may match this impact. The promise of big data, however, requires considered interrogation, and its benefits to fields such as aging research need to be made explicit. Key to these changes are methodological implications related to theory development. One dimension of changes heralded by the advent of big data is reflected in debates relating to its methodological implications for theory. According to Kitchin:There is a powerful and attractive set of ideas at work in the empiricist epistemology that runs counter to the deductive approach that is hegemonic within modern science: Big Data can capture a whole domain and provide full resolution; there is no need for a priori theory, models or hypotheses; through the application of agnostic data analytics the data can speak for themselves free of human bias or framing, and any patterns and relationships within Big Data are inherently meaningful and truthful; meaning transcends context or domain-specific knowledge, thus can be interpreted by anyone who can decode a statistic or data visualization. These work together to suggest that a new mode of science is being created, one in which the modus operandi is purely inductive in nature [1].Kitchin also argues, however, that the purely inductive rationale is based on “fallacious thinking” as big data (i) is nevertheless still shaped by technology, data ontologies, regulatory influences and platforms used; (ii) still offers oligoptic views of the world, which are not free from vantage point perspectives; (iii) is not entirely neutrally abstracted from the world, as algorithm design and inductive pattern identification is “discursively framed by previous findings, theories, and training,”; and (iv) is based on correlations and patterns which can also be random and not causal, posing serious risks of ecological fallacy for data dredging [1]. Theory is therefore still central to applications of big data to scientific research, including to aging research topic areas.According Kitchen, “the idea that data can speak for themselves suggests that anyone with a reasonable understanding of statistics should be able to interpret them without context or domain-specific knowledge” which is a “conceit voiced by some data and computer scientists and other scientists…all of whom have become active in practicing social science and humanities research.” This produces reductionist and functionalist analysis which “ignores the effects of culture, politics, policy, governance and capital” [1]. Kitchin stresses that this is also a problem in biomedical research, as bioinformaticians gain sway in the biological sciences. Knowledge of these debates is relevant for aging research, as availability of hitherto unimagined volumes of biological and scientific data force reconceptualisation of assumptions underlying research processes, and also the human factors involved in this process.Kitchin attempts to reconcile these two perspectives, of inductive versus deductive approaches, and offers the idea of ‘data-driven science’ which “seeks to hold to the tenets of the scientific method, but is more open to using a hybrid combination of abductive, inductive and deductive approaches to advance the understanding of a phenomenon” where inductive processes are situated and contextualised within a “highly evolved theoretical domain” and the epistemological strategy applies “guided knowledge discovery techniques to identify potential questions (hypotheses) worthy of further examination and testing” [1]. Understanding the impact of big data on theorising is therefore particularly important for those seeking to apply big data to scientific research. The notion of ‘abduction’ is particularly salient in this context.Abduction as a mode of logical inference and reasoning “seeks a conclusion that makes reasonable and logical sense, but is not definitive in its claim” such as the identification of an approach to data collection that makes logical sense about what is already known, and by incorporating these techniques, ‘data-driven science,’ according to Kitchin can incorporate big data into the traditional scientific method, providing a “new way in which to build theory,” which in turn offers researchers a new epistemological paradigm [1]. Given the failure (to date) of big data to deliver high volumes of ‘game changing’ breakthroughs in biomedical research, the constant search for new developments to strengthen the theory development process is increasingly important, as exponentially increasing data resources may be going untapped.This ‘new epistemological paradigm’ is differentiated from traditional deductive ‘knowledge driven science’ which is best suited to conditions of scarce data and weak computation but has weaknesses in its ability to foster interconnectivity and interdisciplinary research as well as an inability to capitalise on exponentially increased data availability offered by big data [1]. Data driven science can therefore offer societally important research problem solving potentialities that are hitherto unimagined. The potential of data driven logics for aging research are now considered in relation to how these methods are now being applied to solving complex (and seemingly intractable) problems in climate research, or research into environmental systems. By paying attention to these big data applications in other areas of science, it is possible to draw useful insights into how these can be applied to aging research.There is a growing literature on biogerontological and aging research related to antiaging therapies, such as intracellular and extracellular junk buildup, telomeres, immunology, therapeutic delivery, stem cells, and tissue engineering, as specific topics of research [20]. In this literature, certain themes stand out. First, according to Swan, the strongest overall theme in the aging literature seems to be the “complexity and systemic nature” of scientific understandings of aging, as mechanisms reversing aging in lower organisms, such as knockout genes or caloric restriction, are more difficult to apply in higher mammals such as humans. Second, debates extend to the role of inflammation and plaques, either as protection against aging processes or as cause, and key to increasing knowledge of these overarching themes is the centrality of multidisciplinarity and integration of life sciences and technology research. Big data analytics offers opportunities to link theory development to comprehensive data accumulation, and to ultimately answer such questions. Thirdly, a dual focus has emerged on the physical state of individuals as well as cognitive function, and fourthly, increasing attention is being paid to the use of genomic data and its use in the evaluation of aging pathologies [20]. A review of these trends reflects a need for transdisciplinary theory development, and big data methodologies may offer useful insights for those researching in these areas in particular. Having outlined certain broad trends in aging research, more specific examples of aging research areas are now considered, with specific reference to holistic theoretical assemblage, and an example from environmental research is used to illustrate potential opportunities for more practical big data applications.
METHODS AND MATERIALS
This article follows the structure of a theoretical conceptual review paper. As such, the following sections provide a critical review and a synthesis of literature, with a view to providing useful insights for practice and theory.
RESULTS
Having introduced the purpose of the research, and having outlined its theoretical significance, the results of the critical analysis are discussed as follows. First, a synthesis is provided, which relates probabilistic innovation theory to practice in aging research. Next, analysis results are reported for different areas related to the need for further theory development in contemporary aging research.
Synthesis: Relating Probabilistic Innovation Theory to Practice in Aging Research
Big data has the potential to identify and provide insights about connection points between different spheres of research. An example of this is environmental research, where big data can provide informational linkages between different spheres such as atmosphere, biosphere, hydrosphere, lithosphere and pedosphere, providing opportunities for holistic theoretical assemblage [1]. These relationships are shown in Fig. (. In Fig. (, drawing on these logics, the same analogy is applied to certain ‘spheres’ of aging research. Although these areas are briefly discussed in sections which follow, it is not suggested here that they necessarily relate to each other in the same way as the environmental ‘spheres,’ but these areas are nonetheless used as examples of the principle of holistic theoretical assemblage, enabled as they are by big data opportunities. Given that systems effects can be complex, big data offers the opportunity for mapping relationships and providing descriptive analyses of system effects.
Fig. (1)
Environmental spheres and connection points linked by big data: Holistic theoretical assemblage.
Dietary restriction, inflammation, stress resistance, homeostasis, proteasome activity, sarcopenia and neurological degeneration have featured as long-standing primary areas of aging research, but increasingly, new areas of discovery are opening up [20]. Arguably, the potential of rapid developments in the technologies that in turn enable big data analysis remain untapped, and important potential implications for the ageing research literature may yet to be identified. A host of more practical examples of aging research opportunities are now briefly considered, to outline certain broad topic areas which might benefit from applications based on holistic theoretical assemblage.In elimination of accumulated intracellular and extracellular wastes, for instance in macular degeneration, atherosclerosis and Alzheimer’s disease, researchers face challenges related to knowledge of how to bind small molecules [20, 21]. In seeking knowledge of extracellular material rejuvenation, ongoing work seeks to apply computational physics and chemistry to discover theozymes, or theoretical enzymes with specific binding sites and other properties, as well as to generate of portfolios of new proteins and enzymes [20, 22]. Another important area of current research concerns whether genes exist which accelerate aging, with the implication that aging may be a function of evolutionary intent; although mainstream biogerontology has considered aging to be related to evolutionary neglect rather than intent, further research in this area is of growing importance [23]. These developments present examples of the need to build connection points between different areas of research. Big data can arguably provide this process of holistic theoretical assemblage [1], and in this way may ultimately allow convergence of different bodies of theory in terms of their focus on the phenomena of aging. Although development of a ‘general theory’ which solves most research problems may forever be beyond the reach of aging researchers due to the complexity of the field, convergence and holistic theoretical assemblage needs only to set the stage for more effective theory development, to be able to make a contribution.Arguably, a synthesis of knowledge based on big data can relate many disparate areas of research, and bridge certain divides, or challenges to integration posed by the different ‘languages’ of chemistry, biology, medicine, bioinformatics and others which big data analytics can now potentially link theoretically. Advances are also being made in reprogramming ageing processes, or reversing, managing or reprogramming these mechanisms [20]. Synthesis of programmable integrases is showing promise, building on other work to date on other techniques such as homologous recombination genome modification based on zinc finger proteins and serine recombinases reprogramming [20, 24]. Further work on oxidation reduction, and anti-senescence drug development is producing products ultimately due for clinical trials [20, 25]. MicroRNA research is offering increased insights into ageing-related gene expression, the dynamic systems of ageing, targets of selective deletion, as well as knowledge of their complex posttranscriptional regulation mechanisms [26].These examples highlight the wide range of fronts on which aging research is engaged. Expert crowdsourced R & D [3] can also be used to radically ‘populate’ the problem spaces described here. According to the tenets of probabilistic innovation, if sufficient resources and problem solving inputs of researchers is mobilised in pursuit of the solving of a knowledge problem, the probability of solving the problem will increase in some proportion to these investments. Arguably, the opportunity also exists to radically increase problem solving capabilities offered by big data analytics in step with the developing technologies which are increasingly enabling larger numbers of research problem solving collaborators to come together in virtual space.Stem cell research is also increasingly important in the quest for rejuvenation therapies [20], and there are a host of further examples such as these of areas in which holistic theoretical assemblage can be used to link research streams. Through applications of innovation theory derived from other contexts, such as those related to disaster management, and their focus on real time research problem solving, opportunities for big data application to research can be more clearly identified, and researchers might become more sensitised to lost big data opportunities. Biocuration movements might make an important contribution to this end, as emergent research communities may allow for a more effective coverage of problem spaces. Linkages between research areas can be further enabled through deeper engagement with crossover fields such as comparative biology.Comparative biology is an important area of research, as studies using naked mole rats and other animal testing provide a platform for theory development in human ageing research [27]. The building of reference databases related to animal experimentation offers useful insights into normal parameters, for example regarding lung volumes and capacities and infection incidence [28]. Work on endocrine-related effects of ageing, for example, relies heavily on comparative biology, as does work on environmental influences [29]. Comparative biology also offers useful insights into telomere research [30], an increasingly promising area of discovery. Holistic theoretical assemblage enabled by big data analytics can link aging research areas to topic areas across the sciences. Big data offers the potential to disrupt constraints to more effective and efficient research.Big data is therefore a disruptive innovation which presents the possibility of a new approach to science, with the potential to spur debate between different perspectives. Such debates can have different epistemological implications, such as bridging the divide between empiricism, associated with the notion that data can convey meaning without theory and data-driven science, which radically “modifies the existing scientific method by blending aspects of abduction, induction and deduction” [1], but which is inherently premised on theory, and theory-development. Such an approach might usefully extend advantages offered by holistic theoretical assemblage, and provide an even stronger methodological basis by providing multiple perspectives of phenomena, thus increasing discriminant and convergent validity [31].The examples offered above suggest the potential of holistic theoretical assemblage, and serve to highlight the importance of theory development in support of linking knowledge across fields, using the exponential volumes of big data now at our disposal. However, given the scale of such a project, it might be difficult for researchers to focus both on specialist areas and in developing theory related to the synthesis of these areas. Arguably, a service field is needed, one which might draw together theory development and literature related to the enablement of big data in support of holistic theoretical assemblage.
The Need for a Service Field
Arguably, the probability of scientific breakthroughs can be radically increased through better understanding of the human factor in research collaborations and in collective research problem solving in real time [32]. Big Data opportunities can be captured in aging research, but it is perhaps the human factor relating to tacit knowledge and the need for more effective collaborative high level research problem solving that is the ‘bottleneck’ constraint to achieving constant streams of breakthroughs. Transforming exponential data increases to achieve higher probabilities of biomedical research breakthroughs is possible, according to the tenets of probabilistic innovation theory [3]. Social science theory is therefore perhaps also key to addressing these bottleneck conditions, as are principles of transdisciplinary research to guide theory development. The complexity of aging research and systems associated with aging, both social and biological, can perhaps be taken to reflect complexities inherent in environmental systems, which require a transdisciplinary research approach. Systems theory and other theory developed from study of complex adaptive systems can perhaps make an important contribution to the development of a service field seeking to incorporate literature on how to enable big data/big knowledge transformation as its raison d’etre. Arguably, probabilistic innovation theory can be used as a core rationale for drawing together transdisciplinary theory to support this transformation.Transdisciplinary inquiry transcends certain limitations of multidisciplinary or interdisciplinary research, providing a “systematic, comprehensive theoretical framework for the definition and analysis of the social, economic, political, environmental, and institutional factors influencing human health and well-being” [18]. Such an approach might be necessary to support complexity such as that found in aging research. It is important to support those on the front-lines of the biomedical research process by making theoretical developments in other fields accessible.Such support is particularly salient in a context of rapidly developing technologies offering new opportunities for acceleration of research productivity. As a ‘service field’ to those in different areas of scientific research, the establishment of a probabilistic innovation literature stream might be helpful in the development of theory around how to assist those working in scientific fields to attain near-real time research capacity through engagement with big data. However, unlike the computational literature, this body of theory might offer complementary knowledge of human aspects of problem solving (including interactions with computational processes). Such a support field might better identify and articulate the challenges associated with transformation of knowledge inputs into scientific discovery as well as better integrate theory premised on maximising human scientific collaborations. However, to more clearly identify the potential of this service field, it is first necessary to identify the seminal problem it seeks to solve.At the heart of this stream of literature seeking to understand how to maximise collaboration is the seminal problem of knowledge aggregation, originally identified and developed by economists such as Hayek [33] and von Hippel [34]. The knowledge aggregation problem relates to difficulties in linking tacit knowledge across geographical and other barriers, given this knowledge is inextricably tied to the individual [35] and that transmission from tacit to explicit knowledge modes can be problematic [36]. Linking experts in support of real time scientific research problem solving has found proof of concept in sites like which have a long history of harnessing crowd-sourced inputs into problem solving. However, interactions on this site are typically between solvers and private problem providers, and dyadic in nature in that problem solving inputs comprise proprietary information and are not released back into the crowd [32], thereby lacking exponential knowledge creation capability, notwithstanding its ‘probabilistic’ approach to knowledge creation.With its focus on enabling human research collaborations but also harnessing emergent technologies to this end, probabilistic innovation literature therefore predicts radical exponential increases in data and knowledge can ultimately transmit to radically increased rates of biomedical discovery [37]. This body of theory literature also reflects the rise of new scientific movements such as Citizen Science (CS) [38], public participation in scientific research [39], and Participant-Led Biomedical Research (PLR) [40], drawing ontological and epistemological principles from these fields. Further, probabilistic innovation theory is also premised on novel scientific ethics theory development derived from post-normal science [41].
Big Data Ethics Theory Development
These citizen science movements, which effectively also extend stakeholder theory [42] into the realm of biomedicine [43], also have implications for bioethics, as some have highlighted lack of timely biomedical responses to crises such as Ebola as essentially a failure of bioethics [43]. Arguably, applying the probabilistic innovation theoretical lens, recent developments in technology have enabled radically increased data, information, knowledge collection, and analysis capabilities, and the field of bioethics needs to develop alongside these emergent capabilities [44]. The value of the probabilistic innovation literature lies in its ability to provide those on the front lines of scientific research with useful information focused on these changes, as well as relevant theory drawn from other fields [45].It is important to offer stakeholders in areas of ageing research a contemporary perspective of these changes. At this nexus is also the interface of bioethics, with its moral philosophy and multidisciplinary roots [46] which is experiencing a trend toward increasing stakeholder engagement and responsiveness to public conversations [47]. It seems technology has enabled a convergence of social and natural science driven by the rise of stakeholder and population voices, potentially offering radically increased democratisation of science as a working mechanism to address ‘wicked problems’ of biomedical ethics and research under conditions of uncertainty at the intersections of social values and social complexity [48]. Dramatically increased opportunities for scientific research offered by big data analytics therefore require a commensurate focus by theorists on how ethics and ethical engagement research can develop alongside work seeking to harness these opportunities.Central to a goal of radically increasing breakthroughs in aging research, according to probabilistic innovation logics there are opportunities offered by big data, but these are emerging within a research context of transparent responsiveness to principles of ethics, and also an awareness that unnecessarily slow innovation can also be unethical [44].Callaghan [43] considers the implications of two aspects of biotechnological advances, namely the sale of human tissues and gene transfer, for scientific ethics, suggesting that ethical frameworks might usefully draw from the tenets of postnormal science [41]. According to postnormal science perspectives, problems like conflicting arguments by climate researchers require a radical deepening of ethical scrutiny, which might be best accomplished by increased scientific transparency [41]. The citizen science movement [38] might reflect a broader trend toward the democratisation of science, which may in time enable more societal engagement with scientific ethical issues. The context of bioethical engagement with biomedical research has experienced rapid global expansion [46], and theory development should keep pace with technological developments which enable more effective big data applications in aging research.A practical example of how big data can provide synthesis of biological and social data relevant to aging research is work in the area of the ‘quantified self.’ Whereas the example offered in Fig. ( draws from examples of biomedical aging research, the following example shows how social science-related aging research can also benefit from big data applications. This example also offers a perspective of how social scientific and biomedical research can benefit from holistic theoretical assemblage, as human behaviour can be quantified and big data can allow linkages hitherto unavailable to aging researchers.
The Quantified Self
The Quantified Self (QS) trend in big data science relates to how individuals self-track biological, physical, behavioural, or environmental information, either as individuals or groups. Such large scale data creates opportunities for big data scientists to develop theory and new models based on radically increased volumes of data collection, integration, and analysis [49]. Parallel to these developments is other work on developing open-access database resources, all the while also developing privacy standards for personal data use. Next-generation applications include potential for behaviour change, pattern recognition, and the aggregation of self-tracking data streams from “wearable electronics, biosensors, mobile phones, genomic data and cloud-based services” [49]. These innovations also offer novel opportunities for capturing and analysing the use of words to predict actions or activity, and to track social media and internet activity. This information has been shown to have real time substantial predictive and explanatory power [19]. Holistic theoretical assemblage applied to QS research therefore offers the opportunity to ‘close the circle’ at the interactions of behavioural and medical aging research.QS Theory may therefore offer researchers studying aging system effects useful opportunities to obtain high volume multiple data perspectives. According to Swan:The long-term vision of QS activity is that of a systemic monitoring approach where an individual’s continuous personal information climate provides real-time performance optimization suggestions. There are some potential limitations related to QS activity-barriers to wide-spread adoption and a critique regarding scientific soundness- but these may be overcome. One interesting aspect of QS activity is that it is fundamentally a quantitative and qualitative phenomenon since it includes both the collection of objective metrics data and the subjective experience of the impact of these data…In the long-term future, the quantified self may become additionally transformed into the extended exoself as data quantification and self-tracking enable the development of new sense capabilities that are not possible with ordinary senses. The individual body becomes a more knowable, calculable, and administrative object through QS activity, and individuals have an increasingly intimate relationship with data as it mediates the experience of reality [49].The use of QS by big data science is but one example of the radically increased potential for data collection and analysis provided by big data. These analytics offer new opportunities for tracking human mobility. The GPS capabilities of mobile telephony have demonstrated usefulness in information management under disaster conditions, allowing algorithms and visualisation tools to be applied in real time [19]. Tracking of natural events, of digital transactions, and of behaviour has been shown to offer new potential for understanding patterns and relationships between human activity and context, with important medical applications [19]. More general applications of big data to biomedical aging research are now considered, and the broader contributions of big data analytics to aging research are summarised and related to the probabilistic innovation stream of literature which seeks to consolidate theory related to the development of real time research capability, a process dependent on big data analytics.
Biomedicine and Big Data: Aging Research Potential
The impact of big data has revolutionary potential for scientific advancement, according to Hilbert [19]:If we improve the structure of prior information on which to base our estimates, our uncertainty will on average be reduced. The better the prior, the better the estimate, the better the decision. This is not merely an intuitive analogy, but one of the core theorems of information theory and provides the foundation for all kinds of analytics…The Big Data paradigm…provides a vast variety of new kinds of priors and estimation techniques to inform all sorts of decisions. The impact on the economy has been referred to as ‘the new oil’…Its impact on the social sciences can be compared to the impact of the invention of the telescope for astronomy and the invention of the microscope for biology (providing an unprecedented level of fine-grained detail).Probabilistic innovation literature suggests exponential increases in volumes of problem solving inputs and data can ultimately transmit to scientific outputs, and that theory can provide useful insights as to the extent this transmission can occur, and under which conditions this may best occur. What has been absent in the literature to date is a field which explicitly focuses problem solving on attaining real time research productivity.The logics which unite the probabilistic innovation literature derive from a core theorem of information theory, which predicts that in the same way a revolution in data access and accumulation has radically increased data stocks, big data analytics can drive a commensurate revolution in ‘big knowledge’ [50], or an exponentially increased probability of attainment of research breakthroughs.As previously stressed, key to this, however, is the human factor, in that while biomedical research draws heavily on natural sciences, social science research is also key to understanding the role of human problem solving in the transformation of big data accumulation to knowledge creation. It is, however, emergent technological advances which have made this possible, and which are expected to contribute further to this transformation.Underpinning these processes, therefore, are necessarily the technological advancements which have made big data possible. Ultimately, if the achievement of breakthrough innovations does indeed come to match the explosion in data accumulation, radical advances in aging research can be expected. Exponential increases in telecommunications bandwidth, data storage systems and digital computational capacities have occurred, and, according to Hilbert:Over two decades of digitisation, the world’s effective capacity to exchange information through two-way telecommunication networks has grown from the information equivalent of two newspaper pages per person per day in 1986 (0.3 optimally compressed exabytes worldwide, 20% of which were digitised) to six entire newspapers two decades later in 2007 (65 exabytes worldwide, 99.9% digitised)…in an average minute of 2012, Google received around 2,000,000 search queries, Facebook users shared almost 700,000 pieces of content and Twitter users roughly 100,000 microblogs…[19].Already, “in 2010, it cost merely $600 to buy a hard disk capable of storing all the music in the world” and this increased memory capacity can now store an ever larger part of the “growing information flow” but there is still an inability to analyse but a small percentage of this captured and stored data, and much is not actionable data [19]. In summary, according to tenets of probabilistic innovation theory, an exponential increase in knowledge (and real time knowledge problem solving) commensurate with increases in information is possible, but this requires a body of theory and literature specifically focused on this process.According to probabilistic innovation theory, exponential growth in data, information and knowledge can transmit to exponential growth in biomedical research outcomes, and the way these relate to each other can be represented as a probabilistic function. This transmission, however, seems a long way off an ultimate 1:1 relationship, and probabilistic innovation as a field of research is focused on explaining constraints to innovation that account for this gap. This gap is shown in Fig. (, as the difference between exponential growth in data, information and knowledge and the non-exponential growth in biomedical research outcome. From a knowledge management theory base this perspective therefore seeks to provide a synthesis of theory and evidence relating to novel methodologies which might be helpful to biomedical researchers seeking to close this gap.
Fig. (2)
Probabilistic innovation transmission of big data to research outcomes.
With its explicit focus on the knowledge aggregation problem [34], probabilistic innovation perspectives take tacit knowledge of human problem solvers to be key to enable transformation of data capacity to radically increased knowledge creation. Whereas advances in artificial intelligence (AI) are a helpful counterpoint to attempts to radically increase volumes of human problem solving inputs into biomedical research problems, until AI advances sufficiently to be able to solve knowledge aggregation problems, breakthroughs are considered to primarily be a function of expert human collaborations, and the key to closing the transmission gap might lie in efforts to radically increase the collaborative potential of expert research problem solvers [3]. No matter how insoluble the data-knowledge problem might seem currently, theory development focused on this ‘space’ which applies the principle of exponentially increasing collaboration and engagement of expert human problem solvers is expected to make some headway over time in addressing this seminal research problem. It is therefore holistic theoretical assemblage, and new opportunities offered by big data analytics that can perhaps offer a useful contribution to close this gap. The paper is now concluded with a summary of certain theoretical implications arising from the above discussions.
SYNTHESIS
On the basis of the above discussions, radical changes to the research landscape caused by the emergence of big data analytics can be represented as changes to inductive and theoretical methodological approaches. Following Kitchin’s logics, Fig. ( differentiates between inductive, theoretical and synthesis modes of research and these modes are in turn differentiated between pre-big data and post-big data contexts [1].
Fig. (3)
Paths to real time research capability.
Whereas before the advent of big data, inductive data research could be considered data mining, within the big data context new opportunities exist for ‘closing the circle,’ or being able to have extremely large data sources which offer insights into populations and which allows for patterns and regularities to be identified and related to other patterns and regularities on a scale that allows causal interpretation that is grounded in the data. Under these conditions, the sample is effectively the population itself, and samples can interrelate, offering exponentially increased opportunities to close gaps in knowledge. Given the complexity of interrelationships at the biomedical and social level, big data is considered to be able to offer useful opportunities for aging research to draw from its multiple sub-fields, even from a purely inductive perspective. However, following Kitchin’s work [1], the synthesis of inductive and theoretical approaches can herald a new era in aging research different from what inductive perspectives can offer on their own.Prior to what is termed here the big data era, theoretical research has largely been guided by theory testing framed as rejection of the null hypothesis, or by the principle of falsifiability [51]. Popper’s analysis is based on the propensity interpretation of probability. Probabilistic innovation [3] derives from Popper’s use of probability of falsification [52], but incorporates the notion of ‘inclusivity’ through open source engagement with research problem solving. Popperian exclusivity is taken to be associated with the testing of one hypothesis at a time, whereby knowledge creation is limited to delimitation constraints. In contrast, Popperian inclusivity is taken to be the testing of theory on a multidimensional scale. An example of such a process would be populating a very large problem space with high numbers of researchers, thus collapsing time constraints in biomedical problem solving, whereby theory is tested in an open-source context, with theory testing providing the input for further theory development in almost-real time. Thus the exclusivity of knowledge is transcended, and theory development feeds on knowledge creation itself, as all inputs and collaborators are part of an inclusive process, which is made possible by big data. This process is akin to the process whereby open source software development occurs as developers release their work back into the crowd (a process not typically followed in proprietary crowdsourced R&D on platforms such as InnoCentive).The synthesis of these modes and contexts, however, allows transcendence of a state of paradigm incommensurability [53], where different ontological and epistemological perspectives could be used to argue for the independence of these modes. A state of holistic theoretical assemblage may therefore offer hitherto unimagined theoretical and inductive opportunities for biomedical and social research. Kitchin’s notion of ‘data-driven science’ [1] premised on a hybrid combination of abductive, inductive and deductive approaches, or holistic theoretical assemblage might therefore offer hope for a probabilistic relationship to develop and strengthen between exponentially increasing big data availability and increases in biomedical and social-scientific breakthroughs. Arguably, further work seeking to develop theory which better takes advantage of the novel opportunities of big data may increase the probability of one day achieving real time research capability. The importance of this goal may warrant greatly increased research interest in this area.
CONCLUSION
The objective of this paper was to relate recent developments in big data analytics to aging research, in order to highlight opportunities for researchers to apply theory and practice to their research, whether in natural or social-scientific aging research. The importance of aging research as a core field which underlies human health in general was stressed, as was its linkages with novel developments in fields such as biology and recent technological developments which have enabled communities of researchers to intensify collaborative efforts toward the ultimate attainment of real-time research capabilities.Novel thought relating to theory development associated with the emergence of big data analytics was identified, and its potential usefulness for certain aspects of aging research was also considered. Fundamental changes in biomedicine were also discussed, such as the increasing importance of biologics, or protein-based medicine, and the potential of big data analytics to make important contributions to these areas of research. Examples of large scale research collaboration communities were also used to illustrate the need for theory and literature focused on identifying and building on trends toward increased collaboration in support of real time research capabilities. Certain ‘quantified self’ literature was introduced as another example of how biomedical and social research data convergence could ultimately be enabled. Finally, probabilistic innovation theory was introduced, and its predictions discussed, particularly that big data analytics may usher in a new era of knowledge creation in which a probabilistic relationship between big data and big knowledge creation may ultimately be quantified. Implications arising from these predictions were made explicit, namely that if the radical advances in big data accumulation and analysis can result in a commensurate increase in knowledge creation, then the rates of breakthroughs in biomedical science may also increase exponentially. Certain theoretical paths to real time research capability were finally considered, taking recourse to Popper and Kuhn’s theoretical ideas. Given the importance of aging research as a field of science, and its expansiveness as a theoretical domain, it is concluded that the emergent opportunities offered by recent developments in big data analytics may continue to be particularly important for those in this area of scientific inquiry.
Ethics Approval and Consent to Participate
Not applicable.
Human and Animal Rights
No Animals/Humans were used for studies that are base of this research.
Authors: Doug Howe; Maria Costanzo; Petra Fey; Takashi Gojobori; Linda Hannick; Winston Hide; David P Hill; Renate Kania; Mary Schaeffer; Susan St Pierre; Simon Twigger; Owen White; Seung Yon Rhee Journal: Nature Date: 2008-09-04 Impact factor: 49.962