Literature DB >> 26734074

A call for biological data mining approaches in epidemiology.

Shannon M Lynch1, Jason H Moore2.   

Abstract

Entities:  

Year:  2016        PMID: 26734074      PMCID: PMC4700596          DOI: 10.1186/s13040-015-0079-8

Source DB:  PubMed          Journal:  BioData Min        ISSN: 1756-0381            Impact factor:   2.522


× No keyword cloud information.
Forging a partnership between the traditionally distinct disciplines of informatics and epidemiology is becoming increasingly necessary. Epidemiology is the study of the distribution and determinants of disease. Traditionally, epidemiology has focused on univariate analysis and studied single or a small number of risk determinants and their relationship to health outcomes. However, given the multifactorial and complex nature of chronic diseases, such as cancer, epidemiology has shifted its focus from single risk factors to multilevel conceptual frameworks of health that serve to integrate and study multiple risk factors and how they interact across 3 main levels: 1) the macro-environment, defined by factors outside an individual, such as where a person lives, their family/social circumstances , and environmental exposures; 2) the individual, which includes behaviors, such as smoking, and psychosocial factors; 3) biology, which includes the study of genes and other biomarkers [1]. Similar to the biologic concept of epistasis, understanding which risk factors are most relevant to disease and their interactions is exceedingly convoluted within one level, let alone across multiple levels. Further, few existing population and clinic-based study samples include risk factor information at each of these levels. Thus, it is difficult to test these conceptual frameworks from both a data availability and analytic standpoint. Epidemiology could benefit from entering the “big data” arena and has begun to do so with studies at the biologic level. Advances in –omic technologies have led to the generation of large datasets in genomics and proteomics; however, publically available datasets that contain risk factor information at both the individual and macro-environmental level remain untapped and underutilized. For instance, U.S. Census and U.S. Consumer Spending data could be combined with existing clinical biorepositories and linked through a geocode to test hypotheses related to the interaction of the macro-environment and biology in disease etiology and prognosis. A recent report of emerging macrotrends in Epidemiology suggests that data integration and generation of large social, environmental, and clinical datasets should be a core competency in epidemiologic training [2]. However, the creation of these enormous datasets is futile without the ability to analyze and manipulate big data. Analyzing big data requires knowledge and execution of data mining techniques. Like most biomedical sciences, epidemiology relies heavily on reductionist approaches that use standard regression models (i.e. linear, logistic, multilevel) based on statistical assumptions that may not reflect the true nature of how a risk factor or group of risk factors influence disease etiology and prognosis. For example, genome-wide association studies (GWAS) have yielded new insights into disease processes, but have proven to have little prognostic value, perhaps due to a stringent emphasis on identifying true positives, as well as a focus on the analysis of univariate as opposed to joint effects [3]. Complex Systems approaches and agent-based modeling (ABM) have become increasing popular in epidemiologic investigations, given their focus on interactions or joint effects. ABM is a type of systems algorithmic approach that accounts for the recognition of feedback, interference, change over time, and nonlinearities among risk factors a priori [4], based on existing knowledge and observation, but it is not a true data mining technique that can identify novel risk factors or groups of risk factors empirically. Epidemiology is in need of more powerful modeling approaches that relax model assumptions and allow for more empiric investigations of large scale, joint biologic, social, and genetic datasets. Biological data mining approaches, particularly those related to artificial intelligence and machine learning, could address current epidemiologic limitations and are starting to be explored in population-based studies that include patient and biologic level data [5, 6]. These approaches are model-free, nonparametric, and allow for high performance computing that can incorporate artificial intelligence approaches with human knowledge [6]. Some machine learning approaches, such as neural networks [6] and learning classifier systems [7], have demonstrated an added statistical benefit, as well as revealed effects missed by traditional regression frameworks [3]. While one of the limitations of machine learning algorithms has been validation and interpretation of findings, epidemiology often plays an important role in evaluating inferential statistical methods [8]. Thus, the computational capacity offered by machine learning algorithms, which can allow for the identification of complex interactions across multiple data levels and multiple risk factors, warrants further study in epidemiologic investigations. Epidemiology and informatics can be linked through common data mining methods applied across macro-environmental, individual, and biologic data sources. A partnership with epidemiology would expand the application and reach of data mining methods beyond just genomic or proteomic investigations. Applying big data approaches, namely the creation of large scale datasets from existing resources, as well as data mining methods (i.e. those related to machine learning), to test hypotheses related to epidemiologic, multilevel conceptual models will likely have implications for improving understanding of disease etiology and prognosis. Informatics can aid in methods development and epidemiology can assess the precision, accuracy, and effectiveness of inferences made using big data approaches [8]. Thus, an Epidemiology-Big Data collaboration is of mutual benefit to both groups, and it is the goal of BioData Mining to foster these type of collaborations.
  8 in total

1.  The learning classifier system: an evolutionary computation approach to knowledge discovery in epidemiologic surveillance.

Authors:  J H Holmes; D R Durbin; F K Winston
Journal:  Artif Intell Med       Date:  2000-05       Impact factor: 5.326

2.  From Smallpox to Big Data: The Next 100 Years of Epidemiologic Methods.

Authors:  Stephen J Gange; Elizabeth T Golub
Journal:  Am J Epidemiol       Date:  2015-10-06       Impact factor: 4.897

Review 3.  Machine learning approaches for the discovery of gene-gene interactions in disease data.

Authors:  Rosanna Upstill-Goddard; Diana Eccles; Joerg Fliege; Andrew Collins
Journal:  Brief Bioinform       Date:  2012-05-18       Impact factor: 11.622

4.  Formalizing the role of agent-based modeling in causal inference and epidemiology.

Authors:  Brandon D L Marshall; Sandro Galea
Journal:  Am J Epidemiol       Date:  2014-12-05       Impact factor: 4.897

5.  Charting a future for epidemiologic training.

Authors:  Ross C Brownson; Jonathan M Samet; Gilbert F Chavez; Megan M Davies; Sandro Galea; Robert A Hiatt; Carlton A Hornung; Muin J Khoury; Denise Koo; Vickie M Mays; Patrick Remington; Laura Yarber
Journal:  Ann Epidemiol       Date:  2015-03-14       Impact factor: 3.797

6.  Bridging the gap between biologic, individual, and macroenvironmental factors in cancer: a multilevel approach.

Authors:  Shannon M Lynch; Timothy R Rebbeck
Journal:  Cancer Epidemiol Biomarkers Prev       Date:  2013-03-05       Impact factor: 4.254

7.  Networks in Coronary Heart Disease Genetics As a Step towards Systems Epidemiology.

Authors:  Fotios Drenos; Enzo Grossi; Massimo Buscema; Steve E Humphries
Journal:  PLoS One       Date:  2015-05-07       Impact factor: 3.240

Review 8.  Machine learning applications in cancer prognosis and prediction.

Authors:  Konstantina Kourou; Themis P Exarchos; Konstantinos P Exarchos; Michalis V Karamouzis; Dimitrios I Fotiadis
Journal:  Comput Struct Biotechnol J       Date:  2014-11-15       Impact factor: 7.271

  8 in total
  6 in total

Review 1.  The Necessity of Data Mining in Clinical Emergency Medicine; A Narrative Review of the Current Literatrue.

Authors:  Elahe Parva; Reza Boostani; Zahra Ghahramani; Shahram Paydar
Journal:  Bull Emerg Trauma       Date:  2017-04

2.  Population Neuroscience: Dementia Epidemiology Serving Precision Medicine and Population Health.

Authors:  Mary Ganguli; Emiliano Albanese; Sudha Seshadri; David A Bennett; Constantine Lyketsos; Walter A Kukull; Ingmar Skoog; Hugh C Hendrie
Journal:  Alzheimer Dis Assoc Disord       Date:  2018 Jan-Mar       Impact factor: 2.703

3.  Antibody Clustering Using a Machine Learning Pipeline that Fuses Genetic, Structural, and Physicochemical Properties.

Authors:  Louis Papageorgiou; Dimitris Maroulis; George P Chrousos; Elias Eliopoulos; Dimitrios Vlachakis
Journal:  Adv Exp Med Biol       Date:  2020       Impact factor: 2.622

4.  Noncommunicable diseases in India: Challenges and the way forward.

Authors:  A Banerjee
Journal:  J Postgrad Med       Date:  2019 Jan-Mar       Impact factor: 1.476

5.  Using clinical Natural Language Processing for health outcomes research: Overview and actionable suggestions for future advances.

Authors:  Sumithra Velupillai; Hanna Suominen; Maria Liakata; Angus Roberts; Anoop D Shah; Katherine Morley; David Osborn; Joseph Hayes; Robert Stewart; Johnny Downs; Wendy Chapman; Rina Dutta
Journal:  J Biomed Inform       Date:  2018-10-24       Impact factor: 6.317

6.  Arabidopsis thaliana Genes Associated with Cucumber mosaic virus Virulence and Their Link to Virus Seed Transmission.

Authors:  Nuria Montes; Alberto Cobos; Miriam Gil-Valle; Elena Caro; Israel Pagán
Journal:  Microorganisms       Date:  2021-03-27
  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.