| Literature DB >> 32117857 |
Ivo D Dinov1,2,3,4.
Abstract
This perspective provides a rationale for redesigning and a framework for expanding the graduate health science analytics and biomedical doctoral program curricula. It responds to digital revolution pressures, ubiquitous proliferation of big biomedical data, substantial recent advances in scientific technologies, and rapid progress in health analytics. Specifically, the paper presents a set of common prerequisites, a proposal for core computational and data analytic curriculum, and a list of expected outcome competencies for graduates of doctoral health science and biomedical programs. The manuscript emphasizes the necessity for coordinated efforts of all stakeholders, including trainees, educators, academic institutions, funding agencies, and policy makers. Concrete recommendations are presented of how to ensure graduates with terminal health science analytics and biomedical degrees are trained and able to continuously self-learn, effectively communicate across disciplines, and promote adaptation and change to counteract the relentless pace of automation and the law of diminishing returns.Entities:
Keywords: analytics; data science; doctoral training; graduate curricula; health science; methods; quantitative education
Mesh:
Year: 2020 PMID: 32117857 PMCID: PMC7031195 DOI: 10.3389/fpubh.2020.00022
Source DB: PubMed Journal: Front Public Health ISSN: 2296-2565
Examples of prerequisites for strong biomedical and health sciences quantitative doctoral programs.
| Bachelor's degree or equivalent | Prior quantitative methods/analytics training and coding skills | Graduate programs require a basic minimum level of quantitative skills |
| Quantitative literacy | Undergraduate calculus, linear algebra, numerical methods, introduction to probability, statistics, or data science | These represent fundamentals that are required for most methods and analytics graduate health science courses |
| Some coding experience | Some academic, training or professional experience in programming or software development | Most practicing bioinformaticians and health analysts need substantial coding experience, e.g., Java, C/C++, HTML5, R, Python, Perl, PHP, SQL/DB |
| Strong motivation | Substantial current interest for emersion and motivation to pursue long-term quantitative data analytic applications | Dedication for prolonged and sustained immersion into hands-on practice, collaboration, and methodological health research is very important |
Exemplary courses at the University of Michigan.
| HS853: Advanced scientific methods for health sciences | Covers a number of modern analytical methods for advanced healthcare research. Specific focus is on reviewing and using innovative modeling, computational, analytic and visualization techniques to address specific driving biomedical and healthcare applications. The course covers the 6 dimensions of Big-Data, statistical cross-validation, model-based, and model-free forecasting | Analytics/applications |
| HS650: Data science and predictive analytics | Concepts, techniques, tools, and services for managing, harmonizing, aggregating, preprocessing, modeling, analyzing, and interpreting large, multi-source, incomplete, incongruent, and heterogeneous data (Big Data). The focus will be to expose students to common challenges related to handling Big Data and present the enormous opportunities and power associated with our ability to interrogate such complex datasets, extract useful information, derive knowledge, and provide actionable forecasting | Analytics |
| PIBS503: Research responsibilities and ethics | Covers case-studies on fraud, fabrication, and plagiarism, data storage, ownership, and peer review, animal use and care, human subjects research and IRBs, conflict of interest, research in the global workplace, dual use issues, discussion about ethical practices particular to project/laboratory | Research ethics |
| BIOINF585: Machine learning for systems biology & clinical informatics | Focuses on machine learning methods and their applications in biomedical sciences. Topics include: (1) data management solutions for Big Data applications, (2) feature extraction and reduction methods, (3) clustering and classification methods, (4) testing and validation of models, (5) applications in systems biology and clinical informatics | Methods and apps |
| BIOINF527: Introduction to bioinformatics and computational biology | Introduces students to the fundamental theories and practices of Bioinformatics and Computational Biology via a series of integrated lectures and labs. These lectures and labs will focus on the basic knowledge required in this field, methods of high-throughput data generation, accessing public genome-related information and data, and tools for data mining and analysis | Methods and apps |
| BIOSTAT602: Biostatistical inference | Provides deep understanding of key concepts and analytics of statistical inference. Statistical inference methods are of critical importance for statisticians to properly process data and organize information to quantify uncertainty so to delivery adequate solutions to substantive questions | Methods and analytics |
| Math 571: Numerical linear algebra | Introduces numerical linear algebra as a core subject in scientific computing. Three types of problems are considered: (1) linear systems ( | Methods and analytics |
| Stats 415: Data mining and statistical learning | Covers the principles of data mining, exploratory analysis and visualization of complex data sets, and predictive modeling. The presentation balances statistical concepts (such as over-fitting data, and interpreting results) and computational issues. | Methods and analytics |
| Stats 503: Applied multivariate analysis | Presents modern methods of multivariate data analysis and statistical learning, including theoretical foundations, and practical applications. Topics include principal component analysis and other dimension reduction techniques, classification (discriminant analysis, decision trees, nearest neighbor classifiers, logistic regression, support vector machines, ensemble methods), and clustering | Methods and analytics |
| NERS 590: Methods and practice of scientific computing | Develops the necessary skills to be effective computational scientists and how to produce work that adheres to the scientific method. A broad range of topics are covered including: software engineering best practices, computer architectures, computational performance, common algorithms in engineering, solvers, software libraries for scientific computing, uncertainty quantification, and validation | Methods |
| EECS 584: Advanced database management systems | Advanced topics and research issues in database management systems. Distributed databases, advanced query optimization, query processing, transaction processing, data models, and architectures. Data management for emerging application areas, including bioinformatics, the internet, OLAP, and data mining. A substantial course project allows in-depth exploration of topics of interest | Methods and analytics |
| EECS 545: Machine learning | Introduces computer algorithms that can learn from data or past experience to predict well on the new unseen data. In the past few decades, machine learning has become a powerful tool in artificial intelligence and data mining, and it has made major impacts in many real-world applications. This course gives a graduate-level introduction of machine learning and provide foundations of machine learning, mathematical derivation and implementation of the algorithms, and their applications | Methods and analytics |
| EECS 453: Applied data analysis | Theory and application of matrix algorithms to signal processing, data analysis and machine learning. Theoretical topics include subspaces, eigenvalue and singular value decomposition, projection theorem, constrained, regularized, and unconstrained least squares techniques and iterative algorithms. Applications include image deblurring, ranking of webpages, image segmentation and compression, social networks, circuit analysis, recommender systems, handwritten digit recognition | Methods and analytics |
Examples of hypothetical broad-scope 5-year program plans by specialization.
| Bioinformatics | Discipline-specificcourses. | Data science | Advanced ML/AI | Inter-professional education, | Domain-specific AI/ML applications | Data-driven dissertation-topic specific research |
| Professional schools | Computing, | Data science | ||||
| Public health, biostatistics | AI/ML techniques | Clinical decision support systems | ||||
| Biomathematics | Computational biology, | High-throughput precision health | ||||
| Neuroscience | Computational neuroscience, | Data science and predictive | ||||
Expected program graduate's competencies.
| Algorithms and applications | Tools | Working knowledge of basic software tools (command-line, GUI based, or web-services) | Familiarity with statistical programming languages, e.g., R or SciKit/Python, and database querying languages, e.g., SQL or NoSQL |
| Algorithms | Knowledge of core principles of scientific computing, applications programming, numerical methods, API's, algorithm complexity, and data structures | Best practices for scientific and application programming, efficient implementation of matrix linear algebra and graphics, elementary notions of computational complexity, user-friendly interfaces, string matching | |
| Application domain | Data analysis experience from at least one application area, either through coursework, internship, research project, etc. | Applied domain examples include: computational social sciences, health sciences, business and marketing, learning sciences, transportation sciences, engineering, and physical sciences | |
| Data management | Data validation and visualization | Curation, Exploratory Data Analysis (EDA) and visualization | Data provenance, validation, visualization via histograms, Q-Q plots, scatterplots (ggplot, Dashboard, D3.js) |
| Data wrangling | Skills for data normalization, data cleaning, data aggregation, and data harmonization and registration. | Data imperfections include missing values, inconsistent string formatting (“2016-01-01” vs. “01/01/2016,” PC/Mac/Linux time vs. timestamps, structured vs. unstructured data, ASCII vs. binary format, compression, etc. | |
| Data infrastructure | Handling databases, web-services, Hadoop, multi-source data | Data structures, SOAP protocols, ontologies, XML, JSON, streaming | |
| Analysis methods | Statistical inference | Basic understanding of bias and variance, principles of (non)parametric statistical inference, and (linear) modeling | Biological variability vs. technological noise, parametric (likelihood) vs. non-parametric (rank order statistics) procedures, point vs. interval estimation, hypothesis testing, regression |
| Study design and diagnostics | Design of experiments, power calculations and sample sizing, strength of evidence, | Multistage testing, variance normalizing transforms, histogram equalization, goodness-of-fit tests, model overfitting, model reduction | |
| Machine learning | Dimensionality reduction, k-nearest neighbors, random forests, AdaBoost, kernelization, SVM, ensemble methods, CNN | Empirical risk minimization. Supervised, semi-supervised, and unsupervised learning. Transfer learning, active learning, reinforcement learning, multiview learning, instance learning |