Literature DB >> 35340694

Clinical Pathology and the Data Science revolution.

Dustin R Bunch1,2, Daniel T Holmes3,4.   

Abstract

Entities:  

Keywords:  AI, Artificial Intelligence; Clinical Pathology; Data Science; Laboratory Medicine; ML, Machine Learning; MS, Mass Spectormetry

Year:  2022        PMID: 35340694      PMCID: PMC8942826          DOI: 10.1016/j.jmsacl.2022.03.001

Source DB:  PubMed          Journal:  J Mass Spectrom Adv Clin Lab        ISSN: 2667-145X


× No keyword cloud information.
While clinical laboratory medicine has always been replete with data, there has been relatively little effort applied from within our discipline to leverage it for clinical, operational, and financial insights. While the concept of “Data Science” as a scientific discipline is not new [1], in the past decade the development, maturation and democratization open-source Data Science tools has made it possible for any determined laboratorian to incorporate their use into clinical practice. Data Science is a multidisciplinary field incorporating aspects of computer science, mathematics, statistics, and predictive analytics for the purposes of extracting actionable insights from large datasets produced by any business or scientific sector. As it pertains to healthcare, Data Science can be leveraged for everything from diagnostic decision support to workforce analysis to the automation of repetitive data entry tasks. Sometimes the nomenclature used in Data Science is confusing as there are a number of frequently arising and overlapping concepts: data analytics, big data, artificial intelligence (AI) and machine learning (ML). The overlap makes precise definitions challenging. Data Science itself has been defined by Kelleher and Tierney as “a set of principles, problem definitions, algorithms, and processes for extracting nonobvious and useful patterns from large data sets” [2] while Donoho defines it as “The science of learning from data; it studies the methods involved in the analysis and processing of data and proposes technology to improve methods in an evidence-based manner” [3]. Data Analytics can be considered a necessary subset of Data Science directed at performing data cleansing, data merging, descriptive statistical analysis, and data visualization tasks for the purposes of drawing meaningful insights. Data Analytics is more often focused on business intelligence rather than scientific inferences per se. AI is a term coined in the 1950’s which has become very broad in its meaning. Fundamentally, AI is any computational process that creates the appearance of human intelligence. One can therefore leverage Data Science tools to build components of any AI system. ML is a subset of AI wherein the computational algorithm is able to learn from experience (that is through the provision of new example cases) without being explicitly programmed to do so [4]. Lee and Durant give a simplified beginner’s tutorial of ML applied mass spectrometry data with code provided in both python and R to suit the reader [5]. Like AI, the development of any machine learner will involve a large component of Data Science. Big Data more or less refers to data sets that are large, unwieldly, and can be characterized by the 3V’s of volume, velocity, and variety [6], [7]. Volume indicates the amount of data generated overwhelms traditional software tools and specific strategies (e.g. distributed computing) are required to perform the computational tasks [8]. Velocity indicates that the data is generated and accumulates rapidly. Variety, in the healthcare context, is easy to understand: numerical results, narrative notes and reports in the clinical chart, images from scanned external reports, radiology, anatomical pathology, and sequencing data from genomic studies. Clinical laboratory data fulfills this definition of Big Data with sources including the laboratory information systems, electronic health record, analytical instruments, and other ancillary systems. These data are frequently non-standardized and unstructured in their formats, requiring data cleansing/merging (“wrangling”) prior to analysis. The data-science tasks intrinsic to laboratory medicine have created a need for the clinical laboratorian to adopt new tools which include programming languages (R, Python, Julia), literate programming tools (Markdown), and web-app development tools (Shiny, Dash) and deployment strategies designed for reliability and long-term stability (e.g. use of cloud infrastructure, containers). These tools allow software development to fill gaps where no commercial solutions exist. Method validation, QC/QA, data automation, instrument interfacing, automated reports, and dashboard creation are typical targets for application development in clinical laboratory medicine. Examples of these can be found in Haymond’s article on creating dashboards for business intelligence [9] and Geistanger’s automated workflow for stability data [10]. Every area of the healthcare system is increasingly affected by Data Science and as more data is generated and stored, so also develops the need to leverage it for clinical and operational insights. The scientific leadership of Mass Spectrometry and Advances in the Clinical Laboratory (MSACL.org) recognized the need for Data Science education nearly a decade ago and began promoting it as a discipline in clinical laboratory medicine through short courses, seminars, and now a special issue in JMSACL. Many of the clinical laboratory Data Science thought-leaders have either taken or contributed to MSACL short courses and have supported this special issue through articles or peer review. As an emerging area of research and operational interest, the gathering of the insights and experiences of the laboratory community seems essential to the development of the next generation of clinical laboratory scientists. The goal of this Data Science Special Issue is to draw on the laboratory community’s knowledge to showcase the wide variety of Data Science applications deployed in laboratory medicine. We have gathered articles that span a large swath of Data Science applications in the clinical laboratory from the aforementioned article by Haymond that demonstrates creating business intelligence dashboards [9], automated specimen stability statistics by Geistanger et al. [10], reproducible manuscripts in R Markdown [11], and workflows for continuous [12] and indirect reference intervals [13] to the applications to mass spectrometry including mass spectrometry imagining by Shedlock et al. [14] and Balluff et al. [15], MS quality metrics by Wilkes et al. [16] and Pablo et al. [17], and MS ML with Lee et al. [5]. We hope that this issue serves to inspire others to engage with Data Science in their workplaces. Respectfully, Dustin Bunch and Dan Holmes
  12 in total

1.  Data parsing in mass spectrometry imaging using R Studio and Cardinal: A tutorial.

Authors:  Cameron J Shedlock; Katherine A Stumpo
Journal:  J Mass Spectrom Adv Clin Lab       Date:  2021-12-20

2.  Continuous reference intervals for pediatric testosterone, sex hormone binding globulin and free testosterone using quantile regression.

Authors:  Daniel T Holmes; J Grace van der Gugten; Benjamin Jung; Christopher R McCudden
Journal:  J Mass Spectrom Adv Clin Lab       Date:  2021-11-01

3.  Reproducible manuscript preparation with RMarkdown application to JMSACL and other Elsevier Journals.

Authors:  Daniel T Holmes; Mahdi Mobini; Christopher R McCudden
Journal:  J Mass Spectrom Adv Clin Lab       Date:  2021-09-27

4.  Listening to your mass spectrometer: An open-source toolkit to visualize mass spectrometer data.

Authors:  Abed Pablo; Andrew N Hoofnagle; Patrick C Mathias
Journal:  J Mass Spectrom Adv Clin Lab       Date:  2021-12-13

5.  Create laboratory business intelligence dashboards for free using R: A tutorial using the flexdashboard package.

Authors:  Shannon Haymond
Journal:  J Mass Spectrom Adv Clin Lab       Date:  2021-12-13

6.  Supervised machine learning in the mass spectrometry laboratory: A tutorial.

Authors:  Edward S Lee; Thomas J S Durant
Journal:  J Mass Spectrom Adv Clin Lab       Date:  2021-12-13

7.  Indirect reference intervals using an R pipeline.

Authors:  Dustin R Bunch
Journal:  J Mass Spectrom Adv Clin Lab       Date:  2022-02-23

8.  An overview of image registration for aligning mass spectrometry imaging with clinically relevant imaging modalities.

Authors:  Benjamin Balluff; Ron M A Heeren; Alan M Race
Journal:  J Mass Spectrom Adv Clin Lab       Date:  2021-12-18

9.  Automated data analytics workflow for stability experiments based on regression analysis.

Authors:  Andrea Geistanger; Kathrin Braese; Ruediger Laubender
Journal:  J Mass Spectrom Adv Clin Lab       Date:  2022-02-08
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.