Literature DB >> 35117369

Leveraging well-annotated databases for deep learning in biomedical research.

Nam Nhut Phan1,2,3, Amrita Chattopadhyay3, Tzu-Pin Lu3,4, Mong-Hsun Tsai3,5,6.   

Abstract

Entities:  

Year:  2020        PMID: 35117369      PMCID: PMC8798879          DOI: 10.21037/tcr-20-3163

Source DB:  PubMed          Journal:  Transl Cancer Res        ISSN: 2218-676X            Impact factor:   1.241


× No keyword cloud information.
Buzzwords indicate popular trends in research fields. These terms might last for decades or perish in just a few years (1). Over the last ten years, we have witnessed the rise of a big buzzword-deep learning (DL) (2-7). In brief, DL is a sub-domain of artificial intelligence (AI), a type of representation learning, which automatically finds features in a data and transforms them into a higher abstract data based on matrix operations (3). There are various types of DL algorithms such as convolutional neural networks (8), recurrent neural networks (9), long short-term memory networks (10), convolutional deep belief networks (11), generative adversarial networks (12), and deep residual networks (13), just to name a few. Depending on the specific task/problem, one could use these networks individually or combine them into a pipeline. The biggest advantage of DL algorithms is that they can be trained without pre-defined features/variables, which is especially convenient for complicated data types, such as biomedical images or sequencing data, that are time-consuming and computationally expensive and require a high level of human expertise for feature selection (3). Moreover, high-end facilities such as graphical processing units, central processing units, and random-access memory are needed for processing and training such data within a reasonable amount of time. A growing body of research related to neural network applications for solving problems in the biomedical field includes diverse research topics that commonly leverage big data. This includes biomedical images and multi-omics datasets either from public domain or in-house data from different populations (6,14-16). Biomedical images can be in 2-dimensional (2D) format such as pathological images, or 3-dimensional (3D) such as with mammography images, computed tomography scans, and magnetic resonance imaging (17-22). A single scanned image could be split into hundreds to several thousands of smaller images, which easily complies with the data demands of neural network training. The data formats for multi-omics data is even more complicated and are highly dependent on the manufacturing platforms. The omics data, such as genomics (sequencing data) (23), transcriptomics (sequencing and expression data) (24,25), proteomics (mass spectrometry data) (26), and metabolomics (metabolite compounds) (27), can be used for DL models as long as the number of samples and features is suitable for training and can achieve acceptable accuracy. From only a single run, these high-throughput platforms can generate thousands to millions of data points from each sample. Integrating these could provide an unprecedentedly comprehensive data to study the complicated diseases such as cancer (28) or human brain diseases (29). Therefore, this is a golden era for data-driven research, not only due to the huge amount of publicly available datasets, but also because of the rapid development of modern algorithms and giant technology corporations such as Google (TensorFlow and CoLab cloud computing) (30-32), Amazon (Amazon Web Services) (33), and Facebook (PyTorch) (34) and their platforms and cloud computing services. With such favorable conditions and the available open-source environments of the DL community, it is inevitable that biomedical researchers start to enter the race of DL. Moreover, several databases are available that house a huge number of biomedical images such as the National Cancer Institute’s GDC Data Portal (https://portal.gdc.cancer.gov), the National Institutes of Health Database (https://nihcc.app.box.com/v/ChestXray-NIHCC), the Cancer Imaging Archive (https://www.cancerimagingarchive.net), NLM’s MedPix database (https://medpix.nlm.nih.gov/home), the Open Access Series of Imaging Studies (OASIS) (http://www.oasis-brains.org), the Alzheimer’s Disease Neuroimaging Initiative (ADNI) (http://adni.loni.usc.edu), and Stanford’s AI in Medicine database (AIMI) (https://aimi.stanford.edu/research/public-datasets); all of these could be of immense advantage for the DL community. These databases are maintained and continuously updated with additional samples and data types and play a central role in DL studies due to their well-structured and diverse disease sources. For instance, the GDC data portal can provide whole exome sequencing data, targeted sequencing data, RNA-sequencing data, genotype data, tissue and diagnostic slides, whole genome data, and ATAC-seq data. All of these data are not fully open access, but researchers can apply for access to the controlled portions of the data. However, model training on such large datasets requires data labeling and annotation, which are time-consuming and sometimes expensive, so there are still barriers to the use of all the available data. Many DL publications describe well-annotated datasets; however, gaining access to these resources is usually difficult. Access to in-house datasets, pre-annotated by experts, is still in demand, for the benefit of the healthcare research community. As the public domain data are usually specific to ethnic groups or local populations, other in-house datasets from varied ethnicities could serve as an external validation resource to prevent model bias of certain datasets. That would ultimately make the pre-trained model more useful across populations. Clinical application is the ultimate goal in biomedical research. Therefore, the questions or hypotheses that researchers aim to address with DL, leveraging all ready-to-use data and resources, is of utmost clinical importance. This is what leads to the proper design of models that represent complex real-life data, and potentially provide data-driven information for clinical research. All of this requires close collaboration between laboratory researchers and medical doctors, to understand the current needs in each specific disease and successfully translate findings from the laboratory bench to the clinic.
  18 in total

Review 1.  Computational analysis of microarray data.

Authors:  J Quackenbush
Journal:  Nat Rev Genet       Date:  2001-06       Impact factor: 53.242

Review 2.  Next-generation sequencing transforms today's biology.

Authors:  Stephan C Schuster
Journal:  Nat Methods       Date:  2007-12-19       Impact factor: 28.547

Review 3.  Applications, promises, and pitfalls of deep learning for fluorescence image reconstruction.

Authors:  Chinmay Belthangady; Loic A Royer
Journal:  Nat Methods       Date:  2019-07-08       Impact factor: 28.547

4.  TensorFlow: Biology's Gateway to Deep Learning?

Authors:  Ladislav Rampasek; Anna Goldenberg
Journal:  Cell Syst       Date:  2016-01-27       Impact factor: 10.304

5.  Deep Learning-Based Multi-Omics Integration Robustly Predicts Survival in Liver Cancer.

Authors:  Kumardeep Chaudhary; Olivier B Poirion; Liangqun Lu; Lana X Garmire
Journal:  Clin Cancer Res       Date:  2017-10-05       Impact factor: 12.531

6.  Prediction of cardiovascular risk factors from retinal fundus photographs via deep learning.

Authors:  Ryan Poplin; Avinash V Varadarajan; Katy Blumer; Yun Liu; Michael V McConnell; Greg S Corrado; Lily Peng; Dale R Webster
Journal:  Nat Biomed Eng       Date:  2018-02-19       Impact factor: 25.671

Review 7.  A primer on deep learning in genomics.

Authors:  James Zou; Mikael Huss; Abubakar Abid; Pejman Mohammadi; Ali Torkamani; Amalio Telenti
Journal:  Nat Genet       Date:  2018-11-26       Impact factor: 38.330

Review 8.  A guide to deep learning in healthcare.

Authors:  Andre Esteva; Alexandre Robicquet; Bharath Ramsundar; Volodymyr Kuleshov; Mark DePristo; Katherine Chou; Claire Cui; Greg Corrado; Sebastian Thrun; Jeff Dean
Journal:  Nat Med       Date:  2019-01-07       Impact factor: 53.440

9.  Explaining the unique nature of individual gait patterns with deep learning.

Authors:  Fabian Horst; Sebastian Lapuschkin; Wojciech Samek; Klaus-Robert Müller; Wolfgang I Schöllhorn
Journal:  Sci Rep       Date:  2019-02-20       Impact factor: 4.379

10.  Deep learning as a tool for increased accuracy and efficiency of histopathological diagnosis.

Authors:  Geert Litjens; Clara I Sánchez; Nadya Timofeeva; Meyke Hermsen; Iris Nagtegaal; Iringo Kovacs; Christina Hulsbergen-van de Kaa; Peter Bult; Bram van Ginneken; Jeroen van der Laak
Journal:  Sci Rep       Date:  2016-05-23       Impact factor: 4.379

View more
  1 in total

Review 1.  Artificial Intelligence-Driven Prediction Modeling and Decision Making in Spine Surgery Using Hybrid Machine Learning Models.

Authors:  Babak Saravi; Frank Hassel; Sara Ülkümen; Alisia Zink; Veronika Shavlokhova; Sebastien Couillard-Despres; Martin Boeker; Peter Obid; Gernot Michael Lang
Journal:  J Pers Med       Date:  2022-03-22
  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.