| Literature DB >> 32039240 |
Antonio de Marvao1, Timothy J W Dawes1, Declan P O'Regan1.
Abstract
Cardiovascular conditions remain the leading cause of mortality and morbidity worldwide, with genotype being a significant influence on disease risk. Cardiac imaging-genetics aims to identify and characterize the genetic variants that influence functional, physiological, and anatomical phenotypes derived from cardiovascular imaging. High-throughput DNA sequencing and genotyping have greatly accelerated genetic discovery, making variant interpretation one of the key challenges in contemporary clinical genetics. Heterogeneous, low-fidelity phenotyping and difficulties integrating and then analyzing large-scale genetic, imaging and clinical datasets using traditional statistical approaches have impeded process. Artificial intelligence (AI) methods, such as deep learning, are particularly suited to tackle the challenges of scalability and high dimensionality of data and show promise in the field of cardiac imaging-genetics. Here we review the current state of AI as applied to imaging-genetics research and discuss outstanding methodological challenges, as the field moves from pilot studies to mainstream applications, from one dimensional global descriptors to high-resolution models of whole-organ shape and function, from univariate to multivariate analysis and from candidate gene to genome-wide approaches. Finally, we consider the future directions and prospects of AI imaging-genetics for ultimately helping understand the genetic and environmental underpinnings of cardiovascular health and disease.Entities:
Keywords: artificial intelligence; cardiology; cardiovascular imaging; deep learning; genetics; genomics; imaging-genetics; machine learning
Year: 2020 PMID: 32039240 PMCID: PMC6985036 DOI: 10.3389/fcvm.2019.00195
Source DB: PubMed Journal: Front Cardiovasc Med ISSN: 2297-055X
Considerations in the use of machine learning in imaging-genetics research.
| Selection of AI approach based on clinical question and data characteristics | Supervised methods suited to classification and prediction tasks involving “labeled” data: e.g., image segmentation or survival prediction. |
| Algorithm selection | Are there “off-the-shelf” algorithms tailored to identical problems or validated in similar data? Transparency, understandability and performance are all important features. Try to avoid “black box” approaches where it is not possible to scrutinize the features that inform the classification or explain the outputs in high-stakes decision-making. |
| Data pre-processing | Several steps are likely to be required in the preparation of data including anonymization, quality control, data normalization and standardization, addressing how to handle missing data points and outliers, imputation of missing values, etc. Is the training data an accurate representation of the wider data/population (e.g., all expected variation present, same technical characteristics)? |
| Feature selection | A subset of relevant features (variables or predictors) is selected from high dimensional data allowing for a more succinct representation of the dataset. |
| Data allocation | Evaluate the available data and plan the proportions of data being allocated into the training, testing, and validation datasets. Other approaches include cross validation, stratified cross validation, leave-one-out, and bootstrapping. |
| Hardware considerations | Based on the volume of data and methodological approaches are CPU clusters, GPUs, or cloud computing better suited? |
| Evaluation of model performance | Receiver operating characteristic (ROC) curves with accuracy measured by the area under the ROC curve (AUC), C-statistics, negative predictive value, positive predictive values, sensitivity, and specificity, Hosmer–Lemeshow test for goodness of fit, precision, recall, f-measure. Imaging segmentation accuracy (comparison between human expert labels and automated labels) reported as Dice metric, mean contour distance, and Hausdorff distance. |
| Publication and transparency | Make code and anonymized sample of data publicly available (e.g., GitHub, Docker containers, R packages, or Code Ocean repositories). Encourage independent scrutiny of the algorithm. |
| Generalization and replication results | Algorithms should be validated by independent researchers on external cohorts and satisfy the requirements of medical devices and software regulatory frameworks. |
Figure 1Artificial intelligence in big data imaging-genetics research.