| Literature DB >> 27376073 |
Alberto Malovini1, Riccardo Bellazzi2, Carlo Napolitano3, Guia Guffanti4.
Abstract
Over the last decade, high-throughput genotyping and sequencing technologies have contributed to major advancements in genetics research, as these technologies now facilitate affordable mapping of the entire genome for large sets of individuals. Given this, genome-wide association studies are proving to be powerful tools in identifying genetic variants that have the capacity to modify the probability of developing a disease or trait of interest. However, when the study's goal is to evaluate the effect of the presence of genetic variants mapping to specific chromosomes regions on a specific phenotype, the candidate loci approach is still preferred. Regardless of which approach is taken, such a large data set calls for the establishment and development of appropriate analytical methods in order to translate such knowledge into biological or clinical findings. Standard univariate tests often fail to identify informative genetic variants, especially when dealing with complex traits, which are more likely to result from a combination of rare and common variants and non-genetic determinants. These limitations can partially be overcome by multivariate methods, which allow for the identification of informative combinations of genetic variants and non-genetic features. Furthermore, such methods can help to generate additive genetic scores and risk stratification algorithms that, once extensively validated in independent cohorts, could serve as useful tools to assist clinicians in decision-making. This review aims to provide readers with an overview of the main multivariate methods for genetic data analysis that could be applied to the analysis of cardiovascular traits.Entities:
Keywords: SNPs; cardiovascular diseases; multivariate methods; risk scores; risk stratification
Year: 2016 PMID: 27376073 PMCID: PMC4896915 DOI: 10.3389/fcvm.2016.00017
Source DB: PubMed Journal: Front Cardiovasc Med ISSN: 2297-055X
Summary of the main multivariate methods for common variants analysis.
| Phenotype | Method | Main software packages | Analysis of entire GWAS datasets | Advantages | Disadvantages |
|---|---|---|---|---|---|
| Stepwise logistic regression ( | Orange ( | Limited to candidate variants | Results can be easily interpreted | Results could be negatively influenced by collinearity; computationally intensive; R implementations | |
| LASSO ( | Orange ( | Yes (HyperLASSO), otherwise the analysis is limited to candidate variants | Fast computation; internal CV to learn the optimal λ parameter | Does not necessarily yield good results in presence of high collinearity and when the number of variants exceeds the number of examples; R | |
| Elastic net ( | elasticnet | Limited to candidate variants | Combines strengths of LASSO and Ridge regression ( | Requires advanced computer skills | |
| BOSS ( | BOSS | Limited to candidate variants | Works properly also when the number of features exceeds the number of samples | Computationally intensive; requires advanced computer skills | |
| BoNB ( | BoNB | Yes | Fast computation; robust to LD between variants | Requires advanced computer skills | |
| Classification trees ( | Orange ( | Limited to candidate variants | Fast computation; easy to interpret | May not perform well in the presence of complex interactions, overfitting may lead to instability; R | |
| Random forest ( | Orange ( | Yes (RFF) otherwise the analysis is limited to candidate variants | Robust to noise; fast computation | Results are difficult to interpret; R | |
| ABACUS ( | ABACUS | Candidate regions mapping to specific pathways | Able to simultaneously consider common and rare variants and different directions of genotype effect | Requires advanced computer skills | |
| Stepwise Cox proportional hazard model | Survival | Limited to candidate variants | Results can be easily interpreted | Results could be negatively influenced by collinearity; computationally intensive; requires advanced computer skills | |
| LASSO ( | glmnet | Limited to candidate variants | Fast computation; internal CV to learn the optimal λ parameter | Does not necessarily yield good results in presence of high collinearity and when the number of variants exceeds the number of examples; requires advanced computer skills | |
| Elastic net ( | coxnet | Limited to candidate variants | Combines strengths of LASSO and Ridge regression ( | Requires advanced computer skills | |
| Classification (survival) trees ( | rpart | Limited to candidate variants | Fast computation; easy to interpret | May not perform well in the presence of complex interactions, overfitting may lead to instability; requires advanced computer skills | |
| Random forest ( | randomForestSRC | Limited to candidate variants | Robust to noise; fast computation | Results are difficult to interpret; requires advanced computer skills | |
| Stepwise linear regression | stats | Limited to candidate variants | Results can be easily interpreted | Results could be negatively influenced by collinearity; computationally intensive; requires advanced computer skills | |
| LASSO ( | Orange ( | Yes (HyperLASSO), otherwise the analysis is limited to candidate variants | Fast computation; internal CV to learn the optimal λ parameter | Does not necessarily yield good results in presence of high collinearity and when the number of variants exceeds the number of examples; R | |
| Elastic net ( | Elasticnet | Limited to candidate variants | Combines strengths of LASSO and Ridge regression ( | Requires advanced computer skills | |
| GUESS ( | GUESS/R2GUESS | Yes | Fast parallel computation | Requires advanced computer skills | |
| Regression trees ( | Orange ( | Limited to candidate variants | Fast computation; easy to interpret | May not perform well in the presence of complex interactions, overfitting may lead to instability; R | |
| Random forest ( | Orange ( | Yes (RFF) otherwise the analysis is limited to candidate variants | Robust to noise; fast computation | Results are difficult to interpret; R | |
Phenotype, dependent variable’s distribution; method, algorithm or method; main software packages, main softwares, packages, or functions implementing the described method; analysis of entire GWAS datasets, indicates whether the method can be applied to whole GWAS data; advantages, advantages of the method; disadvantages, disadvantages of the method.
.
.
Figure 1rs10494366 common variant on . The schema reports the combined hazard ratios (HRs) from Cox regression by risk categories. The risk stratification schema includes the common variant rs10494366 on NOS1AP gene and known risk predictors in LQTS, represented by: QTc ≥ 500 ms, gender, and LQTS subgroup. Each box shows the combined HR for patients sharing clinical and genetic characteristics. The reference category (HR = 1) is represented by individuals LQT1, males, QT < 500 ms and homozygote for the common allele of NOS1AP rs10494366. Reprinted from the manuscript by Tomás and colleagues (48) with permission from Elsevier.