| Literature DB >> 22698142 |
Michele Pinelli1, Giovanni Scala, Roberto Amato, Sergio Cocozza, Gennaro Miele.
Abstract
BACKGROUND: The analysis of complex diseases is an important problem in human genetics. Because multifactoriality is expected to play a pivotal role, many studies are currently focused on collecting information on the genetic and environmental factors that potentially influence these diseases. However, there is still a lack of efficient and thoroughly tested statistical models that can be used to identify implicated features and their interactions. Simulations using large biologically realistic data sets with known gene-gene and gene-environment interactions that influence the risk of a complex disease are a convenient and useful way to assess the performance of statistical methods.Entities:
Mesh:
Year: 2012 PMID: 22698142 PMCID: PMC3538511 DOI: 10.1186/1471-2105-13-132
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1GENS2 work flow. Chart of the steps that were used to simulate a complex disease in a population using the simuPOP and GENS2 systems.
The epidemiologic parameters that were used for the sample simulation
| 1) Starting data (Hap Map) | Chromosomes, or chromosome regions, or markers and marker distance. | The genomic regions containing the loci that will be simulated |
| | Population (ethnicity) | The starting frequency and linkage data to be used in the sumulation |
| 2) Simulation of sample’s genetic data | DPLs (Disease Predisposing Loci) | Loci that will influence the disease risk. |
| | Target allelic frequency | Final allelic frequencies at the end of simuPOP simulation |
| | Final sample size | Number of individuals that compose the population by simuPOP |
| Starting sample | simuPOP generated sample | Sample data generated with simuPOP |
| | Disease prevalence | The expected disease prevalence in the whole sample |
| Environment | Environmental factor distribution | Distribution of the environmental exposure in the whole sample |
| | Environmental factor OR | Odds ratio associated with one-unit-increase of the environmental exposure |
| | Noisy Environmental variables | As many as desired confounding environmental exposures not associated with the disease risk (gaussian, binomial or uniform distributed) |
| Genetics | DPLs | These are the same DPLs as selected in the simuPOP simulation |
| | High risk alleles | The allele, for each DPL, associated with the highest disease risk |
| | DPLs genotypic RR | The relative risk of the high risk homozygote versus low risk homozygote, for each DPL |
| | Dominance | The relationship of the risk associated with the heterozygote with that associated with the homozygotes (recessive, dominant, codominant) |
| | Epistasis model (G×G) | Percent increase of the risk associated with each combined genotype |
| Gene Environment interaction | G×E model | One of the four predefined interaction models |
Predefined gene-environment interaction models in GENS2
| Genetic Model(GEN) | Disease risk depends only on the genetics of an individual |
| Environmental Model (ENV) | Disease risk depends only on environmental exposure of an individual |
| Gene Environment interaction Model (GEM) | The genetics modifies the effect of the environment in modulating the disease risk |
| Additive Model(ADD) | The effects of environment and genetics are independent and sums in modulating the disease risk |
Figure 2Example of application of epistasis. Disease penetrance for combined genotypes before (left panel) and after (right panel) the application of an epistasis model with an increment of 20% of the risk associated with the (CC-TT) composed genotype. The x- and y- axes plot the reported genotypes of the two DPLs; the z-axis plots the risk associated with each combined genotype.
Figure 3Association test for the case of epistatic interaction. The population comprised 5,000 cases and 5,000 controls. Two DPLs with no marginal risk (RR=1), an epistatic interaction ( + 20%penetrance for the (3,3) combined genotype) and an additive G×E model (odds ratio (OR)=1.2) were used. The two DPLs are in two distinct genomic regions (Chr 8: 117,948,182 - 119,256,695 in yellow; Chr 10: 114,408,939 - 115,256,799 in cyan). In the upper panel, the Manhattan plot shows the significance of the association (−log10(p-value)) of each marker when tested individually (each dot represents a different marker). The red dashed line represents the significance threshold (0.05 after Bonferroni correction) and the green dashed lines mark the position of DPLs. In the middle panel, the r2 of each marker with the DPL in the same region is shown. In the bottom panel, the significance of the association for each 2-loci interaction (grey scale, nonsignificant; red scale, significant at a 0.05 level after Bonferroni correction) is shown.