| Literature DB >> 28979766 |
Ilya Y Zhbannikov1, Konstantin G Arbeev1, Anatoliy I Yashin1.
Abstract
Simulation is important in evaluating novel methods when input data is not easily obtainable or specific assumptions are needed. We present cophesim, a software to add the phenotype to generated genotype data prepared with a genetic simulator. The output of cophesim can be used as a direct input for different genome wide association study tools. cophesim is available from https://bitbucket.org/izhbannikov/cophesim.Entities:
Keywords: GWAS; Phenotype simulation
Year: 2017 PMID: 28979766 PMCID: PMC5605948 DOI: 10.12688/f1000research.11968.1
Source DB: PubMed Journal: F1000Res ISSN: 2046-1402
Figure 1. Workflow of cophesim has three stages: (1) Input stage, where the input data (can be provided in one of the three formats: Plink, MS and GENOME, see the user manual - Supplementary File 1) along with the other input parameters (such as causal variants with size effects, output format, etc.) is prepared for phenotype simulation; (2) Phenotype simulation stage, where different types of phenotypic traits are simulated: dichotomous, continuous and time-to-event (‘survival’); (3) Output stage – the final stage, where simulated phenotype data are packed to various formats in order to be directly usable by six GWAS tools: EMMAX, BLOSSOC, Plink, QTDT, TASSEL and GenABEL. Summary statistics are generated at the output stage as well.
Output file formats supported by phenotype simulator cophesim.
Applying one of the options shown below controls the output format. Each output format has a special suffix type, which defines the file format. These output formats are concordant to those used in phenosim.
| Application | Option | Commentary |
|---|---|---|
| EMMAX | -emmax | Suffices .emma_geno, .emma_pheno |
| BlOSSOC | -blossoc | Suffices .blossoc_pos, .blossoc_geno |
| PLINK | -plink | Used by default across all phenotypes,
|
| QTDT | -qtdt | Suffices .ped, .map, .dat |
| TASSEL | -tassel | Suffices .poly, .trait |
| GenABEL | - | This format is used in simulation of
|
Figure 2. ROC curves constructed from results of association tests performed on a simulated dataset of N = 10,000 individuals, 100 causal and 1,000 of total SNP sites.
TPR: True Positive Ratio, FPR: False Positive Ratio. These results were calculated for dichotomous, continuous and survival traits. The dashed, 45 degrees line represents random guessing.