Literature DB >> 22373155

Genetic Analysis Workshop 17 mini-exome simulation.

Laura Almasy1, Thomas D Dyer, Juan Manuel Peralta, Jack W Kent, Jac C Charlesworth, Joanne E Curran, John Blangero.   

Abstract

The data set simulated for Genetic Analysis Workshop 17 was designed to mimic a subset of data that might be produced in a full exome screen for a complex disorder and related risk factors in order to permit workshop participants to investigate issues of study design and statistical genetic analysis. Real sequence data from the 1000 Genomes Project formed the basis for simulating a common disease trait with a prevalence of 30% and three related quantitative risk factors in a sample of 697 unrelated individuals and a second sample of 697 individuals in large, extended pedigrees. Called genotypes for 24,487 autosomal markers assigned to 3,205 genes and simulated affection status, quantitative traits, age, sex, pedigree relationships, and cigarette smoking were provided to workshop participants. The simulating model included both common and rare variants with minor allele frequencies ranging from 0.07% to 25.8% and a wide range of effect sizes for these variants. Genotype-smoking interaction effects were included for variants in one gene. Functional variants were concentrated in genes selected from specific biological pathways and were selected on the basis of the predicted deleteriousness of the coding change. For each sample, unrelated individuals and family, 200 replicates of the phenotypes were simulated.

Entities:  

Year:  2011        PMID: 22373155      PMCID: PMC3287854          DOI: 10.1186/1753-6561-5-S9-S2

Source DB:  PubMed          Journal:  BMC Proc        ISSN: 1753-6561


Background

The state of the science for localization and identification of genes that influence common complex diseases has changed rapidly over the past 20 years. As laboratory costs continue to fall with the development of more efficient high-throughput techniques, the field is quickly proceeding toward studies that make use of genome-wide sequence data. There is as yet no consensus on optimal, or even appropriate, statistical genetic approaches for analyzing exome sequence data, and few investigators have had experience analyzing such data sets. This was the motivation for the Genetic Analysis Workshop 17 (GAW17) “mini-exome” data set. The GAW17 data set is a hybrid of simulated and real data. Real exome sequence data from the 1000 Genomes Project were used as the basis for simulating a common complex disease and related quantitative risk factors. Two different study designs were simulated, unrelated individuals and large families, each with the same sample size.

1000 Genomes Project

The 1000 Genomes Project (http://www.1000genomes.org) is designed to survey genetic variation at the sequence level across multiple human population groups. It includes individuals of European, East Asian, South Asian, West African, and American Indian ancestry. Three pilot projects for the 1000 Genomes Project were completed in 2010: low-fold genome-wide sequencing of 179 individuals, higher fold sequencing of two parent-child trios, and exonic sequencing in 697 individuals [1]. Publicly available exon sequence data from the 1000 Genomes Project were used to provide a realistic pattern of number and frequency of single-nucleotide polymorphisms (SNPs), including cross-population variation and linkage disequilibrium between sites, for the GAW17 simulations.

Methods

Genotype calling

SNP genotypes were obtained from the sequence alignment files provided by the 1000 Genomes Project for their pilot3 study. When the GAW17 data set was generated, the 1000 Genomes Project had not yet posted processed calls of these genotypes for each individual. Thus the UnifiedGenotyper method from the Genome Analysis Toolkit (GATK) package [2] was used for the detection of SNPs and for the calling of SNP genotypes. A male human genome based on National Center for Biotechnology Information reference sequence 36 (RefSeq36) human genome release (human_b36_male.fasta.gz) was used as the reference genome sequence for both male and female alignments. The UnifiedGenotyper method was run twice on the alignment files. The first time it was allowed to scan freely through the alignments to search for variation against the reference sequence to be considered as SNP candidates. Genotypes that were not homozygous for the reference base allele were called for the candidate SNPs detected. Because of time and technical constraints, GAW17 SNPs were chosen to be the subset of candidate SNP genotypes that were called from an alignment of 10 or more sequencing reads. During the second run, genotypes, including those homozygous for the reference base, were called only for the subset of SNPs selected in the first run. This procedure had the advantages of being fast, correctly calling most of the true common SNP variants, generating a large volume of rare SNP variants, and producing a genotype matrix with few missing genotypes to simplify downstream preparation of the simulated data set. However, it was not meant to detect the true natural variation present in the 1000 Genomes Project. Thus there were more rare SNPs in the GAW17 data set than those described in the 1000 Genomes Project Consortium analyses of their own pilot data sets [1]. The enrichment of rare variants in the GAW17 data set was caused in part by artifacts introduced by, for example, lack of filtering. The 1000 Genomes Project genotypes were not phased, and some genotypes were missing as a result of incomplete sequence coverage in some individuals. We used the program fastPHASE [http://depts.washington.edu/uwc4c/express-licenses/assets/fastphase/] to infer missing genotypes and haplotypic phase. In the family data set (described later), we used the program CHRSIM [3] to drop the phased founder haplotypes throughout the rest of the pedigree. Recombination was taken into account, with a single obligate crossover event occurring on each chromosome. As noted, the GAW17 genotypes differ from the official 1000 Genomes Project called genotypes for the same individuals because of differing approaches to genotype calling, inclusion or exclusion of regions of low-fold coverage, and the inclusion of the imputed genotypes in the GAW17 data set. Imputed genotypes were not identified as such in the distributed data and were treated as equally “real” as called genotypes in the phenotype simulations. These choices were motivated by a focus on designing a data set that would be useful for developing methods related to gene localization, identification, and characterization, with the 1000 Genomes Project data primarily serving as a source of sequence data with realistic patterns of SNP distribution, allele frequency, population variation, and linkage disequilibrium. However, these decisions limited the utility of the GAW17 data for population genetic analyses or for examination of effects of genotype calling or data cleaning on gene finding.

Distributed genotype data

The called genotype data distributed for GAW17 included the inferred genotypes, such that all individuals had genotypes for all base-pair positions, and phenotypes were simulated on the basis of these data. Markers were numbered sequentially on each chromosome and were labeled C#S# (e.g., C1S254 is the 254th SNP on chromosome 1); marker locations were recorded as RefSeq36 base-pair coordinates. The 24,487 autosomal SNPs detected in genotype calling were, for purposes of the simulation, assigned to 3,205 genes based on the first intersection found of the marker location and the base-pair coordinates of all genes obtained from RefSeq36 annotations. SNPs that overlapped multiple genes were assigned to only one of those genes. There were 1 to 231 SNPs per gene (mean = 7.64, SD = 14.00). Of the SNPs, 9,433 (38.4%) were private variants, occurring once in the set of 697 unrelated individuals. Multiple private variants carried by the same individual resulted in SNPs with identical genotypes, including a SNP in the KDR gene that was designated as functional in the phenotype simulations which had identical genotypes to multiple nonfunctional SNPs. Relatively few of the variants were common; 74% had minor allele frequency (MAF) ≤ 0.01 and only 12.8% had MAF ≥ 0.05 (Figure 1). The median MAF was 0.002, that is, three copies of the minor allele in the sample of 697 unrelated individuals.
Figure 1

Minor allele frequency in the unrelated individuals sample for the 15,054 SNPs present in multiple copies. Note that the scale of the MAF categories is uneven, going by 0.5% intervals for MAF < 0.01, by 1% intervals for MAF = 0.01–0.05, and by 5% intervals thereafter.

Minor allele frequency in the unrelated individuals sample for the 15,054 SNPs present in multiple copies. Note that the scale of the MAF categories is uneven, going by 0.5% intervals for MAF < 0.01, by 1% intervals for MAF = 0.01–0.05, and by 5% intervals thereafter.

Unrelated individuals and pedigree samples

Two disparate sampling designs were used in the construction of the simulated data. One sample consisted of 697 unrelated individuals, each of whom corresponded to an individual from the 1000 Genomes Project data. The 1000 Genomes Project subjects whose data were used came from the CEPH (European-descent residents of Utah; n = 90), Denver Chinese (n = 107), Han Chinese (n = 109), Japanese (n = 72), Luhya (n = 108), Tuscan (n = 66), and Yoruban (n = 112) population groups. The second sample configuration used in GAW17 simulations consisted of 697 individuals in 8 extended families with the genotypes for the 202 pedigree founders being taken from the 1000 Genomes Project data. The founders of the pedigrees were chosen at random from the unrelated individuals sample and included 12 CEPH, 18 Denver Chinese, 19 Han Chinese, 28 Japanese, 50 Luhya, 66 Tuscan, and 9 Yoruban samples. Because of a computational error, the genotypes of pedigree founders were merged incorrectly across files, resulting in an incomplete match between the genotypes of the pedigree founders and the corresponding individual from the unrelated individuals sample. This affected a small proportion of genotypes (7%) but impacted all pedigree founders. Approximately one-third of the SNPs were unaffected, and for the two-thirds that had substitutions, most had only one or two founders with altered genotypes. Pedigree configurations were adapted from the pedigrees used for simulated data in Genetic Analysis Workshops 10 and 12 [4,5] and included four generations and relatives as distant as second cousins. The data set was designed such that all family members had genotype and phenotype data available with no missing or unexamined relatives. Because the pedigree founders were a subset of the unrelated individuals, genetic diversity was restricted in the families compared to the unrelated individuals sample. Of the 24,487 variant sites identified in the unrelated individuals sample, 10,703 were monomorphic in the family sample with only one allele appearing. On the other hand, some variants that were present in single copies or at low frequency in the unrelated individuals sample appeared many times in the family sample, because they were transmitted by a founder with numerous descendants. For example, C6S2981, which was designated as functional in the phenotype simulations, was present in 3 copies in the unrelated individuals sample and in 46 copies in the family sample. C4S4935, also designated as functional in the simulations, was present in a single copy in the unrelated individuals sample but in 31 copies in the pedigree sample. There are 327 males and 370 females in the unrelated individuals data set, which preserved the listed sex for each of the 1000 Genomes Project samples. The family set included 346 males and 351 females. Pedigree founders were allowed to have a different sex from the unrelated individuals whose genotypes they shared. However, only autosomal markers were used in the GAW17 simulations (i.e., X and Y data were not included). Assigned ages were matched across the family and unrelated individuals data sets and ranged from 16 to 91 years, with a mean of 41.8 years. For the family data set, fully informative markers were generated at each gene (recombination was not allowed within genes) and used to compute identical-by-descent (IBD) allele sharing at each gene location under the rationale that family-based data sets were likely to have previous short tandem repeat (STR) or high-density SNP genotyping that could be used to estimate the IBD allele sharing. These IBD matrices were provided as part of the GAW17 data set.

Phenotype model

A common disease, with a prevalence of 30%, was simulated along with three related quantitative risk factors, Q1, Q2, and Q4. Smoking status (prevalence 25%) was also simulated. Phenotype simulations were performed multiple times to generate 200 replicates of the unrelated individuals and pedigree data sets. Note that the genotype data remained constant across replicates, as did age, sex, and pedigree configuration. Knowledge about biological pathways and statistical predictions regarding the potential deleteriousness of coding variants was used in designing the simulation model. Pathways for gene enrichment were selected from the publicly available Kyoto Encyclopedia of Genes and Genomes (KEGG) database (http://www.genome.jp/kegg/) and the proprietary software Ingenuity Pathways Analysis (IPA), version 8.7 (http://www.ingenuity.com). The vascular endothelial growth factor (VEGF) pathway was observed to have numerous genes with available typed SNPs and was therefore selected as the source of a subset of the functional loci for phenotype simulations. The IPA version of the VEGF signaling pathway was used as the core source because it included most of the genes in the KEGG VEGF signaling pathway as well as some additional upstream information, primarily relating to VEGF transcriptional control. The overlap between the two data sources was considered significant enough not to impede the investigations of any researchers limited to the freely available KEGG data set. Genes influencing Q1 come primarily from the VEGF pathway; those influencing Q2 were chosen without reference to pathways and were primarily related to cardiovascular disease risk and inflammation, and those influencing latent disease liability also came primarily from the VEGF pathway (a different section from the one in which genes were selected for Q1). Effect sizes for coding variants within genes were assigned using PolyPhen and SIFT predictions of the likelihood that the variant would be deleterious. The functional variants included both rare and common alleles and a range of effect sizes, with most having small effects but a few having large effects that should be reliably detectable in most replicates of the data set. Some genes contained a single functional variant and others contained many. Population origin of the 1000 Genomes Project participants was not used in the phenotype simulations. In general, there was little disequilibrium between the functional variants (Figures 2, 3, 4), with a few exceptions that were primarily private variants carried in a single copy by the same individual (e.g., C3S4836 and C10S3092 for Q2 and C4S1877 and C4S1889 for Q1).
Figure 2

Gametic disequilibrium (. Markers are shown in chromosomal order from bottom to top and from left to right and are symmetric across the diagonal.

Figure 3

Gametic disequilibrium (. Markers are shown in chromosomal order from bottom to top and from left to right and are symmetric across the diagonal.

Figure 4

Gametic disequilibrium (. Markers are shown in chromosomal order from bottom to top and from left to right and are symmetric across the diagonal.

Gametic disequilibrium (. Markers are shown in chromosomal order from bottom to top and from left to right and are symmetric across the diagonal. Gametic disequilibrium (. Markers are shown in chromosomal order from bottom to top and from left to right and are symmetric across the diagonal. Gametic disequilibrium (. Markers are shown in chromosomal order from bottom to top and from left to right and are symmetric across the diagonal. Quantitative risk factors Q1, Q2, and Q4 were simulated as normally distributed phenotypes. Disease was simulated using a liability threshold model, and the top 30% of the distribution was declared affected. All SNP effects were additive on the quantitative trait or liability scale, with each copy of the minor allele increasing the mean trait value by an equal amount. Genotype by environment effects were simulated for Q1. Because genotype, age, and sex were held constant across replicates, the variation in phenotype across replicates came primarily from the residual polygenic and residual environmental components. The residual polygenic components were correlated between relatives, by definition, and also correlated between Q1, Q2, and latent liability. The residual environmental components were unique to each individual and were simulated to be weakly correlated between Q1, Q2, and latent liability.

Q1

Quantitative risk factor Q1 was influenced by 39 SNPs in 9 genes (see Table 1). There were 1–11 functional variants per gene. Their MAFs in the 1000 Genomes Project data ranged from 0.07% (i.e., a single copy of the minor allele) to 16.5%. In all cases, the minor allele was associated with higher mean Q1; the β column in the table provides the displacement in mean levels of Q1 for each copy of the minor allele. Q1 also had a residual heritability of 0.44, resulting from variants at loci not included in the current sequence data set. The residual genetic component of Q1 was correlated with the residual genetic components of Q2 and latent liability. There were also weaker environmental correlations between Q1 and Q2 and latent liability. Values of Q1 were higher in smokers, and there was genotype by smoking interaction for the effects of variants in the KDR gene on Q1. Effects of the KDR variants were 50% higher in smokers than in nonsmokers. (Note that for KDR the effect sizes given in Table 1 are those for nonsmokers.) Q1 also increased with age.
Table 1

Effects on Q1

GeneSNPMAFβ
ARNTC1S65330.0114780.589734
ARNTC1S65370.0007170.642689
ARNTC1S65400.0014350.323662
ARNTC1S65420.0021520.488219
ARNTC1S65610.0007170.625721
ELAVL4C1S31810.0007170.795093
ELAVL4C1S31820.0007170.328748
FLT1C13S3200.0014350.18047
FLT1C13S3990.0007170.457361
FLT1C13S4310.0172170.732566
FLT1C13S4790.0007170.839669
FLT1C13S5050.0007170.38582
FLT1C13S5140.0007170.549816
FLT1C13S5220.0279770.623466
FLT1C13S5230.0667140.653351
FLT1C13S5240.0043040.596704
FLT1C13S5470.0007170.549214
FLT1C13S5670.0007170.0905862
FLT4C5S51330.0014350.120761
FLT4C5S51560.0007170.385374
HIF1AC14S17180.0007170.251622
HIF1AC14S17290.0021520.329088
HIF1AC14S17340.0121950.220448
HIF1AC14S17360.0007170.228202
HIF3AC19S47990.0007170.174668
HIF3AC19S48150.0007170.51468
HIF3AC19S48310.0007170.265181
KDRC4S18610.0021520.598271
KDRC4S18730.0007170.715613
KDRC4S18740.0007170.503025
KDRC4S18770.0007171.17194
KDRC4S18780.1649930.149975
KDRC4S18790.0007170.610938
KDRC4S18840.0208030.318125
KDRC4S18870.0007170.312058
KDRC4S18890.0007171.17194
KDRC4S18900.0021520.417977
VEGFAC6S29810.0021521.13045
VEGFCC4S49350.0007171.40529
Effects on Q1

Q2

Q2 was influenced by 72 SNPs in 13 genes (see Table 2). There were 1–13 functional variants per gene. MAFs ranged from 0.07% to 17.07%. In all cases, the minor allele was associated with higher mean Q2. Q2 had a residual heritability of 0.29. The residual genetic component of Q2 was correlated with the residual genetic components of Q1 and latent liability. There were also weaker environmental correlations between Q2 and Q1 and latent liability. Q2 was not influenced by age, sex, or smoking status.
Table 2

Effects on Q2

GeneSNPMAFβ
BCHEC3S48340.0007170.232562
BCHEC3S48360.0007170.352589
BCHEC3S48560.0007170.311344
BCHEC3S48590.0021520.557489
BCHEC3S48600.0007170.339017
BCHEC3S48620.0007170.93321
BCHEC3S48670.0007170.67704
BCHEC3S48690.0007171.15994
BCHEC3S48730.0028690.588113
BCHEC3S48740.0007171.06857
BCHEC3S48750.0007171.15207
BCHEC3S48760.0007170.798247
BCHEC3S48800.0014350.164995
GCKRC2S3540.0121950.396642
INSIG1C7S51320.0007170.0983783
INSIG1C7S51330.0007170.106056
INSIG1C7S51440.0007170.237783
LPLC8S4420.0157820.490165
LPLC8S4760.0007170.725673
LPLC8S5300.0014350.800024
PDGFDC11S52920.0086080.60155
PDGFDC11S52990.0007170.823159
PDGFDC11S53010.0007170.982146
PDGFDC11S53020.0014350.814925
PLATC8S17410.0035870.71858
PLATC8S17420.0007170.891241
PLATC8S17580.0014350.86814
PLATC8S17700.0007170.58405
PLATC8S17720.0014350.219187
PLATC8S17730.0014350.515733
PLATC8S17990.0057390.190653
PLATC8S18110.0014350.0753783
RARBC3S6350.0007170.653224
RARBC3S6790.0050220.632142
SIRT1C10S30480.0021520.825893
SIRT1C10S30500.0021520.956865
SIRT1C10S30580.0007170.393157
SIRT1C10S30920.0007170.352589
SIRT1C10S30930.0007170.47264
SIRT1C10S31070.0007170.99946
SIRT1C10S31080.0007170.52925
SIRT1C10S31090.0007170.57047
SIRT1C10S31100.0021520.117719
SREBF1C17S10070.0021520.548739
SREBF1C17S10090.0007170.716057
SREBF1C17S10240.0043040.447239
SREBF1C17S10300.0007170.734055
SREBF1C17S10430.0043040.459494
SREBF1C17S10450.0035870.30998
SREBF1C17S10460.0028690.604567
SREBF1C17S10480.0014350.297328
SREBF1C17S10550.0014350.957889
SREBF1C17S10560.0007170.46384
VLDLRC9S3670.0007170.510889
VLDLRC9S3760.0028690.543897
VLDLRC9S3770.0014351.20543
VLDLRC9S3910.0007170.483147
VLDLRC9S4300.0007170.677573
VLDLRC9S4430.0014350.61953
VLDLRC9S4440.0014350.901646
VLDLRC9S4970.0007170.731422
VNN1C6S53780.0057390.466305
VNN1C6S53800.1707320.248606
VNN3C6S54120.0007170.551757
VNN3C6S54260.0329990.110779
VNN3C6S54390.0007170.127341
VNN3C6S54410.0982780.268411
VNN3C6S54460.0007170.528353
VNN3C6S54480.0007170.581462
VNN3C6S54490.0100430.680317
VWFC12S1810.0007170.76848
VWFC12S2110.0057390.337463
Effects on Q2

Q4

Q4 had a heritability of 0.70, but none of this genetic component was due to genes in this sequencing set (i.e., it was not influenced by any of the genotyped exonic SNPs). Q4 was lower in smokers, decreased with age, and was lower in females. Q4 was protective; that is, individuals with higher levels of Q4 had lower risk of disease.

Affection status

A normally distributed latent liability trait (not included in the distributed phenotype data) was simulated; it was influenced by 51 SNPs in 15 genes with 1–24 functional variants per gene (see Table 3). MAFs of these variants ranged from 0.07% to 25.8%. In all cases, the minor allele was associated with higher mean liability. This latent liability trait was also higher in smokers and increased with age. Disease risk was a function of this latent liability, Q1, Q2, and Q4:
Table 3

Effects on disease liability

GeneSNPMAFβ
AKT3C1S113960.0007170.340456
BCL2L11C2S22860.0007170.274592
BCL2L11C2S22880.0028690.563598
BCL2L11C2S23070.0007170.606816
ELAVL4C1S31810.0007170.64359
ELAVL4C1S31820.0007170.214219
HSP90AA1C14S36300.0007170.0579258
HSP90AA1C14S36950.0007170.152617
HSP90AA1C14S37040.0035870.0789197
HSP90AA1C14S37060.2582500.0874168
NRASC1S57480.0007170.409806
PIK3C2BC1S91640.0014350.205094
PIK3C2BC1S91650.0007170.141183
PIK3C2BC1S91720.0043040.508901
PIK3C2BC1S91730.0014350.12026
PIK3C2BC1S91740.0007170.634406
PIK3C2BC1S91890.0064560.454308
PIK3C2BC1S92000.0007170.679158
PIK3C2BC1S92220.0007170.38177
PIK3C2BC1S92500.0014350.358232
PIK3C2BC1S92660.0028690.184476
PIK3C2BC1S92670.0021520.504508
PIK3C2BC1S93060.0007170.239692
PIK3C2BC1S93200.0007170.653693
PIK3C2BC1S93330.0007170.703217
PIK3C2BC1S93460.0007170.29823
PIK3C2BC1S93730.0007170.399922
PIK3C2BC1S93910.0007170.582382
PIK3C2BC1S94230.0007170.590111
PIK3C2BC1S94320.0107600.461306
PIK3C2BC1S94450.0007170.582247
PIK3C2BC1S94460.0007170.477664
PIK3C2BC1S94490.0007170.647146
PIK3C2BC1S94550.0028690.518095
PIK3C2BC1S94570.0007170.497112
PIK3C3C18S24750.0007170.695313
PIK3C3C18S24920.0172170.576351
PIK3R3C1S29190.0007170.414798
PRKCAC17S45780.1664280.39334
PRKCAC17S45810.0007170.129034
PRKCB1C16S18940.0007170.45754
PTK2C8S48250.0007170.0164796
PTK2C8S48390.0007170.142502
PTK2BC8S8860.0007170.466067
PTK2BC8S9000.0014350.111154
PTK2BC8S9090.0014350.431062
RRASC19S49290.0014350.34384
RRASC19S49370.0014350.462103
SHC1C1S70610.0064560.206036
SOS2C14S13810.0007170.613801
SOS2C14S13820.0035870.633247
Effects on disease liability Liability to disease = latent liability + Q1 + Q2 – Q4. (1) Using this formula, a quantitative liability score was calculated for each individual, and the top 30% of the distribution in each simulation replicate was declared affected. A consequence of this assignment was that each replicate had the same number of affected individuals, although the identity of these individuals varied across replicates. The effect sizes in Table 3 are for liability to disease.

Competing interests

The authors declare that there are no competing interests.

Authors’ contributions

LA, TDD, and JB designed the phenotypic trait model and the pedigree and unrelated individuals data sets. JMP aligned and processed sequence data. JCC and JEC provided input on incorporation of pathway and functional information and selection of functional variants. TDD conducted the gene dropping and phenotype simulations. JWK, TDD, LA, and JB performed error checking on the simulated data. All authors read and approved the final manuscript.
  5 in total

1.  GAW12: simulated genome scan, sequence, and family data for a common disease.

Authors:  L Almasy; J D Terwilliger; D Nielsen; T D Dyer; D Zaykin; J Blangero
Journal:  Genet Epidemiol       Date:  2001       Impact factor: 2.135

2.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data.

Authors:  Aaron McKenna; Matthew Hanna; Eric Banks; Andrey Sivachenko; Kristian Cibulskis; Andrew Kernytsky; Kiran Garimella; David Altshuler; Stacey Gabriel; Mark Daly; Mark A DePristo
Journal:  Genome Res       Date:  2010-07-19       Impact factor: 9.043

3.  GAW10: simulated family data for a common oligogenic disease with quantitative risk factors.

Authors:  J W MacCluer; J Blangero; T D Dyer; M C Speer
Journal:  Genet Epidemiol       Date:  1997       Impact factor: 2.135

4.  Chromosome-based method for rapid computer simulation in human genetic linkage analysis.

Authors:  J D Terwilliger; M Speer; J Ott
Journal:  Genet Epidemiol       Date:  1993       Impact factor: 2.135

5.  A map of human genome variation from population-scale sequencing.

Authors:  Gonçalo R Abecasis; David Altshuler; Adam Auton; Lisa D Brooks; Richard M Durbin; Richard A Gibbs; Matt E Hurles; Gil A McVean
Journal:  Nature       Date:  2010-10-28       Impact factor: 49.962

  5 in total
  123 in total

1.  Multiple testing in high-throughput sequence data: experiences from Group 8 of Genetic Analysis Workshop 17.

Authors:  Inke R König; Jeremie Nsengimana; Charalampos Papachristou; Matthew A Simonson; Kai Wang; Jason A Weisburd
Journal:  Genet Epidemiol       Date:  2011       Impact factor: 2.135

2.  Two-stage extreme phenotype sequencing design for discovering and testing common and rare genetic variants: efficiency and power.

Authors:  Guolian Kang; Dongyu Lin; Hakon Hakonarson; Jinbo Chen
Journal:  Hum Hered       Date:  2012-06-07       Impact factor: 0.444

3.  A general approach for combining diverse rare variant association tests provides improved robustness across a wider range of genetic architectures.

Authors:  Brian Greco; Allison Hainline; Jaron Arbet; Kelsey Grinde; Alejandra Benitez; Nathan Tintle
Journal:  Eur J Hum Genet       Date:  2015-10-28       Impact factor: 4.246

4.  A novel method, the Variant Impact On Linkage Effect Test (VIOLET), leads to improved identification of causal variants in linkage regions.

Authors:  Lisa J Martin; Lili Ding; Xue Zhang; Ahmed H Kissebah; Michael Olivier; D Woodrow Benson
Journal:  Eur J Hum Genet       Date:  2013-06-05       Impact factor: 4.246

5.  A fast and powerful aggregated Cauchy association test for joint analysis of multiple phenotypes.

Authors:  Lili Chen; Yajing Zhou
Journal:  Genes Genomics       Date:  2021-01-11       Impact factor: 1.839

6.  Incorporating group correlations in genome-wide association studies using smoothed group Lasso.

Authors:  Jin Liu; Jian Huang; Shuangge Ma; Kai Wang
Journal:  Biostatistics       Date:  2012-09-17       Impact factor: 5.899

7.  Evaluation of pooled association tests for rare variant identification.

Authors:  Wan-Yu Lin; Boshao Zhang; Nengjun Yi; Guimin Gao; Nianjun Liu
Journal:  BMC Proc       Date:  2011-11-29

8.  A powerful and adaptive association test for rare variants.

Authors:  Wei Pan; Junghi Kim; Yiwei Zhang; Xiaotong Shen; Peng Wei
Journal:  Genetics       Date:  2014-05-15       Impact factor: 4.562

9.  Boosting the Power of the Sequence Kernel Association Test by Properly Estimating Its Null Distribution.

Authors:  Kai Wang
Journal:  Am J Hum Genet       Date:  2016-06-09       Impact factor: 11.025

10.  Pathway-based approach using hierarchical components of collapsed rare variants.

Authors:  Sungyoung Lee; Sungkyoung Choi; Young Jin Kim; Bong-Jo Kim; Heungsun Hwang; Taesung Park
Journal:  Bioinformatics       Date:  2016-09-01       Impact factor: 6.937

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.