| Literature DB >> 20348396 |
Abstract
Missing data arise in genetic association studies when genotypes are unknown or when haplotypes are of direct interest. We provide a general likelihood-based framework for making inference on genetic effects and gene-environment interactions with such missing data. We allow genetic and environmental variables to be correlated while leaving the distribution of environmental variables completely unspecified. We consider 3 major study designs-cross-sectional, case-control, and cohort designs-and construct appropriate likelihood functions for all common phenotypes (e.g. case-control status, quantitative traits, and potentially censored ages at onset of disease). The likelihood functions involve both finite- and infinite-dimensional parameters. The maximum likelihood estimators are shown to be consistent, asymptotically normal, and asymptotically efficient. Expectation-Maximization (EM) algorithms are developed to implement the corresponding inference procedures. Extensive simulation studies demonstrate that the proposed inferential and numerical methods perform well in practical settings. Illustration with a genome-wide association study of lung cancer is provided.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20348396 PMCID: PMC3294269 DOI: 10.1093/biostatistics/kxq015
Source DB: PubMed Journal: Biostatistics ISSN: 1465-4644 Impact factor: 5.899