Literature DB >> 23795232

Efficient distribution estimation for data with unobserved sub-population identifiers.

Yanyuan Ma1, Yuanjia Wang.   

Abstract

We study efficient nonparametric estimation of distribution functions of several scientifically meaningful sub-populations from data consisting of mixed samples where the sub-population identifiers are missing. Only probabilities of each observation belonging to a sub-population are available. The problem arises from several biomedical studies such as quantitative trait locus (QTL) analysis and genetic studies with ungenotyped relatives where the scientific interest lies in estimating the cumulative distribution function of a trait given a specific genotype. However, in these studies subjects' genotypes may not be directly observed. The distribution of the trait outcome is therefore a mixture of several genotype-specific distributions. We characterize the complete class of consistent estimators which includes members such as one type of nonparametric maximum likelihood estimator (NPMLE) and least squares or weighted least squares estimators. We identify the efficient estimator in the class that reaches the semiparametric efficiency bound, and we implement it using a simple procedure that remains consistent even if several components of the estimator are mis-specified. In addition, our close inspections on two commonly used NPMLEs in these problems show the surprising results that the NPMLE in one form is highly inefficient, while in the other form is inconsistent. We provide simulation procedures to illustrate the theoretical results and demonstrate the proposed methods through two real data examples.

Entities:  

Keywords:  Finite mixed samples; nonparametric maximum likelihood estimator (NPMLE); robustness; semiparametric efficiency

Year:  2012        PMID: 23795232      PMCID: PMC3685883          DOI: 10.1214/12-EJS690

Source DB:  PubMed          Journal:  Electron J Stat        ISSN: 1935-7524            Impact factor:   1.125


  15 in total

1.  A marginal likelihood approach for estimating penetrance from kin-cohort designs.

Authors:  N Chatterjee; S Wacholder
Journal:  Biometrics       Date:  2001-03       Impact factor: 2.571

2.  Re: Population-based, case-control study of HER2 genetic polymorphism and breast cancer risk.

Authors:  Michael Hauptmann; Alice J Sigurdson; Nilanjan Chatterjee; Joni L Rutter; Deirdre A Hill; Michele Morin Doody; Jeffery P Struewing
Journal:  J Natl Cancer Inst       Date:  2003-08-20       Impact factor: 13.506

3.  Variable Selection in Semiparametric Regression Modeling.

Authors:  Runze Li; Hua Liang
Journal:  Ann Stat       Date:  2008       Impact factor: 4.028

4.  Mapping mendelian factors underlying quantitative traits using RFLP linkage maps.

Authors:  E S Lander; D Botstein
Journal:  Genetics       Date:  1989-01       Impact factor: 4.562

5.  Apolipoprotein E polymorphisms affect atherosclerosis in young males. Pathobiological Determinants of Atherosclerosis in Youth (PDAY) Research Group.

Authors:  J E Hixson
Journal:  Arterioscler Thromb       Date:  1991 Sep-Oct

6.  Breast cancer risk in Ashkenazi BRCA1/2 mutation carriers: effects of reproductive history.

Authors:  Patricia Hartge; Nilanjan Chatterjee; Sholom Wacholder; Lawrence C Brody; Margaret A Tucker; Jeffery P Struewing
Journal:  Epidemiology       Date:  2002-05       Impact factor: 4.822

7.  Relations of plasma fibrinogen level in children to measures of obesity, the (G-455-->A) mutation in the beta-fibrinogen promoter gene, and family history of ischemic heart disease: the Columbia University BioMarkers Study.

Authors:  S Shea; C R Isasi; S Couch; T J Starc; R P Tracy; R Deckelbaum; P Talmud; L Berglund; S E Humphries
Journal:  Am J Epidemiol       Date:  1999-10-01       Impact factor: 4.897

Review 8.  Apolipoprotein E polymorphism and atherosclerosis.

Authors:  J Davignon; R E Gregg; C F Sing
Journal:  Arteriosclerosis       Date:  1988 Jan-Feb

9.  Risk of Parkinson disease in carriers of parkin mutations: estimation using the kin-cohort method.

Authors:  Yuanjia Wang; Lorraine N Clark; Elan D Louis; Helen Mejia-Santana; Juliette Harris; Lucien J Cote; Cheryl Waters; Howard Andrews; Blair Ford; Steven Frucht; Stanley Fahn; Ruth Ottman; Daniel Rabinowitz; Karen Marder
Journal:  Arch Neurol       Date:  2008-04

10.  Case-control, kin-cohort and meta-analyses provide no support for STK15 F31I as a low penetrance colorectal cancer allele.

Authors:  E L Webb; M F Rudd; R S Houlston
Journal:  Br J Cancer       Date:  2006-09-26       Impact factor: 7.640

View more
  4 in total

1.  COMBINING ISOTONIC REGRESSION AND EM ALGORITHM TO PREDICT GENETIC RISK UNDER MONOTONICITY CONSTRAINT.

Authors:  Jing Qin; Tanya P Garcia; Yanyuan Ma; Ming-Xin Tang; Karen Marder; Yuanjia Wang
Journal:  Ann Appl Stat       Date:  2014       Impact factor: 2.083

2.  Nonparametric estimation for censored mixture data with application to the Cooperative Huntington's Observational Research Trial.

Authors:  Yuanjia Wang; Tanya P Garcia; Yanyuan Ma
Journal:  J Am Stat Assoc       Date:  2012       Impact factor: 5.033

3.  SEMIPARAMETRIC LATENT-CLASS MODELS FOR MULTIVARIATE LONGITUDINAL AND SURVIVAL DATA.

Authors:  Kin Yau Wong; Donglin Zeng; D Y Lin
Journal:  Ann Stat       Date:  2022-02-16       Impact factor: 4.904

4.  Efficient Estimation of Nonparametric Genetic Risk Function with Censored Data.

Authors:  Yuanjia Wang; Baosheng Liang; Xingwei Tong; Karen Marder; Susan Bressman; Avi Orr-Urtreger; Nir Giladi; Donglin Zeng
Journal:  Biometrika       Date:  2015-09-01       Impact factor: 2.445

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.