MOTIVATION: Most genotyping technologies for single nucleotide polymorphism (SNP) markers use standard clustering methods to 'call' the SNP genotypes. These methods are not always optimal in distinguishing the genotype clusters of a SNP because they do not take advantage of specific features of the genotype calling problem. In particular, when family data are available, pedigree information is ignored. Furthermore, prior information about the distribution of the measurements for each cluster can be used to choose an appropriate model-based clustering method and can significantly improve the genotype calls. One special genotyping problem that has never been discussed in the literature is that of genotyping of trisomic individuals, such as individuals with Down syndrome. Calling trisomic genotypes is a more complicated problem, and the addition of external information becomes very important. RESULTS: In this article, we discuss the impact of incorporating external information into clustering algorithms to call the genotypes for both disomic and trisomic data. We also propose two new methods to call genotypes using family data. One is a modification of the K-means method and uses the pedigree information by updating all members of a family together. The other is a likelihood-based method that combines the Gaussian or beta-mixture model with pedigree information. We compare the performance of these two methods and some other existing methods using simulation studies. We also compare the performance of these methods on a real dataset generated by the Illumina platform (www.illumina.com). AVAILABILITY: The R code for the family-based genotype calling methods (SNPCaller) is available to be downloaded from the following website: http://watson.hgen.pitt.edu/register.
MOTIVATION: Most genotyping technologies for single nucleotide polymorphism (SNP) markers use standard clustering methods to 'call' the SNP genotypes. These methods are not always optimal in distinguishing the genotype clusters of a SNP because they do not take advantage of specific features of the genotype calling problem. In particular, when family data are available, pedigree information is ignored. Furthermore, prior information about the distribution of the measurements for each cluster can be used to choose an appropriate model-based clustering method and can significantly improve the genotype calls. One special genotyping problem that has never been discussed in the literature is that of genotyping of trisomic individuals, such as individuals with Down syndrome. Calling trisomic genotypes is a more complicated problem, and the addition of external information becomes very important. RESULTS: In this article, we discuss the impact of incorporating external information into clustering algorithms to call the genotypes for both disomic and trisomic data. We also propose two new methods to call genotypes using family data. One is a modification of the K-means method and uses the pedigree information by updating all members of a family together. The other is a likelihood-based method that combines the Gaussian or beta-mixture model with pedigree information. We compare the performance of these two methods and some other existing methods using simulation studies. We also compare the performance of these methods on a real dataset generated by the Illumina platform (www.illumina.com). AVAILABILITY: The R code for the family-based genotype calling methods (SNPCaller) is available to be downloaded from the following website: http://watson.hgen.pitt.edu/register.
Authors: Kimberly F Kerstann; Eleanor Feingold; Sallie B Freeman; Lora J H Bean; Robert Pyatt; Stuart Tinker; Amy H Jewel; George Capone; Stephanie L Sherman Journal: Genet Epidemiol Date: 2004-11 Impact factor: 2.135
Authors: Valentina Moskvina; Nick Craddock; Peter Holmans; Michael J Owen; Michael C O'Donovan Journal: Hum Hered Date: 2006-04-06 Impact factor: 0.444
Authors: Yik Y Teo; Michael Inouye; Kerrin S Small; Rhian Gwilliam; Panagiotis Deloukas; Dominic P Kwiatkowski; Taane G Clark Journal: Bioinformatics Date: 2007-09-10 Impact factor: 6.937
Authors: Adam E Locke; Kenneth J Dooley; Stuart W Tinker; Soo Yeon Cheong; Eleanor Feingold; Emily G Allen; Sallie B Freeman; Claudine P Torfs; Clifford L Cua; Michael P Epstein; Michael C Wu; Xihong Lin; George Capone; Stephanie L Sherman; Lora J H Bean Journal: Genet Epidemiol Date: 2010-09 Impact factor: 2.135
Authors: Elizabeth A Tindall; Desiree C Petersen; Stina Nikolaysen; Webb Miller; Stephan C Schuster; Vanessa M Hayes Journal: BMC Res Notes Date: 2010-02-22
Authors: Dhanya Ramachandran; Zhen Zeng; Adam E Locke; Jennifer G Mulle; Lora J H Bean; Tracie C Rosser; Kenneth J Dooley; Clifford L Cua; George T Capone; Roger H Reeves; Cheryl L Maslen; David J Cutler; Eleanor Feingold; Stephanie L Sherman; Michael E Zwick Journal: G3 (Bethesda) Date: 2015-07-20 Impact factor: 3.154
Authors: Jonathan M Chernus; Emily G Allen; Zhen Zeng; Eva R Hoffman; Terry J Hassold; Eleanor Feingold; Stephanie L Sherman Journal: PLoS Genet Date: 2019-12-12 Impact factor: 5.917