J Kowalski1. 1. Department of Biostatistics, Harvard School of Public Health, Boston, MA 02115, USA.
Abstract
MOTIVATION: The analysis of genetic data poses statistical problems in the form of high dimensionality with small sample sizes. The construction of a composite gene region (sequence pair) heterogeneity measure is one technique for reducing the dimensionality of the problem. This approach however is not without cost, since the contribution of locations to observed gene region differences between groups becomes entangled in this summary measure. This is problematic since it is of scientific interest to identify locations that together depict phenotype. RESULTS: A method is proposed for relating observed gene region heterogeneity back to the location level. In the spirit of a factor analysis-type setting, the approach focuses on identifying a latent variable structure among locations to explain within and between group genetic differences associated with phenotype. The method is flexible for identifying either the additive contribution from individual locations or the additive contribution from a group of locations, to observed gene region heterogeneity, depending upon the weighting scheme used in constructing a gene region heterogeneity measure. The approach is illustrated with clinical trial data, where the problem of altered HIV drug susceptibility is examined through characterizing location contributions to HIV protease gene region differences associated with a phenotypic treatment response. AVAILABILITY: The Splus (MathSoft, Inc. S-Plus 2000, Seattle, WA, 1999) developed menu-driven functions for obtaining results, GENE_ S (J.Kowalski, Harvard School of Public Health, Boston, MA 2001), is available from the author upon request.
MOTIVATION: The analysis of genetic data poses statistical problems in the form of high dimensionality with small sample sizes. The construction of a composite gene region (sequence pair) heterogeneity measure is one technique for reducing the dimensionality of the problem. This approach however is not without cost, since the contribution of locations to observed gene region differences between groups becomes entangled in this summary measure. This is problematic since it is of scientific interest to identify locations that together depict phenotype. RESULTS: A method is proposed for relating observed gene region heterogeneity back to the location level. In the spirit of a factor analysis-type setting, the approach focuses on identifying a latent variable structure among locations to explain within and between group genetic differences associated with phenotype. The method is flexible for identifying either the additive contribution from individual locations or the additive contribution from a group of locations, to observed gene region heterogeneity, depending upon the weighting scheme used in constructing a gene region heterogeneity measure. The approach is illustrated with clinical trial data, where the problem of altered HIV drug susceptibility is examined through characterizing location contributions to HIV protease gene region differences associated with a phenotypic treatment response. AVAILABILITY: The Splus (MathSoft, Inc. S-Plus 2000, Seattle, WA, 1999) developed menu-driven functions for obtaining results, GENE_ S (J.Kowalski, Harvard School of Public Health, Boston, MA 2001), is available from the author upon request.
Authors: Daniel J Schaid; Shannon K McDonnell; Scott J Hebbring; Julie M Cunningham; Stephen N Thibodeau Journal: Am J Hum Genet Date: 2005-03-22 Impact factor: 11.025