| Literature DB >> 12939786 |
A Gregory DiRienzo1, Victor DeGruttola, Brendan Larder, Kurt Hertogs.
Abstract
Medical management of HIV infection requires an understanding of the relationship between viral genetic sequences and viral susceptibility to antiretroviral drugs. Because of the high dimensionality of the data on viral genotype, traditional statistical methods are not well suited for investigating this relationship. We develop non-parametric methods specifically for the setting where high-dimensional data provides a basis for predicting a low-dimensional response variable. Our non-recursive methods proceed in three stages: (i) build models, in a forward-stepwise manner, that predict phenotype response from genotype sequence; (ii) identify specific patterns of amino acid sequence that are most influential in predicting phenotype, and (iii) identify combinations of codons that have either a concordant or a discordant association in the occurrence of a mutation. The methods are applied to a data set provided by the Virco Group that contains protease genome sequences and IC50 measurements on a drug from the protease inhibitor class, amprenavir, for 2747 patient samples. From these methods, we were able to identify eight codons from the protease region of the HIV genome that predict resistance to amprenavir, and to determine pairs of codons that tend either to occur together or to preclude the occurrence of the other member of the pair. Copyright 2003 John Wiley & Sons, Ltd.Entities:
Mesh:
Substances:
Year: 2003 PMID: 12939786 DOI: 10.1002/sim.1516
Source DB: PubMed Journal: Stat Med ISSN: 0277-6715 Impact factor: 2.373