Literature DB >> 16646798

Relating HIV-1 sequence variation to replication capacity via trees and forests.

Mark R Segal1, Jason D Barbour, Robert M Grant.   

Abstract

The problem of relating genotype (as represented by amino acid sequence) to phenotypes is distinguished from standard regression problems by the nature of sequence data. Here we investigate an instance of such a problem where the phenotype of interest is HIV-1 replication capacity and contiguous segments of protease and reverse transcriptase sequence constitutes genotype. A variety of data analytic methods have been proposed in this context. Shortcomings of select techniques are contrasted with the advantages afforded by tree-structured methods. However, tree-structured methods, in turn, have been criticized on grounds of only enjoying modest predictive performance. A number of ensemble approaches (bagging, boosting, random forests) have recently emerged, devised to overcome this deficiency. We evaluate random forests as applied in this setting, and detail why prediction gains obtained in other situations are not realized. Other approaches including logic regression, support vector machines and neural networks are also applied. We interpret results in terms of HIV-1 reverse transcriptase structure and function.

Entities:  

Year:  2004        PMID: 16646798     DOI: 10.2202/1544-6115.1031

Source DB:  PubMed          Journal:  Stat Appl Genet Mol Biol        ISSN: 1544-6115


  18 in total

1.  A framework for inferring fitness landscapes of patient-derived viruses using quasispecies theory.

Authors:  David Seifert; Francesca Di Giallonardo; Karin J Metzner; Huldrych F Günthard; Niko Beerenwinkel
Journal:  Genetics       Date:  2014-11-17       Impact factor: 4.562

2.  Importance measures for epistatic interactions in case-parent trios.

Authors:  Holger Schwender; Katherine Bowers; M Daniele Fallin; Ingo Ruczinski
Journal:  Ann Hum Genet       Date:  2010-11-30       Impact factor: 1.670

Review 3.  Random forests for genomic data analysis.

Authors:  Xi Chen; Hemant Ishwaran
Journal:  Genomics       Date:  2012-04-21       Impact factor: 5.736

4.  The peaks and geometry of fitness landscapes.

Authors:  Kristina Crona; Devin Greene; Miriam Barlow
Journal:  J Theor Biol       Date:  2012-10-02       Impact factor: 2.691

5.  An introduction to recursive partitioning: rationale, application, and characteristics of classification and regression trees, bagging, and random forests.

Authors:  Carolin Strobl; James Malley; Gerhard Tutz
Journal:  Psychol Methods       Date:  2009-12

6.  Only slight impact of predicted replicative capacity for therapy response prediction.

Authors:  Hendrik Weisser; André Altmann; Saleta Sierra; Francesca Incardona; Daniel Struck; Anders Sönnerborg; Rolf Kaiser; Maurizio Zazzi; Monika Tschochner; Hauke Walter; Thomas Lengauer
Journal:  PLoS One       Date:  2010-02-03       Impact factor: 3.240

7.  Tree-Based Methods for Discovery of Association between Flow Cytometry Data and Clinical Endpoints.

Authors:  M Eliot; L Azzoni; C Firnhaber; W Stevens; D K Glencross; I Sanne; L J Montaner; A S Foulkes
Journal:  Adv Bioinformatics       Date:  2010-01-21

8.  EM-random forest and new measures of variable importance for multi-locus quantitative trait linkage analysis.

Authors:  Sophia S F Lee; Lei Sun; Rafal Kustra; Shelley B Bull
Journal:  Bioinformatics       Date:  2008-05-21       Impact factor: 6.937

9.  Factors Associated with HIV Testing Among Participants from Substance Use Disorder Treatment Programs in the US: A Machine Learning Approach.

Authors:  Yue Pan; Hongmei Liu; Lisa R Metsch; Daniel J Feaster
Journal:  AIDS Behav       Date:  2017-02

10.  Application of two machine learning algorithms to genetic association studies in the presence of covariates.

Authors:  Bareng A S Nonyane; Andrea S Foulkes
Journal:  BMC Genet       Date:  2008-11-14       Impact factor: 2.797

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.