Marc Pybus1, Pierre Luisi2, Giovanni Marco Dall'Olio3, Manu Uzkudun1, Hafid Laayouni4, Jaume Bertranpetit1, Johannes Engelken1. 1. Institut de Biologia Evolutiva (UPF-CSIC), Universitat Pompeu Fabra, Barcelona 08003, Spain. 2. Institut de Biologia Evolutiva (UPF-CSIC), Universitat Pompeu Fabra, Barcelona 08003, Spain, Department of Biology, Stanford University, Stanford, CA 94305, USA. 3. Institut de Biologia Evolutiva (UPF-CSIC), Universitat Pompeu Fabra, Barcelona 08003, Spain, Division of Cancer Studies, King's College of London, London SE1 1UL, UK and. 4. Institut de Biologia Evolutiva (UPF-CSIC), Universitat Pompeu Fabra, Barcelona 08003, Spain, Departament de Genètica i de Microbiologia, Universitat Autonòma de Barcelona, Bellaterra 8193, Spain.
Abstract
MOTIVATION: Detecting positive selection in genomic regions is a recurrent topic in natural population genetic studies. However, there is little consistency among the regions detected in several genome-wide scans using different tests and/or populations. Furthermore, few methods address the challenge of classifying selective events according to specific features such as age, intensity or state (completeness). RESULTS: We have developed a machine-learning classification framework that exploits the combined ability of some selection tests to uncover different polymorphism features expected under the hard sweep model, while controlling for population-specific demography. As a result, we achieve high sensitivity toward hard selective sweeps while adding insights about their completeness (whether a selected variant is fixed or not) and age of onset. Our method also determines the relevance of the individual methods implemented so far to detect positive selection under specific selective scenarios. We calibrated and applied the method to three reference human populations from The 1000 Genome Project to generate a genome-wide classification map of hard selective sweeps. This study improves detection of selective sweep by overcoming the classical selection versus no-selection classification strategy, and offers an explanation to the lack of consistency observed among selection tests when applied to real data. Very few signals were observed in the African population studied, while our method presents higher sensitivity in this population demography. AVAILABILITY AND IMPLEMENTATION: The genome-wide results for three human populations from The 1000 Genomes Project and an R-package implementing the 'Hierarchical Boosting' framework are available at http://hsb.upf.edu/.
MOTIVATION: Detecting positive selection in genomic regions is a recurrent topic in natural population genetic studies. However, there is little consistency among the regions detected in several genome-wide scans using different tests and/or populations. Furthermore, few methods address the challenge of classifying selective events according to specific features such as age, intensity or state (completeness). RESULTS: We have developed a machine-learning classification framework that exploits the combined ability of some selection tests to uncover different polymorphism features expected under the hard sweep model, while controlling for population-specific demography. As a result, we achieve high sensitivity toward hard selective sweeps while adding insights about their completeness (whether a selected variant is fixed or not) and age of onset. Our method also determines the relevance of the individual methods implemented so far to detect positive selection under specific selective scenarios. We calibrated and applied the method to three reference human populations from The 1000 Genome Project to generate a genome-wide classification map of hard selective sweeps. This study improves detection of selective sweep by overcoming the classical selection versus no-selection classification strategy, and offers an explanation to the lack of consistency observed among selection tests when applied to real data. Very few signals were observed in the African population studied, while our method presents higher sensitivity in this population demography. AVAILABILITY AND IMPLEMENTATION: The genome-wide results for three human populations from The 1000 Genomes Project and an R-package implementing the 'Hierarchical Boosting' framework are available at http://hsb.upf.edu/.
Authors: Mayukh Mondal; Ferran Casals; Tina Xu; Giovanni M Dall'Olio; Marc Pybus; Mihai G Netea; David Comas; Hafid Laayouni; Qibin Li; Partha P Majumder; Jaume Bertranpetit Journal: Nat Genet Date: 2016-07-25 Impact factor: 38.330
Authors: Pâmela M Rezende; Joicymara S Xavier; David B Ascher; Gabriel R Fernandes; Douglas E V Pires Journal: Brief Bioinform Date: 2022-07-18 Impact factor: 13.994