| Literature DB >> 25206364 |
Murray Cadzow1, James Boocock1, Hoang T Nguyen2, Phillip Wilcox3, Tony R Merriman1, Michael A Black1.
Abstract
The detection of "signatures of selection" is now possible on a genome-wide scale in many plant and animal species, and can be performed in a population-specific manner due to the wealth of per-population genome-wide genotype data that is available. With genomic regions that exhibit evidence of having been under selection shown to also be enriched for genes associated with biologically important traits, detection of evidence of selective pressure is emerging as an additional approach for identifying novel gene-trait associations. While high-density genotype data is now relatively easy to obtain, for many researchers it is not immediately obvious how to go about identifying signatures of selection in these data sets. Here we describe a basic workflow, constructed from open source tools, for detecting and examining evidence of selection in genomic data. Code to install and implement the pipeline components, and instructions to run a basic analysis using the workflow described here, can be downloaded from our public GitHub repository: http://www.github.com/smilefreak/selectionTools/Entities:
Keywords: analysis pipeline; genome-wide; genomics; signatures of selection
Year: 2014 PMID: 25206364 PMCID: PMC4144660 DOI: 10.3389/fgene.2014.00293
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Software tools used in the selection analysis workflow.
| R ≥ v3.0 | rehh | |
| Perl ≥ v5.0 | Vcftools modules vcf-subset and vcf-merge | |
| Python ≥ v2.6 | Running pipeline, haps file filtering and ancestral allele annotation | |
| rehh v1.11 | Calculating iHS (and other EHH-based measures) | |
| vcftools v1.11 | Conversion of VCF genotype data to PLINK format, and calculation of FST and Tajima's D | |
| SHAPEIT v2.r790 | Phasing the PLINK formatted data to produce phased haplotype file | |
| Beagle v4 r1274 | Phasing un-phased VCF data to produce phased haplotype file | |
| PLINK v1.07 | Remove SNPs with too many genotypes missing, filter on HWE and MAF | |
| tabix/bgzip v0.2.5 | Required to get VCF into compressed and indexed format for vcftools | |
| Multicore v0.1-7 | R multicore package used to parallelise rehh runs | |
| impute2 v2.3.1 | Imputing genotypes from phased haplotype data | |
| Pyfasta v0.5.2 | Required to process ancestral fasta files | |
| PyVcf v0.6.0 | Required to process VCF files in python scripts | |
| Variscan v2.0.3 | Calculation of Fay and Wu's H |
Figure 1Plots of Rsb (top row) and iHS (middle and bottom rows) values across chromosome 2 (whole chromosome in the left column, and the region around the LCT gene in the right column) based on 1000 Genomes Project data for the CEU and YRI populations. Blue vertical lines/boxes on the plots indicate the location of the LCT gene, and the red horizontal lines denote a p-value of less than 5% for any Rsb value above the line. The marked deviation of iHS away from zero in the CEU population provides evidence for the region around the LCT gene having been under selective pressure in the past. In contrast, there is no such evidence in the YRI population, which is also communicated by the Rsb statistic, which examines the relative evidence for selection in the two populations, here indicating that there is stronger evidence for this region having been under selective pressure in the CEU cohort.