| Literature DB >> 25628874 |
Andrea Benazzo1, Alex Panziera2, Giorgio Bertorelle1.
Abstract
Massive DNA sequencing has significantly increased the amount of data available for population genetics and molecular ecology studies. However, the parallel computation of simple statistics within and between populations from large panels of polymorphic sites is not yet available, making the exploratory analyses of a set or subset of data a very laborious task. Here, we present 4P (parallel processing of polymorphism panels), a stand-alone software program for the rapid computation of genetic variation statistics (including the joint frequency spectrum) from millions of DNA variants in multiple individuals and multiple populations. It handles a standard input file format commonly used to store DNA variation from empirical or simulation experiments. The computational performance of 4P was evaluated using large SNP (single nucleotide polymorphism) datasets from human genomes or obtained by simulations. 4P was faster or much faster than other comparable programs, and the impact of parallel computing using multicore computers or servers was evident. 4P is a useful tool for biologists who need a simple and rapid computer program to run exploratory population genetics analyses in large panels of genomic data. It is also particularly suitable to analyze multiple data sets produced in simulation studies. Unix, Windows, and MacOs versions are provided, as well as the source code for easier pipeline implementations.Entities:
Keywords: Allelic spectrum; Fst; NGS; genetic indicators; genetic variation; software
Year: 2014 PMID: 25628874 PMCID: PMC4298444 DOI: 10.1002/ece3.1261
Source DB: PubMed Journal: Ecol Evol ISSN: 2045-7758 Impact factor: 2.912
Figure 14P execution times. (A) The time required by 4P to compute five different pairwise measures of genetic differentiation (see the main text for details) is reported as a function of the number of core; different lines correspond to datasets with different numbers of SNPs. (B) The time required by 4P and PLINK to compute expected and observed heterozygosities is reported as a function of the data set size; PLINK is not implemented for multiple cores.