Literature DB >> 22025480

PoPoolation2: identifying differentiation between populations using sequencing of pooled DNA samples (Pool-Seq).

Robert Kofler¹, Ram Vinay Pandey, Christian Schlötterer.

Abstract

SUMMARY: Sequencing pooled DNA samples (Pool-Seq) is the most cost-effective approach for the genome-wide comparison of population samples. Here, we introduce PoPoolation2, the first software tool specifically designed for the comparison of populations with Pool-Seq data. PoPoolation2 implements a range of commonly used measures of differentiation (F(ST), Fisher's exact test and Cochran-Mantel-Haenszel test) that can be applied on different scales (windows, genes, exons, SNPs). The result may be visualized with the widely used Integrated Genomics Viewer.
AVAILABILITY AND IMPLEMENTATION: PoPoolation2 is implemented in Perl and R. It is freely available on http://code.google.com/p/popoolation2/ CONTACT: christian.schloetterer@vetmeduni.ac.at SUPPLEMENTARY INFORMATION: Manual: http://code.google.com/p/popoolation2/wiki/Manual Test data and tutorial: http://code.google.com/p/popoolation2/wiki/Tutorial Validation: http://code.google.com/p/popoolation2/wiki/Validation.

Entities: Chemical

Mesh：

Year: 2011 PMID： 22025480 PMCID： PMC3232374 DOI： 10.1093/bioinformatics/btr589

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

1 INTRODUCTION

Next-generation sequencing of pooled DNA samples (Pool-Seq) allows the comparison of population samples on a genomic scale, thus facilitating the transition from single marker studies to population genomics. Due to its cost-effectiveness (Futschik and Schlötterer, 2010), Pool-Seq can be used for a range of applications. The most intuitive application is the comparison of natural populations to perform standard population genetic analyses on a genomic scale (e.g. Begun ). The comparison of natural Arabidopsis lyrata populations from different habitats allowed the characterization of genes involved in heavy metal tolerance (Turner ). Also in experimental evolution studies, Pool-Seq has been used to identify genomic regions that show high differentiation between different selective treatments (Burke ; Parts ; Turner ). Finally, Pool-Seq offers an enormous potential for selective genotyping (Darvasi and Soller, 1994; Hillel ; Lander and Botstein, 1989). While several tools for analyzing Pool-Seq data of single populations are already available (Bansal, 2010; Kofler ; Pandey ), to our knowledge no standalone software tool is available for the comparison of Pool-Seq data for multiple populations. PoPoolation2 is a software tool dedicated to the comparison of allele frequencies between populations.

2 IMPLEMENTATION

As input PoPoolation2 requires a ‘pileup’ file for every population (sample) of interest or alternatively a single multi ‘pileup’ file (mpileup) may be used. These files can be obtained by mapping the reads of a Pool-Seq experiment to a reference genome and subsequently converting the mapping results into the ‘pileup/mpileup’ format with samtools (Li ) (For Manual see http://code.google.com/p/popoolation2/wiki/Manual; Test data and tutorial http://code.google.com/p/popoolation2/wiki/Tutorial). PoPoolation2 requires Pool-Seq data from at least two populations, but may be used with an unlimited number of populations. To assess allele frequency differences between population samples PoPoolation2 implements a wide variety of statistics. When data from more than two populations are available, PoPoolation2 automatically computes all pairwise comparisons for these tests (except for the CMH test). As the most intuitive measure of population differentiation, the allele frequency differences are reported. The fixation index (FST) can be calculated to measure differentiation between populations. FST values may either be calculated with the classical approach (Hartl and Clark, 2007) or with an approach adapted to digital data (Karlsson ) The statistical significance of allele frequency differences is determined with Fisher's exact test (Fisher, 1922). Since in experimental evolution experiments and selective genotyping studies often biological replicates are available, we implemented the Cochran–Mantel–Haenszel (CMH) test (Landis ) to test for the statistical significance between groups. All these analyses can be performed on different levels. We have implemented a sliding window analysis, which permits a genome-wide scan for differentiation using a specified window size. For the analysis of single SNPs, a window size of 1 may be used. Finally, with a user-provided GTF file the analysis of genes, coding sequence, introns, etc. is possible. To visualize the population differentiation across the genome, PoPoolation2 converts the results into file formats that are compatible with the Integrative Genomics Viewer (Robinson ). Finally, PoPoolation2 also implements the functionality to randomly subsample the data to achieve a uniform coverage. The subsampling is based on a user-defined quality threshold. For analyzing the data with standard software, such as Mega5 (Tamura ) and Arlequin (Excoffier and Lischer, 2010), PoPoolation2 allows exporting the data as artificial chromosomes as ‘multi-fasta’ files and as ‘GenePop’ files (Raymond and Rousset, 1995).

3 VALIDATION

To test PoPoolation2, we placed 10 000 SNPs for two populations on chromosome 2R of Drosophila melanogaster (v5.38). For these SNPs, we simulated 75 bp reads such that the coverage was 100× and the allele frequency differences between the two populations ranged from 0.1 to 0.9. Subsequently, the simulated reads were mapped to the reference genome (D.melanogaster, chromosome 2R, v5.38) with BWA (0.5.8) (Li and Durbin, 2009) and a ‘mpileup’ file was created using samtools (0.1.13) (Li ). Finally, we compared the expected values with the observed ones and found an almost perfect correlation between the simulated data and the estimates based on PoPoolation2 for all implemented tests (allele frequency differences: R2=0.9979, P<2.2e-16; FST: R2=0.9967, P<2.2e-16; Fisher's exact test: R2=0.9974, P<2.2e-16; CMH test: R2=0.9978, P<2.2e-16; Fig. 1). These high correlations confirm that PoPoolation2 yields highly reliable results (for details, see http://code.google.com/p/popoolation2/wiki/Validation).

Fig. 1.

Expected versus observed values for the tests implemented in PoPoolation2 using 10 000 simulated SNPs. (A) allele frequency difference; (B) FST; (C) Fisher's exact test [−log 10(P-value)]; (D) CMH test [−log 10(P-value)]. To ensure that all scripts continue to work properly, we implemented Unit-tests for the main scripts (which may be run by providing the parameter ‘–test’).

17 in total

1. MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods.

Authors: Koichiro Tamura; Daniel Peterson; Nicholas Peterson; Glen Stecher; Masatoshi Nei; Sudhir Kumar
Journal: Mol Biol Evol Date: 2011-05-04 Impact factor: 16.240

2. Arlequin suite ver 3.5: a new series of programs to perform population genetics analyses under Linux and Windows.

Authors: Laurent Excoffier; Heidi E L Lischer
Journal: Mol Ecol Resour Date: 2010-03-01 Impact factor: 7.090

3. Selective DNA pooling for determination of linkage between a molecular marker and a quantitative trait locus.

Authors: A Darvasi; M Soller
Journal: Genetics Date: 1994-12 Impact factor: 4.562

4. Revealing the genetic structure of a trait by sequencing a population under selection.

Authors: Leopold Parts; Francisco A Cubillos; Jonas Warringer; Kanika Jain; Francisco Salinas; Suzannah J Bumpstead; Mikael Molin; Amin Zia; Jared T Simpson; Michael A Quail; Alan Moses; Edward J Louis; Richard Durbin; Gianni Liti
Journal: Genome Res Date: 2011-03-21 Impact factor: 9.043

5. Population genomics: whole-genome analysis of polymorphism and divergence in Drosophila simulans.

Authors: David J Begun; Alisha K Holloway; Kristian Stevens; Ladeana W Hillier; Yu-Ping Poh; Matthew W Hahn; Phillip M Nista; Corbin D Jones; Andrew D Kern; Colin N Dewey; Lior Pachter; Eugene Myers; Charles H Langley
Journal: PLoS Biol Date: 2007-11-06 Impact factor: 8.029

6. The Sequence Alignment/Map format and SAMtools.

Authors: Heng Li; Bob Handsaker; Alec Wysoker; Tim Fennell; Jue Ruan; Nils Homer; Gabor Marth; Goncalo Abecasis; Richard Durbin
Journal: Bioinformatics Date: 2009-06-08 Impact factor: 6.937

7. Population resequencing reveals local adaptation of Arabidopsis lyrata to serpentine soils.

Authors: Thomas L Turner; Elizabeth C Bourne; Eric J Von Wettberg; Tina T Hu; Sergey V Nuzhdin
Journal: Nat Genet Date: 2010-01-24 Impact factor: 38.330

8. Integrative genomics viewer.

Authors: James T Robinson; Helga Thorvaldsdóttir; Wendy Winckler; Mitchell Guttman; Eric S Lander; Gad Getz; Jill P Mesirov
Journal: Nat Biotechnol Date: 2011-01 Impact factor: 54.908

9. Population-based resequencing of experimentally evolved populations reveals the genetic basis of body size variation in Drosophila melanogaster.

Authors: Thomas L Turner; Andrew D Stewart; Andrew T Fields; William R Rice; Aaron M Tarone
Journal: PLoS Genet Date: 2011-03-17 Impact factor: 5.917

10. PoPoolation: a toolbox for population genetic analysis of next generation sequencing data from pooled individuals.

Authors: Robert Kofler; Pablo Orozco-terWengel; Nicola De Maio; Ram Vinay Pandey; Viola Nolte; Andreas Futschik; Carolin Kosiol; Christian Schlötterer
Journal: PLoS One Date: 2011-01-06 Impact factor: 3.752

262 in total

1. Defense Response in Brazilian Honey Bees (Apis mellifera scutellata × spp.) Is Underpinned by Complex Patterns of Admixture.

Authors: Brock A Harpur; Samir M Kadri; Ricardo O Orsi; Charles W Whitfield; Amro Zayed
Journal: Genome Biol Evol Date: 2020-08-01 Impact factor: 3.416

2. Ribosomal protein genes are highly enriched among genes with allele-specific expression in the interspecific F1 hybrid catfish.

Authors: Ailu Chen; Ruijia Wang; Shikai Liu; Eric Peatman; Luyang Sun; Lisui Bao; Chen Jiang; Chao Li; Yun Li; Qifan Zeng; Zhanjiang Liu
Journal: Mol Genet Genomics Date: 2016-01-08 Impact factor: 3.291

3. Identifying a Major QTL Associated with Salinity Tolerance in Nile Tilapia Using QTL-Seq.

Authors: Xiao Hui Gu; Dan Li Jiang; Yan Huang; Bi Jun Li; Chao Hao Chen; Hao Ran Lin; Jun Hong Xia
Journal: Mar Biotechnol (NY) Date: 2018-01-09 Impact factor: 3.619

4. Introduction to Population Genomics Methods.

Authors: Thibault Leroy; Quentin Rougemont
Journal: Methods Mol Biol Date: 2021

5. Independent Origin of XY and ZW Sex Determination Mechanisms in Mosquitofish Sister Species.

Authors: Verena A Kottler; Romain Feron; Indrajit Nanda; Christophe Klopp; Kang Du; Susanne Kneitz; Frederik Helmprobst; Dunja K Lamatsch; Céline Lopez-Roques; Jerôme Lluch; Laurent Journot; Hugues Parrinello; Yann Guiguen; Manfred Schartl
Journal: Genetics Date: 2019-11-08 Impact factor: 4.562

6. Divergence of Drosophila melanogaster repeatomes in response to a sharp microclimate contrast in Evolution Canyon, Israel.

Authors: Young Bun Kim; Jung Hun Oh; Lauren J McIver; Eugenia Rashkovetsky; Katarzyna Michalak; Harold R Garner; Lin Kang; Eviatar Nevo; Abraham B Korol; Pawel Michalak
Journal: Proc Natl Acad Sci U S A Date: 2014-07-08 Impact factor: 11.205

7. Measuring Genetic Differentiation from Pool-seq Data.

Authors: Valentin Hivert; Raphaël Leblois; Eric J Petit; Mathieu Gautier; Renaud Vitalis
Journal: Genetics Date: 2018-07-30 Impact factor: 4.562

8. Chromosomal inversions and ecotypic differentiation in Anopheles gambiae: the perspective from whole-genome sequencing.

Authors: R Rebecca Love; Aaron M Steele; Mamadou B Coulibaly; Sékou F Traore; Scott J Emrich; Michael C Fontaine; Nora J Besansky
Journal: Mol Ecol Date: 2016-11-09 Impact factor: 6.185

9. Intra-species differences in population size shape life history and genome evolution.

Authors: David Willemsen; Rongfeng Cui; Martin Reichard; Dario Riccardo Valenzano
Journal: Elife Date: 2020-09-01 Impact factor: 8.140

10. Genetic and Genome Analyses Reveal Genetically Distinct Populations of the Bee Pathogen Nosema ceranae from Thailand.

Authors: Melissa J Peters; Guntima Suwannapong; Adrian Pelin; Nicolas Corradi
Journal: Microb Ecol Date: 2018-10-04 Impact factor: 4.552