| Literature DB >> 28270526 |
Sònia Casillas1,2, Antonio Barbadilla3,2.
Abstract
Molecular population genetics aims to explain genetic variation and molecular evolution from population genetics principles. The field was born 50 years ago with the first measures of genetic variation in allozyme loci, continued with the nucleotide sequencing era, and is currently in the era of population genomics. During this period, molecular population genetics has been revolutionized by progress in data acquisition and theoretical developments. The conceptual elegance of the neutral theory of molecular evolution or the footprint carved by natural selection on the patterns of genetic variation are two examples of the vast number of inspiring findings of population genetics research. Since the inception of the field, Drosophila has been the prominent model species: molecular variation in populations was first described in Drosophila and most of the population genetics hypotheses were tested in Drosophila species. In this review, we describe the main concepts, methods, and landmarks of molecular population genetics, using the Drosophila model as a reference. We describe the different genetic data sets made available by advances in molecular technologies, and the theoretical developments fostered by these data. Finally, we review the results and new insights provided by the population genomics approach, and conclude by enumerating challenges and new lines of inquiry posed by increasingly large population scale sequence data.Entities:
Keywords: Drosophila; FlyBook; Hill–Robertson interference; distribution of fitness effects; genetic draft; linked selection; molecular population genetics; neutral theory; population genomics; population multi-omics
Mesh:
Year: 2017 PMID: 28270526 PMCID: PMC5340319 DOI: 10.1534/genetics.116.196493
Source DB: PubMed Journal: Genetics ISSN: 0016-6731 Impact factor: 4.562
Figure 1Population genomics resources available for four Drosophila species. ● represents sequenced populations, and the size of the ● is proportional to the number of individuals sequenced. See an interactive and updateable version of this figure with additional information about each population at http://flybook-mpg.uab.cat. D. melanogaster populations: USTB, Tampa Bay, FL, n = 2; UST, Thomasville, GA, n = 2; USS, Selva, AL, n = 2; USB, Birmingham, AL, n = 2; USM, Meridian, MS, n = 2; USFL, Sebastian, FL, n = 2; BF, Freeport, Bahamas, n = 2; BGT, George Town, Bahamas, n = 2; BBH, Bullocks Harbor, Bahamas, n = 2; SS, Cockburn Town, San Salvador, n = 2; BM, Mayaguana, Bahamas, n = 2; B, Beijing, China, n = 15; CK, Kisangani, Congo, n = 2; CO, Oku, Cameroon, n = 13; EA, Gambella, Ethiopia, n = 24; EB, Bonga, Ethiopia, n = 5; ED, Dodola, Ethiopia, n = 8; EF, Fiche, Ethiopia, n = 69; EG, Cairo, Egypt, n = 32; EM, Masha, Ethiopia, n = 3; ER, Debre Birhan, Ethiopia, n = 5; EZ, Ziway, Ethiopia, n = 5; FRL, Lyon, France, n = 96; FRM, Montpellier, France, n = 20; GA, Franceville, Gabon, n = 10; AGA, Athens, GA, n = 15; GH, Accra, Ghana, n = 15; GU, Dondé, Guinea, n = 7; H, Port Au Prince, Haiti, n = 2; I, Ithaca, NY, n = 19; KM, Malindi, Kenya, n = 4; KN, Nyahururu, Kenya, n = 6; KO, Molo, Kenya, n = 4; KR, Marigat, Kenya, n = 6; KT, Thika, Kenya, n = 2; N, Houten, Netherlands, n = 19; NG, Maiduguri, Nigeria, n = 6; RAL, n = 205; RC, Cyangugu, Rwanda, n = 2; RG, Gikongoro, Rwanda, n = 27; SB, Barkly East, South Africa, n = 5; SD, Dullstroom, South Africa, n = 81; SE, Port Edward, South Africa, n = 3; SF, Fouriesburg, South Africa, n = 5; SP, Phalaborwa, South Africa, n = 37; T, Sorell, Tasmania, Australia, n = 18; TZ, Uyole, Tanzania, n = 3; UG, Namulonge, Uganda, n = 6; UK, Kisoro, Uganda, n = 5; UM, Masindi, Uganda, n = 3; W, Winters, CA, n = 35; ZH, Harare, Zimbabwe, n = 4; ZI, Siavonga, Zambia, n = 197; ZK, Lake Kariba, Zimbabwe, n = 3; ZL, Livingstone, Zambia, n = 1; ZO, Solwezi, Zambia, n = 2; ZS, Sengwa, Zimbabwe, n = 5; ZW, Victoria Falls, Zimbabwe, n = 9; MAD, Tampa Bay, FL, n = 2; NAIS, Thomasville, GA, n = 2; WIN, Selva AL, n = 2; NOU, Birmingham, AL, n = 2; NAN, Meridian, MS, n = 2. D. simulans populations: MAD, Madagascar, n = 12; NAIS, Nairobi, Kenya, n = 10; WIN, Winters, CA, n = 2; NOU, Noumea, New Caledonia, n = 1; NAN, Nanyuki, Kenya, n = 1. D. yakuba populations: NAIY, Nairobi, Kenya, n = 10; NGU, Nguti, Cameroon, n = 10; TAI, Taï Rainforest, Liberia, n = 1. D. mauritiana populations: MAU, Mauritius, n = 117.
Figure 2DFE according to the (nearly) neutral theory of molecular evolution. (A) In the 1960s, according to the Kimura’s neutral theory. (B) In the 1970s, after the extension of the neutral theory by Ohta. Different selection coefficients of mutations are colored in a gradient from maroon (strongly deleterious), red (slightly deleterious), gray (neutral), light green (slightly advantageous), and dark green (advantageous).
Figure 3Molecular evolutionary rate (K) as a function of (A) the DFE, (B) the probability of fixation of new mutations entering the population, and (C) the rate at which new mutations enter the population per site per generation (see text for details). Different selection coefficients of mutations are colored in a gradient from maroon (strongly deleterious), red (slightly deleterious), gray (neutral), light green (slightly advantageous), and dark green (advantageous).
The arsenal of parameters for population genetics/genomics analyses: measures of nucleotide diversity, LD, and tests of selection
| Measure/test | Description | References |
|---|---|---|
| Nucleotide diversity measures (uni-dimensional measures) | ||
| Number of segregating sites (per DNA sequence or per site, respectively) | ||
| H, | Minimum number of mutations (per DNA sequence or per site, respectively) | |
| Average number of nucleotide differences (per DNA sequence) between any two sequences | ||
| π | Nucleotide diversity: average number of nucleotide differences per site between any two sequences | |
| θ, θ | Nucleotide polymorphism: proportion of nucleotide sites that are expected to be polymorphic in any suitable sample | |
| SFS | Site/allele frequency spectrum: distribution of allele frequencies at a given set of loci in a population or sample | |
| LD (multi-dimensional association among variable sites) and recombination | ||
| Coefficient of LD whose range depends of the allele frequencies | ||
| Normalized | ||
| Statistical correlation between pairs of sites | ||
| Average of | ||
| Four-gamete test | Measure of historical recombination under the infinite-sites model | |
| ρ | Population-scaled recombination rate ρ | |
| Selection tests based on the allele frequency spectrum and/or levels of variability | ||
| Tajima’s | Number of nucleotide polymorphisms with the mean pairwise difference between sequences | |
| Fu and Li’s | Number of derived nucleotide variants observed only once in a sample with the total number of derived nucleotide variants | |
| Fu and Li’s | Number of derived nucleotide variants observed only once in a sample with the mean pairwise difference between sequences | |
| Fay and Wu’s | Number of derived nucleotide variants at low and high frequencies with the number of variants at intermediate frequencies | |
| Zeng’s | Difference between θ | |
| Achaz’s | Unified framework for θ estimators on the basis of the allele frequency spectrum | |
| Fu’s | Test based on the allele frequency spectrum | |
| Ramos-Onsins’ and Rozas’ | Tests based on the difference between the number of singleton mutations and the average number of nucleotide differences | |
| CL, CLR | Genome scan for candidate regions of selective sweeps based on aberrant allele frequency spectrum | |
| Selection tests based on comparisons of polymorphism and/or divergence between different classes of mutation | ||
| Ratio of nonsynonymous to synonymous nucleotide divergence/polymorphism (ω) | ||
| HKA | Degree of polymorphism within and between species at two or more loci | |
| MK | Ratios of synonymous and nonsynonymous nucleotide divergence and polymorphism | |
| Estimators derived from extensions of the MK test or the DFE | ||
| NI | Neutrality index that summarizes the four values in an MK test table as a ratio of ratios | |
| DoS | Direction of selection: difference between the proportion of nonsynonymous divergence and nonsynonymous polymorphism | |
| α | Proportion of substitutions that are adaptive | |
| DFE-α | Fraction of adaptive nonsynonymous substitutions, robust to low recombination | |
| ω | Rate of adaptive evolution relative to the mutation rate | |
| Rate of adaptive amino acid substitution ( | ||
| Fractions of five different selection regimes derived from an extension of the MK test: | ||
| Proportion of adaptive substitutions lost due to HRi | ||
| Optimal baseline recombination, above which the genome is free of the HRi and thus | ||
| Selection tests based on LD | ||
| Hudson’s haplotype test | Detection of derived and ancestral alleles on unusually long haplotypes | |
| B/Q | Based on LD between adjacent pairs of segregating sites, under the coalescent model with recombination | |
| Integrated haplotype score, based on the frequency of alleles in regions of high LD | ||
| LRH | Long-range haplotype test, based on the frequency of alleles in regions of long-range LD | |
| HS | Haplosimilarity score: long-range haplotype similarity | |
| EHH | Extended haplotype homozygosity: measurement of the decay of LD between loci with distance | |
| LDD | LD decay: expected decay of adjacent SNP LD at recently selected alleles | |
| SGS | Shared genomic segment analysis: detection of shared regions across individuals within populations | |
| GIBDLD | Detection of genomic loci with excess of identity-by-descent sharing in unrelated individuals as signature of recent selection | |
| XP-EHH | Long-range haplotype method to detect recent selective sweeps | |
| H12, H2/H1 | Haplotype homozygosity | |
| Population differentiation and associated selection tests | ||
| Analysis of gene diversity (heterozygosity) within and between subpopulations | ||
| Average levels of gene flow based on allele frequencies, under the infinite-sites model | ||
| Bayesian | Probability that a locus is subject to selection based on locus-specific population differentiation, using a Bayesian method | |
| Different test statistics based on haplotype frequencies and/or the number of nucleotide differences between sequences | ||
| Genetic differentiation of subpopulations based on haplotypic data | ||
| Correlation of haplotypic diversity at different levels of hierarchical subdivision | ||
| Strobeck’s | Measure of population structure based on the comparison of the observed number of alleles in a sample to that expected when θ is estimated from the average number of nucleotide differences | |
| XP-CLR | Cross-population composite likelihood ratio test, based on allele frequency differentiation across populations | |
| TLK, TF-LK | Original Lewontin–Krakauer test (TLK) and an extension (TF-LK), aimed at detecting selection based on the variance of | |
| LSBL | Locus-specific branch length, based on pairwise | |
| hapFLK | Detecting of selection based on differences in haplotype frequencies among populations with a hierarchical structure | |
Selection of software available for population genetics/genomics analyses
| Released | Last version | Language | OS | Supported alignment formats | Supported SNP data formats | |
|---|---|---|---|---|---|---|
| DnaSP | 1995 | 5.10.1 (2010/03) | Visual Basic | MS Windows | FASTA, MEGA, NBRF/PIR, NEXUS, PHYLIP | HapMap |
| PAML | 1997 | 4.8a (2014/08) | ANSI C | UNIX/Linux, MAC OSX, MS Windows | PHYLIP, NEXUS (limited support) | — |
| LAMARC | 2001 | 2.1.10 (2016/01) | C++ | UNIX/Linux, MAC OSX, MS Windows | PHYLIP, (own) | (own) |
| Arlequin | 2005 | 3.5.2.2 (2015/08) | C++, R | UNIX/Linux, MAC OSX, MS Windows | (own) | (own) |
| VariScan | 2005 | 2.0.3 (2012/07) | C++ | UNIX/Linux, MAC OSX, MS Windows | MAF, MGA, XMFA, PHYLIP | HapMap |
| PLINK | 2007 | 1.9 beta 3.38 (2016/06), 1.07 stable (2009/10) | C/C++ | UNIX/Linux, MAC OSX, MS Windows | — | PED/MAP (own) |
| adegenet and pegas | 2008; 2010 | Adegenet, 2.0.1 (2016/02); Pegas, 0.9 (2016/04) | R | UNIX/Linux, MAC OSX, MS Windows | FASTA, NEXUS, PHYLIP, (own) | VCF, FSTAT, GENETIX, GENEPOP, STRUCTURE, (own) |
| PopGenome | 2014 | 2.1.6 (2015/05) | R | UNIX/Linux, MAC OSX, MS Windows | FASTA, NEXUS, MEGA, MAF, PHYLIP, RData, (own) | VCF, SNP, HapMap, MS, MSMS |
| ANGSD | 2014 | 0.911 (2016/03) | C/C++ | UNIX/Linux | BAM, CRAM, MPILEUP | VCF, GLF, BEAGLE |
DnaSP, http://www.ub.edu/dnasp/ (Rozas and Rozas 1995, 1997, 1999; Rozas ; Librado and Rozas 2009; Rozas 2009); PAML, http://abacus.gene.ucl.ac.uk/software/paml.html (Yang 1997, 2007); LAMARC, http://evolution.genetics.washington.edu/lamarc/index.html (Kuhner 2006; Kuhner and Smith 2007); Arlequin, http://cmpg.unibe.ch/software/arlequin35 (Excoffier ; Excoffier and Lischer 2010); VariScan, http://www.ub.edu/softevol/variscan (Vilella ; Hutter ); PLINK, http://pngu.mgh.harvard.edu/∼purcell/plink/ (Purcell ); adegenet, http://adegenet.r-forge.r-project.org/ (Jombart 2008; Jombart and Ahmed 2011); pegas, http://ape-package.ird.fr/pegas.html (Paradis 2010); PopGenome, http://popgenome.weebly.com/ (Pfeifer ); and ANGSD, http://www.popgen.dk/angsd (Korneliussen ).
Figure 4The footprint of deleterious selection on indel variation. Indel size distribution of (A) deletions and (B) insertions in coding regions (bars) and short introns (for comparison, gray line). The size distribution of indels in coding regions has discrete peaks for indel sizes in multiples of 3 bp. This remarkable pattern is a classroom example of the footprint that natural selection against frameshifting indels leaves, compared to a more relaxed selection for insertions and deletions spanning complete codons or short introns. Data from Massouras and Huang .
Figure 5Representation of the cost of linkage on selected sites, or HRi. Arrows indicate adaptive (green) and deleterious (red) mutations, while their length indicates the intensity of selection. (A) When two or more adaptive mutations occur in separate haplotypes without recombination (left), only one of them can be fixed in the population and thus mutations compete for their fixation. However, when recombination is sufficiently high (right), the two haplotypes can exchange alleles and generate a new haplotype that carries both adaptive mutations and can be fixed. (B) In the presence of both adaptive and deleterious mutations without recombination (left), all alleles compete; as a result, deleterious alleles may be dragged to fixation if the intensity of selection favoring a nearby adaptive mutation is high, or adaptive alleles may be lost if the joint strength of negative selection is higher. With recombination (right), deleterious alleles can be removed and adaptive alleles can be fixed without interfering with each other. Adapted from Barrón (2015).
Figure 6Relationship between recombination and adaptation in the D. melanogaster genome. The adaptation rate of a genomic region increases with the recombination rate until a threshold value of recombination (∼2 cM/Mb) in which adaptation rate reaches an asymptote. The shaded area represents the reduction of adaptive rate due to the cost of genome linkage, whose value has been estimated for the first time at ∼27% in a North American population of D. melanogaster. ropt is the optimal baseline value of recombination above which any detectable HRi vanishes (see text for details). Adaptation index: K, rate of adaptive nonsynonymous substitution. Negative values mean fixation of deleterious mutations. Data from Castellano .