| Literature DB >> 29059408 |
Sònia Casillas1, Roger Mulet1, Pablo Villegas-Mirón2, Sergi Hervas1, Esteve Sanz3, Daniel Velasco1, Jaume Bertranpetit2, Hafid Laayouni2,4, Antonio Barbadilla1,3.
Abstract
The 1000 Genomes Project (1000GP) represents the most comprehensive world-wide nucleotide variation data set so far in humans, providing the sequencing and analysis of 2504 genomes from 26 populations and reporting >84 million variants. The availability of this sequence data provides the human lineage with an invaluable resource for population genomics studies, allowing the testing of molecular population genetics hypotheses and eventually the understanding of the evolutionary dynamics of genetic variation in human populations. Here we present PopHuman, a new population genomics-oriented genome browser based on JBrowse that allows the interactive visualization and retrieval of an extensive inventory of population genetics metrics. Efficient and reliable parameter estimates have been computed using a novel pipeline that faces the unique features and limitations of the 1000GP data, and include a battery of nucleotide variation measures, divergence and linkage disequilibrium parameters, as well as different tests of neutrality, estimated in non-overlapping windows along the chromosomes and in annotated genes for all 26 populations of the 1000GP. PopHuman is open and freely available at http://pophuman.uab.cat.Entities:
Mesh:
Year: 2018 PMID: 29059408 PMCID: PMC5753332 DOI: 10.1093/nar/gkx943
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.PopHuman pipeline. Cited references in the figure: 1000GP Phase III (15); Inbred individuals in the 1000GP (20); VISTA Genome Browser (23); Human genetic maps (24); PopGenome software (25); UCSC Genome Browser (35); JBrowse software (34).
Summary of the amount of data analyzed in PopHuman
| Chromosome | Windows-based analysis | Genes-based analysis | |||
|---|---|---|---|---|---|
| Chromosome number | Chromosome size (millions of bases)a | Number of windowsb | Number of bases (millions) | Percentage of analyzed bases | Number of RefSeqc genes analyzed |
|
| 249.25 | 14 741 | 147.41 | 59.14 | 2328 |
|
| 243.20 | 16 270 | 162.70 | 66.90 | 1464 |
|
| 198.02 | 13 575 | 135.75 | 68.55 | 1274 |
|
| 191.15 | 12 512 | 125.12 | 65.45 | 879 |
|
| 180.92 | 12073 | 120.73 | 66.73 | 1022 |
|
| 171.12 | 11 433 | 114.33 | 66.81 | 1206 |
|
| 159.14 | 9919 | 99.19 | 62.33 | 1108 |
|
| 146.36 | 9783 | 97.83 | 66.84 | 818 |
|
| 141.21 | 7358 | 73.58 | 52.11 | 944 |
|
| 135.53 | 8760 | 87.60 | 64.63 | 903 |
|
| 135.01 | 8877 | 88.77 | 65.75 | 1439 |
|
| 133.85 | 8773 | 87.73 | 65.54 | 1175 |
|
| 115.17 | 6481 | 64.81 | 56.27 | 449 |
|
| 107.35 | 5948 | 59.48 | 55.41 | 779 |
|
| 102.53 | 5334 | 53.34 | 52.02 | 791 |
|
| 90.35 | 4688 | 46.88 | 51.88 | 938 |
|
| 81.20 | 4556 | 45.56 | 56.11 | 1358 |
|
| 78.08 | 5164 | 51.64 | 66.14 | 341 |
|
| 59.13 | 2681 | 26.81 | 45.34 | 1609 |
|
| 63.03 | 4091 | 40.91 | 64.91 | 647 |
|
| 48.13 | 2211 | 22.11 | 45.94 | 296 |
|
| 51.30 | 2009 | 20.09 | 39.16 | 535 |
|
| 155.27 | 9312 | 93.12 | 59.97 | 918 |
|
| 59.37 | 622 | 6.22 | 10.48 | 53 |
|
| 3095.68 | 187 171 | 1871.71 | 60.46 | 23 274 |
aChromosome sizes are according to version GRCh37/hg19 of the human genome.
bNon-overlapping sliding windows of 10 kb have been defined such that they do not include non-accessible bases according to the Pilot-style Accessibility Mask of the 1000GP (15).
cRefSeq genes provided by the NCBI Entrez Gene database (33).
List of major windows-based variation statistics and tests of neutrality in PopHuman, computed for each population separately
| Category | Track name | Track description | Reference |
|---|---|---|---|
|
| S | Number of segregating sites per site | ( |
| Pi | Nucleotide diversity: average number of nucleotide differences per site between any two sequences | ( | |
| theta | Nucleotide polymorphism: proportion of nucleotide sites that are expected to be polymorphic in any suitable sample | ( | |
| hap_diversity_within | Haplotype diversity within the population | ( | |
|
| Divsites | Number of divergent sites | |
| K | Nucleotide divergence per base pair, corrected by Jukes-Cantor | ( | |
|
| Kelly_ZnS | Average pairwise | ( |
| Rozas_ZA | Average of | ( | |
| Rozas_ZZ | Rozas_ZA minus Kelly_ZnS | ( | |
| Wall_B; Wall_Q | Proportion of pairs of adjacent segregating sites that are congruent, with values approaching 1 indicating extensive congruence among adjacent segregating sites | ( | |
| iHS | Integrated haplotype score, based on the frequency of alleles in regions of high LD (computed for the autosomes) | ( | |
| XP_EHH | Long-range haplotype method to detect recent selective sweeps (computed for the autosomes, between the major continental populations CEU, CHB and YRI, taken in pairs) | ( | |
|
| recomb_Bherer2017_females/males/sexavg | Recombination estimates (cM/Mb) from the refined genetic map by Bhérer | ( |
| recomb_deCODE_ females/males/sexavg | deCODE genetic map based on 5136 microsatellite markers for 146 families with a total of 1257 meiotic events. | ( | |
| recomb_Marshfield_ females/males/sexavg | Marshfield genetic map based on 8325 short tandem repeat polymorphisms (STRPs) for 8 CEPH families consisting of 134 individuals with 186 meioses. | ( | |
| recomb_Genethon_ females/males/sexavg | Genethon genetic map based on 5264 microsatellites for 8 CEPH families consisting of 134 individuals with 186 meioses. | ( | |
|
| FayWu_H | Number of derived nucleotide variants at low and high frequencies with the number of variants at intermediate frequencies | ( |
| FuLi_D | Number of derived nucleotide variants observed only once in a sample with the total number of derived nucleotide variants | ( | |
| FuLi_F | Number of derived nucleotide variants observed only once in a sample with the mean pairwise difference between sequences | ( | |
| Tajima_D | Difference between the number of segregating sites and the average number of nucleotide differences. | ( | |
| Zeng_E | Difference between θL and θW, sensitive to changes in high-frequency variants. | ( | |
|
| DoS | Direction of Selection: difference between the proportion of nonsynonymous divergence and nonsynonymous polymorphism | ( |
| NI | Neutrality Index: summarizes the four values in a McDonald and Kreitman test table as a ratio of ratios | ( | |
| alpha; alpha_cor | Proportion of substitutions that are adaptive. The second is calculated after removing slightly deleterious mutations | ( |
A complete list is available under the section Help → Tracks Description of PopHuman.
List of major gene-based variation statistics in PopHuman, computed for each population separately and for different types of sites
| Category | Estimate | Description | Reference | Types of sites analyzed |
|---|---|---|---|---|
|
| π | Nucleotide diversity: average number of nucleotide differences per site between any two sequences | ( | Whole gene region ±500 bp |
| K | Nucleotide divergence per base pair, corrected by Jukes-Cantor | ( | ||
| πa/πs | Ratio of nonsynonymous to synonymous nucleotide polymorphism (ω) | ( | Ratio: 0-fold divided by 4-fold | |
| Ka/Ks | Ratio of nonsynonymous to synonymous nucleotide divergence (ω) | ( | ||
| DAF | Derived Allele Frequency: distribution of allele frequencies of segregating sites | ( | Whole gene region ±500 bp | |
|
| cM/Mb | Recombination estimates (cM/Mb) from the refined genetic map by Bhérer | ( | Whole gene region ±500 bp |
|
| P | Number of segregating sites | ( | Separately: 4-fold; 0-fold; 5′UTR; 3′UTR; intron; intergenic (±500 bp) |
| D | Number of divergent sites | |||
| π | Nucleotide diversity: average number of nucleotide differences per site between any two sequences | ( | ||
| K | Nucleotide divergence per base pair, corrected by Jukes-Cantor | ( | ||
| α | Proportion of substitutions that are adaptive. It is calculated both from P and D, and from π and K | ( | ||
|
|
| Fraction of new mutations that are strongly deleterious and do not segregate in the population | ( | Separately: 0-fold; 5′UTR; 3′UTR; intron; intergenic (±500 bp) |
|
| Fraction of new mutations that are slightly deleterious and segregate at minor allele frequency (MAF) <5% | |||
| ƒ-γ | Fraction of new mutations that are neutral since before the split of humans and chimpanzees, calculated after removing the excess of sites at MAF <5% due to slightly deleterious mutations | |||
| γ | Fraction of new mutations that have become neutral recently, after the split of humans and chimpanzees, calculated after removing the excess of sites at MAF <5% due to slightly deleterious mutations | |||
| α | Proportion of substitutions that are adaptive, calculated after removing slightly deleterious mutations | ( | ||
| DoS | Direction of Selection: difference between the proportion of nonsynonymous divergence and nonsynonymous polymorphism | ( |
A comprehensive explanation is available under the section Help → Integrative MKT of PopHuman.