| Literature DB >> 25631317 |
Justin B Lack1, Charis M Cardeno2, Marc W Crepeau2, William Taylor3, Russell B Corbett-Detig4, Kristian A Stevens2, Charles H Langley5, John E Pool1.
Abstract
Hundreds of wild-derived Drosophila melanogaster genomes have been published, but rigorous comparisons across data sets are precluded by differences in alignment methodology. The most common approach to reference-based genome assembly is a single round of alignment followed by quality filtering and variant detection. We evaluated variations and extensions of this approach and settled on an assembly strategy that utilizes two alignment programs and incorporates both substitutions and short indels to construct an updated reference for a second round of mapping prior to final variant detection. Utilizing this approach, we reassembled published D. melanogaster population genomic data sets and added unpublished genomes from several sub-Saharan populations. Most notably, we present aligned data from phase 3 of the Drosophila Population Genomics Project (DPGP3), which provides 197 genomes from a single ancestral range population of D. melanogaster (from Zambia). The large sample size, high genetic diversity, and potentially simpler demographic history of the DPGP3 sample will make this a highly valuable resource for fundamental population genetic research. The complete set of assemblies described here, termed the Drosophila Genome Nexus, presently comprises 623 consistently aligned genomes and is publicly available in multiple formats with supporting documentation and bioinformatic tools. This resource will greatly facilitate population genomic analysis in this model species by reducing the methodological differences between data sets.Entities:
Keywords: Drosophila Genome Nexus; Drosophila melanogaster; genome assembly; population genomics
Mesh:
Year: 2015 PMID: 25631317 PMCID: PMC4391556 DOI: 10.1534/genetics.115.174664
Source DB: PubMed Journal: Genetics ISSN: 0016-6731 Impact factor: 4.562
Figure 1Comparison of genomic coverage and error rate for various genome assembly pipeline variations, based on resequencing of the D. melanogaster reference. (A) Heat maps illustrating variation in coverage (left) and error rate (right) at the Q75 threshold chosen to optimize the trade-off between coverage and error rate. (B) Evaluation of the trade-off between genomic coverage and error rate for the haploid caller of the Unified Genotyper; quality values ranged from 10 to 100. Resequenced genomes from the reference strain (y) were modified to simulate realistic levels of variation. We chose a cutoff of Q75 (red) to maximize coverage and minimize error.
Chromosome-arm nucleotide diversity (π) for the RG population based on sites called in both rounds of our pipeline and on sites called only by adding the second round of mapping
| RG nucleotide diversity | ||
|---|---|---|
| Chromosome | Both rounds | Second round only |
| X | 0.0083 | 0.0277 |
| 2L | 0.0086 | 0.0259 |
| 2R | 0.0077 | 0.0252 |
| 3L | 0.0079 | 0.0248 |
| 3R | 0.0065 | 0.0260 |
Figure 2Relationship between the number of indels and the number of sites called by our two-round pipeline but not called in a single-round pipeline for two RG genomes (RG5 and RG33). Site counts (y-axis) and indel counts (x-axis) were determined in 100-kb windows across each genome.
Figure 3Enrichment of each of nine annotation classes in the sites added by our two-round pipeline, but not called by a single-round pipeline, relative to genome-wide frequencies. We examined two RG genomes (RG5 and RG33) with an ∼30× mean depth and comparable coverage.
Figure 4Mean sequencing depth vs. genetic distance (A) from the Zambia population and depth vs. coverage (B) for the AGES dataset genomes with high coverage on all chromosome arms (listed in Table S1). Circles indicate comparisons utilizing all windows with called sites, while triangles indicate comparisons including only sites called for all of the AGES and ZI genomes. Comparisons illustrate the effect of depth on genetic distance (A) and coverage (B) for genomes assembled using a single-round pipeline (red) vs. our two-round pipeline (blue). The two-round pipeline appears to alleviate the potential downward bias present in the single-round pipeline for depths below ∼20×, and the greater impact of depth on coverage for the single-round pipeline (B) suggests that the sites added by the two-round pipeline are driving the differences in distance to ZI.
Figure 5A histogram of the proportions of each autosomal chromosome arm called heterozygous from the 205 DGRP genomes. Based on the cytological analysis of Huang , red arms were reported to be free of inversion polymorphism, while blue arms contained polymorphic inversions. The greatly increased heterozygosity of the latter category illustrates the effects of inversion polymorphism on inbreeding efficacy.
Figure 6Heterogeneity in estimated cosmopolitan admixture proportions among individuals for each Sub-Saharan African population.
Chromosomal arm nucleotide diversity (π) for populations with inversion polymorphism
| X | 2L | 2R | 3L | 3R | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Population | Standard | Total | Standard | Total | Standard | Total | Standard | Total | Standard | Total |
| CO | 0.0075 | 0.0075 | 0.0082 | 0.0083 | 0.0073 | No inversion | 0.0076 | No inversion | Non | 0.0058 |
| EA | 0.0071 | 0.0071 | 0.0087 | No inversion | 0.0074 | No inversion | 0.0075 | No inversion | 0.0060 | No inversion |
| EB | 0.0064 | No inversion | 0.0074 | No inversion | 0.0066 | No inversion | 0.0067 | No inversion | ||
| EG | 0.0034 | 0.0034 | Non | 0.0066 | 0.0053 | No inversion | Non | Non | Non | 0.0062 |
| FR | 0.0036 | 0.0036 | 0.0051 | No inversion | ||||||
| GA | 0.0077 | 0.0076 | 0.0077 | No inversion | 0.0080 | No inversion | 0.0066 | 0.0068 | ||
| GU | 0.0076 | 0.0076 | 0.0083 | 0.0084 | 0.0075 | No inversion | 0.0077 | No inversion | Non | 0.0066 |
| KN | 0.0087 | 0.0086 | 0.0077 | No inversion | 0.0082 | No inversion | 0.0070 | 0.0073 | ||
| KR | 0.0082 | 0.0085 | Non | 0.0068 | 0.0080 | 0.0081 | 0.0085 | No inversion | 0.0067 | No inversion |
| NG | 0.0076 | 0.0076 | 0.0075 | 0.0074 | ||||||
| RAL | 0.0041 | 0.0041 | 0.0068 | 0.0070 | 0.0062 | 0.0064 | 0.0064 | 0.0065 | 0.0052 | 0.0052 |
| RG | 0.0080 | No inversion | 0.0085 | 0.0094 | 0.0078 | No inversion | 0.0063 | 0.0065 | ||
| SB | 0.0088 | No inversion | 0.0082 | No inversion | 0.0083 | 0.0089 | 0.0072 | No inversion | ||
| SD | 0.0086 | No inversion | 0.0096 | 0.0097 | 0.0078 | 0.0078 | 0.0080 | No inversion | 0.0065 | No inversion |
| SE | 0.0083 | 0.0086 | Non | 0.0075 | 0.0079 | 0.0076 | 0.0081 | No inversion | 0.0072 | No inversion |
| SF | 0.0086 | No inversion | 0.0081 | No inversion | ||||||
| SP | 0.0090 | No inversion | 0.0099 | 0.0100 | 0.0083 | 0.0082 | 0.0083 | No inversion | 0.0065 | No inv. |
| TZ | Non | 0.0058 | Non | 0.0062 | 0.0077 | No inversion | 0.0075 | No inversion | ||
| UK | 0.0081 | 0.0081 | 0.0085 | 0.0086 | 0.0077 | 0.0078 | 0.0077 | No inversion | ||
| UM | 0.0080 | 0.0081 | Non | 0.0084 | 0.0078 | No inversion | 0.0068 | No inversion | ||
| ZI | 0.0089 | 0.0089 | 0.0099 | 0.0097 | 0.0083 | 0.0082 | 0.0084 | 0.0087 | 0.0076 | 0.0076 |
| ZS | 0.0099 | 0.0098 | 0.0083 | 0.0080 | 0.0082 | No inversion | 0.0074 | 0.0074 | ||
Nucleotide diversity estimates include both the total data set for a given population (“Total”) and excluding arms carrying inversions (“Standard”). “Non” denotes populations with fewer than two standard chromosome arms; “No inversion” denotes populations without inversion polymorphism. Values in boldface indicate a difference ≥5%.
Pairwise population FST for select populations averaged across chromosome arms
| FR | GA | NG | RAL | RG | SP | ZI | |
|---|---|---|---|---|---|---|---|
| FR | 0.0000 | 0.1898 | 0.2263 | 0.0376 | 0.2173 | 0.2213 | 0.2152 |
| GA | 0.2270 | 0.0000 | 0.0263 | 0.1626 | 0.0515 | 0.0955 | 0.0874 |
| NG | 0.2630 | 0.0133 | 0.0000 | 0.1954 | 0.0719 | 0.1128 | 0.1067 |
| RAL | 0.0444 | 0.1672 | 0.2000 | 0.0000 | 0.1879 | 0.1967 | 0.1911 |
| RG | 0.2545 | 0.0515 | 0.0631 | 0.1965 | 0.0000 | 0.0694 | 0.0595 |
| SP | 0.2565 | 0.0962 | 0.1034 | 0.2079 | 0.0768 | 0.0000 | 0.0130 |
| ZI | 0.2508 | 0.0939 | 0.1015 | 0.2025 | 0.0662 | 0.0127 | 0.0000 |
Comparisons utilizing the total data set for each population are above the diagonal, and comparisons using only arms without inversions are shown below the diagonal.