| Literature DB >> 24714809 |
Wen Huang1, Andreas Massouras2, Yutaka Inoue3, Jason Peiffer1, Miquel Ràmia4, Aaron M Tarone5, Lavanya Turlapati1, Thomas Zichner6, Dianhui Zhu7, Richard F Lyman1, Michael M Magwire1, Kerstin Blankenburg7, Mary Anna Carbone1, Kyle Chang7, Lisa L Ellis5, Sonia Fernandez7, Yi Han7, Gareth Highnam8, Carl E Hjelmen5, John R Jack1, Mehwish Javaid7, Joy Jayaseelan7, Divya Kalra7, Sandy Lee7, Lora Lewis7, Mala Munidasa7, Fiona Ongeri7, Shohba Patel7, Lora Perales7, Agapito Perez7, LingLing Pu7, Stephanie M Rollmann1, Robert Ruth7, Nehad Saada7, Crystal Warner7, Aneisa Williams7, Yuan-Qing Wu7, Akihiko Yamamoto1, Yiqing Zhang7, Yiming Zhu7, Robert R H Anholt1, Jan O Korbel6, David Mittelman8, Donna M Muzny7, Richard A Gibbs7, Antonio Barbadilla4, J Spencer Johnston5, Eric A Stone1, Stephen Richards7, Bart Deplancke2, Trudy F C Mackay1.
Abstract
The Drosophila melanogaster Genetic Reference Panel (DGRP) is a community resource of 205 sequenced inbred lines, derived to improve our understanding of the effects of naturally occurring genetic variation on molecular and organismal phenotypes. We used an integrated genotyping strategy to identify 4,853,802 single nucleotide polymorphisms (SNPs) and 1,296,080 non-SNP variants. Our molecular population genomic analyses show higher deletion than insertion mutation rates and stronger purifying selection on deletions. Weaker selection on insertions than deletions is consistent with our observed distribution of genome size determined by flow cytometry, which is skewed toward larger genomes. Insertion/deletion and single nucleotide polymorphisms are positively correlated with each other and with local recombination, suggesting that their nonrandom distributions are due to hitchhiking and background selection. Our cytogenetic analysis identified 16 polymorphic inversions in the DGRP. Common inverted and standard karyotypes are genetically divergent and account for most of the variation in relatedness among the DGRP lines. Intriguingly, variation in genome size and many quantitative traits are significantly associated with inversions. Approximately 50% of the DGRP lines are infected with Wolbachia, and four lines have germline insertions of Wolbachia sequences, but effects of Wolbachia infection on quantitative traits are rarely significant. The DGRP complements ongoing efforts to functionally annotate the Drosophila genome. Indeed, 15% of all D. melanogaster genes segregate for potentially damaged proteins in the DGRP, and genome-wide analyses of quantitative traits identify novel candidate genes. The DGRP lines, sequence data, genotypes, quality scores, phenotypes, and analysis and visualization tools are publicly available.Entities:
Mesh:
Substances:
Year: 2014 PMID: 24714809 PMCID: PMC4079974 DOI: 10.1101/gr.171546.113
Source DB: PubMed Journal: Genome Res ISSN: 1088-9051 Impact factor: 9.043
Figure 1.Flowchart of the integrated genotyping procedure used to call SNP and non-SNP variants. Seven different variant calling methods were used to derive a consensus list of variant calls. The variant calls were grouped into haplotype bins (indicated by dashed vertical lines) such that there is a region on both sides of each region containing two or more regions of at least 110 bp with no non-SNP variants in any line. The variable regions and their 110-bp flanking regions were used to derive the sequences of alternative haplotypes against which reads are aligned. Finally, reads were aligned and genotypes called, followed by quality filtering that accounted for the experimental design.
Comparison of genotyping methods for (A) SNPs, (B) short (<100 bp) non-SNP variants, (C) long (≥100-bp non-SNP variants)
Concordance between Illumina and 454 genotyping calls (%)
Figure 2.Distributions of the percent segregating variants in 205 DGRP lines, by chromosome. The distributions for homozygous standard or inverted karyotypes are given in blue, and the distributions for inversion/standard heterozygotes are given in red.
Inversions in DGRP lines
Figure 3.Nonrandom distribution of variants. The average number of SNPs (y-axis) for each distance in bp (x-axis) from either side of a variant of high frequency (MAF 40%–50%). Solid lines represent the number of SNPs of a given range of allele counts in lines that have the variant in question, whereas dashed lines show the number of SNPs in lines that do not have the variant. (A) Indels. (B) Noncoding SNPs.
Figure 4.Nucleotide diversity (π) within standard karyotypes (blue bars), within inverted karyotypes (red bars), and between standard and inverted karyotypes (purple bars) within genomic regions encompassed by common polymorphic inversions. The calculation was based on nonmissing genotypes only, with indels (>1 bp) or multiple nucleotide polymorphisms receiving the same weight as SNPs regardless of their length.
Figure 5.Histograms of the numbers of DGRP lines containing each damaged gene (left) and the number of damaged genes per DGRP line (right).
Figure 6.Histogram of genomic relationships among DGRP lines (20,910 possible pairs). The distribution of the relationship between all DGRP lines and the reference sequence is displayed as a box plot.
Figure 7.Principal component analysis of DNA sequence variation in the DGRP. Principal components (PCs) are computed using EIGENSTRAT. (A) PC plot of PC1 versus PC2. (B) PC plot of PC1 versus PC3. (C) PC plot of PC1 versus PC2 after PCs were recomputed excluding all variants in regions encompassing major inversions (In[2L]t, In[2R]NS, In[3R]P, In[3R]K, In[3R]Mo). With the exception of four highly related pairs of lines, there is no apparent clustering of karyotype groups.
Figure 8.Patterns of LD. (A) Decay in LD with physical distance, by chromosome arm. (B) Genome-wide spatial variation in LD. Mean r between variants within 50–150 bp of each other in sliding windows (in 100-kb steps) of 1 Mb is plotted.
Figure 9.Relationship between LD and minor allele count. For each of the minor allele counts, 1000 random variants are sampled, and the mean number of variants genome-wide or locally (<1 kb) in strong LD (r > 0.95) with the focal variant is calculated. (A) Relationship between the mean number of variants in strong LD with the focal variant and minor allele count. (B) Relationship between the mean number of variants in strong LD with the focal variant and minor allele count, stratified according to the location of the focal variant (within or outside of inversions).