| Literature DB >> 24479562 |
Chris Bizon, Michael Spiegel, Scott A Chasse, Ian R Gizer, Yun Li, Ewa P Malc, Piotr A Mieczkowski, Josh K Sailsbery, Xiaoshu Wang, Cindy L Ehlers, Kirk C Wilhelmsen1.
Abstract
BACKGROUND: The reduction in the cost of sequencing a human genome has led to the use of genotype sampling strategies in order to impute and infer the presence of sequence variants that can then be tested for associations with traits of interest. Low-coverage Whole Genome Sequencing (WGS) is a sampling strategy that overcomes some of the deficiencies seen in fixed content SNP array studies. Linkage-disequilibrium (LD) aware variant callers, such as the program Thunder, may provide a calling rate and accuracy that makes a low-coverage sequencing strategy viable.Entities:
Mesh:
Year: 2014 PMID: 24479562 PMCID: PMC3914019 DOI: 10.1186/1471-2164-15-85
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Sample depth of coverage. Histogram of the mean sequencing read depth per sample for 641 samples. 88% of the samples have mean depth less than 13, and 26% have depth less than 5.
Figure 2Concordance with exome chip. Concordance between exome chip genotypes and genotypes from three variant callers (a) and false positive rate (b). One point (at depth = 30.4, concordances between 96.7% and 98%) has been removed to expand the data region. The concordance is calculated only at the sites that are measured as non-monomorphic in the exome chip genotypes.
Figure 3Frequency dependence of site finding. The fraction of variant sites found is dependent on both the frequency range of the variant, and the method used to call variants. The GATK Unified Genotyper in multisample mode finds more variants at all frequency ranges, but the disparity is most pronounced at the lowest frequencies, where the Unified Genotyper finds approximately 50% more variant sites than THUNDER. Single-sample Unified Genotyper calls follow a model that assumes a constant probability of finding any site in a single sample.
Figure 4Empirical kinship coefficents. Histograms of empirical kinship coefficients calculated from THUNDER genotypes. Each row contains all pairwise values that have the noted value for the pedigree-defined kinship coefficient. Thus, the lowest histogram ( φped = 0.25) contains all full sibling and parent–child relations, the next row up contains grandparent-grandchild, avuncular, and half-sibling relations, and so on.
Figure 5Allele frequencies in the NA Cohort. a). A two dimensional histogram comparing allele frequencies in the Native American cohort with those in European ancestry samples from 1000 genomes. The variants shown are the union of the two sets. Color scales logarithmically with the number of variants as in the colorbar above the image. b). One dimensional histogram of the difference in allele frequency for the same variants as shown in (a).