| Literature DB >> 18194558 |
Paola Sebastiani1, Zhenming Zhao, Maria M Abad-Grau, Alberto Riva, Stephen W Hartley, Amanda E Sedgewick, Alessandro Doria, Monty Montano, Efthymia Melista, Dellara Terry, Thomas T Perls, Martin H Steinberg, Clinton T Baldwin.
Abstract
BACKGROUND: One of the challenges of the analysis of pooling-based genome wide association studies is to identify authentic associations among potentially thousands of false positive associations.Entities:
Mesh:
Substances:
Year: 2008 PMID: 18194558 PMCID: PMC2248205 DOI: 10.1186/1471-2156-9-6
Source DB: PubMed Journal: BMC Genet ISSN: 1471-2156 Impact factor: 2.797
Summary of the results of Experiment 1.
| Number of pools | Sample size | Average difference | Standard deviation | Correlation | Average error | Standard deviation | Correlation |
| 2 | 30 | 0.0043 | 0.0303 | 0.9940 | 0.0304 | 0.0565 | 0.9860 |
| 2 | 45 | 0.0011 | 0.0295 | 0.9956 | 0.0331 | 0.0573 | 0.9890 |
| 2 | 60 | 0.0164 | 0.0498 | 0.9873 | 0.0451 | 0.0668 | 0.9890 |
Column 1: pool description; Column 2: number of samples per pool; Column 3: average difference between estimates of allele frequencies in repeated pools; Column 4: standard deviation of differences; Column 5: correlation between repeated allele frequency estimation. Column 6: average difference between estimates of allele frequencies from pooled DNA samples and individually genotyped samples. Column 7: standard deviation of the differences; Column 8: Correlation between estimates of allele frequencies from pooled DNA samples and individually genotyped samples.
Figure 1The reproducibility of the allele frequency estimates is shown by the scatter plot of repeated estimates of allele frequency inferred from pooled DNA samples (left). The labels "run 1" and "run 2" in the x- and y-axis specify each replication. The accuracy of the allele frequency estimates is shown by the scatter plot of the estimates of allele frequency inferred from pooled DNA samples (y-axis in the right plots) and those computed from individually genotyped samples (x-axis). The analysis of the other chromosomes shows similar results.
Summary of the pools of DNA samples that were used for the validation of the analytical method. Each pool was done in duplicates.
| Phenotype | Sample size |
| Exceptional longevity | 130 male centenarians |
| 130 male controls | |
| 130 female centenarians | |
| 100 female controls | |
| Fetal hemoglobin expression | 55 sickle cell anemia subjects with fetal hemoglobin below 3% of the total hemoglobin |
| 54 sickle cell anemia subjects with fetal hemoglobin above 6.5% of the total hemoglobin |
Figure 2Example of the artificial pool sets that we created to assess the specificity of the procedure. As an example, the top four pools were generated to compare the genome of centenarians (pools 1 and 2) with that of younger controls (pools 3 and 4). The two artificial pool sets are obtained by mixing pools of centenarians DNA with those of controls.
Figure 3Distribution of the log10 Bayes factor in 1,114 false positive associations generated in approximately 106 association tests with an estimated false positive rate 5 10-6. The analysis shows that the chance to observe a very large Bayes factor has an exponential decay, and the probability of observing a Bayes factor greater than 10 by chance is 6 10-4, the probability of observing a Bayes factor greater than 100 is 3 10-4, greater than 1000 is 2 10-4.
List of SNPs that are known to be associated with different levels of HbF and results of the analysis based on pooled DNA samples.
| Number | SNP | Band | Gene | Validated | Distance | D' | BF | P_H | P_L | OR |
| 6 | rs2295199 | 6q23.2 | rs2295199 | 0 | 0.51 | 0.83 | 0.89 | 0.58 | ||
| 14 | rs717088 | 6q23.3 | rs717088 | 0 | 0.47 | 0.37 | 0.47 | 0.65 | ||
| 16 | rs1342645 | 6q23.3 | rs1342645 | 0 | 0.67 | 0.77 | 0.86 | 0.55 | ||
| 20 | rs1349115 | 8q12.1 | rs1349115 | 0 | 0.69 | 0.29 | 0.41 | 0.59 | ||
| 21 | rs10504269 | 8q12.1 | rs10504269 | 0 | 0.61 | 0.72 | 0.82 | 0.58 | ||
| 22 | rs6997859 | 8q12.1 | rs6997859 | 0 | 0.37 | 0.23 | 0.17 | 1.45 | ||
| 23 | rs12155519 | 8q12.1 | rs12155519 | 0 | 0.94 | 0.58 | 0.44 | 1.77 | ||
| 25 | rs746867 | 8q12.1 | rs746867 | 0 | 0.29 | 0.29 | 0.35 | 0.76 | ||
| 26 | rs389349 | 8q12.1 | rs389349 | 0 | 0.42 | 0.95 | 0.93 | 1.43 | ||
| 32 | rs1867380 | 15q22.31 | rs1867380 | 0 | 0.28 | 0.82 | 0.84 | 0.85 | ||
Column 1: row number; Column 2: SNP ID; Column 3: Cytogenic band; Column 4: Genes tagged by the SNP; Column 5: SNP in the Illumina array that was used to compare the association. If the SNP to be validated was not in the array, we searched for the closest SNP within 100 kb from that to be validated with a positive Bayes Factor; Column 6: distance between the two SNPs; Column 7: Bayes D' between the two SNPs, an NA means that the SNPs originally reported as associated with HbF is not in the HapMap data. Column 8: Bayes Factor; Column 9–10: estimates of allele frequencies in the pools of DNA from patients with high HbF and low HbF; Column 11: Odds ratio. The SNPs 6, 13, 14, 16, 20–26, 32 and 33 are in the array and SNPs 13, 24 and 33 were found associated with different levels of HbF. Highlighted in bold are the associations confirmed by our analysis.
List of SNPs that were found associated with exceptional longevity in different studies, and results based on the analysis of pooled DNA samples.
| Males | Females | ||||||||||||||||
| Index | SNP | Band | Gene | Validated | Distance | D' | BF | P_L | P_C | OR | Validated | Distance | D' | BF | P_L | P_C | OR |
| rs9999238 | 29363 | 0.98 | 0.2 | 0.81 | 0.83 | 0.85 | |||||||||||
| rs1131896 | 780 | 1.00 | 0.8 | 0.83 | 0.84 | 0.96 | |||||||||||
| rs2227956 | 0 | 0.3 | 0.18 | 0.14 | 1.31 | ||||||||||||
| 7p15.3 | rs2056576 | 5019 | 0.64 | 0.2 | 0.70 | 0.72 | 0.91 | ||||||||||
| 7q21.3 | rs662 | 0 | 0.2 | 0.33 | 0.29 | 1.21 | |||||||||||
| rs2373929 | 18701 | 0.07 | 0.6 | 0.63 | 0.54 | 1.45 | |||||||||||
| rs1346044 | 0 | 0.4 | 0.28 | 0.21 | 1.46 | ||||||||||||
| rs243908 | 16909 | 0.96 | 0.2 | 0.28 | 0.28 | 1.01 | |||||||||||
| rs1467558 | 0 | 0.2 | 0.84 | 0.86 | 0.87 | ||||||||||||
| rs861539 | 0 | 0.15 | 0.75 | 0.74 | 1.05 | ||||||||||||
| 9798 | NA | 0.14 | 0.59 | 0.57 | 1.09 | ||||||||||||
| rs3764261 | 1910 | NA | 0.15 | 0.66 | 0.67 | 0.97 | |||||||||||
| 22 | rs5882 | 16q13 | rs5882 | 0 | 0.19 | 0.39 | 0.35 | 1.19 | rs5882 | 0 | 0.17 | 0.37 | 0.40 | 0.88 | |||
| rs2059807 | 15691 | 0.31 | 0.2 | 0.75 | 0.72 | 1.16 | |||||||||||
Column 1: row number; Column 2: SNP ID; Column 3: Cytogenic band; Column 4: Genes tagged by the SNP; Column 5: SNP in the Illumina array that was used to compare the association as described in the caption of Table 3; Column 6: distance between the two SNPs; Column 7: Bayes D' between the two SNPs; Column 8: Bayes Factor; Column 9–10: estimates of allele frequencies in the pools of DNA from male centenarians and younger controls; Column 11: Odds ratio. Columns 12–18: as columns 5–11 but for the pools comparing female centenarians to younger controls. The SNPs 8, 10, 14, 16–19, 22 are in the array and SNPs 8, 14, 17–19 were found associated with longevity in females and/or males. Highlighted in bold are the associations confirmed by our analysis.
Figure 4Schematic summary of the modular approach to the analysis of GWA data.
Figure 5Relation between the pattern of LD (x-axis) and the global measure of association (y-axis) in the regional filter. The pattern of LD is measured by the average of the Bayes D' between consecutive SNPs in the region, and the global measure of association is the joint probability of association in the region. The two figures in the top half show the relation using data from the study of fetal hemoglobin in the sickle cell anemia subjects. The two figure in the bottom half show the relation using data from the longevity study. The different extent of LD reflect the fact that sickle cell anemia subjects are all African American while centenarians in the longevity study are all Caucasians The correlations in the four sets are 0.03, 0.18, 0.018, -0.10.
parameters and allele frequencies from pooled DNA samples
| Allele A | Allele B | ||
| Cases (one pool) | n1A = p(A|cases)*(2n1) | n1B = p(B|cases)*(2n1) | 2n1 |
| Controls (one pool) | n2A = p(A|controls)*(2n2) | n2B = p(B|controls)*(2n2) | 2n2 |
| nA | nB |