| Literature DB >> 16550411 |
Dana C Crawford1, Qian Yi, Joshua D Smith, Cynthia Shephard, Michelle Wong, Laura Witrak, Robert J Livingston, Mark J Rieder, Deborah A Nickerson.
Abstract
With the recent completion of the International HapMap Project, many tools are in hand for genetic association studies seeking to test the common variant/common disease hypothesis. In contrast, very few tools and resources are in place for genotype-phenotype studies hypothesizing that rare variation has a large impact on the phenotype of interest. To create these tools for rare variant/common disease studies, much interest is being generated towards investing in re-sequencing either large sample sizes of random chromosomes or smaller sample sizes of patients with extreme phenotypes. As a case study for rare variant discovery in random chromosomes, we have re-sequenced approximately 1,000 chromosomes representing diverse populations for the gene C-reactive protein (CRP). CRP is an important gene in the fields of cardiovascular and inflammation genetics, and its size (approximately 2 kb) makes it particularly amenable medical or deep re-sequencing. With these data, we explore several issues related to the present-day candidate gene association study including the benefits of complete SNP discovery, the effects of tagSNP selection across diverse populations, and completeness of dbSNP for CRP. Also, we show that while deep re-sequencing uncovers potentially medically relevant coding SNPs, these SNPs are fleetingly rare when genotyped in a population-based survey of 7,000 Americans (NHANES III). Collectively, these data suggest that several different types re-sequencing and genotyping approaches may be required to fully understand the complete spectrum of alleles that impact human phenotypes.Entities:
Mesh:
Substances:
Year: 2006 PMID: 16550411 PMCID: PMC1449912 DOI: 10.1007/s00439-006-0160-y
Source DB: PubMed Journal: Hum Genet ISSN: 0340-6717 Impact factor: 4.132
Fig. 1CRP variants identified re-sequencing the SeattleSNPs (n=47) and extended DNA panels (n=40). SNPs are numbered across the top of the figure in order of discovery in the reference sequence. Individual samples are labeled to the left of the figure. Each square represents the genotype of a SNP for each individual: homozygous common (blue), heterozygous (red), and homozygous rare (yellow). Gray represents missing data. Abbreviations: African-American (D), European-American (E), extended panel DNA (V31-36: Indo-Pakistani, V80-89: Mexican; and V71-77, V130-139, V321-327: Asian)
Sample size (chromosomes), number of segregating sites (S), nucleotide diversity (θ), and number of inferred haplotypes for CRP re-sequenced in several samples
| Sample | Haplotypes (heterozygosity) | |||
|---|---|---|---|---|
| African-Americans | 48 | 30 (18) | 11.96 | 18 (0.86) |
| European-Americans | 46 | 13 (10) | 5.25 | 7 (0.76) |
| Asians | 48 | 17 (9) | 6.84 | 11 (0.67) |
| Mexicans | 20 | 13 (11) | 6.32 | 6 (0.74) |
| Indo-Pakistanis | 12 | 11 (11) | 7.17 | 6 (0.75) |
| Total | 174 | 40 (17) | 11.83 | 29 (0.86) |
CRP variants identified in the coding region
| Site | Nucleotide change | Amino acid change (position) | Number of chromosomes (panel) | SIFT/PolyPhen prediction |
|---|---|---|---|---|
| (a) Nonsynonymous | ||||
| 2314 | T > C | Tyr to His (67) | 1/794 (PDR) | Tolerated/benign |
| 2513 | C > T | Pro to Lys (133) | 1/80 (extended) | Tolerated/possibly damaging |
| 2612 | G > A | Gly to Glu (166) | 1/828 (PDR) | Tolerated/possibly damaging |
| (b) Synonymous | ||||
| 2220 | T > G | Thr to Thr (35) | 7/700 (PDR) | N/A |
| 2244 | G > A | Pro to Pro (43) | 3/734 (PDR) | N/A |
| 2475 | C > T | Ser to Ser (120) | 2/80 (extended) | N/A |
| 2667 (rs1800947) | G > C | Leu to Leu (184) | 36/826 (PDR) | N/A |
PDR Polymorphism Discovery Resource panel
Fig. 2TagSNPs chosen for four population samples. TagSNPs were chosen from SNPs with MAF >5% using LDSelect at r2>0.64