| Literature DB >> 20098672 |
Tomas Drgon1, Ping-Wu Zhang, Catherine Johnson, Donna Walther, Judith Hess, Michelle Nino, George R Uhl.
Abstract
BACKGROUND: Vulnerabilities to dependence on addictive substances are substantially heritable complex disorders whose underlying genetic architecture is likely to be polygenic, with modest contributions from variants in many individual genes. "Nontemplate" genome wide association (GWA) approaches can identity groups of chromosomal regions and genes that, taken together, are much more likely to contain allelic variants that alter vulnerability to substance dependence than expected by chance. METHODOLOGY/PRINCIPALEntities:
Mesh:
Year: 2010 PMID: 20098672 PMCID: PMC2809089 DOI: 10.1371/journal.pone.0008832
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Validation graph of the relationships between observed (y axis) and expected (x axis) allele frequency data for Affymetrix 6.0 arrays.
“Expected” frequencies come from individual genotyping of individuals. These individuals were assigned to three sets of pools each containing 2, 5 and 15 CEPH individuals (total of 81 individuals). Arctan A/B represent the “observed” measures of allele frequency and are arctangents of the A/B hybridization ratios for this set of pools of individuals. In this figure we have only used SNPs that show at least 10% difference in the expected values across the set of pools (total of 146,000 SNPs). We have obtained similar data from studies validating 500k,100k, 10k and HuSNP arrays [14]–[16]. Note that DNA used for hybridization is less than that recommended for individual genotyping (135 vs 225 ng) in order to avoid saturation of hybridization signals for some array features. Error bars indicate SEM.
Figure 2Chromosomal distributions of abuser/control t values, clustered positive SNPs, and candidate positive genes (Table 1).
Blue boxes: t values of the abuser control differences from 870,000 SNPs studied here. Values from European-Americans: right side, from African-Americans: left side. Red circles: Positions of the SNPs whose data yield clustered positive values. Yellow triangles: positions of clustered positive results that support genes listed in Table 1. Scale bar (grey): 25 Mb.
Results of “converge first then cluster” (approach 1) and “cluster first then converge (approach 2) analytic strategies applied to substance dependence vulnerability datasets described herein.
|
|
|
|
| |||
|
|
|
|
|
| ||
|
| ||||||
|
|
|
|
|
| ||
| 3 | 10,000 | 299 | 37 | 37 | ||
| 4 | 25,000 | 328 | 31 | 31 | ||
| 4 | 10,000 | 86 | 10 | 10 | ||
|
| ||||||
| 3 | 25,000 | 55,507 | 47,614 | 18,552 | 1,546 | 104 |
| 3 | 10,000 | 29,009 | 25,523 | 5,353 | 802 | 79 |
| 4 | 25,000 | 44,881 | 37,927 | 12,562 | 1,015 | 92 |
|
|
|
|
|
|
|
|
Columns list the numbers of SNPs that display abuser vs control differences with nominal p<0.05 (nominally positive) and lie in clusters, the maximal distance between nominally positive SNPs that is considered to indicate clustering, the numbers of clustered, nominally positive SNPs in African American samples, the numbers of clustered, nominally positive SNPs in European American samples, the numbers of “convergent” SNPs that display nominally positive results in both samples, the fraction of “convergent” SNPs that are likely to be true positives, on average, based on comparison with randomly chosen SNPs that are selected for similar convergence analyses (data not shown), the numbers of genes identified by the clusters of nominally positive SNPs and the overlap between the genes in each set and the 104 genes identified by the preplanned criteria used for primary analysis, using approach 1 with 3 SNP and 25kb intervals (boldfaced). The primary comparison set from approach (2) is also listed in boldface, based on the similar fraction of true positives anticipated using these criteria. We summarize these data in Fig. 2.
Genes and classes of genes that contain clustered positive SNPs using the principal, preplanned analyses with criteria noted in Table 1.
| 1: converge then cluster | 2: cluster then converge | dbGAP support | ||||||
|
|
|
|
|
|
|
|
|
|
|
| ||||||||
| CDCP1 | 3 | 45098 | CUB dom cont prot 1 | 3 | 0.021 | |||
| FHIT* | 3 | 59710 | fragile histid triad | 5 | 0.022 | 29/62 | 0.003 | 38/24 |
| ODZ2 | 5 | 166644 | odd Oz/ten-m hom 2 | 3 | 0.091 | 21/23 | 0.011 | 4/0 |
| CSMD1* | 8 | 2782 | CUB Sushi mult dom 1 | 10 | 0.004 | 117/137 | 0.001 | 84/55 |
| CSMD3 | 8 | 113304 | CUB Sushi mult dom 3 | 4 | 0.033 | 4/5 | ||
| CD274 | 9 | 5440 | CD274 molecule | 5 | 0.002 | 7/6 | 0.013 | |
| PCDH15 | 10 | 55250 | protocadherin 15 | 3 | 0.094 | |||
| CTNNA3* | 10 | 67349 | α 3 catenin | 3 | 0.158 | 29/6 | 0.059 | 0/20 |
| NRXN3 | 14 | 77939 | neurexin 3 | 3 | 0.134 | 18/5 | 0.110 | 4/0 |
| SEMA6D | 15 | 45797 | semaphorin 6D | 7 | 0.001 | 14/7 | 0.005 | |
| THSD4 | 15 | 69220 | thrombospondin I dom 4 | 3 | 0.067 | |||
| CDH13* | 16 | 81218 | cadherin 13 | 3 | 0.104 | 76/65 | 0.001 | 0/5 |
| DSCAM* | 21 | 40306 | Down synd cell adh mol | 4 | 0.026 | 23/36 | 0.003 | 8/0 |
|
| ||||||||
| CHD1L | 1 | 145180 | chrdom h'case DNA bind1L | 3 | 0.018 | 5/0 | ||
| DDX1 | 2 | 15649 | DEAD box polypept 1 | 3 | 0.019 | |||
| PRPF4 | 9 | 115077 | pre-mRNA proc fact 4 hom | 3 | 0.012 | |||
| PIWIL1 | 12 | 129388 | piwi-like 1 | 3 | 0.019 | |||
| POLR1D | 13 | 27094 | RNA pol I polypep D | 3 | 0.019 | |||
| SAMD4A | 14 | 54104 | ster a motif dom 4A | 3 | 0.035 | |||
| RAD51L1 | 14 | 67356 | RAD51-like 1 | 3 | 0.073 | 4/10 | 0.112 | 0/8 |
|
| ||||||||
| MKNK1 | 1 | 46795 | MAP kin interact S/T kin 1 | 3 | 0.019 | |||
| AGBL4 | 1 | 48822 | ATP/GTP binding protL 4 | 6 | 0.008 | 10/8 | 0.040 | 7/0 |
| NME7 | 1 | 167368 | nucleoside-diP kin | 14 | 0.001 | 22/25 | 0.001 | 0/8 |
| QSOX1 | 1 | 178390 | quiescin Q6 SH ox'ase 1 | 3 | 0.019 | 7/0 | ||
| PRKCE | 2 | 45732 | protein kinase C epsilon | 3 | 0.059 | 13/22 | 0.002 | 12/0 |
| LASS6 | 2 | 169021 | ceramide synthase 6 | 3 | 0.042 | |||
| TMPRSS7 | 3 | 113236 | serine TM protease 7 | 3 | 0.018 | |||
| EHHADH | 3 | 186391 | 3-OHAc coA dehydrog'ase | 3 | 0.022 | 4/4 | ||
| GBA3 | 4 | 22303 | acidic ß glucosidase 3 | 6 | 0.003 | 5/5 | 0.042 | |
| PDE1C* | 7 | 31795 | calmod-dep P-diest'ase 1C | 3 | 0.039 | |||
| MSRA | 8 | 9949 | methionine SO red'ase A | 3 | 0.047 | |||
| ADARB2 | 10 | 1218 | RNA-spec A deam'ase B2 | 3 | 0.061 | 5/9 | ||
| SLK | 10 | 105717 | STE20-like kinase | 3 | 0.019 | |||
| PRKCH | 14 | 60858 | protein kinase C eta | 5 | 0.008 | |||
| XYLT1 | 16 | 17108 | xylosyltransferase I | 3 | 0.048 | 4/11 | 0.043 | 0/4 |
|
| ||||||||
| CXCL14 | 5 | 134934 | chemokine ligand 14 | 4 | 0.005 | 5/5 | 0.020 | |
|
| ||||||||
| TSSC1 | 2 | 3171 | tumor sup subtrans cand 1 | 4 | 0.009 | 6/6 | 0.034 | |
| FKBP15 | 9 | 114967 | FK506 binding protein 15 | 4 | 0.006 | |||
| HSPA12A | 10 | 118419 | HSP 12A | 3 | 0.020 | 10/15 | 0.004 | |
| BRWD2 | 10 | 122600 | bromodom WD dom 2 | 3 | 0.022 | |||
| DOCK1 | 10 | 128658 | ded cytokinesis 1 | 3 | 0.051 | 0/7 | ||
| PACS1 | 11 | 65594 | Pfurin sort prot 1 | 4 | 0.010 | 7/9 | 0.019 | |
| CCDC91 | 12 | 28301 | coiled-coil dom 91 | 10 | 0.001 | 5/17 | 0.012 | |
| XPO6 | 16 | 28016 | exportin 6 | 4 | 0.006 | 12/5 | 0.011 | |
| PMAIP1 | 18 | 55718 | PMA-induced prot 1 | 3 | 0.011 | 4/4 | 0.041 | |
|
| ||||||||
| OPRD1 | 1 | 29011 | δ opioid rec 1 | 6 | 0.001 | 10/6 | 0.011 | |
| PLA2R1 | 2 | 160506 | Pipase A2 rec 1 | 4 | 0.006 | |||
| GRM7* | 3 | 6877 | metabo glut rec 7 | 3 | 0.083 | 4/5 | 0.226 | 9/34 |
| GRIK2 | 6 | 101953 | ino glut rec kainate 2 | 3 | 0.071 | 5/4 | 0.191 | 10/0 |
| OR51E1 | 11 | 4630 | olfactory rec 51 E 1 | 3 | 0.011 | |||
| LDLRAD3 | 11 | 35922 | low dens lipoprot rec A 3 | 3 | 0.042 | 13/6 | 0.020 | 6/0 |
| GRM5 | 11 | 87880 | metabo glut rec 5 | 5 | 0.011 | 20/7 | 0.015 | 5/0 |
| GRIA4 | 11 | 104986 | ino glut rec AMPA 4 | 3 | 0.048 | |||
| COLEC12 | 18 | 309 | collectin sub-fam 12 | 3 | 0.031 | |||
| INSR | 19 | 7067 | insulin rec | 3 | 0.032 | |||
|
| ||||||||
| BCAR3 | 1 | 93799 | br ca anti-est res 3 | 3 | 0.025 | |||
| TTC21B | 2 | 165905 | tetratricopept rep dom 21B | 4 | 0.020 | 8/4 | 0.118 | 10/12 |
| TIAM2 | 6 | 155453 | T-cell lymph inv met 2 | 3 | 0.032 | |||
| FAM126A | 7 | 22949 | fam seq similar 126 A | 3 | 0.022 | 6/4 | 0.032 | |
| ANO4 | 12 | 99712 | anoctamin 4 | 3 | 0.041 | 0/5 | ||
| APPL2 | 12 | 104091 | pY inter PH dom leu zip 2 | 3 | 0.021 | 9/0 | ||
|
| ||||||||
| INADL | 1 | 61980 | InaD-like | 3 | 0.048 | 0/9 | ||
| LIMCH1 | 4 | 41057 | LIM calpon homol dom 1 | 3 | 0.045 | |||
| DNAH8 | 6 | 38798 | dynein h polypept 8 | 4 | 0.012 | 5/4 | 0.090 | 4/0 |
| MYO6 | 6 | 76515 | myosin VI | 3 | 0.029 | 4/4 | 0.091 | |
| AKAP7 | 6 | 131508 | A kinase anchor prot 7 | 4 | 0.008 | 4/7 | 0.040 | |
| SYNE1 | 6 | 152484 | spectrin rep nuc env 1 | 3 | 0.058 | 26/5 | 0.007 | 6/7 |
| CADPS2 | 7 | 121746 | Ca-dep act prot secret 2 | 3 | 0.063 | |||
| CHCHD3 | 7 | 132120 | coil-coil-helix dom 3 | 3 | 0.039 | 7/4 | 0.064 | |
| MPP7 | 10 | 28382 | palmitoyl memb prot 7 | 3 | 0.033 | |||
| ABLIM1 | 10 | 116180 | actin bind LIM protein 1 | 3 | 0.040 | |||
| PARVA | 11 | 12355 | a parvin | 4 | 0.010 | 4/12 | 0.018 | |
| CSRP3 | 11 | 19160 | C G-rich prot 3 | 3 | 0.016 | |||
| FARP1 | 13 | 97593 | FERM RhoGEF pleckst 1 | 3 | 0.043 | |||
| MYO5C | 15 | 50271 | myosin VC | 3 | 0.023 | 4/9 | 0.027 | |
| FHOD3 | 18 | 32131 | formin homol 2 cont 3 | 3 | 0.056 | 0/12 | ||
|
| ||||||||
| PBX1 | 1 | 162795 | pre-B-cell leukemia TF 1 | 5 | 0.005 | |||
| AFF3 | 2 | 99530 | AF4/FMR2 3 | 3 | 0.063 | |||
| CSRNP3 | 2 | 166137 | cys-ser-rich nuclear prot 3 | 3 | 0.029 | 3/3 | 0.081 | 6/0 |
| ZNF804A | 2 | 185171 | zinc finger protein 804A | 4 | 0.011 | 4/4 | 0.141 | |
| ZNF385D | 3 | 21437 | zinc finger protein 385D | 3 | 0.042 | 12/4 | 0.033 | 0/5 |
| ZNF366 | 5 | 71774 | zinc finger protein 366 | 8 | 0.001 | 7/11 | 0.008 | |
| ETV6 | 12 | 11694 | ets variant gene 6 | 3 | 0.038 | 4/25 | 0.004 | |
| KLF12 | 13 | 73158 | Kruppel-like factor 12 | 4 | 0.015 | 4/7 | 0.094 | |
| ZNF606 | 19 | 63180 | zinc finger protein 606 | 3 | 0.017 | |||
| LDOC1L | 22 | 43267 | L zip down-reg ca 1-L | 3 | 0.012 | 5/4 | 0.028 | |
|
| ||||||||
| ATP1B1 | 1 | 167342 | Na/K transpor ATPase ß 1 | 4 | 0.004 | 4/4 | 0.053 | 0/3 |
| SLC45A2 | 5 | 33980 | solute carrier 45 2 | 4 | 0.005 | 11/0 | ||
| CFTR | 7 | 116907 | ATP-binding cassette C 7 | 3 | 0.033 | 5/0 | ||
| XKR4 | 8 | 56177 | Kell blood gp comp 4 | 3 | 0.051 | 8/8 | 0.043 | |
| SLC2A13 | 12 | 38435 | solute ligand carrier 2 13 | 6 | 0.004 | 17/4 | 0.019 | 0/8 |
| ABCC4* | 13 | 94470 | ATP-binding cassette C 4 | 4 | 0.012 | 13/4 | 0.023 | 23/30 |
| SLC10A2 | 13 | 102494 | solute ligand carrier10 2 | 3 | 0.014 | 12/7 | 0.005 | 0/8 |
|
| ||||||||
| KIAA1276 | 4 | 17242 | KIAA1276 protein | 3 | 0.029 | |||
| FLJ44606 | 5 | 126411 | FLJ44606 | 3 | 0.016 | 5/4 | 0.034 | |
| FAM184A | 6 | 119322 | fam seq sim 184 A | 3 | 0.028 | |||
| BRP44L | 6 | 166698 | brain protein 44-L | 3 | 0.015 | |||
| FRMD4A | 10 | 13725 | FERM dom 4A | 3 | 0.071 | 4/23 | 0.021 | 5/13 |
| C10orf11 | 10 | 77212 | Ch 10 ORF 11 | 3 | 0.077 | |||
| C10orf82 | 10 | 118413 | Ch 10 ORF 82 | 3 | 0.013 | 5/6 | 0.017 | |
| C19orf18 | 19 | 63161 | Ch 19 ORF 18 | 3 | 0.013 | |||
| MACROD2 | 20 | 13924 | MACRO dom 2 | 3 | 0.175 | 57/22 | 0.004 | 6/8 |
| C20orf70 | 20 | 31219 | Ch 20 ORF 70 | 3 | 0.014 | |||
| RHBDD3 | 22 | 27985 | rhomboid dom 3 | 3 | 0.014 | |||
These “converge then cluster” genes thus each contain three or more SNPs that display nominally significant allele frequency differences between both European-American (EA) and African-American (AA) polysubstance abuser vs control comparisons that cluster within <25kb of each other and lie within the gene's exons or within +/−10 kb 3′ or 5′ flanking sequences. Genes are grouped by the class of the function to which they contribute. The numbers of reproducibly positive SNPs that lay in clusters within the gene's exons and in 10 kb genomic flanking regions are noted. Chromosome number and initial chromosomal position for the cluster (bp, NCBI Mapviewer Build 36.1) are listed. “Approach 2/Cluster then converge” genes that were identified by clusters of at least 4 nominally positive SNPs that lay within 10kb of each other and lay within the gene for each sample are listed in the column labeled “2: cluster then converge”. Asterisk identifies genes also identified in [16]. P values are based on 10,000 Monte Carlo simulation trials in which the number of times randomly-selected segments of the genome that lie within genes are assessed for the same features displayed by the actual gene identified. Relevant rs numbers for SNPs are listed in Table S2. dbGAP support lists the numbers of SNPs in the same genes that display nominally-significant differences between cocaine-dependent and nondependent control AA and EA samples from 1M SNP Illumina individual genotyping of samples from COGA, FSCD and COGEND samples as described in dbGAP (http://www.ncbi.nlm.nih.gov/sites/entrez?Db=gap).