Literature DB >> 23715297

Haplotype structure enables prioritization of common markers and candidate genes in autism spectrum disorder.

B N Vardarajan¹, A Eran, J-Y Jung, L M Kunkel, D P Wall.

Abstract

Autism spectrum disorder (ASD) is a neurodevelopmental condition that results in behavioral, social and communication impairments. ASD has a substantial genetic component, with 88-95% trait concordance among monozygotic twins. Efforts to elucidate the causes of ASD have uncovered hundreds of susceptibility loci and candidate genes. However, owing to its polygenic nature and clinical heterogeneity, only a few of these markers represent clear targets for further analyses. In the present study, we used the linkage structure associated with published genetic markers of ASD to simultaneously improve candidate gene detection while providing a means of prioritizing markers of common genetic variation in ASD. We first mined the literature for linkage and association studies of single-nucleotide polymorphisms, copy-number variations and multi-allelic markers in Autism Genetic Resource Exchange (AGRE) families. From markers that reached genome-wide significance, we calculated male-specific genetic distances, in light of the observed strong male bias in ASD. Four of 67 autism-implicated regions, 3p26.1, 3p26.3, 3q25-27 and 5p15, were enriched with differentially expressed genes in blood and brain from individuals with ASD. Of 30 genes differentially expressed across multiple expression data sets, 21 were within 10 cM of an autism-implicated locus. Among them, CNTN4, CADPS2, SUMF1, SLC9A9, NTRK3 have been previously implicated in autism, whereas others have been implicated in neurological disorders comorbid with ASD. This work leverages the rich multimodal genomic information collected on AGRE families to present an efficient integrative strategy for prioritizing autism candidates and improving our understanding of the relationships among the vast collection of past genetic studies.

Entities: CellLine Chemical Disease Gene Mutation Species

Mesh：

Substances：
Genetic Markers

Year: 2013 PMID： 23715297 PMCID： PMC3669925 DOI： 10.1038/tp.2013.38

Source DB: PubMed Journal: Transl Psychiatry ISSN： 2158-3188 Impact factor: 6.222

Introduction

Autism spectrum disorder (ASD) is a neurodevelopmental condition that results in behavioral, social and communication impairments. It is currently estimated that 1 in every 88 children in the United States is affected with ASD, with boys five times more likely to be affected than girls.[1] ASD has a substantial genetic component,[2, 3, 4] with 88–95% monozygotic twin concordance and an estimated heritability of 60–90%.[5] A recent study showed that a large proportion of the variance in liability among monozygotic twins can be explained by shared environmental factors (55% for autism and 58% for ASD) in addition to moderate genetic heritability (37% for autism and 38% for ASD).[6] Studies conclude that there are multiple genetic factors that have a role in the etiology of autism. Recent findings have provided evidence in support of roles for de novo mutations,[7, 8, 9, 10] common genetic variants,[11] rare variants[12] and copy-number variation.[13, 14, 15] Nevertheless, the genetic basis of the majority of ASD remains largely unclear. Contributing to the complexity, ASD linkage studies have uncovered over 70 susceptibility loci across the genome and a large number of gene candidates,[16, 17] but most of these findings have not been successfully replicated. The only exceptions to this trend have been linkage peaks on 17q11–17q21[18, 19, 20, 21] and 7q.[22, 23, 24, 25, 26] Yet, linkage and association studies have dominated the approaches to disentangle the genetic etiology of autism for more than two decades, leaving behind a rich legacy of research findings in the biomedical literature. Reports of significant linkage peaks represent an important clue to the genetic cause of autism that should not be ignored, even in the absence of sufficient replication. Aside from the possibility of false positives, absence of replication could be due to several factors such as lack of sample size, differential recombination rates in the replication population, lower coverage in the replication samples of genetic markers in the linkage peaks or batch effects. However, the mechanistic relevance of the marker should still be determined. For example, a marker may designate collections of genes involved in biological processes or individual genes with mutations of high importance to the susceptibility to autism. Furthermore, these markers and their importance to the etiology of autism, once they have achieved the minimum significance threshold of logarithm-of-the-odds of 3.0 or an association P-value of <0.05 (corrected for multiple testing), are usually treated as equal. Therefore, despite the fact that markers provide maps, the granularity of those maps is insufficient to direct prioritized experimental follow-up, as every marker, and every gene proximal to that marker, is equally likely to be as important. Given that markers have been identified on nearly every chromosome, the utility of linkage studies for providing specific gene leads and directing further experimental research is limited. In the present study, we have focused on maximizing the value of previously published linkage and association findings using families from the Autism Genetic Resource Exchange (AGRE) project for directing further genetic analysis of autism. Specifically, our aim was to provide finer resolution to published linkage and association studies through a novel analytical strategy focused on marker-to-gene male-specific genetic distance. Our study was loosely predicated on the assumption that genes in tight linkage with a susceptibility locus are more likely to be linked with the phenotype of interest, that is, autism, and was leveraged by the collective understanding that the disorder has a substantial male bias. As such, our work focused on reconstructing the male-specific structure of linkage disequilibrium (LD) surrounding significant autism markers to sets of genes in tight, medium and distant LD with those markers. We examined the biological signal inherent to each concept and measured its expression in peripheral blood and postmortem brain tissue from individuals with autism as compared with controls. This strategy improves the resolution of marker-based findings by pointing to the specific genes contributing to the linkage and/or association signals, more likely to have a role in ASD. A large percentage of these genes had not been previously linked to autism but had been implicated in numerous other neurological diseases, including those with overlapping symptoms. Given the ability of this strategy to identify important and novel signal among the rich collection of research findings from various linkage and association studies in autism, we anticipate that it will have broader applications in the study of other complex genetic disorders in which a large collection of samples had been previously typed and not immediately available for modern sequencing.

Materials and methods

Autism marker selection

We first mined the autism literature to identify genetic studies focusing on AGRE families. Owing to the focus on AGRE families, all probands included here were assessed and diagnosed using the same instruments and procedures. We identified 67 reports of significant autism linkage and association signals spanning 18 chromosomes (Table 1). Significance thresholds were a logarithm-of-the-odds score >3, which is suggestive evidence of linkage or corrected-association P-value <0.01 (depending on the number of markers tested in the study). The search was restricted to studies performed on AGRE families because the same subjects were used to calculate the genetic map around autism markers. This strategy allowed us to capture the true rates of recombination in the studied population and avoid any potential recombination bias. As the linkage and association studies were based on various experimental designs, we developed the strategy described below to enable their meta-analysis.

Table 1

Autism markers identified in AGRE families between 2001 and 2012

Chromosome	Marker	Median marker position (bp)	Male-specific genetic map units (cMs)	P-value/LOD (association/linkage)	References
1	dup RFWD2–PAPPA2	174522115	23.3	P=1.0e−02	[37]
1	rs12740310–rs3737296–rs12410279	218873645	26.7	P=5.0e−04	[38]
1	D1S1656	228971975	40.9	NPL=3.21	[39]
1	rs6683048	235855409	37.6	P=2.3e−09	[40]
2	dup AK123120	13142782	41.5	P=3.57e−06	[37]
2	del NRXN1	50557085	20.9	P=3.30e−04	[41]
2	del NRXN1	51134122	21.7	P=4.7e−04	[37]
2	rs17420138	158585159	19.6	P=5.63e−08	[40]
2	rs1807984	168787136	22.9	P=7.0e−03	[42]
2	D2S335	172274852	22.8	HLOD=2.99 NPL=3.32	[43]
2	rs4519482	172671605	24.2	P=7.0e−05	[44]
2	204,444,539–204,446,116 LD block	204445327	26.6	P=1.8e−06	[45]
3	del CNTN4	1915556	27.0	P=4.7e−04	[37]
3	del UNQ3037	4218017	30.4	P=2.0e−03	[37]
3	D3S3045–D3S1763	138597603	23.2	Z=3.10 P<1.0e−03	[18]
3	dup NLGN1	174763176	21.5	P=1.0e−02	[37]
4	rs17599165	46634972	18.1	P=1.5e−03	[46]
4	rs1912960	46648638	18.8	P=7.3e−03	[46]
4	rs17599416	46668195	19.1	P=4.0e−03	[46]
4	rs6826933–rs17088473	61133187	17.8	HLOD=3.79 LOD=2.96	[47]
4	dup GUSBP5	144850990	22.8	P=1e−02	[37]
5	rs10513025	9676622	38.4	P=1.7e−06	[48]
5	rs1896731–rs10038113	25936438	29.3	P=3.4e−06	[38]
5	rs4307059	26003460	31.6	P=3.0E4e−08	[11]
5	rs11959298–rs6596189	134395753	23.2	P=4.0e−04	[49]
6	rs13193457	15453984	35.0	P=3.0e−05	[50]
6	del PARK2	162585788	31.6	P=4.7e−03	[37]
7	rs736707	102917639	21.8	P=1.40e−5	[51]
7	rs1858830	116099675	17.0	P=5.0e−06	[52]
7	rs38841	116107162	16.0	P=6.0e−04	[53]
7	rs7794745	146120539	36.7	LOD=3.4 P<2.14e−05	[54]
7	rs2710102	147205323	35.0	P=2.0e−03	[55]
7	D7S483	151829212	33.4	NPL=3.7,P=7.9e−05	[56]
7	rs1861972–rs1861973	154946830	28.1	P=3.5e−06	[57]
9	rs1340513	6967633	35.7	Zlr=3.21 P=7.0e−04	[58]
9	rs722628	7136888	35.7	Zlr=3.59 P=6.0e−03	[58]
9	rs536861	127353265	31.6	Zlr=3.30 P=5.0e−04	[58]
10	del GRID1	87945347	30.8	P=3.1e−04	[37]
11	rs2421826	35187181	24.5	Zlr=3.57	[58]
11	rs1358054	36163248	25.6	Zlr=3.77 P=8.0e−03	[58]
11	rs6590109	124264258	37.7	P=9.0e−03	[59]
12	rs1445442	63577561	25.0	HLOD=4.51	[60]
14	del MDGA2	46796374	22.1	P=1.3e−04	[41]
15	del OR4M2,OR4N	19844860	26.1	P=9.48e−12	[41]
15	del LOC650137	19915407	24.8	P=9.48e−12	[41]
15	dup UBE2A	23184355	38.4	P=9.27e−06	[41]
15	dup 15q11–13	23704547	37.5	P=1.0e−05	[37]
15	maternal dup 15q11–13	23750000	36.1	p approaching 0	[61]
15	GABRB3 155CA-2	24559869	38.2	MTDT P=2.0e−03	[62]
15	rs25409	24569934	39.2	P=8.0e−03	[63]
15	dup 15q13 BP4–BP5	29508500	42.5	p approaching 0	[64]
15	rs11855650–rs10520676	77364734	23.0	HLOD=3.09 LOD=3.62	[47]
16	FE0DBACA18ZG03v	19408579	32.6	P=1.6e−04	[65]
16	FE0DBACA7ZD06v	24133057	26.8	P=1.4e−05	[65]
16	del/dup 16p11.2	30300000	45.5	P=1.1e−04	[61, 66]
17	D17S1294–D17S1800	26183756	22.2	HLODREC=5.8 P=1.59e−07	[67]
17	D17S1294	26860299	20.9	MLS=3.2 Male-only MLS=4.3	[68]
17	D17S1299	36247989	25.7	MLS=3.6	[19]
17	D17S2180	44028199	24.7	MLS=4.1	[19]
17	rs757415 and rs12603112	46020488	22.8	P=1.9e−05	[69]
17	del BZRAP1	53747037	29.0	P=8.0e−04	[41]
19	del MADCAM1	451915	17.8	P=6.0e−04	[41]
19	rs344781	48866628	23.9	P=6.0e−03	[70]
20	rs723477	237362	22.8	NPL LOD=3.81	[48]
20	rs16999397–rs200888	958294	24.5	HLOD=3.36 LOD=3.38	[47]
20	rs4141463	14695471	34.8	P=3.7e−08	[71]
21	D21S1437	20568713	28.4	NPL=3.4 P=3.5e−04	[56]

Abbreviations: AGRE, Autism Genetic Resource Exchange; SNP, single-nucleotide polymorphism.

Linkage and association studies performed in AGRE families were compiled and genome-wide significant markers identified. The logarithm-of-the-odds (LOD) scores and/or association P-values are listed for each marker. Human genome build 36.3 was used to calibrate marker position. Male-specific genetic distances were calculated using dense SNP genotypes from the same individuals.

Each marker was first mapped to the NCBI human genome build 36.3. Then, a 20-Mb slice flanking that genomic coordinate was retrieved and the single-nucleotide polymorphisms (SNPs) within that region were used for calculating a genetic map using the same subjects' genotypes.[11] The nearest SNP to the autism marker was used as the reference for calculating recombination rates with other SNPs. The recombination rates were determined with respect to the reference. We assumed that the recombination rates between the marker and the nearest SNP was negligible, enabling us to designate that SNP as a proxy for the marker. Owing to the heterogeneity in the discovery methods of the various regions (linkage vs association, copy-number variations vs SNPs and so on), we treated each region as equally significant. This enabled us to use an unbiased approach in finding genes and regions that were enriched for autism cases.

Calculation of LD structure of autism markers

In order to establish the male-specific LD structure between genes and autism markers, we created genetic maps from a 20-Mb slice of the chromosome flanking each linkage locus. Specifically, we collected and assembled SNPs 10 Mb upstream and 10 Mb downstream of each autism marker using the SNP data for AGRE probands.[11] As autism is almost five times more prevalent in males, we filtered out the females from the data set before calculating the genetic map. These filtration procedures followed the logic that an AGRE data specific and male-only genetic map would be the most likely to provide an accurate reflection of the samples contributing to the linkage and association signals reported in the pooled studies. To create the genetic maps for each autism marker, we estimated fine-scale recombination rates using the LDhat software package.[27] This program estimates recombination rates between adjacent SNPs by fitting a Bayesian model based on coalescent theory to analyze patterns of LD in the data. We conducted this analysis for all 67 markers, identifying the male-specific genetic distances between the marker and genes surrounding that marker, measured in cM. For further filtering, we pruned the genetic map to 15 cM around the marker. A process flow for the creation of these LD structure (LDS) sets is depicted in Figure 1.

Figure 1

Integrative genomics workflow for prioritizing candidate genes for further experimentation. (I) The rich collection of genetic studies performed on Autism Genetic Resource Exchange (AGRE) families between 2001 and 2012 was mined to identify genome-wide significant linkage and association signals. (II) Markers were remapped to the current genome build (NCBI human genome build 36.3) and flanking regions extracted. (III) Single-nucleotide polymorphism (SNP) genotypes of AGRE male probands were compiled to enable male-specific genetic distance calculations in the same subjects. (IV) Regional recombination rates between markers and SNPs were calculated and (V) protein-coding genes within 20 male-specific cM from the markers identified. (VI) The expression profiles of these genes were examined in brain and blood of individuals with autism spectrum disorder (ASD) relative to neurotypical individuals. Genes found to be differentially expressed in both tissues and located within the male-specific vicinity of a significant autism marker are considered prime candidates for further studies. Of 30 genes that satisfy these criteria, 19 were previously implicated in disorders that share symptoms and morbidity patterns with ASD.

Messenger RNA expression data processing

Gene Expression Omnibus data sets GSE6575[28] and GSE28521[29] were used to examine the expression of genes surrounding significant autism markers in individuals with ASD. The GSE6575 data set consists of 17 samples of individuals with ASD without regression, 18 individuals with ASD with regression, 9 patients with mental retardation or developmental delay, and 12 typically developing children from the general population. In this previous study, total RNA was extracted from whole blood samples using the PaxGene (Qiagen, Germantown, MD, USA) Blood RNA System and run on Affymetrix U133plus2.0 (Santa Clara, CA, USA). For the purposes of our study, we elected to use the 35 individuals with autism and 12 control samples from the general population. Preprocessing and expression analyses were done with the Bioinformatics Toolbox Version 2.6 (for Matlab R2007a+, Mathworks, Natick, MA, USA). GeneChip Robust Multi-array Average was used for background adjustment, and control probe intensities were used to estimate nonspecific binding.[30] Housekeeping genes, gene expression data with empty gene symbols, genes with very low absolute expression values and genes with low variance were removed from the preprocessed data set. The GSE28521 data set consisted of postmortem brain tissue samples from 19 autism cases and 17 controls from the Autism Tissue Project, using the Illumina (San Diego, CA, USA) HumanRef-8 v3.0 expression beadchip panel. Three regions of the brain previously implicated in autism were profiled in each individual: superior temporal gyrus (also known as Brodmann's area 41/42), prefrontal cortex (BA9) and cerebellar vermis. Raw data were formatted with log2 transformation and normalized by quantile normalization. We considered probes with detection P-value<0.05 for at least half of the samples for further analysis, as described here.[29] Raw P-values were generated using limma/bioconductor package in R software (http://www.bioconductor.org/packages/2.12/bioc/html/limma.html), and Benjamini and Hochberg multiple testing correction was applied to obtain adjusted P-values.

Gene expression profiles around common autism markers

To examine the importance of genes at varying cM distances, and to examine the level of signal relevant to autism surrounding each autism marker individually, we treated each marker region as an independent hypothesis. We then examined the differential regulation of genes within LDS sets using the messenger RNA expression profiles described above. Our hypothesis is that genes at close genetic distances from autism markers will be more differentially regulated than genes not in LD with the autism markers. Our tests for significant differential expression deviated from standard analyses of microarray data for the primary reason that each LDS set reflected independent, prior biological knowledge. As such, we treated each LDS set as a separate collection of hypotheses, with the number of hypotheses being tested simultaneously equivalent to the number of genes in the set. To appropriately account for this multiple testing, we adjusted the nominal P-values using the q-value calculation,[31] a measurement framed in terms of the false discovery rate.[32] All 67 LDS sets were investigated in this way to determine the frequencies of significant, adjusted P-values (q<0.05) surrounding each autism marker.

Disease cross-referencing

We mined eight existing gene-disease annotation resources for genes associated with neurological disorders considered to be closely related to autism.[33] Diseases included tuberous sclerosis, epilepsy, seizure disorder and many others with established behavioral similarities to ASD. The databases examined included the Genetic Association Database,[34] Database of Genomic Variants (http://projects.tcag.ca/variation/), dbSNP (http://www.ncbi.nlm.nih.gov/projects/SNP/), HuGE Navigator Navigator,[35] Human Gene Mutation Database (http://www.hgmd.cf.ac.uk/ac/index.php), Online Mendelian Inheritance in Man (http://www.ncbi.nlm.nih.gov/omim/), GeneCards (http://www.genecards.org/) and SNPedia (http://snpedia.com/index.php/SNPedia). Results from these resources were integrated to create a list of genes and associated gene characteristics, which was used for comparisons with the autism LDS genes.

Results

More than 200 genetic studies were conducted on AGRE families between 2001 and 2012. These were mined to identify 67 genome-wide significant linkage and association signals for ASD (Table 1). Common markers for autism span 18 chromosomes, all with a logarithm-of-the-odds score >3 or a corrected association P-value<0.01. These studies were based on various experimental designs, mostly using multiplex families with affected sib-pairs. We calibrated the positions of significant markers using NCBI human genome build 36.3 (NCBI), and then aggregated all SNPs within a 10-MB window on either side of the marker to calculate the male-specific structure of LD around each marker. Examining the recombination rates in the same subjects allows us to build a population-specific genetic map, eliminating any genetic bias that might arise from considering ethnicity-matched controls. Our calculations of recombination rates and LD between SNPs and common autism markers identified a total of 1426 genes within 25 cM of the markers. Of those, 697 protein-coding genes were within 5 cM, 450 between 5 and 10 cM and 212 between 10 and 15 cM from the nearest autism locus (Figure 2). Both recombination rates and gene densities varied extensively among autism markers (28.1±7.3 cM in the 20-Mb region around markers, spanning 35.4±10.4 genes). There was a strong correlation (rho=0.7) between the size of the genetic map and the proportion of genes at distances >10 cM. The highest density of genes was around RFWD2 and PAPPA2 on chromosome 1, in a copy-number variation-associated region encoding 60 genes within 24 cM. Forty-eight and 90% of the genes fell within 5 and 10 cM, respectively, indicating that LD was well preserved with increasing distance from the autism locus. In contrast, the region around a common copy-number variations near UNQ3037 on chromosome 3 contained 73% genes at a distance greater than >10 cM.

Figure 2

Number of genes within 20 cM of significant autism markers. Genetic distances were calculated using male-only Autism Genetic Resource Exchange (AGRE) proband single-nucleotide polymorphisms (SNPs).[11] Genes were grouped into three distance bins indicating the extent of recombination with the autism marker. The figure displays the number of genes in tight linkage with the marker, and therefore the extent of recombination around each locus.

Previous results indicate that the information content varies by marker and genetic distance, but do not directly demonstrate whether this information is of relevance to our understanding of the genetic etiology of autism. To test directly whether specific markers and/or regions surrounding those markers are more likely to contain promising new gene leads, we examined the regulatory patterns of each LDS set independently in two expression data sets obtained from the Gene Expression Omnibus: a blood-based messenger RNA expression data from individuals with autism and controls (GSE6575)[28] and a transcriptomic analysis of postmortem brain RNA (GSE28521). In the blood-based expression data set, although the large majority showed no change in expression, 27 marker regions (40%) contained at least one gene with significant, multiple test-corrected differential expression (Table 2). More than 50% of the genes around markers on 3p26 (del CNTN4, del UNQ3037), 3q (D3S3045–D3S1763), 2q (rs17420138) and 5p (rs10513025) were differentially expressed in whole blood from individuals with ASD. In all, 79 genes were significantly enriched at q<0.05 across all the marker sets out of which 31 (39%) and 60 (76%) genes lie within 5 and 10 cM of the nearest autism marker, respectively, further supporting the notion that the genes proximal to the markers represent more viable autism gene leads than genes further away.

Table 2

Differential expression of genes around common autism markers

	Blood (GSE6575)		Brain (GSE28521)
Marker	Number of surrounding genes with expression data	% of genes significantly differentially expressed in individuals with ASD	Number of surrounding genes with expression data	% of genes significantly differentially expressed in individuals with ASD
del CNTN4	16	100.0	12	100.0
rs10513025	12	75.0	12	100.0
D3S3045–D3S1763	22	50.0	26	100.0
(CNV: UNQ3037)	19	89.5	21	66.7
rs11855650–rs10520676	21	28.6	18	100.0
del NRXN1 (Glessner et al.[37])	35	5.7	27	100.0
del NRXN1 (Bucan et al.[41])	32	6.3	24	100.0
rs6683048	43	4.7	30	100.0
dup AK123120	13	0.0	17	100.0
rs4307059	16	0.0	7	100.0
rs1896731–rs10038113	15	0.0	7	100.0
rs7794745	37	0.0	23	100.0
rs2710102	38	0.0	25	100.0
rs736707	30	0.0	23	100.0
rs344781	28	0.0	16	100.0
D7S483	29	0.0	21	95.2
rs1861972–rs1861973	27	0.0	19	94.7
del GRID1	28	14.3	17	94.1
del BZRAP1	19	0.0	14	92.9
rs17599165	24	0.0	13	84.6
rs1912960	24	0.0	13	84.6
rs17599416	24	0.0	13	84.6
rs757415–rs12603112	25	4.0	22	81.8
dup NLGN1	22	0.0	16	81.3
D17S2180	27	0.0	21	76.2
Chr2:204444539–204446116 LD block	25	36.0	10	70.0
rs1807984	25	0.0	20	70.0
D1S1656	42	0.0	36	66.7
del MDGA2	17	0.0	11	63.6
FE0DBACA18ZG03v	30	0.0	22	63.6
rs12740310–rs3737296–rs12410279	28	0.0	21	61.9
D2S335	28	0.0	18	61.1
rs4519482	28	0.0	18	61.1
FE0DBACA7ZD06v	24	0.0	18	61.1
rs38841	22	9.1	23	60.9
rs1858830	22	9.1	23	60.9
dup GUSBP5	17	0.0	12	58.3
rs723477	9	11.1	16	56.3
del PARK2	35	11.4	20	50.0
dup RFWD2–PAPPA2	32	0.0	24	50.0
rs6826933–rs17088473	12	0.0	10	50.0
rs16999397–rs200888	13	0.0	18	50.0
dup UBE2A	14	7.1	13	46.2
maternal dup 15q11–13	15	6.7	13	46.2
GABRB3 155CA-2	15	6.7	13	46.2
rs25409	15	6.7	13	46.2
dup 15q11–13	15	6.7	13	46.2
rs4141463	10	20.0	18	44.4
D21S1437	10	0.0	7	42.9
del/dup 16p11.2	15	0.0	12	41.7
dup 15q13 BP4–BP5	23	8.7	19	31.6
rs11959298—rs6596189	23	0.0	16	31.3
rs1358054	29	31.0	26	30.8
rs1340513	24	0.0	13	30.8
rs722628	24	0.0	13	30.8
D17S1299	20	0.0	20	30.0
rs536861	33	0.0	25	28.0
del OR4M2–OR4N	10	10.0	11	27.3
del LOC650137	10	10.0	11	27.3
rs2421826	26	46.2	23	26.1
D17S1294–D17S1800	21	0.0	17	17.6
D17S1294	21	0.0	17	17.6
rs6590109	24	0.0	18	11.1
rs13193457	27	0.0	21	9.5
rs17420138	17	100.0	17	0.0
rs1445442	24	0.0	0	0.0
del MADCAM1	8	0.0	5	0.0

Abbreviations: ASD, autism spectrum disorder; CNV, copy-number variation.

For each marker region, the table lists the percentage of genes found to be differentially expressed in blood and brain of individuals with ASD at a significance level of q<0.05.

In postmortem brain tissue data there was an abundance of signal in 64 of the 67 LDS sets, which contained at least one gene at q-value<0.05. Regions around 41 markers contained gene sets with significant differential expression, defined as >50% of gene differentially expressed in at least one brain region between individuals with ASD and matched controls at a q-value threshold of 0.05. Of 383 genes showing evidence of differential expression at q<0.05, 205 (53%) and 323 (84%) lie within 5 and 10 cM of the nearest autism marker, respectively. Four markers were found to reside within a neighborhood of differentially expressed genes in both brain and blood of individuals with ASD. At least 50% of protein-coding genes around rs10513025, D3S3045–D3S1763, del CNTN4 and del UNQ3037 are differentially expressed in both tissues (Table 2). Three of these regions, 20 Mb around del CNTN4, del UNQ3037 and rs10513025 show heavy recombination and contain 73%, 68% and 47% of genes, respectively, at >10 cM. Despite significant recombination within the region, genes significantly enriched for differential expression in both data sets were those closer to the autism marker. Of 30 genes found to be significantly differentially expressed in both blood and brain of individuals with ASD, 11 and 20 were within 5 and 10 cM of the nearest autism marker, respectively. Integrating a decade of genome-wide linkage and association studies, the male bias of ASD and differential expression in both brain and blood of individuals with ASD has identified a set of 30 prime candidates for future experimentation, such as efficient targeted resequencing in very large cohorts.[36] Of these, CADPS2, CNTN4, NTRK3, SLC9A9 and SUMF1 have been previously implicated in ASD. Other differentially expressed genes within 20 male-specific cM of common autism markers have been implicated in disorders with shared symptoms and morbidity patterns, but have not yet been implicated in ASD (Table 3).

Table 3

Top candidate genes based on integrating a decade of genome-wide linkage and association studies, the autism male bias and differential expression in brain and blood of individuals with autism spectrum disorder (ASD)

	Differential expression in blood		Differential expression in brain
Gene	t-test P	FDR	t-test P	FDR	Male-specific genetic distance from marker (cM)	Association with disorders comorbid to ASDa
TRIM44	9.5e−02	4.8e−02	5.1e−03	1.4e−02	0.84
ITPR1	7.7e−01	9.3e−04	1.9e−03	1.5e−03	1.09	4, 7, 15, 16
IREB2	2.3e−02	4.0e−02	1.2e−02	2.8e−03	1.39	4, 10
CNTN4	1.4e−01	2.4e−02	3.2e−01	5.1e−03	1.88	4, 6, 8, 16, 18, 19
NMNAT3	2.0e−01	4.5e−02	4.6e−01	8.3e−03	2.39	10
RAB6B	2.0e−01	4.5e−02	1.0e−03	1.9e−04	3.02
CADPS2	4.1e−04	3.7e−03	1.0e−05	4.9e−05	3.34	5, 6
SPTBN1	1.5e−03	9.1e−03	1.6e−01	3.1e−06	3.56	1, 16
TMEM108	2.1e−01	4.5e−02	1.0e−01	2.8e−03	4.09
ACPL2	8.3e−02	2.8e−02	8.2e−02	2.4e−03	4.43
ADCY2	1.4e−01	3.2e−02	3.9e−02	3.9e−03	4.77	16
NSUN2	1.1e−02	1.1e−02	7.7e−01	3.8e−02	6.62	16
PANK1	1.4e−02	3.8e−02	7.7e−03	2.8e−03	7.14
SUMF1	1.4e−01	2.4e−02	4.9e−01	6.7e−03	7.31	9, 10, 13, 21, 22
TANC1	1.2e−01	1.7e−03	6.0e−02	3.8e−02	7.31	4, 6, 17, 19, 20
SLC23A2	1.6e−02	2.7e−02	2.4e−02	1.8e−02	8.35
EPB41L5	2.5e−03	1.3e−02	3.5e−02	1.8e−02	8.65
ALKBH3	1.6e−01	3.7e−02	3.0e−05	2.4e−04	9.00
SLC9A9	7.5e−02	2.8e−02	4.3e−04	1.8e−04	9.04	5, 6, 8, 15, 18, 20
NTRK3	3.0e−02	4.0e−02	7.9e−02	9.1e−03	9.67	2, 3, 5, 6, 7, 11, 15, 16, 17, 23
PLSCR4	5.4e−02	2.8e−02	8.4e−03	5.0e−04	12.30	7, 16
MYO10	1.4e−01	3.2e−02	1.4e−01	9.2e−03	12.64	10
KCNMA1	1.3e−02	3.8e−02	2.4e−01	2.2e−02	13.86	2, 8, 10, 15, 16, 18
SMYD3	4.8e−04	9.5e−03	2.6e−02	1.8e−03	14.32
ATP2B2	5.3e−02	2.4e−02	1.1e−03	9.4e−04	14.77	14, 16, 20
ALDH18A1	9.1e−03	3.8e−02	6.0e−01	4.1e−02	15.93	8, 10, 12, 18, 19, 20
LMCD1	1.3e−01	2.4e−02	5.4e−01	6.7e−03	16.72
ATG7	2.7e−01	4.5e−04	3.4e−04	7.1e−04	16.78	10
SYN2	2.6e−01	2.6e−02	4.6e−03	2.9e−03	18.41	7, 8, 15, 16
MKRN2	2.3e−01	2.6e−02	2.0e−02	9.5e−03	18.70

Abbreviation: FDR, false discovery rate.

Listed are genes located within 20 male-specific cM of genome-wide significant autism markers, which are also differentially expressed in both brain and blood of individuals with ASD. Of these, 19 genes (63%) were previously implicated in neurological disorders with high degrees of overlap in symptomatology and morbidity to ASD.

List of disorders: (1) neurofibromatosis, (2) tuberous sclerosis, (3) anxiety disorders, (4) ataxia, (5) attention deficit disorder, (6) autistic disorder, (7) bipolar disorder, (8) seizures, (9) cerebral palsy, (10) dementia, (11) depressive disorder, (12) Down syndrome, (13) dystonia, (14) encephalomyelitis, (15) epilepsy, (16) schizophrenia, (17) hydrocephalus, (18) mental retardation, (19) microcephaly, (20) multiple sclerosis, (21) neuroacanthocytosis, (22) neuroaxonal dystrophies, (23) obsessive-compulsive disorder.

Discussion

Despite the high heritability of autism, efforts to identify its genetic causes have enjoyed only limited success. Numerous susceptibility loci have been identified, yet few have been replicated, supporting the notion that the genetic complexity of this disorder outmatches the proportion of the population with autism that has been sampled to date. Until the sampling adequately covers the diversity of genetic systems underlying ASD, we must develop analytical approaches to make optimal use of existing results. To this end, we focused here on the development of a simple strategy aimed at targeting previously published autism markers, as well as genes genetically proximal to those markers and most likely to be causally related to ASD. By coupling the structure of LD with knowledge of biological process and patterns of gene expression data from individuals with ASD, we were able to identify a set of markers and genes proximal to those markers likely to be most informative to the genetic basis of autism. Specific loci on a few chromosomes including three signals on chromosome 3 and one on chromosome 5 yielded the greatest signal, with a sizable percentage of adjacent genes showing highly significant differential expression in blood and brain data from individuals with autism. In support of their relevance to the genetics of autism, many of the differentially expressed genes closely linked to the markers have already been identified as promising autism gene candidates, such as CNTN4, CADPS2, SUMF1, NTRK3 and SLC9A9. In addition, an even greater percentage of these genes have been linked to neurological diseases with high comorbidity and behavioral similarities to ASD. Overall, our strategy provides a means for meta-analysis of previous linkage and association studies to prioritize both markers and adjacent genes for further experimental analysis. Although our results corroborate the general rule of thumb that genes close to loci identified via linkage and association studies are likely to be informative to the disease under study, they stress that this rule only applies to specific markers. Given the success of application to the autism research field, we expect that our analytical strategy could be of general use in the study of other similarly complex genetic diseases, such as Alzheimer's disease and type 1 diabetes.

68 in total

1. A navigator for human genome epidemiology.

Authors: Wei Yu; Marta Gwinn; Melinda Clyne; Ajay Yesupriya; Muin J Khoury
Journal: Nat Genet Date: 2008-02 Impact factor: 38.330

2. A genomewide screen for autism: strong evidence for linkage to chromosomes 2q, 7q, and 16p.

Authors:
Journal: Am J Hum Genet Date: 2001-07-30 Impact factor: 11.025

3. Association between a GABRB3 polymorphism and autism.

Authors: J D Buxbaum; J M Silverman; C J Smith; D A Greenberg; M Kilifarski; J Reichert; E H Cook; Y Fang; C-Y Song; R Vitale
Journal: Mol Psychiatry Date: 2002 Impact factor: 15.992

4. Identifying autism loci and genes by tracing recent shared ancestry.

Authors: Eric M Morrow; Seung-Yun Yoo; Steven W Flavell; Tae-Kyung Kim; Yingxi Lin; Robert Sean Hill; Nahit M Mukaddes; Soher Balkhy; Generoso Gascon; Asif Hashmi; Samira Al-Saad; Janice Ware; Robert M Joseph; Rachel Greenblatt; Danielle Gleason; Julia A Ertelt; Kira A Apse; Adria Bodell; Jennifer N Partlow; Brenda Barry; Hui Yao; Kyriacos Markianos; Russell J Ferland; Michael E Greenberg; Christopher A Walsh
Journal: Science Date: 2008-07-11 Impact factor: 47.728

5. Autism genome-wide copy number variation reveals ubiquitin and neuronal genes.

Authors: Joseph T Glessner; Kai Wang; Guiqing Cai; Olena Korvatska; Cecilia E Kim; Shawn Wood; Haitao Zhang; Annette Estes; Camille W Brune; Jonathan P Bradfield; Marcin Imielinski; Edward C Frackelton; Jennifer Reichert; Emily L Crawford; Jeffrey Munson; Patrick M A Sleiman; Rosetta Chiavacci; Kiran Annaiah; Kelly Thomas; Cuiping Hou; Wendy Glaberson; James Flory; Frederick Otieno; Maria Garris; Latha Soorya; Lambertus Klei; Joseph Piven; Kacie J Meyer; Evdokia Anagnostou; Takeshi Sakurai; Rachel M Game; Danielle S Rudd; Danielle Zurawiecki; Christopher J McDougle; Lea K Davis; Judith Miller; David J Posey; Shana Michaels; Alexander Kolevzon; Jeremy M Silverman; Raphael Bernier; Susan E Levy; Robert T Schultz; Geraldine Dawson; Thomas Owley; William M McMahon; Thomas H Wassink; John A Sweeney; John I Nurnberger; Hilary Coon; James S Sutcliffe; Nancy J Minshew; Struan F A Grant; Maja Bucan; Edwin H Cook; Joseph D Buxbaum; Bernie Devlin; Gerard D Schellenberg; Hakon Hakonarson
Journal: Nature Date: 2009-04-28 Impact factor: 49.962

6. Linkage, association, and gene-expression analyses identify CNTNAP2 as an autism-susceptibility gene.

Authors: Maricela Alarcón; Brett S Abrahams; Jennifer L Stone; Jacqueline A Duvall; Julia V Perederiy; Jamee M Bomar; Jonathan Sebat; Michael Wigler; Christa L Martin; David H Ledbetter; Stanley F Nelson; Rita M Cantor; Daniel H Geschwind
Journal: Am J Hum Genet Date: 2008-01 Impact factor: 11.025

7. Strong association of de novo copy number mutations with autism.

Authors: Jonathan Sebat; B Lakshmi; Dheeraj Malhotra; Jennifer Troge; Christa Lese-Martin; Tom Walsh; Boris Yamrom; Seungtai Yoon; Alex Krasnitz; Jude Kendall; Anthony Leotta; Deepa Pai; Ray Zhang; Yoon-Ha Lee; James Hicks; Sarah J Spence; Annette T Lee; Kaija Puura; Terho Lehtimäki; David Ledbetter; Peter K Gregersen; Joel Bregman; James S Sutcliffe; Vaidehi Jobanputra; Wendy Chung; Dorothy Warburton; Mary-Claire King; David Skuse; Daniel H Geschwind; T Conrad Gilliam; Kenny Ye; Michael Wigler
Journal: Science Date: 2007-03-15 Impact factor: 47.728

8. Linkage analysis for autism in a subset families with obsessive-compulsive behaviors: evidence for an autism susceptibility gene on chromosome 1 and further support for susceptibility genes on chromosome 6 and 19.

Authors: J D Buxbaum; J Silverman; M Keddache; C J Smith; E Hollander; N Ramoz; J G Reichert
Journal: Mol Psychiatry Date: 2004-02 Impact factor: 15.992

9. Further characterization of the autism susceptibility locus AUTS1 on chromosome 7q.

Authors:
Journal: Hum Mol Genet Date: 2001-04-15 Impact factor: 6.150

10. Multiplex targeted sequencing identifies recurrently mutated genes in autism spectrum disorders.

Authors: Brian J O'Roak; Laura Vives; Wenqing Fu; Jarrett D Egertson; Ian B Stanaway; Ian G Phelps; Gemma Carvill; Akash Kumar; Choli Lee; Katy Ankenman; Jeff Munson; Joseph B Hiatt; Emily H Turner; Roie Levy; Diana R O'Day; Niklas Krumm; Bradley P Coe; Beth K Martin; Elhanan Borenstein; Deborah A Nickerson; Heather C Mefford; Dan Doherty; Joshua M Akey; Raphael Bernier; Evan E Eichler; Jay Shendure
Journal: Science Date: 2012-11-15 Impact factor: 47.728

9 in total

1. Chromosomal microarray analysis of consecutive individuals with autism spectrum disorders or learning disability presenting for genetic services.

Authors: Jennifer L Roberts; Karine Hovanes; Majed Dasouki; Ann M Manzardo; Merlin G Butler
Journal: Gene Date: 2013-11-02 Impact factor: 3.688

2. Case report of Chromosome 3q25 deletion syndrome or Mucopolysaccharidosis IIIB.

Authors: Yu-Tzu Chang; Chung-Hsing Wang; I-Ching Chou; Wei-De Lin; Siew-Yin Chee; Huang-Tsung Kuo; Fuu-Jen Tsai
Journal: Biomedicine (Taipei) Date: 2014-08-06

Review 3. A Subset of Autism-Associated Genes Regulate the Structural Stability of Neurons.

Authors: Yu-Chih Lin; Jeannine A Frei; Michaela B C Kilander; Wenjuan Shen; Gene J Blatt
Journal: Front Cell Neurosci Date: 2016-11-17 Impact factor: 5.505

Review 4. Bio-collections in autism research.

Authors: Jamie Reilly; Louise Gallagher; June L Chen; Geraldine Leader; Sanbing Shen
Journal: Mol Autism Date: 2017-07-10 Impact factor: 7.509

5. Meta-analysis of GWAS of over 16,000 individuals with autism spectrum disorder highlights a novel locus at 10q24.32 and a significant overlap with schizophrenia.

Authors:
Journal: Mol Autism Date: 2017-05-22 Impact factor: 7.509

6. Role of a circadian-relevant gene NR1D1 in brain development: possible involvement in the pathophysiology of autism spectrum disorders.

Authors: Masahide Goto; Makoto Mizuno; Ayumi Matsumoto; Zhiliang Yang; Eriko F Jimbo; Hidenori Tabata; Takanori Yamagata; Koh-Ichi Nagata
Journal: Sci Rep Date: 2017-03-06 Impact factor: 4.379