Literature DB >> 35047858

Population-based genetic effects for developmental stuttering.

Hannah G Polikowsky¹, Douglas M Shaw¹, Lauren E Petty¹, Hung-Hsin Chen¹, Dillon G Pruett², Jonathon P Linklater³, Kathryn Z Viljoen⁴, Janet M Beilby⁴, Heather M Highland⁵, Brandt Levitt⁶, Christy L Avery^5,6, Kathleen Mullan Harris^6,7, Robin M Jones², Jennifer E Below¹, Shelly Jo Kraft⁸.

Abstract

Despite a lifetime prevalence of at least 5%, developmental stuttering, characterized by prolongations, blocks, and repetitions of speech sounds, remains a largely idiopathic speech disorder. Family, twin, and segregation studies overwhelmingly support a strong genetic influence on stuttering risk; however, its complex mode of inheritance combined with thus-far underpowered genetic studies contribute to the challenge of identifying and reproducing genes implicated in developmental stuttering susceptibility. We conducted a trans-ancestry genome-wide association study (GWAS) and meta-analysis of developmental stuttering in two primary datasets: The International Stuttering Project comprising 1,345 clinically ascertained cases from multiple global sites and 6,759 matched population controls from the biobank at Vanderbilt University Medical Center (VUMC), and 785 self-reported stuttering cases and 7,572 controls ascertained from The National Longitudinal Study of Adolescent to Adult Health (Add Health). Meta-analysis of these genome-wide association studies identified a genome-wide significant (GWS) signal for clinically reported developmental stuttering in the general population: a protective variant in the intronic or genic upstream region of SSUH2 (rs113284510, protective allele frequency = 7.49%, Z = -5.576, p = 2.46 × 10-8) that acts as an expression quantitative trait locus (eQTL) in esophagus-muscularis tissue by reducing its gene expression. In addition, we identified 15 loci reaching suggestive significance (p < 5 × 10-6). This foundational population-based genetic study of a common speech disorder reports the findings of a clinically ascertained study of developmental stuttering and highlights the need for further research.

Entities: Chemical

Keywords: complex trait; genome-wide assocation study; population-study; stuttering

Year: 2021 PMID： 35047858 PMCID： PMC8756529 DOI： 10.1016/j.xhgg.2021.100073

Source DB: PubMed Journal: HGG Adv ISSN： 2666-2477

Introduction

Speech and language represents an integral component of the human experience; we unite language (what we say) and speech (how we say it) to express our thoughts, feelings, and experiences with one another. Successful verbal communication requires coordination among neurological, cognitive, motor, and linguistic systems; dysregulation across or among any of these systems may result in disordered speech and language.2, 3, 4 Developmental stuttering is a common speech disorder with an onset between 2 and 5 years of age characterized by prolongations, blocks, and repetitions of speech sounds. Although most studies report a 5% lifetime prevalence of stuttering,,6, 7, 8, 9 the true prevalence may be higher due to restrictive reporting criteria and poor subject selection,, particularly since onset and recovery can be transient during early childhood. Individuals who are affected by the condition either stutter into adulthood (persistent stuttering) or stutter during early childhood but recover with the assistance of therapy, or spontaneously, typically before age 8 years (recovered stuttering). Persistent developmental stuttering afflicts approximately 1% of the adult population,, which equates to more than 2.5 million adults afflicted with developmental stuttering in the United States. This common speech condition impacts the quality of life for many. Persistent stuttering has no known cure, and therapy for affected individuals often results in only a modest reduction in severity. Moreover, those who stutter frequently require a lifetime of therapy to manage the speech challenges as well as the psycho-social impact.15, 16, 17 Job performance and employability in adults who stutter can be affected, leading to substantial economic impacts.18, 19, 20 Despite extensive research on the psychological and economic consequences of this speech disorder, the etiology of developmental stuttering remains elusive. Current evidence postulates neurological, biological, and genetic underpinnings for stuttering,23, 24, 25, 26, 27, 28, 29, 30 though few causal associations have been identified to date. Even though multiple studies in the past few decades,,30, 31, 32, 33 evince a genetic predisposition for developmental stuttering, its genetic etiology and architecture largely remain evasive. Family, twin, and segregation studies overwhelmingly support a strong genetic influence on stuttering risk; many individuals who stutter have a family member who also stutters. However, heritability estimates of developmental stuttering have varied widely across studies,,34, 35, 36, 37, 38 with estimates ranging from 0.42 to 0.84 from the two largest twin studies, each comprising a sample size greater than 20,000 individuals. Although heritability estimates performed in twin studies of developmental stuttering point to genetic causes, such estimates also indicate the presence of environmental factors contributing to developmental stuttering. Monozygotic twin concordance rate estimates range from 38%–62% in these two studies., Nevertheless, many studies of other complex disorders (e.g., type 2 diabetes, [MIM: 125853], serum lipid levels, Parkinson disease, and Alzheimer disease, [MIM: 104300]) with similar or smaller heritability estimates have discovered genetic risk factors essential to understanding the molecular basis of the trait, suggesting that similar genetic study designs may offer key insights into the etiology of developmental stuttering. To date, published literature investigating genetic contributions to developmental stuttering has primarily drawn on family-based analyses and studies of population isolates.23, 24, 25, 26, 27,29, 30, 31,, Linkage and other family-based approaches have been successful at identifying rare and private causal variants with large genetic effects in the absence of genetic heterogeneity. For developmental stuttering, identifying the causal gene(s) within and across families has proven challenging. For example, in 2005 Riaz et al. performed linkage analyses in 46 consanguineous Pakistani families where stuttering occurred in at least two generations and diagnosis was confirmed independently by two different clinicians; they discovered a region on 12q23.3 linked with developmental stuttering in a single family without pinpointing an exact causal gene. Five years later in 2010, Kang et al. reported the results from a follow-up study of 77 unrelated Pakistani individuals who stutter plus unrelated cases from the same 46 Pakistani families interrogated by Riaz et al. in 2005; their investigation pinpointed three causal genes critical for the mannose-6-phosphate lysosomal targeting pathway: GNPTAB (MIM: 607840), GNPTG (MIM: 607838), and NAGPA (MIM: 607985). In 2018, Kazemi et al. performed Sanger sequencing and homozygosity mapping for 25 Iranian families afflicted by developmental stuttering and identified an additional 3 variants in GNPTAB and GNPTG that co-segregated with stuttering. Additional studies have revealed several regions across the genome linked with the trait but only identified three candidate risk genes: DRD2 (MIM: 126450), AP4E1 (MIM: 607244), and CYP17A1 (MIM: 609300). Lan et al. performed an association study focusing specifically on dopaminergic gene haplotypes and allele frequencies among SNPs in the Han Chinese population and identified risk and protective alleles in DRD2. These results were not replicated in 2011 by Kang et al. in a case-control cohort from Brazil and western Europe. In 2015, Raza et al. used whole-exome sequencing to identify two heterozygous AP4E1 coding variants that co-segregated with persistent developmental stuttering in a large Cameroonian family (the same polygamous family as published in their earlier work from 2013); they also observed these same two variants in unrelated Cameroonians with persistent stuttering. Although Raza et al. also reported 23 additional rare variants (including loss-of-function variants) within AP4E1 among unrelated stuttering individuals from Cameroon, Pakistan, and North America, their findings have yet to be replicated by another group. In 2017, Mohammadi et al. performed a case-control study of the Kurdish population aged 3 to 9 years from Western Iran, specifically focusing on the dimorphic nature of stuttering, and identified an allelic polymorphism associated with stuttering susceptibility in CYP17A1, a gene integral for the synthesis of steroid hormones. As reported by Frigerio Domingues et al. in 2019, these results were not replicated in an independent case- and population-matched control association study from the United States, Brazil, Pakistan, and Cameroon. Despite these efforts, the molecular pathophysiology of developmental stuttering in general populations remains obscure, in part due to the dearth of studies exploring common genetic risk factors in unrelated individuals and the lack of consensus across studies. The International Stuttering Project (ISP) was formed to represent global outbred populations of individuals who stutter, specifically to illuminate genetic etiology and broaden investigations of its diverse and variable phenotype (see Web resources). Given the success of investigations for heritable complex diseases, genome-wide association analyses of developmental stuttering are poised to provide insights into its molecular basis. Moreover, prior investigations into the genetics underlying developmental stuttering have comprised samples and study designs ill-equipped to detect common variant effects or reconcile genetic heterogeneity. Our study accommodates both. Here, we accrued a global and multiethnic clinically ascertained developmental stuttering case set through the ISP and report genome-wide significant (GWS) findings in a meta-analysis study of developmental stuttering.

Material and methods

Studies

The multiethnic genome-wide association study (GWAS) meta-analysis included studies with genotype data from clinically ascertained individuals with developmental stuttering from the ISP and their sex- and ancestry-matched control subjects (n = 8,104; n cases = 1,345) and summary statistics (n = 8,357; n cases = 785) from The National Longitudinal Study of Adolescent to Adult Health (Add Health). The ISP comprises 1,345 clinically ascertained developmental stuttering cases collected from the Curtin Stuttering Treatment Clinic in Perth, Australia; the SpeechMatters Clinic and the Irish Stammering Association, in Dublin, Ireland; the National Stuttering Association, USA; online recruitment on reddit.com; and Dr. Shelly Jo Kraft’s research group at Wayne State University (Table 1). Stuttering status was confirmed in all affected individuals by a speech pathologist with expertise in fluency disorders. Up to five ancestry- and sex-matched population-based control subjects per affected individual were drawn from BioVU (n control subjects = 6,759; Table 1), Vanderbilt University Medical Center’s (VUMC’s) electronic health record (EHR)-linked biobank: 49 of the 6,759 control subjects included genotyped unaffected family members of affected individuals. Vanderbilt University Medical Center has recruited and consented individuals to join BioVU since February 2007., The electronic health record at Vanderbilt University Medical Center offers de-identified demographic data, clinical notes, electronic orders, laboratory measurements, ICD-9 CM/ICD-10 disease diagnosis codes, and CPT codes. Using electronic health records, individuals with diagnoses of developmental, speech, or language disorders as identified via ICD-9 and ICD-10 codes (Table S1) or a phenome risk classifier, and individuals under age 18 years were excluded as potential control subjects. To select ancestry-matched control subjects, we calculated eigenvectors and eigenvalues through principal component analysis (PCA) run on PLINKv.1.90. PCA was performed on the maximally unrelated set of affected individuals and potential control subjects ( < 0.09375, as identified by PRIMUS,) using a panel of SNPs in low linkage disequilibrium (LD); additional related affected individuals and potential control subjects were projected along each of the calculated eigenvectors. Pairwise Euclidean distance between each affected individual and potential control subject was calculated using principal components to identify the control subject with the smallest Euclidean pairwise distance for each affected individual. This control subject selection method included outlier pruning, which removed any potential control subjects with a pairwise distance more than 2 standard deviations away from the average pairwise distance. Control subject selection also matched according to sex. Age information was not available for 171 samples and therefore was excluded as a covariate during control subject selection.

Table 1

Demographic distribution for subjects used in genome-wide association analyses

ISP GWAS			Add Health GWAS
	Cases	Controls		Cases	Controls

Total	1,345	6,759	Total	785	7,572
	n (%)			n (%)
Male	965 (71.7)	4,780 (70.7)	Male	446 (56.8)	3,419 (45.2)
Female	380 (28.3)	1,979 (29.3)	Female	339 (43.2)	4,153 (54.8)

Ancestry			Ancestry

African	68 (5.1)	388 (5.7)	Non-Hispanic Black	182 (23.2)	1,522 (20.1)
Hispanic	38 (2.8)	131 (1.9)	Hispanic	122 (15.5)	1,055 (13.9)
East Asian	42 (3.1)	113 (1.7)	Asian	44 (5.6)	404 (5.3)
European	1,132 (84.2)	5,875 (86.9)	Non-Hispanic white	433 (55.2)	4,559 (60.2)
South Asian	44 (3.3)	143 (2.1)	Native American	4 (0.5)	32 (0.4)
Other/mixed	21 (1.6)	109 (1.6)

			Age, years (std)	28.44 (1.77)	28.52 (1.81)

For the ISP analysis, ancestry was determined through principal component analysis, and approximately 5 cases were selected for each case, matching on ancestry and sex. For the Add Health GWAS, ancestry was determined through principal component analysis, and affection status was self-reported by each subject.

Demographic distribution for subjects used in genome-wide association analyses For the ISP analysis, ancestry was determined through principal component analysis, and approximately 5 cases were selected for each case, matching on ancestry and sex. For the Add Health GWAS, ancestry was determined through principal component analysis, and affection status was self-reported by each subject. ISP studies (protocol 0225119MP2E) were approved by Wayne State University’s Committee for the Protection of Human Subjects. The studies were explained to all participants and written informed consent obtained. Genotyping services were provided by Vanderbilt Technologies for Advanced Genomics (VANTAGE). Analysis of deidentified ISP and BioVU data in this study was approved under an institutional review board (IRB) exemption by Vanderbilt University’s Committee for the Projection of Human Subjects (IRB #180583). Use of BioVU data was approved by the Vanderbilt Institute for Clinical and Translational Research (BV247, BV247_A1). Add Health represents an ongoing, nationally representative, longitudinal study of the social, behavioral, and biological factors influencing health and developmental trajectories from early adolescence into adulthood. Add Health collected demographic and health survey data as well as in-home physical and biological data from participants. For our study, 785 self-reported stuttering cases were defined as participants who at one point answered “yes” to the following survey question: “Do you have a problem with stuttering or stammering?” For control subjects, 7,572 participants were included in the analysis. All control individuals answered “no” to the above question and did not mark “delayed speech or other problems with speaking or understanding” or “I don’t know” to the same query across all study visits.

Genotyping, quality control, and imputation

The ISP cases and controls were genotyped using the Illumina Expanded Multi-Ethnic Genotyping Array (MEGAex) at Vanderbilt University Medical Center’s core facility, VANTAGE. Sample and variant filtering for quality control was performed using PLINKv.1.90; initial filtering thresholds for the control cohort excluded variants with a call rate less than 98% and samples with a call rate less than 97%. Initial filtering for stuttering cases excluded variants and samples with a call rate less than 90%. Duplicate variants and indels (insertions and deletions) were removed as well as any duplicate samples (the duplicate sample with a lower call rate was removed) in both cases and controls. We applied the methods described by Pluzhnikov et al. to identify possible plate or batch effects prior to merging genotype batches. Cases were separately assessed for quality control by ancestry group (European, Admixed American, African, East Asian), assigned using principal components calculated after merging the case samples with HapMap3 reference data for ancestry classification. Each ancestry group was subsequently analyzed, applying a group-specific minor allele filter of 1%, variant missingness filter of 3%, sample missingness filter of 5%, as well as checks for heterozygosity, sex, and variants that deviated strongly from Hardy-Weinberg equilibrium (HWE) (variants with a HWE p < 1 × 10−15 were removed). Cases were then merged with their selected matched controls for imputation according to standard protocols and specifications outlined for the TOPMed Imputation Server, including using William Rayner’s pre-imputation data preparation toolkit (see Web resources). Relatedness checks for cases and controls were performed using PRIMUS. All autosomal chromosomes were imputed on the TOPMed Imputation Server using EAGLE_v.2.4 phasing, Minimac4 imputation, and the TOPMed reference., Post-imputation quality control filtering included removal of variants with a minor allele frequency (MAF) less than 1%, imputation r2 less than 0.4, or with an effective n (neff = 2 × (MAF) × (1 − MAF) × n × r2) less than 30. In Add Health, data were genotyped on the Illumina Omni Quad 1 and 2.5 and imputed on the Michigan Imputation Server using Minimac2 and the 1000 Genomes (Phase3v.5) reference., Post-imputation quality control filtering included selecting variants with a MAF above 1%, as well as removing all variants with an imputation r2 less than 0.4 or with an effective n less than 30. Experimental workflow is depicted in graphical format in Figure S1.

Statistical analysis

In the clinically ascertained developmental stuttering set, ∼9 million imputed variants (Figure S2; Table S2) were analyzed for association with stuttering risk using a frequency-based additive logistic mixed model via SAIGE, a method applied and developed for biobank data in order to accommodate imbalanced case-control ratios and sample relatedness. Association analysis corrected for population substructure by using six trait-associated principal components capturing genetic ancestry as covariates. In Add Health, ∼9 million imputed variants (Figure S3; Table S3) were analyzed for association with stuttering status using a frequency-based additive logistic model via SUGEN. Model covariates included ten ancestry-associated principal components and age.

Meta-analysis

Meta-analysis was performed combining result of the ISP and Add Health studies across 7,275,796 variants imputed in both datasets. Summary statistics (direction of effect and observed p value) from each contributing GWAS were combined in each study to calculate a signed Z-score using METAL. The sample size scheme was used in this meta-analysis, since effect size estimates and standard errors were not equivalent between each GWAS. Annotated associations from study-specific and meta-analyses were variants that reached genome-wide significance (p < 5 × 10−8) or were suggestive (p < 5 × 10−6). Variants were aligned to human genome reference build 38. The genome-wide significance threshold of p < 5 × 10−8 was set according to field standards. This threshold uses a Bonferroni correction where α = 0.05 and assumes there are approximately 1 million independent (i.e., not in linkage disequilibrium) common signals across the human genome. The suggestive threshold of p < 5 × 10−6 assumes an expectation of one false-positive association per GWAS (i.e., 1/total number of independent SNPs).,

Annotation

We annotated top associated variants from study-specific and meta-analyses using ANNOVAR. The data accessed from the Genotype Tissue Expression (GTEx) portal in July 2021 derive from the following version: dbGaP: phs000424.v8.p2.

Genetic heritability calculation

Genome-wide SNP-based liability scaled heritability within our ISP set was calculated through a genomic-relatedness-based restricted maximum-likelihood (GREML) approach implemented through GCTA software., Observed variance estimates from the observed scale were transformed to an expected underlying scale with an expected population prevalence being set to 0.01 based conservatively on estimated prevalence among the general adult population. Heritability estimates included 485,698 genotyped variants that passed all quality control metrics prior to imputation (see Genotyping, quality control, and imputation) among the PRIMUS,-identified maximum unrelated set (up to third degree), which included 7,768 individuals (1,095 cases). We corrected for individual sex and the first six principal components (see Genotyping, quality control, and imputation).

Functional analyses

We performed a Bayesian colocalization analysis between our Add Health-ISP meta-analysis top hits and tissue-specific eQTL signals from GTEx v.8 data using fast enrichment aided colocalization analysis (fastENLOC,). We looked for colocalization solely for meta-analysis regions with a variant identified as a top hit (Table 2). Evaluated regions in the meta-analysis included all sentinel variants as well as any other variants found in the same LD block in addition to any variants nearby (gene <250 kb upstream or downstream of the sentinel variant) an identified protein-coding gene. LD blocks in the meta-analysis data were defined according to European-based LD calculated from 1000 Genomes phase 1 data by Berisa and Pickrell. Colocalization analysis was tissue-specific and included all stuttering-relevant tissues available in GTEx v.8 (skeletal muscle, pituitary, minor salivary gland, all esophageal tissues, and all brain tissues). We reported the results of any colocalization signal with a regional colocalization probability (RCP) (i.e., the probability that one of two SNPs in an LD block is responsible for a genuine association) ≥ 0.05 (Table S6).

Table 2

Top hits from Add Health and ISP meta-analysis

rsID	CHR	Position	Effect allele	Other allele	EAF	Z score	p value	Nearest gene	Location
rs113284510a	3	8683501	T	C	0.0749	−5.576	2.46E−08	SSUH2	intronic or genic upstream transcript
rs34919320	1	217480104	G	A	0.2602	5.128	2.93E−07	GPATCH2	intronic
rs58528263	4	17105854	G	T	0.249	5.102	3.37E−07	LDB2	207 kb downstream
rs2938894	9	78480092	A	T	0.030	−5.004	5.63E−07	PSAT1	150 kb downstream
rs1011275	2	53611701	G	T	0.2393	4.804	1.55E−06	ASB3	58 kb upstream
rs4282275	5	54451477	A	G	0.1347	4.783	1.72E−06	HSPB3	4 kb upstream
rs6547085	2	76410210	G	A	0.0532	4.752	2.01E−06	NA	NA
rs16855942	4	43363575	A	G	0.3616	−4.685	2.79E−06	NA	NA
rs10994385	10	46038901	C	G	0.3678	−4.661	3.14E−06	MSMB	intronic
rs35612603	2	98486856	A	G	0.1321	4.659	3.18E−06	INPP4A	intronic
rs16954038	15	70000732	G	C	0.0371	4.64	3.48E−06	TLE3	47 kb upstream
rs11158418	14	62490392	T	A	0.4699	4.633	3.60E−06	KCNH5	209 kb upstream
rs1446110	2	80290628	C	A	0.3264	−4.622	3.79E−06	CTNNA2	intronic
rs10779884	2	112130529	A	G	0.4826	4.62	3.83E−06	FBLN7	8 kb upstream
rs115327327	2	61440832	T	A	0.2722	4.574	4.78E−06	USP34	intronic
rs111962436	2	223438584	A	T	0.0261	−4.571	4.85E−06	SCG2	158 kb upstream

Genome-wide association summary statistics from Add Health and ISP stuttering studies meta-analyzed using METAL. Sentinel variants from loci with p < 5 × 10−6 reported along with nearest gene annotation. NA (not available) reported for variants where the nearest protein-coding gene was more than 250 kb away (either upstream or downstream according to UCSC reference genome browser). Base-pair positions listed according to human genome reference build 38.

Variant represents a locus that reached genome-wide significance (p < 5 × 10−8).

Top hits from Add Health and ISP meta-analysis Genome-wide association summary statistics from Add Health and ISP stuttering studies meta-analyzed using METAL. Sentinel variants from loci with p < 5 × 10−6 reported along with nearest gene annotation. NA (not available) reported for variants where the nearest protein-coding gene was more than 250 kb away (either upstream or downstream according to UCSC reference genome browser). Base-pair positions listed according to human genome reference build 38. Variant represents a locus that reached genome-wide significance (p < 5 × 10−8). We performed gene ontology analysis for the top 100 genes associated with variant signals in our meta-analysis. Our top signals were annotated with the Open Targets Genetics “Variant-to-Gene” (V2G) pipeline, which integrates evidence from four main data types (molecular phenotype quantitative trait loci, chromatin interactions, in silico functional predictions from Ensembl, and distance between the variant and each gene’s canonical transcription start site) to assign the most likely functional gene for each variant. Next, we used clusterProfiler, to perform a false discovery rate (FDR)-corrected enrichment test for gene ontology terms among our identified top 100 genes. We also performed an enrichment test for gene modules using our identified top 100 genes to determine if any sets of highly correlated genes (gene modules) were associated with stuttering risk. Gene co-expression networks comprised groups of functionally related genes or “modules” Gerring et al. identified from GTEx v.7 tissue gene expression data. Module enrichment reported for any gene tissue-specific analysis with a raw p value < 0.05 among stuttering-relevant tissues (skeletal muscle, pituitary, minor salivary gland, all esophageal tissues, and all brain tissues). We performed a competitive gene pathway analysis for reported module enrichments using g:Profiler and subsequently annotated the outputted biological pathways (Table S7).

Power calculation

We calculated our power to detect significant stuttering risk associations across a range of disease allele frequencies for our meta-analysis comprising 2,130 stuttering cases and 14,331 controls. We estimated power assuming a two-sided hypothesis test at p < 5 × 10−8, an additive model, and using a developmental stuttering prevalence of 1%. Calculations were performed using the University of Michigan’s Genetic Association (GAS) Power Calculator.

Replication for published implicated stuttering genes

We manually reviewed over 200 records on PubMed via the National Center for Biotechnology website for publications in the past 21 years (2000–2021) that mentioned “stuttering” in the title field. Much of the published stuttering literature23, 24, 25, 26,,,, implicated large genome regions from linkage studies in families, without determining a specific causal gene. We sought replication for the six genes that have been previously implicated in the stuttering literature,,, (Table S5) by evaluating all variants that passed our QC metrics within each gene in our meta-analyzed GWAS. To determine the effective number of tests for each gene, we calculated r2 between each SNP pair within a gene using PLINKv.1.90. SNPs that had an r2 > 0.4 were considered to be in linkage disequilibrium. The effective number of tests used for our Bonferroni correction represented the number of independent tag SNPs in each gene with pairwise r2 < 0.4. Results were Bonferroni corrected for the effective number of tests in each gene and the variant with the minimum p value within each gene is reported (Table S5).

Results

Meta-analyzed GWAS

Genome-wide association analyses of stuttering were carried out in 8,104 individuals (1,345 cases) from the ISP study and 8,357 individuals (785 cases) (Table 1) from a self-reported stuttering study, with ∼7.3 million overlapping variants tested. No evidence for residual population stratification or systematic technical artifact was observed in either individual dataset or the meta-analysis. The genomic inflation factor, λ, was 1.0173 (Figure S2) in the ISP GWAS and 1.0161 in the Add Health GWAS (Figure S3). The genomic inflation factor for the meta-analysis was λ = 0.9977 (Figure 1). In the meta-analysis, one genome-wide significant association was observed at rs113284510 (Z = −5.576, p = 2.46 × 10−8). The variant, rs113284510, occurred in either an intronic region or genic upstream region of SSUH2, (MIM: 617479) (Figure 2) depending on the transcript. This variant exhibited consistent direction of effect (p < 5 × 10−6) in the Add Health GWAS (p = 2.23 × 10−7, odds ratio [OR] = 0.455 [0.320–0.591]) and in the ISP GWAS (p = 0.0059, OR = 0.754 [0.617–0.922]) (Table S2). The frequency of the protective effect allele (T) for rs113284510 was 7.49% overall (7.08% in the ISP GWAS and 7.88% in the Add Health GWAS) (Table S2).

Figure 1

Manhattan and Q-Q plot for meta-analysis of Add Health and ISP stuttering studies

Meta-analysis included 16,461 samples and 7,275,796 variants present in both datasets; variants not present in both datasets were excluded. One locus reached genome-wide significance (red line p < 5 × 10−8); fifteen loci reached suggestive genome-wide significance (blue line p < 5 × 10−6). Q-Q plot x axis represents expected −log10(p) and the y axis represents observed −log10(p).

Figure 2

Locus zoom plot of rs113284510

Locus zoom plot of meta-analysis stuttering associations with surrounding variants (color coded by r2 bin) and the sentinel variant (denoted by purple diamond) using EUR linkage disequilibrium (LD) generated from 1000 Genomes EUR reference. The x axis represents chromosome position (hg38) with annotated genes found within the region, the y axis represents −log10 (p value) of the association between the genetic variant and stuttering. Sentinel variant is located in either an intronic or genic upstream region of SSUH2.

Manhattan and Q-Q plot for meta-analysis of Add Health and ISP stuttering studies Meta-analysis included 16,461 samples and 7,275,796 variants present in both datasets; variants not present in both datasets were excluded. One locus reached genome-wide significance (red line p < 5 × 10−8); fifteen loci reached suggestive genome-wide significance (blue line p < 5 × 10−6). Q-Q plot x axis represents expected −log10(p) and the y axis represents observed −log10(p). Locus zoom plot of rs113284510 Locus zoom plot of meta-analysis stuttering associations with surrounding variants (color coded by r2 bin) and the sentinel variant (denoted by purple diamond) using EUR linkage disequilibrium (LD) generated from 1000 Genomes EUR reference. The x axis represents chromosome position (hg38) with annotated genes found within the region, the y axis represents −log10 (p value) of the association between the genetic variant and stuttering. Sentinel variant is located in either an intronic or genic upstream region of SSUH2. In the meta-analysis, the index variants for an additional 15 associations reaching a suggestive genome-wide significance threshold of p < 5 × 10−6 are presented in Table 2. No genome-wide significant associations were observed in either the ISP or Add Health GWAS; however, 19 variants reached our suggestive (p < 5 × 10−6) significance threshold for the ISP GWAS (Table S3), and 24 variants reached this same suggestive threshold in the Add Health GWAS (Table S4).

Genetic heritability

We calculated SNP-based liability scaled heritability within our unrelated ISP sample through GCTA., The proportion of phenotypic variance explained by the genetic factors was reported at 0.791 (SE = 0.043). Through GCTA we also transformed the explained variance estimates from the observed scale to the underlying liability scale, accounting for an expected case prevalence of 0.01. Liability scaled heritability was 0.902 (SE = 0.049). Our colocalization analysis identified three regions in our stuttering meta-analysis showing weak association (regional colocalization probability, 0.1 > RCP ≥ 0.05) between cis-eQTLs in GTEx v.8: chr2: 111630529–112630529, chr2: 60940832–6194083, and chr2: 97986856–98986856 (Table S6). In the chr2: 111630529–112630529 region, the lead SNP, rs10779884, was identified as a top hit in our meta-analysis (Table 2) and serves as an eQTL in FBLN7 (MIM: 611551) within muscle skeletal, esophagus mucosa, and brain hypothalamus tissues. The chr2: 60940832–6194083 did not colocalize with any eQTLS for protein coding genes and chr2: 97986856–98986856 region identified rs140321250 as the lead SNP, predicted to act as an eQTL for INPP4A (MIM: 600916) in esophagus mucosa tissue (Table S6). We did not observe any significant (p < 0.05 after FDR correction) enrichment for gene ontology terms among the top 100 genes identified in our meta-analysis. We observed one significant GTEx tissue-specific enrichment for a gene module in the minor salivary gland (FDR-corrected p = 6.63 × 10−3) with biological pathways implicated in processes such as extracellular matrix and structure organization, cell adhesion, anatomical structure development, nervous system development, ossification, neurogenesis, cell migration, and bone morphogenesis (Table S7). The nearest gene to the identified genome-wide significant hit (rs113284510), SSUH2, was found in this gene module as well as the FBLN7 gene near another top variant hit (rs10779884) (Table 2). We did not observe any additional significant GTEx tissue-specific gene module enrichments.

Replication analysis of implicated stuttering genes from the literature

To determine whether genetic contributions observed in families and population isolates might replicate in a population-based analysis, we assessed our data for replication of six genes that have previously been implicated in the stuttering literature:,,, DRD2, GNTAB, GNPTG, NAGPA, AP4E1, and CYP17A1 (Table S5). We reported the lowest p value observed in our study in imputed variants within the exonic and intronic region for each gene, as well as the Bonferroni corrected p value for each top signal, based on the effective number of tests in that gene. None of the variants measured in our GWAS meta-analysis for these six genes reached statistical significance (p < 0.05) after Bonferroni correction; however, two variants neared statistical significance after Bonferroni correction: rs761057 (intron of GNPTG; p = 0.105; risk allele [T] frequency 9.9%) and rs4919687 (intron of CYP17A1; p = 0.100; protective allele [A] frequency 27%) (Table S5).

Discussion

Our multiethnic GWAS meta-analysis of stuttering in men and women of European, Hispanic, Asian, and African American ancestry led to the identification of one genome-wide significant protective risk locus. The protective T allele for the index variant, rs113284510, occurred within either an intronic or genic upstream region of SSUH2, a gene previously reported to play a major role in odontogenesis. A missense mutation in SSUH2 was shown to disrupt protein structure and production, causing autosomal-dominant dentin dysplasia type I (MIM: 125400) in a large Chinese family. Interestingly, a different top meta-analysis locus, rs10779884 (Table 2), is found approximately 8 kb upstream of the FBLN7 transcription start site. FBLN7 encodes a protein that interacts with extracellular matrix molecules in developing teeth and may play important roles in differentiation and maintenance of odontoblasts and dentin formation. Moreover, in the GTEx database, rs10779884 acts as an eQTL modulating FBLN7 expression across several tissues, including arteries, adipose, tibial nerve, skin, breast, skeletal muscle, heart, esophagus, pancreas, colon, and brain. Interestingly, both these genes (SSUH2 and FBLN7) appeared in a GTEx tissue-specific gene module enriched in the minor salivary gland (Table S7). This enriched gene module also included RELN (MIM: 600514), a gene that encodes a glycoprotein produced within the developing brain. RELN has been implicated in neural traits such as autism spectrum disorder (ASD) (MIM: 209850) as well as volumetric brain measures. Notably, a recent paper by Peter et al. identified likely deleterious variants in RELN inherited by two siblings affected with both ASD and childhood apraxia of speech, suggesting pleiotropic effects for RELN. A pathway analysis showed that genes in this module are involved in a myriad of biological processes such as extracellular matrix organization, nervous system development, neurogenesis, cell migration, and bone morphogenesis. This intriguing analysis provides preliminary support that genes with roles in structural organization and various neural processes might play a role in developmental stuttering risk. Further investigation of the genome-wide significant sentinel variant, rs113284510, in GTEx showed that it acts as an eQTL specifically in esophagus-muscularis tissue by reducing SSUH2 expression in the presence of the protective T allele. This function might in part be explained by its genic upstream position to SSUH2. A review of the GWAS literature also shows suggestive significance (p < 5 × 10−6) for variants located in SSUH2 with ASD. ASD is a neurodevelopmental disorder that presents with a gradual or sudden early childhood onset, similar to developmental stuttering. Individuals with ASD exhibit impaired social interaction skills and, in moderate to severe cases, have little to no speech production beyond basic vocalizations. A possible shared genetic liability between ASD and developmental stuttering has not been published; however, disordered speech in ASD was found to be associated with mutations in the FOXP2 (MIM: 605317) and CNTNAP2 (MIM: 604569) genes., Although not yet specifically implicated in persistent developmental stuttering, these genes have known associations with a broad umbrella of speech and language disorders, including developmental verbal dyspraxia and developmental language disorder. GTEx also shows that the genome-wide significant sentinel variant, rs113284510, acts as an eQTL in tibial artery tissue by increasing CAV3 (MIM: 601253) expression in the presence of the protective T allele. CAV3 encodes instructions for making the caveolin-3 protein, which is found in the membrane surrounding muscle cells; caveolin-3 may also help regulate calcium levels in muscle cells. As such, genetic changes in CAV3 have been implicated in various health conditions with impaired muscle function, such as rippling muscle disease, limb-girdle muscular dystrophy, and hypertrophic cardiomyopathy.96, 97, 98, 99, 100, 101 The rs113284510 variant seems to regulate the expression of two unique genes, CAV3 and SSUH2; investigating pleiotropic effects may help unravel the potential role of these genes in modulating developmental stuttering risk. Among our meta-analysis top hits (Table 2), we observed nine variants suggestively associated with stuttering risk (p < 5 × 10−6) of obscure functional consequence (rs58528263, rs2938894, rs1011275, rs4282275, rs6547085, rs16855942, rs16954038, rs11158418, and rs111962436). Our gene ontology and GTEx tissue-specific gene module enrichment analysis did not provide any additional illumination for these associations. For the other six suggestively associated loci, our GTEx, Open Targets Genetics Portal, and GWAS catalog searches uncovered initial clues as to their possible function. For example, our GWAS meta-analysis identified a risk locus that neared genome-wide significance (rs34919320, p = 2.93 × 10−7) in an intronic region of GPATCH2 (MIM: 616836), a gene that encodes a nuclear factor that plays a role in spermatogenesis and tumor growth during breast cancer. Investigation of rs34919320 in GTEx also supports a possible role for GPATCH2 in spermatogenesis, showing that the risk allele G acts as an eQTL in testis tissue by reducing GPATCH2 expression. Another top hit (rs115327327, p = 4.78 × 10−6) is found in an intronic region of USP34 (MIM: 615295), which encodes a ubiquitin protein. In a large study of testosterone and related sex hormones in 425,097 UK Biobank participants, other intronic variants in USP34 were associated with sex hormone-binding globulin (SHBG). SHBG controls how much testosterone, dihydrotestosterone, and estradiol are delivered throughout the body; however, the SHBG blood test is primarily used to determine testosterone levels (see Web resources). These findings motivate further exploration into population-based genetic effects that might contribute to the sexually dimorphic nature of stuttering. At stuttering onset, the male-to-female ratio is more even (between 1:1 and 2:1); however, females are more likely to recover from stuttering, changing the male-to-female ratio to 4:1, as observed in adults. The mechanisms for this observed sex discrepancy are not well understood, and initial clues into possible causal genetics are unclear. A paper by Mohammadi et al. provides preliminary insight: they measured testosterone levels in children who stutter and their control counterparts and found higher levels of testosterone and its metabolites in children who stutter. Observed stuttering susceptibility was also reported in association with CYP17A1, a gene that encodes the instructions to make an enzyme involved in steroid hormone synthesis. Interestingly, work by Anthoni et al. has also shown that variants in CYP19A1 (MIM: 609300), a gene in the same cytochrome P450 family as CYP17A1, are associated with quantitative measures of language and speech, such as phonological processing and oral motor skills. These results combined with our associations suggest that variants involved in the regulation of sex hormones may contribute to stuttering risk. Follow-up analysis to determine if these associations reach genome-wide significance and replicate in an independent dataset is warranted, particularly in a larger dataset powered to perform sex-stratified analyses. Another intriguing association with stuttering susceptibility, included the identification of a protective variant in an intronic region of CTNNA2 (MIM: 114025) (rs1446110, p = 3.79 × 10−6), which encodes Catenin alpha-2 protein. Catenin alpha-2 plays a critical role in cortical neuronal migration and neurite growth. Another identified protective variant (rs10994385, p = 3.14 × 10−6) occurred in an intronic region of MSMB (MIM: 157145), a gene that encodes a member of the immunoglobulin binding factor and is synthesized by prostate epithelial cells., Other variants in MSMB have shown association with prostate cancer., Finally, an identified risk variant, rs35612603, occurred in an intronic region of INPP4A, a gene that encodes a Mg2+ enzyme to hydrolyze the 4-position of the inositol ring. In the GWAS Catalog summary statistics repository, the top five traits associated with variants within INPP4A included: use of prednisolone medication, time employed in current main job, unspecified personality disorders, hypopituitarism, and brain cancer/tumor. Although these initial associations lay a foundation for possible common developmental stuttering susceptibility variants, future replication analyses to determine if these associations reach genome-wide significance and replicate in independent datasets will be integral to the design of functional validation analyses. As an initial investigation into possible shared genetic contributions between familial developmental stuttering and the risk of developmental stuttering in the general population, we performed a replication analysis for six published stuttering risk genes (Table S5). Since the observed causal variants in these family-based studies,,, were not directly measured on our arrays, were too rare to impute with accuracy, and were too rare to estimate robust effects given our sample size, we instead looked for any potential effects from common variants in and around these genes. After performing locus-based Bonferroni correction for the number of independent tests in each locus, we did not identify any significant common variant effects in these six genes, thus suggesting that the genetic architecture detected to date in families highly enriched with individuals who stutter is largely distinct from the common genetic drivers of stuttering in general populations. However, we did observe common variant signals that neared nominal statistical significance after regional test correction at GNPTG (rs761057; p = 0.105) and CYP17A1 (rs4919687; p = 0.100), suggesting that studies with greater sample size and improved power may identify shared familial and population genetic contributions for these stuttering risk genes. Interestingly, estimation of observed trait heritability in our clinically ascertained subset (ISP dataset) was similar to heritability calculations in several twin studies,, of developmental stuttering (h2 = 0.791, SE = 0.043), providing evidence that genetic factors for developmental stuttering at a population level exist. Another postulation is that common and rare variation act additively to create risk in developmental stuttering, as was observed in a recent study of the genetics underlying ASD. The authors developed a polygenic transmission disequilibrium test (pTDT) and demonstrated that common and rare variation act additively in ASD. As this study is underpowered to detect effects of rare variants (see Figure S4 for power curve), future investigations performing a pTDT, using the data presented here, to create a polygenic risk score capturing common genetic stuttering liability are warranted. This study has a few potential limitations, the most significant of which is the sample size. Our sample of a little over 2,000 cases is not sufficient to identify definitive developmental stuttering susceptibility variants, especially if a myriad of common variants of very small effect prove to impact its liability (Figure S4). Relatedly, our study has insufficient power for stratified analyses examining additional clinical variables of interest such as sex or recovery status (persistence). This study also lacked a sufficient sample size to divide the data into a training and testing set for polygenic risk score development; for example; PRS-CS recommends tens of thousands of cases for their approach. Akin to other neurologic polygenic traits such as Tourette syndrome, (MIM: 137580), ASD,, and schizophrenia (MIM: 181500), establishing and independently validating common variant trait liability to stuttering will most likely require studies of tens of thousands of subjects. Furthermore, replicating the results herein is critical. Although our study identified rs113284510 as significantly associated with stuttering in our meta-analysis, both the strength of association and quality of imputation (ISP: beta = −0.282 and INFO = 0.860; Add Health: beta = −0.787 and INFO 0.478) for this SNP varied in each contributing study (Table S2). Furthermore, the exact biologic context of this association remains obscure. Our functional analyses implicated the variant, rs113284510, in SSUH2, which our GTEx tissue-specific gene module enrichment test identified as enriched for a gene module in the minor salivary gland (Table S7). Although our pathway analysis showed that genes in this module are involved in processes such as nervous system development and neurogenesis, the module was not significantly enriched within any brain tissues. This lack of observed enrichment in brain tissues might be a model limitation. Furthermore, gene modules were built using adult GTEx data, and it has been hypothesized that genes influencing other speech traits, such as childhood apraxia of speech, are likely expressed during prenatal and early postnatal brains. Since our modules relied on adult brain data, we could be missing relevant mechanistic correlations with our genetic findings. Nonetheless, the fact that our stuttering-associated variant and genes appear to have neural functions suggests a role for these genes that warrants future study. Finally, despite the success of this study in identifying both genome-wide significant and suggestively significant signals for stuttering in the general population, we anticipate that the power limitations of our presented study can be resolved by substantial increases in the number of stuttering cases collected for GWAS and inclusion of diverse ancestries. Most importantly, this study lays necessary groundwork for the identification of additional common developmental stuttering susceptibility variants in larger population-wide cohorts and helps to provide a more complete understanding of the full genetic architecture for this common speech condition, with the potential of uncovering etiology, pathophysiology, and eventual therapeutic targets.

113 in total

1. Results of a genome-wide linkage scan for stuttering.

Authors: Yin Yao Shugart; Jennifer Mundorff; James Kilshaw; Kimberly Doheny; Betty Doan; Jacqueline Wanyee; Eric D Green; Dennis Drayna
Journal: Am J Med Genet A Date: 2004-01-15 Impact factor: 2.802

2. Genetic and environmental influences on stuttering and tics in Japanese twin children.

Authors: Syuichi Ooki
Journal: Twin Res Hum Genet Date: 2005-02 Impact factor: 1.587

Review 3. Caveolinopathies: translational implications of caveolin-3 in skeletal and cardiac muscle disorders.

Authors: E Gazzerro; A Bonetto; C Minetti
Journal: Handb Clin Neurol Date: 2011

4. Estimating missing heritability for disease from genome-wide association studies.

Authors: Sang Hong Lee; Naomi R Wray; Michael E Goddard; Peter M Visscher
Journal: Am J Hum Genet Date: 2011-03-03 Impact factor: 11.025

5. The prevalence of stuttering, voice, and speech-sound disorders in primary school students in Australia.

Authors: David H McKinnon; Sharynne McLeod; Sheena Reilly
Journal: Lang Speech Hear Serv Sch Date: 2007-01 Impact factor: 2.983

6. Open Targets Genetics: systematic identification of trait-associated genes using large-scale genetics and functional genomics.

Authors: Maya Ghoussaini; Edward Mountjoy; Miguel Carmona; Gareth Peat; Ellen M Schmidt; Andrew Hercules; Luca Fumis; Alfredo Miranda; Denise Carvalho-Silva; Annalisa Buniello; Tony Burdett; James Hayhurst; Jarrod Baker; Javier Ferrer; Asier Gonzalez-Uriarte; Simon Jupp; Mohd Anisul Karim; Gautier Koscielny; Sandra Machlitt-Northen; Cinzia Malangone; Zoe May Pendlington; Paola Roncaglia; Daniel Suveges; Daniel Wright; Olga Vrousgou; Eliseo Papa; Helen Parkinson; Jacqueline A L MacArthur; John A Todd; Jeffrey C Barrett; Jeremy Schwartzentruber; David G Hulcoop; David Ochoa; Ellen M McDonagh; Ian Dunham
Journal: Nucleic Acids Res Date: 2021-01-08 Impact factor: 16.971

7. Rippling muscle disease and cardiomyopathy associated with a mutation in the CAV3 gene.

Authors: Michela Catteruccia; Tommaso Sanna; Filippo Maria Santorelli; Alessandra Tessa; Raffaella Di Giacopo; Donato Sauchelli; Alessandro Verbo; Mauro Lo Monaco; Serenella Servidei
Journal: Neuromuscul Disord Date: 2009-09-20 Impact factor: 4.296

Review 8. Epidemiology of stuttering: 21st century advances.

Authors: Ehud Yairi; Nicoline Ambrose
Journal: J Fluency Disord Date: 2012-11-27 Impact factor: 2.538

9. Association of genes with phenotype in autism spectrum disorder.

Authors: Sabah Nisar; Sheema Hashem; Ajaz A Bhat; Najeeb Syed; Santosh Yadav; Muhammad Waqar Azeem; Shahab Uddin; Puneet Bagga; Ravinder Reddy; Mohammad Haris
Journal: Aging (Albany NY) Date: 2019-11-19 Impact factor: 5.682

10. Meta-analysis of 74,046 individuals identifies 11 new susceptibility loci for Alzheimer's disease.

Authors: J C Lambert; C A Ibrahim-Verbaas; D Harold; A C Naj; R Sims; C Bellenguez; A L DeStafano; J C Bis; G W Beecham; B Grenier-Boley; G Russo; T A Thorton-Wells; N Jones; A V Smith; V Chouraki; C Thomas; M A Ikram; D Zelenika; B N Vardarajan; Y Kamatani; C F Lin; A Gerrish; H Schmidt; B Kunkle; M L Dunstan; A Ruiz; M T Bihoreau; S H Choi; C Reitz; F Pasquier; C Cruchaga; D Craig; N Amin; C Berr; O L Lopez; P L De Jager; V Deramecourt; J A Johnston; D Evans; S Lovestone; L Letenneur; F J Morón; D C Rubinsztein; G Eiriksdottir; K Sleegers; A M Goate; N Fiévet; M W Huentelman; M Gill; K Brown; M I Kamboh; L Keller; P Barberger-Gateau; B McGuiness; E B Larson; R Green; A J Myers; C Dufouil; S Todd; D Wallon; S Love; E Rogaeva; J Gallacher; P St George-Hyslop; J Clarimon; A Lleo; A Bayer; D W Tsuang; L Yu; M Tsolaki; P Bossù; G Spalletta; P Proitsi; J Collinge; S Sorbi; F Sanchez-Garcia; N C Fox; J Hardy; M C Deniz Naranjo; P Bosco; R Clarke; C Brayne; D Galimberti; M Mancuso; F Matthews; S Moebus; P Mecocci; M Del Zompo; W Maier; H Hampel; A Pilotto; M Bullido; F Panza; P Caffarra; B Nacmias; J R Gilbert; M Mayhaus; L Lannefelt; H Hakonarson; S Pichler; M M Carrasquillo; M Ingelsson; D Beekly; V Alvarez; F Zou; O Valladares; S G Younkin; E Coto; K L Hamilton-Nelson; W Gu; C Razquin; P Pastor; I Mateo; M J Owen; K M Faber; P V Jonsson; O Combarros; M C O'Donovan; L B Cantwell; H Soininen; D Blacker; S Mead; T H Mosley; D A Bennett; T B Harris; L Fratiglioni; C Holmes; R F de Bruijn; P Passmore; T J Montine; K Bettens; J I Rotter; A Brice; K Morgan; T M Foroud; W A Kukull; D Hannequin; J F Powell; M A Nalls; K Ritchie; K L Lunetta; J S Kauwe; E Boerwinkle; M Riemenschneider; M Boada; M Hiltuenen; E R Martin; R Schmidt; D Rujescu; L S Wang; J F Dartigues; R Mayeux; C Tzourio; A Hofman; M M Nöthen; C Graff; B M Psaty; L Jones; J L Haines; P A Holmans; M Lathrop; M A Pericak-Vance; L J Launer; L A Farrer; C M van Duijn; C Van Broeckhoven; V Moskvina; S Seshadri; J Williams; G D Schellenberg; P Amouyel
Journal: Nat Genet Date: 2013-10-27 Impact factor: 38.330

1 in total

1. Test of Prosody via Syllable Emphasis ("TOPsy"): Psychometric Validation of a Brief Scalable Test of Lexical Stress Perception.

Authors: Srishti Nayak; Daniel E Gustavson; Youjia Wang; Jennifer E Below; Reyna L Gordon; Cyrille L Magne
Journal: Front Neurosci Date: 2022-02-09 Impact factor: 4.677

1 in total