| Literature DB >> 32668441 |
Veronica B Searles Quick1, Belinda Wang1, Matthew W State2.
Abstract
"Big data" approaches in the form of large-scale human genomic studies have led to striking advances in autism spectrum disorder (ASD) genetics. Similar to many other psychiatric syndromes, advances in genotyping technology, allowing for inexpensive genome-wide assays, has confirmed the contribution of polygenic inheritance involving common alleles of small effect, a handful of which have now been definitively identified. However, the past decade of gene discovery in ASD has been most notable for the application, in large family-based cohorts, of high-density microarray studies of submicroscopic chromosomal structure as well as high-throughput DNA sequencing-leading to the identification of an increasingly long list of risk regions and genes disrupted by rare, de novo germline mutations of large effect. This genomic architecture offers particular advantages for the illumination of biological mechanisms but also presents distinctive challenges. While the tremendous locus heterogeneity and functional pleiotropy associated with the more than 100 identified ASD-risk genes and regions is daunting, a growing armamentarium of comprehensive, large, foundational -omics databases, across species and capturing developmental trajectories, are increasingly contributing to a deeper understanding of ASD pathology.Entities:
Mesh:
Year: 2020 PMID: 32668441 PMCID: PMC7688655 DOI: 10.1038/s41386-020-0768-y
Source DB: PubMed Journal: Neuropsychopharmacology ISSN: 0893-133X Impact factor: 8.294
Fig. 1Types of genetic variants.
a The majority of genetic variation in the human genome is common (population frequency ≥ 1%, blue). These variants are transmitted from parents to offspring via Mendelian inheritance patterns. A smaller proportion is rare (≤1%, purple) and also transmitted from parents. ∼70 variants are de novo (red), observed only in the child, but not in either parent. b The impact of single-nucleotide variants (SNVs) and small (≤50 bp) insertion/deletions (indels) depends on their location in the genome. In the 1.5% of the genome that encodes proteins (the exome), these variants can either be synonymous (no change to the resulting protein), missense (a single amino acid is changed in the protein with variable functional impact), or protein-truncating (leads to nonsense-mediated decay and no protein). Variants and their consequences (red stars) are shown on the father’s allele, but can also arise on the maternal allele. c Copy number variants (CNVs) are large (≥50 bp to millions of nucleotides) deletions (resulting in no protein), or duplications (potentially resulting in excess protein). Figure adapted from Sanders [81] with author permission.
Fig. 2A model of rare large-effect de novo mutations acting in combination with common risk alleles.
a An idealized distribution of common polygenic risks that are normally distributed in the general population. The red vertical dotted line represents an arbitrary cutoff for the diagnosis of ASD. For a highly heritable disorder such as ASD, those at the low end of the distribution of risk (left) will be less likely to meet diagnostic criteria than those on the far right end of the distribution. The superimposition of the upper panel and the lower panel (b), representing the distribution of ASD symptoms in the population, models the observation that the vast majority of common allele population risk is present in individuals without a clinical diagnosis. The lower panel (b) shows the same red dotted vertical line reflecting an arbitrary cutoff for the categorical diagnosis of ASD. The abbreviations in parenthesis (epi epilepsy, ADHD attention deficit hyperactivity disorder, SCZ schizophrenia, SLI specific language impairment) reflects the observation that highly penetrant ASD risks may also carry risks for diagnoses apart from ASD. The arrows on the bottom of the diagram represent large-effect rare de novo mutations. The purple arrow is showing how a large risk de novo mutation can move an individual with intermediate risk and the likelihood of no symptoms across the diagnostic threshold. The gray arrow reflects the observation that these risks while large are not Mendelian and that sometimes rare large-effect mutations do not show a phenotype at all, which may reflect that they are acting in the context of very low polygenic risk. The purple box on the right side of (b), reflects the finding that while de novo mutations carry a very small proportion of population risk, they represent a substantial fraction of individuals who exceed clinical thresholds.
Recurrent de novo CNVs found in Simons Simplex Collection and Autism Genome Project cohorts.
| Cytoband | Location (hg19) | De novo SNVs | Del/Dup | FDR | Associated with |
|---|---|---|---|---|---|
| 1q21.1 | chr1:146,467,203−147,801,691 | 9 | 1/8 | 2 × 10−9 | |
| 2p16.3 | chr2:50,145,643−51,259,674 | 8 | 7/1 | 4 × 10−8 | |
| 3q29 | chr3:195,747,398−196,191,434 | 4 | 4/0 | 0.02 | |
| 7q11.23 | chr7:72,773,570−74,144,177 | 5 | 1/4 | 0.0008 | |
| 15q11.2-13.1 | chr15:23,683,783−28,446,765 | 10 | 0/10 | <1 × 10−10 | Angelman/Prader-Willi |
| 15q13.2-13.3 | chr15:30,943,512−32,515,843 | 5 | 3/2 | 0.0008 | |
| 16p11.2 | chr16:29,655,864−30,195,048 | 19 | 12/7 | <1 × 10−10 | |
| 22q11.21 | chr22:18,889,490−21,463,730 | 8 | 4/4 | 1 × 10−7 | |
| 22q13.33 | chr22:51,123,505−51,174,548 | 4 | 4/0 | 0.02 |
Adapted from Sanders et al. [5].
Statistical evidence for association of ASD genes based on rare de novo transmitted sequence variation and de novo CNVs.
| FDR ≤ 0.01 | 0.01 < FDR ≤ 0.05 | 0.05 < FDR ≤ 0.1 |
| ADNPa, ANK2a, ANKRD11, AP2S1, ARID1Ba, ASH1La, BCL11Ab, CHD2a, CHD8a, CTNNB1, DEAF1, DNMT3Ab, DPYSL2, DSCAMa, DYNC1H1, DYRK1Aa, FOXP1b, GABRB3b, GIGYF1b, GRIN2Ba, KCNQ3, KDM5B a, KDM6Bb, KMT2Ca, MAP1A, MBD5c, MED13L, MKX, MYT1Lb, NRXN1a, PAX5, POGZa, PTENa, RAI1, RORB, SCN2Aa, SETD5a, SHANK2a, SHANK3a, SIN3A, SLC6A1b, SRPR, SUV420H1a, SYNGAP1a, TBL1XR1, TLK2, WACa | ASXL3, CACNA1E, CELF4, CREBBP, EIF3G, FOXP2, GFAP, GNAI1, IRF2BPLc, KIAA0232, LDB1, NSD1, PHF12b, PHF2, PHF21A, PPP2R5D, PRR12, RFX3, SATB1, SKI, SMARCC2, SPASTb, STXBP1, TBR1a, TCF20, TCF4, TCF7L2a, TM9SF4, TRIP12a, VEZF1, ZMYND8 | CACNA2D3, CORO1A, DIP2Ac, ELAVL3, GABRB2, GRIA2, HDLBP, HECTD4, KCNMA1, KMT2Ec, LRRC4C, NACC1, NCOA1, NR3C2, NUP155, PPP1R9B, PPP5C, PTK7c, SCN1A, TAOK1, TEK, TRAF7, TRIM23, UBR1 |
Genes found to be significantly associated with ASD in Satterstrom et al. [4]. Comparison with genes identified by Sanders et al. is indicated (aFDR ≤ 0.01 in Sanders et al. [5], b0.01 < FDR ≤ 0.05 in Sanders et al. [5], c0.05 < FDR ≤ 0.1 in Sanders et al.) [5].
Genome-wide significant loci from ASD scans.
| Index variant | Chr | BP | s.e. | A1/A2 | Freq | Nearest genes | ||
|---|---|---|---|---|---|---|---|---|
| rs910805 | 20 | 21248116 | 2.04 × 10−9 | –0.096 | 0.016 | A/G | 0.76 | |
| rs10099100 | 8 | 10576775 | 1.07 × 10−8 | 0.084 | 0.015 | C/G | 0.331 | |
| rs201910565 | 1 | 96561801 | 2.48 × 10−8 | –0.077 | 0.014 | A/AT | 0.689 | |
| rs71190156 | 20 | 14836243 | 2.75 × 10−8 | –0.078 | 0.014 | GTTTT | 0.481 | |
| rs111931861 | 7 | 104744219 | 3.53 × 10−8 | –0.216 | 0.039 | A/G | 0.966 |
Chr chromosome, BP chromosomal position, A1/A2 alleles, Freq allele frequency of A1, β estimate of effect with respect to A1; s.e. standard error of β, P association P-value of the index variant (P).
Adapted from Grove et al. [60].
*Rare variation in KMT2E has been found to be associated with ASD risk with FDR < 0.1 (Sanders et al. [5], Satterstrom et al. [4]). “Nearest genes” lists nearest genes from within 50 kb of the region spanned by all SNPs with r2 ≥ 0.6 to the index variant.
Fig. 3Levels of pathogenesis and convergent analysis.
a ASD can manifest or be investigated at multiple different levels, starting from a genetic variant (marked by red star) all the way to behavioral phenotypes. b A conceptual illustration of convergent analysis from risk genes to behavior in ASD, in which multiple independent risk genes are studied in parallel to triangulate on specific protein complexes, functional networks, cell types, and or/circuits that show overlap among functionally diverse risk genes. Figures adapted from Willsey et al. [13] and Sestan and State 2018 [156] with author permission.
Fig. 4A strategy for combining human brain expression data and high-confidence risk genes to identify spatiotemporal convergence.
Willsey et al. [112] established co-expression networks for the nine highest confidence ASD-risk genes at the time of publication. There networks were established by setting a high threshold for gene expression correlation irrespective of sign—based on the hypothesis that coordinated gene activity, whether in the same or opposite directions, is a useful proxy for shared biological function. Networks were created for spatiotemporal periods defined in the Brainspan database [113], using their time windows. Co-expression networks based on the highest confidence genes were then examined for enrichment of an independent list of probable ASD-risk genes and compared to the null expectation, looking for enrichment of genes that have evidence for ASD risk within any of the predefined networks. In this case, statistically significant evidence was found for enrichment of PFC in mid-fetal development at approximately 18–24 weeks, and additional signal was identified in medial dorsal thalamus and cerebellum later in development (in early infancy).