| Literature DB >> 35850704 |
Jamie M Ellingford1,2,3, Joo Wook Ahn4, Diana Baralle5,6, Sian Ellard7,8, David R FitzPatrick9, William G Newman10,11, Jenny C Taylor12,13, Steven M Harrison14,15, Nicola Whiffin16,17, Richard D Bagnall18, Stephanie Barton11, Chris Campbell11, Kate Downes4, Celia Duff-Farrier19, John M Greally20, Jodie Ingles21,22, Neesha Krishnan21,22, Jenny Lord5, Hilary C Martin23, Anne O'Donnell-Luria14,24,25, Simon C Ramsden11, Heidi L Rehm14,25, Ebony Richardson21,22, Moriel Singer-Berk14, Maggie Williams19, Jordan C Wood14, Caroline F Wright7.
Abstract
BACKGROUND: The majority of clinical genetic testing focuses almost exclusively on regions of the genome that directly encode proteins. The important role of variants in non-coding regions in penetrant disease is, however, increasingly being demonstrated, and the use of whole genome sequencing in clinical diagnostic settings is rising across a large range of genetic disorders. Despite this, there is no existing guidance on how current guidelines designed primarily for variants in protein-coding regions should be adapted for variants identified in other genomic contexts.Entities:
Keywords: Gene regulation; Non-coding variation; Variant interpretation
Mesh:
Year: 2022 PMID: 35850704 PMCID: PMC9295495 DOI: 10.1186/s13073-022-01073-3
Source DB: PubMed Journal: Genome Med ISSN: 1756-994X Impact factor: 15.266
Fig. 1Schematic of regulatory elements within and around a gene and examples of disruptions that can lead to disease
Categories of small variants in non-coding regions previously implicated in penetrant Mendelian disease
| Region | Mechanism | Example (variant/gene) | Example (disease) | ClinVar Var ID | Reference | VEP categories | In silico tools to predict effect |
|---|---|---|---|---|---|---|---|
| Promoter | Altering transcription factor binding | GATA1:−113A>G | Hereditary persistence of foetal haemoglobin | 1 | Upstream gene variant / regulatory region / TF binding site | TF binding site disruption prediction tools e.g. motifbreakR / SEMpl / QBiC-Pred | |
| Promoter | Altering transcription | CHM c.-98C>A, c.-98C>T | Choroideremia | 2 | |||
| Promoter/5′UTR | Altering methylation patterns | BRCA1:c.-107A>T | Breast and ovarian cancer | 3 | Upstream gene variant / 5 prime UTR variant | ||
| 5′UTR | Creating upstream start site (uAUG) | NF1:c.-272G>A | Neurofibromatosis type 1 | 1013130 | 4 | 5 prime UTR variant | e.g. UTRannotator |
| 5′UTR | Perturbing upstream open reading frames | NF2:−66-65insT | Neurofibromatosis type 2 | 5 | e.g. UTRannotator | ||
| 5′UTR | Disrupting internal ribosome entry sites (IRES) | GJB1:c.-103C>T | Charcot-Marie-Tooth disease | 217166 | 6 | e.g. IRESpy / IRESfinder / IRESite | |
| 5′UTR | Disrupting splicing | SHOX:c.-19G>A | SHOX haploinsufficiency | 933226 | 7 | Splicing prediction tools e.g. SpliceAI | |
| 5′UTR | Altering Kozak consensus of start site | GATA4:c.-6G>C | Atrial septal defect | 8 | e.g. utR.annotation | ||
| 5′UTR | N-terminal transcript elongation | MEF2C:c.−8C>T | Developmental disorder | 9 | e.g. UTRannotator | ||
| Intron | Disrupting canonical splice sites | MYBPC3:c.3490+1G>A | Hypertrophic cardiomyopathy | 42715 | 10 | Splice donor variant / splice acceptor variant / splice region variant / splice_donor_5th_base_variant / splice_polypyrimidine_tract_variant | Splicing prediction tools e.g. SpliceAI |
| Intron | Disrupting splicing branch point | HNF4A:c.264-21A>G | Maturity-onset diabetes of the young | 11 | Intron variant | Splicing prediction tools e.g. SpliceAI | |
| Intron | Pseudo-exon activation | DMD: c.7310-19A>G | Muscular dystrophy | 12 | Splicing prediction tools e.g. SpliceAI | ||
| Intron | Poison-exon inclusion | SCN1A:c.4002+2165C>T | Dravet Syndrome | 13 | Splicing prediction tools e.g. SpliceAI | ||
| Intron | Branchpoint mutation | BBS1:c.592-21A>T | Retinitis pigmentosa | 14 | Splicing prediction tools e.g. SpliceAI | ||
| Intron | Indels & spacing of splicing motifs | DOK7:c.54+8_54+17del | 15 | Splicing prediction tools e.g. SpliceAI | |||
| Intron | Cryptic exon | VHL: c.3401770T.C | Erythrocytosis | 16 | |||
| 3′UTR | Disrupting polyA signal motif | NAA10:c.a43A>G | Microphthalmia | 617463 | 17 | 3 prime UTR variant | PolyA signal motif prediction tools e.g. Omni-PolyA |
| 3′UTR | Disrupting miRNA interactions | REEP1 | Hereditary spastic paraplegia | 18 | miRNA binding site prediction e.g. miRTarBase | ||
| 3′UTR | Disrupting splicing | LHFPL5:c.a16+1G>A | Hearing impairment | 19 | Splicing prediction tools e.g. SpliceAI | ||
| CRE | Altering transcription factor binding | chr7:155754267:C>T; NCBI build 36.3 | Holoprosencephaly | 20 | Upstream gene variant / downstream gene variant / regulatory region variant / TF binding site variant / intergenic variant / TFBS ablation / TFBS amplification / regulatory region ablation / regulatory region amplification | TF binding site disruption prediction tools e.g. motifbreakR / SEMpl / QBiC-Pred | |
| CRE | Abolishing enhancer activity | PTF1A - 6 variants | Isolated pancreatic agenesis | 21 | |||
| CRE | Disrupting enhancer activity | SOX9 - deletion (chr17:67,628,756–67,634,155) | Pierre Robin sequence | 22 | |||
| Intergenic | Creating new regulatory element | chr16:209,709 T>C | α-thalassaemia | 23 | Intergenic variant / upstream gene variant / downstream gene variant | ||
| miRNA | Disrupting seed region | miR-204:n.37C>T | Retinal dystrophy | 24 | non-coding transcript exon variant / non-coding transcript variant / mature_miRNA_variant | ||
| snRNA | Altering structure | RNU12 | Cerebellar ataxia | 25 | Non-coding transcript exon variant / non-coding transcript variant | ||
| snRNA | Abnormal splicing, accumulation of minor intron retained transcripts | RNU4ATAC | Roifman Syndrome | 26 | Splicing prediction tools e.g. SpliceAI | ||
| snRNA | Affecting expression, processing and protein binding | SNORD118 | Cerebral microangiopathy leukoencephalopathy | 27 | |||
| TAD boundary | Disrupting chromatin looping leading to enhancer loss or adoption | WNT6/IHH/EPHA4/PAX3 locus | Limb phenotypes | 28 | Intergenic variant |
This is not intended as an exhaustive list. Reference DOIs: 1. 10.1182/blood-2018-07-863951. 2. 10.1002/humu.23212. 3. 10.1016/j.ajhg.2018.07.002. 4. 10.1016/j.ebiom.2016.04.005. 5. 10.1038/s41467-019-10717-9. 6. 10.1074/jbc.M005199200. 7. 10.1038/s41431-020-0676-y. 8. 10.1002/ajmg.a.36703. 9. 10.1016/j.ajhg.2021.04.025. 10. 10.1172/JCI119555. 11. 10.2337/db07-1657. 12. 10.3390/genes11101180. 13. 10.1016/j.ajhg.2018.10.023. 14. 10.1136/jmedgenet-2020-107626. 15. 10.1016/j.ajhg.2019.07.013. 16. 10.1182/blood-2018-03-838235. 17. 10.1136/jmedgenet-2018-105836. 18. 10.1086/505361. 19. 10.1038/s10038-018-0502-3. 20. 10.1038/ng.230. 21. 10.1038/ng.2826. 22. 10.1038/ng.329. 23. 10.1038/s41467-021-23980-6. 24. 10.1073/pnas.1401464112. 25. 10.1002/ana.24826. 26. 10.1038/ncomms9718. 27. 10.1038/ng.3661. 28. 10.1016/j.cell.2015.04.004
Fig. 2Non-coding region variants are under-ascertained in ClinVar and are more likely to be classified as variants of uncertain significance (VUS) when compared to protein-coding variants. a The proportion of the genomic footprint of MANE transcripts that fall into each of five region categories and the proportion of variants in ClinVar (all, likely pathogenic or pathogenic, likely benign or benign, and VUS) within those regions. b The number of high-confidence pathogenic variants in ClinVar (see ‘2’) that fall into each of the five region categories plotted as bars, with the proportion of variants in each region classified as VUS as blue points
Data types used for identification of candidate non-coding regions
| Evidence | Description | Region(s) | Possible source | Extra considerations |
|---|---|---|---|---|
| ATAC-seq or DNase | Flags regions of open chromatin | Promoter / CRE | ENCODE; ROADMAP epigenomics | |
| Promoter capture Hi-C | Links enhancer regions to promoters of target genes | CRE | Published datasets; 3D genome browsera | Transient interactions that may not be needed for enhancer function |
| Hi-C | Define topologically associated domains (TADs) | TAD boundaries | Published datasets; 3D genome browsera | |
| Cap analysis gene expression (CAGE) | Marks the 5' cap of mRNA | Promoter / 5'UTR | FANTOM5 | Alternative transcripts |
| CTCF ChIP-seq | Identifies regions bound by the insulator protein CTCF | TAD boundaries | ENCODE; ROADMAP epigenomics | |
| Transcription factor ChIP-seq | Identifies regions bound by specific transcription factors | Promoter / CRE | ENCODE; ROADMAP epigenomics | |
| H3K4Me | Histone modification found near enhancers | CRE | ENCODE; ROADMAP epigenomics | |
| H3K4Me3 | Histone modification found near promoters | Promoter | ENCODE; ROADMAP epigenomics | |
| H3K27Ac | Histone modification found at active regulatory elements | Promoter / CRE | ENCODE; ROADMAP epigenomics | |
| Expression quantitative trait loci (eQTLs) | Identifies variants that are associated with changes in gene expression | Promoter / CRE | GTEx; eqtlgen.org | |
| Experimental perturbation | Demonstrates an impact of altering/deleting all or part an element on gene expression | Promoter / CRE | Published data | Assays may not be representative of endogenous situation |
| Northern blot, RT-qPCR, RNA sequencing and microarrays | Detection of miRNAs | miRNAs | miRBase; published datasets | |
| Multiple approaches | Experimentally validated miRNA–target interactions | miRNA targets | miRTarBase; published datasets | |
| Bisulphite sequencing | Detects methylated DNA and allows identification of differentially methylated regions | Promoter / upstream gene regions | ENCODE; ROADMAP epigenomics | |
| RNA Pol II ChIP-seq | Detection of poised polymerase II | Promoter | ENCODE; ROADMAP epigenomics |
a10.1186/s13059-018-1519-9
bTissue specificity and temporal specificity (e.g. specific to a developmental time-point) should be considerations for all
Fig. 3ACMG evidence framework for non-coding region variants. An adapted version of the figure from Richards et al. [30] (permission granted). Rules that require no extra guidance for non-coding region variants are written in black, with those requiring extra considerations or adaptation in colour. †Should not be applied if the assay only assessed one of multiple possible mechanisms. ^Reduced to supporting following guidance from ClinGen SVI [50]. $Variant must have at least as great an impact predicted by in silico tools
| Gene and protein expression are tightly controlled processes mediated by a multitude of regulatory elements (Fig. |