Literature DB >> 32926138

Annotating high-impact 5'untranslated region variants with the UTRannotator.

Xiaolei Zhang1,2, Matthew Wakeling3, James Ware1,2, Nicola Whiffin1,2.   

Abstract

SUMMARY: Current tools to annotate the predicted effect of genetic variants are heavily biased towards protein-coding sequence. Variants outside of these regions may have a large impact on protein expression and/or structure and can lead to disease, but this effect can be challenging to predict. Consequently, these variants are poorly annotated using standard tools. We have developed a plugin to the Ensembl Variant Effect Predictor, the UTRannotator, that annotates variants in 5'untranslated regions (5'UTR) that create or disrupt upstream open reading frames. We investigate the utility of this tool using the ClinVar database, providing an annotation for 31.9% of all 5'UTR (likely) pathogenic variants, and highlighting 31 variants of uncertain significance as candidates for further follow-up. We will continue to update the UTRannotator as we gain new knowledge on the impact of variants in UTRs.
AVAILABILITY AND IMPLEMENTATION: UTRannotator is freely available on Github: https://github.com/ImperialCardioGenetics/UTRannotator. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
© The Author(s) 2020. Published by Oxford University Press.

Entities:  

Mesh:

Substances:

Year:  2021        PMID: 32926138      PMCID: PMC8150139          DOI: 10.1093/bioinformatics/btaa783

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 Introduction

Upstream open reading frames (uORFs) are short sequences within 5′UTRs that regulate the rate at which the downstream coding sequence is translated into protein. Variants that create or disrupt uORFs (uORF-perturbing variants) have been shown to cause rare disease (Calvo ; Whiffin ). We recently used data from the Genome Aggregation Database (gnomAD) to systematically characterize the deleteriousness of different categories of uORF-perturbing variants and prioritize those that are more likely to be disease causing (Whiffin ). Current variant annotation approaches focus on the impact of protein-coding variants, with only limited annotation of predicted consequences for non-coding variants. For example, the Ensembl Variant Effect Predictor (VEP) (McLaren ), only annotates variants within UTRs as 3′ or 5′ to the coding sequence, without any further information about their predicted effect. To aid the assessment of high-impact uORF-perturbing variants, we have developed a plugin for VEP to identify 5′UTR variants that create upstream start sites (uAUGs), disrupt the start or stop codon of existing uORFs, create a new stop codon within existing uORFs, or shift the frame of an existing uORF. In each case, the tool outputs detailed annotations that allow the user to predict the likely impact of the variant on protein translation. Recently, the MORFEE tool was described (Aïssi ), however, it is limited to annotating single nucleotide variants (SNVs) that create uAUGs. The UTRannotator is, to our knowledge, the first comprehensive annotation tool for 5′UTR uORF creating and disrupting variants. Our tool has initially been created to characterize the impact of uORF-perturbing variants, however, it will be updated to annotate additional UTR variants as we learn how to interpret these for a role in human disease.

2 Approach

For any SNV, 1–5 bp small insertion/deletion (indel) or multi-nucleotide variant (MNV) in a 5′UTR, we first summarize the number of uORFs in the 5′UTR in the reference sequence. Then, for each variant within the 5′UTR we evaluate whether it would have any of the following consequences, on any annotated transcript: (i) creating a new start codon AUG to introduce a new uORF; (ii) removing an existing start codon AUG; (iii) removing the STOP codon of an existing uORF; (iv) creating a new stop codon to shorten an existing uORF; (v) disrupting an existing uORF with a frameshift deletion or insertion, whose number of nucleotides inserted or deleted is not a multiple of three. Where a variant has multiple annotation consequences, it is evaluated for each separately. To enable evaluation of the effect of each variant, the UTRannotator outputs detailed annotations for each type of uORF-perturbing variant (Table 1). This includes describing the subtype of uORF created and/or disrupted (i.e. whether this is a distinct uORF with a stop codon in the 5′UTR, or an ORF that overlaps the coding sequence either in- or out-of-frame), and the strength of the created and/or disrupted uORF start site match to the Kozak consensus sequence (Kozak, 1989). For a variant disrupting an uORF, we also evaluate whether the uORF has any experimental evidence of translation, by assessing a curated list of uORFs previously identified with ribosome profiling from the online repository of small ORFs (www.sorfs.org) (Olexiouk ). Users can also use their own customized list of translated uORFs. Given that ribosome profiling datasets are currently limited in the cell types/tissues and conditions analysed, we output results for all possible uORF-disrupting variants and include experimental evidence as an annotation.
Table 1.

Details of the annotations provided for different categories of uORF-perturbing variants.hmark results of the cascade oscillators model

ConsequenceuAUG-gaineduAUG-lostuSTOP-lostuSTOP-gaineduFrameshift
Number of existing uORFs
KozakContext: sequence and strength
Start distance to CDS
Start distance to STOP
With translated evidence
uORF subtype√ (ref and alt)
Other annotationsStart distance from capWhether there is an alternative STOP, alternative stop distance to CDS, frame of disrupted uORF with CDSNew stop distance to CDS
Details of the annotations provided for different categories of uORF-perturbing variants.hmark results of the cascade oscillators model Since a 5′UTR can have multiple existing uORFs, for each 5′UTR variant we output the annotations for all disrupted uORFs. Detailed information on installing and running UTRannotator can be found in Supplementary Information. The time complexity of our implementation is linear to the number of input variants. The ratio of running time without the plugin to that with the plugin, tested on 1000 random variants (60% annotated as 5′UTR variants) is 1.02–1.07 (5 replications).

3 Results

To show the utility of our UTR annotator tool, we annotated all 5′UTR variants interpreted as pathogenic/likely pathogenic and uncertain significance from ClinVar (version 202005) (Landrum ). These variants do not have a coding annotation on any transcript. However, we note that 5′UTR variants are under-represented in ClinVar as they are rarely sequenced and/or reported. There are 97 Pathogenic/Likely pathogenic 5′UTR variants in ClinVar (97/113 969 = 0.085% of all ClinVar Pathogenic/Likely pathogenic). 91 are 1–5 bp small variations, 29 of which (31.9%) are annotated as creating or disrupting uORFs by our plugin (Fig. 1; Supplementary Table S1). We examined the evidence behind the reported clinical significance for each variant, and found 15 (51.7%) have previously been attributed to a uORF-perturbing mechanism.
Fig. 1.

5′UTR variants in ClinVar annotated by the UTRannotator. (a) A schematic showing the five distinct consequences of 5′UTR variants annotated by the tool: those that create an upstream AUG (uAUG_gained), those that disrupt the start site of an existing upstream open reading frame (uORF; uAUG_lost), those that cause a frameshift in the sequence of the uORF (uFrameShift), those that introduce a new stop codon into an existing uORF (uSTOP_gained) and those that disrupt the stop site of an existing uORF (uSTOP_lost). (b) The counts of each variant category that are classified as Pathogenic/Likely Pathogenic (teal) or Uncertain Significance (VUS; grey) in ClinVar

5′UTR variants in ClinVar annotated by the UTRannotator. (a) A schematic showing the five distinct consequences of 5′UTR variants annotated by the tool: those that create an upstream AUG (uAUG_gained), those that disrupt the start site of an existing upstream open reading frame (uORF; uAUG_lost), those that cause a frameshift in the sequence of the uORF (uFrameShift), those that introduce a new stop codon into an existing uORF (uSTOP_gained) and those that disrupt the stop site of an existing uORF (uSTOP_lost). (b) The counts of each variant category that are classified as Pathogenic/Likely Pathogenic (teal) or Uncertain Significance (VUS; grey) in ClinVar There are 5128 5′UTR variants of uncertain significance (VUS) reported in ClinVar (5128/255 691 = 2% of all VUS), 4966 of which are 1–5 bp small variations. Our plugin annotated 377 of these (7.6%) as creating or disrupting uORFs, on at least one annotated transcript (Supplementary Table S2). We used the detailed annotations from the UTRannotator to illustrate how to prioritize 5′UTR VUS that are most promising for further follow-up. We first restricted to variants that form new overlapping ORFs (oORFs) with start sites that are Strong or Moderate matches to the Kozak consensus sequence, or that are uORFs with documented evidence of translation, as we previously showed that variants with these consequences are under strongest negative selection (Whiffin ). Finally, we took variants in 3191 genes previously identified as having a ‘High’ likelihood that uORF-perturbation could be an important disease mechanism (Whiffin ). Through this approach, we identified 31 potential ‘high-impact’ ClinVar 5′UTR VUS (Supplementary Table S3).

4 Discussion

We have created a freely available tool, as a plugin to the Ensembl VEP, that annotates variants that create or disrupt uORFs. The output from the tool can be used to predict the possible impact of variants identified in patients for a role in disease. It is also directly applicable to annotate 5′UTR variants from other eukaryotes (see Supplementary Information). We initially referenced our development of this tool in prior work (Whiffin ), however, since then have greatly expanded the variant types evaluated (including small indels and MNVs) and the consequences annotated (including uAUG-lost, uSTOP-gained and uORF frameshift variants). We note several limitations to our tool. Firstly, the UTRannotator has been configured to annotate only variants up to 5 bp in length. We included this length restriction for two reasons: (i) the annotation of longer indels is tricky, as the chance of variants having multiple possible annotations is increased, and (ii) the impact of larger indels that add or remove large stretches of UTR is currently unknown. We also currently only consider uORFs with canonical AUG start sites. It is known that many translated uORFs use non-canonical start sites (McGillivray ). More research is needed into the impact of variants that create or disrupt these non-canonical uORFs in human disease. For the initial tool release, we have included five variant types that create or disrupt uORFs, however, we will continue to develop the UTRannotator to include additional types of UTR variants.

Funding

N.W. was supported by a Rosetrees and Stoneygate Imperial College Research Fellowship. This work was supported by the Wellcome Trust [107469/Z/15/Z; 200990/A/16/Z], Medical Research Council (UK), British Heart Foundation [RE/18/4/34215], National Institute for Health Research (NIHR) Royal Brompton Cardiovascular Biomedical Research Unit, and the NIHR Imperial College Biomedical Research Centre. Conflict of Interest: none declared. Click here for additional data file.
  7 in total

1.  A comprehensive catalog of predicted functional upstream open reading frames in humans.

Authors:  Patrick McGillivray; Russell Ault; Mayur Pawashe; Robert Kitchen; Suganthi Balasubramanian; Mark Gerstein
Journal:  Nucleic Acids Res       Date:  2018-04-20       Impact factor: 16.971

2.  Upstream open reading frames cause widespread reduction of protein expression and are polymorphic among humans.

Authors:  Sarah E Calvo; David J Pagliarini; Vamsi K Mootha
Journal:  Proc Natl Acad Sci U S A       Date:  2009-04-16       Impact factor: 11.205

Review 3.  The scanning model for translation: an update.

Authors:  M Kozak
Journal:  J Cell Biol       Date:  1989-02       Impact factor: 10.539

4.  ClinVar: improving access to variant interpretations and supporting evidence.

Authors:  Melissa J Landrum; Jennifer M Lee; Mark Benson; Garth R Brown; Chen Chao; Shanmuga Chitipiralla; Baoshan Gu; Jennifer Hart; Douglas Hoffman; Wonhee Jang; Karen Karapetyan; Kenneth Katz; Chunlei Liu; Zenith Maddipatla; Adriana Malheiro; Kurt McDaniel; Michael Ovetsky; George Riley; George Zhou; J Bradley Holmes; Brandi L Kattman; Donna R Maglott
Journal:  Nucleic Acids Res       Date:  2018-01-04       Impact factor: 16.971

5.  An update on sORFs.org: a repository of small ORFs identified by ribosome profiling.

Authors:  Volodimir Olexiouk; Wim Van Criekinge; Gerben Menschaert
Journal:  Nucleic Acids Res       Date:  2018-01-04       Impact factor: 16.971

6.  The Ensembl Variant Effect Predictor.

Authors:  William McLaren; Laurent Gil; Sarah E Hunt; Harpreet Singh Riat; Graham R S Ritchie; Anja Thormann; Paul Flicek; Fiona Cunningham
Journal:  Genome Biol       Date:  2016-06-06       Impact factor: 13.583

7.  Characterising the loss-of-function impact of 5' untranslated region variants in 15,708 individuals.

Authors:  Daniel G MacArthur; James S Ware; Nicola Whiffin; Konrad J Karczewski; Xiaolei Zhang; Sonia Chothani; Miriam J Smith; D Gareth Evans; Angharad M Roberts; Nicholas M Quaife; Sebastian Schafer; Owen Rackham; Jessica Alföldi; Anne H O'Donnell-Luria; Laurent C Francioli; Stuart A Cook; Paul J R Barton
Journal:  Nat Commun       Date:  2020-05-27       Impact factor: 14.919

  7 in total
  8 in total

1.  SmProt: A Reliable Repository with Comprehensive Annotation of Small Proteins Identified from Ribosome Profiling.

Authors:  Yanyan Li; Honghong Zhou; Xiaomin Chen; Yu Zheng; Quan Kang; Di Hao; Lili Zhang; Tingrui Song; Huaxia Luo; Yajing Hao; Runsheng Chen; Peng Zhang; Shunmin He
Journal:  Genomics Proteomics Bioinformatics       Date:  2021-09-15       Impact factor: 6.409

2.  Non-coding region variants upstream of MEF2C cause severe developmental disorder through three distinct loss-of-function mechanisms.

Authors:  Caroline F Wright; Nicholas M Quaife; Laura Ramos-Hernández; Petr Danecek; Matteo P Ferla; Kaitlin E Samocha; Joanna Kaplanis; Eugene J Gardner; Ruth Y Eberhardt; Katherine R Chao; Konrad J Karczewski; Joannella Morales; Giuseppe Gallone; Meena Balasubramanian; Siddharth Banka; Lianne Gompertz; Bronwyn Kerr; Amelia Kirby; Sally A Lynch; Jenny E V Morton; Hailey Pinz; Francis H Sansbury; Helen Stewart; Britton D Zuccarelli; Stuart A Cook; Jenny C Taylor; Jane Juusola; Kyle Retterer; Helen V Firth; Matthew E Hurles; Enrique Lara-Pezzi; Paul J R Barton; Nicola Whiffin
Journal:  Am J Hum Genet       Date:  2021-05-21       Impact factor: 11.025

3.  A monoallelic SEC23A variant E599K associated with cranio-lenticulo-sutural dysplasia.

Authors:  Katarina Cisarova; Livia Garavelli; Stefano Giuseppe Caraffi; Francesca Peluso; Lara Valeri; Giancarlo Gargano; Sara Gavioli; Gabriele Trimarchi; Alberto Neri; Belinda Campos-Xavier; Andrea Superti-Furga
Journal:  Am J Med Genet A       Date:  2021-09-28       Impact factor: 2.578

4.  Genome-wide identification of the genetic basis of amyotrophic lateral sclerosis.

Authors:  Sai Zhang; Johnathan Cooper-Knock; Annika K Weimer; Minyi Shi; Tobias Moll; Jack N G Marshall; Calum Harvey; Helia Ghahremani Nezhad; John Franklin; Cleide Dos Santos Souza; Ke Ning; Cheng Wang; Jingjing Li; Allison A Dilliott; Sali Farhan; Eran Elhaik; Iris Pasniceanu; Matthew R Livesey; Chen Eitan; Eran Hornstein; Kevin P Kenna; Jan H Veldink; Laura Ferraiuolo; Pamela J Shaw; Michael P Snyder
Journal:  Neuron       Date:  2022-01-18       Impact factor: 18.688

5.  Comparison of in silico strategies to prioritize rare genomic variants impacting RNA splicing for the diagnosis of genomic disorders.

Authors:  Charlie Rowlands; Huw B Thomas; Jenny Lord; Htoo A Wai; Gavin Arno; Glenda Beaman; Panagiotis Sergouniotis; Beatriz Gomes-Silva; Christopher Campbell; Nicole Gossan; Claire Hardcastle; Kevin Webb; Christopher O'Callaghan; Robert A Hirst; Simon Ramsden; Elizabeth Jones; Jill Clayton-Smith; Andrew R Webster; Andrew G L Douglas; Raymond T O'Keefe; William G Newman; Diana Baralle; Graeme C M Black; Jamie M Ellingford
Journal:  Sci Rep       Date:  2021-10-18       Impact factor: 4.379

Review 6.  Common and Rare 5'UTR Variants Altering Upstream Open Reading Frames in Cardiovascular Genomics.

Authors:  Omar Soukarieh; Caroline Meguerditchian; Carole Proust; Dylan Aïssi; Mélanie Eyries; Aurélie Goyenvalle; David-Alexandre Trégouët
Journal:  Front Cardiovasc Med       Date:  2022-03-21

7.  Case Report: Biallelic Loss of Function ATM due to Pathogenic Synonymous and Novel Deep Intronic Variant c.1803-270T > G Identified by Genome Sequencing in a Child With Ataxia-Telangiectasia.

Authors:  Tatiana Maroilley; Nicola A M Wright; Catherine Diao; Linda MacLaren; Gerald Pfeffer; Justyna R Sarna; Ping Yee Billie Au; Maja Tarailo-Graovac
Journal:  Front Genet       Date:  2022-01-25       Impact factor: 4.599

8.  Molecular diagnoses in the congenital malformations caused by ciliopathies cohort of the 100,000 Genomes Project.

Authors:  Sunayna Best; Jenny Lord; Matthew Roche; Christopher M Watson; James A Poulter; Roel P J Bevers; Alex Stuckey; Katarzyna Szymanska; Jamie M Ellingford; Jenny Carmichael; Helen Brittain; Carmel Toomes; Chris Inglehearn; Colin A Johnson; Gabrielle Wheway
Journal:  J Med Genet       Date:  2021-10-29       Impact factor: 5.941

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.