Literature DB >> 24716852

A short guide to long non-coding RNA gene nomenclature.

Abstract

The HUGO Gene Nomenclature Committee (HGNC) is the only organisation authorised to assign standardised nomenclature to human genes. Of the 38,000 approved gene symbols in our database (http://www.genenames.org), the majority represent protein-coding (pc) genes; however, we also name pseudogenes, phenotypic loci, some genomic features, and to date have named more than 8,500 human non-protein coding RNA (ncRNA) genes and ncRNA pseudogenes. We have already established unique names for most of the small ncRNA genes by working with experts for each class. Small ncRNAs can be defined into their respective classes by their shared homology and common function. In contrast, long non-coding RNA (lncRNA) genes represent a disparate set of loci related only by their size, more than 200 bases in length, share no conserved sequence homology, and have variable functions. As with pc genes, wherever possible, lncRNAs are named based on the known function of their product; a short guide is presented herein to help authors when developing novel gene symbols for lncRNAs with characterised function. Researchers must contact the HGNC with their suggestions prior to publication, to check whether the proposed gene symbol can be approved. Although thousands of lncRNAs have been predicted in the human genome, for the vast majority their function remains unresolved. lncRNA genes with no known function are named based on their genomic context. Working with lncRNA researchers, the HGNC aims to provide unique and, wherever possible, meaningful gene symbols to all lncRNA genes.

Entities: Chemical Disease Gene Species

Mesh：

Substances：

Year: 2014 PMID： 24716852 PMCID： PMC4021045 DOI： 10.1186/1479-7364-8-7

Source DB: PubMed Journal: Hum Genomics ISSN： 1473-9542 Impact factor: 4.639

Introduction

Since its inception in the 1970s, the HUGO Gene Nomenclature Committee (HGNC) [1] has kept apace with the discovery and characterisation of new human genes, providing each gene with a unique symbol and name and thus aiding effective scientific communication. By the time the initial sequence of the Human Genome was published in 2001 [2], the HGNC database (http://www.genenames.org) [3] contained more than 13,000 approved gene names, mostly for protein-coding genes with only around 200 non-coding RNA (ncRNA) gene names. With the burgeoning research and interest in ncRNAs over the last decade, the number of ncRNA loci with gene names has vastly expanded to more than 8,500 currently; about 2,000 of these represent long non-coding RNA (lncRNA) genes. Whereas classes of small ncRNAs can be defined by their shared homology and common function [4], in contrast, lncRNA genes are a disparate set of loci related only by their size (more than 200 bases in length), are non-homologous, and have variable functions [5]. Their discovery has been further complicated because they are expressed at very low levels, sometimes only at specific developmental stages, and in specific tissues [6]. Large-scale transcriptomic analyses, such as RNA-Seq, have now revealed thousands of putative long non-coding RNAs [7]; these present unique nomenclature challenges, especially because for the vast majority, the function of the resultant transcript(s) remains unknown. Below, we present a brief guide to the nomenclature of lncRNA genes and provide examples of some of the genes named to date.

lncRNA gene naming guidelines

The HGNC endeavours to approve symbols and names that have been used in publications, but this is not always possible. To ensure their symbol can be approved authors must contact the HGNC prior to publication to agree the nomenclature for each novel lncRNA gene. When creating a new lncRNA gene name, there are a number of factors that should be taken into account:

Each approved gene symbol must be unique

This is the paramount nomenclature rule and cannot be broken. Uniqueness enables unambiguous communication and this utility of approved gene nomenclature ensures that everyone knows they are speaking about the same gene. If an author publishes a lncRNA name that is already in use for another locus, then the HGNC will have to assign an alternative symbol. For instance, a novel lncRNA required to keep epidermal cells in an undifferentiated state was published as ANCR[8] but this could not be approved since this was already in use for the ‘Angelman syndrome chromosome region’; so, in agreement with the authors, it was approved as DANCR for ‘differentiation antagonizing non-protein coding RNA’.

Symbols are short-form representations of the descriptive gene name

Each lncRNA is assigned a gene symbol that is an abbreviation or acronym of a descriptive name. For example, the symbol BANCR is an abbreviation of the full name ‘BRAF-activated non-protein coding RNA’. Gene symbols are the primary descriptors used in communications about genes and their brevity makes them user friendly.

Symbols should only contain Latin letters and Arabic numerals

Gene symbols should only contain Latin letters and Arabic numerals, e.g. NEAT1 (nuclear paraspeckle assembly transcript 1). Punctuation is not used and will generally be removed or replaced by a letter or number. The use of hyphens is limited to specific exceptions, such as genes named as antisense to protein-coding genes (discussed later), e.g. BACE1-AS (BACE1 antisense RNA).

Human gene symbols are all uppercase

By long-established convention, all human gene symbols are written in uppercase letters. This distinguishes them from rodent genes where only the first letter is uppercase and the rest lowercase. For instance the mouse gene Hotair is the ortholog of the human HOTAIR (HOX transcript antisense RNA) gene.

Symbols should not contain any reference to species

Symbols should not contain any reference to species, for example ‘H/h’ for human. The use of ‘human’ in gene names should also be avoided because approved human gene names are transferred across to homologous genes in other species, where ‘human’ would be potentially confusing and misleading.

Symbols should not spell out commonly used words

Whilst authors might be tempted to use commonly used words for gene symbols because they are easily recognized and pronounced, they should be avoided because they generate unnecessary confusion and make searching for information about a gene much more difficult. A good example of this is AIRN, which was first published as AIR [9]. A search with ‘AIR’ in PubMed returns more than 220,000 unrelated hits, whereas a search with the approved symbol ‘AIRN’ returns only the 10 publications specific to this gene. Other examples include EGO [10], since approved as EGOT (eosinophil granule ontogeny transcript), and PANDA [11] now PANDAR (promoter of CDKN1A antisense DNA damage activated RNA).

If possible, names should be based on function

Genes are preferentially named based on the function of the gene product. Examples include the well-known 'XIST' which is short for 'X (inactive)-specific transcript' because the transcript is involved in transcriptionally silencing one of the pair of X chromosomes, and more recently ‘TINCR’ [12] which stands for ‘tissue differentiation-inducing non-protein coding RNA’ because the product is required for epidermal tissue differentiation. If possible, the name of a gene should be based on the normal function of the gene product and not a mutant phenotype. Gene names should be concise and not attempt to represent all known information about a gene. The following are a few other things to consider in gene symbols and names: •Must not be offensive or pejorative •Must not be used to acknowledge individuals or places •Should not reference names of mythical, fictional, or historical figures •Should not be whimsical or impart no meaningful information about the gene

Functional transcribed pseudogenes should retain their pseudogene name

A small number of transcribed pseudogenes have now been shown to be functional, e.g. PTENP1 regulates levels of PTEN by binding to PTEN-targeting miRNA [13]. Transcribed pseudogenes with published function will retain their pseudogene nomenclature and not be renamed based on function; however, ‘(functional)’ is added to the end of the gene name so that these genes can be found in a search, e.g. the full name of PTENP1 is ‘phosphatase and tensin homolog pseudogene 1 (functional)’.

Naming genes with no known function

LncRNA genes with no known function are named pragmatically based on their genomic context. A schematic of the naming protocol is presented in Figure 1. This figure demonstrates how gene nomenclature can be applied in these instances but should not be used independently by researchers to generate lncRNA gene names with potentially different numbering to the approved HGNC names. If there is a proximal pc gene then the lncRNA genes are given a gene symbol beginning with the pc symbol and assigned a suffix according to whether they are: antisense (AS) e.g. BACE1-AS; intronic (IT) e.g. SPRY4-IT1; or overlapping (OT) e.g. SOX2-OT. Long intergenic lncRNAs (lincRNAs) that lie between pc gene loci are named with a common root symbol (LINC, ‘long intergenic non-coding RNA’) and an iterated, numerical suffix. The HGNC naming schema is consistent with the lncRNA categories annotated by GENCODE: antisense RNAs, sense intronic, sense overlapping, and lincRNA [14]. A new locus category is under consideration for lncRNAs that lie in a head-to-head orientation with a pc gene and hence putatively share a bidirectional promoter; the HGNC proposes naming these as antisense upstream (AU), e.g. GENE2-AU1. It should be noted that the HGNC does not approve names for splice variants so the two variant transcripts opposite GENE2 in Figure 1 are named as one lncRNA gene (GENE2-AS1). Also if an lncRNA gene encodes transcripts that span more than one protein-coding gene, then the first protein-coding gene from the 5′ end of the lncRNA is used to name it, e.g. GENE2-AS2 in Figure 1. This naming schema is applicable to most lncRNA genes but some lncRNA genes within gene dense regions may not fit into these discrete categories and require individual assessment by the HGNC (Additional file 1: Figure S1 shows the HGNC decision tree for naming lncRNAs with no known function).

Figure 1

A schematic summary of the nomenclature scheme for human long ncRNA genes of no known function.

Conclusions

Working together with the lncRNA community, the HGNC aims to provide informative names for all lncRNA genes in the human genome. The simple guidelines stated in this paper are intended to guide researchers, but the only way to approve a new lncRNA gene symbol is to contact the HGNC. For further information on lncRNA nomenclature please see the HGNC lncRNA webpage: and email us at hgnc@genenames.org.

Competing interests

The authors declare that they have no competing interests.

Additional file 1: Figure S1

HGNC decision tree for naming lncRNAs with unknown function. Click here for file

14 in total

1. Initial sequencing and analysis of the human genome.

Authors: E S Lander; L M Linton; B Birren; C Nusbaum; M C Zody; J Baldwin; K Devon; K Dewar; M Doyle; W FitzHugh; R Funke; D Gage; K Harris; A Heaford; J Howland; L Kann; J Lehoczky; R LeVine; P McEwan; K McKernan; J Meldrim; J P Mesirov; C Miranda; W Morris; J Naylor; C Raymond; M Rosetti; R Santos; A Sheridan; C Sougnez; Y Stange-Thomann; N Stojanovic; A Subramanian; D Wyman; J Rogers; J Sulston; R Ainscough; S Beck; D Bentley; J Burton; C Clee; N Carter; A Coulson; R Deadman; P Deloukas; A Dunham; I Dunham; R Durbin; L French; D Grafham; S Gregory; T Hubbard; S Humphray; A Hunt; M Jones; C Lloyd; A McMurray; L Matthews; S Mercer; S Milne; J C Mullikin; A Mungall; R Plumb; M Ross; R Shownkeen; S Sims; R H Waterston; R K Wilson; L W Hillier; J D McPherson; M A Marra; E R Mardis; L A Fulton; A T Chinwalla; K H Pepin; W R Gish; S L Chissoe; M C Wendl; K D Delehaunty; T L Miner; A Delehaunty; J B Kramer; L L Cook; R S Fulton; D L Johnson; P J Minx; S W Clifton; T Hawkins; E Branscomb; P Predki; P Richardson; S Wenning; T Slezak; N Doggett; J F Cheng; A Olsen; S Lucas; C Elkin; E Uberbacher; M Frazier; R A Gibbs; D M Muzny; S E Scherer; J B Bouck; E J Sodergren; K C Worley; C M Rives; J H Gorrell; M L Metzker; S L Naylor; R S Kucherlapati; D L Nelson; G M Weinstock; Y Sakaki; A Fujiyama; M Hattori; T Yada; A Toyoda; T Itoh; C Kawagoe; H Watanabe; Y Totoki; T Taylor; J Weissenbach; R Heilig; W Saurin; F Artiguenave; P Brottier; T Bruls; E Pelletier; C Robert; P Wincker; D R Smith; L Doucette-Stamm; M Rubenfield; K Weinstock; H M Lee; J Dubois; A Rosenthal; M Platzer; G Nyakatura; S Taudien; A Rump; H Yang; J Yu; J Wang; G Huang; J Gu; L Hood; L Rowen; A Madan; S Qin; R W Davis; N A Federspiel; A P Abola; M J Proctor; R M Myers; J Schmutz; M Dickson; J Grimwood; D R Cox; M V Olson; R Kaul; C Raymond; N Shimizu; K Kawasaki; S Minoshima; G A Evans; M Athanasiou; R Schultz; B A Roe; F Chen; H Pan; J Ramser; H Lehrach; R Reinhardt; W R McCombie; M de la Bastide; N Dedhia; H Blöcker; K Hornischer; G Nordsiek; R Agarwala; L Aravind; J A Bailey; A Bateman; S Batzoglou; E Birney; P Bork; D G Brown; C B Burge; L Cerutti; H C Chen; D Church; M Clamp; R R Copley; T Doerks; S R Eddy; E E Eichler; T S Furey; J Galagan; J G Gilbert; C Harmon; Y Hayashizaki; D Haussler; H Hermjakob; K Hokamp; W Jang; L S Johnson; T A Jones; S Kasif; A Kaspryzk; S Kennedy; W J Kent; P Kitts; E V Koonin; I Korf; D Kulp; D Lancet; T M Lowe; A McLysaght; T Mikkelsen; J V Moran; N Mulder; V J Pollara; C P Ponting; G Schuler; J Schultz; G Slater; A F Smit; E Stupka; J Szustakowki; D Thierry-Mieg; J Thierry-Mieg; L Wagner; J Wallis; R Wheeler; A Williams; Y I Wolf; K H Wolfe; S P Yang; R F Yeh; F Collins; M S Guyer; J Peterson; A Felsenfeld; K A Wetterstrand; A Patrinos; M J Morgan; P de Jong; J J Catanese; K Osoegawa; H Shizuya; S Choi; Y J Chen; J Szustakowki
Journal: Nature Date: 2001-02-15 Impact factor: 49.962

2. The HUGO Gene Nomenclature Committee (HGNC).

Authors: S Povey; R Lovering; E Bruford; M Wright; M Lush; H Wain
Journal: Hum Genet Date: 2001-10-24 Impact factor: 4.132

3. Suppression of progenitor differentiation requires the long noncoding RNA ANCR.

Authors: Markus Kretz; Dan E Webster; Ross J Flockhart; Carolyn S Lee; Ashley Zehnder; Vanessa Lopez-Pajares; Kun Qu; Grace X Y Zheng; Jennifer Chow; Grace E Kim; John L Rinn; Howard Y Chang; Zurab Siprashvili; Paul A Khavari
Journal: Genes Dev Date: 2012-02-02 Impact factor: 11.361

Review 4. Long noncoding RNAs in cell biology.

Authors: Michael B Clark; John S Mattick
Journal: Semin Cell Dev Biol Date: 2011-01-20 Impact factor: 7.727

Review 5. Long noncoding RNAs: past, present, and future.

Authors: Johnny T Y Kung; David Colognori; Jeannie T Lee
Journal: Genetics Date: 2013-03 Impact factor: 4.562

6. The imprinted antisense RNA at the Igf2r locus overlaps but does not imprint Mas1.

Authors: R Lyle; D Watanabe; D te Vruchte; W Lerchner; O W Smrzka; A Wutz; J Schageman; L Hahner; C Davies; D P Barlow
Journal: Nat Genet Date: 2000-05 Impact factor: 38.330

7. EGO, a novel, noncoding RNA gene, regulates eosinophil granule protein transcript expression.

Authors: Lori A Wagner; Clarissa J Christensen; Diane M Dunn; Gerald J Spangrude; Ann Georgelas; Linda Kelley; M Sean Esplin; Robert B Weiss; Gerald J Gleich
Journal: Blood Date: 2007-03-09 Impact factor: 22.113

8. GENCODE: the reference human genome annotation for The ENCODE Project.

Authors: Jennifer Harrow; Adam Frankish; Jose M Gonzalez; Electra Tapanari; Mark Diekhans; Felix Kokocinski; Bronwen L Aken; Daniel Barrell; Amonida Zadissa; Stephen Searle; If Barnes; Alexandra Bignell; Veronika Boychenko; Toby Hunt; Mike Kay; Gaurab Mukherjee; Jeena Rajan; Gloria Despacio-Reyes; Gary Saunders; Charles Steward; Rachel Harte; Michael Lin; Cédric Howald; Andrea Tanzer; Thomas Derrien; Jacqueline Chrast; Nathalie Walters; Suganthi Balasubramanian; Baikang Pei; Michael Tress; Jose Manuel Rodriguez; Iakes Ezkurdia; Jeltje van Baren; Michael Brent; David Haussler; Manolis Kellis; Alfonso Valencia; Alexandre Reymond; Mark Gerstein; Roderic Guigó; Tim J Hubbard
Journal: Genome Res Date: 2012-09 Impact factor: 9.043

Review 9. Naming 'junk': human non-protein coding RNA (ncRNA) gene nomenclature.

Authors: Mathew W Wright; Elspeth A Bruford
Journal: Hum Genomics Date: 2011-01 Impact factor: 4.639

10. A coding-independent function of gene and pseudogene mRNAs regulates tumour biology.

Authors: Laura Poliseno; Leonardo Salmena; Jiangwen Zhang; Brett Carver; William J Haveman; Pier Paolo Pandolfi
Journal: Nature Date: 2010-06-24 Impact factor: 49.962

53 in total

1. Transcription of lncRNA ACoS-AS1 is essential to trans-splicing between SlPsy1 and ACoS-AS1 that causes yellow fruit in tomato.

Authors: Yao Xiao; Baoshan Kang; Meng Li; Liangjun Xiao; Han Xiao; Huolin Shen; Wencai Yang
Journal: RNA Biol Date: 2020-02-02 Impact factor: 4.652

Review 2. The short and long of noncoding sequences in the control of vascular cell phenotypes.

Authors: Joseph M Miano; Xiaochun Long
Journal: Cell Mol Life Sci Date: 2015-05-29 Impact factor: 9.261

3. Discovery and annotation of long noncoding RNAs.

Authors: John S Mattick; John L Rinn
Journal: Nat Struct Mol Biol Date: 2015-01 Impact factor: 15.369

4. Identification of antisense transcripts of the microsomal triglyceride transfer protein genes in humans and mice.

Authors: Shuai Zhang; M Mahmood Hussain
Journal: Biochem Biophys Res Commun Date: 2019-07-25 Impact factor: 3.575

5. Navigating the dynamic landscape of long noncoding RNA and protein-coding gene annotations in GENCODE.

Authors: Saakshi Jalali; Shrey Gandhi; Vinod Scaria
Journal: Hum Genomics Date: 2016-10-28 Impact factor: 4.639

6. Different effects of long noncoding RNA NDRG1-OT1 fragments on NDRG1 transcription in breast cancer cells under hypoxia.

Authors: Ching-Ching Yeh; Jun-Liang Luo; Nam Nhut Phan; Yi-Chun Cheng; Lu-Ping Chow; Mong-Hsun Tsai; Eric Y Chuang; Liang-Chuan Lai
Journal: RNA Biol Date: 2018-12-04 Impact factor: 4.652

7. Identification of a novel lncRNA induced by the nephrotoxin ochratoxin A and expressed in human renal tumor tissue.

Authors: Mirjana Polovic; Sandro Dittmar; Isabell Hennemeier; Hans-Ulrich Humpf; Barbara Seliger; Paolo Fornara; Gerit Theil; Patrick Azinovic; Alexander Nolze; Marcel Köhn; Gerald Schwerdt; Michael Gekle
Journal: Cell Mol Life Sci Date: 2017-12-27 Impact factor: 9.261

8. A novel variable exonic region and differential expression of LINC00663 non-coding RNA in various cancer cell lines and normal human tissue samples.

Authors: Esra Bozgeyik; Yusuf Ziya Igci; Mevan F Sami Jacksi; Kaifee Arman; Serdar A Gurses; Ibrahim Bozgeyik; Elif Pala; Onder Yumrutas; Ebru Temiz; Mehri Igci
Journal: Tumour Biol Date: 2016-01-08

Review 9. Regulatory RNAs and control of epigenetic mechanisms: expectations for cognition and cognitive dysfunction.

Authors: Anderson A Butler; William M Webb; Farah D Lubin
Journal: Epigenomics Date: 2015-09-14 Impact factor: 4.778

Review 10. Emerging roles for long noncoding RNAs in skeletal biology and disease.

Authors: Nguyen P T Huynh; Britta A Anderson; Farshid Guilak; Audrey McAlinden
Journal: Connect Tissue Res Date: 2016-06-02 Impact factor: 3.417