Literature DB >> 17984084

The HGNC Database in 2008: a resource for the human genome.

Elspeth A Bruford¹, Michael J Lush, Mathew W Wright, Tam P Sneddon, Sue Povey, Ewan Birney.

Abstract

The HUGO Gene Nomenclature Committee (HGNC) aims to assign a unique and ideally meaningful name and symbol to every human gene. The HGNC database currently comprises over 24 000 public records containing approved human gene nomenclature and associated gene information. Following our recent relocation to the European Bioinformatics Institute our homepage can now be found at http://www.genenames.org, with direct links to the searchable HGNC database and other related database resources, such as the HCOP orthology search tool and manually curated gene family webpages.

Entities: Chemical Disease Gene Species

Mesh：

Year: 2007 PMID： 17984084 PMCID： PMC2238870 DOI： 10.1093/nar/gkm881

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

The HUGO Gene Nomenclature Committee (HGNC) was founded in 1977 by the Human Gene Mapping community to provide a single worldwide authority to assign unique, standardized and user-friendly gene symbols to human genes. Since 1996 the HGNC has been based at University College London, UK, but in 2007 the Committee relocated to the European Bioinformatics Institute on the Wellcome Trust Genome Campus near Cambridge, UK. The website for the HGNC can now be found at http://www.genenames.org; we ask all users to update their bookmarks to this new URL as our old website at www.gene.ucl.ac.uk/nomenclature is now offline. This site provides direct links to enable users to search and download information from the HGNC database (1), which currently contains over 24 000 public gene records, or ‘symbol reports’. The majority of these records represent protein-coding genes, though there are also records for pseudogenes, non-protein-coding RNA genes, phenotypes and a limited number of genomic features such as fragile sites. The primary identifier for each record is the current approved gene symbol, which is an acronym or abbreviation of the associated gene name. Each entry is also assigned a unique ‘HGNC ID’, which enables easy data tracking regardless of updates in the nomenclature of any given entry. Further data contained in each record include the chromosomal location of the gene, defining nucleotide sequences and publications, other symbols and names for the gene (aliases) and links to a variety of external resources.

ACCESSING THE HGNC DATABASE

The HGNC dataset can be accessed in a number of ways. Many users search and retrieve gene records using the online search facility; a simple search can be found on the new homepage at www.genenames.org. In addition an advanced search feature is located at http://www.genenames.org/cgi-bin/hgnc_search.pl, and allows the user to define up to four search terms from a variety of data fields, including approved symbol, approved name, alias symbol, alias name, previously approved name, chromosome and HGNC ID. Results can be displayed in html or text format, and sorted by approved symbol or chromosome. The HGNC dataset can also be accessed using our data downloads facility (http://www.genenames.org/data/gdlw_index.html). Along with providing standard ‘Core data’, ‘Core Data by Chromosome’ and ‘All Data’ datasets, the custom downloads feature is a web-based interface that allows users to: select columns of data for output as text or html; execute limited SQL queries; generate PHP and perl code; and save searches for future reference. Two new fields have been added to the public dataset recently: ‘Gene Family Name’ that indicates the name of the family or families a gene has been assigned to; and ‘Ensembl ID (mapped)’ derived from the current build of the Ensembl database (2) and provided by the Ensembl team.

LINKING TO THE HGNC DATABASE

It is still very easy to link directly to a specific HGNC symbol report. In line with our new domain name, URLs of the form http://www.genenames.org/data/hgnc_data.php?match=ABCA1 link directly via the approved gene symbol; however, we recommend users link directly to records using the HGNC ID, in the format http://www.genenames.org/data/hgnc_data.php?hgnc_id=29, as this will allow links to be maintained if the approved gene symbol changes. Standard symbol reports include nine fields: approved symbol, approved name, HGNC ID, status of the record (‘approved’, ‘symbol withdrawn’ for previously approved entries, or ‘entry withdrawn’ for entries that are no longer thought to exist), chromosomal location, previous symbols, previous names, aliases and name aliases.

LINKS TO EXTERNAL DATABASES FROM THE HGNC DATABASE

Symbol reports also contain links to established genome resources via both HGNC-curated data and mapped data provided by the external database; each field is labelled to distinguish curated from mapped data. RefSeq (3) IDs and International Nucleotide Sequence Database accessions are used to link out to GenBank (4) and the UCSC Browser and Gene Index (5). Entrez Gene (6) IDs take the user to the relevant entry in the NCBI's Gene database or Map Viewer (7). Curated PubMed (7) IDs link to specific publications in PubMed, and OMIM (8) IDs to OMIM records. Mapped UniProt (9) IDs link out to SwissProt and UniProt, and recently included Ensembl IDs take the user directly to the Ensembl GeneView (2) for the gene in question. Basic links that query external databases using the approved gene symbol are also provided at the bottom of each symbol report, and these link to GENATLAS (10), GeneCards (11), HCOP (12), GeneClinics/GeneTests (13), Vega (14) and Treefam (15). Over the last two years, the HGNC has been actively developing reciprocal links with databases specializing in specific gene (or RNA) families or groupings. This both broadens the range of resources available to the community via our symbol report pages, and additionally provides publicity for useful resources that may otherwise be overlooked in a casual search. The majority of our specialist database links, listed in Table 1, are manually curated by the HGNC team, though some (e.g. the KZNF Gene Catalog and IUPHAR) are automatically mapped from download files provided by the specialist database.

Table 1.

List of specialist database links in the HGNC database

Database	URL	Number of links
microRNA sequence database (miRBase) (16)	http://microrna.sanger.ac.uk/	472
Human Olfactory Receptor Data Exploratorium (HORDE) (11)	http://bioportal.weizmann.ac.il/HORDE/	857
Human Cell Differentiation Molecules (17)	http://www.hcdm.org/	363
RNA families database (Rfam) (18)	http://www.sanger.ac.uk/Software/Rfam/	62
snoRNABase (Database of human H/ACA and C/D box snoRNAs) (19)	http://www-snorna.biotoul.fr/	372
Lawrence Livermore National Laboratory Human KZNF Gene Catalog (LLNL) (20)	http://znf.llnl.gov/catalog/	462
Intermediate Filament Database	http://www.interfil.org/	69
IUPHAR Database	http://www.iuphar-db.org/	188
ImMunoGeneTics information system (IMGT) (21)	http://imgt.cines.fr/	660
MEROPS (the peptidase database) (22)	http://merops.sanger.ac.uk/	648

List of specialist database links in the HGNC database

GENE FAMILY RESOURCES

Since 2006 we have significantly expanded our resources for specific subsets of genes, either related by function, location or phenotype (gene groupings) or by sequence similarity (gene families) (see http://www.genenames.org/genefamily.html). Assignment of genes into gene families or groupings is based on sequence analyses, publications, information from specialist advisors for specific families and from other databases. For gene family members, we strongly encourage the use of a stem (or root) symbol as a basis for a hierarchical series that allows the easy identification of other related members in both database searches and the literature. HGNC currently provide over 170 manually curated webpages dedicated to individual gene families or groupings, as well as listing over 60 links to externally managed family/grouping resources. If you would like us to create webpages for other specific gene families, or include links to external gene family pages or resources, please contact us.

ORTHOLOGY RESOURCES

Orthologs are genes in different species that derive from a common ancestor and generally share the same function. The utility of standardized orthologous gene names is perhaps one of the strongest arguments for approved nomenclature and cooperation between nomenclature committees, and without this resource the analysis of genomes would be made far more difficult. We closely coordinate our efforts with the Mouse Genome Informatics (MGI) (23) Nomenclature Group and endeavour to approve the equivalent gene symbol for each human/mouse ortholog pair (e.g. human ACOT1 and mouse Acot1). As part of the nomenclature assignment we research the orthology for each human gene and then add the corresponding MGI ID for the orthologous mouse gene to our database, thereby associating each human gene with its mouse ortholog. These hand-curated MGI IDs are displayed in the gene symbol report as a hyperlink direct to the relevant MGI database (23) entry. The HGNC Comparison of Orthology Predictions search tool, HCOP (http://www.genenames.org/hcop), enables users to compare orthologs predicted for a specified human gene, or set of human genes (12,24). HCOP shows orthology predictions between human and seven other genomes (mouse, rat, chimp, dog, chicken, zebrafish and fruitfly), and currently includes data from Ensembl (2), Evola from the H-Invitational database (25), HGNC, HomoloGene (6), Inparanoid (26), MGI (23), PhIGS (27), PhyOP (28), Treefam (15) and ZFIN (29). Users can assess the reliability of the prediction from the number of these different sources that identify a particular orthologous pair. For ease of use, search terms can be either an approved symbol (e.g. ACOT1), a term from an approved gene name (e.g. ‘thioesterase’), an Entrez Gene ID (e.g. 641371), HGNC ID or MGI ID (e.g. HGNC: 33128 or MGI: 1349396), or a RefSeq accession (e.g. NM_001037161). We recently updated HCOP to include a reciprocal orthology search link, using the Entrez Gene ID from the orthologous gene to identify human orthologs. In addition to the orthology predictions, the data returned includes the official nomenclatures, DNA sequences, database identifiers, aliases and chromosomal locations for each putative ortholog pair. We plan to expand this resource to include other species and orthology prediction databases.

VARIATION RESOURCES

In recent years, it has been shown that an increasing number of genes that were originally assumed to be single copy in the human genome are actually copy number variant (CNV) between individuals. Following consultation with the research community, and to complement the introduction of these data into the major genome databases, the HGNC decided it was vital to establish a copy number variant gene nomenclature system that would be flexible, dynamic and most importantly accepted and used by the research community. Hence we are in the process of populating our database with CNV genes and associated nomenclature, using published data taken from the Database of Genomics Variants (30). To display this information in a useful and easily accessed format, we will be implementing a hierarchical structure within the HGNC database that will be public by 2008. This will allow users to link from a standard symbol report to sub-entries containing nomenclature and sequence data for each CNV copy. In addition to copy number variant genes, this new hierarchical database structure will also allow us to capture and represent information concerning other types of genomic variation, including complex allelic gene loci such as the immunoglobulins, T-cell receptors and protocadherins, and read-through/chimaeric transcripts.

FUTURE DIRECTIONS

We are planning to develop an HGNC data mining interface based on the BioMart (31) infrastructure. This will allow standalone data mining of the HGNC dataset and will be easily linked to other BioMart instances, including HapMap (32), Reactome (33) and Ensembl (2). We are also aiming to increase the proportion of curated links to external resources and welcome suggestions for further resources we could be linking to. To be notified of future developments in the HGNC database and website, please subscribe to our newsletter by emailing hgnc@genenames.org with the subject line ‘subscribe’.

FEEDBACK

We welcome your feedback on any aspect of our work, including specific gene symbols and names. Please click on the ‘feedback’ link on our homepage to send us your comments and/or suggestions. Users can now also submit data directly to the HGNC using our online Gene Symbol Request Form (http://www.genenames.org/cgi-bin/hgnc_request.pl). This facility can be used to enquire if approved nomenclature has been assigned to a gene, to request an update in the nomenclature of a named gene, or to request nomenclature for a gene or copy number variant that currently does not yet have an approved gene nomenclature.

CITATION

Authors are requested to cite this article and the database in the following format: ‘The HGNC Database, HUGO Gene Nomenclature Committee (HGNC), European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK (URL: http://www.genenames.org/)’. [Include month and year in which you retrieved the data cited.]

33 in total

1. HCOP: the HGNC comparison of orthology predictions search tool.

Authors: Mathew W Wright; Tina A Eyre; Michael J Lush; Sue Povey; Elspeth A Bruford
Journal: Mamm Genome Date: 2005-11-11 Impact factor: 2.957

2. The International HapMap Project Web site.

Authors: Gudmundur A Thorisson; Albert V Smith; Lalitha Krishnan; Lincoln D Stein
Journal: Genome Res Date: 2005-11 Impact factor: 9.043

Review 3. Genatlas database, genes and development defects.

Authors: J Frézal
Journal: C R Acad Sci III Date: 1998-10

4. Human Gene-Centric Databases at the Weizmann Institute of Science: GeneCards, UDB, CroW 21 and HORDE.

Authors: Marilyn Safran; Vered Chalifa-Caspi; Orit Shmueli; Tsviya Olender; Michal Lapidot; Naomi Rosen; Michael Shmoish; Yakov Peter; Gustavo Glusman; Ester Feldmesser; Avital Adato; Inga Peter; Miriam Khen; Tal Atarot; Yoram Groner; Doron Lancet
Journal: Nucleic Acids Res Date: 2003-01-01 Impact factor: 16.971

5. CD molecules 2005: human cell differentiation molecules.

Authors: Heddy Zola; Bernadette Swart; Ian Nicholson; Bent Aasted; Armand Bensussan; Laurence Boumsell; Chris Buckley; Georgina Clark; Karel Drbal; Pablo Engel; Derek Hart; Václav Horejsí; Clare Isacke; Peter Macardle; Fabio Malavasi; David Mason; Daniel Olive; Armin Saalmueller; Stuart F Schlossman; Reinhard Schwartz-Albiez; Paul Simmons; Thomas F Tedder; Mariagrazia Uguccioni; Hilary Warren
Journal: Blood Date: 2005-07-14 Impact factor: 22.113

6. Mendelian Inheritance in Man and its online version, OMIM.

Authors: Victor A McKusick
Journal: Am J Hum Genet Date: 2007-03-08 Impact factor: 11.025

7. Rfam: annotating non-coding RNAs in complete genomes.

Authors: Sam Griffiths-Jones; Simon Moxon; Mhairi Marshall; Ajay Khanna; Sean R Eddy; Alex Bateman
Journal: Nucleic Acids Res Date: 2005-01-01 Impact factor: 16.971

8. The Vertebrate Genome Annotation (Vega) database.

Authors: J L Ashurst; C-K Chen; J G R Gilbert; K Jekosch; S Keenan; P Meidl; S M Searle; J Stalker; R Storey; S Trevanion; L Wilming; T Hubbard
Journal: Nucleic Acids Res Date: 2005-01-01 Impact factor: 16.971

9. Inparanoid: a comprehensive database of eukaryotic orthologs.

Authors: Kevin P O'Brien; Maido Remm; Erik L L Sonnhammer
Journal: Nucleic Acids Res Date: 2005-01-01 Impact factor: 16.971

10. Integrative annotation of 21,037 human genes validated by full-length cDNA clones.

Authors: Tadashi Imanishi; Takeshi Itoh; Yutaka Suzuki; Claire O'Donovan; Satoshi Fukuchi; Kanako O Koyanagi; Roberto A Barrero; Takuro Tamura; Yumi Yamaguchi-Kabata; Motohiko Tanino; Kei Yura; Satoru Miyazaki; Kazuho Ikeo; Keiichi Homma; Arek Kasprzyk; Tetsuo Nishikawa; Mika Hirakawa; Jean Thierry-Mieg; Danielle Thierry-Mieg; Jennifer Ashurst; Libin Jia; Mitsuteru Nakao; Michael A Thomas; Nicola Mulder; Youla Karavidopoulou; Lihua Jin; Sangsoo Kim; Tomohiro Yasuda; Boris Lenhard; Eric Eveno; Yoshiyuki Suzuki; Chisato Yamasaki; Jun-ichi Takeda; Craig Gough; Phillip Hilton; Yasuyuki Fujii; Hiroaki Sakai; Susumu Tanaka; Clara Amid; Matthew Bellgard; Maria de Fatima Bonaldo; Hidemasa Bono; Susan K Bromberg; Anthony J Brookes; Elspeth Bruford; Piero Carninci; Claude Chelala; Christine Couillault; Sandro J de Souza; Marie-Anne Debily; Marie-Dominique Devignes; Inna Dubchak; Toshinori Endo; Anne Estreicher; Eduardo Eyras; Kaoru Fukami-Kobayashi; Gopal R Gopinath; Esther Graudens; Yoonsoo Hahn; Michael Han; Ze-Guang Han; Kousuke Hanada; Hideki Hanaoka; Erimi Harada; Katsuyuki Hashimoto; Ursula Hinz; Momoki Hirai; Teruyoshi Hishiki; Ian Hopkinson; Sandrine Imbeaud; Hidetoshi Inoko; Alexander Kanapin; Yayoi Kaneko; Takeya Kasukawa; Janet Kelso; Paul Kersey; Reiko Kikuno; Kouichi Kimura; Bernhard Korn; Vladimir Kuryshev; Izabela Makalowska; Takashi Makino; Shuhei Mano; Regine Mariage-Samson; Jun Mashima; Hideo Matsuda; Hans-Werner Mewes; Shinsei Minoshima; Keiichi Nagai; Hideki Nagasaki; Naoki Nagata; Rajni Nigam; Osamu Ogasawara; Osamu Ohara; Masafumi Ohtsubo; Norihiro Okada; Toshihisa Okido; Satoshi Oota; Motonori Ota; Toshio Ota; Tetsuji Otsuki; Dominique Piatier-Tonneau; Annemarie Poustka; Shuang-Xi Ren; Naruya Saitou; Katsunaga Sakai; Shigetaka Sakamoto; Ryuichi Sakate; Ingo Schupp; Florence Servant; Stephen Sherry; Rie Shiba; Nobuyoshi Shimizu; Mary Shimoyama; Andrew J Simpson; Bento Soares; Charles Steward; Makiko Suwa; Mami Suzuki; Aiko Takahashi; Gen Tamiya; Hiroshi Tanaka; Todd Taylor; Joseph D Terwilliger; Per Unneberg; Vamsi Veeramachaneni; Shinya Watanabe; Laurens Wilming; Norikazu Yasuda; Hyang-Sook Yoo; Marvin Stodolsky; Wojciech Makalowski; Mitiko Go; Kenta Nakai; Toshihisa Takagi; Minoru Kanehisa; Yoshiyuki Sakaki; John Quackenbush; Yasushi Okazaki; Yoshihide Hayashizaki; Winston Hide; Ranajit Chakraborty; Ken Nishikawa; Hideaki Sugawara; Yoshio Tateno; Zhu Chen; Michio Oishi; Peter Tonellato; Rolf Apweiler; Kousaku Okubo; Lukas Wagner; Stefan Wiemann; Robert L Strausberg; Takao Isogai; Charles Auffray; Nobuo Nomura; Takashi Gojobori; Sumio Sugano
Journal: PLoS Biol Date: 2004-04-20 Impact factor: 8.029

102 in total

1. Serotonin, via HTR2 receptors, excites neurons in a cortical-like premotor nucleus necessary for song learning and production.

Authors: William E Wood; Peter V Lovell; Claudio V Mello; David J Perkel
Journal: J Neurosci Date: 2011-09-28 Impact factor: 6.167

2. The consensus coding sequence (CCDS) project: Identifying a common protein-coding gene set for the human and mouse genomes.

Authors: Kim D Pruitt; Jennifer Harrow; Rachel A Harte; Craig Wallin; Mark Diekhans; Donna R Maglott; Steve Searle; Catherine M Farrell; Jane E Loveland; Barbara J Ruef; Elizabeth Hart; Marie-Marthe Suner; Melissa J Landrum; Bronwen Aken; Sarah Ayling; Robert Baertsch; Julio Fernandez-Banet; Joshua L Cherry; Val Curwen; Michael Dicuccio; Manolis Kellis; Jennifer Lee; Michael F Lin; Michael Schuster; Andrew Shkeda; Clara Amid; Garth Brown; Oksana Dukhanina; Adam Frankish; Jennifer Hart; Bonnie L Maidak; Jonathan Mudge; Michael R Murphy; Terence Murphy; Jeena Rajan; Bhanu Rajput; Lillian D Riddick; Catherine Snow; Charles Steward; David Webb; Janet A Weber; Laurens Wilming; Wenyu Wu; Ewan Birney; David Haussler; Tim Hubbard; James Ostell; Richard Durbin; David Lipman
Journal: Genome Res Date: 2009-06-04 Impact factor: 9.043

Review 3. Localization and targeting of voltage-dependent ion channels in mammalian central neurons.

Authors: Helene Vacher; Durga P Mohapatra; James S Trimmer
Journal: Physiol Rev Date: 2008-10 Impact factor: 37.312

Review 4. Genome and proteome annotation: organization, interpretation and integration.

Authors: Gabrielle A Reeves; David Talavera; Janet M Thornton
Journal: J R Soc Interface Date: 2009-02-06 Impact factor: 4.118

Review 5. Recent progress in automatically extracting information from the pharmacogenomic literature.

Authors: Yael Garten; Adrien Coulet; Russ B Altman
Journal: Pharmacogenomics Date: 2010-10 Impact factor: 2.533

6. VisANT: an integrative framework for networks in systems biology.

Authors: Zhenjun Hu; Evan S Snitkin; Charles DeLisi
Journal: Brief Bioinform Date: 2008-05-07 Impact factor: 11.622

7. The UCSC Genome Browser.

Authors: Donna Karolchik; Angie S Hinrichs; W James Kent
Journal: Curr Protoc Bioinformatics Date: 2009-12

8. Petabyte-scale innovations at the European Nucleotide Archive.

Authors: Guy Cochrane; Ruth Akhtar; James Bonfield; Lawrence Bower; Fehmi Demiralp; Nadeem Faruque; Richard Gibson; Gemma Hoad; Tim Hubbard; Christopher Hunter; Mikyung Jang; Szilveszter Juhos; Rasko Leinonen; Steven Leonard; Quan Lin; Rodrigo Lopez; Dariusz Lorenc; Hamish McWilliam; Gaurab Mukherjee; Sheila Plaister; Rajesh Radhakrishnan; Stephen Robinson; Siamak Sobhany; Petra Ten Hoopen; Robert Vaughan; Vadim Zalunin; Ewan Birney
Journal: Nucleic Acids Res Date: 2008-10-31 Impact factor: 16.971

9. IMGT, the international ImMunoGeneTics information system.

Authors: Marie-Paule Lefranc; Véronique Giudicelli; Chantal Ginestoux; Joumana Jabado-Michaloud; Géraldine Folch; Fatena Bellahcene; Yan Wu; Elodie Gemrot; Xavier Brochet; Jérôme Lane; Laetitia Regnier; François Ehrenmann; Gérard Lefranc; Patrice Duroux
Journal: Nucleic Acids Res Date: 2008-10-31 Impact factor: 16.971

10. The database of experimentally supported targets: a functional update of TarBase.

Authors: Giorgos L Papadopoulos; Martin Reczko; Victor A Simossis; Praveen Sethupathy; Artemis G Hatzigeorgiou
Journal: Nucleic Acids Res Date: 2008-10-27 Impact factor: 16.971