Literature DB >> 33564398

HGNChelper: identification and correction of invalid gene symbols for human and mouse.

Sehyun Oh1,2, Jasmine Abdelnabi1,2, Ragheed Al-Dulaimi1,2,3, Ayush Aggarwal4,5, Marcel Ramos1,2, Sean Davis6, Markus Riester7, Levi Waldron1,2.   

Abstract

Gene symbols are recognizable identifiers for gene names but are unstable and error-prone due to aliasing, manual entry, and unintentional conversion by spreadsheets to date format. Official gene symbol resources such as HUGO Gene Nomenclature Committee (HGNC) for human genes and the Mouse Genome Informatics project (MGI) for mouse genes provide authoritative sources of valid, aliased, and outdated symbols, but lack a programmatic interface and correction of symbols converted by spreadsheets. We present HGNChelper, an R package that identifies known aliases and outdated gene symbols based on the HGNC human and MGI mouse gene symbol databases, in addition to common mislabeling introduced by spreadsheets, and provides corrections where possible. HGNChelper identified invalid gene symbols in the most recent Molecular Signatures Database (MSigDB 7.0) and in platform annotation files of the Gene Expression Omnibus, with prevalence ranging from ~3% in recent platforms to 30-40% in the earliest platforms from 2002-03. HGNChelper is installable from CRAN. Copyright:
© 2022 Oh S et al.

Entities:  

Keywords:  HGNC; MGI; gene symbols; molecular biology

Year:  2020        PMID: 33564398      PMCID: PMC7856679.2          DOI: 10.12688/f1000research.28033.2

Source DB:  PubMed          Journal:  F1000Res        ISSN: 2046-1402


  12 in total

1.  GEOquery: a bridge between the Gene Expression Omnibus (GEO) and BioConductor.

Authors:  Sean Davis; Paul S Meltzer
Journal:  Bioinformatics       Date:  2007-05-12       Impact factor: 6.937

2.  Molecular signatures database (MSigDB) 3.0.

Authors:  Arthur Liberzon; Aravind Subramanian; Reid Pinchback; Helga Thorvaldsdóttir; Pablo Tamayo; Jill P Mesirov
Journal:  Bioinformatics       Date:  2011-05-05       Impact factor: 6.937

3.  limma powers differential expression analyses for RNA-sequencing and microarray studies.

Authors:  Matthew E Ritchie; Belinda Phipson; Di Wu; Yifang Hu; Charity W Law; Wei Shi; Gordon K Smyth
Journal:  Nucleic Acids Res       Date:  2015-01-20       Impact factor: 16.971

4.  Guidelines for human gene nomenclature.

Authors:  Elspeth A Bruford; Bryony Braschi; Paul Denny; Tamsin E M Jones; Ruth L Seal; Susan Tweedie
Journal:  Nat Genet       Date:  2020-08       Impact factor: 38.330

5.  Structure of the GCN5 histone acetyltransferase bound to a bisubstrate inhibitor.

Authors:  Arienne N Poux; Marek Cebrat; Cheol M Kim; Philip A Cole; Ronen Marmorstein
Journal:  Proc Natl Acad Sci U S A       Date:  2002-10-21       Impact factor: 11.205

6.  Genenames.org: the HGNC and VGNC resources in 2017.

Authors:  Bethan Yates; Bryony Braschi; Kristian A Gray; Ruth L Seal; Susan Tweedie; Elspeth A Bruford
Journal:  Nucleic Acids Res       Date:  2016-10-30       Impact factor: 16.971

7.  Mouse Genome Database (MGD) 2019.

Authors:  Carol J Bult; Judith A Blake; Cynthia L Smith; James A Kadin; Joel E Richardson
Journal:  Nucleic Acids Res       Date:  2019-01-08       Impact factor: 16.971

8.  GeneSigDB--a curated database of gene expression signatures.

Authors:  Aedín C Culhane; Thomas Schwarzl; Razvan Sultana; Kermshlise C Picard; Shaita C Picard; Tim H Lu; Katherine R Franklin; Simon J French; Gerald Papenhausen; Mick Correll; John Quackenbush
Journal:  Nucleic Acids Res       Date:  2009-11-24       Impact factor: 16.971

Review 9.  Comparative meta-analysis of prognostic gene signatures for late-stage ovarian cancer.

Authors:  Levi Waldron; Benjamin Haibe-Kains; Aedín C Culhane; Markus Riester; Jie Ding; Xin Victoria Wang; Mahnaz Ahmadifar; Svitlana Tyekucheva; Christoph Bernau; Thomas Risch; Benjamin Frederick Ganzfried; Curtis Huttenhower; Michael Birrer; Giovanni Parmigiani
Journal:  J Natl Cancer Inst       Date:  2014-04-03       Impact factor: 11.816

10.  Mistaken identifiers: gene name errors can be introduced inadvertently when using Excel in bioinformatics.

Authors:  Barry R Zeeberg; Joseph Riss; David W Kane; Kimberly J Bussey; Edward Uchio; W Marston Linehan; J Carl Barrett; John N Weinstein
Journal:  BMC Bioinformatics       Date:  2004-06-23       Impact factor: 3.169

View more
  4 in total

1.  Comparative single-cell transcriptomes of dose and time dependent epithelial-mesenchymal spectrums.

Authors:  Nicholas Panchy; Kazuhide Watanabe; Masataka Takahashi; Andrew Willems; Tian Hong
Journal:  NAR Genom Bioinform       Date:  2022-09-21

2.  Network analysis of TCGA and GTEx gene expression datasets for identification of trait-associated biomarkers in human cancer.

Authors:  Huey-Miin Chen; Justin A MacDonald
Journal:  STAR Protoc       Date:  2022-02-07

3.  Gene name errors: Lessons not learned.

Authors:  Mandhri Abeysooriya; Megan Soria; Mary Sravya Kasu; Mark Ziemann
Journal:  PLoS Comput Biol       Date:  2021-07-30       Impact factor: 4.779

4.  Comparative Analysis of microRNA Binding Site Distribution and microRNA-Mediated Gene Expression Repression of Oncogenes and Tumor Suppressor Genes.

Authors:  Shuangmei Tian; Jing Wang; Fangyuan Zhang; Degeng Wang
Journal:  Genes (Basel)       Date:  2022-03-09       Impact factor: 4.096

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.