Saakshi Jalali1,2, Shrey Gandhi1, Vinod Scaria3,4. 1. GN Ramachandran Knowledge Center for Genome Informatics, CSIR Institute of Genomics and Integrative Biology (CSIR-IGIB), Mathura Road, Delhi, 110 025, India. 2. Academy of Scientific and Innovative Research (AcSIR), CSIR-IGIB South Campus, Mathura Road, Delhi, 110025, India. 3. GN Ramachandran Knowledge Center for Genome Informatics, CSIR Institute of Genomics and Integrative Biology (CSIR-IGIB), Mathura Road, Delhi, 110 025, India. vinods@igib.res.in. 4. Academy of Scientific and Innovative Research (AcSIR), CSIR-IGIB South Campus, Mathura Road, Delhi, 110025, India. vinods@igib.res.in.
Abstract
BACKGROUND: Our understanding of the transcriptional potential of the genome and its functional consequences has undergone a significant change in the last decade. This has been largely contributed by the improvements in technology which could annotate and in many cases functionally characterize a number of novel gene loci in the human genome. Keeping pace with advancements in this dynamic environment and being able to systematically annotate a compendium of genes and transcripts is indeed a formidable task. Of the many databases which attempted to systematically annotate the genome, GENCODE has emerged as one of the largest and popular compendium for human genome annotations. RESULTS: The analysis of various versions of GENCODE revealed that there was a constant upgradation of transcripts for both protein-coding and long noncoding RNA (lncRNAs) leading to conflicting annotations. The GENCODE version 24 accounts for 4.18 % of the human genome to be transcribed which is an increase of 1.58 % from its first version. Out of 2,51,614 transcripts annotated across GENCODE versions, only 21.7 % had consistency. We also examined GENCODE consortia categorized transcripts into 70 biotypes out of which only 17 remained stable throughout. CONCLUSIONS: In this report, we try to review the impact on the dynamicity with respect to gene annotations, specifically (lncRNA) annotations in GENCODE over the years. Our analysis suggests a significant dynamism in gene annotations, reflective of the evolution and consensus in nomenclature of genes. While a progressive change in annotations and timely release of the updates make the resource reliable in the community, the dynamicity with each release poses unique challenges to its users. Taking cues from other experiments with bio-curation, we propose potential avenues and methods to mend the gap.
BACKGROUND: Our understanding of the transcriptional potential of the genome and its functional consequences has undergone a significant change in the last decade. This has been largely contributed by the improvements in technology which could annotate and in many cases functionally characterize a number of novel gene loci in the human genome. Keeping pace with advancements in this dynamic environment and being able to systematically annotate a compendium of genes and transcripts is indeed a formidable task. Of the many databases which attempted to systematically annotate the genome, GENCODE has emerged as one of the largest and popular compendium for human genome annotations. RESULTS: The analysis of various versions of GENCODE revealed that there was a constant upgradation of transcripts for both protein-coding and long noncoding RNA (lncRNAs) leading to conflicting annotations. The GENCODE version 24 accounts for 4.18 % of the human genome to be transcribed which is an increase of 1.58 % from its first version. Out of 2,51,614 transcripts annotated across GENCODE versions, only 21.7 % had consistency. We also examined GENCODE consortia categorized transcripts into 70 biotypes out of which only 17 remained stable throughout. CONCLUSIONS: In this report, we try to review the impact on the dynamicity with respect to gene annotations, specifically (lncRNA) annotations in GENCODE over the years. Our analysis suggests a significant dynamism in gene annotations, reflective of the evolution and consensus in nomenclature of genes. While a progressive change in annotations and timely release of the updates make the resource reliable in the community, the dynamicity with each release poses unique challenges to its users. Taking cues from other experiments with bio-curation, we propose potential avenues and methods to mend the gap.
Entities:
Keywords:
Annotations; GENCODE; Long noncoding RNAs; Transcripts
Authors: Jennifer Harrow; France Denoeud; Adam Frankish; Alexandre Reymond; Chao-Kung Chen; Jacqueline Chrast; Julien Lagarde; James G R Gilbert; Roy Storey; David Swarbreck; Colette Rossier; Catherine Ucla; Tim Hubbard; Stylianos E Antonarakis; Roderic Guigo Journal: Genome Biol Date: 2006-08-07 Impact factor: 13.583
Authors: Kristian A Gray; Bethan Yates; Ruth L Seal; Mathew W Wright; Elspeth A Bruford Journal: Nucleic Acids Res Date: 2014-10-31 Impact factor: 19.160
Authors: Tao Xu; Chang-Ming Lin; Shu-Qi Cheng; Jie Min; Li Li; Xiao-Ming Meng; Cheng Huang; Lei Zhang; Zi-Yu Deng; Jun Li Journal: Mol Cancer Date: 2018-07-23 Impact factor: 27.401