| Literature DB >> 30357393 |
Adam Frankish1, Mark Diekhans2, Anne-Maud Ferreira3, Rory Johnson4,5, Irwin Jungreis6,7, Jane Loveland1, Jonathan M Mudge1, Cristina Sisu8,9, James Wright10, Joel Armstrong2, If Barnes1, Andrew Berry1, Alexandra Bignell1, Silvia Carbonell Sala11, Jacqueline Chrast3, Fiona Cunningham1, Tomás Di Domenico12, Sarah Donaldson1, Ian T Fiddes2, Carlos García Girón1, Jose Manuel Gonzalez1, Tiago Grego1, Matthew Hardy1, Thibaut Hourlier1, Toby Hunt1, Osagie G Izuogu1, Julien Lagarde11, Fergal J Martin1, Laura Martínez12, Shamika Mohanan1, Paul Muir13,14, Fabio C P Navarro8, Anne Parker1, Baikang Pei8, Fernando Pozo12, Magali Ruffier1, Bianca M Schmitt1, Eloise Stapleton1, Marie-Marthe Suner1, Irina Sycheva1, Barbara Uszczynska-Ratajczak15, Jinuri Xu8, Andrew Yates1, Daniel Zerbino1, Yan Zhang8,16, Bronwen Aken1, Jyoti S Choudhary10, Mark Gerstein8,17,18, Roderic Guigó11,19, Tim J P Hubbard20, Manolis Kellis6,7, Benedict Paten2, Alexandre Reymond3, Michael L Tress12, Paul Flicek1.
Abstract
The accurate identification and description of the genes in the human and mouse genomes is a fundamental requirement for high quality analysis of data informing both genome biology and clinical genomics. Over the last 15 years, the GENCODE consortium has been producing reference quality gene annotations to provide this foundational resource. The GENCODE consortium includes both experimental and computational biology groups who work together to improve and extend the GENCODE gene annotation. Specifically, we generate primary data, create bioinformatics tools and provide analysis to support the work of expert manual gene annotators and automated gene annotation pipelines. In addition, manual and computational annotation workflows use any and all publicly available data and analysis, along with the research literature to identify and characterise gene loci to the highest standard. GENCODE gene annotations are accessible via the Ensembl and UCSC Genome Browsers, the Ensembl FTP site, Ensembl Biomart, Ensembl Perl and REST APIs as well as https://www.gencodegenes.org.Entities:
Mesh:
Year: 2019 PMID: 30357393 PMCID: PMC6323946 DOI: 10.1093/nar/gky955
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.New and updated manually annotated genes and transcripts from July 2016 to June 2018. For both human (left) and mouse (right) the numbers of completely new genes and transcripts, updated genes and transcripts and the total number of manually added or edited genes and transcripts for each of four broad categories of annotation. A new gene annotation can represent a completely de novo locus with no overlap with pre-existing annotation or the reclassification of an existing complex locus into multiple loci to better represent the biology of the locus inferred from transcriptomic and/or proteomic data. A new transcript represents the annotation of a unique exon-intron structure, including novel alternative splicing at an annotated locus. Updated genes and transcripts represent pre-existing loci or transcript models that have been edited to improve the representation of biotype (e.g. changed from lncRNA to protein-coding) or structure (e.g. by extension, addition of novel exons).
Figure 2.Annotation statistics for human and mouse GENCODE releases from July 2016 to June 2018, encompassing human releases GENCODE 25–28 and mouse releases M10 to M18. The panels on the left show the total number of genes by broad biotype (protein-coding, lncRNA, pseudogene and sncRNA) for each release for human and mouse respectively and panels on the right show the total numbers of genes and transcripts of all biotypes.