Literature DB >> 23572138

Comprehensively identifying and characterizing the missing gene sequences in human reference genome with integrated analytic approaches.

Geng Chen1, Charles Wang, Leming Shi, Weida Tong, Xiongfei Qu, Jiwei Chen, Jianmin Yang, Caiping Shi, Long Chen, Peiying Zhou, Bingxin Lu, Tieliu Shi.   

Abstract

The human reference genome is still incomplete and a number of gene sequences are missing from it. The approaches to uncover them, the reasons causing their absence and their functions are less explored. Here, we comprehensively identified and characterized the missing genes of human reference genome with RNA-Seq data from 16 different human tissues. By using a combined approach of genome-guided transcriptome reconstruction coupled with genome-wide comparison, we uncovered 3.78 and 2.37 Mb transcribed regions in the human genome assemblies of Celera and HuRef either missed from their homologous chromosomes of NCBI human reference genome build 37.2 or partially or entirely absent from the reference. We further identified a significant number of novel transcript contigs in each tissue from de novo transcriptome assembly that are unalignable to NCBI build 37.2 but can be aligned to at least one of the genomes from Celera, HuRef, chimpanzee, macaca or mouse. Our analyses indicate that the missing genes could result from genome misassembly, transposition, copy number variation, translocation and other structural variations. Moreover, our results further suggest that a large portion of these missing genes are conserved between human and other mammals, implying their important biological functions. Totally, 1,233 functional protein domains were detected in these missing genes. Collectively, our study not only provides approaches for uncovering the missing genes of a genome, but also proposes the potential reasons causing genes missed from the genome and highlights the importance of uncovering the missing genes of incomplete genomes.

Entities:  

Mesh:

Substances:

Year:  2013        PMID: 23572138     DOI: 10.1007/s00439-013-1300-9

Source DB:  PubMed          Journal:  Hum Genet        ISSN: 0340-6717            Impact factor:   4.132


  46 in total

1.  BLAT--the BLAST-like alignment tool.

Authors:  W James Kent
Journal:  Genome Res       Date:  2002-04       Impact factor: 9.043

Review 2.  Overview of available methods for diverse RNA-Seq data analyses.

Authors:  Geng Chen; Charles Wang; Tieliu Shi
Journal:  Sci China Life Sci       Date:  2012-01-07       Impact factor: 6.038

3.  De novo transcriptome assembly of RNA-Seq reads with different strategies.

Authors:  Geng Chen; Kangping Yin; Charles Wang; Tieliu Shi
Journal:  Sci China Life Sci       Date:  2012-01-07       Impact factor: 6.038

4.  De novo assembly and analysis of RNA-seq data.

Authors:  Gordon Robertson; Jacqueline Schein; Readman Chiu; Richard Corbett; Matthew Field; Shaun D Jackman; Karen Mungall; Sam Lee; Hisanaga Mark Okada; Jenny Q Qian; Malachi Griffith; Anthony Raymond; Nina Thiessen; Timothee Cezard; Yaron S Butterfield; Richard Newsome; Simon K Chan; Rong She; Richard Varhol; Baljit Kamoh; Anna-Liisa Prabhu; Angela Tam; YongJun Zhao; Richard A Moore; Martin Hirst; Marco A Marra; Steven J M Jones; Pamela A Hoodless; Inanc Birol
Journal:  Nat Methods       Date:  2010-10-10       Impact factor: 28.547

Review 5.  Structural variation in the human genome.

Authors:  Lars Feuk; Andrew R Carson; Stephen W Scherer
Journal:  Nat Rev Genet       Date:  2006-02       Impact factor: 53.242

Review 6.  Computational methods for transcriptome annotation and quantification using RNA-seq.

Authors:  Manuel Garber; Manfred G Grabherr; Mitchell Guttman; Cole Trapnell
Journal:  Nat Methods       Date:  2011-05-27       Impact factor: 28.547

7.  Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses.

Authors:  Moran N Cabili; Cole Trapnell; Loyal Goff; Magdalena Koziol; Barbara Tazon-Vega; Aviv Regev; John L Rinn
Journal:  Genes Dev       Date:  2011-09-02       Impact factor: 11.361

Review 8.  RNA-Seq: a revolutionary tool for transcriptomics.

Authors:  Zhong Wang; Mark Gerstein; Michael Snyder
Journal:  Nat Rev Genet       Date:  2009-01       Impact factor: 53.242

9.  The Pfam protein families database.

Authors:  Marco Punta; Penny C Coggill; Ruth Y Eberhardt; Jaina Mistry; John Tate; Chris Boursnell; Ningze Pang; Kristoffer Forslund; Goran Ceric; Jody Clements; Andreas Heger; Liisa Holm; Erik L L Sonnhammer; Sean R Eddy; Alex Bateman; Robert D Finn
Journal:  Nucleic Acids Res       Date:  2011-11-29       Impact factor: 16.971

10.  The diploid genome sequence of an individual human.

Authors:  Samuel Levy; Granger Sutton; Pauline C Ng; Lars Feuk; Aaron L Halpern; Brian P Walenz; Nelson Axelrod; Jiaqi Huang; Ewen F Kirkness; Gennady Denisov; Yuan Lin; Jeffrey R MacDonald; Andy Wing Chun Pang; Mary Shago; Timothy B Stockwell; Alexia Tsiamouri; Vineet Bafna; Vikas Bansal; Saul A Kravitz; Dana A Busam; Karen Y Beeson; Tina C McIntosh; Karin A Remington; Josep F Abril; John Gill; Jon Borman; Yu-Hui Rogers; Marvin E Frazier; Stephen W Scherer; Robert L Strausberg; J Craig Venter
Journal:  PLoS Biol       Date:  2007-09-04       Impact factor: 8.029

View more
  6 in total

Review 1.  Sequencing XMET genes to promote genotype-guided risk assessment and precision medicine.

Authors:  Yaqiong Jin; Geng Chen; Wenming Xiao; Huixiao Hong; Joshua Xu; Yongli Guo; Wenzhong Xiao; Tieliu Shi; Leming Shi; Weida Tong; Baitang Ning
Journal:  Sci China Life Sci       Date:  2019-05-20       Impact factor: 6.038

2.  Comprehensive phylogeny of Konosirus punctatus (Clupeiformes: Clupeidae) based on transcriptomic data.

Authors:  Fangrui Lou; Shengyao Qiu; Yongzheng Tang; Zhiyang Wang; Lei Wang
Journal:  Biosci Rep       Date:  2021-05-28       Impact factor: 3.840

3.  Dissecting the Characteristics and Dynamics of Human Protein Complexes at Transcriptome Cascade Using RNA-Seq Data.

Authors:  Geng Chen; Jiwei Chen; Caiping Shi; Leming Shi; Weida Tong; Tieliu Shi
Journal:  PLoS One       Date:  2013-06-18       Impact factor: 3.240

4.  Re-annotation of presumed noncoding disease/trait-associated genetic variants by integrative analyses.

Authors:  Geng Chen; Dianke Yu; Jiwei Chen; Ruifang Cao; Juan Yang; Huan Wang; Xiangjun Ji; Baitang Ning; Tieliu Shi
Journal:  Sci Rep       Date:  2015-03-30       Impact factor: 4.379

5.  Identification of Tissue-Specific Protein-Coding and Noncoding Transcripts across 14 Human Tissues Using RNA-seq.

Authors:  Jinhang Zhu; Geng Chen; Sibo Zhu; Suqing Li; Zhuo Wen; Yuanting Zheng; Leming Shi
Journal:  Sci Rep       Date:  2016-06-22       Impact factor: 4.379

6.  Discrepancies between human DNA, mRNA and protein reference sequences and their relation to single nucleotide variants in the human population.

Authors:  Matsuyuki Shirota; Kengo Kinoshita
Journal:  Database (Oxford)       Date:  2016-09-01       Impact factor: 3.451

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.