| Literature DB >> 28109431 |
A McAfee1, L J Foster2.
Abstract
Massively parallel sequencing is revealing species genomes faster than ever before, but the value of the raw sequence is limited unless the genes can be accurately annotated. This is typically achieved using gene prediction algorithms which, despite continual improvement, still require substantial verification and refinement. For example, in silico methods struggle with annotating splice isoforms accurately and empirical methods are needed to refine and verify the initial bioinformatic gene predictions. RNA-seq is an excellent way to confirm exon-exon boundaries and transcript termini, while mass spectrometry (MS) offers definitive proof that a gene is translated and a secondary means of confirming exon expression, protein termini, and posttranslational modifications. Furthermore, both methods can potentially identify entirely novel genes that were missed by conventional gene predictors. This chapter describes a proteogenomics procedure using information from the proteome, transcriptome, and genome-thus utilizing each component of the central dogma-to annotate genetic elements in eukaryotes. We also discuss gene modeling, integration of RNA-seq and MS data, minimizing false discoveries, proteogenomics software, functional annotation, and sequence validation. We hope that the procedure described here will assist efforts to annotate the genomes of newly sequenced species, as well as sharpen those that have been annotated in the past.Keywords: Annotation; Gene models; Mass spectrometry; Proteogenomics; Proteomics; RNA-seq
Mesh:
Year: 2016 PMID: 28109431 DOI: 10.1016/bs.mie.2016.09.020
Source DB: PubMed Journal: Methods Enzymol ISSN: 0076-6879 Impact factor: 1.600