| Literature DB >> 27493475 |
Steffen Klasberg1, Tristan Bitard-Feildel1, Ludovic Mallet1.
Abstract
While it has long been thought that all genomic novelties are derived from the existing material, many genes lacking homology to known genes were found in recent genome projects. Some of these novel genes were proposed to have evolved de novo, ie, out of noncoding sequences, whereas some have been shown to follow a duplication and divergence process. Their discovery called for an extension of the historical hypotheses about gene origination. Besides the theoretical breakthrough, increasing evidence accumulated that novel genes play important roles in evolutionary processes, including adaptation and speciation events. Different techniques are available to identify genes and classify them as novel. Their classification as novel is usually based on their similarity to known genes, or lack thereof, detected by comparative genomics or against databases. Computational approaches are further prime methods that can be based on existing models or leveraging biological evidences from experiments. Identification of novel genes remains however a challenging task. With the constant software and technologies updates, no gold standard, and no available benchmark, evaluation and characterization of genomic novelty is a vibrant field. In this review, the classical and state-of-the-art tools for gene prediction are introduced. The current methods for novel gene detection are presented; the methodological strategies and their limits are discussed along with perspective approaches for further studies.Entities:
Keywords: de novo genes; evolutionary genomics; gene detection; novel domains; novel genes
Year: 2016 PMID: 27493475 PMCID: PMC4970615 DOI: 10.4137/BBI.S39950
Source DB: PubMed Journal: Bioinform Biol Insights ISSN: 1177-9322
Figure 1Definition of novel genes based on phylogeny. Circles in the tree show gain of orphan (gray), novel (red), and de novo (green) genes. (A) Ancestral genome, ancestral genes are widely spread on the phylogenetic tree. (b) Ancestral acquisition of a gene, the orphan character is defined by its uniqueness among all currently known sequences. (C) Acquisition of a gene, the novel character is defined by hitherto absence on a considered phylogeny and is attributed by experts of the local taxonomy. (d) A genomic fragment gains transcriptional activity, constituting a de novo gene.
List of tools and methods used for novel gene prediction.
| METHOD | COMP. EFFORT | INPUT | COMMENTS |
|---|---|---|---|
| Primary methods RNA-Seq Mapping | r G | RNA-Seq experiment; Transcribed sequences Mapped to a reference genome | |
| RNA-Seq | r | RNA-Seq experiment; Transcribed sequences | |
| Ribosome profiling | r | RNA-Seq on ribosomes, likely translated | |
| Proteogenomics | RP | Mass spectrometry of peptides | |
| D | Gene structure prediction based on homology | ||
| Projector/Genewise | eg, Emboss” getorf, OrfPredictor | ||
| Simple ORF finding | D | Based on Codon usage | |
| SORFIND/Genview | D | Exon prediction partly based on Hexamer/K-mer frequency | |
| Gene mark | D | Hidden Markov Model based coding region prediction | |
| Eugene | D R | Gene prediction pipeline using Hidden Markov Models | |
| SPLICEVIEW | D | Splice site prediction based on homology | |
| NetGene2 | D | Splice site prediction using neural network | |
| KissSplice/FineSplice | r D | Splice site prediction using primary RNA-seq data | |
| RSVP | r D | Splice variants predictor using RNA-seq data | |
| GlimmerHMM/Genescan | D | Gene structure prediction using | |
| FragGeneScan | r D G | Gene structure prediction using | |
| TWINSCAN/N-SCAN | D R | Gene structure prediction using | |
| CONTRAST | D | Gene prediction using machine learning and homology | |
| CPC | D R A | Coding potential prediction using | |
| CPAT/PhyloCSF | D R A | Coding potential prediction using statistic scores | |
| ReEvolver/ | D | Coding potential prediction based on evolutionary simulation | |
| Maker/Augustus | R D A | Gene prediction pipelines | |
| Seg-HCA | A | Domain prediction based on hydrophobic clusters | |
| Classification methods | R D A | Sequence homology searches | |
| Blastp/Exonerate | Classification based on | ||
| Domain trees | D R A | Phylogeny and Parsimony approaches | |
| Comparative genomics | D R A | Gene classification based on clustering by homology | |
| Phylostratigraphy | D R A | Gene classification based on phylogeny and homology |
Note: The computational effort (Comp. effort) and required input types are shown.
Abbreviations: R, RNA sequence; r, RNA-reads; D, DNA sequence; A, amino acid sequence; P, peptides; G, reference genome; Comp. effort, computational effort.
| Orphan genes |
| Orphan (or taxonomically restricted) genes are classified based on a given phylogeny. A gene that is only found inside a single species or a branch, but not outside, is orphan in that specific branch. |
| Novel genes |
| Novel genes are classified by their age. Genes that have emerged inside a defined time frame are novel genes. The time frame is not fixed and need to be defined for each study. All novel genes are orphan in a specific clade, but, depending on the time frame, not all orphan genes are classified as novel. |
| De novo genes |
| De novo genes are defined based on their mechanism of emergence, ie, out of previously noncoding DNA. This might, eg, occur via acquisition of transcriptional regulation, consecutive point mutations, or genomic rearrangements. |