| Literature DB >> 27245157 |
Neel Prabh1, Christian Rödelsperger2.
Abstract
BACKGROUND: Current genome sequencing projects reveal substantial numbers of taxonomically restricted, so called orphan genes that lack homology with genes from other evolutionary lineages. However, it is not clear to what extent orphan genes are real, genomic artifacts, or represent non-coding RNAs.Entities:
Keywords: Gene expression; Negative selection; Nematodes; Orphan genes; Ortholog; Paralog; dN/dS
Mesh:
Substances:
Year: 2016 PMID: 27245157 PMCID: PMC4888513 DOI: 10.1186/s12859-016-1102-x
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Transcription and differential expression of orphan genes. a RNA sequencing data from 14 experiments are used to determine the number of expressed orphan and conserved genes as a function of the number of RNA-seq samples (Y-axis) and different expression thresholds (FPKM > = 1 and FPKM > =10). The boxplot shows the variation in the number of expressed genes from ten random permutations of the order of RNA-seq samples b) Saturation analysis for the number of differentially expression pattern of both orphan and conserved genes from six transcriptome profiling studies [25, 28]
Fig. 2Orphan genes are under strong negative selection. a Distribution of gene clusters based on the number of orthologs between P. pacificus and P. exspectatus. b The graph shows the cumulative proportion of orphan and conserved gene clusters with evidence for negative selection relative to different ω thresholds in the P. pacificus – P. exspectatus orthologous clusters dataset. Using an abitrary cutoff of ω < 0.6, 78 % of all orphan clusters (N = 3571) show evidence of negative selection. c Comparison of the proportion of negatively selected orphan gene clusters and conserved gene clusters (Y-axis) based on the analysis of paralogous gene clusters. Given, that only 11 % of conserved clusters and 15 % of all orphan clusters contain P. pacificus paralogs, this analysis shows that even in the absence of other genomic data, negative selection can be investigated in a subset of orphan genes. d Cumulative proportion of orphan and conserved gene clusters under given ω value in Clade A1 – Clade A2 ortholog dataset
Fig. 3Ortholog, paralog and population genomic data are highly complementary. a Venn diagram for three different data sets using a definition of ω < 0.6. As most genes with evidence of negative selection are unique to one specific data set, this indicates that all three data sets are highly complementary. b Venn diagram for three different data sets using a definition of ω < 1 and P < 0.05. Again, most genes with evidence of negative selection are unique to one specific data set. c Overlap between different thresholds to define negative selection and expressed genes. d Identification of a candidate gene set of orphan genes that lack any expression data, are under significant negative selection and also have ω value less than 0.6 in at least two of the three datasets
Fig. 4Validation of orphan genes. a PCR validation experiments for eleven candidate orphan genes. Genomic DNA (odd numbers) and cDNA (even numbers) was amplified using the same primer pairs. In three cases, we obtained bands in the expected size range. b Sequencing of amplification products resulted in ESTs that exactly confirmed the gene structure
Fig. 5Differences between various gene classes. a Comparison of transcript length for potential prediction artifacts/pseudogenes, non-coding RNA candidates, negatively selected orphan genes, and conserved genes. The y-axis denotes the fraction of genes at a given transcript length. b Comparison of the number of exons for different gene classes. c GC content distribution for all four gene classes. d Distribution of contig size percentiles among all four classes. The top 1 % of largest contigs harbors roughly 90 % of genes for all four gene classes
Primer pairs for the candidate genes
| Gene identifier | Primer identifier | Primer sequence |
|---|---|---|
| Contig1-snapTAU.32 | C1.32F1 | TCTGTCCAGAGGAACGAATGGGATC |
| C1.32R1 | TGCACACTAACAAGTCTTCCCTCAG | |
| C1.32F2 | CAGGAAAGATCGTCAAACAGGACCA | |
| C1.32R2 | TGATTTCTCTTCAGGAGACACTCAG | |
| Contig115-snapTAU.38 | C115.38F1 | GTCAGAGTGGAAATCAGTGCAACTG |
| C115.38R1 | TCACTTCCGTGTGTACGATTGACTT | |
| C115.38F2 | ATGCCGAGCACAGAACAAATGCTGC | |
| C115.38R2 | ACCGAGATTGCGGAAAACAGCGCAA | |
| Contig159-snapTAU.23 | C159.23F1 | TTCATCGCTGACGATCACAGGCACA |
| C159.23R1 | AGATCATCATGCAGCCCTCCTTTGC | |
| C159.23F2 | ATGCTCAAACTCCTCGTCTTCACCA | |
| C159.23R2 | ACGATTTGACTGCGGGCTCTGCCTT | |
| Contig162-snapTAU.8 | C162.8F1 | ATCAATGGCAATAAATCCGCTTACG |
| C162.8R1 | ATAAAGCCGTGAAGGTAATTCTCAT | |
| C162.8F2 | AATAAATCCGCTTACGAACCAATCG | |
| C162.8R2 | GGTAATTCTCATATTTGATGATTCC | |
| Contig163-snapTAU.25 | C163.25F1 | GCAATCCCTCTACTGGCAGAATCTC |
| C163.25R1 | ATTGCATGGAGAGTACGTATCCGAC | |
| C163.25F2 | AACTATGAAGGCGGTGATTCATTGG | |
| C163.25R2 | GTTCGTTGAAAATCCACACTTTTCG | |
| Contig27-snapTAU.5 | C27.5F1 | ACAAGAAGGCATACATGATGTACCC |
| C27.5R1 | AGTAGTCGAGGTGATGCTGTCAGGA | |
| C27.5F2 | AACTGCATCTCAGACGCATCGGACA | |
| C27.5R2 | TTTGACCTTGAACGCTTTCCTCCCG | |
| Contig51-snapTAU.126 | C51.126F1 | ATGCTTGCGTGCATTGGGATCATCG |
| C51.126R1 | TAGCTCATTGAGATCAATGTCTTCG | |
| C51.126F2 | TGACCTTCCTCGGCGGATGTTCCA | |
| C51.126R2 | AGTTCACTTAGGCTCTCAAATGAGG | |
| Contig57-snapTAU.76 | C57.76F1 | AGGAGATGATCGATAAACACAAAGCC |
| C57.76R1 | TCTTCTTCTGCAGCTGATTTGCCAC | |
| C57.76F2 | TCGACAAGTGCTTCAAAGCCGAGCT | |
| C57.76R2 | AAGATCCTCAAACTTCTCGCTGTG | |
| Contig62-snapTAU.17 | C62.76F1 | TGCAAGTTGCACATCTCAACCACCT |
| C62.76R1 | ACACTTGGTTTCTTGAATGAGCTAAC | |
| C62.76F2 | TGGGGATATCAAGTGCAAAGGCACTG | |
| C62.76R2 | TTGGCTGGTTGGCTCTCGAATACTG | |
| Contig67-snapTAU.30 | C67.30F1 | ATTCGACGTCTACTCTCACGCAACA |
| C67.30R1 | ATACGAAGTACAACATCACCTTGAG | |
| C67.30F2 | TTCCGGCACACTTCTCATCATTCTC | |
| C67.30R2 | AAATGAACGAGTACAACAGTAAACC | |
| Contig68-snapTAU.138 | C68.138F1 | ACTGATTGCTGCTCATACAGATCGA |
| C68.138R1 | ACTGAGGAGCATCGTAAGCTGACTC | |
| C68.138F2 | TCTTATTGGCTATACTGATTGCTGC | |
| C68.138R2 | ATCCACTTTCCTGTCGAATTGACGC |