| Literature DB >> 31400206 |
Ann M McCartney1,2, Edel M Hyland1,3, Paul Cormican4, Raymond J Moran1,2, Andrew E Webb1, Kate D Lee1,5,6, Jessica Hernandez-Rodriguez7, Javier Prado-Martinez7,8, Christopher J Creevey3,9, Julie L Aspden10, James O McInerney11,12, Tomas Marques-Bonet7,13,14,15, Mary J O'Connell1,2,12.
Abstract
Gene fusion occurs when two or more individual genes with independent open reading frames becoming juxtaposed under the same open reading frame creating a new fused gene. A small number of gene fusions described in detail have been associated with novel functions, for example, the hominid-specific PIPSL gene, TNFSF12, and the TWE-PRIL gene family. We use Sequence Similarity Networks and species level comparisons of great ape genomes to identify 45 new genes that have emerged by transcriptional readthrough, that is, transcription-derived gene fusion. For 35 of these putative gene fusions, we have been able to assess available RNAseq data to determine whether there are reads that map to each breakpoint. A total of 29 of the putative gene fusions had annotated transcripts (9/29 of which are human-specific). We carried out RT-qPCR in a range of human tissues (placenta, lung, liver, brain, and testes) and found that 23 of the putative gene fusion events were expressed in at least one tissue. Examining the available ribosome foot-printing data, we find evidence for translation of three of the fused genes in human. Finally, we find enrichment for transcription-derived gene fusions in regions of known segmental duplication in human. Together, our results implicate chromosomal structural variation brought about by segmental duplication with the emergence of novel transcripts and translated protein products.Entities:
Keywords: Great Ape Comparative genomics; mechanisms of protein-coding evolution; novel genes; segmental duplication; sequence similarity networks; transcriptional readthrough
Mesh:
Substances:
Year: 2019 PMID: 31400206 PMCID: PMC6764479 DOI: 10.1093/gbe/evz163
Source DB: PubMed Journal: Genome Biol Evol ISSN: 1759-6653 Impact factor: 3.416
. 1.—Phylogenetic distribution of transcription-derived gene fusions (TDGFs). (a) The species sampled are represented in the phylogeny on the left with their estimated divergence times—Mya. Numbers on branches represent the number of gene fusions at those nodes. (b) Deep and pale pink cells in the matrix on the right correspond to the presence (deep pink) or absence (pale pink) of the gene fusion in that species. The “Seg Dup” row in the matrix shows the fused genes present at known segmental duplication breakpoints from human (dark gray), in pale gray are gene fusions for which there is missing information and in white are the gene fusions that are not found in human.
. 2.—Expression profiles for transcription-derived gene fusions (TDGFs) and their parent genes. (a) Comparison of the expression profiles between the orthologs of the human-specific fusion genes and their respective orthologous parent gene counterparts in each vertebrate shown. RNAseq data (Brawand et al. 2011) of each organism from the cerebellum, brain, heart, kidney, liver, and testis* (*not available for Pan troglodytes and Macaca mulatta data sets) were analyzed for the presence of >1 read that maps the breakpoint for each gene fusion. Sample sizes were as follows: Homo sapiens (20); P. troglodytes (34); Gorilla gorilla (34); P. pygmaeus (34), M. mulatta (34), and Mus musculus (34). ND, no expression detected; SB, same expression as both parent genes; SO, same expression profile as one parent gene; RP, reduced breadth of expression compared with parent genes; IP, increased breadth of expression compared with parent genes. (b) RT-qPCR to determine the expression of each fused gene across a panel of five human tissues. Darker cells represent amplified product and presence of the gene fusion in that human tissue, pale squares represent no evidence for the gene fusion transcript in that tissue.
Results of RT-qPCR on 26 TDGFs in 5 human tissues
| Tissue | Number of Fusions Expressed |
|---|---|
| Brain | 13 |
| Testis | 19 |
| Liver | 19 |
| Placenta | 17 |
| Lung | 16 |
Out of the 26 testable TDGFs, we display the number that are detected as expressed following RT-qPCR in each of the five human tissues assessed.
Splice factor and transcription factor binding sites predicted for 3 of the TDGFs
| Transcript_ID | RT-qPCR | Predicted Parents | SFBS | TFBS |
|---|---|---|---|---|
| ENSG00000446072 | Ubiquitious | N/A | NOVA1 | N/A |
| ENSG00000567078 | Ubiquitious | ARL6IP1 and RPS15A | NOVA1 | HMGI/Y |
| ENSG00000529564 | No expression | PRSS53-201 and VKORC1-206 | SFASF, SRp20, mbnl, NOVA1 | Sp1, Zfx, YGR067C |
only those transcription derived gene fusions for which we had evidence of translation from ribosome profiling datasets were used in this analysis
. 3.—Splice Factor Binding site profiles for fusion transcript ENST00000529564 and the corresponding parent genes. (a) Transcription-derived gene fusion transcript ENST00000529564 is displayed along with parent genes PRSS53 and VKORC1. Splice Factor binding sites for splice factor “SF2ASF” (in pink), “MBNL1-3” (in gray), “SFp20” (in red), and “NOVA1” (in blue). Each square represents a single SFBS present. (b) Expression level of each Splice factor binding site across ENST00000529564 across a panel of tissues on the x axis (left to right): Adipose tissue; Adrenal gland; Brain; Heart; Kidney; Liver; Lung; Ovary; Pancreas; Sigmoid colon; Small intestine; Spleen, and Testis. Expression data are given in RPKMs. Expression data were obtained from the expression atlas ENCODE data set (Kapushesky et al. 2010). (c) Expression profile of Splice factor binding sites of each of the parent genes PRSS53 (gray bars) and VKORC1 (black bars). Tissue panel on the x axis (left to right): Adipose tissue; Adrenal gland; Brain; Heart; Kidney; Liver; Lung; Ovary; Pancreas; Sigmoid colon; Small intestine; Spleen, and Testis. Expression data are given in RPKMs. Expression data were obtained from the expression atlas ENCODE data set (Kapushesky et al. 2010).