| Literature DB >> 26681921 |
Yuping Jia1, Lichan Chen2, Yukui Ma1, Jian Zhang3, Ningzhi Xu4, Dezhong Joshua Liao2.
Abstract
Recent genomic and ribonomic research reveals that our genome produces a stupendous amount of non-coding RNAs (ncRNAs), including antisense RNAs, and that many genes contain other gene(s) in their introns. Since ncRNAs either regulate the transcription, translation or stability of mRNAs or directly exert cellular functions, they should be regarded as the fourth category of RNAs, after ribosomal, messenger and transfer RNAs. These and other research advances challenge the current concept of gene and raise a question as to how we should redefine gene. We can either consider each tiny part of the classically-defined gene, such as each mRNA variant, as a "gene", or, alternatively and oppositely, regard a whole genomic locus as a "gene" that may contain intron-embedded genes and produce different types of RNAs and proteins. Each of the two ways to redefine gene not only has its strengths and weaknesses but also has its particular concern on the methodology for the determination of the gene's function: Ectopic expression of complementary DNA (cDNA) in cells has in the past decades provided us with great deal of detail about the functions of individual mRNA variants, and will make the data less conflicting with each other if just a small part of a classically-defined gene is considered as a "gene". On the other hand, genomic DNA (gDNA) will better help us in understanding the collective function of a genomic locus. In our opinion, we need to be more cautious in the use of cDNA and in the explanation of data resulting from cDNA, and, instead, should make delivery of gDNA into cells routine in determination of genes' functions, although this demands some technology renovation.Entities:
Keywords: Complementary DNA; Gene definition; Gene function; Genomic locus; non-coding RNA
Mesh:
Substances:
Year: 2015 PMID: 26681921 PMCID: PMC4671999 DOI: 10.7150/ijbs.13436
Source DB: PubMed Journal: Int J Biol Sci ISSN: 1449-2288 Impact factor: 6.580
Fig 1Examples copied from the NCBI database in which a gene (indicated by a red arrow) contains other gene(s) (indicated by grey arrow). In the cases that two arrows point to the opposite directions, the genes are encoded by the opposite strands of the DNA double helix. A: The CTNNA1 (β-catenin) gene has the LRRTM2 gene and an unannotated gene (LOC105379193) embedded in its introns. B: The RB1 gene has the LPAR6 gene and several pseudogenes embedded in its introns. C: The REEP5 gene has the SRP19 and ERSP1 genes and the XBP1P1 pseudogene embedded in its introns. D: The genomic region for the oncogenic PVT1 ncRNA harbors the TMEM75 gene and several microRNAs as well.
Fig 2Illustration of the relationship between ncRNAs and mRNAs with an academic biomedical department as an analogy. Our genome assigns only 1.5% of its sequence to mRNAs that encode proteins, which conduct cellular functions and thus resemble the cellular counterpart of the blue-collar working class. The remaining 98.5% of the genome is non-coding but is also transcribed to RNAs as regulators of cellular functions, mainly via control of mRNAs, thus resembling the white-collar class. In a biomedical department, most scientific achievements, with their various rewards, are credited to professors, with the tiny leftover credited with acknowledgements (such as diplomas) to those graduate students, postdocs or technicians who are the academic counterpart of the blue-collar working class employed and told by the professors to produce the actual data in the labs or animal rooms. Therefore, those who provide the direction are considered more important than those who provide the labor. Today the main focus of the biomedical fraternity is still on proteins as before, but it is probably time to shift more attention to the governing 98.5%.
Fig 3Illustration of multiple ORFs in a given mRNA. Top panel: In the wt human CDK4 mRNA (copied from the NCBI database as a DNA sequence), as an example, all ATGs and CTGs as the most possible start codons are highlighted in red color while the three canonical stop codons (TAA, TAG and TGA) are shaded with yellow color. The ATG and TGA of the annotated CDK4 ORF are italicized and boldfaced with green color, while all in-frame downstream ATGs that may initiate N-terminally truncated CDK4 protein isoforms are highlighted in green color. Some (but not all, to avoid overwhelming the picture) ORFs that are initiated from out-of-frame ATGs and thus encode non-CDK4 peptides or proteins are underlined, with red underlining indicating the ORFs outside the CDK4 coding region, green underlining indicating an ORF overlapping with the CDK4 C-terminus, and black underlining indicating the ORFs within the CDK4 coding region. Some of these non-CDK4 AltORFs also contain some shorter out-of-frame AltORFs, which are displayed in yellow letters. Bottom panel: Although the current translation algorithm assigns only one ORF (long red bar, referred herein to as “annotated” ORF) to one mRNA (long black arrow), the mRNA also has two uORFs (short green bar) at the 5'UTR and an out-of-frame AltORF at the 3'UTR (long green bar). Moreover, there are many other short AltORFs (blue bar) that are not in frame with one another or with the annotated one. Some of these AltORFs may overlap with the nearby ones and contain some even shorter AltORFs (short yellow bar).
Fig 4Illustration of how a gene functions by producing different RNAs. Left panel: Flowchart of the routine in studying a gene's function, with emphasis on the ectopic expression approach. Sequencing RT products leads to identification of a gene's mRNA in a cDNA form. Aligning its sequence with gDNA will localize it to a chromosome, which allows us to knock-in or knockout the gene. Continuing to sequence more cDNAs will identify other mRNA variants, which allows us not only to knock down the expression of one, some or all of the variants using such as siRNA but also to ectopically express the mRNAs using cDNAs. For ectopic expression, each cDNA will be cloned into a vector and introduced to cells in culture or in an animal, and the resulting data are used to evaluate the function of this cDNA. Right panel: A gene, which may be expressed in two different cell types (A and B), has two alternative initiation sites and two alternative termination sites for transcription, permitting it to produce four different transcripts. One, some or all four transcripts may have a long 5'-UTR that may harbor multiple uORFs and/or an even-longer 3'-UTR that may contain AltORFs. In one cell type, e.g. normal cells, splicing of one transcript retains all five exons, thus annotated as the wt mRNA, or alternative splicing produces three mRNA variants. In another cell type, e.g. in cancer or another organ or at another developmental stage, the transcripts are spliced to a partly different spectrum of mRNA variants. Some of the mRNAs encode AltORFs as well, resulting in a total of six AltORFs in the two cell types. Moreover, the intron 2 encodes another gene, and its transcripts may be spliced to a wt mRNA with 3 exons (I1, I2 and I3) or, alternatively, to two other mRNA variants in the two cell types. The intron sequences may be processed to different ncRNAs, although only miRNAs and siRNAs are shown for simplicity. More complexly, part of the Crick strand of the DNA may be transcribed to some antisense RNAs as well. Therefore, the global picture about the function of this gene or genomic locus is a collective (but not simply additive) effect of the six mRNA variants and six AltORFs of the parental gene, the three mRNA variants of the nested gene, and all the ncRNAs (miRNAs, siRNAs, piRNAs, snRNAs, exRNAs, circRNAs, and antisense RNAs) in these two cell types. If the parental or the nested gene encodes a transcription factor or a membrane receptor, different heterodimers may be formed among the protein isoforms of the same gene to exert functions as well.