Literature DB >> 17760969

antiCODE: a natural sense-antisense transcripts database.

Yifei Yin1, Yi Zhao, Jie Wang, Changning Liu, Shuguang Chen, Runsheng Chen, Haitao Zhao.   

Abstract

BACKGROUND: Natural antisense transcripts (NATs) are endogenous RNA molecules that exhibit partial or complete complementarity to other RNAs, and that may contribute to the regulation of molecular functions at various levels. In recent years, large-scale NAT screens in several model organisms have produced much data, but there is no database to assemble all these data. AntiCODE intends to function as an integrated NAT database for this purpose.
RESULTS: This release of antiCODE contains more than 30,000 non-redundant natural sense-antisense transcript pairs from 12 eukaryotic model organisms. In order to provide an integrated NAT research platform, efficient browser, search and Blast functions have been included to enable users to easily access information through parameters such as species, accession number, overlapping patterns, coding potential etc. In addition to the collected information, antiCODE also introduces a simple classification system to facilitate the study of natural antisense transcripts.
CONCLUSION: Though a few similar databases also dealing with NATs have appeared lately, antiCODE is the most comprehensive among these, comprising almost all currently detected NAT pairs.

Entities:  

Mesh:

Substances:

Year:  2007        PMID: 17760969      PMCID: PMC1997216          DOI: 10.1186/1471-2105-8-319

Source DB:  PubMed          Journal:  BMC Bioinformatics        ISSN: 1471-2105            Impact factor:   3.169


Background

Natural antisense transcripts (NATs) are endogenous RNA molecules that exhibit partial or complete complementarity to other transcripts, through which they may contribute to the regulation of molecular expression at various levels. Though many natural antisense transcripts were discovered through their regulatory function on the expression of mRNAs [1,2], some global predictions of NATs in several species have also been published [3-10]. The first of these used mRNA data to predict natural antisense transcripts [4]. With the appearance of more draft genomes and full length cDNA data, the scale of NATs predictions has been extended. Several datasets, mainly based on full length cDNAs, have been published for mouse [8,11], rice [12] and Arabidopsis thaliana [7]. Since 2006, the trend in NATs prediction has turned to multi-species comparisons [6,13]. A number of published NATs have been validated by various experimental approaches, such as RT-PCR [10] and microarray [5], further confirming that antisense transcript is a common occurrence in eukaryote transcriptomes. The background for the emergence of so much NAT data in recent years, is on the one hand the availability of more genomic and full length cDNA data, and on the other hand a growing realization of the important functions of natural antisense transcripts. Antisense RNAs may contribute regulatory activity at various levels, such as post-transcription [14,15], splicing [16,17], transport [18], and genomic imprinting[19,20], and have been shown to be involved in the control of developmental processes [21], adaptation to various stresses [22], and viral infection [23,24] through annealing to complementary sequences. To facilitate research, previous publications have suggested a few classification systems for NATs. The most basic of these is the cis/trans system [4] in which an antisense transcript from the same genomic loci as the sense transcript is labelled a cis-NAT, whereas a trans-NAT is an antisense transcript expressed from a genomic locus different from that of the sense transcript. A second classification system is based on the overlapping position of the complementary pair, which will be divided into 5–6 categories according to their patterns of gene structure, e.g. depending on whether the pair overlaps at their 5' ends, 3' ends, completely, or in the introns [6,7,10,11]. A third classification system considers the respective coding potential of the complementary pair, and includes the categories coding-coding, coding-noncoding and noncoding-noncoding [8,13]. Up to present, a number of large-scale NAT data have been published and several functional studies of NATs have been carried out, however, thus far no database has been set up to collect and order all these transcripts. In order to serve the need of the NAT research, we have over the past two years built the antiCODE database. The purpose of the database is to collect the existing NAT data, and to provide a useful browsing and search platform for these data. This release of antiCODE contains more than 30,000 natural sense-antisense transcript pairs from the 12 model organisms Homo sapiens (human), Mus musculus (mouse), Rattus norvegicus (rat), Xenopus tropicalis (western clawed frog), Drosophila melanogaster (fruit fly), Caenorhabditis elegans (nematode), Ciona intestinalis (seasquirt), Gallus gallus (chicken), Danio rerio (zebrafish), Bos taurus (cow), Oryza sativa (rice) and Arabidopsis thaliana (thale cress).

Construction and content

All NATs in the database have been collected from recent articles [4-8,10-13]. The original datasets used for construction of the database are listed in Table 1, which include 11,287 human NAT pairs [4,5,10], 14,199 mouse NAT pairs [8,11], 1,339 A. thaliana NAT pairs [7], 687 rice NAT pairs [12] and more than 5,000 NAT pairs from other species [6,13].
Table 1

The genome-wide NAT datasets in eukaryotic species

ReferenceSpecies involved in the predictionsThe number of transcripts
[4]Human372
[5]Human2,667
[11]Mouse4,279
[12]Rice1,374
[10]Human5,880
[7]Arabidopsis thaliana1,340
[8]Mouse37,562
[13]Human, mouse, rat, chicken, fruit fly, and nematode11,200
[6]Human, mouse, frog, cow, fruit fly, worm, zebra fish and sea squirt21,266
The genome-wide NAT datasets in eukaryotic species

Classification

After collecting the NAT pairs, there was a need for uniform criteria to organize the data. Based on the previous classifications, we developed a classification system that includes three complementary aspects for which we use the terms "5/3/c/o", "cis/trans" and "coding/noncoding". The "5/3/c/o" system represents a simplification of the existing classification based on gene structure [6,11], and indicates which parts of the two sequences overlap, i.e. the 5' ends (5' overlapping), the 3' ends (3' overlapping), or one transcript completely covered by the other (complete; see Figure 1). If neither applies, the NAT pair will be marked "o" (other), for instance if only partial overlap between the two transcripts. The "cis/trans" scheme tells whether or not the two sequences of a NAT pair are located at the same chromosomal loci, i.e. if both of them are located at the same genomic position they will be named a cis-NAT pair, otherwise a trans-NAT pair. The "coding/noncoding" scheme indicates whether the two overlapping RNAs are (protein) coding RNAs or noncoding RNAs. We have not adopted the system [6,11] that divided NAT pairs according to their exon-intron structures, because we wish to provide more compact and practical information and thus enable quick retrieval of the most useful bits from the abundance of available information. For more detailed information on particular NAT pairs, users may visit other relevant databases through the provided links.
Figure 1

The "5/3/c/o" classification system. The arrows indicate the transcriptional orientation of the NAT pair. A solid line indicates an exon and a broken line an intron.

The "5/3/c/o" classification system. The arrows indicate the transcriptional orientation of the NAT pair. A solid line indicates an exon and a broken line an intron.

Database Construction

We obtained accession numbers and clone IDs for the NAT pairs from the supplementary material of published articles and downloaded the annotation information and sequences from the NCBI and FANTOM websites. In the first step, we divided the NAT pairs to cis/trans classes according to information in referenced papers. The second step was to classify the NAT pairs according to the coding/noncoding system, thus, all NAT pairs were sorted as coding-coding, coding-noncoding and noncoding-noncoding. In the third step, Blat [25] was used to classify the NAT pairs according to the 5/3/c/o system. Finally, we have removed redundant NAT pairs derived from different datasets.

Website Features

The three core functions of antiCODE database are browse, search and sequence alignment with Blast. Under the browse option, there are five sub-options – Pair ID, cis/trans, overlap, coding/noncoding, and species – by which users can browse all NAT pairs by pair ID, or NAT pair classes. More specific lookups can be executed by the search function. Users can enter the exact gene accession number or clone ID to see whether a sequence of interest has a possible complementary transcript. If one is interested in NAT pairs relating to some particular condition, e.g. cancer, a relevant key word can be entered in the Text search frame under the search option. If a sequence of interest cannot be found in the database or a user want to investigate whether some novel sequence possibly overlap with known NAT pairs, the Blast option will be very useful. Users just needs to paste her sequence in the sequence window, or load them into the Blast web page, and select the appropriate choices, such as expected number of hits (Figure 2), and then the Blast result will be returned.
Figure 2

The Blast options. In the database frame, 12 genomes could be selected as Blast databases. More detailed options could be found below which allow users to personalize the Blast results according to complexity, expect value and graphical overview options.

The Blast options. In the database frame, 12 genomes could be selected as Blast databases. More detailed options could be found below which allow users to personalize the Blast results according to complexity, expect value and graphical overview options. After a NAT pairs of interest have been found, all information pertaining to the NAT pair, including annotation and map view links to other databases, affiliated classes, a simple description and references, will appear. More detailed annotations and comments can be obtained through the links to other relevant databases.

Utility and discussion

Recently, new technologies, such as microarray, SAGE, and MPSS have played prominent roles in the identification of NAT pairs. Before 2005 only EST (UniGene) and mRNAs had been used for NAT prediction. Later large scale full-length cDNA data emerged, based on which more than 1,000 rice NATs[12] were first reported, closely followed by mouse [8,11] and Arabidopsis [7] NATs. For NAT prediction in Arabidopsis [7] also MPSS data has been used, and in 2005, a new NAT dataset based on SAGE was reported in mouse [26]. In 2007, data [27] from whole-genome arrays was employed for NAT prediction in Arabidopsis. It is expected that along with the improvement in array technology, more transcripts from tilling microarrays will be used for future NAT predictions, hopefully resulting in an accurate and exhaustive set of NAT data.

Conclusion

The most recently released NAT datasets [9,26-28] have yet not been included in antiCODE, but will be included in the next release of the database. However, compared with other existing databases [29], antiCODE is presently the most comprehensive and integrated database for NAT pairs. The most distinctive features of antiCODE are as follows; (i) antiCODE includes almost all known natural antisense transcript (NAT) pairs from 12 eukaryotic model organisms, (ii) antiCODE provides substantial and compact information relating to NATs (e.g. accession number, clone ID, species, classification etc.), (iii) we have introduced a classification system based on the previous notions which should give users an immediate impression of the basic features of each NAT pair, (iv) a Blast service is provided, and (v) antiCODE provides a user-friendly interface and a convenient search option, allowing efficient investigation and verification of natural antisense pairs from different species.

Availability and requirements

The antiCODE database and related resources can be freely accessed at its websites or

Authors' contributions

Yifei Yin and Yi Zhao carried out the design and the collection of data. Jie Wang carried for building the database. Changning Liu participated in the design of the study. Shuguang Chen helped to draft the manuscript. Runsheng Chen and Haitao Zhao participated in the design and coordination. All authors read and approved the final manuscript.
  29 in total

Review 1.  The uniqueness of the imprinting mechanism.

Authors:  F Sleutels; D P Barlow; R Lyle
Journal:  Curr Opin Genet Dev       Date:  2000-04       Impact factor: 5.578

2.  Specific interference with gene expression induced by long, double-stranded RNA in mouse embryonal teratocarcinoma cell lines.

Authors:  E Billy; V Brondani; H Zhang; U Müller; W Filipowicz
Journal:  Proc Natl Acad Sci U S A       Date:  2001-11-27       Impact factor: 11.205

3.  BLAT--the BLAST-like alignment tool.

Authors:  W James Kent
Journal:  Genome Res       Date:  2002-04       Impact factor: 9.043

4.  Antisense transcripts with FANTOM2 clone set and their implications for gene regulation.

Authors:  Hidenori Kiyosawa; Itaru Yamanaka; Naoki Osato; Shinji Kondo; Yoshihide Hayashizaki
Journal:  Genome Res       Date:  2003-06       Impact factor: 9.043

Review 5.  Antisense RNA in imprinting: spreading silence through Air.

Authors:  Claire Rougeulle; Edith Heard
Journal:  Trends Genet       Date:  2002-09       Impact factor: 11.639

6.  Widespread occurrence of antisense transcription in the human genome.

Authors:  Rodrigo Yelin; Dvir Dahary; Rotem Sorek; Erez Y Levanon; Orly Goldstein; Avi Shoshan; Alex Diber; Sharon Biton; Yael Tamir; Rami Khosravi; Sergey Nemzer; Elhanan Pinner; Shira Walach; Jeanne Bernstein; Kinneret Savitsky; Galit Rotman
Journal:  Nat Biotechnol       Date:  2003-03-17       Impact factor: 54.908

7.  Natural antisense transcripts are detected in different cell lines and tissues of cats infected with feline immunodeficiency virus.

Authors:  S Briquet; J Richardson; C Vanhée-Brossollet; C Vaquero
Journal:  Gene       Date:  2001-04-18       Impact factor: 3.688

8.  Antisense transcripts in the human genome.

Authors:  Ben Lehner; Gary Williams; R Duncan Campbell; Christopher M Sanderson
Journal:  Trends Genet       Date:  2002-02       Impact factor: 11.639

Review 9.  Regulation of the NPT gene by a naturally occurring antisense transcript.

Authors:  Andreas Werner; Keziah Preston-Fayers; Leif Dehmelt; Perihan Nalbant
Journal:  Cell Biochem Biophys       Date:  2002       Impact factor: 2.194

10.  Prediction of trans-antisense transcripts in Arabidopsis thaliana.

Authors:  Huan Wang; Nam-Hai Chua; Xiu-Jie Wang
Journal:  Genome Biol       Date:  2006-10-13       Impact factor: 13.583

View more
  12 in total

1.  Mapping of small RNAs in the human ENCODE regions.

Authors:  Christelle Borel; Maryline Gagnebin; Corinne Gehrig; Evgenia V Kriventseva; Evgeny M Zdobnov; Stylianos E Antonarakis
Journal:  Am J Hum Genet       Date:  2008-04       Impact factor: 11.025

2.  Transcriptional regulation of translocator protein (Tspo) via a SINE B2-mediated natural antisense transcript in MA-10 Leydig cells.

Authors:  Jinjiang Fan; Vassilios Papadopoulos
Journal:  Biol Reprod       Date:  2012-05-10       Impact factor: 4.285

3.  RNAi screen indicates widespread biological function for human natural antisense transcripts.

Authors:  Mohammad Ali Faghihi; Jannet Kocerha; Farzaneh Modarresi; Pär G Engström; Alistair M Chalk; Shaun P Brothers; Eric Koesema; Georges St Laurent; Claes Wahlestedt
Journal:  PLoS One       Date:  2010-10-04       Impact factor: 3.240

4.  Genome-wide identification of long noncoding natural antisense transcripts and their responses to light in Arabidopsis.

Authors:  Huan Wang; Pil Joong Chung; Jun Liu; In-Cheol Jang; Michelle J Kean; Jun Xu; Nam-Hai Chua
Journal:  Genome Res       Date:  2014-01-08       Impact factor: 9.043

5.  OverGeneDB: a database of 5' end protein coding overlapping genes in human and mouse genomes.

Authors:  Wojciech Rosikiewicz; Yutaka Suzuki; Izabela Makalowska
Journal:  Nucleic Acids Res       Date:  2018-01-04       Impact factor: 16.971

Review 6.  Computational Approaches for Revealing the Structure of Membrane Transporters: Case Study on Bilitranslocase.

Authors:  Katja Venko; A Roy Choudhury; Marjana Novič
Journal:  Comput Struct Biotechnol J       Date:  2017-01-31       Impact factor: 7.271

7.  Identification of differentially expressed sense and antisense transcript pairs in breast epithelial tissues.

Authors:  Anita Grigoriadis; Gavin R Oliver; Austin Tanney; Howard Kendrick; Matt J Smalley; Parmjit Jat; A Munro Neville
Journal:  BMC Genomics       Date:  2009-07-17       Impact factor: 3.969

8.  Multiple-omic data analysis of Klebsiella pneumoniae MGH 78578 reveals its transcriptional architecture and regulatory features.

Authors:  Joo-Hyun Seo; Jay Sung-Joong Hong; Donghyuk Kim; Byung-Kwan Cho; Tzu-Wen Huang; Shih-Feng Tsai; Bernhard O Palsson; Pep Charusanti
Journal:  BMC Genomics       Date:  2012-11-29       Impact factor: 3.969

Review 9.  Long non-coding RNAs and complex human diseases.

Authors:  Jing Li; Zhenyu Xuan; Changning Liu
Journal:  Int J Mol Sci       Date:  2013-09-12       Impact factor: 5.923

10.  NONCODE 2016: an informative and valuable data source of long non-coding RNAs.

Authors:  Yi Zhao; Hui Li; Shuangsang Fang; Yue Kang; Wei Wu; Yajing Hao; Ziyang Li; Dechao Bu; Ninghui Sun; Michael Q Zhang; Runsheng Chen
Journal:  Nucleic Acids Res       Date:  2015-11-19       Impact factor: 16.971

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.