Literature DB >> 17082204

NATsDB: Natural Antisense Transcripts DataBase.

Yong Zhang1, Jiongtang Li, Lei Kong, Ge Gao, Qing-Rong Liu, Liping Wei.   

Abstract

Natural antisense transcripts (NATs) are reverse complementary at least in part to the sequences of other endogenous sense transcripts. Most NATs are transcribed from opposite strands of their sense partners. They regulate sense genes at multiple levels and are implicated in various diseases. Using an improved whole-genome computational pipeline, we identified abundant cis-encoded exon-overlapping sense-antisense (SA) gene pairs in human (7356), mouse (6806), fly (1554), and eight other eukaryotic species (total 6534). We developed NATsDB (Natural Antisense Transcripts DataBase, http://natsdb.cbi.pku.edu.cn/) to enable efficient browsing, searching and downloading of this currently most comprehensive collection of SA genes, grouped into six classes based on their overlapping patterns. NATsDB also includes non-exon-overlapping bidirectional (NOB) genes and non-bidirectional (NBD) genes. To facilitate the study of functions, regulations and possible pathological implications, NATsDB includes extensive information about gene structures, poly(A) signals and tails, phastCons conservation, homologues in other species, repeat elements, expressed sequence tag (EST) expression profiles and OMIM disease association. NATsDB supports interactive graphical display of the alignment of all supporting EST and mRNA transcripts of the SA and NOB genes to the genomic loci. It supports advanced search by species, gene name, sequence accession number, chromosome location, coding potential, OMIM association and sequence similarity.

Entities:  

Mesh:

Substances:

Year:  2006        PMID: 17082204      PMCID: PMC1635336          DOI: 10.1093/nar/gkl782

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

Recent studies showed that not only prokaryotic, but also eukaryotic genomes contain abundant genes that at least partially overlap with another gene encoded by the opposite strand at the same genomic loci (1–9). If the overlap involves exonic regions of both genes, they are defined as cis-encoded natural antisense transcripts (cis-NATs) and the pairs are named sense–antisense (SA) gene pairs; otherwise, the pairs are named non-exon-overlapping bidirectional (NOB, or exon–intron overlapping) gene pairs; if the transcripts at a genomic locus are derived from the same strand, they are called non-bidirectional (NBD) transcripts (6,8). NATs have long been known to be involved in gene expression regulations in prokaryotic cells (1,2). In the past 10 years they have also been found to play multiple roles in eukaryotic gene regulation, such as X-inactivation, genomic imprinting, alternative splicing, RNA stability, transport and translational regulation (3–5). Abnormal changes of antisense transcription have been associated with serious diseases such as cancer and schizophrenia (7,10,11). NOB transcripts have been suggested to play roles in the regulation of pre-mRNA processing and have possible pathological associations (12,13). Whole-genome searches have identified thousands of SA gene pairs in mammals (6,14,15), and hundreds in fly (6,16), worm (6,17) and plants (18,19). We recently developed a computational pipeline to identify SA and NOB gene pairs in 10 species, the most comprehensive collection at the time (6). Two key steps in the pipeline were the reliable mapping of the expressed sequence tag (EST) and mRNA transcripts to genomic sequences and the correct determination of the transcription orientation of ESTs. Here, we report an improved pipeline that imposes more stringent quality control filter on EST-to-genome mapping and uses more evidence to infer the transcript orientation of ESTs. We used the pipeline to identify over 50% more SA and NOB gene pairs in 11 species, including human, mouse, fly, worm, sea squirt, chicken, rat, frog, zebrafish, cow and dog, resulting in the largest collection of SA to date (for details see the next section). The importance and abundance of SA and NOB gene pairs requires a database system for efficient storage, retrieval and display. However, current databases, SADB (), Sense/Antisense Database () and LEADS-Antisensor (), are inadequate for several reasons. SADB includes only SA and NOB genes in mouse, last updated in February 2005. SADB Database includes only human and mouse SA genes and LEADS-Antisensor includes only human SA genes, both of which have not been updated since 2003 and do not include NOB genes. None of the existing databases includes other important species and their collection of SA and NOB genes is limited. Furthermore, their annotation and graphical display of the antisense transcripts is limited. Based on the significantly enlarged set of SA and NOB genes we identified in 11 genomes, we developed NATsDB (Natural Antisense Transcripts DataBase, ), updated quarterly. NATsDB includes extensive annotations and hyperlinks to external databases. It allows users to study whether their gene of interest has antisense transcripts, whether there is sufficient supporting evidence of the transcript orientation, such as splicing sites, poly(A) signals and tails, what is the exact overlapping pattern, whether they are conserved across different species and what is the expression profile of the sense and antisense genes. This multiple-species, highly annotated database can facilitate the study of the function, conservation, and evolution of SA and NOB genes.

IMPROVED PIPELINE TO IDENTIFY SA AND NOB GENE PAIRS

We recently reported a rapid pipeline to identify SA pairs based on UniGene sequences (20) and GoldenPath (21) chromosome mapping data (6). In short, we filtered the GoldenPath genome mapping data to determine the exact chromosomal coordinates of mRNAs or ESTs. Because many ESTs have been known to be mis-oriented, we combined multiple evidence to infer the correct orientation for mRNAs and ESTs, including sequence type (mRNA or EST), CDS annotation, poly(A) signal/tail and consensus splicing junctions. Based on the genomic coordinates, we then grouped the orientation-reliable sequences into SA, NOB, and NBD clusters and selected representative sequences within each cluster to remove redundancy. Finally, we classified the SA gene pairs into six subtypes including ‘Convergent’ (3′–3′ or tail–tail overlap), ‘Divergent’ (5′–5′ or head–head overlap), ‘Complete’ (full overlap), ‘Contained’, ‘Intronic’ and ‘Others’. Here we improved the above pipeline to further increase its accuracy and coverage. First, more stringent filtering of the GoldenPath mapping data was performed to retain higher-quality mRNA/EST mapping to the genomic sequences. We required mapping length ≥150 bp, identity ≥96%, coverage within mapping ≥97% and coverage within whole transcript ≥75%. If a transcript was mapped to multiple genomic loci, only the best mapping was retained; if more than one nearly identical best mapping existed (difference in BLAT scores <5%), the transcript was discarded to avoid ambiguity. We also discarded transcripts that were mapped to somatic DNA recombination hotspots of the immunoglobulin or T-cell receptor in the international ImMunoGeneTics information system (IMGT) (22) because of the difficulty to infer the exact genomic location of these genes. Second, we kept our previous pipeline to infer the transcription orientation for mRNAs and spliced ESTs (6), while adopting the strategy by Engstrom et al. (15) for unspliced ESTs. First, if an unspliced EST had a poly(A) [or poly(T)] tail, then its orientation was determined to be the original (or the opposite) orientation. Second, if its standard poly(A) signal agreed with its direction annotation, it was considered to have the correct orientation. Third, if it came from an ‘orientation reliable’ EST library as defined below, it was considered to have the correct orientation. For each EST library, we determined the orientation of spliced ESTs and compared it with their direction annotation, i.e. 3′ sequencing or 5′ sequencing. If the proportion of spliced ESTs with correct direction annotation in a library was >99% at the 99% confidence level, the library was considered ‘orientation reliable’ and the direction annotation of the unspliced ESTs in the library was adopted. Engstrom et al. (15) proved that such combination of evidences was reliable and sensitive to infer the orientation of unspliced ESTs. For our human dataset, 1 139 001 (50%) of unspliced ESTs could be assigned orientation using this strategy whereas only 317 846 (14%) could have been assigned orientation using our previous pipeline (6). Using this improved pipeline we identified 7356 SA pairs in human, 6806 SA pairs in mouse, 1607 in rat, 1554 in fly, and hundreds of each in worm, sea squirt, chicken, frog, zebrafish, cow and dog. We also identified thousands of NOB pairs. The statistics is shown in Table 1. We compared the mouse SA dataset in NATsDB with that in SADB, using the cross-reference information available on FANTOM3's FTP site to map clone IDs to accession numbers. We found that 89.8% of the SA loci in SADB could be mapped to <50% of mouse SA clusters in NATsDB. Thus despite using different transcript datasets and genome assemblies, NATsDB was able to cover the majority of SADB. At the same time, NATsDB covers 50% more new data for mouse as well as data for 10 other species not included in SADB.
Table 1

Input data source and content statistics of NATsDB

SpeciesUniGene build versionGoldenPath genome versionNumber of orientation reliable sequences mapped on to exact genomic locationPercentage of mRNAs + Spliced ESTs (%)Number of SA clustersNumber of NOB clustersNumber of NBD clustersPercentage of SA genesa(%)Average overlap length of SA pairs
Human193hg184 494 66574.77356129618 86340.7345
Mouse155mm82 100 30581.3680682118 01940.9355
Rat154rn4463 78761.6160772628 4639.7229
Fly44dm2310 31986.51554352831125.6290
Sea squirt18ci2414 45489.499317610 86215.0254
Cow77bosTau2536 93980.186629122 6406.9221
Frog29xenTro2630 01975.983025922 3056.8312
Chicken30galGal2299 93174.087320217 0679.1266
Zebrafish91danRer4522 25987.559330320 4835.3306
Worm28ce2291 39587.547031517 9104.8116
Dog15canFam2203 77275.830221315 1123.7152

aPercentage of SA genes = 2*‘Number of SA Clusters’/(2*‘Number of SA Clusters’ + 2*‘Number of NOB Clusters’ + ‘Number of NBD Clusters’).

Input data source and content statistics of NATsDB aPercentage of SA genes = 2*‘Number of SA Clusters’/(2*‘Number of SA Clusters’ + 2*‘Number of NOB Clusters’ + ‘Number of NBD Clusters’).

INTERACTIVE WEB INTERFACE FOR BROWSING AND SEARCH

Users can browse NATsDB by cluster type (SA, NOB or NBD), species and genomic location. They can also limit the selection by the six classes of SA overlapping patterns, minimum overlapping length, coding potential of the genes involved and UniGene description of the transcripts (Figure 1). NATsDB intersects all the criteria and shows the corresponding genomic loci.
Figure 1

The browser interface of NATsDB: Limited by the criteria specified by the user in the top part of the page, the browser marks on the human chromosomes all SA pairs that involve coding genes on both strands, at least one of which has ‘kinase’ in its description, with overlapping length ≥100 bp. The x-axis of the figure at the bottom of the page shows the chromosomes. ‘+’ signs marked on the chromosome in different colors denote different classes of SA pairs.

The browser interface of NATsDB: Limited by the criteria specified by the user in the top part of the page, the browser marks on the human chromosomes all SA pairs that involve coding genes on both strands, at least one of which has ‘kinase’ in its description, with overlapping length ≥100 bp. The x-axis of the figure at the bottom of the page shows the chromosomes. ‘+’ signs marked on the chromosome in different colors denote different classes of SA pairs. To display each SA, NOB or NBD cluster, we implemented an interactive web interface using PHP () and GD () graphical library. The graphical browser displays the alignment of all transcripts to the genomic sequence to show the overlapping patterns (for SA and NOB). Figure 2 shows one known SA pair in human, MKRN2/RAF1 (23). By default only the representative sequences are shown, as some genomic loci may have hundreds or even thousands of known mRNA and EST transcripts, but users can choose to view all transcripts. Users can also interactively select subsets of transcripts for display using combinations of several criteria including RefSeq (24) mRNAs, spliced ESTs, polyadenlyated ESTs, and/or transcripts from plus, minus or both of the strands.
Figure 2

Loci browser showing human SA gene pair, MKRN2/RAF1: The control panel on top allows users to interactively select all or subsets of all sequences. Below the control panel, the browser displays, from top to bottom, the chromosome coordination (‘Genome’), phastCons conservation score (‘Conservation Score’), selected supporting mRNA/EST sequences with representative sense and antisense transcripts marked in red, and links to expression profiles of the ESTs. Gene name, tissue information, Homologene link, OMIM link and sequence link appear on the right-hand side of each transcript. For more details, please refer to .

Loci browser showing human SA gene pair, MKRN2/RAF1: The control panel on top allows users to interactively select all or subsets of all sequences. Below the control panel, the browser displays, from top to bottom, the chromosome coordination (‘Genome’), phastCons conservation score (‘Conservation Score’), selected supporting mRNA/EST sequences with representative sense and antisense transcripts marked in red, and links to expression profiles of the ESTs. Gene name, tissue information, Homologene link, OMIM link and sequence link appear on the right-hand side of each transcript. For more details, please refer to . The loci browser also displays several types of important information about the transcripts and hyperlinks to external databases such as GoldenPath (21), Homologene (20), BodyMap-Xs (25) and OMIM (26). Exon/intron structure, poly(A) signals and tails, CpG island and First Exon prediction (27) are shown to support the transcript's orientation. The single-nucleotide phastCons conservation scores (28) were imported from GoldenPath so that users can visually check the difference in conservation between overlapping and non-overlapping regions which might indicate biological significance of the pairing between sense and antisense transcripts (15). If an ‘H’ appears at the right end of a transcript line, it can be clicked to open a list of homologous genes, if any, in the other 11 species, cross-reference by Homologene (20). Expression profiling of the SA and NOB gene pairs may provide important information about the pairs' interaction. We used data in BodyMap-Xs (25) to profile the expression of transcripts in NATsDB across 13 organs, 40 tissues and normal versus pathological conditions (Figure 3). Finally, a hyperlink to OMIM, denoted by ‘O’, appears at the right end of a transcript line if the gene has been previously linked to disease.
Figure 3

Expression profile of MKRN2/RAF1 is shown as bar plot, based on all spliced ESTs derived from the plus strand (MKRN2) and minus strand (RAF1) of this genomic locus. Users could change the criteria in the control panel on the loci page to select any other subsets of ESTs to profile the sense and antisense genes, such as only polyadenylated ESTs [with poly(A) tail or signal].

Expression profile of MKRN2/RAF1 is shown as bar plot, based on all spliced ESTs derived from the plus strand (MKRN2) and minus strand (RAF1) of this genomic locus. Users could change the criteria in the control panel on the loci page to select any other subsets of ESTs to profile the sense and antisense genes, such as only polyadenylated ESTs [with poly(A) tail or signal]. We implemented several search options in NATsDB (Figure 4). Boolean operators are supported for all text searches. Users can search for genes with Entrez Gene names, synonyms, and descriptions given the conditions including overlapping pattercoding potential and minimum overlapping length of representative SA pairs, or search for transcripts with mRNA/EST accession numbers or descriptions. They can search for genes in NATsDB that are listed in OMIM to be involved in disease(s). Users can also specify a genomic location and retrieve all SA/NOB/NBD clusters in that region. Finally, users can search NATsDB using BLAST (Blastn, Tblastn or Tblastx) to find SA/NOB/NBD sequences similar to the query sequence of their interest.
Figure 4

The search interface of NATsDB NATsDB supports multiple search methods including free text search, OMIM disease search, chromosomal location search and BLAST sequence search.

The search interface of NATsDB NATsDB supports multiple search methods including free text search, OMIM disease search, chromosomal location search and BLAST sequence search. Data in NATsDB are stored in a MySQL 5.0 () relational database, which comprises 80 tables and requires ∼20 GB of storage. MySQL indexes were extensively created to speed up online query. All the representative SA and NOB pairs are free to download. We will continue to maintain NATsDB with a major update every quarter. Similar to Ensembl (29), we archive older releases and make them accessible for users.

DISCUSSION

Although genome browsers such as GoldenPath (21) and Ensembl (29) can display a specific genomic locus with cDNAs and ESTs aligned to it, users interested in the study of antisense transcription would need to know a priori which loci to open or manually check each locus one by one to find SA and NOB pairs. Thus despite the tremendous general utility of GoldenPath and Ensembl, databases such as NATsDB are necessary for the study of antisense beyond single-gene scale. NATsDB also displays other features not available in the general browsers such as poly(A)/poly(T) signals and tails. As more EST and genomic sequence data become available, we will continue to enrich NATsDB with more SA/NOB pairs in more species.
  29 in total

1.  Antisense starts making more sense.

Authors:  Gordon G Carmichael
Journal:  Nat Biotechnol       Date:  2003-04       Impact factor: 54.908

2.  Ensembl 2004.

Authors:  E Birney; D Andrews; P Bevan; M Caccamo; G Cameron; Y Chen; L Clarke; G Coates; T Cox; J Cuff; V Curwen; T Cutts; T Down; R Durbin; E Eyras; X M Fernandez-Suarez; P Gane; B Gibbins; J Gilbert; M Hammond; H Hotz; V Iyer; A Kahari; K Jekosch; A Kasprzyk; D Keefe; S Keenan; H Lehvaslaiho; G McVicker; C Melsopp; P Meidl; E Mongin; R Pettett; S Potter; G Proctor; M Rae; S Searle; G Slater; D Smedley; J Smith; W Spooner; A Stabenau; J Stalker; R Storey; A Ureta-Vidal; C Woodwark; M Clamp; T Hubbard
Journal:  Nucleic Acids Res       Date:  2004-01-01       Impact factor: 16.971

3.  Antisense transcripts with FANTOM2 clone set and their implications for gene regulation.

Authors:  Hidenori Kiyosawa; Itaru Yamanaka; Naoki Osato; Shinji Kondo; Yoshihide Hayashizaki
Journal:  Genome Res       Date:  2003-06       Impact factor: 9.043

4.  Over 20% of human transcripts might form sense-antisense pairs.

Authors:  Jianjun Chen; Miao Sun; W James Kent; Xiaoqiu Huang; Hanqing Xie; Wenquan Wang; Guolin Zhou; Run Zhang Shi; Janet D Rowley
Journal:  Nucleic Acids Res       Date:  2004-09-08       Impact factor: 16.971

5.  Widespread occurrence of antisense transcription in the human genome.

Authors:  Rodrigo Yelin; Dvir Dahary; Rotem Sorek; Erez Y Levanon; Orly Goldstein; Avi Shoshan; Alex Diber; Sharon Biton; Yael Tamir; Rami Khosravi; Sergey Nemzer; Elhanan Pinner; Shira Walach; Jeanne Bernstein; Kinneret Savitsky; Galit Rotman
Journal:  Nat Biotechnol       Date:  2003-03-17       Impact factor: 54.908

Review 6.  Antisense RNA control in bacteria, phages, and plasmids.

Authors:  E G Wagner; R W Simons
Journal:  Annu Rev Microbiol       Date:  1994       Impact factor: 15.500

Review 7.  In search of antisense.

Authors:  Giovanni Lavorgna; Dvir Dahary; Ben Lehner; Rotem Sorek; Christopher M Sanderson; Giorgio Casari
Journal:  Trends Biochem Sci       Date:  2004-02       Impact factor: 13.807

8.  Is the G72/G30 locus associated with schizophrenia? single nucleotide polymorphisms, haplotypes, and gene expression analysis.

Authors:  Michael Korostishevsky; Miryam Kaganovich; Alina Cholostoy; Maya Ashkenazi; Yael Ratner; Dvir Dahary; Jeanne Bernstein; Ullrike Bening-Abu-Shach; Edna Ben-Asher; Doron Lancet; Michael Ritsner; Ruth Navon
Journal:  Biol Psychiatry       Date:  2004-08-01       Impact factor: 13.382

9.  Antisense intronic non-coding RNA levels correlate to the degree of tumor differentiation in prostate cancer.

Authors:  Eduardo M Reis; Helder I Nakaya; Rodrigo Louro; Flavio C Canavez; Aurea V F Flatschart; Giulliana T Almeida; Camila M Egidio; Apuã C Paquola; Abimael A Machado; Fernanda Festa; Denise Yamamoto; Renato Alvarenga; Camille C da Silva; Glauber C Brito; Sérgio D Simon; Carlos A Moreira-Filho; Katia R Leite; Luiz H Camara-Lopes; Franz S Campos; Etel Gimba; Giselle M Vignal; Hamza El-Dorry; Mari C Sogayar; Marcello A Barcinski; Aline M da Silva; Sergio Verjovski-Almeida
Journal:  Oncogene       Date:  2004-08-26       Impact factor: 9.867

10.  Antisense transcripts with rice full-length cDNAs.

Authors:  Naoki Osato; Hitomi Yamada; Kouji Satoh; Hisako Ooka; Makoto Yamamoto; Kohji Suzuki; Jun Kawai; Piero Carninci; Yasuhiro Ohtomo; Kazuo Murakami; Kenichi Matsubara; Shoshi Kikuchi; Yoshihide Hayashizaki
Journal:  Genome Biol       Date:  2003-12-11       Impact factor: 13.583

View more
  36 in total

1.  Meiosis-induced alterations in transcript architecture and noncoding RNA expression in S. cerevisiae.

Authors:  Karen S Kim Guisbert; Yong Zhang; Jared Flatow; Sara Hurtado; Jonathan P Staley; Simon Lin; Erik J Sontheimer
Journal:  RNA       Date:  2012-04-26       Impact factor: 4.942

2.  Novel noncoding antisense RNA transcribed from human anti-NOS2A locus is differentially regulated during neuronal differentiation of embryonic stem cells.

Authors:  Sergei A Korneev; Elena I Korneeva; Marya A Lagarkova; Sergei L Kiselev; Giles Critchley; Michael O'Shea
Journal:  RNA       Date:  2008-10       Impact factor: 4.942

3.  A cautionary note for retrocopy identification: DNA-based duplication of intron-containing genes significantly contributes to the origination of single exon genes.

Authors:  Yong E Zhang; Maria D Vibranovski; Benjamin H Krinsky; Manyuan Long
Journal:  Bioinformatics       Date:  2011-05-05       Impact factor: 6.937

4.  piRNA profiling during specific stages of mouse spermatogenesis.

Authors:  Haiyun Gan; Xiwen Lin; Zhuqiang Zhang; Wei Zhang; Shangying Liao; Lixian Wang; Chunsheng Han
Journal:  RNA       Date:  2011-05-20       Impact factor: 4.942

Review 5.  HMGA1-pseudogene overexpression contributes to cancer progression.

Authors:  Francesco Esposito; Marco De Martino; Floriana Forzati; Alfredo Fusco
Journal:  Cell Cycle       Date:  2014       Impact factor: 4.534

6.  Transcriptional regulation of translocator protein (Tspo) via a SINE B2-mediated natural antisense transcript in MA-10 Leydig cells.

Authors:  Jinjiang Fan; Vassilios Papadopoulos
Journal:  Biol Reprod       Date:  2012-05-10       Impact factor: 4.285

7.  The effect of temperature on Natural Antisense Transcript (NAT) expression in Aspergillus flavus.

Authors:  Carrie A Smith; Dominique Robertson; Bethan Yates; Dahlia M Nielsen; Doug Brown; Ralph A Dean; Gary A Payne
Journal:  Curr Genet       Date:  2008-09-24       Impact factor: 3.886

8.  Extensive structural renovation of retrogenes in the evolution of the Populus genome.

Authors:  Zhenglin Zhu; Yong Zhang; Manyuan Long
Journal:  Plant Physiol       Date:  2009-09-29       Impact factor: 8.340

9.  Identification of differentially expressed sense and antisense transcript pairs in breast epithelial tissues.

Authors:  Anita Grigoriadis; Gavin R Oliver; Austin Tanney; Howard Kendrick; Matt J Smalley; Parmjit Jat; A Munro Neville
Journal:  BMC Genomics       Date:  2009-07-17       Impact factor: 3.969

10.  Integrative analysis of the human cis-antisense gene pairs, miRNAs and their transcription regulation patterns.

Authors:  Oleg V Grinchuk; Piroon Jenjaroenpun; Yuriy L Orlov; Jiangtao Zhou; Vladimir A Kuznetsov
Journal:  Nucleic Acids Res       Date:  2009-11-11       Impact factor: 16.971

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.