| Literature DB >> 16849434 |
Yong Zhang1, X Shirley Liu, Qing-Rong Liu, Liping Wei.
Abstract
We developed a fast, integrative pipeline to identify cis natural antisense transcripts (cis-NATs) at genome scale. The pipeline mapped mRNAs and ESTs in UniGene to genome sequences in GoldenPath to find overlapping transcripts and combining information from coding sequence, poly(A) signal, poly(A) tail and splicing sites to deduce transcription orientation. We identified cis-NATs in 10 eukaryotic species, including 7830 candidate sense-antisense (SA) genes in 3915 SA pairs in human. The abundance of SA genes is remarkably low in worm and does not seem to be caused by the prevalence of operons. Hundreds of SA pairs are conserved across different species, even maintaining the same overlapping patterns. The convergent SA class is prevalent in fly, worm and sea squirt, but not in human or mouse as reported previously. The percentage of SA genes among imprinted genes in human and mouse is 24-47%, a range between the two previous reports. There is significant shortage of SA genes on Chromosome X in human and mouse but not in fly or worm, supporting X-inactivation in mammals as a possible cause. SA genes are over-represented in the catalytic activities and basic metabolism functions. All candidate cis-NATs can be downloaded from http://nats.cbi.pku.edu.cn/download/.Entities:
Mesh:
Substances:
Year: 2006 PMID: 16849434 PMCID: PMC1524920 DOI: 10.1093/nar/gkl473
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1Six classes of SA gene pairs. SA gene pairs are classified into six classes based on their overlapping patterns. Arrows indicate transcription orientation. Blocks indicate exons. Dashed lines between blocks indicate introns.
Statistics of SA genes in 10 species
a‘Number of SA genes’ is equal to 2* ‘Number of SA clusters’.
b‘Abundance of SA genes’ is equal to 2* ‘Number of SA clusters’ / (2* ‘Number of SA clusters’ + 2* ‘Number of NOB clusters’ + ‘Number of NBD clusters’). The numbers and other details of NOB and NBD clusters are listed in Supplementary Table S1.
c‘Number of usable transcripts’ is the total number of orientation-reliable mRNAs and ESTs that can be mapped to genome sequences. For the last five species this number is too low compared to the genome size, indicating that the ‘Abundance of SA genes’ for these species is likely to change with additional data. For the other five species this number might be enough to give a relatively good estimate of ‘Abundance of SA genes’.
d‘Properties of usable transcripts’ uses cumulative bar charts to show the percentage of SA clusters in which both (blue), only one (dark red) and none (light yellow) of the forward and backward orientations consists of at least one mRNA sequence (‘mRNA’), coding sequence (‘CDS’), and intron-spanning sequence (‘Splice site’), respectively. CDS information is directly retrieved from UniGene annotation.
Figure 2Example of a quadruplet cluster. This quadruplet cluster consists of four distinct genes (AUP1, PRSS25, LOXL3 and DOK1) overlapping in a head–head, tail–tail and head–head manner, respectively. The overlap between LOXL3 and DOK1 is NOB-type. The other two overlaps are SA type. The genomic sequence is shown as red line at the top. The transcripts are aligned to the genomic sequence and shown underneath. Alternating black and gray blocks indicate neighboring exons and dashed lines between blocks indicate introns. Blocks in green indicate UTR. The poly(A) signals and poly(A) tails are also shown.
Human SA pairs conserved in both mouse and rat identified by cross-reference of HomoloGene
| Plus_strand_transcript | Minus_strand_transcript | ||
|---|---|---|---|
| Accession_number | Gene_name | Accession_number | Gene_name |
| NM_001001787 | ATP1B1 | NM_013330 | NME7 |
| NM_024854 | FLJ22028 | NM_002907 | RECQL |
| NM_001005407 | CACNA1H | NM_012467 | TPSG1 |
| NM_173618 | FLJ90652 | NM_003586 | DOC2A |
| NM_003250 | THRA | NM_021724 | NR1D1 |
| NM_032332 | MGC4238 | NM_001930 | DHPS |
| NM_014160 | MKRN2 | NM_002880 | RAF1 |
| NM_018295 | FLJ11000 | NM_024033 | MGC5242 |
| NM_032509 | RBM13 | NM_025115 | FLJ23263 |
SA pairs conserved in different species. The ‘accession number’ and ‘gene_name’ are from the RefSeq database and Entrez Gene database, respectively.
SA pairs in human, mouse and rat that are conserved in chicken identified by HomoloGene cross-reference of one representative gene and pair-wise sequence comparison of the other representative gene
| Plus_strand_transcript | Minus_strand_transcript | ||
|---|---|---|---|
| Accession_number | Gene_name | Accession_number | Gene_name |
| Human | |||
| NM_181706 | ZCSL3 | NM_144981 | FLJ25059 |
| BC073964 | PHC1 | NM_002355 | M6PR |
| AL832339 | MGC50559 | NM_207337 | LOC196394 |
| NM_024549 | FLJ21127 | NM_032369 | MGC15619 |
| NM_205850 | SLC24A5 | NM_016132 | MYEF2 |
| NM_024419 | PGS1 | NM_003727 | DNAH17 |
| NM_012249 | RHOQ | NM_002643 | PIGF |
| NM_000179 | MSH6 | NM_025133 | FBXO11 |
| NM_030582 | COL18A1 | AB209069 | SLC19A1 |
| BC042384 | N/A | NM_005965 | MYLK |
| NM_000938 | POLR2B | NM_001553 | IGFBP7 |
| NM_016067 | MRPS18C | NM_139076 | FLJ13614 |
| NM_002006 | FGF2 | NM_007083 | NUDT6 |
| NM_152683 | FLJ33167 | NM_024629 | MLF1IP |
| NM_032509 | RBM13 | NM_025115 | FLJ23263 |
| BE349850 | N/A | NM_000349 | STAR |
| BC063847 | INVS | NM_017746 | TEX10 |
| NM_153710 | C9orf96 | NM_020385 | XPMC2H |
| Mouse | |||
| NM_011955 | Nubp1 | BC068110 | LOC383103 |
| NM_145491 | Rhoq | NM_008838 | Pigf |
| NM_010830 | Msh6 | BC049946 | Fbxo11 |
| NM_028260 | 1500034J20Rik | NM_026992 | 1700030A21Rik |
| NM_175034 | Slc24a5 | AK034990 | Myef2 |
| NM_010569 | Invs | BC006867 | Tex10 |
| NM_001003893 | Masp2 | NM_001003898 | Tardbp |
| NM_153798 | Polr2b | NM_008048 | Igfbp7 |
| NM_144927 | BC019943 | NM_026453 | Rbm13 |
| NM_027973 | Mlf1ip | NM_001001184 | MGC86034 |
| Rat | |||
| NM_001013048 | Igfbp7_predicted | BG381131 | N/A |
| NM_001014002 | RGD1311297_ predicted | NM_001013883 | RGD1310414_ predicted |
| NM_019305 | Fgf2 | NM_181363 | Nudt6 |
| AA819391 | N/A | NM_181631 | RGD:727935 |
Three SA pairs, MSH6/FBXO11, POLR2B/IGFBP7 and RBM13/C8orf41 occur in all four vertebrate species.
SA pairs in human, mouse, and rat that are conserved in frog identified by pair-wise sequence comparison of both representative genes
| Plus_strand_transcript | Plus_strand_transcript | ||
|---|---|---|---|
| Accession_number | Gene_name | Accession_number | Gene_name |
| Human | |||
| AK096901 | FLJ39582 | NM_145042 | MGC16703 |
| BG722184 | N/A | AK056169 | TLP19 |
| NM_032704 | TUBA6 | BE743720 | N/A |
| NM_000179 | MSH6 | NM_025133 | FBXO11 |
| NM_002940 | ABCE1 | BU567906 | N/A |
| NM_015360 | SKIV2L2 | NM_003711 | PPAP2A |
| BI089035 | N/A | NM_001025 | RPS23 |
| NM_004279 | PMPCB | X98260 | ZRF1 |
| Mouse | |||
| AY223866 | Cog4 | NM_133953 | Sf3b3 |
| BG066336 | N/A | NM_031878 | Smarcd2 |
| NM_177325 | AW550801 | NM_013761 | Srr |
| NM_010830 | Msh6 | BC049946 | Fbxo11 |
| Rat | |||
| AA819391 | N/A | NM_181631 | RGD:727935 |
Figure 3Abundance of the six classes of SA gene pairs. The abundance of the six classes of SA gene pairs is shown in the cumulative bar chart for human, mouse, fly, worm, sea squirt, chicken, rat, frog, zebrafish and cow. The figure shows that convergent SA pairs are prominent in fly, worm and sea squirt, but not in human and mouse as reported previously. The abundance in the last five species is likely to change with more available data.
Figure 4Percentage of mouse imprinted gene that belong to SA, NOB and NBD clusters, respectively Three datasets of mouse imprinted genes were used, including the imprinted gene catalogue which was curated from literature (IGC), putative imprinted genes in the Ensembl database (Ensembl), and imprinted genes discovered from differential expression profiling of parthenogenote and androgenote mouse embryos (FANTOM2). SA genes are statistically significantly enriched (χ2 test P-value < 0.01) in imprinted genes compared to the NBD dataset.
Figure 5Chromosomal distribution of SA gene pairs in the mouse genome x-axis shows the different chromosomes in mouse genome; y-axis shows the chromosomal coordinate. ‘+’ marks a SA gene pair. SA genes are significantly less abundant in the X-chromosome compared to all autosomes. The chromosomal distribution of SA gene pairs in the human genome is very similar.