| Literature DB >> 15610564 |
Matthias B Wahl1, Randolph B Caldwell, Andrzej M Kierzek, Hiroshi Arakawa, Eduardo Eyras, Nina Hubner, Christian Jung, Manuel Soeldenwagner, Manuela Cervelli, Yan-Dong Wang, Volkmar Liebscher, Jean-Marie Buerstedde.
Abstract
BACKGROUND: The understanding of whole genome sequences in higher eukaryotes depends to a large degree on the reliable definition of transcription units including exon/intron structures, translated open reading frames (ORFs) and flanking untranslated regions. The best currently available chicken transcript catalog is the Ensembl build based on the mappings of a relatively small number of full length cDNAs and ESTs to the genome as well as genome sequence derived in silico gene predictions.Entities:
Mesh:
Substances:
Year: 2004 PMID: 15610564 PMCID: PMC543457 DOI: 10.1186/1471-2164-5-98
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
SAGE and unitag collections
| Busage | 65,798 | 24,064 | 2.73 | 4.63 |
| Dt40sage | 63,770 | 21,308 | 2.99 | 4.97 |
| Total | 129,568 | 38,212 | 3.39 |
Unitag mapping to reference datasets
| Bursal cDNA | 3,030 | 26,044 | 6.89 | |
| 1 | 2,997 | 6.90 | ||
| > 1 | 33 | 5.15 | ||
| Ensembl transcript build | 2,934 | 208,048 | 9.93 | |
| 1 | 2,275 | 10.60 | ||
| > 1 | 659 | 7.63 | ||
| Genome | 14,505 | 9,091,924 | 2.84 | |
| 1 | 13,427 | 2.83 | ||
| > 1 | 1,078 | 2.91 | ||
| Total matching | 20,469 | 4.45 | ||
| Non-matching | 17,743 | 2.17 | ||
Locations of unitags having a single match in genome but no transcript match
| Total | 13,427 | |||||
| Within Ensembl transcript boundaries | 1,637 | |||||
| Outside Ensembl transcript boundaries | 11,177 | 100 | 409 | 362 | 46 | 1 |
| 200 | 732 | 668 | 64 | 2 | ||
| 500 | 1,651 | 1,496 | 143 | 12 | ||
| 1,000 | 2,896 | 2,553 | 262 | 81 | ||
| 5,000 | 7,169 | 5,439 | 669 | 1,061 | ||
| 10,000 | 8,639 | 5,627 | 911 | 2,101 | ||
Analysis of unitags mapping 5' of or within Ensembl transcript boundaries. #
| CATGCTGCTCGCACGAGCCCT | ENSGALT00000002525.1 | Q9W7P7 | riken1_17l12r1 | Upstream 5' exon |
| CATGGCGGGGTTCCCGGGGCA | ENSGALT00000005092.1 | PEF protein with a long N-terminal hydrophobic domain | riken1_18i20r1 | Upstream 5' exon (EST supports two additional 5' exons) |
| CATGCTCCTGCTGCTGGCTGG | ENSGALT00000009521.1 | LAC_CHICK | dkfz426_24a5r1 | Upstream 5' exon |
| CATGAGGCACCTCCTGTTGGC | ENSGALT00000001476.1 | GR78_CHICK | riken1_25c14r1 | 5' upstream/Exon1 (EST supports one additional 5' exon) |
| CATGGCCGCCCAAGGAGAGCC | ENSGALT00000004055.1 | RAN_CHICK | riken1_25b20r1 | 5' upstream/Exon1 (EST supports one additional 5' exon) |
| CATGTACTGGTTGTCTGTTTT | ENSGALT00000025884 | HG14_CHICK | dkfz426_13h16r1 | Intron 4–5 |
| CATGCATAGAGGCTTTATTGC | ENSGALT00000021336 | Aldo-keto reductase family 1 member | dkfz426_3h12r1 | Intron 8–9 |
| CATGTTGGGACTCACCACTCT | ENSGALT00000000504 | No description | dkfz426_13d22r1 | Intron 5–6/Exon6 |
| CATGGTCACCCTAGTAAATAG | ENSGALT00000009677 | Protein kinase C, beta type | dkfz426_38f16r1 | Intron 14–15 |
| CATGTAAAGTGTTAGCTGTAC | ENSGALT00000006857 | ITF2_CHICK | dkfz426_14i24r1 | Intron 8–9 |
| CATGTTACCTGCAACCTGCTG | ENSGALT00000021577 | Centromeric protein E | dkfz426_17a21r1 | Intron 28–29 |
| CATGGGATATACTGAAAATCT | ENSGALT00000009956 | T-cell activation leucine repeat-rich protein | dkfz426_41d20r1 | Intron 1–2 |
| CATGGGCTGGTTGGTTTTTGT | ENSGALT00000028428 | No description | dkfz426_43g3r1 | Intron 2–3 |
| CATGGTCAAGTACAACTCTTA | ENSGALT00000022583 | Bcl-2-associated transcription factor | dkfz426_12n7r1 | Intron 8–9 |
# Only a few representative examples are shown
## BLAST results are abbreviated
### Unitag aligns within an intron or exon or lies across an intron/exon or upstream sequence/exon boundary
Figure 2Mappings of SAGE unitags downstream of Ensembl transcripts compared to simulated genomic tags. The number of tags falling within windows of 10 bp is plotted on the y-axis whereas the distance from the 3' end of the nearest predicted Ensembl transcript is plotted on the x-axis. Sage unitags coordinates are indicated by crosses and randomly selected tag coordinates by diamonds.
Unitag mapping to transcripts
| Total | 38,212 | ||||
| Without match | 17,743 | ||||
| With only multiple genome matches | 1,078 | ||||
| With match to annotated transcripts or single genome match | 19,391 | 5,964 | 1,637 | 7,169 | 4,621 |
List of genes differentially expressed in bursal cells and DT40
| CATGGCAGGGGGCGGAAACCT | 4 | 45 | 2.83E-10 | riken1_2o24 | (AAH61765) Hypothetical protein |
| CATGGTGAGCCAAGGTGTTGT | 24 | 82 | 2.06E-9 | riken1_4m1 | (AAH69219) Cold inducible RNA-binding protein |
| CATGCAGAAATAAGCTTCTCC | 45 | 109 | 4.09E-8 | riken1_7b15 | (Q7ZUR6) Similar to muscle-specific beta 1 integrin binding protein |
| CATGAGCGGGGGCAGCACTTG | 118 | 203 | 5.75E-7 | riken1_25p23 | (Q90YW7) Ribosomal protein L4 |
| CATGCTGGAAGAAAGAATAAC | 46 | 114 | 1.92E-8 | riken1_32c11 | (Q9YGQ1) Peptide elongation factor 1-beta |
| CATGCGCTCTCCTTTTAAAAG | 9 | 41 | 2.67E-6 | riken1_15l3 | (CAA31409) Chinese hamster asparagine synthetase |
| CATGGATGGCCAGCAAGTGTT | 29 | 4 | 1.17E-5 | riken1_4k19 | (P13796) L-plastin (Lymphocyte cytosolic protein 1) |
| CATGTCCGTGGCATCCTTTGA | 0 | 16 | 1.18E-5 | riken1_24e23 | (Q8BGQ8) Heterogeneous nuclear ribonucleoprotein K |
| CATGGCTTTGGAATATTTGAC | 25 | 3 | 2.90E-5 | riken1_2f9 | (AAH46152) Selenoprotein P precursor |
| CATGGAGTCCATAACACGGCG | 21 | 2 | 6.88E-5 | riken1_34m12 | (Q96CJ1) Testosterone regulated apoptosis inducer and tumor suppressor |
| CATGCAAAGTGCCCTTGGCTT | 17 | 1 | 1.46E-4 | riken1_10g19 | (P30281) G1/S-specific cyclin D3 |
| CATGTAAGCCAATTCTGAACC | 19 | 1 | 4.09E-5 | riken1_33a18 | (Q8JHJ4) TNF family B cell activation factor |
| CATGTTGTACACACGGGCACT | 11 | 0 | 5.79E-4 | riken1_5g12 | (Q90YB0) FEN-1 nuclease |
| CATGTGCCCGTGACCCCCATC | 2 | 16 | 6.12E-4 | riken1_4n15 | (Q13200) 26S proteasome non-ATPase regulatory subunit 2 |
| CATGTCGTGCTCTGTGCCTCC | 5 | 26 | 9.28E-5 | riken1_2i9 | (Q90W60) XNop56 protein |
| CATGCTTTCTGCTTTGACTTT | 21 | 4 | 9.42E-4 | riken1_12p16 | (P22794) Ecotropic viral integration site 2A protein |
| CATGTTTGTGCATAGCTGTCC | 5 | 28 | 1.17E-5 | riken1_30e3 | (Q91XC8) Similar to death-associated protein |
| CATGGCCGGGCGCCCCACCAG | 0 | 15 | 2.41E-5 | riken1_15i13 | (Q99P44) Leucine aminopeptidase |
| CATGGGACCAACAAATAAAGC | 19 | 4 | 0.0027 | riken1_4o10 | (P97440) Histone RNA hairpin-binding protein |
| CATGAAAATGTACTGTGCTAA | 2 | 13 | 0.0036 | riken1_20p3 | (P34022) Ran-specific GTPase-activating protein |
| CATGTATACAGAACTGCTGGA | 8 | 0 | 0.0044 | riken1_2i24 | (Q9UMR2) ATP-dependent RNA helicase DDX19 |
| CATGGCCAAATTAGAGGAGTG | 1 | 10 | 0.0051 | riken1_32c11 | (Q9YGQ1) Peptide elongation factor 1-beta |
| CATGCTACGCTGTGTCTGCCA | 11 | 1 | 0.0062 | riken1_2m14 | (AAQ20009) Heterogeneous nuclear ribonucleoprotein H1-like protein |
| CATGCTCTCCGGTGGTACAAT | 0 | 7 | 0.0070 | riken1_32c11 | (Q9YGQ1) Peptide elongation factor 1-beta |
| CATGTTGATTCCTATGCTAAA | 7 | 0 | 0.0087 | riken1_3a6 | (Q9H165) B-cell lymphoma/leukemia 11A |
# only unitags matching bursal cDNAs are listed
## BLAST results are abbreviated
Figure 3Confirmation of differential gene expression using semi-quantitative PCR. Primers derived from reference genes for SAGE tags were used for the amplification of cDNA from bursal cells and DT40 employing different cycle numbers as indicated on top of the lanes. Based on the SAGE tag counts, the reference genes were classified as likely to be equally expressed (left part), higher expressed in bursal cells (middle part) or higher expressed in DT40 (right part). The size of the expected PCR product is indicated by a bar adjacent to the gel image. The numbers of tags found for the busage and dt40sage libraries as well as the calculated significance for differential expression are indicated in brackets under the gene names.