| Literature DB >> 15345051 |
Bianca Habermann1, Anne-Gaelle Bebin, Stephan Herklotz, Michael Volkmer, Kay Eckelt, Kerstin Pehlke, Hans Henning Epperlein, Hans Konrad Schackert, Glenis Wiebe, Elly M Tanaka.
Abstract
BACKGROUND: The ambystomatid salamander, Ambystoma mexicanum (axolotl), is an important model organism in evolutionary and regeneration research but relatively little sequence information has so far been available. This is a major limitation for molecular studies on caudate development, regeneration and evolution. To address this lack of sequence information we have generated an expressed sequence tag (EST) database for A. mexicanum.Entities:
Mesh:
Substances:
Year: 2004 PMID: 15345051 PMCID: PMC522874 DOI: 10.1186/gb-2004-5-9-r67
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Some characteristics of the A. mexicanum EST contigs
| Library | Number of sequences | Number of contigs (+ singlets) | Number of clones in contigs | Number of clones in singlets |
| St18-22 neural tube | 7,469 | |||
| 6D tail blastema | 9,883 | |||
| Combined total | 17,352 | 6,377 | 12,791 | 4,561 |
The number of expressed sequence tags sequenced from the two libraries blastema and neural tube, as well as the number of contigs, the number of clones in contigs and the number of clones found in singlets is shown.
Figure 1Distribution of sequence length. (a) Distribution of read lengths of the sequenced ESTs after quality control. The average read length was 569 bp, corresponding to a peak of between 500 and 600 bp. (b) Distribution of sequence length of assembled contigs. The average length of contigs was 597 bp. (c) Distribution of the number of ESTs per assembled contig. Most of the contigs had one EST. The two largest contigs contained over 400 ESTs (cytochrome c oxidase subunit I and 12S rRNA, respectively).
Gene definition of the most abundant contigs in the A. mexicanum EST libraries
| Gene definition | Number of clones in contig |
| Cytochrome | 469 |
| 12S rRNA | 445 |
| Nuclear factor 7 | 332 |
| Keratin type II | 274 |
| Keratin | 211 |
| Cytoplasmic β-actin | 206 |
The gene with the highest number of clones identified was cytochrome c oxidase subunit I (469 clones in contig), followed by 12S rRNA (445) and nuclear factor 7 (332 clones in contig).
Figure 2Homology of A. mexicanum contigs to protein and nucleotide sequences from other species. (a) Distribution of E-values from the first identified hit in the protein non-redundant database that was used to assign a putative identity to the contig. The majority of contigs identified a protein with an E-value between 1e-20 and 1e-99. In 11% of the cases, the E-value of the first hit was below 1e-100 and can therefore be considered a true ortholog. (b) Distribution of hits in the different sequence databases that were searched sequentially.
Contig identities and GenBank identifiers of ESTs unique to A. mexicanum
| Contig ID | GenBank identifier | UTR |
| Am_1065 | BI817418.1 | |
| Am_13 | BI817561.1 | |
| Am_1868 | BI817299.1 | |
| Am_1879 | BI817273.1 | UTR |
| Am_1986 | BI817397.1 | |
| Am_2156 | BI817699.1 | UTR |
| Am_2280 | BI817354.1 | |
| Am_242 | BI817917.1 | |
| Am_2631 | BI817344.1 | |
| BI818040.1 | ||
| BI817371.1 | ||
| Am_2695 | BI818066.1 | UTR |
| Am_2767 | BI817941.1 | UTR |
| Am_2952 | BI817736.1 | |
| Am_3070 | BI817303.1 | |
| Am_3486 | BI817478.1 | |
| Am_3807 | BI817992.1 | UTR |
| Am_3828 | BI817981.1 | |
| BI817250.1 | ||
| Am_4598 | BI817704.1 | |
| Am_4661 | BI817548.1 | UTR |
| Am_4720 | BI817653.1 | UTR |
| Am_5031 | BI817804.1 | UTR |
| Am_5579 | BI818004.1 | |
| Am_5650 | BI817315.1 | |
| Am_5742 | BI817525.1 | UTR |
| Am_5881 | BI818060.1 | |
| Am_6107 | BI817553.1 | UTR |
| Am_6128 | BI817667.1 | UTR |
| Am_6198 | BI817866.1 | |
| Am_646 | BI817520.1 | |
| BI817607.1 | ||
| BI817743.1 | ||
| Am_6565 | BI817313.1 | UTR |
| Am_901 | BI817984.1 |
The table shows contig identities and GenBank identifiers of existing A. mexicanum ESTs that do not share any homology to a known protein or nucleotide sequence and can therefore be considered unique.
Figure 3Annotated GO terms and protein domains in the A. mexicanum EST libraries. (a) Gene Ontology electronic annotation in the category 'biological process' of contigs from A. mexicanum. The largest proportion of annotated contigs was assigned a 'cellular process' (87%). Of those, five large groups of cellular processes emerged, with 'cell cycle/proliferation' (13%), 'intracellular signaling' and 'intracellular transport' (8% and 15%), 'metabolism' (17%), 'protein metabolism/modification' (18%) and 'RNA metabolism' (13%). (b) Domains associated with cellular processes identified in the A. mexicanum contig sequence dataset. The largest fraction of contigs was associated with a domain function in 'intracellular transport', followed by 'RNA-binding and metabolism' and 'DNA-binding and transcriptional control'.
The most abundant biological processes assigned to the A. mexicanum contigs
| Biological process | Total number of contigs | % contigs | BL/NT | Fisher's exact (BL/NT) |
| Protein metabolism | 324 | 15 | 116/132 | 3/1 |
| Metabolism | 296 | 13.7 | 78/170 | 0/3 |
| Intracellular transport | 268 | 12.4 | 59/53 | 4/5 |
| RNA metabolism | 227 | 10.5 | 127/45 | 22/2 |
| Cell cycle | 194 | 9 | 95/52 | 5/2 |
| Intracellular signaling | 148 | 6.8 | 95/65 | 1/6 |
| DNA metabolism/repair | 90 | 4.1 | 50/12 | 3/0 |
| Development | 69 | 3.2 | 32/27 | 0/2 |
| Cell-cell communication | 81 | 3.7 | 24/42 | 0/6 |
| Differentiation | 27 | 1.5 | 13/7 | 2/3 |
The highest-ranking biological process is 'protein metabolism/modification' with 15% of contigs assigned. 'Cellular metabolism', 'intracellular transport' and 'RNA metabolism' have all more than 10% of contigs assigned and represent the most abundant gene families in the two libraries. The percentage contigs refers to the number of contigs assigned a biological process. BL: Blastema; NT: Neural tube.
Common protein domains identified in the A. mexicanum contigs and comparison to domain occurrences in other vertebrate species
| Domain | ||||||
| EF-hand | 10 | 319 | 308 | 36 | 48 | 38 |
| Cyclin | 12 | 60 | 58 | 20 | 9 | 15 |
| Chromo | 5 | 26 | 26 | 8 | 5 | 5 |
| Prox1 | 5 | 4 | 2 | 2 | 1 | 2 |
| HLH | 8 (1) | 167 | 179 | 83 | 70 | 75 |
| HOX | 13 (19) | 280 | 352 | 196 | 142 | 250 |
| PAX | 1 (4) | 12 | 31 | 25 | 9 | 13 |
| EGF | 10 | 310 | 281 | 26 | 50 | 32 |
| SET | 2 | 82 | 64 | 4 | 3 | 1 |
| RAS | 37 | 220 | 194 | 34 | 11 | 27 |
| RhoGEF | 4 | 124 | 98 | 4 | 2 | 3 |
| PH | 2 | 453 | 374 | 14 | 10 | 18 |
| PX | 4 | 70 | 74 | 2 | 0 | 3 |
| WD40 | 39 | 547 | 490 | 63 | 12 | 50 |
| Cullin | 3 | 8 | 20 | 0 | 0 | 2 |
| F-box | 2 | 119 | 130 | 11 | 0 | 8 |
| HectC | 3 | 64 | 66 | 4 | 2 | 1 |
| RING | 17 | 374 | 325 | 18 | 16 | 29 |
| KH | 23 (1) | 71 | 52 | 20 | 7 | 10 |
| RRM | 101 (2) | 443 | 438 | 94 | 23 | 69 |
| PDZ | 8 | 260 | 252 | 17 | 11 | 23 |
| Kinase | 69 (2) | 949 | 954 | 210 | 122 | 156 |
| LIM | 5 | 128 | 125 | 22 | 19 | 22 |
| PHD | 4 | 164 | 122 | 13 | 4 | 3 |
Numbers in parentheses indicate the number of domains that had been annotated to a protein sequence from A. mexicanum prior to this project.
Gene families identified that are either involved in cell-cycle control or developmental processes
| Cellular process | Putative ID of contig | Contig | Expression |
| Cell cycle | Cyclin A2 | Am_20 | BL unique |
| Cyclin B1 | Am_1031 | BL 3x | |
| Cyclin B2 | Am_4185 | NT unique | |
| Cyclin B3 | Am_3173 | BL unique | |
| Cyclin E1 | Am_38 | BL unique | |
| Cyclin E2 | Am_91 | BL unique | |
| Cdk4 | Am_3891 | BL unique | |
| Polo kinase | Am_1717 | BL unique | |
| Cdc25A | Am_3678 | BL unique | |
| p27/Kip1 | Am_4671 | NT unique | |
| Cdc20 | Am_2213 | BL unique | |
| Cdh1 | Am_1148 | BL unique | |
| Development | Wnt8 | Am_384 | BL unique |
| Wnt5 | Am_642 | BL unique | |
| FGFR4a | Am_1393 | BL unique | |
| Sonic hedgehog | Am_3741 | BL unique | |
| Activin receptor type II | Am_3590 | BL unique | |
| TGF-β | Am_4990 | NT unique | |
| BMP-1 | Am_4639 | NT unique | |
| Cdx1 | Am_875 | BL unique | |
| Cdx2 | Am_387 | BL unique | |
| HoxA2 | Am_2387 | BL unique | |
| HoxB13 | Am_4865 | NT unique | |
| HoxC4 | Am_3998 | BL unique | |
| HoxC8 | Am_2910 | BL unique | |
| Pax6 | Am_2945 | BL unique | |
| Smad5 | Am_1420 | BL unique | |
| Smad8 | Am_4665 | NT unique | |
| Retinoblastoma binding protein 2 | Am_2723 | BL unique | |
| Beta-catenin | Am_699 | BL 3x | |
| Zic5 | Am_2068 | BL unique | |
| Frizzled 2 | Am_3243 | BL unique | |
| Frizzled 5 | Am_3451 | BL unique | |
| Frizzled 7 | Am_2334 | BL unique | |
| TRIP12 | Am_6416 | NT unique |
The identifier of the A. mexicanum contig is given in the third column. The expression pattern as determined by in silico differential display is shown in column 4.
Figure 4Phylogenetic analysis of the vertebrate cyclin-dependent kinase (CDK) inhibitors (CKIs) p21(Cip1), p27(Kip1) and p57(Kip2). (a) Reference phylogenetic tree of mitochondrial 12S rRNA. The Caudata and Salientia both branch out to build the amphibian group. (b) Unrooted phylogenetic tree of the cyclin B1 gene family. The amphibian cyclin B1 family members form a distinct group. (c) Unrooted phylogenetic tree of the amino-terminal CDK-inhibitory domain of vertebrate p21, p27, p28 and p57, which is conserved between the protein families. p27 of A. mexicanum clearly groups with the p27 proteins from other vertebrates. The amphibian-specific p28-family does not parse with any singe group. Note, however, that unlike the 12S rRNA tree, the A. mexicanum and A. t. tigrinum p27 branch out with that of D. rerio. (d) Unrooted, phylogenetic tree of the full-length kinase inhibitor sequences. Using the full-length protein sequences from the CKI families, the p28 family branches off between the p21 and p27 families. (e) Multiple sequence alignment of the amino-terminal, CDK-inhibitory region of the CKI families. The protein sequence of A. mexicanum p27 is clearly the ortholog of the p27 family, yet displays higher than expected divergence on the protein level. The same divergence is observed for the ambystomatid p57 proteins. The p28 family has extremely high sequence divergence compared to any other CDKN1 family member. Conserved residues between the three CDKN1 families are highlighted in green and the p28-family in light blue. Residues that differ between ambystomatid sequences and the other vertebrate species are highlighted in the ambystomatid sequences in red. Accession numbers are: NM_131513 (D. rerio ccnb1), NM_031966 (H. sapiens ccnb1), BC041302 (X. laevis ccnb1), NM_172301 (M. musculus ccnb1), NM_171991 (R. norvegicus ccnb1), P13351 (X. leavis ccnb2), XP_343420 (R. norvegicus ccnb2), P29332 (G. gallus ccnb2), NP_004692 (H. sapiens ccnb2), NP_031656 (M. musculus ccnb2), CAC24491 (X. laevis ccnb3), P39963 (G. gallus ccnb3), CAC94915 (H. sapiens ccnb3), NP_898836 (M. musculus ccnb3), AAH56746.1 (D. rerio p27A, Drp27A); AAK84219.1 (D. rerio p27, Drp27); CN056871.1 (A. t. tigrinum p27, Attp27); AAM22491.1 (G. gallus p27, Ggp27); NP_004055.1 (H. sapiens p27, Hsp27); P46414 (M. musculus p27, Mmp27); NP_113950.1 (R. norvegicus p27, Rnp27); NP_000067.1 (H. sapiens p57, Hsp57); P49919 (M. musculus p57, Mmp57); XP_341967.1 (R. norvegicus p57, Rnp57); CN039016.1 (A. mexicanum p57, Amp57); BM489375.1 (G. gallus p57, Ggp57); CK697132.1 (D. rerio p57, Drp57); AAH01935.1 (H. sapiens p21, Hsp21); NP_031695.1 (M. musculus p21, Mmp21); NP_542960.1 (R. norvegicus p21, Rnp21); AL639561.2 (X. tropicalis p21, Xtp21); BJ065460.1 (X. laevis p21, Xlp21); AAN63876.1 (G. gallus p21, Ggp21); I51683 (X. laevis Xic1, XlXic1); BX712320.1 (X. tropicalis p28, Xtp28); TNeu143i03.p1cSP6 (X. tropicalis p28A, Xtp28A); CN033557.1 (A. mexicanum p28, Amp28); CN035131.1 (A. mexicanum p28A, Amp28A); CN033708.1 (A. mexicanum p28B, Amp28B). The scale bar indicates substitutions per site.
Occurrence of CKI-family members in different vertebrate species
| Human | Zebrafish | Fugu | |||
| CDKN1A (p21) | + | -* | + | + | -* |
| CDKN1B (p27Kip1) | + | + | + | -† | + |
| CDKN1C (p57) | + | + | + | -† | + |
| p28Kix1 | - | - | - | +‡ | +‡ |
* Genes most likely present, yet not identified due to limited sequence information; † genes not present in genomic sequence information; ‡ genes so far only present in amphibian species. Databases searched were the human, mouse, rat, fugu, zebrafish and X. tropicalis genome databases, and the EST databases for X. laevis, X. tropicalis, zebrafish, A. mexicanum and A. tigrinum.
Figure 5The Ambystoma mexicanum EST database. A relational database was created as a sequence storage and annotation resource of the sequenced ESTs from A. mexicanum. (a) The main entry site of the EST resource is the contig page, where a subset of the information is available, including the identity of included ESTs, putative identity of the contig, GO annotation including cellular role, biochemical function and cellular component, a list of homologs from different model organisms, and identified conserved domains. Source data are available for all BLAST-based alignments, for external sequence or domain data, and for the complete contig sequence. (b,c) EST information and protein information pages, containing more detailed description of storage information, library source and read length (b). A complete list of homologs and identified conserved domains can be assessed on the protein information page (c). For a more detailed description of the database, see text.