| Literature DB >> 17725830 |
Roxane M Barthélémy1, Anne Chenuil, Samuel Blanquart, Jean-Paul Casanova, Eric Faure.
Abstract
BACKGROUND: Chaetognaths, or arrow worms, are small marine, bilaterally symmetrical metazoans. The objective of this study was to analyse ribosomal protein (RP) coding sequences from a published collection of expressed sequence tags (ESTs) from a chaetognath (Spadella cephaloptera) and to use them in phylogenetic studies.Entities:
Mesh:
Substances:
Year: 2007 PMID: 17725830 PMCID: PMC2020476 DOI: 10.1186/1471-2148-7-146
Source DB: PubMed Journal: BMC Evol Biol ISSN: 1471-2148 Impact factor: 3.260
Figure 1Frequency of clones sequenced for each cDNA type. A: SSU RP cDNAs, B: LSU RP cDNAs. It is also indicated when isoforms (is.) have been found and when all the sequences contain frameshift(s) (FS). The characteristics of the 5'-end have been indicated for each cDNA type, TTT potential binding site(s) in black, TAT potential binding site(s) in white and partial sequences which do not contain the 5'-end in grey.
Structural characteristics of the complete ribosomal proteins from S. cephaloptera
| SSU RP name | N° of aa | Mr (Da) | p | N° of EST | EMBL acc. n° | LSU RP name | N° of aa | Mr (Da) | p | N° of EST | EMBL acc. n° |
| SA | # | 1 (FS) | P1 | 117 | 11,809 | 4.24 | 4 | ||||
| S2 | 190* | 1 | L2/L8 | 124# | 1 | ||||||
| S3 | 250# | 2 | L3 | 252# | 3 | ||||||
| S3a | 210# | 1 | L5 | 214# | 2 | ||||||
| S4 | 260 | 29,417 | 10.16 | 5 | L6 | 245 | 28,213 | 10.97 | 19 | ||
| S6 | 242# | 3 | L7 | 35 | |||||||
| S7 | 31 | is.1: 245 | 28,485 | 10.73 | 13 | ||||||
| is.1: 194 | 22,046 | 10.16 | 30 | is.2: 245 | 28,322 | 10.65 | 22 | ||||
| is.2: 194 | 22,049 | 10.22 | 1 | L9 | 188 | 21,489 | 9.59 | 3 | |||
| S8 | 23 | L10 | 217 | 25215 | 10.32 | 31 | |||||
| is.1: 208 | 23,735 | 10.65 | 10 | L10e/P0 | 233# | 4 | |||||
| is.2: 208 | 23,877 | 10.65 | 8 | L10a | 216 | 24,268 | 10.04 | 2 | |||
| is.3: 208 | 23,801 | 10.65 | 2 | L11 | 207 | 23,095 | 10.09 | 6 | |||
| 3 (FS) | L13 | 213 | 24,848 | 10.75 | 4 | ||||||
| S9 | 189 | 22,157 | 10.62 | 45 | L14 | 137 | 15,753 | 10.65 | 8 | ||
| S11 | 22 | L15 | 2 | ||||||||
| is.1: 156 | 17,969 | 10.74 | 21 | is.1: 205 | 24,092 | 11.45 | 1 | ||||
| is.2: 156 | 17,880 | 10.64 | 1 | is.2: 205 | 24,146 | 11.50 | 1 | ||||
| S12 | 143 | 15,587 | 6.34 | 4 | L17 | 190 | 21,642 | 10.45 | 2 | ||
| S13 | 151 | 17,160 | 10.75 | 14 | L18 | 188 | 21,527 | 11.83 | 3 | ||
| S14 | 151 | 16,369 | 10.67 | 7 | L18a | 178 | 20,905 | 10.73 | 15 | ||
| S15 | 15 | L21 | 7 | ||||||||
| is.1: 144 | 16,559 | 10.38 | 7 | is.1: 161 | 18,655 | 11.03 | 5 | ||||
| is.2: 144 | 16,529 | 10.38 | 8 | is.2: 161 | 18,637 | 11.33 | 2 | ||||
| S15a | 130 | 14,780 | 10.12 | 21 | L22 | 11 | |||||
| S16 | 25 | is.1: 131 | 15,025 | 9.39 | 10 | ||||||
| is.1: 145 | 16,345 | 10.38 | 14 | is.2: 128 | 14,713 | 9.59 | 1 | ||||
| is.2: 146 | 16,357 | 10.38 | 9 | L23 | 140 | 14,849 | 10.61 | 6 | |||
| 2 (FS) | L24 | 158 | 18,033 | 11.53 | 2 | ||||||
| S17 | 30 | L26 | 5 (FS) | ||||||||
| is.1: 134 | 15,568 | 10.12 | 19 | L27 | 41 | ||||||
| is.2: 135 | 15,742 | 10.08 | 10 | is.1: 136 | 15,830 | 10.53 | 15 | ||||
| 1 (FS) | is.2: 136 | 15,723 | 10.42 | 26 | |||||||
| S18 | 154 | 17,829 | 10.58 | 3 | L27a | 13 | |||||
| S19 | 139 | 15,514 | 10.49 | 29 | is.1: 145 | 16,073 | 10.82 | 3 | |||
| S20 | 125 | 13,634 | 10.04 | 3 | is.2: 145 | 16,065 | 10.57 | 10 | |||
| S21 | 12 | L28 | 132 | 14,440 | 11.85 | 8 | |||||
| is.1: 81 | 8,773 | 7.58 | 2 | L29 | 83 | 9,604 | 11.87 | 12 | |||
| is.2: 83 | 9,077 | 7.58 | 10 | L30 | 36 | ||||||
| S23 | 143 | 15,779 | 10.76 | 24 | is.1: 114 | 12,381 | 9.74 | 30 | |||
| S24 | 135 | 15,383 | 10.88 | 11 | is.2: 114 | 12,411 | 9.79 | 6 | |||
| S25 | 17 | L31 | 123 | 14,044 | 10.87 | 11 | |||||
| is.1: 115 | 12,719 | 10.12 | 14 | L32 | 133 | 15,693 | 11.46 | 11 | |||
| is.2: 114 | 12,663 | 10.15 | 2 | L34 | 24 | ||||||
| is.3: 127 | 13,911 | 10.50 | 1 | is.1: 130 | 14,470 | 11.42 | 16 | ||||
| S26 | 106 | 12,011 | 10.73 | 2 | is.2: 131 | 14,547 | 11.42 | 8 | |||
| S27 | 84 | 9,273 | 9.20 | 36 | L35 | 24 | |||||
| S28 | 21 | is.1: 123 | 14,326 | 11.72 | 13 | ||||||
| is.1: 64 | 7,279 | 10.54 | 7 | is.2: 123 | 14,468 | 11.49 | 11 | ||||
| is.2: 64 | 7,279 | 10.54 | 14 | L35a | 135 | 15,397 | 10.96 | 2 | |||
| S29 | 56 | 6,393 | 9.93 | 35 | L36 | 49 | |||||
| is.1: 106 | 12,318 | 11.49 | 16 | ||||||||
| is.2: 105 | 12,223 | 11.25 | 32 | ||||||||
| 1 (FS) | |||||||||||
| L37 | 10 | ||||||||||
| is.1: 100 | 11,621 | 11.77 | 9 | ||||||||
| is.2: 100 | 11,561 | 11.77 | 1 | ||||||||
| L37a | 93 | 10,411 | 11.04 | 8 | |||||||
| L38 | 35 | ||||||||||
| is.1: 70 | 8,220 | 10.43 | 29 | ||||||||
| is.2: 70 | 8,198 | 10.64 | 6 | ||||||||
| L39 | 15 | ||||||||||
| is.1: 51 | 6.321 | 12.55 | 13 | ||||||||
| is.2: 51 | 6.315 | 12.55 | 1 | ||||||||
| is.3: 51 | 6.305 | 12.55 | 1 | ||||||||
| L40-ubiq | 3 (FS) | ||||||||||
| L44/L36a | 106 | 12,400 | 10.67 | 42 |
Accession numbers correspond to protein sequences, except when less than three EST sequences have been found, where EST accession numbers are been given. When two ESTs belonging to the same mRNA type have been found, only the accession number of the longest sequence has been given. Abbreviations: Da, Daltons; FS, all the sequences from a cDNA type contain frameshift(s); L40-ubiq, L40-ubiquitin; Mr, molecular weight; pI, isoelectric point. # Incomplete COOH-end; *Incomplete NH2-end.
Figure 2Alignment of the two consensus 5'-ends of the . The 28 nucleotides region named n°2 has been found in 249 ESTs putatively encoding for 28 different SSU RPs, and in 319 ESTs putatively encoding for 37 different LSU RPs. The 28 nucleotides region of the sequence n°1 has been found in 75 ESTs putatively encoding for 25 different SSU RPs and in 96 ESTs putatively encoding for 29 different LSU RPs. The stars (*) indicate nucleotides which are conserved between these two sequences. The nucleotide regions which differ between these two sequences have been underlined and have named respectively TAC consensus site and TTT consensus site. The nucleotides which are conserved between these two consensus sites and Tinman/Nkx2.5 binding consensus sites are indicated in bold letters; K representing T or G.
Frequency of the TTT and TAC regions in the chaetognath ribosomal protein cDNAs with complete 5'-ends
| Ribosomal subunit | TTT putative binding site (5'-TAATTGAGTAGTTT-3') | TAC putative binding site (5'-TATTAAGTACTAC-3') | ||
| % of clones bearing this sequence | % of clones bearing a highly homologous sequence | % of clones bearing this sequence | % of clones bearing a highly homologous sequence | |
| SSU RP genes | 63.7 | 10.8 | 21.8 | 3.7 |
| LSU RP genes | 70.3 | 4.5 | 19.8 | 5.4 |
The percentage of identity between consensus sequences and highly homologous sequences is always ≥ 69%.
Search of differences in biologically significant sites between chaetognath ribosomal protein isoforms using Prosite
| Protein name | Isoform number | Aa numbers | EST numbers | % identity/overall similarity | Motifs which are different |
| S7 | 1 | 194 | 30 | 82.0/97.9 | cAMP:1 – PKC:3 |
| 2 | 194 | 1 | cAMP:2 – PKC:6 | ||
| S8 | 1 | 208 | 9 | is.1/is.2: 96.6/100 | amidation:2 – cAMP:2 – myristil:2 – nuclear:3 – PKC:6 |
| 2 | 208 | 8 | is.2/is.3: 97.1/100 | amidation:1 – cAMP:4 – myristil:2 – nuclear:4 – PKC:8 | |
| 3 | 208 | 2 | is.1/is.3: 97.1/100 | amidation:2 – cAMP:4 – myristil:1 – nuclear:4 – PKC:7 | |
| S11 | 1 | 156 | 24 | 98.1/99.4 | nuclear:1 |
| 2 | 156 | 1 | nuclear:0 | ||
| S15 | 1 | 144 | 7 | 99.3/100 | N.D. |
| 2 | 144 | 8 | N.D. | ||
| S16 | 1 | 146 | 9 | 93.8/96.6 | CK2:2 |
| 2 | 145 | 13 | CK2:3 | ||
| S17 | 1 | 134 | 19 | 87.4/96.3 | CK2:0 – myristil:0 – nuclear:1 – sulfation:1 |
| 2 | 135 | 10 | CK2:1 – myristil:1 – nuclear:0 – sulfation:0 | ||
| S21 | 1 | 81 | 2 | 83.1/92.8 | cAMP:1 – myristyl:2 – PKC:3 – tyr:0 |
| 2 | 83 | 10 | cAMP:2 – myristyl:1 – PKC:2 – tyr:1 | ||
| S25 | 1 | 115 | 14 | is.1/is.2: 82.8/93.9 | cAMP:1 – CK2:0 – myristil:2 – nuclear:1 |
| 2 | 114 | 2 | is.2/is.3: 71.7/83.5 | cAMP:1 – CK2:1 – myristil:1 – nuclear:1 | |
| 3 | 127 | 1 | is.1/is.3: 73.2/83.5 | cAMP:0 – CK2:0 – myristil:2 – nuclear:0 | |
| S28 | 1 | 64 | 8 | 98.4/100 | N.D. |
| 2 | 64 | 14 | N.D. | ||
| L7 | 1 | 245 | 13 | 85.7/95.5 | cAMP:1 – PKC:2 |
| 2 | 243 | 23 | cAMP:0 – PKC:3 | ||
| L15 | 1 | 206 | 1 | 98.1/99.0 | N.D. |
| 2 | 207 | 1 | N.D. | ||
| L21 | 1 | 161 | 5 | 97.5/88.1 | myristil:2 – PKC:4 |
| 2 | 161 | 2 | myristil:1 – PKC:5 | ||
| L22 | 1 | 121 | 10 | 81.2/92.1 | amidation:1 |
| 2 | 128 | 1 | amidation:0 | ||
| L27 | 1 | 136 | 15 | 88.2/99.2 | myristil:0 – PKC:2 |
| 2 | 136 | 26 | myristil:1 – PKC:3 | ||
| L27a | 1 | 145 | 3 | 89.6/97.2 | CK2:1 |
| 2 | 145 | 10 | CK2:0 | ||
| L30 | 1 | 114 | 20 | 96.5/98.2 | PKC:4 |
| 2 | 114 | 6 | PKC:5 | ||
| L34 | 1 | 130 | 16 | 95.4/97.7 | cAMP:3 – PKC:1 |
| 2 | 131 | 8 | cAMP:4 – PKC:2 | ||
| L35 | 1 | 123 | 13 | 89.4/98.4 | tyr:0 |
| 2 | 123 | 11 | tyr:1 | ||
| L36 | 1 | 105 | 16 | 90.6/98.1 | PKC:4 |
| 2 | 106 | 32 | PKC:2 | ||
| L37 | 1 | 100 | 9 | 99.0/99.0 | N.D. |
| 2 | 100 | 8 | N.D. | ||
| L38 | 1 | 70 | 29 | 92.9/100 | asn:1 – CK2:1 – PKC:3 |
| 2 | 70 | 5 | asn:0 – CK2:0 – PKC:2 | ||
| L39 | 1 | 51 | 13 | is.1/is.2: 96.1/98.0 | myristil:0 |
| 2 | 51 | 1 | is.1/is.3: 98.0/98.0 | myristil:1 | |
| 3 | 51 | 1 | is.2/is.3: 94.1/96.1 | myristil:0 | |
Within a protein family, each of the isoforms are putatively encoded by different types of cDNA, except the L39 isoform-1 and isoform-3 which are encoded by cDNAs belonging to two different subtypes (TTT and TAC respectively) of a same type. Abbreviations: amidation, amidation site; cAMP, cAMP- and cGMP-dependent protein kinase phosphorylation site; CK, casein kinase II phosphorylation site; myristil, N-myristoylation site ; N.D., no difference; nuclear, bipartite nuclear targeting sequence; PKC, protein kinase C phosphorylation site ; tyr, tyrosine sulfation site.
Presence of members of 4 ribosomal gene families in three S. cephaloptera individuals using PCR
| ESTs with TTT sites | ESTs with TAC sites | ||||||||||
| RP | Is. | Name of the reverse primers (5'-3') | EMBL acc. n° of ESTs which bear the primer sequences | EMBL acc. n° of ESTs with 1 or 2 internal mutations in the primer sequences | PCR results by individual | EMBL acc. n° of ESTs which bear the primer sequences | PCR results by individual | ||||
| 1 | 2 | 3 | 1 | 2 | 3 | ||||||
| S8 | 1 | S8-1R | + | + | + | + | + | + | |||
| 2 | S8-2R | none | + | + | + | + | + | + | |||
| 3 | S8-3R | none | - | - | - | none | - | - | - | ||
| S25 | 1 | S25-1R | + | + | + | + | + | + | |||
| 2 | S25-2R | none | + | + | - | none | - | - | - | ||
| 3 | S25-3R | none | none | - | - | - | - | - | + | ||
| L15 | 1 | L15-1R | none | + | + | - | none | + | - | + | |
| 2 | L15-2R | none | + | - | + | none | - | - | + | ||
| L27a | 1 | L27a-1R | + | + | + | + | + | + | |||
| 2 | L27a-2R | none | + | + | + | + | + | + | |||
For all the members of these four gene families, no internal mutations in the regions bearing the putative TAC sites have been found. Abbreviations and symbols: none, no EST clone has been found in the S. cephaloptera library; RP, ribosomal protein; is., isoform; Acc. n°, EMBL accession numbers; +, PCR with positive results; -, PCR with negative results.
Figure 3Phylogenetic trees of the selected ribosomal proteins sequences (see Table 2 and Methods). The trees A, B and C were obtained using respectively Neighbor Joining (NJ), Fitch, Maximum Parsimony (MP), and Maximum Likehood (ML) methods on an amino acid dataset. The trees constructed using Fitch and MP methods have a similar topology. In D, the ML tree using the first two codon positions and the model selected by MrAIC, GTRIG, ML estimated base frequency, a gamma (2) distribution for site substitution rates, and an estimated proportion of invariant sites. Similar topologies were obtained with ML using codon models and with a non homogeneous non stationary ML method allowing G+C equilibrium frequency to vary (see text). Trees E and F were obtained using respectively the GTR model with a MCMC bayesian method and the CAT mixture model on an amino acid dataset. Numbers indicate bootstrap values or branch support; in tree B, MP and Fitch values are respectively at the left and at the right, in tree D, after the slash, the aLRT (actually the minimum of the CHI2-based parametric and non parametric aLRT estimated value). Abbreviations: D.m., D. melanogaster; Echino., Echinoderm; R.n., R. norvegicus; S.c., S. cephaloptera; S.d., S. domuncula; Yeast, S. cerevisiae.