| Literature DB >> 17092333 |
Carmela Gissi1, Graziano Pesole, Elena Cattaneo, Marzia Tartari.
Abstract
BACKGROUND: To gain insight into the evolutionary features of the huntingtin (htt) gene in Chordata, we have sequenced and characterized the full-length htt mRNA in the ascidian Ciona intestinalis, a basal chordate emerging as new invertebrate model organism. Moreover, taking advantage of the availability of genomic and EST sequences, the htt gene structure of a number of chordate species, including the cogeneric ascidian Ciona savignyi, and the vertebrates Xenopus and Gallus was reconstructed.Entities:
Mesh:
Substances:
Year: 2006 PMID: 17092333 PMCID: PMC1636649 DOI: 10.1186/1471-2164-7-288
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Amino acid alignment of the huntingtin region corresponding to exon 50–51 of . The red box indicates the region absent in the alternatively spliced isoform. Identical, similar and conserved positions are reported with different backgrounds.
Features of the huntingtin 3'UTRs of Ciona intestinalis mRNA and source supporting the data.
| 1 | 37 | AAUAcA | -15 | 1/23 | ||
| 2 | 118 | - | 1/23 | |||
| 3 | 283 | AAUcAA | -44 | 1/23 | ||
| 4 | 323 | AcUAAA | -15 | 2/23 | ||
| 5 | 370 | AAUAcA | -17 | 15/23 | 4/6 | h, g, b, cl |
| 6 | 496 | - | 0/23 | 2/6 | eg, tb | |
| 7 | 744 | AAgAAA | -18 | 2/23 | ||
| 8 | 1018 | AAUAAA | -10 | 1/23 | ||
Lower case letters in the polyadenylation signal indicate differences from the canonical AAUAAA sequence. The position is the distance of the last base of the hexamer from the polyA start site, indicated as position 0. Tissue abbreviations: b: blood cells; cl: cleavage stage embryo; eg: egg; g: gonad; h: heart; tb: tailbud embryo. EST clones are detailed in Additional file 3.
Uncorrected amino acid distances, calculated as average number of differences per 100 amino acids, for all pairwise comparisons of aligned proteins.
| 9.10 | 8.74 | 11.47 | 15.56 | 20.61 | 26.21 | 26.80 | 26.86 | 63.17 | 63.98 | ||
| - | 2.86 | 13.05 | 17.21 | 21.78 | 27.26 | 27.96 | 28.19 | 63.31 | 63.93 | ||
| - | 12.75 | 17.07 | 21.74 | 27.16 | 27.64 | 27.87 | 63.40 | 63.95 | |||
| - | 18.46 | 23.75 | 27.90 | 28.19 | 28.23 | 63.15 | 63.56 | ||||
| - | 19.21 | 24.59 | 25.39 | 25.11 | 63.00 | 63.83 | |||||
| - | 26.54 | 27.13 | 27.31 | 63.18 | 63.71 | ||||||
| - | 18.13 | 17.81 | 62.55 | 63.60 | |||||||
| - | 4.97 | 63.64 | 64.23 | ||||||||
| - | 63.48 | 64.17 | |||||||||
| - | 27.95 | ||||||||||
| - |
Figure 2Bayesian phylogenetic tree of huntingtin, reconstructed from protein sequences. Branch lengths are proportional to the number of substitutions per site. Numbers close to the nodes represent Bayesian posterior probabilities.
Ascidian and human HEAT repeats mapped on the protein sequence of the corresponding species.
| A1 | 0.0005 | N-term | 58–96 | PGLLAVSVETLLQSCADDNADVRLNANECLNRLIKGLYE | |
| A2 | 5.96E-06 | N-term | 139–177 | RPYILNLLPCLCRISQREEDGVQETLGLSLVKIFKILGP | |
| A3 | 1.35E-06 | N-term | 181–219 | ESEIQGLLASFLKNLSHKSATMRRTACVCLHSVILNCRK | |
| B4 | 6.19E-06 | N-term | 682–720 | QSLSHQALSIALKCLCDDDLRLRKTAAATIVTMPTSFPT | |
| c | 2.30E-06 | Central | 867–905 | SQQQFGILPFVMSLLHSAWLPLDVTAHSDALVLAGNLVA | |
| E1 | 1.26E-06 | Central | 1341–1378 | QGSASHVIPAMQPIIHDI.YVVRASSKNEPPEVTTQREV | |
| g1 | 9.05E-06 | C-term | 2771–2809 | ARVMSKVLPSMLDDFFPAQDIMNKIIAEFISTLQPFPAS | |
| g2 | 1.46E-06 | C-term | 2864–2904 | NRWISSMVPLIISRVHDPTLDVDWTCFCKAAVDFYTCQLSE | |
| A1 | 2.92E-07 | N-term | 58–96 | PGLLAVSVETLLQSCADENADVRLNSNECLNRVIKGLYD | |
| A2 | 0.0001 | N-term | 139–177 | RPYILNLLPCLCRISQREEDAVQEVLSSSLAKIFIVLGA | |
| A3 | 2.52E-06 | N-term | 181–219 | ESEIQGLLASFLKNLSHKSPTVRRTACICLHSILTNSRK | |
| B4 | 1.53E-06 | N-term | 692–730 | KSIAQKALSIALECLCDEDTRLRKTSSAAIVSMATSYPT | |
| c | 1.46E-06 | Central | 876–914 | AQQQFGILPIVMSLLRSAWLPLDVTAHSDALVLAGNLIA | |
| E1 | - | Central | 1352–1389 | QGSASHVIPAMQPITHDI.FVVRGSLKNEPPEVTTQREV | |
| g1 | 1.27E-06 | C-term | 2770–2808 | ARVMSKILPSMLDDFFPAQEIMNKIIAEFISTLQPFPGS | |
| g2 | - | C-term | 2864–2903 | RWISSMVPLIISRSHDPSLDRNWTCFCKSAVDFYTCQLSE | |
| A1 | 4.75E-07 | N-term | 124–162 | QKLLGIAMELFLLCSDDAESDVRMVADECLNKVIKALMD | |
| A2 | 0.0001 | N-term | 205–243 | RPYLVNLLPCLTRTSKRPEESVQETLAAAVPKIMASFGN | |
| A3 | 5.48E-07 | N-term | 247–285 | DNEIKVLLKAFIANLKSSSPTIRRTAAGSAVSICQHSRR | |
| a4 | * | N-term | 291–329 | SWLLNVLLGLLVPVEDEHSTLLILGVLLTLRYLVPLLQQ | |
| a5 | 7.77E-06 | N-term | 318–362 | LTLRYLVPLLQQQVKDTSLKGSFGVTRKEMEVSPSAEQLVQVYEL | |
| b1 | * | N-term | 745–783 | EYPEEQYVSDILNYIDHGDPQVRGATAILCGTLICSILS | |
| b2 | 1.04E-06 | N-term | 803–841 | TFSLADCIPLLRKTLKDESSVTSKLACTAVRNCVMSLCS | |
| b3 | * | N-term | 842–880 | SSYSELGLQLIIDVLTLRNSSYWLVRTELLETLAEIDFR | |
| B4 | 6.69E-08 | N-term | 904–942 | KLQERVLNNVVIHLLGDEDPRVRHVAAASLIRLVPKLFY | |
| b5 | 9.05E-06 | N-term | 984–1025 | RIYRGYNLLPSITDVTMENNLSRVIAAVSHELITSTTRALTF | |
| d | 5.62E-06 | Central | 1425–1463 | RLFEPLVIKALKQYTTTTCVQLQKQVLDLLAQLVQLRVN | |
| E1 | * | Central | 1534–1575 | RKAVTHAIPALQPIVHDLFVLRGTNKADAGKELETQKEVVVS | |
| e2 | * | Central | 1610–1648 | RQIADIILPMLAKQQMHIDSHEALGVLNTLFEILAPSSL | |
| e3 | * | Central | 1670–1710 | TVQLWISGILAILRVLISQSTEDIVLSRIQELSFSPYLISC | |
| f | 3.51E-06 | C-term | 2798–2836 | DDTAKQLIPVISDYLLSNLKGIAHCVNIHSQQHVLVMCA |
HEAT repeats are named according to their relative position along the chordate aligned sequences, using the same letter for repeats closer than 45 amino acids. Orthologous HEAT repeats conserved in ascidians and human share the same name, and are reported in upper case. The Expectation values (E-value) was calculated by the REP program [62]. Htt regions defined as in Methods. Absolute position of the HEAT repeats in the corresponding protein sequence is reported in the "Location" column. Dash: REP E-value not statistically significant. Asterisk: HEAT repeats originally described in Andrade and Bork [18] but not identified by the REP program as statistically significant [62].
Chordata huntingtin gene structure, coding region (CDS), and length percentage of repetitive elements (Rpt) in intronic sequences.
| Mammalia | 67 | 9432 | 165202 | 155770 | 38.5 | 1.3 | 37.2 | 33.9 | 2.2 | |
| 67 | 9357 | 146932 | 137575 | 38.2 | 1.9 | 36.2 | 34.7 | 0.5 | ||
| 67 | 9351 | 145591 p | 136240 p | 35.4 | 2.6 | 32.9 | 31.3 | 0.9 | ||
| Aves | 67 | 9351 | 73424 p | 64073 p | 5.5 | 0.6 | 4.9 | 4.9 | 0 | |
| Amphibia | 67 | 9066 p | 79087 p | 70021 p | 11.0 | 0.9 | 10.0 | 0 | 9.3 | |
| Teleostei | 67 | 9363 | 79017 p | 69654 p | 29.9 | 5.3 | 24.6 | 4.3 | 20.3 | |
| 67 | 9444 | 21324 | 11880 | 1.1 | 1.1 | 0 | 0 | 0 | ||
| 67 | 9435 | 22257 | 12822 | 2.4 | 1.8 | 0.5 | 0 | 0.5 | ||
| Ascidiacea | 61 | 8835 | 45085 | 36249 | 9.8 | 2.7 | 7.1 | 4.4 | 2.1 | |
| 61 | 8838 | 32283 | 23445 | 12.5 | 5.8 | 6.7 | 1.6 | 5.1 | ||
Simple: satellites, simple repeats, low complexity repeats and small RNAs. Interspers: interspersed repeats. Retroel: retroelements. DNA el: DNA elements. p: partial sequence. Partial introns in Rattus: 1, 8, 28. Partial introns in Gallus: 46, 47. Partial or unknown exons in Xenopus: 27, 37, 66. Partial or unknown introns in Xenopus: 9, 26, 27, 36, 37, 65, 66. Partial introns in Danio: 1, 6, 8, 11, 13, 18, 25, 44, 49, 59. Intron 61 of Danio was excluded from gene size calculation due to its length (about 254 kb).
Figure 3Comparison of huntingtin gene structure between . Only protein-coding regions are indicated. Exons are represented by boxes, with upper numbers indicating exon numbering and inner number indicating exon length (in bp). Box size is unrelated to exon length. Square bracket: exon-block (see text). Yellow box: equivalent exon (see text). Gray box: exon belonging to an exon-block (see text). Introns positionally conserved in the two species are represented by dashed lines, in black for identical intron position, in red for slipped position (changes ≤ 18 bp). Intron phase is reported between dashed lines as a single number if common to the two species. Boxed phase number indicates that intron phase is not conserved in one vertebrate species (see text). Blue boxes below or above the gene structure indicate exons with length differences > 12 bp in vertebrates (below) or in the Ciona genus (above). Letters inside blue boxes indicates the species where the size difference is observed: M, difference between mammals and other-vertebrates; NM, difference within non-mammalian vertebrates; F, difference between fishes and other-vertebrates; G, difference only in Gallus. Arrows indicate the 5% longest introns in at least one vertebrate species (below), and in the Ciona genus (above). AS: alternative splicing experimentally identified in C. intestinalis. PuAS: putative alternative splicing identified "in silico" in Gallus, Xenopus and pufferfishes. Sp: presence of lineage-specific sequences only in non-mammalian species (SpNM) or only in Ciona (SpC).
Figure 4Percentage amino acid identity calculated for each Equivalent exon (E, in yellow) and exon-Block (B, in gray). The percentage amino acid identity was calculated from the chordate protein alignment for each of the equivalent exons and exon-blocks described in Figure 1. Numbers refer to the Ciona exon numbering. Bold-dashed line represents the mean % identity (21.2%) calculated over the entire alignment length. Normal-dashed lines represent mean value +/- standard deviation (7.1).
Length variability of huntingtin introns.
| 1549 | 2360 | 93 | 12251 | 11850 | 9949 | |
| 1164 | 2084 | 92 | 20632 | 20632 | 6828 | |
| 1146 | 2064 | 92 | 14532 | 14532 | 5970 | |
| 743 | 971 | 85 | 5285 | 5285 | 2569 | |
| 831 | 1096 | 77 | 7186 | 7186 | 2849 | |
| 538 | 1091 | 73 | 4946 | 4891 | 3962 | |
| 105 | 180 | 66 | 1274 | 537 | 528 | |
| 111 | 194 | 72 | 1264 | 781 | 736 | |
| 569 | 604 | 122 | 1933 | 331 | 1119 | |
| 319 | 391 | 55 | 3038 | 257 | 616 |
Figure 5Amino acid alignment of CSTs found in intron 12. The Conserved Sequence Tag (CST) corresponds to an internal cassette exon (12Bis) in non-mammalian tetrapods and to a longest splicing isoform of exon 12 (12L) in pufferfishes. Identical, similar and conserved positions are indicated with different background.