| Literature DB >> 22938396 |
Hilary C Miller1, Patrick J Biggs, Claudia Voelckel, Nicola J Nelson.
Abstract
BACKGROUND: The tuatara (Sphenodon punctatus) is a species of extraordinary zoological interest, being the only surviving member of an entire order of reptiles which diverged early in amniote evolution. In addition to their unique phylogenetic placement, many aspects of tuatara biology, including temperature-dependent sex determination, cold adaptation and extreme longevity have the potential to inform studies of genome evolution and development. Despite increasing interest in the tuatara genome, genomic resources for the species are still very limited. We aimed to address this by assembling a transcriptome for tuatara from an early-stage embryo, which will provide a resource for genome annotation, molecular marker development and studies of development and adaptation in tuatara.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22938396 PMCID: PMC3478169 DOI: 10.1186/1471-2164-13-439
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Summary statistics for individual and merged assemblies
| 21 | Initial | 33024 | 844 | 525 | 5689 | 17,354,832 |
| | Representative | 29082 | 786 | 501 | 5659 | 14,561,997 |
| 25 | Initial | 28723 | 746 | 491 | 5689 | 14,105,603 |
| | Representative | 26715 | 706 | 474 | 5659 | 12,660,658 |
| 29 | Initial | 26236 | 615 | 431 | 5689 | 11,307,053 |
| | Representative | 25016 | 590 | 419 | 5659 | 10,488,297 |
| 33 | Initial | 23648 | 488 | 363 | 5584 | 8,591,562 |
| | Representative | 22972 | 469 | 355 | 5584 | 8,148,996 |
| 37 | Initial | 19180 | 369 | 311 | 5111 | 5,898,486 |
| | Representative | 18821 | 357 | 301 | 5111 | 5,664,511 |
| 41 | Initial | 12230 | 281 | 263 | 5750 | 3,218,609 |
| | Representative | 12090 | 273 | 258 | 5750 | 3,122,927 |
| Merged | | 35680 | 747 | 479 | 5750 | 17,086,468 |
| | ||||||
| Annotated | 15965 | 927 | 586 | 5659 | 9,357,209 |
For each kmer, data from both the initial Velvet/Oases assembly (Initial), and the assembly containing only one representative transcript from each locus (Representative) are shown. The “Merged” assembly is the result of merging representative assemblies from different kmers using CD-HIT-EST, the “Final” assembly is after potentially misassembled transcripts were removed, and the “Annotated” set only contains transcripts with a significant BLAST match. Kmer = required length of overlap match between two reads in Velvet; N50 = length-weighted median contig length.
Figure 1Length distribution (A) and mean coverage per base (B) of assembled transcripts. White bars show data from all transcripts (32,911 sequences), while grey bars show data from annotated sequences only (15,965 sequences). The line denotes mean length (A) or coverage (B) for all transcripts (dotted line), and annotated sequences only (solid line).
Figure 2Gene Ontology (GOslim) assignments for tuatara transcripts. Level 2 annotations are shown for the biological process and molecular function graphs, and level 7 annotations for the cellular component graph.
Candidate genes for immune function, sex differentiation and temperature-response found in our dataset
| MHC class I | 29_Locus_17662 | 136 | 136 | 96% | DQ145788 |
| | 29_Locus_8295 | 526 | 444 | 91% | ABA42599 |
| | 21_Locus_8663 | 1455 | 717 | 90% | ABA42599 |
| | 25_Locus_8701 | 642 | 567 | 88% | ABA42600 |
| | 33_Locus_20320 | 141 | 138 | 86% | ABA42600 |
| | 25_Locus_21578 | 159 | 105 | 62% | ABA42599 |
| | 21_Locus_10540 | 222 | 216 | 50% | ABB92561 |
| MHC class II β chain | 21_Locus_1538 | 921 | 420 | 99% | DQ124231 |
| | 29_Locus_1474 | 231 | 231 | 94% | DQ124232 |
| MHC class II α chain | 21_Locus_7902 | 232 | 222 | 74% | AF256650 |
| | 29_Locus_2120 | 133 | 111 | 92% | AF256650 |
| MHC class II DM α chain | 21_Locus_1203 | 963 | 648 | 43% | AEC52935 |
| | 21_Locus_6166 | 155 | 144 | 60% | ACY01474 |
| Toll-like receptor 2 | 21_Locus_29589 | 125 | 123 | 98% | ABU95017 |
| Sox8 | 25_Locus_11477 | 766 | 222 | 97% | AAO39011 |
| | 25_Locus_1895 | 162 | 162 | 93% | ABB02374 |
| | 21_Locus_14808 | 309 | 252 | 79% | AAO49746 |
| Sox 9 | 21_Locus_21814 | 173 | 173 | 100% | AY168558 |
| | 21_Locus_18139 | 167 | 165 | 98% | ACU12331 |
| | 21_Locus_24105 | 142 | 67 | 100% | AY168558 |
| Dax1 | 21_Locus_20366 | 134 | 117 | 92% | ABQ88373 |
| CIRBP | 37_Locus_775 | 500 | 252 | 82% | XP_003224509 |
| HSP27 | 21_Locus_6322 | 1028 | 543 | 82% | XP_002190077 |
| | 33_Locus_379 | 827 | 606 | 75% | XP_002194703 |
| HSP40 (DnaJ) | 21_Locus_336 | 1887 | 1233 | 97% | NP_001005841 |
| | 21_Locus_2214 | 2001 | 1014 | 83% | XP_003217107 |
| HSP47 | 25_Locus_2437 | 1689 | 1212 | 90% | BAF94140 |
| HSP70 | 21_Locus_6649 | 1124 | 1113 | 96.5% | AEO13403 |
| | 25_Locus_13016 | 548 | 429 | 92% | EAY98319 |
| | 21_Locus_13133 | 369 | 369 | 90% | ADD69959 |
| | 21_Locus_4641 | 3563 | 2406 | 90% | XP_003210546 |
| | 25_Locus_21367 | 271 | 270 | 89% | XP_002193237 |
| HSP70BP | 25_Locus_4659 | 1484 | 693 | 81% | NP_001025928 |
| | 25_Locus_5331 | 1431 | 696 | 84% | XP_003226240 |
| HSP75 | 21_Locus_7217 | 2174 | 1566 | 90% | BAF94145 |
| HSP90 | 33_Locus_89 | 2694 | 1230 + 372 | 98% | AF275719 |
| | 21_Locus_16233 | 979 | 975 | 90% | BAD95027 |
| 21_Locus_31051 | 139 | 138 | 96% | AAD11550 | |
The accession number for the top BLAST hit is given. Not all matches for heat-shock proteins are shown. CIRBP = Cold-inducible RNA binding protein.
Summary of repeats identified in tuatara transcripts
| Retroelements | 792 | 112,220 | 0.75 |
| SINES: | 177 | 19,814 | 0.13 |
| LINES: | 488 | 76,335 | 0.51 |
| L2/CR1/Rex | 459 | 72,305 | 0.48 |
| RTE/Bov-B | 3 | 160 | 0.001 |
| L1/CIN4 | 25 | 3,655 | 0.025 |
| LTR elements: | 127 | 16,071 | 0.11 |
| Bel/Pao | 1 | 54 | 0.0004 |
| Ty1/Copia | 6 | 1,490 | 0.01 |
| Gypsy/DIRS1 | 65 | 10,075 | 0.07 |
| Retroviral | 52 | 4,252 | 0.03 |
| DNA transposons | 105 | 11,133 | 0.07 |
| Hobo-Activator | 30 | 1,698 | 0.01 |
| Tc1-IS630-Pogo | 6 | 732 | 0.005 |
| PiggyBac | 15 | 2,039 | 0.01 |
| Tourist/Harbinger | 1 | 59 | 0.0004 |
| Unclassified interspersed repeat | 15 | 1,187 | 0.008 |
| Small RNA* | 61 | 7,028 | 0.05 |
| Satellites | 44 | 5,917 | 0.04 |
| Simple repeats | 786 | 30,947 | 0.21 |
| Low complexity | 1072 | 48,337 | 0.32 |
* tRNA and snRNA sequences (rRNAs removed).