| Literature DB >> 21617250 |
Laura Baldo1, M Emília Santos, Walter Salzburger.
Abstract
The hundreds of endemic species of cichlid fishes in the East African Great Lakes Tanganyika, Malawi, and Victoria are a prime model system in evolutionary biology. With five genomes currently being sequenced, eastern African cichlids also represent a forthcoming genomic model for evolutionary studies of genotype-to-phenotype processes in adaptive radiations. Here we report the functional annotation and comparative analyses of transcriptome data sets for two eastern African cichlid species, Astatotilapia burtoni and Ophthalmotilapia ventralis, representatives of the modern haplochromines and ectodines, respectively. Nearly 647,000 expressed sequence tags were assembled in more than 46,000 contigs for each species using the 454 sequencing technology, largely expanding the current sequence data set publicly available for these cichlids. Total predicted coverage of their proteome diversity is approximately 50% for both species. Comparative qualitative and quantitative analyses show very similar transcriptome data for the two species in terms of both functional annotation and relative abundance of gene ontology terms expressed. Average genetic distance between species is 1.75% when all transcript types are considered including nonannotated sequences, 1.33% for annotated sequences only including untranslated regions, and decreases to nearly half, 0.95%, for coding sequences only, suggesting a large contribution of noncoding regions to their genetic diversity. Comparative analyses across the two species, tilapia and the outgroup medaka based on an overlapping data set of 1,216 genes (∼526 kb) demonstrate cichlid-specific signature of disruptive selection and provide a set of candidate genes that are putatively under positive selection. Overall, these data sets offer the genetic platform for future comparative analyses in light of the upcoming genomes for this taxonomic group.Entities:
Mesh:
Substances:
Year: 2011 PMID: 21617250 PMCID: PMC3296448 DOI: 10.1093/gbe/evr047
Source DB: PubMed Journal: Genome Biol Evol ISSN: 1759-6653 Impact factor: 3.416
Summary of the ESTs Generated by 454 Sequencing in This Study
| AB | OV | |
| Summary run | ||
| Total number of reads | 647,219 | 647,816 |
| Average read length | 349.27 | 344.36 |
| Total number of bases | 226,048,424 | 223,072,738 |
| Summary assembly | ||
| Total number of contigs | 49,311 | 46,298 |
| Total number of large contigs (≥500 bases) | 19,408 | 17,207 |
| Average contig size | 585.84 | 566.33 |
| N50 contig size | 1,016 | 1,003 |
| Largest contig size | 8,335 | 7,430 |
Half of all bases reside in contigs of this size or longer.
Summary of the ESTs Annotation Using Blast2GO
| AB | OV | |
| Number of ESTs returning BlastX hits | 19,121 (12,491 AccNos) | 16,582 (11,269 AccNos) |
| Number of ESTs with GO annotation | 11,956 (5,152 terms) | 10,250 (4,852 terms) |
| Biological process | 8,438 (2,974 terms) | 7,293 (2,732 terms) |
| Cellular component | 7,330 (616 terms) | 6,307 (623 terms) |
| Molecular function | 10,110 (1,562 terms) | 8,683 (1,497 terms) |
| Annotated protein-coding genes | 8,684 | 7,671 |
FTen most represented GO terms per biological category and absolute number of ESTs assigned to each term. Overall representation of GO terms is nearly equal between AB and OV.
Most Common Hits in the nt Database (cut off e value 1 × 10−15) for Contigs That Had No Hits in the nr Database
| Number of Contigs | ||||
| Hit Description | Species | AccNo | AB | OV |
| MHC class IA antigen UBA1, UBA2, UAA1 genes, UAA3 and UAA2 pseudogenes, UAA4, UAA5, and UAA6 pseudogene fragments | AB270897.1 | 260 | 226 | |
| Platelet-derived growth factor receptor beta b (pdgfrbb) and colony-stimulating factor 1 receptor b (csf1rb) genes | DQ386647.1 | 181 | 153 | |
| Hoxba gene cluster | EF594310.1 | 149 | 136 | |
| KLR1 gene; KLR2 pseudogene, KLR3 and KLR4 genes; KLR5 gene, KLR6 and KLR7 pseudogenes | AY495714.1 | 115 | 115 | |
| Hoxdb gene cluster | EF594316.1 | 84 | 59 | |
| Platelet-derived growth factor receptor beta a (pdgfrba) and colony-stimulating factor 1 receptor a (csf1ra) genes | DQ386648.1 | 60 | 43 | |
| Gsh2 (gsh2), Pdgfra (pdgfra), and Kita (kita) genesKdrb (kdrb) gene; and Clock (clock) gene | EF526075.2 | 57 | 64 | |
| Hoxbb gene cluster | EF594314.1 | 56 | 74 | |
| Hoxab gene cluster, complete sequence | EF594311.1 | 55 | 52 | |
| KLR8 pseudogene; KLR9 gene, C-type lectin (CLECT2)-like protein pseudogene, and C-type lectin (CLECT2)-like protein gene; KLR10 pseudogene; C-type lectin natural killer cell receptor-like protein gene; and transposon TX1-like ORF2 pseudogene | AY495715.1 | 45 | 47 | |
| Hoxda gene cluster | EF594315.1 | 31 | 32 | |
| Hoxca gene cluster | EF594312.1 | 22 | 30 | |
| Hoxaa gene cluster | EF594313.1 | 20 | 13 | |
| Total number of contigs | 1,135 | 1,044 | ||
Average Pairwise Genetic Distance (Pi, Uncorrected) with Standard Deviation and Median Values Estimated from 4,516 BBHs between AB and OV (Data set #1) and from 2,660 Three-Species Alignments (AB, OV, and Tilapia; Data set #2)
| Pi | Median | Mean Length (Range), bp | ||
| Data set #1 | ||||
| AB | OV | 0.0175 ± 0.0101 | 0.0158 | 1,463 (516–6,837) |
| Data set #2 | ||||
| AB | OV | 0.0138 ± 0.0096 | 0.0117 | 541 (150–2,588) |
| Tilapia | AB | 0.0302 ± 0.0203 | 0.0261 | |
| Tilapia | OV | 0.0314 ± 0.0212 | 0.0268 |
Data set #1 includes both annotated and nonannotated ESTs, whereas data set #2 includes only annotated ESTs with UTRs.
FML phylogeny based on four-species concatenated alignment of 1,216 genes (526,113 bp). The tree is rooted using medaka as outgroup. All nodes had a 100 bootstrap value support. For each branch, individual dN and dS values (in brackets, respectively) and the corresponding dN/dS ratios (in red) were calculated under the free-ratio model (codeml). Indel events per branch (specified by number followed by “i”) were mapped by maximum parsimony.
Average Pairwise Genetic Distances (Pi, Uncorrected), Rates of Synonymous and Nonsynonymous Substitutions Per Site and Relative Ratio Estimated for Both Individual and Concatenated 1,216 Four-Species Alignments (526,113 bp, Data set #3)
| Individual Alignments | Concatenated Alignments | ||||||||||
| Pi | Ks | Ka | Ka/Ks | Ks | Ka | Ka/Ks | d | d | d | ||
| AB | OV | 0.0095 ± 0.0072 | 0.0289 ± 0.0001 | 0.0048 ± 4.7 × 10−06 | 0.1856 ± 0.2688 | 0.0288 | 0.0057 | 0.1979 | 0.0288 | 0.0039 | 0.1358 |
| Tilapia | AB | 0.0222 ± 0.0207 | 0.0732 ± 0.0006 | 0.0096 ± 1 × 10−05 | 0.1753 ± 0.2124 | 0.0685 | 0.0103 | 0.1504 | 0.0686 | 0.0091 | 0.1323 |
| Tilapia | OV | 0.0230 ± 0.0210 | 0.0746 ± 0.0005 | 0.0102 ± 0.0000 | 0.1827 ± 0.2349 | 0.0699 | 0.0117 | 0.1674 | 0.0700 | 0.0097 | 0.1387 |
| Medaka | Tilapia | 0.1609 ± 0.0496 | 0.8657 ± 0.0197 | 0.065 ± 0.0002 | 0.0810 ± 0.0977 | 0.8128 | 0.0672 | 0.0827 | 0.8160 | 0.0607 | 0.0744 |
| Medaka | AB | 0.1605 ± 0.0497 | 0.8695 ± 0.0125 | 0.0644 ± 0.0002 | 0.0806 ± 0.1171 | 0.8167 | 0.0665 | 0.0814 | 0.8201 | 0.06 | 0.0731 |
| Meakda | OV | 0.1605 ± 0.0497 | 0.8681 ± 0.016 | 0.0647 ± 0.0002 | 0.0810 ± 0.1062 | 0.8143 | 0.0676 | 0.0830 | 0.8182 | 0.0603 | 0.0737 |
Average Pairwise Genetic Distances (Pi, Uncorrected) Estimated for 1,216 Individual Four-Species Alignments (Gene Data set #3) before and after Trimming UTRs
| Pi | ||||
| ORFs only | ORFs + UTRs | ORFs + UTRs | ||
| AB | OV | 0.0095 ± 0.0072 | 0.0112 ± 0.0077 | 0.0133 ± 0.0080 |
| Tilapia | AB | 0.0222 ± 0.0207 | 0.0250 ± 0.0171 | na |
| Tilapia | OV | 0.0230 ± 0.0210 | 0.0250 ± 0.0171 | na |
Total length: 652,849 bp.
Total length: 1,122,962 bp.
Genes Under Putative Positive Selection Based on Pairwise Ka/Ks Values > 1
| Pairwise | Gene | Length, bp | Pi | Ks | Ka | Ka/Ks |
| Single | ||||||
| AB versus OV | Aquaporin fa-chip | 396 | 0.0202 | 0.0103 | 0.0273 | 2.650 |
| Succinate dehydrogenase | 450 | 0.0178 | 0.0085 | 0.0214 | 2.518 | |
| 20-beta-hydroxysteroid dehydrogenase | 501 | 0.0140 | 0.0084 | 0.0159 | 1.893 | |
| 26s proteasome nonatpase regulatory subunit 9 | 636 | 0.0173 | 0.0140 | 0.0227 | 1.621 | |
| Muscle-type creatine kinase ckm1 | 438 | 0.0092 | 0.0098 | 0.0151 | 1.541 | |
| Darmin protein | 363 | 0.0083 | 0.0061 | 0.0090 | 1.475 | |
| Serine hydrolase-like protein | 489 | 0.0226 | 0.0180 | 0.0247 | 1.372 | |
| Tetratricopeptide repeat protein 35 | 600 | 0.0034 | 0.0078 | 0.0107 | 1.372 | |
| Transmembrane protein 16f | 357 | 0.0114 | 0.0120 | 0.0148 | 1.233 | |
| Dead (asp-glu-ala-asp) box polypeptide 56 | 537 | 0.0075 | 0.0080 | 0.0098 | 1.225 | |
| Novel protein (zgc:100919) | 384 | 0.0131 | 0.0116 | 0.0139 | 1.198 | |
| loc733309 protein | 363 | 0.0138 | 0.0128 | 0.0142 | 1.109 | |
| Alpha-sialyltransferase st3gal v | 345 | 0.0116 | 0.0111 | 0.0119 | 1.072 | |
| Trypsinogen 2 | 540 | 0.0315 | 0.0311 | 0.0325 | 1.045 | |
| Tilapia versus OV | Beta-galactoside-binding lectin | 378 | 0.0212 | 0.0119 | 0.0243 | 2.042 |
| Decaprenyl-diphosphate synthase subunit 2 | 348 | 0.0201 | 0.0120 | 0.0231 | 1.925 | |
| Elastase 2-like protein | 540 | 0.0225 | 0.0152 | 0.0253 | 1.664 | |
| cdc42-interacting protein 4 homolog | 306 | 0.0132 | 0.0157 | 0.0167 | 1.064 | |
| Cytochrome c oxidase subunit 4 isoform mitochondrial precursor | 516 | 0.0177 | 0.0178 | 0.0182 | 1.022 | |
| Regulator of g-protein signaling 18 | 417 | 0.0240 | 0.0218 | 0.0283 | 1.298 | |
| Serum paraoxonase arylesterase 2 | 435 | 0.0300 | 0.0305 | 0.0336 | 1.102 | |
| hbaa_serqu ame: full = hemoglobin subunit alpha-a ame: full = hemoglobin alpha-a chain ame: full = alpha-a-globin | 426 | 0.0423 | 0.0403 | 0.0445 | 1.104 | |
| Suppression of tumorigenicity 14 (colon epithin) | 477 | 0.0359 | 0.0266 | 0.0400 | 1.504 | |
| Tilapia versus AB | Signal sequence alpha | 528 | 0.0076 | 0.0078 | 0.0126 | 1.615 |
| Nadh dehydrogenase 1 alpha subcomplex subunit mitochondrial precursor | 330 | 0.0182 | 0.0135 | 0.0199 | 1.474 | |
| mgc85594 protein | 402 | 0.0150 | 0.0123 | 0.0159 | 1.293 | |
| ca++ cardiac fast twitch 1 like | 447 | 0.0201 | 0.0180 | 0.0212 | 1.178 | |
| Two | ||||||
| Tilapia versus OV | Annexin a4 | 534 | 0.0356 | 0.0361 | 0.0366 | 1.014 |
| Tilapia versus AB | 0.0300 | 0.0279 | 0.0314 | 1.125 | ||
| Tilapia versus OV | Lipid phosphate phosphohydrolase 2 | 258 | 0.0233 | 0.0149 | 0.0268 | 1.799 |
| Tilapia versus AB | 0.0233 | 0.0149 | 0.0268 | 1.799 | ||
| AB versus OV | 39s ribosomal protein mitochondrial precursor | 318 | 0.0126 | 0.0138 | 0.0165 | 1.196 |
| Tilapia versus AB | 0.0189 | 0.0138 | 0.0207 | 1.500 | ||
| AB versus OV | Ubiquinol-cytochrome c rieske iron-sulfur polypeptide 1 | 441 | 0.0136 | 0.0096 | 0.0150 | 1.563 |
| Tilapia versus OV | 0.0159 | 0.0096 | 0.0181 | 1.885 | ||
| AB versus OV | Epithelial cadherin precursor | 651 | 0.0691 | 0.0671 | 0.0742 | 1.106 |
| Tilapia versus OV | 0.0799 | 0.0807 | 0.0857 | 1.062 | ||
| Three | ||||||
| AB versus OV | Cell cycle control protein 50a | 372 | 0.0162 | 0.0109 | 0.0218 | 2.000 |
| Tilapia versus OV | 0.0431 | 0.0218 | 0.0558 | 2.560 | ||
| Tilapia versus AB | 0.0457 | 0.0439 | 0.0483 | 1.100 |
Note.—Of the 33 genes, 27 were found with Ka/Ks > 1 only in single cichlid pairwises, five in two pairwises, and one in all three pairwise comparisons.