| Literature DB >> 16737530 |
Patrícia Beldade1, Stephen Rudd, Jonathan D Gruber, Anthony D Long.
Abstract
BACKGROUND: Butterfly wing color patterns are a key model for integrating evolutionary developmental biology and the study of adaptive morphological evolution. Yet, despite the biological, economical and educational value of butterflies they are still relatively under-represented in terms of available genomic resources. Here, we describe an Expression Sequence Tag (EST) project for Bicyclus anynana that has identified the largest available collection to date of expressed genes for any butterfly.Entities:
Mesh:
Year: 2006 PMID: 16737530 PMCID: PMC1534037 DOI: 10.1186/1471-2164-7-130
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
cDNA libraries made from wing discs at different developmental stages
| IDa | Nr of outbred individuals | Nr of ESTsb | GenBank Accession |
| 5TH | 35 caterpillars | 2555 (83) | |
| CPP | 21 crawlers and 15 pre-pupae | 1882 (61) | |
| 24H | 33 females and 33 males | 2014 (54) | |
| 48H | 20 females and 20 males | 1669 (26) | |
| 72H | 10 females and 10 males | 2039 (32) |
a See Methods for a full description of each library; b Total number of ESTs submitted to NCBI ESTdb and, in parentheses, the number of these which were excluded from the analysis (either shorter than 54 bp or part of the 166 UniGenes with low complexity scores; see Methods).
General description of EST project sequences and assemblies
| length | ||||
| # total | min | max | mean | |
| ESTs (PHRED> = 20) | 9903 | 55 (10) | 1264 (1201) | 502 (455) |
| contigs | 4251 | 55 | 2763 | 583 |
| peptides | 2202 | 20 | 539 | 154 |
Figure 1Contig alignment depth and SNP identification. Of the 4,251 UniGenes (contigs) identified, 2,994 were singletons (not shown) and all others had two or more ESTs (i.e., alignment depth of two or greater). The total number of contigs for each alignment depth class is represented by the height of the columns, and the different colors represent different SNP number classes. It should be noted that alignment depth here refers to the number of ESTs in each contig and does not necessarily imply a constant alignment depth at all sites along the contig sequence.
Successful annotation based on BLAST against different gene collections
| BLAST hits | ||||
| Field | min E-valuea | Totalb | Bestc | Uniqued |
| Dmel.pro1 | 1e-171 | 1346 | 110.5 | 4 |
| Dmel.nuc1 | 1e-147 | 264 | 7 | 6 |
| Bmori.pro2 | 0e+00 | 1730 | 141.5 | 194 |
| Bmori nuc2 | 0e+00 | 1265 | 1222 | 20 |
| Bmori.wgs.nuc2 | 0e+00 | 1370 | 283 | 143 |
| lep.nuc3 | 0e+00 | 1012 | 459.5 | 147 |
| invert.pro3 | 1e-171 | 1488 | 170.5 | 4 |
| NonRed.pro4 | 1e-175 | 1487 | 29 | 3 |
| SwissP.pro4 | 1e-176 | 1136 | 25 | 1 |
| organel.nuc5 | 3e-70 | 26 | 3 | 2 |
| Rfam.nuc5 | 0e+00 | 27 | 2 | 0 |
| plant.pro | 1e-125 | 793 | 7.5 | 0 |
| Ecoli.nuc | 4e-12 | 7 | 1 | 1 |
a The threshold maximum E-value was set to 1e-05. b Total number of contigs with significant E-value. c Number of contigs having the lowest E-value for each specified BLAST field. When the lowest and second lowest E-values were the same for a particular contig it counted as 0.5 for each of the hit fields. d Number of contigs having significant BLAST hits exclusively for one of the fields. Some BLAST fields were combined to give the counts in Figure 2: 1 Dmel, 2 Bmori, 3 InvLep, 4 protein databases, 5 non-nuclear genes (as explained in the Methods). All collections used in our BLAST analysis were downloaded from public databases (see Methods) in June-August of 2005 (except for the organellar nucleotidic and the E. coli whole-genome sequences, both obtained in April 2004).
Figure 2UniGene identification through BLAST analyses toselected genomic collections. a) Venn diagram summarizing gene identification based on BLAST against the genomic collections phylogenetically most relevant for B. anynana: "Dmel" for D. melanogaster, "Bmori" for the silkworm B. mori, and "InvLep" for lepidopteran nucleotide sequences and invertebrate proteins (see Methods and Table 3). About 42% of our 4,251 UniGenes did not have a significant BLAST hit for any of these three categories. b) Of the 1,804 genes not included in the Venn diagram, 14 showed a significant BLAST hit to at least one of the additional collections analyzed (details in Table 3 and in the text). The numbers on the left panel represent BLAST hits to groups of these collections (with some overlap across the collections).
GO functional terms for B. anynana genes annotated to Drosophila CG numbers
| GO TERMS | # | % | P-value |
| Signal transducer activity (GO: 0004871) | 72 | 7 | 0.3025 |
| Structural molecule activity (GO: 0005198) | 195 | 25 | |
| Motor activity (GO: 0003774) | 15 | 16 | |
| Catalytic activity (GO: 0003824) | 568 | 14 | |
| Transporter activity (GO: 0005215) | 151 | 12 | 0.3294 |
| Binding (GO: 0005488) | 651 | 15 | |
| Antioxidant activity (GO: 0016209) | 13 | 33 | |
| Enzyme regulator activity (GO: 0030528) | 52 | 14 | |
| Transcription regulator activity (GO: 0030528) | 103 | 12 | 0.0011 |
| Transcription factor activity (GO: 0003700) | 28 | 7 | |
| Translation regulator activity (GO: 0045182) | 26 | 28 | |
| Development (GO: 0007275) | 190 | 8 | |
| Larval or pupal development (GO: 0002165) | 57 | 10 | |
| Pattern specification (GO: 0007389) | 37 | 10 | |
| Metamorphosis (GO: 0007552) | 43 | 9 | |
| Aging (GO: 0007568) | 5 | 10 | |
| Pigmentation (GO: 0048066) | 4 | 7 | |
| Regulation of development (GO: 0050793) | 5 | 7 | |
| Morphogenesis (GO: 0009653) | 89 | 9 | 0.1183 |
| Embryonic development (GO: 0009790) | 46 | 9 | |
| Physiological process (GO: 0007582) | 1013 | 13 | |
| Behavior (GO: 0007610) | 31 | 8 | |
| Cellular process (GO: 0009987) | 1000 | 13 | |
| Regulation of biological process (GO: 0050789) | 217 | 13 | |
| Structural constituent of ribosome (GO: 0003735) | 110 | 57 | |
| Tubulin (GO: 0045298) | 4 | 25 | |
| Wing disc development (GO: 0035220) | 21 | 11 | |
| Tracheal system development (GO: 0007424) | 13 | 10 | |
| 11 | 15 | ||
| 2 | 22 | ||
| 3 | 7 | ||
| 6 | 29 | ||
| Eye pigment metabolism (GO: 0042441) | 4 | 13 |
# is the number of B. anynana gene annotations obtained via BLAST analysis D. melanogaster and B. mori (total CG numbers in B. anynana with GO annotation = 1,164) that belong to the listed GO categories. % refers to the proportion of D. melanogaster genes (total CG numbers in D. melanogaster with GO annotation = 10,391) found in B. anynana for each GO category. The categories specified in this list are a limited subset of all GO categories found to be represented [see complete list in Additional file 7]. A Chi-square test was performed for the null hypothesis that the fraction of B. anynana UniGenes in each GO category relative to the total number of B. anynana UniGenes with an assigned category is equivalent to the fraction of D. melanogaster genes in that category relative to all D. melanogaster genes with a GO assignment. For the cases where Chi-square p > 0.0001, p-values are provided. All other cases are p < 0.0001.
Microsatellites found in wing UniGenes
| Repeat size | Repeat nr | Total | Non ATa | Alignedb | Polymorphicc |
| 2 | >5 | 26 | 12 | 5 | 2 |
| 3 | >3 | 243 | 129 | 56 | 4 |
| 4 | >3 | 44 | 24 | 12 | 4 |
| 5 | >3 | 7 | 5 | 0 | 0 |
a Microsatellites whose repeat unit is not composed exclusively of the nucleotides A and T, b Total numberof microsatellites occurring in a region where the alignment depth is greater than one, c Total numberof microsatellites for which we observed more than one allele in our EST collection. Details in Table S6 [see Additional file 6].