| Literature DB >> 17224046 |
John Wang1, Stephanie Jemielity, Paolo Uva, Yannick Wurm, Johannes Gräff, Laurent Keller.
Abstract
Ants display a range of fascinating behaviors, a remarkable level of intra-species phenotypic plasticity and many other interesting characteristics. Here we present a new tool to study the molecular mechanisms underlying these traits: a tentatively annotated expressed sequence tag (EST) resource for the fire ant Solenopsis invicta. From a normalized cDNA library we obtained 21,715 ESTs, which represent 11,864 putatively different transcripts with very diverse molecular functions. All ESTs were used to construct a cDNA microarray.Entities:
Mesh:
Substances:
Year: 2007 PMID: 17224046 PMCID: PMC1839134 DOI: 10.1186/gb-2007-8-1-r9
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Fire ant EST and assembly statistics
| Total number of sequence reads | 28,133 |
| cDNA clones sequenced from 5' end | 22,560 |
| Extra reads due to re-sequencing | 5,573 |
| High-quality sequences after filtering* | 21,715 |
| Average EST size after trimming (bp) | 522.4 |
| Total number of assembled sequences | 11,864 |
| Number of contigs | 4,319 |
| True contigs (from >2 different clones) | 3,057 |
| Re-sequencing contigs† | 1,262 |
| Number of singletons | 7,545 |
| Number of putatively different fire ant sequences | <11,864 |
| Average size of assembled sequences (bp) | 600.5 |
*High quality sequences are those with greater than 200 bp after trimming of vector and primer sequences and with a phred value higher than 15. In addition, this set excludes artifactual sequences that were manually removed. †Contigs composed of replicate sequences of only one clone
Figure 1Sequence analysis by blastx searches. (a) Percentage of fire ant assembled sequences with and without blastx matches at various E-value cutoffs. (b) Quantitative overview of organisms providing the best-matching homologous protein sequences to fire ant assembled sequences (E ≤ 1e-5).
Gene Ontology annotation
| 4,301* | (100.0%) | 486* | (100.0%) | 14,778* (100.0%) | |||
| Antioxidant activity | 20 | (0.5%) | 2 | (0.4%) | 39 (0.3%) | ||
| Binding | ↑ | 174 | (35.8%) | 4,319 (29.2%) | |||
| Catalytic activity | ↑ | ↑ | 4,072 (27.6%) | ||||
| Chaperone regulator activity | ↑ | 0 | (0.0%) | 1 (0.0%) | |||
| Enzyme regulator activity | 91 | (2.1%) | 7 | (1.4%) | 382 (2.6%) | ||
| Molecular function unknown | ↓ | ↓ | 1,852 (12.5%) | ||||
| Motor activity | 29 | (0.7%) | 1 | (0.2%) | 88 (0.6%) | ||
| Nutrient reservoir activity | ↑ | 0 | (0.0%) | 8 (0.1%) | |||
| Obsolete molecular function | 0 | (0.0%) | ↑ | 0 (0.0%) | |||
| Signal transducer activity | ↓ | ↓ | 1,091 (7.4%) | ||||
| Structural molecule activity | 210 | (4.9%) | 59 | (12.1%) | 759 (5.1%) | ||
| Transcription regulator activity | ↓ | 4 | (0.8%) | 841 (5.7%) | |||
| Translation regulator activity | ↑ | 7 | (1.4%) | 92 (0.6%) | |||
| Transporter activity | 235 | (5.5%) | 12 | (2.5%) | 1,014 (6.9%) | ||
| Triplet codon-amino acid adaptor activity | ↓ | 0 | (0.0%) | 220 (1.5%) | |||
| 4,838* | (100.0%) | 362* | (100.0%) | 14,986* (100.0%) | |||
| Cell† | ↑ | 147 | (40.6%) | 5,225 (34.9%) | |||
| Cellular component unknown | ↓ | ↓ | 1,920 (12.8%) | ||||
| Envelope | 107 | (2.2%) | 1 | (0.3%) | 290 (1.9%) | ||
| Extracellular matrix | 14 | (0.3%) | 0 | (0.0%) | 46 (0.3%) | ||
| Extracellular matrix part | 4 | (0.1%) | 0 | (0.0%) | 23 (0.2%) | ||
| Extracellular region | ↓ | 2 | (0.6%) | 416 (2.8%) | |||
| Extracellular region part | 23 | (0.5%) | 0 | (0.0%) | 88 (0.6%) | ||
| Membrane-enclosed lumen | 160 | (3.3%) | 3 | (0.8%) | 515 (3.4%) | ||
| Organelle | ↑ | 100 | (27.6%) | 3,007 (20.1%) | |||
| Organelle part | 548 | (11.3%) | 22 | (6.1%) | 1,632 (10.9%) | ||
| Protein complex | 575 | (11.9%) | ↑ | 1,756 (11.7%) | |||
| Synapse | 7 | (0.1%) | 0 | (0.0%) | 40 (0.3%) | ||
| Synapse part | 3 | (0.1%) | 0 | (0.0%) | 27 (0.2%) | ||
| Virion† | ↑ | 0 | (0.0%) | 1 (0.0%) | |||
| 5,453* | (100.0%) | 630* | (100.0%) | 22,798* (100.0%) | |||
| Biological process unknown | ↓ | ↓ | 888 (3.9%) | ||||
| Cellular process | ↑ | ↑ | 7,772 (34.1%) | ||||
| Development | ↓ | ↓ | 2,148 (9.4%) | ||||
| Growth | 17 | (0.3%) | 0 | (0.0%) | 102 (0.4%) | ||
| Interaction between organisms | 6 | (0.1%) | 0 | (0.0%) | 92 (0.4%) | ||
| Physiological process | ↑ | ↑ | 7,858 (34.5%) | ||||
| Pigmentation | 1 | (0.0%) | 0 | (0.0%) | 51 (0.2%) | ||
| Regulation of biological process | 436 | (8.0%) | 11 | (1.7%) | 1,658 (7.3%) | ||
| Reproduction | ↓ | ↓ | 826 (3.6%) | ||||
| Response to stimulus | ↓ | 7 | (1.1%) | 1,402 (6.1%) | |||
| Viral life cycle | ↑ | 0 | (0.0%) | 1 (0.0%) | |||
Listed are the numbers and percentages of assembled fire ant sequences and of D. melanogaster genes that match at least one of the second-level GO terms for molecular function, cellular component, or biological process. GO annotations for fire ant sequences were inferred electronically using two methods: blastx homology to GO-annotated proteins and Prosite protein domain scans. Statistically significant over- (↑) or underrepresentation (↓) of GO terms in fire ant relative to the Drosophila genome are indicated in bold (p < 10-8, Bonferroni-corrected hypergeometric test). *This number represents the sum of the numbers of occurences of GO terms below this level. †The 'cell part' and 'virion part' GO categories were excluded from analyses because they were redundant with the 'cell' and 'virion' categories, respectively.
Putative Hymenoptera-specific genes
| Blast statistics | Confidence7 | |||||||||||||
| Identifier (length) | Span | Frame | ORF2 (bp) | I3 | Exp4 | Bit-score | E-value | Linkage Group | Span | Strand | ORF2 (bp) | Est5 | Annotated gene6 | |
| SI.CL.8.cl.881.Contig1 (724 bp) | 509-640 | 2 | 300 | • | 272 | 1.24E-18 | 6 | 2701427-2701558 | + | 429 | *** | |||
| SI.CL.8.cl.843.SiJWH04BDO2.scf (730 bp) | 582-761 | 3 | 147 | • | 210 | 1.99E-12 | NW_001254419.8 | 44307-44486 | - | 147 | • | Near NH homology. GB18184-PA on reverse strand | ** | |
| SI.CL.19.cl.1938.Contig1 (835 bp) | 21-323 | 3 | 372 | T | • | 212 | 1.43E-12 | 6 | 1145090-1145392 | - | 429 | *** | ||
| SI.CL.19.cl.1953.SiJWC11BBX.scf (613 bp) | 81-215 | 3 | 555 | • | 166 | 5.08E-08 | 8 | 5253595-5253729 | - | 372 | • | GB14543-PA. Near NH homology on reverse strand | * | |
| 306-416 | 87 | 4.5E-15 | 5252894-5253094 | 306 | ||||||||||
| 435-635 | 200 | 5253189-5253299 | 318 | |||||||||||
| SI.CL.23.cl.2326.Contig1 (632 bp) | 413-577 | 2 | 219 | • | 291 | 1.33E-20 | 11 | 8022183-8022347 | + | 480 | *** | |||
| SI.CL.26.cl.2688.Contig1 (859 bp) | 60-131 | 39 | 87 | • | 98 | 9.74E-15 | 9 | 10421877-10421948 | - | 549 | • | ** | ||
| 119-256 | 29 | 558 | 186 | 10421751-10421888 | ||||||||||
| SI.CL.33.cl.3311.Contig1 (710 bp) | 228-359 | 3 | 189 | • | 258 | 3.07E-17 | 14 | 8634060-8634191 | - | 132 | • | Near | * | |
| SI.CL.33.cl.3384.Contig1 (469 bp) | 229-327 | 19 | 264 | T,S | • | 160 | 3.11E-13 | 14 | 3770768-3770866 | - | 231 | *** | ||
| 362-454 | 29 | 180 | S | 104 | 3770649-3770741 | 186 | ||||||||
| SI.CL.35.cl.3595.Contig1 (415 bp) | 123-398 | 3 | 342 | • | 301 | 5.97E-22 | NW_001261806.8 | 12471-12746 | + | 327 | *** | |||
| SiJWA02BAZ2.scf (600 bp) | 374-469 | 2 | 261 | • | 193 | 2.13E-15 | 5 | 9909503-9909598 | + | 627 | • | Near GB15931-PA and NH homology on reverse strand | * | |
| 533-604 | 98 | 9909356-9909427 | ||||||||||||
| SiJWA03CAW.scf (666 bp) | 49-144 | 1 | 96 | 120 | 2.1E-16 | NW_001259848.8 | 47860-47955 | + | 99 | • | GB10007-PA on reverse strand | *** | ||
| 136-297 | 117 | 182 | 47704-47865 | 726 | ||||||||||
| SiJWA12ACK.scf (212 bp) | 137-268 | 29 | 69 | • | 264 | 1.42E-19 | 3 | 5151467-5151598 | + | 162 | • | Near | ** | |
| 63-143 | 39 | 72 | 69 | 5151391-5151471 | 189 | |||||||||
| SiJWB12BCQ.tag5_B12_04.scf (754 bp) | 121-369 | 1 | 354 | • | 254 | 1.1E-16 | 7 | 5620128-5620376 | + | 336 | *** | |||
| SiJWC11BAT.scf (342 bp) | 189-278 | 3 | 228 | • | 160 | 3.98E-17 | 14 | 8645843-8645932 | + | 162 | • | Near | ** | |
| 282-368 | 123 | 6.41E-14 | 8645754-8645840 | 117 | ||||||||||
| SiJWE02BBO2.scf (865 bp) | 714-863 | 3 | 129 | • | 243 | 1.26E-15 | 6 | 4850974-4851123 | - | 354 | Near | ** | ||
| SiJWF07BCC.tag5_F07_11.scf (799 bp) | 329-529 | 2 | 96 | • | 196 | 6.59E-11 | 3 | 6205208-6205408 | - | 108 | Near NH homology. | ** | ||
| SiJWG01BDU2.scf (759 bp) | 21-227 | 3 | 102 | • | 354 | 1.23E-26 | 2 | 9618145-9618351 | + | 171 | • | GB12576-PA and NH homology on reverse strand | * | |
| SiJWG03ACB.scf (623 bp) | 172-609 | 1 | 471 | • | 558 | 4.63E-47 | 10 | 2344965-2345402 | + | 1440 | • | GB19005-PA | *** | |
| SiJWH02AAN.scf (469 bp) | 100-294 | 1 | 102 | • | 341 | 1.32E-30 | 12 | 281374-281568 | - | 294 | - | *** | ||
| 28-105 | 69 | 104 | 281564-281641 | 207 | ||||||||||
| SiJWH05BDPR5A08.scf (658 bp) | 580-657 | 1 | 78 | • | 161 | 1.1E-15 | 10 | 2890267-2890344 | + | 159 | • | Near | ** | |
| SiJWH05BDV2.scf (517 bp) | 204-353 | 3 | 198 | • | 237 | 4.87E-15 | 5 | 6704423-6704572 | + | 174 | ** | |||
| SiJWH08AAT.scf (653 bp) | 76-162 | 1 | 60 | • | 141 | 4.53E-20 | 5 | 1169177-1169263 | + | 84 | • | Near | * | |
| 151-195 | 102 | 75 | 4.52E-13 | 1169261-1169305 | 69 | |||||||||
| SiJWH08ADY.scf (563 bp) | 236-496 | 2 | 327 | • | 312 | 1.32E-22 | 12 | 4477772-4478032 | - | 432 | GB16574-PA | *** | ||
1Solenopsis invicta assembled sequences that show no significant similarity to any known non-hymenopteran sequence (E > 1), but high similarity to a region of the honey bee genome (E < e-10). 2Length in base-pairs of the largest overlapping in-frame open reading frame. 3In-frame Interproscan annotation of fire ant assembled sequence. T means 'transmembrane region', S means 'signal peptide'. 4Gene is known (•) to be expressed in fire ant (unpublished microarray data). 5In honey bee, EST evidence exists (•) within 5,000 bp of the aligned region. 6This column shows the annotation of overlapping or nearby (within 5,000 bp) honey bee genes, as well as the nearby presence of genes from non-hymenopteran organisms. Numbers starting with GB are honeybee Official Gene Set numbers. 'Ab initio prediction' indicates that Gnomon, Genscan, or another algorithm was used to predict a gene that was not retained for the bee genome Official Gene Set. 'NH homology' indicates the nearby presence of a gene from non-hymenopteran organisms. 7Based on visual inspection we assigned a confidence level (the more asterisks the better) to each ant-bee putative gene pair (see Materials and methods). 8Apis mellifera unanchored scaffolds such as NW_001254419.1 are regions that have not been mapped to a chromosome. 9Multiple alignment frames for a S. invicta transcript indicate possible frameshifts during sequencing.
Figure 2Examples of two candidate Hymenoptera-specific genes. (a) Fire ant sequence SI.CL.23.cl.2326.Contig1 matches an ab intio predicted honey bee gene that has no homology to any sequences in the public databases. The predicted gene was not included in the Honey Bee Official Gene Set. (b) Fire ant assembled sequence SiJWG03ACB.scf is the first EST evidence for the ab initio predicted honey bee gene GB19005-PA. Fire ant sequences are depicted as yellow boxes. Orientation (5' to 3') is indicated by an arrow. Predicted honey bee genes are depicted in purple; official Gene Set genes are shown in red. Images are based on output from Beebase (see Materials and methods).
Fire ant assembled sequences putatively involved in behavior
| Fire ant assembled sequence | Gene name and behavior in | E-value | |
| SI.CL.10.cl.1087.Contig1 | CG5670-PB | 1.0e-134 | |
| SI.CL.13.cl.1344.SiJWC08BDJ.scf | CG4443-PA | 1.0e-73 | |
| SI.CL.13.cl.1344.Contig1 | CG4443-PA | 5.0e-73 | |
| SiJWE02ABO.scf | CG3263-PG | 4.0e-66 | |
| SiJWA12BCM.scf | CG2212-PA | 1.0e-65 | |
| SiJWC02AAC2.scf | CG3966-PA | 3.0e-55 | |
| SiJWB06ABV.scf | CG4379-PB | 2.0e-42 | |
| SI.CL.3.cl.316.Contig1 | CG8472-PB | 2.0e-42 | |
| SI.CL.20.cl.2069.Contig1 | CG2212-PB | 5.0e-42 | |
| SiJWH05AEA.scf | CG2048-PC | 4.0e-40 | |
| SiJWH06BAG.scf | CG8472-PB | 4.0e-39 | |
| SI.CL.9.cl.956.Contig1 | CG14724-PB | 6.0e-38 | |
| SiJWA04BDS2.scf | CG3331-PA | 7.0e-38 | |
| SiJWG01ADR.scf | CG7826-PC | 1.0e-24 | |
| SiJWD02ACW.scf | CG7758-PA | 1.0e-24 | |
| SI.CL.31.cl.3101.Contig1 | CG1232-PB | 3.0e-16 | |
| SiJWG06BCF2.scf | CG5670-PA | 8.0e-15 | |
| SiJWF02BDZ.scf | CG32688-PA | 1.0e-13 | |
| SiJWB11ABH.scf | CG10033-PG | 1.0e-11 | |
| SiJWB03ACL.scf | CG7100-PH | 2.0e-11 | |
| SiJWD03ACB.scf | CG10697-PA | 1.0e-07 |
*Although the best hit for SiJWB11ABH.scf is foraging, a type I cGMP-dependent protein kinase (PKG), when using blastx analysis with only the Drosophila predicted proteins, closer inspection using all the nr sequences suggests that it is actually a type II PKG.
Fire ant assembled sequences most similar to viral genes
| Fire ant assembled sequence | Best virus hit ID | Hit description | E-value | Identity (%) |
| SI.CL.23.cl.2338.Contig1 | Q5Y974 | Structural polyprotein. [ | 0 | 98 |
| SI.CL.23.cl.2338.Contig2 | Q5Y974 | Structural polyprotein. [ | 0 | 92 |
| SI.CL.8.cl.873.Contig1 | Q65353 | ORF B. [ | 2.0e-76 | 52 |
| SiJWG09BAM.scf | Q5Y975 | Nonstructural polyprotein. [ | 2.0e-63 | 96 |
| SiJWF01ADQ.scf | Q6AW71 | (orf1)RNA-dependent RNA polymerase. [ | 3.0e-51 | 93 |
| SiJWB11ACS.scf | Q6AW71 | (orf1)RNA-dependent RNA polymerase. [ | 1.0e-44 | 90 |
| SI.CL.29.cl.2930.Contig1 | Q65353 | ORF B. [ | 1.0e-43 | 55 |
| SI.CL.28.cl.2823.Contig1 | Q38QJ4 | Polyprotein. [Kelp fly virus] | 7.0e-34 | 28 |
| SiJWC03CAP.scf | Q5ZNV0 | Hypothetical protein. [ | 2.0e-22 | 51 |
| SiJWA06BBH.scf | Q85431 | RNA polymerase. [Rice stripe virus] | 1.0e-21 | 35 |
| SI.CL.37.cl.3723.Contig1 | Q5S8C7 | Non-structural polyprotein (Fragment). [Honey bee virus - Israel] | 1.0e-18 | 40 |
| SI.CL.41.cl.4135.Contig1 | Q38QJ4 | Polyprotein. [Kelp fly virus] | 2.0e-15 | 34 |
| SI.CL.19.cl.1909.Contig1 | Q6AW70 | (orf2)Coat protein. [ | 2.0e-14 | 84 |
| SI.CL.6.cl.610.Contig1 | Q8QY61 | Polyprotein. [Sacbrood virus] | 2.0e-11 | 26 |
| SI.CL.25.cl.2511.Contig1 | O11437 | (pv4)Non-capsid protein. [ | 6.0e-11 | 26 |
| SI.CL.6.cl.610.Contig3 | Q9QRA8 | Polyprotein (Fragment). [Tomato ringspot virus] | 2.0e-10 | 23 |
| SI.CL.6.cl.610.Contig2 | Q3YC01 | Polyprotein (Fragment). [Stocky prune virus] | 2.0e-06 | 29 |
| SiJWA06CAM.scf | Q6QLR4 | (RdRp)RNA-dependent RNA polymerase (Fragment). [ | 3.0e-05 | 37 |
| SiJWC05ADI.scf | Q5ZP67 | Soluble protein. [ | 7.0e-05 | 38 |
| SI.CL.40.cl.4005.Contig1 | P03515 | (N)Nucleocapsid protein (Nucleoprotein). [ | 4.0e-04 | 32 |
| SiJWG01BBJ2.scf | Q9JGN8 | (p1vc)P1. 339K. [Rice grassy stunt virus] | 0.001 | 23 |
| SiJWD07ACK.scf | Q8BDE0 | Replicase polyprotein. [Acute bee paralysis virus] | 0.002 | 25 |
| SI.CL.10.cl.1089.Contig1 | Q9YMJ7 | Envelope protein. [ | 0.003 | 23 |
| SI.CL.16.cl.1675.Contig1 | Q9YW13 | (MSV079)Hypothetical protein MSV079. [ | 0.004 | 42 |
| SiJWH05ADG.scf | Q76LW4 | Polyprotein. [Kakugo virus] | 0.008 | 27 |
| SiJWE11AAZ.scf | Q5ZNU9 | Soluble protein. [ | 0.01 | 34 |