Literature DB >> 17052344

Analysis and functional annotation of expressed sequence tags from the fall armyworm Spodoptera frugiperda.

Youping Deng1, Yinghua Dong, Venkata Thodima, Rollie J Clem, A Lorena Passarelli.   

Abstract

BACKGROUND: Little is known about the genome sequences of lepidopteran insects, although this group of insects has been studied extensively in the fields of endocrinology, development, immunity, and pathogen-host interactions. In addition, cell lines derived from Spodoptera frugiperda and other lepidopteran insects are routinely used for baculovirus foreign gene expression. This study reports the results of an expressed sequence tag (EST) sequencing project in cells from the lepidopteran insect S. frugiperda, the fall armyworm.
RESULTS: We have constructed an EST database using two cDNA libraries from the S. frugiperda-derived cell line, SF-21. The database consists of 2,367 ESTs which were assembled into 244 contigs and 951 singlets for a total of 1,195 unique sequences.
CONCLUSION: S. frugiperda is an agriculturally important pest insect and genomic information will be instrumental for establishing initial transcriptional profiling and gene function studies, and for obtaining information about genes manipulated during infections by insect pathogens such as baculoviruses.

Entities:  

Mesh:

Substances:

Year:  2006        PMID: 17052344      PMCID: PMC1634997          DOI: 10.1186/1471-2164-7-264

Source DB:  PubMed          Journal:  BMC Genomics        ISSN: 1471-2164            Impact factor:   3.969


Background

The nucleotide sequences from numerous animal genomes have been derived and include examples from both vertebrate and invertebrate organisms. In addition, determination of the genomic sequences of many more organisms are in progress, yielding a broad picture of the diversity and common pathways among differing organisms. Genome sequences for the insects Apis mellifera (honeybee), Anopheles gambiae (mosquito), Drosophila melanogaster (fruit fly), and Bombyx mori (silkworm) have been reported [1-5], and additional insect genome sequences including Acyrthosiphon pisum (pea aphid), Aedes aegypti and Culex pipiens (mosquitoes), several Drosophila species, Nasonia vitripennis (parasitoid wasp), Rhodnius prolixus (insect vector for Trypanosoma cruzi), and Tribolium castaneum (red flour beetle) are anticipated. In addition, a number of EST databases derived from Lepidoptera are available (NCBI dbEST). The Lepidoptera (moths and butterflies) are the second largest order of the class Insecta. As such, they are a diverse group of insects distributed worldwide and throughout different climates ranging from that in Siberia to the tropics. Thus, it is important to compare the genomes of a number of species within the order and to that of other insects. Lepidoptera are viewed as being among the most beautiful insects, yet their larvae are major pests to economically important crops and forests. Among the Lepidoptera, the silkworm, B. mori, has been studied intensively, since it is a commercially important insect. In addition, the Lepidoptera are valued as models to examine insect-plant and insect-pathogen interactions. Their study and comparative genomic analyses will render valuable tools for insect pest management and the improvement of widely used lepidopteran pathogens, the baculoviruses, as foreign gene expression vectors. In this study, we report the establishment of an expressed sequence tag (EST) database of 1,195 unique sequences from the cell line IPLB-SF-21 (SF-21) [6], a cell line derived from immature ovaries of pupae of the fall armyworm, Spodoptera frugiperda. This and other EST databases can serve as the starting point from which other S. frugiperda genome clones can be surveyed or to explore gene expression profiles using microarray assays. More importantly, these and additional ESTs can be the basis for comparative genomic analyses among Lepidoptera or other genomes within the Insecta.

Results

General sequence survey

To start characterizing SF-21 sequences, two independent directionally cloned cDNA libraries that had been previously constructed for use in yeast two-hybrid screens using the plasmid vectors pB42AD and pYES2 (Invitrogen) were utilized. Initially, about 200 clones were partially sequenced from each library to assess library quality. Both libraries yielded acceptable results. The library cloned in pYES2 was selected for further sequencing. In all, a total of 3,365 cloned inserts were subjected to single-pass sequencing from their 5' ends, 192 clones in the vector pB42AD and 3,173 clones in the vector pYES2. The 3,365 sequences were trimmed of vector sequences, poly A/T tails, low quality, adaptor, and contaminating bacterial sequences, and screened for a minimum length of 200 bp. This resulted in a total of 2,367 high quality ESTs with an average length of 610 bases (156 sequences from pB42AD and 2,211 sequences from pYES2). No attempt was made to carry out sequencing to saturation. These ESTs were assembled using the CAP3 [7] program and verified using the Phrap [8] program. Both programs assemble overlapping ESTs to commence forming contigs. A total of 1,417 ESTs were assembled into 244 contigs, leaving 951 sequences as singlets. Contigs and singlets together resulted in 1,195 unique sequences that putatively represent different transcripts. The number of ESTs in the 244 contigs varied from 2 to 63; 56% of contig sequences had two ESTs, 10% had three ESTs, and 10% had greater than 10 ESTs (Fig. 1). The average length of the assembled contigs was 854 bases; longer than the average length of singlets (617 bases). The longest contig sequence, contig 138, was 2,361 bases.
Figure 1

Distribution of . Percentage distribution of contig sequences with number of ESTs. The color-coded legend indicates the number of ESTs in the contig sequences.

Highly redundant genes

A total of 14 contigs containing 307 ESTs were highly redundant. This accounted for more than 13% of the total high quality ESTs. The minimum number of ESTs that made up these most highly redundant contigs was 13 (Table 1). Distribution of ESTs in each contig can be accessed via the ESTMD database [9] using the contig viewer search function. The best matched genes of 6 contigs, totaling 138 ESTs, are from S. frugiperda, verifying the source of the cDNAs. Nearly half (6) of the highly redundant contigs, totaling 90 ESTs, had significant homology to various ribosomal proteins, indicating high transcript abundance of ribosomal protein genes, as expected. Four contigs totaling 89 ESTs had matches similar to sequences derived from mitochondrial cytochrome b or cytochrome oxidase subunits (Table 1). The most redundant contig was composed of 63 ESTs and had significant homology to NADH dehydrogenase subunit -1 (ND-1) from S. frugiperda [10].
Table 1

Most abundantly represented transcripts in the Spodoptera frugiperda cDNA library.

ContigESTsGI#Bit scoreE-valueIdentitiesGene descriptionsOrganism
Contig 19013403637072381e-61129/215cytochrome oxidase IIGlyphodes bicolor
Contig 9813183143102415e-96122/159Cytochrome c oxidase subunit 3Ostrinia furnacalis
Contig 1161316566722458e-128229/244ribosomal protein S3AS. frugiperda
Contig 14613182530451385e-3276/11260s acidic ribosomal protein P2S. frugiperda
Contig 611473020662496e-65129/243CG11522-PB, isoform BD. melanogaster
Contig 22514182530431471e-3478/11160s acidic ribosomal protein P1S. frugiperda
Contig 701554609281452e-126233/307ribosomal protein SAB. mori
Contig 1391727260896411e-113204/218ribosomal protein S2S. frugiperda
Contig 1601722094837389e-106199/283Cytochrome bSamia cynthia ricini
Contig 231818253041550e-155283/31560 Saccadic ribosomal protein POS. frugiperda
Contig 134211258526111190568/608Heat shock 70 kDa cognate 4Manduca sexta
Contig 23930397526354800234/241elongation factor-1 alpha F2D. melanogaster
Contig 1414614389286850364/504Cytochrome oxidase subunit 1Feltia jaculifera
Contig 1963552886226e-111102/104ND-1 protein geneS. frugiperda

Comparative sequence analysis of S. frugiperda cDNA data

We used the 1,195 unique ESTs to search non-redundant protein databases using BLASTX (Table 2). A total of 724 sequences (60.6%) matched known proteins at a cut-off expectation (E)-value of 10-5 or below. Eleven sequences (0.9%) had hits with E-values at E < 10-150, 53 sequences (4.4%) had hits with E-values between 10-150 and 10-100, 283 sequences (23.6%) had hits with E-values between 10-100 and 10-50, 237 sequences (19.8%) had hits with E-values between 10-50 and 10-20, and 140 sequences (12%) had hits with E-values between 10-20 and 10-5. The main matched E-value, between 10-50 and 10-20, included 510 searched unique sequences, which was more than 70% of the 724 matched sequences. The remainder of the unique sequences (39.4%) had no meaningful matches (E > 10-5).
Table 2

Comparative analysis of Spodoptera ESTs to Drosophila1 and other sequences.

All matches Drosophila
ContigSingletsTotalContigSingletsTotal
HomologyN%N%N%N%N%N%
E ≤ 10 -1501150011210500101
E ≤ 10 -10030152345373016235538
E ≤ 10 -509245191372833988461863827440
E ≤ 10 -204824189362373345241793722433
E ≤ 10 -522111182314020179992011617
Total matched203915215572461190854875067757
No match419430454713954224845051843
Total24410095110011951002441009511001195100

1 [11]

Given that Drosophila is the most thoroughly annotated insect genome [11], we compared S. frugiperda unique sequences with Drosophila genes using BLASTX. A total of 677 sequences had hits with Drosophila genes at E < 10-5, that is, 56.7% of the 1,195 unique sequences (Table 2). A subset of 53 unique sequences (4.4%) matched to Drosophila genes with the cut-off equal to E < 10-100. A total of 274 sequences (22.9%) had matches with Drosophila genes between 10-100 and 10-50, 224 sequences (18.7%) had matches between E-values of 10-50 and 10-20, and 116 sequences (9.7%) had matches between E-values of 10-20 and 10-5(Table 2). We compared our unique sequences from the SF-21 cell line with ESTs obtained from another S. frugiperda-derived cell line, Sf9 [12], using BLASTN [13]. A total of 419 sequences (35%) matched the ESTs from Sf9 cells with an E-value equal to 0 (Table 5). A total of 241 sequences (20.2%) were similar but not exact matches with ESTs from Sf9 with E-values > 0 and < 10-5. In addition, almost half of our ESTs (535 sequences or 44.8%) had no significant match with the Sf9 ESTs. Therefore, 776 of our sequences (65%) were not previously reported in the Sf9 EST project.
Table 5

Comparative analysis of Spodoptera frugiperda SF-21-derived ESTs with Sf9- and midgut-derived ESTs1.

S. frugiperda Sf9ESTsS. frugiperda midgutESTs
ContigSingletsTotalContigSingletsTotal
HomologyN%N%N%N%N%N%
E ≤ 024399.5917618.5141935.0672.87818.52887.36
0 < E ≤ 10-15000.00373.89373.1020.82171.79191.59
10-150 < E ≤ 10-10000.00474.94473.9341.64161.68201.67
10-100 < E ≤ 10-5000.00586.10584.8500.00242.52242.01
10-50 < E ≤ 10-2000.00495.15494.1010.41303.15312.59
10-20 < E ≤ 10-500.00505.26504.1841.64727.57766.36
Total matched24399.5941743.8566055.23187.3824025.2425821.59
No match10.4153456.1553544.7722692.6271174.7693778.41
Total24410095110011951002441009511001195100

1 SF-21 ESTs, this report; Sf9 ESTs [12]; midgut ESTs (NCBI dbEST)

We also compared our sequences with about 4,000 S. frugiperda midgut-specific ESTs available in NCBI dbEST database using BLASTN. Only 88 sequences (7.36%) matched with E-value equal to 0 (Table 5). The remaining sequences matched midgut ESTs to different extents: 19 sequences (1.59%) had E-values between 0 and 10-150, 20 sequences (1.67 %) had E-values of 10-150 and 10-100, 24 sequences (2.01%) had E-values of 10-100 and 10-50, 31 sequences (2.59%) had E-values of 10-50 and 10-20, and 76 sequences (6.36%) had E-values of 10-20 and 10-5. A total of 937 sequences (78.4%) had no hits with the available midgut ESTs. In addition, we compared our unique sequences with those of the silkworm B. mori. We used BLASTN to search the all B. mori EST sequences available using a BLAST search site [14], given that the genome sequence is not fully annotated. A total of 492 from the 1,195 unique sequences (41.17%) had hits with silkworm sequences at E < 10-5 (Table 6). Of these, 133 unique ESTs (27% of the 492 sequences) had E-values between 10-100 and 10-50. A total of 703 sequences (58.8%) had no matches with silkworm sequences.
Table 6

Comparative analysis of Spodoptera frugiperda SF-21 ESTs with silkworm ESTs.

Bombyx mori ESTs
ContigSingletsTotal
HomologyN%N%N%

E ≤ 05321.7270.74605.02
0 < E ≤ 10-150176.97121.26292.43
10-150 < E ≤ 10-1005120.90606.311119.29
10-100 < E ≤ 10-504418.03899.3613311.13
10-50 < E ≤ 10-20135.33798.31927.70
10-20 < E ≤ 10-5135.33545.68675.61
Total matched19178.2830131.6549241.17
No match5321.7265068.3570358.83

Total2441009511001195100

1 SF-21 ESTs, this report; Sf9 ESTs [12]; midgut ESTs (NCBI dbEST)

Conserved S. frugiperda and Drosophila gene sequences

We found 11 highly conserved sequences between S. frugiperda and Drosophila genes based on BLASTX analyses. All of the 11 sequences were from contigs, with one, contig 134, having an E-value of 10-154. Six sequences had matches with their homologous Drosophila genes at an E-value of 0. We chose contig 134 for phylogenetic analysis given it was the most conserved sequence between Spodoptera and Drosophila. The sequences from heat shock 70 cognate 4 proteins were aligned with CLUSTALW and only similar sequences with complete coding sequences (CDS) were included in the alignment as described in Methods. Alignments of the heat shock protein 70 cognate 4 (contig 134) with similar ones in the Class Insecta (Fig. 2) showed that the heat shock protein 70 cognate 4 of S. frugiperda formed a single clade with Trichoplusia ni, Manduca sexta, Bombyx mori, and Lonomia oblique, as expected since all these organisms belong to the order Lepidoptera. This clade shares a common ancestor with members of other orders, Diptera, Orthoptera, and Hymenoptera, and with insects in other clades (e.g., Ceratitis capitata,Chironomus tentans, Drosophila melanogaster, Anopheles gambiae, Locusta migratoria, and Cotesia rubecula).
Figure 2

Phylogenetic analysis with neighbor-joining tree. A. The heat shock proteins of 10 insects and 8 other organisms (Cotesia rubecula, Ceratitis capitata, Chironomus tentans, Manduca sexta, Locusta migratoria, Drosophila melanogaster, Anopheles gambiae, Lonomia oblique, Bombyx mori, Trichoplusia ni, Bos taurus, Gallus gallus, Rattus norvegicus, Danio rerio, Xenopus laevis, Caenorhabditis elegans, Mus musculus, Homo sapiens) along with Contig 134 (heat shock 70 cognate 4 protein) are presented in unrooted phylogenetic analyses. B. Phylogenetic tree showing heat shock proteins with Saccharomyces cerevisiae as the outgroup. The bootstrap values (percentages) are indicated at the corresponding node.

Functional classification of S. frugiperda ESTs

Gene Ontology (GO) has been widely used to characterize gene function annotation and classification [15]. GO describes gene function using controlled vocabulary and hierarchy including molecular function, biological processes, and cellular communication. In this report, we used well annotated GO information of Drosophila melanogaster to interpret the gene function of our ESTs. Each unique sequence from S. frugiperda was assigned the same gene function of the best BLASTX hit (E ≤ 10-5) with Drosophila sequences based on the annotated GO of Drosophila [15]. This method has been successfully used to annotate bee brain EST function [16]. The major GO categories for the unique sequences included those outlining gene molecular function (Additional file 1- Table 7), biological processes (Additional file 2- Table 8), and cellular components (Additional file 3- Table 9). The highest final child GO term for molecular function was the hydrogen transporting two-sector ATPase in the nucleotide binding category. The highest final GO term for biological processes was protein biosynthesis, under the protein metabolism and biosynthesis categories, which had 84 unique sequences accounting for 7% of the total unique sequences matched in this category. The largest number for a final GO term in cellular components was cytosolic large ribosomal subunit under both the ribosome and cytosol categories. Seventy-eight unique sequences belonged to this GO term, which accounted for 6.5% of the total unique sequences annotated for cellular components. We found 13 unique sequences (1.1%) showing significant similarity with Drosophila signal transduction factors (Table 3). Among these, 6 sequences belonged to the receptor binding category and the remaining 7 sequences belonged to receptor and receptor signaling proteins.
Table 3

Signal transduction sequences of Spodoptera frugiperda compared with Drosophila genes.

S. frugiperda sequencesFlybase numberHit lengthBit scoreE-valueIdentities Drosophila geneGene description
pyes2-ct_019_b03.p1caFBgn00395418362751.00e-74138/266Cg12876Signal transduction activity
pyes2-ct_006_f12.p1caFBgn0035771753360e-104164/231Cg8583Signal recognition particle binding
pyes2-ct_021_a12.p1caFBgn0027363689841.00e-1735/59StamSignal transducing adaptor molecule
Contig 14FBgn000396311911392.00e-3386/223ushInvolved in torso signaling pathway
pyes2-ct_005_g01.p1caFBgn00357717532221.00e-58104/149cg8583Involved in signal recognition particle complex
pyes2-ct_006_f09.p1caFBgn003727722283143.00e-86152/242Cg17735Ligand-dependent nuclear receptor interactor activity
pyes2-ct_028_g06.p1caFBgn00206183181991.00e-5196/107Rack1Receptor of activated protein kinase C 1
Contig 140FBgn0020618318578e-165274/319Rack1Receptor of activated protein kinase C 1
pyes2-ct_030_g06.p1caFBgn0004569444674.00e-1229/43argosReceptor antagonist activity
p42ad_2_001_b07.p1cb.expFBgn003711312581333.00e-3271/131cg33291Putative protein binding
pyes2-ct_003_e12.p1caFBgn00139842144731.00e-1374/313InRInsulin like receptor
Contig 220FBgn00315474061253.00e-4365/191CG3212Scavenger receptor activity involved in defense response
Contig 226FBgn00373577731027.00e-2552/62sec23Putative GTPase activator activity
Based on GO, we also found one sequence for an apoptosis-related gene, pyes2-ct_017_g10.p1ca, which showed similarity to the Dros ophila Aac11 gene. Two additional sequences, pb42ad-1_001_f09.pb42 primer and pyes2-ct_010_g11.p1ca, showed significant similarity to Drosophila Gnbp3, a gene involved in defense and immunity.

Pathway analysis based on the Kyoto Encyclopedia of Genes and Genomes (KEGG) classification

KEGG has been widely used for pathway mapping [17]. Enzyme commission (EC) numbers were used to judge which sequences pertained to a specific pathway. Twenty-nine unique sequences including 8 contigs and 21 singlets accounted for 2.4% of unique sequences and matched enzymes with an EC number. Within these 29 sequences, 11 unique sequences (1% of total) containing 4 contigs and 7 singlets were mapped to KEGG biochemical pathways (Table 4). Genes involved in amino acid metabolism had the highest mapping sequences (5) and 6 mappings. Genes involved in carbohydrate metabolism included 5 sequences and 8 mappings. Other genes included those involved in nucleotide metabolism (2 sequences), translation (2 sequences), energy metabolism (1 sequence), lipid metabolism (1 sequence), and metabolism of other amino acids (1 sequence). Contigs 35 and 97 each had 4 mappings to different pathways under a different metabolism category, which was the highest number for a single sequence. Contig 120, contig 152, pyes2-ct_008_e11.p1ca, and pyes2-ct_012_c04.p1ca mapped to two pathways. The remaining pathway-assigned sequences only mapped to one pathway.
Table 4

KEGG pathway mapping for Spodoptera frugiperda unique sequences.

KEGG pathwaySequence IDNumber of sequencesPercentage of total
Carbohydrate metabolism436
 Glycolysis/gluconeogenesisContig 35, contig 97, pyes2-ct_008_e11.p1ca327
 Fructose and mannose metabolismpyes2-ct_027_b04.p1ca19
 Pyruvate metabolismContig 35, contig 97218
 Propanoate metabolismContig 3519
 Starch and sucrose metabolismpyes2-ct_010_g11.p1ca19
Energy metabolism19
 Carbon fixationContig 9719
Lipid metabolism19
 Sphingoglycolipid metabolismpyes2-ct_010_a06.p1ca19
Nucleotide metabolism218
 Purine metabolismContig 97, contig 120218
Amino acid metabolism545
 Alanine and aspartate metabolismContig 12019
 Arginine and proline metabolismpyes2-ct_026_h11.p1ca19
 Glycine, serine and threonine metabolismpyes2-ct_012_c04.p1ca19
 Cysteine metabolismContig 3519
 Phenylalanine, tyrosine and tryptophan biosynthesisContig 152, pyes2-ct_008_e11.p1ca218
Metabolism of other amino acids19
 Selenoamino acid metabolismpyes2-ct_021_f10.p1ca19
Translation218
 Aminoacyl-tRNA biosynthesisContig 152, pyes2-ct_012_c04.p1ca218

EST database

To efficiently manage and retrieve information in the ESTs analyzed in this project, we developed an EST model database (ESTMD version 1) [9]. The EST model database is a highly efficient, web-accessible, user-friendly relational database. It provides several tools to search raw, cleaned, and assembled EST sequences, genes and GO, as well as pathway information. The user may input and submit keywords or IDs to the server using the web interface. ESTs and annotated function data are in the relational database and results are sent back to the user in proper formats in response to a query. It also provides a clear contig view and BLAST searches, data submission, and download pages. In both the GO and KEGG pathway search pages, ESTMD allows users to search GO and pathways not only by single gene name, symbol, or ID, but also by using a file that contains a batch of sequence IDs or FlyBase IDs. All the sequence function classification based on GO and KEGG pathway in this study was done using ESTMD. The file search provision in ESTMD allows users to obtain information regarding the possible function of many ESTs or genes at one time instead of searching them individually (Fig. 3).
Figure 3

Gene Ontology search results. The search results used a sequence file by choosing all three ontologies.

Discussion

Single-pass sequencing was performed on 3,365 cDNA clones derived from two SF-21 cDNA libraries. Using this sequence data, we have established an EST database comprised of 1,195 unique sequences from the SF-21 cell line, derived from the lepidopteran insect S. frugiperda. A total of 667 unique sequences (57%) had homology to sequences found in Drosophila. These sequences will be useful for comparative genomics within and outside the Lepidoptera, establishing microarrays, and as probes to either clone or down-regulate genes of interest by RNA interference in order to perform studies related to Spodoptera, other closely related Lepidoptera, or their pathogens. This is one of two published annotated EST studies available for S. frugiperda. A prior report included a similar scale project as that reported here with 5,937 ESTs of which 1,855 were unique sequences obtained from a clonally-derived cell line of SF-21, Sf9 [12]. The majority of unique sequences in the previous study consisted of the highly abundant ribosomal protein genes and these were found to have low codon usage bias [12]. Our data provides 776 novel S. frugiperda sequences. A small percentage of our sequences (20.2%) had similarities, whether these reflect cell line specific differences is not clear at this time. Together these two studies and other available S. frugiperda ESTs constitute seminal work on the genome sequence of S. frugiperda. The sequences reported in this study have been made available for incorporation into Spodobase [18]. Many insects within the Lepidoptera, including the fall armyworm S. frugiperda (family Noctuidae), are pests that cause significant annual damage to a number of field crops and tree foliage worldwide. Deciphering their genomic sequences will aid in developing improved pest control agents, such as baculoviruses and polydnaviruses/parasitic wasps. Although these pathogens are being used or sought as biological control agents, there remains ample room for improvement of their entomopathogenic properties. Finally, molecular tools have been used in the study of Lepidoptera or have been derived from Lepidoptera. The transposable element piggyBac was discovered in the lepidopteran T. ni (cabbage looper) and has been used to create somatic and germline transformations in a number of organisms including crickets, butterflies, Plasmodium falciparum, and more recently, mice [19-22]. Lepidoptera are also amenable to down-regulation of genes by RNA interference and transgenic techniques [23]. Thus, knowledge of the genomics of Lepidoptera will aid in their manipulation or use as molecular tools.

Conclusion

We have established an EST database from the S. frugiperda-derived cell line SF-21, containing 1,195 unique sequences. Lepidoptera are among the most diverse insects and as such, sequences and EST databases from various genomes will be instrumental in assessing species-specific genes, phylogeny, and parallels within species of the same order. In addition, comparative analyses with available genomes of other insects including A. mellifera, D. melanogaster, A. gambiae, Ae. aegypti, and T. castaneum will yield additional insights since these include members of distinct orders (Hymenoptera, Diptera, and Coleoptera), providing a more accurate picture of the conserved pathways and the order-specific gene elements in the Insecta.

Methods

cDNA library construction

Two independently constructed cDNA libraries were used for sequencing. For both libraries, mRNA isolated from log phase SF-21 cells was used for cDNA synthesis, and the libraries were directionally cloned into plasmid vectors. One library was custom made by Clontech using the plasmid vector pB42AD. The pB42AD library had a titer of 3.6 × 1013 colony forming units per ml. The second library was constructed using the SuperScript™ Plasmid System (Invitrogen) and the plasmid vector pYES2/CT (Invitrogen) that had been modified by addition of a Sal I linker at the Bam HI site. The pYES2/CT library had a titer of 1.2 × 1012 colony forming units per ml. The average insert size for both libraries was 1.5 kbp.

EST sequencing

Initially, approximately 200 randomly selected clones from each library were subjected to single-pass sequencing using 5' vector primers. DNA sequencing was performed by MWG Biotech (High Point, NC). Although both libraries yielded acceptable sequence quality, the pYES2/CT library appeared to yield slightly longer sequences; consequently, the remainder of the sequencing was performed using clones from the pYES2/CT library.

Sequence processing

Sequence information was stored in chromatograph trace files, and Phred [24] was used to perform Base-calling [24]. Flanking vector and adaptor sequences were trimmed using Cross-match [25] and Lucy [26], while low quality bases (quality score < 20) were cleaned at both sequence ends by using our custom program. RepeatMasker [27] was used to mask repeated sequences, and the masked sequences were further screened to remove contaminating sequences from bacteria and viruses using BLASTN [13]. High quality ESTs were assembled using CAP 3 [7] and verified with Phrap [8], which perform similar tasks. After assembly, Consed [24] was used to assess contig quality, and assembled ESTs were chosen for further analysis. Contigs flagged for possible miss-assembly were manually edited in Consed and potential chimeric ESTs or other suspect ESTs were removed from the pool of traces.

Sequence annotation

High quality assembled ESTs were annotated using BLASTX through NCBI and our local BLAST server. We searched several databases including the NCBI non-redundant and Drosophila protein databases. The BLAST results were automatically extracted and transferred into a relational database. The sequences reported in this study (2,367 ESTs) have been deposited in GenBank under accession numbers [GenBank: DY792773 to DY795139].

Functional classification

Functional classification of unique sequences from S. frugiperda was based on GO [15]. Unique sequences, including contigs and non-overlapping singlets, were used to search Drosophila predicted protein databases using BLASTX. The Drosophila genes corresponding to the best hits at a threshold of E-value ≤ 10-5 with known GO term were assigned to the query "Spodoptera sequences". All the matched GO information was stored in our local MySQL database.

Pathway assignments

Pathway assignments were carried out according to KEGG mapping [17]. EC numbers [28] were assigned to unique sequences that had BLASTX scores with a cut off value of E = 10-5 or less upon searching SWIR protein databases. The sequences were mapped to KEGG biochemical pathways according to the EC distribution in the pathway database.

Phylogenetic analysis

Proteins were aligned with CLUSTALW using only CDS in the alignment. The alignment was then used to generate phylogenetic trees by the Neighbour-Joining method using the MEGA version 2.1 program. The bootstrap values for the nodes were determined by analyzing 500 bootstrap replicate data sets to estimate the strength of the groupings.

Database implementation

A web-based interface of the database was created using HTML and JavaScript to evaluate the validation of the input on the client side and to reduce the burden on the server side. Apache 2.0 was used as the http web server, while Tomcat 4.1 was the servlet container. Both of these programs were developed and maintained on UNIX, Linux, and Windows NT, ensuring that ESTMD was transplantable and platform-independent. ESTMD is currently hosted on Red Hat 9, and it can be implemented in MySQL 4.0 or higher version. The main tables were on clones, ESTs, uniSequence, uniHit, FlyBase and FlyBaseDetails. The server-side programs were implemented by Java technologies. Servlet and JavaServer Pages were used to communicate between users and databases and to implement a query. XML and XSLT technologies were used to describe, generate, and express GO trees.

Authors' contributions

ALP and RJC contributed to the conception and design of the study, coordinated the study, and were involved in drafting the manuscript. YDeng participated in data analysis and drafting of the manuscript. YDong and VT were involved in data analysis. All authors read and approved the final manuscript.

Additional file 1- Table 7

Table 7. Distribution of molecular functional categories based on gene ontology for Spodoptera frugiperda unique sequences Click here for file

Additional file 2- Table 8

Table 8. Distribution of biological processes based on gene ontology for Spodoptera frugiperda unique sequences Click here for file

Additional file 3- Table 9

Table 9. Distribution of cellular component categories based on gene ontology for Spodoptera frugiperda unique sequences Click here for file
  21 in total

1.  Sequence evolution in mitochondrial ribosomal and ND-1 genes in lepidoptera: implications for phylogenetic analyses.

Authors:  D P Pashley; L D Ke
Journal:  Mol Biol Evol       Date:  1992-11       Impact factor: 16.240

2.  Basic local alignment search tool.

Authors:  S F Altschul; W Gish; W Miller; E W Myers; D J Lipman
Journal:  J Mol Biol       Date:  1990-10-05       Impact factor: 5.469

3.  Germline transformation of the butterfly Bicyclus anynana.

Authors:  Jeffrey M Marcus; Diane M Ramos; Antónia Monteiro
Journal:  Proc Biol Sci       Date:  2004-08-07       Impact factor: 5.349

4.  KEGG: Kyoto Encyclopedia of Genes and Genomes.

Authors:  H Ogata; S Goto; K Sato; W Fujibuchi; H Bono; M Kanehisa
Journal:  Nucleic Acids Res       Date:  1999-01-01       Impact factor: 16.971

5.  Base-calling of automated sequencer traces using phred. II. Error probabilities.

Authors:  B Ewing; P Green
Journal:  Genome Res       Date:  1998-03       Impact factor: 9.043

6.  PolyPhred: automating the detection and genotyping of single nucleotide substitutions using fluorescence-based resequencing.

Authors:  D A Nickerson; V O Tobe; S L Taylor
Journal:  Nucleic Acids Res       Date:  1997-07-15       Impact factor: 16.971

7.  Annotation pattern of ESTs from Spodoptera frugiperda Sf9 cells and analysis of the ribosomal protein genes reveal insect-specific features and unexpectedly low codon usage bias.

Authors:  I Landais; M Ogliastro; K Mita; J Nohata; M López-Ferber; M Duonor-Cérutti; T Shimada; P Fournier; G Devauchelle
Journal:  Bioinformatics       Date:  2003-12-12       Impact factor: 6.937

8.  piggyBac-mediated somatic transformation of the two-spotted cricket, Gryllus bimaculatus.

Authors:  Yohei Shinmyo; Taro Mito; Takashi Matsushita; Isao Sarashina; Katsuyuki Miyawaki; Hideyo Ohuchi; Sumihare Noji
Journal:  Dev Growth Differ       Date:  2004-08       Impact factor: 2.053

9.  The establishment of two cell lines from the insect Spodoptera frugiperda (Lepidoptera; Noctuidae).

Authors:  J L Vaughn; R H Goodwin; G J Tompkins; P McCawley
Journal:  In Vitro       Date:  1977-04

Review 10.  Annotation of the Drosophila melanogaster euchromatic genome: a systematic review.

Authors:  Sima Misra; Madeline A Crosby; Christopher J Mungall; Beverley B Matthews; Kathryn S Campbell; Pavel Hradecky; Yanmei Huang; Joshua S Kaminker; Gillian H Millburn; Simon E Prochnik; Christopher D Smith; Jonathan L Tupy; Eleanor J Whitfied; Leyla Bayraktaroglu; Benjamin P Berman; Brian R Bettencourt; Susan E Celniker; Aubrey D N J de Grey; Rachel A Drysdale; Nomi L Harris; John Richter; Susan Russo; Andrew J Schroeder; Sheng Qiang Shu; Mark Stapleton; Chihiro Yamada; Michael Ashburner; William M Gelbart; Gerald M Rubin; Suzanna E Lewis
Journal:  Genome Biol       Date:  2002-12-31       Impact factor: 13.583

View more
  10 in total

1.  Isolation, annotation and applications of expressed sequence tags from the olive fly, Bactrocera oleae.

Authors:  K T Tsoumani; A A Augustinos; E G Kakani; E Drosopoulou; P Mavragani-Tsipidou; K D Mathiopoulos
Journal:  Mol Genet Genomics       Date:  2010-10-27       Impact factor: 3.291

2.  Pyrosequence analysis of expressed sequence tags for Manduca sexta hemolymph proteins involved in immune responses.

Authors:  Zhen Zou; Fares Najar; Yang Wang; Bruce Roe; Haobo Jiang
Journal:  Insect Biochem Mol Biol       Date:  2008-03-29       Impact factor: 4.714

3.  The Lymantria dispar IPLB-Ld652Y cell line transcriptome comprises diverse virus-associated transcripts.

Authors:  Michael E Sparks; Dawn E Gundersen-Rindal
Journal:  Viruses       Date:  2011-11-21       Impact factor: 5.048

4.  Transcriptome sequencing of and microarray development for a Helicoverpa zea cell line to investigate in vitro insect cell-baculovirus interactions.

Authors:  Quan Nguyen; Robin W Palfreyman; Leslie C L Chan; Steven Reid; Lars K Nielsen
Journal:  PLoS One       Date:  2012-05-18       Impact factor: 3.240

5.  Identification, expression profiling and fluorescence-based binding assays of a chemosensory protein gene from the Western flower thrips, Frankliniella occidentalis.

Authors:  Zhi-Ke Zhang; Zhong-Ren Lei
Journal:  PLoS One       Date:  2015-01-30       Impact factor: 3.240

6.  Expressed sequence tags from larval gut of the European corn borer (Ostrinia nubilalis): exploring candidate genes potentially involved in Bacillus thuringiensis toxicity and resistance.

Authors:  Chitvan Khajuria; Yu Cheng Zhu; Ming-Shun Chen; Lawrent L Buschman; Randall A Higgins; Jianxiu Yao; Andre Lb Crespo; Blair D Siegfried; Subbaratnam Muthukrishnan; Kun Yan Zhu
Journal:  BMC Genomics       Date:  2009-06-29       Impact factor: 3.969

7.  Generation and analysis of expressed sequence tags from the ciliate protozoan parasite Ichthyophthirius multifiliis.

Authors:  Jason W Abernathy; Peng Xu; Ping Li; De-Hai Xu; Huseyin Kucuktas; Phillip Klesius; Covadonga Arias; Zhanjiang Liu
Journal:  BMC Genomics       Date:  2007-06-18       Impact factor: 3.969

8.  Towards the ictalurid catfish transcriptome: generation and analysis of 31,215 catfish ESTs.

Authors:  Ping Li; Eric Peatman; Shaolin Wang; Jinian Feng; Chongbo He; Puttharat Baoprasertkul; Peng Xu; Huseyin Kucuktas; Samiran Nandi; Benjaporn Somridhivej; Jerry Serapion; Micah Simmons; Cemal Turan; Lei Liu; William Muir; Rex Dunham; Yolanda Brady; John Grizzle; Zhanjiang Liu
Journal:  BMC Genomics       Date:  2007-06-18       Impact factor: 3.969

9.  Cloning, analysis and functional annotation of expressed sequence tags from the Earthworm Eisenia fetida.

Authors:  Mehdi Pirooznia; Ping Gong; Xin Guan; Laura S Inouye; Kuan Yang; Edward J Perkins; Youping Deng
Journal:  BMC Bioinformatics       Date:  2007-11-01       Impact factor: 3.169

Review 10.  Genome scale transcriptomics of baculovirus-insect interactions.

Authors:  Quan Nguyen; Lars K Nielsen; Steven Reid
Journal:  Viruses       Date:  2013-11-12       Impact factor: 5.048

  10 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.