| Literature DB >> 23837716 |
Akiya Jouraku1, Kimiko Yamamoto, Seigo Kuwazaki, Masahiro Urio, Yoshitaka Suetsugu, Junko Narukawa, Kazuhisa Miyamoto, Kanako Kurita, Hiroyuki Kanamori, Yuichi Katayose, Takashi Matsumoto, Hiroaki Noda.
Abstract
BACKGROUND: The diamondback moth (DBM), Plutella xylostella, is one of the most harmful insect pests for crucifer crops worldwide. DBM has rapidly evolved high resistance to most conventional insecticides such as pyrethroids, organophosphates, fipronil, spinosad, Bacillus thuringiensis, and diamides. Therefore, it is important to develop genomic and transcriptomic DBM resources for analysis of genes related to insecticide resistance, both to clarify the mechanism of resistance of DBM and to facilitate the development of insecticides with a novel mode of action for more effective and environmentally less harmful insecticide rotation. To contribute to this goal, we developed KONAGAbase, a genomic and transcriptomic database for DBM (KONAGA is the Japanese word for DBM). DESCRIPTION: KONAGAbase provides (1) transcriptomic sequences of 37,340 ESTs/mRNAs and 147,370 RNA-seq contigs which were clustered and assembled into 84,570 unigenes (30,695 contigs, 50,548 pseudo singletons, and 3,327 singletons); and (2) genomic sequences of 88,530 WGS contigs with 246,244 degenerate contigs and 106,455 singletons from which 6,310 de novo identified repeat sequences and 34,890 predicted gene-coding sequences were extracted. The unigenes and predicted gene-coding sequences were clustered and 32,800 representative sequences were extracted as a comprehensive putative gene set. These sequences were annotated with BLAST descriptions, Gene Ontology (GO) terms, and Pfam descriptions, respectively. KONAGAbase contains rich graphical user interface (GUI)-based web interfaces for easy and efficient searching, browsing, and downloading sequences and annotation data. Five useful search interfaces consisting of BLAST search, keyword search, BLAST result-based search, GO tree-based search, and genome browser are provided. KONAGAbase is publicly available from our website (http://dbm.dna.affrc.go.jp/px/) through standard web browsers.Entities:
Mesh:
Year: 2013 PMID: 23837716 PMCID: PMC3711893 DOI: 10.1186/1471-2164-14-464
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Statistics of the EST/mRNA sequences
| midgut | 12,406 | 49.2% | 5,960 | 480.4 | 508 | 879 |
| egg | 6,904 | 42.6% | 3,082 | 446.4 | 453 | 855 |
| testis | 16,308 | 44.2% | 9,691 | 594.2 | 633 | 880 |
| NCBI | 1,722 | 50.1% | 1,112 | 645.6 | 454 | 16,113 |
| Total | 37,340 | 45.8% | 19,844 | 531.4 | 563 | 16,113 |
Figure 1Work-flow for constructing a putative gene set from genomic and transcriptomic sequences in the diamondback moth.
Statistics of the WGS assembled sequences
| contig | 88,530 | 38.3% | 186,028 (51.0%) | 2,101.3 | 1,619 | 24,960 |
| degenerate contig | 246,244 | 38.8% | 147,783 (40.5%) | 600.1 | 558 | 12,183 |
| singleton | 106,455 | 42.0% | 31,160 (8.5%) | 292.7 | 287 | 727 |
Percentage values in the “Total” columns are the ratio of the values to the sum of the length of all the assembled sequences. N50 of the contig sequences is 2,273 bp.
Statistics of the putative gene set
| unigene | 20,870 | 47.3% | 18.6 | 890.3 | 654 | 16,249 |
| predicted gene | 11,930 | 54.3% | 7.45 | 624.5 | 453 | 11,460 |
| Total | 32,800 | 49.3% | 26.0 | 793.6 | 570 | 16,249 |
Figure 2Feature block diagram of KONAGAbase. Screenshots of the corresponding web pages are presented for the top page, sequence download page, BLAST search form, GBrowse, BLAST search result, keyword search form, BLAST result search form, GO tree search form, sequence search result, and detailed sequence information page. Sequences searched by BLAST or the three search forms can be downloaded in a tab-delimited text, CSV file, or FASTA file. Links to external public databases include the NCBI database, UniRef90, the Gene Ontology, SPODOBASE, KAIKObase, SilkDB, Manduca Base, MonarchBase, FlyBase, BeetleBase, VectorBase, BeeBase, and AphidBase.
Figure 3Overview of the database architecture.
Results of BLAST search for the putative gene set
| NCBI nr | 25,290 (77.1%) | 72.4% | 40.5% |
| | | | |
| 27,659 (84.3%) | 69.0% | 39.8% | |
| 29,556 (90.1%) | 71.0% | 36.2% | |
| 29,518 (90.0%) | 72.0% | 36.4% | |
| 21,281 (64.9%) | 69.2% | 34.9% | |
| 23,045 (70.3%) | 69.6% | 35.7% | |
| 21,599 (65.9%) | 68.3% | 35.7% | |
| 21,414 (65.3%) | 68.5% | 35.8% | |
| 21,587 (65.8%) | 68.6% | 35.5% | |
| 13,399 (40.9%) | 46.4% | 40.3% | |
| 16,278 (49.6%) | 44.9% | 38.9% | |
The average query coverage was calculated by dividing the aligned length by the length of a query sequence (a putative gene of DBM). Similarly, the average subject coverage was calculated by dividing the aligned length by the length of a subject sequence (a sequence of the target database).
Classification of identified putative repeat sequences
| Transposon | LINE/SINE | 1,776 | 135,174 | 24,072.6 | 6.60% |
| LTR | 482 | 12,451 | 2,642.4 | 0.72% | |
| DNA | 61 | 15,994 | 3,825.6 | 1.05% | |
| Others | 80 | 28,354 | 3,187.7 | 0.87% | |
| Unclassified | 3,911 | 601,647 | 84,920.6 | 23.27% | |
| Total | 6,310 | 893,620 | 118,649.0 | 32.51% | |
“Percentage of masked bases” is the percentage of masked bases against the total bases of the WGS contigs, degenerate contigs, and singletons.
Top 10 more represented genes among the sequenced transcripts in the midgut
| 1 | serin protease | 3,563 | 265 | 28.72% |
| 2 | lipase | 669 | 99 | 5.39% |
| 3 | cytochrome c oxidase | 438 | 16 | 3.53% |
| 4 | ribosomal protein | 380 | 95 | 3.06% |
| 5 | mucin | 314 | 46 | 2.53% |
| 6 | ATP synthase | 205 | 27 | 1.65% |
| 7 | glucosinolate sulfatase | 179 | 12 | 1.44% |
| 8 | carboxypeptidase | 147 | 26 | 1.18% |
| 9 | fibroin | 134 | 11 | 1.08% |
| 10 | ferritin | 108 | 7 | 0.87% |
Top 10 more represented genes among the sequenced transcripts in the egg
| 1 | ribosomal protein | 1,009 | 107 | 14.61% |
| 2 | cytochrome c oxidase | 450 | 10 | 6.52% |
| 3 | ATP synthase | 112 | 37 | 1.62% |
| 4 | heat shock protein | 99 | 36 | 1.43% |
| 5 | actin | 94 | 14 | 1.36% |
| 6 | elongation factor | 83 | 19 | 1.20% |
| 7 | cuticle protein | 83 | 27 | 1.20% |
| 8 | nucleoplasmin-like protein | 73 | 2 | 1.06% |
| 9 | myosin | 56 | 6 | 0.81% |
| 10 | zinc finger protein | 51 | 38 | 0.74% |
Top 10 more represented genes among the sequenced transcripts in the testis
| 1 | arylphorin-like hexamerin | 638 | 27 | 3.91% |
| 2 | elongation factor | 582 | 26 | 3.57% |
| 3 | initiation factor | 329 | 45 | 2.02% |
| 4 | heat shock protein | 246 | 49 | 1.51% |
| 5 | zinc finger protein | 210 | 111 | 1.29% |
| 6 | ribosomal protein | 169 | 34 | 1.04% |
| 7 | tubulin | 165 | 39 | 1.01% |
| 8 | protein kinase | 139 | 50 | 0.85% |
| 9 | ATP synthase | 138 | 30 | 0.85% |
| 10 | protein disulfide isomerase | 117 | 13 | 0.72% |
Number of the putative genes containing highly conserved domains of P450, GST, and ABC transporter genes in DBM
| DS1 | 137 (137) | 22 (22) | 70 (74) |
| DS2 | 61 (61) | 20 (20) | 54 (57) |
Two nucleotide-binding domains (NBDs) can be extracted from one ABC gene. Such ABC genes are included in the DS1 (4 genes) and the DS2 (3 genes).
Number of putative P450 genes in each P450 clan/family
| CYP2 clan | CYP15 | 1 | 1 | 1 | 1 |
| CYP18 | 2 | 2 | 2 | 2 | |
| CYP303 | 2 | 1 | 1 | 1 | |
| CYP304 | 1 | 1 | 1 | 0 | |
| CYP305 | 1 | 1 | 1 | 1 | |
| CYP306 | 1 | 1 | 1 | 1 | |
| CYP307 | 1 | 1 | 1 | 1 | |
| Total | | 9 | 8 | 8 | 7 |
| CYP3 clan | CYP6 | 31 | 21 | 21 | 16 |
| CYP9 | 14 | 3 | 3 | 7 | |
| CYP321 | 2 | 2 | 2 | 0 | |
| CYP324 | 0 | 0 | 0 | 1 | |
| CYP332 | 0 | 0 | 0 | 1 | |
| CYP337 | 2 | 1 | 1 | 2 | |
| CYP338 | 2 | 0 | 0 | 1 | |
| CYP347 | 1 | 1 | 1 | 0 | |
| CYP354 | 3 | 1 | 1 | 1 | |
| CYP365 | 1 | 1 | 1 | 1 | |
| Total | | 56 | 30 | 30 | 30 |
| CYP4 clan | CYP4 | 20 | 9 | 7 | 12 |
| CYP340 | 20 | 5 | 5 | 13 | |
| CYP341 | 6 | 0 | 0 | 8 | |
| CYP366 | 4 | 1 | 1 | 1 | |
| CYP367 | 4 | 0 | 2 | 2 | |
| Total | | 54 | 15 | 15 | 36 |
| Mito. clan | CYP49 | 3 | 1 | 1 | 2 |
| CYP301 | 1 | 0 | 0 | 1 | |
| CYP302 | 2 | 2 | 2 | 1 | |
| CYP314 | 2 | 1 | 1 | 1 | |
| CYP315 | 1 | 1 | 1 | 1 | |
| CYP333 | 7 | 3 | 3 | 4 | |
| CYP339 | 2 | 0 | 0 | 1 | |
| Total | | 18 | 8 | 8 | 11 |
| Total (All) | 137 | 61 | 61 | 84 | |
The number of putative P450 genes in each P450 clan/family was estimated using two sets of p450 domain (Pfam ID: PF00067) sequences (DS1 and DS2). UBH based classification was performed for DS1 (DS1-UBH) and DS2 (DS2-UBH), respectively. Phylogenetic tree-based classification was performed for DS2 (DS2-PTB). The number of P450 genes in Bombyx mori was obtained from [63].
Number of putative GST genes in each GST class
| Delta | 4 | 4 | 4 | 4 |
| Epsilon | 6 | 5 | 6 | 8 |
| Omega | 4 | 3 | 3 | 4 |
| Sigma | 2 | 2 | 2 | 2 |
| Theta | 3 | 3 | 1 | 1 |
| Zeta | 2 | 2 | 2 | 2 |
| Unclassified | 1 | 1 | 2 | 2 |
| Total | 22 | 20 | 20 | 23 |
The number of putative GST genes in each GST class in DBM was estimated using two sets of GST N-terminal domain (Pfam ID: PF02798) sequences (DS1 and DS2). UBH based classification was performed for DS1 (DS1-UBH) and DS2 (DS2-UBH), respectively. Phylogenetic tree-based classification was performed for DS2 (DS2-PTB). The number of GST genes in Bombyx mori was obtained from [62].
Number of putative ABC transporter genes in each ABC transporter family
| ABCA | 8 | 7 | 7 | 9 |
| ABCB | 18 | 11 | 11 | 9 |
| ABCC | 20 | 15 | 15 | 15 |
| ABCD | 2 | 2 | 2 | 2 |
| ABCE | 1 | 1 | 1 | 1 |
| ABCF | 3 | 3 | 3 | 3 |
| ABCG | 15 | 12 | 12 | 12 |
| ABCH | 3 | 3 | 3 | 2 |
| Total | 70 | 54 | 54 | 53 |
The number of putative ABC transporter genes in each ABC transporter family in DBM was estimated using two sets of nucleotide-binding domain (Pfam ID: PF00005) sequences (DS1 and DS2). UBH based classification was performed for DS1 (DS1-UBH) and DS2 (DS2-UBH), respectively. Phylogenetic tree-based classification was performed for DS2 (DS2-PTB). The number of ABC transporter genes in Bombyx mori was obtained from [65].