| Literature DB >> 20089203 |
Na Gao1, Ling-Ling Chen, Hong-Fang Ji, Wei Wang, Ji-Wei Chang, Bei Gao, Lin Zhang, Shi-Cui Zhang, Hong-Yu Zhang.
Abstract
BACKGROUND: Bacterial plant pathogens are very harmful to their host plants, which can cause devastating agricultural losses in the world. With the development of microbial genome sequencing, many strains of phytopathogens have been sequenced. However, some misannotations exist in these phytopathogen genomes. Our objective is to improve these annotations and store them in a central database DIGAP. DESCRIPTION: DIGAP includes the following improved information on phytopathogen genomes. (i) All the 'hypothetical proteins' were checked, and non-coding ORFs recognized by the Z curve method were removed. (ii) The translation initiation sites (TISs) of 20% approximately 25% of all the protein-coding genes have been corrected based on the NCBI RefSeq, ProTISA database and an ab initio program, GS-Finder. (iii) Potential functions of about 10% 'hypothetical proteins' have been predicted using sequence alignment tools. (iv) Two theoretical gene expression indices, the codon adaptation index (CAI) and the E(g) index, were calculated to predict the gene expression levels. (v) Potential agricultural bactericide targets and their homology-modeled 3D structures are provided in the database, which is of significance for agricultural antibiotic discovery.Entities:
Mesh:
Year: 2010 PMID: 20089203 PMCID: PMC2825234 DOI: 10.1186/1471-2164-11-54
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
General annotation information of the 28 plant pathogens
| Species a | Abbreviation | RefSeq | Genomic Length (bp) | G+C content (%) | Annotated ORFs in RefSeq |
|---|---|---|---|---|---|
| NC_008752 | 5,352,772 | 68.02 | 4709 | ||
| NC_003062 | 2,841,580 | 59.38 | 2765 | ||
| NC_011989 | 4,009,526 | 57.60 | 4288 | ||
| NC_007716 | 706,595 | 26.89 | 671 | ||
| NC_009480 | 3,297,891 | 72.66 | 2984 | ||
| NC_010407 | 3,258,645 | 72.60 | 2941 | ||
| NC_010544 | 879,959 | 27.40 | 684 | ||
| NC_011047 | 601,943 | 21.40 | 479 | ||
| NC_004547 | 5,064,019 | 50.97 | 4472 | ||
| NC_006087 | 2,584,158 | 67.68 | 2030 | ||
| NC_006055 | 793,224 | 27.02 | 682 | ||
| NC_005303 | 860,631 | 27.74 | 754 | ||
| NC_005773 | 5,928,787 | 58.02 | 4985 | ||
| NC_007005 | 6,093,698 | 59.23 | 5089 | ||
| NC_004578 | 6,397,123 | 58.40 | 5476 | ||
| NC_003295 | 3,716,416 | 67.04 | 3438 | ||
| NC_003919 | 5,175,554 | 64.77 | 4312 | ||
| NC_007086 | 5,148,708 | 64.96 | 4273 | ||
| NC_003902 | 5,076,188 | 65.07 | 4181 | ||
| NC_010688 | 5,079,002 | 65.00 | 4467 | ||
| NC_007508 | 5,178,466 | 64.75 | 4487 | ||
| NC_007705 | 4,940,217 | 63.70 | 4372 | ||
| NC_006834 | 4,941,439 | 63.69 | 4144 | ||
| NC_010717 | 5,240,075 | 63.60 | 4988 | ||
| NC_010513 | 2,475,130 | 51.90 | 2104 | ||
| NC_010577 | 2,535,690 | 51.80 | 2161 | ||
| NC_002488 | 2,679,306 | 52.67 | 2766 | ||
| NC_004556 | 2,519,802 | 51.78 | 2034 |
a For Agrobacterium tumefaciens str. C58, Ralstonia solanacearum GMI1000, only the largest chromosome are considered.
Figure 1Flowchart depicting the strategy of refined annotation for 28 plant pathogens.
Refined information of the 28 plant pathogens
| Species a | Number of non-coding ORFs | Number (percentage) of refined TISs | Number (percentage) of HPs assigned with functions b | Number (percentage) of PHX genes c | Number of potential drug targets |
|---|---|---|---|---|---|
| 15 | 699 (14.9%) | 105 (9.1%) | 327 (7.0%) | 35 | |
| 20 | 640 (23.3%) | 233 (23.0%) | 210 (7.7%) | 39 | |
| 7 | 1171 (27.4%) | 437 (33.9%) | 76 (1.8%) | 45 | |
| 26 | 91 (14.1%) | 114 (35.3%) | 29 (4.4%) | 6 | |
| 0 | 381 (12.8%) | 197 (19.0%) | 836 (28.0%) | 40 | |
| 63 | 826 (28.7%) | 181 (21.9%) | 455 (15.8%) | 35 | |
| 8 | 110 (16.3%) | 2 (7.5%) | 93 (13.8%) | 7 | |
| 2 | 43 (9.0%) | 7 (4.6%) | 79 (16.6%) | 8 | |
| 48 | 436 (9.9%) | 169 (13.5%) | 259 (5.9%) | 46 | |
| 4 | 612 (30.2%) | 92 (13.6%) | 211 (10.4%) | 47 | |
| 0 | 2 (0.3%) | 1 (1.4%) | 49 (7.2%) | 13 | |
| 9 | 118 (15.8%) | 99 (28.5%) | 25 (3.4%) | 7 | |
| 20 | 728 (14.7%) | 103 (9.3%) | 166 (3.3%) | 44 | |
| 19 | 333 (6.6%) | 133 (11.7%) | 410 (8.1%) | 43 | |
| 34 | 766 (14.1%) | 174 (10.6%) | 209 (3.8%) | 44 | |
| 12 | 503 (14.7%) | 200 (20.4%) | 150 (4.4%) | 40 | |
| 39 | 1146 (26.8%) | 167 (10.4%) | 372 (8.7%) | 27 | |
| 5 | 1341 (31.4%) | 134 (8.4%) | 415 (9.7%) | 45 | |
| 7 | 1022 (24.5%) | 131 (8.9%) | 349 (8.4%) | 45 | |
| 0 | 790 (17.7%) | 91 (5.5%) | 432 (9.7%) | 29 | |
| 10 | 859 (19.2%) | 124 (10.2%) | 408 (9.1%) | 45 | |
| 37 | 1282 (29.6%) | 131 (8.3%) | 404 (9.3%) | 42 | |
| 6 | 1586 (38.3%) | 152 (11.9%) | 470 (11.4%) | 40 | |
| 51 | 2434 (49.3%) | 54 (4.2%) | 673 (13.6%) | 41 | |
| 0 | 354 (16.8%) | 111 (14.4%) | 224 (10.5%) | 29 | |
| 0 | 324 (15.0%) | 83 (12.3%) | 734 (34.0%) | 29 | |
| 70 | 916 (34.0%) | 194 (12.9%) | 205 (7.6%) | 41 | |
| 27 | 459 (22.9%) | 114 (15.4%) | 370 (18.4%) | 41 |
aFull name of all species are listed in Table 1.
bHPs indicate hypothetical proteins.
cPHX genes indicate predicted highly expressed genes.
Figure 2The distribution of points on the principal plane spanned by the first (. The red circles represent the function-known genes, the blue triangles represent the corresponding negative samples and the black stars denote the recognized non-coding ORFs. The first and second principal axes account for 33.96% and 14.98% of the total inertia of the 21-dimensional space, respectively. It is clear that most of the identified non-coding ORFs distribute far from the core of open circles, and close to the core of open triangles, which implies that the recognized non-coding ORFs are very unlikely to encode proteins.
Figure 3Statistics of relocated TISs for . (a) The statistical caky chart for At58. Blue regions denote the percentage of the same TISs as the RefSeq annotation. Pink and light blue regions denote the percentage of 5'-shift and 3'-shift from the RefSeq annotation, respectively. (b) The histogram of relocated TISs for At58. Negative and positive values in x-axis indicate the length of 5'-shift and 3'-shift from the RefSeq annotation, respectively, and y-axis indicates the number of shifted TISs.
Figure 4Web interface of DIGAP. (a) Query interface. (b) An example of improved phytopathogen annotation, the 'hypothetical proteins' assigned with functions are marked in red. Users can click DIGAP_ID to obtain the detailed information. (c) The BLAST search webpage. (d) The potential bactericide targets interface.