| Literature DB >> 20041161 |
Wook Kim1, Mark W Silby, Sam O Purvine, Julie S Nicoll, Kim K Hixson, Matt Monroe, Carrie D Nicora, Mary S Lipton, Stuart B Levy.
Abstract
Genome sequences are annotated by computational prediction of coding sequences, followed by similarity searches such as BLAST, which provide a layer of possible functional information. While the existence of processes such as alternative splicing complicates matters for eukaryote genomes, the view of bacterial genomes as a linear series of closely spaced genes leads to the assumption that computational annotations that predict such arrangements completely describe the coding capacity of bacterial genomes. We undertook a proteomic study to identify proteins expressed by Pseudomonas fluorescens Pf0-1 from genes that were not predicted during the genome annotation. Mapping peptides to the Pf0-1 genome sequence identified sixteen non-annotated protein-coding regions, of which nine were antisense to predicted genes, six were intergenic, and one read in the same direction as an annotated gene but in a different frame. The expression of all but one of the newly discovered genes was verified by RT-PCR. Few clues as to the function of the new genes were gleaned from informatic analyses, but potential orthologs in other Pseudomonas genomes were identified for eight of the new genes. The 16 newly identified genes improve the quality of the Pf0-1 genome annotation, and the detection of antisense protein-coding genes indicates the under-appreciated complexity of bacterial genome organization.Entities:
Mesh:
Substances:
Year: 2009 PMID: 20041161 PMCID: PMC2794547 DOI: 10.1371/journal.pone.0008455
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
General features of nov genes in P. fluorescens Pf0-1.
|
| Organization | Length | Genome Coordinates | Culture Conditions | Potential SD Sequence |
| 01 | overlap (opp. & par.) | 195 | c747558-748142 | Min |
|
| 02 | intergenic (small par. overlap) | 87 | c1046719-1046979 | Min/Rich-stn |
|
| 03 | overlap (opp.) | 90 | c1165207-1165476 | Rich-exp |
|
| 04 | overlap (opp.) | 71 | c2784682-2784894 | Min/Rich-stn |
|
| 05 | intergenic (par.) | 98 | c2950927-2951220 | Min/Rich |
|
| 06 | overlap (opp. two ORFs) | 178 | c3059127-3059660 | Min/Rich-stn |
|
| 07 | overlap (opp.) | 42 | c3285610-3285735 | Rich-exp |
|
| 08 | overlap (opp.) | 152 | c3526135-3526590 | Rich-exp |
|
| 09 | intergenic (opp./par.) | 56 | c3594859-3595026 | Rich-stn |
|
| 10 | intergenic (opp.) | 62 | c3774366-3774551 | Min |
|
| 11 | overlap (opp. two ORFs) | 155 | c4134875-4135339 | Min |
|
| 12 | intergenic (small par. overlap) | 50 | c4285307-4285456 | Min |
|
| 13 | overlap (par.) | 211 | 4475971-4476603 | Min |
|
| 14 | overlap (opp.) | 101 | c4756241-4756543 | Min |
|
| 15 | overlap (opp.) | 530 | c4938030-4939619 | Min/Rich-exp |
|
| 16 | intergenic (par.) | 73 | 5264889-5265107 | Min/Rich-exp |
|
opp. indicates overlap with gene coded on opposite DNA strand; par. indicates overlap with gene coded parallel, in a different frame.
Predicted length of novel gene product (amino acids).
In coordinates, the ‘c’ indicates the ORF is complementary to the coordinates shown. These coordinates represent the smallest potential coding sequence, defined by the first possible initiation codon upstream of the peptides.
These are the conditions for growth of cultures from which the Nov proteins were identified. Min indicates Pseudomonas minimal medium; Rich indicates King's B or LB; exp indicates exponential growth phase; stn indicates stationary growth phase.
Possible Shine-Dalgarno sequences are underlined, and in bold. Potential translation initiation codons are in bold.
Analysis of nov genes that overlap predicted genes in P. fluorescens Pf0-1.
|
| Predictions | TBLASTN – relevant hits |
| 01 | None |
|
| Other | ||
| 03 | None |
|
|
| ||
| 04 | GenemarkS | No significant hits |
| 06 | None | No significant hits |
| 07 | None |
|
|
| ||
| Most hits antisense to probable nucleotide sugar dehydrogenase in | ||
| 08 | None |
|
| Hits to ORFs opposite acetoacetyl-CoA synthase in organisms including | ||
| 11 | None |
|
| Hits to sequences opposite hydroxymethylglutaryl-CoA lyase in numerous organisms including | ||
| 13 | None |
|
| Similar | ||
| 14 | None |
|
| Numerous other | ||
| 15 | None |
|
|
|
Gene prediction using GenemarkS and Glimmer.
Analysis of nov genes located in ‘intergenic’ regions in P. fluorescens Pf0-1.
|
| Predictions | TBLASTN – relevant hits | Orthologs |
| 02 | GenemarkS | Best hit: | colicin-pyocin immunity protein |
| Also, | |||
| 05 | None | No significant hits | None |
| 09 | GenemarkS & Glimmer | Pf0-1 genome | None |
| 10 | GenemarkS & Glimmer |
| None |
| 12 | None | Pf0-1 genome | None |
| 16 | GenemarkS & Glimmer |
| signal peptide at N-terminus |
| Six hits, all in |
Gene prediction using GenemarkS and Glimmer.
Indicates matches in GenBank for which functional predictions have been made.
Figure 1Organization of ten novel ORFs overlapping predicted genes in P. fluorescens Pf0-1.
The novel ORFs are colored red and indicated by “nov” (or n7 for nov7), while predicted Pf0-1 genes are colored blue and labeled with the Pfl01 number of the locus tag corresponding to each in the Pf0-1 GenBank entry. Three forward (numbered 1-3) and three reverse (numbered 4–6) reading frames are shown. Parallel diagonal lines indicate that the complete ORF is not shown to scale. For the length of predicted nov-encoded proteins, see Table 1.