| Literature DB >> 22336032 |
Joseph A Christie-Oleza1, Guylaine Miotello, Jean Armengaud.
Abstract
BACKGROUND: The structural and functional annotation of genomes is now heavily based on data obtained using automated pipeline systems. The key for an accurate structural annotation consists of blending similarities between closely related genomes with biochemical evidence of the genome interpretation. In this work we applied high-throughput proteogenomics to Ruegeria pomeroyi, a member of the Roseobacter clade, an abundant group of marine bacteria, as a seed for the annotation of the whole clade.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22336032 PMCID: PMC3305630 DOI: 10.1186/1471-2164-13-73
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
List of novel genes found in the genome of R.pomeroyi detected by proteogenomics.
| Target a | Plausible CDS start | Stop | Length (aa) | Peptides assigned | CDS Proteomic coverage | Function/Presence in other Roseobacter stains b | |
|---|---|---|---|---|---|---|---|
| SPOA_PG001 | 300016 | 300507 | 164 | 10 | 73% | Unknown/observed (9e-51) | |
| SPO_PG002 | 3171305 | 3170874 | 144 | 9 | 69% | Unknown/unique | |
| SPO_PG003 | 1412876 | 1413418 | 181 | 7 | 73% | Unknown/observed (5e-23) | |
| SPOA_PG004 | 87032 | 87709 | 226 | 6 | 27% | Unknown/unique | |
| SPO_PG005 | 358784 | 358125 | 220 | 6 | 45% | Esterase-lipase/observed (5e-45) | |
| SPO_PG006 | 360911 | 360405 | 169 | 5 | 54% | Unknown/unique | |
| SPO_PG007 | 1483195 | 1482533 | 221 | 5 | 48% | Unknown/unique | |
| SPO_PG008 | 1431167 | 1431595 | 143 | 5 | 45% | Unknown/observed (3e-56) | |
| SPO_PG009 | 501740 | 502171 | 144 | 5 | 42% | Unknown/unique | |
| SPO_PG010 | 2353576 | 2353965 | 130 | 4 | 42% | Unknown/observed (1e-38) | |
| SPO_PG011 | 1374589 | 1374299 | 97 | 3 | 61% | Unknown/conserved (1e-43) | |
| SPO_PG012 | 3703461 | 3702955 | 169 | 3 | 22% | Unknown/unique | |
| SPO_PG013 | 649156 | 649749 | 198 | 3 | 23% | Unknown/unique | |
| SPO_PG014 | 2482691 | 2482317 | 125 | 3 | 20% | Unknown/unique | |
| SPO_PG015 | 3657397 | 3656924 | 158 | 3 | 19% | Unknown/observed (6e-50) | |
| SPO_PG016 | 373055 | 373333 | 93 | 2 | 41% | Unknown/unique | |
| SPO_PG017 | 1092236 | 1092592 | 119 | 2 | 34% | Unknown/unique | |
| SPO_PG018 | 495167 | 495529 | 121 | 2 | 22% | Unknown/observed (4e-44) | |
| SPO_PG019 | 1418666 | 1419187 | 174 | 2 | 10% | Signal transduction/conserved (1e-69) | |
| SPO_PG020 | 2807747 | 2807223 | 175 | 2 | 19% | Polyketide cyclase/unique | |
| SPO_PG021 | 1289473 | 1289829 | 119 | 2 | 28% | Unknown/unique | |
| SPO_PG022 | 1151078 | 1151632 | 185 | 2 | 18% | Unknown/unique | |
| SPO_PG023 | 1400166 | 1399696 | 157 | 2 | 24% | Unknown/unique | |
| SPO_PG024 | 2628409 | 2629668 | 420 | 2 | 9% | RNA helicase/conserved (1e-175) | |
| 1322016 | 1322357 | 114 | 1 | 7% | Transcriptional regulator/unique | ||
| 3883013 | 3882531 | 161 | 1 | 7% | Unknown/unique | ||
| SPO_PG027 | 501090 | 501710 | 207 | 21 | 77% | Unknown/unique | |
| SPO_PG028 | 2429044 | 2427941 | 368 | 20 | 63% | Unknown/conserved (5e-92) | |
| SPO_PG029 | 3124885 | 3123728 | 386 | 11 | 36% | Sporulation related/conserved (6e-92) | |
| SPO_PG030 | 1738173 | 1736680 | 498 | 7 | 24% | Unknown/conserved (1e-175) | |
| SPO_PG031 | 2905673 | 2906335 | 221 | 6 | 37% | Unknown/unique | |
| SPO_PG032 | 3751605 | 3751147 | 153 | 6 | 42% | Unknown/conserved (5e-42) | |
| SPO_PG033 | 2357076 | 2357507 | 144 | 2 | 18% | Excinuclease/observed (4e-35) | |
| 934724 | 935068 | 115 | 1 | 17% | Unknown/Observed (1e-27) | ||
| 2751483 | 2750281 | 401 | 1 | 4% | Unknown/Conserved (1e-162) | ||
| SPO_PG036 | 562052 | 560282 | 590 | 3 | 9% | ABC transporter/conserved (0.0) | |
| SPO_PG037 | 3188876 | 3188459 | 139 | 3 | 27% | Heat shock protein/observed (3e-55) | |
| SPO_PG038 | 2152217 | 2151179 | 346 | 2 | 10% | Aminotransferase/conserved (1e-168) | |
| 3515528 | 3515111 | 139 | 1 | 17% | Stress protein/unique (conserved in | ||
a Targets in bold represent those "one-hit-wonders" validated by RT-PCR.
b Observed indicates presence of a similar gene in less than 5 other Roseobacter strains whereas Conserved means presence in over 20 of the 36 strains searched. E-value for BLAST analysis with its nearest homologue is indicated in brackets.
Figure 1RT-PCR amplification for 'one-hit wonder' validation. A schematic view of the genomic region of target SPO_PG034 is shown as an example. The pink square represents the putative protein sequence highlighting in red the unique peptide detected by MS/MS. SPO0877 is shown with its conserved BLAST region (broken green line) and plausible non-coding area (grey crossed). In yellow is the mRNA produced from SPO_PG034 which was amplified with by RT-PCR using specific primers. The 3% agarose gel stained with ethidium bromide shows the five "one-hit wonder" targets from which RT-PCR amplification was obtained (lane "+"). Lanes marked with "-" represent negative controls by testing PCR amplification on RNA extractions to ensure total DNA elimination.
Figure 2Chromosome region view of three novel genes detected by proteogenomics. Loci of targets SPO_PG032 (Panel A) and targets SPO_PG027 and SPO_PG009 (Panel B) are represented. The six reading frames are shown with all coded stop codons represented by black dashes. Coloured in red are the nucleic sequences specifying the peptides detected by MS/MS. Panel A: The green line represents the area of the stop-to-stop sequence that shows homology to other proteins by means of a PSI-BLAST. The amino acid sequence in bold black represents the plausible sequence of SPO_PG032. Highlighted in red squares are the peptides detected by MS/MS. Panel B: Blue lines represent RT-PCR amplification attempts. Green rectangles show identified transcriptional terminators.
Figure 3Locus of a sequencing error detected in the SPO_PG039 sequence. The reading frame of this novel detected gene shifts from frame -2 (highlighted in blue) to frame -1 (in red) due to an erroneously annotated extra nucleotide (highlighted in green).
Figure 4Genome conservation of operons . The comparison was carried out by a BLAST analysis. In green and blue are those conserved genes that make up the operons. In orange are the novel genes reported in this work. Brown genes represent those genes that share identity with other genes in Roseobacter members.
List of novel genes detected after extending the data obtained in R.pomeroyi to 36 other Roseobacter members
| Target | Roseobacter strain | GenBank locus | 5' start | 3' stop | E-value |
|---|---|---|---|---|---|
| SPO_PG009 | 1860966 | 1860490 | 4e-21 | ||
| SPO_PG019 | 40225 | 39680 | 2e-27 | ||
| 845882 | 846427 | 5e-26 | |||
| SPO_PG020 | 2703488 | 2704012 | 3e-26 | ||
| SPO_PG024 | 1272857 | 1271454 | 1e-158 | ||
| SPO_PG029 | 691858 | 690500 | 3e-91 | ||
| 672254 | 670707 | 8e-33 | |||
| SPO_PG032 | 1621388 | 1621861 | 4e-28 |