| Literature DB >> 17553834 |
G X Yu1, E E Snyder, S M Boyle, O R Crasta, M Czar, S P Mane, A Purkayastha, B Sobral, J C Setubal.
Abstract
We present a bacterial genome computational analysis pipeline, called GenVar. The pipeline, based on the program GeneWise, is designed to analyze an annotated genome and automatically identify missed gene calls and sequence variants such as genes with disrupted reading frames (split genes) and those with insertions and deletions (indels). For a given genome to be analyzed, GenVar relies on a database containing closely related genomes (such as other species or strains) as well as a few additional reference genomes. GenVar also helps identify gene disruptions probably caused by sequencing errors. We exemplify GenVar's capabilities by presenting results from the analysis of four Brucella genomes. Brucella is an important human pathogen and zoonotic agent. The analysis revealed hundreds of missed gene calls, new split genes and indels, several of which are species specific and hence provide valuable clues to the understanding of the genome basis of Brucella pathogenicity and host specificity.Entities:
Mesh:
Substances:
Year: 2007 PMID: 17553834 PMCID: PMC1919506 DOI: 10.1093/nar/gkm377
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Data flow in GenVar showing its three constitutive conceptual steps. sspDB: species-specific database; gwpDB: gene-specific database.
Classification of split genes detected in B. abortus 2308
| Group | Alignment length (AA) | Biological function of orthologs | Species association |
|---|---|---|---|
| G1 | <=100 | Any | Any |
| >100 | Unknown | ||
| G2 | >100 | Assigned | Detected in two |
| G3 | >100 | Assigned | Detected in both |
| G4 | >100 | Assigned | Detected in |
| G5 | >100 | Assigned | Detected in all four |
| G6 | >100 | Assigned | Detected in any other combination of |
Figure 2.Missed gene calls revealed in the intergenic DNA regions from the four Brucella genomes. The bars show the total number of missed gene calls (blue) and the number of missed gene calls that are larger than 100 AA and have orthologs with assigned biological functions (yellow). BME stands for B. melitensis 16M; BSU for B. suis 1330; BA9941 for B. abortus 9-941; and BA2308 for B. abortus 2308. The letters I and II stand for chromosomes I and II, respectively.
The number of split genes detected in the four Brucella genomes
| Organism | Premature stop codon | Frameshift | Both | Total |
|---|---|---|---|---|
| 13 | 44 | 2 | 59 | |
| 28 | 86 | 6 | 120 | |
| 44 | 124 | 7 | 175 | |
| 33 | 122 | 10 | 165 | |
| 81 | 247 | 11 | 339 | |
| 19 | 91 | 1 | 111 | |
| 11 | 92 | 0 | 103 | |
| 10 | 41 | 1 | 52 |
Figure 3.Pairwise alignments of the urease accessory gene ureE-2 in B. abortus 9-941 (panel I, gi|62290255) and in B. melitensis 16M (panel II, gi|17986929) against intergenic region 1244 from B. abortus 2308. Panel I shows that intergenic region 1244 contains a missed gene call for ureE-2 in B. abortus 2308. Furthermore, both B. abortus strains present two frameshifts (FS1 and FS2) when compared to ureE-2 in B. melitensis 16M (panel II). These frameshifts cause the absence of the multi-histidine Ni2+ chelating center at the C-terminal.
Example of split gene groups in chromosome I of B. abortus 2308. The intergenic DNA regions are numbered according to their order on the chromosome; thus, Intergenic_129 is the 129th intergenic DNA region on chromosome I
| Interval name | Split gene group | Frameshift position | Premature stop position | Ortholog | Gene source | RefSeq annotation |
|---|---|---|---|---|---|---|
| Intergenic_129 | G1 | 163662 | – | 23500355 | BS1330 (II) | Hypothetical protein |
| 163662 | – | 83269513 | BA2308 (II) | Conserved hypothetical protein | ||
| – | 163653 | 23502245 | BS1330 (I) | Hypothetical protein | ||
| – | 163653 | 82700191 | BA2308 (I) | Hypothetical protein | ||
| Intergenic_560 | G2 | 608947 | 607381 | 17987624 | BM16M (I) | Phage host specificity protein |
| – | – | 17987623 | BM16M (I) | Phage host specificity protein | ||
| Intergenic_516 | G2 | 564045 | – | 23501439 | BS1330 (I) | Transcriptional regulator, AraC family |
| – | – | 17987666 | BM16M (I) | Transcriptional regulator, AraC family | ||
| – | – | 17987667 | BM16M (I) | Transcriptional regulator, AraC family | ||
| Intergenic_973 | G3 | – | – | Missed gene | BA9-941 (I) | Unannotated |
| – | 1044923 | 23501937 | BS1330 (I) | Drug resistance transporter, EmrB/QacA family | ||
| – | 1044923 | 17987210 | BM16M (I) | Drug resistance transporter, EmrB/QacA family | ||
| Intergenic_1236 | G3 | – | – | Missed gene | BA9-941 (I) | Unannotated |
| 1328845 | – | 23502222 | BS1330 (I) | ABC transporter, ATP binding/ permease protein | ||
| 1327018 | – | 17986937 | BM16M (I) | ABC transporter ATP-binding protein | ||
| 1328815 | – | YHIH_ECOLI | Swiss-Prot | Hypothetical ABC transporter ATP-binding protein yhiH | ||
| Intergenic_1244 | G3 | – | – | 62290255 | BA9-941 (I) | Urease accessory protein UreE |
| 1335216 and 1335291 | – | 17986929 | BM16M (I) | Urease accessory protein UreE | ||
| 1335216 and 1335291 | – | 23502231 | BS1330 (I) | Urease accessory protein UreE | ||
| Intergenic_336 | G4 | 381948 | – | 17987856 | BM16M (I) | Transcriptional regulator, LysR family |
| 381948 | – | 62289344 | BA9-941 (I) | Transcriptional regulator, LysR family | ||
| 381948 | – | 23501257 | BS1330 (I) | Transcriptional regulator, LysR family |
‘Orthologs’ are defined as the best BLAST hits in the custom database relied on by GenVar; the identifiers given are GenBank accession numbers or Swiss-Prot identifiers. ‘Gene source’ gives the organism or database where the ortholog was found; in this column the acronyms used are as follows: BM16M stands for B. melitensis 16M; BS1330 for B. suis 1330; BA9-941 for B. abortus 9-941 and BA2308 for B. abortus 2308. (I) stands for chromosome I and (II) for chromosome II. The split gene groups are as described in Table 1.
Results of experimental verification of GenVar-detected gene disruptions (frameshifts and premature stop codons) in split genes in B. abortus S19
| Genomes in which the same | Chromosome | Number of Genes | Sequence verification | |
|---|---|---|---|---|
| True | False | |||
| I | 9 | √ | ||
| II | 3 | √ | ||
| I | 7 | √ | ||
| II | 10 | √ | ||
| No other genomes | I | 57 | √ | |
| II | 52 | √ | ||
Figure 4.Pairwise alignments of the Type IV secretion system protein VirB10 gene sequence. Panel I shows the alignment between the B. abortus 9-941 gene (gi|62317019) against that of B. melitensis 16M (gi|17988378), and panel II shows the alignment between the B. suis 1330 gene (gi|23499827) also against that of B. melitensis 16M. The two alignments show that B. melintensis 16M has an 8-residue deletion with respect to its orthologs in B. abortus 9-941 and B. suis 1330 (blue sections in panels I and II). B. suis 1330 has a 3-residue insertion (yellow part of panel II).