| Literature DB >> 15983137 |
John Gladitz1, Kai Shen, Patricia Antalis, Fen Ze Hu, J Christopher Post, Garth D Ehrlich.
Abstract
A similarity statistic for codon usage was developed and used to compare novel gene sequences found in clinical isolates of Species">Haemophilus influenzae with a reference set of 80 prokaryotic, eukaryotic and viral genomes. These analyses were performed to obtain an indication as to whether individual genes wereEntities:
Mesh:
Substances:
Year: 2005 PMID: 15983137 PMCID: PMC1160521 DOI: 10.1093/nar/gki670
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Characteristics of the reference organisms
| Reference organism (abbreviated) | ɛ ( | GC% |
|---|---|---|
| 0 | 38.76 | |
| 7.9 | 37.45 | |
| 12.65 | 37.15 | |
| 12.89 | 40.74 | |
| 15.48 | 37.15 | |
| 16.48 | 38.99 | |
| HP1 phage (HP1) | 16.51 | 40.44 |
| 17.12 | 37.79 | |
| HP2 phage (HP2) | 17.32 | 40.4 |
| 22.26 | 32.88 | |
| 22.49 | 35.5 | |
| enterobactphage T4 (T4) | 22.5 | 35.37 |
| 24.53 | 40.61 | |
| Bacteriophage A118 (A118) | 25.26 | 36.33 |
| 28.8 | 39.14 | |
| 28.83 | 31.74 | |
| 28.96 | 24.98 | |
| 30.9 | 30.76 | |
| 33.26 | 39.8 | |
| 33.29 | 29.24 | |
| 33.62 | 26.75 | |
| 34.24 | 40.62 | |
| 34.36 | 36.88 | |
| Gifsy-1 (GF1) | 34.79 | 42.27 |
| 34.84 | 44.32 | |
| 36.4 | 47.35 | |
| 36.67 | 39.71 | |
| 36.69 | 25.87 | |
| 36.81 | 39.56 | |
| 36.91 | 38.93 | |
| 36.98 | 45.67 | |
| 37.53 | 45.49 | |
| 38.84 | 26.25 | |
| Gifsy-2 (GF2) | 39.81 | 43.88 |
| 40.01 | 44.32 | |
| influenzae C virus (INFC) | 40.23 | 38.38 |
| Human papilomavirus 16 (HP16) | 40.8 | 38.12 |
| 41.03 | 48.97 | |
| 42.37 | 42.76 | |
| Human rotavirus (strain rv5) (HR5) | 43.51 | 32.77 |
| 43.65 | 47.67 | |
| 43.74 | 31.85 | |
| 43.92 | 26.76 | |
| FIV (FIV) | 44.07 | 36.44 |
| 45.36 | 44.41 | |
| 45.95 | 36.32 | |
| 46.01 | 30.55 | |
| bacteriophage 933W (933W) | 47.86 | 49.81 |
| Hepatitis A virus (HepA) | 47.97 | 37.15 |
| 48.15 | 34.16 | |
| enterobactphage Mu (Mu) | 48.22 | 52.14 |
| 51.25 | 24.64 | |
| 51.95 | 51.83 | |
| HIV type1 (HIV1) | 52.21 | 43.33 |
| 53.58 | 47.34 | |
| Bacteriophage N15 (N15) | 53.69 | 51.39 |
| 53.84 | 52.52 | |
| influenzae A virus (HongKong) (infA) | 55.15 | 44.78 |
| 55.73 | 53.78 | |
| HIV type2 (HIV2) | 57.74 | 46.14 |
| 62.97 | 52.59 | |
| 63.68 | 53.06 | |
| Plasmid R124 (R124) | 64.75 | 51.45 |
| 65.62 | 46.45 | |
| 67.09 | 52.66 | |
| 68.15 | 52.41 | |
| Agrobact. sp. (Agr) | 69.52 | 56.86 |
| 70.1 | 55.79 | |
| 70.47 | 55.98 | |
| 75.35 | 57.6 | |
| 75.59 | 54.03 | |
| 76.59 | 49.37 | |
| 79.22 | 59.9 | |
| 86.47 | 56.93 | |
| 92.4 | 57.73 | |
| 99.09 | 63.24 | |
| 101.35 | 64.48 | |
| 108.1 | 67.24 | |
| 114.7 | 66.45 | |
| 116.67 | 67.67 |
Figure 1Chart showing the correlation between protein length in amino acids (x-axis) and average ɛ value (y-axis).
Percentage of genes best-fitting Haemophilus with optimized amino acid bias factor
| Amino acid range | Number of genes | ɛ Average | SD | % Genes selecting Rd | % Genes selecting (HF + HFAY + HFDY) | % Genes selecting (HP1 + HP2) | % Genes selecting |
|---|---|---|---|---|---|---|---|
| 100–139 | 150 | 40.04 | 6.54 | 25.33 | 20.67 | 10 | 56 |
| 140–179 | 176 | 34.52 | 5.31 | 36.93 | 24.43 | 4.55 | 65.9 |
| 180–239 | 269 | 31.65 | 4.91 | 44.61 | 20.82 | 7.43 | 72.9 |
| 240–319 | 310 | 28.02 | 4.83 | 58.39 | 13.23 | 8.06 | 79.7 |
| 320+ | 633 | 24.36 | 5.94 | 63.19 | 15.32 | 7.42 | 85.9 |
HP1 = Haemophilus phage 1; HP2 = Haemophilus phage 2; HFAY = Haemophilus influenzae biogroup aegyptius; HFDY = Haemophilus ducreyi.
Organisms providing the closest similarity to the 1538 Haemophilus influenzae Rd genes of length >100 amino acids
| GC content | Number of genes | PM | Remainder | ||
|---|---|---|---|---|---|
| 25–28 | 15 | 2 | 0 | 0 | BB(2),CB(2),FN,GT(2),MM,PF,SA,UU(3) |
| 29–32 | 58 | 19 | 0 | 0 | CB(3),LL(3),MG(3),SA(3),CJ(4),T4(4),UU(7),BB(2),CM(2),LI(2),PF(2)HR5,LP |
| 33–42 | 1335 | 994 | 82 | 81 | AA(11),A118(3),BB,BH(5),BS,CN(2),CB,CJ(2),CM(2),T4(20),GF2(2),GT(6),HP(2),INFC,LH(6),LI(22),LL(17),LP(23),MG(4),MP(2),SA(19),SCHZ(2), SE,SP(3),UU(20) |
| 43–46 | 108 | 57 | 20 | 16 | AA(2),BH,BS,LH,MP,NG,SA(2),SE.SP,T4,UU,VC(2) |
| 47–50 | 22 | 0 | 13 | 2 | AA,HP,NG,VC(5) |
H. strains = Best fit to one of the four Haemophilus strains; H. phage = Best fit to Haemophilus phage HP1 or HP2; PM = Best fit to P.multocida.
aSee Table 1 for explanation of abbreviations.
Characteristics of Haemophilus influenzae Rd genes displaying an ɛ value that exceeded the mean ɛ value plus one SD obtained from Equation 1
| Gene name | Gene | %GC | Length (amino acid) | ɛ | Ref. org. | % Better fit > | ɛ (w/r ribosomal group) |
|---|---|---|---|---|---|---|---|
| Cell envelope | |||||||
| Lipoprotein (nlpC) | HI1314 | 40 | 161 | 42.43 | HFAY | 0 | 62.3 |
| Lic-1operon protein(licA) | HI1537 | 33 | 267 | 33.45 | HFDY | 0 | 63.49 |
| Lic-1operon protein(licD) | HI1540 | 34 | 265 | 33.86 | A118 | 8.9 | 49.85 |
| Undecaprenyl-phosphate alpha- | HI1716 | 37 | 356 | 32.52 | HFRD | 0 | 69.71 |
| Cellular processes | |||||||
| Lactoylglutathionelyase(gloA) | HI0323 | 41 | 135 | 47.77 | HFRD | 0 | 53.51 |
| Competence proteinF (comF) | HI0434 | 38 | 230 | 36.22 | PM | 0.1 | 64.33 |
| Carbonic anhydrase-putative | HI1301 | 38 | 230 | 37.88 | HFRD | 0 | 55.98 |
| Conserved or predicted hypothetical | |||||||
| Conserved hypothetical protein (homol. to haloacid dehalogenase-like protein) | HI0003 | 37 | 263 | 35.27 | HFRD | 0 | 72.47 |
| | HI0152 | 35 | 236 | 35.08 | ET4 | 2.3 | 71.52 |
| | HI0221.1 | 44 | 162 | 43.64 | HFRD | 0 | 35.55 |
| Conserved hypothetical protein (predicted hydrolase or acyltransferase) | HI0282 | 36 | 248 | 35.42 | LP | 7.1 | 72.05 |
| Conserved hypothetical protein (homol. to transcriptional regulator) | HI0304 | 39 | 186 | 41.25 | PM | 2.5 | 68.96 |
| Conserved hypothetical protein (homol. to lysine 2,3-aminomutase) | HI0329 | 38 | 338 | 32.18 | HFRD | 0 | 66.55 |
| Conserved hypothetical protein | HI0510 | 42 | 239 | 36.98 | HP1 | 0 | 59.91 |
| Conserved hypothetical protein (homol. to pyruvate formate lyase) | HI0520 | 39 | 263 | 35.38 | PM | 1.6 | 75.17 |
| | HI0554 | 30 | 181 | 39.08 | UU | 4.5 | 72.21 |
| Conserved hypothetical protein | HI0638 | 38 | 205 | 37.07 | HP2 | 0 | 65.62 |
| Conserved hypothetical protein (homol. to DNA topoisomerase) | HI0656.1 | 39 | 179 | 39.7 | AA | 0.4 | 74.86 |
| Conserved hypothetical protein (4-diphosphocytidyl-2- | HI0672 | 40 | 226 | 36.26 | HFDY | 0 | 63 |
| Conserved hypothetical protein (probable pseudouridylate synthase) | HI0694 | 37 | 240 | 34.66 | HP1 | 0 | 67.36 |
| Conserved hypothetical protein (homol. to integral membrane protein) | HI0862 | 39 | 236 | 37.91 | LI | 0.5 | 43.75 |
| Conserved hypothetical protein (homol. to cytosine/adenosine deaminase) | HI0906 | 41 | 173 | 39.17 | PM | 0.3 | 72.95 |
| Conserved hypothetical protein (homol. to methylases) | HI0925 | 36 | 122 | 47.52 | HFDY | 0 | 76.27 |
| | HI0983 | 35 | 194 | 40.33 | HFRD | 0 | 71.52 |
| | HI1055 | 39 | 515 | 29.75 | HP2 | 0 | 66.79 |
| | HI1058 | 40 | 195 | 51.88 | GF2 | 9.9 | 88.85 |
| Conserved hypothetical protein (homology to membrane protein) | HI1073 | 38 | 125 | 49.31 | HF | 0 | 52.46 |
| Conserved hypothetical GTP-binding protein (predicted GTPase) | HI1118 | 40 | 206 | 37.32 | HP2 | 0 | 65.19 |
| Conserved hypothetical protein | HI1150 | 34 | 210 | 38.9 | HFDY | 0 | 68.94 |
| Conserved hypothetical protein (probable translation factor) | HI1198 | 39 | 207 | 36.96 | HFRD | 0 | 67.81 |
| | HI1343 | 39 | 239 | 36.09 | LL | 7.5 | 75.17 |
| | HI1375 | 28 | 302 | 33.53 | HFRD | 0 | 61.53 |
| | HI1498 | 47 | 139 | 44.72 | HP1 | 0 | 81.43 |
| | HI1499 | 46 | 189 | 40.88 | PM | 0.1 | 71.55 |
| | HI1500 | 48 | 508 | 30.69 | HP2 | 0 | 64.05 |
| | HI1505 | 47 | 308 | 33.23 | HP1 | 0 | 53.79 |
| Conserved hypothetical protein (homol. to Mu-like phage protein gp36) | HI1508 | 47 | 141 | 45.89 | HP1 | 0 | 66.55 |
| Conserved hypothetical protein (homol. to Mu-like phage protein gp37) | HI1509 | 49 | 194 | 41.08 | VC | 6.2 | 69.66 |
| | HI1518 | 50 | 182 | 41.53 | VC | 2.9 | 74.48 |
| | HI1519 | 50 | 135 | 50.11 | HP | 3.9 | 77.58 |
| | HI1523 | 38 | 296 | 33.71 | HP2 | 0 | 57.79 |
| | HI1570 | 42 | 170 | 47.82 | BH | 16.2 | 81.55 |
| Conserved hypothetical protein (probable 3-Deoxy-D-manno-octulosonate 8-phosphate phosphatase) | HI1679 | 43 | 180 | 38.57 | HFRD | 0 | 68.72 |
| Conserved hypothetical protein [homol. to Mn(+2) and Fe(+2) transporters] | HI1728 | 39 | 398 | 31.06 | HFRD | 0 | 61.31 |
| Conserved hypothetical protein (homol. to lactam utilization protein) | HI1729 | 39 | 258 | 34.28 | HFRD | 0 | 61.49 |
| Metabolism | |||||||
| Esterase | HI0184 | 44 | 276 | 33.45 | PM | 5 | 72.89 |
| Ferredoxin-type protein (napH) | HI0346 | 42 | 287 | 32.89 | HFRD | 0 | 62.02 |
| Urease accessory protein (ureH) | HI0535 | 44 | 262 | 36.19 | SE | 16.4 | 78.55 |
| Urease accessory protein(ureG) | HI0536 | 44 | 226 | 40.57 | VC | 5.7 | 47.87 |
| 2-Hydroxy acid dehydrogenase | HI1556 | 39 | 316 | 31.93 | HFRD | 0 | 59.98 |
| Enoyl-(acyl-carrier-protein) reductase (fabI) | HI1734 | 43 | 296 | 36.5 | HFRD | 0 | 41.78 |
| Nucleosides, nucleotides, purines, pyrimidines | |||||||
| Hydroxy ethylthiazole kinase | HI0415 | 48 | 265 | 34.79 | HP1 | 0 | 75.16 |
| Thymidylate synthetase (thyA) | HI0905 | 40 | 283 | 33.19 | HFRD | 0 | 66.13 |
| Uracil phosphoribosyl transferase (upp) | HI1228 | 41 | 209 | 36.28 | HFRD | 0 | 42.11 |
| Phosphoribosyl aminoimidazole synthetase (purM) | HI1429 | 44 | 345 | 31.19 | HP2 | 0 | 42.27 |
| Phage-like | |||||||
| Transposase (muA) | HI1478 | 48 | 686 | 30.55 | HP2 | 0 | 56.04 |
| DNA transposition protein (muB) | HI1481 | 48 | 287 | 37.29 | HP2 | 0 | 60.03 |
| E16 protein-putative | HI1488 | 42 | 184 | 41.89 | GF2 | 8.4 | 74.87 |
| Iprotein (muI) | HI1504 | 48 | 355 | 33.09 | HP2 | 0 | 56.76 |
| Sheath protein gpL (muL) | HI1511 | 48 | 487 | 30.4 | HP2 | 0 | 66.86 |
| 64 kDa virion protein (muN) | HI1515 | 46 | 455 | 31.34 | VC | 8.7 | 64.41 |
| Gprotein (muG-2) | HI1568 | 44 | 139 | 43.51 | PM | 0.1 | 60.71 |
| Regulators | |||||||
| Transcriptional regulator-putative | HI0186 | 38 | 135 | 46.22 | HP1 | 0 | 67.41 |
| Transcriptional regulatory protein | HI1476 | 43 | 240 | 37.42 | BS | 1.5 | 56.47 |
| Replication | |||||||
| Integrase/recombinase (xerD) | HI0309 | 42 | 297 | 32.44 | HFRD | 0 | 67.01 |
| Holliday junction DNA helicase (ruvB) | HI0312 | 43 | 336 | 33.63 | HFRD | 0 | 64.1 |
| RNA,tRNA modifying | |||||||
| tRNA-guanine transglycosylase (tgt) | HI0244 | 41 | 383 | 32 | HFRD | 0 | 42.85 |
| rRNAmethylase-putative | HI0766 | 39 | 161 | 40.92 | HFRD | 0 | 51.13 |
| Pseudouridylate synthase I (truA) | HI1644 | 41 | 270 | 35.53 | HFRD | 0 | 70.83 |
| Translation | |||||||
| Polypeptide deformylase (def) | HI0622 | 37 | 169 | 43.04 | HFRD | 0 | 63.76 |
| Prolyl-tRNA synthetase | HI0729 | 43 | 572 | 28.32 | HFRD | 0 | 38.02 |
| Transport | |||||||
| tonB protein | HI0251 | 40 | 271 | 35.21 | A118 | 8.3 | 57.67 |
| ABCtransporter | HI0354 | 47 | 240 | 37.97 | HP2 | 0 | 80.01 |
| ABCtransporter | HI0355 | 46 | 245 | 43.03 | BH | 0.9 | 82.72 |
| Glycerol-3-phosphatase transporter (glpT) | HI0686 | 42 | 480 | 46.66 | HFRD | 0 | 16.89 |
| Aminoacid ABCtransporter-permease protein | HI1079 | 35 | 211 | 36.96 | HFRD | 0 | 57.36 |
| Hemeexporter ATP-binding protein A (ccmA) | HI1089 | 42 | 212 | 37.11 | HFRD | 0 | 61.56 |
| Arginine ABC transporter-periplasmic-binding protein (artI) | HI1179 | 36 | 240 | 35.57 | HFRD | 0 | 54.75 |
| Ironchelatin ABC transporter | HI1472 | 44 | 352 | 30.99 | HP2 | 0 | 59.2 |
| ABC transporter-ATP-binding protein | HI1474 | 31 | 200 | 37.09 | ET4 | 9.8 | 61.34 |
| Glutamate permease (gltS) | HI1530 | 38 | 404 | 31.2 | HFDY | 0 | 37.44 |
Gene = the gene number in the annotated H.influenzae Rd genome; %GC = the percentage of GC base pairs in the gene; ɛ = the statistic derived from Equation 1; Ref. org. = The reference genome most similar in codon usage (see Table 1 for list of abbreviations); % better fit > Haemophilus = the percentage by which ɛ is lower in the most similar reference organism than Haemophilus. ɛ (w/r ribosomal group) = the ɛ value with respect to the 21 ribosomal and elongation genes of Haemophilus longer than 140 amino acids.
Rd genes demonstrating best-fit based upon Equation 1 (>10%) to a non-Haemophilus organism
| Gene | % GC | L (amino acids) | ɛ ( | Gene name | Ref. org. | % better fit > |
|---|---|---|---|---|---|---|
| HI0916 | 35 | 198 | 32.57 | Outer membrane protein | UU | 14.5 |
| HI1407 | 38 | 448 | 20.46 | traN-related protein | LP | 14.9 |
| HI1599 | 34 | 239 | 25.15 | SA | 15.1 | |
| HI1470 | 38 | 254 | 30.6 | Iron chelatin ABC transporter | AA | 15.2 |
| HI0855 | 42 | 116 | 43.56 | Conserved hypothetical protein (protein homol. to inner membrane protein) | INFC | 15.5 |
| HI1411 | 39 | 172 | 28.81 | Terminase-small subunit | LP | 15.8 |
| HI1412 | 36 | 174 | 27.49 | Conserved hypothetical protein (protein homol. to phage-encoded prot, possible anti-repressor) | LP | 15.8 |
| HI1070 | 43 | 1305 | 20.06 | ATP-dependent helicase (hrpa) | PM | 16 |
| HI1570 | 42 | 170 | 47.82 | BH | 16.2 | |
| HI1385 | 30 | 165 | 29.38 | Ferritin (rsgA) | UU | 16.3 |
| HI0535 | 44 | 262 | 36.19 | Urease accessory protein (ureH) | SE | 16.4 |
| HI0087 | 40 | 424 | 29.21 | Threonine synthase (thrC) | BS | 16.6 |
| HI1110 | 34 | 504 | 17.68 | D-xyloseABC transporter-ATP-binding protein (xylG) | LL | 16.9 |
| HI0601 | 31 | 217 | 33.77 | DNA transformation protein (tfoX) | MG | 17.1 |
| HI1384 | 31 | 182 | 26.07 | Ferritin (rsgA) | SA | 17.1 |
| HI0724 | 32 | 186 | 28.5 | Conserved hypothetical protein | T4 | 17.6 |
| HI0011 | 41 | 135 | 34.03 | DNA polymeraseIII psi subunit (holD) | LP | 17.9 |
| HI0228 | 30 | 125 | 41.22 | T4 | 18 | |
| HI0977 | 30 | 191 | 29.91 | Cell filamentation protein (fic) | UU | 18.5 |
| HI1410 | 41 | 395 | 24.46 | SE | 19.9 | |
| HI1422 | 45 | 191 | 45 | NG | 19.9 | |
| HI0802 | 39 | 327 | 31.13 | DNA-directed RNApolymerase-alpha chain (rpoA) | T4 | 20.9 |
| HI1099 | 32 | 102 | 32.67 | MP | 21 | |
| HI1040 | 31 | 334 | 24.77 | Type II restriction enzyme | BB | 21.1 |
| HI0787 | 28 | 201 | 28.46 | BB | 22.7 | |
| HI0358 | 42 | 215 | 34.39 | Transcriptional activator-putative | SCHZ | 24.7 |
| HI0588 | 34 | 411 | 28.55 | UU | 24.7 | |
| HI0872 | 30 | 471 | 27.1 | Undecaprenyl-phosphate galactose phospho transferase (rfbP) | CJ | 26.5 |
| HI1514 | 49 | 631 | 24.19 | VC | 26.7 | |
| HI1718 | 35 | 262 | 25.08 | PF | 26.7 | |
| HI0352 | 26 | 232 | 24.22 | Conserved hypothetical protein (protein homology to sialyl transferase) | UU | 27.6 |
| HI1459 | 32 | 195 | 25.09 | Putative sigma factor | T4 | 28.4 |
| HI1225 | 36 | 106 | 32.69 | Conserved hypothetical protein (homol. to translation initiation factor) | UU | 28.5 |
| HI0054 | 31 | 266 | 22.52 | Uxuoperon regulator (uxuR) | PF | 32 |
| HI0053 | 35 | 343 | 25.14 | Zinc-type alcohol dehydrogenase | CB | 37.8 |
| HI0051 | 31 | 166 | 28.76 | Conserved hypothetical transmembrane protein (homol. to transport protein) | BB | 40.5 |
| HI1647 | 44 | 291 | 25.32 | Conserved hypothetical protein (homol. to pyridoxine biosynthesis protein) | SP | 45.5 |
| HI0258 | 27 | 331 | 25.96 | Glycosyl transferase-putative | GT | 46.3 |
| HI1041 | 32 | 304 | 26.95 | Modification methylase | PF | 50.5 |
| HI0688 | 25 | 103 | 32.43 | MM | 53.7 | |
| HI0871 | 28 | 306 | 26.06 | BB | 54 | |
| HI1578 | 27 | 323 | 23.91 | Glycosyl transferase | PF | 57.4 |
| HI0512 | 30 | 259 | 21.82 | TypeII restriction endonuclease (HindIIR) | CB | 58.8 |
| HI1392 | 31 | 309 | 23.16 | Modification methylase (hindIIIM) | FN | 63.6 |
| HI1287 | 49 | 444 | 27.07 | TypeI modification enzyme (hsdM) | NG | 66.6 |
| HI1393 | 26 | 300 | 17.94 | TypeII restriction endonuclease (hindIIIR) | FN | 86.3 |
| HI0513 | 26 | 519 | 17.01 | Modification methylase (hindIIM) | CB | 89.3 |
| HI0687 | 26 | 304 | 18.13 | Conserved hypothetical protein (homol. to drug/metabolite transport protein) | CB | 99.6 |
Novel ORFs from H.influenzae clinical isolates with ɛ values closest to the phage HP1 or HP2
| Clone no. and ORF | ɛ | Number of amino acids | % GC | Protein homology (%ID, %Sim) | Organism with closest protein homology |
|---|---|---|---|---|---|
| 100_E23 | 22.04 (34.04) | 487 | 44 | ||
| 103_L4 | 32.43 (35.87) | 255 | 45 | Hypoth. protein Hinf801001315 (98, 99) (underlying phage homologies) | |
| 120_O6(ORF1) | 38.88 (44.85) | 183 | 41 | Hypoth protein Hinf801001531 (94, 98) (underlying phage protein homologies) | |
| 122_N17(ORF1) | 30.32 (35.5) | 286 | 42 | ATPases of the AAA+ class (99, 100) | |
| 126_N4(ORF2) | 31.79 (35.75) | 279 | 39 | ATPase (AAA+ superfamily) (86, 88) | |
| 13_I7/135_C22(ORF2) | 31.92 (34.84) | 271 | 42 | Chromosome segregation ATPases (90, 94) (underlying phage homologies to capsid protein precursor) | HI R2866 |
| 152_N2 | 39.59 (42.83) | 213 | 46 | HifD (85, 88) | |
| 153_I16(ORFs1,2) | 26.45 (32.12) | 295 | 43 | Phage associated baseplate assembly protein | |
| 168_P21(ORF1) | 29.64 (32.04) | 165 | 42 | Hypoth. Protein MS0093 (47, 65) (underlying phage protein homologies) | |
| 32_F13 | 25.63 (31.7) | 412 | 45 | Superfamily II helicase and inactivated derivatives (49, 64) (underlying phage homologies) | |
| 38_O23(ORF1) | 27.23 (33.93) | 187 | 43 | Hypoth. protein MS0080 (53, 69) (underlying tail fiber protein homologies) | |
| 38_O23(ORF2) | 35.37 (46.4) | 235 | 49 | Hypoth protein MS0081 (51, 70) (underlying baseplate assembly homologies) | |
| 4_E21(ORF1) | 28.14 (34.48) | 240 | 45 | HifC (98, 99) | |
| 4_E21(ORF2) | 54.19 (58.91) | 189 | 48 | HifD (88, 90) | |
| 59_C2(ORFs1,2,3) | 30.96 (35.09) | 276 | Orf1: transcriptional regulator (82, 88). Orf2: prophage CP4-57 regulator protein AlpA. Orf3: Hypoth. Protein lpp2120 (51, 69) | Orf1: | |
| 67_D11(ORF1) | 28.74 (36.28) | 214 | 45 | Hypoth.protein Hflu203000157 (97, 97) (underlying phage homologies to endonuclease subunit) | |
| 67_D11(ORF2) | 32.46 (36.9) | 349 | 43 | Hypoth. protein Hinf801001765 (99, 100) (underlying phage homologies to capsid protein precursor) | |
| 67_D11(ORF3) | 50.71 (51.07) | 106 | 40 | Methyl accepting chemotaxis protein (97, 99) | |
| 97_H3 | 34.73 (35.05) | 327 | 41 | 2-methyl thioadenine synthetase (98, 99) | |
| 17_D20 | 22.99 (23.52) | 392 | 44 | hypothetical protein Bucepa03004689(47, 65) (underlying phage homologies) |
ɛ = codon usage bias similarity statistic; %GC = the percentage of guanine and cytosine nucleotide bases in an ORF; % ID = the percentage of amino acids from the novel ORF that are identical to the protein encoded by the paralogous reference gene; %Sim = the percentage of amino acids from the novel ORF that are similar to the protein encoded by the orthologous/paralogous reference gene.
aThe number of the ORF within the clone if the clone contained multiple ORFs.
bThe number in the parentheses is the lowest ɛ value amongst the four H.influenzae strains.
Novel ORFs from H.influenzae clinical isolates that are probably of foreign origin
| Clone no. and ORF | Number of amino acids | % GC | Lowest ɛ (species) | Δ%ɛ | Protein homology (%ID, %Sim) | Organism with closest protein homology |
|---|---|---|---|---|---|---|
| 179_D14 | 240 | 69 | 31.98 (PA) | 179.2 | Flp pilus assembly protein, ATPase CpaF (91, 93) | Azotobacter vinelandii |
| 132_O3 (ORF1) | 191 | 49 | 33.81 (NM) | 55.3 | Conserved hypoth. protein ( | |
| 151_O4 | 385 | 29 | 26.4 (BB) | 38.7 | YhbX/YhjW/YijP/YjdB family (48, 69) | |
| 125_L2(ORF3) | 143 | 37 | 32.6 (GT) | 37.9 | Anaerobic decarboxylate transporter (71, 83) | |
| 121_L20 | 149 | 32 | 35.66 (MM) | 32.7 | SAM-dependent methyltransferase | |
| 55_M14 | 275 | 28 | 27.21 (CJ) | 30.1 | Hydrolase (metallo-beta-lactamase) (33, 52) | |
| 173_G10 | 225 | 34 | 29.85 (CM) | 29.7 | Hypoth. protein Hflu20300043 (98, 98) | |
| 104_E15(orf1) | 180 | 41 | 35.39 (BH) | 26.4 | Hypoth. protein Hsom02001338 (61, 76) | |
| 124_K2 | 567 | 43 | 24.64 (VC) | 23.9 | Type I restriction enzyme HsdR (70, 81) | |
| 125_L2(ORF1) | 144 | 32 | 34.91 (GT) | 23.2 | Transcriptional regulator (LysR family) (68, 77) | |
| 120_O6 (ORF 2) | 198 | 41 | 32.06 (HP) | 22.7 | Hypoth protein (73, 82) | |
| 125_L2(ORF2) | 230 | 34 | 31.14 (UU) | 21.3 | Unknown (58, 77) (putative aspartate racemase) | |
| 13_D9(ORF2) | 381 | 36 | 24.64 (A118) | 19.2 | TnaB (96, 97) | |
| 121_J7 | 293 | 33 | 33.35 (CB) | 19.1 | DNA methylase (58, 74) | E.coli |
| 32_B2 | 198 | 33 | 22.12 (SA) | 14.6 | Hemoglobin–haptoglobin binding protein HhuA (57, 73) | |
| 47_C3 | 306 | 41 | 29.04 (VC) | 9.8 | Recombinational DNA repair protein (99, 99) | |
| 183_E8 | 177 | 44 | 40.5 (BS) | 8.8 | Transcriptional regulator (100, 100) | |
| 159_B20(ORFs1,2) | 157 | 38 | 40.62 (A118) | 8.8 | HD0114 and HD0115 (40, 69) (weaker protein homologies to HI1496 and HI1495) | |
| 96_C16 | 371 | 33 | 19.82 (LL) | 7 | Restriction/modification protein HI0216 (68, 77) | |
| 13_D9(ORF1) | 185 | 43 | 32.29 (GT) | 5.3 | TnaA (98, 98) | |
| 134_O6 | 270 | 39 | 30.24 (PM) | 4.6 | Fapy DNA glycosylase (98, 99) | |
| 112_A12(ORF1) | 250 | 34 | 23.99 (PM) | 3.9 | ADP-heptose:LPS heptosyltransferase(100, 100) | |
| 43_I10 | 358 | 35 | 24.28 (SA) | 2.1 | Putative glucosidase (73, 84) | |
| 110_E11(ORF1) | 151 | 49 | 46.65 (HP) | 1.4 | Hypoth. protein HD1532 (68, 81) | |
| 128_C1 | 141 | 45 | 33.0 (PM) | 0.8 | Hypoth. protein Lin1719 | Listeria innoocua |
| 93_G12/117_A22 (ORF1) | 254 | 43 | 30.79 (LP) | 0.7 | Hypoth. protein | |
| 112_A12(ORF2) | 116 | 40 | 47.52 (PM) | 0.6 | Fapy DNA glycosylase (100, 100) |
aThe number of the ORF within the clone, if the clone contained multiple ORFs; ɛ = codon usage bias similarity statistic.
bThe letters in parentheses indicate the species that gave the lowest ɛ value; PA = Pseudomonas aeruginosa; NM = Neisseria meningiditis; CB = Clostridium butyricum; BH = Bacillus halodurans; MP = Mycoplasma penetrans; VC = Vibrio cholerae; CJ = Campylobacter jejuni; LL = Lactococcus lactis; T4 = enterobacteriophage T4; PM = Pasturella multocida; NG = Neisseria gonorrheae; HI = Haemophilus influenzae.
%GC = the percentage of guanine and cytosine nucleotide bases in an ORF; Δ%ɛ = the percentage difference in ɛ values between the best-fitting genome and the best-fitting Haemophilus group genome; % ID = the percentage of amino acids from the novel ORF that are identical to the protein encoded by the paralogous reference gene;% Sim = the percentage of amino acids from the novel ORF that are similar to the protein encoded by the orthologous/paralogous reference gene.
Novel ORFs from H.influenzae clinical isolates that probably represent genes that are part of the original H.influenzae gene pool
| Clone no. and ORF | ɛ, High ɛ-threshold | Number of amino acids | %GC | Protein homology |
|---|---|---|---|---|
| Hb | 14.72, 26.3 | 1011 | 31 | Hemoglobin binding protein A (49, 76) |
| 166_G6(ORFs1,2,3) | 17.7, 28.5 | 567 | 39, 39, 45 | soluble lytic murein transglycosylase (94, 97): hypothetical protein Hflu2121901 (98, 99): Type IV secretory pathway, VirD4 components (99, 100) |
| 101_H6(ORFs2,3,4) | 18.85, 28.1 | 607 | 37, 36, 40 | Malata/L-lactate dehydrogenase (95, 97): permease of the major facilitator superfamily (97, 98): hypothetical protein (98,98) |
| UI | 20.42, 29.6 | 465 | 36 | Uronate isomerase (78, 79) |
| HifE | 25.83, 34.0 | 38 | HifE (94, 95) | |
| 120_C11 | 24.39, 30.9 | 284 | 37 | HMW1B (88, 93) |
| 135_I10 | 28.42, 32.3 | 291 | 38 | Las (autotransporter protein) (53, 63) |
| 36_E20 | 27.38, 30.8 | 384 | 42 | HMWA (86, 88) |
| 167_A16(ORFs2,3) | 25.35, 27.9 | 598 | 37, 36 | TPR repeat (99, 99):TPR repeat (22, 46) |
| 170_J8 | 31.18, 32.8 | 286 | 42 | HMWA (97, 99) |
aThe number of the ORF within the clone, if the clone contained multiple ORFs.
bɛ = codon usage bias similarity statistic, High ɛ threshold = Value of ɛ associated with ɛ values greater than the avg. plus SD for comparable sized genes.
cAll protein homologies are from various Haemophilus strains; % ID = the percentage of amino acids from the novel ORF that are identical to the protein encoded by the paralogous reference gene; % Sim = the percentage of amino acids from the novel ORF that are similar to the protein encoded by the orthologous/paralogous reference gene.