| Literature DB >> 28539731 |
Jessica Marklevitz1, Laura K Harris1,2.
Abstract
Antibiotic resistant Staphylococcus aureus is a major public health concern effecting millions of people annually. Medical science has documented completely untreatable S. aureus infections. These strains are appearing in the community with increasing frequency. New diagnostic and therapeutic options are needed to combat this deadly infection. Interestingly, around 50% of the proteins in S. aureus are annotated as hypothetical. Methods to select hypothetical proteins related to antibiotic resistance have been inadequate. This study uses differential gene expression to identify hypothetical proteins related to antibiotic resistant phenotype strain variations. We apply computational tools to predict physiochemical properties, cellular location, sequence-based homologs, domains, 3D modeling, active site features, and binding partners. Nine of 23 hypothetical proteins were <100 residues, unlikely to be functional proteins based on size. Of the 14 differentially expressed hypothetical proteins examined, confident predictions on function could not be made. Most identified domains had unknown functions. Six hypothetical protein models had >50% confidence over >20% residues. These findings indicate the method of hypothetical protein identification is sufficient; however, current scientific knowledge is inadequate to properly annotate these proteins. This process should be repeated regularly until entire genomes are clearly and accurately annotated.Entities:
Keywords: Annotations; Hypothetical proteins; Methicillin; S.aureus
Year: 2017 PMID: 28539731 PMCID: PMC5429968 DOI: 10.6026/97320630013104
Source DB: PubMed Journal: Bioinformation ISSN: 0973-2063
Differential expression T-scores and physiochemical properties of 23 hypothetical proteins
| Protein | T-score | # AA | MW | pI | # neg | # pos | EC | II | AI | GRAVY |
| SACOL0919 | 18.77 | 45 | 5270 | 9.03 | 4 | 6 | 2980 | 17.29 | 136.22 | 0.56 |
| SACOL1859 | 12.23 | 1016 | 120681 | 5.7 | 147 | 130 | 161360 | 37.5 | 95.75 | -0.415 |
| SACOL1346 | 10.83 | 64 | 7573 | 4.1 | 16 | 6 | 5960 | 21.26 | 92.81 | -0.42 |
| SACOL0356 | 9.29 | 78 | 8726 | 4.32 | 17 | 7 | 5960 | 41.83 | 86.28 | -0.529 |
| SACOL0326 | 7.48 | 74 | 8841 | 4.54 | 18 | 7 | 11460 | 52.27 | 77.7 | -0.938 |
| SACOL0323 | 7.32 | 102 | 11944 | 7.91 | 17 | 18 | 9970 | 33.41 | 89.8 | -0.762 |
| SACOL0109 | 6.83 | 135 | 15123 | 4.45 | 14 | 8 | 26930 | 39.15 | 132.89 | 0.757 |
| SACOL0087 | 6.62 | 35 | 4172 | 6 | 5 | 5 | 4470 | 25.62 | 94.57 | -0.149 |
| SACOL0075 | 6.04 | 200 | 22662 | 9.55 | 9 | 20 | 31860 | 42.51 | 121.35 | 0.665 |
| SACOL0644 | 5.35 | 208 | 24690 | 9.55 | 14 | 25 | 43890 | 29.04 | 125.48 | 0.448 |
| SACOL0350 | 3.8 | 118 | 13923 | 10.08 | 14 | 28 | 12950 | 26.44 | 76.02 | -0.804 |
| SACOL0362 | 3.77 | 66 | 7806 | 8.03 | 8 | 9 | 9970 | 37.99 | 125.45 | 0.185 |
| SACOL2481 | 3.23 | 121 | 14067 | 4.54 | 21 | 11 | 5960 | 35.52 | 118.43 | -0.098 |
| SACOL0835 | -2.56 | 209 | 24070 | 9.07 | 31 | 36 | 8940 | 64.82 | 34.16 | -1.974 |
| SACOL2241 | -6.45 | 129 | 14638 | 9.73 | 3 | 8 | 18450 | 26.53 | 155.74 | 1.209 |
| SACOL2123 | -6.59 | 223 | 25856 | 4.74 | 43 | 28 | 28550 | 41.8 | 93 | -0.289 |
| SACOL2491 | -8.97 | 63 | 7221 | 4.6 | 10 | 6 | 7450 | 25.52 | 97.46 | -0.146 |
| SACOL2571 | -9.8 | 63 | 7266 | 5.44 | 9 | 7 | 1490 | 17.06 | 103.65 | -0.233 |
| SACOL2076 | -10.78 | 45 | 5070 | 10.46 | 3 | 10 | 1 | 45.35 | 114.67 | -0.424 |
| SACOL1956 | -14.64 | 176 | 20513 | 9.25 | 10 | 15 | 21555 | 39.83 | 132.95 | 0.747 |
| SACOL0267 | -15.31 | 507 | 57978 | 8.02 | 93 | 95 | 36790 | 23.76 | 74.48 | -0.906 |
| SACOL0488 | -24.11 | 107 | 13458 | 5.23 | 26 | 21 | 15930 | 65.54 | 59.25 | -1.693 |
| SACOL0710 | -25.02 | 165 | 19009 | 5.08 | 27 | 17 | 10430 | 33.94 | 100.48 | -0.181 |
| # AA, number of amino acids; MW, molecular weight; pI, theoretical isoelectric point; # neg, total number of negatively charged residues (Asp + Glu); # pos, total number of positively charged residues (Arg + Lys); EC, extinction coefficient assuming all pairs of Cys residues form cystines; II, instability index; AI, aliphatic index; GRAVY, grand average hydropathy. 1As there are no Trp, Tyr, or Cys in the region considered, protein should not be visible by UV spectrophotometry. | ||||||||||
Top PSI-BLAST result for 14 hypothetical proteins
| Protein | PSI-BLAST Match | Query Cover | E-value | Identity |
| SACOL1859 | NTPase | 100% | 0 | 100% |
| SACOL0323 | Metallophosphoesterase | 59% | 1.6 | 31% |
| SACOL0109 | Membrane protein | 100% | 3.00E-44 | 59% |
| SACOL0075 | Membrane spanning protein | 90% | 7.00E-124 | 98% |
| SACOL0644 | tandem five-TM protein | 100% | 1.00E-143 | 99% |
| SACOL0350 | Phage protein | 100% | 5.00E-80 | 98% |
| SACOL2481 | Outer membrane protein | 59% | 4.3 | 27% |
| SACOL0835 | Exported protein | 91% | 8.00E-128 | 100% |
| SACOL2241 | Membrane protein | 79% | 3.00E-64 | 100% |
| SACOL2123 | PF11042 family protein | 100% | 1.00E-91 | 65% |
| SACOL1956 | Permease | 100% | 3.00E-123 | 100% |
| SACOL0267 | Exported protein | 51% | 8.00E-170 | 98% |
| SACOL0488 | Cytosolic protein | 89% | 2.00E-59 | 100% |
| SACOL0710 | RHS repeat-associated core domain-containing protein | 87% | 1.00E-09 | 29% |
PSortB cellular location of 14 hypothetical proteins
| Protein | Location | Localization Score |
| SACOL1859 | Unknown | 2.501 |
| SACOL0323 | Cytoplasm | 7.5 |
| SACOL0109 | Cytoplasmic membrane | 10 |
| SACOL0075 | Cytoplasmic membrane | 10 |
| SACOL0644 | Cytoplasmic membrane | 10 |
| SACOL0350 | Unknown | 2.501 |
| SACOL2481 | Cytoplasm | 7.5 |
| SACOL0835 | Cytoplasmic membrane | 9.55 |
| SACOL2241 | Cytoplasmic membrane | 10 |
| SACOL2123 | Cytoplasm | 7.5 |
| SACOL1956 | Cytoplasmic membrane | 10 |
| SACOL0267 | Unknown | 3.332 |
| SACOL0488 | Cytoplasm | 7.5 |
| SACOL0710 | Cytoplasm | 7.5 |
| 1Equal probability of the protein being located in any cellular structure: cytoplasm, cytoplasmic membrane, cell wall, or extracellular. 2Equal probability of protein being located in cytoplasmic membrane, cell wall, or extracellular. | ||
SOSUI results for 7 transmembrane hypothetical proteins
| Protein | N-terminal | Transmembrane Region | C-terminal | Type | Length |
| SACOL0109 | 53 | IGKIAIWIGIVAQIYFSVVFVRM | 75 | PRIMARY | 23 |
| 89 | IFLLGLILALFTVLPTIFTAIYM | 111 | PRIMARY | 23 | |
| 123 | IVYAIIALCLYNFLSSILWLIGG | 145 | PRIMARY | 23 | |
| SACOL0075 | 7 | KIAIWIGIVAQIYFSVVFVRMIS | 29 | PRIMARY | 23 |
| 41 | IFLLGLILALFTVLPTIFTAIYM | 63 | PRIMARY | 23 | |
| 75 | IVYAIIALCLYNFLSSILWLIGG | 97 | PRIMARY | 23 | |
| SACOL0644 | 23 | YLLIDLVSTWLVYFFPFINWFIP | 45 | SECONDARY | 23 |
| 94 | QLDNKILISLCFIGFIGIAAFYI | 116 | PRIMARY | 23 | |
| 147 | SFIVFTYLLLGGCSILFLIWLMT | 169 | PRIMARY | 23 | |
| 174 | NLLVFIMWIIITIFFFLISMGSI | 196 | PRIMARY | 23 | |
| SACOL0835 | 23 | AKVVSIATVLLLLGGLVFAIFAY | 45 | PRIMARY | 23 |
| SACOL2241 | 10 | ALIGIFLILCEFFYGIPFLGATF | 32 | PRIMARY | 23 |
| 40 | PLLFNALLYLILTIILLVNRQNA | 62 | PRIMARY | 23 | |
| 65 | PIAIIPIFGIVGSFLAIIPFLGI | 87 | PRIMARY | 23 | |
| 90 | HWILFFLMILFVLVVLSAPTYIP | 112 | PRIMARY | 23 | |
| SACOL1956 | 16 | FIILQLVIALFVILFTYKWALGV | 38 | PRIMARY | 23 |
| 50 | LVYGFAGFIILLILHELIHRALF | 72 | PRIMARY | 23 | |
| 103 | QFSIIMLSPLILLSTGLLILIKV | 125 | PRIMARY | 23 | |
| 134 | MFSMHTAYCFIDILLVALTISSS | 156 | PRIMARY | 23 | |
| SACOL0267 | 6 | KIIIPIIIVLLLIGGIAWGVYAF | 28 | PRIMARY | 23 |
Phyre2 model data for 14 hypothetical proteins
| Protein | Template | Template Description | Confidence | Coverage |
| SACOL1859 | c4kxfF | nlr family card domain-containing protein 4 | 99.70% | 30% |
| SACOL0323 | d1nu9c1 | immunoglobulin/albumin-binding domain-like | 37.80% | 25% |
| SACOL0109 | c3x29A | crystal structure of mouse claudin-19 | 73.20% | 45% |
| SACOL0075 | c4zxsD | virion egress protein ul31 | 55.60% | 20% |
| SACOL0644 | c4yjxB | ATP-dependent clp protease adapter protein | 30.70% | 7% |
| SACOL0350 | c2qdqA | talin-1 | 40.70% | 21% |
| SACOL2481 | c3daoB | putative phosphatse | 23.30% | 17% |
| SACOL0835 | c2ifmA | pf1 filamentous bacteriophage | 80.30% | 14% |
| SACOL2241 | c2ap8A | bombinin h4 | 43.20% | 10% |
| SACOL2123 | c1zctB | glycogenin-1 | 46.30% | 12% |
| SACOL1956 | c3b4rB | putative zinc metalloprotease mj0392 | 89.30% | 41% |
| SACOL0267 | c3jcuj | photosystem ii reaction center protein j | 50.40% | 5% |
| SACOL0488 | c4c46B | general control protein gcn4 | 80.60% | 29% |
| SACOL0710 | c1kt0A | lare fkbp-like protein, fkbp51, involved in steroid2 receptorcomplexes | 93.60% | 47% |
Figure 2Phyre2’s intensive mode models for hypothetical proteins SACOL1859 (A), SACOL0323 (B), SACOL0109 (C), SACOL1956 (D), SACOL0488 (E), and SACOL0710 (F). Image colored by rainbow N- to C-terminus.
Top STITCH predicted binding partners for 11 hypothetical proteins
| Protein | Substrate | Score |
| SACOL1859 | SACOL1860 | 0.651 |
| SACOL0323 | SACOL0322 | 0.819 |
| SACOL0109 | SACOL0110 | 0.692 |
| SACOL0075 | SACOL0076 | 0.462 |
| SACOL0644 | SACOL0643 | 0.859 |
| SACOL0350 | SACOL0351 | 0.859 |
| SACOL2123 | SACOL2125 | 0.422 |
| SACOL2124 | 0.422 | |
| SACOL1956 | SACOL2519 | 0.685 |
| SACOL0267 | SACOL0266 | 0.694 |
| SACOL0488 | SACOL0487 | 0.859 |
| SACOL0486 | 0.859 | |
| SACOL0710 | SACOL0709 | 0.57 |
| SACOL0708 | 0.57 |
CDD-BLAST domain data for 7 hypothetical proteins
| Protein | Domains | Description | E-value |
| SACOL1859 | pfam13401 | AAA | 1.61E-04 |
| smart00382 | ATPase | 7.59E-03 | |
| SACOL0644 | pfam04276 | Protein of unknown function (DUF443) | 1.03E-37 |
| SACOL0350 | pfam07768 | PVL ORF-50-like family | 8.79E-47 |
| SACOL0835 | pfam16228 | Domain of unknown function (DUF4887) | 1.18E-12 |
| SACOL2123 | pfam11042 | Protein of unknown function (DUF2750) | 2.38E-20 |
| SACOL1956 | pfam11667 | Putative zincin peptidase | 4.49E-08 |
| SACOL0488 | pfam13654 | AAA | 5.17E-03 |
Pfam domain data for 5 hypothetical proteins
| Protein | Domain | Description | E-value |
| SACOL0644 | DUF443 | Unknown function | 9.80E-56 |
| SACOL0350 | PVL_ORF50 | Panton-Valentine leucocidin ORF-50- like family | 2.80E-45 |
| SACOL0835 | DUF4887 | Unknown function | 1.70E-50 |
| SACOL2123 | DUF2750 | Unknown function | 2.10E-21 |
| SACOL1956 | DUF3267 | Putative zincin peptidase | 1.60E-18 |