| Literature DB >> 20384986 |
Chilamakuri Cs Reddy1, Sane Sudha Rani, Bernard Offmann, R Sowdhamini.
Abstract
BACKGROUND: Protein domains are the fundamental units of protein structure, function and evolution. The delineation of different domains in proteins is important for classification, understanding of structure, function and evolution. The delineation of protein domains within a polypeptide chain, namely at the genome scale, can be achieved in several ways but may remain problematic in many instances. Difficulties in identifying the domain content of a given sequence arise when the query sequence has no homologues with experimentally determined structure and searching against sequence domain databases also results in insignificant matches. Identification of domains under low sequence identity conditions and lack of structural homologues acquire a crucial importance especially at the genomic scale.Entities:
Year: 2010 PMID: 20384986 PMCID: PMC2865477 DOI: 10.1186/1756-0500-3-98
Source DB: PubMed Journal: BMC Res Notes ISSN: 1756-0500
Figure 1Methodology of the approach and statistics. Initially, 726 protein sequences were considered from Mycoplasma gallisepticum genome, which have 620 unassigned regions of different lengths. 434 unassigned regions are at least 70 residues long. Out of 434, only 364 passed through transmembrane and coiled coil filtering and 359 sequences after secondary structure filtering. The remaining unassigned regions (359) sequences were subject to PSI-BLAST searches, but only 230 unassigned regions picked up at least two hits. We extracted full-length sequences for each hit in PSI-BLAST and used for HMMpfam search. Here again, only 62 unassigned regions were associated indirectly with pre-existing domains which correspond to 48 different domain families.
Newly predicted domains in the Mycoplasma gallisepticum genome.
| Domain Name | Description | Full | Partial | |
|---|---|---|---|---|
| 1. | AAA | ATPase family associated with various cellular activities | NP_853502.1 = 9-182 | - |
| 2. | Anticodon_1 | Anticodon-binding domain. This domain is found valyl and leucyl tRNA synthetases. It binds to the anticodon of the tRNA. | NP_852939.1 = 397-590 | - |
| 3. | ATP-synt_ab_C | ATP synthase alpha/beta chain, C terminal domain. | NP_853478.1 = 140-221 | - |
| 4. | ATP-synt_ab_N | ATP synthase alpha/beta family, beta-barrel domain | NP_853438.1 = 4-126 | - |
| 5. | BPD_transp_1 | Binding-protein-dependent transport system inner membrane component. | NP_853029.1 = 53-260 | - |
| 6. | CHASE-3 # | Cyclases/Histidine kinases Associated Sensory Extracellular) present in diverse receptor-like proteins with histidine kinase and nucleotide cyclase domains | NP_853387.1 = 55-582 | - |
| 7. | DUF-30 | Domain of Unknown Function 30 | NP_853479.1 = 370-770 | - |
| 8. | DUF-31 | Domain of Unknown Function 31 | NP_853440.1 = 220-317 | - |
| 9. | LMP $ | LMP repeated region. Found in the LMP group of surface-located membrane proteins of Mycoplasma hominis. | NP_853333.1 = 1260-1320, 1420-1580, 1600-1760 | - |
| 10. | Ferritin $ | Ferritin-like domain is one of the major non-haem iron storage proteins in animals, plants, and microorganisms | NP_852976.1 = 5-143 | - |
| 11. | GMP_synt_C $ | GMP synthase C terminal domain. | NP_852801.1 = 220-275 | - |
| 12. | Helicase_C | Helicase conserved C-terminal domain. Found in a wide variety of helicases and helicase related proteins. | NP_852813.1 = 440-530 | - |
| 13. | HGTP_anticodon | Anticodon binding domain. tRNA synthetases, or tRNA ligases are involved in protein synthesis. This domain is found in histidyl, glycyl, threonyl and prolyl tRNA synthetases. | NP_852966.1 = 342-423 | - |
| 14. | HHH # | The helix-hairpin-helix DNA-binding motif is found to be duplicated in the central domain of RuvA. | NP_853482.1 = 589-619 | - |
| 15. | HTH_11 # | Helix-turn-helix domain present in a wide variety of proteins. | NP_853136.1 = 28-73 | - |
| 16. | HTH_12 # | Ribonuclease R winged-helix domain. Found found at the amino terminus of Ribonuclease R and a number of presumed transcriptional regulatory proteins from archaea. | NP_853240.1 = 38-89 | - |
| 17. | S1 | The S1 domain occurs in a wide range of RNA associated proteins. It is structurally similar to cold shock protein which binds nucleic acids. The S1 domain has an OB-fold structure. | NP_852895.1 = 140-210 | - |
| 18. | KH_1 | The K homology (KH) domain was first identified in the human heterogeneous nuclear ribonucleoprotein (hnRNP) K. It is a domain of around 70 amino acids that is present in a wide variety of quite diverse nucleic acid-binding proteins. | NP_852895.1 = 333-393 | - |
| 19. | Lactamase_B | Metallo-beta-lactamase superfamily. | NP_852865.1 = 40-248 | - |
| 20. | RMMBL | RNA-metabolising metallo-beta-lactamase. | NP_852865.1 = 320-360 | - |
| 21. | Lipoprotein_17 | Lipoprotein associated domain. | NP_852799.1 = 49-160 | - |
| 22. | MatE | Multi Antimicrobial Extrusion (MATE) family function as drug/sodium antiporters. | NP_853011.1 = 364-530 | - |
| 23. | Methyltransf_3 $ | O-methyltransferase | NP_852906.1 = 7-185 | - |
| 24. | MFS_1 $ | Major Facilitator Superfamily | NP_852970.1 = 480-922 | - |
| 25. | NusB $ | The NusB protein is involved in the regulation of rRNA biosynthesis by transcriptional antitermination. | NP_853291.1 = 13-130 | - |
| 26. | Peptidase_M23 $ | Peptidase family M23 | NP_853190.1 = 484-657 | - |
| 27. | PGM_PMM_IV $ | Phosphoglucomutase/phosphomannomutae, C-terminal domain | NP_853364.1 = 481-550 | - |
| 28. | PTS_EIIB | phosphotransferase system, EIIB | NP_853326.1 = 47-85 | - |
| 29. | SBP_bac_1 $ | Bacterial extracellular solute-binding protein | NP_852821.1 = 1-385 | NP_852814.1 = 6-181 |
| 30. | Sigma70_r1_1 # | Sigma-70 factor, region 1.1. | NP_853171.1 = 288-342 | - |
| 31. | Sigma70_r1_2 $ | Sigma-70 factor, region 1.2 | NP_853171.1 = 357-398 | - |
| 32. | Sigma70_r4_2 $ | Sigma-70, region 4 | NP_852863.1 = 120-170 | - |
| 33. | TGS $ | ThrRS, GTPase, and SpoT domain. | NP_852968.1 = 417-487 | - |
| 34. | THUMP $ | thiouridine synthases, methylases and PSUSs domain. | NP_853282.1 = 78-170 | - |
| 35. | Transketolase_C | The C-terminal domain of transketolase has been proposed as a regulatory molecule binding site | NP_852812.1 = 530-641 | - |
| 36. | tRNA_anti | OB-fold nucleic acid binding domain | NP_852876.1 = 230-310 | - |
| 37. | VapD_N # | Virulence-associated protein D | NP_853458.1 = 7-49 | - |
| 38. | Lipoprotein_X | Mycoplasma MG185/MG260 protein. | - | NP_852988.1 = 247-404 |
| 39. | Lipoprotein_10 | Putative mycoplasma lipoprotein, C-terminal region | NP_852988.1 = 444-563 | - |
| 40. | DEAD | Members of this family include the DEAD and DEAH box helicases. Helicases are involved in unwinding nucleic acids. | - | NP_852877.1 = 596-722 |
| 41. | ABC_membrane | ABC transporter transmembrane region. | - | NP_852786.1 = 2-126 |
| 42. | ABC_tran | ABC transporter | - | NP_853051.1 = 317-467 |
| 43. | DUF258 | Domain of Unknown Function 258 | - | NP_853404.1 = 7-104 |
| 44. | GTP_EFTU | - | NP_853200.1 = 68-151 | |
| 45. | RecO # | Recombination protein O | - | NP_853174.1 = 1-74 |
| 46. | SBP_bac_5 $ | - | NP_853298.1 = 461-889 | |
| 47. | Transposase_mut | Transposase, Mutator family | - | NP_852891.1 = 6-108 |
| 48. | HNH $ | HNH endonuclease | NP_853456.1 = 650-708 | - |
Among probable new domains, some of the them are first time identified in Mycoplasma gallisepticum genome (indicated by $ or #). Among those unique domains, some are not even present in other Mycoplasmas (indicated by #), while some of them are present in other Mycoplasmas (indicated by $). Second column indicates the name of the domain, third column a brief description about the domain and fourth and fifth columns indicate the kind of association which may be full association (we refer 'fully associated' when at least 75% of unassigned region is indirectly aligned with domain region) or partial association (we call partial associated when at most 75% of unassigned region is indirectly aligned with domain region). Under full and partial columns, we indicate the protein id and start and end residue number of probable new domain.
# Unique domain, this domain was not presently identified in entire Mycoplasmataceae family
$ Unique domain but present in other Mycoplasmataceae family members
Figure 2Multiple sequence alignment of unassigned region (NP_853398.1.-242-329) and its homologues obtained in PSI-BLAST search. Unassigned region indicated by '*' mark and consensus sequence is shown on the top of the alignment. Species name is given along with sequence ids. The highly conserved GGGIG motif is highlighted.
Figure 3Phylogenetic tree of homologues obtained in the PSI-BLAST search. Domain architecture is shown on the top-right. All the homologues have identical domain architecture with amino-terminal AsnA domain. The mode of deriving phylogenetic trees is as described in Methods.
Figure 4Multiple sequence alignment of unassigned region NP_853462.1.1-265 indicated by '*' in the alignment and its homologues obtained in the PSI-BLAST search. Consensus sequence showed on the top the alignment and the species names given along with sequence ids.
Figure 5Multiple sequence alignment of unassigned region NP_852844.1.39-109, (indicated by '*' in the alignment) and its homologues obtained in the PSI-BLAST search. Consensus sequence is shown on the top of the alignment.
Comparison of PURE predicted domains with CDD predicted domains
| S.No | Seq id | PURE | CDD |
|---|---|---|---|
| 1 | NP_852935.1 | antocodon_1 - 55-218 | MetG - 10-199 |
| 2 | NP_853333.1 | LMP - 1260-1320 | SbcC - 1253-1856 |
| 3 | NP_853467.1 | Helicase_C - 660-730 | Type I site-specific restriction-modification system- 2-1018 |
| 4 | NP_852865.1 | Lactamase_B - 40-248 | mRNA degradation ribonucleases-23-594 |
| 5 | NP_853011.1 | MatE - 364-530 | NorM - 127-558 |
| 6 | NP_852970.1 | MFS_1 - 480-992 | SecD - 359-574 |
| 7 | NP_853282.1 | THUMP - 78-170 | PseudoU_synth - 112-167 |
| 8 | NP_853404.1 | DUF_258 - 7-104 | YlqF - 12-174 |
| 9 | NP_853458.1 | VapD_N | |
Sequence id in the second column, PURE predicted domains in the third column, domain name followed by starting and ending residue of the domain shown. In the last column CDD predicted domains name of the domain followed by domain range shown.