| Literature DB >> 23516388 |
Pier Luigi Buttigieg1, Wolfgang Hankeln, Ivaylo Kostadinov, Renzo Kottmann, Pelin Yilmaz, Melissa Beth Duhaime, Frank Oliver Glöckner.
Abstract
BACKGROUND: The proportion of conserved DNA sequences with no clear function is steadily growing in bioinformatics databases. Studies of sequence and structural homology have indicated that many uncharacterized protein domain sequences are variants of functionally described domains. If these variants promote an organism's ecological fitness, they are likely to be conserved in the genome of its progeny and the population at large. The genetic composition of microbial communities in their native ecosystems is accessible through metagenomics. We hypothesize the co-variation of protein domain sequences across metagenomes from similar ecosystems will provide insights into their potential roles and aid further investigation. METHODOLOGY/PRINCIPALEntities:
Mesh:
Substances:
Year: 2013 PMID: 23516388 PMCID: PMC3597751 DOI: 10.1371/journal.pone.0050869
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
DUFs* exhibiting bias in correlative associations to metabolic categories (unstandardized data).
| DUF | Primary category | Secondary category (fraction of primary category) |
| DUF59 | TransR | AA (0.44) |
| DUF3429 | AA (0.44) | |
| DUF805 | AA (0.48) | |
| DUF140 | CoE (0.45) | |
| DUF354 | CoE (0.5) | |
| DUF151 | RRR (0.43) | |
| DUF192 | RRR (0.44) | |
| DUF87 | RRR (0.46) | |
| DUF37 | Transcr (0.50) | |
| DUF1730 | None | |
| DUF2805 | None | |
| DUF3159 | AA | Carb, E, Lip, Nuc (0.50) |
| DUF521 | CoE, E, TransR (0.50) | |
| DUF1329 | None | |
| DUF208 | None | |
| DUF2141 | None | |
| DUF2899 | None | |
| DUF403 | None | |
| DUF407 | None | |
| DUF490 | CoE | CWME, E, Photo, Sec (0.50) |
| DUF1499 | None | |
| DUF1820 | None | |
| DUF92 | None | |
| DUF2062 | E | AA, Carb, CoE, RRR (0.50) |
| DUF137 | None | |
| DUF74 | None | |
| DUF2130 | Carb | CoE, Photo (0.50) |
| DUF897 | None | |
| DUF1445 | Lip | None |
| DUF501 | None | |
| DUF1732 | PostModChaps | Carb, Nuc (0.33) |
| DUF1385 | None | |
| DUF1008 | Ion | None |
| DUF88 | Nuc | None |
| DUF2334 | RRR | None |
| DUF429 | Sec | None |
| DUF1907 | Sig | None |
| DUF2779 | Transcr | CWME, PostModChaps (0.50) |
DUFs with primary connectivity to photobiology domains (n = 56) present in .
AA: Amino acid transport and metabolism.
Carb: Carbohydrate transport and metabolism.
CellDiv: Cell cycle control, cell division, chromosome partitioning.
CoE: Coenzyme transport and metabolism.
CWME: Cell wall, membrane, and envelope biogenesis.
Def: Defence mechanisms.
E: Energy production and conversion.
Ion: Inorganic ion transport and metabolism.
Lip: Lipid transport and metabolism.
Nuc: Nucleotide transport and metabolism.
Photo: Photobiology.
PostModChaps: Posttranslational modification, protein turnover, chaperones.
RRR: Replication, recombination, and repair.
Sec: Secondary metabolite biosynthesis, transport, and catabolism.
Sig: Signal transduction mechanisms.
Transcr: Transcription.
TransR: Translation, ribosomal structure and biogenesis.
Figure 1Force-directed, spring-embedded network visualization of pairwise correlations between Pfam domain abundances across selected GOS metagenomes.
Nodes represent Pfam domains and edges correlations greater than a Spearman's rho of 0.80. Shorter edges indicate stronger correlations. A large network with two enmeshed regions (Box 1 and 2) bridged by a small number of nodes (Box 4) dominates the graph. Several small networks of functionally related nodes are also present (Box 5, inset). Node colors represent functional categories; refer to for description. See text for detailed descriptions.
Pfam domains contained in a prominent spoke of the UM-derived association network (Figure 1, Box 3).
| Category | Pfam ID | Pfam comment (abridged) |
| CWME | LpxD | UDP-3-O-[3-hydroxymyristoyl] glucosamine N-acyltransferase catalyses an early step in lipid A biosynthesis. Members of this family also contain a hexapeptide repeat (Pfam:PF00132). This family constitutes the non-repeating region of LPXD proteins. |
| CoE | Porphobil_deamC | – |
| NA | DUF2805 | This is a bacterial family of proteins with unknown function. |
| DUF37 | This domain is found in short (70 amino acid) hypothetical proteins from various bacteria. The domain contains three conserved cysteine residues. Swiss:Q44066 from Aeromonas hydrophila has been found to have hemolytic activity (unpublished). | |
| PostModChaps | NifU | This is an alignment of the carboxy-terminal domain. This is the only common region between the NifU protein from nitrogen-fixing bacteria and rhodobacterial species. The biochemical function of NifU is unknown. |
| SmpB | – | |
| Bac_DnaA_C | – | |
| DNA_gyraseB_C | The amino terminus of eukaryotic and prokaryotic DNA topoisomerase II are similar, but they have a different carboxyl terminus. The amino-terminal portion of the DNA gyrase B protein is thought to catalyse the ATP-dependent super-coiling of DNA. See Pfam:PF00204. The carboxyl-terminal end supports the complexation with the DNA gyrase A protein and the ATP-independent relaxation. This family also contains Topoisomerase IV. This is a bacterial enzyme that is closely related to DNA gyrase. | |
| Transcr | RNA_pol_Rpb2_3 | RNA polymerases catalyse the DNA dependent polymerisation of RNA. Prokaryotes contain a single RNA polymerase compared to three in eukaryotes (not including mitochondrial. and chloroplast polymerases). Domain 3, s also known as the fork domain and is proximal to catalytic site. |
| Sigma70_r2 | Region 2 of sigma-70 is the most conserved region of the entire protein. All members of this class of sigma-factor contain region 2. The high conservation is due to region 2 containing both the −10 promoter recognition helix and the primary core RNA polymerase binding determinant. The core binding helix, interacts with the clamp domain of the largest polymerase subunit, beta prime. The aromatic residues of the recognition helix, found at the C-terminus of this domain are though to mediate strand separation, thereby allowing transcription initiation. | |
| TransR | B5 | This domain is found in phenylalanine-tRNA synthetase beta subunits. |
| Glu-tRNAGln | This is a family of Glu-tRNAGln amidotransferase C subunits. The Glu-tRNA Gln amidotransferase enzyme itself is an important translational fidelity mechanism replacing incorrectly charged Glu-tRNAGln with the correct Gln-tRANGln via transmidation of the misacylated Glu-tRNAGln. This activity supplements the lack of glutaminyl-tRNA synthetase activity in gram-positive eubacterteria, cyanobacteria, Archaea, and organelles. | |
| Phe_tRNA-synt_N | – | |
| Ribosomal_L11 | – | |
| Ribosomal_L12 | – | |
| Ribosomal_S17 | – | |
| Ribosomal_S5 | – | |
| RNase_PH_C | This family includes 3′-5′ exoribonucleases. Ribonuclease PH contains a single copy of this domain, and removes nucleotide residues following the -CCA terminus of tRNA. Polyribonucleotide nucleotidyltransferase (PNPase) contains two tandem copies of the domain. PNPase is involved in mRNA degradation in a 3′-5′ direction. The exosome is a 3′-5′ exoribonuclease complex that is required for 3′ processing of the 5.8S rRNA. Three of its five protein components, Swiss:P46948 Swiss:Q12277 and Swiss:P25359 contain a copy of this domain. Swiss:Q10205, a hypothetical protein from S. pombe appears to belong to an uncharacterised subfamily. This subfamily is found in both eukaryotes and archaebacteria. |
Refer to Table 1, footnote for list of abbreviations.
Figure 2Force-directed, spring-embedded network visualization of pairwise correlations between standardized Pfam domain abundances across selected GOS metagenomes.
Abundances were standardized by site maxima. Nodes represent Pfam domains and edges correlations greater than a Spearman's rho of 0.80. Shorter edges indicate stronger correlations. The largest network (i) is dominated by DUFs and domains linked to photobiological processes. Domains linked to urea metabolism were also present in this network (hollow arrowheads). Smaller networks featured domains linked to translation (ii), phosphonate metabolism (iii), and cyanophage activity (iv). Numerous pairs of functionally related domains were also present. Node colors represent functional categories; refer to for description. See text for detailed descriptions.
Figure 3Transitivity clusters derived from correlations of Pfam abundances across selected GOS metagenomes (unstandardized data).
Edge-weights (correlations) determine the cost of adding or removing edges during clustering. We observed clusters with domains linked to photobiology; oligotrophic adaptations; DNA maintenance and repair; and iron supply. Node colors represent functional categories; refer to for description. See text and , , and for detailed descriptions.
Figure 4Transitivity clusters derived from correlative associations of Pfam domains across GOS metagenomes (standardized data).
Edge-weights (correlations) determine the cost of adding or removing edges during clustering. The largest cluster contained DUFs and domains linked to photobiology (). Node colors represent functional categories; refer to for description. See text for detailed descriptions.
Pfam domains contained in transitivity clusters putatively linked to nutrient-limitation (unstandardized data).
| Cluster | Category | Pfam ID | Pfam Comment (abridged) |
| TC4 | AA | Alliinase_C | Allicin is a thiosulphinate that gives rise to dithiines, allyl sulphides and ajoenes, the three groups of active compounds in Allium species. Allicin is synthesised from sulfoxide cysteine derivatives by alliinase, whose C-S lyase activity cleaves C(beta)-S(gamma) bonds. It is thought that this enzyme forms part of a primitive plant defence system. |
| CM_2 | Chorismate mutase catalyses the conversion of chorismate to prephenate in the pathway of tyrosine and phenylalanine biosynthesis. This enzyme is negatively regulated by tyrosine, tryptophan and phenylalanine. | ||
| CoE | MoaC | Members of this family are involved in molybdenum cofactor biosynthesis. However their molecular function is not known. | |
| Mob_synth_C | This region contains two iron-sulphur (3Fe-4S) binding sites. | ||
| ThiS | ThiS (thiaminS) is a 66 aa protein involved in sulphur transfer. Thiocarboxylate is formed at the last G in the activation process. Sulphur is transferred from ThiI to ThiS in a reaction catalysed by IscS. MoaD, Swiss:P30748 a protein involved sulphur transfer in molybdopterin synthesis, is about the same length and shows limited sequence similarity to ThiS. | ||
| E | FdhD-NarQ | Nitrate assimilation protein, NarQ, and FdhD are required for formate dehydrogenase activity. | |
| Ion | FMO-like | This family includes FMO proteins, cyclohexanone monooxygenase Swiss:P12015, and Swiss:Q10532. | |
| NA | DUF3108 | This bacterial family of proteins has no known function. | |
| DUF328 | Members of this family are functionally uncharacterised. They are about 250 amino acids in length. | ||
| DUF3501 | This family of proteins is functionally uncharacterised. This protein is found in bacteria and archaea. | ||
| Nuc | Ureidogly_hydro | Ureidoglycolate hydrolase carried out the third step in the degradation of allantoin. | |
| Transcr | NIF | This family contains a number of NLI interacting factor isoforms and also N-terminal regions of RNA polymerase II CTC phosphatase and FCP1 serine phosphatase. This region has been identified as the minimal phosphatase domain. | |
| Sigma70_ner | The domain is found in the primary vegetative sigma factor. The function of this domain is unclear and can be removed without loss of function. | ||
| TC12 | NA | DUF2155 | This domain, found in various hypothetical prokaryotic proteins, has no known function. |
| DUF3047 | This bacterial family of proteins has no known function. | ||
| Nuc | Allantoicase | These proteins allow the use of purines as secondary nitrogen sources in nitrogen-limiting conditions. | |
| PostModChaps | DS | Eukaryotic initiation factor 5A (eIF-5A) contains an unusual amino acid, hypusine. The first step in the post-translational formation of hypusine is catalysed by the enzyme deoxyhypusine synthase (DS). The modified version of eIF-5A, and DS, are required for eukaryotic cell proliferation. | |
| Sig | HPP | These proteins are integral membrane proteins with four transmembrane spanning helices. The most conserved region of the alignment is a motif HPP. The function of these proteins is uncertain but they may be transporters. | |
| TC19 | CWME | OstA_C | Family involved in organic solvent tolerance in bacteria. |
| PostModChaps | GlnD_UR_UTase | This is a family of bifunctional uridylyl-removing enzymes/uridylyltransferases (UR/UTases, GlnD) that are responsible for the modification of the regulatory protein P-II, or GlnB. In response to nitrogen limitation, these transferases catalyse the uridylylation of the PII protein, which in turn stimulates deadenylylation of glutamine synthetase (GlnA). Moreover, uridylylated PII can act together with NtrB and NtrC to increase transcription of genes in the sigma54 regulon, which include glnA and other nitrogen-level controlled genes. It has also been suggested that the product of the glnD gene is involved in other physiological functions such as control of iron metabolism in certain species. | |
| Transcr | IclR | This family of bacterial transcriptional regulators includes the glycerol operon regulatory protein and acetate operon repressor both of which are members of the iclR family. However this family covers the C-terminal region that may bind to the regulatory substrate (unpublished observation, Bateman A.). | |
| FCD | This domain is the C-terminal ligand binding domain of many members of the GntR family. This domain probably binds to a range of effector molecules that regulate the transcription of genes through the action of the N-terminal DNA-binding domain. This domain is found in Swiss:P45427 and Swiss:P31460 that are regulators of sugar biosynthesis operons. |
Refer to Table 1, footnote for list of abbreviations.
Pfam domains contained in transitivity clusters putatively linked to DNA maintenance and repair (unstandardized data).
| Cluster | Category | Pfam ID | Pfam Comment |
| TC5 | AA | ELFV_dehydrog_N | – |
| Pro_dh | – | ||
| CoE | DHFR_1 | – | |
| ApbE | This prokaryotic family of lipoproteins are related to ApbE from Salmonella typhimurium. ApbE is involved in thiamine synthesis. More specifically is may be involved in the conversion of aminoimidazole ribotide (AIR) to 4-amino-5-hydroxymethyl-2-methyl pyrimidine (HMP). | ||
| NA | DUF1800 | This is a family of large bacterial proteins of unknown function. | |
| DUF177 | – | ||
| Nuc | ASL_C | This domain is found at the C-terminus of adenylosuccinate lyase(ASL; PurB in E. coli). It has been identified in bacteria, eukaryotes and archaea and is found together with the lyase domain Pfam:PF00206. ASL catalyses the cleavage of succinylaminoimidazole carboxamide ribotide to aminoimidazole carboxamide ribotide and fumarate and the cleavage of adenylosuccinate to adenylate and fumarate. | |
| Thymidylat_synt | Swiss:P28176 is not included as a member of this family, Although annotated as such there is no significant sequence similarity to other members. | ||
| TransR | tRNA-synt_1c_C | Other tRNA synthetase sub-families are too dissimilar to be included. This family includes only glutamyl and glutaminyl tRNA synthetases. In some organisms, a single glutamyl-tRNA synthetase aminoacylates both tRNA(Glu) and tRNA(Gln). | |
| TC39 | E | Pyrophosphatase | – |
| NA | DUF836 | – | |
| RRR | Exonuc_V_gamma | The Exodeoxyribonuclease V enzyme is a multi-subunit enzyme comprised of the proteins RecB, RecC (this family) and RecD. This enzyme plays an important role in homologous genetic recombination, repair of double strand DNA breaks resistance to UV irradiation and chemical DNA-damage. The enzyme (EC:3.1.11.5) catalyses ssDNA or dsDNA-dependent ATP hydrolysis, hydrolysis of ssDNA or dsDNA and unwinding of dsDNA. This family consists of two AAA domains. |
Refer to Table 1, footnote for list of abbreviations.
Pfam domains contained in transitivity clusters putatively linked to iron supply and utilization (unstandardized data).
| Cluster | Category | Pfam ID | Pfam Comment |
| TC23 | CWME | TonB | – |
| Ion | Sod_Fe_N | superoxide dismutases (SODs) catalyse the conversion of superoxide radicals to hydrogen peroxide and molecular oxygen. Three evolutionarily distinct families of SODs are known, of which the Mn/Fe-binding family is one. | |
| Sod_Fe_C | superoxide dismutases (SODs) catalyse the conversion of superoxide radicals to hydrogen peroxide and molecular oxygen. Three evolutionarily distinct families of SODs are known, of which the Mn/Fe-binding family is one. | ||
| NA | DUF255 | – | |
| TC24 | E | POR | This family includes a region of the large protein pyruvate-flavodoxin oxidoreductase and the whole pyruvate ferredoxin oxidoreductase gamma subunit protein. It is not known whether the gamma subunit has a catalytic or regulatory role. Pyruvate oxidoreductase (POR) catalyses the final step in the fermentation of carbohydrates in anaerobic microorganisms. This involves the oxidative decarboxylation of pyruvate with the participation of thiamine followed by the transfer of an acetyl moiety to coenzyme A for the synthesis of acetyl-CoA. The family also includes pyruvate flavodoxin oxidoreductase as encoded by the nifJ gene in cyanobacterium which is required for growth on molecular nitrogen when iron is limited. |
| POR_N | This family includes the N terminal structural domain of the pyruvate ferredoxin oxidoreductase. This domain binds thiamine diphosphate, and along with domains II and IV, is involved in inter subunit contacts. The family also includes pyruvate flavodoxin oxidoreductase as encoded by the nifJ gene in cyanobacterium which is required for growth on molecular nitrogen when iron is limited. | ||
| Ion | Peripla_BP_2 | “This family includes bacterial periplasmic binding proteins. Several of which are involved in iron transport.” | |
| NA | DUF58 | This family of prokaryotic proteins have no known function. Swiss:P71138 a protein of unknown function in the family has been misannotated as alpha-dextrin 6-glucanohydrolase. |
Refer to Table 1, footnote for list of abbreviations.
Phylum-level* taxonomic distribution of DUFs in selected transitivity clusters (unstandardized data).
| Sample | n(DUF) | Phylum | Instances | % of total instances |
|
| 225 |
| 73177 | 47.83 |
|
| 26904 | 17.59 | ||
|
| 15128 | 9.89 | ||
|
| 7798 | 5.10 | ||
|
| 30 |
| 21877 | 46.11 |
|
| 10783 | 22.73 | ||
|
| 4646 | 9.79 | ||
|
| 28 |
| 2206 | 54.48 |
|
| 820 | 20.25 | ||
|
| 588 | 14.52 | ||
|
| 240 | 5.93 | ||
|
| 8 |
| 2199 | 77.32 |
|
| 202 | 7.10 | ||
|
| 145 | 5.10 | ||
|
| 3 |
| 1590 | 62.62 |
|
| 363 | 14.30 | ||
|
| 289 | 11.38 | ||
|
| 160 | 6.30 | ||
|
| 5 |
| 362 | 64.07 |
|
| 155 | 27.43 | ||
|
| 3 |
| 2027 | 35.42 |
|
| 1140 | 19.92 | ||
|
| 1019 | 17.81 | ||
|
| 971 | 16.97 | ||
|
| 2 |
| 1021 | 98.74 |
|
| 2 |
| 317 | 94.63 |
|
| 4 |
| 266 | 92.68 |
|
| 15 | 5.23 | ||
|
| 2 |
| 355 | 64.20 |
|
| 163 | 29.48 | ||
|
| 30 | 5.42 | ||
|
| 2 |
| 1614 | 79.16 |
|
| 120 | 5.89 | ||
|
| 3 |
| 1363 | 67.24 |
|
| 303 | 14.95 | ||
|
| 126 | 6.22 | ||
|
| 4 |
| 1708 | 71.73 |
|
| 218 | 9.16 | ||
|
| 159 | 6.68 | ||
|
| 143 | 6.01 | ||
|
| 3 |
| 1480 | 76.21 |
|
| 202 | 10.40 | ||
|
| 193 | 9.94 |
Only Phyla with >5% of DUF instances are shown.
Phylum-level* taxonomic distribution of DUFs in selected transitivity clusters (standardized data).
| Sample | n(DUF) | Phylum | Instances | % of total instances |
|
| 75 |
| 5575 | 32.31 |
|
| 4845 | 28.08 | ||
|
| 1946 | 11.28 | ||
|
| 1505 | 8.72 | ||
|
| 1026 | 5.95 | ||
|
| 28 |
| 2290 | 54.48 |
|
| 760 | 18.08 | ||
|
| 600 | 14.28 | ||
|
| 244 | 5.81 | ||
|
| 6 |
| 338 | 66.14 |
|
| 73 | 14.29 | ||
|
| 52 | 10.18 | ||
|
| 29 | 5.68 | ||
|
| 2 |
| 125 | 65.79 |
|
| 30 | 15.79 | ||
|
| 24 | 12.63 | ||
|
| 2 |
| 109 | 37.59 |
|
| 91 | 31.38 | ||
|
| 77 | 26.55 | ||
|
| 2 |
| 109 | 37.59 |
|
| 91 | 31.38 | ||
|
| 77 | 26.55 | ||
|
| 3 |
| 199 | 70.32 |
|
| 41 | 14.49 | ||
|
| 19 | 6.71 | ||
|
| 2 |
| 89 | 28.34 |
|
| 69 | 21.97 | ||
|
| 63 | 20.06 | ||
|
| 34 | 10.83 | ||
|
| 24 | 7.64 | ||
|
| 2 |
| 121 | 51.05 |
|
| 71 | 29.96 | ||
|
| 24 | 10.13 | ||
|
| 2 |
| 113 | 39.37 |
|
| 53 | 18.47 | ||
|
| 42 | 14.63 | ||
|
| 30 | 10.45 | ||
|
| 16 | 5.57 |
Only Phyla with >5% of DUF instances are shown.