Literature DB >> 32350143

The Great Oxidation Event expanded the genetic repertoire of arsenic metabolism and cycling.

Song-Can Chen1,2,3, Guo-Xin Sun1, Yu Yan4, Konstantinos T Konstantinidis5,6, Si-Yu Zhang5, Ye Deng1, Xiao-Min Li1,3, Hui-Ling Cui1,3, Florin Musat2, Denny Popp7, Barry P Rosen8, Yong-Guan Zhu9,10.   

Abstract

The rise of oxygen on the early Earth about 2.4 billion years ago reorganized the redox cycle of harmful metal(loids), including that of arsenic, which doubtlessly imposed substantial barriers to the physiology and diversification of life. Evaluating the adaptive biological responses to these environmental challenges is inherently difficult because of the paucity of fossil records. Here we applied molecular clock analyses to 13 gene families participating in principal pathways of arsenic resistance and cycling, to explore the nature of early arsenic biogeocycles and decipher feedbacks associated with planetary oxygenation. Our results reveal the advent of nascent arsenic resistance systems under the anoxic environment predating the Great Oxidation Event (GOE), with the primary function of detoxifying reduced arsenic compounds that were abundant in Archean environments. To cope with the increased toxicity of oxidized arsenic species that occurred as oxygen built up in Earth's atmosphere, we found that parts of preexisting detoxification systems for trivalent arsenicals were merged with newly emerged pathways that originated via convergent evolution. Further expansion of arsenic resistance systems was made feasible by incorporation of oxygen-dependent enzymatic pathways into the detoxification network. These genetic innovations, together with adaptive responses to other redox-sensitive metals, provided organisms with novel mechanisms for adaption to changes in global biogeocycles that emerged as a consequence of the GOE.
Copyright © 2020 the Author(s). Published by PNAS.

Entities:  

Keywords:  arsenic; biogeochemistry; detoxification; evolution; oxygen

Year:  2020        PMID: 32350143      PMCID: PMC7229686          DOI: 10.1073/pnas.2001063117

Source DB:  PubMed          Journal:  Proc Natl Acad Sci U S A        ISSN: 0027-8424            Impact factor:   11.205


One of life’s earliest challenges was coping with the toxicity of harmful metal(loids) (1). Understanding the nature and timing of the onset of protective mechanisms is essential for the study of early evolution of Earth and life, yet limited information is available. Arsenic is the most ubiquitous toxic metalloid in nature, with two biologically relevant oxidation states: trivalent arsenite and pentavalent arsenate. Arsenite is generally more toxic than arsenate, and perturbs the physiology of prokaryotes at micromolar levels (2, 3). Relatively high amounts (>20 μM) of dissolved arsenic are nowadays frequently found in oceanic hydrothermal vents or hot springs, environments that may have conditions analogous to similar niches of primordial Earth. For this reason, resistance pathways for transport and biotransformation of arsenic are believed to have emerged early in the evolution of life on Earth (4–6). Environmentally, the rise of atmospheric oxygen during the Great Oxidation Event (GOE) ∼2.4 billion years ago (Bya) is thought to have fundamentally changed arsenic chemistry in the Earth’s surface and oceans (2, 7). Prior to the GOE, reduced arsenic species (i.e., arsenite) would have predominated over oxidized arsenics (i.e., arsenate) because the atmosphere and oceans were anoxic and reducing (4, 6, 8). Continental weathering of arsenic at this time is negligible under an atmosphere with very low oxygen levels (<<0.001% compared with present atmospheric level) (9). The rise of atmospheric oxygen (∼1% of present atmospheric levels) during the GOE between 2.4 and 2.3 Bya most likely led to intense oxidative weathering of arsenic-bearing minerals that liberated continental arsenic, predominantly as arsenate, for delivery to oceans from rivers (3, 10). These processes would have resulted in the widespread appearance of oxidized arsenic species in the environment. We hypothesized that these dramatic shifts in the redox state of arsenicals and their bioavailability imposed a strong selective pressure on ancient microorganisms toward acquisition of novel enzymatic systems conferring arsenic resistance. Current microbial fossil records lack the power to resolve the timing and causes of the origin of these tolerance and detoxification mechanisms. Molecular and genetic studies have identified many arsenic resistance (ars) genes in extant organisms (). These include efflux permeases, redox enzymes, methyltransferases, and transcriptional repressors. Arsenite efflux is catalyzed by two evolutionarily unrelated groups of arsenite efflux permeases: ArsB and Acr3 (11). Arsenate detoxification is catalyzed by reductases (ArsC), with homology to the glutaredoxin family (ArsC1), to low-molecular-weight phosphatases (ArsC2), or by members of the CDC25 family of dual-specific phosphatases (Acr2), respectively (12). These enzymes reduce intracellular arsenate to arsenite, the substrate of the two arsenite efflux permeases. Additionally, arsenite can be methylated by ArsM, an arsenite S-adenosylmethionine (SAM) methyltransferase, to the more toxic species methylarsenite and dimethylarsenite. In air, these are oxidized nonenzymatically to the largely nontoxic pentavalent species. However, methylarsenite can be also detoxified by active extrusion from cells catalyzed by the methylarsenite-specific efflux permease ArsP (13), oxidation to methylarsenate by the methylarsenite-specific oxidase ArsH (14, 15), or demethylation to less toxic arsenite by the ArsI C-As lyase that cleaves the carbonarsenic bond in methylarsenite (16). Arsenic resistance genes are usually organized in ars operons, which are nearly always under control of an ArsR transcriptional repressor. Four different ArsRs, in which each an arsencial binding site is located at a different place in the protein structure, have been described, with three (ArsR1, ArsR2, and ArsR3) regulated selectively by arsenite (17) and one (ArsR4) by methylarsenite (18). Here, we estimate the geological birth date of 13 arsenic resistance genes in relation to the GOE, using molecular clock analyses. The detailed evolutionary histories for each gene family were reconstructed by comparing their gene phylogenies with the phylogeny of organisms (the tree of life) under an explicit model of macroevolution events including gene birth, transfer, duplication, and loss. The occurrence of each arsenic detoxification gene was examined with respect to the taxonomy and physiology of the host microorganisms to provide independent evidence for our molecular dating analysis.

Results

Phylogenetic Distribution of Arsenic Detoxification Genes.

Protein sequences of the 13 arsenic resistance genes were acquired from genomes of 645 bacteria, 88 archaea, and 53 eukaryotes, representative of phylogenetic diversity across the three domains of life (19). The presence/absence of arsenic resistance genes in each of the sampled taxa were collapsed at phylum level and plotted against a reference tree reconstructed from a concatenated alignment of 16 ribosomal proteins (Fig. 1). The distinct phyletic patterns divide the 13 genes into three sets (A-C). Genes in set A, including arsM, acr3, arsC2, arsP, and arsR1, are widely distributed among major lineages of bacteria, archaea, and/or eukaryotes, whereas set B comprises seven genes (arsI, arsB, arsR3, arsH, arsC1, arsR2, and arsR4) found mostly in aerobes that are more sparsely distributed compared with those in set A. Set C comprised a single gene, acr2, with homologs detected only in eukaryotes. The descent patterns suggest that the genes in set A may have emerged as the earliest arsenic detoxification systems, followed by those in sets B and C. However, promiscuous horizontal gene transfer (HGT) of arsenic resistance genes across species (20, 21), as exemplified by apparent incongruousness between individual gene phylogeny and the organism backbone (), obscured our capability to coordinate these genes along the geological timeline with merely phyletic patterns (22).
Fig. 1.

Phylogenetic distribution of 13 arsenic detoxification genes. (Left) Reference phylogenetic trees of major lineages of Bacteria, Archaea, and Eukaryotes. (Right) Relative abundance of 13 arsenic detoxification genes present within each major lineage. The 13 genes were divided into three sets (A-C) according to their phyletic distribution patterns. The reference phylogeny was reconstructed from concatenate alignment of 16 ribosomal proteins, as previously reported (19). Divergent times and corresponding confidence intervals (95%) were estimated using PhyloBayes (analysis 7; Table 2). Timescale: Hd, Hadean; Ph, Phanerozoic; Ga, billions of years.

Phylogenetic distribution of 13 arsenic detoxification genes. (Left) Reference phylogenetic trees of major lineages of Bacteria, Archaea, and Eukaryotes. (Right) Relative abundance of 13 arsenic detoxification genes present within each major lineage. The 13 genes were divided into three sets (A-C) according to their phyletic distribution patterns. The reference phylogeny was reconstructed from concatenate alignment of 16 ribosomal proteins, as previously reported (19). Divergent times and corresponding confidence intervals (95%) were estimated using PhyloBayes (analysis 7; Table 2). Timescale: Hd, Hadean; Ph, Phanerozoic; Ga, billions of years.
Table 2.

Birth Age of 13 arsenic resistance genes estimated under analytical scenarios 7 to 12

Analysis789101112
Model assumptions and calibrations
 Rate model*UncorrelatedUncorrelatedUncorrelatedUncorrelatedUncorrelatedUncorrelated
 CalibrationFull setFull set−Cyanobacteria−RhodophytaFull setFull set
 Root priorU(3.35,4.38)Γ(3.95;0.23)U(3.35,4.38)U(3.35,4.38)U(3.35,4.38)U(3.35,4.38)
 Topology§MLMLMLMLMTThree-domain tree
Gene age (Gyr)
arsM3.40 (3.23–3.61)3.37 (3.23–3.72)3.44 (3.03–3.68)3.40 (3.24–3.73)3.40 (3.24–3.72)3.45 (3.24–3.76)
acr32.79 (2.55–2.96)2.77 (2.59–3.05)2.86 (2.51–3.06)2.79 (2.56–3.04)2.78 (2.60–3.04)2.81 (2.59–3.04)
arsC22.39 (2.03–2.68)2.39 (2.03–2.76)2.45 (2.04–2.72)2.38 (2.01–2.76)2.40 (2.06–2.76)2.39 (2.07–2.74)
arsR12.47 (2.12–2.75)2.46 (2.09–2.86)2.53 (2.12–2.79)2.46 (2.06–2.84)2.47 (2.12–2.83)2.46 (2.17–2.82)
arsP2.47 (2.12–2.75)2.46 (2.09–2.86)2.53 (2.12–2.79)2.46 (2.06–2.84)2.47 (2.12–2.83)2.46 (2.17–2.82)
arsB1.99 (1.70–2.21)1.99 (1.73–2.34)2.04 (1.71–2.30)1.98 (1.68–2.30)2.00 (1.73–2.30)2.01 (1.75–2.28)
arsI1.36 (0.84–2.02)1.40 (0.84–2.04)1.42 (0.79–2.01)1.37 (0.78–2.00)1.39 (0.82–2.15)1.38 (0.79–2.05)
arsH1.70 (1.61–1.81)1.61 (1.43–1.82)1.70 (1.61–1.82)1.70 (1.61–1.84)1.58 (1.41–1.78)1.64 (1.43–1.83)
arsR21.63 (1.57–1.73)1.63 (1.56–1.76)1.63 (1.56–1.74)1.63 (1.57–1.77)1.63 (1.56–1.73)1.63 (1.57–1.75)
arsR31.53 (1.33–1.70)1.53 (1.34–1.74)1.57 (1.31–1.73)1.53 (1.37–1.73)1.69 (1.53–1.88)1.56 (1.36–1.77)
arsC11.31 (0.93–1.58)1.29 (1.02–1.59)1.35 (0.95–1.58)1.31 (1.00–1.59)1.30 (1.01–1.58)1.34 (1.02–1.62)
acr21.11 (0.92–1.29)1.11 (0.96–1.31)1.17 (0.95–1.32)1.09 (0.93–1.28)1.10 (0.94–1.28)1.18 (1.01–1.38)
arsR41.02 (0.89–1.14)1.02 (0.92–1.19)1.05 (0.89–1.17)1.02 (0.90–1.19)0.79 (0.65–0.94)1.05 (0.93–1.19)

Autocorrelated, autocorrelated rate model; Uncorrelated, uncorrelated rate model.

−Cyanobacteria, subsampled calibration points without Cyanobacteria; −Rhodophyta, subsampled calibration points without Rhodophyta.

U, uniform distribution (upper, lower); Γ: Gamma distribution (mean; SD).

ML, maximum likelihood tree of ribosomal proteins; MT: alternative topology reflecting minority bipartitions; Three-domain tree: tree topology where archaea and eukaryotes are sister group.

Median age estimates of gene birth nodes, with 95% confidence intervals in parentheses; Gyr, billion years.

Gene Birth Date of Arsenic Detoxification Genes.

To estimate the timing of the origin of the 13 arsenic resistance genes, we conducted a series of Bayesian molecular clock analyses, using a tree reconciliation algorithm, which explicitly models HGT and generates gene birth dates by mapping gene phylogeny onto a chronogram of species. We tested gene ages against chronograms modeled with autocorrelated rate (analyses 1 to 6) and independent rate clock (analyses 7 to 12). For each clock model, a set of six independent analyses were performed to evaluate the robustness of the results to prior assumptions of root age (analyses 1 and 7), subsampling of fossil calibrations (analyses 3, 4, 9, and 10), and alternative topologies (analyses 5, 6, 11, and 12). Median gene ages under 12 analytical scenarios are shown in Tables 1 and 2, and the uncertainties associated with the results from all these analyses were integrated over to provide composite credibility interval for each gene family (Fig. 2). Although the timing of arsM and acr3 varied under different prior assumptions, all analyses consistently recovered 95% credibility intervals entirely within the Archean eon, suggesting that they originated before the GOE. For arsC2, arsR1, and arsP, we estimate that the median gene ages are before or at the beginning of the Paleoproterozoic period, with composite 95% confidence intervals overlapping with the GOE. In contrast, arsB, arsI, arsH, arsR2, arsR3, arsC1, acr2, and arsR4 are estimated to have evolved near the end of or significantly after the GOE. To assess the sensitivity of our results to alternative species topologies, we also reconciled gene families against 100 reference trees reconstructed from ribosomal proteins or small subunit ribosomal RNA (SSU rRNA). The results show only slightly differences in estimates of gene ages (), which further supports our initial interpretation of the data. Overall, our analyses are consistent with an expansion of microbial arsenic resistance systems in response to the rise of atmospheric oxygen.
Table 1.

Birth Age of 13 arsenic resistance genes estimated under analytical scenarios 1 to 6

Analysis123456
Model assumptions and calibrations
 Rate model*AutocorrelatedAutocorrelatedAutocorrelatedAutocorrelatedAutocorrelatedAutocorrelated
 CalibrationFull setFull set−Cyanobacteria−RhodophytaFull setFull set
 Root priorU(3.35,4.38)Γ(3.95;0.23)U(3.35,4.38)U(3.35,4.38)U(3.35,4.38)U(3.35,4.38)
 Topology§MLMLMLMLMTThree-domain tree
Gene age (Gyr)
arsM3.55 (3.27–3.72)3.59 (3.31–3.79)3.69 (3.45–3.89)3.50 (3.28–3.76)3.57 (3.29–3.83)3.62 (3.40–3.86)
acr32.97 (2.71–3.09)3.10 (2.77–3.31)3.18 (2.87–3.38)2.99 (2.77–3.30)3.08 (2.78–3.34)3.00 (2.81–3.23)
arsC22.70 (2.34–2.89)2.74 (2.39–2.95)2.82 (2.44–2.98)2.64 (2.41–2.91)2.74 (2.42–3.01)2.73 (2.50–3.01)
arsR12.79 (2.45–2.97)2.83 (2.45–3.03)2.91 (2.55–3.10)2.73 (2.49–2.98)2.83 (2.52–3.09)2.82 (2.61–3.08)
arsP2.79 (2.45–2.97)2.83 (2.45–3.03)2.91 (2.55–3.10)2.73 (2.49–2.98)2.83 (2.52–3.09)2.82 (2.61–3.08)
arsB2.07 (1.57–2.36)2.10 (1.57–2.41)2.16 (1.58–2.46)2.03 (1.61–2.39)2.10 (1.60–2.47)2.10 (1.73–2.47)
arsI2.26 (1.78–2.47)2.27 (1.94–2.54)2.36 (1.99–2.61)2.19 (1.85–2.51)2.26 (1.90–2.62)2.28 (1.90–2.58)
arsH1.91 (1.79–2.04)1.92 (1.79–2.03)1.93 (1.80–2.05)1.90 (1.79–2.04)1.82 (1.59–2.00)1.91 (1.80–2.06)
arsR21.80 (1.70–1.91)1.81 (1.70–1.91)1.81 (1.70–1.92)1.79 (1.70–1.91)1.86 (1.50–2.14)1.80 (1.70–1.93)
arsR31.71 (1.48–1.84)1.71 (1.51–1.90)1.77 (1.55–1.93)1.63 (1.49–1.86)1.91 (1.69–2.09)1.71 (1.54–1.94)
arsC11.72 (1.48–1.85)1.73 (1.53–1.94)1.79 (1.57–1.92)1.64 (1.51–1.89)1.74 (1.54–1.97)1.72 (1.58–1.95)
acr20.97 (0.80–1.11)0.99 (0.80–1.16)1.02 (0.78–1.11)0.95 (0.79–1.13)1.00 (0.85–1.16)1.17 (0.99–1.40)
arsR41.14 (0.97–1.26)1.15 (0.98–1.31)1.18 (1.02–1.28)1.08 (0.97–1.26)1.00 (0.81–1.16)1.14 (1.02–1.32)

Autocorrelated, autocorrelated rate model; Uncorrelated, uncorrelated rate model.

−Cyanobacteria, subsampled calibration points without Cyanobacteria; −Rhodophyta, subsampled calibration points without Rhodophyta.

U, uniform distribution (upper, lower); Γ: Gamma distribution (mean; SD).

ML, maximum likelihood tree of ribosomal proteins; MT: alternative topology reflecting minority bipartitions; Three-domain tree: tree topology where archaea and eukaryotes are sister group.

Median age estimates of gene birth nodes, with 95% confidence intervals in parentheses; Gyr, billion years.

Fig. 2.

Gene birth date for each of 13 arsenic detoxification genes. Gene ages were derived from reconciliation results (cycle), using fully dated species trees (n = 1200) sampled from 12 PhyloBayes analyses. The median age estimates under each analytical scenario (Tables 1 and 2) were shown as diamond. The uncertainties associated with the results from all PhyloBayes analyses were integrated as 95% composite confidence intervals (whisker of the boxplot). Age estimates of genes evolved before, around, and after GOE were shown as blue, yellow, and green, respectively. Atmospheric oxygen content throughout Earth’s history was overlaid on the gene’s age (red line) (9). Right y axis, pO2, relative to the present atmospheric level (PAL); left y axis, gene names. Genes found in both anaerobes and aerobes, or only in aerobes were denoted as blue and green, respectively (Fig. 3). Oxygen-dependent genes (arsI and arsH) were indicated by star. AsIII, AsV, and MAsIII were used to delineate genes acting on inorganic arsenite, arsenate, or methylarsenite, respectively. Ga, billions of years.

Birth Age of 13 arsenic resistance genes estimated under analytical scenarios 1 to 6 Autocorrelated, autocorrelated rate model; Uncorrelated, uncorrelated rate model. −Cyanobacteria, subsampled calibration points without Cyanobacteria; −Rhodophyta, subsampled calibration points without Rhodophyta. U, uniform distribution (upper, lower); Γ: Gamma distribution (mean; SD). ML, maximum likelihood tree of ribosomal proteins; MT: alternative topology reflecting minority bipartitions; Three-domain tree: tree topology where archaea and eukaryotes are sister group. Median age estimates of gene birth nodes, with 95% confidence intervals in parentheses; Gyr, billion years. Birth Age of 13 arsenic resistance genes estimated under analytical scenarios 7 to 12 Autocorrelated, autocorrelated rate model; Uncorrelated, uncorrelated rate model. −Cyanobacteria, subsampled calibration points without Cyanobacteria; −Rhodophyta, subsampled calibration points without Rhodophyta. U, uniform distribution (upper, lower); Γ: Gamma distribution (mean; SD). ML, maximum likelihood tree of ribosomal proteins; MT: alternative topology reflecting minority bipartitions; Three-domain tree: tree topology where archaea and eukaryotes are sister group. Median age estimates of gene birth nodes, with 95% confidence intervals in parentheses; Gyr, billion years. Gene birth date for each of 13 arsenic detoxification genes. Gene ages were derived from reconciliation results (cycle), using fully dated species trees (n = 1200) sampled from 12 PhyloBayes analyses. The median age estimates under each analytical scenario (Tables 1 and 2) were shown as diamond. The uncertainties associated with the results from all PhyloBayes analyses were integrated as 95% composite confidence intervals (whisker of the boxplot). Age estimates of genes evolved before, around, and after GOE were shown as blue, yellow, and green, respectively. Atmospheric oxygen content throughout Earth’s history was overlaid on the gene’s age (red line) (9). Right y axis, pO2, relative to the present atmospheric level (PAL); left y axis, gene names. Genes found in both anaerobes and aerobes, or only in aerobes were denoted as blue and green, respectively (Fig. 3). Oxygen-dependent genes (arsI and arsH) were indicated by star. AsIII, AsV, and MAsIII were used to delineate genes acting on inorganic arsenite, arsenate, or methylarsenite, respectively. Ga, billions of years.
Fig. 3.

Distribution of 13 arsenic detoxification genes among strict anaerobes and aerobes. Species were classified either as aerobes (including facultative anaerobes) or anaerobes based on their capability to use oxygen as a terminal electron acceptor. Each black tick indicated the presence of the corresponding gene in a taxon. Genes evolved before or at beginning of GOE were denoted as blue, and those after as green. Oxygen-dependent genes (arsI and arsH) were indicated with the star symbol.

Physiology Bears Out the Age of Arsenic Detoxification Genes.

We attempted to further validate these conclusions by analyzing the physiology of the host microorganisms. Organisms were classified either as aerobes (including facultative anaerobes) or anaerobes, based on their capability to utilize oxygen as a terminal electron acceptor. We found that all the genes predicted to originate in an oxic environment after the GOE are overrepresented in aerobes, but are nearly absent in strict anaerobes (Fig. 3). Furthermore, the genes predicted to have a more ancient origin were found among both anaerobes and aerobes, including the ancient lineages of methanogens and acetogens (Fig. 3). This implies an early origin of these genes in an anoxic or microaerobic environment before or at the beginning of the GOE. They dispersed into the oxic environment after the rise of oxygen, as predicted by our evolutionary model. To further probe the robustness of our predictions, we tested the correlation of arsenic resistance systems with the physiology of the host microorganisms on a more densely sampled set of taxa encompassing more than 2,000 species. We found similar patterns of gene distribution across anaerobes/aerobes, suggesting that our results are broadly conserved independent of taxonomic sampling (). Distribution of 13 arsenic detoxification genes among strict anaerobes and aerobes. Species were classified either as aerobes (including facultative anaerobes) or anaerobes based on their capability to use oxygen as a terminal electron acceptor. Each black tick indicated the presence of the corresponding gene in a taxon. Genes evolved before or at beginning of GOE were denoted as blue, and those after as green. Oxygen-dependent genes (arsI and arsH) were indicated with the star symbol.

Discussion

Arsenic Detoxification Systems before the GOE.

Our molecular clock analyses indicate that enzymatic pathways acting on trivalent arsenite, including arsenite efflux and arsenite methylation, constituted the core of microbial arsenic resistance systems before the rise of atmospheric oxygen (Fig. 4). Our results are consistent with geochemical models that predict the predominance of reduced arsenic compounds in the anoxic Archean biosphere (2, 3, 6, 10). Formation of traces of arsenate in the Archean, creating a selective pressure before the GOE (6), could have occurred via microbial mediated arsenite oxidation processes such as anoxygenic photosynthesis (5) or nitrate-dependent respiration (23). Alternatively, arsenate could have been formed during transient atmospheric oxygenation events documented back to ∼3.0 Bya (9, 24–28). However, our molecular clock analyses placed the earliest origin of the arsenate resistance system coincident with the onset of GOE (Fig. 2). This is consistent with recent analysis on marine shales, suggesting that arsenate began to accumulate in the ocean only after the Archean eon (10), and compatible with the causal role of the GOE in altering the arsenic chemistry on Earth’s surface and driving the genetic expansion of arsenic resistance system.
Fig. 4.

Arsenic resistance systems before (A) and after (B) the GOE. As(III), arsenite; As(V), arsenate; MAs(III), trivalent methylarsenite; MAs(V), pentavalent methylarsenate; SAM, S-adenosylmethionine; GSH, reduced glutathione; GSSG, oxidized glutathione; Grxred: reduced glutaredoxin; Grxox, oxidized glutaredoxin; Trxred, reduced thioredoxin; Trxred, oxidized thioredoxin.

Arsenic resistance systems before (A) and after (B) the GOE. As(III), arsenite; As(V), arsenate; MAs(III), trivalent methylarsenite; MAs(V), pentavalent methylarsenate; SAM, S-adenosylmethionine; GSH, reduced glutathione; GSSG, oxidized glutathione; Grxred: reduced glutaredoxin; Grxox, oxidized glutaredoxin; Trxred, reduced thioredoxin; Trxred, oxidized thioredoxin. The early origin of the arsenite efflux permease encoded by acr3, together with its wide distribution among living organisms (Fig. 1), underpins the fundamental role of efflux mechanisms in heavy metal resistance (29, 30). In contrast, the physiological function of arsenite methylation in anoxic Archean environments remains unclear. The higher toxicity of the trivalent methylated product methylarsenite calls into question the commonly held assumption that methylation is a detoxification process. An attractive hypothesis is that the transient oxygenation of the Archean atmosphere (25, 26) and the existence of oxygen oases in local, shallow marine settings (24, 31) could have provided niches where microbial arsenite methylation could have operated as a detoxification pathway. Alternatively, methylation has been proposed as an antibiotic-producing process in Archean environments, with methylarsenite being a primitive antibiotic (32, 33). Further studies will clarify the function of ArsM in anoxic environments and its contribution to arsenic cycling and overall toxicity in ancient ecosystems.

Expansion of the Arsenic Resistance Network as a Consequence of the GOE.

The rise of oxygen in Earth’s atmosphere since the GOE both triggered global-scale oxidation of reduced arsenic species and led to widespread bioavailability of arsenate (3, 10). Our analyses indicate that the ancient arsenic resistance networks, optimized for detoxification of reduced arsenic in the anoxic Archean Earth, expanded to accommodate these environmental shifts (Fig. 4). In the face of the these challenges, components of arsenate reduction systems (including a new efflux permease, ArsB, and arsenate reductases) evolved independently through convergent evolution after the GOE. The recurrent innovation of counterparts of ancient arsenate resistance devices is in agreement with enhanced arsenate stress because of gradually increasing oxygen levels after the Archean (3, 8). With the appearance of molecular oxygen, the ancient arsenic detoxification pathways were remodeled for detoxification of inorganic arsenic. For example, arsenite methylation process catalyzed by ArsM could be recruited as a detoxification pathway under oxic settings. Its products, the toxic trivalent methylarsenite and dimethylarsenite, would be oxidized nonenzymatically by dioxygen into relatively innocuous methylarsenate and dimethylarsenate. However, the influence of dioxygen did not stop here. Our results further suggest that two new obligate oxygen-dependent methylarsenite resistance enzymes, ArsH and ArsI, arose during or after the GOE. Concurrent with the evolution of these new oxygen-dependent methylarsenite detoxification enzymes, recurrent expansion of ArsR families after the GOE resulted in formation of diverse ars operons present in extant prokaryotes and enabled regulatory fine-tuning of ars genes throughout different ages of the Earth evolution (17).

Conclusion and Implications.

The timing we propose for the birth of arsenic resistance gene-families supports a shifted marine arsenic cycle across Archean–Proterozoic boundary. We observed an early origin of metabolic functions including methylation and excretion of arsenic during the Archaean eon, which is in accord with the fossil evidence indicating the occurrence of microbial arsenic metabolism and cycling 2.72 Bya (34). Our prediction of continuous innovation of gene families toward detoxification of oxidized arsenic species is in agreement with recent analysis of marine shales that inferred a sharp increase of dissolved arsenate from ∼2.48 Bya onward (10). The persistence of ars genes among distinct microbial lineages over billions of years implies a temporal continuity of arsenic stress (2). The genetic expansion of arsenic resistance systems across the GOE would have entailed fitness advantages leading to success and diversification of life in the new redox landscape, which in turn remodeled the transition of metal chemistry on the Earth’s surface. Our molecular analysis, together with the innovations of protective mechanisms against other elements (35, 36) (e.g., Cu and Zn), provides a crucial constraint on the response of global biosphere to the major transitions in cycles of toxic, redox-sensitive metals.

Methods

Genomic Sampling and Reconstruction of Species Tree.

A previously reported tree of life was used as template for reconstruction of species tree (19). A total of 786 representative species with a completely sequenced genome were sampled from the original dataset (see Dataset S1 for accession number). The ribosomal protein tree was inferred with RAxML v8.4.1 (37), using the PROTGAMMALG evolution model. To reconstruct the SSU rRNA tree, an alignment was generated from SSU rRNA genes of the sampled organisms, using the SINA alignment algorithm (38). One representative SSU rRNA gene was selected for species with multiple copies. Phylogenetic trees were calculated under the GTRCAT model, using RAxML. A total of 204 and 300 bootstrap replicates were conducted for ribosomal protein and SSU rRNA gene phylogenies, respectively, according to extended majority-rule consensus (MRE)-based bootstopping criteria. The oxygen requirement for each selected species was retrieved from Genomes OnLine Database (GOLD) (39) and literature reviews.

Molecular Dating of the Tree of Life.

The divergence time of species tree was estimated with PhyloBayes, using a fixed RAxML phylogeny of ribosomal proteins, a CAT20 substitutional model, a birth–death process, and four gamma categories (40). The CAT20 model was chosen because preliminary tests showed that analyses using a full CAT model failed to converge within a reasonable time (>2 mo). Both the autocorrelated lognormal (-ln) and uncorrelated gamma multiplier (-ugam) relaxed clocks were applied to model the rate variation across lineages (41). Bayesian cross-validation implemented in PhyloBayes was used to test whether one of two clock models fits the data better. The clocks were calibrated with eight sets of temporal constraints () that are directly linked to fossil and geochemical evidence, as described previously (22, 42). The age of the last universal common ancestor (root) was constrained between 4.38 Bya (approximating earliest habitability evidence) (43, 44) and 3.35 Bya (fossil records from the Strelley Pool Formation) (42, 45, 46), using a uniform distribution. Gamma-distributed root prior (3.95 ± 0.23 Bya), assuming the maximum probability of the root age falling in the midway between the calibrations, was applied to test the effects of root prior distribution (analyses 2 and 8). Geochemical evidence from the Manzimnyama Banded Iron Formation, Fig Tree Group, South Africa, indicates the presence of free oxygen being produced by Cyanobacteria before 3.2 Bya (42, 47), and this was used as a minimum age for total-group of Cyanobacteria. However, as the Banded Iron Formation at 3.2 Bya may have been also formed via anaerobic processes [i.e., UV oxidation (48) and anoxygenic photosynthesis (49, 50)], PhyloBayes analyses without the constraint on Cyanobacteria (analyses 3 and 9) were performed to test how inclusion of this constraint impacts the results. The time constraint on Rhodophyta was derived from the oldest fossil records of Bangiale red algae, which occurred in 1.20 Bya Hunting Formation (51). To evaluate whether this assumption is so stringent to overdetermine the estimated divergence times, analyses were performed with reduced sets of calibrations by precluding constraints on Rhodophyta (analyses 4 and 10). Comparisons of estimated confidence intervals suggested that varying root priors or subsampling of calibrations resulted in minimal changes of estimated divergence times (). For all molecular clock analyses, two independent PhyloBayes Markov chain Monte Carlo (MCMC) chains were run in parallel up to 1 mo (∼60,000 model cycles). The convergence of MCMC chains was checked by comparing the posterior distributions of independent runs, using tracecomp program implemented in PhyloBayes (effective sizes >100, and maximum discrepancy between chains <0.3). A state of the MCMC chain was sampled every 20 cycles after 20% initial generations discarded as burn-in. All PhyloBayes analyses were also run under the prior conditions by removing the sequence data, to verify that the estimated divergence time is not solely driven by fossil records (). In addition, ribosomal protein phylogeny and SSU rRNA gene phylogeny were converted to ultrametric tree, using TreePL under a penalized likelihood model (52). The rate smoothing parameters were set to 10-based values between 1 and 10,000 with cross-validation procedure and the χ2 test enabled in TreePL. The full set of temporal constraints () was used. To evaluate the effect of phylogenetic uncertainty on the results, alternative tree topologies reflecting alternative arrangements/bipartitions for taxa of uncertain relationships were generated. Conflicting bipartitions (n = 32) of RAxML ribosomal protein tree that are substantially represented (>40%) in bootstrap replicates were retrieved using RAxML (37) (option -f t, internode certainty analysis). The alternative minority-bipartition topology was obtained by editing the RAxML tree to reflect all conflicting bipartitions via subtree prune and regraft (analyses 5 and 11). A three-domain tree placing Archaea as a sister group of Eukaryotes was built similarly (analyses 6 and 12). Both alternative topologies were dated with full alignment of ribosomal proteins, using PhyloBayes. Furthermore, we built 100 alternative chronograms using TreePL (), based on alternative topologies containing 50% of randomly selected minority bipartitions (Bipartition-Jackknife analysis). Branch length of these alternative topologies were re-estimated by RAxML (option -f e), using full alignment of ribosomal proteins.

Identification of Arsenic Resistance Genes.

A hidden Markov model (HMM)-based search was performed to identify arsenic resistance genes in selected genomes. To develop HMM profiles, reference protein sequences were downloaded from Uniprot or National Center for Biotechnology Information (NCBI) () and aligned using MAFFT v7.310 (53) with linsi option. Sequence alignment was visualized by ClustalX (54), and the ambiguously aligned regions were removed using TrimAl v1.2 (55). HMM profiles were built on curated alignments using hmmbuild in HMMER v3.1b2 package (56). To collect homologs of arsenic resistance genes, each HMM profile was searched against 786 genomes, using hmmsearch with an E-value cutoff of 0.1. Hit scores were retrieved, and the corresponding sequences were examined for conserved domains, using protein family (PFAM) database (57). With profile searches for Acr3, ArsB, ArsH, ArsI, ArsM, and ArsP (), the retrieved hits were partitioned into two distinct groups: one exhibited significantly higher scores that consist of reference proteins, and another showed a much lower score that included distant homologs. The separation of scoring values permitted us to distinguish these arsenic resistance genes from their remote relatives, and we annotated the sequences showing better scoring values as the target proteins. To determine whether these sequences are truly arsenic resistance proteins, hits from hmmsearch were aligned with MAFFT (multiple sequence alignment based on fast Fourier transform), and phylogenetic trees were constructed using RAxML under the PROTGAMMAAUTO model with 100 nonparametric bootstraps. The results from these tree-building trails indicated that sequences with significant higher scores formed a moderate- to strong-supported monophyletic clade among the functional characterized proteins (), which provided evidence that the arsenic resistance proteins were correctly annotated. In contrast, HMM profiles showed lower ability to distinguish ArsCs from their distant relatives (), probably because of their short protein lengths and absence of highly conserved domains. Therefore, we identified prokaryotic arsenate reductase genes (ArsC1 and ArsC2) by taking genomic contexts into account. The hmmsearch scoring threshold for each arsenate reductase (ArsC1 and ArsC2) was optimized to include sequences from the phylogenetic clade containing both reference proteins and homologs located within ars operon (). Eukaryotic arsenate reductases (Acr2) were determined via a phylogenetic method. Branches within a well-supported clade consisting known Acr2 were selected as putative Acr2 (). ArsR homologs were classified into four families on the basis of a reported phylogenetic tree (18). Reference alignment and phylogenetic tree of ArsRs were built as described previously (18). For each ArsR family, homologs extracted by HMM profiles were added to reference alignment using MAFFT (–add and –keeplength) and assigned to a reference tree with evolutionary placement algorithm in RAxML. Sequences that were placed within the corresponding clade of the reference tree were identified as ArsR (). Sequences retrieved here were further screened for presence of key catalytic residues (). Homologs passed through these criteria were regarded as functional orthologs involved in arsenic resistance, which were used for subsequent analysis. The same identification pipeline was further applied to fetch protein sequences of arsenic resistance genes in 2,031 organisms included in EggNOG Database (v4.5.1).

Phylogenetic Analysis of Arsenic Resistance Genes.

The protein sequences of each arsenic detoxification gene family were aligned with five different methods [MUSCLE (58), ClustalW (54), T-Coffee (59), MAFFT (53) and ProbCons (60)]. Consensus alignment of genes was calculated on the basis of the consistency of output from individual alignment programs using M-Coffee, provided in the T-Coffee package (61). The poorly aligned regions were excised using TrimAl v1.2 (55) with -automated1 option. The best-fit evolutionary model for each gene family (Acr3: LG+I+G; ArsB: LG+I+G; ArsC1: WAG+I+G; ArsC2: LG+I+G; Acr2: LG+I+G; ArsH: LG+I+G; ArsI: WAG+I+G; ArsM: LG+I+G; ArsP: LG+I+G; ArsR1: LG+I+G; ArsR2: Dayhoff+G+F; ArsR3: LG+I+G; ArsR4: LG+G) was determined by ProtTest3 (62), according to Akaike information criterion and Bayesian information criterion. Inference of maximum likelihood tree was performed under best-fit evolutionary model, using RAxML. Nonparametric bootstrap analysis for each gene tree was conducted under a corresponding evolutionary model with 100 replicates. The pairwise phylogenetic distances were calculated by summing up all of the branches linking two taxons in maximum-likelihood phylogeny. The congruence between gene tree and species tree (ribosomal protein phylogeny) was assessed by scatterplots of pairwise phylogenetic distances calculated from corresponding trees.

Gene Birth Date Inference.

Gene birth dates were inferred using a reconciliation algorithm implemented in ecceTERA (63, 64). An ensemble (n = 10) of nonparametric bootstrapped trees were used as a gene tree set to resolve the uncertainty in deep-branching phylogenies, using amalgamation algorithm (option amalgamate = 1). Fully dated species tree (option dated = 2) reconstructed by either PhyloBayes or TreePL was provided to restrict the HGT events among only chronological overlapped lineages. Gene birth was parsed as the earliest split event that led to the gene clade. Posterior estimates of gene age (i.e., median and 95% highest posterior density interval) were calculated over the course of 1,200 reconciliation analyses, using fully dated species trees (n = 100) sampled from each of PhyloBayes MCMC analysis (Tables 1 and 2). To assess the sensitivity of our results to reconciliation algorithms, the gene ages were also estimated using the Analyzer of Gene and Species Trees (AnGST) program (22). AnGST was run with default parameters (event cost: HGT = 3.0, DUP = 2.0, and LOS = 1.0; ultrametric = True) with 10 bootstrapped gene trees. Due to computation limitations, AnGST was performed only on consensus species trees of 12 Bayesian molecular clock analyses ( and Tables 1 and 2).

Data Availability.

Accession numbers of all genomes used in this study are listed in Dataset S1. Protein sequence alignments and maximum-likelihood trees of 13 arsenic resistance genes are available in Dataset S2. Species trees based on alignment of concatenated ribosomal proteins or SSU rRNA are included in Dataset S3.
  52 in total

1.  PhyloBayes 3: a Bayesian software package for phylogenetic reconstruction and molecular dating.

Authors:  Nicolas Lartillot; Thomas Lepage; Samuel Blanquart
Journal:  Bioinformatics       Date:  2009-06-17       Impact factor: 6.937

2.  A new view of the tree of life.

Authors:  Laura A Hug; Brett J Baker; Karthik Anantharaman; Christopher T Brown; Alexander J Probst; Cindy J Castelle; Cristina N Butterfield; Alex W Hernsdorf; Yuki Amano; Kotaro Ise; Yohey Suzuki; Natasha Dudek; David A Relman; Kari M Finstad; Ronald Amundson; Brian C Thomas; Jillian F Banfield
Journal:  Nat Microbiol       Date:  2016-04-11       Impact factor: 17.745

3.  ArsH is an organoarsenical oxidase that confers resistance to trivalent forms of the herbicide monosodium methylarsenate and the poultry growth promoter roxarsone.

Authors:  Jian Chen; Hiranmoy Bhattacharjee; Barry P Rosen
Journal:  Mol Microbiol       Date:  2015-04-06       Impact factor: 3.501

4.  A novel MAs(III)-selective ArsR transcriptional repressor.

Authors:  Jian Chen; Venkadesh Sarkarai Nadar; Barry P Rosen
Journal:  Mol Microbiol       Date:  2017-09-13       Impact factor: 3.501

5.  Convergent evolution of a new arsenic binding site in the ArsR/SmtB family of metalloregulators.

Authors:  Jie Qin; Hsueh-Liang Fu; Jun Ye; Krisztina Z Bencze; Timothy L Stemmler; Douglas E Rawlings; Barry P Rosen
Journal:  J Biol Chem       Date:  2007-09-26       Impact factor: 5.157

6.  The Pfam protein families database.

Authors:  Alex Bateman; Ewan Birney; Lorenzo Cerruti; Richard Durbin; Laurence Etwiller; Sean R Eddy; Sam Griffiths-Jones; Kevin L Howe; Mhairi Marshall; Erik L L Sonnhammer
Journal:  Nucleic Acids Res       Date:  2002-01-01       Impact factor: 16.971

7.  T-Coffee: a web server for the multiple sequence alignment of protein and RNA sequences using structural information and homology extension.

Authors:  Paolo Di Tommaso; Sebastien Moretti; Ioannis Xenarios; Miquel Orobitg; Alberto Montanyola; Jia-Ming Chang; Jean-François Taly; Cedric Notredame
Journal:  Nucleic Acids Res       Date:  2011-05-09       Impact factor: 16.971

8.  SINA: accurate high-throughput multiple sequence alignment of ribosomal RNA genes.

Authors:  Elmar Pruesse; Jörg Peplies; Frank Oliver Glöckner
Journal:  Bioinformatics       Date:  2012-05-03       Impact factor: 6.937

9.  Joint amalgamation of most parsimonious reconciled gene trees.

Authors:  Celine Scornavacca; Edwin Jacox; Gergely J Szöllősi
Journal:  Bioinformatics       Date:  2014-11-06       Impact factor: 6.937

10.  Sedimentary sulfur isotopes and Neoarchean ocean oxygenation.

Authors:  Mojtaba Fakhraee; Sean A Crowe; Sergei Katsev
Journal:  Sci Adv       Date:  2018-01-24       Impact factor: 14.136

View more
  14 in total

1.  High Arsenic Levels Increase Activity Rather than Diversity or Abundance of Arsenic Metabolism Genes in Paddy Soils.

Authors:  Si-Yu Zhang; Xiao Xiao; Song-Can Chen; Yong-Guan Zhu; Guo-Xin Sun; Konstantinos T Konstantinidis
Journal:  Appl Environ Microbiol       Date:  2021-08-11       Impact factor: 4.792

Review 2.  Origins, fate, and actions of methylated trivalent metabolites of inorganic arsenic: progress and prospects.

Authors:  Miroslav Stýblo; Abhishek Venkatratnam; Rebecca C Fry; David J Thomas
Journal:  Arch Toxicol       Date:  2021-03-26       Impact factor: 5.153

3.  Oxidation of organoarsenicals and antimonite by a novel flavin monooxygenase widely present in soil bacteria.

Authors:  Jun Zhang; Jian Chen; Yi-Fei Wu; Zi-Ping Wang; Ji-Guo Qiu; Xiao-Long Li; Feng Cai; Ke-Qing Xiao; Xiao-Xu Sun; Barry P Rosen; Fang-Jie Zhao
Journal:  Environ Microbiol       Date:  2021-04-06       Impact factor: 5.491

4.  NemA Catalyzes Trivalent Organoarsenical Oxidation and Is Regulated by the Trivalent Organoarsenical-Selective Transcriptional Repressor NemR.

Authors:  Kaixiang Shi; Manohar Radhakrishnan; Xingli Dai; Barry P Rosen; Gejiao Wang
Journal:  Environ Sci Technol       Date:  2021-04-14       Impact factor: 9.028

5.  Functional characterization of the methylarsenite-inducible arsRM operon from Noviherbaspirillum denitrificans HC18.

Authors:  Jun Zhang; Jian Chen; Yi-Fei Wu; Xia Liu; Charles Packianathan; Venkadesh S Nadar; Barry P Rosen; Fang-Jie Zhao
Journal:  Environ Microbiol       Date:  2022-01-26       Impact factor: 5.491

Review 6.  Antimicrobial Activity of Metals and Metalloids.

Authors:  Yuan Ping Li; Ibtissem Ben Fekih; Ernest Chi Fru; Aurelio Moraleda-Munoz; Xuanji Li; Barry P Rosen; Masafumi Yoshinaga; Christopher Rensing
Journal:  Annu Rev Microbiol       Date:  2021-08-03       Impact factor: 16.232

7.  Self-assembling thermostable chimeras as new platform for arsenic biosensing.

Authors:  Rosanna Puopolo; Ilaria Sorrentino; Giovanni Gallo; Alessandra Piscitelli; Paola Giardina; Alan Le Goff; Gabriella Fiorentino
Journal:  Sci Rep       Date:  2021-02-04       Impact factor: 4.379

8.  Meta-omics-aided isolation of an elusive anaerobic arsenic-methylating soil bacterium.

Authors:  Karen Viacava; Jiangtao Qiao; Andrew Janowczyk; Suresh Poudel; Nicolas Jacquemin; Karin Lederballe Meibom; Him K Shrestha; Matthew C Reid; Robert L Hettich; Rizlan Bernier-Latmani
Journal:  ISME J       Date:  2022-03-25       Impact factor: 11.217

9.  Temporal variation of planetary iron as a driver of evolution.

Authors:  Jon Wade; David J Byrne; Chris J Ballentine; Hal Drakesmith
Journal:  Proc Natl Acad Sci U S A       Date:  2021-12-21       Impact factor: 11.205

10.  Living to the High Extreme: Unraveling the Composition, Structure, and Functional Insights of Bacterial Communities Thriving in the Arsenic-Rich Salar de Huasco Altiplanic Ecosystem.

Authors:  Juan Castro-Severyn; Coral Pardo-Esté; Katterinne N Mendez; Jonathan Fortt; Sebastian Marquez; Franck Molina; Eduardo Castro-Nallar; Francisco Remonsellez; Claudia P Saavedra
Journal:  Microbiol Spectr       Date:  2021-06-30
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.