Marco Malatesta1, Giulia Mori1, Domenico Acquotti2, Barbara Campanini3, Alessio Peracchi1, Parker B Antin4, Riccardo Percudani5. 1. Department of Chemistry, Life Sciences and Environmental Sustainability, University of Parma, Parma, Italy. 2. Centro Interdipartimentale Misure 'Giuseppe Casnati', University of Parma, Parma, Italy. 3. Department of Food and Drug, University of Parma, Parma, Italy. 4. Department of Cellular and Molecular Medicine, University of Arizona, Tucson, AZ, USA. 5. Department of Chemistry, Life Sciences and Environmental Sustainability, University of Parma, Parma, Italy. riccardo.percudani@unipr.it.
Abstract
Among amniotes, reptiles and mammals are differently adapted to terrestrial life. It is generally appreciated that terrestrialization required adaptive changes of vertebrate metabolism, particularly in the mode of nitrogen excretion. However, the current paradigm is that metabolic adaptation to life on land did not involve synthesis of enzymatic pathways de novo, but rather repurposing of existing ones. Here, by comparing the inventory of pyridoxal 5'-phosphate-dependent enzymes in different amniotes, we identify in silico a pathway for sulfur metabolism present in chick embryos but not in mammals. Cysteine lyase contains haem and pyridoxal 5'-phosphate co-factors and converts cysteine and sulfite into cysteic acid and hydrogen sulfide, respectively. A specific cysteic acid decarboxylase produces taurine, while hydrogen sulfide is recycled into cysteine by cystathionine beta-synthase. This reaction sequence enables the formation of sulfonated amino acids during embryo development in the egg at no cost of reduced sulfur. The pathway originated around 300 million years ago in a proto-reptile by cystathionine beta-synthase duplication, cysteine lyase neofunctionalization and cysteic acid decarboxylase co-option. Our findings indicate that adaptation to terrestrial life involved innovations in metabolic pathways, and reveal the molecular mechanisms by which such innovations arose in amniote evolution.
Among amniotes, reptiles and mammals are differently adapted to terrestrial life. It is generally appreciated that terrestrialization required adaptive changes of vertebrate metabolism, particularly in the mode of nitrogen excretion. However, the current paradigm is that metabolic adaptation to life on land did not involve synthesis of enzymatic pathways de novo, but rather repurposing of existing ones. Here, by comparing the inventory of pyridoxal 5'-phosphate-dependent enzymes in different amniotes, we identify in silico a pathway for sulfur metabolism present in chick embryos but not in mammals. Cysteine lyase contains haem and pyridoxal 5'-phosphate co-factors and converts cysteine and sulfite into cysteic acid and hydrogen sulfide, respectively. A specific cysteic acid decarboxylase produces taurine, while hydrogen sulfide is recycled into cysteine by cystathionine beta-synthase. This reaction sequence enables the formation of sulfonated amino acids during embryo development in the egg at no cost of reduced sulfur. The pathway originated around 300 million years ago in a proto-reptile by cystathionine beta-synthase duplication, cysteine lyase neofunctionalization and cysteic acid decarboxylase co-option. Our findings indicate that adaptation to terrestrial life involved innovations in metabolic pathways, and reveal the molecular mechanisms by which such innovations arose in amniote evolution.
Amniotes split ~320 million years ago soon after their origin into two lineages that dominate animal life on land: Sauropsida (Reptilia) including turtles, lizards, snakes, crocodiles, and birds, and Synapsida whose extant representatives are mammals[1-3]. Before separation, amniotes had developed few derived characters, most notably a protective and nourishing structure for embryo development[4]. After separation, reptiles and mammals evolved distinct adaptations to terrestrial life[5]. These adaptations are less well understood at the molecular level. It has long been known that the two amniote clades evolved patterns of nitrogen elimination (uricotelism and ureotelism) suitable for their survival and development in terrestrial ecosystems[6,7]. Such modifications in nitrogen metabolism involved the use of old pathways for a new purpose[8,9], supporting the notion that terrestrialization did not require novel enzymes and metabolic pathways. Consistent with this view, no evidence has been found for enzymatic pathways that originated in early amniote evolution[10-12]. Whole genome comparisons, however, have revealed a number of genes that are not shared between the two classes of amniotes -more than two thousand in the case of Gallus gallus versus Homo sapiens[13]. Whether these genetic differences correspond to innovations in molecular processes and pathways is largely unknown. Conversely, there is evidence for metabolic pathways present in sauropsids but absent in mammals for which the genes have not been identified.According to 35S radiotracer experiments[14,15], chicken embryos synthesize sulfonated amino acids through incorporation of sulfite into cysteine with release of H2S. During development, this activity (cysteine lyase; EC 4.4.1.10) is first observed in germ layer cells and increases with the differentiation of the yolk-sac endoderm[16]. The presence of cysteine lyase was confirmed in other sauropsids, but not in mammals[17]. The partially purified enzyme[18] was found to be dependent on pyridoxal 5’-phosphate (PLP). This activity, however, has never been reported, even as a side reaction, for any known PLP-dependent protein. In embryonated eggs, sulfite or sulfate 35S is finally incorporated into taurine[14,15], a sulfonated amino acid (2-aminoethane sulfonic acid) with a vital role in vertebrates during development and adult life[19,20]. In mammals, taurine is synthesized in a pathway starting with cysteine/cysteamine oxidation[19,21,22]. However, during fetal development in humans[23], cats[20], and mice[24], taurine is provided by maternal transfer. In view of previous evidence, a pathway for taurine biosynthesis involving cysteine lyase and oxidized sulfur should exist in embryonated chicken eggs. The understanding of this pathway, however, has been limited by the lack of knowledge of its molecular components. Such missing pieces of information in metabolic reactions, or “pathway holes”[25], represent a substantial portion (10–30% depending on the organism) of pathway databases (e.g. Kegg: https://www.genome.jp/kegg; Metacyc: https://metacyc.org). With the availability of complete genomes, the genes responsible for such unassigned functions can be searched in a finite list of sequences by bioinformatics. There are several bioinformatics methods for the prediction of gene-trait associations[25,26]. Here we devised a procedure to limit the search space based on information of the reaction mechanism and dependency on a particular cofactor.
Results
To identify this pathway, we assumed that Gallus gallus has a gene encoding a PLP-dependent enzyme that is absent in mammals. Enzymes that depend on PLP (the active form of vitamin B6) are remarkable for their evolvability: they catalyze a wide variety of reactions, but have a limited number of evolutionary origins, making it possible to identify an organism’s PLPome in silico. We initially compared the Gallus gallus and Homo sapiens PLPomes (Fig. 1a) with the genome analysis tool of B6db[27]. This side-by-side comparison (Extended Data Fig. 1) showed that these species encode only ~50 of the >300 known families contained in the database. Most of the proteins in the two amniotes have a 1:1 orthologous relationship. However, 8 human and 4 chicken proteins do not have a correspondence in the other species (Fig. 1a). One of the 4 proteins uniquely present in Gallus (XP_015151050; predicted threonine aldolase) could be excluded as comparisons that included other species revealed that this gene is generally present in mammals. Detailed examination of the remaining proteins identified a single strong candidate for the sought function. In the B6db classification, the XP_015156382 protein (gene: LOC418544) was assigned to the same family as cystathionine beta-synthase (CBS), an enzyme catalyzing β-replacement reactions similar to cysteine lyase. In particular, the main cysteine lyase activity (reaction 1, Fig. 1b) resembles the serine hydro-lyase activity of CBS (reaction 4, Fig. 1c), while a secondary activity of cysteine lyase, i.e. formation of lanthionine (reaction 2, Fig. 1b), resembles the main CBS activity (reaction 3, Fig. 1c). We used 3D structures of experimentally validated CBS proteins and structural models of homologous Gallus sequences to analyze the conservation of residues lining the active site. The protein annotated as CBS (XP_015156364) had perfect conservation in this region, while the protein annotated as CBS-like (XP_015156382) showed conservation of the catalytic lysine (K119), but several non conservative substitutions of active site residues (Fig. 1d). Chicken embryo gene expression data in the GEISHA database[28] suggested LOC18544 expression at an extra-embryonic location consistent with the reported cysteine lyase activity[16]. In view of the evidence of the bioinformatics analysis we undertook the characterization of the protein encoded by LOC418544. Based on the observations described below, we named this gene cysteine lyase (CL).
Fig. 1.
Identification of the genes involved in sulfonated amino acids biosynthesis in Gallus gallus.
a, Venn diagram of the in silico comparison of Gallus gallus and Homo sapiens PLPomes summarizing the numbers of shared and unique genes. Accession numbers of the proteins identified as cysteine lyase (CL), cystathionine beta-synthase (CBS), cysteic acid decarboxylase (CAD) are indicated. b, Reactions catalyzed by CL: synthesis of cysteate (1) and synthesis of lanthionine (2). c, Reactions catalyzed by CBS: synthesis of cystathionine (3) and conversion of serine into cysteine via addition of hydrogen sulfide (4). d, Conservation of the catalytic lysine (K119) and non conservative substitutions of active site residues of Gallus gallus CBS-like (XP_015156382). Numeration is according to the human CBS sequence. e,
GgCL and GgCBS domain composition; the dashed line indicates gene truncation for recombinant protein expression in GgCL (K396) and GgCBS (S427). f, Time-resolved 1H NMR spectra of cysteine (5 mM) in the presence of Na2SO3 (7 mM) and GgCL (1 μM), showing complete conversion into cysteic acid. (g) Time-resolved 1H NMR spectra of serine (5 mM) in the presence of Na2S (30 mM) and GgCBS (4 μM), showing complete conversion into cysteine. h, GgCAD domain composition. (i) Time-resolved 1H NMR spectra of cysteic acid (5 mM) in the presence of GgCAD (1 μM), showing complete conversion into taurine. j, Example of GgCAD kinetics in the presence of cysteic acid (CA) or cysteine sulfinic acid (CSA); the black curve is the fitting of the experimental points with the integrated Michaelis-Menten equation[50] with KM = 6.95 ± 3.23 mM, kcat = 10.54 ± 3.46 s−1. k, Specific activities of Gallus, Homo, and Danio CSAD orthologs with CA or CSA substrates. Data are means ± SDV of three independent experiments.
Extended Data Fig. 1
In silico subtraction of chicken and human PLPomes.
Comparison of the complete set of PLP-dependent enzymes (one isoform per gene) in Gallus gallus and Homo sapiens as classified by B6db. Orthologous proteins (BRH test) are colored blue. Gallus proteins without human orthologs are in bold. E-values indicate significance of the protein alignments with family-level Hidden Markov Models.
The GgCL protein sequence has the same level of similarity with either GgCBS or HsCBS (65.1% and 64.6% identity), less than the similarity between the two CBS sequences (75.3% identity) (Extended Data Fig. 2a). Domain analysis predicts for GgCL an architecture similar to CBS[29], with a N-terminal heme-binding domain, a central PLP-binding domain, and two C-terminal CBS repeats (Fig. 1e). As previously reported for CBS[30], soluble expression could be achieved with a truncated protein lacking the regulatory CBS repeats (aa 1–396, Fig. 1e). Recombinant GgCL was produced in E. coli and purified to homogeneity as an orange protein (Extended Data Fig. 2b,c). Conservation of residues for heme and PLP coordination (Extended Data Fig. 2a,d) suggests that CL has maintained the ability to bind these cofactors. The absorbance spectra of GgCL showed the typical Soret peak of heme-binding proteins (Extended Data Fig. 2e). The presence of PLP was not apparent in the absorbance spectrum due to the dominant signal of heme. However, the fluorescence emission spectrum upon excitation at 412 nm showed a peak centered at 510 nm (Extended Data Fig. 2f) attributable to the ketoenamine tautomer of bound PLP[31].
Extended Data Fig. 2
GgCL is a heme and PLP protein with cysteine lyase activity.
a, Multiple alignment of H. sapiens CBS (HsCBS) with G. gallus CBS and CL proteins (GgCBS, GgCL). Filled circles indicate residues that recognize heme (red), PLP (yellow) and serine (white) in the holo CBS structure (PDB code 3PC4). Conserved residues based on the alignment of 8 CL and 22 CBS sequences from vertebrates are shaded in black. Green shading indicates conserved differences between CBS and CL groups. b, Photograph of the FPLC collector after cation exchange, showing the vivid orange color of GgCL protein fractions (upper panel); selected fractions were subjected to SDS-PAGE electrophoresis and stained with Coomassie Brilliant Blue (lower panel). c, Gel filtration chromatogram (Superdex 200) with dual wavelength detection (λ = 280, 428 nm), showing a molecular weight corresponding to GgCL monomer. d, GgCL predicted interactions with heme (left) and PLP (right) are shown with residues conserved in the alignment of CBS/CL proteins highlighted in colors. e, Absorbance spectrum of recombinant GgCL (16.5 μM) in NaH2PO4 (20 mM), pH 7.0. f, Fluorescence emission spectrum (excitation: 412 nm) of recombinant GgCL (22 μM) in NaH2PO4, pH 7.0. g, Kinetics of H2S release by the CL reaction monitored spectrophotometrically at 390 nm in 50 mM NaH2PO4, pH 7.0 with GgCL (1 μM), lead acetate (0.4 mM), cysteine (5 mM) in the absence (dashed line) or in the presence of Na2SO3 (5 mM, solid line). h-i, Non linear fitting to the Michaelis Menten equation of the dependency on substrate concentrations of the initial reaction velocity of GgCL (1 μM) with fixed (h) Na2SO3 (5 mM) and (i) cysteine (40 mM). Data are means ± SDV of three independent experiments. j, Time-resolved 1H NMR spectra of cysteine (10 mM) in the presence of GgCL (1 μM), showing partial conversion into lanthionine.
We monitored spectrophotometrically the GgCL activity by trapping in situ generated H2S with lead acetate to form lead sulfide (PbS), a dark compound. H2S release was observed with cysteine alone, and was much faster in the presence of sulfite (Extended Data Fig. 2g), suggesting that GgCL catalyzes both cysteine lyase reactions (see Fig. 1b). In the absence of lead acetate, these reactions, albeit conducted with just micromoles of reagents, could be perceived for their rotten egg odor. H2S is a volatile molecule with a role as gaseous messenger in vertebrate[32]. The dependence of reaction velocity on substrate concentrations followed Michaelis-Menten kinetics (Extended Data Fig. 2h–i) with Fitting to the MM equation gave a k value of 17.32 ± 1.05 s−1 and K values of 12.75 ± 2.16 mM for cysteine and 0.096 ± 0.018 mM for sulfite. Formation of the cysteine reaction product was directly followed by time-resolved 1H NMR spectrometry. In the presence of excess sulfite, cysteine was completely converted into cysteic acid (CA) (Fig. 1f), whereas a partial conversion into lanthionine was observed in the absence of sulfite (Extended Data Fig. 2j). GgCL showed no activity with serine and homocysteine, or serine and H2S (Extended Data Fig. 3a,b), indicating that it is unable to catalyze the CBS reactions (see Fig. 1c and Extended Data Fig 3c). While incompetent in the β-replacement of serine, GgCL is able to abstract the serine alpha proton (Extended Data Fig. 3d–e), i.e. to complete the first step of the reaction mechanism. By contrast, recombinant GgCBS was found to be able to catalyze the CBS reactions, including the β-replacement of serine with H2S to form cysteine (Fig. 1g).
Extended Data Fig. 3
Absence of CBS activity in GgCL.
a, Time-resolved 1H NMR spectra of 5 mM of serine (atoms labeled in blue) and 5 mM of DL-homocysteine (atoms labeled in red) in the presence of GgCL (1 μM). b, Time-resolved 1H NMR spectra of serine (5 mM) and Na2S (5 mM) in the presence of GgCL (1 μM). c, Time-resolved 1H NMR spectra of 5 mM of serine (atoms labeled in blue) and 10 mM of DL-homocysteine (atoms labeled in red) in the presence of GgCBS (4 μM), showing complete consumption of serine and partial conversion of DL-homocysteine in cystathionine (atoms labeled in green) due to the stereospecific enzymatic reaction. d, Hydrogen-Deuterium exchange of serine alpha proton catalysed by GgCL (1 μM) in 95% D2O. Spectra were superimposed at time 0’ (red), 60’ (green), 260’ (black). e, 1H peak integration of serine Cα proton is plotted in the interval 0’−260’.
The CA product of the CL reaction could be converted into taurine in a single step. This reaction, which involves α-decarboxylation of an amino acid, is presumably catalyzed by a PLP-dependent enzyme. Our PLPome comparison did not reveal a putative decarboxylase uniquely present in Gallus. However, Gallus has a bona fide ortholog (XP_025001259; see Fig. 1a and Extended Data Fig. 1) of mammalian cysteine sulfinic acid decarboxylase (CSAD), a protein reportedly able to catalyze CA decarboxylation, albeit with lower efficiency with respect to its CSA substrate[33]. The XP_025001259 protein, containing the typical domain of PLP-dependent decarboxylases (Fig. 1h), was produced in E. coli as a PLP-bound protein, with a prevalent enolimine tautomer of the cofactor (Extended Data Fig. 4a–b). Surprisingly, not only the Gallus protein was able to efficiently catalyze CA decarboxylation to taurine (Fig. 1i), but it was specific for CA, with CSA serving as a poor substrate (Fig. 1j and Extended Data Fig. 4c). The presence of CSA and its reaction product hypotaurine was inhibitory for CA decarboxylation (Extended Data Fig. 4d–f). To acknowledge its substrate specificity, we propose to name CA decarboxylase (CAD) the enzyme encoded by the Gallus gene annotated as CSAD based on its orthology[34]. We confirmed the opposite preference for the CSA substrate in human CSAD in our experimental conditions, and observed the same preference for CSA in a CSAD ortholog of a basal vertebrate (Danio rerio) (Fig. 1k). No conserved differences in residues of the active site cavity were found in GgCAD and sauropsidian orthologs with respect to CSAD proteins from other vertebrates. However, two conserved substitutions (hydrophobic → hydrophilic) in sauropsidian sequences were observed in residues located within 5 Å from the active site cavity (Extended Data Fig. 5a). Analysis of the activity with the two substrates in single (Q467V and T470A) and double site-directed GgCAD mutants, showed that these two substitutions contribute to the preference of the Gallus protein for CA (Extended Data Fig. 5b,c).
Extended Data Fig. 4
Gallus CSAD encodes a PLP-dependent cysteic acid decarboxylase (CAD).
a, Absorbance spectrum of GgCAD in 20 mM NaH2PO4, pH 8.0 and 100 mM NaCl; The absorbance region of PLP tautomers (enolimine 340 nm, ketoenamine 415 nm) is shown in the inset. b, Fluorescence emission spectrum of PLP enolimine tautomer upon excitation at 340 nm. c, Time-resolved 1H NMR spectra of cysteine sulfinic acid (5 mM) in the presence of GgCAD (1 μM), showing partial formation of hypotaurine (inset). d, Time-resolved 1H NMR spectra of 5 mM of cysteic acid (atoms labeled in blue) and 5 mM of hypotaurine (atoms labeled in red) in the presence of GgCAD (1 μM), showing slight inhibition of CAD activity. e, Time-resolved 1H NMR spectra of 5 mM of cysteic acid (atoms labeled in blue) and 5 mM of cysteine sulfinic acid (atoms labeled in red) in the presence of GgCAD (1 μM), showing strong inhibition of CAD activity. f, 1H peak integration of CA signals in the presence of GgCAD and hypotaurine (CA + Hyp) or cysteine sulfinic acid (CA + CSA).
Extended Data Fig. 5
Analysis of Gallus CSAD site-directed mutants.
a, Multiple alignment of CSAD orthologs from (1) non-sauropsids and (2) sauropsids. Conserved differences between groups are shaded in green. Residues that recognize PLP (yellow) or line the active site cavity (white) or are within 5 Å from the active site cavity (blue) in the human holo CSAD structure (PDB code 2JIS) are indicated by filled circles; positions of site-directed mutants are indicated by red arrows. b, Specific activities of wild-type (WT), single (Q467V, T470A), and double (Q467V/T470A) GgCAD mutants with CA and CSA substrates. c, 1H NMR spectra showing decarboxylation activity of wild-type (WT), single (Q467V, T470A) and double (Q467V-T470A) mutants in the presence of cysteic acid (right) and cysteine sulfinic acid (left) after 5’ of reaction stopped with 1M HCl.
To determine the expression of CL, CBS and CSAD genes during early stages of embryogenesis, whole mount in situ hybridization analyses were performed in chicken embryos between 0.5 and 4 days of development (Hamburger-Hamilton [HH] stages 4–24)[35]. At HH stage 4, CL expression was first detected in the extraembryonic endoderm at the boundary of the area pellucida and area opaca (Fig. 2a). At HH stages 10 and 18, CL mRNAs were broadly detected throughout the extraembryonic endoderm (Fig. 2b,c). At HH stage 18 and 24, widespread expression was also evident in the embryo proper (Fig. 2c,d). CBS expression was first detected at HH stage 4 weakly in the epiblast (Fig. 2e). At HH stage 10, CBS mRNAs were localized to the head region and in the intermediate mesoderm, with strong expression in the primitive blood cells of the extraembryonic blood islands (Fig. 2f). Broad CBS expression was evident throughout the embryo at HH stages 18 and 24 (Fig. 2g,h). CSAD expression was first detected at HH stage 4 in the extraembryonic endoderm (Fig. 2i). Expression in extraembryonic endoderm persisted at HH stages 10 and 18 (Fig. 2j,k). At HH stage 24, CSAD mRNAs were detected throughout the embryo, with higher levels of expression observed in liver and mesonephros (arrowhead and arrow, Fig. 2l). Inspection of the genomic regions containing CL, CBS, and CSAD genes revealed that CL is adjacent to CBS in a head-to-tail orientation on the chromosome 1 (Fig. 2m). Analysis of available RNA-seq profiles shows a prevalence of CBS over CL transcripts in the aggregated dataset. Tissue-specific RNA-seq profiles show abundant CBS transcripts in adult kidney and liver where CL transcripts are barely detected. Conversely, CL transcripts are more abundant than CBS transcripts in adult duodenum (Fig. 2m). CSAD is located on chromosome 33 adjacent to ZNF740 in a head-to-head orientation (Fig. 2n). The same organization is also observed for the human gene (www.ncbi.nlm.nih.gov/gene/51380), supporting orthology. CSAD transcripts are present in several adult tissues and especially abundant in kidney, liver, and duodenum (Fig. 2n).
Fig. 2.
CL, CBS, and CSAD genes are expressed since early stages of embryogenesis.
a-d, In situ hybridization analysis of CL expression in chick embryos at Hamburger-Hamilton developmental stages 4, 10, 18, and 24, sorted from left to right. e-h, In situ hybridization analysis of CBS expression at HH stages 4, 10, 18, and 24, sortexd from left to right. i-l, In situ hybridization analysis of CSAD expression at HH stages 4, 10, 18, and 24, sorted from left to right. Scale bars: 1mm (a-c, e-g, i-k) 5mm (d, h, i). m, NCBI Sequence-viewer representation of the genomic region on Gallus gallus chromosome 1 (annotation release 104) encompassing the CBS and CL genes. Gene exon structure is represented by green segments. Blue bars represent RNA-seq exon coverage (log2 scaled) for aggregate, kidney (SAMEA2201372), liver (SAMEA2201470), and duodenum (SAMN03376186) datasets. n, NCBI Sequence-viewer representation of the genomic region on Gallus gallus chromosome 33 encompassing the CSAD gene; tracks are as in panel m.
By combining the evidence obtained with CL, CBS, and CAD proteins, one can define the pathway that produces sulfonated amino acids in embryonated chicken eggs (Fig. 3). The replacement of the thiol group of cysteine with sulfite by CL produces CA, which is decarboxylated by CAD to produce taurine (Fig. 3, upper branch). When these proteins are used together in the presence of sulfite, cysteine is quantitatively converted into taurine with little transient accumulation of CA (Extended Data Fig. 6a). Analysis of CL orthologs (see below) suggests that this pathway is universal in sauropsids. By contrast, the pathway is absent in synapsids or other vertebrates, in which taurine is formed through cysteine oxidation by cysteine dioxygenase (CDO) followed by CSA decarboxylation to hypotaurine and hypotaurine oxidation[19,22] (Fig. 3, lower branch). The sauropsidian pathway is shorter and does not involve formation of hypotaurine and sulfur oxidation. The H2S produced by the CL reaction can be recycled for the formation of cysteine from other amino acids (Fig. 3, dashed line). In particular, the formation of cysteine from serine and H2S is catalyzed by CBS (see Fig. 1g), whose gene is expressed at early stages in the chicken embryo (see Fig. 2e–h). CBS should be thus responsible for the serine hydrolase activity described in the chicken embryo liver[36]. By adding in vitro GgCBS and serine to the other components of the pathway, a similar consumption of cysteine (Extended Data Fig. 6B) produces twice as much taurine (Extended Data Fig. 6c) with complete consumption of serine (Extended Data Fig. 6d).
Fig. 3.
The CL pathway for taurine biosynthesis.
The identified pathway for sulfonated amino acids biosynthesis is shown in comparison with the known pathway for taurine biosynthesis. The dashed line shows recycle of hydrogen sulfide into cysteine catalyzed by CBS. Cysteine and sulfite sulfur atoms are denoted in different colors to highlight the different sources of sulfur. Enzymes are indicated by EC numbers (if any) and protein abbreviations as follows: cystathionine beta-synthase (CBS), cysteine lyase (CL), cysteic acid decarboxylase (CAD), cysteine dioxygenase (CDO), cysteine sulfinic acid decarboxylase (CSAD), flavin-containing monooxygenase 1 (FMO1).
Extended Data Fig. 6
One-pot enzymatic synthesis of taurine from cysteine.
a, Time-resolved 1H NMR spectra of cysteine (5 mM) and sulfite (7 mM) in the presence of recombinant GgCL (1 μM) and GgCAD (1 μM) proteins. b-d, 1H peak integration of (b) cysteine, (c) taurine, and (d) serine NMR signals in the same reaction conditions as (a) in the absence (black dots) or in the presence (blue dots) of serine (5 mM) and GgCBS (4 μM).
Evolution of an enzyme able to catalyze the CL reaction has been key to the origin of the metabolic pathway. We found bona fide CL orthologs only in sauropsids, suggesting an origin of the protein family in this lineage. CL sequences form a separate group within the vertebrate CBS tree (Extended Data Fig. 7). In the maximum likelihood (ML) protein tree, the CL clade branches basal to teleostei (Extended Data Fig. 7a), while in the ML nucleotide tree, CL branches basal to amniotes (Extended Data Fig. 7b). These phylogenetic reconstructions are complicated by differences in evolutionary rates and possible long-branch attraction artifacts[37] causing attraction of the fast evolving clade (CL) towards the basal clades. Rate differences between CL and CBS are mainly due to amino acid substitutions (Extended Data Fig. 7a,b). The ML tree obtained with the third (~synonymous) codon position showed reduced differences in branch lengths and the expected sister relationship of sauropsidian CBS and CL clades (Extended Data Fig. 7c). The CBS-CL locus (see Fig. 2m) is present in conserved synteny in all sauropsidian genomes and absent in non-sauropsids (Fig. 4). This suggests that CL evolved by tandem duplication of CBS after the split of sauropsids and synapsids in the late Paleozoic, c.a. 300 MYA. After duplication, one copy retained the original function while the other developed a novel catalytic ability through molecular changes involving one deletion and five substitutions of conserved active site residues; this neofunctionalization process was completed before separation of extant sauropsids (Fig. 4 and Extended Data Fig. 8). A further step involved adaptation of CSAD ortholog to the new function (co-option, Fig. 4) by promoting a secondary activity (CAD) to the main one. Interestingly, this tuning of substrate specificity occurred with substitution of residues located externally to the active site (see Extended Data Fig. 5). Based on evidence from extant genes and ancestral reconstruction, all sauropsids are expected to have inherited and maintained the CL pathway for taurine biosynthesis.
Extended Data Fig. 7
Phylogeny of CBS and CL proteins in vertebrates.
Unrooted maximum-likelihood (ML) trees obtained from protein and nucleotide alignments of 35 CBS and CL sequences from 26 vertebrate species. Protein and nucleotide accession numbers corresponding to tree tip names are indicated; sauropsidian sequences are shaded in blue. Scale bars represent the number of calculated substitutions per site. a, Protein ML tree (436 alignment patterns) showing branching of the CL clade basal to teleostei. b, Nucleotide ML tree (1277 alignment patterns) showing branching of the CL clade basal to amniotes. c, Third codon position ML tree (613 alignment patterns) showing branching of the CL clade within sauropsida.
Fig. 4.
Origin and conservation of the CL pathway in birds and reptiles.
Key evolutionary events in the origin of the metabolic pathway are mapped on a vertebrate chronogram. Phylogenetic relationships and divergence times are from TimeTree [1]; empty nodes correspond to unresolved relationships in NCBI taxonomy. The dashed line indicates uncertainty in dating the events along the branch and within the temporal boundaries delimited by blue segments. The configuration of the CBS locus in the different species is represented at the terminal nodes, showing conservation of CBS-CL synteny in sauropsids.
Extended Data Fig. 8
Ancestral substitutions in CL neofunctionalization.
a, Evolutionary dendrogram used in ancestral state reconstructions assuming split of amniote last common ancestor (Amniote; N2) into two lineages before the gene duplication (GD; N12) leading to saurospidian CL (sCL; N13) and CBS (sCBS; N21). Sequence identifiers are as in Figure S7. b, Multiple alignment of reconstructed ancestral sequences corresponding to nodes N2, N12, N13, and N21. Active site residues are indicated by blue triangles. Positions with Identical residues in the four nodes and human CBS are shaded gray. Numeration is in accordance with the human CBS sequence. c, Character state probabilities for active site residues substituted in GgCL showing high probability of fixation before the split of extant sauropsids.
Discussion
Identification in this study of a route to taurine biosynthesis that originated in the common ancestor of birds and reptiles reveals that early amniote adaptation to terrestrial habitats -particularly embryonic development in the reptilian egg - also entailed the evolution of novel enzymes and metabolic pathways. Knowledge of the requirement of a particular cofactor (PLP) for the enzymatic activity has been key to the discovery of the CL gene since it allowed restriction in the search to a limited number of proteins expected to use that cofactor. The experimental characterization of the CL candidate provided clear-cut evidence for its functional assignment. The identification of the cysteic acid decarboxylase (CAD) as the enzyme responsible for the conversion of the CL reaction product into taurine has been a surprising outcome of the experimental validation process. No such enzyme has been previously described in literature. The Gallus ortholog of mammalian cysteine sulfinate decarboxylase (CSAD) was tested in view of the reported ability of CSAD to decarboxylate CA as secondary reaction. Unexpectedly, this protein revealed a strong preference for CA. The evolutionary shift of CSAD towards a preference for the CL reaction product (see Fig. 1k) gives independent support to the physiological relevance of the CL reaction and pathway for taurine biosynthesis. It further suggests that the CDO pathway (see Fig. 3) is less relevant for taurine production in sauropsids despite maintenance of a CDO gene in their genomes.While physiological roles of taurine in specific organs such as retina, brain, muscles, and kidneys have been investigated only in mammals[19], there is evidence of the importance of taurine in the formation of bile in sauropsids[38-40]. Only taurine conjugates are found in chicken bile, owing to the inability of the Gallus bile acid-CoA:amino acid N-acyltransferase (BAAT) enzyme to use glycine in substitution of taurine[41]. During embryo development, bile aids utilization of lipids, the major source of energy of the egg yolk and a source of metabolic water[42]. Fats are absorbed through the yolk sac membrane, by specialized endodermal cells containing bile and lipases[43]. Expression of the CL gene in the extraembryonic endoderm during early development (see Fig. 2a–d) suggests that formation of bile for fat digestion is a role of the taurine produced by this pathway. However, the more pleiotropic expression of the CSAD gene, especially during later development (see Fig. 2i–l), is consistent with a broader physiological role of taurine in different organs, such as regulation of cellular osmolarity[44] and cytoprotection[45], as observed in mammals.Given that taurine biosynthesis is required in the reptilian egg, a question is why sauropsids evolved a different pathway. A possible advantage of the novel biosynthetic route is that it does not require molecular oxygen. Although gas exchange is ensured by microscopic pores in the egg shell, oxygen is limiting for the growth of chick embryos and critical for embryo survival in reptile species that nest underground[46,47]. The saving of reduced sulfur can be an additional advantage of the CL pathway, as oxidized sulfur (SO32−) is used instead of cysteine oxidation to obtain sulfonated amino acids. In the chicken egg, also sulfate (SO42−) is eventually incorporated into taurine by enzymatic conversion to sulfite[15]. Sulfite and sulfate are products of spontaneous oxidation of the sulfur present in the cell. Conversely, animals are unable to reduce oxidized sulfur to sulfide for its incorporation into amino acids. Therefore, in oviparous amniotes the reduced sulfur needed for embryo development until hatching must be stored in the egg. This sulfur content is witnessed by features observed in everyday use of unfertilized eggs: release of H2S during heating contributes to the distinct egg flavour and is responsible for formation of FeS precipitates that turn green the yolk surface of hard-boiled eggs[48]. The CL pathway can provide a more efficient way to synthesize taurine by finding a use for oxidized sulfur that otherwise would be a waste -likely toxic[49] - product of cellular metabolism. In addition, functional and phylogenetic links between CBS and CL support the existence of a reduced sulfur cycle in the reptilian egg allowing the reuse of H2S in amino acids (see Fig. 3). Origin and maintenance of this pathway in the saupsidian lineage provide evidence that the need to complete development out of water in a self-contained life-supporting structure imposes a selective pressure on embryo metabolism for efficient use of resources.
Methods
In silico subtraction of PLPomes
Gallus gallus and Homo sapiens PLPomes were compared side-by-side using the “whole genome analysis” tool of B6db (http://bioinformatics.unipr.it/B6db) with the option “exclude isoforms”. To facilitate identification of common and unique genes, the results were visually inspected with the help of the “highlight BRH” option on the results page painting in color entries that are Best Reciprocal Hits (BRH) in the two species. The comparison was extended to other sauropsids (e.g. Anolis carolinensis) and mammals (e.g. Mus musculus) to infer the conservation of unique genes in their respective taxonomic classes.To determine active site conservation of the identified Gallus genes, substrate-binding cavities of human CBS (PDB: 4L3V) and CSAD (PDB: 2JIS) were determined through cavity computation by CAVER Analyst 2.0 BETA with Large Probe and Probe (respectively. 3.00 and 2.50 Å in CBS, and 4.00 and 2.80 in CSAD). Residues corresponding to the cavity were highlighted in multiple alignments (see Extended Data Fig. 2A, 5A, and 8B) using Espript ver. 3.0 (http://espript.ibcp.fr).
Molecular phylogeny
Protein sequences were downloaded from NCBI and aligned with Clustalw 2.1. To obtain coding sequence (CDS) alignments, CDS were extracted from the corresponding mRNA sequences using ORFfinder 0.4.3 with the options “-s 0 -ml 1000 -strand plus -outfmt 1”, and aligned based on amino acid alignment with macse v2.03. Phylogenetic trees were constructed with RAxML v. 7.7.8 using the general time reversible (GTR) amino acid substitution matrix with optimization of substitution rates and GAMMA model of rate heterogeneity. The partitioning of codon positions was specified in a partition file to generate separated alignment sets for the first and second codon positions and third codon position using the option ‘-f s’. Maximum-likelihood reconstruction of ancestral character states including insertions/deletions[51] was obtained with the FastML web server (http://fastml.tau.ac.il/) based on extant CBS and CL sequences and a phylogenetic tree assuming CBS duplication in the sauropsidian ancestor.
Embryo Collection and In Situ Hybridization
Fertile chicken eggs (Hy-Line, Iowa) were incubated in a humidified incubator at 37.5 °C for 0.5 to 5 days. Embryos were collected into chilled chick saline (123 mM NaCl), removed from the vitelline membrane and cleaned of yolk. Extra‐embryonic membranes and large body cavities (brain vesicles, atria, allantois, eye) were opened to minimize trapping of the in situ reagents. Embryos were fixed overnight at 4°C in freshly prepared 4% paraformaldehyde in PBS, washed twice briefly in PBS plus 0.1% Triton X-100 then dehydrated through a graded MEOH series and stored at −20 °C overnight in 100% MEOH. cDNA templates for generating all antisense RNA probes were obtained by reverse transcriptase‐polymerase chain reaction using pooled RNA from embryos between HH stages 4 and 30. Primer sequences were designed using the mRNA sequence in the NCBI database. Embryo processing, antisense RNA probe preparation and whole‐mount ISHs were performed as described[52]. Experiments with fertilized eggs were conducted in accordance with federal agency guidelines. A detailed protocol is available for download at http://geisha.arizona.edu.
Plasmid Construction
For construction of GgCL expression plasmids, the LOC418544 (NCBI GeneID: 418544) CDS sequence (XM_015300896) inserted into pcDNA3.1+/C-(K)DYK vector was purchased from GenScript (USA Inc.). The sequence was then amplified using CBSL_Fw, CBSL_Rev primers (native GgCL) or CBSL_Fw, CBSL_short_Rev (truncated GgCL) by PCR, using Phusion DNA polymerase, and inserted into pET-28b expression vector at NdeI/XhoI sites, generating respectively pET-28b-nativeGgCL and pET-28b-truncatedGgCL. Details of the designed primers are reported in supplementary Table 1. A first transformation of the constructs into E. coli XL1Blue strain by electroporation was performed for plasmid amplification. Plasmids were extracted by alkaline lysis and transformed into E. coli BL21 Codon Plus strain by electroporation. For construction of GgCAD expression plasmids, GgCAD wild-type sequence (NCBI GeneID: 426184) and mutated sequences (Q467V, T470A, Q467V-T470A) inserted into pET-28b expression vector were purchased from GenScript (USA Inc.), generating respectively pET-28b-GgCAD, pET-28b-GgCAD_Q467V, pET-28b-GgCAD_T470A, pET-28b-GgCAD_Q467V-T470A. The constructs were transformed directly into E. coli BL21 Codon Plus by electroporation. The authenticity of all constructs was verified by sequence analysis.
Protein expression and purification
Protein expression was performed inoculating a single colony of every clone in a Liter of autoinducing LB broth obtained by adding 0,5 g/L glucose and 2 g/L lactose to standard LB medium. Cells were grown at 30°C for 16h (GgCL, GgCBS), or at 20°C for 16h after a pre-induction phase at 30°C for 8h (GgCAD). Cell pellets were resuspended in 50 mL of Lysis Buffer (NaH2PO4 20 mM pH 7.0, NaCl 100 mM, 20 μM PLP, 50 mM imidazole), sonicated (1s on/off alternatively at 30 W for 30 min) and centrifuged (14000 rpm for 40 minutes). Supernatant was loaded onto a 50 mL Superloop of AKTA pure system FPLC and purified by Affinity Chromatography (AC) using HisTrap 5 mL FF column. Proteins were eluted with AC Elution Buffer (NaH2PO4 20 mM pH 7.0, NaCl 100 mM, 20 μM PLP, 500 mM imidazole). GgCL fractions were collected and diluted in 50 mL of Loading Buffer (MES 20 mM pH 6.5, 20 μM PLP) for Cation Exchange Chromatography (CIEX) using HiTrap SP FF column, and eluted in 35 mL gradient of CIEX Elution Buffer (MES 20 mM pH 6.5, 1M NaCl, 20 μM PLP). Protein fractions (see Extended Data Fig. 2b) were collected and concentrated by Vivaspin™ centrifugation for further purification steps. Size Exclusion Chromatography (SEC) was performed with Superdex 200 column using SEC Buffer (NaH2PO4 20 mM pH 7.0, NaCl 100 mM) used as well as Storage Buffer. GgCAD fractions after AC were collected and diluted in 50 mL of Loading Buffer (NaH2PO4 20 mM pH 8.0, 20 μM PLP) for Anion Exchange Chromatography (AIEX) using HiTrap Q FF column, and eluted in 35 mL gradient of AIEX Elution Buffer (NaH2PO4 20 mM pH 8.0, 1M NaCl, 20 μM PLP).
UV-Visible and fluorescence spectroscopy
JASCO spectrophotometer was used to measure absorbance spectra of purified enzymes and for determination of kinetic parameters of GgCL. Eluted fractions of enzymes were measured for protein quantification. For the quantification of GgCL, absorbance at 428 nm (Soret peak) was measured, using an extinction coefficient of 84900 M−1 cm−1 previously determined for HsCBS[53]. For the quantification of different CSAD/CAD proteins, absorbance at 280 nm was measured, using molar extinction coefficients computed with ProtParam (57410 M−1 cm−1
GgCAD, 61880 M−1 cm−1
HsCSAD, 72880 M−1 cm−1
DrCSAD).H2S release due to CL reactions was followed spectrophotometrically at 390 nm as formation of PbS using the previously calculated 54 extinction coefficient of 5500 M-1 cm-1. Reaction mixture were prepared in a 1 mL plastic cuvette with 50 mM NaH2PO4 pH 7.0, different concentrations of cysteine and sulfite, and 0.4 mM of lead acetate; the reaction was started by addition of 1 μM GgCL. Velocity with different concentrations of cysteine and sulfite were taken at maximum speed reached by the enzyme, that did not correspond to the initial velocity of the kinetic, due to an appreciable delay at start (see Extended Data Fig. 2g). Data were fitted to the Michaelis-Menten equation using SigmaPlot 14.0.The presence of PLP bound to the protein was assessed by fluorescence spectroscopy[55]. Fluorescence measurements were performed on a FluoroMax-3 spectrofluorometer (HORIBA Jobin Yvon, Kyoto, Japan) equipped with a thermostatic bath, set at 20 °C. GgCL concentration was 40 μM in 20 mM NaH2PO4 pH 7.0, 100 mM NaCl; GgCAD concentration was 20 μM in 20 mM NaH2PO4 pH 8.0, 100 mM NaCl. Excitation and emission slits width was set to 7 nm and the integration time to 0.6 seconds. GgCL emission spectrum was recorded between 425 nm and 600 nm using an excitation wavelength of 412 nm. GgCAD emission spectrum was recorded between 355 nm and 500 nm using an excitation wavelength of 340 nm. The spectra were corrected for buffer contribution.
NMR Spectroscopy
1H NMR spectra were acquired with a JEOL ECZ600R spectrometer in no spinning mode. Samples were loaded in Wilmad ECONOMY NMR tubes, solved in 600 ul of H2O:D2O (9:1). For single spectra measurement (i.e. substrates spectra t0) we used simple DANTE presat sequence for H2O suppression. To monitor the reaction kinetics, we use a kinetic array of DANTE presat sequence with 600 min periods for 1h, 2h, 3h depending on reaction speed. NMR experiments were performed with 50 mM NaH2PO4 pH 7.0 to avoid signals of organic buffers in 1H NMR spectra. The time-courses of the CAD reaction at a given initial substrate concentration were fitted with the integrated Michaelis-Menten equation[50] using the R software. The R script containing the equation and commands (SM.tar.gz) used to produce the fitting shown in Fig 1j is provided in the dataset deposited at the Harvard dataverse repository (https://doi.org/10.7910/DVN/UYAUBO). Specific activities of CSAD/CAD enzymes and mutants were determined by quantifying the reaction products obtained with CSA and CA substrates after 5 minutes of reactions stopped with 1 M HCl. All the spectra and kinetics were collected at 25°C.
In silico subtraction of chicken and human PLPomes.
Comparison of the complete set of PLP-dependent enzymes (one isoform per gene) in Gallus gallus and Homo sapiens as classified by B6db. Orthologous proteins (BRH test) are colored blue. Gallus proteins without human orthologs are in bold. E-values indicate significance of the protein alignments with family-level Hidden Markov Models.
GgCL is a heme and PLP protein with cysteine lyase activity.
a, Multiple alignment of H. sapiens CBS (HsCBS) with G. gallus CBS and CL proteins (GgCBS, GgCL). Filled circles indicate residues that recognize heme (red), PLP (yellow) and serine (white) in the holo CBS structure (PDB code 3PC4). Conserved residues based on the alignment of 8 CL and 22 CBS sequences from vertebrates are shaded in black. Green shading indicates conserved differences between CBS and CL groups. b, Photograph of the FPLC collector after cation exchange, showing the vivid orange color of GgCL protein fractions (upper panel); selected fractions were subjected to SDS-PAGE electrophoresis and stained with Coomassie Brilliant Blue (lower panel). c, Gel filtration chromatogram (Superdex 200) with dual wavelength detection (λ = 280, 428 nm), showing a molecular weight corresponding to GgCL monomer. d, GgCL predicted interactions with heme (left) and PLP (right) are shown with residues conserved in the alignment of CBS/CL proteins highlighted in colors. e, Absorbance spectrum of recombinant GgCL (16.5 μM) in NaH2PO4 (20 mM), pH 7.0. f, Fluorescence emission spectrum (excitation: 412 nm) of recombinant GgCL (22 μM) in NaH2PO4, pH 7.0. g, Kinetics of H2S release by the CL reaction monitored spectrophotometrically at 390 nm in 50 mM NaH2PO4, pH 7.0 with GgCL (1 μM), lead acetate (0.4 mM), cysteine (5 mM) in the absence (dashed line) or in the presence of Na2SO3 (5 mM, solid line). h-i, Non linear fitting to the Michaelis Menten equation of the dependency on substrate concentrations of the initial reaction velocity of GgCL (1 μM) with fixed (h) Na2SO3 (5 mM) and (i) cysteine (40 mM). Data are means ± SDV of three independent experiments. j, Time-resolved 1H NMR spectra of cysteine (10 mM) in the presence of GgCL (1 μM), showing partial conversion into lanthionine.
Absence of CBS activity in GgCL.
a, Time-resolved 1H NMR spectra of 5 mM of serine (atoms labeled in blue) and 5 mM of DL-homocysteine (atoms labeled in red) in the presence of GgCL (1 μM). b, Time-resolved 1H NMR spectra of serine (5 mM) and Na2S (5 mM) in the presence of GgCL (1 μM). c, Time-resolved 1H NMR spectra of 5 mM of serine (atoms labeled in blue) and 10 mM of DL-homocysteine (atoms labeled in red) in the presence of GgCBS (4 μM), showing complete consumption of serine and partial conversion of DL-homocysteine in cystathionine (atoms labeled in green) due to the stereospecific enzymatic reaction. d, Hydrogen-Deuterium exchange of serine alpha proton catalysed by GgCL (1 μM) in 95% D2O. Spectra were superimposed at time 0’ (red), 60’ (green), 260’ (black). e, 1H peak integration of serine Cα proton is plotted in the interval 0’−260’.
Gallus CSAD encodes a PLP-dependent cysteic acid decarboxylase (CAD).
a, Absorbance spectrum of GgCAD in 20 mM NaH2PO4, pH 8.0 and 100 mM NaCl; The absorbance region of PLP tautomers (enolimine 340 nm, ketoenamine 415 nm) is shown in the inset. b, Fluorescence emission spectrum of PLP enolimine tautomer upon excitation at 340 nm. c, Time-resolved 1H NMR spectra of cysteine sulfinic acid (5 mM) in the presence of GgCAD (1 μM), showing partial formation of hypotaurine (inset). d, Time-resolved 1H NMR spectra of 5 mM of cysteic acid (atoms labeled in blue) and 5 mM of hypotaurine (atoms labeled in red) in the presence of GgCAD (1 μM), showing slight inhibition of CAD activity. e, Time-resolved 1H NMR spectra of 5 mM of cysteic acid (atoms labeled in blue) and 5 mM of cysteine sulfinic acid (atoms labeled in red) in the presence of GgCAD (1 μM), showing strong inhibition of CAD activity. f, 1H peak integration of CA signals in the presence of GgCAD and hypotaurine (CA + Hyp) or cysteine sulfinic acid (CA + CSA).
Analysis of Gallus CSAD site-directed mutants.
a, Multiple alignment of CSAD orthologs from (1) non-sauropsids and (2) sauropsids. Conserved differences between groups are shaded in green. Residues that recognize PLP (yellow) or line the active site cavity (white) or are within 5 Å from the active site cavity (blue) in the human holo CSAD structure (PDB code 2JIS) are indicated by filled circles; positions of site-directed mutants are indicated by red arrows. b, Specific activities of wild-type (WT), single (Q467V, T470A), and double (Q467V/T470A) GgCAD mutants with CA and CSA substrates. c, 1H NMR spectra showing decarboxylation activity of wild-type (WT), single (Q467V, T470A) and double (Q467V-T470A) mutants in the presence of cysteic acid (right) and cysteine sulfinic acid (left) after 5’ of reaction stopped with 1M HCl.
One-pot enzymatic synthesis of taurine from cysteine.
a, Time-resolved 1H NMR spectra of cysteine (5 mM) and sulfite (7 mM) in the presence of recombinant GgCL (1 μM) and GgCAD (1 μM) proteins. b-d, 1H peak integration of (b) cysteine, (c) taurine, and (d) serine NMR signals in the same reaction conditions as (a) in the absence (black dots) or in the presence (blue dots) of serine (5 mM) and GgCBS (4 μM).
Phylogeny of CBS and CL proteins in vertebrates.
Unrooted maximum-likelihood (ML) trees obtained from protein and nucleotide alignments of 35 CBS and CL sequences from 26 vertebrate species. Protein and nucleotide accession numbers corresponding to tree tip names are indicated; sauropsidian sequences are shaded in blue. Scale bars represent the number of calculated substitutions per site. a, Protein ML tree (436 alignment patterns) showing branching of the CL clade basal to teleostei. b, Nucleotide ML tree (1277 alignment patterns) showing branching of the CL clade basal to amniotes. c, Third codon position ML tree (613 alignment patterns) showing branching of the CL clade within sauropsida.
Ancestral substitutions in CL neofunctionalization.
a, Evolutionary dendrogram used in ancestral state reconstructions assuming split of amniote last common ancestor (Amniote; N2) into two lineages before the gene duplication (GD; N12) leading to saurospidian CL (sCL; N13) and CBS (sCBS; N21). Sequence identifiers are as in Figure S7. b, Multiple alignment of reconstructed ancestral sequences corresponding to nodes N2, N12, N13, and N21. Active site residues are indicated by blue triangles. Positions with Identical residues in the four nodes and human CBS are shaded gray. Numeration is in accordance with the human CBS sequence. c, Character state probabilities for active site residues substituted in GgCL showing high probability of fixation before the split of extant sauropsids.
Authors: Ron Caspi; Richard Billington; Carol A Fulcher; Ingrid M Keseler; Anamika Kothari; Markus Krummenacker; Mario Latendresse; Peter E Midford; Quang Ong; Wai Kit Ong; Suzanne Paley; Pallavi Subhraveti; Peter D Karp Journal: Nucleic Acids Res Date: 2018-01-04 Impact factor: 16.971