Literature DB >> 24981511

Molecular evolutionary and structural analysis of the cytosolic DNA sensor cGAS and STING.

Xiaomei Wu1, Fei-Hua Wu2, Xiaoqiang Wang3, Lilin Wang4, James N Siedow5, Weiguo Zhang6, Zhen-Ming Pei7.   

Abstract

Cyclic GMP-AMP (cGAMP) synthase (cGAS) is recently identified as a cytosolic DNA sensor and generates a non-canonical cGAMP that contains G(2',5')pA and A(3',5')pG phosphodiester linkages. cGAMP activates STING which triggers innate immune responses in mammals. However, the evolutionary functions and origins of cGAS and STING remain largely elusive. Here, we carried out comprehensive evolutionary analyses of the cGAS-STING pathway. Phylogenetic analysis of cGAS and STING families showed that their origins could be traced back to a choanoflagellate Monosiga brevicollis. Modern cGAS and STING may have acquired structural features, including zinc-ribbon domain and critical amino acid residues for DNA binding in cGAS as well as carboxy terminal tail domain for transducing signals in STING, only recently in vertebrates. In invertebrates, cGAS homologs may not act as DNA sensors. Both proteins cooperate extensively, have similar evolutionary characteristics, and thus may have co-evolved during metazoan evolution. cGAS homologs and a prokaryotic dinucleotide cyclase for canonical cGAMP share conserved secondary structures and catalytic residues. Therefore, non-mammalian cGAS may function as a nucleotidyltransferase and could produce cGAMP and other cyclic dinucleotides. Taken together, assembling signaling components of the cGAS-STING pathway onto the eukaryotic evolutionary map illuminates the functions and origins of this innate immune pathway.
© The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

Entities:  

Mesh:

Substances:

Year:  2014        PMID: 24981511      PMCID: PMC4117786          DOI: 10.1093/nar/gku569

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

Innate immune sensing of microbial infections represents a crucial element for host defense. Utilizing germ-line encoded pattern recognition receptors (PRRs), innate immunity examines extracellular, endosomal and cytosolic compartments for signs of infection and triggers type I interferon (IFN) induction and other proinflammatory cytokines when the pathway is activated. DNA has been known to stimulate immune responses for more than a century, long before it was shown to be the genetic material. Cytosolic DNA of pathogenic bacterial or viral origin, or leaking from the nucleus or mitochondria following cell stress can be sensed by eukaryotic cells as a danger signal or a sign of foreign invasion (1). The accumulation of self-DNA can also produce severe autoimmune diseases, such as systemic lupus erythematosus. Over the past several years, many PRRs for cytosolic DNA have been identified, including DNA-dependent activator of IFN-regulatory factors (DAI) (2), RNA polymerase III (3,4), DEAD box polypeptide 41 (DDX41) (5), absent in melanoma 2 (AIM2) (6–8) and IFN-inducible protein 16 (IFI16) (9). The presence of these multiple PRRs may reflect their functioning in a distinct cell-type- or DNA-sequence-specific manner (10), and no consensus has emerged until recently. Detection of cytosolic DNA activates a stimulator of IFN genes (STING, also known as MITA, MPYS, ERIS or TMEM173), an endoplasmic reticulum (ER) translocon-associated transmembrane protein (11–14). STING in turn initiates a cascade of known events by first recruiting and activating the cytosolic kinases, IκB kinase (IKK) and TANK-binding kinase 1 (TBK1), which phosphorylate and activate the transcription factors nuclear factor κB (NF-κB) and IFN regulatory factor 3 (IRF3), respectively. NF-κB and IRF3 then enter the nucleus and function together to induce IFNs and other cytokines, and thereby trigger the host immune response (1). STING is a central player in the innate immune response to cytosolic nucleic acids (15). STING could also act as a direct PRR for cyclic dinucleotides, such as cyclic diguanylate monophosphate (c-di-GMP) and cyclic diadenylate monophosphate (c-di-AMP), which are conserved signaling molecules produced by bacteria that regulate bacterial motility and biofilm formation (16). Recently, cyclic GMP-AMP (cGAMP) synthase (cGAS, also known as C6orf150 and MAB21D1) was reported in Homo sapiens (human) and Mus musculus (mouse) as a general (with broad specificity) cytosolic DNA sensor for activating type I IFN signaling pathway (17,18). cGAS binds DNA and catalyzes the synthesis of cGAMP from adenosine triphosphate (ATP) and guanosine triphosphate (GTP) in the presence of DNA. cGAMP, an endogenous second messenger structurally similar to c-di-GMP and c-di-AMP, binds and activates STING in the cytoplasm, illuminating how STING can stimulate type I IFN pathway in response to both cytosolic DNA and cyclic dinucleotides (18). Furthermore, cGAMP binds STING with an affinity of ∼10 nM, which is significantly stronger than that of c-di-GMP, and induces a ‘closed’ conformation of STING that is important to activate the downstream of type I IFN (19–21). The details of this cGAS-STING pathway were uncovered by a series of structural, biophysical and biochemical studies (19,21–25). Crystal structures of the nucleotidyltransferase domain of cGAS (23,26–27) established how cGAS functions as a DNA-sensing enzyme in a sequence-independent manner. cGAS interacts with the sugar-phosphate backbone along the minor groove of DNA by employing a positively charged surface as well as a zinc-ribbon domain insertion. A cGAS-DNA complex, harboring one cGAS molecule and one DNA molecule, was focused on in these studies. In contrast, a 2:2 complex that contains dimeric cGAS bound to two DNA molecules, was found most recently (24,25). Both of the two DNA binding surfaces and the dimer interface are critical to DNA binding. More interestingly, the endogenous cGAMP generated by cGAS contains a phosphodiester linkage between the 2′-OH of GMP and the 5′-phosphate of AMP and another between the 3′-OH of AMP and the 5′-phosphate of GMP. This specific isomer of cGAMP with 2′-5′, 3′-5′ linkages is termed 2′3′-cGAMP, and is distinguished from conventional cGAMP (with 3′-5′, 3′-5′ linkages and termed 3′3′-cGAMP) and other cyclic dinucleotides (such as c-di-AMP and c-di-GMP) of microbial origin. Both mouse (R231, with an Arg residue at site 231) and human (R232) STING proteins can be stimulated by 2′3′-cGAMP, 3′3′-cGAMP and bacterial 3′3′-c-di-GMP (20,21), but human STING with the H232 allele is only responsive to 2′3′-cGAMP (20–22). Moreover, the Chen lab has provided genetic evidence that cGAS as well as STING are essential for the induction of type I IFN stimulated by foreign DNA (28). The cGAS-knockout mouse is strikingly similar to goldenticket mouse (loss of function of STING), both of which are susceptible to infection by DNA viruses (11,28). cGAS homologs are present in several vertebrate species and have structures similar to human or murine cGAS (17,26). In fish and pig, STING proteins also act as a mediator for activating different IFN genes (29–31). These results indicate that cGAS-STING signaling may be the major and non-redundant method of DNA recognition in the innate immune system in mammals and even vertebrates. Given the importance of cGAS-STING signaling in mammals, it is necessary to explore the evolutionary origins of this pathway. In this study, we aimed to present a comprehensive molecular evolutionary analysis of cGAS and STING proteins by applying a systematic homolog search on all eukaryotic genomes that have been fully sequenced. We identified the origins of mouse cGAS and STING in the choanoflagellate Monosiga brevicollis, the closest relative of metazoans. During metazoan evolution, both cGAS and STING were lost in nematodes and flatworms. We also utilized several methods to gain novel evolutionary insights into cGAS and STING structure–function relationships: (i) examining the evolutionary pattern of domain organization of cGAS and STING; (ii) mapping reported functionally critical residues on cGAS and STING through multiple sequence alignments across representative species and (iii) structural modeling of STING proteins from species other than human and obtaining a structural basis for understanding their binding with ligands. Furthermore, we presented the evolutionary distribution of the cGAS-STING signaling pathway in representative eukaryotic organisms, allowing us to explore the evolutionary history of this pathway.

MATERIALS AND METHODS

Eukaryotic species

A list of 190 fully sequenced eukaryotic species was derived from the Database of KEGG Organisms (updated on May 20, 2013) (http://www.genome.jp/kegg/catalog/org_list.html) (32). Chromosome, protein and mRNA sequences of these eukaryotic species were downloaded from National Center for Biotechnology Information (NCBI) (release 59) (ftp://ftp.ncbi.nih.gov/refseq/release/). The genome of Ctenophore Mnemiopsis leidyi that was evolved in early metazoans was fully sequenced recently (33). The genome data of M. leidyi was downloaded from the Mnemiopsis Genome Project Portal (http://research.nhgri.nih.gov/mnemiopsis/). The genome data of African clawed frog Xenopus laevis (JGI v6.0) was derived from Xenbase (ftp://ftp.xenbase.org/pub/Genomics/JGI/Xenla6.0/).

Identification of cGAS and STING homologs

Two rounds of searches on the protein and genomic sequences were carried out to detect putative cGAS and STING homologs in fully sequenced eukaryotic species. The illustration of procedures for the cGAS homolog searches is displayed in Supplementary Figure S1A. (i) In the first round of search based on the protein sequences, cGAS homologs were identified using a PSI-BLAST (34) search followed by reverse BLASTP. Mouse cGAS proteins were initially searched against the eukaryotic proteome data set via PSI-BLAST v2.2.26. We set the E-value threshold (-e) as 0.001, the E-value threshold for inclusion in the multipass model (-h) as 0.002 (default value), and the maximum running iterations (-j) as 5. Then, the hits satisfying the thresholds were reversely aligned against the mouse proteome via BLASTP. The putative homologs were identified if the reverse BLASTP best hit was mouse cGAS. (ii) Considering that some cGAS homologs could not be found due to errors in genome annotation or the presence of some genes that have not been annotated, we carried out the second round of search that is based on the genomic sequences. For the species without cGAS homologs detected in the first round, all the candidate cGAS homologs were aligned against the assembled genomic sequences using TBLASTN (E-value ≤ 0.001 and coverage on query sequence ≥45%). The identified gene region in a target genome was extended by 1 kb on both sides, and then input to Fgenesh+ (HMM plus similar protein-based gene prediction) (35) to predict the gene structure and protein sequence. (iii) Finally, 52 cGAS homologs in 45 eukaryotic species were combined from the two rounds of searches that are based on protein and genomic sequences. A total of 48 putative STING homologs from 45 species were obtained in the same way (Supplementary Figure S1B). Detailed information on cGAS and STING homologs are listed in Supplementary Table S1 and their sequences in FASTA format are available in Supplementary Data S1.

Sequence analysis

The cGAS and STING homologs in platypus Ornithorhynchus anatinus are partial sequences. The cGAS homolog in western lowland gorilla lacks ∼44% of the C-terminal fragment compared with the human sequence. To check whether there is any error in sequence annotation that leads to the incompleteness of the three proteins, we obtained the genomic regions of the three proteins and input them into Fgenesh+ by extending 3 kb on both sides. The predicted protein sequences are very similar to the original ones and are still incomplete. We did not include the three partial proteins in sequence alignments and construction of phylogenetic trees of cGAS and STING gene families. STING is known as an ER membrane protein in mammalian cells, containing four N-terminal transmembrane domains (TMs). We used HMMTOP v2.0 (36) to identify the TMs of the putative homologs. HMMTOP results were then checked manually and compared with human TMs. Four STING homologs in Ixodes scapularis, Tribolium castaneum, Apis mellifera and Brachiostoma floridae lacked TM at the N terminus. Considering that there may be some errors in sequence annotation of the four proteins, we obtained their genomic regions by extending 3 kb on both sides and then predicted gene structure and protein sequences using Fgenesh+. The predicted B. floridae protein sequence has two N-terminal TMs detected using HMMTOP and another software Phobius (37). Thus, the information of B. floridae STING homolog was updated in Supplementary Table S1 and Supplementary Data S1. The other three predicted sequences still lack TM. BLASTP result shows that at least 76% coverage on the three TM-lacking sequences in I. scapularis, T. castaneum and A. mellifera could be aligned with mouse STING protein with E-value ≤ 7e-10 (Supplementary Table S2), which indicates that the three proteins do have similar sequences with mouse STING. CTT domains of STING homologs were determined by two steps. Each STING homolog was first aligned with the human CTT region using BLASTP (E-value ≤ 0.01). Then, if the aligned hit on the STING homolog sequence had the two conserved residues, Ser 366 and Leu 374 in human STING which are important for IRF3 activation (38), the STING homolog was considered as containing a CTT domain. The multiple sequence alignment of the human CTT region on eukaryotic STING homologs was shown in Supplementary Figure S2. Amino acid sequences were aligned using PROMALS3D (39) and colored according to BLOSUM62 score in Jalview 2.8 (40). The secondary structure of human cGAS (residues from 164 to 513) was derived from the crystal structure (PDB ID: 4KM5). The secondary structure elements of all other cGAS homologs, Vibrio cholerae DncV and OAS1 proteins were identified using Jpred 3 (41). Distribution of cGAS and STING homologs across a choanoflagellate and 61 metazoan species. A species tree is shown in this figure. Branch lengths are not intended to be to scale. Plant Arabidopsis thaliana and fungus Saccharomyces cerevisiae were used as outgroups to root the species tree. The species phylogenies were inferred based on the summary statistics of coalescence times for 27 multilocus data sets. Support values were derived from 100 bootstrap replicates. Taxonomic branches are labeled in different colors. Pink, Cnidaria; brown, Nematoda; purple, Arthropoda; cyan, Vertebrata; green, Mammalia. Each leaf node is denoted as a standard species name followed by its three-letter abbreviation in brackets. cGAS and STING homologs are labeled with blue and red rectangles, respectively. The rectangles with oblique lines indicate that the genes only have partial sequences and so are excluded from the construction of gene trees in Figure 2. The species harboring multiple copies of cGAS and STING homologs are marked with the number of copies in blue and red, respectively. Sequences of three vertebrate cGAS homologs (marked with blue asterisks) have no insertion of zinc-ribbon DNA-binding domain (see Supplementary Text S1 for inspection of the three proteins). The sequence references, alignments and trees in this figure can be obtained from Supplementary Data S1.
Figure 2.

ML phylogenetic trees showing the relationships of cGAS (A) or STING (B) homologs. Three proteins with partial sequences, two cGAS homologs in platypus Ornithorhynchus anatinus and Gorilla gorilla and one STING homolog in O. anatinus, were not included in the construction of gene trees (see Materials and Methods). Bootstrap values equal to or larger than 75 are marked beside each node. Branch lengths indicate the number of amino acids substitution per site. The leaf nodes are colored according to the color scheme in Figure 1. Each leaf node is depicted as a three-letter abbreviation for species name, followed by cGAS or STING. The correspondence between the abbreviations and the standard species names can be found in Figure 1 and Supplementary Data S1. The sequence references, alignments and trees in this figure are in Supplementary Data S1, and accession numbers of proteins are noted in Supplementary Table S1.

Phylogenetic analysis

A species tree (Figure 1) was constructed for visualization of the distribution of cGAS and STING homologs across 61 metazoan species and the choanoflagellate M. brevicollis. The plant Arabidopsis thaliana and fungus S. cerevisiae were used as outgroup sequences to root the tree. First, 27 highly conserved proteins (Supplementary Table S3) were selected from the published list (42,43). Each set of homologs which were identified via best bi-directional BLASTP search is present in A. thaliana, S. cerevisiae and at least 58 (95% in 62) metazoan and M. brevicollis proteomes. Second, separate multiple sequence alignments for each set of homologs were built using MUSCLE v3.8.31 (44), with the maximum number of iterations set to 100. PhyML v3.1 (45) was used to derive the maximum likelihood (ML) trees with 100 bootstrap replicates by applying the JTT matrix (parameters set as -d aa -b 100 -m JTT -f e -v e -a e –quiet). Note that 100 bootstrapped gene trees were then constructed for each set of homologs. Each gene tree was rerooted with A. thaliana using the nw_reroot tool from the Newick Utilities package v1.6 (46). Third, bootstrapped trees for all genes were combined in one file and input to phybase R package v1.3 (47). From each set of bootstrap replicates, one STAR tree (48) was estimated (47). Based on the multispecies coalescent model, STAR uses average ranks of coalescences to estimate species trees from a set of rooted gene trees and constructs an NJ tree that is a consistent estimate of the species tree topology (48). STAR cannot estimate branch lengths. Finally, a consensus tree was constructed from the 100 bootstrapped STAR trees using the consense program in the phylip package v3.69 (49). Tree visualization was carried out in iTOL v2.1 (50) and then manually labeled and colored for clarity. This species tree is in accordance with trees published before (43,51–52) and NCBI taxonomy with the exception in the branch of mammals. M. musculus and Rattus norvegicus should be closer evolutionarily to Macaca mulatta than Sus scrofa and Bos Taurus.
Figure 1.

Distribution of cGAS and STING homologs across a choanoflagellate and 61 metazoan species. A species tree is shown in this figure. Branch lengths are not intended to be to scale. Plant Arabidopsis thaliana and fungus Saccharomyces cerevisiae were used as outgroups to root the species tree. The species phylogenies were inferred based on the summary statistics of coalescence times for 27 multilocus data sets. Support values were derived from 100 bootstrap replicates. Taxonomic branches are labeled in different colors. Pink, Cnidaria; brown, Nematoda; purple, Arthropoda; cyan, Vertebrata; green, Mammalia. Each leaf node is denoted as a standard species name followed by its three-letter abbreviation in brackets. cGAS and STING homologs are labeled with blue and red rectangles, respectively. The rectangles with oblique lines indicate that the genes only have partial sequences and so are excluded from the construction of gene trees in Figure 2. The species harboring multiple copies of cGAS and STING homologs are marked with the number of copies in blue and red, respectively. Sequences of three vertebrate cGAS homologs (marked with blue asterisks) have no insertion of zinc-ribbon DNA-binding domain (see Supplementary Text S1 for inspection of the three proteins). The sequence references, alignments and trees in this figure can be obtained from Supplementary Data S1.

The gene tree for depicting the phylogenetic relationship of cGAS gene family (Figure 2A) was constructed using PhyML based on the multiple sequence alignment of cGAS sequences via MUSCLE. JTT matrix and 100 bootstrap replicates were applied. The ML tree of STING gene family (Figure 2B) was obtained in the same way. The phylogenetic tree showing the relationship of cGAS homologs, OAS1 homologs and V. cholerae DncV (Supplementary Figure S3) was built using PhyML based on the multiple sequence alignment of these protein sequences via Promals3D (39). All protein sequences, alignments and trees are available in Supplementary Data S1. ML phylogenetic trees showing the relationships of cGAS (A) or STING (B) homologs. Three proteins with partial sequences, two cGAS homologs in platypus Ornithorhynchus anatinus and Gorilla gorilla and one STING homolog in O. anatinus, were not included in the construction of gene trees (see Materials and Methods). Bootstrap values equal to or larger than 75 are marked beside each node. Branch lengths indicate the number of amino acids substitution per site. The leaf nodes are colored according to the color scheme in Figure 1. Each leaf node is depicted as a three-letter abbreviation for species name, followed by cGAS or STING. The correspondence between the abbreviations and the standard species names can be found in Figure 1 and Supplementary Data S1. The sequence references, alignments and trees in this figure are in Supplementary Data S1, and accession numbers of proteins are noted in Supplementary Table S1.

Calculation of Ka and Ks

One homolog pair was aligned at the protein sequence level using MUSCLE (44), and the codon multiple alignments of mRNA sequences were created from the protein alignments using PAL2NAL v14.0 (53). Ka and Ks values were then calculated with yn00 according to Yang and Nielsen (54), implemented in PAML v4.7 (55).

Molecular modeling of STING proteins

The comparative modeling program MODELLER (56) was used to generate models for STING homologs from other non-mammalian species. The structures of human STING (PDB ID: 4LOH) were used as templates in the structural modeling experiments. The 3D models of STING from other species were obtained by optimally satisfying spatial restraints derived from the sequence alignment based on the CLUSTALX results and 3D structures. The dimer structure models were assembled by superpositioning the two monomer models onto the human STING dimer structure architecture bound with cGAMP. All structural models were analyzed and minor manual adjustments of the modeling solution were made using a graphics program COOT (57). Figures were prepared with PyMOL (The PyMOL Molecular Graphics System, Version 1.3, Schrödinger, LLC).

RESULTS

Presence of cGAS and STING in choanoflagellates and metazoans but not in nematodes

To elucidate the comprehensive evolutionary history of cGAS and STING proteins, we determined the distribution of these two proteins across 191 fully sequenced eukaryotic species. We used two rounds of searches, which are respectively based on the levels of protein and genomic sequences, to detect homologs of mouse cGAS and STING (see Materials and Methods, Supplementary Figure S1). There are 52 cGAS and 48 STING homologs present in 44 metazoan species and a unicellular eukaryotic organism, choanoflagellate M. brevicollis (Supplementary Table S1). M. brevicollis is considered as the closest living relative of animals. Except for the choanoflagellate homologs of cGAS and STING, we did not find any homolog of cGAS and STING in the other branches (fungi, plants or protists) of eukaryotes. Figure 1 shows the visualization of the distribution of cGAS (in blue rectangles) and STING (in red rectangles) homologs on the ‘life tree’ of 61 metazoans and their close protozoan relatives. The species tree showing the phylogenetic relationship of metazoan species was inferred from 100 bootstrapped STAR trees (48) based on ML gene trees of 27 universal genes (see Materials and Methods). Within metazoans, homologs of both cGAS and STING are present as early as in cnidarians (sea anemone Nematostella vectensis and Hydra magnipapillata). Both proteins are distributed in all Drosophila species, several non-Drosophila arthropods and nearly all chordates except for torafugu Takifugu rubripes; however, they were lost together in the flatworm Schistosoma mansoni and nematodes. Interestingly, the fully sequenced species either contain homologs of both cGAS and STING or have lost both. To explore phylogenetic relationships of cGAS or STING homologs, we built a ML gene tree for each family (Figure 2). All cGAS-containing species have one cGAS homolog except for seven species (labeled with copy numbers in blue in Figure 1). H. magnipapillata, T. castaneum, Drosophila virilis, Drosophila persimilis, Drosophila pseudoobscura, cephalochordate B. floridae (Florida lancelet) and Danio rerio (zebrafish) each of which has two cGAS homologs. The phylogenetic tree of the cGAS gene family (Figure 2A) shows that one round of cGAS homolog duplication may have occurred before the divergence of D. persimilis and D. pseudoobscura, and the cGAS homolog might have duplicated once in the other five genomes. It is the same for the STING family. H. magnipapillata harbors three STING candidates and B. floridae has two STING candidates. Figure 2B shows that STING homologs also have experienced species-specific gene duplication. The Ka/Ks ratio is an important index of functional constraints. Ka refers to the number of non-synonymous substitutions per non-synonymous site, while Ks represents the number of synonymous substitutions per synonymous site. The smaller the Ka/Ks ratio is, the stronger the functional constraints are (58). We listed Ka/Ks ratios of different proteins, cGAS, STING and TBK1 (a downstream protein activated by STING in IFN-pathway induction) within four metazoan lineages, namely, mammals (human and mouse), fish (D. rerio and Oryzias latipes), insects (Drosophila grimshawi and Drosophila willistoni, Drosophila melanogaster and Drosophila simulans) and cnidarians (H. magnipapillata and N. vectensis) (see Table 1, Materials and Methods). The Ka/Ks ratios of all protein pairs are much smaller than 1.0, suggesting that these proteins are subject to purifying selections (functional constraints). cGAS and STING homologs show higher Ka/Ks ratios than TBK1, which reveals that TBK1 has undergone strong purifying selection, while the functional constraints on cGAS and STING, both of which are upstream proteins in the IFN induction pathway, may have been relaxed to some extent.
Table 1.

Ka, Ks and sequence divergence of protein homologs between two species

Metazoan branchSpeciesGeneKsaKabKa/KsDivergencec
MammaliansH.sapiens and M.musculuscGAS0.630.280.450.44
STING0.570.210.360.31
TBK10.600.030.060.06
FishesD.rerioe and O.latipescGAS-14.360.540.120.48
cGAS-22.470.730.300.65
STING2.090.490.230.58
TBK13.630.020.010.04
InsectsD. grimshawi and D. willistonicGAS2.710.500.180.47
STING3.090.340.110.42
TBK12.140.060.030.09
InsectsD.melanogaster and D.simulanscGAS0.140.060.460.13
STING0.110.040.360.09
TBK10.150.020.110.04
CnidariansdH. magnipapillatae and N. vectensiscGAS-13.960.880.220.73
cGAS-23.990.870.220.76
STING-13.660.640.170.61
STING-24.010.740.180.69
STING-34.001.110.280.72

aKs, the number of synonymous substitutions per synonymous site.

bKa, the number of non-synonymous substitutions per non-synonymous site.

cSequence divergence, one minus identity of BLASTP alignment between two amino acid sequences.

dTBK1 is absent from H. magnipapillata and so is not in the last group of this table.

eD.rerio has two cGAS homologs (cGAS-1 and -2); H. magnipapillata has two cGAS homologs (cGAS-1 and -2) and three STING homologs (STING-1, -2 and -3).

aKs, the number of synonymous substitutions per synonymous site. bKa, the number of non-synonymous substitutions per non-synonymous site. cSequence divergence, one minus identity of BLASTP alignment between two amino acid sequences. dTBK1 is absent from H. magnipapillata and so is not in the last group of this table. eD.rerio has two cGAS homologs (cGAS-1 and -2); H. magnipapillata has two cGAS homologs (cGAS-1 and -2) and three STING homologs (STING-1, -2 and -3).

Evolutionary pattern of domain organization and critical residues of cGAS

Human cGAS is composed of an unstructured and poorly conserved N terminus (amino acid residues 1–160) and a highly conserved C terminus (160–513) (Figure 3A). The C-terminal fragment contains two highly conserved domains, nucleotidyltransferase (NTase) core domain (160–330) and Mab21 domain (213–513). In the NTase core domain, there are several conserved residues associated with active sites within the NTase superfamily: hG[G/S]X9–13[D/E]h[D/E]h…h[D/E]h (h indicates a hydrophobic amino acid). The hG[G/S] pattern has a crucial role in docking substrates within active sites and the three conserved aspartate/glutamate residues are involved in coordination of divalent ions and activation of acceptor hydroxyl groups on the substrate (59). Inserted into the Mab21 region is a zinc-ribbon structural domain (390–405) typically defined as H(X5)CC(X6)C. The Mab21 domain was first identified in the nematode Caenorhabditis elegans Mab-21, a cell fate-determining gene (60). Human and mouse Mab-21-like proteins are homologous with the nematode Mab-21 (61,62). Although the mouse cGAS is also a Mab21-containing protein, the eukaryotic homologs of mouse cGAS identified in this study are more similar to mouse cGAS than mouse Mab-21-like proteins (Supplementary Table S4).
Figure 3.

Evolution of functional domains in cGAS proteins. (A) A diagram of the domain organization of human cGAS. Five red asterisks indicate key catalytic residues (G212, S213, E225, D227 and D319) within the NTase fold of human cGAS. (B) Diagrams of the domain organization of cGAS homologs based on phylogeny of metazoans and M. brevicollis. The major metazoan groups followed by the representative species in brackets include cnidarians, such as sea anemone N. vectensis and H. magnipapillata, insects, such as D. melanogaster (fruit fly), cephalochordate B. floridae and vertebrates. The node of nematoda is in gray indicating that cGAS was lost within nematodes.

Evolution of functional domains in cGAS proteins. (A) A diagram of the domain organization of human cGAS. Five red asterisks indicate key catalytic residues (G212, S213, E225, D227 and D319) within the NTase fold of human cGAS. (B) Diagrams of the domain organization of cGAS homologs based on phylogeny of metazoans and M. brevicollis. The major metazoan groups followed by the representative species in brackets include cnidarians, such as sea anemone N. vectensis and H. magnipapillata, insects, such as D. melanogaster (fruit fly), cephalochordate B. floridae and vertebrates. The node of nematoda is in gray indicating that cGAS was lost within nematodes. Human cGAS shows significant structural similarity to the human oligoadenylate synthetase1 (OAS1) which polymerizes ATP into linear 2′-5′-linkage oligoadenylate upon stimulation by dsRNA. The structural similarity is especially striking in the catalytic domain fold for complex formation of cGAS with dsDNA and bound ligands (23,26–27,63-64). Furthermore, the first enzyme reported to synthesize cGAMP is a bacterial dinucleotide cyclase (DncV) in V. cholerae, which has no obvious primary sequence homology to human cGAS (17,65). Although these three kinds of enzymes, cGAS homologs, OAS1 homologs and DncV can be classified clearly according to the phylogenetic analysis (Supplementary Figure S3), they all belong to the nucleotidyltransferase superfamily and share similar patterns of secondary structural elements (Supplementary Figure S4). To gain novel insights into the evolutionary perspective of the relationships between structure and function in cGAS, we used two complementary means: (i) placing the domain organization of cGAS homologs on the evolutionary map of metazoans and M. brevicollis (Figure 3B) and (ii) mapping functionally critical residues of cGAS reported in recent publications across these species (Figure 4, Supplementary Table S5). NTase core domain and Mab21 domain are conserved in these species, but M. brevicollis and invertebrate cGAS homologs do not have a zinc-ribbon domain at the C terminus (Figures 3B and 4). This zinc-ribbon domain is conserved among all vertebrate cGAS members except for the three homologs in Pan paniscus (bonobo), Canis familiaris (dog) and Taeniopygia guttata (zebra finch). Lack of a zinc-ribbon insertion in the three vertebrates may result from the different versions of genome assembly or genome annotation (Supplementary Text S1, Supplementary Figure S5 and Supplementary Table S6). The vertebrate-specific zinc-ribbon domain of cGAS is not in OAS1, nematode Mab-21 and mammalian Mab-21-like proteins (Figure 4). Furthermore, cGAS homologs across vertebrates as well as cephalochordate B. floridae have N-terminal fragments with an average length of 167 amino acids except for chicken Gallus gallus and turkey Meleagris gallopavo. In contrast, cnidarian and insect cGAS homologs contain a very short N-terminal fragment, ∼70 amino acids in N. vectensis and fewer than 7 amino acids in the other species (Figures 3B and 4). Similarly, OAS1, nematode Mab-21 and mammalian Mab-21-like proteins also contain very short N-terminal tails (Figure 4). This could indicate that the ∼167-amino-acid-long N-terminal tail has evolved in chordate/vertebrate lineage and seems to be a cGAS-specific adaptation.
Figure 4.

Multiple sequence alignment of DncV and the representative sequences of cGAS homologs, OAS1 and Mab-21-like proteins. DncV is a prokaryotic enzyme named dinucleotide cyclase in Vibrio cholerae. The listed species include mammals H. sapiens and M. musculus, zebrafish D. rerio, cephalochordate B. floridae, insect D. melanogaster, nematode Caenorhabditis elegans and cnidarians. If an organism harbors multiple cGAS homologs, only one homolog is listed. The Mab21 domain was first identified in the nematode cell fate-determining gene Mab-21 (60). Two human proteins (Mab-21-like 1 and Mab-21-like 2) and two mouse proteins (Mab-21-like 1 and Mab-21-like 2) are homologous with the nematode Mab-21 (61,62).

Multiple sequence alignment of DncV and the representative sequences of cGAS homologs, OAS1 and Mab-21-like proteins. DncV is a prokaryotic enzyme named dinucleotide cyclase in Vibrio cholerae. The listed species include mammals H. sapiens and M. musculus, zebrafish D. rerio, cephalochordate B. floridae, insect D. melanogaster, nematode Caenorhabditis elegans and cnidarians. If an organism harbors multiple cGAS homologs, only one homolog is listed. The Mab21 domain was first identified in the nematode cell fate-determining gene Mab-21 (60). Two human proteins (Mab-21-like 1 and Mab-21-like 2) and two mouse proteins (Mab-21-like 1 and Mab-21-like 2) are homologous with the nematode Mab-21 (61,62). We next looked at the conservation of key residues within the NTase fold of human cGAS (G212, S213, E225, D227 and D319) across OAS1, Mab-21-like proteins, nematode Mab-21 and the prokaryotic protein DncV in V. cholerae (Figure 4). V. cholerae DncV synthesizes conventional 3′3′-cGAMP involved in bacterial chemotaxis and colonization while human cGAS synthesizes the specific 2′3′-cGAMP involved in sensing dsDNA. In addition, early metazoan cGAS and DncV show very similar patterns of secondary structural elements (Supplementary Figure S4). Thus, we wondered whether DncV and human cGAS share conserved NTase catalytical sites or not. Expectedly, the five aforementioned NTase active sites residues are completely conserved not only in cGAS homologs but also in DncV. OAS1 proteins share all five conserved NTase catalytic residues although they contain Asp in place of E225. Conversely, only one of the five residues, E225 is conserved in mammalian Mab-21-like proteins and nematode Mab-21, although another residue, D227, is conservatively replaced with a glutamate. It has been recently reported that cGAS in human and mouse interact with DNA through two binding sites, forming a complex composed of dimeric cGAS bound to two DNA molecules. Both of the two DNA binding surfaces and the dimer interface play a critical role in DNA binding (24,25). We checked the conservation of five functionally important amino acid residues involved in 2′3′-cGAMP binding, three positively charged residues on the primary DNA binding surface and seven critical residues on the second DNA binding surface and dimer interface across eukaryotic cGAS homologs (Supplementary Table S5). Except that S434 of human cGAS is not conserved in mammals while Y436 is conserved in all species, the other three residues (K362, R376 and S378) involved in 2′3′-cGAMP binding are completely conserved in a cephalochordate and in vertebrates but not in arthropods. Two lysine residues, K384 and K411, on the primary DNA binding surface are completely conserved in vertebrates but not in early-branching species, and the other one, K407, is highly conserved across all species except for cnidarians with Arg as the corresponding residue. Three critical residues, K347, K394 and K398, on the dimer interface are only conserved in vertebrates. Two residues, R236 and K254, on the second DNA binding surface are conserved in vertebrates except for amphibians (Xenopus tropicalis and X. laevis), but not in early-branching species, while the other two amino acids (K327 and R353) are not conserved residues. It seems reasonable that the above four residues (R236, K254, K327 and R353) are not conserved in vertebrates, because only double (R236E/K254E and K254E/K327E) and triple (R236E/K254E/K327E) mutations abrogated the ability of cGAS to stimulate IFN production (24), and the positively charged R353 of human cGAS is replaced by the other positively charged amino acid lysine in vertebrates. Strictly speaking, five functionally critical residues are not conserved in vertebrates while two residues are highly conserved in all species; the other eight residues are completely conserved in vertebrates but not in early-branching species.

Evolutionary pattern of structural features of STING

Human STING consists of four N-terminal TMs (amino acid residues 21–136), a central c-di-GMP-binding domain (CBD, 153–340) and a C-terminal tail (CTT, 340–379) (Figure 5A). The CBD containing a dimerization domain (DD, 155–180), protrudes into the cytoplasm (15). The crystal structures based on the ∼240-amino-acid-long globular carboxy-terminal domain (CBD+CTT or CTD, 138–379) have been reported to mediate binding to the bacterial second messenger c-di-GMP (66–70) and recently to 2′3′-cGAMP produced in mammalian cells (20,21). The CTT domain is important for STING to transduce signals (38,67).
Figure 5.

Evolution of functional domains in STING proteins. (A) A diagram of the domain organization of human STING. (B) Evolutionary pattern of the domain organization of STING homologs based on the phylogeny of metazoans and M. brevicollis. The node of nematoda is in gray indicating the absence of STING proteins. The TM and CTT domains were identified as described in the Materials and Methods.

Evolution of functional domains in STING proteins. (A) A diagram of the domain organization of human STING. (B) Evolutionary pattern of the domain organization of STING homologs based on the phylogeny of metazoans and M. brevicollis. The node of nematoda is in gray indicating the absence of STING proteins. The TM and CTT domains were identified as described in the Materials and Methods. Conserved structural domains in STING homologs are important for their proper biological roles. Detection of novel STING homologs enabled the detailed analysis of the evolutionary history of these conserved domains (Figure 5B). Human and mouse STING proteins reside exclusively on the ER membrane (14). Generally, four putative TMs (in red rectangles in Figure 5B) exist in protozoan M. brevicollis and metazoan STING homologs. Fewer than four TMs (in pink rectangles with dashed borders) are in arthropods except for jewel wasp Nasonia vitripennis, birds and the cephalochordate B. floridae (Figure 5B). Another three arthropod STING homologs in I. scapularis, T. castaneum and A. mellifera lack TM at the N terminus based on the current genome data (see Materials and Methods). All STING homologs have the conserved CBD and DD domains. However, the CTT is only observed in vertebrates. Overall, the modern STING proteins might have recently gained their structural domains during the early evolution of vertebrates, although the CTT domain is missing from STING homologs in amphibians X. tropicalis and X. laevis. As mentioned previously, replacing R232 with histidine in human STING was reported to affect its sensitivity particularly toward 2′-5′-linkage-containing cGAMP isomers. Recent structural studies found that residue 232 in human STING is part of the β sheet lid over the binding pocket of the STING-cGAMP complex, and the arginine residue interacts with the α–phosphate groups of cGAMP (20,21). The R232 allele is highly conserved in STING homologs except for O. latipes (Japanese medaka fish, Met 228), H. magnipapillata (Thr 230) and M. brevicollis (Ile 234) (Table 2). Furthermore, substitutions of several amino acid residues within the binding pocket of STING with Ala indicate that Y167, R238, Y240, N242, E260 and T263 are involved in the recognition of cGAMP isomers (20,21). Y167 and R238 are completely conserved in all STING homologs. E260 and T263 are highly conserved. The corresponding amino acid of E260 in mammal S. scrofa (pig) is a glycine, and serine replaces T263 in chicken G. gallus, turkey M. gallopavo and M. brevicollis. The other two sites Y240 and N242 are only fully conserved in mammals.
Table 2.

Conservation analysis of key amino acid residues within the c-di-GMP binding domain (CBD) of human or mouse STING in eukaryotic STING family

Amino acid residuesNon-conserved in these species
Mouse STINGHuman STINGMammalsNon-mammal vertebratesArthropodsCnidariansProtist (M. brevicollis)
R231R232, H232acMet (M) in O. latipes (Japanese medaka)Thr (T) in H. magnipapillataIle (I)
Y166Y167b
R237R238b
Y239Y240bPhe (F) in A. carolinensis (green anole), X. tropicalis and X. laevisPhe (F) in most arthropodsdMet (M)
N241N242bHis (H) in O. latipes, T. guttata (zebra finch) and G. gallus (chicken); Gln (Q) in M. gallopavoAsn (N) in Ixodes scapularis, T. castaneum, A. mellifera and N. vitripennis;Ile (I) in D. willistoni;His (H) in the othersHis (H) in N. vectensis (sea anemone)His (H)
E259E260bGly (G) in S. scrofa (pig)
T262T263bSer (S) in G. gallus and M. gallopavoSer (S)

aR232/H232 in human and R231 in mouse are important for optimal response to 2′-5′-linkage-containing natural or unnatural cGAMP isomers (20,21).

bAmino acid substitutions of Ala at these positions reduced or abolished cGAMP-isomer-dependent IFN-pathway activation (20).

c’−’ means the residue is conserved in all species.

dSTING homologs in D. willistoni, D. persimilis, D. pseudoobscura, Tribolium castaneum, Apis mellifera and Nasonia vitripennis have the same amino acid Tyr (Y) as human STING.

aR232/H232 in human and R231 in mouse are important for optimal response to 2′-5′-linkage-containing natural or unnatural cGAMP isomers (20,21). bAmino acid substitutions of Ala at these positions reduced or abolished cGAMP-isomer-dependent IFN-pathway activation (20). c’−’ means the residue is conserved in all species. dSTING homologs in D. willistoni, D. persimilis, D. pseudoobscura, Tribolium castaneum, Apis mellifera and Nasonia vitripennis have the same amino acid Tyr (Y) as human STING. To initially obtain a structural basis for understanding the binding of STING homologs from the species other than human with 2′3′-cGAMP, we generated homology models of STING homologs from three species, Japanese medaka O. latipes, chicken G. gallus and choanoflagellate M. brevicollis, according to the structure of human STING (PDB ID: 4LOH) (see Materials and Methods). Similar to the human STING structure reported recently (20), other STING homologs can also form a dimer (Figure 6A, Supplementary Figure S6A and C) and probably exhibit a closed conformation with a 2′3′-cGAMP bound. In O. latipes STING (Figure 6B), Met 228 substitutes for Arg/His 232 of human STING. M228 is on the surface of molecule and in the entrance of the ligand binding pocket. M228 makes the pocket slightly more hydrophobic and would slightly affect ligand binding affinity (possibly a bit weaker) in O. latipes STING. M228 could also play a role during the open to closed conformational transition. A substitution by Ile 234 in M. brevicollis STING (Supplementary Figure S6D) would be similar to M228 in O. latipes STING, but Thr in H. magnipapillata makes the pocket and entrance slightly polar. His 238 in O. latipes STING (Figure 6B) and His 247 in G. gallus STING (Supplementary Figure S6B) are located on the corresponding position of N242 in human STING. Similar to N242 in human STING, these histidines can form a hydrogen bond with Y161 or Y172 (Y167 in human STING) which forms a stacking interaction with guanine or adenine ring of cGAMP. The corresponding residue Ile in D. willistoni would probably have weak interaction with the corresponding Tyr. In G. gallus and M. brevicollis STING homlogs, Ser 268 and Ser 267 substitutes for T263 of human STING. However, these serines can also form a hydrogen bond with amino group on guanine ring of cGAMP, but are less restricted in ligand binding compared with T263 in human STING. In M. brevicollis STING, Met 242 replaces Y240 in human STING and provides similar hydrophobic environment for cGAMP binding. Our results showed that even though several key residues in human STING have mutations in STING homologs from other species, the STING from species other than human have structures similar to the human STING. They are still expected to exhibit functional levels of binding with 2′3′-cGAMP, although the binding of 2′3′-cGAMP with non-mammalian STING is expected to probably be slightly weaker than to human STING. We also tried STING structural modeling using 3′3′-cGAMP instead of 2′3′-cGAMP and found that the difference in binding between the two ligands is minor (data not shown).
Figure 6.

Structural modeling of STING from Japanese medaka Oryzias latipes binding with 2′3′-cGAMP. (A) Modeled dimer structure of STING homolog from O. latipes. One monomer is shown in green and the other in cyan. 2′3′-cGAMP is shown as thick bond model. (B) Comparison of 2′3′-cGAMP binding pockets of human STING with O. latipes. cGAMP is shown as thick bond model and amino acids are shown as thin bond models in orange (human STING) or cyan (O. latipes homolog). Structural modeling of another two non-mammalian STING homologs in chicken G. gallus and choanoflagellate M. brevicollis, and their binding with 2′3′-cGAMP are displayed in Supplementary Figure S6.

Structural modeling of STING from Japanese medaka Oryzias latipes binding with 2′3′-cGAMP. (A) Modeled dimer structure of STING homolog from O. latipes. One monomer is shown in green and the other in cyan. 2′3′-cGAMP is shown as thick bond model. (B) Comparison of 2′3′-cGAMP binding pockets of human STING with O. latipes. cGAMP is shown as thick bond model and amino acids are shown as thin bond models in orange (human STING) or cyan (O. latipes homolog). Structural modeling of another two non-mammalian STING homologs in chicken G. gallus and choanoflagellate M. brevicollis, and their binding with 2′3′-cGAMP are displayed in Supplementary Figure S6.

Evolution of proteins in cGAS-STING signaling pathway

In infected cells, cGAS senses cytosolic dsDNA from diverse microbes and self-DNA in a sequence-independent manner and generates 2′3′-cGAMP as an endogenous second messenger, which binds STING to trigger the signaling pathway that leads to the production of cytokines, such as type I IFN (Figure 7A). The activation of STING has been reported to be inhibited by ULK1 associated with AMPKα (71). To study how this cGAS-STING-dependent type I IFN induction evolved, we placed cGAS, STING and their downstream components onto the eukaryotic evolutionary map (Figure 7B). Fungus S. cerevisiae (budding yeast) was added to the representative species group as a contrast. AMPKα and ULK1 are present in nearly all representative species. cGAS and STING are present in choanoflagellate M. brevicollis. But the other four proteins (IKK, TBK1, NF-κB and IRF3) involved in the activation of this pathway are absent from M. brevicollis and S. cerevisiae, suggesting they are probably metazoan-specific proteins. IKKϵ and TBK1 have been reported to group together in early metazoans (72). Consistent with this, Figure 7B indicates that only a single homolog of IKKϵ or TBK1 is present in the poriferan Amphimedon queenslandica, cnidarians, nematode C. elegans and insect D. melanogaster. At the origin of echinoderms, Strongylocentrotus purpuratus (purple sea urchin), IKKϵ and TBK1 were diverged and present together in late-branching metazoans. NF-κB is widely distributed in all metazoans except nematodes. IRF3 homologs occurred at the origin of fishes but are absent from all three birds gathered in this study, G. gallus, M. gallopavo and T. guttata. The bird genomes contain another member of the IRF family, IRF7, which is also a transcription factor that induces type I IFN and is grouped with IRF3 according to their evolutionary history (73). IRF7 is present in all vertebrates, which is consistent with previous studies (74,75).
Figure 7.

cGAS-STING signaling to trigger type I IFN and phylogenetic profiles of its molecular components. (A) Overview of cGAS-STING signaling pathway leading to the production of type I IFN adapted from recent studies (1,71,76). Cytosolic dsDNA from diverse microbes and self-DNA in infected cells are danger signals that are sensed by cGAS in a sequence-independent manner. cGAS generates 2′3′-cGAMP as an endogenous second messenger, which binds STING and induces a conformational change in STING. The activated STING recruits TBK1 and IKKϵ kinases, which in turn phosphorylate and activate IRF3 and NF-κB, respectively. IRF3 and NF-κB then translocate to the nucleus to induce type I IFN and other cytokines. Certain bacteria produce c-di-GMP, c-di-AMP and 3′3′-cGAMP, which could activate some STING alleles (such as R232 in human STING). The activation of STING is inhibited by ULK1 (71). (B) Distribution of each molecular component (on the right) of the signaling pathway in a representative group of species (on the top) with fully sequenced genomes. The transcription factor IRF7 was added in this part because just like IRF3, IRF7 is also considered a master regulator of type I IFN induction. Eukaryotic homologs of each molecular component were identified in a similar way to those of cGAS and STING (see Materials and Methods). The presence of a homolog of the molecular component in a particular species is in green and the absence in light gray. The species are placed according to the phylogenetic relationship in Figure 1. Most species have been indicated in Figure 4 except for yeast S. cerevisiae, sponge Amphimedon queenslandica, echinoderms Strongylocentrotus purpuratus (purple sea urchin), non-mammal vertebrates X. tropicalis (western clawed frog) and G. gallus (chicken).

cGAS-STING signaling to trigger type I IFN and phylogenetic profiles of its molecular components. (A) Overview of cGAS-STING signaling pathway leading to the production of type I IFN adapted from recent studies (1,71,76). Cytosolic dsDNA from diverse microbes and self-DNA in infected cells are danger signals that are sensed by cGAS in a sequence-independent manner. cGAS generates 2′3′-cGAMP as an endogenous second messenger, which binds STING and induces a conformational change in STING. The activated STING recruits TBK1 and IKKϵ kinases, which in turn phosphorylate and activate IRF3 and NF-κB, respectively. IRF3 and NF-κB then translocate to the nucleus to induce type I IFN and other cytokines. Certain bacteria produce c-di-GMP, c-di-AMP and 3′3′-cGAMP, which could activate some STING alleles (such as R232 in human STING). The activation of STING is inhibited by ULK1 (71). (B) Distribution of each molecular component (on the right) of the signaling pathway in a representative group of species (on the top) with fully sequenced genomes. The transcription factor IRF7 was added in this part because just like IRF3, IRF7 is also considered a master regulator of type I IFN induction. Eukaryotic homologs of each molecular component were identified in a similar way to those of cGAS and STING (see Materials and Methods). The presence of a homolog of the molecular component in a particular species is in green and the absence in light gray. The species are placed according to the phylogenetic relationship in Figure 1. Most species have been indicated in Figure 4 except for yeast S. cerevisiae, sponge Amphimedon queenslandica, echinoderms Strongylocentrotus purpuratus (purple sea urchin), non-mammal vertebrates X. tropicalis (western clawed frog) and G. gallus (chicken).

DISCUSSION

Does cGAS co-evolve with STING in metazoans?

Using elegant biochemical and genetic experiments, Chen et al. identified that mammalian cGAS proteins act as the major and nonredundant cytosolic DNA sensor that generates the second-messenger product, 2′3′-cGAMP. 2′3′-cGAMP then activates STING, which further stimulates the downstream signaling pathway that leads to type I IFN production (17–18,28). The evolution of cGAS and STING was studied in two recent reviews (77,78). Schaap screened representative genomes of the major metazoan phyla and the choanoflagellate M. brevicollis using a modified best bi-directional BLASTP search, and found that STING homologs are distributed in all major animal phyla, except for porifera, and M. brevicollis. But cGAS is present as early as in cephalochordate B. floridae. In the other review, Wu and Chen obtained similar results of the origins of cGAS and STING. In our study, we applied a systematic and rigorous method to search homologs on all fully sequenced eukaryotic genomes, and the understanding of the evolutionary distribution of both cGAS and STING genes can be expanded across eukaryotic species. Besides STING, the emergence of cGAS could be traced back to the choanoflagellate M. brevicollis, which is the closest known relative of metazoans (79). The cGAS family has several features in common with the STING family during metazoan evolution. First, both proteins are present early in several simple organisms, including cnidarians N. vectensis and H. magnipapillata. However, both cGAS and STING were subsequently lost in nematodes and flatworms, from which other key components (NF-κB and IRF3) of the cGAS-STING signaling pathway are also absent (Figure 7B). It is possible that nematodes and flatworms may rely on different mechanisms to trigger innate immunity signaling. For example, even though C. elegans has homologs of Toll-like receptors which have well-established roles in innate immunity in mammals, these homologs do not function in response to infections in nematode (80). Second, both cGAS and STING are subject to functional constraints that are relaxed to some extent compared with TBK1, a signaling component at the downstream end of this pathway, suggesting that cGAS and STING may be less conservative than TBK1 from insects to mammals (Table 1). Third, the modern cGAS and STING proteins appear to have acquired their domain features early in the evolution of vertebrates. For cGAS proteins, the zinc-ribbon domain is not present in invertebrate cGAS (Figures 3B and 4). It has been reported that the zinc-ribbon domain is functionally important for a metal coordination and interaction with the major groove of DNA, suggesting that it serves as a molecular ‘ruler’ to scale the specificity of cGAS toward dsDNA (26–27,76). Additionally, cGAS proteins in cnidarians and insects contain very short N-terminal domains (Figures 3B and 4). This highly positively charged N-terminal fragment of cGAS may play a role in DNA binding because this fragment can bind an immune stimulatory DNA (ISD, 45 bp) (17). Kranzusch et al. suspected that this N-terminal region also plays an important role in stabilization or autoinhibition like other nucleic acid sensors (RIG-I and AIM2) (27). Moreover, through investigating the conservation of 15 amino acid residues of human cGAS, which are critical for DNA binding or are involved in 2′3′-cGAMP binding, across eukaryotic cGAS homologs, we found that excluding five residues that are not conserved in vertebrates, most amino acids are completely conserved in vertebrates but not in arthropods, cnidarians or choanoflagellate M. brevicollis (Supplementary Table S5). Taken together, evolutionary analysis of key structural domains and critical residues shows that invertebrate cGAS homologs may not act as a DNA sensor by binding dsDNA. Similarly, the modern STING possibly began to acquire its CTT domain within the vertebrate lineage (Figure 5). The CTT domain plays an essential role in the biological function of STING. In the absence of ligand, CTT binds the STING CBD and maintains STING in an inactive and autoinhibited state. Binding of ligand relieves the autoinhibited state, exposes the CTT and stabilizes the STING dimers in complex with the ligand (67). The exposure of CTT also facilitates the interactions of CTT with TBK1 to promote activation of IRF3. Two residues, S366 and L374, in the CTT domain of human STING are important for IRF3 activation (38). Furthermore, Barber et al. studied the details of phosphorylation of STING and uncovered that the phosphorylated S366, induced by ULK1, may facilitate STING degradation to prevent sustained function (71). The ULK1-induced phosphorylation site S366 is within the CTT domain of human STING and conserved in all vertebrates (including fishes) except for amphibians X. laevis and X. tropicalis (Supplementary Figure S2). Amphibian STING homologs lack the CTT domain and probably depend on different mechanisms for activation and degradation. Thus, the function of cGAS and STING in sensing cytosolic DNA to trigger the innate immune response is possibly restricted to vertebrates, which is consistent with the finding that the IFN system functions only in jawed vertebrates (81). In summary, cGAS and STING share three important evolutionary characteristics as discussed above. Interestingly, in mammals, cGAS and STING cooperate to bring about innate immune signaling. In that regard, cGAS represents a new class of cytosolic DNA sensor, while STING cooperates with cGAS and functions as a central adaptor molecule. The critical link is cGAMP which can activate STING to launch the type I IFN induction and also trigger negative-feedback control of STING activity (17,28,71). Therefore, contrary to the conclusion of Schaap that cGAS and STING did not evolve together (77), we speculate that cGAS has co-evolved with STING during the evolution of metazoans. Given that the critical functional zinc-ribbon domain and the most amino acid residues important for DNA binding in human cGAS are not conserved in early cGAS homologs, and even the CTT domain essential for STING activation and degradation through the ULK1-induced phosphorylation site S366, is also not conserved in early STING homologs, cGAS-STING may possibly play other roles in early metazoans rather than activating an innate immunity response to cytosolic DNA.

Could cGAS produce cGAMP in early metazoans and M. brevicollis?

Even though we have suggested that cGAS may not function as a DNA sensor in invertebrates, it remains unknown whether cGAS in M. brevicollis and early metazoans can function as a cGAMP synthase. A series of mutations in mammalian cGAS at each position in the zinc-coordination site near the DNA-binding cleft abolished or severely impaired the activity of 2′3′-cGAMP synthesis in vitro and the production of type I IFN in vivo (26,27). A reason for this may be that the ability of cGAS to synthesize 2′3′-cGAMP requires the pronounced conformation changes taking place after the binding of cGAS to dsDNA, which triggers a repositioning of catalytic residues in the binding pockets of ATP and GTP (23–24,26,27,63-64). Invertebrate cGAS homologs do not contain the zinc-ribbon domain (Figures 3B and 4) and four of five amino acid residues involved in 2′3′-cGAMP in human cGAS are not conserved in arthropods, cnidarians or a choanoflagellate (Supplementary Table S5), but in the absence of any experimental evidence regarding invertebrate cGAS reactivity, we can not make an unambiguous conclusion that unlike the mammalian cGAS, cGAS homologs in invertebrates are not 2′3′-cGAMP synthase. However, the secondary structure pattern and the NTase-specific motif (hG[G/S]X9–13[D/E]h[D/E]h…h[D/E]h) are highly conserved in cGAS homologs throughout metazoan evolution, in the choanoflagellate M. brevicollis and in the V. cholerae DncV (Figure 4 and Supplementary Figure S4). Interestingly, DncV is capable of synthesizing 3′3′-cGAMP, c-di-AMP and c-di-GMP in vitro by incubating with the corresponding nucleoside triphosphate and a simple assay buffer. Therefore, we hypothesize that ‘early’ cGAS proteins may probably have the ability to produce cGAMP or other kinds of cyclic dinucleotides acting as a nucleotidyltransferase. In a recent review, Schaap suspected that metazoan and their protist ancestors of STING could detect cyclic dinucleotides long before cGAS could synthesize 2′3′-cGAMP, through the conservation analysis of 10 residues in human STING that are required for binding cyclic dinucleotides across homologous sequences (77). Considering that cGAS and STING evolved together during animal evolution, the conclusion by Schaap probably provide complementary information for our hypothesis that ‘early’ cGAS homologs could synthesize cyclic dinucleotides. To provide more complementary information, we generated homology models of STING from three non-mammal species, fish O. latipes (Figure 6), chicken G. gallus and the choanoflagellate M. brevicollis (Supplementary Figure S6), according to the structure of human STING (PDB ID: 4LOH). Similar to the reported human STING structure (20), the three non-mammalian STING homologs can also form a dimer and probably exhibit functional levels of binding with cGAMP, although the binding would probably be slightly weaker than to human STING. If cGAMP synthesis activity of cGAS evolved in M. brevicollis and early metazoans, another question arose naturally: Along the trajectory of human evolution, did two kinds of cGAMP, canonical 3′3′- and uncommon 2′3′-cGAMP, once exist together in the innate immune system? cGAMP and all other cyclic dinucleotides in bacterial cells are linked by 3′-5′-phosphodiester linkage. In the mammals, namely, human, mouse and pig, the endogenous cGAMP has a unique 2′-5′-phosphodiester linkage between GMP and AMP (19,21–23,27). Another nucleotidyltransferase, OAS1, can produce 2′-5′-phosphodiester linkages and polymerize ATP into 2′-5′-linked iso-RNA (2′-5′-oligoadenylate) instead of 3′-5′-linked RNA under dsRNA binding (63). Biochemical and evolutionary analysis concluded that the first OAS1 protein with 2′-5′-oligoadenylate synthesis activity is in Geodia cydonium (marine sponge), an earlier lineage than cnidaria in the kingdom metazoa. G. cydonium OAS1 can produce both 3′-5′ and 2′-5′ linkages but predominantly synthesizes the 2′-5′ linkage (82,83). Nothing is yet known about the production of cGAS in invertebrates; however, we could get some clues from the STING family considering that cGAS and STING may have co-evolved in metazoans. For example, the H232 allele of human STING specifically responds to 2′3′-cGAMP but loses the ability to respond to 3′3′-cGAMP or bacterial c-di-GMP, while the R232 allele can respond to all these cyclic dinucleotides. This suggests that the responsiveness to bacterial cyclic dinucleotides was lost under a strong selective pressure during human evolution (22,76). Interestingly, the H232 STING allele only appears in humans while the R232 is highly conserved in most metazoans except for O. latipes, H. magnipapillata and M. brevicollis (Table 2). Therefore, although mammalian cGAS proteins synthesize the non-canonical 2′3′-cGAMP on sensing cytosolic DNA, we suspect that like OAS1, invertebrate cGAS might have had the ability to produce both 2′3′-cGAMP and 3′3′-cGAMP.

CONCLUSION

To conclude, cGAS and STING are already present in the unicellular eukaryotic organism M. brevicollis. But during the metazoan evolution that followed, both were lost in nematodes, flatworms. Because both proteins cooperate extensively in the stimulation of the innate immune pathway and display similar evolutionary characteristics, we hypothesize that cGAS and STING have co-evolved in M. brevicollis and metazoans. Given the critical functions of cGAS and STING in mammals, it is important to study the primitive biological functions controlled by cGAS and STING in M. brevicollis and early metazoans. Based on the evolutionary analysis of their structural organization, zinc-ribbon domain and long N-terminal fragment in cGAS as well as STING CTT domain, modern cGAS and STING proteins may have gained their functional domains early in the evolution of vertebrates. In addition, vertebrate cGAS homologs keep most of the amino acids residues in human cGAS that are important for DNA binding conserved, whereas invertebrate cGAS homologs have variations on these residues (Supplementary Table S5). Therefore, we hypothesize that cGAS and STING do not take part in the innate immunity response to cytosolic DNA in invertebrates. However, the high conservation of secondary structures and key active residues between cGAS homologs and V. cholerae DncV indicates that cGAS may already have acquired the ability to synthesize cGAMP in M. brevicollis and early-branching metazoans. A question remaining is whether 3′3′- and 2′3′-cGAMP once existed together in the evolution of the innate immune system? We propose that cGAS might have been able to synthesize both 3′3′- and 2′3′-cGAMP during some stage of metazoan evolution. Although the specific physiological and biochemical roles of cGAS and STING homologs in invertebrates are remain uncertain, the conservation analysis of critical domains, secondary structural elements and amino acid residues provide novel insights into the relationships between structure and function in both proteins. cGAS and STING do not function in isolation; they activate cellular innate immune responses to cytosolic DNA through interactions with downstream molecules. The study of cGAS and STING combined with the other signaling components in an evolutionary perspective, which goes beyond the review by Schaap (77), may provide valuable molecular insights into the functions and origins of this type of pathway that initiates type I IFN.

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online.
  82 in total

1.  A combined transmembrane topology and signal peptide prediction method.

Authors:  Lukas Käll; Anders Krogh; Erik L L Sonnhammer
Journal:  J Mol Biol       Date:  2004-05-14       Impact factor: 5.469

Review 2.  Evolution of the interleukins.

Authors:  Pete Kaiser; Lisa Rothwell; Stuart Avery; Sucharitha Balu
Journal:  Dev Comp Immunol       Date:  2004-05-03       Impact factor: 3.636

3.  Coordinated regulation of accessory genetic elements produces cyclic di-nucleotides for V. cholerae virulence.

Authors:  Bryan W Davies; Ryan W Bogard; Travis S Young; John J Mekalanos
Journal:  Cell       Date:  2012-04-13       Impact factor: 41.582

4.  Crystal structures of STING protein reveal basis for recognition of cyclic di-GMP.

Authors:  Guijun Shang; Deyu Zhu; Ning Li; Junbing Zhang; Chunyuan Zhu; Defen Lu; Cuilan Liu; Qian Yu; Yanyu Zhao; Sujuan Xu; Lichuan Gu
Journal:  Nat Struct Mol Biol       Date:  2012-06-24       Impact factor: 15.369

5.  The structural basis for the sensing and binding of cyclic di-GMP by STING.

Authors:  Yi-He Huang; Xiang-Yu Liu; Xiao-Xia Du; Zheng-Fan Jiang; Xiao-Dong Su
Journal:  Nat Struct Mol Biol       Date:  2012-06-24       Impact factor: 15.369

6.  Evolutionary diversity of the mitochondrial calcium uniporter.

Authors:  Alexander G Bick; Sarah E Calvo; Vamsi K Mootha
Journal:  Science       Date:  2012-05-18       Impact factor: 47.728

7.  Cyclic di-GMP sensing via the innate immune signaling protein STING.

Authors:  Qian Yin; Yuan Tian; Venkataraman Kabaleeswaran; Xiaomo Jiang; Daqi Tu; Michael J Eck; Zhijian J Chen; Hao Wu
Journal:  Mol Cell       Date:  2012-06-14       Impact factor: 17.970

8.  Structural analysis of the STING adaptor protein reveals a hydrophobic dimer interface and mode of cyclic di-GMP binding.

Authors:  Songying Ouyang; Xianqiang Song; Yaya Wang; Heng Ru; Neil Shaw; Yan Jiang; Fengfeng Niu; Yanping Zhu; Weicheng Qiu; Kislay Parvatiyar; Yang Li; Rongguang Zhang; Genhong Cheng; Zhi-Jie Liu
Journal:  Immunity       Date:  2012-05-10       Impact factor: 31.745

9.  Estimating divergence dates and substitution rates in the Drosophila phylogeny.

Authors:  Darren J Obbard; John Maclennan; Kang-Wook Kim; Andrew Rambaut; Patrick M O'Grady; Francis M Jiggins
Journal:  Mol Biol Evol       Date:  2012-06-07       Impact factor: 16.240

10.  Structure of STING bound to cyclic di-GMP reveals the mechanism of cyclic dinucleotide recognition by the immune system.

Authors:  Chang Shu; Guanghui Yi; Tylan Watts; C Cheng Kao; Pingwei Li
Journal:  Nat Struct Mol Biol       Date:  2012-06-24       Impact factor: 15.369

View more
  64 in total

Review 1.  The emerging role of nuclear viral DNA sensors.

Authors:  Benjamin A Diner; Krystal K Lum; Ileana M Cristea
Journal:  J Biol Chem       Date:  2015-09-09       Impact factor: 5.157

Review 2.  International Union of Basic and Clinical Pharmacology. XCVI. Pattern recognition receptors in health and disease.

Authors:  Clare E Bryant; Selinda Orr; Brian Ferguson; Martyn F Symmons; Joseph P Boyle; Tom P Monie
Journal:  Pharmacol Rev       Date:  2015       Impact factor: 25.468

3.  Analysis of Drosophila STING Reveals an Evolutionarily Conserved Antimicrobial Function.

Authors:  Marina Martin; Aoi Hiroyasu; R Marena Guzman; Steven A Roberts; Alan G Goodman
Journal:  Cell Rep       Date:  2018-06-19       Impact factor: 9.423

4.  Gene family innovation, conservation and loss on the animal stem lineage.

Authors:  Daniel J Richter; Parinaz Fozouni; Michael B Eisen; Nicole King
Journal:  Elife       Date:  2018-05-31       Impact factor: 8.140

5.  Conservation of the STING-Mediated Cytosolic DNA Sensing Pathway in Zebrafish.

Authors:  Rui Ge; Yi Zhou; Rui Peng; Rui Wang; Mi Li; Yunbin Zhang; Chunfu Zheng; Chen Wang
Journal:  J Virol       Date:  2015-05-13       Impact factor: 5.103

6.  Sublingual targeting of STING with 3'3'-cGAMP promotes systemic and mucosal immunity against anthrax toxins.

Authors:  Tara L Martin; Junbae Jee; Eunsoo Kim; Haley E Steiner; Estelle Cormet-Boyaka; Prosper N Boyaka
Journal:  Vaccine       Date:  2017-03-24       Impact factor: 3.641

7.  Human cGAS catalytic domain has an additional DNA-binding interface that enhances enzymatic activity and liquid-phase condensation.

Authors:  Wei Xie; Lodoe Lama; Carolina Adura; Daisuke Tomita; J Fraser Glickman; Thomas Tuschl; Dinshaw J Patel
Journal:  Proc Natl Acad Sci U S A       Date:  2019-05-29       Impact factor: 11.205

Review 8.  Crosstalk between cGAS-STING signaling and cell death.

Authors:  Ambika M V Murthy; Nirmal Robinson; Sharad Kumar
Journal:  Cell Death Differ       Date:  2020-09-18       Impact factor: 15.828

9.  Identification of Uncharacterized Components of Prokaryotic Immune Systems and Their Diverse Eukaryotic Reformulations.

Authors:  A Maxwell Burroughs; L Aravind
Journal:  J Bacteriol       Date:  2020-11-19       Impact factor: 3.490

10.  Structural basis of nucleosome-dependent cGAS inhibition.

Authors:  Joshua A Boyer; Cathy J Spangler; Joshua D Strauss; Andrew P Cesmat; Pengda Liu; Robert K McGinty; Qi Zhang
Journal:  Science       Date:  2020-09-10       Impact factor: 47.728

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.