Daniel A Bonsor1, Dorothy Beckett2, Eric J Sundberg1. 1. Institute of Human Virology, University of Maryland School of Medicine, Baltimore, MD 21201, USA. 2. Department of Chemistry and Biochemistry, University of Maryland College Park, Baltimore, MD 20742, USA.
Abstract
CEACAM7 is a human cellular adhesion protein that is expressed on the surface of colon and rectum epithelial cells and is downregulated in colorectal cancers. It achieves cell adhesion through dimerization of the N-terminal IgV domain. The crystal structure of the N-terminal dimerization domain of CEACAM has been determined at 1.47 Å resolution. The overall fold of CEACAM7 is similar to those of CEACAM1 and CEACAM5; however, there are differences, the most notable of which is an insertion that causes the C'' strand to buckle, leading to the creation of a hydrogen bond in the dimerization interface. The Kdimerization for CEACAM7 determined by sedimentation equilibrium is tenfold tighter than that measured for CEACAM5. These findings suggest that the dimerization affinities of CEACAMs are modulated via sequence variation in the dimerization surface.
CEACAM7 is a human cellular adhesion protein that is expressed on the surface of colon and rectum epithelial cells and is downregulated in colorectal cancers. It achieves cell adhesion through dimerization of the N-terminal IgV domain. The crystal structure of the N-terminal dimerization domain of CEACAM has been determined at 1.47 Å resolution. The overall fold of CEACAM7 is similar to those of CEACAM1 and CEACAM5; however, there are differences, the most notable of which is an insertion that causes the C'' strand to buckle, leading to the creation of a hydrogen bond in the dimerization interface. The Kdimerization for CEACAM7 determined by sedimentation equilibrium is tenfold tighter than that measured for CEACAM5. These findings suggest that the dimerization affinities of CEACAMs are modulated via sequence variation in the dimerization surface.
Carcinoembryonic antigen-related cell adhesion molecules (CEACAMs) belong to the immunoglobulin (Ig) family and are expressed differentially on the surfaces of cells (Gray-Owen & Blumberg, 2006 ▸; Tchoupa et al., 2014 ▸). There are 12 CEACAMs found in humans: CEACAM1, CEACAM3–CEACAM8, CEACAM16 and CEACAM18–CEACAM21 (Beauchemin & Arabzadeh, 2013 ▸; Tchoupa et al., 2014 ▸). Their functions and roles in cellular processes are diverse and include roles in phagocytosis, hearing, proliferation, signaling, tumor suppression and cell adhesion (Oikawa et al., 1989 ▸, 1991 ▸; Benchimol et al., 1989 ▸; Streichert et al., 2001 ▸; Pils et al., 2008 ▸; Singer et al., 2010 ▸; Zheng et al., 2011 ▸). CEACAMs are typically dysregulated in cancer and are found to be parasitized by bacteria (e.g.
Neisseria meningitidis, Escherichia coli and Haemophilus influenzae) and viruses in mice (e.g. coronavirus) during infection (Dveksler et al., 1991 ▸; Leusch et al., 1991 ▸; Bos et al., 1997 ▸; Schölzel et al., 2000 ▸; Virji et al., 2000 ▸; Duxbury et al., 2004 ▸; Litkouhi et al., 2008 ▸; Obrink, 2008 ▸; Singer et al., 2010 ▸). CEACAMs contain an N-terminal immunoglobulin variable domain (IgV), a variable number of immunoglobulin constant domains (IgC2) and either a C-terminal transmembrane and cytoplasmic domain or a glycophosphatidyl-inositol (GPI) moiety by which they are anchored to the plasma membrane (Tchoupa et al., 2014 ▸). Cell adhesion is achieved through the N-terminal domain of CEACAMs, which can undergo heterodimerization and homodimerization in a cis (on the same cell) or trans (across different cells) fashion (Taheri et al., 2000 ▸; Kuroki et al., 2001 ▸; Watt et al., 2001 ▸).CEACAM7 is expressed on highly differentiated epithelial cells of the colon and rectum and on the epithelial cells within the ducts of the pancreas (Schölzel et al., 2000 ▸). The expression pattern of CEACAM7 suggests a specialized function. In fetal tissues of the colon, CEACAM7 is located at the base of epithelial cells and has been found to migrate to the apical surface a few days after birth (Schölzel et al., 2000 ▸). CEACAM7 contains three domains: an N-terminal IgV domain, a single IgC2 domain and a cell-surface GPI anchor domain (Tchoupa et al., 2014 ▸). CEACAM7 expression is downregulated during the early development of colorectal tumors, unlike CEACAM5 or CEACAM6, which are typically upregulated, suggesting a tumor-suppression function (Schölzel et al., 2000 ▸). Currently, it is unknown whether CEACAM7 is involved in cell adhesion through homodimerization or if any structural differences exist that could potentially allow CEACAM7 to function as a tumor-suppression molecule when compared with the two known structures of CEACAM1 and CEACAM5 (Fedarovich et al., 2006 ▸; Korotkova et al., 2008 ▸). Here, we report the 1.47 Å resolution X-ray crystal structure of the N-terminal domain of CEACAM7 and have characterized its oligomeric state in solution.
Materials and methods
CEACAM7 production
The N-terminal domain of CEACAM7 was synthesized as a codon-optimized GeneArt string (Life Technologies), which was digested and ligated into an NcoI/XhoI-cut pET-21d vector without a purification tag. CEACAM7 was expressed in inclusion bodies in E. coli BL21 (DE3) pLysS cells. Briefly, 1 l of cells were grown in LB Miller at 310 K until an OD600 nm of ∼0.6 was attained, prior to induction with 1 mM isopropyl β-d-1-thiogalactopyranoside (IPTG). Cells were grown for a further 4 h before harvesting (5000g for 15 min at 277 K). The cells were resuspended in lysis buffer [50 mM Tris–HCl, 500 mM NaCl, 1%(v/v) Triton X-100 pH 7.5] and lysed by sonication. Inclusion bodies were isolated (20 000g for 20 min at 277 K), resuspended in lysis buffer, sonicated and isolated by centrifugation (20 000g for 20 min at 277 K). Inclusion bodies were washed with a high-salt buffer (50 mM Tris–HCl, 1.0 M NaCl pH 8.0) to remove DNA, followed by lysis buffer without Triton X-100. Inclusion bodies were dissolved in 30 mM Tris–HCl, 150 mM NaCl, 8.0 M urea pH 8.3 (∼5 ml per litre of grown cells), refolded by rapid dilution (1:12 ratio) at 277 K into 50 mM CHES–HCl, 500 mM
l-arginine pH 9.2 and left for 24 h. Refolded CEACAM7 was dialyzed against 10 mM Tris–HCl pH 8.0 and concentrated by anion-exchange chromatography (Mono Q, GE Healthcare). A linear salt gradient from 0 to 1000 mM was run at 1 ml min−1 over 15 min, with CEACAM7 eluting at between 50 and 100 mM NaCl. CEACAM7 was further purified by size-exclusion chromatography (Superdex 200, GE Healthcare) in 50 mM Tris–HCl, 150 mM NaCl, 1 mM EDTA pH 7.5 and fractions were stored at 277 K. Typically, 10 mg of refolded protein per litre was obtained with a refolding efficiency of 10%. Macromolecule-production information is summarized in Table 1 ▸.
Table 1
CEACAM7 production details
Source organism
Homo sapiens
DNA source
Synthetic
Expression vector
pET-21d
Expression host
E. coli BL21 (DE3) pLysS
Complete amino-acid sequence of the construct produced
CEACAM7 was concentrated using a Centricon centrifugal filter unit (10 kDa MWCO, Millipore) and subsequently dialyzed against 20 mM Tris–HCl, 100 mM NaCl pH 7.5. CEACAM7 at 5.9 mg ml−1 was screened against The JCSG+ Suite screen (Qiagen) using a Crystal Gryphon Protein Crystallography System (Art Robbins Instruments) with sitting drops consisting of 150 nl protein solution and 150 nl reservoir solution equilibrated against 50 µl reservoir solution. A shower of small crystals grew over 5 d in condition D6 [20%(w/v) polyethylene glycol 8000, 200 mM magnesium chloride, 100 mM Tris–HCl pH 8.5]. Crystals were optimized by hanging-drop vapor diffusion, with the final crystals growing in 18%(w/v) polyethylene glycol 8000, 200 mM magnesium chloride, 100 mM Tris–HCl pH 8.5. Crystallization information is summarized in Table 2 ▸.
Data collection, processing, structure solution and refinement
Crystals of CEACAM7 were washed and cryoprotected in mother liquor containing 20%(v/v) glycerol. Data were collected on beamline 23-ID-B at the Advanced Photon Source (APS), Argonne National Laboratory, USA. Data were processed using HKL-2000 (Otwinowski & Minor, 1997 ▸). Data-collection and processing statistics are shown in Table 3 ▸. F
obs were obtained using SCALEPACK2MTZ (Winn et al., 2011 ▸). Molecular replacement was performed using MOLREP (Vagin & Teplyakov, 2010 ▸) and a CHAINSAW (Stein, 2008 ▸) model of CEACAM5 (PDB entry 2qsq; Korotkova et al., 2008 ▸), a protein with 65% sequence identity to CEACAM7. CEACAM7 was refined with REFMAC (Murshudov et al., 2011 ▸) and rebuilt in Coot (Emsley et al., 2010 ▸). MolProbity (Chen et al., 2010 ▸) was used for Ramachandran analysis. Refinement statistics are shown in Table 4 ▸.
Table 3
Data collection and processing
Values in parentheses are for the outer shell.
Diffraction source
Beamline 23-ID-B, APS
Wavelength ()
1.0332
Temperature (K)
100
Detector
MAR Mosaic 300mm CCD
Crystal-to-detector distance (mm)
253.8
Rotation range per image ()
0.2
Total rotation range ()
180
Exposure time per image (s)
0.3
Space group
P21
a, b, c ()
32.62, 64.89, 103.16
, , ()
90, 89.99, 90
Mosaicity ()
0.52
Resolution range ()
103.161.47 (1.501.47)
Total No. of reflections
205245
No. of unique reflections
71599
Completeness (%)
98.2 (82.7)
Multiplicity
2.900 (1.80)
I/(I)
28.0 (1.57)†
Rr.i.m.‡
0.130 (0.666)
Overall B factor from Wilson plot (2)
18.3
CC1/2
0.993 (0.664)
The data were extended owing to a reasonable CC1/2 value. An I/(I) of 2.0 is equal to a resolution of 1.57.
Estimated R
r.i.m. = R
merge[N/(N 1)]1/2, where N is the data multiplicity.
Table 4
CEACAM7 structure refinement
Resolution range ()
103.161.47 (1.501.47)
Completeness (%)
97.9
Cutoff
F > 0.000(F)
No. of reflections, working set
67967 (4156)
No. of reflections, test set
3613 (226)
Final Rcryst
0.143 (0.280)
Final Rfree
0.193 (0.309)
Cruickshank DPI
0.0722
No. of non-H atoms
Protein
3529
Ion
3
Ligand
0
Water
269
Total
3801
R.m.s. deviations
Bonds ()
0.020
Angles ()
1.901
Average B factors (2)
Protein
27.6
Ion
41.9
Ligand
0.0
Water
39.5
Ramachandran plot
Favored regions (%)
98.8
Additionally allowed (%)
0.9
PDB code
4y89
Analytical ultracentrifugation
Sedimentation-equilibrium measurements of CEACAM7 were performed using a Beckman–Coulter XL-I analytical ultracentrifuge equipped with a four-hole An-60 Ti rotor at 20°C. Prior to centrifugation, CEACAM7 was dialyzed extensively against 50 mM Tris–HCl pH 7.5, 50 mM NaCl. SEDNTERP (http://sednterp.unh.edu) was used to calculate values for the protein partial specific volume and solvent density from the protein amino-acid sequence and buffer composition, respectively. CEACAM7 at three different concentrations (24.1, 14.5 and 9.6 µM) was loaded into cells equipped with six-hole charcoal-filled Epon centerpieces (1.2 cm path length) with sapphire windows. Centrifugation was carried out at 29 000, 32 000 and 35 000 rev min−1 and scans were acquired at 280 nm with a step size of 0.001 and five averages per step. The data were globally analyzed using the WinNonLin program (Johnson et al., 1981 ▸).
Results and discussion
A single X-ray diffraction data set was collected to a resolution of 1.47 Å. Initial indexing of the data suggested that the crystal contained a primitive orthorhombic lattice with two molecules in the asymmetric unit. However, attempts to find a molecular-replacement solution using the CEACAM5 monomer as a search model yielded no solutions that would refine in any of the orthorhombic space groups. The data were reprocessed in a primitive monoclinic lattice with a β angle of 89.99°, which led to the correct solution in space group P21 with four copies of the search model found in the asymmetric unit. 〈|L|〉 tests of the data (0.503) show that the data are untwinned and no pseudo-merohedral twinning was detected. All residues in the final model were modeled except for the initial alanines of two of the four molecules in the asymmetric unit. The final model contained three chloride ions and 269 waters. The final model was refined to an R
cryst and R
free of 0.143 and 0.193, respectively. The structure factors and model have been deposited in the Protein Data Bank (PDB entry 4y89).The closest homologs of CEACAM7 in the PDB are CEACAM1 and CEACAM5, which both share 65% sequence identity with CEACAM7, with 38 residues differing between the proteins (Fig. 1 ▸
a). The overall fold of CEACAM7 is similar to those of the other CEACAMs that have been determined previously. The overall topology is that of the V-set fold of the immunoglobulin superfamily, comprised of two β-sheets labeled ABED and A′GFCC′C′′. The sheets are connected by the BC, EF, C′′D and AA′ loops (Fig. 1 ▸
b). The r.m.s.d.s of the CEACAM7 molecules to each other are low (∼0.30 Å), showing no structural differences within the asymmetric unit. The four molecules of CEACAM7 form two pairs of dimers (Fig. 1 ▸
c). The dimer interface is formed from the second β-sheet, A′GFCC′C′′, specifically the GFCC′C′′ strands and the CC′, C′C′′ and FG loops (Fig. 1 ▸
c). Dimerization buries 1610 Å2 of solvent-accessible surface area as calculated by PISA (Krissinel & Henrick, 2007 ▸). This is similar to the CEACAM1 and CEACAM5 homodimers, which bury 1600 and 1460 Å2 of solvent-accessible surface area, respectively. The shape-complementarity value (Sc; Lawrence & Colman, 1993 ▸) is 0.68, which is smaller than those for the other CEACAMs, with values of 0.81 and 0.72 for CEACAM1 and CEACAM5, respectively. CEACAM7 forms nine hydrogen bonds in the dimerization interface. This is more than CEACAM5 (six hydrogen bonds) but less than CEACAM1 (16 hydrogen bonds). Of the 38 residues in CEACAM7 that differ from CEACAM1 and CEACAM5, eight are found in the dimerization interface.
Figure 1
(a) Sequence alignment of CEACAM1, CEACAM5 and CEACAM7. Residues buried in the dimerization interface are shown in bold. (b) Overall topology of the CEACAM7 fold. (c) Side and top view of the CEACAM7 dimer. (d) Top, sedimentation-equilibrium experiment of CEACAM7 (28 µM) in 50 mM Tris–HCl, 50 mM sodium chloride pH 7.5 with rotor speeds of 29 000, 32 000 and 35 000 rev min−1 (blue, red and green curves, respectively). Bottom, residuals of fitted data for each curve.
The dimerization constant of CEACAM5 was determined previously to be 0.8 µM by analytical ultracentrifugation (Korotkova et al., 2008 ▸). CEACAM1 does dimerize but forms high molecular-weight oligomers (Korotkova et al., 2008 ▸). The dimerization constant of CEACAM7 was estimated by sedimentation-equilibrium analysis using analytical ultracentrifugation (Fig. 1 ▸
d). An estimated average molecular weight of 24.7 ± 0.7 kDa (the theoretical monomer molecular weight is 12 574 Da) and a K
dimerization of 95 nM (+20/−60 nM) were measured, a tenfold increase in affinity when compared with CEACAM5. We observed no higher molecular weight oligomers for CEACAM7 other than the dimer.Superposition of the CEACAM7 dimer (A + B) onto the CEACAM5 dimer (A + B) was achieved using one half of each dimer (molecule A) and two r.m.s.d.s were calculated: one for the first half of the dimer (molecule A), which results in an r.m.s.d. of 0.67 Å, and the second for the second half, which shows a larger r.m.s.d. of 2.70 Å (molecule B; Fig. 2 ▸
a). Superposition of CEACAM7 onto the CEACAM1 dimer using the same method reveals similar r.m.s.d.s of 0.83 Å for molecule A and 2.26 Å for molecule B (Fig. 2 ▸
b). Closer comparison of CEACAM7 with CEACAM5 and CEACAM1 shows two major regions of deviation. The first is the BC loop (residues 23–29), which is not involved in the dimerization interface (Fig. 2 ▸
c). This region is highly conserved among members of the CEACAM family; however, positions 25–26 of CEACAM7 differ from those of the other CEACAMs. In all other CEACAMs these residues are Leu25 and Pro26. However, in CEACAM7 they are Glu25 and Ser26. The loss of Pro26 is likely to reduce the rigidity of the loop. The r.m.s.d. of this loop in CEACAM7 is 3.10 Å relative to CEACAM5 and the displacement of this loop causes a slight movement of the N-terminal β-strand. Notably, the preceding residue is Asn24, a unique N-linked glycosylation site found only in CEACAM7 and CEACAM4. Three other glycosylation sites are present in CEACAM7 (Asn52, Asn72 and Asn79). These are highlighted in Fig. 2 ▸(d), showing that none are found in the dimerization interface. Glycosylation of CEACAM5 has been shown to be important for interaction with CD8α (Roda et al., 2014 ▸) and therefore may also be important for CEACAM7 function.
Figure 2
(a) Superposition of the CEACAM7 dimer (CEA7A and CEA7B; red and pink, respectively) onto the CEACAM5 dimer (CEA5A and CEA5B; light and dark cyan, respectively) through molecule A. (b) Superposition of the CEACAM7 dimer (CEA7A and CEA7B; red and pink, respectively) onto the CEACAM1 dimer (CEA1A and CEA1B; blue and lilac, respectively) through molecule A. (c) Alignment of CEACAM7 and CEACAM5 monomers, colored by r.m.s.d. Dark blue is low r.m.s.d. and red is high r.m.s.d. CEACAM1 is omitted for clarity. (d) Potential N-linked glycosylation sites of the CEACAM7 dimer. All residues are solvent-exposed and are not found in the dimerization interface.
The second region of deviation between CEACAM7, CEACAM5 and CEACAM1 is the C′C′′ loop and the C′′ strand. CEACAM7 is unique compared with other CEACAMs, as the C′C′′ loop contains a single amino-acid insertion (isoleucine) between residues 52 and 53. To accommodate the insertion in the C′C′′ loop without altering the length of the C′C′′ or the C′′D loops, the C′′ strand is found to be distorted relative to CEACAM5 (Fig. 3 ▸
a) and CEACAM1 (Fig. 3 ▸
b). The C′′ strand bulges in the center (residues 56–60), causing breakages of hydrogen bonds in the antiparallel β-sheet between the C′ and C′′ strands. The C′′ strand of CEACAM7 is held in place through a hydrogen bond from the O∊2 atom of Asn57 to the N atom of Gly48 (Fig. 3 ▸
c). In CEACAM5 the carbonyl group of Thr57 forms the hydrogen bond to Gly48 (Fig. 3 ▸
d). Although the C′′ strand does not buckle in CEACAM1, as in CEACAM5, it is found that Thr57 does form a hydrogen bond across the dimerization interface to Asp95 (Fig. 3 ▸
e). This buckling of the C′′ strand creates an extra interaction in the dimerization interface, potentially explaining why CEACAM7 forms such tight dimers and has not been found to form heterodimeric CEACAM complexes such as CEACAM6–CEACAM8, CEACAM1–CEACAM8, CEACAM1–CEACAM6, CEACAM3–CEACAM6 and CEACAM5–CEACAM6 (Oikawa et al., 1991 ▸; Kuroki et al., 2001 ▸; Skubitz & Skubitz, 2008 ▸; Singer et al., 2014 ▸). Although CEACAM1 does not buckle, it too creates an extra interaction across the dimerization through reorientation of Asp95. This also suggests that this hydrogen bond is important for a higher affinity interaction.
Figure 3
(a) The polypeptide backbone of the C′ and C′′ strands of CEACAM7 (red) and CEACAM5 (cyan). (b) The polypeptide backbone of the C′ and C′′ strands of CEACAM7 (red) and CEACAM1 (blue). (c) The insertion of Ser54 in CEACAM7 (red) results in breakage of the main-chain antiparallel hydrogen bonds and replacement with a hydrogen bond from the side chain of Asn57. This residue also forms a hydrogen bond across the dimer interface to Asn96 (brown). (d) In CEACAM5 (green), only the main-chain antiparallel hydrogen bonds exist. Thr57 is too short to form a hydrogen bond across the dimer interface to Asp95 (purple). (e) However, this is not the case for CEACAM1 (blue). Thr57 forms a hydrogen bond across the dimer interface to Asp95 (gray).
The structure of CEACAM7 reveals that the dimerization interface is comprised of the same face as other CEACAMs (GFCC′C′′) yet can accommodate 16 different residues (eight from each monomer), suggesting that these sequence differences can modulate the homodimerization to achieve a tenfold increase in affinity.PDB reference: CEACAM7, 4y89
Authors: T Streichert; A Ebrahimnejad; S Ganzer; R Flayeh; C Wagener; J Brümmer Journal: Biochem Biophys Res Commun Date: 2001-11-23 Impact factor: 3.575
Authors: Mark S Duxbury; Hiromichi Ito; Eric Benoit; Michael J Zinner; Stanley W Ashley; Edward E Whang Journal: Oncogene Date: 2004-07-29 Impact factor: 9.867
Authors: Vincent B Chen; W Bryan Arendall; Jeffrey J Headd; Daniel A Keedy; Robert M Immormino; Gary J Kapral; Laura W Murray; Jane S Richardson; David C Richardson Journal: Acta Crystallogr D Biol Crystallogr Date: 2009-12-21
Authors: Bernhard B Singer; Lena Opp; Annina Heinrich; Frauke Schreiber; Ramona Binding-Liermann; Luis Carlos Berrocal-Almanza; Kerstin A Heyl; Mario M Müller; Andreas Weimann; Janine Zweigner; Hortense Slevogt Journal: PLoS One Date: 2014-04-17 Impact factor: 3.240
Authors: G Roda; X Jianyu; M S Park; L DeMarte; Z Hovhannisyan; R Couri; C P Stanners; G Yeretssian; L Mayer Journal: Mucosal Immunol Date: 2013-10-09 Impact factor: 7.313
Authors: Yong Huang; Sushila Dalal; Dionysios Antonopoulos; Nathaniel Hubert; Laura H Raffals; Kyle Dolan; Christopher Weber; Jeannette S Messer; Bana Jabri; Albert Bendelac; A Murat Eren; David T Rubin; Mitch Sogin; Eugene B Chang Journal: Inflamm Bowel Dis Date: 2017-03 Impact factor: 5.325
Authors: Meagan Belcher Dufrisne; Nicole Swope; Marissa Kieber; Jeong-Yeh Yang; Ji Han; Jason Li; Kelley W Moremen; James H Prestegard; Linda Columbus Journal: Structure Date: 2022-02-25 Impact factor: 5.871
Authors: Nina M van Sorge; Daniel A Bonsor; Liwen Deng; Erik Lindahl; Verena Schmitt; Mykola Lyndin; Alexej Schmidt; Olof R Nilsson; Jaime Brizuela; Elena Boero; Eric J Sundberg; Jos A G van Strijp; Kelly S Doran; Bernhard B Singer; Gunnar Lindahl; Alex J McCarthy Journal: EMBO J Date: 2021-02-01 Impact factor: 11.598