Literature DB >> 25701227

Proteoglycan form and function: A comprehensive nomenclature of proteoglycans.

Renato V Iozzo1, Liliana Schaefer2.   

Abstract

We provide a comprehensive classification of the proteoglycan gene families and respective protein cores. This updated nomenclature is based on three criteria: Cellular and subcellular location, overall gene/protein homology, and the utilization of specific protein modules within their respective protein cores. These three signatures were utilized to design four major classes of proteoglycans with distinct forms and functions: the intracellular, cell-surface, pericellular and extracellular proteoglycans. The proposed nomenclature encompasses forty-three distinct proteoglycan-encoding genes and many alternatively-spliced variants. The biological functions of these four proteoglycan families are critically assessed in development, cancer and angiogenesis, and in various acquired and genetic diseases where their expression is aberrant.
Copyright © 2015.

Entities:  

Keywords:  Angiogenesis; Cancer growth; Glycosaminoglycan; Growth factor modulation; Proteoglycan

Mesh:

Substances:

Year:  2015        PMID: 25701227      PMCID: PMC4859157          DOI: 10.1016/j.matbio.2015.02.003

Source DB:  PubMed          Journal:  Matrix Biol        ISSN: 0945-053X            Impact factor:   11.583


Introduction

It has been nearly 20 years since the original publication of a comprehensive classification of proteoglycan gene families [1]. For the most part, these classes have been widely accepted. However, a broad and current taxonomy of the various proteoglycan gene families and their products is not available. In contrast to the classification of glycosaminoglycans (GAGs), primarily based on the chemical structure of their repeating disaccharide units, classifying proteoglycans is a much more complex task [2]. We propose a comprehensive and simplified nomenclature of proteoglycans based on three criteria including: Cellular and subcellular location, overall gene/protein homology, and the presence of specific protein modules within their respective protein cores. Whereas the first two attributes have been utilized in the past for various nomenclatures, the third attribute is of more recent development and represents a sort of “intrinsic signature” for various protein cores. Indeed, modular design is based on the simple concept that protein cores are made up of finite units, like pieces of Lego. The units represent a minimum level of organization and a module can be thought of as a functional domain that affects cell–matrix dynamics. Another key feature is that each module/functional unit can be stable and can fold on its own, without being part of the large precursor protein. Thus, a module is a self-contained component. An example of this is the LG3 domain of endorepellin, the C-terminal globular-like domain of perlecan, which has recently been crystallized [3]. Below, we will critically assess the field of proteoglycans which now encompass forty three distinct genes and a much higher number of proteoglycans due to alternative splicing, thereby providing a very rich and biologically-active group of molecules. As hyaluronan and the enzymes involved in the synthesis and degradation of various GAGs are not covered in this review, readers are referred to recent reviews covering these closely-related subjects [4-18].

General features

Four major proteoglycan classes encompass nearly all the known proteoglycans of the mammalian genome (Fig. 1). Observing the types of proteoglycans based on cellular and subcellular localization, we can see that there is only one intracellular proteoglycan, serglycin. This unique proteoglycan forms a class on its own as it is the only proteoglycan that carries heparin side chains. Serglycin is packaged in the granules of mast cells and serves as biological glue for most of the intracellular proteases stored within the granules [19]. Another general observation is that heparan sulfate proteoglycans (HSPGs) are prevalently associated with the cell surface or the pericellular matrix. The HSPGs are intimately associated with the plasma membranes of cells, either directly via an intercalated protein core or via a glycosyl-phosphatidyl-inositol (GPI) anchor, and function as major biological modifiers of growth factors such as FGF, VEGF and PDGF among others. Similar functions are also performed by the HSPGs located in the basement membrane zone, in addition to their ability to interact with each other and with key constituents of the basement membrane, including various laminins, collagen type IV, and nidogen. Presentation of growth factors to their cognate receptors in a biologically-favorable form is a major function of cell surface and pericellular HSPGs. Another key role is participating in the generation and long range maintenance of gradients for morphogens during embryogenesis and regenerative processes.
Fig. 1

A comprehensive classification of proteoglycans. The four families are based on their cellular and subcellular location, homology at the protein and genomic levels and the presence of unique protein modules which are often shared by members of a given class. The key for the various modules is provided in the bottom panel. For additional details about structure and function, please consult the text.

As we move away from the cells in a centrifugal manner, chondroitin- and dermatan sulfate-containing proteoglycans (CSPGs and DSPGs, respectively) predominate. These proteoglycans function as structural constituents of complex matrices such as cartilage, brain, intervertebral discs, tendons and corneas. Thus, among other functions, they provide viscoelastic properties, retain water and keep osmotic pressure, dictate proper collagen organization and are the main molecules responsible for corneal transparency. The extracellular matrix also contains the largest class of proteoglycans, the so-called small leucine-rich proteoglycans (SLRPs) which are the most abundant products in terms of gene number. These SLRPs can function both as structural constituent and as signaling molecules, especially when tissues are remodeled during cancer, diabetes, inflammation and atherosclerosis. SLRPs interact with several receptor tyrosine kinases (RTKs) and Toll-like receptors, thereby regulating fundamental processes including migration, proliferation, innate immunity, apoptosis, autophagy and angiogenesis. Below we will discuss the rationale for grouping certain proteoglycans in the same class and their overall biological function.

Intracellular proteoglycans

It is quite amazing that since the original cloning of serglycin, the first proteoglycan-encoding gene to be sequenced, no other true intracellular proteoglycan has been discovered. Serglycin occupies a class of its own insofar as it is the only proteoglycan that is covalently substituted with heparin due to its consecutive (and quite unique) Ser-Gly repeats, essentially a silk-like sequence. Serglycin has been utilized primarily by mast cells for the proper assembly and packaging of the numerous proteases that are released upon inflammation [19]. The defects in the formation of mast cell granules observed in Srgn−/− mice are remarkably similar to those observed in mast cells derived from mice lacking N-deacetylase/N-sulfotransferase 2, a key enzyme involved in the sulfation of heparin [19]. Thus, serglycin promotes granular storage via electrostatic interaction between its highly-anionic heparin chains and basic residues within the various proteases of the secretory granules. It is becoming evident, however, that all inflammatory cells express serglycin and store it within intracytoplasmic granules where, in addition to proteases, serglycin binds and modulates the bioactivity of several inflammatory mediators, chemokines, cytokines and growth factors [20]. More recently, serglycin has been found in a wide variety of non-immune cells such as endothelial cells, chondrocytes and smooth muscle cells [21]. Cell-surface serglycin promotes adhesion of myeloma cells to collagen I and affects the expression of MMPs [22]. These findings have been corroborated by in vivo studies where serglycin knockdown attenuates the multiple myeloma growth in immunocompromised mice [23]. It has been proposed that some of these effects are mediated by a specific interaction between serglycin and cell-surface CD44 [23], a known receptor for hyaluronan [24,25]. It has been recently shown that serglycin is a key component of the cell inflammatory response in activated primary human endothelial cells as both LPS and IL-1β increase its synthesis and secretion [26]. Notably, serglycin can be substituted with chondroitin sulfate (CS), and in several circulating cells serglycin contains lower sulfated CS-4 chains [21]. In contrast, several hematopoietic cells (mucosal mast cells, macrophages etc.) express serglycin with highly sulfated CS-E. Although the significance of this phenomenon is not fully appreciated, it is likely that these isoforms of serglycin might have different functions in a cell-context specific manner. Serglycin is a marker of immature myeloid cells and interacts with many bioactive components including histamine, TNF-α and proteases [27]. In general, serglycin expression correlates with a more aggressive malignant phenotype and it has been recently proposed that serglycin protects breast cancer cells from complement attack, thereby supporting cancer cell survival and progression [28].

Cell surface proteoglycans

In this class, there are thirteen genes, seven encoding transmembrane proteoglycans and six encoding GPI-anchored proteoglycans. With the exception of two gene products, NG2 and phosphacan, all contain heparan sulfate side chains.

Syndecans

The eponym syndecan was coined by the late Merton Bernfield [29] to define a class of transmembrane proteoglycans that would connect (from the Greek syndein, “bind together”) the surface of the cells to the underlying extracellular matrix. The syndecan family now comprises four distinct genes encoding single-pass transmembrane protein cores which include an ectodomain, a transmembrane region and an intracellular domain [4,30] (Fig. 2). The ectodomains exhibit the lowest amount of amino acid sequence conservation, no more than 10–20%, in contrast to the transmembrane and cytoplasmic domains which are 60–70% identical. A recent study has shown that the ectodomain of syndecans is natively disordered and this characteristic allows syndecans to interact with a variety of proteins and ligands, thereby providing enrichment in their biological function [31]. The ectodomain contains the GAG attachment sites, which are often covalently-linked to HS and sometimes to CS, making syndecans hybrid proteoglycans. Several cell types shed syndecan into the pericellular environment through the action of MMPs. For example, it has recently been shown that shed syndecan-2 retards angiogenesis by inhibiting endothelial cell migration [32], a key step in neovascularization [33]. The transmembrane domain contains a dimerization motif (GxxxG) that mediates both homo-dimerization and hetero-dimerization [30]. The intracellular domain is composed of two regions of conserved amino acid sequence (C1 and C2), separated by a central variable sequence of amino acids that is distinct for each family member (V) [34]. Notably, the C-terminus of all the four syndecans harbors a unique signature (EFYA) that binds PDZ-containing proteins. Generally, PDZ-containing proteins contribute to a proper anchor of transmembrane proteins to the cytoskeleton, thereby holding together large signaling complexes.
Fig. 2

Schematic representation of the cell surface proteoglycans, which comprise transmembrane type I (the N-terminus is outside of the plasma membrane) proteoglycans (four syndecans, CSPG4/NG2, betaglycan and phosphacan) and six GPI-anchored proteoglycans, glypicans 1–6. The type of GAG chain and the major protease sensitive sites are indicated. The key for the various modules is provided in the bottom panel.

Syndecans are involved in a wide variety of biological functions, too vast to be reviewed here, but reviewed recently [5,30,34]. Briefly, syndecans bind numerous growth factors, especially through their HS chains, and dictate morphogen gradients during development. In concert with other cell-surface HSPGs, syndecans can act as endocytosis receptors and are also involved in the uptake of exosomes [35]. Syndecans play key roles as co-receptors for many RTKs and can also function as receptors for atherogenic lipoproteins [36]. Indeed, there is strong genetic evidence that syndecan-1 is the main HSPG mediating clearance of triglyceride-rich lipoproteins derived from either the liver or from intestinal absorption [37]. Many, if not all the syndecans, can also act as soluble HSPGs via partial proteolysis of their juxtamembrane region releasing their whole ectodomains. This shedding is considered a powerful post-translational modification that can regulate the amount of HSPG linked to the cell surface and that present in the pericellular microenvironment [30]. Several inflammatory cytokines can induce syndecan shedding by triggering outside-in signaling and by activating several metalloproteinases. In the case of hepatocytes, shedding of syndecan-1 occurs via PKC-dependent activation of ADAM17, and this impairs VLDL catabolism and promotes hypertriglyceridemia [38]. Importantly, soluble syndecan-1 promotes the growth of myeloma tumors in vivo [39], and this process, i.e. the shedding of syndecan-1, is enhanced by heparanase [40], thereby offering a novel mechanism for promoting cancer growth and metastasis [41,42]. Notably, chemotherapy stimulates syndecan-1 shedding, a potential drawback of the treatment that could potentially favor tumor progression [43]. The biological interplay between heparanase-evoked shedding of syndecan-1 and myeloma cells leads to enhanced angiogenesis [44], further supporting cancer growth. As mentioned above, however, shed syndecan-2 inhibits angiogenesis via a paracrine interaction with the protein tyrosine phosphatase receptor CD148, which in turn deactivates β1-containing integrins [32], presumably α1β1 and α2β1, two main angiogenesis receptors. In contrast, the ortholog syndecan-2 is required for angiogenic sprouting during zebrafish development [45]. An emerging new role for syndecan-1 is linked to its ability to reach the nuclei in a variety of cells. Initial observations showed that myeloma and mesothelioma cells contain syndecan-1 in their nuclei [46,47] and this nuclear translocation is also regulated by heparanase [46], indicating that there must be a cellular receptor for shed syndecan-1 that could mediate its nuclear targeting and transport. In support of these studies are previous observations that exogenous HS can translocate to the nuclei and modulate the activity of DNA Topoisomerase I [48] and histone acetyl transferase (HAT) [49]. N-terminal acetylation of histones by HAT is linked to transcriptional activation, and this process is finely tuned by its counteracting enzyme, histone deacetylase (HDAC). Heparanase-evoked loss of nuclear syndecan-1 causes an increase in HAT enzymatic activity and enhances transcription of pro-tumorigenic genes [50]. Syndecan-1 that is shed from myeloma tumor cells is uptaken by bone marrow stromal cells and is transported to the nuclei by a mechanism that requires its HS chains, as this process is inhibited by heparin and chlorate [51]. Once nuclear, soluble syndecan-1 binds to HAT p300 and inhibits its activity, thereby providing a new mechanism for tumor–host cell interaction and cross-talk [52].

CSPG4/NG2

The melanoma-associated chondroitin sulfate proteoglycan (MCSP) was discovered over 30 years ago as a transmembrane proteoglycan and a highly immunogenic tumor antigen of melanoma tumor cells. This proteoglycan has been subsequently detected in various species, with many names designating the same gene product. The rat ortholog of MCSP is called nerve/glial antigen 2 (NG2) [53], while the term CSPG4 designates the human gene. We will use CSPG4/NG2 terminology with the idea that some of the functional properties have not been fully described in the human and rat species [54]. CSPG4/NG2 is a single-pass, type I transmembrane proteoglycan carrying one chondroitin sulfate chain, and harboring a large ectodomain composed of three subdomains (Fig. 2). The N-terminal domain (D1 subdomain) contains two laminin-like globular (LG) repeats. It is likely that the LG domains as in other proteoglycans (i.e. perlecan and agrin, see below) mediate ligand binding, cell–matrix and cell–cell interactions, as well as interaction with integrins and receptor tyrosine kinase (RTK). The central subdomain D2 contains 15 tandem repeats of a new module called CSPG [54]. The CSPG repeat is a cadherin-like and tumor-relevant module which is predicted to be involved in cell–matrix interaction, further modulated by the CS chain covalently attached to this module. Indeed, CSPG modules bind to collagens V and VI, FGF and PDGF. The juxtamembrane subdomain D3 contains a carbohydrate modification able to bind integrins and galectin, as well as numerous protease cleavage sites. Accordingly, the intact ectodomain and fragments thereof can be detected in sera from normal and melanoma-carrying patients [54]. The transmembrane domain of CSPG4/NG2 is quite interesting insofar as it has a unique Cys residue, generally not found in transmembrane regions. The intracellular domain harbors a proximal region with numerous Thr phospho-acceptor sites for PKCα and ERK1/2, and a distal region encompassing a PDZ-binding module similar to the syndecan family. The latter can bind to the PDZ domain of several scaffold proteins involved in intracellular signaling, including syntenin, MUPP1 and GRIP1. Functionally, CSPG4/NG2 proteoglycan promotes tumor vascularization [55] and because of its predominant perivascular localization, CSPG4/NG2 may modulate the availability of FGF at the cell surface as well as the bioactivity and signal transduction of FGF receptors [56]. This CSPG binds to collagen VI in the tumor microenvironment and promotes cell survival and adhesion via the PI3K pathway [57]. Indeed, targeting CSPG4/NG2 in two animal models of highly-malignant brain tumors reduces tumor growth and angiogenesis [58]. Moreover, a combinatorial treatment using activated natural killer cells and a monoclonal antibody toward CSPG4/NG2 is capable of eradicating glioblastoma xenografts more efficiently than single therapies [59]. It has recently been discovered that NG2 controls the directional migration of oligodendrocyte precursor cells by constitutively stimulating RhoA GTPases [60]. Based on NG2 ability to regulate adhesion, RhoA GTPase and growth factor activities, it is likely that this transmembrane proteoglycan might play a key role in regulating cell polarity in response to extracellular cues [61]. Perdido/Kon-tiki, the Drosophila ortholog of mammalian CSPG4, genetically interacts with integrins during Drosophila embryogenesis, and its loss is embryonic lethal [62]. RNAi-mediated suppression of Perdido/Kon-tiki in the muscles, just before adult myogenesis starts, induces misorientation and detachment of Drosophila adult abdominal muscle, generating a phenotype similar to the embryonic lethal ones [63]. Thus, it is possible that, based on its high conservation through species, mammalian CSPG4 could also play a role in myogenesis and function as well. A recent study has added another function to CSPG4 by involving this cell surface proteoglycan in the pathogenesis of severe pseudomembranous colitis. CSPG4 acts as a receptor for the Clostridium difficile toxin B, one of the key toxins secreted by this gram-positive and spore-forming anaerobic bacillus [64]. The interaction occurs between the N-terminus of CSPG4 and the C-terminus of toxin B. This discovery, if confirmed in future studies, opens new therapeutic targets for the treatment of this severe and often lethal form of enterocolitis.

Betaglycan/TGFβ type III receptor

In 1991, two back-to-back papers reported on the isolation and cloning of a membrane-anchored proteoglycan with high affinity for TGFβ, and thus named betaglycan [65,66]. Betaglycan, also known as TGFβ type III receptor (TGFB3), is a single-pass transmembrane proteoglycan that belongs to the TGFβ superfamily of co-receptors (Fig. 2). The extracellular domain contains several potential GAG attachment sites and protease-sensitive sequences near the plasma membrane. The short intracellular domain is highly enriched in Ser/Thr (>40%) and some of these residues are candidate sites for PKC-mediated phosphorylation [65]. Betaglycan amino acid sequence is highly similar to that of endoglin, a close member of the same superfamily. The membrane-proximal ectodomain of betaglycan contains a unique module called zona pellucida (ZP)-C [67]. The ZP module is a structural element typically found in the ectodomain of eukaryotic proteins composed of a Cys-rich bipartite structure joined by a linker. Generally, proteins harboring ZP modules tend to polymerize and assemble into long fibrils of specialized extracellular matrices [67]. In the case of betaglycan and endoglin these ZP modules are not utilized for polymerization, rather they function as membrane co-receptors for the TGFβ superfamily members [68]. The intracellular domain contains a PDZ-binding element similar to that observed in the syndecan family of proteoglycans (Fig. 1). Betaglycan is a ubiquitously-expressed cell surface proteoglycan that acts as a co-receptor for members of the TGFβ superfamily of Cys knot growth factors which also include activins, inhibins, GDFs and BMPs [69,70]. For example, betaglycan enhances the binding of all the TGFβ isoforms to the signaling TGFβ complex [71] and is needed for TGFβ2 high-affinity interaction with the receptor complex. Betaglycan also blocks the aggressiveness of ovarian granulosa cell tumors by suppressing NF-κB-evoked MMP2 expression [72]. Betaglycan, together with other TGFβ-binding SLRPs, i.e. decorin and biglycan (see below), can be cleaved by granzyme B, thereby releasing an active form of TGFβ [73]. Ectodomain shedding of betaglycan is indeed necessary for betaglycan-mediated suppression of TGFβ signaling and breast cancer migration and invasion [74]. The ability of betaglycan to affect epithelial mesenchymal transformation [70], together with genetic evidence of embryonic lethality in Tgfbr3−/− mice, suggests that betaglycan may play a unique and non-redundant function during development. Another important feature of betaglycan is its ability to modulate the subcellular topology of the signaling receptor complex via its PDZ-binding domain, which interacts with PDZ-containing proteins such as β-arrestin [75]. This interaction, as well as that between betaglycan intracellular domain and GIPC, would stabilize betaglycan at the cell surface and potentiate its bioactivity. Finally, betaglycan is involved in regulating many functions including reproduction and fetal growth [75], and is a putative tumor suppressor in many forms of cancer [76]. Several additional betaglycan-evoked activities have been recently reviewed elsewhere [75].

Phosphacan/receptor-type protein tyrosine phosphatase β

Phosphacan, originally isolated from rat brain, is a CSPG that interacts with neurons and neural cell-adhesion molecules (N-CAM) and corresponds to the soluble ectodomain of a Receptor-type protein tyrosine phosphatase β (RPTPβ) [77]. The phosphacan gene (PTPRZ1) encodes a single-pass type I membrane protein with a relatively large ectodomain harboring an N-terminal module homologous to the alpha-carbonic anhydrase (Fig. 2). Distal to this, there is a fibronectin type III domain. The ectodomain contains six Ser-Gly repeats, at least four of which are flanked by acidic residues suggesting potential glycanation sites. Sporadically, phosphacan can also be substituted with keratan sulfate chains. Notably, alternative splice variants encoding different protein isoforms have been described but their full-length nature has not yet been established. Functionally, the ectodomain of phosphacan mediates cell–cell adhesion by hemophilic binding. In addition, phosphacan's ability to bind N-CAM and tenascin in a calcium-dependent manner suggests that RPTPs may also modulate cellular interactions via heterophilic mechanisms [77]. Indeed, phosphacan blocks the growth-promoting ability of N-CAM, axonin-1 TAG-1 and tenascin, and is crucial in the oriented movement of post-mitotic cells during cortical development of the brain [78]. Moreover, phosphacan binds contactin, another member of the Ig superfamily like N-CAM, and the extracellular portion of the voltage-gated sodium channel [79]. The latter interaction appears to be mediated by the carbonic anhydrase-like module of phosphacan's ectodomain. It has been proposed that phosphacan, as an integral extracellular matrix constituent of the neural stem cell compartment, would contribute to the privileged microenvironment that supports self-renewal and maintenance of the neural stem cell niche [80].

Glypicans/GPI-anchored proteoglycans

Glypicans (GPC) are HSPGs that are bound to the plasma membrane via a C-terminal lipid moiety known as GPI, for glycosylphosphatidylinositol, linkage or anchor (Fig. 2). There are six independent genes in the mammalian genome which can be subdivided into two broad classes: GPC1/2/3/6 and GPC3/5 with orthologs present across Metazoan including Dally and Dlp in Drosophila melanogaster [81]. Although most of the protein core is unique to this family, there is a stretch of amino acid in the ectodomain of the protein core with similarity to the Cys-rich domain of Frizzled proteins. There are two unique features in the structural organization of all glypicans, with potentially important functional implications. First and in contrast to syndecans, the attachment of the GAG chains – mostly HS chains – is located near the juxtamembrane region. This allows the three linear HS chains to span a great deal of plasma membrane surface, thereby presenting various morphogens and growth factors in an active configuration to their cognate receptors. Indeed, glypicans bind to and modulate the activity of Hedgehog (Hh), Wnt, and FGFs [82-84]. More recently, it has been shown that glypican-3 binds to Frizzled thereby acting directly in the modulation of canonical Wnt signaling [85]. Second, glypicans are dually processed via partial proteases and lipases. In the former case, the ectodomain of glypicans is processed via endoproteolytic cleavage by a furin-like convertase. This processing generates two subunits that are then bound via disulfide bonds, in a way similar to the Met receptor. In the latter case, the entire glypican proteoglycan is released from the plasma membrane via an extracellular lipase (Notum) that cleaves the GPI anchor. Drosophila studies have shown that the Notum-mediated release of glypican can regulate morphogen gradients including Wnt, BMP and Hh gradients [84]. Notably, the anchorless GPC-1, devoid of the GPI anchor, is a stable α-helical protein that rests high concentrations of urea and guanidine HCL [86]. Unfolding data are consistent with a two-state model, suggesting that GPC-1 protein core is a densely-packed globular protein. In agreement with these data, the crystal structure of the Drosophila glypican Dally-like protein has revealed an extended α-helical fold [87]. The crystal structure of human GPC-1 is very similar to Drosophila Dally-like, and consists of a stable α-helical domain with 14 conserved Cys residues, followed by a GAG attachment site that is exclusively substituted with HS chains [88]. Of interest, removal of the α-helical domain leads to substitution with CS chains instead of HS chains, indicating that there is a “message” embedded in the α-helical domain that drives a different posttranslational modification [88]. Functionally, glypicans have been involved in the control of tumor growth and angiogenesis. For example, glypican-3 has been implicated in cancer and growth control. Human mutations of GPC3 cause the rare X-linked Sympson–Golabi–Behmel (SGB) syndrome, characterized by both pre- and post-natal overgrowth, abnormal craniofacial features, cardiovascular anomalies, renal dysplasia and urinary tract malformations [84]. Originally, it was hypothesized that GPC3 was an inhibitor of IGF-II, given the prominent function of IGF-II in developmental growth. However, it was later found that the levels of IGF-II do not change in Gpc3−/− mice nor does GPC3 interacts with IGF-II. It appears that GPC3 is an inhibitor of the Hh signaling, insofar as the Hh-dependent signaling activity is elevated in Gpc3−/− mice. Moreover, purified glypican-3 binds with high affinity to Indian and Sonic Hh as well as it competes with Patched for Hh binding [83,89]. A recent study has shown that processing by convertases is required for GPC3-evoked suppression of Hh signaling, and this process is dependent on the HS chains and their degree of sulfation [90]. Thus, the glypican family is not only complex in nature, but is also the control of various modifying enzymes (proteases and lipases) that modulate its biological activity. We are positive than many “surprises” will happen in the future regarding unsuspected biological functions of various glypicans.

Pericellular and basement membrane zone proteoglycans

This group of four proteoglycans is closely associated with the surface of many cell types anchored via integrins or other receptors, but they can also be a part of most basement membranes. Pericellular proteoglycans are mostly HSPGs and include perlecan and agrin, which share homology especially at their C-termini, and collagens XVIII and XV, which share homology at their N- and C-terminal noncollagenous domains (Fig. 1).

Perlecan

Perlecan is a modular HSPG encoded by a large gene [91,92] with a complex promoter [93-95]. The ~500-kDa protein core is composed of 5 domains with homology to SEA, N-CAM, IgG, LDL receptor and laminin [96,97] (Fig. 3). The terminal LG3 domain has been crystallized and reveals a jellyroll fold characteristic of other LG modules [3]. Perlecan is expressed by both vascular and avascular tissues [97-101], and is ubiquitously located at the apical cell surface [102,103] and basement membranes [98,104-106]. Perlecan regulates various biological processes primarily because of its widespread distribution [101,105] and its ability to interact with various ligands and RTKs [107], and more recently the potential utilization of perlecan splice variants in mast cells [108]. Perlecan is an early responsive gene and is induced by TGFβ [109] and repressed by interferon γ [95]. The heparan sulfate chains of perlecan and the protein core can be cleaved by heparanase and various proteases [110-112], respectively, releasing various pro-angiogenic factors [113].
Fig. 3

Schematic representation of the pericellular proteoglycans, which comprise perlecan agrin, and collagens XVIII and XV. The collagenous (COL) and non-collagenous (NC) domains of collagen XVIII are numbered on the top and bottom of the lower schematics. For brevity only the structure of collagen XVIII is shown. The key for the various modules is provided in the bottom panel.

Perlecan is involved in modulating cell adhesion [114,115], lipid metabolism [116], thrombosis and cell death [117,118], biomechanics of blood vessels and cartilage [119-121], skin and endochondral bone formation [122,123], and osteophyte formation [124]. Perlecan binds and modulates the activity of several growth factors and morphogens [106,125-129] and its expression is often deregulated in several types of cancer [130-134]. In Drosophila, perlecan, known as Trol (for terribly reduced optical lobe) regulates Fgf and Hh signaling to activate neural stem signaling [135,136]. In addition, Trol is essential for the architecture and maintenance of the lymph gland and for the proliferation of blood progenitor cells [137]. Loss of Trol is associated with premature differentiation of hemocytes and this phenotype can be rescued by ectopic expression of Hh [137]. In mice, Hspg2 controls neurogenesis in the developing telencephalon [138]. Moreover, perlecan can act as a lipoprotein receptor and mediate its endocytosis and catabolism [116]. Specifically, domain II of perlecan has been shown to bind low density lipoproteins and this interaction is mediated by the O-linked oligosaccharides [139], suggesting an important role for perlecan in atherogenesis and lipid retention. Perlecan is a complex regulator of vascular biology and tumor angiogenesis [33,140,141] by performing a dual function: via the N-terminal HS chains, perlecan is pro-angiogenic [96] by binding and presenting VEGFA and various FGFs to their cognate receptors [33,141-152]. Moreover, heparanase-mediated cleavage of basement membrane perlecan releases FGF10 and enhances salivary gland branching morphogenesis [153]. Indeed, ablating Hspg2 or preventing Hspg2 expression in early embryogenesis causes severe cardiovascular defects [154-157]. The critical role for the N-terminal HS chains of perlecan has been elegantly demonstrated by the generation of mice harboring a genomic deletion of exon 3, designated Hspg2Δ3/Δ3 mice, which encodes the SGDs responsible for the covalent attachment of HS chains [158]. These mutant mice have impaired angiogenesis, delayed healing after experimental wounding and suppression of tumor growth [159]. When challenged with flow cessation of the carotid artery, the Hspg2Δ3/Δ3 mice show an enhanced intimal hyperplasia and smooth muscle cell proliferation [160,161]. Moreover, during mouse hind-limb ischemia, the HS chains of perlecan are key regulators of the angiogenic response [162].Collectively, these studies reaffirm the role of HS perlecan in modulating pro-angiogenic factors such as FGF2, VEGFA and PDGF. More recently other functions of perlecan have been discovered. Using a lethality-rescued Hspg2−/− where perlecan was reintroduced into the cartilage, it was found that perlecan deficiency leads to significant depression of endothelial nitric oxide synthase [163]. This leads to endothelial cell dysfunction, as shown by attenuated endothelial relaxation, likely as a consequence of endothelial nitric oxide synthase expression. This is another example of how a secreted HSPG affects the biology of vascular endothelial cells likely through a receptor-mediated signaling pathway. Another recently unveiled function of perlecan is its ability to bind the clustering molecule gliomedin [164]. In this case, perlecan binds dystroglycan at nodes of Ranvier which are required for fast conduction and accumulation of Na+ channels. Perlecan seems to enhance clustering of nodes of Ranvier components via a specific interaction with gliomedin. Thus, perlecan may have specific roles in the biology and pathophysiology of peripheral nodes [164]. In contrast to the pro-angiogenic N-terminal domain I, the C-terminal processed form of perlecan domain V, named endorepellin [165], has a nearly opposite function: it inhibits endothelial cell migration, capillary morphogenesis, and in vivo angiogenesis [166-169]. A global proteomic analysis of human serum has identified endorepellin as a major circulating protein [170]. Moreover, endorepellin has been detected in extracts of fetal cartilage, exclusively in the hypertrophic zone, and it was speculated that processing of perlecan protein core in the growth plate could play a role in inhibiting blood vessel invasion or formation in cartilage [171]. Elevated endorepellin/LG3 peptides were found in the plasma proteome of patients with refractory cytopenia with multilineage dysplasia [172], and in the urine of end-stage renal failure patients [173]. These LG3 fragments had N-terminal residues (i.e., cleaved by BMP-1) identical to those reported by us [174]. Similar LG3 fragments are elevated in the urine of patients with chronic allograft nephropathy [175,176], in the amniotic fluid of pregnant women [177] with a marked increase in women with premature rupture of fetal membranes [178,179] and those carrying trisomy 21 fetuses [180]. Recently, LG3 peptides have been proposed to represent a potential marker of physical activity [181]. Endorepellin fragments have also been detected in the urine of children with sleep apnea [182], in the media conditioned by apoptotic endothelial cells [118,183,184], and in the secretome of pancreatic and colon carcinoma cells [174,185-188]. Endorepellin can be pro-angiogenic in brain infarcts due to the lack of anti-angiogenic α2β1 integrin and the presence of the pro-angiogenic α5β1 integrin receptor for endorepellin in brain microvascular endothelial cells [189]. In this context, LG3 can be released by oxygen-glucose deprivation and can be neuroprotective [190,191]. Finally, circulating LG3 levels are reduced in patients with breast cancer, suggesting that reduced LG3 titers might be a useful biomarker for cancer progression and invasion [192]. Mast cells produce shorter forms of perlecan including functional endorepellin, suggesting a potential role of endorepellin in inflammation and tissue repair [193]. Moreover, MMP-7 processing of perlecan in the prostate cancer stroma acts as a molecular switch to favor cancer invasion [112]. Thus, processed forms of perlecan protein core harboring domains III and IV can function as protumorigenic factors. Endorepellin binds to the α2β1 integrin receptor [140,166,194], and tumor xenografts generated in α2β1−/− mice are insensitive to systemic delivery of endorepellin [168]. Endorepellin triggers the activation of the tyrosine phosphatase SHP-1 which, in turn, dephosphorylates and inactivates various RTKs including VEGFR2 [195]. Soluble endorepellin alters the proteomic profile of human endothelial cells [196], and exerts a dual receptor antagonism by concurrently targeting VEGFR2 and the α2β1 integrin [197]. Notably, the proximal LG1/2 domains bind the Ig3–5 domain of VEGFR2 while the terminal LG3 domain, release by BMP-1/Tolloid-like metalloproteinases [174], binds the α2β1 integrin [198]. This dual signaling causes: (a) Disassembly of actin filaments and focal adhesions, via the α2β1 integrin, leading to suppression of endothelial cell migration [198,199], and (b) Activation of SHP-1 dephosphorylates Tyr1175, a key residue in the cytoplasmic tail of VEGFR2, and consequent transcriptional inhibition of VEGFA [200]. More recently, we have discovered that endorepellin induces autophagy in endothelial cells via VEGFR2 signaling [201], similar to decorin (see below). This novel function could contribute to the angiostatic properties of this interesting fragment of perlecan protein core.

Agrin

The second pericellular/basement membrane HSPG is agrin. A C-terminal portion of agrin lacking HS chains was first isolated from the Torpedo electric organ as an agent responsible for acetylcholine receptor (AChR) clustering, thereby the eponym agrin, from the Greek ageirein, meaning “to assemble” [202]. The majority of the research on agrin in mammalians has focused on agrin's contribution to the control of the postsynaptic apparatus in the neuromuscular junction. However, after many years of research, it was serendipitously discovered that agrin was indeed an HSPG interacting with N-CAM in the avian brain [203]. Subsequently, orthologs of agrin have been cloned from multiple species and are all highly homologous. Agrin has a multimodular structural organization that is homologous to that of perlecan with potential generation of several splice isoforms. The N-terminal region can be spliced to generate either a Type II transmembrane form (TM) of agrin, highly expressed in nervous tissue, or an isoform associated with most basement membranes that contains the N-terminal-agrin (NtA) domain (Fig. 3). In the central nervous system, TM agrin is highly expressed by axons and dendrites; thus, neurite-associated TM agrin could potentially function as receptor or co-receptor for neurite function. The NtA domain has high affinity for the laminin γ1 chain's coiled-coil domain, thereby functioning as a link between the cell surface and the basement membrane. Following the N-terminal domain is a stretch of nine follistatin-like (FS) repeats, also known as Kazal-type protein inhibitor domains [204]. The last two repeats are separated by an insertion of two laminin EGF-like (LE) domains. Notably, overexpression of TM agrin in non-neuronal cells induces filipodia-like processes similar to those induced in CNS neurites, and this bioactivity was localized to FS repeat seven [205]. Thus FS modules can modulate an important biological activity of neurons by affecting the reorganization of the actin cytoskeleton during active neurite growth. Following the FS repeats, there are two Ser/Thr (S/T)-rich domains which can be alternatively spliced (especially the second ST module) to generate an X+/− form [204]. The two S/T modules are separated by a SEA module, similar to that of perlecan (see above), known to be involved in regulating O-glycosylation of mucins and glycoproteins. The N-terminal and central regions of agrin protein core contain the attachment sites for HS chains, and rotary shadowing electron microscopy has revealed three attachment sites for HS chains [206]. However, agrin can be a hybrid HS/CSPG with two clusters of Ser-Gly sequences, one primarily carrying HS chains located between FS repeats 7 and 8, and one carrying mostly CS chains, located in the first S/T module [207]. An agrin fragment harboring all protein modules described so far inhibits neuronal outgrowth independently of HS or CS [208]. The HS chains of agrin, however, bind FGF2, thrombospondin, β-amyloid peptide, N-CAM, and the protein tyrosine phosphatase δ [209]. The C-terminus of agrin is structurally organized as perlecan domain V/endorepellin, with three LG domains separated by EGF-like modules (Fig. 3). The only difference is the position of the EGF repeats vis-à-vis the LG domains. The LG domains of agrin bind α-dystroglycan in skeletal muscle and low-density lipoprotein-like receptor 4 (LRP4) [210]. The latter interaction activates the RTK MuSK which initiates a signaling cascade that leads to the formation of pre- and post-synaptic specializations. The terminal LG3 domain of agrin can be alternatively spliced with inserts of 8,11 and 19 residues and their bioactivity is influenced by Ca2+ binding [211]. Moreover, the overall function of agrin is regulated by site-specific processing via MMPs [212]. Agrin is a good example, together with perlecan, of the evolved mechanisms in molecular recognition and function achieved through utilization of common protein folds, such as LG modules [211]. Thus, both agrin and perlecan bind, via their LG-rich C-termini, multiple cell surface receptors including RTKs, and can potently modulate cardiovascular and musculoskeletal systems. Importantly, conjugation of LG modules of agrin and perlecan to polymerizing laminin-2 evokes clustering of acetylcholine receptors [213]. These data provide strong support for a cooperative function of basement membrane HSPGs in AChR assembly and function. Of interest, recessive missense mutations in the AGRN genes cause congenital myasthenic syndromes characterized by defective neuromuscular transmission [214]. More recently, AGRN recessive missense mutations have been identified as causative factor for a congenital myasthenic syndrome with distal muscle weakness and atrophy, resembling distal myopathy [215]. Given the large number and heterogeneous groups of neuromuscular disorders it is likely that in the future new syndromes will be identified that are linked to genetic abnormalities of the AGRN gene.

Collagens XVIII and XV

Collagens XVIII and XV, two members of the “multiplexin” gene family [216-220], harbor structural features of collagens and proteoglycans, being substituted with HS and CS, respectively [221]. Like agrin, collagen XVIII was serendipitously discovered to be an HSPG when monoclonal antibodies were used against an unidentified avian HSPG [222]. Subsequent cloning and sequencing of the cDNA showed that this avian HSPG protein core shows high homology to the mammalian collagen XVIII. Collagen XVIII is a homotrimer comprised of three identical α1 chains and consists of ten interrupted collagenous domains, flanked by eleven noncollagenous domains at their respective N- and C-termini. Collagen XVIII also harbors three Ser-Gly consensus binding sites for the attachment of HS chains [223] (Fig. 3). The human COL18A1 gene can generate three protein variants derived from alternative promoter usage and splicing events [221]. Specifically, COL18A1 can produce a short variant, a middle variant containing a TSP-1 module, and a long variant containing an additional Frizzled repeat. The latter is missing in collagen XV. Both collagens XVIII and XV contain a C-terminal noncollagenous domain harboring the antiangiogenic endostatin and endostatin-like modules. Specifically, the NC1 domain consists of an N-terminal trimerization region, a central hinge region sensitive to proteolytic activity and the C-terminal endostatin domain (Fig. 3). Endostatin interacts with numerous receptors including integrins α5β1, αvβ3 and αvβ5 [224,225] and VEGFR2 [226]. Interestingly, endostatin, in analogy to endorepellin, is capable of inducing autophagy in endothelial cells by modulating Beclin 1 and β-catenin levels [227]. These findings suggest that C-terminal anti-angiogenic fragments of pericellular HSPGs may evoke endothelial cell autophagy which could contribute to their angiostatic properties. The signaling network evoked by soluble endostatin leads to a downregulation of several key components of the VEGF signaling cascade and, concurrently, to a stimulation of the synthesis of thrombospondin [228], a powerful angiostatic protein [229,230]. Both collagens XVIII and XV are ubiquitously expressed in all vascular and epithelial basement membranes of human and mouse tissues, with an overall topography reminiscent of that of perlecan and agrin. Notably, Col18a1−/− mice show multiple ocular abnormalities, especially affecting the anterior portion of the eyes [231,232]. In humans, mutations in the COL18A1 gene cause Knobloch syndrome, a rare autosomal recessive disease characterized by high myopia, vitreoretinal degeneration and retinal detachment [233,234]. Col18a1−/− mice show enhanced neovascularization and vascular permeability during atherosclerotic disease progression [235], and loss of this gene in both mice and humans leads to hypertriglyceridemia [236]. Moreover, Col18a1−/− mice display enhanced angiogenesis during wound healing [237]. In contrast to Col18a1−/, Col15a1−/− show normal vascular formation but primarily develop a skeletal myopathy [238]. However, microscopic changes in the small arterioles with collapsed capillaries and endothelial cell degeneration in heart and skeletal muscles are also noted [238]. Collectively, these findings implicate collagen XVIII as a negative regulator of angiogenesis and as an anti-atherosclerotic factor. Collagen XV may function as a key structural constituent required for the stabilization of skeletal muscle cells and microvessels [238], and recently both collagens XV and XVIII have been involved in mediating the influx of leukocytes in renal ischemia/reperfusion [239]. Of interest, mice lacking the long form of collagen XVIII (i.e. the N-terminal frizzled-like sequence) but producing the short form, exhibit a decreased number of pre-adipocytes, hepatic steatosis and elevated VLDL and triglyceride levels [240]. Thus collagen XVIII is directly implicated in the generation of adipose tissue and in hyperlipidemia associated with visceral obesity and fatty liver.

Extracellular proteoglycans

This is the largest class encompassing 25 distinct genes. Four genes encode the hyalectans, key structural components of cartilage, blood vessels and central nervous systems. They all bind hyaluronan and form supramolecular complexes of high viscosity. The second class encompasses 18 SLRPs, which have a multitude of functions and often signal through various receptors as many members are now found in the circulation and in various body fluids. The third class, SPOCK family, encompasses 3 testicans which are calcium-binding HSPGs.

Hyaluronan- and lectin-binding proteoglycans (hyalectans)

Hyalectans comprise a distinct family of proteoglycans with structural similarities at both the genomic and protein levels. This family contains four distinct genes, namely aggrecan, versican, neurocan, and brevican (Figs. 1 and 4). A shared feature of these proteoglycans is their tridomain structure: an N-terminal domain that binds hyaluronan, a central domain harboring the GAG side chains, and a C-terminal region that binds lectins [2]. Based on this dual activity at the N- and C-termini, the term hyalectans, an acronym for hyaluronan- and lectin-binding proteoglycans, has been proposed [1]. Alternate exon usage and variability in the degree of glycanation and glycosylation provide diverse functional attributes for these proteoglycans which often act as molecular bridges between cell surfaces and extracellular matrices.
Fig. 4

Schematic representation of the hyaluronan- and lectin-binding proteoglycans (hyalectans), which comprise aggrecan, versican, neurocan and brevican. The full-length versican (V0) and the three splice variants lacking GAGα (V1), GAGβ (V2) or both GAGα and GAGβ (V3) are shown. A new variant, V4, containing a portion of GAGβ is not shown. A GPI-anchored form of brevican is also not shown in the graphic. The dotted circles specify the globular domains (G1–G3) shared by the other hyalectans. These modules are compose of ~100 amino acids and have a characteristic consensus sequence with four disulfide-bonded Cys residues. The key for the various modules is provided in top right panel.

Aggrecan

Aggrecan, as its eponym indicates, has the propensity to aggregate into large supramolecular complexes > 200 MDa together with hyaluronan and link protein, and is the principal load-bearing proteoglycan of cartilage [241]. These large aggregates generate a densely-packed, hydrated gel enmeshed in a network of reinforcing collagen fibrils and other proteoglycans and glycoproteins [242]. The N-terminal domain contains four link protein-like modules or proteoglycan tandem repeats in addition to the Ig-like repeat (Fig. 4). The entire link module is ~100 amino acids in length and has a characteristic consensus sequence with four disulfide-bonded Cys residues. These modules form two globular domains known as G1 and G2 [243]. The G1 domain is related to link protein and to the other G1 domains of the hyalectans, both in terms of structural domains and subdomains [243]. The G1/hyaluronan/link protein ternary complex is very stable thereby immobilizing the aggrecan into enormous complexes that maintain a stable network and provide mechanical properties to cartilage. An interglobular region, between G1 and G2, has a rod-like structure and harbors several protease-sensitive sites involved in the partial degradation of aggrecan in arthritis and other inflammatory diseases. Following the G2 domain is a relatively small region containing numerous KS chains. This domain is not well conserved and its size significantly varies among species. Next, is the largest domain of aggrecan which contains the GAG-binding region. This protein domain is encoded by a single, very large (~4 kb) exon with ~120 Ser-Gly dipeptide repeats, which can generate >100 covalently-linked CS chains. The concentration of negatively-charged forces within aggrecan accounts for its ability to hold large amount of water, not only in cartilage, but also in the intervertebral disc and brain. Moreover, electrostatic repulsion forces generated by the numerous negatively- charged CS and KS chains of aggrecan provide the equilibrium compressive modulus (a measure of stiffness) of cartilage. In humans, variable number of tandem repeats can generate different alleles in the general population, ranging between 13 and 33 repeats, causing a great variability in the aggrecan degree of glycanation and negative charge (due to sulfation) within cartilage. The G3 module of aggrecan contains 2 EGF-like repeats, a C-type lectin domain and a complement regulatory protein (CRP) domain. Notably, the EGF repeats can be alternatively spliced in part because in rodents exon 13 is a pseudoexon. Moreover, in rodent brain, the most common aggrecan species lacks both EGF repeats [244]. As in the case of other hyalectans, the C-type lectin domain of aggrecan binds simple sugars, such as fucose and galactose, in a Ca2+-dependent manner. Thus, aggrecan G3 may serve as a binding domain for the galactose present on collagen type II or other extracellular matrix or cell surface constituents. Moreover, the G3 domain of aggrecan interacts with tenascins, fibulins and sulfated glycolipids [245]. Thus, aggrecan could bridge and interconnect various constituents of the cell surface and extracellular matrix via its C-terminal G3 domain, thereby providing a mechanosensitive feedback to the chondrocytes. Indeed, epiphyseal chondrocytes grown on hydrogel substrata can maintain their phenotype for up to six months with proper secretion of cartilage-specific constituents, such as aggrecan, and collagens type II and IX, but without expressing collagen type I [246]. The essential role of aggrecan in cartilage is underscored by several genetic defects including two autosomal recessive chondrodystrophies, nanomelia in chickens and cartilage matrix deficiency (cmd) in mice [247]. In nanomelia, the defect leads to the formation of a C-terminal truncated aggrecan, while in cmd mice there is an even larger C-terminal truncation. In both mutant animals, there is little or no aggrecan in cartilage leading to shortened long bones and lethality, most likely due to respiratory failure arising from tracheal collapse [247]. Aggrecan is also involved in the morphogenesis of limb synovial joints and articular cartilage [248], and fragments of aggrecan represent biomarkers for osteoarthritis [249]. Aggrecan is also expressed in the brain, and unlike other hyalectans, is expressed primarily in the perineuronal nets [79]. A relatively small number of cortical neurons express aggrecan, especially the cortical interneurons [244]. One of the hypothesized functions of brain aggrecan is its potential regulation of neural maturation, in addition to its physical ability to adduct cations and regulate osmotic imbalances. Thus, aggrecan could affect high-rate synaptic transmission, mechanical stabilization of synaptic contacts and neuroprotection by counteracting oxidative stress via scavenging redox-active cations [244].

Versican

Versican, an eponym that signifies its highly versatile function [250], is the largest member of the hyalectan family when expressed as a whole molecule, designated V0 (Fig. 4). Versican is the mammalian counterpart of the so-called PG-M, a large chondroitin sulfate proteoglycan expressed during chondrogenesis in chick limb buds [251,252]. The VCAN gene, originally called CSPG2 [253-255], encompasses 15 exons encoding a full-length (V0 variant) protein core of ~400 kDa, with 3396 amino acid residues. The overall structural organization of versican is similar to that of aggrecan, with a few exceptions. At the N-terminus there is only one globular domain instead of two. Specifically, the N-terminal domain of versican contains one IgG fold followed by two consecutive link protein modules similar to G1, which are involved in mediating the binding of proteins to hyaluronan. Recombinant versican and a truncated form of versican containing the N-terminal domain bind to hyaluronan with high affinity, KD ~ 4 nM, in the same range as the other major aggregating CSPG, aggrecan [256]. The central domain of versican comprises two relatively large subdomains, designated GAGα (encoded by exon 7) and GAGβ (encoded by exon 8), which can be alternatively spliced to generate the three main variants V1, V2 and V3 [255], with significant CS polymorphism in the different versican isoforms. These large regions lack Cys residues and contain ~30 potential consensus sequences for GAG attachment as well as several binding sites for N- and O-linked oligosaccharides. There is also variability in tissue expression of the isoforms, with V0 and V1 representing the most ubiquitous isoforms, expressed in the developing heart and limbs, vascular smooth muscle cells and several nonneuronal tissues, whereas the V2 isoform is mainly present in the brain [79]. Expression of the V3 isoform in arterial smooth muscle cells regulates multiple signaling pathways, including TGFβ, EGF and NF-κB pathways, thereby creating a microenvironment resistant to monocyte adhesion [257]. Recently, a new splice variant of Versican, V4, has been identified in human breast cancer, which contains up to five CS chains [258]. This isoform comprises only the first 1194 bp of exon 8 (encoding the GAGβ) sandwiched between exon 6 and 9, and is highly expressed in breast cancer in contrast to normal breast tissue where it is undetectable [258]. Notably, the avian versican ortholog harbors an additional exon, known as PLUS, in the N-terminal region that is developmentally regulated [259]. This exon can be alternatively spliced giving rise to two additional isoforms. Although no similar region is present in the mammalian genome, sequence homology suggests that the PLUS domain of avian versican may correspond to the KS attachment region in aggrecan. The C-terminal domain of versican is also very similar to that of aggrecan and other hyalectans in that it harbors similar structural motifs, including two EGF-like repeats, a C-type lectin domain, and a complement regulatory protein-like module (Fig. 4). These motifs are generally found in the selectin family of glycoproteins, which include several adhesion receptors regulating leukocyte homing and extravasation during inflammation. Given the fact that the various C-type lectin modules may have different saccharide-binding specificity, the presence of these domains at the C-terminal ends of hyalectans could provide specialized and refined functions for these CSPGs. Moreover, these findings suggest that versican may form a molecular link between lectin-containing glycoproteins at the cell surface and extracellular hyaluronan. Because hyaluronan is bound to the cell surface via its CD44 receptor [241,260], versican may also stabilize a large supramolecular complex at the plasma membrane zone [2]. The functional roles of versican are multiple and complex. Versican is involved in the regulation of cell adhesion, migration and inflammation [260-262]. During an inflammatory response, leukocytes need to emigrate from the inner blood vessels into the damaged surrounding tissues. During this process, leukocytes encounter a provisional matrix highly enriched in versican, which in turn is capable of interacting with many receptors on the surface of immune cells including CD44, P-selectin glycoprotein-1, and Toll-like receptors [261]. Another important role of versican derives from the multiple processing of its protein core. Versican is degraded and partially processed by several MMPs, plasmin and members of the ADAMTS family [263,264].Versican is also involved in the biology of leiomyosarcomas insofar as its levels are markedly increased vis-à-vis benign leiomyomas, and suppression of versican expression attenuates malignant growth and tumor progression [265]. Two autosomal dominant eye disorders, Wagner syndrome and erosive vitreo-retinopathy, which both show optically empty vitreous cavities, are caused by mutations in the VCAN gene [266]. Interestingly, the mutant alleles contain mutations around the splice sites flanking exon 8, which encodes the GAGβ domain, likely producing exon skipping. The ultimate consequence of exon skipping is that most tissues, and especially the eye, would have a lack of the GAGβ domain with much fewer CS chains, and thus a less charged environment.

Neurocan and brevican

The third member of the hyalectans is neurocan, a developmentally regulated CSPG originally cloned from rat brain, and thus its eponym to signify neuronal origin [267]. Rotary shadowing electron microscopy of neurocan has revealed two globular domains interconnected by a 60–90 nm rod [268], similar to the predicted organization of other hyalectans derived from biochemical and genomic analyses. As other hyalectans, neurocan has an N-terminal domain with structural homology to the typical arrangements found in link protein, harboring a G1 domain and an Ig repeat (Fig. 4). Functionally, recombinant N-terminal module of neurocan interacts with hyaluronan in solution, and isolated complexes comprise gel permeation assays, and hyaluronan and globular profiles [268]. Therefore, it is highly likely that all the N-terminal domains of the hyalectans bind and interact with hyaluronan and link protein in vivo, forming gigantic supramolecular aggregates. The next interglobular region of neurocan, with little homology to other proteins, contains ~seven potential CS binding sites. The C-terminal module of neurocan shares significant homology to the G3 domain of aggrecan and versican, with ~60% identity between the rat neurocan and human versican/aggrecan. By analogy to the other hyalectan members, this domain could bind several brain glycoproteins including Ng-CAM, N-CAM, and tenascin. Neurocan is known to inhibit neurite outgrowth in vitro and, in keeping with this function, the expression of neurocan is increased at the site of mechanical and ischemic injury in the adult central nervous system [78,269]. Neurocan has been implicated in path finding during development. However, Ncan−/− mice develop normally with only mild deficiency in long-term potentiation, suggesting that neurocan might only have a redundant role during development. Brevican is one of the most important hyalectans of the central nervous system. It takes its eponym from the Latin word brevis (for short) as it harbors a typical hyalectan configuration with N- and C-terminal homologous domains, but with the shorter GAG-binding domain (Fig. 4) [270,271]. Brevican was simultaneously discovered by three laboratories searching for hyaluronan-binding proteoglycans in the brain [271,272] and for synapse associated proteins [273]. The eponym BEHAB, which is sometimes used for brevican as they are the same gene products, refers to brain-enriched hyaluronan binding protein [272]. Although sequence homology with the other hyalectan members is quite uniform (~60% overall), the GAG-binding domain is poorly conserved and contains a high content of acidic amino acid residues (mainly glutamic acid). This structural feature, shared with the link protein-like module of versican, could mediate binding to cationic proteins and minerals. In analogy to neurocan, brevican can exist as either a full-length CSPG or as a partially cleaved product without the GAG-binding module and the N-terminal domain. Similar to neurocan, brevican exists in vivo either as a full-length proteoglycan or as a proteolytically-processed form lacking the GAG-binding region and the N-terminal domain. The C-terminal G3-like domain is structurally organized like the other hyalectans, although it harbors only one EGF-like repeat instead of two as in all the other members (Fig. 4). In addition to secreted full-length brevican, an isoform of brevican encoded by a shorter 3.3 kb mRNA and highly expressed during post-natal development, is linked to the plasma membrane via a GPI anchor [273]. Notably, the GPI-anchored brevican lacks EGF, C-type lectin and CRP modules but contains a stretch of hydrophobic amino acids resembling the GPI-anchor. Brevican is located at the outer surface of neurons and is enriched at perisynaptic sites. Brevican interacts with tenascin-R and fibulin-2 via its G3-like domain [274]. Functionally, brevican has been implicated in glioma tumorigenesis, nervous tissue injury and repair, and in Alzheimer's disease [274]. However, many more studies need to be performed before a clear picture of brevican's biology can be clearly drawn.

Small leucine-rich proteoglycans/SLRPs

General considerations

This is the largest family of proteoglycans encompassing 18 distinct gene products and numerous splice variants and processed forms. The eponym SLRP, for small leucine-rich proteoglycans [1], is now a widely-used abbreviation. SLRPs designate a class of proteoglycans characterized by a relatively small protein core (as compared to the larger aggregating proteoglycans) of 36–42 kDa and encompassing a central region constituted by leucine-rich repeats (LRRs) (Fig. 5) [275]. The SLRPs are ubiquitously expressed in most extracellular matrices and are highly expressed during development in the thin membranes enveloping all the major organs such as meninges, pericardium, pleura, periosteum, perichondrium, perimesium and endomesium [276-278] This strategic topology suggests that SLRPs would be directly involved in regulating organ size and shape during embryonic development and homeostasis [279,280].
Fig. 5

Phylogenetic tree of the small leucine-rich proteoglycans (SLRPs) and crystal structure of porcine decorin and biglycan decorin. (A) Dendogram of the five human SLRP classes, numbered and color-coded. Protein sequences were first aligned with CLUSTALW before an unrooted dendogram was generated by a neighbor joining method using GenomeNet. (B) Cartoon ribbon diagram of the crystal structure of monomeric bovine decorin rendered with Pymol v1.7 (PDB accession number 1XKU). Vertical arrows indicate β-strands, while coiled ribbons indicate α-helices. The leucine- rich repeats (LRRs) are numbered above the diagram. The sequence (SYIRIADTNIT) involved in binding to collagen type I [306,307] is highlighted in yellow. The terminal LRR Cys capping motif, known as the ear repeat, is also indicated [299].

The 18 SLRP members are grouped into five classes: Classes I–III are canonical genes, whereas Classes IV and V are non-canonical (Fig. 1). Although eight non-canonical members do not carry glycosaminoglycan side chains, they have been included because they share close structural homology and several functional properties with the full-time proteoglycans. This classification is based on several considerations, including evolutionary conservation, homology at both the protein and genomic level, and chromosomal organization (Fig. 5A) [281]. It is important to note that SLRPs share many biological functions in terms of binding to various collagens [282-286], RTKs [287-290], innate immune receptors [291,292] and in terms of modulating the bioactivity of various signaling pathways when in soluble form [293-295]. Moreover, several SLRPs bind TGFβ and bone morphogenetic protein (BMP), and several members of this family inhibit cell growth [296,297]. The crystal structure of bovine decorin [298] shows a solenoid fold structure typical of LRRs (Fig. 5B). Each LRR unit is composed of ~24 amino acids, characterized by a conserved pattern of hydrophobic residues, with short parallel β-sheet on the concave face interwoven with loops containing short β-strands, 310 helices and polyproline II helices on the convex (outer) side of the protein core (Fig. 5B). The LRRs form a curved, solenoid structure where protein/protein interactions occur primarily via the side chains of variable residues protruding from the short parallel β-strands that form the inner (concave) face of the solenoid. The LRRs are flanked at the N- and C-termini by disulfide-bonded caps which define the various classes [277]. At the N-terminus, there are four Cys residues with a variable number of intervening amino acids, whereas the C-terminal capping motif encompasses two LRRs and includes the so-called ear repeat (Fig. 5B). This Cys-capping motif, designated LRRCE, is present in the canonical SLRPs (Classes I–III) but absent in the other two non-canonical classes [299]. Likely, both capping motifs at either end of SLRPs Class I–III would function to stabilize the LRR central domain as in the case of other LRR protein and receptors. Another characteristic feature of Class I–III SLRPs is the presence of a long penultimate LRR (LRR XI in decorin), that has been called the “ear” repeat [300]. Typically, the ear repeats contain 30 or more amino acid residues including an atypical sequence harboring a Cys located at about 10 residues after the asparagine residue in the consensus LRR [300]. Genetic mutations in the decorin gene leading to a terminal truncation of the decorin protein core, lacking the ear repeat, cause congenital stromal corneal dystrophy [301]. This syndrome has been faithfully reproduced in mice where this truncated decorin was specifically expressed into the cornea [302,303]. Although bovine decorin has been crystallized as an anti-parallel dimer [298] and reported to be a dimer in solution [304], there is strong evidence that decorin acts as a monomer in solution [293], especially when interacting with the small binding site on the EGFR ectodomain in vivo where a dimer could not fit the cavity [305]. Also supportive of a concave face binding is the identification of the sequence (SYIRIADTNIT) in LRR VII (highlighted in yellow in Fig. 5B) of the decorin protein core that is directly involved in binding to collagen type I [306,307]. A recent study utilizing mutant forms of mouse decorin, where engineered glycosylated sites in the concave face prevent dimerization, has shown that the monomeric mutants are as stable as the wild-type in solution [308]. The concave face mutants fail to bind collagen, regardless of the dimerization state, thus providing robust biological evidence for a concave face-mediated binding (i.e., monomeric decorin) to collagen [308]. A hallmark shared by nearly all SLRPs, and by most LRR-containing proteins, is their propensity to interact with other proteins and to regulate collagen fibrillogenesis [282,283,309,310]. For example, several SLRPs interact with fibrils of collagen types I, II, III, V, VI and XI. Indeed, the eponym “decorin” derives from its ability to decorate fibrillar (banded) collagen in a periodic fashion, that is, decorin protein core non-covalently binds, about every 67 nm, to an intraperiod site on the surface of collagen fibrils, every D period [311,312]. In highly purified α1(I) procollagen molecules, decorin protein core binds close to an intermolecular cross-linking site near the C terminus [313]. SLRP coating of various types of collagen serves a dual function: it regulates the lateral association of collagen molecules into proper fibrils, and protects collagen fibrils from proteolysis by sterically limiting the access of collagenases to their cleavage sites. It is important that, during evolution, these dual functional properties of SLRPs are shared by both their sulfated GAGs and protein cores. Notably, few SLRP members contain stretches of amino acids that can be sulfated, such as the poly-Tyr sulfate in fibromodulin or the poly-Asp region in asporin. Often, the GAGs are located in the N-terminus, in a location that is similar to that of these poly-sulfated amino acid stretches, and can be directly involved in collagen interaction [314,315]. An additional degree of complexity is provided by the heterogeneous structure of the GAG chains. For instance, Class I SLRPs contain CS or DS chains, with the exception of asporin, ECM2, and ECMX. In contrast, Class II members contain poly-lactosamine or KS chains in their LRRs and sulfated Tyr residues at their N-termini. Class III members contain CS/DS (epiphycan), KS (osteoglycin), or no GAG (opticin). Finally, the non-canonical Class IV and V members lack GAG chains with the exception of chondroadherin, which is substituted with KS. The biological functions of SLRPs are very vast and there are over 3000 published papers on decorin alone, the archetypal and most studied SLRP. Thus, we refer the readers to recent comprehensive and specialized reviews on SLRPs [275,281– 283,294,307,316–325]. Moreover, it has been proposed that SLRPs can be transcriptionally co-regulated through utilization of HOX-Runx modules in their promoters and genomic regions, including proximal exons and intergenic regions [326]. Below, is a brief overview of each family with emphasis on recent discoveries of their multiple functional roles in physiological and perturbed states.

Class I SLRP

Decorin, also known as PG40 and DSPG1, was originally cloned from a fibroblast cDNA library [327], and subsequently named decorin because of its ability to decorate collagen fibrils [328]. Specifically, decorin protein core is a Zn2+ metalloprotein [329,330] that is biologically active in solution as a monomer [293]. As mentioned above, decorin protein core binds non-covalently to an intraperiod site on the surface of collagen fibrils about every 67 nm, at the D period [312]. Using purified collagen and procollagen molecules, that can be visualized by their C-terminal globular regions, it has been shown that decorin protein core binds near the C terminus of collagen α1(I), near an intermolecular cross-linking site [313]. Not only the protein core but also the N-terminal GAG chain of decorin plays a role in collagen fibrillogenesis and structure [285,314,315,331-334]. The strategic location of the GAG binding domain in the N-terminus of decorin allows a higher degree of mobility for the DS chain, which presumably could align orthogonally or parallel to the axis of the collagen fibrils. This dual function of decorin could help in maintaining corneal transparency and biomechanical properties of various connective tissues [282,284,335]. The decorin gene exhibits a complex genomic organization and transcriptional control [276,336-338] and its transcription can be induced by quiescence and suppressed by TNFα [339,340]. It was known for many years that the small DSPG of tendon, mostly decorin, is capable of inhibiting lateral growth of collagen fibrils [309]. Thus, when the decorin-null mice were generated, the first targeted deletion of a proteoglycan-encoding gene, the abnormal collagen structure in the dermis and the skin fragility phenotype [310] provided the first genetic evidence for a regulatory role for the prototype member of SLRP gene family in collagen fibrillogenesis. The phenotype of the decorin deficient mice includes abnormal collagen fibril morphology in the skin and tail tendon, presumably by being less stable during development due to abnormal cross-linking or enhanced susceptibility to collagenase. The prevalent phenotype of the decorin-null mice is skin fragility caused by a thinning of the dermis with concurrent reduced tensile strength, a biomechanical impairment directly linked to the abnormal collagen network. Overall, the Dcn−/− mice resemble the cutaneous defects observed in the Ehlers–Danlos syndrome, characterized by skin hyperextensibility and tissue fragility [341], in a way opposite to fibrosis [342]. Due to its mild phenotype, the Dcn−/− mice have been utilized by a large number of investigators using many experimental challenges and have provided strong genetic evidence for decorin roles in Lyme disease [343,344], lung mechanics and asthma [345,346], diabetic nephropathy and tubulointerstitial fibrosis [347-350], myocardial infarction [351], corneal transparency and tendon biomechanical properties [352-356], dentin mineralization and periodontal homeostasis [357-359], hepatic fibrosis and hepatocellular carcinoma [318,360-362], collagen fibrillogenesis [314,363,364], fetal membrane biology [365-367], wound healing and angiogenesis [368-373], innate immunity and inflammation [291,374,375], adhesion and migration [376], and mesenchymal stem cell biology [377]. Decorin plays an important role during zebrafish development insofar as zDcn knockdown causes a severe phenotype characterized by abnormal convergent extension, craniofacial abnormalities, and cyclopia [278]. As these genetic defects are reminiscent of several zebrafish mutants affecting the non-canonical Wnt signaling pathway, it is possible that decorin might also play a role in this pathway in mammalians. Indeed, a recent study has shown that decorin is directly involved in modulating the signaling pathway of Wnt3a shaping niches supportive of hematopoiesis [378]. Mutations in the decorin gene have been linked to congenital stromal corneal dystrophy (CSCD) syndrome [301,379] where a truncated form of decorin lacking the ear repeat, the C-terminal 33 amino acids, acts in dominant negative fashion. A corneal knock in transgenic mouse lacking the C-terminal 33 amino acid residues (952delTDcn) faithfully recapitulates the human phenotype of corneal opacities [302]. Mechanistically, the C-terminal truncated form of decorin is retained in the cytoplasm of keratinocytes, triggering ER stress and an unfolded protein response [380]. These data provide a cell-based, rather than ECM-based, interpretation of the CSCD phenotype whereby a truncated SLRP protein core, by inducing ER stress, causes an abnormal processing and secretion of decorin and other SLRPs, eventually generating an abnormal matrix assembly and corneal opacities. Decorin was the first proteoglycan to be directly involved in the control of cell growth. Two seminal papers identified decorin as a growth suppressor, via a mechanism involving decorin's binding to and inhibiting TGFβ in Chinese hamster ovary cells [381,382]. Concurrently, decorin was identified as a proteoglycan highly expressed in the tumor stroma of colon carcinomas [383], primarily via hypomethylation of its promoter regions [384]. It was soon recognized, however, that the growth of most malignant cells does not depend on the availability of TGFβ. Thus, there had to be other signaling receptors for the growth suppressive function of decorin. The existence of such receptor(s) was supported by an emerging body of literature describing that ectopic expression of decorin or its protein core suppress the malignant phenotype in a variety of histogenetic malignant backgrounds [385,386]. Utilizing A431 cells, a squamous carcinoma cell line which overexpress EGFR, it was discovered that exogenous decorin proteoglycan or protein core transiently activated the EGFR to induce growth inhibition via expression of the cyclin-dependent kinase inhibitor p21 [287,387,388]. Indeed, decorin binds to a narrow region of the EGFR, partially overlapping with but distinct from the EGF-binding epitope [305]. Mechanistically, decorin transiently activates the EGFR and elevates cytosolic Ca2+ in A431 cells [389], but it causes a sustained down-regulation of this RTK, thereby providing a plausible mechanism for controlling tumor growth in vivo in various forms of cancer [390-392]. Specifically, soluble decorin evokes protracted internalization and degradation of the EGFR via caveolar endocytosis [393]. An anti-oncogenic role for decorin has been also demonstrated in its ability to inhibit another member of the ErbB family, namely the ErbB2/Neu, in this case by inhibiting heterodimerization of ErbB4 with ErbB2, thereby leading to growth suppression and cytodifferentiation of mammary carcinoma cells [394]. It was subsequently found that decorin binds specifically and with higher affinity (KD ~ 2 nM) to hepatocyte growth factor receptor known as Met [288] and causes proteasomal degradation of Myc and β-catenin, two critical downstream effectors of Met [395]. An important downstream effect of the decorin/Met interaction is induction of two anti-angiogenic proteins, Thrombospondin 1 and TIMP3, with concurrent inhibition of two powerful pro-angiogenic factors, HIF-1α and VEGFA [371,372]. Moreover, decorin binds and suppresses both the IGF-IR [289,396,397] and VEGFR2 [371,398]. Loss of decorin in the tumor stroma correlates with poor survival of patients with invasive breast carcinomas [275,399,400] and in mice with spontaneous breast cancer [401]. Moreover, decorin is markedly reduced in the stroma of many solid tumors [402-404], as well as low- and high-grade bladder carcinomas, but is highly expressed in the normal bladder stroma [397]. Decorin levels are also decreased in multiple myeloma [405,406], soft tissue sarcomas [407], prostatic [408], urothelial [409-411] and hepatic [362,412] carcinomas, together with a complete loss of decorin expression by several tumor cells [413,414]. Additional proof for an oncostatic role of decorin as a soluble tumor repressor stems from genetic models wherein ablation of decorin under conditions of a high-fat, western-type diet, is linked to the spontaneous appearance of intestinal tumors [415,416]. Moreover, compound Dcn−/−;Tp53−/− mice die of aggressive T-cell lymphomas much sooner than mice lacking only the tumor suppressor Tp53 [417]. Notably, systemic delivery of decorin, either as a soluble factor or via adenoviral gene delivery, significantly retards tumorigenic and angiogenic growth in a wide variety of malignant solid tumors [413,418-424]. Collectively, these findings provide strong support to the concept that decorin could act as a “guardian from the matrix” in analogy to p53, a guardian of the genome [414]. Thus, decorin could become a potent therapeutic factor, either alone or in combination with traditional chemotherapy, in preventing tumor progression and metastasis [297]. Recently, it was discovered that soluble decorin evokes excessive autophagy in endothelial cells, independently of nutrient deprivation, through partial agonistic activity on VEGFR2 [425]. This signaling cascade emanating from the decorin/VEGFR2 interaction leads to two effects. First, it activates AMPKα and Vps34, which in turn stimulate the synthesis of Peg3 [426], a recently-identified master regulator of autophagy [422]. Peg3 recruits LC3 and Beclin 1, which evoke autophagy, and concurrently induces transcription of both genes, while inhibiting VEGFA production [425]. These multiple biological roles of decorin would converge on oncostasis by suppressing RTK signaling in the growing cancer cells and inhibiting the supply of oxygen and nutrient via hindering angiogenesis and inducing a protracted, and in this case deleterious, stromal cell autophagy [427]. In view of the fact that decorin has been found in the circulation in nanomolar amounts [428-430], at concentrations similar to those used in the experimental studies mentioned above, and as plasma decorin is significantly increased in cancer patients [291], it is plausible that this endogenous tumor repressor might have a physiological role in vivo. Biglycan, decorin's closest proteoglycan, was originally isolated from bovine bone and then, following its cloning and sequencing, was found to contain two Ser-Gly attachment sites in the N-terminal region, thus its eponym meaning two GAG chains [431]. Both the human and mouse genes have an overall similar exonic arrangement [432,433]. It is highly homologous to decorin, with > 65% overall homology. Similar to decorin, biglycan binds TGFβ [434] and modulates its bioactivity [435]. Ablation of the biglycan gene, Bgn−/0 (this genetic symbol designates the presence of Bgn gene on the X chromosome), which harbors a gene with a ubiquitous tissue distribution and a pronounced expression in bone [433,436], reveals a key function for this SLRP in regulating postnatal skeletal growth [437]. In general, the long bones in Bgn−/0 mice grow slower than wild-type littermates and eventually are shorter and exhibit reduced bone mass. The latter is secondary to the marked decline in number of osteoblasts with concurrent progressive depletion of the bone marrow stromal cells [437]. These mutant mice also display delayed osteogenesis after marrow ablation [438], broader metadentin, and altered dentin mineralization, causing significant enamel structural defects. Thus, biglycan-deficient mice could be a promising animal model to study skeletal diseases and osteoporosis [439]. Although Dcn−/− mice also show abnormalities in bone collagen fibril size and organization, they show neither overt bone mass defects nor abnormal osteoblast growth as in the case of biglycan deficiency. These findings underline non-overlapping functions that have evolved for these two homologous Class I SLRPs. Biglycan modulates BMP-4-induced osteoblast differentiation [440], and it also binds Chordin and BMP-4 in Xenopus embryos, thereby blocking BMP-4 activity [441]. Moreover, biglycan affects the Wnt signaling pathway [442], in analogy to decorin (see above). However, a recent study has shown that biglycan acts as a pro-angiogenic stimulus in contrast to decorin, suggesting that these two functions are SLRP-specific. This pro-angiogenic activity of biglycan is mediated by its binding to VEGFA and its potentiation of the VEGFR2 signaling pathway [443]. This bioactivity favors fracture healing via a pro-angiogenic stimulus, a process that is markedly attenuated in the absence of biglycan [443]. Biglycan can also affect cell growth by inducing the cyclin-dependent kinase inhibitor p27 in pancreatic carcinoma cells [444], as well as myofibroblast differentiation and proliferation by modulating the TGFβ/Smad2 signaling pathway [445]. A significant paradigmatic shift for biglycan biology was the discovery that this SLRP is proinflammatory and binds to Toll-like receptors (TLR)-2 and -4 [446]. The key observation again came from genetic studies where biglycan-deficient mice show a greater survival rate than wild-type when subjected to lethal LPS-induced sepsis. Mechanistically, biglycan is highly produced and secreted by circulating macrophages, thereby acting as a danger signaling molecule for the innate immunity receptors TLR-2/4 [446] and by activating the inflammasome via TLR-2/4 and the purinergic P2X receptors [447]. Indeed, biglycan-evoked TLR-2/4 activation exacerbates the outcome of ischemic acute renal injury [448] and induces the synthesis and secretion of several chemo-attractants in the kidney, thereby enhancing the inflammatory damage [449]. Some of the pro-inflammatory roles of biglycan affect the pulmonary parenchyma as well via a receptor cross-talk [323,450]. Recently, it has been proposed that biglycan could act as a biomarker of inflammatory renal diseases [451]. Thus, an emerging picture is appearing where biglycan, often in contrast to its “first cousin” decorin, links the innate to the adaptive immunity, thereby operating in a broad biological environment including microbial and non-microbial pathogenesis, and cancer growth and inflammation [452]. Another Class I SLRP is asporin, also known as PLAP-1 or periodontal ligament-associated protein 1. Asporin was originally isolated from cartilage extracts of human patients with early osteoarthritis and was soon recognized to share homology with other SLRP members [453]. Its eponym derives from the N-terminal region enriched in aspartic acid and its homology to decorin. Asporin, although being similar to decorin and biglycan in its overall structure, does not contain Ser-Gly dipeptides capable of GAG substitution; thus it might not have GAG chains [454]. The overall tissue distribution of asporin is similar to that of decorin [276], with high expression detected in the skeleton and other mineralized connective tissues, but with minimal expression detected in all parenchymal organs [454]. Asporin is located on human chromosome 9 and is a member of the chromosomal SLRP gene cluster that includes osteoadherin, osteoglycin and ECM2. The N-terminal polyaspartate domain binds calcium and regulates hydroxyapatite formation [455]. Moreover, asporin and decorin compete for binding to collagen via LRR10–12, and asporin's role in biomineralization is further corroborated by its expression in osteoblast progenitor cells [456], key players in intramembranous bone formation. Asporin antagonizes chondrogenesis in articular cartilage by interfering with the TGFβ1/receptor interaction on the cell surface and by inhibiting the canonical TGFβ/Smad signaling pathway [457]. Specifically, suppression of ASPN gene expression via siRNA leads to increased expression of TGFβ1 [457], which in turn stimulates the expression of asporin indirectly via upregulation of Smad3 [456]. In agreement with these concepts is the discovery that a polymorphism in the polyaspartate region of asporin (D14 allele) is strongly associated with osteoarthritis. Moreover, the frequency of the D14 allele increases with disease severity [458]. Asporin is expressed at high levels in the more degenerate human intervertebral discs [459]. Moreover, asporin suppresses the TGFβ-evoked expression of aggrecan and collagen type II and reduces proteoglycan accumulation in an in vitro model of chondrogenesis, again both prominently linked to the D14 allele [458]. Thus, asporin and TGFβ1 form a regulatory feedback loop to fine tune chondrogenesis. Recently it has been reported that asporin is highly expressed in the cancer-associated fibroblasts of scirrhous gastric carcinomas [460]. In this case, asporin promotes invasion by neighboring cells in a paracrine fashion by activating the CD44-Rac1 pathway [460]. Finally, ECM2 and ECMX, two poorly-studied SLRP Class I, are two genes that are related to decorin, being ~35% homologous to the LRR of decorin. However, both SLRPs have a larger size and contain an RGD sequence known to bind integrin receptors and a von Willebrand Factor-like domain [461]. ECM2 is predominantly expressed in adipose tissue and in female organs such as mammary gland, ovary and uterus [461]. Interestingly, ECM2 gene is physically linked to asporin on chromosome 9, and its promoter shares cis-acting elements in common with other members of SLRP gene family [326]. These SLRPs are included in Class I based on genomic and protein homology, although most likely they do not contain any GAG chains. Future studies are needed to decipher their biological function

Class II SLRP

This class includes five SLRPs that can be further subdivided into three subgroups based on protein homology. Subgroup A includes fibromodulin and lumican, subgroup B harbors PRELP and keratocan, and subgroup C includes osteoadherin. All these Class II SLRPs have homologous genomic organization (three exons), with the largest exon encoding for most of the LRRs. All contain a charged N-terminus with multiple tyrosine sulfate residues that contribute to the anionic properties of these proteoglycans. Characteristically, Class II SLRPs are substituted with keratan sulfate and polylactosamine, an unsulfated variant of KS. Notably, corneal KS binds with high affinity to FGF2 and sonic hedgehog [462], indicating that KSPGs can participate in the modulation of growth factor activity and morphogen gradient formation. Many of these SLRPs are highly expressed in connective tissues and cartilage where they bind many ECM constituents, especially fibrillar collagens, thereby stabilizing the fibrillar network that constitutes the framework of the tissue [242]. KSPGs are also directly involved in regulating corneal transparency, especially the interfibrillar spacing of orthogonal fibers, and their sulfation pattern is highly conserved throughout the cornea [463]. Fibromodulin was originally isolated from cartilage [464] and soon realized to be homologous to decorin. Its eponym derives from the fact that fibromodulin binds to collagens I and II and causes delayed fibril formation [465,466]. The N-terminus of fibromodulin contains a stretch of tyrosine sulfate residues which can be cleaved by MMP-13 [467]. As fibromodulin N-terminal domain appears to be exposed following its binding to fibrillar collagens, it is possible that this charged domain would have a dual function: it could be involved in collagen cross-linking and it could bind and sequester growth factors such as members of the FGF and VEGF family, as well as several inflammatory cytokines released during tissue remodeling. Indeed, this domain, as that of osteoadherin (see below), physically interacts with basic clusters of several heparin-binding growth factors and cytokines [468]. Fibromodulin is a major KSPG [469] and some molecules contain KS chains exclusively capped with α(2–3)-linked sialic acid [470]. It regulates collagen fibrillogenesis during corneal development [471]. Fibromodulin binds to same region of collagen I where lumican binds [472], but in a region different from the decorin binding site [473]. Specifically, fibromodulin binds collagen I via residues located in LRR11, between Glu-353 and Lys-355, located in the convex surface of the protein core [474]. In contrast, both lumican and fibromodulin bind to collagen I via a more proximal region located between LRR5 and LRR7 [475]. In spite of this overlapping binding, it has been reported that differential expression of lumican and fibromodulin regulates collagen fibrillogenesis during mammalian tendon development [476]. Thus, there is redundancy and specificity for SLRP binding and modulation of collagen fibrillogenesis in vivo. As other SLRPs, fibromodulin binds TGFβ [434], and, in common to decorin [477], binds the collagenous part of complement C1q [478]. However, and in contrast to decorin, fibromodulin activates the classical complement pathway [478]. Fibrodulin is widely distributed in connective tissues, and, thus, the phenotype of Fmod−/− mice is quite complex [479,480]. These mutant mice exhibit abnormal collagen fibril organization, but they also show abnormal deposition of lumican in tendon [481], and abnormal dentin mineralization [482] and alveolar bone formation [483,484]. The phenotype of Fmod−/− mice becomes ever more complex when these mutant mice are crossed with mice deficient in either biglycan or lumican. Double mutant Lum−/−;Fmod−/− mice develop a syndrome of joint laxity and tendinopathy [485] reminiscent of patients with Ehlers-Danlos syndrome. Moreover, Lum−/−;Fmod−/− mice exhibit ocular features of high myopia, including thin sclera and increased axial length [486]. When Fmod−/− mice are mated to homozygosity with Bgn−/0 mice, the double mutants develop ectopic ossification and osteoarthritis [487], and also an accelerated temporomandibular osteoarthritis [488]. In the latter case, osteoarthritis arises from accelerated chondrogenesis secondary to decreased levels of sequestered TGFβ1 in the double mutant Bgn−/0;Fmod−/− mice, thereby causing an over-activation of the TGFβ signaling pathway [488]. This mechanism is similar to that recently reported for excessive TGFβ signaling due to low decorin expression/levels in osteogenesis imperfecta [489] and recessive dystrophic epidermolysis bullosa [490]. In both diseases, the clinical severity of the relative phenotypes is markedly enhanced by TGFβ freed from sequestration by low SLRP levels. Thus, there is a genetic interaction among various SLRPs and their temporal and spatial expression needs to be maintained and finely balanced to prevent significant pathology. In solid tumors, fibromodulin appears to modulate the tumor stroma by increasing extracellular fluid volume and lowering interstitial fluid pressure [491]. This bioactivity has been proposed to influence cancer fluid balance, which in turn affects the response to chemotherapy [491]. Finally, recent reports have shown that fibromodulin promotes in vitro and in vivo angiogenesis [492], and this is particularly prominent in melanocyte-secreted fibromodulin [493]. Mechanistically, fibromodulin appears to be secreted at high levels in low pigmented melanocytes and it stimulates the secretion of monocyte chemotactic protein-1, which is a powerful angiogenic factor [494]. The second member of Class II SLRP is lumican which was originally characterized from avian cornea as a KSPG and derives its eponym in recognition of lumican's role in regulating corneal transparency [495,496]. However, it is now clear that lumican is ubiquitously expressed and is localized primarily to mesenchymal tissues and tumor stroma [480,497]. This KSPG plays a critical role in corneal clarity by maintaining the interfibrillar space of the corneal collagen architecture vital for transparency. Indeed, Lum−/− mice develop bilateral corneal opacities together with skin laxity and fragility reminiscent of Ehlers Danlos syndrome [498]. The posterior corneal stroma is most vulnerable to lumican deficiency as this region shows early developmental defects in fibril structure and architecture in the Lum−/− mice [499]. The causative role of lumican in corneal opacity is demonstrated by genetic studies where a mouse overexpressing lumican in the cornea, driven by the keratocan promoter, can fully rescue the Lum−/− eye phenotype [500]. Notably, these ocular abnormalities are more exaggerated and include scleral alterations when both lumican and fibromodulin are ablated [486]. In zebrafish, knock-down of lumican leads to scleral thinning and increased size of scleral coats [501]. Moreover, mice deficient in lumican and fibromodulin have joint laxity and severe tendinopathy [485]. Indeed, differential expression of lumican and fibromodulin (see above) regulates the proper alignment and overall structure of collagen fibrils during murine tendon development [476,502]. Lumican has been involved in cancer and inflammation [503], two areas of research where other SLRPs, predominantly Class I decorin and biglycan, have been extensively investigated. One of the first observations was that lumican could inhibit colony formation in soft agar induced by v-K-ras and v-src [504]. Indeed, lumican is markedly increased in the stroma of breast carcinomas [505,506], and is highly expressed in melanomas [507,508]. Lumican also inhibits melanoma progression [509,510], and blocks melanoma cell adhesion via interaction with β1-containing integrins [511] and by modulating focal adhesion complexes [512]. These effects are all mediated by the protein core, as a peptide fragment named lumcorin from LRR9 can by itself inhibit melanoma cell migration [513]. Lumican has also been involved in other forms of malignancy including prostate cancer [514], pancreatic cancer [515], and osteosarcomas [516]. In the latter case, lumican regulates osteosarcoma cell adhesion by modulating TGFβ2 activity [517]. Lumican can be the target of MMPs and can also inhibit MMP activity, as recently shown for MMP-14 [518]. Using expression cloning, it was found that lumican specifically interacts with membrane-type MMP-1, which can cleave lumican, thereby preventing induction of p21 [519], with a mechanism similar to the p21 induction described for decorin [387]. Thus, it seems that in certain neoplastic conditions the biological effects of Class I and II SLRPs can converge on an antioncostatic function, as decorin was also found to be susceptible to membrane-type MMP-1 cleavage in the same cell system [519]. Lumican's involvement in inflammation is exemplified by the findings that lumican regulates corneal inflammation by binding to the Fas ligand and thus interfering with Fas–Fas ligand interaction [520]. This mechanism is again shared by Class I SLRPs where several components bind various forms of TGFβ. Notably, lumican deposited on the surface of neutrophils during their transmigration across endothelia promotes neutrophil migration via β2 integrin [521] and keratinocyte-derived CXCL1 chemokine [522]. These findings are consistent with a role for lumican in evoking neutrophil recruitment and invasion following corneal injury and wound healing [523]. Lumican can also interact with the innate immune receptor Toll-like receptor 4 [524], as it was previously shown for the Class I SLRP biglycan [446]. Soluble lumican evokes bacterial phagocytosis thus providing a molecular protection against Gram-negative bacteria [525]. This biological function of lumican has been corroborated by genetic studies whereby Lum−/− mice show an enhanced pulmonary infection by Pseudomonas aeruginosa [525] and a low innate immunity and inflammatory response in a murine model of colitis [526]. Recently, it has been shown that lumican binds via its C-terminal 50 amino acid region to TGFβ receptor 1, also known as ALK5 [527]. Thus, in common to other SLRP members, lumican can affect both the innate immune system and the TGFβ signaling pathway. PRELP (proline/arginine-rich end leucine-rich repeat protein), also known as prolargin, derives its eponym from its unique N-terminal domain enriched in basic amino acid residues [528]. The N-terminus of PRELP binds heparin, heparan sulfate and also tyrosine sulfate-rich domains of Class III SLRPs, fibromodulin and osteoadherin [242]. PRELP was originally isolated from cartilage extracts and found to be expressed predominantly in the territorial matrix [464]. However, it is now recognized that PRELP has a much wider distribution, with expression in kidneys, aorta, liver and skeletal muscle. It is expressed in pericellular regions near basement membranes [242]. Indeed, PRELP binds to the N-terminus of perlecan via its HS chains and thus it might constitute a bridge between basement membranes and the surrounding collagenous matrices linked by the LRRs of PRELP [529]. Notably, the N-terminal positively-charged domain of PRELP inhibits NF-κB signaling and thus acts as a potent anti-resorptive molecule attenuating osteoclast formation [530]. A peptide derived from the N-terminus of PRELP has been recently shown to concurrently inhibit the progression of osteoporosis and the formation of osteolytic bone metastases from aggressive mammary carcinoma xenografts [531]. We should point out, however, that in pancreatic ductal carcinomas, increased levels of PRELP correlate with either a good [532] or bad prognosis [533], suggesting that there might be, as in other cases, an organ and tissue-specificity of activity. Another interesting biological function of PRELP is its ability to inhibit the formation of complement membrane attack complex [534]. Thus, PRELP could suppress complement attack near basement membranes of vascularized tissues and diminish pathological complement activation in chronic inflammatory disease such as rheumatoid arthritis [534]. Recently, this bioactivity of PRELP has been exploited in murine models of macular degeneration, where AAV-mediated delivery of human PRELP inhibits complement activation, choroid angiogenesis and deposition of the membrane attack complex [535]. Another Class II SLRP member is keratocan, a KSPG involved in maintaining corneal transparency [536,537]. It contains three chains of KS, a highly-sulfated linear polymer of N-acetyl-lactosamine covalently linked to asparagine residues via a mannose-containing oligosaccharide. Mice deficient in keratocan, Kera−/−, have normal corneal transparency, but they exhibit a thinner corneal stroma and a narrow corneal/iris angle vis-à-vis wild type littermates [538]. Moreover, Kera−/− corneas have larger collagen fibrils and abnormal packing of the stromal collagen, indicating a role for keratocan in maintaining proper corneal structure [538]. Consistent with a role for keratocan in corneal physiology, KERA mutations in a Finnish population have been causatively linked to a severe form of cornea plana, the autosomal recessive cornea plana (CNA2) [539]. The key observation, which now extends to other non-Finnish populations [540], is that these affected patients have recessively inherited N247s mutation that replaces a single asparagine residue in the LRRR consensus sequence. This leads to loss-of-function of keratocan. Noticeably, another mutation Q174X leads to a truncated form of keratocan and this is also linked to CNA2 [539]. Keratocan is expressed in organs other than the ocular system, especially prominent being skin, tendon, cartilage and striated muscle [536,537,541]. It is also expressed in osteoblasts and it might be involved in osteogenesis since Kera−/− show a decreased rate of bone formation and mineral apposition [542]. Keratocan fragments, together with fragments of other SLRPs (decorin, biglycan and lumican) are increased in degenerate human menisci, knees and articular cartilages [543]. As in the case of lumican, keratocan contains short, non-sulfated polylactosamine chains in tissues other than cornea [536], suggesting that they might serve other functions in non-ocular systems. In addition to its structural role, this KSPG is involved in regulating corneal inflammation by actively binding the major neutrophil chemokine CXCL1/KC and forming a chemokine gradient that evokes neutrophil recruitment [522]. Moreover, keratocan and lumican play a role in resolution of the inflammatory response, as neutrophils are required for cleavage of these KSPGs and release of cleavage products and chemokines are detected in the anterior chamber, resulting in loss of the chemokine gradient and cessation of neutrophil infiltration [544]. Osteoadherin, also known as osteomodulin, was originally isolated from guanidinium extracts of bovine bone as a cell-binding KSPG, hence its eponym [545]. It is highly expressed in mineralized tissues reaching concentrations of up to 400 µg/g wet weight, where it localizes to the primary spongiosa of fetal growth plate [545]. Osteoadherin contains six closely-spaced tyrosine sulfate residues in its N-terminal extension and two in its C-terminal region [546]. The N-terminal domain also contains a large number of acidic amino acid residues that, together with the tyrosine sulfate ones, would generate a strong polyanionic scaffold [547]. This region could simulate “heparin” in several interactions with growth factors and cell surfaces [242]. Indeed the tyrosine sulfate-rich domains of both fibromodulin and osteoadherin bind basic cluster motifs shared by a wide variety of heparin-binding proteins and growth factors [468]. The ability of osteoadherin to provide a cell-binding substrate is shown by the fact that this KSPG is as efficient as fibronectin in promoting osteoblast attachment in vitro [545]. The binding is mediated by the αvβ3 integrin, as shown by osteoadherin-linked affinity chromatography [545]. During endochondral bone formation, the glycosylation pattern of osteoadherin is quite unique. It is primarily a KSPG in the mineralized zone of developing bones, but it is unglycanated in the non-mineralized zones [548], further reinforcing a role for this KSPG in endochondral bone mineralization [548]. Osteoadherin, together with biglycan, decorin and fibromodulin, is dynamically expressed during odontogenesis [358,549]. Osteoadherin is primarily localized in the predentin, generating a gradient toward the mineralization front suggesting a direct role in regulating tooth development [549].

Class III SLRP

This class encompasses three structurally and genomically related members, epiphycan, opticin and osteoglycin. This class contains only seven LRRs in contrast to the more usual 10–12 LRRs of the other classes. In common with Class II members, Class III members harbor N-terminal consensus sequences for tyrosine sulfation, which may provide am signal for keratan sulfate addition to the protein core during assembly and post-translational modification [536]. Epiphycan was originally isolated as a glycoprotein from the epiphyseal cartilage, and thus its eponym [550]. It was soon realized that epiphycan is the mammalian ortholog of the avian dermatan sulfate proteoglycan PG-Lb, isolated from the developing chick cartilage, with Lb standing for its low buoyant density during its purification using density gradient ultracentrifugation [551]. Epiphycan has a precise spatiotemporal distribution during cartilage development and is localized to the entire growth plate, suggesting that epiphycan is a player in chondrogenesis [552]. Although the Epyc−/− mice have a mild bone phenotype, the epiphycan/biglycan double-knockout mice have shorter long bones and developed osteoarthritis with age, suggesting a potential synergism between these two SLRPs [553]. Notably, epiphycan has recently been shown to be part of collagen IX interactome, further suggesting that it might be involved in growth plate organization [554]. Opticin was concurrently isolated and characterized by several groups and has also been named oculoglycan [555-557]. The eponym obviously derives from its original ocular source of cloning/purification, although its expression is not limited to the eye. Several opticin ESTs are present in many data banks from nonocular tissues including brain, kidney, urinary bladder and uterus. Indeed, opticin is also expressed in the human articular cartilage and it's degraded in osteoarthritis by MMP-13 [558]. In contrast, in the mouse opticin appears to be localized to the eye, especially in the ciliary body [325]. More recently, opticin has been identified as one of the tyrosine sulfated constituents of the retinal pigment epithelium [559]. In common with Class I member decorin [369,371], opticin inhibits angiogenesis [560] by binding to collagen and competitively disrupting the interaction of collagen with α1β1 and α2β1 integrins, two key receptors regulating experimental and developmental angiogenesis [194,561-564]. Osteoglycin, also known as mimecan, was originally isolated as a truncated protein from bone and later identified as a keratan sulfate SLRP in the cornea [565,566]. Numerous mRNA are generated from a single Ogn gene, and these mRNA are all detectable in the cornea, although a single protein core is generated [326,567,568]. Functionally, Ogn−/− mice have increased collagen fibril diameter in both cornea and dermis [569], analogous to other SLRP phenotypes described above. These studies have been corroborated by the observation that both osteoglycin and epiphycan appear to be proteolytically processed in vivo. Specifically, osteoglycin is processed by BMP-1/Tolloid-like metalloproteinases and this processing enhances osteoglycin's ability to regulate collagen fibrillogenesis [570]. One of the emerging biological roles of osteoglycin is its ability to modulate myocardial integrity and injury, and to affect cardiac remodeling in concert with several ECM glycoproteins of the myocardium [571]. An integrated genomic approach has found that elevated osteoglycin is a positive regulator of rat left ventricular cardiac mass, and OGN transcript abundance has the highest correlation with left ventricular mass among 22,000 subjects tested [572]. In support of these observations is the finding that abnormal collagen assembly in Ogn−/− mice leads to increased infarct rupture and wall thinning after myocardial infarction, and this phenotype can be improved by adenoviral-mediated Ogn gene delivery [573]. Of interest, and in analogy to decorin and biglycan which are also increased in the circulation following inflammation and cancer, circulating osteoglycin levels are markedly increased in patients with ischemic heart failure and correlate with markers of cardiac remodeling [573]. Recently, it has been shown that osteoglycin can act as anabolic bone factor secreted by muscle cells [574], and that increased levels of circulating osteoglycin correlate with vascular remodeling in apolipoprotein E-deficient mice [575]. Thus, osteoadherin or fragment of it could act as predictors of adverse cardiovascular events after coronary angiography [576]. Thus, as in the case of decorin and biglycan, several other SLRPs are found in the circulation, and, hopefully in the near future, we will dissect their function as key components of the human plasma.

Class IV SLRP

This non-canonical class of SLRPs includes chondroadherin [577], nyctalopin [578,579] and tsukushi [580]. Chondroadherin is primarily located in cartilage and provides a link between chondrocytes and the surrounding ECM via specific interactions with the α2β1 integrin [581] and HS chains [582]. As perlecan also binds to the same integrin [168] and it contains HS chains at its N-terminus, it is possible that chondroadherin and perlecan could compete for the same binding site, especially when perlecan has been shown to be arranged in a peri-chondrocytic basement membrane-like zone [583]. Chondroadherin also binds to collagens II and VI [242] and Chad−/− mice show both cartilage and bone abnormalities [584]. A recent study using atomic force microscopy has shown that the Chad−/− cartilage show abnormal collagen network assembly and mechanical properties especially in the superficial cartilage zone [585]. Nyctalopin is a quite interesting and unique SLRP for two reasons: (a) It is the only member of this family that is GPI-anchored to plasma membrane, and (b) It is the only SLRP gene member, with the exception of biglycan, to be located on the X chromosome. Several mutations in the NYX gene have been causatively linked to X-linked congenital stationary blindness, a group of retinal diseases characterized by reduced nocturnal vision, often associated with myopia and reduced visual acuity [578,579]. Notably, mutations in the TRPM1 gene (transient receptor potential cation channel subfamily member 1) are associated with congenital stationary blindness [586]. It is interesting that in the mouse eye, nyctalopin is a transmembrane SLRP rather than anchored via GPI [587]. Thus, it seems that the mode of anchor to the plasma membrane is not important, rather the orientation of this SLRP and its exposed LRR interacting with other receptors and surface proteins. Indeed, nyctalopin interacts directly with both TRPM1 [588,589] and the glutamate receptor mGluR6 [589]. Thus, it is likely that nyctalopin is a key component of a supramolecular complex, where this SLRP acts as scaffold to target and maintain the correct signaling ensemble at the visual synapse. The final Class IV SLRP member is tsukushi, an eponym derived from its expression pattern in avian embryos reminiscent of the Japanese horsetail plant tsukushi (Equisetum arvense) [580]. Tsukushi has important regulatory functions insofar as it is involved in modulating BMP and Wnt signaling pathways [580,590-593]. For example, overexpression of tsukushi in embryonic retinal cells, both in vivo and in vitro, effectively antagonizes Wnt2b and represses Wnt-dependent specification of peripheral eye fates [593]. Moreover, tsukushi binds TGFβ1 [594] and controls macrophage function by inhibiting TGFβ1 [595]. Notably, targeted inactivation of the Tsk gene in mice causes malformation of the corpus callosum, similar to the SPOCK1 mutants [596] (see below) and agenesis of the anterior commissure [597]. This forebrain commissure formation is co-regulated by draxin, dorsal inhibitory axon guidance protein [598]. Finally, tsukushi has been shown to control the hair cycles by regulating the TGFβ1/Smad pathway [594]. Tsukushi shares several functional properties with other SLRPs such as decorin and biglycan, which have been shown to bind TGFβ1 and modulate BMP and Wnt pathways [278,367,378,442,599-602], as well as controlling the hair follicle cycle [603].

Class V SLRP

This is the least studied family of non-canonical SLRPs with only two members, podocan [604,605] and podocan-like [606]. The eponym derives from its high expression in podocytes isolated from sclerotic glomeruli of experimental HIV-associated nephropathy [604]. In normal kidneys, podocan shows a distribution along the basement membrane of the glomeruli and proximal tubules [604], and more recent studies have shown that podocan is a constituent of human aortic tissue [607]. In agreement with its tissue distribution, podocan has been identified as a negative regulator of migration and proliferation of smooth muscle cells [608]. On this basis, podocan can affect atherosclerosis development like other SLRPs, such as biglycan. Of note, Podn−/− smooth muscle cells exhibit a constitutively-activated Wnt pathway, whereas wild-type smooth muscle cells overexpressing podocan have a significantly depressed Wnt signaling pathway [608], biological properties also shared by other SLRPs (see above). As in the case of other-non canonical SLRPs, podocan shares functional properties with decorin and biglycan, especially in its ability to bind collagen I and to induce p21WAF1 and growth suppression [605].

Testican/SPOCK family

The next subclass of extracellular proteoglycans includes the testican/SPOCK family of genes. Testican was originally isolated from seminal fluid over two decades ago as a hybrid CS/HSPG [609] and its sequence showed homology to SPARC, secreted protein acidic and rich in cysteine, also known as BMP-40 [610]. The testican family of HSPGs has now been shown to comprise three members and has been renamed SPOCK, referring to SPARC/Osteonectin CWCV and Kazal-like domain proteoglycans [611-614]. SPOCKs have a modular structure, similar to perlecan and agrin, characterized by five domains (Fig. 6). Domain I, a SPOCK-specific N-terminal domain, does not have any significant homology except to other members of the testican/SPOCK proteoglycan gene family. Domain II is a cysteine-rich module homologous to follistatin, also shared by agrin (cfr. Fig. 1). Domain III shares homology with the extracellular calcium-binding domain of SPARC, characterized by two Ca2+-binding EF-hand motifs [615]. Domain IV harbors a thyroglobulin-like domain, relatively short sequence stabilized by three disulfide bonds and harboring a CWCV tetrapeptide sequence [612]. The C-terminal Domain V, in analogy to Domain I, is unique to the testican/SPOCK family and harbors two potential GAG attachment sites [616]. Notably, SPOCK3 contains two consecutive SGD triplets, known attachment sites for HS also shared by perlecan and agrin. Although isolated from testis, it has become apparent that SPOCKs are almost exclusively expressed in the central nervous system and they are primarily HSPGs. For example, SPOCK1 is associated with the postsynaptic area of the hippocampus pyramidal cells [617], while SPOCK2 has been located to various neuronal cells of several brain regions including the corpus callosum, cerebral peduncles and fimbria fornix [612]. SPOCK3 is an HSPG in brain and appears to be ubiquitously expressed in the cerebral nervous system, including the forebrain, the striatum, the thalamus and to a lesser extent the cortex [616]. Notably, SPOCK3 and MMP-16 can be co-induced by TGFβ-evoked upregulation of a specific MKL1 isoform, a cofactor for the transcriptional program regulated by serum responsive elements [618].
Fig. 6

Schematic representation of the modular organization of testican/SPOCK family of brain-specific proteoglycans. The five domains in roman numerals from N- to C-terminus are indicated at the top, and their structural homology is indicated at the bottom. Domains I and V appear to be specific for this family, whereas the other domains are shared with other proteoglycan gene families (see Fig. 1). The C-terminal Domain V contains two attachment sites for heparan sulfate chains labeled by asterisks. SP, signal peptide.

Functional studies utilizing recombinant SPOCK-2 proteoglycan and protein core have shown that both forms inhibit neurite extension from cerebellar neurons, thus providing strong support to the notion that SPOCKs are involved in neuronal regulation [619]. In support of these studies, Spock3−/− mice show many structural anomalies of the corpus callosum and cortical axonal tracts linked to abnormal behavior, supporting a role for Spock3 gene in neuronal tropism [596]. A novel de novo missense mutation in SPOCK1 on chromosome 5q31 (c.239A>T; p.D80V), has been recently shown to cause a syndrome including intellectual disability with dyspraxia, dysarthria, partial agenesis of corpus callosum, prenatal-onset microcephaly and atrial septal defect [620]. As this mutation, i.e. replacement of a polar aspartic acid with a hydrophobic nonpolar valine, affects a highly-conserved area of the gene, it is plausible that an abnormal SPOCK1 could contribute to this human phenotype of developmental delay and microcephaly.

Other proteoglycans

There are a number of part-time proteoglycans that are not included in this comprehensive nomenclature, including Prg4/lubricin, endocan, leprecan, collagens IX and XII, bikunin and CD44. These molecules have been investigated to a lesser extent and reports are scarce regarding their biological functions as proteoglycans. We do apologize to the authors working on these interesting molecules and we hope to cover them in future updates of this nomenclature.

Final considerations

Of the 43 genes encoding full-time proteoglycans, only 33 appear to be glycanated. Thus, roughly 1 in 10,000 genes in the human genome codes for a proteoglycan protein core. This is quite amazing and indicates that proteoglycans play fundamental and often vital functions necessary for life to operate and evolve. We are confident that new proteoglycans will be discovered in the future. One of the major difficulties in finding new proteoglycans is their large size and negative charge. Both hinder proper separation in conventional acrylamide or 2D gels used for routine proteomic studies of various biological fluids and tissues. However, as in the case of agrin and collagen XVIII which were studied for several years without knowing their proteoglycan nature, it is likely that there will be significant discoveries of known proteins as being members of the “restricted” proteoglycan gene family. We hope that this nomenclature will help researchers who want to familiarize themselves with our exciting and growing field of proteoglycan biology.
  610 in total

Review 1.  Transcriptional and posttranscriptional regulation of proteoglycan gene expression.

Authors:  R V Iozzo; K G Danielson
Journal:  Prog Nucleic Acid Res Mol Biol       Date:  1999

2.  Differential splicing and alternative polyadenylation generate multiple mimecan mRNA transcripts.

Authors:  E S Tasheva; L M Corpuz; J L Funderburgh; G W Conrad
Journal:  J Biol Chem       Date:  1997-12-19       Impact factor: 5.157

3.  Expression pattern and gene characterization of asporin. a newly discovered member of the leucine-rich repeat protein family.

Authors:  S P Henry; M Takanosu; T C Boyd; P M Mayne; H Eberspaecher; W Zhou; B de Crombrugghe; M Hook; R Mayne
Journal:  J Biol Chem       Date:  2001-01-10       Impact factor: 5.157

4.  Lumican, a small leucine-rich proteoglycan substituted with keratan sulfate chains is expressed and secreted by human melanoma cells and not normal melanocytes.

Authors:  M Sifaki; M Assouti; D Nikitovic; K Krasagakis; N K Karamanos; G N Tzanakakis
Journal:  IUBMB Life       Date:  2006-10       Impact factor: 3.885

5.  A role for perlecan in the suppression of growth and invasion in fibrosarcoma cells.

Authors:  M Mathiak; C Yenisey; D S Grant; B Sharma; R V Iozzo
Journal:  Cancer Res       Date:  1997-06-01       Impact factor: 12.701

6.  Fibromodulin and lumican bind to the same region on collagen type I fibrils.

Authors:  L Svensson; I Närlid; A Oldberg
Journal:  FEBS Lett       Date:  2000-03-24       Impact factor: 4.124

7.  Lumican and decorin are differentially expressed in human breast carcinoma.

Authors:  E Leygue; L Snell; H Dotzlaw; S Troup; T Hiller-Hitchcock; L C Murphy; P J Roughley; P H Watson
Journal:  J Pathol       Date:  2000-11       Impact factor: 7.996

8.  A role for versican in the development of leiomyosarcoma.

Authors:  Paul A Keire; Steven L Bressler; Joan M Lemire; Badreddin Edris; Brian P Rubin; Maziar Rahmani; Bruce M McManus; Matt van de Rijn; Thomas N Wight
Journal:  J Biol Chem       Date:  2014-10-15       Impact factor: 5.157

9.  Decorin is a secreted protein associated with obesity and type 2 diabetes.

Authors:  K Bolton; D Segal; J McMillan; J Jowett; L Heilbronn; K Abberton; P Zimmet; D Chisholm; G Collier; K Walder
Journal:  Int J Obes (Lond)       Date:  2008-04-15       Impact factor: 5.095

10.  Lumican: a new inhibitor of matrix metalloproteinase-14 activity.

Authors:  Katarzyna Pietraszek; Aurore Chatron-Colliet; Stéphane Brézillon; Corinne Perreau; Anna Jakubiak-Augustyn; Hubert Krotkiewski; François-Xavier Maquart; Yanusz Wegrowski
Journal:  FEBS Lett       Date:  2014-10-07       Impact factor: 4.124

View more
  337 in total

Review 1.  Sulfated glycosaminoglycans in protein aggregation diseases.

Authors:  Kazuchika Nishitsuji; Kenji Uchimura
Journal:  Glycoconj J       Date:  2017-04-11       Impact factor: 2.916

Review 2.  Decorin is a devouring proteoglycan: Remodeling of intracellular catabolism via autophagy and mitophagy.

Authors:  Simone Buraschi; Thomas Neill; Renato V Iozzo
Journal:  Matrix Biol       Date:  2017-11-07       Impact factor: 11.583

3.  Multiple roles of epithelial heparan sulfate in stomach morphogenesis.

Authors:  Meina Huang; Hua He; Tatyana Belenkaya; Xinhua Lin
Journal:  J Cell Sci       Date:  2018-05-29       Impact factor: 5.285

4.  The Tyrosine Sulfate Domain of Fibromodulin Binds Collagen and Enhances Fibril Formation.

Authors:  Viveka Tillgren; Matthias Mörgelin; Patrik Önnerfjord; Sebastian Kalamajski; Anders Aspberg
Journal:  J Biol Chem       Date:  2016-09-15       Impact factor: 5.157

Review 5.  Extracellular matrix: The driving force of mammalian diseases.

Authors:  Renato V Iozzo; Maria A Gubbiotti
Journal:  Matrix Biol       Date:  2018-04-03       Impact factor: 11.583

6.  High-density lipoproteins are a potential therapeutic target for age-related macular degeneration.

Authors:  Una L Kelly; Daniel Grigsby; Martha A Cady; Michael Landowski; Nikolai P Skiba; Jian Liu; Alan T Remaley; Mikael Klingeborn; Catherine Bowes Rickman
Journal:  J Biol Chem       Date:  2020-07-31       Impact factor: 5.157

7.  The perlecan-interacting growth factor progranulin regulates ubiquitination, sorting, and lysosomal degradation of sortilin.

Authors:  Ryuta Tanimoto; Chiara Palladino; Shi-Qiong Xu; Simone Buraschi; Thomas Neill; Leonard G Gomella; Stephen C Peiper; Antonino Belfiore; Renato V Iozzo; Andrea Morrione
Journal:  Matrix Biol       Date:  2017-04-20       Impact factor: 11.583

8.  Two glycosaminoglycan-binding domains of the mouse cytomegalovirus-encoded chemokine MCK-2 are critical for oligomerization of the full-length protein.

Authors:  Sergio M Pontejo; Philip M Murphy
Journal:  J Biol Chem       Date:  2017-04-21       Impact factor: 5.157

Review 9.  Decorin interacting network: A comprehensive analysis of decorin-binding partners and their versatile functions.

Authors:  Maria A Gubbiotti; Sylvain D Vallet; Sylvie Ricard-Blum; Renato V Iozzo
Journal:  Matrix Biol       Date:  2016-09-30       Impact factor: 11.583

Review 10.  Osteogenesis imperfecta and therapeutics.

Authors:  Roy Morello
Journal:  Matrix Biol       Date:  2018-03-11       Impact factor: 11.583

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.