Literature DB >> 22813778

Repeated evolution of identical domain architecture in metazoan netrin domain-containing proteins.

Abstract

The majority of proteins in eukaryotes are composed of multiple domains, and the number and order of these domains is an important determinant of protein function. Although multidomain proteins with a particular domain architecture were initially considered to have a common evolutionary origin, recent comparative studies of protein families or whole genomes have reported that a minority of multidomain proteins could have appeared multiple times independently. Here, we test this scenario in detail for the signaling molecules netrin and secreted frizzled-related proteins (sFRPs), two groups of netrin domain-containing proteins with essential roles in animal development. Our primary phylogenetic analyses suggest that the particular domain architectures of each of these proteins were present in the eumetazoan ancestor and evolved a second time independently within the metazoan lineage from laminin and frizzled proteins, respectively. Using an array of phylogenetic methods, statistical tests, and character sorting analyses, we show that the polyphyly of netrin and sFRP is well supported and cannot be explained by classical phylogenetic reconstruction artifacts. Despite their independent origins, the two groups of netrins and of sFRPs have the same protein interaction partners (Deleted in Colorectal Cancer/neogenin and Unc5 for netrins and Wnts for sFRPs) and similar developmental functions. Thus, these cases of convergent evolution emphasize the importance of domain architecture for protein function by uncoupling shared domain architecture from shared evolutionary history. Therefore, we propose the terms merology to describe the repeated evolution of proteins with similar domain architecture and discuss the potential of merologous proteins to help understanding protein evolution.

Entities: Chemical Disease Gene Species

Mesh：

Substances：

Year: 2012 PMID： 22813778 PMCID： PMC3516229 DOI： 10.1093/gbe/evs061

Source DB: PubMed Journal: Genome Biol Evol ISSN： 1759-6653 Impact factor: 3.416

Introduction

Protein domains are distinct units that can fold autonomously into a particular, stable, three-dimensional structure and are often conserved during evolution. In eukaryotes, the majority of proteins are multidomain proteins, and the particular number and order of these domains defines the domain architecture of the protein (Koonin et al. 2002). During evolution, domains can be recombined into different arrangements to create proteins with new functions, a process that contributes significantly to the expansion of protein repertoires despite a limited number of domains. The generation of multidomain proteins occurs through gene fusion and fission events that can lead to gain, rearrangement, or loss of domains, whereas gene duplication can generate protein families with shared domain architecture (Weiner et al. 2006; Moore et al. 2008). Thus, the unique domain architecture of each protein family is classically considered to have originated only once during evolution (Vogel et al. 2005). However, recent work proposed that a subset of domain architectures could have appeared several times independently. In an analysis of 62 genomes, Gough (2005) estimated that between 0.4% and 4% of domain architectures could be the result of convergent evolution. In a study based on gene trees instead of species trees, Forslund et al. (2008) argued that even between 5.6% and 12.4% of domain architecture could have originated several times independently, although approximately two-third of the cases involve only the loss of domains. These findings suggest that in contrast to traditional concepts, convergent evolution of domain architecture might significantly contribute to the expansion of proteomes. Although the strength of the above-mentioned studies lies in their broad sampling of complete genomes, the huge size of these datasets necessarily limits the depth of analysis and the type of approaches that can be used. In the case of the analyses based on species trees (Gough 2005; Kummerfeld and Teichmann 2005), the different domain architectures are plotted on species trees to reconstruct the most parsimonious scenario for the origin of these multidomain proteins. This type of approach cannot take into account cases of horizontal gene transfer or independent evolution of the same domain architecture in the same lineage. Approaches based on domain phylogeny (Forslund et al. 2008) take into account these two possibilities (Yanai et al. 2002; Jordan et al. 2003; Zhang et al. 2010; Wu et al. 2011); however, they rely on the phylogenies of short protein domains that often lead to only moderate support for the obtained tree topologies and that can be subject to various tree reconstruction artifacts. Therefore, detailed studies of individual cases are necessary to validate or reject the possibility of independent evolution of the same domain architecture. Here, we focus on two families of secreted developmental regulators, netrin and secreted frizzled-related proteins (sFRPs), because complex phylogenetic patterns for these two protein families have been noticed before but have led to contradictory interpretations (see below). Netrins and sFRPs are essential regulators of embryonic development (Serafini et al. 1996; Satoh et al. 2006). Netrins regulate axon guidance and other developmental processes by binding to the transmembrane receptors Deleted in Colorectal Cancer (DCC)/Neogenin and Unc5 (Moore et al. 2007; Rajasekharan and Kennedy 2009; Lai Wing Sun et al. 2011), whereas sFRPs act as modulators of the Wnt signaling pathway, which plays a prominent role in axial patterning (Bovolenta et al. 2008; Petersen and Reddien 2009; Mii and Taira 2011). Both protein groups contain a so-called netrin domain (Banyai and Patthy 1999; Chong et al. 2002), which is characterized by an enrichment of basic residues and a particular spacing pattern of cysteines, that cause it to fold into one β-barrel and two α-helices, which are located at the N- and C-termini of the domain (Banyai and Patthy 1999; Liepinsh et al. 2003; Bramham et al. 2005). Netrin domains are present in various multidomain proteins that have diverse overall structure and function, for example, complement components C3–C5 or WFIKKN (WAP, Follistatin/kazal, immunoglobulin [IG], Kunitz, and netrin domain-containing protein). They are also found in the single domain protein tissue inhibitor of metalloproteases (TIMPs), present in metazoans and Eubacteria (Brew and Nagase 2010). Netrin and sFRP multidomain proteins are widespread in metazoans and constitute multigene families. Netrin proteins are composed of two parts: the N-terminal part is a supra-domain (Vogel et al. 2004) that consists of one LamininNT domain plus three epidermal growth factor (EGF) domains, homologous to domains VI and V of laminins, and the C-terminal part contains one netrin domain (Banyai and Patthy 1999; Koch et al. 2000; Qin et al. 2007; Rajasekharan and Kennedy 2009). In contrast, netrin-G1 and netrin-G2 proteins are composed of the LamininNT-3EGF supra-domain plus a particular C-terminal domain that is not related to the netrin domain (Nakashiba et al. 2000; Yin et al. 2002; Rajasekharan and Kennedy 2009). sFRPs are composed of two domains: the N-terminal part is a frizzled-cysteine-rich domain (CRD) domain and the C-terminal part is a netrin domain (Banyai and Patthy 1999; Chong et al. 2002; Bovolenta et al. 2008). sFRP polyphyly among frizzled-type proteins has been noticed previously (Adamska et al. 2010), but the shared domain architecture was considered as evidence for an artifact of the phylogenetic reconstruction. For netrins, grouping of the LamininNT-3EGF supra-domain of netrin-1/2/3/5, netrin-4, and netrin-G with different Laminin groups has been observed (Koch et al. 2000; Nakashiba et al. 2000; Yin et al. 2002; Moore et al. 2007; Rajasekharan and Kennedy 2009) but based on their common domain architecture, the former two groups of netrins have been considered to come from a single ancestor. Recently, Fahey and Degnan (2012) reinterpreted this phylogenetic pattern as an indication of independent evolution of the same domain architecture. However, the phylogenetic support for an independent origin of different netrin groups was weak, and none of the above-mentioned studies have tested this possibility thoroughly. Therefore, we have analyzed the evolution of netrin domain-containing proteins in detail and have recovered a phylogenetic pattern supporting a convergence of domain architecture for both netrin and sFRP proteins. We assessed the strength of these hypotheses of convergence using a broad range of reconstruction methods and tests. We show that an independent origin of netrin-1/2/3/5 and netrin-4 is strongly supported and cannot be explained by known tree reconstruction artifacts (functional convergence, gene conversion, mutational saturation, heterogeneity of base composition, long-branch attraction, and heterotachy). Polyphyly of sFRPs was clearly favored by phylogenetic analyses and was not caused by the tested reconstruction artifacts. However, statistical tests did not reject monophyletic tree topologies for sFRPs, probably because of the short size of the domains contained in these proteins. These findings strongly suggest that the protein architecture shared by the two groups of netrins and the two groups of sFRPs does not reflect common evolutionary ancestry but instead is the result of independent events of domain rearrangement. The similar molecular interactions and functions of the two groups of netrins and sFRPs provide a striking example for the importance of domain architecture for protein function, independently of shared evolutionary history.

Materials and Methods

Whole Genomes Analyses

We searched for frizzled-CRD, netrin, and LamininNT domains and frizzled, netrin-1/2/3/5, netrin-4, netrin-G, Laminin, sFRP-1/2/5, and sFRP-3/4 proteins, and the netrin receptors Neogenin/DCC and Unc5 in 388 complete genomes belonging to all major eukaryote clades. This included all draft and finished eukaryote genomes available from the Joint Genome Institute (JGI) and NCBI (see supplementary table S1, Supplementary Material online for databases, genome, and sequence information) on 1 September 2011. Gene searches were performed using BLAST (blastp and tblastn) with Nematostella, Strongylocentrotus, Drosophila, and Mus proteins as query sequences, against protein and genome databases with the default BLAST parameters and an e-value threshold of 0.1. We then used various validated sequences as queries for a second round of BLAST search on complete genomes. In addition, NCBI Conserved Domain Database (http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi?), PFAM (http://pfam.sanger.ac.uk/—domain references: netrin, PF01759; CRD-frizzled, PF01392; frizzled-7TM, PF01534; LamininNT, PF00055; EGF, PF00053), Interpro (http://www.ebi.ac.uk/interpro/), UniProt (http://www.uniprot.org/uniprot/), and Superfamily (http://supfam.cs.bris.ac.uk/SUPERFAMILY/) were searched for the different proteins and domains in complete Eukaryota, Eubacteria, and Archea genomes.

Data Sampling and Assembly for Phylogenetic Analyses

To reconstruct the phylogeny of netrin domains, frizzled-CRD domains, and LamininNT-3EGF supra-domains, we recovered all genes containing at least one of these domains or supra-domains by BLAST search (tblastn and blastp) against a selection of metazoan and choanoflagellate complete genomes: Mus musculus (NCBI), Danio rerio (NCBI), Branchiostoma floridae (JGI), Strongylocentrotus purpuratus (NCBI + SpBase), Caenorhabditis elegans (NCBI + WormBase), Drosophila melanogaster (NCBI + FlyBase), Capitella teleta (JGI), Lottia gigantea (JGI), Nematostella vectensis (JGI), Trichoplax adhaerens (JGI), Amphimedon queenslandica (spongezome), and Monosiga brevicollis (JGI). As outgroups we included Usherin, TIMP, and Smoothened sequences for the LamininNT-3EGF, netrin, and frizzled-CRD domain, respectively. The 11 metazoan genomes include 304 frizzled-CRD domain-containing sequences. As a result of massive tandem domain duplication, more than 80 of these domains are encoded in the Branchiostoma floridae genome. Initial maximum-likelihood (ML) analysis (data not shown) with all 304 sequences showed that one-third of them fall in a clade containing almost only frizzled receptors and sFRPs. We excluded sequences that did not fall into this clade from further analysis. Each domain dataset was aligned independently using the software Muscle (Edgar 2004) under default parameters and adjusted manually in BioEdit (Hall 1999). Partial sequences, positions ambiguously aligned or containing more than 70% of gaps and/or missing data were deleted (see supplementary table S2, Supplementary Material online for details about sequences). Alignments of nucleotide sequences were generated based on the amino acid alignments using BioEdit. To exclude fast-evolving sequences and test the impact of the outgroup on the topology, we removed D. melanogaster, C. elegans, S. purpuratus, T. adhaerens, A. queenslandica, and M. brevicollis and outgroup sequences (Usherin, TIMP, and Smoothened), and some particularly unstable sequences between nonparametric bootstrap replicates of the complete amino acid dataset (see supplementary figs. S7–S9, Supplementary Material online for more details). We refer to these matrices as “reduced datasets” in comparison with the “complete datasets” containing all sequences. Amino acid alignments for neogenin/DCC and Unc5 were generated as described above with protogenin/nope/punc and ankyrin as outgroups for neogenin/DCC and Unc5, respectively (Salbaum and Kappen 2000; Toyoda et al. 2005; Wang et al. 2009). Neogenin and DCC are both composed of four IG and five fibronectin 3 (FN3) domains, whereas the outgroup sequences are composed of four IG and three to five FN3 domains. To produce an accurate domain alignment, we performed phylogenetic analyses of the individual FN3 and IG domains present in the neogenin–protogenin proteins of the mouse genome (see Supplementary Material online for more details).

Phylogenetic Analyses

Bayesian phylogenetic inferences were performed using MrBayes v3.1.2 (Ronquist and Huelsenbeck 2003) under the amino acid substitution models WAG + Γ(4) + I. For the nucleotide dataset, partitioned Bayesian analyses were carried out using a GTR + Γ(4) + I model for each codon position with parameter settings optimized independently for each of the three codon partitions. We ran twice 2 searches of four chains for 2 million generations each, sampled every 100 generations, for all datasets except for the complete frizzled-CRD data for which the same number of chains was run for 5 million generations. All other settings were kept as default. Convergence was estimated for each search by using the standard deviation of split frequencies and potential scale reduction factors reported by the software and by checking stasis of the likelihood values using the command “sump” in MrBayes. Posterior probabilities (PP) were estimated by constructing a majority rule consensus of trees, sampled every 100 generations from 1,500,000 to 2,000,000 generations. For the complete frizzled-CRD domain phylogeny, we sampled from 4,500,000 to 5,000,000. Finally, the consensus trees of the two independent searches were compared to confirm convergence on the same topology. ML analyses were performed using PhyML 3.0 (Guindon and Gascuel 2003) with BioNJ starting tree and NNI branch swapping. The best-fitting model of amino acid substitution for each dataset was estimated using ProtTest v2.4 (Abascal et al. 2005) under the Akaike information criterion (starting tree: tree from a preliminary ML analysis using PhyML and a LG + Γ + I model with 8 rate categories for the γ distribution). The selected models were WAG + Γ + I for netrin and LamininNT-EGF domain datasets, and LG + Γ + I for frizzled-CRD, neogenin/DCC, and Unc5 datasets. We used the GTR + Γ + I model for all ML analyses of the nucleotide datasets. We considered eight rate categories for the γ distribution in all ML analyses. Maximum parsimony (MP) analyses were performed using PAUP* 4.0b10 (Swofford 2002). All characters were treated as equally weighted, and unordered and gaps were treated as missing data. Heuristic analyses were performed with 500 random addition of sequences and the TBR algorithm for branch swapping. Branch robustness in the MP and ML analyses was estimated using nonparametric bootstrap (BP) (Felsenstein 1985) with 100 or 500 replicates depending on the analyses (10 random addition sequences for each MP bootstrap replicate). Trees were visualised using Mega 5.0 (Tamura et al. 2011).

Approximately Unbiased and Parametric Bootstrap Tests

For the approximately unbiased (AU) test (Shimodaira 2002), different topologies were generated by rearranging in TreeView (Page 1996) the branching order of the ML trees having the best likelihood after PhyML and RaxML (7.2.6) (Stamatakis 2006) analyses using the complete amino acid domain datasets and the WAG + Γ(8) + I model. In addition, for each hypothesis of monophyly, we also included the best topology obtained with RaxML (model WAG + Γ(8) + I) under the appropriate constraint and various rearrangements of this topology. Branch lengths were estimated using Tree Puzzle 5.1 under the WAG + Γ(8) model. Likelihoods of these different test topologies were compared with each other and with the likelihood of the best PhyML tree by the AU test using CONSEL (Shimodaira and Hasegawa 2001). For a given tested hypothesis, the selected P value corresponded to the highest P value obtained among the topology displaying the tested clade. Monophyly was then rejected if the clade was not found in any of the nonrejected tested topologies. For the parametric bootstrap test (SOWH test of Goldman et al. 2000), the null hypotheses were that sFRP and/or netrin had a single ancestor and that the polyphylies obtained in domain phylogenies were the result of reconstruction errors. We used the ML tree generated with RaxML under the constraint that sFRP and/or netrin were monophyletic (model WAG + Γ(8) + I) as the null hypothesis topology T0 (list of tested hypothesis in table 2). We generated 100 simulated datasets using Seq-Gen version 1.3.2 (Rambaut and Grassly 1997) of the same size as the original one, taking into account the T0 topology (including branch length), proportion of amino acid determined empirically, and parameter of the model from the constrained ML analysis including shape of the γ distribution and proportion of invariable sites. This procedure was repeated for each tested null hypothesis. Unconstrained and constrained ML searches (constrained under the topology T0 obtained from the constrained ML analysis of the real dataset) under the WAG + Γ(8) + I model were conducted on each simulated dataset using PhyML with re-estimation of all free parameters in each case. We computed the difference in ML scores between these two optimizations () for each simulated dataset to generate a frequency distribution. This provided an empirical estimate of the null distribution and allowed us to generate a critical value δ* corresponding to the 5% tail of the null distribution where the null hypothesis was statistically rejected. This δ distribution was then compared with the δ(RD) = LML − L0 obtained from the real dataset corresponding to the difference in likelihood values between the unconstrained and the constrained (netrin and/or sFRP monophyly) analyses.

Table 2

Results of the Approximately Unbiased and Parametric Bootstrap Tests for Comparison of Alternative Phylogenetic Hypotheses

Hypotheses	ln L	δ ln L	AU Test (P)	PB Test (P)	Rejected
ML netrin domain	−21377.15	-	-	-
sFRP-1/2/5 as sister to sFRP-3/4	−21378.98	1.83	0.696	0.97	No
Netrin-1/2/3/5 as sister to netrin-4	−21381.31	4.16	0.574	0.81	No
Both netrin and sFRP monophyletic	−21383.31	6.16	0.443	0.62	No
Netrin-4 as sister to Cnidaria/Bilateria netrin-1/2/3/5	−21390.72	13.57	0.219	0.10	No
Netrin-4 as sister to Deuterostomia netrin-1/2/3/5	−21411.41	34.26	0.041	<0.01	Yes
ML LamininNT-EGF supra-domain	−50081.22	-	-	-
Netrin-1/2/3/5 as sister to NetrinG	−50157.58	76.36	0.002	<0.01	Yes
Netrin-4 as sister to netrin-G	−50186.93	105.71	0.001	<0.01	Yes
Netrin-1/2/3/5 as sister to Netrin4	−50202.58	121.36	0.0004	<0.01	Yes
Netrin-4, netrin-G, netrin-4 grouping together	−50271.00	189.78	0.0002	<0.01	Yes
ML frizzled-CRD domain	−10872.47	-	-	-
sFRP1/2/5 as sister to sFRP3/4	−10876.87	4.40	0.601	0.59	No

Note.—ML analyses on complete amino acid datasets under WAG + Γ(8) + I model by PhyML. Constrained analyses performed by RaxML. Log-likelihood values recalculated by PhyML using model, topology, and free parameters from RaxML analyses. See supplementary fig. S11, Supplementary Material online for details about the parametric bootstrap analyses. Bold values indicate significant results at the 5% level.

Saturation Analysis

The nucleotide and amino acid substitution saturation of the different domains was evaluated by plotting, for each pair of sequences, the total number of differences against the number of substitutions inferred from the ML trees (as the sum of the length of all branches linking these two sequences). Observed and inferred distances were obtained in PAUP* 4.0b10.

Analysis of the Effect of Individual Sites on Netrin and sFRP Polyphyly

We assessed the influence of particular sites on the polyphyly of netrin and sFRP in the laminin-EGF, netrin, and frizzled-CRD domain amino acid matrices by comparing the log-likelihood values (ln L) for each site under unconstrained (polyphyly) and constrained (monophyly) analyses. Constrained analyses included netrin-1/netrin-4, netrin-1/netrin-G, and netrin-4/netrin-G for the LamininNT-EGF supra-domain; sFRP-1/2/5/sFRP-3/4 for the frizzled-CRD domain; and sFRP-1/2/5/sFRP-3/4, netrin-1/2/3/5/netrin-4, and netrin-4/Cnidaria-Bilateria-netrin-1 for the netrin domain. Per-site log likelihood (psln L) were recovered after constrained and unconstrained analyses in RaxML, and the difference in per-site log likelihood (Δpsln L) between competing hypotheses was calculated. To assess the effects of the highest and lowest Δpsln L on polyphyly, we extracted the corresponding sites and reanalyzed the culled matrix under the same conditions.

Identification of Slow-Evolving and Heterotachous Positions in the LamininNT-EGF and Frizzled-CRD Datasets

A simple method to sort sites according to their rate variation was derived from the slow–fast method (Brinkmann and Philippe 1999). Aligned sequences were divided into seven groups from the complete amino acid alignments: laminin-α, β, γ and netrin-1, 4, G and Usherin for LamininNT-EGF (laminin-β/γ-like, Monosiga and Amphimedon sequences were not considered); and frizzled-1/2/7-3/6, frizzled-4, frizzled-5/8, frizzled-9/10, sFRP-1/2/5, sFRP-3/4, and Smoothened for frizzled-CRD. We calculated the number of substitutions per amino acid position within each group using PAUP*. The evolutionary rate of a given position was estimated as the sum of the numbers of steps for this position within the seven groups. Positions were then sorted according to their total number of steps (those having the same number of steps were sorted randomly), to produce a list of amino acid positions from the slowest to fastest evolving. To identify the level of heterotachy per amino acid position, we computed the absolute difference between the total number of steps per site per clades. For the LamininNT-EGF supra-domain, this was done between netrin-1-Laminin-γ and netrin-4-Laminin-β clades, and for the frizzled-CRD domain between the groups frizzled-5/8-frizzled-1/2/7-3/6-sFRP-3/4 and frizzled-4-frizzled-9/10-sFRP-1/2/5. We then sorted positions according to their absolute difference in steps between clades and sites displaying the same value being sorted randomly. Using a chi-square test, we tested for each “Δ steps per site” category whether the heterogeneity inferred between the subgroups was significant. From both “fast-evolving” and “heterotachy” site lists, we generated nine matrices containing from 10% to 90% positions. This allowed us to study the evolution of nodal support for netrin and sFRP polyphylies (100 ML bootstrap replicates in PhyML) as increasingly fast-evolving or heterotachous positions were removed. We also plotted these two values per sites with the Δpsln L values from the comparative netrin-4-netrin-1 and sFRP-1/2/5-sFRP-3/4 monophyly–polyphyly analyses described above.

Results

Phylogenetic Analyses Suggest Polyphyly of Netrins and sFRPs

The netrin and sFRP protein families share a C-terminal domain enriched in basic residues and a particular spacing pattern of cysteines, the netrin domain. This domain is also present in several other proteins, such as complement components C3–C5, WFIKKN, and TIMP (Banyai and Patthy 1999), and it can be found in all metazoan genomes. Outside metazoa, according to our genome survey (388 different eukaryote genomes analyzed—table 1 and supplementary table S1, Supplementary Material online), this domain is only found as a single domain protein, TIMP, in a few eukaryotes (Sphaeroforma arctica and Ectocarpus siliculosus) and various Eubacteria (Brew and Nagase 2010). We found netrin-1/2/3/5 and sFRP-1/2/5 and sFRP-3/4 proteins in bilaterian and nonbilaterian genomes (but no sFRP-1/2/5 in insects and no sFRP-3/4 in protostomes) and netrin-4 proteins only in deuterostomes (but not in the tunicate Oikopleura dioica and the echinoderm S. purpuratus). In reconstructing the netrin domain phylogeny using Maximum Likelihood and Bayesian analysis, with amino acid sequences from 11 metazoan genomes (see Materials and Methods), we recovered that netrin and sFRP protein families were each divided into two distantly related clades: netrin-1/2/3/5 and netrin-4, and sFRP-1/2/5 and sFRP-3/4, respectively (fig. 1A). ML analyses of the netrin domain showed monophyly for all the other gene families containing a netrin domain (TIMP, complement component C3–C5, WIFKKN, ADAMTSL5, and PColCE—fig. 1A and supplementary fig. S1, Supplementary Material online). This phylogenetic pattern suggests the possibility of an independent origin of different groups of sFRP and netrin proteins.

Table 1

Distribution of Frizzled-CRD, LamininNT, and TIMP/Netrin Domains and sFRP and Netrin Proteins in Sequenced Genomes

	Domains			Proteins
	Fzd-CRD	LamininNT	TIMP/Netrin	sFRP-1/2/5	sFRP-3/4	Netrin-1	Netrin-4	Netrin-G
Vertebrata	+	+	+	+	+	+	+	+
Urochordata	+	+	+	+	+	+	+/−^a	+/−^a
Cephalochordata	+	+	+	+	+	+	+	+
Hemichordata	+	+	+	+	+	+	+	−
Echinodermata	+	+	+	+	+	+	−	−
Arthropoda	+	+	+	+/−^b	−	+	−	−
Nematoda	+	+	+	+	−	+	−	−
Mollusca	+	+	+	+	−	+	−	−
Annelida	+	+	+	+	−	+	−	−
Platyhelminthes	+	+	+	+	−	+	−	−
Cnidaria	+	+	+	+	+	+/−^c	−	−
Placozoa	+	+	+	−	+	+	−	−
Porifera	+	+	+	?^d	?^d	−	−	−
Choanoflagellata	−	+	−	−	−	−	−	−
Filasterea	−	−	+/−	−	−	−	−	−
Fungi	+/−	−	−	−	−	−	−	−
Amoebozoa	+/−	−	−	−	−	−	−	−
Apusozoa	+	−	−	−	−	−	−	−
Chromalveolata	+/−	−	+/−	−	−	−	−	−
Haptophyta	+	−	−	−	−	−	−	−
Cryptophyta	−	−	−	−	−	−	−	−
Rhizaria	+/−	−	−	−	−	−	−	−
Archaeplastida	+/−	−	−	−	−	−	−	−
Excavata	+/−	−	−	−	−	−	−	−
Eubacteria	−	−	+/−	−	−	−	−	−
Archaea	−	−	−	−	−	−	−	−

Note.— +, domain or protein present in all genomes checked; +/−, domain or protein present in some genomes checked; −, domain or protein not present in genomes checked.

aNot present in the genome of Oikopleura dioica.

bNot present in insect genomes.

cNot present in the genome of Hydra magnipapillata.

dNo true sFRP (frizzled-CRD + netrin domains) in the complete genome of Amphimedon queesnlandica but presence of a very divergent sequence in the sponge Lubomirski baicalensis (Adell et al. 2007) of unclear homology.

Phylogenetic analyses of the complete amino acid domain datasets support polyphyly of netrins and sFRPs. (A) Netrin domain maximum likelihood (ML) analysis under a WAG + Γ(8) + I model (111 aa, 101 sequences, − ln L 21377.15); (B) LamininNT-3EGF supra-domain ML analysis under a model WAG + Γ(8) + I (363 aa, 99 sequences, −ln L 50081.22); (C) Frizzled-CRD domain ML analysis under a model LG + Γ(8) + I (112 aa, 87 sequences, −ln L 10857.68). For deep branches, nonparametric bootstrap values BP (ML)—500 replicates—are indicated on the left (A) or above the branches (B and C), and Bayesian posterior probability (PP) are indicated on the right or below the branches. Asterisks indicate branches with maximum support for both BP (ML) and PP. A dash indicates branches with BP (ML) < 50% and PP < 70%. (B) Values in parenthesis correspond to BP (ML) and PP values from analyses without Amphimedon and Monosiga sequences. For other branches, black dot indicates PP ≥ 90%, yellow dot indicates PP ≥ 95% and BP (ML) ≥ 90%. The scale bar indicates the estimated number of substitution per site. Consistent grouping of netrin and sFRP subfamilies in individual domain phylogenies are highlighted in red and green, respectively. (A–C) Domain composition of proteins are sketched next to each subgroup and are oriented N- to C-terminal from top to bottom in A and from left to right in B and C. Size of netrin and sFRP protein sketches are double that for the other proteins. The two first letters of gene names in B and C correspond to the first letters of genus and species names (see Materials and Methods). Distribution of Frizzled-CRD, LamininNT, and TIMP/Netrin Domains and sFRP and Netrin Proteins in Sequenced Genomes Note.— +, domain or protein present in all genomes checked; +/−, domain or protein present in some genomes checked; −, domain or protein not present in genomes checked. aNot present in the genome of Oikopleura dioica. bNot present in insect genomes. cNot present in the genome of Hydra magnipapillata. dNo true sFRP (frizzled-CRD + netrin domains) in the complete genome of Amphimedon queesnlandica but presence of a very divergent sequence in the sponge Lubomirski baicalensis (Adell et al. 2007) of unclear homology. To investigate this hypothesis, we next analyzed the N-terminal supra-domain of the netrin proteins, LamininNT-EGF (one LamininNT + 3 EGF domains), which is also present in Laminin proteins. LamininNT domains alone, or coupled with three EGF domains, are present only in metazoan and choanoflagellate genomes (table 1). Phylogenetic analyses of the LamininNT-EGF supra-domain lead to a strongly supported topology (fig. 1B). As in the netrin domain phylogeny, analyses of the LamininNT-EGF supra-domain suggest polyphyly of netrin proteins. ML and Bayesian analyses are highly congruent and place netrin-1/2/3/5 as the sister group of eumetazoan Laminin-γ, netrin-4 as the sister group of Laminin-β, and netrin-G (a chordate specific group of netrin proteins that actually lack a netrin domain, see Nakashiba et al. 2000) inside the clade composed of Laminin-β/γ-like (fig. 1B and supplementary fig. S3, Supplementary Material online), a recently described group of Laminins (Fahey and Degnan 2012). According to this phylogeny in which netrin clades are nested within paraphyletic groups of laminin, and laminin proteins of Porifera are sister groups to eumetazoan netrin–laminin protein subgroups, the N-terminal supra-domains of netrins are unambiguously derived from laminin proteins. In sFRP proteins, the Netrin domain is combined with another CRD, the frizzled domain. This domain is also present in the frizzled family of G protein-coupled receptors (Frizzled and Smoothened), where it is coupled with the frizzled-7 transmembrane (frizzled-7TM) domain, and in several other proteins (e.g., the receptor tyrosine kinases Musk and ROR). The domain is present in many eukaryote genomes, but the combination with a Netrin domain, as in sFRPs, is restricted to metazoans, whereas the combination with a frizzled-7TM domain occurs in metazoans, “non-Dikarya” fungi and amoebozoans (Dictyosteliida) (table 1). sFRPs are unambiguously derived from frizzled-type proteins as the two sFRP subgroups cluster inside clades of frizzled sequences (fig. 1C and supplementary fig. S5, Supplementary Material online), and the phylogeny of the second domain present in Frizzled receptors (frizzled-7TM domain) clearly groups all frizzled-like sequences together within the superfamily of G protein-coupled receptors (data not shown). Consistent with the analysis of the Netrin domain, ML analyses of the frizzled domain support polyphyly of the two distinct groups of sFRPs, sFRP-1/2/5 and sFRP-3/4 (fig. 1C). Thus, the combined phylogenetic analyses of the domains present in netrins and in sFRPs suggest that the domain architecture of both protein groups evolved two times independently.

Assessing the Strength of Netrin and sFRP Polyphyly Hypotheses

Independent evolution of the domain architecture in different Netrin and sFRP protein groups is not a parsimonious scenario and, therefore, needs to be thoroughly tested. We rigorously assessed the support for the polyphyly hypothesis using three approaches. First, we analyzed each domain at the amino acid and nucleotide levels using different reconstruction methods. This allowed us to use different types of models and data for the same alignment and to detect reconstruction artifacts such as those caused by convergence at the amino acid level due to functional constraint (Li et al. 2010). Second, we reduced the number of sequences by removing outgroups, long branches, and particularly unstable sequences in bootstrap replicates. Removal of distant and divergent sequences can lead to more stable and accurate phylogenies (Gatesy et al. 2007). Third, we performed nonparametric (AU test) and parametric (parametric bootstrap test—SOWH test) likelihood-based statistical tests to assess the strength of the signal supporting sFRP and Netrin polyphyly. When analyzing the netrin domain, polyphyly for sFRP and Netrin proteins was obtained in ML and Bayesian analyses of the amino acid and nucleotide datasets (fig. 1A and supplementary figs. S1 and S2, Supplementary Material online). In most of these analyses netrin-4 was the sister group of sFRP-3/4. However, the ML bootstrap or Bayesian posterior probability values for this topology were low. Removal of fast-evolving and outgroup sequences led to similar polyphyletic topologies for sFRPs and netrins, but it did not increase the bootstrap and Bayesian support values for the deep nodes (fig. 2A and B and supplementary fig. S7, Supplementary Material online). AU and parametric bootstrap tests both failed to reject a topology where netrins and/or sFRPs are monophyletic (table 2), but they excluded grouping of netrin-4 and deuterostome netrin-1/2/3/5, suggesting that the netrin domain of netrin-4 is not derived from the netrin domain of netrin-1/2/3/5.

Polyphyly of netrins and sFRPs is confirmed in reduced amino acid (A, C, and E) and nucleotide (B, D, and F) datasets. Unstable, fast-evolving and outgroup sequences were excluded from the datasets before re-analyses. (A and B) Netrin domain ML analysis under WAG + Γ(8) + I (111 aa, 57 sequences, −ln L 11711.10) and GTR + Γ(8) + I (333 nt, 57 sequences, −ln L 19511.50) models; (C and D) LamininNT-3EGF supra-domain ML analysis under WAG + Γ(8) + I (363 aa, 61 sequences, −ln L 30275.53) and GTR + Γ(8) + I (1089 nt, 61 sequences, −ln L 58079.49) models; (E and F) frizzled-CRD domain ML analysis under LG + Γ(8) + I (112 aa, 56 sequences, −ln L 5969.88) and GTR + Γ(8) + I (1089 nt, 56 sequences, −ln L 13272.39) models. For deep branches, nonparametric bootstrap values BP (ML)—500 replicates—and Bayesian PP are indicated above and below the branches, respectively. Asterisks indicate branches with maximum support for both BP (ML) and PP. A dash indicates branches with BP (ML) < 50% and PP < 70%. For other branches, PP ≥ 90% are indicated by a black dot, and PP ≥ 95% + BP (ML) ≥ 90% are indicated by a yellow dot. The scale bar indicates the estimated number of substitution per site. Results of the Approximately Unbiased and Parametric Bootstrap Tests for Comparison of Alternative Phylogenetic Hypotheses Note.—ML analyses on complete amino acid datasets under WAG + Γ(8) + I model by PhyML. Constrained analyses performed by RaxML. Log-likelihood values recalculated by PhyML using model, topology, and free parameters from RaxML analyses. See supplementary fig. S11, Supplementary Material online for details about the parametric bootstrap analyses. Bold values indicate significant results at the 5% level. Analyses of the LamininNT-EGF supra-domain led to a strongly supported topology (fig. 1B), with congruence between phylogenetic analyses of amino acid and nucleotide datasets (supplementary figs. S3 and S4, Supplementary Material online). Reduction of the sequence and species sampling did not modify the general relationships between laminin and netrin subgroups and led to maximal support in MP and ML bootstraps and Bayesian analyses for grouping the laminin and netrin subgroups (fig. 2C and D and supplementary fig. S8, Supplementary Material online). We also analyzed the LamininNT domain and the three EGF domains in two separate datasets and obtained in both cases the same netrin–laminin subgrouping (data not shown). Furthermore, when only synonymous substitutions (third codon positions—reduced dataset) of the LamininNT–EGF supra-domain were analyzed in ML, netrin-1/2/3/5, netrin-4, and netrin-G were still sister groups of laminin-γ, laminin-β, and laminin-β/γ-like, respectively (data not shown). In this analysis, some netrin-1/2/3/5 sequences of Danio and Lottia and Netrin-4 of Danio were not clustering together or with any group of Netrin, highlighting the poor conservation of the third codon positions. Nevertheless, this result ruled out the possibility of a reconstruction artifact due to convergent selection pressure at the amino acid level between netrin and laminin subgroups. Both the AU and parametric bootstrap tests strongly rejected grouping of the three netrin subfamilies or the grouping of any of the three possible pairs (see table 2). These results show that polyphyly of netrin proteins, in LamininNT-EGF and Netrin domain phylogenies, is consistently obtained in complete and reduced amino acid and nucleotide datasets and with different reconstruction methods. However, statistical tests rejected monophyly of netrins only for the LamininNT-EGF supradomain and not for the netrin domain. Polyphyly of sFRPs in the analyses of the frizzled-CRD domain was recovered in ML analyses of the complete datasets but with low support values (BP and PP <50%, fig. 1C). In ML analyses of both nucleotide and amino acid datasets, we obtained sFRP-1/2/3/5 as the sister group to a clade containing sFRP-3/4 sequences and four subgroups of frizzled: frizzled-1/2/7-3/6, frizzled-4, frizzled-5/8, and frizzled-9/10 (supplementary figs. S5 and S6, Supplementary Material online). In the reduced and unrooted datasets, sFRPs were still polyphyletic with a bipartition into sFRP3/4-frizzled5/8-frizzled1/2/7-3/6 and sFRP1/2/5-frizzled4-frizzled9/10. This topology was supported in both amino acid and nucleotide reduced datasets (fig. 2E and F, supplementary fig. S9, Supplementary Material online) with high Bayesian probability (amino acid [aa] dataset: 96%; nucleotide [nt]: 98%) and moderate ML bootstrap values (aa: 73%, nt: 64%). Parsimony analyses provided a resolved topology only with the reduced dataset, showing the same bipartition but with very low bootstrap values (<50% for both aa and nt datasets). ML analysis of the third codon position also led to a topology where sFRP was polyphyletic, but this analysis failed to recover monophyletic frizzled subgroups. The difference in log likelihood between the two competing hypotheses was small (table 2), and AU and parametric bootstrap tests did not reject the hypothesis of monophyly of sFRPs in the complete (table 2) or in the reduced dataset (AU test P = 0.376). These analyses show that the polyphyly of sFRP in netrin and frizzled-CRD domains is consistent across methods and sampling but that monophyly cannot be ruled out statistically using the current phylogenetic methods. To assess the significance of the observed polyphyly, we therefore tested whether different types of known tree reconstruction artifacts might affect the topology of the obtained trees.

Analysis of Substitution Saturation of the Domains

Accumulation of multiple substitutions at the same position over time erases the true phylogenetic signal and can cause tree reconstruction artifacts. When multiple substitutions affect most of the positions, the dataset can become mutationally saturated (Jeffroy et al. 2006). We performed a saturation analysis of the different domains at the amino acid (fig. 3) and nucleotide level on the ML trees (supplementary fig. S10, Supplementary Material online). The slope of the linear regression between the numbers of observed differences (y axis) and inferred substitutions (x axis) is proportional to the quantity of homoplasy present in the data. Saturation can be detected when the number of inferred substitutions increased, whereas the number of observed differences remains constant (plateau shape and slope close to zero).

Netrin, LamininNT-EGF, and frizzled-CRD domains display a significant level of substitution saturation. Estimation of the substitution saturation of the domains netrin (A), LamininNT-EGF (B), and frizzled-CRD (C) at the amino acid level (complete datasets) as a ratio between inferred (x axis) and observed (y axis) differences for each pair of sequences. Inferred number of substitutions between pairs of sequences were determined using parsimony on the best ML trees. White squares and grey diamonds represent netrin-1/2/3/5-netrin-4 and sFRP-1/2/5-sFRP-3/4 pairwise comparison, respectively. Data points on the straight line X = Y correspond to completely unsaturated comparisons. Saturation in the complete dataset of the netrin domain appeared high at both the amino acid (slope = 0.2505; fig. 3A and supplementary fig. S10, Supplementary Material online) and nucleotide (slope = 0.2687; supplementary fig. S10, Supplementary Material online) levels with most of the pairwise comparison located on a plateau (slope = 0.0391). This pattern indicates that saturation is reached even for comparisons between relatively closely related proteins, and this is probably causing the difficulties in reconstructing the phylogeny of this domain with accuracy. The LamininNT-EGF supra-domain (slope = 0.2992, fig. 3B and supplementary fig. S10, Supplementary Material online) and frizzled-CRD (slope = 0.3783, fig. 3C and supplementary fig. S10, Supplementary Material online) domain complete datasets were also saturated, but less so than the netrin domain. Saturation appeared to be slightly lower in the amino acid datasets than in the nucleotide datasets and lower in the reduced than in the complete datasets (slope for the reduced amino-acid domain datasets: netrin: 0.2798, LamininNT-EGF: 0.3779, and frizzled-CRD: 0.4739; supplementary fig. S10, Supplementary Material online). This indicates that although the saturation observed in these domains is partially due to fast-evolving sequences and distant outgroups, it is mainly due to the great evolutionary distance separating the proteins containing each of these domains.

Identification of Sites Most Influencing Netrin and sFRP Polyphylies

To investigate from which sites the signal for polyphyly originates for each domain, and test for conflicting signal (phylogenetic vs. nonphylogenetic), we computed the difference in log likelihood per-site (Δpsln L) between ML analyses with or without monophyletic constraint for netrin and sFRP on the complete amino acid datasets. In figure 4, sites with positive y-axis values have a higher likelihood for the unconstrained topology in which netrin or sFRP is polyphyletic, whereas sites with negative y-axis values have a higher likelihood for the constrained topology in which netrin or sFRP is monophyletic.

Distribution of the polyphyly versus monophyly signal for netrins and sFRPs. Differences in log likelihood per-site (Δpsln L) between unconstrained and constrained maximum likelihood analyses of (A) LamininNT-EGF supra-domain, with netrin-1/2/3/5 + netrin-4 + netrin-G constrained as monophyletic; (B) LamininNT-EGF and netrin domains, with netrin-1/2/3/5 + netrin-4 constrained as monophyletic; (C) frizzled-CRD and netrin domain, with sFRP-1/2/5 + sFRP-3/4 constrained as monophyletic. The x axes correspond to the alignment columns along the complete amino acid matrices and the y axes correspond to the Δpsln L between unconstrained and constrained ML analyses. The sites with positive y axis values have a higher likelihood for the unconstrained topology in which netrin or sFRP is polyphyletic, whereas the sites with negative y axis values have a higher likelihood for the constrained topology in which netrin or sFRP is monophyletic. When analyzing the LamininNT-EGF supra-domain, we obtained a clear majority of site supporting the polyphyly of the three netrin protein groups (netrin-1/2/3/5 + netrin-4 + netrin-G, fig. 4A) and polyphyly of the netrins sensu stricto (netrin-1/2/3/5 + netrin-4, fig. 4B), with the signal for polyphyly being stronger in the LamininNT domain than in the three EGF domains. Most of the sites were in favor of polyphyly (64%) but some conflicting sites with strong signal against polyphyly were found distributed throughout the protein sequence. For the netrin domain of netrins, the majority of sites (55%) were in favor of netrin polyphyly, and these sites with positive values were not clustered on the gene sequence, arguing against conflict due to gene conversion (fig. 4B). In sFRP proteins, a narrow majority of sites in both the frizzled-CRD and the netrin domain supported polyphyly (54% for the frizzled-CRD and 55% for the netrin domain, fig. 4C). As for netrin proteins, the sites in favor of sFRP polyphyly in frizzled-CRD and netrin domain were not spatially restricted. To exclude that the polyphyly of netrins and sFRPs is due to few sites with very high Δpsln L values, we progressively removed sites with the highest and lowest Δpsln L values from the top 5% to 25% (10%–50% removed sites in total) and performed ML analyses on these reduced datasets. In the domain datasets, the 25% highest plus 25% lowest Δpsln L sites represented most of the total sum of the absolute Δpsln L between the competing hypotheses (82% for LamininNT-EGF, 91.5% for frizzled-CRD, and 84% for netrin domain). For the LamininNT-EGF domain, removing these sites had no influence on the relationships between laminin and netrin subgroups. Only a few sequences with long branches had variable positions in the different replicates, in particular, the laminin sequences of the poriferan A. queenslandica. For the netrin domain, removing the most influential sites did not affect sFRP or netrin polyphyly but had some impact on the relationships between the different protein groups. However, in all analyses, we obtained netrin-4 and sFRP-3/4 as closely related. For the frizzled-CRD domain, removing the most influential sites did not lead to monophyly of sFRP. However, after removing 50% of the most influential sites, the rooting changed from being the sister of sFRP-1/2/5 to the sister of frizzled-3/6. This was probably the result of a long-branch attraction (LBA) artifact since the frizzled-3/6 clade contained only vertebrate sequences and has the longest branch of the ingroup. In all the resulting topologies, ingroup sequences were subdivided into the same two groups as in the complete analysis: sFRP-3/4, frizzled-5/8, frizzled-1/2/7-3/6 and sFRP-1/2/5, frizzled-4, and frizzled-9/10. All together, these analyses show that the observed polyphylies of Netrin and sFRP proteins are not caused by a dominating influence of few sites with exceptionally high Δpsln L values and do not originate from restricted clusters of sites, and therefore, the observed polyphyletic groupings were not caused by gene conversion.

Netrin and sFRP Polyphylies Are Not Caused by Classical Tree Reconstruction Artifacts

For both LamininNT-EGF and frizzled-CRD domain datasets, we could detect a certain amount of substitution saturation and conflict between sites (figs. 3 and 4), possible indications of systematic bias affecting the topology. Three major biases that strongly affect phylogenetic reconstruction have been described: 1) heterogeneity in base composition (Foster and Hickey 1999; Delsuc et al. 2005; Sheffield et al. 2009); 2) LBA, artificially grouping sequences that share high evolutionary rates (Bergsten 2005; Delsuc et al. 2005); and 3) heterotachy that refers to shifts in site-specific evolutionary rates over time and can lead to the grouping of sequences that share covariant sites (Lopez et al. 2002; Philippe et al. 2005). To clarify the nature of the signal supporting netrin and sFRP polyphylies, we assessed to which extent the phylogenetic reconstruction of LamininNT-EGF and frizzled-CRD datasets were influenced by these tree reconstruction artifacts. First, we analyzed heterogeneity in base composition of the three domain datasets: netrin, frizzled-CRD and LamininNT-EGF. We could not detect significant variation for base composition among protein groups in the different nucleotide and amino acid datasets using chi-square test, rejecting this possible source of tree reconstruction artifacts. To address the possibility that LBA artifacts cause the sFRP and netrin protein polyphylies in the LamininNT-EGF and frizzled-CRD datasets, we selectively analyzed slow evolving sites. These sites are known to retain better phylogenetic signal and are less subject to LBA (Brinkmann et al. 2005). We used a method derived from the slow–fast method (Brinkmann and Philippe 1999) to sort the characters according to the sum of their evolutionary rate in monophyletic groups (see Materials and Methods). This method does not consider the deep nodes under study and thus avoids problems of circularity. In both LamininNT-EGF and frizzled-CRD datasets, the slowest evolving 20% of sites had almost no signal for or against netrin and sFRP polyphyly. Most of the signal in favor of polyphyly came from moderately slow evolving sites (fig. 5A–D). Interestingly, in both cases, most of the signal in favor of monophyly was found to come from the fast-evolving sites (fig. 5B and D). When removing 10%–70% of sites in the LamininNT-EGF dataset, starting from the fastest evolving, branch support for Netrin-1-Laminin-γ and Netrin-4-Laminin-β stayed ≥90% (fig. 5E). In none of the bootstrap replicates was monophyly of Netrin obtained. Furthermore, the difference between observed and inferred differences in the slowest evolving 30% of sites was reasonably low (fig. 5F, slope = 0.4652, to compare with 0.2992 for the dataset without site deletion), showing that sites with a low level of saturation also supported netrin polyphyly. Inversely, when removing the character in the opposite order (starting from the slowest to the fastest), the ML bootstrap support was below 90% for laminin-γ-netrin-1 after removing only 50% of sites (data not shown). In the frizzled-CRD datasets, removing up to 50% of the fastest evolving sites of the “reduced” dataset did not significantly affect the topology or the support values, with moderate support for sFRP polyphyly (fig. 5G, BP–ML: 62%; PP: 92%) and no support for sFRP monophyly still retained (BP–ML: 0%–1%). However, the level of saturation strongly decreased (fig. 5H, slope = 0.6402, to compare with 0.4739 for the dataset without site deletion). Conversely, removing the slow-evolving sites led to a strong increase in support for sFRP monophyly (fig. 5G, BP–ML: 24%–22% after removing 50%–60% of the slowest evolving sites) and of the saturation level (slope of the plateau = 0.1719, data not shown). These analyses show that most of the signal in favor of sFRP and netrin polyphyly comes from slowly evolving sites and that most of the signal in favor of monophyly comes from fast-evolving and saturated sites, clearly arguing against an LBA artifact as the cause for the polyphyletic tree topologies.

Polyphylies of netrins and sFRPs are supported by slow-evolving sites and are not caused by heterotachy in the ML analyses of the LamininNT-EGF (A, B, E, F, I, and J) and frizzled-CRD (C, D, G, H, K, and L) amino acid datasets. (A and C) Proportion of sites for each rate category, corresponding to the calculated number of steps in seven monophyletic groups using parsimony. For displaying purpose, each category contains two merged sequential values. (B and D) Cumulated difference in log likelihood per-site between unconstrained and constrained (B: netrin-1-4 monophyletic; D: sFRP-1/2/5-3/4 monophyletic) ML analysis for all sites within each rate category. (E and G) “Evolution” of the ML bootstrap support values (100 replicates) as fast-evolving sites are progressively removed from the original dataset; (E) 90% of bootstrap support is figured by a dotted line; (G) the “evolution” of BP-ML support value for sFRP monophyly is also indicated as slow-evolving sites are progressively removed from the original dataset. (F and H) Estimation of the mutational saturation as a ratio between inferred (x axis) and observed differences (y axis) for each pair of sequences in the LamininNT-EGF (F) and frizzled-CRD (H) datasets containing, respectively, the 30% and 50% slowest evolving sites. Data points on the straight line X = Y correspond to completely unsaturated comparisons. Data coming from the analyses of the 30% slowest evolving sites of the LamininNT-EGF dataset (in A, B, E, and F) and of the 50% slowest evolving sites of the frizzled-CRD dataset (in C, D, G, and H) are shaded. (I and K) Histogram of the absolute difference of steps per site calculated between the netrin-1-laminin-γ and netrin-4-laminin-β clades for the LamininNT-EGF dataset (I) and between the frizzled-5/8-frizzled-1/2/7-3/6-sFRP-3/4 and frizzled-4-frizzled-9/10-sFRP-1/2/5 clades for the frizzled-CRD dataset (K). (J and L) Cumulated difference in log likelihood per-site between unconstrained and constrained (netrin-1-4 monohyletic in J; sFRP-1/2/5-3/4 monophyletic in L) ML analysis for all sites within each “Δsteps per site” category. Data coming from the analyses of the 70% nonheterotachous sites of the LamininNT-EGF dataset (I and J) and of the 84% nonheterotachous sites of the frizzled-CRD dataset (K and L) are shaded. Finally, we assessed the level of heterotachy in the LamininNT-EGF and frizzled-CRD amino acid sites and could also exclude its influence on the topologies. For the LamininNT-EGF supra-domain, we compared the difference in the number of substitution steps per site between the laminin-β-netrin-4 and laminin-γ-netrin-1 clades and sorted sites according to their level of heterotachy between the laminin and netrin subgroups (fig. 5I). Using a chi-square test, we identified sites with a difference in number of steps below five between two groups, as nonheterotachous. They account for approximately 70% of sites. The nonheterotachous sites showed a clear signal in favor of netrin polyphyly, contrary to heterotachous positions (fig. 5J). Furthermore, we found that the laminin–netrin groupings were still strongly supported (BP–ML: 98%–100%; PP: 100%) after removing all the heterotachous positions. Similarly, sorting sites according to their level of heterotachy between the frizzled-5/8-frizzled-1/2/7-3/6-sFRP-3/4 and frizzled-4-frizzled-9/10-sFRP-1/2/5, we could define that 84% of sites in the frizzled-CRD domain were homotachous (sites with difference in number of steps ≤6—fig. 5K). Both homotachous and heterotachous positions provided signal for and against sFRP polyphyly (fig. 5L). However, the same frizzled-sFRP grouping was recovered after removing all heterotachous positions (BP–ML: 57%; PP: 97%). These analyses exclude the possibility that sFRP and netrin polyphylies are due to a reconstruction artifact caused by heterotachy.

Netrin Receptors Phylogeny

Phylogenetic analyses of the neogenin and Unc5 receptors reveal a more classical evolutionary history, with a unique origin in the Cnidaria–Bilateria ancestor and diversification through gene duplication (supplementary figs. S12 and S13, Supplementary Material online). We did not find these receptors in genomes of nonmetazoans, poriferans, or placozoans. For both proteins, diversification occurred in the vertebrates, probably caused by the two genome duplications at the base of vertebrates (reviewed in Kasahara 2007). These events led to the formation of neogenin and DCC and Unc5A, B, C, and D.

Discussion

Repeated Evolution of Domain Architecture of Netrin and sFRP Proteins

The results of the different phylogenetic analyses and statistical tests on the LamininNT-3EGF supra-domain strongly support a scenario where the N-terminal supra-domain (one LamininNT + 3 EGF domains) of netrins evolved independently three times from the C-terminal part of different laminins: netrin-1/2/3/5 from laminin-γ, netrin-4 from laminin-β (fig. 6A) and netrin-G from laminin-β/γ-like. Laminin-β/γ-like is a newly described group of laminins that shares structural similarities with both laminin-β and laminin-γ and is present in eumetazoans with the exception of ecdysozoans, urochordates, and vertebrates (Fahey and Degnan 2012; supplementary fig. S3, Supplementary Material online). For netrin-1/2/3/5 and netrin-4, the N- terminal part of laminin fused C-terminally to a netrin domain, whereas netrin-G acquired a short and unique CRD (C domain) (Yin et al. 2002). The relatively poor resolution of the netrin domain phylogeny does not allow us to unambiguously determine the origin of the netrin domains found in netrin-1/2/3/5 and netrin-4. However, we could exclude that the netrin domain in netrin-4 is derived from the netrin domain of the older netrin-1/2/3/5. Thus, the domains of these two groups of netrins have completely independent origins.

Evolutionary scenario for the origin and evolution of netrins and sFRPs. Schematic representation of expansion of (A) netrins and (B) sFRP within one evolutionary lineage by both convergent domain shuffling and gene duplication. Note that diversification of laminin and frizzled proteins in vertebrates and origin and diversification of laminin-α, β/γ-like and netrin-G have been omitted. The phylogenetic distribution of the different netrin groups suggests that netrin-1/2/3/5 was present in the ancestor of eumetazoans (Bilateria, Cnidaria, and Placozoa), and the phylogeny of the LamininNT-EGF supra-domain further shows that netrins did not originate before the common ancestor of Eumetazoa. The netrin-1/2/3/5 group expanded at the base of the vertebrates by gene duplication. Netrin-4 was most likely present in the ancestor of deuterostomes (fig. 6A), although an earlier occurrence followed by multiple losses cannot be ruled out. Netrin-G1 was present in the ancestor of chordates. For sFRPs, our phylogenetic reconstruction of both Netrin and Frizzled-CRD domains suggest an independent origin for sFRP-1/2/5 and sFRP-3/4 before the last common ancestor of eumetazoans (fig. 6B). For both domains, we could not find obvious reconstruction bias, but statistical tests were not able to reject monophyletic tree topologies. The weak phylogenetic signal is probably due to the short size of these two domains and the ancestry of the domain recombination events. Furthermore, we could show that most of the signal in favor of an independent origin of sFRP-1/2/5 and sFRP-3/4 in the frizzled-CRD domain came from slow-evolving, nonsaturated sites that are more likely to retain genuine phylogenetic signal (Jeffroy et al. 2006), whereas signal in favor of a single origin of all sFRPs was mostly provided by fast-evolving and mutationally saturated sites. These analyses provide evidence in favor of repeated evolution for sFRPs and highlight the relevance of detailed phylogenetic analyses, in addition to statistical tests, for the identification of independent domain architecture evolution. Our phylogenetic analyses support a scenario with four frizzled (frizzled-4, frizzled-9/10, frizzled-5/8, and frizzled-1/2/3/6/7) and two sFRPs (sFRP-3/4 and sFRP-1/2/5) genes in the ancestor of cnidarians and bilaterians. sFRP-3/4, but not sFRP-1/2/5, is also present in the placozoan T. adhaerens (with a truncated frizzled-CRD domain not included in the phylogenetic analyses), while both groups of sFRPs are absent from the sequenced ctenophore (Mnemiopsis leidyi; Pang et al. 2010) and sponge genomes (Adamska et al. 2010; Srivastava et al. 2010). The A. queenslandica proteins annotated as sFRP (ADO16571-16574) do not contain a netrin domain but only a single CRD domain and are thus not genuine sFRPs. One poorly conserved sFRP sequence from the freshwater sponge Lubomirski baicalensis has been reported, composed of a highly divergent frizzled-CRD domain and a putative netrin domain (Adell et al. 2007). We were unable to assign this sequence to any group in the netrin domain or frizzled-CRD domain phylogenies (not shown), thus we could not determine its origin. Therefore, it remains possible that sFRPs originated earlier, in the ancestor of metazoans, but sequencing of additional poriferan genomes or transcriptomes is required to answer this question. It is important to note that the hypothesis of repeated evolution of netrin and sFRP domain architectures does not rely on the phylogeny of the netrin domain. We have shown that this domain is saturated and does not provide a reliable phylogenetic signal contrary to the other domains analyzed, which are clearly in favor of the polyphyly hypothesis (see results). However, even if the netrin domains of netrins and sFRPs were monophyletic, the LamininNT-EGF domain of netrins and the frizzled-CRD domain of sFRP would still have been combined twice independently with the same netrin domain. Consequently, even in this scenario, the identical domain architecture of the different netrin subgroups arose by convergent evolution and not by gene duplication.

Possible Mechanism for the Evolution of the Netrin and sFRP Domain Architectures

Recently, it has been shown that the inclusion of coding exons of neighboring genes is the prevalent mode for the gain of domains in metazoan proteins (Buljan et al. 2010). These events of gene fusion are typically preceded by the duplication of the “donor” domain and its recombination to a position adjacent to the “host” protein (Buljan et al. 2010). Our data suggest that the domain architectures of netrins and sFRP may have evolved by this mechanism. The N-terminal addition of a laminin-derived LamininNT-EGF supra-domain or a frizzled-CRD domain to a single netrin domain is the most parsimonious explanation, because the C-terminal addition of a netrin domain to a LamininNT-EGF supra-domain or a frizzled-CRD domain would require an additional loss of the C-terminal domains of the “hosts” laminins and frizzled. However, as we were unable to establish the exact relationships between the netrin domains of netrins/sFRPs and the netrin-domain-only proteins TIMP (the potential “host” protein), unambiguous support for this scenario is not available. Finally, the presence of conserved introns in both the Netrin and the LamininNT-EGF domains clearly argues against the involvement of retrotransposition as a possible mechanism for the origin of netrins.

Functional Convergences of Netrin and sFRP Proteins Result From the Convergences of Domain Architecture

Domain architecture is thought to be a determining factor for the functional properties of a protein, and thus, multidomain protein with the same domain architecture is expected to have similar functions (Bashton and Chothia 2007). This is what is indeed observed for many paralogous proteins; however, in paralogs, the shared domain architecture is a consequence of a shared evolutionary history. Thus, the described independent origin of netrins provides an intriguing confirmation of the importance of domain architecture for protein function. In fact, members of both the netrin-1/2/3/5 and netrin-4 subgroups are secreted molecules that bind to DCC/neogenin and Unc5 transmembrane receptors (Koch et al. 2000; Qin et al. 2007; Lejmi et al. 2008; Staquicini et al. 2009) and function in netrin signaling-mediated axon guidance and angiogenesis (Koch et al. 2000; Qin et al. 2007; Rajasekharan and Kennedy 2009). Strikingly, neither Laminin, Netrin-G (lacking the netrin domain and binding to specific netrin-G ligands, see Seiradake et al. 2011) nor TIMP (proteins composed of the netrin domain only) have been shown to bind DCC/neogenin and Unc5 proteins or to function in this signaling pathway (Rajasekharan and Kennedy 2009; Brew and Nagase 2010). Furthermore, we could show that contrary to netrin ligands, the primary netrin receptors, neogenin and Unc5, have both a unique origin in the ancestor of Eumetazoa with diversification through gene duplication in the ancestor of vertebrates (supplementary fig. S13, Supplementary Material online). Because netrin-1/2/3/5 and netrin-4 subgroups originated independently, this shared binding property cannot be explained by a conserved function present in the ancestor of these proteins, but most probably is the consequence of the convergent domain architecture. In addition to DCC/neogenin and Unc5, netrins from both subgroups have been shown to bind to Integrin alpha3beta1 (Yebra et al. 2003; Stanco et al. 2009; Yebra et al. 2011), where, at least in the case of netrin-1, this interaction is mediated by the netrin domain. However, because the netrin domain-only protein TIMP2 has also been shown to bind to Integrin alpha3beta1 (Seo et al. 2003), this shared function of netrins might not reflect a consequence of their shared domain architecture but rather an ancestral property of netrin domains. Accepting the polyphyletic origin for sFRPs, they constitute a similar example for functional convergence based on convergence of domain architecture. sFRP proteins have been extensively described as inhibitors of the Wnt signaling pathway, and they bind to secreted Wnt proteins and thereby prevent the interaction of Wnts with frizzled transmembrane receptors (Bovolenta et al. 2008; Mii and Taira 2011). This mechanism appears to have a conserved function in axial patterning in Metazoa (Petersen and Reddien 2009). Both sFRP-1/2/5 and sFRP-3/4 proteins bind to and antagonize signaling molecules of the Wnt family (reviewed in Bovolenta et al. 2008), and recent studies show that both the frizzled-CRD and netrin domains of sFRP-1/2/5 and sFRP-3/4 proteins are necessary for optimal Wnt inhibition (Lin et al. 1997; Bhat et al. 2007; Lopez-Rios et al. 2008).

General Considerations on the Convergence of Domain Architecture

Identical domain architecture of multidomain proteins is frequently considered as evidence for paralogy when occurring in one genome, and for orthology when occurring in the genomes of different taxa (e.g., for sFRPs, Adamska et al. 2010). These simplistic assignments may confound the inference of the evolutionary origin of multidomain proteins and their associated cellular functions. To the best of our knowledge, current terminology does not cover the independent evolution of identical domain architecture (Koonin 2005). As the different parts (domains) of these proteins have different evolutionary histories, we propose the concept of “merology” (derived from the Greek word “méros” meaning part and “logos” meaning relation) to describe the repeated evolution of similar domain architecture and “merologous proteins” to refer to nonhomologous proteins that display the same domain organization. A study using phylogenetic trees of domains from 96 genomes of Bacteria, Archaea, and Eukaryota has suggested that convergent evolution of domain architecture may occur more frequently than previously suspected (Forslund et al. 2008). Depending on the criteria used for the generation of protein datasets, between 5.6% and 12.4% of domain architectures were identified as candidates for convergent evolution. The cases of netrins and sFRPs described in detail here belong to a particular subset of these events for two reasons. First, only one-third of the documented cases included the independent gain of domains, as is the case for netrins and sFRPs. Second, the repeated evolution of netrins and sFRPs occurred within the same genomic background, that is, the netrin-4 group evolved in a genome in which the netrin-1/2/3/5 group was already present. This is contrary to most cases described by Forslund et al. (2008), in which the same domain architecture evolved in different taxa. The phylogenetic distribution of merologous proteins identified by Forslund et al. (2008) suggests that many of them originated relatively recently. The cases in which merologs evolved recently could help to understand the genomic mechanisms that promote this type of convergence, for example, whether particular features of “host” and “donor” genes predispose them to recombine with each other. In addition, studies of merologous proteins, in particular those displaying functional convergence, could add a new perspective to the understanding of the relationship between domain architecture and protein function. Currently, research on proteins with shared domain architecture focuses on duplicated paralogs undergoing structural and functional divergence. In the case of merologs, the situation is reversed: proteins originate from ancestral sequences with different domain architecture and probably different functions and converge to similar structures and potentially similar functions. Thus, merologs are particularly interesting cases that may help to explain why only a fraction of all possible domain combinations exists and why some domains are more frequently found in multidomain proteins than others (Basu et al. 2008).

Supplementary Material

Supplementary tables S1 and S2, supplementary figs. S1–S13, and supplementary material including detailed phylogenies and analyzed data sets of the netrin, frizzled-CRD, and LamininNT-EGF domains are available at Genome Biology and Evolution online (http://www.gbe.oxfordjournals.org/).

73 in total

1. An approximately unbiased test of phylogenetic tree selection.

Authors: Hidetoshi Shimodaira
Journal: Syst Biol Date: 2002-06 Impact factor: 15.683

2. An empirical assessment of long-branch attraction artefacts in deep eukaryotic phylogenomics.

Authors: Henner Brinkmann; Mark van der Giezen; Yan Zhou; Gaëtan Poncelin de Raucourt; Hervé Philippe
Journal: Syst Biol Date: 2005-10 Impact factor: 15.683

3. Domain deletions and substitutions in the modular protein evolution.

Authors: January Weiner; Francois Beaussart; Erich Bornberg-Bauer
Journal: FEBS J Date: 2006-05 Impact factor: 5.542

4. RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models.

Authors: Alexandros Stamatakis
Journal: Bioinformatics Date: 2006-08-23 Impact factor: 6.937

5. Isolation and characterization of Wnt pathway-related genes from Porifera.

Authors: Teresa Adell; Archana N Thakur; Werner E G Müller
Journal: Cell Biol Int Date: 2007-03-18 Impact factor: 3.612

6. Discovery of a functional protein complex of netrin-4, laminin gamma1 chain, and integrin alpha6beta1 in mouse neural stem cells.

Authors: Fernanda I Staquicini; Emmanuel Dias-Neto; Jianxue Li; Evan Y Snyder; Richard L Sidman; Renata Pasqualini; Wadih Arap
Journal: Proc Natl Acad Sci U S A Date: 2009-02-04 Impact factor: 11.205

Review 7. Netrins: versatile extracellular cues with diverse functions.

Authors: Karen Lai Wing Sun; James P Correia; Timothy E Kennedy
Journal: Development Date: 2011-06 Impact factor: 6.868

8. NMR structure of the netrin-like domain (NTR) of human type I procollagen C-proteinase enhancer defines structural consensus of NTR domains and assesses potential proteinase inhibitory activity and ligand binding.

Authors: Edvards Liepinsh; Laszlo Banyai; Guido Pintacuda; Maria Trexler; Laszlo Patthy; Gottfried Otting
Journal: J Biol Chem Date: 2003-04-01 Impact factor: 5.157

9. Disulfide bond assignments of secreted Frizzled-related protein-1 provide insights about Frizzled homology and netrin modules.

Authors: Jae Min Chong; Aykut Uren; Jeffrey S Rubin; David W Speicher
Journal: J Biol Chem Date: 2001-12-10 Impact factor: 5.157

10. Endothelium-derived Netrin-4 supports pancreatic epithelial cell adhesion and differentiation through integrins α2β1 and α3β1.

Authors: Mayra Yebra; Giuseppe R Diaferia; Anthony M P Montgomery; Thomas Kaido; William J Brunken; Manuel Koch; Gary Hardiman; Laura Crisa; Vincenzo Cirulli
Journal: PLoS One Date: 2011-07-29 Impact factor: 3.240

8 in total

1. Identification, molecular characterization, and in silico structural analysis of larval salivary glands Netrin-A as a potent biomarker from Lucilia sericata (Diptera: Calliphoridae).

Authors: Masoumeh Bagheri; Hamzeh Alipour; Tahereh Karamzadeh; Marzieh Shahriari-Namadi; Abbasali Raz; Kourosh Azizi; Javad Dadgar Pakdel; Mohammad Djaefar Moemenbellah-Fard
Journal: Genetica Date: 2022-09-22 Impact factor: 1.633

2. MDAT- Aligning multiple domain arrangements.

Authors: Carsten Kemena; Tristan Bitard-Feildel; Erich Bornberg-Bauer
Journal: BMC Bioinformatics Date: 2015-01-28 Impact factor: 3.169

3. A gonad-expressed opsin mediates light-induced spawning in the jellyfish Clytia.

Authors: Gonzalo Quiroga Artigas; Pascal Lapébie; Lucas Leclère; Noriyo Takeda; Ryusaku Deguchi; Gáspár Jékely; Tsuyoshi Momose; Evelyn Houliston
Journal: Elife Date: 2018-01-05 Impact factor: 8.140

4. Recurrent DCC gene losses during bird evolution.

Authors: François Friocourt; Anne-Gaelle Lafont; Clémence Kress; Bertrand Pain; Marie Manceau; Sylvie Dufour; Alain Chédotal
Journal: Sci Rep Date: 2017-02-27 Impact factor: 4.379

5. Dynamic Evolution of the Cthrc1 Genes, a Newly Defined Collagen-Like Family.

Authors: Lucas Leclère; Tal S Nir; Michael Bazarsky; Merav Braitbard; Dina Schneidman-Duhovny; Uri Gat
Journal: Genome Biol Evol Date: 2020-02-01 Impact factor: 3.416

6. Structure, phylogeny, and expression of the frizzled-related gene family in the lophotrochozoan annelid Platynereis dumerilii.

Authors: Benjamin R Bastin; Hsien-Chao Chou; Margaret M Pruitt; Stephan Q Schneider
Journal: Evodevo Date: 2015-12-04 Impact factor: 2.250

7. Evolution of bacterial protein-tyrosine kinases and their relaxed specificity toward substrates.

Authors: Lei Shi; Boyang Ji; Lorena Kolar-Znika; Ana Boskovic; Fanny Jadeau; Christophe Combet; Christophe Grangeasse; Damjan Franjevic; Emmanuel Talla; Ivan Mijakovic
Journal: Genome Biol Evol Date: 2014-04 Impact factor: 3.416

8. Analysis of the protein domain and domain architecture content in fungi and its application in the search of new antifungal targets.

Authors: Alejandro Barrera; Ana Alastruey-Izquierdo; María J Martín; Isabel Cuesta; Juan Antonio Vizcaíno
Journal: PLoS Comput Biol Date: 2014-07-17 Impact factor: 4.475

8 in total