| Literature DB >> 17868438 |
Jackie Dolan1, Karen Walshe, Samantha Alsbury, Karsten Hokamp, Sean O'Keeffe, Tatsuya Okafuji, Suzanne F C Miller, Guy Tear, Kevin J Mitchell.
Abstract
BACKGROUND: Leucine-rich repeats (LRRs) are highly versatile and evolvable protein-ligand interaction motifs found in a large number of proteins with diverse functions, including innate immunity and nervous system development. Here we catalogue all of the extracellular LRR (eLRR) proteins in worms, flies, mice and humans. We use convergent evidence from several transmembrane-prediction and motif-detection programs, including a customised algorithm, LRRscan, to identify eLRR proteins, and a hierarchical clustering method based on TribeMCL to establish their evolutionary relationships.Entities:
Mesh:
Substances:
Year: 2007 PMID: 17868438 PMCID: PMC2235866 DOI: 10.1186/1471-2164-8-320
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Bioinformatics pipeline. Figure shows starting datasets (blue), annotation programs (green) and clustering pipeline (orange) used to generate final eLRR dataset.
Figure 2Sample from list of all eLRR genes, hierarchically clustered at e. Proteins have been sorted in this table based on the clustering output from TribeMCL. This has been done hierarchically across inflation parameters, starting at 1.2, then 2, 3, 4 and 5. For most proteins this yields a tree-like structure with cluster stringency increasing (and membership decreasing) from low inflation parameters to high. Numbers used to identify clusters are generated by TribeMCL with larger clusters having lower numbers. Proteins are colour-coded by species: black, mammalian; blue, fly; red, worm. For the mammalian proteins, only the mouse orthologue is listed. The table shows examples of clusters in the LRR_Ig/FN3 group with mouse, fly and worm orthologues (the Lrig subfamily) and with mouse paralogues only (the Lrrn6, Lrrn1–3 and Lrrc4 subfamilies, which cluster together at level 1.2). It also shows many of the proteins in the LRR_Tollkin group, with the hierarchical clustering apparent across inflation parameters and indicated by shading. One subfamily containing a known and novel member is shown at the bottom. Proteins encoded by genes located in tandem in the genome are boxed in the right-hand column. A complete list of all eLRR proteins is provided [see Additional File 3]. Lists clustered at the e-25 and e-10 cutoff levels are given [see Additional Files 4 and 5].
Figure 3eLRR protein predicted architectures (part 1). Consensus architectures are shown for all proteins in the LRR_Ig/FN3 group and for all proteins in subfamilies in the LRR_Only group. An additional set of LRR_Only singletons is listed separately in Table 1. Protein names are shown below the corresponding structures (black, mammalian; blue, fly; red, worm). All figures are drawn to scale (see Key). Consensus architectures were derived for single proteins and across subfamilies from convergent evidence from motif and topology prediction programmes. Where there is a range in number of predicted LRRs or other domains across members of a subfamily, this is indicated next to the domain. A range in length of the cytoplasmic domain is similarly indicated, where it exceeds 20 amino acids. Tightly clustered subfamilies (e.g., Slits, Amigos) are listed under a single consensus architecture. Clusters with more structurally diverse proteins are indicated by the brackets; the numbers refer to e-value and inflation parameter at which the proteins cluster in the MCL programme. See Key for more information.
Figure 4eLRR protein predicted architectures (part 2). Consensus architectures are shown for all proteins in the LRR_Tollkin and LRR_Other groups. See Figure 3 legend for details.
List of LRR_Only singletons
| BC031901 | novel | 872 | SS, 7LRR, TM |
| Cd14 | 366 | SS, LRRNT, 11LRR, GPI | |
| Gp1ba | Glycoprotein 1b, alpha polypeptide | 734 | SS, LRRNT, 8LRR, LRRCT1, TM |
| Gp1bb | Glycoprotein 1b, beta polypeptide | 214 | SS, LRRNT, 2LRR, LRRCT1, TM |
| Gp9 | Glycoprotein 9 | 177 | SS, LRRNT, 2LRR, LRRCT1, TM |
| Lrg1 | Leucine-rich alpha-2-glycoprotein 1 | 342 | SS, LRRNT, 9LRR, LRRCT2 |
| Lrrc17 | 443 | SS, LRRNT, 4LRR, LRRCT1, LRRNT, 3LRR, LRRCT1 | |
| Lrrc19 | 364 | SS, LRRNT, 6LRR, LRRCT1, TM | |
| Lrrc25 | 297 | SS, 2LRR, LRRCT1, TM | |
| Nepn | Nephrocan/5730521E12Rik | 512 | SS, LRRNT, 17LRR, LRRCT1 |
| Nyx | Nyctalopin (mouse) | 476 | SS, LRRNT, 11LRR, LRRRCT1, TM |
| NYX | Nyctalopin (human) | 481 | SS, LRRNT, 12LRR, LRRRCT1, GPI |
| Omg | Oligodendrocyte myelin protein | 440 | SS, LRRNT, 7LRR, LRRCT2, GPI |
| Q7Z2Q7 | Synleurin (human) | 621 | SS, LRRNT, 13LRR, LRRCT1, TM |
| Tsku | Tsukushi/Lrrc54 | 354 | SS, LRRNT, 10LRR, LRRCT2 |
| Con | Connectin | 691 | SS, LRRNT, 11LRR, LRRCT1, GPI |
| Gp150 | Gp150 | 1051 | SS, LRRNT, 15LRR, LRRCT2, TM |
| hfw | Halfway | 611 | SS, LRRNT, 4LRR, LRRNT, 2LRR, LRRCT1 |
| wdp | windpipe | 677 | SS, LRRNT, 4LRR, LRRCT1, TM |
| CG1504 | 392 | 11LRR, LRRCT1, TM | |
| CG4781 | 469 | SS, LRRNT, 11LRR, LRRCT1, TM | |
| CG5096 | 491 | SS, LRRNT, 12LRR, LRRCT, TM | |
| CG5541 | 463 | SS, LRRNT, 6LRR, TM | |
| CG5819 | 915 | SS, LRRNT, 17LRR, LRRCT1, TM | |
| CG5888 | 455 | SS, LRRNT, 8LRR | |
| CG7702 | 537 | SS, LRRNT, 11LRR, LRRCT1, TM | |
| CG8852 | 663 | SS, 10LRR, LRRCT, TM | |
| CG10148 | 329 | SS, 9LRR | |
| CG11136 | 799 | SS, LRRNT, 13LRR, LRRCT1, TM | |
| CG14351 | 1316 | SS, LRRNT, 12LRR, LRRCT1, TM | |
| CG14662 | 550 | SS, 6LRR, TM | |
| CG14762 | 470 | SS, LRRNT, 14LRR, LRRCT1 | |
| CG15658 | 343 | SS, LRRNT, 7LRR, LRRCT1, TM | |
| CG17667 | 458 | SS, LRRNT, 7LRR, TM | |
| CG18095 | 548 | SS, 18LRR, TM | |
| CG18480 | 550 | SS, LRRNT, 7LRR, LRRCT, TM | |
| CG32372 | 817 | SS, 23LRR | |
| C02C6.3 | 369 | SS, LRRNT, 8LRR, LRRCT1, GPI | |
| C41C4.3 | 630 | SS, 8LRR | |
| F10F2.4 | 656 | SS, LRRNT, 18LRR, LRRCT1, TM | |
| F37E3.2 | 568 | SS, LRRNT, 11LRR, TM | |
| K03A1.2 | 586 | SS, LRRNT, 9LRR, LRRCT1, TM | |
| T22E7.1a | 341 | SS, 8LRR, LRRCT1, TM | |
| T23G11.6 | 653 | SS, LRRNT, 15LRR, LRRCT, TM | |
| Y39A1A.7 | 187 | SS, LRRNT, 4LRR | |
| Y71F9B.8 | 542 | SS, LRRNT, 14LRR, LRRCT1, TM | |
| Y75B8A.5 | 448 | SS, LRRNT, 6LRR, LRRCT1 | |
| Y76A2B.2 | 782 | SS, LRRNT, 6LRR, GPI | |
List of singleton proteins in LRR_Only group not shown in Figure 3. For the mammalian proteins, only the mouse orthologue is listed, with the following exceptions: both human and mouse Nyctalopin (Nyx) are listed as they have different topologies (GPI-linked and TM, respectively) and synleurin is a human gene that has been pseudogenised in mouse.
Complement of eLRR proteins by group, localisation and species
| Type I TM | GPI | Secreted | Multi-TM | Total | |
| Worm | 3 | 0 | 1 | 0 | 4 |
| Fly | 8 | 0 | 0 | 0 | 8 |
| Mouse | 35 | 1 | 1 | 0 | 37 |
| Human | 35 | 1 | 2 | 0 | 38 |
| Total | 81 | 2 | 4 | 0 | 87 |
| Worm | 3 | 1 | 0 | 0 | 4 |
| Fly | 12 | 1 | 3 | 0 | 16 |
| Mouse | 17 | 0 | 2 | 0 | 19 |
| Human | 17 | 0 | 2 | 0 | 19 |
| Total | 49 | 2 | 7 | 0 | 58 |
| Worm | 0 | 0 | 3 | 1 | 4 |
| Fly | 0 | 0 | 2 | 5 | 7 |
| Mouse | 1 | 0 | 9 | 16 | 26 |
| Human | 1 | 0 | 9 | 16 | 26 |
| Total | 2 | 0 | 23 | 38 | 63 |
| Worm | 11 | 2 | 4 | 0 | 17 |
| Fly | 23 | 1 | 10 | 0 | 35* |
| Mouse | 28 | 5 | 19 | 0 | 52 |
| Human | 32 | 6 | 19 | 0 | 57 |
| Total | 94 | 14 | 52 | 0 | 161* |
The numbers of eLRR proteins in each of the four major groups is listed for each species, broken down by predicted protein localisation or topology: type I transmembrane, GPI-linked, secreted and multiple-membrane-spanning. *includes CG1504, unclassified localisation.
Figure 5Group-specific patterns of expansion and diversification. The graphs depict three-dimensional histograms showing the number of clusters (on the z axis) having x members in the fly and y members in the mouse. The clusters used for this analysis are listed [see Additional File 6]. Different patterns of expansion (new members in one species of a conserved subfamily) and diversification (novel subfamilies in one species) are observed across the four major groups of eLRR proteins. Graphs were generated with the SPSS program.
Figure 6Alignment of Elfn proteins. Predicted amino acid sequences from Elfn1 (A930017N06Rik) and Elfn2 (Lrrc62) from the mouse were aligned with CLUSTALW. Amino acids are colour-coded by chemical properties: blue: acidic; green: hydroxyl/amine/basic/Q; magenta: basic; red: small, hydrophobic (including aliphatic Y). Brackets indicate the extent of predicted motifs, including signal sequence (SS), six LRRs (the notch under the bracket indicates the end of the conserved N-terminal portion of each LRR), LRR-CT domain, fibronectin type-3 (FN3) domain and a transmembrane domain (TM). No recognizable LRR-NT domain was predicted. Note that the final LRR comprises the highly conserved N-terminal half-repeat only (consensus: LxxLxxLxLxxN). Identical residues are indicated by an asterisk, highly conservative substitutions by two dots and conservative substitutions by a single dot.
Figure 7Alignment of proteins in Elron cluster. Predicted amino acid sequences from Lrtm1, Lrtm2, Lrrc38, Lrrc55, Lrrc52 and BC004853 from the mouse were aligned with CLUSTALW. Brackets indicate the extent of predicted motifs (consensus limits are shown); the notch under the bracket indicates the end of the conserved N-terminal portion of each LRR. Arrowheads denote exon-intron boundaries. The short cytoplasmic domain is poorly conserved, but does contain similarly positioned acidic residues (E/D) in all members. Lrtm1 and 2 end in consensus PDZ-binding domains (SSSA/SSVA), underlined. Abbreviations, amino acid colour-code and conservation symbols as in Figure 7.
Figure 8Expression of . Expression as defined by RNA in situ hybridisation is shown for Elfn1 (A-C) and Elfn2 (D-F) in coronal sections of mouse brain at three ages (embryonic day 15 (E15), A, D; postnatal day zero (P0), B, E; and postnatal day 9 (P9), C, F). Elfn1 is strongly expressed in globus pallidus and interneurons in cortex and hippocampus, while Elfn2 is expressed in striatum and in projection neurons in cortex and hippocampus. Arrowheads in A and B indicate presumed interneurons migrating towards cortex. Abbreviations: cp, cortical plate; Cx, cortex; DG(vz), ventricular zone of dentate gyrus; gcl, granule cell layer (of dentate gyrus); GP, globus pallidus; hab, habenula; hc, hippocampus; hi, hilus (of dentate gyrus); hy, hypothalamus; pcl; pyramidal cell layer (of hippocampus); SB, subiculum; sp, subplate; so, stratum oriens (of hippocampus), str, striatum. Scale bar: E15, 200 microns; P0 and P9, 500 microns.
Figure 9Expression of Elron cluster genes in developing mouse brain. Expression as defined by RNA in situ hybridisation is shown for Lrtm1 (A, B), Lrtm2 (C, D) and Lrrc55 (E, F) in coronal sections of mouse brain at two ages (E15, A, C, E and P0, B, D, F). Differential staining in subsets of thalamic nuclei and across cortex is observed. Abbreviations: Am, amygdala; dLGN, dorsal lateral geniculate nucleus; dTh, dorsal thalamus; hab, habenula; hc, hippocampus; RS, retrosplenial cortex; sp, subplate; str, striatum; vLGN, ventral lateral geniculate nucleus; ZI, zona incerta. Scale bar: E15, 200 microns; P0, 500 microns.
Figure 10Expression of novel eLRR genes in the . (A) A lateral view of a stage 12 embryo showing expression of CG7702 in the midgut and the peripheral nervous system, PNS expression is indicated by a black arrow. (B) CG40500 expression in a stage 16 embryo, expression can be seen at the midline (indicated by a black arrow). (C and D) Lateral and ventral views, respectively, of a stage 15 embryo showing CG11910 expression in the central nervous system. (E) A stage 16 embryo with CG5888 expression in the CNS and midgut chamber, midgut chamber is indicated by a black arrow. (F) A dissected ventral nerve cord fillet with CG5888 expression (shown at 400× magnification). (G) A stage 11 embryo showing CG11136 expression at the midline, indicated by a white arrow and (H) a stage 15 embryo showing expression of CG11136 in the somatic musculature. All whole embryos are shown at 200× magnification. In all views anterior is to the left, in all lateral views dorsal is at the top, B, D and E show ventral views and G shows a dorsal view.