| Literature DB >> 20338883 |
Wojciech M Karlowski1, Andrzej Zielezinski, Julie Carrère, Dominique Pontier, Thierry Lagrange, Richard Cooke.
Abstract
Domains in Arabidopsis proteins NRPE1 and SPT5-like, composed almost exclusively of repeated motifs in which only WG or GW sequences and an overall amino-acid preference are conserved, have been experimentally shown to bind multiple molecules of Argonaute (AGO) protein(s). Domain swapping between the WG/GW domains of NRPE1 and the human protein GW182 showed a conserved function. As classical sequence alignment methods are poorly-adapted to detect such weakly-conserved motifs, we have developed a tool to carry out a systematic analysis to identify genes potentially encoding AGO-binding GW/WG proteins. Here, we describe exhaustive analysis of the Arabidopsis genome for all regions potentially encoding proteins bearing WG/GW motifs and consider the possible role of some of them in AGO-dependent mechanisms. We identified 20 different candidate WG/GW genes, encoding proteins in which the predicted domains range from 92aa to 654aa. These mostly correspond to a limited number of families: RNA-binding proteins, transcription factors, glycine-rich proteins, translation initiation factors and known silencing-associated proteins such as SDE3. Recent studies have argued that the interaction between WG/GW-rich domains and AGO proteins is evolutionarily conserved. Here, we demonstrate by an in silico domain-swapping simulation between plant and mammalian WG/GW proteins that the biased amino-acid composition of the AGO-binding sites is conserved.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20338883 PMCID: PMC2910046 DOI: 10.1093/nar/gkq162
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Scoring matrix used to define domain boundaries and calculate dos score, representing likelihood for a given amino acid to be found in GW domain
| Amino acid | Score [half-bits] | Score [bits] | Ratio | Frequency | Count |
|---|---|---|---|---|---|
| W | 2.666 | 1.333 | 2.520 | 0.063:0.025 | 743:1062 |
| G | 2.068 | 1.034 | 2.048 | 0.213:0.104 | 2490:4447 |
| N | 1.510 | 0.755 | 1.688 | 0.081:0.048 | 949:2051 |
| S | 1.236 | 0.618 | 1.535 | 0.152:0.099 | 1774:4213 |
| A | 0.280 | 0.140 | 1.102 | 0.065:0.059 | 762:2537 |
| D | 0.184 | 0.092 | 1.066 | 0.081:0.076 | 950:3254 |
| T | 0.000 | 0.000 | 1.000 | 0.040:0.040 | 467:1718 |
| Q | −0.076 | −0.038 | 0.974 | 0.038:0.039 | 440:1686 |
| K | −0.120 | −0.060 | 0.959 | 0.070:0.073 | 821:3136 |
| R | −0.590 | −0.295 | 0.815 | 0.044:0.054 | 518:2319 |
| P | −0.644 | −0.322 | 0.800 | 0.032:0.040 | 373:1726 |
| E | −1.288 | −0.644 | 0.640 | 0.048:0.075 | 560:3219 |
| V | −2.408 | −1.204 | 0.434 | 0.023:0.053 | 274:2260 |
| F | −2.558 | −1.279 | 0.412 | 0.014:0.034 | 169:1443 |
| H | −3.324 | −1.662 | 0.316 | 0.006:0.019 | 76:796 |
| C | −3.398 | −1.699 | 0.308 | 0.004:0.013 | 43:568 |
| Y | −4.792 | −2.396 | 0.190 | 0.004:0.021 | 50:890 |
| M | −5.012 | −2.506 | 0.176 | 0.003:0.017 | 39:743 |
| I | −5.030 | −2.515 | 0.175 | 0.007:0.040 | 79:1705 |
| L | −5.252 | −2.626 | 0.162 | 0.011:0.068 | 132:2925 |
The amino acids are sorted by the score value, from highest to lowest. The second and third columns were used in domain identification calculations. The last two colums contain counts and frequencies of a given amino acid found in the whole protein sequence versus the domain (format–domain: entire protein).
Figure 1.Schematic representation of the WG/GW protein identification pipeline. Grey-filled boxes represent the four major steps in the identification procedure.
Figure 2.Distribution of ics and dos scores of all identified proteins in Arabidopsis. Each point represents a WG/GW-containing protein. Grey dashed lines indicate dos and ics score threshold values revealing WG/GW protein candidates marked in red.
GW motif proteins identified in Arabidopsis genome after applying threshold filters on dos and ics scores
| AGI locus code | TAIR annotation (partial) | |||
|---|---|---|---|---|
| AT1G04800.1 | 78.26 | 3.55 | 1.96 | Glycine-rich protein; FUNCTIONS IN: molecular_function unknown; INVOLVED IN: N-terminal protein myristoylation; LOCATED IN: endomembrane system; EXPRESSED IN: 17 plant structures; |
| AT1G05460.1 | 74.53 | 4.47 | 1.14 | SDE3—SILENCING DEFECTIVE: a protein with similarity to RNA helicases; mutantsare defective in post-transcriptional gene silencing. |
| AT1G10270.1 | 108.26 | 7.30 | 1.25 | GRP23—GLUTAMINE-RICH PROTEIN 23: InterPro IPR011990—tetratricopeptide-like helical domain; InterPro IPR002885—pentatricopeptiderepeat; InterPro IPR013026—tetratricopeptideregion. |
| AT1G13020.1 | 63.96 | 9.07 | 1.27 | EIF4B2—eukaryotic initiation factor 4B2; Plant specific eukaryotic initiation factor 4B:IPR010433 |
| AT1G15840.1 | 88.09 | 2.01 | 2.06 | Unknown protein; FUNCTIONS IN: molecular_function unknown; INVOLVED IN: biological_process unknown; LOCATED IN: cellular_component unknown; EXPRESSED IN: 11 plant structures |
| AT1G65440.1 | 215.95 | 2.03 | 0.96 | GTB1—GLOBAL TRANSCRIPTION FACTOR GROUP B1: related to yeast Spt6 protein, whichfunctions as part of a protein complex in transcription initiation and also plays a role in chromatinstructure/assembly. |
| AT1G65440.2 | 69.07 | 6.37 | 2.01 | Same as above |
| AT2G16470.1 | 59.91 | 1.22 | 0.89 | DNA binding/nucleic-acid binding/protein binding/zinc ion binding; Zinc finger (CCCH-type) family protein/GYF domain-containing protein: InterPro:IPR000571—CCCH-type zinc-finger domain; InterPro IPR003169—GYF domain. |
| AT2G33410.1 | 27.71 | 2.79 | 1.9 | Heterogeneous nuclear ribonucleoprotein/hnRNP: contains InterPro domain RNA recognition motif, RNP-1; (InterPro:IPR000504); contains InterPro domain Nucleotide-binding, alpha-beta plait; (InterPro:IPR012677) |
| AT2G15780.1 | 107.99 | 7.39 | 2.07 | Glycine-rich protein; FUNCTIONS IN: electron carrier activity, copper ion binding; LOCATED IN: endomembrane system; CONTAINS InterPro DOMAIN/s: Plastocyanin-like (InterPro:IPR003245), Cupredoxin (InterPro:IPR008972). |
| AT2G40030.1 | 170.3 | 7.15 | 0.54 | NRPE1—the largest subunit of nuclear DNA-dependent RNA polymerase V; Required for normal RNA-directed DNA methylation at non-CG methylation sites and transgene silencing. |
| AT3G26400.1 | 49.64 | 2.79 | 1.53 | EIF4B—eukaryotic initiation factor 4B; Plant specific eukaryotic initiation factor 4B:InterPro:IPR010433 |
| AT3G51940.1 | 10.83 | 4.28 | 1.95 | Oxidoreductase/transition metal ion binding: InterPro domain Ferritin/ribonucleotide reductase-like; (InterPro:IPR009078) |
| AT4G16830.1 | 38.91 | 7.69 | 1.87 | Nuclear RNA-binding protein (RGGA): InterPro domain Hyaluronan/mRNA binding protein (InterPro:IPR006861) |
| AT4G16830.3 | 38.95 | 7.66 | 1.9 | Same as above |
| AT4G33930.1 | 130.58 | 2.83 | 1.78 | Glycine-rich protein; LOCATED IN: endomembrane system; CONTAINS InterPro DOMAIN/s: Cupredoxin (InterPro:IPR008972) |
| AT4G36230.1 | 171.65 | 6.86 | 1.62 | Unknown protein; hypothetical protein |
| AT4G38710.1 | 11.05 | 4.09 | 1.97 | Glycine-rich protein: InterPro domain Plant specific eukaryotic initiation factor 4B (InterPro:IPR010433) |
| AT5G03990.1 | 35.08 | 1.16 | 1.09 | Similar to oxidoreductase/transition metal ion binding |
| AT5G04290.1 | 585.79 | 8.37 | 0.4 | KTF1—KOW DOMAIN-CONTAINING TRANSCRIPTION FACTOR 1; SPT5-Like, a member of the nuclear SPT5 (Suppressor of Ty insertion 5) RNA polymerase (RNAP) elongation factor family that is characterized by the presence of a carboxy-terminal extension with more than 40 WG/GW motifs. Interacts with AGO4. Required for RNA-directed DNA methylation. |
| AT5G07540.1 | 122.96 | 3.85 | 1.82 | GLYCINE-RICH PROTEIN 16 (GRP16); Oleosin (InterPro:IPR000136); FUNCTIONS IN: lipid binding, nutrient reservoir activity; INVOLVED IN: sexual reproduction, lipid storage; |
| AT5G61660.1 | 64.68 | 8.62 | 1.84 | Glycine-rich protein; FUNCTIONS IN: molecular_function unknown; INVOLVED IN: biological_process unknown; LOCATED IN: endomembrane system; |
Genes are sorted by AGI identifiers (localization on the genome).
Figure 3.Domain architectures of selected WG/GW proteins. Gene structures of two small gene families are shown. Exons are represented as boxes and introns by lines. The WG/GW motif-containing region is colored in brown. (A) Three members of the SPT5-like transcription elongation factor family, showing the extensive platform in the At5g04290 gene product. (B) Variable motif domain length illustrated by alternative splicing in the SPT6 global transcription elongation factor family.
Figure 4.WGRP1 protein has Argonaute-binding capacity. (A) Primary sequence of the Arabidopsis WGRP1 sequence. The evolutionarily conserved N-terminal sequence is bolded and the location of the intron relative to the open reading frame is indicated by a vertical arrowhead. The WG/GW motifs in the WGRP1 CTD are in red and the WGRP1 sequence fused to GST is underlined. (B) Coomassie staining of the purified GST and GST-WGRP1 recombinant proteins used in the Argonaute-binding assay. (C) Preferential binding of AGO4 to the WG/GW-rich domain of WGRP1 protein. Myc-AGO4 or Flag-AGO1 extracts were applied to equimolar amounts of GST and GST-based fusion protein beads and the bound protein (Pellet) and supernatant (Super) fractions detected by immunoblotting with anti-Myc or anti-M2 antibodies. The GST protein was used as control.
Figure 5.Domain-swapping experiment simulation. (A) Pairwise alignment of WG/GW-rich domains from Arabidopsis largest polV subunit, NRPE1 and Human GW182. (B) Outline of the virtual domain swapping experiment between plant and mammalian WG/GW proteins. Dos/ics score tables were calculated based on experimentally-verified plant/mammalian WG/GW proteins and subsequently used to search for WG/GW domains in mammalian/plant proteomes. Detected putative WG/GW motif proteins were compared with experimentally verified AGO-binding proteins. Reciprocal best protein hits of such a bidirectional procedure share a conserved amino-acid composition of WG/GW-rich AGO-binding sites.
List of mammalian proteins identified with plant-specific scoring matrices and selected using thresholds calculated for plant proteins
| Description (partial) | Ago-binding activity | Organism | NCBI GI | |||
|---|---|---|---|---|---|---|
| hypothetical protein | nt | 237.7 | 1.21 | 0.41 | Human | 239758013 |
| TNRC6A: trinucleotide repeat containing 6A | + | 92.04 | 1.63 | 0.54 | Human, Cattle | 119916998 |
| TNRC6C: trinucleotide repeat containing 6C | + | 89.98 | 1.82 | 0.67 | Human, Cattle | 194676322 |
| HRNR: hornerin—intermediate filament-associated protein | nt | 161.44 | 9.46 | 0.69 | Human | 57864582 |
| Hypothetical protein | nt | 265.69 | 6.64 | 0.70 | Human | 169173184 |
| TNRC6B: trinucleotide repeat containing 6B | + | 106.81 | 7.81 | 0.77 | Human, Cattle, Horse, Rhesus, Dog | 73969036 |
| DMKN—dermokine | nt | 182.7 | 4.94 | 0.78 | Rhesus | 109124494 |
| Microsomal dipeptidase | nt | 187.96 | 4.25 | 0.80 | Cattle | 194687044 |
| Similar to Flag | nt | 47.25 | 3.45 | 1.08 | Platypus | 149631903 |
| ADP-ribosylation factor GTPase activating protein 1 | nt | 31.75 | 1.69 | 1.09 | Cattle | 115497314 |
| Similar to Repetin | nt | 30.22 | 2.04 | 1.10 | Rat | 27692337 |
| Similar to splicing coactivator subunit SRm300 | nt | 60.94 | 1.13 | 1.13 | Human, Rat | 109497194 |
| Hypothetical protein | nt | 37.81 | 8.62 | 1.14 | Mouse | 149258285 |
| FLG-2: flaggrin-2; similar to ifapsoriasin | nt | 110.16 | 6.69 | 1.18 | Platypus | 149515391 |
| Fibrinogen alpha-chain | nt | 74.37 | 4.51 | 1.19 | Horse | 194208383 |
| Hypothetical protein | nt | 21.52 | 6.59 | 1.31 | Dog | 74001559 |
| Serine/arginine repetitive matrix 3 | nt | 62.74 | 9.90 | 1.32 | Human | 158854042 |
| Collagen, type VI, alpha 6 precursor | nt | 13.74 | 2.42 | 1.33 | Human | 156616290 |
| Similar to splicing factor, arginine/serine-rich 2 | nt | 7.33 | 9.24 | 1.37 | Rat | 109481239 |
| SCY1-like 1 isoform A; N terminal kinase like protein | nt | 5.53 | 1.43 | 1.38 | Human | 115430241 |
| Zinc finger protein 106 homolog; FOG: WD40 repeat | nt | 50.03 | 2.70 | 1.39 | Cattle | 194670681 |
| Procollagen, type VII, alpha 1 | nt | 28.36 | 2.57 | 1.46 | Rat | 157819015 |
| CDSN—Corneodesmosin | nt | 152 | 1.30 | 1.47 | Platypus | 156602049 |
| Similar to Nucleoporin like 2 | nt | 8.95 | 6.39 | 1.53 | Horse | 149705615 |
| Paired mesoderm homeobox protein 2B | nt | 32.54 | 1.54 | 1.62 | Rat | 109499673 |
| Hypothetical protein | nt | 26.28 | 3.37 | 1.63 | Human | 239758008 |
| Similar to ribosomal protein S2 | nt | 15.06 | 1.90 | 1.69 | Rhesus | 109073249 |
| Hypothetical protein | nt | 90.01 | 1.81 | 1.72 | Rat | 109510645 |
| Hypothetical protein | nt | 71.91 | 5.28 | 1.80 | Rat | 109511723 |
| Insulin receptor substrate 4 | nt | 127.03 | 3.26 | 1.83 | Dog | 74008591 |
| Similar to Mucin-19 | nt | 29.98 | 2.10 | 1.93 | Human | 239755776 |
| Hypothetical protein | nt | 31.99 | 1.65 | 1.93 | Platypus | 149610906 |
| Leukocyte receptor tyrosine kinase isoform 1 precursor | nt | 41.14 | 6.13 | 1.99 | Human | 42544153 |
| Keratin 24 | nt | 66.13 | 7.79 | 2.01 | Mouse | 122425580 |
| Myeloblastin precursor (Proteinase 3) (PR-3) | nt | 22.37 | 5.81 | 2.01 | Horse | 194238637 |
| Epsin 1 isoform b | nt | 33.32 | 1.41 | 2.11 | Human | 194248095 |
The items are sorted accorind to ics score. Only the values for the highest scoring orthologous sequence are presented (marked byb).
ant: not tested AGO-binding activity.
bValues presented in table correspond to orthologous gene from marked organism.