| Literature DB >> 24004908 |
Sheng Liu1, Joanna Melonek1, Laura M Boykin2, Ian Small3, Katharine A Howell1.
Abstract
A small subset of the large pentatricopeptide repeat (PPR) protein family in higher plants contain a C-terminal small MutS-related (SMR) domain. Although few in number, they figure prominently in the chloroplast biogenesis and retrograde signaling literature due to their striking mutant phenotypes. In this review, we summarize current knowledge of PPR-SMR proteins focusing on Arabidopsis and maize proteomic and mutant studies. We also examine their occurrence in other organisms and have determined by phylogenetic analysis that, while they are limited to species that contain chloroplasts, their presence in algae and early branching land plant lineages indicates that the coupling of PPR motifs and an SMR domain into a single protein occurred early in the evolution of the Viridiplantae clade. In addition, we discuss their possible function and have examined conservation between SMR domains from Arabidopsis PPR proteins with those from other species that have been shown to possess endonucleolytic activity.Entities:
Keywords: Arabidopsis thaliana; Zea mays; chloroplast; endonuclease; genomes uncoupled; mitochondria; pentatricopeptide repeat protein; plastid; small MutS-related domain
Mesh:
Substances:
Year: 2013 PMID: 24004908 PMCID: PMC3858433 DOI: 10.4161/rna.26172
Source DB: PubMed Journal: RNA Biol ISSN: 1547-6286 Impact factor: 4.652

Figure 1. Proteins containing an SMR domain in the model plant Arabidopsis thaliana. A non-redundant set of 12 proteins was identified by searching the Universal Protein knowledgebase (UniProt; www.uniprot.org) for Arabidopsis proteins that contain the InterPro domain IPR002625 (Smr protein/MutS2 C-terminal domain). Proteins are denoted by their corresponding Arabidopsis Genome Identifier (AGI; ATXGXXXXX) and, if applicable, followed by their common name (e.g., GUN1). Protein domain structure is shown alongside each AGI to demonstrate presence and location of the pentatricopeptide repeat (PPR), small MutS-related (SMR), domain of unknown function (DUF) 1771, and MutS domains. The schematics of protein domain structure were created by combining TPRpred to predict PPR domains (those with P > 0.01 were excluded), InterProScan to identify other domains and DOG 1.0 for visualization of their respective positions.
Table 1. Summary of current knowledge of PPR-SMR proteins in the dicot and monocot plant models, Arabidopsis and maize
| Arabidopsis | Maize | ||||
|---|---|---|---|---|---|
| AT2G31400 (GUN1) | Normal gross phenotype in dark or light growth conditions but shows defective de-etiolation response | GRMZM2G432850 | ? | ||
| AT1G74850 | Seedling lethal, requires exogenous carbon source for further growth, cannot produce seeds, PEP promoter usage affected | GRMZM2G122116 | Very pale yellow green | ||
| AT4G16390 | Slower growth with reduced chlorophyll concentration, | GRMZM2G128665 | Pale green, reduced translation of ATP synthase subunits | ||
| AT5G46580 | ? | GRMZM2G438524 | Very pale yellow green-virescent (PML: | ||
| AT2G17033 | ? | GRMZM2G164202 | WT-like (PML: | ||
| AT1G79490 | Embryo defective, developmental arrest occurs at globular stage | GRMZM2G345667 | ? | ||
| AT1G74750 | ? | GRMZM2G475897 | ? | ||
| AT1G18900 | ? | ||||
Localization data was derived from the SUBA3 database for Arabidopsis proteins and by manual curation of proteomic data sets for maize proteins. Subcellular localizations in italics are based on predictions while those in bold are based on experimental evidence (GFP/YFP, GFP/YFP fusion studies; MS, identified by mass spectrometry of protein samples). In some cases proteins were identified from a sample corresponding to a specific suborganellar location as specified (e.g., stroma, envelope, nucleoids, TAC). Mutant phenotype descriptions are based on manual curation of the literature and, where available, seedling phenotype descriptions from the maize photosynthetic mutant library (PML; http://pml.uoregon.edu/photosyntheticml.html). For PML descriptions, note that these mutants have not yet been analyzed in detail and the effect of the mutation on the expression of the gene still needs to be determined before definitive phenotypes are assigned.
Table 2. The relative abundance of PPR-SMR proteins in different Arabidopsis and maize protein samples based on normalized adjusted spectral counts as an estimate of protein abundance
| Reference | Protein fraction description | No. proteins identified | No. PPR proteins identified | No. PPR-SMR | % of total | % of total PPR protein mass attributed |
|---|---|---|---|---|---|---|
| 24 | Total leaf protein | 3424 | 17 | 0.05 | ||
| 17 | Total leaf protein | 815 | 9 | 0.02 | ||
| 16 | Stromal fraction: low molecular weight ( | 398 | 0 | 0 | ||
| 16 | Stromal fraction: high molecular weight A ( | 293 | 9 | 0.46 | ||
| 16 | Stromal fraction: high molecular weight B ( | 230 | 6 | 0.47 | ||
| 17 | Nucleoids | 1026 | 26 | 1.04 | ||
| 17 | Proplastids | 2242 | 32 | 0.67 | ||
| 18 | Proplastids | 1717 | 17 | 0.41 | ||
| 23 | Chloroplasts | 1428 | 5 | 0.002 | ||
| 18 | Nucleoids - average from | 1092 | 63 | 4.65 | ||
| 18 | Nucleoids, leaf base | 678 | 46 | 4.89 | ||
| 18 | Nucleoids, leaf tip | 710 | 35 | 2.68 | ||
| 18 | Nucleoids, young leaves | 827 | 55 | 6.38 |
For quantitation of protein mass, each protein accession is scored for total MS/MS spectral counts (SPC), unique SPC (uniquely matching to an accession), and adjusted SPC (adjSPC). AdjSPC is the sum of unique SPCs and SPCs from shared peptides across accessions with SPC distributed in proportion to their unique SPC. The normalized adjSPC (NadjSPC) for each protein is calculated through division of adjSPC by the sum of all adjSPC values for the proteins from the sample (e.g., per gel lane or protein extract). Thus, NadjSPC provides a relative protein abundance measure by mass. For example, a protein with NadjSPC = 0.01 contributes approximately 1% of the protein mass of the analyzed sample. NadjSPC values were obtained from the publications indicated and used to calculate the relative abundance of PPR and PPR-SMR proteins.

Figure 2. Bayesian phylogenetic tree of PPR-SMR protein sequences from a range of different species. Sequences of PPR-SMR proteins were obtained from BLAST searches and InterPro domain searches (IPR002625 and IPR002885) and aligned using MUSCLE. A phylogenetic tree was constructed using MrBayes version 3.2.1 which employs Markov Chain Monte Carlo (MCMC) sampling to approximate the posterior probabilities of phylogenies (shown above the branches). MrBayes 3.2.1 was run in parallel on the Fornax supercomputer (located at iVEC@UWA) utilizing the BEAGLE library with a mixed model of molecular evolution (determined using jModelTest), utilizing 12 chains for 50 million generations and trees sampled every 1000 generations. All runs reached a plateau in likelihood score, which was indicated by the standard deviation of split frequencies (0.0015), and the potential scale reduction factor was close to one, indicating the MCMC chains converged. Sequences are color shaded based on their lineage as indicated.

Figure 3. SMR domain alignment to assess amino acid sequence conservation. The SMR domains of the eight Arabidopsis PPR-SMR proteins were aligned with SMR domains from proteins that have been experimentally demonstrated to have endonucleolytic activity.,– The sequences are denoted by the SMR subfamily type (1_, 2_, or 3_) followed by the AGI (for Arabidopsis proteins) or alternative identifier (Tt_MutS2 – Thermus thermophilus MutS2 protein; Hs_B3BP – Homo sapiens BCL3 binding protein; Ld_CSBP – Leishmania donovani cycling sequence binding protein; Ec_YdaL – Escherichia coli YdaL protein), and the length of the SMR domain (e.g., /1–93). Alignment was performed using MUSCLE and visualized using Jalview (www.jalview.org) with ClustalX coloring by conservation. The positions of previously described conserved regions are indicated on the alignment: the LDXH motif present in subfamily 2 SMR domains and the centrally located HGXG/TGXG (subfamilies 1 and 3/subfamily 2) are bounded by the red and blue boxes, respectively.