| Literature DB >> 32763954 |
Kaushal Kumar Bhati1,2,3, Valdeko Kruusvee4,2, Daniel Straub4,2, Anil Kumar Nalini Chandran5, Ki-Hong Jung5, Stephan Wenkel1,2,6.
Abstract
MicroProteins are a class of small single-domain proteins that post-translationally regulate larger multidomain proteins from which they evolved or which they relate to. They disrupt the normal function of their targets by forming microProtein-target heterodimers through compatible protein-protein interaction (PPI) domains. Recent studies confirm the significance of microProteins in the fine-tuning of plant developmental processes such as shoot apical meristem maintenance and flowering time regulation. While there are a number of well-characterized microProteins in Arabidopsis thaliana, studies from more complex plant genomes are still missing. We have previously developed miPFinder, a software for identifying microProteins from annotated genomes. Here we present an improved version where we have updated the algorithm to increase its accuracy and speed, and used it to analyze five cereal crop genomes - wheat, rice, barley, maize and sorghum. We found 20,064 potential microProteins from a total of 258,029 proteins in these five organisms, of which approximately 2000 are high-confidence, i.e., likely to function as actual microProteins. Gene ontology analysis of these 2000 microProtein candidates revealed their roles in stress, light and growth responses, hormone signaling and transcriptional regulation. Using a recently developed rice gene co-expression database, we analyzed 347 potential rice microProteins that are also conserved in other cereal crops and found over 50 of these rice microProteins to be co-regulated with their identified interaction partners. Overall, our study reveals a rich source of biotechnologically interesting small proteins that regulate fundamental plant processes such a growth and stress response that could be utilized in crop bioengineering.Entities:
Keywords: biotechnology; crops; miPFinder; microProteins; protein-protein interaction
Mesh:
Year: 2020 PMID: 32763954 PMCID: PMC7534434 DOI: 10.1534/g3.120.400794
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
Figure 1A) Trans-miPs are generated through the duplication of the ancestral gene (top) and subsequent evolutionary trimming (middle), leaving only the domain responsible for dimerization (bottom, blue). Black lines are introns while colored and gray boxes represent exons. B) Cis-miPs are derived from the same mRNA transcript (top) as their target proteins (middle) through alternative splicing, alternative translation start site or alternative polyadenylation. Alternative splicing can lead to the generation of microProteins (bottom) by only including the exon containing the dimerization domain (blue). Alternative translation start site (circled) within the mRNA of the parent protein (top) can lead to the generation of a truncated construct that encodes for the dimerization domain (bottom, green). In some instances, an alternative polyadenylation signal (circled) can lead to the generation of a shorter mRNA construct from which the protein containing the dimerization domain (bottom, yellow) is made from. TSS – Translation start site. Poly (A) – polyadenylation site. C) The balance between the target homodimers and the target-microProtein (red) heterodimers can affect many molecular functions such as DNA-binding, recruitment of a co-repressor (purple) and other accessory proteins (teal) or even nuclear localization of the target protein.
Characteristics of potential microProteins and their targets used in the miPFinder v2.0 algorithm
| Potential microProtein characteristics | Potential target characteristics |
|---|---|
| Homologous to their interaction partner | Homologous to the potential microProtein |
| Contains a single domain | Contains two or more domains |
| High predicted instability index |
List of filters used by the miPFinder v2.0 algorithm
| MicroProtein filter | Target filter |
|---|---|
| Length <=150 amino acids | Length > 150 amino acids |
| Has no more than 10 homologs in the proteome | No identified target has a bit score lower than 10 and higher than 120 |
| Existence level <= 4 (as defined by UniProt) | Existence level <= 4 (as defined by UniProt) |
| Has an identified homologous target with a bit score of more than 30 | Longer than the identified microProtein by at least 40 amino acids |
A list of known A. thaliana microProteins scored using the miPFinder v2.0 algorithm
| MicroProtein UniProt ID | TAIR ID | Ranking (percentile) |
|---|---|---|
| Q9M157 | AT4G01060 | 1 (1) |
| Q9LJW5 | AT3G28917 | 2 (1) |
| O22059 | AT2G46410 | 3 (1) |
| D3GKW6 | AT2G30432 | 4 (1) |
| Q9LNI5 | AT1G01380 | 6 (2) |
| Q8GV05 | AT5G53200 | 7 (2) |
| B3H4X8 | AT2G30424 | 10 (3) |
| Q84RD1 | AT2G30420 | 14 (4) |
| Q1G3I2 | AT4G15248 | 16 (4) |
| Q9CA51 | AT1G74660 | 18 (5) |
| Q2Q493 | AT1G18835 | 19 (5) |
| Q9LRM4 | AT3G21890 | 20 (5) |
| Q9SJH0 | AT2G42870 | 49 (12) |
| Q9FLE9 | AT5G39860 | 52 (13) |
| Q8GW32 | AT1G26945 | 55 (14) |
| Q9LXG5 | AT5G15160 | 59 (15) |
| F4HXU3 | AT1G14760 | 61 (15) |
| Q9CA64 | AT1G74500 | 63 (15) |
| Q9LXR7 | AT3G58850 | 68 (17) |
| Q9LJX1 | AT3G28857 | 69 (17) |
| F4JCN9 | AT3G47710 | 88 (21) |
| Q9LXI8 | AT3G52770 | 189 (45) |
| Q56WL5 | AT2G36307 | 191 (46) |
Number of microProteins found in the five analyzed monocot species. Small proteins include all proteins smaller than 150 amino acids, including all microProteins. MicroProteins refers to the number of all small proteins identified as a potential microProtein. % Small proteins refer to the relative amount of small proteins compared to the whole genome. % MicroProteins refers to the relative amount of microProteins compared to the number of small proteins
| Barley | Rice | Wheat | Maize | Sorghum | |
|---|---|---|---|---|---|
| 35965 | 43603 | 105061 | 39399 | 34001 | |
| 26372 | 29242 | 90307 | 32098 | 26541 | |
| 9593 | 14361 | 14754 | 7301 | 7460 | |
| 3810 | 5451 | 4482 | 3790 | 2534 | |
| 1515 | 3686 | 697 | 1537 | 1474 | |
| 2295 | 1765 | 3785 | 2253 | 1060 | |
| 27% | 33% | 14% | 19% | 22% | |
| 40% | 38% | 30% | 52% | 34% |
Figure 2Score distribution of identified candidate microProteins as assigned by miPFinder.
Frequency of biological role keywords occurring in the identified microProtein targets. Each target may have had more than one keyword associated with it. Number in parentheses after each term indicates the frequency of occurrence
| Target keywords (n = 124) | Target phrases (n = 64) |
|---|---|
| Root (7) | Transcription factor (5) |
| Growth (6) | Salt stress (4) |
| Auxin (5) | Abscisic acid (3) |
| Stress (5) | Flowering time (2) |
| Development (5) | Heading date (2) |
| Shoot (4) | Grain yield (2) |
| Grain (4) | Auxin response (2) |
| Drought (4) | Drought stress (2) |
| Salt (4) | Stress tolerance (2) |
| Seedling (3) | Cold stress (2) |