| Literature DB >> 19107199 |
Fengyu Wang1, Jingfa Xiao, Linlin Pan, Ming Yang, Guoqiang Zhang, Shouguang Jin, Jun Yu.
Abstract
BACKGROUND: Mini-proteins, defined as polypeptides containing no more than 100 amino acids, are ubiquitous in prokaryotes and eukaryotes. They play significant roles in various biological processes, and their regulatory functions gradually attract the attentions of scientists. However, the functions of the majority of mini-proteins are still largely unknown due to the constraints of experimental methods and bioinformatic analysis. METHODOLOGY/PRINCIPALEntities:
Mesh:
Substances:
Year: 2008 PMID: 19107199 PMCID: PMC2602986 DOI: 10.1371/journal.pone.0004027
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1A: Mini-protein length distribution. B: Distribution of all mini-proteins.
Overview of mini-proteins in phylum.
| Domain | Phylum | Sum | Average% | Range% | Average Length | Minimum Length | Organism Number | Average Sum |
| Archaea | Crenarchaeota | 2953 | 11.28 | 8.36–18.23 | 77 | 18 | 12 | 246 |
| Archaea | Euryarchaeota | 7642 | 11.30 | 7.83–15.00 | 76 | 16 | 28 | 273 |
| Archaea | Nanoarchaeota | 50 | 9.33 | 9.33 | 81 | 54 | 1 | 50 |
| Bacteria | Acidobacteria | 696 | 5.58 | 5.35–5.80 | 84 | 37 | 2 | 348 |
| Bacteria | Actinobacteria | 14195 | 8.01 | 4.53–15.88 | 76 | 10 | 42 | 338 |
| Bacteria | Aquificae | 48 | 3.08 | 3.08 | 80 | 47 | 1 | 48 |
| Bacteria | Bacteroidetes | 3582 | 8.89 | 5.51–13.11 | 74 | 27 | 11 | 326 |
| Bacteria | Chlamydiae | 1296 | 9.98 | 6.22–17.73 | 73 | 30 | 11 | 118 |
| Bacteria | Chlorobi | 1295 | 11.79 | 6.79–21.80 | 72 | 30 | 5 | 259 |
| Bacteria | Chloroflexi | 979 | 13.29 | 6.31–18.16 | 74 | 27 | 4 | 245 |
| Bacteria | Cyanobacteria | 12057 | 17.31 | 7.80–30.83 | 71 | 15 | 26 | 464 |
| Bacteria | Firmicutes | 36465 | 12.60 | 0.16–25.11 | 70 | 6 | 113 | 323 |
| Bacteria | Fusobacteria | 232 | 11.22 | 11.22 | 72 | 20 | 1 | 232 |
| Bacteria | Planctomycetes | 1944 | 26.54 | 26.54 | 66 | 35 | 1 | 1944 |
| Bacteria | Alphaproteobacteria | 21246 | 11.03 | 5.09–33.39 | 74 | 20 | 65 | 327 |
| Bacteria | Betaproteobacteria | 21347 | 10.02 | 5.02–24.49 | 73 | 13 | 44 | 485 |
| Bacteria | Deltaproteobacteria | 5664 | 10.12 | 1.80–18.99 | 72 | 18 | 15 | 378 |
| Bacteria | Epsilonproteobacteria | 2099 | 11.03 | 7.35–16.75 | 69 | 12 | 11 | 191 |
| Bacteria | Gammaproteobacteria | 42626 | 9.75 | 4.58–26.91 | 73 | 9 | 123 | 347 |
| Bacteria | Spirochaetes | 3225 | 13.63 | 5.49–28.71 | 61 | 14 | 9 | 358 |
| Bacteria | Thermi | 761 | 7.46 | 5.75–9.29 | 78 | 11 | 4 | 190 |
| Bacteria | Thermotogae | 477 | 8.62 | 7.28–9.28 | 73 | 30 | 3 | 159 |
| 180879 | 10.99% | 74 | 532 | 346 |
Specific or shared mini-proteins in phyla.
| Mini-Proteins Categories | Euryarchaeota | % | Actinobacteria | % | Cyanobacteria | % | Firmicutes | % | Gamma Proteobacteria | % | Spirochaetes | % |
| Sum | 7643 | 14195 | 12057 | 36465 | 42626 | 3225 | ||||||
| Blast-result | 7638 | 14184 | 12055 | 36383 | 42534 | 3212 | ||||||
| Species-specific | 4854 | 63.55 | 7551 | 53.24 | 7245 | 60.10 | 20304 | 55.81 | 18711 | 43.99 | 2443 | 76.06 |
| Species-shared | 238 | 3.12 | 2619 | 18.46 | 907 | 7.52 | 5930 | 16.30 | 4684 | 11.01 | 69 | 2.15 |
| Genus-specific | 431 | 5.64 | 661 | 4.66 | – | – | 2354 | 6.47 | 1965 | 4.62 | 524 | 16.31 |
| Genus-shared | 413 | 5.41 | – | – | 23 | 0.19 | 945 | 2.60 | 5371 | 12.63 | – | – |
| Family-specific | 409 | 5.35 | 66 | 0.47 | 512 | 4.25 | 604 | 1.66 | 367 | 0.86 | 8 | 0.25 |
| Family-shared | 39 | 0.51 | 1972 | 13.90 | – | – | 844 | 2.32 | 243 | 0.57 | 0 | – |
| Order-specific | 237 | 3.10 | – | – | – | – | 127 | 0.35 | 28 | 0.07 | 2 | 0.06 |
| Order-shared | – | – | – | – | 1941 | 16.10 | 51 | 0.14 | 3938 | 9.26 | – | – |
| Class-specific | 47 | 0.62 | 206 | 1.45 | 194 | 1.61 | 878 | 2.41 | – | – | – | – |
| Class-shared | 487 | 6.38 | – | – | – | – | 972 | 2.67 | 3951 | 9.29 | – | – |
| Phylum-specific | 5 | 0.07 | – | – | 542 | 4.50 | 15 | 0.04 | – | – | – | – |
| Phylum-shared | 219 | 2.87 | 1087 | 7.66 | 663 | 5.50 | 3143 | 8.64 | 3180 | 7.48 | 163 | 5.07 |
| Domain-specific | 26 | 0.34 | – | – | – | – | – | – | – | – | – | – |
| Domain-shared | 233 | 3.05 | 22 | 0.16 | 28 | 0.23 | 216 | 0.59 | 96 | 0.23 | 3 | 0.09 |
Note: The averages of species-specific, phylum-shared and domain-shared are 58.79%, 6.20% and 0.73%, respectively. Because Actinobacteria contains only one class, class-specific mini-proteins are equal to the phylum-specifics'; similarly, Spirochaetes contains one class and one order, so the order-specific mean the phylum-specific. The hypothetical proteins account for 82.80%, 86.51%, 85.77%, 79.91%, 84.49% and 95.37% of the species-specific in Euryarchaeota, Actinobacteria, Cyanobacteria, Firmicutes, Gammaproteobacteria and Spirochaetes, respectively.
Phylum-shared and domain-shared mini-proteins in phyla.
| Euryarchaeota | Actinobacteria | Cyanobacteria | Firmicutes | Gamma proteobacteria | Spirochaetes | |||||||
| Phylum-shared | sum | 219 | sum | 1087 | sum | 663 | sum | 3143 | sum | 3180 | sum | 163 |
| hypothetical protein | 39 | hypothetical protein | 190 | hypothetical protein | 216 | hypothetical protein | 541 | hypothetical protein | 602 | hypothetical protein | 33 | |
| ribosomal protein | 135 | ribosomal protein | 482 | ribosomal protein | 214 | ribosomal protein | 1470 | ribosomal protein | 1132 | ribosomal protein | 83 | |
| enzyme or submit | 26 | enzyme or submit | 91 | enzyme or submit | 52 | phage protein | 231 | cold-shock protein | 387 | translation initiation | ||
| Other | 19 | cold-shock protein | 82 | redoxin | 29 | cold-shock protein | 179 | enzyme or submit | 219 | factor IF-1 | 10 | |
| redoxin | 58 | acyl carrier protein | 24 | DNA-binding protein | 110 | redoxin | 175 | acyl carrier protein | 7 | |||
| translation initiation | translation initiation | translation initiation | DNA-binding protein | 144 | carbon storage regulator | 5 | ||||||
| factor IF-1 | 37 | factor IF-1 | 22 | factor IF-1 | 100 | translation initiation | GroES chaperone | 4 | ||||
| 10 KD chaperonin | 22 | S4-like RNA | enzyme or submit | 90 | factor IF-1 | 112 | other | 21 | ||||
| other | 125 | binding protein | 15 | redoxin | 63 | acyl carrier protein | 105 | |||||
| other | 91 | acyl carrier protein | 45 | other | 304 | |||||||
| sporulation protein S | 38 | |||||||||||
| other | 276 | |||||||||||
| Domain-shared | sum | 223 | sum | 22 | sum | 28 | sum | 216 | sum | 96 | sum | 3 |
| hypothetical protein | 99 | hypothetical protein | 7 | hypothetical protein | 12 | hypothetical protein | 78 | hypothetical protein | 21 | hypothetical protein | 1 | |
| enzyme or submit | 34 | redoxin | 5 | gas vesicle protein | 8 | transcriptional | rubredoxin | 32 | rubredoxin | 2 | ||
| redoxin | 28 | transcriptional | rubredoxin | 4 | regulator | 49 | transcriptional | |||||
| transcriptional | regulator | 4 | enzyme or submit | 4 | DNA-binding protein | 32 | regulator | 15 | ||||
| regulator | 14 | YHS domain protein | 3 | redoxin | 14 | cation transport | ||||||
| gas vesicle protein | 11 | other | 3 | enzyme or submit | 13 | regulator | 7 | |||||
| other | 47 | other | 30 | other | 21 | |||||||
Figure 2Mini-proteins in COG.
Conservative analysis of hypothetical proteins in mini-proteins.
| Serial Number | Domain Name | Blast Sum | Phylum | Positives | Identity | Annotation | Domain Description |
| group01 | BMC | 103 | AcidobacteriaActinobacteriaCyanobacteriaFirmicutesFusobacteriaPlanctomycetesAlphaproteobacteriaBetaproteobacteriaDeltaproteobacteriaEpsilonproteobacteriaGammaproteobacteria | 66.40% | 7.40% | microcompartments protein;carboxysome shell protein;propanediol/ethanolamineutilization protein; | Bacterial microcompartments are primitive organelles composed entirely of protein subunits. The microcompartment is the carboxysome, a protein shell for sequestering carbon fixation reactions. |
| group02 | PAAR-motif | 16 | AcidobacteriaBacteroidetesChloroflexiCyanobacteriaPlanctomycetesAlphaproteobacteriaDeltaproteobacteriaEpsilonproteobacteriaGammaproteobacteria | 86.40% | 24.30% | PAAR repeat-containing protein ;hypothetical protein ; | This motif is found usually in pairs in a family of bacterial membrane proteins. It is also found as a triplet of tandem repeats comprising the entire length in another family of hypothetical proteins. |
| group03 | Plasmid-killer | 20 | ActinobacteriaCyanobacteriaAlphaproteobacteriaBetaproteobacteriaDeltaproteobacteriaGammaproteobacteria | 74.20% | 22.70% | plasmid maintenance system killer protein;hypothetical protein; | Several plasmids with proteic killer gene systems have been reported. All of them encode a stable toxin and an unstable antidote. The activation of those systems result in cell filamentation and cessation of viable cell production. |
| group04 | Plasmid-Txe | 40 | AcidobacteriaActinobacteriaChloroflexiCyanobacteriaFirmicutesAlphaproteobacteriaBetaproteobacteriaDeltaproteobacteriaGammaproteobacteriaSpirochaetes | 70.80% | 5.70% | addiction module toxin (Txe/YoeB family);hypothetical protein; | The plasmid encoded Axe-Txe proteins act as an antitoxin-toxin pair. |
| group05 | RHH-2 | 9 | ActinobacteriaCyanobacteriaAlphaproteobacteriaBetaproteobacteriaGammaproteobacteria | 74.20% | 28.10% | putative transcriptional regulators(CopG/Arc/MetJ family);hypothetical protein; | This family of proteins is about 80 amino acids in length and their function is unknown. The proteins contain a conserved GRY motif. This family appears to be related to ribbon-helix-helix DNA-binding proteins. |
| group06 | YcfA-like | 11 | FirmicutesAlphaproteobacteriaBetaproteobacteriaGammaproteobacteriaSpirochaetes | 60.70% | 17.90% | YcfA-like protein;hypothetical protein; | This family is similar to the YcfA protein expressed by E. coli. Most of these proteins are hypothetical proteins of unknown function. |
| group07 | zf-UBP | 12 | AcidobacteriaActinobacteriaCyanobacteriaEuryarchaeotaDeltaproteobacteriaGammaproteobacteria | 71.20% | 26.00% | putative Zn-finger domain;hypothetical protein; | Zn-finger in ubiquitin-hydrolases and other protein |
| group08 | DUF37 | 144 | AcidobacteriaActinobacteriaBacteroidetesChlorobiChlamydiaeCyanobacteriaFirmicutesFusobacteriaAlphaproteobacteriaBetaproteobacteriaDeltaproteobacteriaGammaproteobacteriaSpirochaetesThermiThermotogae | 46.50% | 7.10% | alpha-hemolysin;protein of unknown function DUF37;hypothetical protein; | This domain is found in short (75 amino acid) hypothetical proteins from various bacteria. The domain contains three conserved cysteine residues. |
| group09 | DUF196 | 13 | ChlorobiFirmicutesAlphaproteobacteriaDeltaproteobacteriaGammaproteobacteria | 86.60% | 13.40% | CRISPR-associated protein;protein of unknown function DUF196;hypothetical protein; | This domain describes proteins of unknown function.Trm112p-like protein; The bacterial members are about 60–70 amino acids in length and the eukaryotic examples are about 120 amino acids in length. The C terminus contains the strongest conservation. |
| group10 | DUF343 | 132 | ActinobacteriaAlphaproteobacteriaBetaproteobacteriaDeltaproteobacteria | 49.50% | 5.30% | tetraacyldisaccharide -1-P 4-kinase ;protein of unknown function DUF343;hypothetical protein; | |
| group11 | DUF370 | 46 | FirmicutesCyanobacteriaThermotogaeChloroflexiDeltaproteobacteria | 81.40% | 20.60% | protein of unknown function DUF370 ;hypothetical protein; | Domain of unknown function |
| group12 | DUF427 | 17 | ActinobacteriaBacteroidetesChloroflexiCyanobacteriaThermiBetaproteobacteriaGammaproteobacteria | 85.40% | 27.10% | protein of unknown function DUF427 ;hypothetical protein ; | Domain of unknown function |
| group13 | DUF433 | 6 | AcidobacteriaChlorobiChloroflexiCyanobacteriaAlphaproteobacteria | 70.20% | 19.00% | protein of unknown function DUF433 ;hypothetical protein ; | Domain of unknown function |
| group14 | DUF528 | 43 | AcidobacteriaCyanobacteriaAlphaproteobacteriaBetaproteobacteriaDeltaproteobacteriaGammaproteobacteria | 75.30% | 10.40% | accessory protein involved in assembly of Fe-S clusters;protein of unknown function DUF528;hypothetical protein; | Domain of unknown function |
| group15 | DUF891 | 11 | CyanobacteriaAlphaproteobacteriaBetaproteobacteriaDeltaproteobacteriaGammaproteobacteria | 65.50% | 13.60% | protein of unknown function DUF891;hypothetical protein; | This family consists of hypothetical bacterial proteins of unknown function as well as phage Gp49 proteins. |
| group16 | DUF1328 | 78 | AcidobacteriaBacteroidetesAlphaproteobacteriaBetaproteobacteriaDeltaproteobacteriaGammaproteobacteria | 57.00% | 7.00% | putative inner membrane protein;hypothetical protein ; | This family consists of several hypothetical bacterial proteins of around 50 residues in length. The function of this family is unknown. |
| group17 | DUF1458 | 37 | ActinobacteriaChlorobiAlphaproteobacteriaBetaproteobacteriaGammaproteobacteria | 56.10% | 5.60% | protein of unknown function DUF1458 ;hypothetical protein ; | Members of this family are typically of around 70 residues in length. The function of this family is unknown. |
| group18 | UPF0150 | 14 | AcidobacteriaChloroflexiCyanobacteriaEuryarchaeotaFirmicutesDeltaproteobacteria | 70.70% | 6.10% | protein of unknown function UPF0150;hypothetical protein; | This domain is found next to a DNA binding helix-turn-helix domain, which suggests that this is some kind of ligand binding domain. |
| group19 | pfam-B_8409 | 31 | ChlorobiAlphaproteobacteriaBetaproteobacteriaEpsilonproteobacteriaGammaproteobacteriaThermi | 73.80% | 17.80% | predicted membrane protein;hypothetical protein; | |
| group20 | pfam-B_11213 | 27 | BacteroidetesEuryarchaeotaFirmicutesDeltaproteobacteriaEpsilonproteobacteriaGammaproteobacteriaThermi | 63.50% | 22.40% | hypothetical protein | |
| group21 | pfam-B_20813 | 27 | BacteroidetesCyanobacteriaPlanctomycetesBetaproteobacteriaGammaproteobacteria | 50.00% | 13.40% | hypothetical protein | |
| group22 | pfam-B_20885 | 15 | ActinobacteriaFirmicutesAlphaproteobacteriaBetaproteobacteriaGammaproteobacteria | 44.60% | 11.90% | uncharacterized conserved small protein like protein;hypothetical protein; | |
| group23 | pfam-B_49955 | 22 | BacteroidetesAlphaproteobacteriaBetaproteobacteriaEpsilonproteobacteriaGammaproteobacteria | 64.30% | 27.10% | oxygen-sensitive ribonucleoside-triphosphate reductase;hypothetical protein ; | |
| group24 | pfam-B_108629 | 5 | BacteroidetesAcidobacteriaAlphaproteobacteriaBetaproteobacteriaDeltaproteobacteria | 47.40% | 4.20% | hypothetical protein; | |
| group25 | pfam-B_139336 | 7 | ChlorobiCyanobacteriaAlphaproteobacteriaDeltaproteobacteriaGammaproteobacteria | 75.30% | 15.70% | hypothetical protein; | |
| group26 | pfam-B_6607;pfam-B_9422 | 35 | BacteroidetesChlorobiAlphaproteobacteriaBetaproteobacteriaDeltaproteobacteriaEpsilonproteobacteriaGammaproteobacteria | 51.80% | 15.50% | hypothetical protein; | |
| group27 | signal-peptide;transmembrane-regions | 43 | AcidobacteriaAlphaproteobacteriaBetaproteobacteriaDeltaproteobacteriaGammaproteobacteria | 55.10% | 10.10% | conserved hypothetical membrane protein;hypothetical protein; | |
| group28 | TRASH;zf-HIT | 31 | CyanobacteriaAlphaproteobacteriaBetaproteobacteriaGammaproteobacteriaThermi | 42.00% | 12.50% | zinc finger protein;hypothetical protein; | TRASH :metallochaperone-like domainzf-HIT :This presumed zinc finger contains up to 6 cysteine residues that could coordinate zinc. |
Note: [1] Pfam-B_6607 and pfam-B_9422 are continuous; [2] Different domains are searched through different sequences.
Figure 3Patterns of domains.
Dashed lines mean the domain have been evolved to a part of other domain or protein family's conserved region.
Figure 4A: Patterns of BMC domain. Dashed lines mean the domain have been evolved to a part of other domain or protein family's conserved region. Green represents IPR011238 (polyhedral organelle shell protein PduT) and IPR009193 (polyhedral organelle shell protein, EutL/PduB type) in pattern4; blue represents IPR009193 (polyhedral organelle shell protein, EutL/PduB type) and IPR009307 (ethanolamine utilization EutS) in pattern 5. They are all polyhedral organelle shell proteins. B: Phylogeny of BMC domain in Cyanobacteria. Letter L and R represent the left and right domain in pattern 3 or 4, respectively. Letter r represents the domains in pattern 2 or 5.