| Literature DB >> 24256031 |
Peng Zhou1, Kevin At Silverstein, Liangliang Gao, Jonathan D Walton, Sumitha Nallu, Joseph Guhlin, Nevin D Young.
Abstract
BACKGROUND: Small peptides encoded as one- or two-exon genes in plants have recently been shown to affect multiple aspects of plant development, reproduction and defense responses. However, popular similarity search tools and gene prediction techniques generally fail to identify most members belonging to this class of genes. This is largely due to the high sequence divergence among family members and the limited availability of experimentally verified small peptides to use as training sets for homology search and ab initio prediction. Consequently, there is an urgent need for both experimental and computational studies in order to further advance the accurate prediction of small peptides.Entities:
Mesh:
Substances:
Year: 2013 PMID: 24256031 PMCID: PMC3924332 DOI: 10.1186/1471-2105-14-335
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1The SPADA workflow.
Figure 2Performance comparison of different gene prediction components. Search E-value threshold is set to 0.001 by default.
Cysteine-Rich Peptides (CRPs) predicted in and
| Defensin related | CRP0000-CRP0260,etc. | 56 | 43 |
| LCR/BET1 related | CRP0280-CRP0810,etc. | 162 | 110 |
| SCR related | CRP0830-CRP0880 | 32 | 6 |
| Metallocarboxypeptidase inhibitor | CRP1004-CRP1030 | 0 | 1 |
| CCP related | CRP1040-CRP1120 | 19 | 4 |
| Nodule Cysteine-Rich peptide | CRP1130-CRP1530 | 3 | 583 |
| Ripening related protein | CRP1600-CRP1605 | 0 | 21 |
| Novel family | CRP1620,CRP2800,etc. | 14 | 15 |
| Miscellaneous | CRP1640-CRP1660,etc. | 16 | 48 |
| Rapid Alkalinization Factor | CRP1700-CRP2120 | 38 | 36 |
| Thionin related | CRP2200-CRP2610 | 66 | 23 |
| Root cap/late embryogenesis | CRP2820-CRP2850 | 5 | 7 |
| Antimicrobial peptide MBP-1 | CRP2900-CRP3000 | 1 | 2 |
| Bowman Birk inhibitor | CRP3100-CRP3190 | 0 | 16 |
| Pollen Ole e I | CRP3300-CRP3510 | 34 | 44 |
| ECA1 gametogenesis related | CRP3600-CRP3740 | 124 | 17 |
| Lipid transfer protein | CRP3800-CRP4962 | 127 | 127 |
| 2S Albumin | CRP4970-CRP5080 | 5 | 3 |
| Glutenin/Giadin/Prolamin | CRP5090-CRP5270 | 0 | 0 |
| Maternally-expressed gene/Ae1 | CRP5300-CRP5520 | 20 | 2 |
| Proteinase inhibitor II | CRP5545-CRP5600 | 6 | 2 |
| Chitinase/Hevein | CRP5610-CRP5820 | 10 | 15 |
| Kunitz type inhibitor | CRP6010-CRP6180 | 7 | 45 |
| Total | 745 | 1170 |
Novel CRP models identified by SPADA determined by manual inspection
| Total predictions | 745 | 1170 |
| Number of unannotated predictions
| 5 | 125 |
| Number of novel models
| 3 (60%) | 77 (62%) |
An unannotated prediction is a gene model predicted by SPADA but missed by current genome annotation.
Novel models are unannotated predictions that are manually inspected to be true members of the family with evidence from family-specific alignment and/or RNA-Seq evidence.
Figure 3A novel gene model predicted by SPADA is missed by the current annotation. A Medicago NCR (h1001.01) shown in IGV (above figure) and subgroup alignment of CRP1180 (below figure, h1001.01 shaded).
Figure 4SPADA detects mis-annotated and novel SPH peptides in TAIR10. (A) SPADA detects an SPH peptide (h0018.02) that is mis-annotated in TAIR10; (B) SPADA detects a novel SPH peptide (h0013.02) not present in TAIR10. Multiple sequence alignment of selected SPH peptides are shown below with h0018.02 and h0013.02 shaded.
Figure 5Multiple sequence alignment of Amanita toxin proproteins. Sequences identified by SPADA are labeled as "hm****". All remaining sequences were obtained from Hallen et al.[39], and were included in the initial alignment used as input for SPADA.