| Literature DB >> 20140071 |
Hélène San Clemente1, Rafael Pont-Lezica, Elisabeth Jamet.
Abstract
Bioinformatics is used at three different steps of proteomic studies of sub-cellular compartments. First one is protein identification from mass spectrometry data. Second one is prediction of sub-cellular localization, and third one is the search of functional domains to predict the function of identified proteins in order to answer biological questions. The aim of the work was to get a new tool for improving the quality of proteomics of sub-cellular compartments. Starting from the analysis of problems found in databases, we designed a new Arabidopsis database named ProtAnnDB (http://www.polebio.scsv.ups-tlse.fr/ProtAnnDB/). It collects in one page predictions of sub-cellular localization and of functional domains made by available software. Using this database allows not only improvement of interpretation of proteomic data (top-down analysis), but also of procedures to isolate sub-cellular compartments (bottom-up quality control).Entities:
Keywords: bioinformatics; cell wall; plant; proteomics
Year: 2009 PMID: 20140071 PMCID: PMC2808182 DOI: 10.4137/bbi.s2065
Source DB: PubMed Journal: Bioinform Biol Insights ISSN: 1177-9322
Figure 1.Bioinformatic tools in proteomic strategies. Different steps are required for protein identification in complex samples from sub-cellular fractionation to mass spectrometry analysis. Bioinformatics can be used for three different purposes indicated by stars: protein identification, prediction of protein sub-cellular localization, and prediction of protein function.
Evaluation of different methods for the recovery of plant secreted proteins. Efficiency of the classical methods used to assess the purity of subcellular fractions (Methods for purity control) is compared to the results of the bioinformatic analysis of the subcellular localization of the proteins identified by mass spectrometry and bioinformatics. Estimated fraction purity refers to results of classical methods of analysis whereas cell wall protein fraction purity (ratio between number of predicted secreted proteins and total number of proteins) refers to results of bioinformatic analysis. Negative purity control for the cell wall fraction (–); positive purity control for the cell wall fraction (+); 1D-E: mono-dimensional gel electrophoresis; ADH: alcohol dehydrogenase; G-6-PDH: glucose-6-P dehydrogenase; MDH: malate dehydrogenase.
| washings with salt solutions | >60% | 51 | 96 | 0 | 53.1% | Borderies et al. | ||
| culture medium | >99% | 9 | 13 | 1 | 69.2% | Oh et al. | ||
| intercellular fluids | >99% | 6 | 13 | 3 | 46.1% | Haslam et al. | ||
| intercellular fluids | >90% | 87 | 93 | 0 | 93.5% | Boudart et al. | ||
| water extraction; 10% glycerol sedimentation | >99% | 24 | 75 | 2 | 32.0% | Chivasa et al. | ||
| salt solution containing 10% glycerol; extensive washings; CaCl2 final washing | >90% | 89 | 792 | 0 | 12.6% | Bayer et al. | ||
| filtration and extensive washing | different protein patterns after 1D-E analysis of different fractions during the purification procedure | qualitative | 25 | 74 | 9 | 33.8% | Watson et al. | |
| low salt buffer; increasing sucrose density sedimentation; extensive washing | none | not determined | 73 | 99 | 4 | 73.7% | Feiz et al. |
Number of Arabidopsis proteins annotated as LRXs in various databases and by Baumberger et al.13
| NCBI | 14 | |||||
| TAIR | 14 | |||||
| TIGR | 14 | |||||
| MIPS | 1 | 6 | 3 | 1 | 2 | 1 |
| Baumberger et al. | 11 |
Figure 2.BLAST 2 sequences alignment between amino acid sequences of . BLAST was done using BLAST 2 sequences (http://blast.ncbi.nlm.nih.gov/bl2seq/wblast2.cgi). Query stands for amino acid sequence of At2g19780 (402 amino acids). Subject stands for amino acid sequence of At3g24480 (494 amino acids). Note that there is 45% identity and 63% similarity between the LRR regions. The proline-rich domain of At3g24480 is outside of this alignment at the C-terminus of At3g24480.
Functional domains found using InterProScan and PROSITE in Arabidopsis proteins annotated as LRXs in databases. IPR001611: leucine-rich repeat; PF00560: LRR_1; IPR013210: leucine-rich repeat, N-terminal; PF08263: LRR_NT; PS50099: PRO_RICH proline-rich region profile; IPR003882: pistil-specific extensin-like protein; PR01218: PSTLEXTENSIN; IPR003883: extensin-like protein; PF02095: Extensin_1; PR01217: PRICHEXTENSN.
| At1g12040 | AtLRX1 | 1 | 1 | 1 | 1 | ||
| At1g49490 | AtPEX2 | 1 | 1 | 1 | 1 | ||
| At1g62440 | AtLRX2 | 1 | 1 | 1 | |||
| At2g15880 | AtPEX3 | 1 | 1 | 1 | 1 | ||
| At2g19780 | 1 | 1 | |||||
| At3g19020 | AtPEX1 | 1 | 1 | 1 | 1 | 1 | |
| At3g22800 | AtLRX6 | 1 | 1 | 1 | 1 | 1 | |
| At3g24480 | AtLRX4 | 1 | 1 | 1 | 1 | ||
| At4g06744 | 1 | ||||||
| At4g13340 | AtLRX3 | 1 | 1 | 1 | 1 | 1 | |
| At4g18670 | AtLRX5 | 1 | 1 | 1 | 1 | ||
| At4g29240 | 1 | 1 | |||||
| At4g33970 | AtPEX4 | 1 | 1 | 1 | 1 | ||
| At5g25550 | AtLRX7 | 1 | 1 | 1 | |||
Figure 3.Functional annotation of ) Description of the AAK30571 sequence at the NCBI protein database (http://www.ncbi.nlm.nih.gov/sites/entrez?db=Protein). B) BLAST was done using BLAST 2 sequences (http://blast.ncbi.nlm.nih.gov/bl2seq/wblast2.cgi). Query stands for amino acid sequence of At1g12090 (137 amino acids). Subject stands for amino acid sequence of AAK30571 (137 amino acids).
Figure 4.Bioinformatics tools for prediction of protein sub-cellular localization.
Protein sequences are from TAIR8 (http://www.arabidopsis.org/index.jsp).
Aramemnon: http://aramemnon.botanik.uni-koeln.de/
Predotar: http://urgi.versailles.inra.fr/predotar/predotar.html
SignalP: http://www.cbs.dtu.dk/services/SignalP/
TargetP: http://www.cbs.dtu.dk/services/TargetP/
TMHMM: http://www.cbs.dtu.dk/services/TMHMM-2.0/
Figure 5.Bioinformatics tools for prediction of cell wall protein functional domains.
Protein sequences are from TAIR8 (http://www.arabidopsis.org/index.jsp). Links to NCBI RefSeq are provided for each protein (http://www.ncbi.nlm.nih.gov/RefSeq/). Examples of cell wall-related gene families annotated by experts are listed in Table 4.
InterProScan: http://www.ebi.ac.uk/Tools/InterProScan/
PFAM: http://pfam.sanger.ac.uk/search?tab=searchSequenceBlock
PROSITE: http://www.expasy.org/prosite/
Cell wall-related protein families of Arabidopsis annotated by experts.
| Schultz et al. | |
| Schultz et al. | |
| Roudier et al. | |
| Baumberger et al. | |
| Fowler et al. | |
| Johnson et al. | |
| Raes et al. | |
| Pourcel et al. | |
| Jacobs and Roe | |
| Nersissian and Shipp | |
| Shiu and Bleecker |
Figure 6.Output of after query with two cell wall arabinogalactan protein AGI codes ( The table comprises several columns: on the left side, the three mauve columns contain the AGI code of the gene and the RefSeq accession number of the protein, the TAIR annotation, and the curated annotation, respectively; in the central part, the four yellow columns contain results of prediction of sub-cellular localization by TargetP, Predotar, and TMHMM, as well as prediction of presence of GPI anchor taken from Aramemnon; on the right part, the three blue columns contain results of prediction of functional domains by PFAM, InterProScan and PROSITE. By a click on each column head, it is possible to get an explanation on the result. It is also possible to download the protein sequences in the FASTA format and the content of the whole table in a Microsoft Office Excel compatible format.
Figure 7.Output of showing detailed results of predictions of sub-cellular localization and of presence of functional domains for the cell wall arabinogalactan protein encoded by . Only the upper part of the web page is shown. A menu offers the possibility to quickly reach the results of prediction with the different software. The first heading collects the protein sequence in FASTA format, the RefSeq accession number as well as the curated annotation done by experts. References or web sites are also mentioned.