| Literature DB >> 23783574 |
Marco Mariotti1, Alexei V Lobanov, Roderic Guigo, Vadim N Gladyshev.
Abstract
Selenoproteins are proteins containing an uncommon amino acid selenocysteine (Sec). Sec is inserted by a specific translational machinery that recognizes a stem-loop structure, the SECIS element, at the 3' UTR of selenoprotein genes and recodes a UGA codon within the coding sequence. As UGA is normally a translational stop signal, selenoproteins are generally misannotated and designated tools have to be developed for this class of proteins. Here, we present two new computational methods for selenoprotein identification and analysis, which we provide publicly through the web servers at http://gladyshevlab.org/SelenoproteinPredictionServer or http://seblastian.crg.es. SECISearch3 replaces its predecessor SECISearch as a tool for prediction of eukaryotic SECIS elements. Seblastian is a new method for selenoprotein gene detection that uses SECISearch3 and then predicts selenoprotein sequences encoded upstream of SECIS elements. Seblastian is able to both identify known selenoproteins and predict new selenoproteins. By applying these tools to diverse eukaryotic genomes, we provide a ranked list of newly predicted selenoproteins together with their annotated cysteine-containing homologues. An analysis of a representative candidate belonging to the AhpC family shows how the use of Sec in this protein evolved in bacterial and eukaryotic lineages.Entities:
Mesh:
Substances:
Year: 2013 PMID: 23783574 PMCID: PMC3753652 DOI: 10.1093/nar/gkt550
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Testing SECIS prediction methods
| TP | FP | Sn (%) | Pr (%) | FP/Mb | F-score(20) | Speed (min/Mb) | TP after filtering | FP after filtering | |
|---|---|---|---|---|---|---|---|---|---|
| Covels.5 | 114 | 1 747 455 | 98.3 | 0.007 | 224.54 | 0.026 | 33.51 | 107 | *201482 |
| Covels.10 | 108 | 188 466 | 93.1 | 0.057 | 24.22 | 0.184 | 101 | 35945 | |
| Covels.15 | 104 | 16 691 | 89.7 | 0.619 | 2.15 | 0.660 | 97 | 4152 | |
| Infernal.10 | 106 | 166 085 | 91.4 | 0.064 | 21.34 | 0.200 | 6.92 | 105 | 50814 |
| Infernal.15 | 98 | 9383 | 84.5 | 1.034 | 1.21 | 0.703 | 97 | 5697 | |
| Infernal.20 | 86 | 485 | 74.1 | 15.061 | 0.06 | 0.734 | 85 | 393 | |
| Secisearch.strict | 65 | 20 694 | 56.0 | 0.313 | 2.66 | 0.388 | 0.14 | 60 | 10557 |
| Secisearch.def | 86 | 110 532 | 74.1 | 0.078 | 14.20 | 0.220 | 0.18 | 76 | 42719 |
| Secisearch.loose | 79 | 262 710 | 68.1 | 0.030 | 33.76 | 0.102 | 3.18 | 64 | *54775 |
| Secisearch.looser | 84 | 2 689 478 | 72.4 | 0.003 | 345.59 | 0.012 | 2.62 | 66 | *542199 |
| Erpin.25 | 70 | 225 801 | 60.3 | 0.031 | 29.01 | 0.103 | 75.37 | ||
| Erpin.35 | 58 | 3754 | 50.0 | 1.522 | 0.48 | 0.463 | |||
| Erpin.45 | 43 | 48 | 37.1 | 47.253 | 0.01 | 0.371 |
The test set consisted of 116 SECIS elements from nine species (see Supplementary Material S3). For Covels, Infernal and Erpin, various score thresholds were considered; different patterns were considered for SECISearch. The two last columns show the effect of the SECISearch3 filter (see text). Erpin is not shown, as it is not included in SECISearch3.
For the methods indicated with a star (asterisk), the number of false positives after filtering was estimated by running the filter only on a subset of the total predictions, to save computational time. TP, number of true positives; FP, number of false positives; Sn, sensitivity (recall); Pr, precision; FP/Mb, average number of false positives per Mb of input sequence; F-score(20), F-score computed with beta = 20; Speed, total run time divided by the total input sequence length (∼8 Gb); TP after filtering, true positives passing the SECIS filter; FP after filtering, false positives passing the SECIS filter.
Figure 1.Workflow of the SECISearch3 program.
Figure 2.Example of SECISearch3 generated image: SECIS type I of human SelN. The core and the unpaired conserved nucleotides of the SECIS element are highlighted in green, and mismatches in red. SECISearch3 uses internally RNAplot.
Figure 3.Workflow of the Seblastian program.
Testing Seblastian
| Species | Selenoproteins | Known selenoproteins | New selenoproteins | ||
|---|---|---|---|---|---|
| Sn (%) | Pr (%) | Sn (%) | Pr (%) | ||
| 1 | 100.00 | 100.00 | 0.00 | 0.00 | |
| 3 | 33.33 | 100.00 | 0.00 | 0.00 | |
| 32 | 65.63 | 100.00 | 9.38 | 27.27 | |
| 3 | 33.33 | 100.00 | 66.67 | 66.67 | |
| 25 | 96.00 | 100.00 | 40.00 | 21.28 | |
| 24 | 91.67 | 81.48 | 33.33 | 7.84 | |
| 3 | 66.67 | 100.00 | 33.33 | 100.00 | |
| 1 | 100.00 | 100.00 | 0.00 | 0.00 | |
| 2 | 100.00 | 100.00 | 0.00 | 0.00 | |
| Global | 94 | 79.79 | 93.75 | 25.53 | 14.63 |
The testing was split for known and new selenoproteins, as described in the text.
aTo test Seblastian independently of SECISearch3, we considered here only the selenoproteins whose SECIS elements were correctly predicted by Infernal with the score threshold of 15. Thus, the number of selenoproteins reported here do not necessarily represent the complete selenoproteome of the species (see Supplementary Material S3 for full sets).
Sn, sensitivity (recall); Pr, precision.
Figure 4.AhpC selenoproteins. Two selenoprotein candidates in our Seblastian predictions were found in M.brevicollis and E.huxleyi, here framed in orange. The figure shows them aligned with other AhpC selenoproteins predicted using Selenoprofiles in eukaryotes (top) and prokaryotes (bottom). Some metazoan cysteine homologues are also shown on the top. The Sec is found in the highlighted redox box UXXC, present also in vertebrates as CXXC. For the full alignment and further details regarding the search for AhpC proteins, see Supplementary Material S6.
Figure 5.Two snapshots of the SECISearch3/Seblastian web server. On the left, the input form. On the right, the output page displayed when submitting the human GPx2 sequence.