| Literature DB >> 21573033 |
E Ashrafi1, A Alemzadeh, M Ebrahimi, E Ebrahimie, N Dadkhodaei, M Ebrahimi.
Abstract
Phytoremediation refers to the use of plants for extraction and detoxification of pollutants, providing a new and powerful weapon against a polluted environment. In some plants, such as Thlaspi spp, heavy metal ATPases are involved in overall metal ion homeostasis and hyperaccumulation. P1B-ATPases pump a wide range of cations, especially heavy metals, across membranes against their electrochemical gradients. Determination of the protein characteristics of P1B-ATPases in hyperaccumulator plants provides a new opportuntity for engineering of phytoremediating plants. In this study, using diverse weighting and modeling approaches, 2644 protein characteristics of primary, secondary, and tertiary structures of P1B-ATPases in hyperaccumulator and nonhyperaccumulator plants were extracted and compared to identify differences between proteins in hyperaccumulator and nonhyperaccumulator pumps. Although the protein characteristics were variable in their weighting, tree and rule induction models; glycine count, frequency of glutamine-valine, and valine-phenylalanine count were the most important attributes highlighted by 10, five, and four models, respectively. In addition, a precise model was built to discriminate P1B-ATPases in different organisms based on their structural protein features. Moreover, reliable models for prediction of the hyperaccumulating activity of unknown P1B-ATPase pumps were developed. Uncovering important structural features of hyperaccumulator pumps in this study has provided the knowledge required for future modification and engineering of these pumps by techniques such as site-directed mutagenesis.Entities:
Keywords: ATPase pumps; bioinformatics; environment; heavy metals; modeling; transporter
Year: 2011 PMID: 21573033 PMCID: PMC3091408 DOI: 10.4137/BBI.S6206
Source DB: PubMed Journal: Bioinform Biol Insights ISSN: 1177-9322
Identifying the most important protein features in discrimination of hyperaccumulator pumps from nonhyperaccumulators by different weighting algorithms (value nearer to 1 shows higher effectiveness of attribute in generating hyperaccumulator pump).
| Gly-Glu count | 1.00 | |
| Ser-Tyr count | 0.91 | |
| Lys-Ser count | 0.88 | |
| Lys count | 0.84 | |
| Frequency of Cys-Glu, Cys count | 0.83 | |
| Frequency of Lys-Ser | 0.80 | |
| Lie-Cys count, frequency of Asp-Cys | 0.79 | |
| Frequency of Asn-Lys | 0.7 | |
| Asp-Cys count, hydrophilic residues, Gly-Asn | 0.75 | |
| Frequency of Phe-Cys, Asn-Lys count | 0.74 | |
| Ser count | 0.73 | |
| Asp-Ser count | 0.71 | |
| Frequency of Cys | 0.70 | |
| Glu-Asn count | 1.00 | |
| Frequency of Glu-Asn | 0.92 | |
| Ser-Ala count | 0.88 | |
| Gly-Asp count | 0.86 | |
| Gly-Pro count | 0.85 | |
| Frequency of Ser-Ala | 0.83 | |
| Gln-Val count | 0.79 | |
| Leu-Gln count | 0.78 | |
| Frequency of Gly-Pro | 0.74 | |
| Frequency of Ser-Ser; Ser-Asn Gln-Ile and Ser-Asn counts | 0.73 | |
| Glu-Lys count; frequency of Gln-Val | 0.72 | |
| Frequency of Ser-Cys; Val-Phe, Arg-Leu, Asp-Pro counts | 0.71 | |
| Frequency of Val-Ser, Asp-Pro | 0.7 | |
| Frequency of Phe-His | 1.00 | |
| Phe-His count | 0.83 | |
| Cys-Met count, frequency of Cys-Met | 0.73 | |
| Gly count | 1.00 | |
| Gly count | 1.00 | |
| Val-Phe count | 0.7 | |
| Gly count | 1.00 | |
| Trp-Asn count | 0.88 | |
| Trp-Tyr count | 0.78 | |
| Reduced extinction coefficient at 280 nm | 1.00 | |
| Gly count | 1.00 | |
| Val-Phe count | 0.92 | |
| Frequency of Val-Phe | 0.88 | |
| Frequency of Gln-Val | 0.57 | |
| Val-Phe count | 1.00 | |
| Frequency of Val-Phe | 0.98 | |
| Frequency of Lys-Ser | 0.96 | |
| Lys-Ser count | 0.96 | |
| Met-Lys count | 0.92 | |
| Frequency of Leu-Gln; Gly-Asn count | 0.88 | |
| Thr-Ser, Gly, Val-Glu, Asp-Pro, Gln-Ile, Gly-Pro, Phe-His, Asp-Phe, Arg-Gly, Arg-Leu, Pro-Thr counts; frequency of Gly-Arg, Asp-Phe, Arg-Leu, Val-Glu | 0.83 | |
| Ser-Cys, Ala-Leu, Gly-Trp, Lys-Pro, Phe-Ala, Tyr-Pro, Ala-His, Pro-Arg counts; frequency of Glu-Asp, Tyr-Pro | 0.77 | |
| Pro-Ile count | 0.73 | |
| Gly-Leu, Sulfur, Cys-Cys, Glu, Phe-Glu, Met-Thr, Tyr-His, Cys-Gly, Asp-Thr, Pro-Ser, Arg-Pro, Gln-Cys counts; negatively charged residues, Leu-His, Cys-Pro, Ser-Thr; frequency of His-Glu, Asp, Thr-Ala; negatively charged residues, Pro, Cys-Cys, Trp-Lys, Asp-Thr, Gln-Cys, Trp, Leu-Trp, Pro-Thr | 0.71 |
Figure 1.Tree induced by decision tree algorithm on discretized data with gain ratio criterion.
Abbreviations: H, hyperaccumulator; T, tolerant.
Mean ± standard error of the mean for performances of rule induction and tree induction models. Horizontal continuation of this table is placed on page 17.
| Discretized | 66.62 | 10.24 | 69.24 | 9.35 | 82.78 | 13.65 | 84.0 | 12.4 | 62.9 | 12.1 | 49.2 | 21.9 | |||
| Numerical | 70.67 | 8.27 | 77.18 | 9.24 | 74.94 | 7.12 | 83.1 | 0.08 | 68.1 | 11.7 | 59.3 | 11.6 | |||
| Decision tree | Gain ratio | 72.57 | 6.25 | 86.34 | 12.23 | 69.11 | 16.48 | 0.847 | 0.044 | 0.745 | 0.058 | 0.685 | 0.069 | ||
| Information gain | 75.24 | 10.66 | 78.55 | 11.21 | 81.47 | 13.61 | 0.870 | 0.105 | 0.679 | 0.165 | 0.649 | 0.180 | |||
| Gini index | 69.57 | 10.32 | 72.93 | 13.36 | 87.22 | 14.24 | 0.787 | 0.216 | 0.628 | 0.168 | 0.479 | 0.166 | |||
| Accuracy | 56.86 | 12.11 | 70.95 | 17.63 | 51.00 | 16.76 | 0.774 | 0.110 | 0.677 | 0.129 | 0.617 | 0.137 | |||
| ID3 | Gain ratio | 74.00 | 9.73 | 84.62 | 10.67 | 71.44 | 10.59 | 0.889 | 0.109 | 0.654 | 0.181 | 0.569 | 0.212 | ||
| Information gain | 80.86 | 8.94 | 89.74 | 10.23 | 78.28 | 12.83 | 0.936 | 0.092 | 0.716 | 0.179 | 0.718 | 0.139 | |||
| Gini index | 74.00 | 17.05 | 84.94 | 15.07 | 70.14 | 19.04 | 0.769 | 0.119 | 0.633 | 0.168 | 0.519 | 0.222 | |||
| Accuracy | 65.10 | 8.90 | 84.57 | 11.24 | 53.50 | 14.22 | 0.808 | 0.082 | 0.612 | 0.153 | 0.526 | 0.146 | |||
| Decision tree | Gain ratio | 80.10 | 10.34 | 91.89 | 10.81 | 75.53 | 17.07 | 0.960 | 0.067 | 0.524 | 0.065 | 0.658 | 0.166 | ||
| Information gain | 81.38 | 8.93 | 86.84 | 9.56 | 82.47 | 12.39 | 0.917 | 0.073 | 0.750 | 0.144 | 0.693 | 0.152 | |||
| Gini index | 73.05 | 8.60 | 78.58 | 11.65 | 78.28 | 6.79 | 0.861 | 0.079 | 0.705 | 0.140 | 0.641 | 0.124 | |||
| Accuracy | 74.62 | 9.21 | 81.31 | 5.66 | 74.97 | 15.92 | 0.863 | 0.042 | 0.771 | 0.075 | 0.711 | 0.088 | |||
| ID3 | Gain ratio | 80.14 | 10.08 | 86.17 | 10.08 | 80.67 | 14.96 | 0.950 | 0.066 | 0.507 | 0.022 | 0.638 | 0.161 | ||
| Information gain | 80.10 | 8.80 | 88.38 | 8.80 | 77.17 | 13.30 | 0.930 | 0.103 | 0.706 | 0.183 | 0.705 | 0.151 | |||
| Gini index | 82.24 | 7.52 | 90.32 | 7.49 | 79.56 | 12.93 | 0.917 | 0.116 | 0.684 | 0.159 | 0.703 | 0.122 | |||
| Accuracy | 80.29 | 9.94 | 90.97 | 8.02 | 75.22 | 16.14 | 0.931 | 0.075 | 0.843 | 0.085 | 0.761 | 0.113 | |||
Abbreviations: SE, standard error of the mean; AUC, area under curve.
Rule sets (with supports >50%) induced by FP-growth itemset mining on discretized data (Pt was cation transport ATPase (P-type) family; group was animal was cu transporter (low 0–0.35, mid 0.35–0.5, high >0.5)).
| 0.856 | Pro-Cys count was mid | ||
| 0.842 | Protein family was Pt | ||
| 0.801 | Frequency of Pro-Cys was mid | ||
| 0.801 | Pro-Cys count was mid | Frequency of Pro-Cys was mid | |
| 0.705 | Pro-Cys count was mid | Protein family was Pt | |
| 0.685 | Frequency of Gly-Ile was mid | ||
| 0.664 | Group was animal | ||
| 0.664 | Protein family was Pt | Frequency of Pro-Cys was mid | |
| 0.664 | Pro-Cys count was mid | Protein family was Pt | Frequency of Pro-Cys was mid |
| 0.630 | Leu-Val count was mid | ||
| 0.623 | Val-Leu count was high | ||
| 0.616 | Frequency of Gly-Thr was mid | ||
| 0.589 | Pro-Cys count was mid | Frequency of Gly-Ile was mid | |
| 0.589 | Protein family was Pt | Group was animal | |
| 0.582 | Frequency of Thr-Gly was hiigh | ||
| 0.582 | Frequency of Leu-Lie was mid | ||
| 0.582 | Gly-Lie count was high | ||
| 0.575 | Frequency of Lie-Val was high | ||
| 0.575 | Frequency of Lys-Arg was mid | ||
| 0.575 | Frequency of His- Pro was mid | ||
| 0.575 | Frequency of Pro-Cys was mid | Frequency of Gly-Ile was mid | |
| 0.575 | Pro-Cys count was mid | Frequency of Pro-Cys was mid | Frequency of Gly-Ile was mid |
| 0.568 | Frequency of Leu-Val was mid | ||
| 0.568 | Leu-Lie count was mid | ||
| 0.568 | Pro-Cys count was mid | Group was animal | |
| 0.568 | Protein family was Pt | Frequency of Gly-Ile was mid | |
| 0.562 | Thr-Leu count was mid | ||
| 0.562 | Pro-Cys count was mid | Leu-Val was mid group was A | |
| 0.562 | Frequency of Pro-Cys was mid | ||
| 0.562 | Pro-Cys count was mid | Frequency of Pro-Cys was mid | Group was animal |
| 0.555 | Thr-Val count was high | ||
| 0.555 | Thr-Gly count was high | ||
| 0.555 | Lys-Arg count was mid | ||
| 0.548 | Frequency of Val-Leu was high | ||
| 0.548 | Frequency of Thr-Arg was mid | ||
| 0.548 | Pro-Cys count was mid | Frequency of Gly-Thr was mid | |
| 0.541 | Frequency of Lie-Gly was mid | ||
| 0.541 | Lie-Pro count was mid | ||
| 0.541 | Leu-Met count was mid | ||
| 0.541 | Protein family was Pt | Val-Leu count was high | |
| 0.534 | Frequency of Val-Val was mid | ||
| 0.534 | Frequency of Gly-Leu was mid | ||
| 0.534 | Lie-Val count was high | ||
| 0.527 | Frequency of Thr-Val was mid | ||
| 0.527 | Frequency of Leu-Ala was mid | ||
| 0.527 | His-Pro count was mid | ||
| 0.527 | Pro-Cys count was mid | Val-Leu count was high | |
| 0.527 | Pro-Cys count was mid | Frequency of Lie-Val was high | |
| 0.521 | Lie-Gly count was mid | ||
| 0.521 | Phe-Gly count was mid | ||
| 0.521 | Pro-Cys count was mid | Frequency of His-Pro was mid | |
| 0.521 | Protein family was Pt | Frequency of His-Pro was mid | |
| 0.521 | Frequency of Pro-Cys was mid | Leu-Val count was mid | |
| 0.521 | Pro-Cys count was mid | Frequency of Pro-Cys was mid | Leu-Val count was mid |
| 0.514 | Frequency of Pro-Val was mid | ||
| 0.514 | Ala-Gln count was mid | ||
| 0.514 | Pro-Cys count was mid | Frequency of Thr-Gly was high | |
| 0.514 | Pro-Cys count was mid | Gly-Ile count was high | |
| 0.514 | Frequency of Pro-Cys was mid | Val-Leu count was high | |
| 0.514 | Pro-Cys count was mid | Frequency of Pro-Cys was mid | Val-Leu count was high |
| 0.507 | Frequency of Val-Ser was mid | ||
| 0.507 | Frequency of Thr-Cys was mid | ||
| 0.507 | Frequency of Phe-Gly was mid | ||
| 0.507 | Val-Gly count was mid | ||
| 0.507 | Val-Glu count was mid | ||
| 0.507 | Thr-Ala count was high | ||
| 0.507 | Leu-Gly count was mid | ||
| 0.507 | Leu-Ala count was high | ||
| 0.507 | Ala-Thr count was mid | ||
| 0.507 | Pro-Cys count was mid | Frequency of Leu-lie was mid | |
| 0.507 | Protein family was Pt | Leu-Val count was mid | |
| 0.507 | Protein family was Pt | Frequency of Lie-Val was high | |
| 0.507 | Frequency of Pro-Cys was mid | Frequency of Gly-Thr was mid | |
| 0.507 | Frequency of Pro-Cys was mid | Frequency of lie-Val was high | |
| 0.507 | Frequency of Lie-Val was high | Lie-Val count was high | |
| 0.507 | Frequency of Lys-Arg was mid | Lys-Arg count was mid |
Accession, metals, type of pump, and organism of each amino acid sequence of P1-ATPase.
| Q70Q04 | Zn/Cd | H | |
| Q9UVL6 | Cu | H | |
| Q9P983 | Cd | H | |
| Q9P458 | Cu | H | |
| Q96WX2 | Cu | H | |
| Q941L1 | Cu | H | |
| Q92T56 | Zn/Cd/Pb | H | |
| Q8ZS90 | Cu/Ag | H | |
| Q8H028 | Cu | H | |
| Q88CP1 | Cd | H | |
| Q7XU05 | Cu | H | |
| Q70LF4 | Zn/Cd | H | |
| Q6ZDR8 | Cu | H | |
| Q6JAg2 | Cu | H | |
| Q6H7M3 | Cu | H | |
| Q6H6Z1 | Cu | H | |
| Q69AX6 | Zn/Cd/co | H | |
| Q655X4 | Cu | H | |
| Q5AQ24 | Cu | H | |
| Q5API0 | Cu | H | |
| Q59465 | Zn/Cd/co | H | |
| Q59385 | Cu | H | |
| Q4WQF3 | Cu | H | |
| Q3ZDL9 | Zn/Cd | H | |
| Q2I7E8 | Cd | H | |
| Q10QZ3 | Cu | H | |
| Q10QZ2 | Cu | H | |
| Q0JB51 | Cu | H | |
| Q0E3J1 | Cu | H | |
| Q0DAA4 | Cu | H | |
| P38360 | Cd | H | |
| B8BBV4 | Cu/Ag | H | |
| B8B185 | Cu/Ag | H | |
| B8APM8 | Cu/Ag | H | |
| B8AIJ3 | Cu/Ag | H | |
| B8ADR7 | Cu/Ag | H | |
| B6HT11 | Cu | H | |
| B6HC49 | Cu | H | |
| B6H689 | Cu | H | |
| B6H165 | Cu | H | |
| B6GWG5 | Cu | H | |
| B5VEN9 | Cd | H | |
| B3LML9 | Cd | H | |
| B2Y4P1 | Zn/Cd | H | |
| B2Y4N2 | Zn/Cd | H | |
| B2Y4N1 | Zn/Cd | H | |
| B2APT4 | Cu | H | |
| B2AAH3 | Cu | H | |
| B0Y4L9 | Cu | H | |
| B0XWU3 | Cu | H | |
| A6ZLN2 | Cd | H | |
| A5DRE2 | Cu | H | |
| A5DHC6 | Cu | H | |
| A3BU99 | Cu | H | |
| A3BEE3 | Cu | H | |
| A3AWA4 | Cu | H | |
| A1CL19 | Cu | H | |
| A1CII4 | Cu | H | |
| Q60048 | Cd | S | |
| Q31HQ5 | Cu/Ag | S | |
| Q31H35 | Cu2+/Cu/mg | S | |
| Q31E73 | Cu/Ag | S | |
| Q31DS4 | Cu/Ag | S | |
| B5AXL4 | Cu | S | |
| Q9ZHC7 | Cu | T | |
| Q9SZW4 | Zn/Cd | T | |
| Q9SH30 | Cu | T | |
| Q9S7J8 | Cu | T | |
| Q9JZI0 | Cu | T | |
| Q9I147 | Zn/Cd/Pb | T | |
| Q9C594 | Cu | T | |
| Q94KD6 | Cu | T | |
| Q8ZRG7 | Cu/Ag | T | |
| Q8VPE6 | Cu2+/Cu/Ag | T | |
| Q8RVG7 | Cd | T | |
| Q8LPW1 | Zn/Cd | T | |
| Q8L158 | Zn/Cd | T | |
| Q8H384 | Zn/Cd | T | |
| Q88RT8 | co | T | |
| Q830Z1 | co | T | |
| Q7Y051 | Cu | T | |
| Q7SGS2 | Cu | T | |
| Q7S316 | Zn/Cd/pb | T | |
| Q7RZE4 | Cu | T | |
| Q7A3E6 | Cu | T | |
| Q75C31 | Cu | T | |
| Q750J2 | Cu | T | |
| Q72N56 | Cu/Ag | T | |
| Q6MK07 | Cu/Ag | T | |
| Q6JAH7 | Cu | T | |
| Q6JAg3 | Cu | T | |
| Q6CS43 | Cu | T | |
| Q6CKX1 | Cu | T | |
| Q6BVG6 | Cu | T | |
| Q6BIS6 | Cu | T | |
| Q654Y9 | co | T | |
| Q5K722 | Cu | T | |
| Q58AE3 | Cu/Ag/Zn/Cd/pb | T | |
| Q4WYE4 | Cu | T | |
| Q4PI36 | Cu | T | |
| Q4PFU4 | Cu | T | |
| Q3MNJ6 | Cu/Ag | T | |
| Q3E9R8 | Cu | T | |
| Q12685 | Cu | T | |
| Q0WUP4 | Cd | T | |
| Q0WPL5 | Cu | T | |
| Q0D7L9 | Zn/Cd/pb | T | |
| P37617 | Zn/Cd/pb/Au | T | |
| P32113 | Cu | T | |
| P20021 | Cd | T | |
| P0A503 | Zn/Cd/pb | T | |
| P05425 | Cu2+/Cu/Ag | T | |
| O67432 | Cu/Ag | T | |
| O67203 | Cu2+ | T | |
| O64474 | Zn/Cd | T | |
| O32220 | Cu | T | |
| O32219 | Zn/Cd/co | T | |
| O31688 | Co | T | |
| B9WHL7 | Cu | T | |
| B9W8U7 | Cu | T | |
| B8PIS7 | Cu | T | |
| B8PD13 | Cu | T | |
| B8B248 | Zn/Cd/pb | T | |
| B8B1T9 | co | T | |
| B6TVS8 | Cu | T | |
| B6K2D1 | Cu | T | |
| B5AXM3 | Cu | T | |
| B5AXJ3 | Cu | T | |
| B5AXJ0 | Cu | T | |
| B5AXI8 | Cu | T | |
| B5AXI7 | Cu | T | |
| B5AXI6 | Cu | T | |
| B4FW89 | co | T | |
| B3LG21 | Cu | T | |
| A9NIX0 | Zn/Cd/pb | T | |
| A8FHF8 | Cu/Ag | T | |
| A8FHE7 | Zn/Cd/co | T | |
| A8FCJ1 | co | T | |
| A7ISW5 | Cu | T | |
| A6ZYM2 | Cu | T | |
| A5E2U1 | Cu | T | |
| A5E1L1 | Cu | T | |
| A3LVL5 | Cu | T | |
| A3LRS8 | Cu | T | |
| A3GG72 | Cu | T | |
| A3BI12 | Zn/Cd/pb | T | |
| A3BF39 | Zn/Cd/pb | T | |
| A2YJN9 | Zn/Cd/pb | T | |
| A2YED2 | Zn/Cd/pb | T | |
| A1D6E8 | Cu | T | |
| A1CW79 | Cu | T | |
| Q8J286 | Cu | ||
| Q0WXV8 | Cu | ||
| Q0SAU6 | Cu/Ag | ||
| B8PCW0 | Zn/Cd/pb | ||
| B2WP89 | Cu | ||
| B2WCY5 | Cu | ||
| B2W577 | Cu | ||
| B0STR2 | Cu | ||
| A7TLU7 | Cu | ||
| A7JVC8 | Cu | ||
| A6SEF3 | Cu | ||
| A6SAI2 | Cu | ||
| A6RXG0 | Cu | ||
| A6RAT8 | Cu | ||
| A6R8J5 | Cu | ||
| A4RDM4 | Cu | ||
| A4QR04 | Cu |
Abbreviations: H, hyperaccumulator; T, tolerant; S, Sensitive
Standard amino acid abbreviations.
| Alanine | Ala | A |
| Arginine | Arg | R |
| Asparagine | Asn | N |
| Aspartic acid | Asp | D |
| Cysteine | Cys | C |
| Glutamic acid | Glu | E |
| Glutamine | Gln | Q |
| Glycine | Gly | G |
| Histidine | His | H |
| Isoleucine | Ile | I |
| Leucine | Leu | L |
| Lysine | Lys | K |
| Methionine | Met | M |
| Phenylalanine | Phe | F |
| Proline | Pro | P |
| Serine | Ser | S |
| Threonine | Thr | T |
| Tryptophan | Trp | W |
| Tyrosine | Tyr | Y |
| Valine | Val | V |
Abbreviations: H, hyperaccumulator; T, tolerant; S, sensitive.