Literature DB >> 23908587

Association of putative members to family of mosquito odorant binding proteins: scoring scheme using fuzzy functional templates and cys residue positions.

Malini Manoharan1, Kannan Sankar, Bernard Offmann, Sowdhamini Ramanathan.   

Abstract

Proteins may be related to each other very specifically as homologous subfamilies. Proteins can also be related to diverse proteins at the super family level. It has become highly important to characterize the existing sequence databases by their signatures to facilitate the function annotation of newly added sequences. The algorithm described here uses a scheme for the classification of odorant binding proteins on the basis of functional residues and Cys-pairing. The cysteine-based scoring scheme not only helps in unambiguously identifying families like odorant binding proteins (OBPs), but also aids in their classification at the subfamily level with reliable accuracy. The algorithm was also applied to yet another cysteine-rich family, where similar accuracy was observed that ensures the application of the protocol to other families.

Entities:  

Keywords:  Classification of proteins; Functionally important residues; Ligand binding residues; cysteine-based scoring scheme

Year:  2013        PMID: 23908587      PMCID: PMC3728099          DOI: 10.4137/BBI.S11096

Source DB:  PubMed          Journal:  Bioinform Biol Insights        ISSN: 1177-9322


Introduction

The role of olfaction is the major source for host identification among mosquitoes. The molecular basis of this chemical signal recognition is systematically encoded by a series of proteins. Odorant binding proteins are thought to be the primary proteins involved in the transport of odorants and pheromones to the olfactory receptors.1,2 Members of this protein family have been identified in a number of insect species, including four dipterian species Drosophila melanogaster.3,4Anopheles gambiae,4,5Aedes aegypti6 and Culex quinquefasciatus.7 Since their identification, this family of proteins has been of immense focus in the field of biology, as they could act as important target proteins. However, the sequence divergence of this family is very high in comparison to their function, which is to bind to a wide range of odorant molecules. It has been difficult to classify these proteins into different subfamilies for this reason. 3 major subfamilies have been defined previously in this family of proteins, which are Classic, PlusC and Atypical based on their cysteine conservation patterns. In general, biological sequence data are accumulating rapidly as a result of advanced sequencing technology and concerted genome projects, at a greater rate than growth in computing efficiency.8 The probability that a new protein can be classified as part of a sequence family is already near 30%.9 Encouragingly, evolutionary constraints on protein sequences are imposed by requirements of 3-dimensional structure and biological function, which are main aspects employed for the classification of proteins. Generally, functional requirements are known to be more pronounced in terms of residue conservation, where an occurrence of completely conserved residues indicates a specific biological function. Many examples of such occurrences have been reported in protein sequences; 2 examples are the Ser-His-Asp triad of serine proteases10 and the zinc finger motif of deoxyribonucleic acid (DNA)-binding proteins.11 Mutation of such residues generally renders the protein inactive. Such residues can be either spread across the entire stretch of the protein or can be observed as conserved contiguous patterns termed “functional motifs”. Such conservation status has been employed in annotating protein sequences by different methods reviewed by Ouzounis et al.12 Though many of these methods fare well at assigning an unknown protein at a family level, the accuracy fails when a classification is required at a subfamily level. Several such function prediction algorithms require the availability of structural information, namely spatial interactions of residues of query sequences, in order to recognize preservation of geometry of functional residues. These include methods like Conserved functional group (CFG).13,14 Whereas such methods could be quite applicable for proteins of unknown function, determined by structural genomics initiatives, structural information is either not available for most query sequences or the quality of models, derived by homology, could be limited. Residues near the active site might play an auxiliary role and are less easy to identify as part of “functional motifs”. Sequence conservation of functional residues is therefore less obvious for residues that modulate the specificity of biological function. These residues change as a protein evolves to satisfy modified functional constraints, while the basic biochemical mechanism and the overall three-dimensional fold remain unaltered. In such cases, representative residues, associated with structural aspects of a protein, serve as better classifiers. Cysteine, as a sulphur containing non-essential biogenic amino acid, plays critical roles in a number of metabolic processes. It is found as a part of a number of biological important proteins associated with important roles starting from folding to maintaining the integrity of structure to function. One of the most important roles of cysteines is the formation of disulphide bridges involved in the folding of proteins to form 3-dimensional structures. Disulphide bonds, which are formed by cysteines that may be sequentially apart but spatially proximate,15 define the rigidity of large globular proteins. These disulphide bonds are generally conserved among related proteins16–18 and the connectivity patterns can be used to identify proteins of similar 3-D structure.19 The conservation of disulphide bond connectivity pattern enables the identification of remote homologues even when most of popular sequence search methods fail to do so. Such approaches, however, are complicated by observations of topologically equivalent disulphide bonds in non-homologues and also by non-equivalent numbers of disulphide bonds in close homologues.20 Owing to the fact that disulfide connectivity pattern formation in a protein is a directed (ie, non-random) process,21 this property can be used to obtain a structural classification of proteins. A large variety of connectivity patterns are found in disulphide-containing proteins.21,22 In proteins with low sequence similarity, identical connectivity patterns can indicate high structural homology. Proteins that share a disulfide bonding pattern usually belong to the same structural family. Therefore, disulfide connectivity patterns provide a rapid and simple method for structural characterization of protein sequences and for examining structural properties, such as protein topologies.21 entropic effects of cross-linkage,22 structural superimposition of proteins by means of their disulfide bridge topology20 and taxonomy of small disulfide-rich protein folds.22 In addition, methods that classify proteins based on their connectivity patterns have also been established.23 A systematic method for the classification of disulphide-rich proteins based on cysteine conservation is thus worth undertaking. Previous attempts on cysteine-based classification of proteins included approaches based on cysteine pairing,23 identification of odorant binding proteins based on cysteine motifs,4 conotoxin superfamily classification using pseudo amino acid composition and multi class support vector machines,24 and classification of peroxiredoxins using regular expressions.25 An algorithm has been devised that can efficiently classify a new protein as an odorant binding protein belonging to a particular class by capturing specific information in terms of (1) functional residue conservation and (2) cysteine conservation and disulphide connectivity. The functional residue-based scoring scheme relies on the conservation of residues at functionally important sites (only sequence information) and a flexible distance-based scheme (also structural data). The functionally important sites were determined by the mapping of ligand binding residues on the structural alignment of the available structural members. The test sequences were aligned to the structural alignment and scores were assigned based on the residue conservation at these functional sites. The scoring of the distance-based scheme was based on a distance criterion between the residues at these positions. The distance criteria were established by observing the distances between the residues in the functional sites, including the ‘fuzziness’; ie, the variation in distances observed among the crystal structures. The scores were calculated by a fit criterion after examining the distances within the models of unknown sequences. In our approach, for the queries whose structure is not yet available and homology modeling is unreliable due to relationship distance, a simple amino acid conservation-based scoring scheme is adopted that objectively measures the extent of conservation of functionally important residues (please see ‘Scoring of query sequences’ within the Methods section for details). Distances between such residues are not required or employed in this novel option. For the cysteine-based scheme, a “disulphide profile” of aligned sequences19 has been employed of the various classes. The query sequences are aligned with these disulphide profiles followed by assigning a score based on the conservation of the cysteines in the query and further classifying them based on a composite classification scheme. These classification methods were primarily developed for the classification of odorant binding proteins in the mosquito genome. However, the functional residue-based classification was further extended to the serine protease family, where the classification of query sequences using the method into 3 subfamilies has been described. The cysteine-based classification was also implemented on the conotoxin family of proteins to extend the use of this method for the classification of disulphide-rich protein families at the subfamily level.

Methodology

Datasets

7 structural entries of odorant-binding proteins (OBPs; PDB ID: 1dqe, 2wcj, 2gte, 2erb, 3k1e, 3bfh, 1ow4), available then, were used for the construction of the structural alignment. The dataset used in this analysis is comprised of 116 conotoxin sequences24 and 284 odorant binding proteins from mosquito genomes.26 The conotoxins are classified into 7 classes. The odorant binding proteins are classified into 3 major classes including Classic, PlusC and Atypical; the Atypical are further divided into 4 subtypes (MAtype1-4). Representative sequences were chosen from the different classes for the construction of the training profile and the other sequences were used in the test set (Table 1).
Table 1

Datasets used as training and test sets to build and assess scoring schemes for the identification of OBPs. (A) The OBP family dataset representing number of representative sequences used in constructing the profile (training dataset) and test set in the different classes respectively. (B) The conotoxin family dataset representing number of representative sequences used in constructing the profile (training dataset) and test set in the different classes respectively.

Protein subfamilyTraining datasetTest dataset
(A)
Classic18104
Plus C949
Minus C18 (Classic OBPs)17
Atypical 160
Atypical 2626
Atypical 364
Atypical 4633
(B)
Class A619
Class M67
Class O655
Class T611

Construction of profiles

A structural alignment constructed using COMPARER27 was used as a profile for the functional residue-based scoring scheme (Fig. 1). For the cysteine-based scoring scheme, representative sequences from each class, which have conserved cysteines at all the positions under consideration, were aligned separately using ClustalW.28 This alignment of representative sequences was used as a training profile for the classification of query sequences. The number of sequences in the training profile and the number of cysteine positions under consideration vary for the different classes of the protein. Thus, a number of training profiles equal to the number of classes was generated.
Figure 1

Alignment of available structures of odorant binding proteins using COMPARER.

Notes: The conserved cysteines are colored in blue ad functional residues are colored in red and the 12 positions used as functional sites for the scoring scheme are labeled respectively from 1–12 above the alignment. The functional residues are as shown on one example structure: 2erb.

Construction of fuzzy functional template

For the functional residue-based scoring scheme based on functional residues, a fuzzy functional template was constructed. Ligand binding residues, for each of the ligand-bound forms of each of the structural entries of OBPs mentioned above, were identified using LIGPLOT. These residues were mapped on the structural alignment (Fig. 1). 12 residue positions were considered as functionally important as marked in Figure 1. Cβ–Cβ distances between residues at these positions for each of the structural entries were calculated and averaged. The upper and lower limit for the distances were set to ±2 standard deviations (SD) from the average distance and represented in the form of a matrix (Fig. 2). This logic of inscribing distance variation amongst functionally important residues is the same as that adopted by Skolnick’s group in an earlier study.29
Figure 2

Fuzzy functional template investigated to score the dissimilarity between OBPs.

Notes: The matrix represents the distance criteria threshold between the 12 functional sites averaged over data from the available structural members. The distances between pairs which have an SD < 2 are colored yellow.

For the serine protease family, 11 structural entries from the thrombin subfamily (1ai8, 1avg, 1hao, 1mkx, 1ucy, 2hpp, 3hk3, 3k65, 3nxp, 3pma, 3qlp), 15 structural entries from the trypsin family (1aoj, 1aks, 1an1, 1fxy, 1hj8, 1jrs, 1pq7, 2a31, 2eek, 2f91, 2ra3, 3beu, 3fp7, 3mi4, 3p95) and 4 structural entries from the plasminogen activator (1a5h, 1a5i, 1bqy, 1rtf) subfamily were used for the construction of the structural alignment. The functional positions were adopted in a similar manner to the functional sites described by Skolnick et al29 125 annotated query sequences from all 3 subfamilies (derived from SWISSPROT) were aligned to each of the subfamily profiles and the scores were checked for every query sequence against each profile.

Scoring of query sequences

Functional residue based scoring scheme

Different scoring functions were defined for scoring the conservation of residues in the functional positions based on their occurrence, probability of occurrence and by consulting the Dayhoff matrix. Majority-based scheme: In this scheme, a score of 1 is given to a position in the query sequence if it has the amino acid which occurs majority of times at that position in the structural alignment (from known observations) and finally these scores are averaged for all the 12 positions. Probability-based scheme: A score is given to each amino acid at a position in the query sequence equal in magnitude to its probability of occurring at that position. In one scheme (PROB_1), the scores are finally averaged for all the 12 positions, and in the second scheme (PROB_2), the sum of scores is divided by the sum of the maximum probabilities of occurrence each position. Dayhoff matrix-based scheme: For each position in the query sequence, the score is calculated as the product of probability of each amino acid occurring at that position in the template and the Dayhoff Matrix score for the amino acid substitution from that AA to the residue present in the query. Finally, the scores are averaged for all the 12 positions. However, this matrix of amino acid exchanges are recorded and normalized as observed for large numbers of unrelated protein families and are also not position-specific in nature. Given a query string Q with amino acid Qi at functional position i, where 0 ≤ i ≤ p and a training profile T which is an alignment with i functional positions. The scores according to the different schemes are defined as follows: Majority based score: Probability_1 based score: Probability_2 based score: Dayhoff Matrix based score: where: p = # of functional positions under consideration n = # of sequences in the training profile (Structure alignment) T = Amino acid at position i in the sequence j of the training profile Q = Amino acid at position i of the query sequence m = Maximum probability of occurrence of any amino acid at position i M(A,B) = Entry in substitution matrix for amino acid A being substituted by B P(A) = Probability of amino acid A occurring at position i in the training profile.

Functional residue distance-based scoring scheme

Cβ–Cβ distances of the residues at the functional positions were calculated from the models of 131 classic OBP sequences (data not shown). The distances in the fuzzy functional template (FFT) residue pairs with SD < 2 were considered for the final scoring scheme. The query sequences were aligned to the structure alignment profile and the distances between residues corresponding to the functional position were calculated in their respective models. If the distance of the residue pairs fall within the upper and lower limits assigned for those residue pairs in FFT, a score of 1 was awarded (else score is 0) and averaged for the 12 functional positions.

Cysteine-based scoring scheme

Each query sequence was aligned separately with each of the training profiles using the sequence to profile alignment method in ClustalW28 and checked for the conservation of cysteines. If a cysteine was found at a position, a score of ‘1’ was given; otherwise a score of ‘0’ was given. In this study, a cysteine in the query is assumed to be ‘strictly conserved’ if it aligns perfectly with the cysteine position in the training profile. However, according to the ‘relaxed criterion’, an arbitrary shift of 2 residues on either side of the cysteine positions in the training profile is allowed for uncertainties in the sequence alignment. In addition to the scores for cysteine conservation, an extra score of ‘1’ is added for the conservation of each cysteine pair involved in disulphide bond formation. Such position-scores are normalized for all the positions within that class and an average score is obtained for each class for each query sequence (Supplementary Fig. 1). Thus, score of a query with the training profile of each class is a measure of its likelihood of belonging to that class.

Composite classification scheme

A composite classification scheme was devised for the classification of OBPs and conotoxins based on the scores for each class, the length of the query and the distance between the cysteines involved in disulphide formation (loop spacing; Supplementary Figs. 2 and 3). Thus, if it is an ‘N’-class problem, then for each query, there will be ‘N’ score parameters (one for each class), a length parameter and a variable number of loop spacing (depending upon the classes). The loop spacing (number of amino acids along the sequence between the 2 cysteines involved in disulphide bonding) parameter would be extremely useful to distinguish between classes with the same cysteine motif but different disulphide connectivity patterns. This flexibility was introduced since it is expected that the loop spacing is more or less conserved throughout the members of a family, even if other inter-cysteine distances are not.

Re-substitution test of the cysteine based classification scheme

The re-substitution test is one of the important methods of evaluating predictive accuracy. In this test, the training set used to generate the classifier is itself used to test the classification model. In other words, the test set is the same as the training set. The re-substitution test is extremely important because it reflects the self-consistency of an identification scheme, and most importantly, the algorithm.

Results and Discussion

Functional sites and fuzzy functional template

Functional residues of proteins involved in ligand binding are generally conserved through the evolution of proteins and generally considered as good classifiers of protein families and for function annotation.13 The ligand-binding residues from the bound complexes of the available PDB entries were mapped to the structural alignment generated by COMPARER.27 For the family of insect odorant binding proteins, the positions of the alignment, which had ligand-binding entries in at least 4 of the 7 PDB entries, were considered to be significant functional residue positions. 12 such positions were considered to be components of the functional template (Fig. 1). The Cβ–Cβ distance between these 12 residues were calculated and averaged in the form of a matrix called the ‘fuzzy functional template’ (FFT). The distance limits were set by indicating the average ±2 SDs, since the distances between the residues pairs were quite variable. The distances in the matrix that were less than 2 SDs from the mean were considered for the calculation of the scores. 12 such distances were identified involving 12 residue pairs in the matrix (Fig. 2). These distances were used for the scoring function.

Structure-based scoring scheme

The structure-based scoring scheme shows a good range of scores (0.3–1.0). However, there were low scoring sequences observed in the test cases. The scores were independent of the sequence identity to its template (Fig. 3). However, a limitation of this method is the fact that the test set consisted of models derived from members of the training set used as templates. This method could be applied only to proteins that have a structural entity or for query sequences for which a homology model could be derived, and thus the method was applied only on the classic odorant-binding proteins.
Figure 3

Scatter plot representing the effect of sequence identity on the sequence-based scores with sequence identity on the X-axis and scores on the Y-axis. (A) Effect of sequence identity on sequence-based scoring scheme. (B) Effect of sequence identity on structure based scoring scheme.

Functional residue-based scoring scheme

The ‘PROB_2’ scoring scheme, with the addition of homologues, achieves the best range and correlation. The scores were based on the occurrence, probability of occurrence and Dayhoff matrix as described in the Methods section. For the family of insect odorant-binding proteins, different training datasets were analyzed that include (1) a 7-member training set, which is the initial structure alignment, (2) a 25-member dataset where the 7-member dataset was populated (to include evolutionary data) with one additional close homologue from each of the mosquito genomes to every member in the 7-member dataset, (3) a 5-member dataset where the 2 mosquito crystal structures 2erb and 3k1e were removed to avoid potential bias in scoring the models (since these 2 structures served as templates for modeling) and (4) an 18-member dataset from which the 2 mosquito crystal structures and their homologues were excluded. The range of scores for each of the methods on every training set were analyzed and it was observed that the probability score PROB_2 achieved the best range, followed by the majority-based scores (Table 2A), and that they also achieved the best correlation when compared to other 2 methods (Table 2B). It was also observed that addition of homologues to the initial dataset significantly improved the range and correlation.
Table 2

Correlation and distribution of scores by the different schemes. (A) Distribution of the scores obtained from each of the different schemes based on each training set showing that the Prob_2 scheme achieves the highest range among the 4. (B) Correlation between the scores of the different schemes tested on various training sets showing that the Probability based scores have higher correlation with the other 2 types of scores.

7 member training set25 member training set5 member training set18 member training set
(A) Scoring scheme
Majority0.08–0.750.0–0.920.0–0.330.0–0.67
Prob_10.01–0.350.03–0.420.02–0.270.05–0.25
Prob_20.03–0.880.08–0.980.02–0.590.2–0.95
Dayhoff0.3–0.750.3–0.540.33–0.440.28–0.41
(B) Score
Prob. vs. Maj.0.870.960.760.81
Day vs. Maj.0.840.860.630.66
Day vs. Prob.0.720.810.460.6

All 12 positions in the scoring scheme are equivalent in importance

It was important to analyze whether certain functional site positions contributed more to the scores in order to provide different weights on the positions. This was done by jack-knifing each of the 12 individual positions and recalculating the scores for the initial 7-member dataset. The Pearson correlation coefficient between the scores were calculated after removing each of the 12 residue positions (Table 3) and it was observed that the removal of any one position from the scoring scheme did not significantly alter the scores.
Table 3

Pearson correlation co-efficient between the scores using all 12 functional positions and on jack-knifing each position from the 7-member dataset to analyze the contribution of individual function positions on the score.

ScoreW/O1W/O2W/O3W/O4W/O5W/O6W/O7W/O8W/O9W/O10W/O11W/O12
Maj.0.950.960.980.990.970.950.950.960.980.990.970.95
Prob.0.980.980.980.980.971.000.980.980.980.980.971.00
Day.0.970.950.980.990.980.990.970.950.980.990.980.99

Note: All the scores are very similar after jack-knifing any of the positions, which leads to the conclusion that all the 12 positions in the profile are equivalent.

The scores are independent of the sequence identity of the query sequence with the template

Since the scoring scheme is based on the probability of occurrence of an amino acid, the effect of sequence identify on the scores had to be considered carefully. A histogram of the number of sequences versus the sequence identity of the protein with the closest structural template in the dataset was plotted (Fig. 3). The distribution of the graph indicated that the scores are indeed independent of the sequence identity. A histogram of the number of sequences versus the percentage sequence identity of the query sequence with the template was plotted and the consistently high-scoring and low-scoring sequences were marked on it (Fig. 4). It was observed that the distribution of the low scoring and high scoring queries was independent of sequence identity.
Figure 4

Histogram of the number of sequences versus the % identity of the query sequence with the template.

Note: The sequences labeled in red are high scoring while those labeled in black are low scoring.

Comparison of the sequence-based scoring scheme with sequence searches and phylogenetic analyses

We find that our simple sequence-based objective scoring scheme works better than domain-based subfamily association or phylogeny-based associations; for example, in the case of odorant binding proteins, which fall into three major subfamilies the Classic PlusC and Atypical as described earlier in the manuscript. When each of these members are searched against the conserved domain database it is observed that in many cases cross-talk is seen with respect to subfamily (Supplementary Table 1). For example, most of the Plus C Obps are never identified to carry the PBP_GOBP domain, and atypical OBPs, which should be predicted to have two PBP_GOBP domains, are predicted to have only 1 PBP_GOBP domain. In contrast, the current method is able to exactly classify these proteins to their respective subfamilies. It is also difficult to infer sequence associations from phylogenetic trees to provide a meaningful classification of the different subfamilies in the case of the odorant binding proteins. The phylogenetic trees were inferred separately for odorant binding proteins from each of the mosquito genomes using the neighbor-joining method in MEGA 4.0 26 (Supplementary Fig. 4A–C). In the phylogenetic trees of OBPs from Anopheles gambiae, Aedes aegpti and Culex quinquifasciatus, the different subfamilies were not clustered together with significant bootstrap support due to the high sequence divergence that is observed.

Application of sequence-based scoring scheme on serine protease subfamilies

Serine proteases are one of the largest groups of proteolytic enzymes with a nucleophilic serine residue at the active site and are believed to constitute nearly 1/3 of all the known proteolytic enzymes. They include exopeptidases and endopeptidases belonging to different protein families grouped into clans. They function as part of diverse biological processes such as digestion, blood clotting, fertilization, development, complement activation, pathogenesis, apoptosis, immune response, secondary metabolism, with imbalances causing diseases like arthritis and tumors. The current method was applied to 3 families to see if the method can classify the sequences into these 3 subfamilies: Trypsin, Thrombin and Plasminogen Activator. The method was tested on 125 serine protease sequences from the three subfamilies (Supplementary Table 2). It was observed that the method could classify the proteins into their respective subfamilies effectively.

Cysteine-based scoring scheme

Cysteine positions in protein sequences are the other evolutionarily conserved sites in disulphide-rich protein families. They can be used as effective regular expressions in protein sequences, even among distantly-related proteins, whose classification based on other methods would be quite challenging. However, a sequence-to-sequence alignment algorithm, using one representative sequence for a family, would not provide sufficient accuracy in terms of accounting for the insertions and deletions observed in diverse sequences. A disulphide profile, derived from representative sequences, is more suitable for compensating the occurrences of insertions and deletions.18 The cysteine-based scoring scheme was found to be a more direct way for the identification of OBPs in insects and was used previously in the use of identification of OBPs.4 In this work, however, the scheme has been further extended to classify the OBPs in the mosquito genome. Hence, practically, the algorithm not only predicts the chance of a query sequence to be a putative OBP protein, but also facilities its classification into 1 of the different classes of OBPs that are described below. The OBPs are classified into 4 major classes (i) Classic, which carry 6 conserved cysteine motifs, (ii) PlusC OBPs, which carry an additional 3 conserved cysteines, (iii) Dimer OBPs or Atypical OBPs, which carry 2 Classic OBP domains and hence 12 conserved cysteines and (iv) Minus-C OBPs, which lack 2 Cys residues in comparison with Classic OBPs. The dimer OBPs can be further classified as MAtype1-4; all of them hold 12 conserved cysteines except MAtype2. From the alignments used in the construction of phylogenetic trees, it was observed that the cysteine conservation patterns and spacing could play an important role in the classification of OBPs. This was analyzed by observing the cysteine conservation patterns of sequences in the test datasets when aligned to profiles that were constructed using a training set of each of the classes described above. A training set for the 7 different classes of OBPs (disulphide profiles) was prepared (as summarized in Table 1A). A representative sequence was identified from a phylogeny of odorant binding proteins of each class. For the Minus-C class, the same profile for Classic OBPs was used, but only the 1st, 3rd, 4th and 6th cysteine positions were considered. A composite classification scheme was devised for the family of OBPs incorporating the 7 different scores and the length of sequence as attributes. The protocol was applied to a dataset of 284 mosquito OBP sequences (from Anopheles gambiae, Aedes aegypti and Culex quinquefasciatus) and the class predictions were compared with the predictions of class association independently made from phylogenetic analysis. The ‘confusion matrix’ of the classes predicted by the cysteine based classification scheme versus the phylogeny-based classification is given in Figure 5A. The scheme provides an accuracy of 90.14% when compared with the phylogeny-based classification for the test set sequences. The effect of different classes to this was tested using a re-substitution test.
Figure 5

Confusion Matrix of Classification. (A) Confusion matrix between the phylogeny based classification of odorant binding proteins and the cysteine scoring based classification scheme. (B) Confusion matrix between the classification of conotoxins and the cysteine scoring based classification scheme.

The re-substitution test on the training set gave accuracies of 100%, 100%, 0%, 100%, 66.66% and 100% for Classic, PlusC, Atypical1, Atypical2, Atypical3 and Atypical4 classes, respectively. The sequences in Atypical1, however, form a small group of 6 sequences and do not follow a strict conservation of cysteines as the other classes of OBPs. Hence it was difficult to classify these members by our scheme explaining the poor performance of the re-substitution test for the Atypical1 class.

Application of cysteine-based scoring schemes on well-known superfamily of conotoxins

Since the accuracy of the classification scheme needed further convincing, the algorithm was extended to the well-known cysteine-rich superfamily of conotoxins. Conotoxins are small neurotoxic peptides found in the venom of the predatory cone snails of the genus Conus that act primarily by modulating the activity of specific ion channels. The mature conotoxins are characterized by the presence of multiple disulphide bonds and have been classified into 7 families including A, M, O, I, P, T and S, again on the basis of a highly conserved N-terminal precursor sequence, disulphide connectivity and mode of action.24 Each family is characterized by the presence of 1 or 2 characteristic patterns of disulphide cross-links.30 The prominent disulphide connectivity patterns in the 4 major families of conotoxins are shown in Supplementary Figure 5, and these alone were used for scoring purposes. A classification scheme was developed for conotoxins as shown in Supplementary Figure 3, incorporating the 4 scores corresponding to each of the 4 major families. The classifier (constructed using the training set as shown in Table 2) was tested on a dataset of 116 conotoxin sequences obtained from Mondal et al24 and the predictions made by the scheme were compared with the known classes of the sequences in the study by Mondal et al.24 The scheme gave an accuracy of 93.1% for the test set and the confusion matrix is presented in Figure 5B. The re-substitution test on the training set provided an accuracy of 100% for all 4 families.

Conclusion

Simple domain-finding techniques such as association to Pfam families, can be helpful only to relate mosquito OBPs to the broad family of ‘odorant binding proteins’ (PF01395), but cannot be distinguished as Classic, PlusC and Atypical odorant binding proteins. These subfamilies differ in their sequence features, even though they carry the basic PBP/GOBP domain. In the case of families where the sequence divergence in very high, it is important that family-specific classification methods are derived to obtain a more meaningful functional classification of the family. Evolutionarily-constrained functional and structural entities/signatures, combined with family-specific profile-based scoring, improve the function annotation quality and can also be further extended to a subfamily level classification. Fuzzy functional template-based objective methods, encoded in our structure-based scoring scheme, provide a clear representation of the extent of spatial preservation of known functionally important residues. Such scoring schemes provide an early indication of family members with deviations from the parent family in biological function or the lack of function. Such structure-based scoring schemes could be convenient to rapidly validate a large number of gene products whose high-quality homology models can be automatically obtained. Most popular function prediction methods reported in the literature require structural information or models of query sequences for scoring and recognizing functionally important residues which are only applicable for SGI targets or those sequences where homology models can be obtained reliably. In our approach, there is a novel option to employ only sequence information to score the preservation of functionally important residues. Our pure sequence-based approach is different from other methods that use sequence alignments (like the functional-residue-clustering (FRC) method)31 that lead to abstract data by defining amino acid alphabets and require a joint alignment including subfamily members. The above-described algorithms are shown to work efficiently for protein families such as odorant-binding proteins, serine proteases and conotoxins. We demonstrate that it is possible to apply this approach using large-scale annotation and classification by applying it to new odorant-binding proteins, which are indeed a diverse family of proteins and pose a lot of challenges for regular identification and classification algorithms.32 This could be extended to other diverse families of proteins. However, an in-depth analysis of every superfamily for family-specific signatures and the construction of a composite classification scheme at the subfamily level is required. Schematic representation of the cysteine based scoring scheme. Flowchart of the logistics used in the composite classification scheme of OBPs. Flowchart of the logistics used in the composite classification scheme of the conotoxin family. (A) Rooted phylogenetic tree of the odorant binding proteins in the Anopheles gambiae genome. The Classic OBPs subfamily are colored blue, Atypical OBPs are colored green and PlusC OBPs are colored red. (B) Rooted phylogenetic tree of the odorant binding proteins in the Aedes aegypti genome. The Classic OBPs subfamily are colored blue, Atypical OBPs are colored green and PlusC OBPs are colored red. (C) Rooted phylogenetic tree of the odorant binding proteins in the Culex quinquifasciatus genome. The Classic OBPs subfamily are colored blue, Atypical OBPs are colored green and PlusC OBPs are colored red. Cysteine connectivity patterns in the four major superfamilies of conotoxins, namely superfamily A (A), superfamily M (B), superfamily O (C) and superfamily T (D). Sequences mispredicted by domain based methods and correctly predicted by the current method. Scores for the different subfamilies of serine proteases obtained from the sequence based scoring schemes. Notes: The scores for every query sequence with respect to subfamily specific training set is shown. The highest score for each of the sequences have been highlighted and it can be seen that in majority of the cases a correct classification was obtained for the query sequences.
Table S1

Sequences mispredicted by domain based methods and correctly predicted by the current method.

IDCD-searchCurrent method
AAEL000139No familyPlusC subfamily
AAEL006109No familyPlusC subfamily
AAEL006108No familyPlusC subfamily
AAEL006103No familyPlusC subfamily
AAEL010666No familyPlusC subfamily
AAEL010662No familyPlusC subfamily
AAEL011494No familyPlusC subfamily
AAEL011499No familyPlusC subfamily
AAEL011484No familyPlusC subfamily
AAEL011490No familyPlusC subfamily
AAEL011487No familyPlusC subfamily
AAEL011491No familyPlusC subfamily
AAEL011482No familyPlusC subfamily
AAEL011481No familyPlusC subfamily
AAEL015566No familyPlusC subfamily
AAEL015567No familyPlusC subfamily
AAEL011497No familyPlusC subfamily
AAEL011489No familyPlusC subfamily
AAEL006105No familyPlusC subfamily
AAEL006904No familyPlusC subfamily
AAEL004729No familyPlusC subfamily
AAEL004730No familyPlusC subfamily
AAEL011486No familyPlusC subfamily
AAEL011483No familyPlusC subfamily
AAEL014593No familyPlusC subfamily
AAEL000139ClassicAtypical
AGAP007287No familyPlusC subfamily
AGAP006065No familyPlusC subfamily
AGAP006076No familyPlusC subfamily
AGAP006077No familyPlusC subfamily
AGAP006078No familyPlusC subfamily
AGAP006079No familyPlusC subfamily
AGAP006080No familyPlusC subfamily
AGAP006081No familyPlusC subfamily
AGAP011367No familyPlusC subfamily
AGAP011368No familyPlusC subfamily
AGAP006074No familyPlusC subfamily
AGAP006760No familyPlusC subfamily
AGAP007281No familyPlusC subfamily
AGAP007282No familyPlusC subfamily
AGAP006759No familyPlusC subfamily
AGAP007283No familyPlusC subfamily
AGAP012659No familyPlusC subfamily
AGAP008793No familyClassic
AGAP008979No familyPlusC subfamily
CPIJ004634No familyPlusC subfamily
CPIJ004635No familyPlusC subfamily
CPIJ004630No familyPlusC subfamily
CPIJ002105No familyPlusC subfamily
CPIJ002109No familyPlusC subfamily
CPIJ002108No familyPlusC subfamily
CPIJ006608No familyPlusC subfamily
CPIJ002111No familyPlusC subfamily
CPIJ008867No familyPlusC subfamily
CPIJ008868No familyPlusC subfamily
CPIJ017524No familyPlusC subfamily
CPIJ007337No familyPlusC subfamily
CPIJ017168ClassicAtypical
CPIJ017169ClassicAtypical
CPIJ017163ClassicAtypical
CPIJ017165ClassicAtypical
CPIJ017164ClassicAtypical
CPIJ017166ClassicAtypical
CPIJ017170ClassicAtypical
CPIJ001690ClassicAtypical
CPIJ003867ClassicAtypical
CPIJ003863ClassicAtypical
CPIJ003865ClassicAtypical
CPIJ000653ClassicAtypical
CPIJ008154ClassicAtypical
CPIJ008160ClassicAtypical
AGAP000641ClassicAtypical
AGAP000642ClassicAtypical
AGAP000643ClassicAtypical
AGAP000644ClassicAtypical
AGAP011647ClassicAtypical
AAEL001487No familyPlusC subfamily
AAEL000837ClassicAtypical
AAEL001153ClassicAtypical
AAEL001189ClassicAtypical
AAEL004516ClassicAtypical
AAEL010875ClassicAtypical
Table S2

Scores for the different subfamilies of serine proteases obtained from the sequence based scoring schemes.

QueryTrained using trypin datasetTrained using Thrombin datasetTrained using Plasminogen dataset



MajorityProbabilityDayhoffMajorityProbabilityDayhoffMajorityProbabilityDayhoff
PLAS_P007500.5714290.6188680.600170.6666670.6681610.6206490.9523810.9466670.688571
PLAS_P112140.5714290.6188680.600170.7142860.7174890.6406490.8571430.880.662976
PLAS_P156380.6190480.6716980.6178570.6666670.6681610.6145020.8571430.9066670.68131
PLAS_P196370.5714290.6188680.600170.6666670.6681610.6296970.8571430.880.666786
PLAS_P491500.6190480.6716980.6178570.6666670.6681610.6145020.8571430.9066670.68131
PLAS_P981190.6190480.6716980.6178570.6666670.6681610.6145020.8571430.9066670.68131
PLAS_P981210.6190480.6716980.6178570.6666670.6681610.6145020.8571430.9066670.68131
PLAS_Q281980.5714290.6188680.5977550.6666670.6681610.6187450.9523810.9466670.688571
PLAS_Q5R8J00.5714290.6188680.600170.6666670.6681610.6206490.9523810.9466670.688571
PLAS_Q8SQ230.6190480.6716980.6149320.6666670.6681610.6077920.9047620.920.683095
PLAS_B4DN260.5714290.6188680.600170.6666670.6681610.6206490.9523810.9466670.688571
PLAS_B4DNJ10.5714290.6188680.600170.6666670.6681610.6206490.9523810.9466670.688571
PLAS_B4DRD30.5714290.6188680.600170.6666670.6681610.6206490.9523810.9466670.688571
PLAS_B4DV920.5714290.6188680.600170.6666670.6681610.6206490.9523810.9466670.688571
THR_1197600.6666670.7320750.6144560.7142860.708520.6197840.6666670.7066670.591548
THR_1221446900.6190480.6792450.599388110.7156280.6190480.680.574524
THR_1358060.6190480.6792450.58966110.7156280.6190480.680.574524
THR_1358070.6190480.6792450.600102110.7156280.6190480.680.574524
THR_1358080.6190480.6792450.597993110.7156280.6190480.680.574524
THR_1358090.6190480.6792450.597993110.7156280.6190480.680.574524
THR_3388178760.6190480.6716980.6020070.7619050.7533630.6518180.6190480.6666670.584881
THR_484278540.6666670.7283020.6317010.7142860.708520.616450.6666670.7066670.579762
THR_517017190.8095240.8490570.6619390.6190480.6233180.593420.6190480.640.589405
THR_517042150.7619050.8075470.6505440.5714290.5739910.5605630.6190480.640.576071
THR_625111550.6190480.6792450.600102110.7156280.6190480.680.574524
TRY_A1L3H80.7619050.8075470.6509860.7142860.7174890.6119480.6190480.6666670.594643
TRY_A1Z7M70.7142860.7358490.5918710.5714290.582960.5259310.523810.5733330.530238
TRY_A1Z8J70.5714290.6075470.5587070.523810.5246640.5449780.476190.520.527143
TRY_A5CG750.6190480.6754720.6027550.6666670.6681610.5899130.6190480.680.584286
TRY_A7UNZ40.6190480.6716980.560.5714290.5784750.563550.6190480.680.60631
TRY_A9WQU10.4285710.4716980.4282310.4285710.4260090.4136360.4285710.480.415357
TRY_A9WQW10.4285710.452830.4376870.3333330.3363230.38290.4285710.4533330.436667
TRY_B5DZ080.6190480.6566040.6178570.6190480.6188340.5835060.5714290.6133330.597381
TRY_B7P4W60.3333330.3433960.3826190.0952380.094170.3426410.0952380.0933330.344286
TRY_B7P5I00.2380950.252830.3209860.0952380.1210760.2472730.1428570.1466670.219167
TRY_B7P8G50.6666670.7207550.6178570.6666670.6681610.6094370.5714290.640.575952
TRY_B7P9I90.5714290.6037740.5731970.523810.5336320.5648920.4285710.4666670.52131
TRY_B7PAU00.6190480.6490570.5931970.5714290.582960.5731170.4285710.480.538571
TRY_B7PC210.6666670.7169810.6292860.6666670.6681610.5891770.7142860.7466670.629881
TRY_B7PDD50.3809520.4113210.5218030.3809520.385650.4964070.3333330.360.509286
TRY_B7PF420.5714290.6188680.5579930.523810.5291480.556710.4285710.520.535595
TRY_B7PF430.3809520.3924530.3845240.3809520.3811660.3992210.2857140.320.35131
TRY_B7PFF70.5714290.6150940.5786050.5714290.5739910.5612550.5714290.640.586786
TRY_B7PKT90.7142860.7433960.6601020.6666670.6591930.610130.6190480.6666670.597381
TRY_B7PS160.3809520.4113210.4250680.3809520.3811660.4700430.3333330.360.383571
TRY_B7PTD20.523810.5735850.4610540.5714290.5739910.4885280.4285710.480.454762
TRY_B7Q2U20.5714290.6188680.4585030.5714290.5650220.4796970.476190.520.480357
TRY_B7Q6130.6190480.6603770.5850.6190480.6233180.5783980.5714290.6266670.56619
TRY_B7QBB90.523810.5698110.5426190.476190.4753360.4993510.523810.5733330.536905
TRY_B7QCX10.476190.5207550.436020.5714290.5739910.5401730.476190.5066670.432619
TRY_B7QGT20.523810.5660380.4791160.523810.5246640.4373590.4285710.4533330.38
TRY_B7QH620.2857140.3018870.3513950.2380950.2645740.2881820.0952380.1066670.189286
TRY_B7QKP80.5714290.6188680.5497960.6666670.6681610.6062340.6190480.680.584048
TRY_B7QLM50.6666670.7169810.5935030.5714290.5784750.5517320.476190.5333330.506905
TRY_B7QNG90.4285710.4415090.490850.3809520.3946190.4650220.3333330.3733330.465357
TRY_C6WB290.6190480.6716980.611020.6190480.614350.5677920.6190480.6533330.577857
TRY_D0D5G30.6666670.7207550.6176870.6666670.6681610.6128570.6190480.6666670.594524
TRY_D2Y5C30.6666670.694340.6118370.476190.4753360.510520.6190480.6133330.560357
TRY_D2YGB80.7142860.7622640.6345580.6666670.6726460.6080950.5714290.6266670.590238
TRY_D3PK180.8571430.8905660.6775850.6190480.6233180.5939390.6190480.680.591071
TRY_E0V9170.523810.5698110.553980.476190.4843050.5543720.5714290.6266670.574167
TRY_E0V9710.4285710.4679250.5498980.3809520.385650.5064940.4285710.4666670.5125
TRY_E0VCQ10.6190480.6679250.5358840.5714290.5784750.5557580.523810.5733330.53119
TRY_E0VDU60.5714290.60.5893880.5714290.5739910.5651950.476190.5466670.586429
TRY_E0VFA80.6666670.7132080.6045580.5714290.5784750.5802160.6190480.640.588095
TRY_E0VFA90.6190480.6603770.5874830.5714290.5739910.5951950.5714290.6133330.5875
TRY_E0VJE50.7619050.7962260.6617010.6190480.6233180.5850650.5714290.6266670.58619
TRY_E0VN670.6666670.7094340.6102380.6666670.6681610.5837230.6190480.6666670.589762
TRY_E0VPJ30.7142860.7547170.6359180.5714290.5739910.5638530.5714290.6133330.595
TRY_E0VQ980.6190480.6566040.6118030.6190480.6188340.583030.5714290.6133330.600833
TRY_E0VQA90.5714290.6150940.5503740.523810.5291480.5634630.4285710.480.542738
TRY_E0VQK20.6666670.7132080.6115990.6666670.6726460.5845020.523810.5866670.576548
TRY_E0VW090.7142860.7698110.6450680.7142860.708520.616450.7142860.760.631905
TRY_E0VW100.6666670.7245280.6252380.6666670.6681610.6222510.6666670.720.6125
TRY_E0VW140.7619050.8113210.6521090.7142860.7130040.6220780.6190480.680.603929
TRY_E0W0740.5714290.6037740.5434690.523810.5291480.5320780.476190.520.529881
TRY_E0W1D00.2857140.3094340.4503740.1904760.1928250.3595670.1428570.1733330.431429
TRY_E3X3A60.4285710.452830.5492520.476190.4843050.5122080.4285710.4666670.532381
TRY_E8UA100.5714290.6188680.5677210.5714290.5784750.5504330.476190.5333330.54881
TRY_E9GB320.7142860.7622640.6300340.6666670.6681610.6020350.6190480.6666670.594286
TRY_E9GI060.7142860.7660380.6243880.6190480.6233180.5804760.5714290.6266670.583929
TRY_E9GYN10.7619050.8075470.6625850.6666670.6726460.6035930.5714290.6266670.592619
TRY_E9HT870.7142860.7698110.6519050.6190480.614350.5654110.6190480.680.606667
TRY_F0RPS30.5714290.6188680.5659860.5714290.5784750.5466230.476190.5466670.575833
TRY_F6Y1Q10.6666670.7207550.5903060.5714290.5739910.5644590.5714290.6266670.581309
TRY_G0K2W40.7619050.8150940.6363950.7142860.7130040.6163640.6190480.6666670.604167
TRY_G8S3230.6190480.6754720.5642860.6190480.6233180.5516880.6190480.680.568452
TRY_H2B6530.2857140.3207550.4272790.1904760.1883410.3987450.0476190.0533330.328214
TRY_H2K2810.7142860.7660380.6347280.6666670.6681610.5957580.6190480.6666670.588095
TRY_H2XPX50.523810.5396230.4670410.476190.4753360.4079220.3809520.4266670.373095
TRY_O973990.7619050.7849060.6438090.6190480.6233180.596970.523810.5866670.572024
TRY_P007650.8095240.8377360.6707820.6666670.6726460.6087010.5714290.6266670.584881
TRY_P048140.7619050.7811320.6428910.6666670.6681610.5878350.6666670.7066670.597381
TRY_P071460.8571430.8754720.6569050.6666670.6726460.6048920.5714290.6266670.577738
TRY_P074770.8095240.8566040.6573130.6666670.6726460.6126410.5714290.6266670.570833
TRY_P074780.8571430.8754720.6681970.6666670.6726460.6048920.5714290.6266670.577738
TRY_P084260.8571430.8754720.6602380.6190480.6233180.5958440.5714290.6266670.586429
TRY_P246640.6190480.6716980.6143540.6190480.614350.5989180.5714290.6266670.566548
TRY_P350040.8095240.8301890.6521430.6190480.6233180.5810820.6190480.6666670.591071
TRY_P350050.8095240.8264150.6669730.6666670.6681610.6025970.6666670.720.622976
TRY_P350300.7619050.8075470.6426190.6190480.6233180.586970.523810.60.568095
TRY_P350330.8571430.8754720.6586050.6666670.6726460.6048920.5714290.6266670.577738
TRY_P350360.7619050.7924530.6324490.6666670.6681610.5954550.6666670.7066670.596905
TRY_P350380.6666670.7169810.6358840.6666670.6726460.6016020.6190480.6666670.603929
TRY_P350480.7142860.7622640.6137410.5714290.5739910.5558440.6190480.680.591905
TRY_P350490.6666670.7358490.6269050.6190480.6233180.560.5714290.640.56119
TRY_P350510.1904760.1849060.1659520.0476190.0582960.122121000.025714
TRY_P422780.7619050.7849060.6523810.6666670.6681610.6192640.6190480.6666670.599643
TRY_P515880.7619050.8075470.659320.6666670.6681610.6048050.6666670.7066670.6075
TRY_P529050.6666670.6867920.5851360.7142860.7174890.593420.5714290.6266670.565
TRY_Q1D1D20.6666670.7018870.6221430.523810.5291480.5568830.523810.5733330.557619
TRY_Q28EV70.7142860.7698110.6443880.6190480.6233180.6035930.5714290.6266670.583929
TRY_Q4VSI20.6190480.6641510.599490.6090480.6088340.5687880.523810.5733330.55631
TRY_Q541790.6190480.6830190.561020.476190.4753360.5294810.6190480.6933330.590238
TRY_Q6MQB30.3333330.3622640.4352380.2857140.2869960.4603030.1904760.2266670.424286
TRY_Q6QX590.8095240.8377360.6775170.7142860.7174890.6359310.6190480.6666670.595833
TRY_Q6QX600.8571430.9018870.6708160.6190480.6233180.59290.5714290.6266670.584643
TRY_Q7JPN90.7142860.7698110.6156460.7142860.7174890.6073590.6190480.6666670.572738
TRY_Q8IYP20.3333330.3660380.4680950.4285710.4349780.5398270.3333330.3733330.484524
TRY_Q8MS520.7142860.7622640.6236390.6666670.6726460.6019480.5714290.6266670.602857
TRY_Q8ZSE30.2380950.2679250.4174150.2857140.2825110.40710.1428570.1733330.402024
TRY_Q9VBY40.7142860.7622640.6187410.6190480.6233180.5948920.6666670.7333330.625833
TRY_Q9VUG20.6666670.7245280.6216330.6190480.6233180.610.6190480.6933330.625357

Notes: The scores for every query sequence with respect to subfamily specific training set is shown. The highest score for each of the sequences have been highlighted and it can be seen that in majority of the cases a correct classification was obtained for the query sequences.

  29 in total

1.  Classification of protein disulphide-bridge topologies.

Authors:  J M Mas; P Aloy; M A Martí-Renom; B Oliva; R de Llorens; F X Avilés; E Querol
Journal:  J Comput Aided Mol Des       Date:  2001-05       Impact factor: 3.686

2.  CysView: protein classification based on cysteine pairing patterns.

Authors:  Johann Lenffer; Paulo Lai; Wafaa El Mejaber; Asif M Khan; Judice L Y Koh; Paul T J Tan; Seng H Seah; Vladimir Brusic
Journal:  Nucleic Acids Res       Date:  2004-07-01       Impact factor: 16.971

3.  Definition of general topological equivalence in protein structures. A procedure involving comparison of properties and relationships through simulated annealing and dynamic programming.

Authors:  A Sali; T L Blundell
Journal:  J Mol Biol       Date:  1990-03-20       Impact factor: 5.469

4.  A structural basis for sequence comparisons. An evaluation of scoring methodologies.

Authors:  M S Johnson; J P Overington
Journal:  J Mol Biol       Date:  1993-10-20       Impact factor: 5.469

Review 5.  The anatomy and taxonomy of protein structure.

Authors:  J S Richardson
Journal:  Adv Protein Chem       Date:  1981

6.  Conformations of disulfide bridges in proteins.

Authors:  N Srinivasan; R Sowdhamini; C Ramakrishnan; P Balaram
Journal:  Int J Pept Protein Res       Date:  1990-08

7.  Disulfide bonding patterns and protein topologies.

Authors:  C J Benham; M S Jafri
Journal:  Protein Sci       Date:  1993-01       Impact factor: 6.725

8.  Genome-wide analysis of the odorant-binding protein gene family in Drosophila melanogaster.

Authors:  Daria S Hekmat-Scafe; Charles R Scafe; Aimee J McKinney; Mark A Tanouye
Journal:  Genome Res       Date:  2002-09       Impact factor: 9.043

9.  Identification of a distinct family of genes encoding atypical odorant-binding proteins in the malaria vector mosquito, Anopheles gambiae.

Authors:  P X Xu; L J Zwiebel; D P Smith
Journal:  Insect Mol Biol       Date:  2003-12       Impact factor: 3.585

10.  Genome analysis and expression patterns of odorant-binding proteins from the Southern House mosquito Culex pipiens quinquefasciatus.

Authors:  Julien Pelletier; Walter S Leal
Journal:  PLoS One       Date:  2009-07-16       Impact factor: 3.240

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.