| Literature DB >> 25406415 |
Abstract
BACKGROUND: Machine Learning (ML) has a number of demonstrated applications in protein prediction tasks such as protein structure prediction. To speed further development of machine learning based tools and their release to the community, we have developed a package which characterizes several aspects of a protein commonly used for protein prediction tasks with machine learning.Entities:
Mesh:
Substances:
Year: 2014 PMID: 25406415 PMCID: PMC4246511 DOI: 10.1186/1756-0500-7-810
Source DB: PubMed Journal: BMC Res Notes ISSN: 1756-0500
Figure 1PCP-ML can be used to create a quick, stand-alone feature generation program or used in conjunction with other libraries. The left side of the figure illustrates how PCP-ML can be used to create feature files which are then fed into existing ML tools (e.g., SVMlight) to generate a final prediction. In this case, PCP-ML is used to create a feature file that becomes the input to an existing machine learning tool, and this tool makes the final prediction. The right side of the figure shows PCP-ML packaged with other machine learning libraries for a complete, custom solution for protein prediction tasks. In this case, one program will read the input and generate the predictions. The round boxes represent libraries or packages and the labelled boxes represent programs or scripts.
Major methods provided by each component of PCP-ML
| Parsers and encoders | Characterizers | Feature writers and generators |
|---|---|---|
| ParseFastaSequences | AthelyFactors | PrintFeatures |
| ParseSSProOutput | InterfaceContactPotentials | WriteFeatures |
| ParsePSIPredOutput | BetaContactPotentials | |
| ParseAsciiPSSM | SSComposition | |
| ParseAnchoredMSA | SAComposition | |
| ParseDSSPOutput | AAComposition | |
| HotEncodeAA | Hydrophobicity | |
| HotEncodeSS | CalculateR | |
| HotEncodeSA | CalculateCosine | |
| ScaledOrderedMean | ||
| CalculateEntropy |
Figure 2Dataflow diagram for common uses of PCP-ML. The feature generation process accesses sequence data through the Parsers. Based on the prediction task, numerical data that characterizes a protein’s sequence is provided through the Characterizers and Encoders. The generated features are then saved to a file or passed to a machine learning prediction process to make an end prediction.
Description of each Characterizer contained in PCP-ML
| Name of characterizer | Brief description of functionality provided |
|---|---|
| AtchleyFactors | Characterizes five major aspects of an amino acid with real number values. The values were obtained via a statistical analysis of amino acids when looking at polarity, secondary structure, molecular size , amino acid composition and charge. These values were reported in [ |
| InterfaceContactPotentials | Characterizes contact potential between two residues. These contact potentials come from a statistical analysis performed on contacts in protein interfaces. They were reported in [ |
| BetaContactPotentials | Characterizes the contact potential for two residues in two beta sheets. These values come from a study of contact potentials of residues in cross strand pairings in beta sheets. They were reported in [ |
| SSComposition | Determine the percentage of each secondary structure (SS) type in a string representing the secondary structure of the entire protein. |
| SAComposition | Determine the percentage of solvent accessibility from a string representing the solvent accessibility of the entire protein. |
| AAComposition | Determine the percentage of each amino acid in a protein sequence. |
| Hydrophobicity | Characterizes the hydrophobicity of a residue. These values come from a study on hydrophobicity and helical propensity in [ |
| CalculateR | Calculates the Pearson correlation coefficient for the elements of two feature vectors. |
| CalculateCosine | Calculates the cosine between two feature vectors. |
| ScaledOrderedMean | Calculates the nth ordered mean for the Amino Acid, Secondary Structure or Solvent Accessibility string. |
| CalculateEntropy | Calculates the Shannon entropy for a vector of probabilities |