| Literature DB >> 29556758 |
Jie Dong1,2, Zhi-Jiang Yao1, Lin Zhang2, Feijun Luo2, Qinlu Lin2, Ai-Ping Lu3, Alex F Chen4, Dong-Sheng Cao5,6,7.
Abstract
BACKGROUND: With the increasing development of biotechnology and informatics technology, publicly available data in chemistry and biology are undergoing explosive growth. Such wealthy information in these data needs to be extracted and transformed to useful knowledge by various data mining methods. Considering the amazing rate at which data are accumulated in chemistry and biology fields, new tools that process and interpret large and complex interaction data are increasingly important. So far, there are no suitable toolkits that can effectively link the chemical and biological space in view of molecular representation. To further explore these complex data, an integrated toolkit for various molecular representation is urgently needed which could be easily integrated with data mining algorithms to start a full data analysis pipeline.Entities:
Keywords: Bioinformatics; Chemoinformatics; Data integration; Molecular descriptors; Molecular representation; Python library
Year: 2018 PMID: 29556758 PMCID: PMC5861255 DOI: 10.1186/s13321-018-0270-2
Source DB: PubMed Journal: J Cheminform ISSN: 1758-2946 Impact factor: 5.514
Molecular descriptors of chemicals calculated by PyBioMed
| Feature group | Features | Number of descriptors |
|---|---|---|
| Constitution | Molecular constitutional descriptors | 30 |
| Topology | Topological descriptors | 35 |
| Connectivity | Molecular connectivity indices | 44 |
| E-state | E-state descriptors | 237 |
| Bask | Bask descriptors | 21 |
| Burden | Burden descriptors | 64 |
| Kappa | Kappa shape descriptors | 7 |
| Autocorrelation | Moreau–Broto autocorrelation | 32 |
| Moran autocorrelation | 32 | |
| Geary autocorrelation | 32 | |
| Charge | Charge descriptors | 25 |
| Property | Molecular property | 6 |
| MOE-type | MOE-type descriptors | 60 |
| Pharmacophore | Chemically advanced template search (CATS) | 150 |
Molecular fingerprints of chemicals calculated by PyBioMed
| Feature group | Features | Number of descriptors |
|---|---|---|
| Substructure-based fingerprints | MACCS fingerprints | 166 |
| E-state fingerprints | 79 | |
| Ghose–Crippen fingerprints | 110 | |
| FP3 fingerprints | 210 | |
| FP4 fingerprints | 307 | |
| PubChem fingerprints | 881 | |
| Fingerprints | Daylight-type fingerprints | 2048 |
| Atom pairs fingerprints | 2048 | |
| Topological torsion fingerprints | 2048 | |
| FP2 fingerprints | 1024 | |
| ECFP2 fingerprints | 1024 | |
| ECFP4 fingerprints | 1024 | |
| ECFP6 fingerprints | 1024 | |
| FCFP2 fingerprints | 1024 | |
| FCFP4 fingerprints | 1024 | |
| FCFP6 fingerprints | 1024 | |
| Morgan fingerprints | 1024 | |
| Pharm2D2point fingerprints | 135 | |
| Pharm2D3point fingerprints | 2135 |
Protein descriptors of proteins or peptides calculated by PyBioMed
| Feature group | Features | Number of descriptors |
|---|---|---|
| Amino acid composition | Amino acid composition | 20 |
| Dipeptide composition | 400 | |
| Tripeptide composition | 8000 | |
| Autocorrelation | Normalized Moreau–Broto autocorrelation | 240a |
| Moran autocorrelation | 240a | |
| Geary autocorrelation | 240a | |
| CTD | Composition | 21 |
| Transition | 21 | |
| Distribution | 105 | |
| Conjoint triad | Conjoint triad features | 343 |
| Quasi-sequence order | Sequence order coupling number | 60 |
| Quasi-sequence order descriptors | 100 | |
| Pseudo amino acid composition | Pseudo amino acid composition | 50b |
| Amphiphilic pseudo amino acid composition | 50c |
aThe number depends on the choice of the number of properties of amino acid and the choice of the maximum values of the lag. The default is use eight types of properties and lag = 30
bThe number depends on the choice of the number of the set of amino acid properties and the choice of the λ value. The default is use three types of properties proposed by Chou et al. and λ = 30
cThe number depends on the choice of the λ value. The default is that λ = 15
DNA descriptors of DNAs calculated by PyBioMed
| Feature group | Features | Number of descriptorsa |
|---|---|---|
| Nucleic acid composition | Basic kmer | 16 |
| Reverse compliment kmer | 12 | |
| Autocorrelation | Dinucleotide-based auto covariance | 76 |
| Dinucleotide-based cross covariance | 2812 | |
| Dinucleotide-based auto-cross covariance | 2888 | |
| Trinucleotide-based auto covariance | 24 | |
| Trinucleotide-based cross covariance | 264 | |
| Trinucleotide-based auto-cross covariance | 288 | |
| Pseudo nucleic acid composition | Pseudo dinucleotide composition | 18 |
| Pseudo k-tuple nucleotide composition | 18 | |
| Parallel correlation pseudo dinucleotide composition | 18 | |
| Parallel correlation pseudo trinucleotide composition | 66 | |
| Series correlation pseudo dinucleotide composition | 90 | |
| Series correlation pseudo trinucleotide composition | 88 |
aThe number depends on the choice of the values of the parameters in the formula. Here, the number of each type of descriptors is based on the default parameter value. For detailed information, please refer to the documentation section in the PyBioMed manual
Fig. 1The main modules of the PyBioMed library and their corresponding wide applications in chemoinformatics, bioinformatics and drug discovery process
Differences of software packages for descriptor calculation
| Tool names | Retrieving molecules | Pretreating molecules | Chemical descriptors/fingerprints | Protein descriptors | DNA/RNA descriptors | Interaction descriptors |
|---|---|---|---|---|---|---|
| PyBioMed | √ | √ | √ | √ | √ | √ |
| PyDPI | √ | √ | √ | |||
| ChemoPy | √ | |||||
| Cinfony | √ | |||||
| RDKit | √ | √ | ||||
| CDK | √ | |||||
| rcdk | √ | |||||
| PaDEL | √ | |||||
| BioJava | √ | √ | √ | |||
| Rcpi | √ | √ | √ | |||
| ChemmineR | √ | √ | ||||
| Propy | √ | |||||
| RepDNA | √ |