| Literature DB >> 26664458 |
Jie Dong1, Dong-Sheng Cao1, Hong-Yu Miao2, Shao Liu3, Bai-Chuan Deng4, Yong-Huan Yun4, Ning-Ning Wang1, Ai-Ping Lu5, Wen-Bin Zeng1, Alex F Chen1.
Abstract
BACKGROUND: Molecular descriptors and fingerprints have been routinely used in QSAR/SAR analysis, virtual drug screening, compound search/ranking, drug ADME/T prediction and other drug discovery processes. Since the calculation of such quantitative representations of molecules may require substantial computational skills and efforts, several tools have been previously developed to make an attempt to ease the process. However, there are still several hurdles for users to overcome to fully harness the power of these tools. First, most of the tools are distributed as standalone software or packages that require necessary configuration or programming efforts of users. Second, many of the tools can only calculate a subset of molecular descriptors, and the results from multiple tools need to be manually merged to generate a comprehensive set of descriptors. Third, some packages only provide application programming interfaces and are implemented in different computer languages, which pose additional challenges to the integration of these tools.Entities:
Keywords: Chemoinformatics; Molecular descriptors; Molecular fingerprints; Molecular representation; Online descriptor calculation; QSAR/QSPR
Year: 2015 PMID: 26664458 PMCID: PMC4674923 DOI: 10.1186/s13321-015-0109-z
Source DB: PubMed Journal: J Cheminform ISSN: 1758-2946 Impact factor: 5.514
Fig. 1The APIs integrated in ChemDes. ChemDes integrated all the APIs related to molecular descriptors and fingerprints from six toolkits. The APIs from each toolkit are divided into two main parts: the APIs for molecular descriptors and the APIs for fingerprints
The list of molecular descriptors covered by ChemDes
| Type of descriptors | Dimension | Number of descriptors | The origin of features |
|---|---|---|---|
| Constitutional descriptors | 1 | 309 | A, B, C, D, E, F |
| Molecular format descriptors | 1 | 6 | A |
| Autocorrelation descriptors | 2 | 467 | C, B, E, F |
| Basak descriptors | 2 | 63 | B, E |
| BCUT descriptors | 2 | 12 | C, E |
| Burden descriptors | 2 | 160 | B, E |
| Connectivity descriptors | 2 | 194 | C, B, D, E, F |
| E-state descriptors | 2 | 734 | B, E |
| Kappa descriptors | 2 | 92 | C, B, E |
| Molecular property descriptors | 2 | 55 | A, B, C, D, E, F |
| Quantum chemical descriptors | 2 | 7 | C, E |
| Topological descriptors | 2 | 376 | B, C, D, E, F |
| MOE-type descriptors | 2 | 118 | B, D |
| Charge descriptors | 2 | 25 | B |
| 3D Autocorrelation descriptors | 3 | 80 | E |
| CPSA descriptors | 3 | 116 | B, C, E, F |
| RDF descriptors | 3 | 390 | B, E |
| Geometrical descriptors | 3 | 62 | B, C, E, F |
| MoRSE descriptors | 3 | 210 | B |
| WHIM descriptors | 3 | 195 | B, C, E, F |
A, B, C, D, E, F stands for Pybel, Chemopy, CDK, RDKit, PaDEL, and BlueDesc, respectively
The list of molecular fingerprints covered by ChemDes
| Type of molecular fingerprints | The origin of algorithm |
|---|---|
| FP2 fingerprints | A |
| FP3 fingerprints | A |
| FP4 fingerprints | A, B |
| MACCS fingerprints | A, B, C, D, E, F |
| Daylight-type fingerprints | B |
| E-state fingerprints | B, C, D, E |
| Atom Paris fingerprints | B, D, E |
| Torsions fingerprints | B, D |
| Morgan fingerprints | B, D |
| CDK fingerprints | C, E |
| Pubchem fingerprints | C, E |
| CDK extended fingerprints | C, E |
| Klekota-Roth fingerprints | C, E |
| GraphOnly fingerprints | C, E |
| Hybridization fingerprints | C |
| Substructure fingerprints | C, E |
| RDK fingerprints | D |
| Layered fingerprints | D |
| Pattern fingerprints | D |
| Klekota-Roth fingerprint count | E |
| Substructure fingerprint count | E |
| 2D atom pairs count | E |
| Othersa | F |
A, B, C, D, E, F stands for Pybel, Chemopy, CDK, RDKit, PaDEL, and jCompoundMapper, respectively
aFingerprints from jCompoundMapper: DFS, ASP, AP2D, AT2D, AP3D, AT3D, CATS2D, CATS3D, PHAP2POINT2D, PHAP3POINT2D, PHAP2POINT3D, PHAP3POINT3D, ECFP, ECFPVariant, LSTAR, SHED, RAD2D, RAD3D, MACCS
Fig. 2The relationship between the descriptors calculated by different toolkits. The circles in different colors represent the descriptors from different toolkits. The area size of each circle is proportional to the number of descriptors, and the area size of the intersection set is proportional to the number of descriptors they both include