| Literature DB >> 25414339 |
Cheng Zhang1, Lin Tao2, Chu Qin3, Peng Zhang4, Shangying Chen4, Xian Zeng4, Feng Xu5, Zhe Chen6, Sheng Yong Yang7, Yu Zong Chen8.
Abstract
Similarity-based clustering and classification of compounds enable the search of drug leads and the structural and chemogenomic studies for facilitating chemical, biomedical, agricultural, material and other industrial applications. A database that organizes compounds into similarity-based as well as scaffold-based and property-based families is useful for facilitating these tasks. CFam Chemical Family database http://bidd2.cse.nus.edu.sg/cfam was developed to hierarchically cluster drugs, bioactive molecules, human metabolites, natural products, patented agents and other molecules into functional families, superfamilies and classes of structurally similar compounds based on the literature-reported high, intermediate and remote similarity measures. The compounds were represented by molecular fingerprint and molecular similarity was measured by Tanimoto coefficient. The functional seeds of CFam families were from hierarchically clustered drugs, bioactive molecules, human metabolites, natural products, patented agents, respectively, which were used to characterize families and cluster compounds into families, superfamilies and classes. CFam currently contains 11,643 classes, 34,880 superfamilies and 87,136 families of 490,279 compounds (1691 approved drugs, 1228 clinical trial drugs, 12,386 investigative drugs, 262,881 highly active molecules, 15,055 human metabolites, 80,255 ZINC-processed natural products and 116,783 patented agents). Efforts will be made to further expand CFam database and add more functional categories and families based on other types of molecular representations.Entities:
Mesh:
Substances:
Year: 2014 PMID: 25414339 PMCID: PMC4383987 DOI: 10.1093/nar/gku1212
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
The statistics of CFam seeds, compounds, families, superfamilies and classes with respect to the seven functional categories of compounds: approved drugs, clinical trial drugs, investigative drugs, bioactives (currently highly active molecules), human metabolites, zinc-processed natural products and patented agents
| Functional category | Number of seeds | Number of seeds and members | Number of families | Number of superfamilies | Number of classes |
|---|---|---|---|---|---|
| Approved Drugs | 1691 | 95 367 (4121 HM, 19 408 NP) | 1114 | 937 | 813 |
| Clinical Trial Drugs | 1168 | 38 981 (551 HM, 3258 NP) | 863 | 756 | 537 |
| Investigative Drugs | 11 093 | 93 191 (4321 HM, 11 881 NP) | 4226 | 2870 | 1700 |
| Bioactives | 98 523 | 171 162 (833 HM, 24 439 NP) | 29 983 | 15 088 | 4035 |
| Human Metabolites | 5229 | 10 408 (5229 HM, 1820 NP) | 2058 | 1377 | 709 |
| Natural Products | 19 449 | 20 821 | 4017 | 1517 | 394 |
| Patented Agents | 60 349 | 60 349 | 44 875 | 12 335 | 3455 |
| Total | 197 502 | 490 279 | 87 136 | 34 880 | 11 643 |
The number of members of these families from the two categories of special interests, human metabolites (HM) and natural products (NP) are also provided.
Figure 1.CFam web interface. CFam is searchable by three modes: compound and family name and ID searching, browsing of CFam families, superfamilies and classes and the alignment of a compound against CFam families.
Figure 2.A CFam molecule page resulting from the name search by inputting ‘aspirin’ and selecting ‘molecule’.
Figure 3.The CFam approved drug families browsing page resulting from the clicking of ‘Family’ in the section header titled ‘Browse CFam Family/Superfamily/Class by Functional Category’ and ‘Approved Drug Families’ in the section.
Figure 4.The CFam result page of the alignment of aspirin with CFam seeds.