| Literature DB >> 21569515 |
Christopher Southan1, Kiran Boppana, Sarma Arp Jagarlapudi, Sorel Muresan.
Abstract
BACKGROUND: Since the classic Hopkins and Groom druggable genome review in 2002, there have been a number of publications updating both the hypothetical and successful human drug target statistics. However, listings of research targets that define the area between these two extremes are sparse because of the challenges of collating published information at the necessary scale. We have addressed this by interrogating databases, populated by expert curation, of bioactivity data extracted from patents and journal papers over the last 30 years.Entities:
Year: 2011 PMID: 21569515 PMCID: PMC3118229 DOI: 10.1186/1758-2946-3-14
Source DB: PubMed Journal: J Cheminform ISSN: 1758-2946 Impact factor: 5.514
Figure 1Depiction of the key entities and the relationships between them (D-A-R-C-P) used to populate the MCD and TGD databases.
Content statistics and stringency triages for the combination of MCD and TGD.
| Entity Type | Count |
|---|---|
| Total records | 4442492 |
| Unique compound structures | 2856336 |
| Unique compound structures from patents | 2118101 |
| Unique compound structures from journals | 846026 |
| Total quantitative assay results | 10294189 |
| Quantitative assay results from papers | 5149097 |
| Quantitative assay results from patents | 5145092 |
| Total documents | 127330 |
| Journal articles | 79487 |
| Patents | 47843 |
| Type-B assay results | 4841851 |
| Target names (all species) with type-B assay results | 5334 |
| Protein identifiers (all species) with type-B assay results | 4043 |
| Human proteins with type-B assay results | 1736 |
| Human gene identifiers with type-B assay results | 1654 |
| Unique compounds linked to human protein identifiers with type-B assay results | 823179 |
Protein Identifier Content for Additional file 1
| Entity type | Count |
|---|---|
| Distinct protein names | 1736 |
| Entrez Gene ID (EGIDs) | 1654 |
| Symbols | 1654 |
| Symbol matching HGNC | 1638 |
| Splice form names | 135 |
| EGIDs with Splice forms | 48 |
Ranking of top-50 targets by numbers of compounds and documents.
| Rank | Approved | Entrez Gene | No of | No of |
|---|---|---|---|---|
| F10 | 2159 | 42869 | 690 | |
| CNR1 | 1268 | 29658 | 578 | |
| KDR | 3791 | 27661 | 350 | |
| MAPK14 | 1432 | 24568 | 309 | |
| DRD3 | 1814 | 23405 | 508 | |
| F2 | 2147 | 22853 | 768 | |
| HRH3 | 11255 | 22748 | 407 | |
| TACR1 | 6869 | 21908 | 626 | |
| MMP13 | 4322 | 20590 | 315 | |
| CNR2 | 1269 | 19712 | 464 | |
| MMP1 | 4312 | 17525 | 394 | |
| ADORA2A | 135 | 17181 | 532 | |
| EGFR | 1956 | 16581 | 445 | |
| SLC6A4 | 6532 | 16571 | 403 | |
| MMP9 | 4318 | 16537 | 344 | |
| HTR6 | 3362 | 16457 | 504 | |
| MMP2 | 4313 | 16405 | 310 | |
| HTR2C | 3358 | 15945 | 475 | |
| CRHR1 | 1394 | 15550 | 222 | |
| MC4R | 4160 | 15084 | 299 | |
| HTR2A | 3356 | 14622 | 509 | |
| NPY5R | 4889 | 14547 | 216 | |
| CCR3 | 1232 | 14136 | 114 | |
| OPRM1 | 4988 | 13394 | 466 | |
| DPP4 | 1803 | 13057 | 308 | |
| REN | 5972 | 12894 | 438 | |
| CALCRL | 10203 | 12615 | 137 | |
| CTSS | 1520 | 12426 | 177 | |
| CHRM3 | 1131 | 12398 | 412 | |
| CCR2 | 729230 | 12208 | 160 | |
| DRD2 | 1813 | 12050 | 564 | |
| MET | 4233 | 11745 | 118 | |
| ADORA1 | 134 | 11644 | 480 | |
| GSK3B | 2932 | 11283 | 198 | |
| CCR5 | 1234 | 11179 | 197 | |
| CXCR2 | 3579 | 10851 | 183 | |
| SRC | 6714 | 10838 | 282 | |
| MCHR1 | 2847 | 10821 | 209 | |
| EDNRA | 1909 | 10769 | 260 | |
| NR3C1 | 2908 | 10687 | 199 | |
| EDNRB | 1910 | 10601 | 239 | |
| HTR1A | 3350 | 10015 | 627 | |
| OPRK1 | 4986 | 9690 | 375 | |
| TACR2 | 6865 | 9676 | 301 | |
| SLC6A2 | 6530 | 9671 | 272 | |
| ADORA3 | 140 | 9533 | 457 | |
| OPRD1 | 4985 | 9500 | 394 | |
| HSD11B1 | 3290 | 9334 | 151 | |
| ELANE | 1991 | 9173 | 308 | |
| TRPV1 | 7442 | 8988 | 150 | |
Binned distribution of compounds-per-target.
| Compound bin | Targets above bin |
|---|---|
| 10000 | 42 |
| 5000 | 95 |
| 2000 | 194 |
| 1051 (90% total) | 278 |
| 1000 | 287 |
| 500 | 380 |
| 200 | 526 |
| 100 | 667 |
| 50 | 816 |
| 10 | 1194 |
| 2 | 1591 |
| 1 | 1736 |
This includes all entries in the Additional file 1.
Figure 2The molecular topology hierarchy exemplified for Atorvastatin (Lipitor).
Top-20 target rankings by compound count and molecular frameworks.
| Cmpd ranking | Target symbol | MF2 ranking | Target symbol | GS ranking | Target symbol |
|---|---|---|---|---|---|
| 1 | F10 | 1 | F10 | 1 | F10 |
| 2 | CNR1 | 7 | HRH3 | 7 | HRH3 |
| 3 | KDR | 2 | CNR1 | 3 | KDR |
| 4 | MAPK14 | 6 | F2 | 6 | F2 |
| 5 | DRD3 | 3 | KDR | 2 | CNR1 |
| 6 | F2 | 5 | DRD3 | 24 | OPRM1 |
| 7 | HRH3 | 28 | CTSS | 20 | MC4R |
| 8 | TACR1 | 8 | TACR1 | 5 | DRD3 |
| 9 | MMP13 | 10 | CNR2 | 8 | TACR1 |
| 10 | CNR2 | 12 | ADORA2A | 29 | CHRM3 |
| 11 | MMP1 | 4 | MAPK14 | 28 | CTSS |
| 12 | ADORA2A | 29 | CHRM3 | 26 | REN |
| 13 | EGFR | 9 | MMP13 | 10 | CNR2 |
| 14 | SLC6A4 | 26 | REN | 4 | MAPK14 |
| 15 | MMP9 | 24 | OPRM1 | 12 | ADORA2A |
| 16 | HTR6 | 23 | CCR3 | 31 | DRD2 |
| 17 | MMP2 | 20 | MC4R | 38 | MCHR1 |
| 18 | HTR2C | 17 | MMP2 | 37 | SRC |
| 19 | CRHR1 | 11 | MMP1 | 16 | HTR6 |
| 20 | MC4R | 25 | DPP4 | 49 | ELANE |
Columns 1 and 2 are the compound count ranking, columns 3 and 4 show the molecular framework 2 (MF2) ranking and colums 5 and 6 show the graph scaffold (GS) ranking.
Figure 3Sorted MF2 to number of compounds ratio (a) and Graph Scaffold to number of compounds ratio (b). This is plotted for all targets with more than 4869 compounds from Additional file 1.