| Literature DB >> 34254037 |
Magdalyn E Elkin1, Xingquan Zhu1.
Abstract
Clinical trials are crucial for the advancement of treatment and knowledge within the medical community. Although the ClinicalTrials.gov initiative has resulted in a rich source of information for clinical trial research, only a handful of analytic studies have been carried out to understand this valuable data source. Analysis of this database provides insight for emerging trends of clinical research. In this study, we propose to use network analysis to understand infectious disease clinical trial research. Our goal is to understand two important issues related to the clinical trials: (1) the concentrations and characteristics of infectious disease clinical trial research, and (2) recommendation of clinical trials to a sponsor (or an investigator). The first issue helps summarize clinical trial research related to a particular disease(s), and the second issue helps match clinical trial sponsors and investigators for information recommendation. By using 4228 clinical trials as the test bed, our study investigates 4864 sponsors and 1879 research areas characterized by Medical Subject Heading (MeSH) keywords. We use a network to characterize infectious disease clinical trials, and design a new community-topic-based link prediction approach to predict sponsors' interests. Our design relies on network modeling of both clinical trial sponsors and keywords. For sponsors, we extract communities with each community consisting of sponsors with coherent interests. For keywords, we extract topics with each topic containing semantic consistent keywords. The communities and topics are combined for accurate clinical trial recommendation. This transformative study concludes that using network analysis can tremendously help the understanding of clinical trial research for effective summarization, characterization, and prediction.Entities:
Keywords: Clinical trials; Link prediction; Network community; Recommendation
Year: 2021 PMID: 34254037 PMCID: PMC8262767 DOI: 10.1007/s13721-021-00321-7
Source DB: PubMed Journal: Netw Model Anal Health Inform Bioinform ISSN: 2192-6670
Fig. 1A conceptual view of bipartite graph for clinical trial sponsor-area relationship modeling. a Shows a bipartite network where upper pink squares denote sponsors and lower blue circles indicate research areas. A blue solid line denotes an edge, indicating that a sponsor has conducted a clinical trial on the connected area. The brown dot-dash line separates the networks into communities suggesting that sponsors and their research areas fall into two groups. The red-dash line (with a question mark) is the predicted link, predicting that is interested in (although the connection currently does not exist); b shows the two-mode network of the bipartite network in (a); c shows one-mode network which omits sponsor nodes in the bipartite graph. Two area nodes are connected if they both connect to one sponsor node in the bipartite network in (a); and d shows a close 4-path (lower) and an open 4-path. A close 4-path in (d) is a circle in the one-mode network in (c)
Fig. 6The structure of a valid community and an invalid community: a valid community, , consists of 18 nodes (, ); and b invalid community, , consists of of 5 nodes (, ) The pink squares indicate sponsors and the blue circles indicate research areas
Fig. 2Comparison between community based (Elkin et al. 2019) (a) vs. the proposed community-topic-based link prediction (b) for recommendation. Community-based approach (Elkin et al. 2019) relies on community structure for recommendation, therefore it cannot recommend link for sponsors in invalid community (e.g., the purple dashed line with a question mark from sponsor to keyword ). In comparison, the community-topic approach finds communities and topics from sponsors and keywords, respectively. Although sponsors is in an invalid community, the existing linkage to topic will help recommend connection to which is within the same topic
Fig. 3Degree distributions in log-log scale. The axis denotes node degrees, and the axis denotes the number of numbers with the specified node degrees: a research area nodes () and b sponsor nodes ()
Top 20 research area nodes by degree
| Research area node | Degree | Research area node | Degree |
|---|---|---|---|
| Infection | 864 | Toxemia | 212 |
| HIV Infections | 656 | Hepatitis C | 203 |
| Communicable Diseases | 637 | Human | 197 |
| Tuberculosis | 412 | Influenza | 193 |
| Pneumonia | 399 | Respiratory Tract Infections | 190 |
| Hepatitis | 309 | Acquired Immunodeficiency Syndrome | 172 |
| Sepsis | 295 | Chronic | 170 |
| Malaria | 259 | Vaccines | 168 |
| Anti-Bacterial Agents | 256 | Antibiotics | 166 |
| Hepatitis A | 235 | Antitubercular | 166 |
Neglected tropical disease research areas
| Research area node | Degree | Research area node | Degree |
|---|---|---|---|
| Leishmaniasis | 39 | Cysticercosis | 7 |
| Schistosomiasis | 25 | Hookworm Infections | 6 |
| Dengue | 23 | Onchocerciasis | 5 |
| Chagas Disease | 22 | Rabies | 4 |
| Leprosy | 13 | Severe Dengue | 4 |
| Filariasis | 12 | Trypanosomiasis | 4 |
| Helminthiasis | 11 | Echinococcosis | 3 |
| Taeniasis | 7 | Trachoma | 3 |
| Buruli Ulcer | 7 | Treponemal Infections | 1 |
Fig. 4A sub-dendrogram at height 2000. Four topic clusters shown. Red dots indicate when clusters were merged. The dashed line represents where the final partition separated the four clusters
Fig. 5Word clouds for keywords in two separate topic groups. a represents keywords within an oncology construct. b represents keywords within an HIV treatment construct
A subset of topic groups, their possible construct descriptor, and the respective keywords. The numbers beside the keywords represents the frequency of keyword, k, found in all clinical trials
| Topic | Construct | Keywords (frequency) |
|---|---|---|
| Neuropathic | Postherpetic (5); Neuralgia (7); Trigeminal Nerve Injuries (1) | |
| Facial Pain (1); Pregabalin (1); Herpes Zoster Oticus (1) | ||
| Urinary System | Urinary Bladder (5); Overactive (3); Dyspareunia (2) | |
| Enuresis (1); Stress (1); Solifenacin Succinate (1) | ||
| Urinary Incontinence (1) | ||
| Hand, Foot | Mouth Diseases (3); Hand (3); Foot-and-Mouth Disease (3) | |
| Mouth Disease | Magnesium Sulfate (2); Foot and Mouth Disease (3) | |
| Infectious | Whopping Cough (12); Diptheria (5); Tetanus (4) | |
| Disease | Tetany (1); Haemophilus Infections (1) | |
| MMR | Measles (3); Mumps (1); Rubella (1) | |
| Sinus Infections | Triamcinolone (2); Triamcinolone diacetate (2) | |
| Frontal Sinusitis (1); Triamcinolone Acetonide (2) | ||
| Triamcinolone hexacetonide (2) | ||
| Gastrointestinal | Stomach Ulcer (4); Anorexia (2) | |
| System | Weight Loss (2); Duodenal Ulcer (1) | |
| Vitamin A | Vitamin A (3); Night Blindness (1) | |
| Retinol palmitate (3); Vitamin A Deficiency (1) | ||
| Blood Clots | Mastoiditis (2); Intracranial (1); Thrombophilia (1) | |
| Lateral Sinus Thrombosis (1); Sinus Thrombosis (1) | ||
| Facial Paralysis | Paralysis (3); Bell Palsy (3) | |
| Facial Paralysis (3); Facial Nerve Diseases (1) |
Summary of community detection results
| | | | | ||||
|---|---|---|---|---|---|
| Valid | 139 | 3662 | 1303 | .981 | .054 |
| Invalid | 339 | 1202 | 575 | NA | NA |
Each column represents: (1) valid vs. invalid communities, (2) number of communities (), (3) number of sponsors (|s|), (4) number of research areas (|k|), (5) average Global Coefficient (), and (6) reinforcement coefficient (), respectively
Keywords within two communities
| Community | Keywords |
|---|---|
| Antibodies; Monoclonal; Immunological; Yellow Fever | |
| Blocking; Antineoplastic Agents | |
| Stomach Ulcer; Anorexia; Weight Loss |
Fig. 7Link prediction accuracy comparison on valid community networks, a using benchmark node set , and b using benchmark node set . The axis denotes the top-k prediction, and the axis denotes the link prediction accuracy
Fig. 8Link prediction accuracy on the valid community network using benchmark node set . The axis denotes the top-k prediction, and the axis denotes the accuracy
Fig. 9Link prediction accuracy comparison on valid community networks, a using benchmark node set , and b using benchmark node set . The axis denotes the top-k prediction, and the axis denotes the link prediction accuracy
Fig. 10Link prediction accuracy on the whole network using benchmark node set . The axis denotes the top-k prediction, and the axis denotes the accuracy