| Literature DB >> 32459809 |
Abstract
Researchers and clinicians face a significant challenge in keeping up-to-date with the rapid rate of new associations between genetic mutations and diseases. To remedy this problem, this research mined the ClinicalTrials.gov corpus to extract relevant biological insights, produce unique reports to summarize findings, and make the meta-data available via APIs. An automated text-analysis pipeline performed the following features: parsing the ClinicalTrials.gov files, extracting and analyzing mutations from the corpus, mapping clinical trials to Human Phenotype Ontology (HPO), and finding associations between clinical trials and HPO nodes. Unique reports were created for each mutation (SNPs and protein mutations) mentioned in the corpus, as well as for each clinical trial that references a mutation. These reports, which have been run over multiple time points, along with APIs to access meta-data, are freely available at http://snpminertrials.com. Additionally, HPO was used to normalize disease terms and associate clinical trials with relevant genes. The creation of the pipeline and reports, the association of clinical trials with HPO terms, and the insights, public repository, and APIs produced are all novel in this work. The freely-available resources present relevant biological information and novel insights between biomedical entities in a robust and accessible manner, mitigating the challenge of being informed about new associations between mutations, genes, and diseases.Entities:
Year: 2020 PMID: 32459809 PMCID: PMC7252633 DOI: 10.1371/journal.pone.0233438
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Software libraries used in this study.
| Software | Details | |
|---|---|---|
| 1 | SAX Parser [ | Parsing XML of Clinical Trials |
| 2 | Apache OpenNLP [ | NLP parser for SNP mutations |
| 3 | MutationFinder [ | Protein mutation detection |
| 4 | Bootstrap [ | CSS files for HTML |
| 5 | Amazon Web Services (AWS [ | To host HTML reports |
| 6 | Jupyter Notebook (Google Colab [ | Python example to access API |
| 7 | Java Client API | To access results programmatically |
The software tools used and their descriptions. Software libraries 1, 2, and 3 aided in locating mutations in the text files while libraries 4 and 5 facilitated the creation of the reports and website. Software tools 6 and 7 were employed to enhance the accessibility of the results.
Fig 1Seven steps of the pipeline.
Methodology to mine ClinicalTrials.gov to extract unique insights for understanding SNPs and mutations. Each of the steps is described in detail in the “Analysis Steps” section.
Most frequent RSids across ClinicalTrials.gov.
| RSid | Count | HPO Node | HPO Node Name | Count | |
|---|---|---|---|---|---|
| 1 | rs12979860 | 38 | HP:0012115 | Hepatitis | 33 |
| HP:0200123 | Chronic hepatitis | 2 | |||
| HP:0001402 | Hepatocellular carcinoma | 2 | |||
| HP:0030731 | Carcinoma | 1 | |||
| HP:0001392 | Abnormality of the liver | 1 | |||
| 2 | rs6971 | 26 | HP:0002511 | Alzheimer disease | 4 |
| HP:0006802 | Abnormal anterior horn cell morphology | 2 | |||
| HP:0007354 | Amyotrophic lateral sclerosis | 2 | |||
| HP:0100753 | Schizophrenia | 1 | |||
| HP:0000729 | Psychosis | 1 | |||
| HP:0000709 | Encephalitis | 1 | |||
| HP:0002383 | Psychosis | 1 | |||
| HP:0000717 | Autism | 1 | |||
| HP:0000716 | Depressivity | 1 | |||
| HP:0002180 | Neurodegeneration | 1 | |||
| HP:0001658 | Myocardial infarction | 1 | |||
| HP:0001268 | Mental deterioration | 1 | |||
| 3 | rs9939609 | 11 | HP:0001513 | Obesity | 3 |
| HP:0000819 | Diabetes mellitus | 1 | |||
| HP:0001824 | Weight loss | 1 | |||
| HP:0000855 | Insulin resistance | 1 | |||
| HP:0100651 | Type I diabetes mellitus | 1 |
Most frequent RSids extracted across ClinicalTrials.gov.
HPO Terms with the most number of associated RSids.
| HPO Id | Name | # | RSid | |
|---|---|---|---|---|
| 1 | HP:0003002 | Breast carcinoma | 37 | rs1011970,rs10407022,rs1045485,rs10941679,rs10995190,rs11045585,rs11133360,rs11249433,rs12762549,rs13281615,rs13387042,rs16942,rs1800566,rs2002555,rs2046210,rs2237060,rs2241193,rs2297480,rs236114,rs2380205,rs271924,rs2981582,rs3803662,rs3817198,rs4073,rs4646,rs4973768,rs614367,rs6504950,rs704010,rs7333181,rs7349683,rs889312,rs909253,rs9344,rs9457827,rs999737, |
| 2 | HP:0100710 | Impulsivity | 23 | rs1042713,rs1079598,rs1150226,rs1549339,rs16111115,rs1672717,rs1800497,rs1800955,rs1801253,rs2242447,rs2278392,rs2550946,rs4532,rs4680,rs4994,rs518147,rs553668,rs5569,rs6269,rs6280,rs6295,rs6296,rs6311 |
| 3 | HP:0000718 | Aggressive behavior | 23 | rs1042713,rs1079598,rs1150226,rs1549339,rs16111115,rs1672717,rs1800497,rs1800955,rs1801253,rs2242447,rs2278392,rs2550946,rs4532,rs4680,rs4994,rs518147,rs553668,rs5569,rs6269,rs6280,rs6295,rs6296,rs6311 |
| 4 | HP:0000819 | Diabetes mellitus | 22 | rs10830963,rs12469968,rs13266634,rs2266782,rs2281135,rs2284872,rs2294918,rs35652124,rs35874116,rs35874116rs,rs3765467,rs3788979,rs5215,rs5219,rs738409,rs7565794,rs780094,rs780094s,rs78408340,rs7903146,rs9701796,rs9939609 |
| 5 | HP:0012115 | Hepatitis Chronic hepatitis | 20 | rs10813831,rs1127354,rs11795404,rs12356193,rs12979860,rs12992677,rs17037122,rs179008,rs2066842,rs2067085,rs2464266,rs3853839,rs41308230,rs4588,rs5743844,rs6592052,rs7041,rs7270101,rs7549785,rs8099917 |
| 6 | HP:0002099 | Asthma | 18 | rs1042711,rs1042713,rs1042714,rs1042718,rs11958940,rs11959427,rs12654778,rs12936231,rs1504982,rs17778257,rs1800888,rs1801275,rs1805010,rs2053044,rs2895795,rs324011,rs324015,rs4950928 |
| 7 | HP:0001257 | Spasticity | 18 | rs1049522,rs1049524,rs137852620,rs2032892,rs2269272,rs2269273,rs2562582,rs2731886,rs377637047,rs4869675,rs4869676,rs529802001,rs544684689,rs547987105,rs549927573,rs550842646,rs562696473,rs573562920 |
| 8 | HP:0001638 | Cardiomyopathy | 18 | rs1042522,rs1042522s,rs1056892,rs10836235,rs10865801,rs1128503,rs1149222,rs13058338,rs1465952,rs1786378374,rs1883112,rs2229774,rs2279744,rs35599367,rs3761624,rs45511401,rs4673,rs7853758 |
| 9 | HP:0001677 | Coronary arteryatherosclerosis | 16 | rs10153820,rs1143623,rs1143633,rs1143634,rs12041331,rs16944,rs16969968,rs17561,rs1761667,rs2305619,rs4848306,rs6434222,rs7586970,rs7903146,rs8069645,rs8176528 |
| 10 | HP:0001909 | Leukemia | 15 | rs10509681,rs11572080,rs12459419,rs172378,rs2032582,rs230561,rs25531,rs3816527,rs396991,rs4880,rs4958351,rs6190,rs628031,rs776746,rs904627 |
The 368 clinical trials with RSids mapped to 136 unique HPO terms.
Most frequent mutations across ClinicalTrials.gov.
| Mutation | Synonyms | Count | HPO Node | HPO Node Name | Count | |
|---|---|---|---|---|---|---|
| 1 | L858R | leucine to arginine at codon 858 leucine-to-arginine mutation at codon 858 | 293 | HP:0030358 | Non-small cell lung carcinoma | 233 |
| HP:0100526 | Neoplasm of the lung | 165 | ||||
| HP:0030731 | Carcinoma | 16 | ||||
| HP:0002664 | Neoplasm | 6 | ||||
| HP:0030692 | Brain neoplasm lung morphology | 4 | ||||
| HP:0002088 | Cutaneous melanoma | 2 | ||||
| HP:0012056 | Pleural effusion | 2 | ||||
| HP:0002202 | 14 more… | 2 | ||||
| … | … | |||||
| 2 | T790M | Thr790Met | 289 | HP:0030358 | Non-small cell lung carcinoma | 222 |
| HP:0100526 | Neoplasm of the lung | 154 | ||||
| HP:0030731 | Carcinoma | 20 | ||||
| HP:0002664 | Neoplasm | 10 | ||||
| HP:0002088 | Abnormal lung morphology | 4 | ||||
| HP:0030357 | Small cell lung carcinoma | 3 | ||||
| HP:0005584 | Renal cell carcinoma | 2 | ||||
| … | 17 more… | .. | ||||
| 3 | V600E | 228 | HP:0012056 | Cutaneous melanoma | 98 | |
| HP:0100834 | Neoplasm of the large intestine | 31 | ||||
| HP:0030358 | Non-small cell lung carcinoma | 28 | ||||
| HP:0100526 | Neoplasm of the lung | 25 | ||||
| HP:0002664 | Neoplasm | 21 | ||||
| HP:0030731 | Carcinoma | 15 | ||||
| HP:0000854 | Thyroid adenoma | 13 | ||||
| … | 53 more… | 13 | ||||
| 4 | T315I | Thr315Ile threonine 315 to isoleucine | 98 | HP:0001909 | Leukemia | 83 |
| HP:0005506 | Chronic myelogenous leukemia | 73 | ||||
| HP:0012324 | Myeloid leukemia | 67 | ||||
| HP:0005526 | Lymphoid leukemia | 23 | ||||
| HP:0004808 | Acute myeloid leukemia | 5 | ||||
| HP:0002863 | Myelodysplasia | 4 | ||||
| … | 14 more… | … |
The top four commonly cited protein mutations across the clinical trials and their related HPO nodes.
Fig 2Bubble graph showing the key MeSH nodes used to tag clinical trials with protein mutations.
Fig 3Common MeSH terms for clinical trials with RSid and protein mutation frequencies.
HPO Terms with the most cited protein mutations found by MutationsFinder in ClinicalTrials.gov.
| HPO Id | Number Clinical Trials | HPO Node Name | |
|---|---|---|---|
| 1 | HP:0030358 | 382 | Non-small cell lung carcinoma |
| 2 | HP:0100526 | 284 | Neoplasm of the lung |
| 3 | HP:0001909 | 106 | Leukemia |
| 4 | HP:0012056 | 103 | Cutaneous melanoma |
| 5 | HP:0002664 | 78 | Neoplasm |
| 6 | HP:0005506 | 75 | Chronic myelogenous leukemia |
| 7 | HP:0012324 | 75 | Myeloid leukemia |
| 8 | HP:0030731 | 73 | Carcinoma |
| 8 | HP:0100834 | 44 | Neoplasm of the large intestine |
| 10 | HP:0002665 | 36 | Lymphoma |
The 1,939 clinical trials with mutations mapped to 332 unique HPO terms and were referenced 2,447 times.
HPO Terms with the most number of associated mutations.
| HPO Id | Name | # | Mutations | |
|---|---|---|---|---|
| 1 | HP:0002664 | Neoplasm | 75 | C10D,C377T,C677T,C797S,D816V,D835V,D842V,E10A,E17K,E542K,E545K,F1174L,F31I,G12C,G12D,G12V,G13D,G156A,G20210A,G719A,G719C,H1047R,H1112L,H1112Y,H1124D,K652E,L1213V,L265P,L858R,L861Q,M1149T,M1268T,P1009S,P13K,P1446A,P286R,P4503A,Q12H,Q21D,R132C,R132G,R132H,R132L,R132S,R132V,R140L,R140Q,R140W,R172G,R172K,R172M,R172S,R172W,R988C,T1010I,T1191I,T315I,T790M,V1110L,V1206L,V1238I,V411L,V57I,V600D,V600E,V600K,V600M,V600R,V617F,V941L,Y1248C,Y1248D,Y1248H,Y1253D,Y842C |
| 2 | HP:0003002 | Breast carcinoma | 73 | A289T,A864V,C3435T,D538G,D769H,D769N,D769Y,D988Y,E380Q,E542K,E545K,E709K,E757A,G309A,G309E,G598V,G776C,G776V,H1047R,I655V,I767M,L536H,L536P,L536Q,L536R,L755P,L755S,L786V,L841V,L858R,L861Q,L869R,P125A,P12A,P13K,P187S,P535H,P596L,R108K,R222C,R572Y,R678Q,R831C,R831H,R849W,R896C,S310F,S310Y,S463P,S653C,S768I,S8814A,S9313A,T47D,T733I,T790M,T798I,T798M,T862I,V244M,V534E,V600E,V659E,V697L,V742I,V769M,V773M,V774M,V777L,V842I,Y537C,Y537N,Y537S |
| 3 | HP:0030731 | Carcinoma | 57 | C3435T,C420R,C938A,E10A,E542K,E545A,E545D,E545G,E545K,G1049R,G12C,G20210A,G719A,H1047L,H1047R,H1047Y,I105V,I10A,K751Q,L8585R,L858R,L861Q,M1043I,N345K,N375S,P13K,P286R,Q12W,Q546E,Q546K,Q546L,Q546R,R399Q,R776G,R831C,R88Q,S100P,S1400A,S1400C,S1400D,S1400E,S1400F,S1400G,S1400I,S1400K,S1900A,S1900C,S1900D,S768I,T790M,V411L,V600E,V600K,V600R,V617F,V762A,V843I |
| 4 | HP:0002665 | Lymphoma | 52 | A677G,A677V,A687V,C282Y,C481S,E571K,F1174L,G156A,G71R,H1112L,H1112Y,H1124D,H63D,I10A,I1171N,L1213V,L265P,M1149T,M1268T,P1009S,P11A,P13K,P140K,P4503A,Q12H,Q21D,Q28D,R131H,R988C,T1010I,T1191I,T315I,T351I,T790M,V1110L,V1206L,V1238I,V158F,V158M,V600E,V617F,V66M,V941L,Y1248C,Y1248D,Y1248H,Y1253D,Y641C,Y641F,Y641H,Y641N,Y641S |
| 5 | HP:0100526 | Neoplasm of the lung | 52 | C1156Y,C797S,D594G,F1174C,F1174V,G1202R,G1269A,G12C,G12D,G469A,G719A,G719C,G719S,G776C,G776V,I10A,L1196M,L1198F,L523S,L755S,L833F,L8585R,L858R,L859R,L861G,L861Q,L861R,N375S,P13K,P4503A,R776G,R831C,S1400A,S1400C,S1400D,S1400E,S1400F,S1400G,S1400I,S1400K,S1800A,S1900A,S1900C,S1900D,S768I,T790M,T81C,T890M,V600E,V769L,V777L,V843I |
| 6 | HP:0001909 | Leukemia | 51 | C282Y,C481S,D816V,D835Y,E255K,E255V,F317C,F317L,F317S,F317V,F31I,F359C,F359V,G250E,G71R,H369P,H63D,L248R,L248V,N682S,P140K,P1446A,P4503A,Q12H,Q252H,R132C,R132G,R132H,R132L,R132S,R132V,R140L,R140Q,R140W,R172G,R172K,R172M,R172S,R172W,S1612C,S9333A,T315A,T315I,T351I,V158M,V299L,V57I,V600E,V617F,V66M,Y253H |
| 7 | HP:0030358 | Non-small cell lung carcinoma | 43 | C797S,C8092A,D594G,F1174C,F1174V,G1202R,G1269A,G12C,G12D,G12V,G13D,G2032R,G469A,G719A,G719C,G719S,G776C,G776V,I10A,L1196M,L523S,L755S,L833F,L8585R,L858R,L861G,L861Q,L861R,P13K,P4503A,R776G,R831C,S1800A,S1900A,S1900C,S768I,T790M,T81C,V600E,V600K,V769L,V777L,V843I |
| 8 | HP:0012539 | Non-Hodgkin lymphoma | 42 | A1298C,A222V,A677G,A677V,A687V,C677T,F1174L,G71R,H1112L,H1112Y,H1124D,L1213V,M1149T,M1268T,P1009S,P13K,P140K,P4503A,Q12H,Q30R,R988C,T1010I,T1191I,T315I,T790M,V1110L,V1206L,V1238I,V158M,V617F,V66M,V941L,Y1248C,Y1248D,Y1248H,Y1253D,Y641C,Y641F,Y641H,Y641N,Y641S,Y93C |
The 1,939 clinical trials with mutations mapped to 332 unique HPO terms and were referenced
2,447 times.
Fig 4Frequency of different HPO terms across clinical trials, across trials with RSids, and across trials with protein mutations.
Intervention types for clinical trials with mutations.
| Intervention Type | Number of Clinical Trials | Percent mapped to CT with Genes | Percent with RSid | Percent with mutations | |
|---|---|---|---|---|---|
| 1 | Behavioral | 35,450 | 51.5% | 0.055% | 0.12% |
| 2 | Biological | 16,370 | 54.6% | 0.084% | 0.93% |
| 3 | Combination Product | 1152 | 61.5% | 0.11% | 0.52% |
| 4 | Device | 43,079 | 60.1% | 0.025% | 0.1% |
| 5 | Diagnostic Test | 6,299 | 67.6% | 0.255% | 0.4% |
| 6 | Dietary Supplement | 10,882 | 55.7% | 0.24% | 0.36% |
| 7 | Drug | 98,048 | 65.9% | 0.14% | |
| 8 | Genetic | 1,189 | 72.8% | ||
| 9 | Other | 52,885 | 54.8% | 0.12% | 0.43% |
| 10 | Procedure | 33,045 | 62.8% | 0.035% | 0.27% |
| 11 | Radiation | 3,650 | 0.12% |
Eleven different categories of Interventions along with the number of unique tags in each category. Additionally, the percent of clinical trials that mapped to HPO nodes with associated genes, clinical trials with RSids, and clinical trials with protein mutations are illustrated.
Fig 5Percentage of clinical trials in each of the eleven categories with RSids and protein mutations.
(a) The first graph shows the relative frequency of clinical trials in each of the eleven Intervention types. (b) The second shows the percent of clinical trials in each of the categories that link to an HPO term and has an associated gene. (c) The third shows the relative frequency of clinical trials in each of the categories that had an associated RSid. (d) The fourth shows the percent of clinical trials in each of the categories that had an associated protein mutation.
Related HPO terms using co-occurrences of RSids and HPO terms.
| HPO Id | HPO Term | Related HPO Term | Score | |
|---|---|---|---|---|
| 1 | HP:0001909 | Leukemia | HP:0012324 Myeloid leukemia | 0.69 |
| HP:0005526 Lymphoid leukemia | 0.58 | |||
| HP:0005506 Chronic myelogenous leukemia | 0.58 | |||
| HP:0002665 Lymphoma | 0.45 | |||
| HP:0004808 Acute myeloid leukemia | 0.39 | |||
| HP:0005550 Chronic lymphatic leukemia | 0.37 | |||
| HP:0012539 Non-Hodgkin lymphoma | 0.26 | |||
| HP:0004757 Paroxysmal atrial fibrillation | 0.13 | |||
| HP:0100607 Dysmenorrhea | 0.12 | |||
| HP:0000716 Depressivity | 0.1 | |||
| 2 | HP:0000819 | Diabetes mellitus | HP:0005978 Type II diabetes mellitus | 0.57 |
| HP:0100651 Type I diabetes mellitus | 0.5 | |||
| HP:0000077 Abnormality of the kidney | 0.45 | |||
| HP:0011998 Postprandial hyperglycemia | 0.45 | |||
| HP:0012622 Chronic kidney disease | 0.38 | |||
| HP:0001824 Weight loss | 0.29 | |||
| HP:0001392 Abnormality of the liver | 0.27 | |||
| HP:0000855 Insulin resistance | 0.27 | |||
| HP:0011024 Abnormality of the gastrointestinal tract | 0.25 | |||
| HP:0001871 Abnormality of blood and blood-forming tissues | 0.25 | |||
| HP:0001397 Hepatic steatosis | 0.24 | |||
| HP:0001513 Obesity | 0.12 | |||
| HP:0001626 Abnormality of the cardiovascular system | 0.067 | |||
| HP:0001677 Coronary artery atherosclerosis | 0.057 | |||
| 3 | HP:0001824 | Weight loss | HP:0011024 Abnormality of the gastrointestinal tract | |
| HP:0001871 Abnormality of blood and blood-forming tissues | 0.58 | |||
| HP:0000819 Diabetes mellitus | 0.58 | |||
| HP:0012622 Chronic kidney disease | 0.29 | |||
| HP:0100651 Type I diabetes mellitus | 0.29 | |||
| HP:0001513 Obesity | 0.29 | |||
| HP:0000077 Abnormality of the kidney | 0.26 | |||
| HP:0001392 Abnormality of the liver | 0.2 | |||
| HP:0000855 Insulin resistance | 0.2 | |||
| HP:0001397 Hepatic steatosis | 0.18 | |||
| HP:0001626 Abnormality of the cardiovascular system | 0.15 |
Results from finding similar HPO terms using occurrence of RSids as dimensions. The above results are representative, and the complete analysis, with the Java API, can be downloaded from the SNP Miner homepage.