| Literature DB >> 35382578 |
Katherine Roper1, A Abdel-Rehim2, Sonya Hubbard1, Martin Carpenter1, Andrey Rzhetsky3, Larisa Soldatova4, Ross D King2,5,6,7.
Abstract
Scientific results should not just be 'repeatable' (replicable in the same laboratory under identical conditions), but also 'reproducible' (replicable in other laboratories under similar conditions). Results should also, if possible, be 'robust' (replicable under a wide range of conditions). The reproducibility and robustness of only a small fraction of published biomedical results has been tested; furthermore, when reproducibility is tested, it is often not found. This situation is termed 'the reproducibility crisis', and it is one the most important issues facing biomedicine. This crisis would be solved if it were possible to automate reproducibility testing. Here, we describe the semi-automated testing for reproducibility and robustness of simple statements (propositions) about cancer cell biology automatically extracted from the literature. From 12 260 papers, we automatically extracted statements predicted to describe experimental results regarding a change of gene expression in response to drug treatment in breast cancer, from these we selected 74 statements of high biomedical interest. To test the reproducibility of these statements, two different teams used the laboratory automation system Eve and two breast cancer cell lines (MCF7 and MDA-MB-231). Statistically significant evidence for repeatability was found for 43 statements, and significant evidence for reproducibility/robustness in 22 statements. In two cases, the automation made serendipitous discoveries. The reproduced/robust knowledge provides significant insight into cancer. We conclude that semi-automated reproducibility testing is currently achievable, that it could be scaled up to generate a substantive source of reliable knowledge and that automation has the potential to mitigate the reproducibility crisis.Entities:
Keywords: biology; cancer; literature; reproducibility; robustnesses; testings
Mesh:
Year: 2022 PMID: 35382578 PMCID: PMC8984295 DOI: 10.1098/rsif.2021.0821
Source DB: PubMed Journal: J R Soc Interface ISSN: 1742-5662 Impact factor: 4.118
Figure 1The overall process of testing the reproducibility and robustness of the cancer biology literature by robot. First, text mining is used to extract statements about the effect of drugs on gene expression in breast cancer. Then two different teams semi-automatically tested these statements using two different protocols, and two different cell lines (MCF7 and MDA-MB-231) using the laboratory automation system Eve.
The list of statements about the effect of a drug on gene expression levels (textual propositions) tested for reproducibility and robustness.
| gene | drug | id | |
|---|---|---|---|
| 1 | AKT1 | 4OHT | PMC3711340_E360 |
| 2 | AKT1 | curcumin | PMC4708990_E2037 |
| 3 | AKT1 | EGCG | PMC2927993_E10333 |
| 4 | ATF4 | NAC | PMC4546701_E754 |
| 5 | BIRC5 | curcumin | PMC2756684_E6964 |
| 6 | BIRC5 | daidzein | PMC2944964_E8929 |
| 7 | BIRC5 | doxorubicin | PMC2649216_E5319 |
| 8 | BIRC5 | paclitaxel | PMC2826345_E10033 |
| 9 | BRCA2 | daidzein | PMC2361140_E3414 |
| 10 | BRCA1 | indol-3-carbinol | PMC4346871_E712 |
| 11 | CASP3 | quercetin | PMC2712839_E6241 |
| 12 | CCND1 | 4OHT | PMC2882356_E7162 |
| 13 | CCND1 | curcumin | PMC3206621_E15380 |
| 14 | CCND1 | resveratrol | PMC4000631_E146 |
| 15 | CCND1 | SAHA | PMC3355273_E18930 |
| 16 | CCND1 | salinomycin | PMC4631341_E1017 |
| 17 | CTNNB1 | cordycepin | PMC3784440_E402 |
| 18 | CTNNB1 | curcumin | PMC3706856_E361 |
| 19 | CTNNB1 | EGCG | PMC2933702_E10181 |
| 29 | EGFR | curcumin | PMC3206621_E15401 |
| 21 | EGFR | doxorubicin | PMC3181057_E14848 |
| 22 | ERBB2 | curcumin | PMC4003153_E459 |
| 23 | ERBB3 | fulvestrant | PMC2875575_E10985 |
| 24 | ESR1 | 4OHT | PMC2882356_E7158 |
| 25 | ESR1 | curcumin | PMC2705850_E4569 |
| 26 | ESR1 | EGCG | PMC2967543_E11055 |
| 27 | ESR1 | fulvestrant | PMC3139592_E14864 |
| 28 | ESR1 | pterostilbene | PMC4134202_E1283 |
| 29 | ESR1 | quercetin | PMC4228827_E129 |
| 30 | ESR1 | resveratrol | PMC3521661_E722 |
| 31 | HDAC1 | curcumin | PMC3625766_E1801 |
| 32 | HDAC1 | resveratrol | PMC3625766_E1802 |
| 33 | HDAC1 | SAHA | PMC3498753_E565 |
| 34 | HIF1A | doxorubicin | PMC4024011_E700 |
| 35 | HIF1A | melatonin | PMC4123875_E984 |
| 36 | HIF1A | zoledronic_acid | PMC4496173_E126 |
| 37 | HSP90 | quercetin | PMC3652296_E1279 |
| 38 | IL8 | NAC | PMC4463759_E1355 |
| 39 | MAPT | 4OHT | PMC2917038_E8406 |
| 40 | MAPT | fulvestrant | PMC2917038_E8306 |
| 41 | MELK | paclitaxel | PMC3857210_E1352 |
| 42 | MMP-2 | silibinin | PMC4006687_E357 |
| 43 | MMP-9 | curcumin | PMC4176907_E1376 |
| 44 | MMP-9 | silibinin | PMC4196436_E1516 |
| 45 | MTOR | SAHA | PMC3840459_E1427 |
| 46 | NFK1B | quercetin | PMC3747514_E565 |
| 47 | p21 | doxorubicin | PMC3765348_E744 |
| 48 | p21 | paclitaxel | PMC2394338_E3767 |
| 49 | p21 | resveratrol | PMC2364738_E2929 |
| 50 | p21 | vinorelbine | PMC2394338_E3826 |
| 51 | p27 | curcumin | PMC3706856_E382 |
| 52 | p300 | curcumin | PMC3255482_E16909 |
| 53 | p53 | caffeic_acid | PMC2928446_E12078 |
| 54 | p53 | doxorubicin | PMC4228062_E94 |
| 55 | p53 | etoposide | PMC4400643_E1283 |
| 56 | p53 | hesperidin | PMC4177652_E1404 |
| 57 | p53 | resveratrol | PMC2928446_E12079 |
| 58 | PDK1 | curcumin | PMC4192446_E1344 |
| 59 | PGR | letrozole | PMC1064088_E125 |
| 60 | PTEN | resveratrol | PMC2957324_E13190 |
| 61 | PTEN | silibinin | PMC3148510_E16237 |
| 62 | RASSF1 | 4OHT | PMC3977804_E166 |
| 63 | STAT3 | curcumin | PMC3584822_E1221 |
| 64 | STAT3 | doxorubicin | PMC4589559_E1201 |
| 65 | STAT3 | paclitaxel | PMC4467444_E173 |
| 66 | STK11 | honokiol | PMC3496153_E906 |
| 67 | TNF | paclitaxel | PMC2830051_E9591 |
| 68 | TXNIP | resveratrol | PMC3733924_E363 |
| 69 | uPA | EGCG | PMC4006687_E360 |
| 70 | uPA | silibinin | PMC4006687_E360 |
| 71 | VEGFA | EGCG | PMC3708553_E323 |
| 72 | VEGFA | melatonin | PMC3708553_E323 |
| 73 | VEGFA | NAC | PMC3929894_E1687 |
| 74 | VEGFA | paclitaxel | PMC3682088_E5 |
The list of repeatable results. These drugs were found to produce statistically significant changes in the expression of the genes. Human reading—what was found by human annotators: text—is the direction of change of gene expression (↑ increase, stimulation; ↓ decrease, inhibition); MCF7—whether the change was found using the MCF7 cell line; MDA—whether the change was found using the MDA-MB-231 cell line. Text mining—the direction of change of gene expression identified automatically by the computer. MCF7—the results of the robotic experiments using the MCF7 cell line. MDA—the results of the robotic experiments using the MDA-MB-231 cell line. Team 1—the statistical significance found by team 1; team 2—the statistical significance found by team 2; sign—the direction of change of gene expression.
| gene | drug | human reading | text mining | MCF7 | MDA | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| text | MCF7 | MDA | sign | team 1 | sign | team 2 | sign | team 1 | sign | team 2 | sign | |||
| 1 | AKT1 | 4OHT | ↓a | Y | N | — | — | — | 0.0009766 | ↓ | — | — | — | — |
| 2 | BIRC5 | doxorubicin | ↑ | N | N | ↑ | — | — | 0.0004883 | ↓ | — | — | — | — |
| 3 | BRCA2 | daidzein | —b | Y | N | ↑ | — | — | — | — | — | — | 0.0175781 | ↑ |
| 4 | BRCA1 | indol-3-carbinol | ↑ | Y | N | ↑ | — | — | — | — | — | — | 0.0009766 | ↓ |
| 5 | CASP3 | quercetin | ↑ | N | N | ↑ | — | — | — | — | — | — | 0.0004883 | ↑ |
| 6 | CCND1 | curcumin | ↓ | Y | Y | ↓ | — | — | — | — | — | — | 0.0039063 | ↑ |
| 7 | CCND1 | SAHA | ↓ | Y | N | — | 0.0039063 | ↓ | 0.0002441 | ↓ | 0.0019531 | ↓ | 0.0268555 | ↓ |
| 8 | CTNNB1 | cordycepin | ↓c | Y | N | —/↓ | — | — | — | — | 0.03125 | ↓ | — | — |
| 9 | CTNNB1 | curcumin | ↓ | Y | Y | ↓ | — | — | 0.0097656 | ↓ | — | — | — | — |
| 10 | CTNNB1 | EGCG | ↓ | N | N | ↓ | — | — | — | — | 0.03125 | ↓ | — | — |
| 11 | EGFR | curcumin | ↓ | N | N | ↓ | — | — | — | — | — | — | 0.0175781 | ↑ |
| 12 | EGFR | doxorubicin | ↓ | N | N | ↓ | — | — | — | — | — | — | 0.0002441 | ↓ |
| 13 | ERBB3 | fulvestrant | ↑ | Y | N | ↓ | — | — | 0.0004883 | ↑ | — | — | — | — |
| 14 | ESR1 | 4OHT | ↓ | Y | Nd | ↓ | — | — | 0.03125 | ↓ | — | — | — | — |
| 15 | ESR1 | pterostilbene | ↓ | Ye | N | ↓ | — | — | 0.0009766 | ↓ | — | — | — | — |
| 16 | ESR1 | quercetin | ↓ | N | N | ↓ | — | — | 0.015625 | ↓ | — | — | — | — |
| 17 | HIF1A | doxorubicin | ↓ | N | N | ↓ | — | — | 0.0002441 | ↓ | — | — | — | — |
| 18 | MAPT | 4OHT | ↑ | Y | N | ↑ | — | — | — | — | 0.03125 | ↑ | — | — |
| 19 | MAPT | fulvestrant | ↓ | Y | N | ↓ | — | — | 0.0029297 | ↓ | — | — | — | |
| 20 | MMP-2 | silibinin | ↓ | N | N | ↓ | — | — | — | — | — | — | 0.0175781 | ↓ |
| 21 | MMP-9 | curcumin | ↓f | N | N | ↓ | — | — | 0.0078125 | ↓ | — | — | 0.0703125 | ↓ |
| 22 | MTOR | SAHA | ↓ | unclear | unclear | ↓ | — | — | — | — | — | — | 0.0009766 | ↑ |
| 23 | NFK1B | quercetin | ↓ | N | N | ↓ | — | — | 0.0019531 | ↓ | — | — | 0.0439453 | ↑ |
| 24 | p21 | doxorubicin | ↑ | N | N | ↑ | — | — | 0.015625 | ↓ | — | — | — | — |
| 25 | p21 | paclitaxel | ↑ | N | N | ↑ | 0.015625 | ↑ | — | — | — | — | — | — |
| 26 | p21 | resveratrol | ↑ | Y | N | ↑ | — | — | — | — | 0.015625 | ↓ | — | — |
| 27 | p300 | curcumin | ↓ | N | N | ↓ | — | — | — | — | — | — | 0.03125 | ↑ |
| 28 | p53 | caffeic acid | ↑ | N | N | ↑ | — | — | — | — | — | — | 0.0439453 | ↑ |
| 29 | p53 | etoposide | ↑g | Y | Y | ↑ | 0.03125 | ↓ | 0.03125 | ↓ | 0.03125 | ↓ | 0.0053711 | ↓ |
| 30 | p53 | hesperidin | ↑h | N | N | ↑ | — | — | 0.0039063 | ↓ | 0.0703125 | ↓ | — | — |
| 31 | p53 | resveratrol | — | Y | N | ↑ | — | — | — | — | 0.0175781 | ↓ | — | — |
| 32 | PDK1 | curcumin | ↓a | N | N | ↓ | — | — | 0.0039063 | ↓ | — | — | — | — |
| 33 | PGR | letrozole | ↓ | N | N | ↓ | — | — | 0.0010376 | ↓ | — | — | — | — |
| 34 | PTEN | resveratrol | ↑ | Y | N | ↑ | — | — | 6.87 × 10−5 | ↓ | — | — | 0.0019531 | ↓ |
| 35 | PTEN | silibinin | — | N | N | ↓ | — | — | 3.05 × 10−5 | ↓ | — | — | — | — |
| 36 | STAT3 | curcumin | — | Y | Y | ↓ | — | — | — | — | — | — | 0.03125 | ↑ |
| 37 | STAT3 | doxorubicin | ↑ | Y | N | ↑ | — | — | — | — | — | — | 0.0039063 | ↑ |
| 38 | STAT3 | paclitaxel | ↓i | N | N | — | 0.015625 | ↑ | — | — | — | — | — | — |
| 39 | TXNIP | resveratrol | ↑↓j | Y | N | ↑ | — | — | 0.0004883 | ↓ | — | — | — | — |
| 40 | uPA | EGCG | ↓ | N | N | ↓ | — | — | — | — | — | — | 0.0009766 | ↑ |
| 41 | uPA | silibinin | — | N | N | ↓ | — | — | 0.0001221 | ↓ | — | — | — | — |
| 42 | VEGFA | melatonin | ↓ | N | N | ↓ | — | — | — | — | — | — | 0.0004883 | ↓ |
| 43 | VEGFA | NAC | — | N | N | ↓ | — | — | — | — | — | — | 0.0078125 | ↓ |
aInhibition in paper ‘not significant’.
bRefers to a different paper.
cNo effect claimed in text, but appears in a figure.
dGene missing.
eMCF7 with constructs.
fThe paper is a review.
gThe paper does not describe TP53, but rather a splice variant of TP53.
hRefers to another paper with NALM-6 cells.
iphospho Stat3.
jBiphasic depending on concentration.
The list of reproducible results. These effects of drugs on gene expression levels were successfully read from the literature by text mining and were experimentally confirmed using semi-automatic robotic experiments.
| cell | ↑↓ | drug | gene/protein | significance |
|---|---|---|---|---|
| It is of clinical interest that 4OHT both inhibits the receptor and inhibits the expression of ESR1. It is unclear if this effect is beneficial in cancer treatment or not. ESR1 is missing from MDA-MB-231. | ||||
| In cancer treatment it is generally considered desirable to inhibit AKT. | ||||
| This statement has perhaps the strongest evidence for reproducibility ( | ||||
| In cancer treatment it is generally considered desirable to inhibit CTNNB1, so the inhibition of CTNNB1 is a desirable effect of curcumin. | ||||
| We did not observe changes in expression in MDA-MB-231, which is consistent with action through the oestrogen receptor. The reproduced observation of increased ERBB3 expression with fulvestrant may be of concern in cancer treatment. | ||||
| The inhibitory effect of fulvestrant on MAPT may cause unwanted neural side-effects. |
The list of minor robust results. These statements about the effect of drugs on gene expression were about MCF7 cells but were confirmed in MDA-MB-231 cells [33,34].
| ↑↓ | drug | gene | notes |
|---|---|---|---|
| Interestingly, this is the only case where the result was also confirmed in MCF7, i.e. it was reproduced and robustly confirmed. It is unclear why in the other cases, where the original paper reported an effect in MCF7, we only saw an effect in MDA-MB-231. | |||
| Cordycepin is a derivative of the nucleoside adenosine. Our interpretation of the evidence in [ | |||
| This statement is interesting as the increased expression of the gene product of MAPT by 4OHT may cause unwanted side-effects in cancer treatment | |||
| Doxorubicin (DXR) is an anti-cancer drug, a 14-hydroxylated version of daunorubicin. Doxorubicin interacts with DNA by intercalation and inhibition of macromolecular biosynthesis. STAT3 is a transcription factor which plays a key role in many cellular processes such as cell growth and apoptosis. STAT3 may promote oncogenesis by being constitutively active. |
The list of major robust results. In major robustness the original textual statement was about neither MCF7 nor MDA-MB-231 cells. Notes: In one case, ↑PTEN/resveratrol, we see a consistently opposite effect in both MCF7 and MDA-MB-231 cells to that observed in the paper in MCF7 cells. This observation does not invalidate the replicability of the original result, but it does raise questions about its reproducibility. PTEN (phosphatase and tensin homologue) acts as a tumour suppressor gene. Up to 70% of primary prostate tumours lose one PTEN allele and retain the other copy [36]. Resveratrol (3,5,4′-trihydroxy-trans-stilbene) is a stilbenoid, a natural plant product. Resveratrol is associated with possible life longevity. The inhibition of PTEN by resveratrol is potentially of clinical concern.
| ↑↓ | drug | gene | notes |
|---|---|---|---|
| The gene product of CASP3 protein is a cysteine–aspartic acid protease (caspase). Activation of caspases plays a central role in the execution phase of cell apoptosis. Quercetin is a plant flavonol; quercetin supplements have been promoted for the treatment of cancer. | |||
| EGCG (epigallocatechin gallate) is the most abundant catechin in tea. | |||
| The gene product of EGFR (epidermal growth factor receptor) is a receptor for members of the epidermal growth factor family (EGF family). Mutations that lead to EGFR overexpression are associated with a number of cancers. | |||
| — | |||
| HIF1A is a subunit of a heterodimeric of hypoxia-inducible factor 1, a transcription factor that responds to decreases in available oxygen in the cellular environment, or hypoxia. (The 2019 Nobel Prize in Physiology or Medicine was partly awarded for discovery of this function.) The dysregulation and overexpression of | |||
| The gene product of MMP-2 is a zinc metalloproteinase (matrix metalloproteinase-9). It cleaves collagen type IV. Degradation of collagen IV in basement membrane and extracellular matrix facilitates tumour progression, including invasion, metastasis, growth and angiogenesis. | |||
| The gene product of MMP-9 is a zinc metalloproteinase that cleaves gelatin types I and V and collagen types IV and V. | |||
| Paclitaxel is a natural plant product used to treat many cancers. Its mode of action is through targeting tubulin. Paclitaxel stabilizes the microtubule polymer and protects it from disassembly; chromosomes therefore fail to achieve a metaphase spindle configuration. | |||
| Caffeic acid is a natural plant product that is being investigated for anti-cancer treatment. The observation of significantly increased promotion of P53 in MDA-MB-231 may be linked to the fact that this gene is mutated and expressed at high levels relative to MCF7 cells. | |||
| The gene product of PDK1 (protein 3-phosphoinositide-dependent protein kinase-1). It is a central kinase in cell signalling. | |||
| The gene product of PGR is a progesterone receptor. Mutations in PGR are associated with breast cancer. Letrozole is an aromatase inhibitor that is used in the treatment of hormonally responsive breast cancer. Our observation of inhibition in MCF7, but not MDA-MB-231 ( | |||
| VEGFA (vascular endothelial growth factor A) is in the platelet-derived growth factor family of cystine-knot growth factors. The VEGF family stimulate cellular responses by binding to tyrosine kinase receptors. Melatonin ( |
Stage 1. Every event is of the desired form simple chemical affecting a gene/protein—thus allowing for convenient experimentation. There are no ‘duplicated’ events in the results. Groundings into UniProt and Chebi are attempted, and provided where there is reasonable confidence in their accuracy.
| heuristic | matching statements |
|---|---|
| chemical as object | 8084 |
| protein as subject | 33 202 |
| grounded proteins as subject | 13 219 |
| grounded chemicals as object | 6209 |
| chemical object, protein subject | 7174 |
| grounded chemical, protein subject | 5501 |
| grounded protein and chemical | 1999 |
| cell line data, anything allowed—sentence + methods section | 5129 |
| cell line data, only ‘known’ names allowed | 2363 |
Stage 2. Subject protein present as a node in the Petri net model—this checks if the subject protein is present as a node in the Petri net model or not. Subject protein present as a node in the Chicago model—this checks if the subject protein is present as a node in the Boolean model from Chicago or not. Event of type gene expression. Object chemical is known to be commercially available—this is a check that the chemical in question can be purchased at a reasonably plausible price/time scale—conceptually this could be done automatically.
| heuristic | matching statements |
|---|---|
| protein names matching to Petri net model | 5340 |
| grounded proteins matching to Petri net model | 395 |
| protein names matching to Chicago model | 6531 |
| grounded proteins matching to Chicago model | 3404 |
| names matching Chicago or Petri net model | 9413 |
| grounding match Chicago or Petri net model | 3474 |
| gene expression event | 9393 |
| known commercially available chemical | 2172 |
| known commercially available chemical and Chebi grounded | 1911 |