| Literature DB >> 30689846 |
Rezarta Islamaj Dogan1, Sun Kim1, Andrew Chatr-Aryamontri2, Chih-Hsuan Wei1, Donald C Comeau1, Rui Antunes3, Sérgio Matos3, Qingyu Chen4, Aparna Elangovan4, Nagesh C Panyam4, Karin Verspoor4, Hongfang Liu5, Yanshan Wang5, Zhuang Liu6, Berna Altinel7, Zehra Melce Hüsünbeyi8, Arzucan Özgür, Aris Fergadis9, Chen-Kai Wang10, Hong-Jie Dai11, Tung Tran12, Ramakanth Kavuluru13, Ling Luo14, Albert Steppi15, Jinfeng Zhang15, Jinchan Qu15, Zhiyong Lu1.
Abstract
The Precision Medicine Initiative is a multicenter effort aiming at formulating personalized treatments leveraging on individual patient data (clinical, genome sequence and functional genomic data) together with the information in large knowledge bases (KBs) that integrate genome annotation, disease association studies, electronic health records and other data types. The biomedical literature provides a rich foundation for populating these KBs, reporting genetic and molecular interactions that provide the scaffold for the cellular regulatory systems and detailing the influence of genetic variants in these interactions. The goal of BioCreative VI Precision Medicine Track was to extract this particular type of information and was organized in two tasks: (i) document triage task, focused on identifying scientific literature containing experimentally verified protein-protein interactions (PPIs) affected by genetic mutations and (ii) relation extraction task, focused on extracting the affected interactions (protein pairs). To assist system developers and task participants, a large-scale corpus of PubMed documents was manually annotated for this task. Ten teams worldwide contributed 22 distinct text-mining models for the document triage task, and six teams worldwide contributed 14 different text-mining systems for the relation extraction task. When comparing the text-mining system predictions with human annotations, for the triage task, the best F-score was 69.06%, the best precision was 62.89%, the best recall was 98.0% and the best average precision was 72.5%. For the relation extraction task, when taking homologous genes into account, the best F-score was 37.73%, the best precision was 46.5% and the best recall was 54.1%. Submitted systems explored a wide range of methods, from traditional rule-based, statistical and machine learning systems to state-of-the-art deep learning methods. Given the level of participation and the individual team results we find the precision medicine track to be successful in engaging the text-mining research community. In the meantime, the track produced a manually annotated corpus of 5509 PubMed documents developed by BioGRID curators and relevant for precision medicine. The data set is freely available to the community, and the specific interactions have been integrated into the BioGRID data set. In addition, this challenge provided the first results of automatically identifying PubMed articles that describe PPI affected by mutations, as well as extracting the affected relations from those articles. Still, much progress is needed for computer-assisted precision medicine text mining to become mainstream. Future work should focus on addressing the remaining technical challenges and incorporating the practical benefits of text-mining tools into real-world precision medicine information-related curation.Entities:
Mesh:
Year: 2019 PMID: 30689846 PMCID: PMC6348314 DOI: 10.1093/database/bay147
Source DB: PubMed Journal: Database (Oxford) ISSN: 1758-0463 Impact factor: 3.451
Figure 1A positive example from the Biocreative VI Precision Medicine Track corpus.
Statistics of the precision medicine track data set
|
|
|
|
|
|
|
|---|---|---|---|---|---|
| Training | 4082 | 1729 | 2353 | 597 | 752 |
| Testing | 1427 | 704 | 723 | 635 | 869 |
Participating teams and their number of submissions
|
|
|
|
|
|
|---|---|---|---|---|
| 374 | University of Aveiro | Portugal | 3 | - |
| 375 | University of Melbourne | Australia | 3 | 3 |
| 379 | Mayo Clinic | USA | 1 | 2 |
| 391 | Dalian University of Technology | China | - | 3 |
| 405 (Team withdrew) | - | - | 1 | 2 |
| 414 | Boğaziçi University | Turkey | 3 | - |
| 418 | National Technical University of Athens | Greece | 3 | - |
| 419 | Taipei Medical University | Taiwan | 3 | - |
| 420 | University of Kentucky | USA | 1 | 3 |
| 421 | Dalian University of Technology | China | 3 | - |
| 433 | Florida State University | USA | 1 | 1 |
Document triage task results for all submissions
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|
| 374 | Run 1 | 0.6616 | 0.5864 | 0.8338 | 0.6886 | JSON |
| Run 2 | 0.6677 | 0.5700 | 0.8736 | 0.6898 | JSON | |
| Run 3 | 0.6929 | 0.6070 | 0.7898 | 0.6864 | JSON | |
| 375 | Run 1 | 0.6822 | 0.5783 | 0.7713 | 0.6610 | JSON |
| Run 2 | 0.6722 | 0.5936 | 0.7116 | 0.6473 | JSON | |
| Run 3 | 0.6744 | 0.5361 | 0.8849 | 0.6677 | JSON | |
| 379 | Run 1 | 0.4904 | 0.4649 | 0.3480 | 0.3981 | XML |
| 405 | Run 1 | 0.5871 | 0.5484 | 0.5710 | 0.5595 | JSON |
| 414 | Run 1 | 0.4847 | 0.4734 | 0.5824 | 0.5223 | XML |
| Run 2 | 0.5057 | 0.4927 | 0.7202 | 0.5851 | XML | |
| Run 3 | 0.5077 | 0.5022 |
| 0.6641 | XML | |
| 418 | Run 1 | 0.6959 | 0.6136 | 0.7670 | 0.6818 | XML |
| Run 2 | 0.7068 | 0.5944 | 0.8139 | 0.6871 | XML | |
| Run 3 | 0.7158 |
| 0.7656 |
| XML | |
| 419 | Run 1 | 0.5797 | 0.5713 | 0.8253 | 0.6752 | XML |
| Run 2 | 0.5986 | 0.5865 | 0.6065 | 0.5964 | XML | |
| Run 3 | 0.6334 | 0.5992 | 0.6222 | 0.6105 | XML | |
| 420 | Run 1 | 0.6439 | 0.5438 | 0.8736 | 0.6703 | JSON |
| 421 | Run 1 | 0.6678 | 0.5850 | 0.8111 | 0.6798 | XML |
| Run 2 |
| 0.6073 | 0.7997 | 0.6904 | XML | |
| Run 3 | 0.7084 | 0.5857 | 0.8352 | 0.6885 | XML | |
| 433 | Run 1 | 0.6632 | 0.5413 | 0.8835 | 0.6713 | JSON |
| BASELINE | - | 0.6515 | 0.6122 | 0.6435 | 0.6274 | - |
Relation extraction task exact match results for all submissions
|
|
|
|
|
|
|
|---|---|---|---|---|---|
| 375 | Run 1 | 0.3506 | 0.3349 | 0.3426 | XML |
| Run 2 | 0.3506 | 0.3349 | 0.3426 | XML | |
| Run 3 |
| 0.3084 |
| XML | |
| 379 | Run 1 | 0.2602 | 0.0736 | 0.1148 | XML |
| Run 2 | 0.1015 |
| 0.1694 | XML | |
| 391 | Run 1 | 0.2253 | 0.1887 | 0.2054 | XML |
| Run 2 | 0.2222 | 0.1772 | 0.1972 | XML | |
| Run 3 | 0.2306 | 0.1300 | 0.1663 | XML | |
| 405 | Run 1 | 0.0590 | 0.0196 | 0.0294 | JSON |
| Run 2 | 0.0692 | 0.0253 | 0.0371 | JSON | |
| 420 | Run 1 | 0.3555 | 0.2336 | 0.2819 | JSON |
| Run 2 | 0.3739 | 0.2509 | 0.3003 | JSON | |
| Run 3 | 0.3494 | 0.2417 | 0.2857 | JSON | |
| 433 | Run 1 | 0.0580 | 0.2014 | 0.0900 | JSON |
| BASELINE | - | 0.1091 | 0.4741 | 0.1774 |
Relation extraction task HomoloGene results for all submissions
|
|
|
|
|
|
|
|---|---|---|---|---|---|
| 375 | Run 1 | 0.3807 | 0.3573 | 0.3686 | XML |
| Run 2 | 0.3807 | 0.3573 | 0.3686 | XML | |
| Run 3 | 0.4318 | 0.3341 |
| XML | |
| 379 | Run 1 | 0.3102 | 0.0777 | 0.1243 | XML |
| Run 2 | 0.1160 |
| 0.1910 | XML | |
| 391 | Run 1 | 0.2348 | 0.1972 | 0.2144 | XML |
| Run 2 | 0.2337 | 0.1868 | 0.2076 | XML | |
| Run 3 | 0.2398 | 0.1357 | 0.1733 | XML | |
| 405 | Run 1 | 0.0804 | 0.0267 | 0.0401 | JSON |
| Run 2 | 0.1044 | 0.0383 | 0.0560 | JSON | |
| 420 | Run 1 | 0.4417 | 0.2900 | 0.3501 | JSON |
| Run 2 |
| 0.3109 | 0.3727 | JSON | |
| Run 3 | 0.4379 | 0.3028 | 0.3580 | JSON | |
| 433 | Run 1 | 0.0801 | 0.2749 | 0.1241 | JSON |
| BASELINE | - | 0.1468 | 0.5197 | 0.2290 |
Overview of how many submissions correctly identified the protein interactions affected by mutations in the test set
|
| ||
|---|---|---|
|
|
|
|
| 0 | 249 | 28.89 |
| 1 | 81 | 9.40 |
| 2 | 46 | 5.34 |
| 3 | 115 | 13.34 |
| 4 | 96 | 11.14 |
| 5 | 55 | 6.38 |
| 6 | 39 | 4.52 |
| 7 | 56 | 6.50 |
| 8 | 59 | 6.84 |
| 9 | 18 | 2.09 |
| 10 | 22 | 2.55 |
| 11 | 18 | 2.09 |
| 12 | 3 | 0.35 |
| 13 | 4 | 0.46 |
|
| 869 | 100.00 |
Examples of relations in the test set. For each example we give the article identifier (PMID), the relation as extracted by curators specified as two Entrez Gene IDs, the number of systems that extracted that particular relation and a text excerpt from the corresponding abstract that describes the relation. The gene mentions are highlighted in the text excerpt, and the Entrez Gene IDs are given in parenthesis. The relations that have not been detected by systems are typically described in several sentences, describe the absence of an interaction with another protein or contain a self-interaction
|
|
|
|
|
|---|---|---|---|
| 15700267 | 1398, 1793 | 13 | Contrary to the effects of the true dominant negative SH2 domain mutants (R38K CrkII) and SH3-N domain mutants (W170K CrkII) that prevent macromolecular assembly of signaling proteins, W276K CrkII increases association between DOCK180 (1793) and CrkII (1398) as well as constitutive tethering of the Crk/DOCK180/ELMO protein complex that interacted with RhoG. |
| 16969499 | 672, 7157 | 13 | Co-immunoprecipitation assays of |
| 11463845 | 1026, 207 | 5 | Here we demonstrate that Akt (207) phosphorylates the cell cycle inhibitory protein p21(Cip1) (1026) at Thr 145 |
| 9234717 | 12402, 18595 | 4 |
|
| 16144832 | 300772, 60590 | 0 | Pias1(300772) binding to mGluR8-C60590 required a region N-terminal to a consensus sumoylation motif and was not affected by arginine substitution of the conserved lysine 882 within this motif. |
| 8623535 | 1489075, 1489080 | 0 | The E2 binding activity of E1 deletion and point mutant proteins were assayed using glutathione S-transferase E1 fusion proteins and |
| 14985338 | 6804, 9751 | 0 | cAMP-dependent protein kinase (PKA) can modulate synaptic transmission by acting directly on the neurotransmitter secretory machinery. Here we identify one possible target, syntaphilin, which was identified as a molecular clamp that controls free syntaxin-1 and dynamin-1 availability and thereby regulates synaptic vesicle exocytosis and endocytosis. Deletion mutation and site-directed mutagenesis experiments pinpoint dominant PKA phosphorylation sites to serines 43 and 56. PKA phosphorylation of syntaphilin significantly decreases its binding to syntaxin-1A (6804) |
| 15769741 | 285, 285 | 0 | In addition, improper creation of a new cysteine in Ang2 (285) (Ang2S263C) dramatically induced Ang2 aggregation without activating Tie2. |
| 9099695 | 495516, 495516 | 0 | These mutants confirmed that Ser-190 is a major autophosphorylation site of Pim-1 (495516). |
| 9786907 | 1030, 1030 | 0 | Analytical centrifugation allowed to determine that p15 (1030) assembles as a rod-shaped tetramer. Oxidative cross-linking of N-terminal cysteines of the peptide generated specific covalent oligomers, indicating that the N terminus of p15 is a coiled coil that assembles as a parallel tetramer. Mutation of Lys22 into Asp destabilized the tetramer and put forward the presence of a salt bridge between Lys22 and Asp24 in a model building of the stalk. |