Literature DB >> 28706471

Scalable Text Mining Assisted Curation of Post-Translationally Modified Proteoforms in the Protein Ontology.

Karen E Ross1, Darren A Natale1, Cecilia Arighi2, Sheng-Chih Chen2, Hongzhan Huang2, Gang Li2, Jia Ren2, Michael Wang2, K Vijay-Shanker2, Cathy H Wu2.   

Abstract

The Protein Ontology (PRO) defines protein classes and their interrelationships from the family to the protein form (proteoform) level within and across species. One of the unique contributions of PRO is its representation of post-translationally modified (PTM) proteoforms. However, progress in adding PTM proteoform classes to PRO has been relatively slow due to the extensive manual curation effort required. Here we report an automated pipeline for creation of PTM proteoform classes that leverages two phosphorylation-focused text mining tools (RLIMS-P, which detects mentions of kinases, substrates, and phosphorylation sites, and eFIP, which detects phosphorylation-dependent protein-protein interactions (PPIs)) and our integrated PTM database, iPTMnet. By applying this pipeline, we obtained a set of ~820 substrate-site pairs that are suitable for automated PRO term generation with literature-based evidence attribution. Inclusion of these terms in PRO will increase PRO coverage of species-specific PTM proteoforms by 50%. Many of these new proteoforms also have associated kinase and/or PPI information. Finally, we show a phosphorylation network for the human and mouse peptidyl-prolyl cis-trans isomerase (PIN1/Pin1) derived from our dataset that demonstrates the biological complexity of the information we have extracted. Our approach addresses scalability in PRO curation and will be further expanded to advance PRO representation of phosphorylated proteoforms.

Entities:  

Keywords:  Protein Ontology (PRO); phosphorylation; post-translational modification; proteoform; text mining

Year:  2016        PMID: 28706471      PMCID: PMC5504912     

Source DB:  PubMed          Journal:  CEUR Workshop Proc        ISSN: 1613-0073


  19 in total

1.  Cross-species gene normalization by species inference.

Authors:  Chih-Hsuan Wei; Hung-Yu Kao
Journal:  BMC Bioinformatics       Date:  2011-10-03       Impact factor: 3.169

2.  Death-associated protein kinase 1 phosphorylates Pin1 and inhibits its prolyl isomerase activity and cellular function.

Authors:  Tae Ho Lee; Chun-Hau Chen; Futoshi Suizu; Pengyu Huang; Cordelia Schiene-Fischer; Sebastian Daum; Yan Jessie Zhang; Alison Goate; Ruey-Hwa Chen; Xiao Zhen Zhou; Kun Ping Lu
Journal:  Mol Cell       Date:  2011-04-14       Impact factor: 17.970

3.  iPTMnet: Integrative Bioinformatics for Studying PTM Networks.

Authors:  Karen E Ross; Hongzhan Huang; Jia Ren; Cecilia N Arighi; Gang Li; Catalina O Tudor; Mengxi Lv; Jung-Youn Lee; Sheng-Chih Chen; K Vijay-Shanker; Cathy H Wu
Journal:  Methods Mol Biol       Date:  2017

4.  Proteoform: a single term describing protein complexity.

Authors:  Lloyd M Smith; Neil L Kelleher
Journal:  Nat Methods       Date:  2013-03       Impact factor: 28.547

5.  RLIMS-P 2.0: A Generalizable Rule-Based Information Extraction System for Literature Mining of Protein Phosphorylation Information.

Authors:  Manabu Torii; Cecilia N Arighi; Gang Li; Qinghua Wang; Cathy H Wu; K Vijay-Shanker
Journal:  IEEE/ACM Trans Comput Biol Bioinform       Date:  2015 Jan-Feb       Impact factor: 3.710

6.  The first pilot project of the consortium for top-down proteomics: a status report.

Authors:  Xibei Dang; Jenna Scotcher; Si Wu; Rosalie K Chu; Nikola Tolić; Ioanna Ntai; Paul M Thomas; Ryan T Fellers; Bryan P Early; Yupeng Zheng; Kenneth R Durbin; Richard D Leduc; Jeremy J Wolff; Christopher J Thompson; Jingxi Pan; Jun Han; Jared B Shaw; Joseph P Salisbury; Michael Easterling; Christoph H Borchers; Jennifer S Brodbelt; Jeffery N Agar; Ljiljana Paša-Tolić; Neil L Kelleher; Nicolas L Young
Journal:  Proteomics       Date:  2014-04-14       Impact factor: 3.984

7.  Construction of protein phosphorylation networks by data mining, text mining and ontology integration: analysis of the spindle checkpoint.

Authors:  Karen E Ross; Cecilia N Arighi; Jia Ren; Hongzhan Huang; Cathy H Wu
Journal:  Database (Oxford)       Date:  2013-06-07       Impact factor: 3.451

8.  Biomedical text mining and its applications.

Authors:  Raul Rodriguez-Esteban
Journal:  PLoS Comput Biol       Date:  2009-12-24       Impact factor: 4.475

9.  The PhosphoGRID Saccharomyces cerevisiae protein phosphorylation site database: version 2.0 update.

Authors:  Ivan Sadowski; Bobby-Joe Breitkreutz; Chris Stark; Ting-Cheng Su; Matthew Dahabieh; Sheetal Raithatha; Wendy Bernhard; Rose Oughtred; Kara Dolinski; Kris Barreto; Mike Tyers
Journal:  Database (Oxford)       Date:  2013-05-13       Impact factor: 3.451

10.  The Reactome pathway Knowledgebase.

Authors:  Antonio Fabregat; Konstantinos Sidiropoulos; Phani Garapati; Marc Gillespie; Kerstin Hausmann; Robin Haw; Bijay Jassal; Steven Jupe; Florian Korninger; Sheldon McKay; Lisa Matthews; Bruce May; Marija Milacic; Karen Rothfels; Veronica Shamovsky; Marissa Webber; Joel Weiser; Mark Williams; Guanming Wu; Lincoln Stein; Henning Hermjakob; Peter D'Eustachio
Journal:  Nucleic Acids Res       Date:  2015-12-09       Impact factor: 16.971

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.