Literature DB >> 31725861

PCOSBase: a manually curated database of polycystic ovarian syndrome.

Nor Afiqah-Aleng1, Sarahani Harun1, Mohd Rusman Arief A-Rahman1, Nor Azlan Nor Muhammad1, Zeti-Azura Mohamed-Hussein1,2.   

Abstract

Polycystic ovarian syndrome (PCOS) is one of the main causes of infertility and affects 5-20% women of reproductive age. Despite the increased prevalence of PCOS, the mechanisms involved in its pathogenesis and pathophysiology remains unclear. The expansion of omics on studying the mechanisms of PCOS has lead into vast amounts of proteins related to PCOS resulting to a challenge in collating and depositing this deluge of data into one place. A knowledge-based repository named as PCOSBase was developed to systematically store all proteins related to PCOS. These proteins were compiled from various online databases and published expression studies. Rigorous criteria were developed to identify those that were highly related to PCOS. They were manually curated and analysed to provide additional information on gene ontologies, pathways, domains, tissue localizations and diseases that associate with PCOS. Other proteins that might interact with PCOS-related proteins identified from this study were also included. Currently, 8185 PCOS-related proteins were identified and assigned to 13 237 gene ontology vocabulary, 1004 pathways, 7936 domains, 29 disease classes, 1928 diseases, 91 tissues and 320 472 interactions. All publications related to PCOS are also indexed in PCOSBase. Data entries are searchable in the main page, search, browse and datasets tabs. Protein advanced search is provided to search for specific proteins. To date, PCOSBase has the largest collection of PCOS-related proteins. PCOSBase aims to become a self-contained database that can be used to further understand the PCOS pathogenesis and towards the identification of potential PCOS biomarkers. Database URL: http://pcosbase.org.
© The Author(s) 2017. Published by Oxford University Press.

Entities:  

Year:  2017        PMID: 31725861      PMCID: PMC7243924          DOI: 10.1093/database/bax098

Source DB:  PubMed          Journal:  Database (Oxford)        ISSN: 1758-0463            Impact factor:   3.451


Introduction

Polycystic ovarian syndrome (PCOS) is an endocrine disorder that is characterized by a combination of two out of three features, i.e. ovulatory dysfunction, hyperandrogenism and/or the presence of polycystic ovaries (1). PCOS is difficult to diagnose as these features might lead to various phenotypic manifestations (2). Clinical findings showed that women with PCOS have higher risk to develop other complications such as endometrial cancer (3), diabetes (4), hypertension (5) and depression (6). These phenotypic manifestations and disease associations would significantly interrupt the progress in deciphering the cause of PCOS (7). Transcriptomics (8) and proteomics (9) were used to identify genes and proteins differences between non-PCOS and PCOS women and the resulting data analysis could be used to elucidate the cause of PCOS. At present, numbers of published expression studies has increased significantly since 2003, and this contributes to the vast amount of PCOS-related molecular data. Unfortunately, these molecular data were randomly distributed in various general biological databases (GenBank and UniProt) and literatures thus contribute to the difficulties in finding all genes and proteins that are related to PCOS. This limitation has led us to develop PCOSBase to house 8185 PCOS-related proteins that were manually curated. These proteins were filtered from 17 492 identified proteins from 30 expression studies and 9 databases. Bioinformatic analyses were performed on these proteins to characterize and classify them into specific datasets based on their molecular characteristics. PCOSBase also provides indexed publications related to PCOS. These features signify the differences of PCOSBase to previously published, PCOSKB (10) (PCOSKB statistics as of July 2017 contains 241 sequences). Detailed information on proteins and diseases related to PCOS can be found in PCOSBase but none on the proteins-drugs association as described in Open Targets (www.targetvalidation.org). Open Targets has listed 1119 proteins identified as drug targets for PCOS and 73% of those can be found in PCOSBase (11). PCOS is a focus in this study due to inadequate information and understanding on its complex molecular mechanism and at the same time it associates with many well-described diseases identified from clinical findings. For this reason, PCOSBase serves as a comprehensive medically oriented repository that will be an excellent aid in providing and integrating accurate molecular information for in depth understanding on PCOS. Herein, the development and current status of PCOSBase were described. The provided web interfaces were further systematically discussed. PCOSBase can be accessed online at http://pcosbase.org (PCOSBase v1.0, last updated on 21 November 2017).

Materials and methods

Data collection

Keywords including ‘Polycystic Ovary Syndrome,’ ‘Polycystic Ovary Syndrome 1,’ ‘PCOS,’ ‘polycystic ovaries,’ ‘PCOS,’ ‘PCO,’ ‘PCO1,’ ‘Stein-Leventhal,’ ‘Stein Leventhal,’ ‘Stein-Leventhal Syndrome,’ ‘Polycystic Ovary Disease,’ ‘Polycystic Ovarian Disease,’ ‘PCOD,’ ‘Sclerocystic Ovarian Degeneration,’ ‘Sclerocystic Ovary Syndrome,’ ‘Sclerocystic Ovarian Disease’ and ‘Bilateral PCOS’ were searched in nine disease-associated databases including OMIM (12), HGMD (13), DisGeNET (14), MalaCards (15), PhenomicDb (16), DISEASES (17), DGA (18), GWASdb (19) and GWAS catalog (20). Previous keywords of PCOS and another keywords such as ‘gene expression,’ ‘protein expression,’ ‘expression,’ ‘transcriptomics,’ ‘proteomics’ or ‘microarray’ were also used to search for relevant publications from PubMed (21), ArrayExpress (22), ScienceDirect and Scopus. Genes and proteins that were significantly expressed in those publications were included as PCOS-related proteins. These publications were indexed and listed in PCOSBase. All genes and proteins from disease-associated databases and published expression publications were compared against NCBI Gene (23) and UniProt (24) databases to obtain their unique Gene ID and UniProt ID. The overlapping data that were obtained in more than one database or studies were combined.

Functional annotations

To better understand the function of PCOS-related proteins, extensive information on the proteins such as chromosomal location, gene ontology (GO), pathway, proteins structural information, tissue localization, disease-related information and protein-protein interaction (PPI) were retrieved from online databases such as NCBI Gene (23), UniProt (24), Gene Ontology Consortium (25), KEGG (26), BioCarta (27), WikiPathways (28), Interpro (29), Human Protein Atlas (30), DisGeNET (14) and HIPPIE (31), or were obtained from our bioinformatics analysis (where necessary).

Database organization and architecture

All collected data including relevant information on PCOS-related proteins, functional annotation information and PCOS publications were organized in 29 tables. The 28 tables were linked to each other except for PCOS publications table (Figure 1).
Figure 1.

PCOSBase schema. This schema shows all the 29 tables with the connections from table to table.

PCOSBase schema. This schema shows all the 29 tables with the connections from table to table. PCOSBase was built as a relational database using MySQL Server 5.0.11. The web interfaces were designed using Laravel 5.4 (PHP web framework), HTML and JavaScript.

Results and Discussion

Database summary

Figure 2 depicts the organization of three data types in PCOSBase; i.e. PCOS-related proteins, diseases and publications. Currently, PCOSBase contains 8185 PCOS-related proteins retrieved from nine databases and 30 expression studies. Characterization on these proteins have resulted to the classification into 13 237 GOs, 7936 domains, 91 tissues with cell types, 320 472 interactions and 1004 pathways where most of the proteins are located in the metabolic pathways. Prediction on the diseases associated to PCOS reveals 1928 diseases. These were classified into 29 disease classes. Publications of 14 368 articles on PCOS are indexed in this database. Numbers of entries in each dataset were summarized in Table 1.
Figure 2.

PCOSBase data types structure organization. These data types are tables that can be found in Browse and Datasets menu.

Table 1.

Number of entries in the datasets of PCOSBase

DatasetEntries
PCOS-related proteins8185
Gene ontologies13 237
 Biological processes8971
 Cellular components1305
 Molecular functions2961
Domains7936
Pathways1004
Interactions320 472
PCOS-related diseases1928
Disease classes29
Tissues91
Databases9
Resources30
 Transcriptomics19
 Proteomics11
Publications14 368
Number of entries in the datasets of PCOSBase PCOSBase data types structure organization. These data types are tables that can be found in Browse and Datasets menu.

Database interface and access

PCOSBase interface contains six main menus, i.e. About, Search, Browse, Datasets, Network and Help that will help the user to easily navigate the respective pages. Each entry in PCOSBase provides brief description. For example, if the user searches or selects one of the proteins in PCOSBase, for instance ‘androgen receptor,’ it will navigate the user to the Description page of ‘androgen receptor.’ Seven tabs containing different information of ‘androgen receptor’ will appear. If the user clicks on one of the entries in GO tab, it will redirect to the description’s page of that GO. The list of PCOS-related proteins that are associated with this ontology will also appear below the GO description. The description of pathways, domains, diseases, tissues, databases, resources and partners will appear if the user clicks on those entries. Homepage displays total data statistics in every table and five menus that will navigate the users to the pages as described below. Search box is also provided on this page. Information on PCOSBase and PCOS can be accessed on ‘About page.’ ‘Search page’ provides two search options, i.e. Simple Search and Protein Advanced Search. The function of Simple Search is similar to the Search box on the homepage. Users can search for protein, GO, pathway, disease, domain and tissue that match to a particular keyword. For example, if ‘androgen’ keyword is searched, all entries in PCOSBase that contain ‘androgen’ term will appear. However, Protein Advanced Search allows the users to retrieve information of protein(s) with a particular combination of annotation. For instance, protein(s) associated with both GO term of ‘single fertilization’ and disease of ‘female infertility.’ Protein Advanced Search gives the users an option to find protein(s) that contain any combination from six different fields (protein description, GO, pathway, domain, tissue and disease). Users can assess all 11 datasets in PCOSBase by ‘Browse page.’ These datasets were classified based on their biological information, as described below: PCOS-related proteins dataset: contains lists of 8185 proteins related to PCOS that were retrieved from various sources. GO dataset: contains GO vocabulary information on all PCOS-related proteins. Pathways dataset: contains all identified pathways where PCOS-related proteins are involved in. Interactions dataset: contains information on PPIs of PCOS-related proteins. Domains dataset: contains information on the domains present in all PCOS-related proteins. Tissues dataset: provides information on which tissues and cell types where PCOS-related proteins were expressed. Databases dataset: contains list of publicly available databases, where PCOS-related proteins were obtained. Resources dataset: contains the expression studies of all PCOS-related proteins retrieved from transcriptomic and proteomic data. PCOS-related diseases dataset: contains identified diseases that are related to PCOS-related proteins. Disease classes dataset: contains information on PCOS-related diseases based on Medical Subject Headings tree. Publications dataset: provides all publications from PubMed that relates to PCOS. Datasets dropdown menu links all datasets in PCOSBase. Datasets tab are placed at the header and appear on every page of PCOSBase, which allow the users to quickly select and redirect to their desired datasets page. Network menu contains all networks constructed using PCOS-related proteins, Interactions and PCOS-related diseases datasets. Currently, PCOSBase only provides several static PCOS networks. Figure 3 is one of the networks that can be found in this menu, where this network clearly depicted the association of PCOS with other diseases.
Figure 3.

PCOS-disease interaction network. This network is predicted based on PPI and 20 diseases have been predicted to be highly associated with PCOS. The network demonstrates the complexity of PCOS-diseases association and the size of the nodes indicates the degree of association between PCOS and diseases. Green node represents PCOS and size of each node denotes number of shared proteins between PCOS and its respective associated disease.

Help menu provides the user manual of PCOSBase, database schema and all the references that were used to retrieve the data. All terms, definition and references that were used in PCOSBase were also provided in the Help page. PCOS-disease interaction network. This network is predicted based on PPI and 20 diseases have been predicted to be highly associated with PCOS. The network demonstrates the complexity of PCOS-diseases association and the size of the nodes indicates the degree of association between PCOS and diseases. Green node represents PCOS and size of each node denotes number of shared proteins between PCOS and its respective associated disease.

Conclusion and future perspective

In the next few years, the size of PCOS molecular data is expected to increase, especially with the application of new sequencing technologies such as next-generation sequencing in analysing in PCOS samples. To ensure PCOSBase is always up-to-date, all information in this database will be periodically updated. It is very important to consider a comprehensive cataloging on all types of data in any PCOS publications so as to ensure they are accessible to PCOS researchers and clinicians for their quick and easy reference. Ultimately, genomic and molecular information in this database will serve as a reliable repository that can be used to search for potential PCOS biomarker towards the development of improved diagnostics and treatment for PCOS.
  30 in total

1.  Proteomics of follicular fluid from women with polycystic ovary syndrome suggests molecular defects in follicular development.

Authors:  Aditi S Ambekar; Dhanashree S Kelkar; Sneha M Pinto; Rakesh Sharma; Indira Hinduja; Kusum Zaveri; Akhilesh Pandey; T S Keshava Prasad; Harsha Gowda; Srabani Mukherjee
Journal:  J Clin Endocrinol Metab       Date:  2014-11-13       Impact factor: 5.958

2.  PCOS and obesity: insulin resistance might be a common etiology for the development of type I endometrial carcinoma.

Authors:  Xin Li; Ruijin Shao
Journal:  Am J Cancer Res       Date:  2014-01-15       Impact factor: 6.166

3.  UniProt: a hub for protein information.

Authors: 
Journal:  Nucleic Acids Res       Date:  2014-10-27       Impact factor: 16.971

4.  OMIM.org: Online Mendelian Inheritance in Man (OMIM®), an online catalog of human genes and genetic disorders.

Authors:  Joanna S Amberger; Carol A Bocchini; François Schiettecatte; Alan F Scott; Ada Hamosh
Journal:  Nucleic Acids Res       Date:  2014-11-26       Impact factor: 19.160

Review 5.  Polycystic ovary syndrome and mental disorders: a systematic review and exploratory meta-analysis.

Authors:  Sergio Luís Blay; João Vicente Augusto Aguiar; Ives Cavalcante Passos
Journal:  Neuropsychiatr Dis Treat       Date:  2016-11-08       Impact factor: 2.570

6.  HIPPIE v2.0: enhancing meaningfulness and reliability of protein-protein interaction networks.

Authors:  Gregorio Alanis-Lobato; Miguel A Andrade-Navarro; Martin H Schaefer
Journal:  Nucleic Acids Res       Date:  2016-10-24       Impact factor: 16.971

7.  InterPro in 2017-beyond protein family and domain annotations.

Authors:  Robert D Finn; Teresa K Attwood; Patricia C Babbitt; Alex Bateman; Peer Bork; Alan J Bridge; Hsin-Yu Chang; Zsuzsanna Dosztányi; Sara El-Gebali; Matthew Fraser; Julian Gough; David Haft; Gemma L Holliday; Hongzhan Huang; Xiaosong Huang; Ivica Letunic; Rodrigo Lopez; Shennan Lu; Aron Marchler-Bauer; Huaiyu Mi; Jaina Mistry; Darren A Natale; Marco Necci; Gift Nuka; Christine A Orengo; Youngmi Park; Sebastien Pesseat; Damiano Piovesan; Simon C Potter; Neil D Rawlings; Nicole Redaschi; Lorna Richardson; Catherine Rivoire; Amaia Sangrador-Vegas; Christian Sigrist; Ian Sillitoe; Ben Smithers; Silvano Squizzato; Granger Sutton; Narmada Thanki; Paul D Thomas; Silvio C E Tosatto; Cathy H Wu; Ioannis Xenarios; Lai-Su Yeh; Siew-Yit Young; Alex L Mitchell
Journal:  Nucleic Acids Res       Date:  2016-11-29       Impact factor: 16.971

8.  Database Resources of the National Center for Biotechnology Information.

Authors: 
Journal:  Nucleic Acids Res       Date:  2016-11-28       Impact factor: 16.971

9.  WikiPathways: pathway editing for the people.

Authors:  Alexander R Pico; Thomas Kelder; Martijn P van Iersel; Kristina Hanspers; Bruce R Conklin; Chris Evelo
Journal:  PLoS Biol       Date:  2008-07-22       Impact factor: 8.029

10.  GWASdb v2: an update database for human genetic variants identified by genome-wide association studies.

Authors:  Mulin Jun Li; Zipeng Liu; Panwen Wang; Maria P Wong; Matthew R Nelson; Jean-Pierre A Kocher; Meredith Yeager; Pak Chung Sham; Stephen J Chanock; Zhengyuan Xia; Junwen Wang
Journal:  Nucleic Acids Res       Date:  2015-11-28       Impact factor: 16.971

View more
  2 in total

Review 1.  An update on polycystic ovary syndrome: A review of the current state of knowledge in diagnosis, genetic etiology, and emerging treatment options.

Authors:  Hiya Islam; Jaasia Masud; Yushe Nazrul Islam; Fahim Kabir Monjurul Haque
Journal:  Womens Health (Lond)       Date:  2022 Jan-Dec

2.  PCOSKBR2: a database of genes, diseases, pathways, and networks associated with polycystic ovary syndrome.

Authors:  Mridula Sharma; Ram Shankar Barai; Indra Kundu; Sameeksha Bhaye; Khushal Pokar; Susan Idicula-Thomas
Journal:  Sci Rep       Date:  2020-09-07       Impact factor: 4.379

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.