Literature DB >> 18948286

MDPD: an integrated genetic information resource for Parkinson's disease.

Suisheng Tang1, Zhuo Zhang, Gopalakrishnan Kavitha, Eng-King Tan, See Kiong Ng.   

Abstract

Parkinson's disease (PD) is the second most common neurodegenerative disorder affecting millions of people. Both environmental and genetic factors play important roles in its causation and development. Genetic analysis has shown that over 100 genes are correlated with the etiology and pathology of PD. However, accessing genetic information in a consistent and fruitful way is not an easy task. The Mutation Database for Parkinson's Disease (MDPD) is designed to fulfill the need for information integration so that users can easily retrieve, inspect and enhance their knowledge on PD. The database contains 2391 entries on 202 genes extracted from 576 publications and manually examined by biomedical researchers. Each genetic substitution and the resulting impact are clearly labelled and linked to its primary reference. Every reported gene has a summary page that provides information on the variation impact, mutation type, the studied population, mutation position and reference collection. In addition, MDPD provides a unique functionality for users to compare the differences on the type of mutations among ethnic groups. As such, we hope that MDPD will serve as a valuable tool to bridge the gap between genetic analysis and clinical practice. MDPD is publicly accessible at http://datam.i2r.a-star.edu.sg/mdpd/.

Entities:  

Mesh:

Year:  2008        PMID: 18948286      PMCID: PMC2686576          DOI: 10.1093/nar/gkn770

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

Parkinson's disease (PD) is a progressive neurological disease that affects millions of people world wide, with ∼1.8% of the population at age >65 years (1). The death of dopaminergic neurons in the substantia nigra is a pathological feature of the disease. PD is a complicate disease that both environmental and genetic factors play various roles in its causation and development. The genetic evidence is that first degree relatives of the familial PD patients are more vulnerable to PD than the general population, especially significant for early-onset PD (2,3). However, reliable biomarkers or tests to facilitate early and accurate diagnosis are currently not available (4). Genetic testing, if available, could complement clinical diagnostic criteria. Several genetic mutations and variants (point substitution, deletion, insertion or even polymorphisms) have been positively associated with PD (5,6). PARK2, LRRK2, PINK1, SNCA, UCHL1 and PARK7 are the most frequently studied genes (7). Some of the genetic variants are considered as causal factors while others may cause neuronal dysfunction indirectly. For example, mutations in the 5′-UTR of NR4A2 have significantly decreased the expression of NR4A2 gene and its downstream gene tyrosine hydroxylase (8). At present, over 100 genes have been reported to associate with PD in various forms. Some of these genetic variants are widespread in patients while others are ethnic-related risk factor. LRRK2 G2019S is a common pathogenic mutation found in 5–7% of familial PD and 1–2% of sporadic PD worldwide (9,10). At the same time, this mutation shows specific ethnic prevalence with exceptionally high frequency in North African Arabs (37–42% in familial and 41% in sporadic PD) (11) but is rare in Chinese (12,13). Another population-specific example is the GBA gene mutation. Both R496H and c.84insGGfs are found in patients from Ashkenazi Jews (14,15) and have not been reported in other ethnic groups. Genetic screening and treatment strategies could be improved if genetic features have been well characterized. Further elucidation of such information may also lead to new developments in diagnostic methods and early treatment. Due to advances in genetic technology and the polygenic nature of PD, genetic information of the disease has accumulated rapidly in the past decade. The amount of data is a daunting challenge for individual researchers in searching and examining desirable information. For instance, there are over 1300 reports in the PubMed database if ‘Parkinson's disease’ AND ‘mutation’ are the keywords of search. The combination of gene names, official symbols and aliases in literature reports further amplifies the difficulty of retrieving relevant information. In addition, although information of gene function, gene sequences, protein structure and mutation reports are searchable, their availability are scattered across various databases such as Entrez Gene (16), GenBank (17), Swiss-Prot (18), OMIM (19) and PubMed (http://www.ncbi.nlm.nih.gov/sites/entrez?db=PubMed). The Human Gene Mutation Database [HGMD; (20)], which covers mutation information for over 2800 human genes, is the most comprehensive but its public free-access version is limited to less-updated information. Moreover, it is not built specifically for PD. If one searches the HGMD for PD, one may only get the mutation information for less than 20 genes. Thus, to obtain desired information, researchers have to perform at least two time-consuming tasks: (i) examine large volume of data; and (ii) query a number of different databases. There are a few databases that specialize in PD mutation information, but they are limited in either coverage or functionality. For example, the LOVD Parkinson's disease mutation database developed by the Parkinson's Institute in Leiden University (http://www.grenada.lumc.nl/LOVD2/TPI/home.php) briefly covers six genes with total of 71 variants. The PDGENE (http://www.pdgene.org/) is more comprehensive and contains the most updated list of PD candidate genes with emphasis in genetic association studies. However, it lacks the functionality for ethnic comparison and it does not provide summary and statistic reports. To address some of the current limitations of PD databases and to facilitate effective and comprehensive information acquisition, we have developed the Mutation Database for PD (MDPD). Through a single online location with user-friendly interfaces, researchers are able to retrieve the latest information on PD, covering genetic variation (mutation and polymorphism), population studies, literature evidence and gene sequences. Various cross-references to public databases are incorporated to assist further exploration and evaluation.

DATA SOURCE AND CLASSIFICATION

MDPD is a specialist database that presents the human mutation information relevant to PD as an online resource. Data from animal models and cell lines are not included. Mutation evidences are extracted from PubMed database (from 1995 to 10 June 2008) based on keyword search [‘Parkinson's disease/genetics’ (MeSH)]. Sequences, variants and general gene information are obtained from other databanks as mentioned above (the latest update of MDPD was 4 September 2008). Every mutation entry has been manually examined by a researcher specialized in genetics. Important information, such as size of study sample, control group, population, age of onset, type of PD (sporadic versus familial), mutation outcome, its reference sequence and possible impact of the variant, have been manually extracted from the data source. In addition to mutation, single nucleotide polymorphism (SNP) shows the linkage between genotype and the susceptibility of disease (21). SNPs can fall within the coding region and non-coding region in most cases. In non-coding region, a SNP does not change protein sequence but may still have consequences on the risk of disease by affecting the splicing site or transcription factor binding site. Some SNPs have protective effects while others may increase the risk of PD, or show no significant outcomes and need further research. As we believe SNP information plays an important role in the design of genetic tests and also in understanding the mechanisms of the disease, they are included in our database. On the other hand, variations that lack precise genetic locations, such as variants in approximate chromosome regions or markers in inter-gene regions, are not included in MDPD. We believe genetic location is very critical in mutation study and such approximate results need to be resolved before inclusion in the database. Case–control studies and genome-wide association studies (GWAS) have shown that PD is a polygenic disease that multiple gene mutations are responsible for the malfunction (7,22). Even monogenetic causes of PD could have resulted from a variety of mutations. The mutations in a particular gene can yield diverse consequences. A single nucleic acid substitution may lead to a multitude of possible outcomes, including amino acid exchange (missense mutation), no amino acid change (silent mutation), peptide truncation (nonsense mutation), absence of the protein (deletion) and even the production of a different protein (frame shift or insertion). Other types of mutation, such as duplication and compound mutation (more than one type of mutation) are also found in PD. As such, we categorize the genetic variations into the following: missense mutation, silent mutation, nonsense mutation, compound mutation, deletion, insertion, duplication, triplication, frame shift, short repeat and SNP. All of these classifications are based on the primary reference as the trusted data resource. We believe that such categorization can help the user to examine and compare the variations more efficiently. Another classification in MDPD is the variations’ impact to reflect the outcomes of the variations. Divergent impacts are expected due to differences in the method used, sample size and the studied population. One variation may have more than one ‘Impact’. For example, the V380L substitution in PARK2 gene is marked ‘Associated’ since it has been found in the sample of early onset PD patient (23). It is also tagged as ‘Negative Result’ in other occasion due to the lack of significant association with patients (24). Such inevitable discrepancy of impact reflects the complicated nature of PD in which multiple genetic factors play various roles and that interactions between genetic and environmental factors may influence the end result dynamically. According to the found effects of each variation, we classified its ‘Impact’ as ‘protective factor’, ‘risk factor’, ‘associated’, ‘questionable’ and ‘negative result’. For example, if ‘protective’ or ‘risk’ effect has been mentioned in the primary reference of a variance, we assign its impact as ‘protective factor’ or ‘risk factor’. ‘Associated’ is allocated to variances showing significant difference between patients and controls. If a variance has been reported in both patients and controls without statistic difference, we label it as ‘questionable’. The classification aims to help user recapitulate information according to comparison outcome.

DATABASE STRUCTURE AND USAGE

MDPD is designed to be a publicly accessible online resource with user-friendly interface. MySQL, a reliable and proven relational database, is used to organize information. A web-based user interface to the database is provided via an Apache 2.0 HTTP server with PHP scripting engine. MDPD contains the following functional pages: Browser, Search, Compare, Statistics and Variation Report. Three searching options are available in the search page: a variation search can be based on the gene name, gene ID or SWISS-PROT accession number. In accordance to the recommendations from the HUGO Nomenclature Committee, official gene symbols are used in each record, but MDPD also provides all known aliases in the result pages for easy reference and retrieval. For example, both PARK1 and SNCA are valid search terms in the database. Searching mutation information based on geographic region or the author's name in reference collection are two other helpful options. Searching for mutation information based on a geographic region enables a user to quickly know what genes have been studied in the region (or population) and the number of related publications may indicate regional research efforts. On the other hand, searching for mutation information based on the reference author's name can help researchers in the community easily identify various leading authors' research interests and their collaborators. Hyperlinks to each reference and gene symbol are provided in the report page. Various useful features have been built into MDPD and span several web pages for ease of use and navigation. The complete list of web pages and their corresponding features are listed in Table 1. We highlight the ‘Variation Report’ page here as we believe that it is a useful feature of MDPD. In addition to providing the generic gene information, hyperlinks to Entrez, Swissport and OMIM, the ‘Variation Report’ page also covers detailed information of ‘Variation Impact’, ‘Variation Type’, ‘Studied countries’, ‘Variation sequence’ (in both amino acid and nucleic acid levels) and ‘PubMed collection’. For ‘Variation Impact’ and ‘Variation Type’, we do not attempt to modify the findings from the primary reference, as we have mentioned previously. Users are advised to judge the classification based on his/her knowledge and the up-to-date research. To understand the impact of a genetic substitution, the user should be aware of any conflicting results from divergent ethnic groups or from specific subsets of patients. Such information is readily accessible in MDPD. ‘Studied countries’ provides information about the geographic regions and ethnic groups of patients studied. This information is also valuable for refining population screening target to avoid wasting resource. Researchers may also use it to identify key genetic factors, namely those for which impactful variations have been reported in many geographic regions (or ethnic groups). For example, MDPD includes 74 literature reports that described 258 different variants of PARK2 from 33 geographic regions. Deletion, duplication, triplication, insertion, missense mutation, nonsense mutation, silent mutation and compound mutations were all found in this gene. Among them, missense mutation and deletion are the most frequently conveyed with 140 and 86 records, respectively. Based on such information, a user can make reasonable inference about the importance of PARK2 instantly. To further confirm his/her speculation, the user can examine the primary reference through ‘PubMed collection’ and investigate the mutation ‘hot sport’ through ‘Variation sequence’.
Table 1.

Functional summary of MDPD

Web pageContents
BrowseAlphabetic list of gene symbol
Chromosomal location of the gene
Links to Entrez Gene and SWISS-PROT databases
Entry to summary page
SearchSearch gene name, symbol, aliases (allow partial name), gene ID, SWISS-PROT ID
Search studied geographic regions (ethnic group)
Search author's name for primary reference
SummaryAbout the gene
Number of records for the gene
Number of variants reported in SWISS-PROT database
Link to OMIM database
Number of Pubmed reference for the gene in MDPD
Variation reportList of variation impact
List of variation type
List of studied geographic regions (ethnic groups)
Variation sequence (in both amino acid and nucleic acid levels)
List of PubMed reference for the gene
Entry to individual variation report (sample size, control group, age, gender, testing variation, impact, geographic and comments)
ComparisonComparing genetic data from any two geographic regions
StatisticsKey statistics in MDPD
Top 10 genes with most literature reports
Top 10 genes with most reported negative variants
Top 10 countries/regions with most studies done
Functional summary of MDPD Allowing users to compare mutations between ethnic groups is another helpful element of MDPD. Users can readily obtain a list of mutation genes of an interested ethnic group from ‘Search’. Comparing the mutations between two ethnic groups of interest can also be done easily in ‘Comparison’. Currently, more than 2300 entries covering 202 human genes are stored in MDPD. Through systematic data mining on the integrated information, MDPD offers researchers new means for inspecting and making sense of the mutation evidences in published findings. MDPD is publically accessible at http://datam.i2r.a-star.edu.sg/mdpd/.

DISCUSSION

Gene mutations and variations have become the focus of PD research in the last decade. Linkage mapping, case–control study, pedigree analysis and GWAS are powerful approaches to identify and correlate genetic contribution to PD. However, how various mutations and variants affect the disease and shape its development remains unclear. In addition, each mentioned approach has certain limitations. Various research biases and errors contribute to fewer reproducible association findings and diminish the assessing power between genetic variants and the risk of common disease (25,26). To partially overcome the limitation of accumulated imperfect data, we intend to include all published literatures with precise number of sample size (both case and control), variation position, variation impact and geographic location. We expect multiple independent genetic studies could yield meaningful results. At the same time, we remind users to be caution in interpretation of deductions from published data. Many genes and multiple variants in a gene are registered positive correlation with PD. Possessing the available information is an essential first step to further understand and eventually to elaborate effective strategies for diagnosis and treatment. MDPD is an integrated information system that aims to facilitate PD research. It contains records for over 100 PD-associated genes verified from various genetic tests. Among them, the top 10 most reported mutation genes are LRRK2, PARK2, SNCA, CYP2D6, MAPT, PINK1, UCHL1, PARK7, MAOB and APOE (Table 2). The data in MDPD also reveal that current research has been focused in certain key genetic targets—the top 10 genes accounted for 1053 entries from 326 publications, which makes up to 44% of the total records and more than half of the literature reports (57%) in MDPD. At the same time, we realize that 202 genes are somewhat under the current research radar.
Table 2.

Top 10 genes with the most published reference in MDPD

Gene NameNo. of recordNo. of countries/regionsNo. of reference
LRRK22653876
PARK23292866
SNCA1172451
CYP2D6611631
MAPT391420
PINK1781819
UCHL1481218
PARK7581216
MAOB37715
APOE21814
Top 10 genes with the most published reference in MDPD Another interesting outcome from MDPD is the high frequency of ‘negative result’ in the variation reports. For example, >30% of records are labelled ‘negative’ in 9 out of the top 10 most reported genes (the exception is PARK2 with 15.2% negative reports). There are at least three implications: (i) many variants have low incident rates in PD patients and may not be a good screening target for survey; (ii) these variants may have insignificant impact to PD; and (iii) discrepancy may be caused by ethnic-related genetic variance, sample size, methods used or research errors. The variants with the most positive reports could be valuable genetic targets and further studies on them may warrant potential breakthrough in diagnosis and treatment. PD is a multi-factorial disease for which the environmental factors and genetic elements are likely to be equally important. Studies have showed that lifestyles (such as smoking and coffee consumption), pesticides and metal exposure, and even well water drinking are factors that influence the risk of disease in both sporadic and early-onset PD (27,28,29). The involvement of multiple genes, the high incident rate in aging population and high percentage of sporadic cases suggest the possibility of multiple interactions and connections in etiology of PD. In many cases, it is difficult to isolate the environmental factors and to specify the short- and long-term exposure. As such, we did not include environmental study information in MDPD, but the user should be mindful about the potential interactions with environmental factors.

FUTURE WORK

Discovering the relationships between the various genetic factors is an essential step toward understanding the mechanism of complex diseases such as PD. From MDPD, we know at least 202 genes have been examined for their possible involvement in PD. Our future work would be to develop an information system that can assess the impact of disease-causing mutations in terms of the functional changes of their encoded proteins and the interactions. Further integrated information, such as multiple level protein–protein interactions, and the role of the genetic variants in various neurodegenerative pathways will hopefully provide insights that will lead to novel treatments for PD.

FUNDING

Institute for Infocomm Research (I2R); Agency for Science, Technology and Research (A*STAR), Singapore. Conflict of interest statement. None declared.
  29 in total

Review 1.  Parkinson disease and its differentials. Diagnoses made easy.

Authors:  D K Chan
Journal:  Aust Fam Physician       Date:  2001-11

2.  Human Gene Mutation Database: towards a comprehensive central mutation database.

Authors:  P D Stenson; E Ball; K Howells; A Phillips; M Mort; D N Cooper
Journal:  J Med Genet       Date:  2008-02       Impact factor: 6.318

3.  Comprehensive evaluation of common genetic variation within LRRK2 reveals evidence for association with sporadic Parkinson's disease.

Authors:  Lisa Skipper; Yi Li; Carine Bonnard; Ratnagopal Pavanni; Yuen Yih; Eva Chua; Wing-Kin Sung; Louis Tan; Meng-Cheong Wong; Eng-King Tan; Jianjun Liu
Journal:  Hum Mol Genet       Date:  2005-11-03       Impact factor: 6.150

4.  Mutations in the glucocerebrosidase gene are associated with early-onset Parkinson disease.

Authors:  L N Clark; B M Ross; Y Wang; H Mejia-Santana; J Harris; E D Louis; L J Cote; H Andrews; S Fahn; C Waters; B Ford; S Frucht; R Ottman; K Marder
Journal:  Neurology       Date:  2007-09-18       Impact factor: 9.910

5.  Complex relationship between Parkin mutations and Parkinson disease.

Authors:  Andrew West; Magali Periquet; Sarah Lincoln; Christoph B Lücking; David Nicholl; Vincenzo Bonifati; Nina Rawal; Thomas Gasser; Ebba Lohmann; Jean-François Deleuze; Demetrius Maraganore; Allan Levey; Nick Wood; Alexandra Dürr; John Hardy; Alexis Brice; Matt Farrer
Journal:  Am J Med Genet       Date:  2002-07-08

6.  Genetic and environmental findings in early-onset Parkinson's disease Brazilian patients.

Authors:  Patricia de Carvalho Aguiar; Patricia Silva Lessa; Clecio Godeiro; Orlando Barsottini; Andre Carvalho Felício; Vanderci Borges; Sonia Maria de Azevedo Silva; Roberta Arb Saba; Henrique Ballalai Ferraz; Carlos A Moreira-Filho; Luiz Augusto F Andrade
Journal:  Mov Disord       Date:  2008-07-15       Impact factor: 10.338

7.  A common LRRK2 mutation in idiopathic Parkinson's disease.

Authors:  William P Gilks; Patrick M Abou-Sleiman; Sonia Gandhi; Shushant Jain; Andrew Singleton; Andrew J Lees; Karen Shaw; Kailash P Bhatia; Vincenzo Bonifati; Niall P Quinn; John Lynch; Daniel G Healy; Janice L Holton; Tamas Revesz; Nicholas W Wood
Journal:  Lancet       Date:  2005 Jan 29-Feb 4       Impact factor: 79.321

8.  Entrez Gene: gene-centered information at NCBI.

Authors:  Donna Maglott; Jim Ostell; Kim D Pruitt; Tatiana Tatusova
Journal:  Nucleic Acids Res       Date:  2006-12-05       Impact factor: 16.971

9.  Mendelian Inheritance in Man and its online version, OMIM.

Authors:  Victor A McKusick
Journal:  Am J Hum Genet       Date:  2007-03-08       Impact factor: 11.025

10.  GenBank.

Authors:  Dennis A Benson; Ilene Karsch-Mizrachi; David J Lipman; James Ostell; David L Wheeler
Journal:  Nucleic Acids Res       Date:  2007-12-11       Impact factor: 16.971

View more
  7 in total

1.  CIDeR: multifactorial interaction networks in human diseases.

Authors:  Martin Lechner; Veit Höhn; Barbara Brauner; Irmtraud Dunger; Gisela Fobo; Goar Frishman; Corinna Montrone; Gabi Kastenmüller; Brigitte Waegele; Andreas Ruepp
Journal:  Genome Biol       Date:  2012-07-18       Impact factor: 13.583

2.  Evaluation of PARKIN gene variants in West Bengal Parkinson's disease patients.

Authors:  Jaya Sanyal; Arpita Jana; Epsita Ghosh; Tapas K Banerjee; Durga P Chakraborty; Vadlamudi R Rao
Journal:  J Hum Genet       Date:  2015-05-28       Impact factor: 3.172

3.  Network and Pathway-Based Analyses of Genes Associated with Parkinson's Disease.

Authors:  Yanshi Hu; Zhenhua Pan; Ying Hu; Lei Zhang; Ju Wang
Journal:  Mol Neurobiol       Date:  2016-06-27       Impact factor: 5.590

4.  NSDNA: a manually curated database of experimentally supported ncRNAs associated with nervous system diseases.

Authors:  Jianjian Wang; Yuze Cao; Huixue Zhang; Tianfeng Wang; Qinghua Tian; Xiaoyu Lu; Xiaoyan Lu; Xiaotong Kong; Zhaojun Liu; Ning Wang; Shuai Zhang; Heping Ma; Shangwei Ning; Lihua Wang
Journal:  Nucleic Acids Res       Date:  2016-11-28       Impact factor: 16.971

Review 5.  Fulfilling the promise of personalized medicine? Systematic review and field synopsis of pharmacogenetic studies.

Authors:  Michael V Holmes; Tina Shah; Christine Vickery; Liam Smeeth; Aroon D Hingorani; Juan P Casas
Journal:  PLoS One       Date:  2009-12-02       Impact factor: 3.240

6.  PDbase: a database of Parkinson's disease-related genes and genetic variation using substantia nigra ESTs.

Authors:  Jin Ok Yang; Woo-Yeon Kim; So-Young Jeong; Jung-Hwa Oh; Sungwoong Jho; Jong Bhak; Nam-Soon Kim
Journal:  BMC Genomics       Date:  2009-12-03       Impact factor: 3.969

7.  A molecular explanation for the recessive nature of parkin-linked Parkinson's disease.

Authors:  Donald E Spratt; R Julio Martinez-Torres; Yeong J Noh; Pascal Mercier; Noah Manczyk; Kathryn R Barber; Jacob D Aguirre; Lynn Burchell; Andrew Purkiss; Helen Walden; Gary S Shaw
Journal:  Nat Commun       Date:  2013       Impact factor: 14.919

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.