Literature DB >> 23225975

Identification and analysis of biomarkers for mismatch repair proteins: A bioinformatic approach.

Abstract

INTRODUCTION: Mismatch repair is a highly conserved process from prokaryotes to eukaryotes. Defects in mismatch repair can lead to mutations in human homologues of the Mut proteins and affect genomic stability which can result in microsatellite instability (MI). MI is implicated in most human cancers and majority of hereditary nonpolyposis colorectal cancers (HNPCCs) are attributed to defects in MLH1.
MATERIALS AND METHODS: In our study we analyzed MLH1 protein and the associated nucleotide and other protein sequences. The protein sequences involved in mismatch repair in different organisms have been found to be evolutionary related. Several other related proteins to MLH1 have also been identified through protein-protein interactions. All associated proteins are either mismatch repair proteins or associated with MLH1 in various pathways. Pathways information was also confirmed through MMR and other pathways in KEGG. QSite Finder showed that the active site of MLH1 protein involves residues from the conserved pattern and is involved in ligand-protein interactions and could be a useful site. To analyze linkage disequilibrium (LD) and common haplotype patterns in disease association, we performed statistical haplotype analysis on HapMap genotype data of SNPs genotyped in population CEU on chromosome 3 for MLH1.
RESULTS: Various markers have been found and LD plot was also generated. Two distinct blocks have been identified in LD plot which can be independent region of action, and there is involvement of 7 and 17 markers in first and second blocks, respectively.
CONCLUSION: Overall correlation of 0.95 has been found among all interactions of genotyped SNPs which is significant.

Entities: Chemical Disease Gene Mutation Species

Keywords: Haplotype; hyper conservation; linkage disequilibrium; mismatch repair system and proteins; multiple sequence alignment

Year: 2012 PMID： 23225975 PMCID： PMC3510907 DOI： 10.4103/0976-9668.101887

Source DB: PubMed Journal: J Nat Sci Biol Med ISSN： 0976-9668

INTRODUCTION

DNA mismatch repair is a process that takes place in the cells of almost every living organism, both prokaryotic and eukaryotic because of its evolutionary importance. The first evidence for mismatch repair was obtained from Streptococcus pneumonia and then work on Escherichia coli had identified a number of genes that, when mutationally inactivated, cause hypermutable strains.[12] Three of these proteins are essential in detecting the mismatch and directing repair machinery to it – MutS, MutH and MutL (MutS is a homologue of HexA and MutL of HexB). MLH1 heterodimerizes with PMS2 to form MutL alpha, a component of the postreplicative DNA mismatch repair system (MMR). Defects in MLH1 are a cause of mismatch repair cancer syndrome (MMRCS) also known as Turcot syndrome or brain tumor-polyposis syndrome1 (BTPS1),[3] Muir-Torre syndrome (MuToS) also abbreviated MTS and susceptibility to endometrial cancer (ENDMC).[4] Poor efficacy of DNA polymerase enzyme or the DNA being exposed to ionizing radiations (gamma rays, X-rays, ultraviolet rays), highly reactive oxygen radicals and various chemicals in the environment also produces aberrations in the DNA. If the genetic information encoded in the DNA is to remain uncorrupted, these chemical changes must be corrected to avoid various mutations. The DNA repair ability of a cell is vital to the integrity of its genome and thus to its normal functioning and that of the organism. Mismatch repair enzymes function to recognize these errors and correct them. After replication, these enzymes travel down the new DNA molecules and are able to identify mistakes by the “bulge” that results from a mismatched pair. When an error is discovered, the mismatch repair enzymes then activate other enzymes that complete the DNA repair. There are various disorders that occur due to the mutations in this mismatch repair proteins and affect genomic stability, which can result in microsatellite instability (MI).[5] MI is implicated in most human cancers and majority of hereditary nonpolyposis colorectal cancers (HNPCC) are attributed to defects in MLH1.[67] It is also evident that DNA damage and repair are essential processes to understand the mechanisms of cancer, ageing and various human genetic diseases.[8] Therefore there is a need to analyze these proteins and their roles in various disorders. Our approach involves diversified analysis of the structural, functional and evolutionary aspects of these proteins. In our study we analyzed the MLH1 protein and other associated proteins. DNA repair is initiated by MutS alpha (MSH2-MSH6) or MutS beta (MSH2-MSH6) binding to a dsDNA mismatch, then MutL alpha is recruited to the heteroduplex.[9] Assembly of the MutL-MutS-heteroduplex ternary complex in presence of RFC and PCNA is sufficient to activate endonuclease activity of PMS2.[1011] It introduces single-strand breaks near the mismatch and thus generates new entry points for the exonuclease EXO1 to degrade the strand containing the mismatch. DNA methylation would prevent cleavage and therefore assure that only the newly mutated DNA strand is going to be corrected. MutL alpha (MLH1-PMS2) interacts physically with the clamp loader subunits of DNA polymerase III suggesting that it may play a role to recruit the DNA polymerase III to the site of the MMR. Also implicated in DNA damage signaling, a process which induces cell cycle arrest and can lead to apoptosis in case of major DNA damages. The MLH1 protein which is a mismatch repair protein present in many species had a common signature motif - GFRGE[AG]L. The ability to recognize and repair damaged DNA is common to all forms of life, and numerous DNA repair pathways have evolved to repair almost all possible DNA lesions. The comparative and functional genome study of the organisms helps us to identify conserved regions and various related disorders.[12] There is a strong relationship between DNA repair pathways and human genetic disorders as these disorders represents defects in several associated genes e.g. in case of cancer and multi-system defects specifically in the immune and neurological systems.[13] The various protein–protein interactions which are involved in many complex networks and pathways are essential for understanding the metabolic and cellular processes and can further serve as novel targets for therapeutic interventions. There is a growing interest in understanding haplotype structures in the human genome using identified genetic markers as haplotype structures may provide critical information on human evolutionary history and the identification of genetic variants underlying various human traits.[14] Therefore, a DNA mismatch repair protein i.e. MLH1 which is involved in various disorders has been extensively analyzed in this study.

MATERIALS AND METHODS

Various in silico approaches and computational tools have been applied for the biological analysis of MLH1 protein. First, the protein sequence of MLH1 protein in humans was retrieved from NCBI which was cross referenced from Uniprot and Swissprot databases. The MLH1 protein sequences from various other organisms like Saccharomyces cerevisiae, Rattus norvegicus, Bos taurus, Mus musculus, etc. were also retrieved from NCBI and then these sequences were aligned together using Multiple Sequence Alignment tools like MAFFT[15] and MUSCLE.[16] Conserved motifs in these sequences were compared and confirmed through PROSITE database.[17] A phylogenetic tree providing evolutionary relatedness of sequences was also obtained through Treefinder with GTR-GI model and 10,000 replicates[18] and the Phylogenetic Web Repeater (POWER).[19] Various protein-protein interactions with MLH1 were obtained from STRING,[20] BIND,[21] IntAct,[22] and MINT[23] PPI databases. The MLH1 sub-cellular localization was obtained from various tools like PSORT,[24] LOCATE,[25] BaCelLo[26] and MultiLoc,[27] which was found related to various disease pathways in KEGG database[28] [Table 1]. The protein structure of MLH1 was also found and downloaded from Protein Data Bank (PDB) and the active site residues were obtained from QSITE Finder.[29]

Table 1

Various disease pathways through KEGG

Various disease pathways through KEGG Linkage Disequilibrium (LD) is used in the study of population genetics for the non-random association of alleles at two or more loci.[30] Various measures have been proposed for characterizing the statistical association between alleles at different loci. Most common measures are D’ and r2 and both range between 0 and 1. D’ is a measure of LD between two genetic markers. D’ = 1 (complete LD) indicates that two SNPs have not been separated by recombination, while D’ <1 (incomplete LD) indicates that the ancestral LD was disrupted during the history of the population. Only D’ value near one is a reliable measure of LD extent. r2 is also a measure of LD between two genetic markers. r2 = 1 (Perfect LD) for SNPs that have been separated by recombination or have the same allele frequencies. We have here applied haplotype block and haplotype tagger analysis to reveal the information regarding LD.[31] The haplotype analysis was performed using Haploview.

RESULTS

In multiple sequence alignment (MSA) performed by MAFFT and MUSCLE [Figure 1], a conserved signature motif for mismatch repair proteins GFRGE[AG]L, is shown within the rectangle. This proves that the protein sequence involved in mismatch repair in different organisms have been found to be evolutionary related as there is a common conserved motif in MLH1 protein of these species, which is a DNA mismatch repair protein's MutL/HexB/PMS1 signature motif. From the PSORT subcellular localization tool, the MLH1 protein was found to be nuclear which was also confirmed by other available servers. A phylogenetic tree was reconstructed using Treefinder with consensus analysis for 10000 replicates on GTR-GI model with optimum values which shows the evolutionary relationship of sequences in this study [Figure 2]. Tree is in harmony with available phylogenies of involved studies but with different marker data. Arabidopsis with Solanum is an interesting aspect of the tree as this pair is most distant and justifies its presence with these two species as separate and far clade from rest of the species evolutionarily. Further longer branches of Drosophila (Insecta) and Schmidtea (Platyhelminthes) confirms their position between plants and higher organisms. Positions of fungus (Ascomycetes), zebra fish (Cyprinidae), and Nasonia (Insecta) with longer branches than rodents and mammals gave perfect shape to this phylogenetic tree. Tree is in agreement with the available standard phylogenies but distinction of two rooted separated blocks is a unique feature among species in this study.

Figure 1

MAFFT-generated Partial MSA of MLH1 of various species

Figure 2

Phylogenetic tree (Consensus) reconstructed from Tree finder

MAFFT-generated Partial MSA of MLH1 of various species Phylogenetic tree (Consensus) reconstructed from Tree finder The pattern of the tree generated by POWER and other phylogenetic tree generating programs was almost similar with respect to phylogenetic trends of all the species in this study. All the groups and nodes are in agreement with the repetition of particular species with a score of more than 80 except one group where 3 species- Saccharomyces cerevisiae, Sordaria macrospora, Sordaria macrospora k-hell are there while the score of this group is also significant (60–80) as shown in Figure 3.

Figure 3

Phylogenetic tree reconstructed from POWER

Phylogenetic tree reconstructed from POWER The MLH1 protein is known to interact with a number of proteins which are involved in DNA repair pathways. From the STRING database, the MLH1 protein is found to be interacting with a number of proteins like msh2, pms2, msh3, exo1, msh6, etc., which has experimental, text-mining, gene fusion, neighborhood, co-expression and other evidences. Some of the important interactions in STRING database have been found similar to the interactions in the BIND database as shown in Figure 4 and Table 2. When these interactions were observed in other databases like IntAct and MINT, certain new interacting proteins were found which are represented in Tables 3 and 4, respectively. Genecards gave the information regarding 102 proteins interacting with MLH1 [Table 5]. Therefore, on comparing all these databases, certain interactions were found common and will be of interest to researchers.

Figure 4

Protein–protein interactions from STRING database

Table 2

Interactions from BIND database

Table 3

Some important interactions from IntAct database

Table 4

Interactions from MINT database

Table 5

Interacting proteins for MLH1 In genecards

Protein–protein interactions from STRING database Interactions from BIND database Some important interactions from IntAct database Interactions from MINT database Interacting proteins for MLH1 In genecards When MLH1 protein was searched in KEGG (Kyoto Encyclopedia for Genes and Genomes) Pathway database, various biochemical pathways had shown the vital role of MLH1 protein in their processes. Various diseases are closely associated with MLH1 protein as this protein is found in the pathways causing many cancers like colorectal cancer, endometrial cancer, etc., and in mismatch repair pathway.[32] Various active site residues were discovered from the MLH1 protein structure so that the putative site should be known in advance where the ligand could probably bind the protein. As we have already seen that this protein is involved in a number of diseases therefore there is a need to analyze this protein in detail and the pockets identified [Figure 5] where the drug could bind would help in designing new inhibitors for the protein. This kind of analysis can provide an insight for the therapeutic applications. All of the protein atoms close to a probe-cluster defining various sites are shown in Table 6.

Figure 5

MLH1 protein with colored active sites

Table 6

QSITE FINDER predicted active site residues (selected)

MLH1 protein with colored active sites QSITE FINDER predicted active site residues (selected) According to some recent studies it has been found that chromosomes are structured in a way that each chromosome can be divided into many blocks named haplotypes.[33] Knowledge of local linkage disequilibrium (LD) and common haplotype patterns in disease association has potential to make them comprehensive and efficient.[34] Haplotype tagging refers to the methods of selecting minimal number of SNPs that uniquely identify common haplotypes (>5% in frequency). Principal use of tagging is to select a ‘good’ subset of SNPs to be typed in all the studied individuals. We performed haplotype analysis on HapMap genotype data of SNPs genotyped in population CEU on chromosome 3. LD plot was generated and in haploblock diagram [Figure 6], two distinct blocks have been identified that are the alternative blocks within same loci on LD plot, and a strong correlation between blocks indicates independent site of action which is being proposed by this analysis and there is involvement of 7 markers in first block while 17 markers in second block with significant statistical support. Overall correlation of 0.95 has been found among all interactions of genotyped SNPs which is significant [Figure 7].

Figure 6

LD plot generated from haploview

Figure 7

Haplotypes from haploview

LD plot generated from haploview Haplotypes from haploview

DISCUSSION

From our analysis it can be concluded that a system's biology approach is essential for the interaction of genes/proteins/networks for understanding of the cellular processes, and there is a need to perform detailed analysis on repair pathways and associated human genetic disorders. The protein sequences involved in mismatch repair in different organisms have been found to be evolutionary related as there is a common motif GFRGE[AG]L found in MLH1 protein of these species. Followed by the multiple sequence analysis using MAFFT and MUSCLE servers, the same pattern was found conserved among all species in this study. Phylogenetic tree generated based on MSA is also in agreement with standard phylogeny available for various biomarkers. Several other related proteins have also been identified through protein–protein interactions. All associated proteins are either mismatch repair proteins or associated with MLH1 in various pathways. Pathways information was also confirmed through MMR and other pathways in KEGG. Further studies from QSite Finder showed that the active site of MLH1 protein also involves these residues and this conserved pattern is involved in ligand–protein interactions as confirmed through a complex structure of MLH1. Information generated will definitely be an aid for further research and based on conserved residues of active sites and various ligand interaction cavities, new inhibitors can be designed. Marker information is generated from sequence to structure level with conserved signature motif and active site residue within structural pockets, respectively. Besides that, evolutionary information has also been generated which suggests the selection of a specific and suitable molecular evolutionary model of substitution for MLH1 protein sequences among various organisms. Haplotype analysis revealed 24 (17+7) new alleles with significant statistical scores and confirmed the association of these alleles with various disorders. Two independent sites of action (two distinct but related blocks) have been identified for the same allele, which might be helpful in mapping various markers on genomic data. Overall this study provides a new direction towards repair proteins and their myriad analysis.

34 in total

1. The International HapMap Project.

Authors:
Journal: Nature Date: 2003-12-18 Impact factor: 49.962

2. Haploview: analysis and visualization of LD and haplotype maps.

Authors: J C Barrett; B Fry; J Maller; M J Daly
Journal: Bioinformatics Date: 2004-08-05 Impact factor: 6.937

3. MUSCLE: multiple sequence alignment with high accuracy and high throughput.

Authors: Robert C Edgar
Journal: Nucleic Acids Res Date: 2004-03-19 Impact factor: 16.971

Review 4. Definition and clinical importance of haplotypes.

Authors: Dana C Crawford; Deborah A Nickerson
Journal: Annu Rev Med Date: 2005 Impact factor: 13.739

5. Q-SiteFinder: an energy-based method for the prediction of protein-ligand binding sites.

Authors: Alasdair T R Laurie; Richard M Jackson
Journal: Bioinformatics Date: 2005-02-08 Impact factor: 6.937

Review 6. SNP discovery in associating genetic variation with human disease phenotypes.

Authors: Yousin Suh; Jan Vijg
Journal: Mutat Res Date: 2005-06-03 Impact factor: 2.433

7. Low allele frequency of MLH1 D132H in American colorectal and endometrial cancer patients.

Authors: Brian Y Shin; Huiping Chen; Laura S Rozek; Leslie Paxton; David J Peel; Hoda Anton-Culver; Gad Rennert; David G Mutch; Paul J Goodfellow; Stephen B Gruber; Steve M Lipkin
Journal: Dis Colon Rectum Date: 2005-09 Impact factor: 4.585

8. POWER: PhylOgenetic WEb Repeater--an integrated and user-optimized framework for biomolecular phylogenetic analysis.

Authors: Chung-Yen Lin; Fan-Kai Lin; Chieh Hua Lin; Li-Wei Lai; Hsiu-Jun Hsu; Shu-Hwa Chen; Chao A Hsiung
Journal: Nucleic Acids Res Date: 2005-07-01 Impact factor: 16.971

9. LOCATE: a mouse protein subcellular localization database.

Authors: J Lynn Fink; Rajith N Aturaliya; Melissa J Davis; Fasheng Zhang; Kelly Hanson; Melvena S Teasdale; Chikatoshi Kai; Jun Kawai; Piero Carninci; Yoshihide Hayashizaki; Rohan D Teasdale
Journal: Nucleic Acids Res Date: 2006-01-01 Impact factor: 16.971

10. TREEFINDER: a powerful graphical analysis environment for molecular phylogenetics.

Authors: Gangolf Jobb; Arndt von Haeseler; Korbinian Strimmer
Journal: BMC Evol Biol Date: 2004-06-28 Impact factor: 3.260

2 in total

1. An Integrative Approach for Mapping Differentially Expressed Genes and Network Components Using Novel Parameters to Elucidate Key Regulatory Genes in Colorectal Cancer.

Authors: Manika Sehgal; Rajinder Gupta; Ahmed Moussa; Tiratha Raj Singh
Journal: PLoS One Date: 2015-07-29 Impact factor: 3.240

2. Structure of the human MLH1 N-terminus: implications for predisposition to Lynch syndrome.

Authors: Hong Wu; Hong Zeng; Robert Lam; Wolfram Tempel; Iain D Kerr; Jinrong Min
Journal: Acta Crystallogr F Struct Biol Commun Date: 2015-07-28 Impact factor: 1.056

2 in total