Literature DB >> 23621914

CoDP: predicting the impact of unclassified genetic variants in MSH6 by the combination of different properties of the protein.

Hiroko Terui1, Kiwamu Akagi, Hiroshi Kawame, Kei Yura.   

Abstract

BACKGROUND: Lynch syndrome is a hereditary cancer predisposition syndrome caused by a mutation in one of the DNA mismatch repair (MMR) genes. About 24% of the mutations identified in Lynch syndrome are missense substitutions and the frequency of missense variants in MSH6 is the highest amongst these MMR genes. Because of this high frequency, the genetic testing was not effectively used in MSH6 so far. We, therefore, developed CoDP (Combination of the Different Properties), a bioinformatics tool to predict the impact of missense variants in MSH6.
METHODS: We integrated the prediction results of three methods, namely MAPP, PolyPhen-2 and SIFT. Two other structural properties, namely solvent accessibility and the change in the number of heavy atoms of amino acids in the MSH6 protein, were further combined explicitly. MSH6 germline missense variants classified by their associated clinical and molecular data were used to fit the parameters for the logistic regression model and to assess the prediction. The performance of CoDP was compared with those of other conventional tools, namely MAPP, SIFT, PolyPhen-2 and PON-MMR.
RESULTS: A total of 294 germline missense variants were collected from the variant databases and literature. Of them, 34 variants were available for the parameter training and the prediction performance test. We integrated the prediction results of MAPP, PolyPhen-2 and SIFT, and two other structural properties, namely solvent accessibility and the change in the number of heavy atoms of amino acids in the MSH6 protein, were further combined explicitly. Variants data classified by their associated clinical and molecular data were used to fit the parameters for the logistic regression model and to assess the prediction. The values of the positive predictive value (PPV), the negative predictive value (NPV), sensitivity, specificity and accuracy of the tools were compared on the whole data set. PPV of CoDP was 93.3% (14/15), NPV was 94.7% (18/19), specificity was 94.7% (18/19), sensitivity was 93.3% (14/15) and accuracy was 94.1% (32/34). Area under the curve of CoDP was 0.954, that of MAPP for MSH6 was 0.919, of SIFT was 0.864 and of PolyPhen-2 HumVar was 0.819. The power to distinguish between pathogenic and non-pathogenic variants of these methods was tested by Wilcoxon rank sum test (p < 8.9 × 10(-6) for CoDP, p < 3.3 × 10(-5) for MAPP, p < 3.1 × 10(-4) for SIFT and p < 1.2 × 10(-3) for PolyPhen-2 HumVar), and CoDP was shown to outperform other conventional methods.
CONCLUSION: In this paper, we provide a human curated data set for MSH6 missense variants, and CoDP, the prediction tool, which achieved better accuracy for predicting the impact of missense variants in MSH6 than any other known tools. CoDP is available at http://cib.cf.ocha.ac.jp/CoDP/.

Entities:  

Mesh:

Substances:

Year:  2013        PMID: 23621914      PMCID: PMC3651391          DOI: 10.1186/1423-0127-20-25

Source DB:  PubMed          Journal:  J Biomed Sci        ISSN: 1021-7770            Impact factor:   8.410


Background

Lynch syndrome (MIM: #120435, #609310), also known as Hereditary Non-Polyposis Colorectal Cancer (HNPCC), is an autosomal dominant disease and the most common hereditary colorectal cancer syndrome [1]. Lynch syndrome accounts for 1-5% of all colorectal cancer (CRC) patients [2-4] and associates with germline mutations in one of the DNA mismatch repair (MMR) genes including MLH1, MSH2, MSH6 and PMS2 (MIM: #120436, #609309, #600678, #600259, respectively). MMR gene mutation carriers are at high risks of developing Lynch syndrome associated cancer at colorectal, endometrial, small bowel, stomach, ovary, ureter and hepatobiliary tract. Individuals at high risks can be identified by the use of genetic testing, and appropriate surveillance programs can be provided to prevent cancer development. Previous studies reported that more than 90% of the detectable mutations in Lynch syndrome were found in MLH1 and MSH2[5]. Recent data, however, showed that MSH6 contributed to about 20% of the mutations [6,7]. In addition, MSH6 shows the greatest frequency (~37 - 49%) of missense variants in the MMR genes, and most of them are currently “unclassified variants” (UVs) [6,8]. MSH6 mutation carriers tend to develop CRC at the age elder than MLH1 and MSH2 mutation carriers and tend to show reduced penetrance [9-12]. These tendencies suggest that family cancer history with an MSH6 mutation should not be necessarily dense enough to meet the Amsterdam criteria. Furthermore, colorectal tumor from MSH6 mutation carriers sometimes demonstrates microsatellite instability low (MSI-L) or microsatellite stable (MSS) [13], or normal staining pattern of immunohistochemistry (IHC) for MMR proteins [11]. It is, therefore, important to analyze and integrate all the available data, and the data derived from the use of in silico tools for the classification of UVs is one of them. A number of methods to predict the biological effects of missense variants as pathogenic or genetic have been reported. For Lynch syndrome, SIFT [14], PolyPhen [15,16] and multivariate analysis of protein polymorphisms (MAPP) [17] have been used in general. Predictions using SIFT is based on sequence conservation, while that of PolyPhen is based on sequence conservation plus protein structural features [14-16]. These methods aim to predict the pathogenicity of variants for general proteins and hence they were not tuned to the interpretation of the prediction for a specific protein. MAPP uses the evolutionary variations and scales of six physicochemical properties to evaluate the structural and functional impact of all possible variants [17]. MAPP can be customized for a specific protein. It has been optimized to MLH1 and MSH2 and outperformed SIFT and PolyPhen (MAPP-MMR [18]). This result indicates that the algorithm customized for a specific protein is superior to those applicable to proteins in general. However, the accuracy of prediction by MAPP-MMR is not satisfactory enough for the genetic testing. Hence, improvement in the prediction method is required. In the field of bioinformatics, especially the field for developing a prediction method out of amino acid sequences, it has been pointed out that the prediction accuracy can be improved by integrating many different prediction methods (e.g. [19]). Following this idea, the accuracy of the pathogenicity prediction could be improved by integrating a number of existing methods to predict the biological effects of missense variants. In addition, none of the existing methods directly incorporate the information obtained from the MSH6 protein structure. The three-dimensional structure of MSH6-MSH2 complex with ADP and DNA was already solved [20]. The structural data should contain varieties of information, some of which would be useful for the prediction. The easily obtained information related to the mutation effect to the structure includes the solvent accessibility of amino acid residue and the residue volume change. The mutation of amino acid residue at the surface of the protein are tolerant compared with that in the interior of the proteins, and a small volume change in amino acid residues in mutation inside the protein is tolerant compared with a mutation with a big volume change [21]. We, therefore, optimized MAPP [17] for MSH6 and then integrated SIFT [14], PolyPhen-2 [15] and two properties from protein structure, namely solvent accessibility and the volume change in amino acid residues. We joined these properties on the logistic regression model and compared the prediction performance with MAPP, SIFT, PolyPhen-2 and PON-MMR [22]. The parameter adjustment was done on the data that we gathered from different databases and literature and associated them with one another for this study. The newly developed method achieved the best prediction accuracy, sensitivity and specificity, and can distinguish pathogenic variants from non-pathogenic variants clearly. We named the method CoDP, Combination of Different Properties on MSH6, and made it available at http://cib.cf.ocha.ac.jp/CoDP/.

Methods

The dataset of MSH6 missense variants

MSH6 missense variants and their associated clinical and molecular data were collected from the following databases:  InSiGHT  (http://www.insight-group.org/), MMRUV (http://www.mmrmissense.net/), UniProt (http://www.uniprot.org/), dbSNP (http://www.ncbi.nlm.nih.gov/projects/SNP/), NHLBI Exome Sequencing Project (ESP) (http://evs.gs.washington.edu/EVS/), HapMap Project (http://hapmap.ncbi.nlm.nih.gov/) and 1000 Genomes (http://www.1000genomes.org/). A systematic literature search was conducted on PubMed (http://www.ncbi.nlm.nih.gov/pubmed/) to compile unregistered MSH6 missense variants in the databases above. These data were used to assess the in silico pathogenicity prediction. Clinical and molecular data on carriers with missense variants were also collected. The data included the age at the first diagnosis of CRC or endometrial cancer, any affected relatives with Lynch syndrome associated cancer, microsatellite instability (MSI), IHC, segregation study, allele frequency and biochemical functional assay. The biochemical functional assay included the investigations of the following; MMR activity, MSH2 protein interaction, localization, ATP hydrolysis and mismatch recognition. We employed the results of the assay from the literature as is. These clinical and molecular data were used to divide the carriers into one of the following three categories; “likely to be Lynch syndrome (LLS)”, “unlikely to be Lynch syndrome (ULS)” and “unclassified.” LLS is a carrier with pathogenic variant, and ULS is a carrier with non-pathogenic variant. An “Unclassified” carrier has a variant with unknown clinical significance, which is usually called unclassified variant (UV). The division was carried out based on the criteria shown in Table 1. When a carrier fulfilled one or more of the criteria for LLS in Table 1, the carrier was classified as LLS, and when a carrier fulfilled one or more of the criteria for ULS, the carrier was classified as ULS. When the criterion that the carrier fulfilled became important, a sub-numbering system was used, such as LLS-1 for a carrier fulfilling the first criterion of LLS.
Table 1

Definition for classification of missense variants in MSH6

LLS (Likely to be Lynch Syndrome):
ULS (Unlikely to be Lynch Syndrome):
Fulfill one or more of the following criteria;Fulfill one or more of the following criteria;
1. Abnormal result of functional assay AND [abnormal IHC of only MSH6 OR MSI-H]
1. Polymorphism (minor allele frequency ≥.01)
2. Abnormal IHC of only MSH6 AND MSI-H
2. Normal result of functional assay AND [MSS OR normal IHC of MSH6]
3. [Abnormal IHC of only MSH6 OR segregation analysis] AND fulfill at least two of the following three criteria.
3. MSS AND normal IHC of MSH6
a) Family history: More than one affected relatives who were diagnosed as CRC or endometrial cancer under 60 years old and at least in two successive generations.
 
b) Proband‘s tumor feature: diagnosed as CRC or endometrial cancer under 50 years old and/or synchronous or asynchronous multiple cancers.
 
c) Control allele frequency = .00 (healthy population ≥ 100) 
Definition for classification of missense variants in MSH6

Optimization of MAPP for MSH6

We optimized MAPP [17] to predict pathogenicity of MSH6 missense variants. MAPP requires the appropriate multiple sequence alignment of MSH6 orthologues for evaluating missense variants. MSH6 amino acid sequences were collected from GenBank (http://www.ncbi.nlm.nih.gov/genbank/) using BLAST [23] by the default parameters and human MSH6 as a query sequence. The sequences were also obtained from Ensembl genome database (http://www.ensemblgenomes.org/). The inclusion of both paralogous and orthologous sequences into the multiple sequence alignment for the training of MAPP was known to worsen the performance of the prediction [14,17]. We, therefore, selected orthologues of human MSH6 sequences based on their domain organization and a phylogenetic tree. There was a wide range of variability in domain structures of the MSH6 proteins, and MSH6 sequences with the same domain organization to human MSH6 are the good candidates of orthologues. Vertebrate MSH6, the close homologues to human MSH6, generally have a PCNA-binding motif [24], a PWWP domain [25] and an MutS domain [20] (Figure 1). These vertebrate MSH6 sequences were aligned together with other MSH6 homologs by T-Coffee alignment tool [26] and a phylogenetic tree was built. This phylogenetic tree was compared with the species tree, and the proteins orthologous to human MSH6 were operationally defined by the sequences with the same domain organization that located around the human MSH6 consistently with the species tree. As a result, the vertebrate sequences were selected as an initial set and a multiple sequence alignment of them was built for MAPP prediction.
Figure 1

Domain organization of human MSH6 and the additional sequence set used for optimizing MAPP parameters for MSH6. MSH6 protein is depicted by box diagram. A box indicates a distinct domain structure and a line connecting the boxes indicates an inter-domain sequences. The range of the domain is shown above or beneath the box. “−” denotes non-vertebrate sequences in the secondary sequence set added to the initial set. For the detail, see Optimization of MAPP for MSH6 section in Results and Discussion.

Domain organization of human MSH6 and the additional sequence set used for optimizing MAPP parameters for MSH6. MSH6 protein is depicted by box diagram. A box indicates a distinct domain structure and a line connecting the boxes indicates an inter-domain sequences. The range of the domain is shown above or beneath the box. “−” denotes non-vertebrate sequences in the secondary sequence set added to the initial set. For the detail, see Optimization of MAPP for MSH6 section in Results and Discussion. We then improved the prediction accuracy by increasing the size of the sequence set. An augmented data set was reported to improve the accuracy of the prediction [18]. The addition of amino acid sequences to the data set was limited to the domain regions, because the inter-domain sequences were too diverse to align. Sequences of non-vertebrates were added to the initial sequence set and the prediction accuracy was tested using a receiver operating characteristic (ROC) curve and the area under the curve (AUC).

Structural properties to assess mutations in MSH6

Structural property for amino acid residue substitutions was obtained on the three-dimensional structure of MSH6-MSH2-DNA-ADP complex, registered as 2o8b [20] in Protein Data Bank [27]. The registered structure is void of residues at 551, 652, 942, and 992, and of loops at 720–728, 1099–1104, 1123–1125, 1179–1187 and 1271–1283. These missing structures were complemented using MOE (Chemical Computing Group Inc. Montreal, Canada), molecular structure building software. Two properties we focused on were relative accessible surface area (accessibility) of each residue and the change of volumes in residues by substitution. The accessible surface area was calculated using a modified method of Shrake and Rupley [28] with water radius of 1.4 Å [29]. The threshold of 0.1 was used to separate the locations of residues into two categories; buried and surface. The relevance of accessibility to the prediction was tested based on the correlation between the accessibility and LLS/ULS. The change of volumes was quantified by the difference of the number of heavy atoms in the side chains. The relevance of this value to the prediction was also tested by the method that was the same as the one used for the accessibility test.

Combining different properties

We used the logistic regression model to integrate the properties. The logistic regression analysis gives the probability (q) of a categorical variable outcome based on one or more predictor variables (X). The logistic regression equation is given by: logit(q) = ln [q/(1−q)] = Z + ∑bX, where Z is the constant and b, b, …, b are the partial correlation coefficients for X, X, …, X. We defined the value q as joint score in CoDP and this score was used for predicting the impact of UVs. The scores of MAPP for MSH6, SIFT, PolyPhen-2 and the appropriate structural properties discussed above were used as predictors X. Variant sets of LLS and ULS without the biochemical functional assay were used to optimize b. The applicability of the joint score for prediction was tested on the variants of LLS and ULS with the biochemical functional assay.

Performance test

The capability of predicting the impact of UVs was tested using the variants of LLS and ULS. The prediction performance of the tools, CoDP, MAPP for MSH6, SIFT, PolyPhen-2 and PON-MMR, was compared. The comparison was carried out on prediction score distributions. The positive predictive value (PPV), the negative predictive value (NPV), sensitivity, specificity and accuracy were calculated as follows: PPV = TP / (TP + FP); NPV = TN / (FN+TN); Sensitivity = TP / (TP+FN); Specificity = TN / (FP+TN); Accuracy = (TP+TN) / (TP +TN+FP+FN), where TP is true positive, FP is false positive, TN is true negative and FN is false negative. To classify pathogenic variants, the threshold values 0.05 and 0.446 were used in SIFT [14] and PolyPhen-2 [15], respectively. The prediction performance was also compared using AUC. The box and whisker plot for each prediction was drawn to clarify the power to distinguish between LLS and ULS variants. Statistical analyses were carried out on PASW Statistics 18.0.0 software program (SPSS Inc., Chicago, IL, USA).

Results and discussion

The dataset of MSH6 germline missense variants

A total of 294 germline missense variants were collected from the variant databases and literature (Additional file 1: Table S1). Pathogenicity of these variants was determined based on the molecular and clinical data, and the variants were classified into three categories, namely LLS, ULS and UV (Table 1). Out of these 294 variants data, fifteen were classified as LLS (Tables 2 and 3) and nineteen as ULS (Tables 4 and 5).
Table 2

Variants classified as “Likely to be Lynch syndrome” (LLS) with functional assay

No.
Variant
Definition of LLSa
Functional assay
IHC
MSI
References
   MMR activityInteraction with MSH2Locali-zationATP hydrolysisMismatch recognitionMLH1MSH2MSH6  
1
G566R
1
Inconclusive
Normal
ND
Abnormal
ND
ND
ND
ND
H
[12,30-32]
2
R976H
1,2
ND
Normal
ND
ND
Abnormal
Normal
Normal
Abnormal
H
[30,33]
3
G1139S
1,2
ND
ND
ND
Abnormal
ND
Normal
Inconclusive
Abnormal
H
[34-36]
4
S1188N
1,2
Abnormal
ND
ND
ND
ND
Normal
Normal
Abnormal
H
[38]
5E1193K1,2AbnormalAbnormalNDNDNDNormalInconclusiveAbnormalH[31,37]

Abbreviations: ND, Not done, H, MSI-high.

See Table 1.

Table 3

Variants classified as LLS without functional assay

No.
Variant
Definition of LLSa
IHC
MSI
Segregation study
FH
PTF
Healthy control =0 (N>100)
References
   MLH1MSH2MSH6      
6
L449P
2,3
Normal
Normal
Abnormal
H
ND
Abnormal
Abnormal
ND
[39]
7
C559Y
3
ND
ND
ND
ND
Abnormal
Abnormal
Abnormal
ND
[44]
8
P591S
2,3
Normal
Normal
Abnormal
H
ND
Abnormal
Abnormal
Abnormal
[40]
9
P623L
3
Normal
Normal
Abnormal
L
ND
Normal
Abnormal
Abnormal
[31]
10
G670R
2
Normal
Normal
Abnormal
H
ND
Normal
Normal
ND
[41]
11
R772W
2
Normal
Normal
Abnormal
H
ND
Normal
Normal
Inconclusive (0/95)
[42]
12
Y969C
2,3
Normal
Normal
Abnormal
H
Abnormal
Abnormal
Abnormal
Inconclusiveb
[43,44]
13
G1069E
2
Normal
Normal
Abnormal
H
ND
Normal
Normal
ND
[45]
14
R1076C
3
Normal
Normal
Abnormal
ND
ND
Abnormal
Abnormal
ND
[47,48]
15A1236P2,3NormalNormalAbnormalHNDAbnormalNAAbnormal[46]

Abbreviations: ND, not done, H, MSI-high, L, MSI-low.

See Table 1.

The number of healthy population is unknown.

Table 4

Variants classified as “Unlikely to be Lynch syndrome” (ULS) showing normal MMR

NO
Variant
Definition of ULSa
Polymorphism
Functional assay
IHC
MSI
References
    MMR activityInteraction with MSH2LocalizationATP hydrolysisMismatch recognitionMLH1MSH2MSH6  
16
R128L
2
NA
Normal
Normal
ND
ND
ND
Abnormal
Normal
Normal
H
[31]
17
S1441
2,3
<0.01
Normal
Normal
ND
ND
ND
Normal
Normal
Normal
S
[30,49,50]
18
L396V
1,2
≥0.01
Normal
ND
ND
ND
ND
Normal
Normal
Normal
L/H
[32,34]
19K728T2,3NANormalNormalNDNDNDAbnormalAbnormalAbnormalS[31]

Abbreviations: NA, Not available, ND, Not done; H, MSI-high; L, MSI-low; S; Microsatellite stable.

See Table 1.

Table 5

Variants classified as ULS showing polymorphism or normal IHC and MSS

NoVariantDefinition of ULSaPolymorphismMLH1MSH2MSH6MSIReferences
20
K13T
3
<0.01
Normal
Normal
Normal
S
[[49]]
21
A25V
1
≥0.01
ND
ND
ND
ND
db S NP, 1000 Genomes
22
G39E
1
≥0.01
ND
ND
ND
ND
db S NP, 1000 Genomes
23
G54A
3
NA
Normal
Normal
Normal
S
[[51]]
24
S65L
3
<0.01
Normal
Normal
Normal
S
[[49]]
25
C196F
1
≥0.01
ND
ND
ND
ND
db S NP, 1000 Genomes
26
R468H
3
<0.01
Normal
Normal
Normal
S
[[49]]
27
S503C
3
<0.01
Normal
Normal
Normal
S
[[49]]
28
R635G
3
NA
Normal
Normal
Normal
S
[[52]]
29
l886V
1
≥0.01
ND
ND
ND
ND
1000 Genomes
30
l1054F
3
NA
Normal
Normal
Normal
S
[[34]]
31
E1163V
1
≥0.01
ND
ND
ND
ND
1000 Genomes
32
E1196K
1
≥0.01
ND
ND
ND
ND
db S NP 1000 Genomes
33
E1234Q
1
≥0.01
ND
ND
ND
ND
db S NP 1000 Genomes
34E1304K1≥0.01NDNDNDND1000 Genomes

Abbreviations: NA; Not available, ND, Not done, S, Microsatellite stable.

See Table 1.

Variants classified as “Likely to be Lynch syndrome” (LLS) with functional assay Abbreviations: ND, Not done, H, MSI-high. See Table 1. Variants classified as LLS without functional assay Abbreviations: ND, not done, H, MSI-high, L, MSI-low. See Table 1. The number of healthy population is unknown. Variants classified as “Unlikely to be Lynch syndrome” (ULS) showing normal MMR Abbreviations: NA, Not available, ND, Not done; H, MSI-high; L, MSI-low; S; Microsatellite stable. See Table 1. Variants classified as ULS showing polymorphism or normal IHC and MSS Abbreviations: NA; Not available, ND, Not done, S, Microsatellite stable. See Table 1. Out of fifteen LLS variants, five variants including G566R, R976H, G1139S, S1188N and E1193K showed abnormality in protein function assay (Table 2). These five variants also showed high level of MSI (MSI-H), and showed loss of MSH6 expression except for G566R variant [12,30-38]. Hence, these five variants were LLS-1 and/or LLS-2. Out of the remaining ten LLS variants (=15-5), L449P, P591S, G670G, R772W, Y969C, G1069E and A1236P variants had MSI-H and loss of MSH6 expression like the ones in Table 2, but these variants fulfilled the clinical criteria, such as family cancer history and probands’ tumor features [39-46], and hence these seven variants were LLS-2 and/or LLS-3 (Table 3). The remaining three LLS variants (=15-5-7), namely C559Y, P623L and R1076C, were LLS-3 [31,44,47,48] (Table 3). Out of nineteen ULS variants, four variants including R128L, S144I, L396V and K728T showed normal function in protein function assay and normal staining pattern in IHC, hence fulfilled definition ULS-2 [30-32,34,49,50] (Table 4). In addition, L396V was polymorphism and also fulfilled definition ULS-1. Out of the remaining fifteen ULS variants (=19-4), K13T, G54A, S56L, R468H, S503C, R635G and I1054F variants demonstrated MSS and showed normal expression of MSH6 [34,49,51,52], hence these seven variants possessed normal MMR activity and fulfilled definition ULS-3 (Table 5). The remaining eight (=19-4-7) ULS variants, namely A25V, G39E, C196F, I886V, E1163V, E1196K, E1234Q and E1304K were polymorphism and fulfilled definition ULS-1 (Table 5). In total, 34 variants in Tables 2, 3, 4 and 5 were available for prediction assessment, and the remaining 260 variants, which were UVs, were the targets to predict whether each of them was either LLS or ULS. In the following analyses, we used the data in Tables 3 and 5 as a parameter training data set, and the data in Tables 2 and 4 as a prediction test data set. All 34 variants data was referred to as the whole data set. And we applied the prediction to UV dataset at the end.

The sequence data set for the multiple alignments

From GenBank and Ensembl, 126 sequences of MSH6 orthologues were selected (Additional file 2: Table S2). Of them, 34 were derived from vertebrates. Most of the vertebrate orthologues had, from the N-terminus, a PCNA-binding motif (Qxx[LI]xx[FF], amino acid 4–11 in human MSH6) [24], a PWWP domain (amino acid 89–194) [25] and an MutS domain (amino acid 362–1355) [20] (Figure 1). These sequences were a set of initial sequences for a multiple sequence alignment. We then added the amino acid sequences of the PCNA-binding motif and of the PWWP domain of 91 non-vertebrate MSH6 to the initial set, and found that the prediction performance was improved. The procedure of adding more amino acid sequences of MutS domain was, however, not straightforward. Three different sets of sequences were made from the non-vertebrate MutS domain. The first set contained the entire non-vertebrate MutS domain (91 sequences). The second set contained MutS domains derived from the sequences that were comprised of both the MutS and PWWP domains (5 sequences). The third set contained MutS domains derived from the sequences that were comprised of both the MutS domain and PCNA-binding motif (58 sequences). A multiple sequence alignment was built with initial sequences plus each of the described sequence sets, and the performance of prediction was tested on the whole data set using an ROC curve. The AUC of the first set was 0.767, that of the second set was 0.689 and that of the third set was 0.811. It turned out that the initial set plus the third set, namely sequences of both MutS domain and PCNA-binding motif, performed best and this set was used hereafter.

Normalization of the impact score

MAPP determines the pathogenicity of missense variants by an index known as impact score. The threshold of the impact score is required to determine whether the variant is pathogenic or not. The impact score basically depends on the degree of conservation of amino acid types in the alignment position [17]. Therefore, the threshold of the impact score in different domains of MSH6 likely varies. Indeed, the optimum threshold for the initial sequence set was 8.5, that for the PCNA-binding motif was 4.1, that for the PWWP domain was 5.0 and that for the MutS domain was 4.1. The different threshold values of the different domains in the same sequence could cause confusion. We, therefore, normalized the impact scores so as to make the threshold value 1.0 throughout the sequence.

The prediction performance of MAPP for MSH6

This type of prediction method should ideally distinguish disease-causing variants from benign variants [53]. The distributions of the score of MAPP for MSH6 between LLS and ULS variants in the whole data set were significantly different. The average for LLS and ULS was 2.673 and 0.851, respectively (Student’s t-test: p < .001) and median for LLS and ULS was 2.099 and 0.770, respectively (Mann–Whitney U test: p < .001). The capability of this tool is, therefore, reasonably sufficient to distinguish pathogenic variants from non-pathogenic variants.

Development of CoDP

The prediction performance of SIFT and PolyPhen-2

We examined the prediction performance of both SIFT and PolyPhen-2 on the whole data set. PolyPhen-2 calculates values of both HumDiv and HumVar. HumDiv is used for diagnosis of Mendelian disease, and HumVar is used for the evaluation of rare alleles potentially involved in complex phenotypes [15]. Both SIFT and PolyPhen-2 clearly distinguished the median for LLS variants and that for ULS variants (Mann–Whitney U test: HumVar p < .001, HumDiv p < .001, SIFT p < .001).

Correlation between the structural properties of the MSH6 protein and LLS/ULS

The correlation between solvent accessibility of substituted amino acid and LLS/ULS was found to be statistically significant. The average of the solvent accessibility of the substituted amino acid residues in LLS and in ULS variants were 0.141 and 0.589, respectively (Student’s t-test: p < .001) and the median of the solvent accessibility of the residues in LLS and ULS variants were 0.087 and 0.583, respectively (Mann–Whitney U test: p < .005). The amino acid residues substituted in LLS tend to have smaller accessibility than those in ULS variants. Similarly, the correlation between the changes in the number of heavy atoms in the side chains of the substituted residues in LLS/ULS variants was also significant (Figure 2). Minor change in the number of heavy atoms in the side chains was often observed in ULS. These significant differences in the two properties evidently have a potential to be used as predictors for pathogenicity of MSH6 variants. When these two properties alone were applied to the whole data set, eleven out of 15 LLS variants and 17 out of 19 ULS variants were correctly distinguished, which is equivalent to 82.4% accuracy, using the most appropriate threshold. It is surprising to find that this simple and explicit usage of protein three-dimensional structure data had a classification power comparable to the power of SIFT and PolyPhen2.
Figure 2

The number of changes in heavy atoms between the original and the substituted amino acid. For instance, in change 0–1 (no or one change in the number of heavy atoms by substitution), the cases of ULS are more frequent than those of LLS. An I-form line on each bar denotes a standard deviation obtained by the bootstrap method with 1,000 resampling. The distributions do not overlap in the number of changes 0–1 and 2–3.

The number of changes in heavy atoms between the original and the substituted amino acid. For instance, in change 0–1 (no or one change in the number of heavy atoms by substitution), the cases of ULS are more frequent than those of LLS. An I-form line on each bar denotes a standard deviation obtained by the bootstrap method with 1,000 resampling. The distributions do not overlap in the number of changes 0–1 and 2–3.

Combining different properties by logistic regression model

To further improve the prediction accuracy, we combined different prediction methods above on the logistic regression equation and the weight for each method was optimized using the training data set. The logistic regression equation for joint score q was obtained as: The significance level is less than 1% and hence this model seems to be useful for the prediction. In the equation above, we omitted PolyPhen-2 HumDiv, because HumDiv had low accuracy, as will be explained below. We calculated both AUC and the cut-off value of joint score q. AUC was 0.954 and the cut-off value was 0.56. Based on these values, we considered that the variants with the joint score q = 0.56 or less has minor impact on the function of the MSH6 protein, and hence the variants were likely to be non-pathogenic variants. The variants with the joint score q more than 0.56 were, therefore, likely to be pathogenic. More specifically, the variants with the joint score q more than 0.65 likely have the function impaired. And the variants with the joint score q between 0.56 and 0.65 likely have moderate impact on function. We applied this prediction procedure to the test data set, namely the variants with the biochemical functional assay (Tables 2 and 4), and found that the procedure predicted those variants correctly (LLS: 5/5 variants, ULS: 4/4 variants). Of the five LLS variants, four variants, namely G566R, G1139K, S1188N and E1193K, were in the category of “impaired function. ”

Comparison of prediction performance

The performance of CoDP was first compared with those of other conventional tools, namely MAPP, SIFT, PolyPhen-2 and PON-MMR on the whole data set. The values of PPV, NPV, sensitivity, specificity and accuracy were compared (Table 6). PPV of CoDP was 93.3% (14/15), NPV was 94.7% (18/19), sensitivity was 93.3% (14/15), specificity was 94.7% (18/19) and accuracy was 94.1% (32/34). All these scores were better than those of the conventional methods except for PON-MMR. PON-MMR predicted eleven out of 34 LLS/ULS variants as either pathogenic or non-pathogenic variants, and remaining 23 variants as UVs. The eleven variants were predicted correctly, of which three were pathogenic variants and eight were non-pathogenic variants. However, prediction by PON-MMR did not classify 23 (= 34–11) variants as pathogenic or non-pathogenic, and hence the method cannot be used for UV curation, which we aim for in our tools. Therefore, we put PON-MMR aside in this comparison. Superiority of CoDP was also clarified by AUC. AUC of CoDP was 0.954, that of MAPP for MSH6 was 0.919, of SIFT was 0.864 and of PolyPhen-2 HumVar was 0.819. The power to distinguish between LLS and ULS of these methods was visualized by the box and whisker plot (Figure 3) and further tested by Wilcoxon rank sum test. The test ended in p < 8.9 × 10-6 for CoDP, p < 3.3 × 10-5 for MAPP, p < 3.1 × 10-4 for SIFT and p < 1.2 × 10-3 for PolyPhen-2 HumVar. These tests clearly demonstrated that CoDP outperformed other conventional methods.
Table 6

Prediction performance of tools in the whole data set

 CoDPMAPP for MSH6SIFTPolyPhen2 HumVarPolyPhen2 HumDiv
TP
14
14
10
14
14
TN
18
17
15
10
8
FP
1
1
4
9
11
FN
1
2
5
1
1
PP0V
0.933 (14/15)
0.875 (14/16)
0.714 (10/14)
0.609 (14/23)
0.560 (14/25)
NPV
0.947 (18/19)
0.944 (17/18)
0.750 (15/20)
0.909 (10/11)
0.889 (8/9)
Sencitivity
0.933 (14/15)
0.875 (14/15)
0.667 (10/15)
0.933 (14/15)
0.933 (14/15)
Specificity0.941 (32/34)0.912 (31/34)0.735 (25/34)0.706 (24/34)0.647 (22/34)
Figure 3

Box and whisker plots for distributions of prediction scores of tools in LLS and ULS variants. The top and the bottom of the box are the 75th and 25th percentile, respectively, and the black line in the box is the median. × denotes an outlier. The distributions of LLS and ULS in CoDP (a) are better separated than those of MAPP for MSH6 (b), SIFT (c) and PolyPhen-2 (d).

Box and whisker plots for distributions of prediction scores of tools in LLS and ULS variants. The top and the bottom of the box are the 75th and 25th percentile, respectively, and the black line in the box is the median. × denotes an outlier. The distributions of LLS and ULS in CoDP (a) are better separated than those of MAPP for MSH6 (b), SIFT (c) and PolyPhen-2 (d). Prediction performance of tools in the whole data set When the performances of the tools were compared on the test data set alone, only CoDP predicted all test variants correctly. The values of PPV, NPV, sensitivity, specificity and accuracy of the tools in the test data set were shown in Table 7 (MAPP LLS: 4/5 variants, ULS: 4/4 variants; SIFT LLS: 4/5 variants, ULS: 4/4 variants; PolyPhen-2 HumVar LLS: 5/5 variants, ULS: 2/4 variants). AUC of CoDP was 1.000, that of MAPP for MSH6 was 0.800, of SIFT was 0.950 and of PolyPhen-2 HumVar was 0.900. The power to distinguish between LLS and ULS of these methods on the test data set was p < 1.5 × 10-2 for CoDP, p < 1.9 × 10-1 for MAPP, p < 6.5 × 10-2 for SIFT and p < 1.5 × 10-2 for PolyPhen-2 HumVar. The box and whisker plot that visualized the distribution of the scores were shown in Additional file 3: Figure S1.
Table 7

Prediction performance of tools in the test set

 CoDPMAPP for MSH6SIFTPolyPhen2 HumVarPolyPhen2 HumDiv
TP
5
4
4
5
5
TN
4
4
4
2
1
FP
0
0
0
2
3
FN
0
1
1
0
0
PPV
5/5
4/4
4/4
5/7
5/8
NPV
4/4
4/5
4/5
2/2
1/0
Sencitivity
5/5
4/5
4/5
5/5
5/5
Specificity
4/4
4/4
4/4
2/4
1/4
Accuracy9/98/98/97/96/9
Prediction performance of tools in the test set The small size of the test data set may raise doubts on the superiority of CoDP. To overcome the paucity of the test sample, we also employed a leave-one-out jackknife method and evaluated the performance of the tools. CoDP predicted 85.3% (29/34, LLS 93.3%, 14/15, ULS 78.9%, 15/19) of the variants correctly and the performance was still better than SIFT and PolyPhen-2 HumVar (Table 6). Here, we did not compared the performance of CoDP and MAPP for MSH6, because of the fact that MAPP is based on the information retrieved from the homologous sequences and hence it was difficult to leave the information of the target sequence out of the training set.

Predicting UVs by CoDP

We now used CoDP to interpret 260 germline missense variants, which were classified as UVs. Of 260 UVs, 84 variants (32.3%) were predicted as pathogenic variants, and 176 variants (67.7%) as non-pathogenic variants, hence about one third of the UVs detected in MSH6 were predicted as pathogenic variants. Of these putative 84 pathogenic variants, three variants were predicted to have the moderate impact on the protein (0.56 < joint score q ≤ 0.65), and the 81 variants were predicted to have impaired function (joint score q > 0.65) (Table 8).
Table 8

Classification results of UVs in MSH6 by CoDP

The variants with no impact on MSH6
The variants with moderate impact on MSH6
The variants with impact on MSH6
VariantsScoreVariantsScoreVariantsScoreVariantsScoreVariantsScoreVariantsScore
S9G
0.000
S360I
0.000
L815I
0.180
G670V
0.595
L370S
0.832
A1021D
0.988
A20V
0.000
R361H
0.000
P831A
0.060
S1049F
0.572
Y397C
0.976
R1024W
0.938
A20D
0.000
T369I
0.009
D857N
0.426
I1227L
0.619
L435P
0.942
D1026Y
0.995
N21S
0.000
E381K
0.001
V867G
0.189
 
 
A457P
0.951
D1031V
0.722
A25S
0.000
D390N
0.003
V878A
0.009
 
 
R468C
0.992
R1034Q
0.724
A36V
0.000
Y397F
0.003
D880E
0.000
 
 
V474A
0.930
A1055T
0.935
P42S
0.000
I425V
0.115
Q889H
0.022
 
 
V480L
0.853
D1058S
0.975
W50R
0.000
I442T
0.017
I891M
0.031
 
 
E484K
0.826
V1059A
0.716
A81T
0.000
E446N
0.027
L893V
0.016
 
 
V509A
0.969
A1064V
0.846
A81V
0.000
N455T
0.000
R901H
0.035
 
 
I516N
0.740
Y1066C
0.999
K99N
0.003
Q475H
0.261
D904E
0.006
 
 
T521I
0.911
P1087H
0.978
I120V
0.000
K476E
0.145
V907A
0.001
 
 
Y535C
0.894
P1087R
0.995
E122K
0.000
M492V
0.530
E983Q
0.074
 
 
Y538S
0.998
R1095H
0.692
K125E
0.000
R497T
0.028
N984H
0.006
 
 
D575Y
0.997
R1095C
0.996
L147H
0.000
K498R
0.000
F985L
0.016
 
 
S580L
0.997
T1100R
0.860
A159V
0.000
Q522R
0.097
R988L
0.017
 
 
P656L
0.943
I1115T
0.802
H164P
0.000
P531T
0.003
P991L
0.065
 
 
S682C
0.653
T1142M
0.864
K185E
0.000
E533D
0.006
T1008I
0.302
 
 
S682F
0.998
G1148R
1.000
K187T
0.000
E546G
0.031
R1024Q
0.053
 
 
G685A
0.939
G1157S
0.964
E192V
0.000
E546Q
0.003
Q1048E
0.002
 
 
L700F
0.985
A1162P
0.970
V195F
0.015
S549F
0.468
V1056M
0.360
 
 
S702G
0.951
T1175S
0.822
D197H
0.001
Y556F
0.162
R1068G
0.312
 
 
F706S
0.996
E1187G
0.998
E198A
0.000
I570V
0.054
P1073S
0.001
 
 
R761G
0.922
L1201F
0.984
P202A
0.000
R577H
0.522
P1073R
0.042
 
 
C765W
1.000
D1213V
0.932
M208V
0.000
F582L
0.146
V1078A
0.004
 
 
G770V
0.994
E1214A
0.992
V210A
0.000
I608V
0.033
P1082S
0.018
 
 
R772Q
0.954
R1217K
0.880
V215I
0.000
K610N
0.009
P1082L
0.012
 
 
W777R
0.994
T1219I
0.944
D217Y
0.001
E619D
0.291
P1087T
0.056
 
 
A780G
0.713
T1225M
0.888
E220D
0.000
P623A
0.010
P1087S
0.201
 
 
I795T
0.707
R1242L
0.966
E221D
0.000
G624S
0.072
E1090K
0.007
 
 
L798V
0.919
T1243S
0.650
N223D
0.000
E639K
0.005
T1100M
0.025
 
 
Y850C
1.000
V1253E
0.856
N223S
0.000
R644S
0.057
K1101N
0.002
 
 
K854M
0.826
R1263C
0.767
S227I
0.000
K646R
0.223
P1110S
0.376
 
 
S860F
0.982
R1263H
0.669
E229G
0.008
I651T
0.000
I1113T
0.045
 
 
K866T
0.685
M1267T
0.946
P233R
0.000
M654I
0.001
E1121D
0.000
 
 
Q889P
0.682
C1275Y
0.992
R243C
0.005
S666P
0.008
A1151V
0.055
 
 
L909S
0.967
T1284M
0.913
R243H
0.000
D667H
0.453
V1160I
0.117
 
 
D943Y
0.900
A1303T
0.981
I245L
0.000
I669T
0.000
D1181E
0.540
 
 
Y977H
0.945
A1303G
0.916
I251V
0.000
P673A
0.405
M1202V
0.009
 
 
R988C
0.716
R1321G
0.825
I258T
0.000
E675D
0.000
V1232L
0.318
 
 
Y994H
0.895
L1353W
0.989
F265C
0.119
K676R
0.006
H1248D
0.022
 
 
S998T
0.853
 
 
T269S
0.000
Q698K
0.005
V1253L
0.068
 
 
 
 
 
 
K270M
0.001
Q698E
0.006
V1260I
0.001
 
 
 
 
 
 
E277D
0.000
A704G
0.008
N1273S
0.008
 
 
 
 
 
 
S285I
0.000
T719I
0.006
E1274K
0.006
 
 
 
 
 
 
G289D
0.000
T720A
0.033
S1279P
0.014
 
 
 
 
 
 
G289E
0.000
T720I
0.024
I1283V
0.001
 
 
 
 
 
 
K295E
0.000
I725M
0.000
E1310D
0.001
 
 
 
 
 
 
K295R
0.001
I725V
0.000
E1311D
0.004
 
 
 
 
 
 
R300P
0.001
F726S
0.208
R1321S
0.128
 
 
 
 
 
 
S314I
0.000
R761K
0.015
M1326I
0.001
 
 
 
 
 
 
S314R
0.001
T764N
0.005
M1326T
0.002
 
 
 
 
 
 
S315F
0.003
P768A
0.201
S1329L
0.014
 
 
 
 
 
 
T319M
0.000
C783S
0.409
R1331L
0.011
 
 
 
 
 
 
P320T
0.000
A787V
0.063
R1334Q
0.000
 
 
 
 
 
 
A326V
0.000
V800L
0.000
D1346N
0.001
 
 
 
 
 
 
T327S
0.000
V800A
0.000
L1354Q
0.018
 
 
 
 
 
 
F340S
0.001
D803G
0.003
K1358E
0.001
 
 
 
 
 
 
S360G0.000S806F0.450        
Classification results of UVs in MSH6 by CoDP The higher joint scores of CoDP tend to derive from the mutations in the conserved domain, namely in the MutS domain. This tendency suggests that missense mutations in the domain should have considerable influence on protein function. The MutS domain in MSH6 forms a heterodimer with MSH2 and participates in the early recognition of mismatches and small insertion/deletion loops of DNA [54,55]. For instance, the E1193K variant, classified as LLS, is located in the MutS domain V region (Figure 1). The MutS domain V region is the highly conserved region in MutS homologues [20]. This variant showed remarkable impairment of function, such as the loss of heterodimerization with MSH2 and MMR activity [31]. CoDP gave the joint score q = 0.813 to E1193K variant, indicating that the variant likely has significant damage to the structure of MSH6, which may impair the function of the protein.

Conclusion

In this study, we built CoDP, the new prediction tool to assess the MSH6 missense variants. The novelty of CoDP lies in the direct incorporation of protein three-dimensional structure information and the introduction of the logistic regression model for combining the different prediction methods. The former feature was found to have unexpectedly high performance in LLS/ULS classification, and the latter procedure can be interpreted as an introduction of a simple neural network model for combining outputs from different prediction schemes. These new features enabled CoDP to achieve better performance for the classification of the MSH6 variants. The better performance was also sustained by the manually curated dataset of MSH6 variants presented in Tables 2, 3, 4, 5, and 6. For adjusting the parameters, we carefully categorized MSH6 germline missense variants into LLS and ULS. In the current dataset, only 34 out of 294 variants could be categorized into LLS and ULS. This was due to the paucity of both biochemical functional assay data and clinical and molecular data that are linked to the variants of MSH6 on the databases. This data paucity makes the present CoDP not be clinically applicable. However, current form of CoDP has better utility for supporting a risk estimation of UVs in MSH6, as SIFT or PolyPhen-2 does to other proteins. In the future when more associated data would be obtained, the appropriate parameters would be set, and the accuracy of CoDP would be further improved.

Abbreviations

AUC: The area under the curve; CRC: Colorectal cancer; HNPCC: Hereditary Non-Polyposis Colorectal Cancer; IHC: Immunohistochemistry; LLS: Likely to be Lynch syndrome; MAPP: Multivariate analysis of protein polymorphisms; MMR: Mismatch repair; MSI: Microsatellite instability; MSI-H: High level of microsatellite instability; MSI-L: Microsatellite instability low; MSS: Microsatellite stable; NPV: The negative predictive value; PPV: The positive predictive value; ROC: A receiver-operating characteristic; ULS: Unlikely to be Lynch syndrome; UVs: Unclassified variants

Competing interests

The authors declare that they have no competing interests.

Authors’ contribution

HT performed the majority of the work presented in this manuscript and drafted the manuscript. HT, KA and KY participated in this research. HK assisted in research carried out. All authors read and approved the final manuscript.

Additional file 1: Table S1

MSH6 missense variants data used for parameter fitting. The file can be read by standard TIF viewer, such as Preview on Mac OS X. Click here for file

Additional file 2: Table S2

A list of amino acid sequences used for the multiple sequence alignment of MSH6. The file can be read by standard TIF viewer, such as Preview on Mac OS X. Click here for file

Additional file 3: Figure S1

Box and whisker plots for the score distribution of in silico tools evaluated on the test set. The top and the bottom of the box are the 75th and 25th percentile, respectively, and the white line in the box is the median. The distributions of LLS and ULS are divided clearly. The file can be read by standard TIF viewer, such as Preview on Mac OS X. Click here for file
  54 in total

1.  Predicting deleterious amino acid substitutions.

Authors:  P C Ng; S Henikoff
Journal:  Genome Res       Date:  2001-05       Impact factor: 9.043

2.  T-Coffee: A novel method for fast and accurate multiple sequence alignment.

Authors:  C Notredame; D G Higgins; J Heringa
Journal:  J Mol Biol       Date:  2000-09-08       Impact factor: 5.469

3.  hMSH3 and hMSH6 interact with PCNA and colocalize with it to replication foci.

Authors:  H E Kleczkowska; G Marra; T Lettieri; J Jiricny
Journal:  Genes Dev       Date:  2001-03-15       Impact factor: 11.361

4.  Involvement of hMSH6 in the development of hereditary and sporadic colorectal cancer revealed by immunostaining is based on germline mutations, but rarely on somatic inactivation.

Authors:  Jens Plaschke; Stefan Krüger; Steffen Pistorius; Franz Theissig; Hans D Saeger; Hans K Schackert
Journal:  Int J Cancer       Date:  2002-02-10       Impact factor: 7.396

5.  High frequency of hereditary colorectal cancer in Newfoundland likely involves novel susceptibility genes.

Authors:  Michael O Woods; Angela J Hyde; Fiona K Curtis; Susan Stuckless; Jane S Green; Aaron F Pollett; J Desmond Robb; Roger C Green; Marina E Croitoru; Amanda Careen; Jason A W Chaulk; Jegan Jegathesan; John R McLaughlin; Steven S Gallinger; H Banfield Younghusband; Bharati V Bapat; Patrick S Parfrey
Journal:  Clin Cancer Res       Date:  2005-10-01       Impact factor: 12.531

6.  Germ-line msh6 mutations in colorectal cancer families.

Authors:  R D Kolodner; J D Tytell; J L Schmeits; M F Kane; R D Gupta; J Weger; S Wahlberg; E A Fox; D Peel; A Ziogas; J E Garber; S Syngal; H Anton-Culver; F P Li
Journal:  Cancer Res       Date:  1999-10-15       Impact factor: 12.701

7.  Functional analysis of MLH1 mutations linked to hereditary nonpolyposis colon cancer.

Authors:  Minna Nyström-Lahti; Claudia Perrera; Markus Räschle; Elena Panyushkina-Seiler; Giancarlo Marra; Anna Curci; Barbara Quaresima; Francesco Costanzo; Michele D'Urso; Salvatore Venuta; Josef Jiricny
Journal:  Genes Chromosomes Cancer       Date:  2002-02       Impact factor: 5.006

8.  Functional analysis of MSH6 mutations linked to kindreds with putative hereditary non-polyposis colorectal cancer syndrome.

Authors:  Reetta Kariola; Tiina E Raevaara; Karin E Lönnqvist; Minna Nyström-Lahti
Journal:  Hum Mol Genet       Date:  2002-05-15       Impact factor: 6.150

9.  Molecular and clinical characteristics of MSH6 variants: an analysis of 25 index carriers of a germline variant.

Authors:  Maran J W Berends; Ying Wu; Rolf H Sijmons; Rob G J Mensink; Tineke van der Sluis; Jannet M Hordijk-Hos; Elisabeth G E de Vries; Harry Hollema; Arend Karrenbeld; Charles H C M Buys; Ate G J van der Zee; Robert M W Hofstra; Jan H Kleibeuker
Journal:  Am J Hum Genet       Date:  2002-01       Impact factor: 11.025

10.  Human non-synonymous SNPs: server and survey.

Authors:  Vasily Ramensky; Peer Bork; Shamil Sunyaev
Journal:  Nucleic Acids Res       Date:  2002-09-01       Impact factor: 16.971

View more
  9 in total

1.  Functional analysis of rare variants in mismatch repair proteins augments results from computation-based predictive methods.

Authors:  Sanjeevani Arora; Peter J Huwe; Rahmat Sikder; Manali Shah; Amanda J Browne; Randy Lesh; Emmanuelle Nicolas; Sanat Deshpande; Michael J Hall; Roland L Dunbrack; Erica A Golemis
Journal:  Cancer Biol Ther       Date:  2017-05-11       Impact factor: 4.742

2.  VIPdb, a genetic Variant Impact Predictor Database.

Authors:  Zhiqiang Hu; Changhua Yu; Mabel Furutsuki; Gaia Andreoletti; Melissa Ly; Roger Hoskins; Aashish N Adhikari; Steven E Brenner
Journal:  Hum Mutat       Date:  2019-08-17       Impact factor: 4.878

3.  Characterization of the Genomic and Immunologic Diversity of Malignant Brain Tumors through Multisector Analysis.

Authors:  Maximilian O Schaettler; Megan M Richters; Malachi Griffith; Gavin P Dunn; Anthony Z Wang; Zachary L Skidmore; Bryan Fisk; Katherine E Miller; Tammi L Vickery; Albert H Kim; Michael R Chicoine; Joshua W Osbun; Eric C Leuthardt; Joshua L Dowling; Gregory J Zipfel; Ralph G Dacey; Hsiang-Chih Lu; Tanner M Johanns; Obi L Griffith; Elaine R Mardis
Journal:  Cancer Discov       Date:  2021-10-05       Impact factor: 38.272

4.  Suspected Lynch syndrome associated MSH6 variants: A functional assay to determine their pathogenicity.

Authors:  Hellen Houlleberghs; Anne Goverde; Jarnick Lusseveld; Marleen Dekker; Marco J Bruno; Fred H Menko; Arjen R Mensenkamp; Manon C W Spaander; Anja Wagner; Robert M W Hofstra; Hein Te Riele
Journal:  PLoS Genet       Date:  2017-05-22       Impact factor: 5.917

5.  Computational study of the impact of nucleotide variations on highly conserved proteins: In the case of actin.

Authors:  Ha T T Duong; Hirofumi Suzuki; Saki Katagiri; Mayu Shibata; Misae Arai; Kei Yura
Journal:  Biophys Physicobiol       Date:  2022-07-28

Review 6.  Computational approaches for predicting variant impact: An overview from resources, principles to applications.

Authors:  Ye Liu; William S B Yeung; Philip C N Chiu; Dandan Cao
Journal:  Front Genet       Date:  2022-09-29       Impact factor: 4.772

7.  Integrative analysis of hereditary nonpolyposis colorectal cancer: the contribution of allele-specific expression and other assays to diagnostic algorithms.

Authors:  Laura De Lellis; Gitana Maria Aceto; Maria Cristina Curia; Teresa Catalano; Sandra Mammarella; Serena Veschi; Fabiana Fantini; Pasquale Battista; Vittoria Stigliano; Luca Messerini; Cristina Mareni; Paola Sala; Lucio Bertario; Paolo Radice; Alessandro Cama
Journal:  PLoS One       Date:  2013-11-20       Impact factor: 3.240

8.  TumorNext-Lynch-MMR: a comprehensive next generation sequencing assay for the detection of germline and somatic mutations in genes associated with mismatch repair deficiency and Lynch syndrome.

Authors:  Phillip N Gray; Pei Tsai; Daniel Chen; Sitao Wu; Jayne Hoo; Wenbo Mu; Bing Li; Huy Vuong; Hsiao-Mei Lu; Navanjot Batth; Sara Willett; Lisa Uyeda; Swati Shah; Chia-Ling Gau; Monalyn Umali; Carin Espenschied; Mike Janicek; Sandra Brown; David Margileth; Lavinia Dobrea; Lawrence Wagman; Huma Rana; Michael J Hall; Theodora Ross; Jonathan Terdiman; Carey Cullinane; Savita Ries; Ellen Totten; Aaron M Elliott
Journal:  Oncotarget       Date:  2018-04-17

9.  Must Peutz-Jeghers syndrome patients have the LKB1/STK11 gene mutation? A case report and review of the literature.

Authors:  Fu-Xiao Duan; Guo-Li Gu; Hai-Rui Yang; Peng-Fei Yu; Zhi Zhang
Journal:  World J Clin Cases       Date:  2018-08-16       Impact factor: 1.337

  9 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.