Literature DB >> 32728337

Computational Prediction of Probable Single Nucleotide Polymorphism-Cancer Relationships.

Shahab Bakhtiari1, Sadegh Sulaimany2, Mehrdad Talebi3, Kabmiz Kalhor1.   

Abstract

Genetic variations such as single nucleotide polymorphisms (SNPs) can cause susceptibility to cancer. Although thousands of genetic variants have been identified to be associated with different cancers, the molecular mechanisms of cancer remain unknown. There is not a particular dataset of relationships between cancer and SNPs, as a bipartite network, for computational analysis and prediction. Link prediction as a computational graph analysis method can help us to gain new insight into the network. In this article, after creating a network between cancer and SNPs using SNPedia and Cancer Research UK databases, we evaluated the computational link prediction methods to foresee new SNP-Cancer relationships. Results show that among the popular scoring methods based on network topology, for relation prediction, the preferential attachment (PA) algorithm is the most robust method according to computational and experimental evidence, and some of its computational predictions are corroborated in recent publications. According to the PA predictions, rs1801394-Non-small cell lung cancer, rs4880-Non-small cell lung cancer, and rs1805794-Colorectal cancer are some of the best probable SNP-Cancer associations that have not yet been mentioned in any published article, and they are the most probable candidates for additional laboratory and validation studies. Also, it is feasible to improve the predicting algorithms to produce new predictions in the future.
© The Author(s) 2020.

Entities:  

Keywords:  Cancer; SNP; bipartite network; link prediction

Year:  2020        PMID: 32728337      PMCID: PMC7364831          DOI: 10.1177/1176935120942216

Source DB:  PubMed          Journal:  Cancer Inform        ISSN: 1176-9351


Introduction

Cancer has a significant impact on human mortality. It stands among the leading causes of death worldwide. The number of cancer cases is increasing at an alarming rate annually. It is believed that some behavioral and environmental triggers can lead to cancer, including diet, lifestyle, chronic, and viral infection. Increasing life span is another leading cause of cancer, and the researchers estimate that about two-thirds of the increase is due to aging. Although thousands of genetic variants (including single nucleotide polymorphisms [SNPs]) have identified to associate with different cancers, the molecular mechanisms of cancers have remained unknown. Therefore, researchers are continuing to explore this field.[1-4] Genetic variations like SNPs can cause susceptibility to cancer. For example, the SNPs in a promoter site can affect the gene expression, and even in some tumors, it can affect the patient’s overall state of health and mortality risks.[5,6] Most recent, genome-wide association studies (GWASs), show relations between some of the known cancers with the specific SNPs.[5,6] For example, Guo et al[7] reviewed 45 SNPs involved in prostate cancer. Also, Zhang et al[8] studied the effect of rs920778 in the HOTAIR gene on esophageal cancer, and Li et al’s[9] study of the effect of rs13252298 SNP in the PRNCR1 gene showed a relation to gastric cancer. A recent case[10] identifies the link between SNP rs10800708 and breast cancer. There are also numerous examples of the relationships between different SNPs and cancers. However, GWAS studies have several limitations. First, at least one-third of the known variants are in non-coding regulating regions, which affect the transcription factor binding. Second, many GWAS studies show heterogeneity in allele frequencies in different populations.[11,12] So, more studies are needed to identify the relationships between cancers and SNPs. Computational methods can facilitate finding and predicting cancer-SNP relationships. There are some algorithmic studies for predicting cancer-SNP relationships. Those researches are mostly based on machine learning algorithms such as classification that need SNP profile data of case and control groups to predict the relationships.[13-15] The challenges of these studies are simultaneous need to case and control data, limitation to one or few numbers of the cancers in each study, computational complexity, and so on. So, there is a tangible need for a general low complexity computational solution with few pre-requirements to predict cancer-SNP relationships. To the best of our knowledge, there is no study based on link prediction forecasting cancer-SNP relationships.

Link prediction and its importance

Link prediction, as a technique to analyze the graphs, dates back to the emergence of social networks.[16] Later, it was applied to other networks such as biological ones.[17] The primary purpose of link prediction in its basic definition is to find connections in the network that are missing or may be formed in the future, although it is possible to use link prediction to remove weak or spurious relations from the network.[18] Even in some recent articles, it is practicable to predict the addition and removal of links in the network at the same time.[19] To use the link prediction algorithms, it is first necessary to present the problem network with graph structure. In a graph, entities or elements are considered as vertices or nodes and their relationships as links or edges. The modeling graph may be one of the common types, such as simple, bipartite, weighted, or directed graphs, and the link prediction algorithms have been customized and utilized for all of these graph types. For the computation, the graph can be stored in the computer memory as an adjacency matrix. Link prediction ranks the zero entries of the associated adjacency matrix to find the best promising relationships in the graph to establish.[18] The link prediction algorithm for proposing the most probable edge in a simple graph is as follows: Self-loop for the nodes is not allowed in the simple graph used by the link prediction algorithm. This means that N(i, i) should always be zero. In addition, because of the symmetry of the corresponding adjacency matrix with respect to the main diagonal, it is enough to calculate the score for only half of it, because the calculated score for (i, j) is equal to the calculated score for (j, i). Also, the score (i, j) function in the above algorithm can be calculated in a variety of ways; one of the simplest methods is based on network topological properties such as the properties of neighbor nodes.[20] Table 1 consists of the most popular topological scoring methods.
Table 1.

Link prediction score functions for topology-based node neighborhood metrics.

Score metricFormula
CN Score(x,y)=|Γ(x)Γ(y)|
JC Score(x,y)=|Γ(x)Γ(y)||Γ(x)Γ(y)|
PA Score(x,y)=|Γ(x)|×|Γ(y)|
AA Score(x,y)=zΓ(x)Γ(y)1log|Γ(z)|

Abbreviations: AA, Adamic and Adar; CN, common neighbors; JC, Jaccard; PA, preferential attachment.

Link prediction score functions for topology-based node neighborhood metrics. Abbreviations: AA, Adamic and Adar; CN, common neighbors; JC, Jaccard; PA, preferential attachment. refers to the set of neighbors of the x vertex, and represents the number of members of the set or the number of neighbors of the x vertex. For example, for the common neighbors (CN) criterion, the corresponding formula indicates that the score of the candidate edge is calculated by counting the number of CN of the 2 vertices x and y.

Method

Network construction

To study and predict the SNPs associated with various cancers in human, a list of SNPs should be provided at first. For this purpose, the SNPedia database (www.snpedia.com) was used. In this database, information of each SNP, including its effects on cancers, has been gathered from the valid journals. The total number of SNPs in SNPedia reaches 109 530. Also, to determine the relationship between SNPs and cancers, we need a complete list of all cancer names. We extract this list from the Cancer Research UK online database (www.cancerresearchuk.org). It has been active since 2002 in the field of cancer research and information. Each cancer usually has some subgroups that have not always been explicitly mentioned in the articles, so there are many challenges to create an SNP-Cancer network such as the following: Sometimes the articles refer only to the main category and general name of cancer. Occasionally, in the articles on the relationship between cancer and the SNPs, associated SNPs are found for all subtypes of a cancer. Sometimes in the papers, the SNPs associated with cancer are present only for some of the subtypes of that cancer, and no evidence is found for other subtypes. In some cases, in various articles, specific cancer is mentioned with several names. In some cases, cancerous tissue is mentioned without the determination of the exact cancer name. Every so often, there are no data about one cancer on the SNPedia website and no articles showing associated SNP. For this reason, the authors of the article were forced to extract and check the data manually for each cancer. First, the information on the Cancer Research UK website was studied for all cancers. Next, for each cancer, a variety of different names and different types were examined and categorized. Then each cancer was searched manually on the website (www.snpedia.com), with both the main name and its subtypes. The output was the SNPs, whose association with cancer has been reported. So the list only included the cancers, primary cancer name, or subcategory, which their SNP was found in the search. In the second step, a Java code was implemented to connect the website (www.snpedia.com) and automatically detect the SNPs associated with the cancers that were categorized in the previous step. In cases in which an SNP was found for the primary cancer name, and there were no data about its associativity for the cancer subtypes, we considered it linked to all subtypes and not for primary cancer name. In cases where SNP was found only for a few subtypes of the main type, the SNPs associated with subtypes were generalized to the main types, and only the name of the main type was entered into the final list of cancers, and the name of the subtypes was not included, for uniformity of the cancer names. In cases where there were several names for a cancer, a name was chosen, and the SNPs found for other alias cancer names made united to only the selected name, and just the selected name entered to the final list. For instances of cancerous tissues that there were some associated SNPs, we neglected them because of the ambiguity of the cancer name, and we avoided to incorporate these data into the final list. Thereupon the SNP-Cancer network was constructed. Steps performed to prepare the data are shown in Figure 1. The created network is a bipartite one, with 7599 edges or relations, 50 cancers, and 4723 SNP vertices (Figure 2). As the representation of the whole network cannot be informative due to existence of a large number of non-labeled overlapping nodes, we presented only a sub-network regarding a selected subset of cancers in Figure 3 to make the representation more understandable—a bipartite graph demonstration with nodes labeled that would be a complement representation here. In case the network is still enormous in this case, the number of SNPs has also been limited (31 cancers, 33 SNPs, and 222 relationships).
Figure 1.

SNP-Cancer network construction steps. SNP indicates single nucleotide polymorphism.

Figure 2.

Visualization of SNP-Cancer bipartite graph. Red circles are cancers surrounded by SNPs. SNP indicates single nucleotide polymorphism.

Figure 3.

An alternative representation for the SNP-Cancer network. A sub-network with 33 SNPs, 31 cancers, and 222 relationships is presented in the bipartite form to make a better view of the real network. SNP indicates single nucleotide polymorphism.

SNP-Cancer network construction steps. SNP indicates single nucleotide polymorphism. Visualization of SNP-Cancer bipartite graph. Red circles are cancers surrounded by SNPs. SNP indicates single nucleotide polymorphism. An alternative representation for the SNP-Cancer network. A sub-network with 33 SNPs, 31 cancers, and 222 relationships is presented in the bipartite form to make a better view of the real network. SNP indicates single nucleotide polymorphism.

Bipartite link prediction

After creating a bipartite graph of links between SNPs and cancers, we have a network that identifies which SNPs are associated with which cancers and vice versa. From here onward, the discovery of hidden links in the bipartite graph will be the result of calculations and link prediction algorithms. But, we only introduced the link prediction in its basic form for simple graphs earlier in the “Introduction” section. So, it is necessary to adopt the ranking scores for bipartite networks. The bipartite network does not involve self-loop relations. Also, the scoring function, which ranks the probable links in the bipartite graph, is different. Because none of the nodes in each part has relationships with the nodes in the same part, direct relation between SNP pairs or cancer pairs is not important here in this research. In this graph type, only the links between the vertices of one part and the vertices from the other part are important. Thus, we chose the scoring functions based on reference.[21] To clarify the customized formulas, it is necessary to define its elements first. If x node is in the first part and y node is in other part of the graph, refers to the set of neighbors of y node, and refers to the neighboring set of neighbors of y or simply . For example, counts the number of common neighbors of x, which is from one part of the network, and neighbors of neighbors of y, that y is from another part. In other words, neighbors of y have not any intersection with x. So, the probable relationship between x from one part of the network and y from another part will be ranked based on the neighbors of x and indirect (2 steps away) neighbors of y. This modification is not necessary for the preferential attachment (PA) scoring method, because it does not depend on vicinity intersection. It only depends on the degree of the nodes likely to connect from each graph partition (Table 2).
Table 2.

Link prediction score function for topology-based node neighborhood metrics in bipartite graphs.

Score metricScore formula
CN |Γ(x)Γ(y)
JC |Γ(x)Γ(y)||Γ(x)Γ(y)|
PA |Γ(x)|×|Γ(y)|
AA zΓ(x)Γ(y)1log|Γ(z)|

Abbreviations: AA, Adamic and Adar; CN, common neighbors; JC, Jaccard; PA, preferential attachment.

Link prediction score function for topology-based node neighborhood metrics in bipartite graphs. Abbreviations: AA, Adamic and Adar; CN, common neighbors; JC, Jaccard; PA, preferential attachment. Consequently, we are willing to know which of the scoring methods has the best prediction accuracy for the SNP-Cancer bipartite network, and then we would like to know what new connections between SNPs and cancers are discovered and suggested according to the best edge scoring method. Next, we should prove these findings computationally, and finally, we should be able to validate the proposed results based on the evidence we find on the scientific databases or pass it on to new in vitro experiments. The accuracy of predictions depends on the properties of the examined networks, such as scale-free or small-world attributes,[22] and none of the scoring algorithms have complete superiority over the others in advance. Therefore, the calculations will be done for each of the scoring methods in Table 2, and after the accuracy evaluation, the best method is introduced for more practical investigation of the results.

Evaluation criteria

To compare the efficiency of the link scoring functions, it is computationally common to measure the performance based on the known network information, ie, the edges that already exist. It is recommended to apply one of the area under the receiver operating characteristic curve (AUC) or precision measures for evaluation.[23] In our work, AUC is used for quantifying the accuracy of the prediction method. To do so, if the set of edges in the network is E, we divide it into 2 distinct parts, ET and EV, which their intersection is an empty set, and their union includes all the edges in the network. ET stands for a training set of edges as existing information, and EV is validation set of edges which we delete from the network randomly and provide no information on them to the scoring functions, and we are going to predict them accurately. To ensure that all validation links are tested, we will use the 10-fold cross-validation process, which does the prediction 10 times for 10 disjoint sets. Each set includes 10% of the randomly removed edges of the network, EV. After that, we report the AUC of the prediction for each score function as the average of the values of 10-fold cross-validation for each function, and larger percentage of the AUC will show the better performance of the scoring method for link prediction. Here, the AUC means the probability that a randomly chosen missing connection is given a higher score by our algorithm than a randomly chosen pair of unconnected vertices. Thus, the degree to which the AUC exceeds 0.5 indicates how much better our predictions are than chance.[24] Therefore, calculating the average score is as follows where n is the total number of the random edge selection, is the total number of times that randomly chosen missing link has the higher score, and has the same score. Of course, the real data of related researches can also be used for further validation. For this purpose, we will search online scientific databases, Google Scholar and PubMed, for the predicted links that are not currently available on the SNPedia website, and if we find evidence in research papers and articles, we have another proof for the accuracy of the operation of the algorithms.

Results

Based on the AUC evaluation results among link prediction scoring functions, the PA method is more effective in predicting potential links between SNPs and cancers (Table 3). Accordingly, the most likely 15 predicted links between SNP and various cancers are as follows for PA method, first 2 columns of Table 4.
Table 3.

AUC of different node neighborhood similarity–based link prediction scores over bipartite SNP-Cancer network.

AlgorithmAUC
CN0.90
JC0.84
PA0.95
AA0.89

Abbreviations: AA, Adamic and Adar; AUC, area under the receiver operating characteristic curve; CN, common neighbors; JC, Jaccard; PA, preferential attachment; SNP, single nucleotide polymorphism.

Table 4.

Top 15 SNP-Cancer relationships predicted by PA, CN, JC, and AA scoring link prediction approach.

PACNJCAA
SNPCancerSNPCancerSNPCancerSNPCancer
rs1801133Non-small cell lung cancerrs1801133Pancreatic cancerrs25489Cholangiocarcinomars1801133Pancreatic cancer
rs1801131Non-small cell lung cancerrs1801133Non-small cell lung cancerrs20417Cholangiocarcinomars1801133Non-small cell lung cancer
rs1048943Stomach cancerrs1801133Gallbladder cancerrs13181Laryngeal cancerrs1801133Gallbladder cancer
rs1048943Prostate cancerrs1801133Hodgkin lymphomars17851045Thymomars1801133Hodgkin lymphoma
rs1799793Ovarian cancerrs1801133Thyroid cancerrs587781525Thymomars1801133Thyroid cancer
rs1805794Stomach cancerrs1801131Pancreatic cancerrs1057519984Thymomars1801131Pancreatic cancer
rs4646903Prostate cancerrs1801131Non-small cell lung cancerrs764146326Thymomars1801131Non-small cell lung cancer
rs1801394Non-small cell lung cancerrs1801131Bladder cancerrs1057520000Thymomars1801131Bladder cancer
rs4880Non-small cell lung cancerrs1801131Myelomars28934874Thymomars1801131Myeloma
rs1800566Stomach cancerrs1801131Retinoblastomars104894228Thymomars1801131Retinoblastoma
rs3212227Stomach cancerrs1801131Hodgkin lymphomars1801131Retinoblastomars1801131Hodgkin lymphoma
rs1805794Colorectal cancerrs1801131Thyroid cancerrs1799793Laryngeal cancerrs1801131Thyroid cancer
rs2736100Breast cancerrs1801131Gallbladder cancerrs2736100Laryngeal cancerrs1801131Gallbladder cancer
rs1801133Pancreatic cancerrs1801133Skin cancerrs1801133Gallbladder cancerrs1801133Skin cancer
rs699947Stomach cancerrs1801133Osteosarcomars1801133Hodgkin lymphomars13181Hodgkin lymphoma

Abbreviations: AA, Adamic and Adar; CN, common neighbors; JC, Jaccard; PA, preferential attachment; SNP, single nucleotide polymorphism.

AUC of different node neighborhood similarity–based link prediction scores over bipartite SNP-Cancer network. Abbreviations: AA, Adamic and Adar; AUC, area under the receiver operating characteristic curve; CN, common neighbors; JC, Jaccard; PA, preferential attachment; SNP, single nucleotide polymorphism. Top 15 SNP-Cancer relationships predicted by PA, CN, JC, and AA scoring link prediction approach. Abbreviations: AA, Adamic and Adar; CN, common neighbors; JC, Jaccard; PA, preferential attachment; SNP, single nucleotide polymorphism. Because of the novelty of the idea of predicting unknown links between cancer and SNP and for more investigation and better comparison, the results of other scoring functions are also summarized in Table 4.

Discussion

Single nucleotide polymorphisms in the human genome are of the most common genetic variations and are located in different positions of genes such as exon, promoter, intron, 5′-UTR, and 3′-UTR. Due to their position in the genes, SNPs have different levels of control in various diseases, such as cancer, and the results of studies have proven the role of SNPs in cancer in terms of regulation, repair, DNA mismatch, metabolism, cell cycle, and immunity.[25-27] Our understanding of the role of SNPs in cancer susceptibility depends on our molecular understanding of the pathogenicity of cancer.[28] In clinical trials, people are usually identified in the advanced stages of the disease, and the main goal is to prevent the progression of disease in patients, and the SNP biomarker data are essential for predicting and screening individuals that are at hazard. Checking the validity of the AUC results was also done through search in the popular scientific databases, Google Scholar and PubMed, to confirm the probable relationships based on reported pieces of evidence. Type of the reported relations between SNP and cancer is also noted as positive or negative associativity effects, with Yes or No in Tables 5 to 8. Results of the investigation of the evidence affirm the computational link prediction calculation.
Table 5.

Validation of the prediction results for new SNP-Cancer relationships in PA scoring method.

RowSNPCancerSNPediaPubMedGoogle ScholarReferencesAssociation
1rs1801133Non-small cell lung cancerXDing et al[30]Yes
2rs1801131Non-small cell lung cancerXLi et al[31]Yes
3rs1048943Stomach cancerXHidaka et al[32]No
4rs1048943Prostate cancerXKoda et al[33]Yes
5rs1799793Ovarian cancerXAssis et al[34]No
6rs1805794Stomach cancerXZhou et al[35]Yes
7rs4646903Prostate cancerXPorchia et al[36]No
8rs1801394Non-small cell lung cancerX
9rs4880Non-small cell lung cancerX
10rs1800566Stomach cancerXYadav et al[37]Yes
11rs3212227Stomach cancerXYin et al[38]Yes
12rs1805794Colorectal cancerX
13rs2736100Breast cancerXAydin et al[39]No
14rs1801133Pancreatic cancerXNakao et al[40]No
15rs699947Stomach cancerXKe et al[41]No

Abbreviations: PA, preferential attachment; SNP, single nucleotide polymorphism.

Table 8.

Validation of the prediction results for new SNP-Cancer relationships in AA scoring method.

RowSNPCancerSNPediaPubMedGoogle ScholarReferencesAssociation
1rs1801133Pancreatic cancerXNakao et al[40]No
2rs1801133Non-small cell lung cancerXDing et al[30]Yes
3rs1801133Gallbladder cancerXDixit et al[42]No
4rs1801133Hodgkin lymphomaXSud et al[43]Yes
5rs1801133Thyroid cancerXZara-Lopes et al[44]Yes
6rs1801131Pancreatic cancerXNakao et al[40]No
7rs1801131Non-small cell lung cancerXLi et al[31]Yes
8rs1801131Bladder cancerXDe Maturana et al[45]No
9rs1801131MyelomaXMa et al[46]Yes
10rs1801131RetinoblastomaXSoleimani et al[47]No
11rs1801131Hodgkin lymphomaX
12rs1801131Thyroid cancerXYang et al[48]No
13rs1801131Gallbladder cancerXDe Maturana et al[45]No
14rs1801133Skin cancerXXie et al[49]No
15rs13181Hodgkin lymphomaX

Abbreviations: AA, Adamic and Adar; SNP, single nucleotide polymorphism.

Validation of the prediction results for new SNP-Cancer relationships in PA scoring method. Abbreviations: PA, preferential attachment; SNP, single nucleotide polymorphism. Validation of the prediction results for new SNP-Cancer relationships in CN scoring method. Abbreviations: CN, common neighbors; SNP, single nucleotide polymorphism. Validation of the prediction results for new SNP-Cancer relationships in JC scoring method. Abbreviations: JC, Jaccard; SNP, single nucleotide polymorphism. Validation of the prediction results for new SNP-Cancer relationships in AA scoring method. Abbreviations: AA, Adamic and Adar; SNP, single nucleotide polymorphism. Of the 15 not included links in SNPedia, that has been predicted by the PA link prediction algorithm, 12 cases have been addressed in the papers; 6 were confirmed by the experiments, 6 were rejected, and the other 3 were not yet declared. While other link prediction methods, which have fewer points than PA in terms of AUC (CN, Jaccard [JC], and Adamic and Adar [AA] methods), have fewer predicted positive associations than PA in literature survey. In particular, JC, which has the weakest predictability power in link prediction researches,[29] has also least positive findings and returned more results that have not been verified at all in the literature (Tables 5-8).
Table 6.

Validation of the prediction results for new SNP-Cancer relationships in CN scoring method.

RowSNPCancerSNPediaPubMedGoogle ScholarReferencesAssociation
1rs1801133Pancreatic cancerXNakao et al[40]No
2rs1801133Non-small cell lung cancerXDing et al[30]Yes
3rs1801133Gallbladder cancerXDixit et al[42]No
4rs1801133Hodgkin lymphomaXSud et al[43]No
5rs1801133Thyroid cancerXZara-Lopes et al[44]Yes
6rs1801131Pancreatic cancerXNakao et al[40]No
7rs1801131Non-small cell lung cancerXLi et al[31]Yes
8rs1801131Bladder cancerXDe Maturana et al[45]No
9rs1801131MyelomaXMa et al[46]Yes
10rs1801131RetinoblastomaXSoleimani et al[47]No
11rs1801131Hodgkin lymphomaX
12rs1801131Thyroid cancerXYang et al[48]No
13rs1801131Gallbladder cancerXDe Maturana et al[45]Yes
14rs1801133Skin cancerXXie et al[49]No
15rs1801133OsteosarcomaX

Abbreviations: CN, common neighbors; SNP, single nucleotide polymorphism.

Table 7.

Validation of the prediction results for new SNP-Cancer relationships in JC scoring method.

RowSNPCancerSNPediaPubMedGoogle ScholarReferencesAssociation
1rs25489CholangiocarcinomaX
2rs20417CholangiocarcinomaX
3rs13181Laryngeal cancerXSun et al[50]No
4rs17851045ThymomaX
5rs587781525ThymomaX
6rs1057519984ThymomaX
7rs764146326ThymomaX
8rs1057520000ThymomaX
9rs28934874ThymomaX
10rs104894228ThymomaX
11rs1801131RetinoblastomaXSoleimani et al[47]No
12rs1799793Laryngeal cancerXLu et al[51]No
13rs2736100Laryngeal cancerX
14rs1801133Gallbladder cancerXDe Maturana et al[45]No
15rs1801133Hodgkin lymphomaXSud et al[43]Yes

Abbreviations: JC, Jaccard; SNP, single nucleotide polymorphism.

However, in the 3 methods, AA, PA, and CN, there are 2 common couples of rs1801133-Non-small cell lung cancer and rs1801131-Non-small cell lung cancer that all of them confirm but with different score and positions in their sorted, ranked list. Rs1801133 and rs1801131 are also popular and have many related studies and have links to several cancers, while the most frequent cancer in the predictions is Non-small cell lung cancer. AA and CN results are almost identical except in the last position, row 15. Consequently, there are different confirming publications for many of the SNP-Cancer predicted relationships. We briefly mentioned the latest published paper in column 7 of Tables 5 to 8, to show the recent findings of the studies. The last published paper also integrates all the previous studies and final findings of the type of association (Yes or No) between SNP and cancer. Several factors such as sparsity or completeness of the network can affect the evaluation results. For example, the network density, number of the existing edges divided by the total possible edges, is 0.032 here, and we have a relatively sparse network. The denser the network will be, the better the performance of link prediction algorithms will get. Also, the completeness of the investigated dataset affects the accuracy of the calculations and results. SNPedia is not fully up-to-date and ideal, as we will demonstrate in the next section and our computations show. Furthermore, the AUC criterion chosen for the evaluation is not the only criterion. It can be completed by verifying the availability of the results in the literature, which is well known as domain knowledge evaluation and will be discussed further. Another notable point is that there are several predicted relationships here, which have been studied in the literature, but the result of the researches reports no association between SNP and cancer. This is also significant because it has attracted the researchers and approves the importance and directionality of our computational prediction methods for more in vitro investigations.

Conclusions

Based on the promising results of the PA scoring method, to predict new links between SNP and cancer, we suggest examining and verifying below relationships in vitro because, to the best of our knowledge, such links have not yet been reported in scientific publications. If one or more of the following links are verified, one can consider more of these PA predictions to find new SNP-Cancer associations: rs1801394-Non-small cell lung cancer rs4880-Non-small cell lung cancer rs1805794-Colorectal cancer Numerous unreported predictions of SNP and cancer links on the SNPedia reference website indicate that this database is incomplete and can be completed using literature reviews, in vitro tests, or other methods that can also be used to validate the result of the link prediction method. This is also true for many other biological networks, and they can be enriched with the help of link prediction algorithms, or even their hidden or incomplete relations can be discovered. Also, the precision of the link prediction depends on how the network is created, network properties, and the preprocessing of the network constructor data. The more reliable and accurate the work is, the better the results will be. Only a small number of the basic existing algorithms for link prediction are used in this research. There were unsupervised node neighborhood-based link prediction algorithms. Other methods, such as path-based or supervised machine learning–based, can also be used to increase the accuracy of the results. In particular, machine learning–based methods can take into account different related features of the network and not just network topology.[18] Link prediction is not used only to predict new or missed relations. Its newer versions can be used to remove noise or misconnections. This version of the link prediction is known as the Negative Link Prediction (NLP)[17] and can be used to identify and eliminate the weak associations between SNP and cancer. The effectiveness of such a method in noise elimination has been proven on experimental data extracted from high-throughput methods for protein networks.[52] Finally, link prediction can also be used to develop and predict links between SNP and other non-cancerous diseases.
Link Prediction Algorithm
Input: matrix N with n*n dimensions, represent the investigating network
Output: best N(i, j) link to be established
1. imax = 1, jmax = 2
2. Max = 0
3. for i = 1 to n
4. for j = i + 1 to n
5.   if N[i, j] = 0
6.    Rank = score (i, j)
7.    if Rank > Max
8.     Max = score (i, j)
9.     imax = i, jmax = j
10. return i, j, Score (i, j)
  42 in total

1.  Effects of the functional HOTAIR rs920778 and rs12826786 genetic variants in glioma susceptibility and patient prognosis.

Authors:  Ana Xavier-Magalhães; Ana I Oliveira; Joana Vieira de Castro; Marta Pojo; Céline S Gonçalves; Tatiana Lourenço; Marta Viana-Pereira; Sandra Costa; Paulo Linhares; Rui Vaz; Rui Nabiço; Júlia Amorim; Afonso A Pinto; Rui M Reis; Bruno M Costa
Journal:  J Neurooncol       Date:  2017-01-12       Impact factor: 4.130

2.  The identification of an ESCC susceptibility SNP rs920778 that regulates the expression of lncRNA HOTAIR via a novel intronic enhancer.

Authors:  Xiaojiao Zhang; Liqing Zhou; Guobin Fu; Fang Sun; Juan Shi; Jinyu Wei; Chao Lu; Changchun Zhou; Qipeng Yuan; Ming Yang
Journal:  Carcinogenesis       Date:  2014-04-30       Impact factor: 4.944

3.  Genetic polymorphisms of DNA repair pathways influence the response to chemotherapy and overall survival of gastric cancer.

Authors:  Jing Zhou; Zhi-yue Liu; Cun-bao Li; Shang Gao; Li-hong Ding; Xin-lin Wu; Zhao-yang Wang
Journal:  Tumour Biol       Date:  2014-12-28

4.  Interleukin 12B rs3212227 T > G polymorphism was associated with an increased risk of gastric cardiac adenocarcinoma in a Chinese population.

Authors:  J Yin; X Wang; J Wei; L Wang; Y Shi; L Zheng; W Tang; G Ding; C Liu; R Liu; S Chen; Z Xu; H Gu
Journal:  Dis Esophagus       Date:  2014-02-15       Impact factor: 3.429

5.  Modulation of long noncoding RNAs by risk SNPs underlying genetic predispositions to prostate cancer.

Authors:  Haiyang Guo; Musaddeque Ahmed; Fan Zhang; Cindy Q Yao; SiDe Li; Yi Liang; Junjie Hua; Fraser Soares; Yifei Sun; Jens Langstein; Yuchen Li; Christine Poon; Swneke D Bailey; Kinjal Desai; Teng Fei; Qiyuan Li; Dorota H Sendorek; Michael Fraser; John R Prensner; Trevor J Pugh; Mark Pomerantz; Robert G Bristow; Mathieu Lupien; Felix Y Feng; Paul C Boutros; Matthew L Freedman; Martin J Walsh; Housheng Hansen He
Journal:  Nat Genet       Date:  2016-08-15       Impact factor: 38.330

6.  CYP1A1, GSTM1 and GSTT1 genetic polymorphisms and gastric cancer risk among Japanese: A nested case-control study within a large-scale population-based prospective study.

Authors:  Akihisa Hidaka; Shizuka Sasazuki; Keitaro Matsuo; Hidemi Ito; Hadrien Charvat; Norie Sawada; Taichi Shimazu; Taiki Yamaji; Motoki Iwasaki; Manami Inoue; Shoichiro Tsugane
Journal:  Int J Cancer       Date:  2016-04-26       Impact factor: 7.396

7.  Mutations driving CLL and their evolution in progression and relapse.

Authors:  Dan A Landau; Eugen Tausch; Amaro N Taylor-Weiner; Chip Stewart; Johannes G Reiter; Jasmin Bahlo; Sandra Kluth; Ivana Bozic; Mike Lawrence; Sebastian Böttcher; Scott L Carter; Kristian Cibulskis; Daniel Mertens; Carrie L Sougnez; Mara Rosenberg; Julian M Hess; Jennifer Edelmann; Sabrina Kless; Michael Kneba; Matthias Ritgen; Anna Fink; Kirsten Fischer; Stacey Gabriel; Eric S Lander; Martin A Nowak; Hartmut Döhner; Michael Hallek; Donna Neuberg; Gad Getz; Stephan Stilgenbauer; Catherine J Wu
Journal:  Nature       Date:  2015-10-14       Impact factor: 49.962

8.  Associations between polymorphisms in folate-metabolizing genes and pancreatic cancer risk in Japanese subjects.

Authors:  Haruhisa Nakao; Kenji Wakai; Norimitsu Ishii; Yuji Kobayashi; Kiyoaki Ito; Masashi Yoneda; Mitsuru Mori; Masanori Nojima; Yasutoshi Kimura; Takao Endo; Masato Matsuyama; Hiroshi Ishii; Makoto Ueno; Sawako Kuruma; Naoto Egawa; Keitaro Matsuo; Satoyo Hosono; Shinichi Ohkawa; Kozue Nakamura; Akiko Tamakoshi; Mami Takahashi; Kazuaki Shimada; Takeshi Nishiyama; Shogo Kikuchi; Yingsong Lin
Journal:  BMC Gastroenterol       Date:  2016-07-29       Impact factor: 3.067

Review 9.  Single nucleotide polymorphisms and cancer susceptibility.

Authors:  Na Deng; Heng Zhou; Hua Fan; Yuan Yuan
Journal:  Oncotarget       Date:  2017-11-07

10.  Bladder Cancer Genetic Susceptibility. A Systematic Review.

Authors:  Evangelina López de Maturana; Marta Rava; Chiaka Anumudu; Olga Sáez; Dolores Alonso; Núria Malats
Journal:  Bladder Cancer       Date:  2018-04-26
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.