| Literature DB >> 32728337 |
Shahab Bakhtiari1, Sadegh Sulaimany2, Mehrdad Talebi3, Kabmiz Kalhor1.
Abstract
Genetic variations such as single nucleotide polymorphisms (SNPs) can cause susceptibility to cancer. Although thousands of genetic variants have been identified to be associated with different cancers, the molecular mechanisms of cancer remain unknown. There is not a particular dataset of relationships between cancer and SNPs, as a bipartite network, for computational analysis and prediction. Link prediction as a computational graph analysis method can help us to gain new insight into the network. In this article, after creating a network between cancer and SNPs using SNPedia and Cancer Research UK databases, we evaluated the computational link prediction methods to foresee new SNP-Cancer relationships. Results show that among the popular scoring methods based on network topology, for relation prediction, the preferential attachment (PA) algorithm is the most robust method according to computational and experimental evidence, and some of its computational predictions are corroborated in recent publications. According to the PA predictions, rs1801394-Non-small cell lung cancer, rs4880-Non-small cell lung cancer, and rs1805794-Colorectal cancer are some of the best probable SNP-Cancer associations that have not yet been mentioned in any published article, and they are the most probable candidates for additional laboratory and validation studies. Also, it is feasible to improve the predicting algorithms to produce new predictions in the future.Entities:
Keywords: Cancer; SNP; bipartite network; link prediction
Year: 2020 PMID: 32728337 PMCID: PMC7364831 DOI: 10.1177/1176935120942216
Source DB: PubMed Journal: Cancer Inform ISSN: 1176-9351
Link prediction score functions for topology-based node neighborhood metrics.
| Score metric | Formula |
|---|---|
| CN |
|
| JC |
|
| PA |
|
| AA |
|
Abbreviations: AA, Adamic and Adar; CN, common neighbors; JC, Jaccard; PA, preferential attachment.
Figure 1.SNP-Cancer network construction steps. SNP indicates single nucleotide polymorphism.
Figure 2.Visualization of SNP-Cancer bipartite graph. Red circles are cancers surrounded by SNPs. SNP indicates single nucleotide polymorphism.
Figure 3.An alternative representation for the SNP-Cancer network. A sub-network with 33 SNPs, 31 cancers, and 222 relationships is presented in the bipartite form to make a better view of the real network. SNP indicates single nucleotide polymorphism.
Link prediction score function for topology-based node neighborhood metrics in bipartite graphs.
| Score metric | Score formula |
|---|---|
| CN |
|
| JC |
|
| PA |
|
| AA |
|
Abbreviations: AA, Adamic and Adar; CN, common neighbors; JC, Jaccard; PA, preferential attachment.
AUC of different node neighborhood similarity–based link prediction scores over bipartite SNP-Cancer network.
| Algorithm | AUC |
|---|---|
| CN | 0.90 |
| JC | 0.84 |
| PA | 0.95 |
| AA | 0.89 |
Abbreviations: AA, Adamic and Adar; AUC, area under the receiver operating characteristic curve; CN, common neighbors; JC, Jaccard; PA, preferential attachment; SNP, single nucleotide polymorphism.
Top 15 SNP-Cancer relationships predicted by PA, CN, JC, and AA scoring link prediction approach.
| PA | CN | JC | AA | ||||
|---|---|---|---|---|---|---|---|
| SNP | Cancer | SNP | Cancer | SNP | Cancer | SNP | Cancer |
| rs1801133 | Non-small cell lung cancer | rs1801133 | Pancreatic cancer | rs25489 | Cholangiocarcinoma | rs1801133 | Pancreatic cancer |
| rs1801131 | Non-small cell lung cancer | rs1801133 | Non-small cell lung cancer | rs20417 | Cholangiocarcinoma | rs1801133 | Non-small cell lung cancer |
| rs1048943 | Stomach cancer | rs1801133 | Gallbladder cancer | rs13181 | Laryngeal cancer | rs1801133 | Gallbladder cancer |
| rs1048943 | Prostate cancer | rs1801133 | Hodgkin lymphoma | rs17851045 | Thymoma | rs1801133 | Hodgkin lymphoma |
| rs1799793 | Ovarian cancer | rs1801133 | Thyroid cancer | rs587781525 | Thymoma | rs1801133 | Thyroid cancer |
| rs1805794 | Stomach cancer | rs1801131 | Pancreatic cancer | rs1057519984 | Thymoma | rs1801131 | Pancreatic cancer |
| rs4646903 | Prostate cancer | rs1801131 | Non-small cell lung cancer | rs764146326 | Thymoma | rs1801131 | Non-small cell lung cancer |
| rs1801394 | Non-small cell lung cancer | rs1801131 | Bladder cancer | rs1057520000 | Thymoma | rs1801131 | Bladder cancer |
| rs4880 | Non-small cell lung cancer | rs1801131 | Myeloma | rs28934874 | Thymoma | rs1801131 | Myeloma |
| rs1800566 | Stomach cancer | rs1801131 | Retinoblastoma | rs104894228 | Thymoma | rs1801131 | Retinoblastoma |
| rs3212227 | Stomach cancer | rs1801131 | Hodgkin lymphoma | rs1801131 | Retinoblastoma | rs1801131 | Hodgkin lymphoma |
| rs1805794 | Colorectal cancer | rs1801131 | Thyroid cancer | rs1799793 | Laryngeal cancer | rs1801131 | Thyroid cancer |
| rs2736100 | Breast cancer | rs1801131 | Gallbladder cancer | rs2736100 | Laryngeal cancer | rs1801131 | Gallbladder cancer |
| rs1801133 | Pancreatic cancer | rs1801133 | Skin cancer | rs1801133 | Gallbladder cancer | rs1801133 | Skin cancer |
| rs699947 | Stomach cancer | rs1801133 | Osteosarcoma | rs1801133 | Hodgkin lymphoma | rs13181 | Hodgkin lymphoma |
Abbreviations: AA, Adamic and Adar; CN, common neighbors; JC, Jaccard; PA, preferential attachment; SNP, single nucleotide polymorphism.
Validation of the prediction results for new SNP-Cancer relationships in PA scoring method.
| Row | SNP | Cancer | SNPedia | PubMed | Google Scholar | References | Association |
|---|---|---|---|---|---|---|---|
| 1 | rs1801133 | Non-small cell lung cancer | X | ✓ | ✓ | Ding et al[ | Yes |
| 2 | rs1801131 | Non-small cell lung cancer | X | ✓ | ✓ | Li et al[ | Yes |
| 3 | rs1048943 | Stomach cancer | X | ✓ | ✓ | Hidaka et al[ | No |
| 4 | rs1048943 | Prostate cancer | X | ✓ | ✓ | Koda et al[ | Yes |
| 5 | rs1799793 | Ovarian cancer | X | ✓ | ✓ | Assis et al[ | No |
| 6 | rs1805794 | Stomach cancer | X | ✓ | ✓ | Zhou et al[ | Yes |
| 7 | rs4646903 | Prostate cancer | X | ✓ | Porchia et al[ | No | |
| 8 | rs1801394 | Non-small cell lung cancer | X | ||||
| 9 | rs4880 | Non-small cell lung cancer | X | ||||
| 10 | rs1800566 | Stomach cancer | X | ✓ | ✓ | Yadav et al[ | Yes |
| 11 | rs3212227 | Stomach cancer | X | ✓ | ✓ | Yin et al[ | Yes |
| 12 | rs1805794 | Colorectal cancer | X | ||||
| 13 | rs2736100 | Breast cancer | X | ✓ | ✓ | Aydin et al[ | No |
| 14 | rs1801133 | Pancreatic cancer | X | ✓ | ✓ | Nakao et al[ | No |
| 15 | rs699947 | Stomach cancer | X | ✓ | Ke et al[ | No |
Abbreviations: PA, preferential attachment; SNP, single nucleotide polymorphism.
Validation of the prediction results for new SNP-Cancer relationships in AA scoring method.
| Row | SNP | Cancer | SNPedia | PubMed | Google Scholar | References | Association |
|---|---|---|---|---|---|---|---|
| 1 | rs1801133 | Pancreatic cancer | X | ✓ | ✓ | Nakao et al[ | No |
| 2 | rs1801133 | Non-small cell lung cancer | X | ✓ | Ding et al[ | Yes | |
| 3 | rs1801133 | Gallbladder cancer | X | ✓ | ✓ | Dixit et al[ | No |
| 4 | rs1801133 | Hodgkin lymphoma | X | ✓ | ✓ | Sud et al[ | Yes |
| 5 | rs1801133 | Thyroid cancer | X | ✓ | ✓ | Zara-Lopes et al[ | Yes |
| 6 | rs1801131 | Pancreatic cancer | X | ✓ | ✓ | Nakao et al[ | No |
| 7 | rs1801131 | Non-small cell lung cancer | X | ✓ | ✓ | Li et al[ | Yes |
| 8 | rs1801131 | Bladder cancer | X | ✓ | ✓ | De Maturana et al[ | No |
| 9 | rs1801131 | Myeloma | X | ✓ | ✓ | Ma et al[ | Yes |
| 10 | rs1801131 | Retinoblastoma | X | ✓ | ✓ | Soleimani et al[ | No |
| 11 | rs1801131 | Hodgkin lymphoma | X | ||||
| 12 | rs1801131 | Thyroid cancer | X | ✓ | ✓ | Yang et al[ | No |
| 13 | rs1801131 | Gallbladder cancer | X | ✓ | ✓ | De Maturana et al[ | No |
| 14 | rs1801133 | Skin cancer | X | ✓ | ✓ | Xie et al[ | No |
| 15 | rs13181 | Hodgkin lymphoma | X |
Abbreviations: AA, Adamic and Adar; SNP, single nucleotide polymorphism.
Validation of the prediction results for new SNP-Cancer relationships in CN scoring method.
| Row | SNP | Cancer | SNPedia | PubMed | Google Scholar | References | Association |
|---|---|---|---|---|---|---|---|
| 1 | rs1801133 | Pancreatic cancer | X | ✓ | ✓ | Nakao et al[ | No |
| 2 | rs1801133 | Non-small cell lung cancer | X | ✓ | Ding et al[ | Yes | |
| 3 | rs1801133 | Gallbladder cancer | X | ✓ | ✓ | Dixit et al[ | No |
| 4 | rs1801133 | Hodgkin lymphoma | X | ✓ | ✓ | Sud et al[ | No |
| 5 | rs1801133 | Thyroid cancer | X | ✓ | ✓ | Zara-Lopes et al[ | Yes |
| 6 | rs1801131 | Pancreatic cancer | X | ✓ | ✓ | Nakao et al[ | No |
| 7 | rs1801131 | Non-small cell lung cancer | X | ✓ | ✓ | Li et al[ | Yes |
| 8 | rs1801131 | Bladder cancer | X | ✓ | ✓ | De Maturana et al[ | No |
| 9 | rs1801131 | Myeloma | X | ✓ | ✓ | Ma et al[ | Yes |
| 10 | rs1801131 | Retinoblastoma | X | ✓ | ✓ | Soleimani et al[ | No |
| 11 | rs1801131 | Hodgkin lymphoma | X | ||||
| 12 | rs1801131 | Thyroid cancer | X | ✓ | ✓ | Yang et al[ | No |
| 13 | rs1801131 | Gallbladder cancer | X | ✓ | ✓ | De Maturana et al[ | Yes |
| 14 | rs1801133 | Skin cancer | X | ✓ | ✓ | Xie et al[ | No |
| 15 | rs1801133 | Osteosarcoma | X |
Abbreviations: CN, common neighbors; SNP, single nucleotide polymorphism.
Validation of the prediction results for new SNP-Cancer relationships in JC scoring method.
| Row | SNP | Cancer | SNPedia | PubMed | Google Scholar | References | Association |
|---|---|---|---|---|---|---|---|
| 1 | rs25489 | Cholangiocarcinoma | X | ||||
| 2 | rs20417 | Cholangiocarcinoma | X | ||||
| 3 | rs13181 | Laryngeal cancer | X | ✓ | ✓ | Sun et al[ | No |
| 4 | rs17851045 | Thymoma | X | ||||
| 5 | rs587781525 | Thymoma | X | ||||
| 6 | rs1057519984 | Thymoma | X | ||||
| 7 | rs764146326 | Thymoma | X | ||||
| 8 | rs1057520000 | Thymoma | X | ||||
| 9 | rs28934874 | Thymoma | X | ||||
| 10 | rs104894228 | Thymoma | X | ||||
| 11 | rs1801131 | Retinoblastoma | X | ✓ | ✓ | Soleimani et al[ | No |
| 12 | rs1799793 | Laryngeal cancer | X | ✓ | ✓ | Lu et al[ | No |
| 13 | rs2736100 | Laryngeal cancer | X | ||||
| 14 | rs1801133 | Gallbladder cancer | X | ✓ | ✓ | De Maturana et al[ | No |
| 15 | rs1801133 | Hodgkin lymphoma | X | ✓ | ✓ | Sud et al[ | Yes |
Abbreviations: JC, Jaccard; SNP, single nucleotide polymorphism.
|
|
| 1. imax = 1, jmax = 2 |
| 2. Max = 0 |
| 3. for i = 1 to n |
| 4. for j = i + 1 to n |
| 5. if N[i, j] = 0 |
| 6. Rank = score (i, j) |
| 7. if Rank > Max |
| 8. Max = score (i, j) |
| 9. imax = i, jmax = j |
| 10. return i, j, Score (i, j) |