| Literature DB >> 36262138 |
Muhammad Aizaz Akmal1, Muhammad Awais Hassan2, Shoaib Muhammad2, Khaldoon S Khurshid2, Abdullah Mohamed3.
Abstract
N-linked is the most common type of glycosylation which plays a significant role in identifying various diseases such as type I diabetes and cancer and helps in drug development. Most of the proteins cannot perform their biological and psychological functionalities without undergoing such modification. Therefore, it is essential to identify such sites by computational techniques because of experimental limitations. This study aims to analyze and synthesize the progress to discover N-linked places using machine learning methods. It also explores the performance of currently available tools to predict such sites. Almost seventy research articles published in recognized journals of the N-linked glycosylation field have shortlisted after the rigorous filtering process. The findings of the studies have been reported based on multiple aspects: publication channel, feature set construction method, training algorithm, and performance evaluation. Moreover, a literature survey has developed a taxonomy of N-linked sequence identification. Our study focuses on the performance evaluation criteria, and the importance of N-linked glycosylation motivates us to discover resources that use computational methods instead of the experimental method due to its limitations.Entities:
Keywords: Artificial intelligence; Deep learning; Glycosylation; Machine learning; N-linked; Performance evaluation criteria
Year: 2022 PMID: 36262138 PMCID: PMC9575850 DOI: 10.7717/peerj-cs.1069
Source DB: PubMed Journal: PeerJ Comput Sci ISSN: 2376-5992
Proposed survey comparison with existing studies.
| Article Ref. No. | Focus | Year | Survey approach | Quality assessment | N-linked model (Tool) | Feature construction | Training algorithm | Organism type | Performance metric (ACC, SN, SP) | Target repository |
|---|---|---|---|---|---|---|---|---|---|---|
|
| Glycosylation sites prediction tool using AI. | 2021 | Informal | ✗ | ✗ | ✓ | ✓ | ✓ | ✗ | ✗ |
|
| Experimental and computation method for PTM site prediction | 2021 | Informal | ✗ | ✗ | ✗ | ✓ | ✗ | ✗ | ✗ |
|
| PTM sites prediction model develop using Chou’s 5 step model. | 2019 | Informal | ✗ | ✗ (other PTM) | ✓ | ✓ | ✗ | ✗ | ✗ |
|
| Research progress in PTM site prediction. | 2019 | Informal | ✗ | ✗ (glyco type not specified) | ✓ | ✓ | ✗ | ✗ | ✗ |
|
| Tools used for PTM. | 2017 | Informal | ✗ | ✓ | ✗ | ✓ | ✗ | ✗ | ✗ |
| This survey | N-linked site prediction tool including training algorithm, and feature approach which helps to construct an efficient model for other PTM. | 2021 | Systematic Review | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | 5 |
Figure 1Research strategy.
Figure 2Research strategy.
Research questions and objective.
| RQ | Research question | Research objective/motivation |
|---|---|---|
| RQ1 | Which are the relevant publishing channel for N-Linked glycosylation research? Which channel type and geographical area target this research? | To identify |
| RQ2 | Which are the exiting prediction model (tool) used for the identification of N-linked Glycosylation sites and for which kind of species these sites are identified? | To help the researchers to identify diseases |
| RQ3 | Which algorithm or method are used to construct N-Linked feature vector? | To understand the in-depth structure of protein sequences to extract useful information to train model. |
| RQ4 | Which algorithm or method are used to train N-Linked model? | To develop efficient tool to predict the N-linked sites through computational approach. |
| RQ5 | How effective are the existing model to predict the N-Linked sites? | By evaluating the |
Figure 3Keyword used to develop query string.
Search group used for search query.
| Digital library | Search query | Applied filter |
|---|---|---|
| IEEE Xplore | (“n linked” OR “Post translation modification”) AND (“prediction model” OR “Artificial Intelligence” OR “Neural Network” OR “Deep Learning”) | 2017–2021 |
| Springer link | (“n linked” OR ”Post translation modification”) AND (“Glycosylation sites” OR “Glycan”) AND (“prediction model” OR “Artificial Intelligence” OR “Neural Network” OR “Deep Learning”) | 2017–2021 |
| Bioinformatics | (n linked OR Post translation modification) AND (Glycosylation sites OR Glycan) AND (prediction model OR Artificial Intelligence OR Neural Network OR Deep Learning) | 2017–2021 |
| PLOS ONE | (“n linked”) AND (“Glycosylation”) AND (“Neural Network” OR “Deep Learning”) | 2017–2021 |
| Google scholar | (“n linked” OR “Post translation modification”) AND (“Glycosylation sites” OR “Glycan”) AND (“prediction model” OR “Artificial Intelligence” OR “Neural Network” OR “Deep Learning”) | 2017–2021 |
Possible rating for recognized and stable publication score.
| Publication source | +4 | +3 | +2 | +1 | 0 |
|---|---|---|---|---|---|
| Journals | Q1 | Q2 | Q3 | Q4 | No JCR ranking |
| Conference | CORE A* | CORE A | CORE B | CORE C | Not in CORE ranking |
Selection phase and results.
| Phase | Selection | Selection criteria | PLOS ONE | Bioinformatics | Springer link | IEEE Xplore | Google scholar | Total articles |
|---|---|---|---|---|---|---|---|---|
| 1 | Search | Keyword ( | 21 | 3 | 47 | 4 | 770 | 845 |
| 2 | Filtering | Title | 15 | 3 | 18 | 3 | 212 | 251 |
| 3 | Filtering | Abstract | 10 | 3 | 13 | 3 | 160 | 189 |
| 4 | Filtering | Introduction and conclusion | 6 | 2 | 7 | 3 | 125 | 143 |
| 5 | Inspection | Full article | 1 | 1 | 2 | 2 | 62 | 68 |
Classification criteria
| Sr. No. | Ref. No. | P.Year | P.Channel | Research type | Empirical type | Species | PTM type | Feature set method | Model training algorithm | Model | (a) | (b) | (c) | (d) | (e) | (f) | SCORE |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 |
| 2017 | Journal | Solution | Computational | Human | N-linked | Position relative and Statistical Moments | ANN/Back propagation | - | 0 | 2 | 1 | 1 | 1 | 4 | 9 |
| 2 |
| 2020 | Journal | Solution | Computational | Human and Mouse | N-linked | Sequence, Structure and Function feature | XGBOOST | N-GlycoGo | 1 | 2 | 1 | 0 | 1 | 4 | 9 |
| 3 |
| 2019 | Journal | Solution | Computational | Human and Mouse | N-linked and O-linked | Sequence and Structure | Deep ANN and SVM | Sprint-Gly | 1 | 2 | 1 | 1 | 1 | 4 | 10 |
| 4 |
| 2021 | Journal | Solution | Computational | Human and Mouse | N-linked | Word embedding Vector Technique | RM, KNN, SVM and XGBoost. | - | 0 | 2 | 1 | 1 | 0 | 4 | 8 |
| 5 |
| 2019 | Journal | Solution | Computational | Human | N-linked | Sequence | ANN | NetGlyco (Exiting) | 1 | 2 | 1 | 1 | 1 | 4 | 10 |
| 6 |
| 2019 | Journal | Solution | Computational | Human | N-linked (and C/O-linked) | Sequence and Structure Feature | PA2DE using AlphaMax | GlycoMine_PU | 1 | 2 | 1 | 1 | 1 | 4 | 10 |
| 7 |
| 2021 | Journal | Solution | Hybrid | Eukaryote | Glycosylation | Sequence feature | Recurrent NN (LSTM) | SweetOrigin | 0 | 2 | 1 | 1 | 1 | 4 | 9 |
| 8 |
| 2021 | Journal | Solution | Computational | Animal | N-linked and O-linked | - | - | GlycoWork | 1 | 2 | 1 | 1 | 1 | 4 | 10 |
| 9 |
| 2021 | bioRxiv | Solution | Computational | Not Mention | Glycosylation | Fingerprint Encoding | MNN (ADAM) | GlyNet | 1 | 2 | 1 | 0 | 1 | 0 | 5 |
| 10 |
| 2019 | Journal | Solution | Computational | Human | N-linked | Similarity voiting and Gap Peptide | SVM | NGlyDE | 1 | 2 | 1 | 1 | 1 | 4 | 10 |
| 11 |
| 2021 | bioRxiv | Solution | Computational | Human | Glycosylation | Protein-Glycan Sequence Feature | Grpah CNN | LectinOracle | 0 | 2 | 1 | 1 | 1 | 0 | 5 |
| 12 |
| 2021 | Journal | Solution | Computational | Human | Glycosylation | Graph and Statistical feature | Graph NN | SweetNet | 1 | 2 | 1 | 1 | 1 | 4 | 10 |
| 13 |
| 2020 | Journal | Solution | Hybrid | Human | N-linked | - | ANN/Kinetic Model | - | 0 | 0 | 1 | 1 | 1 | 4 | 7 |
| 14 |
| 2021 | Journal | Solution | Experimental | Mammalian | Glycosylation | - | MS | - | 0 | 0 | 0 | 0 | 1 | 4 | 5 |
| 15 |
| 2021 | Journal | Review | Computational | Human | Glycosylation | Computational | AI | - | 1 | 2 | 1 | 1 | 4 | 9 | |
| 16 |
| 2021 | Journal | Solution | Computational | Not Mention | N-linked | KDE | Glycan Tree Modler | Rosetta Carbohydrate Framework | 1 | 2 | 1 | 1 | 0 | 0 | 5 |
| 17 |
| 2019 | Journal | Solution | Experimental | Human | N-linked | Flux Balance Analysis | Kinetic | - | 0 | 1 | 0 | 1 | 1 | 2 | 5 |
| 18 |
| 2021 | Journal | Solution | Experimental | Human | N-linked | MS | - | - | 0 | 1 | 0 | 0 | 1 | 4 | 6 |
| 19 |
| 2019 | Journal | Solution | Computational | Human | N-linked and O-linked | Sequence and Structure | Clustring | Glycan Reader and Modeler | 1 | 2 | 1 | 0 | 0 | 4 | 8 |
| 20 |
| 2021 | Journal | Solution | Experimental | Human | N-linked | - | - | - | 0 | 0 | 0 | 0 | 1 | 4 | 5 |
| 21 |
| 2021 | Journal | Solution | Computational | Human | Glycosylation (O-linked) | feature set selected using SVM them mRmR | SVM,RF and NB | VPTMdb | 1 | 2 | 1 | 1 | 1 | 4 | 10 |
| 22 |
| 2021 | Journal | Solution | Hybrid | Human | N-linked | Stochiometric | ANN | - | 0 | 1 | 1 | 1 | 1 | 4 | 8 |
| 23 |
| 2017 | Journal | Solution | Experimental | Mammalian | N-linked | - | - | - | 0 | 1 | 0 | 0 | 1 | 4 | 6 |
| 24 |
| 2021 | Journal | Solution | Computational | Not Mention | PTM (Amidation) | PseAAC | CNN | IAmideV-deep | 1 | 2 | 1 | 1 | 0 | 2 | 7 |
| 25 |
| 2020 | Journal | Solution | Hybrid | Human | N-linked | IQ-GPA human plazma protein | DNN | - | 0 | 1 | 2 | 0 | 4 | 7 | |
| 26 |
| 2020 | Journal | Solution | Computational | Human and Avian | Glycosylation | Frequent Subtree mining and mRMR | Regression Classifier | CCARL | 1 | 2 | 1 | 1 | 1 | 4 | 10 |
| 27 |
| 2018 | Journal | Solution | Computational | Human | PTM (including N linked) | Statistical Moment and F score | RBF Network | PTM Transporter | 1 | 2 | 1 | 1 | 1 | 2 | 8 |
| 28 |
| 2019 | Journal | Review | Computational | Not Mention | N-linked | Provided | Provided | Provided | 1 | 2 | 1 | 0 | 0 | 4 | 8 |
| 29 |
| 2017 | Journal | Review | Computational | Not Mention | N-linked | - | Provided | Provided | 1 | 0 | 1 | 1 | 0 | 4 | 7 |
| 30 |
| 2019 | Journal | Review | Experimental | Human | N-linked and O-linked | - | - | - | 0 | 0 | 1 | 4 | 5 | ||
| 31 |
| 2020 | Journal | Solution | Experimental | Human | Glycan (including N) | - | - | - | 1 | 1 | 4 | 6 | |||
| 32 |
| 2021 | Journal | solution | Computational | Not Mention | Glycosylation (O-linked) | Sequence feature | RF | OGP-Based | 1 | 2 | 1 | 1 | 0 | 4 | 9 |
| 33 |
| 2021 | Journal | Review | Computational | Not Mention | Glycosylation | - | Provided | Provided | 1 | 2 | 1 | 0 | 0 | 4 | 8 |
| 34 |
| 2020 | Journal | solution | Hybrid | Human | N-linked | MS | - | Existing Tool | 1 | 1 | 0 | 1 | 1 | 4 | 8 |
| 35 |
| 2021 | Journal | solution | Experimental | Human | Glycosylation (including N) | - | - | - | 0 | 1 | 0 | 1 | 1 | 2 | 5 |
| 36 |
| 2021 | Journal | solution | Computational | Mammalian | N-linked | Unknown Parameter and Structure | Baysen Network | - | 0 | 1 | 1 | 0 | 1 | 4 | 7 |
| 37 |
| 2019 | Conference | Solution | Computational | Human | Protein Prediction | Frequency Feature of AA and EH Method | SVM and NN | PPSNN | 1 | 2 | 1 | 0 | 1 | 0 | 5 |
| 38 |
| 2020 | Journal | Solution | Experimental | Human | N-linked | - | MS | - | 0 | 0 | 1 | 0 | 1 | 4 | 6 |
| 39 |
| 2017 | Journal | solution | Hybrid | human | N-linked | CfsSubSetEval | SVM | - | 0 | 1 | 2 | 0 | 1 | 4 | 8 |
| 40 |
| 2018 | Journal | Solution | Experimental | Human | N-linked | - | MS | - | 0 | 0 | 1 | 1 | 1 | 2 | 5 |
| 41 |
| 2018 | Journal | Solution | Hybrid | Human | N-linked | Structural Feature | Maturation | - | 0 | 1 | 1 | 1 | 1 | 4 | 8 |
| 42 |
| 2019 | Journal | Solution | Computational | Human | Glycosylation and Phosphorylation | Membrane Buried, Confrontational and average Flexible Indices | NN+ELM+SVM | CMSENN | 1 | 2 | 1 | 0 | 1 | 3 | 8 |
| 43 |
| 2019 | Journal | Solution | Experimental | human | N-linked | - | MS | - | 0 | 0 | 0 | 1 | 1 | 4 | 6 |
| 44 |
| 2018 | Journal | Solution | Computational | Not Mention | Glycosylation (O-linked) | KPCA and FUS | Rotation Forest | OGLYCPred | 1 | 2 | 1 | 1 | 4 | 9 | |
| 45 |
| 2019 | Journal | Review | Computational | Human | Non-Glycosylation | KC Chou’s 5 step | - | Povided | 1 | 1 | 0 | 1 | 1 | 1 | 5 |
| 46 |
| 2020 | Journal | Solution | Hybrid | Human | N-linked | Statistical Moment | ANN | THETA Model | 1 | 2 | 1 | 1 | 1 | 4 | 10 |
| 47 |
| 2021 | Journal | Solution | Hybrid | Human | Protein Traffic membrane (N and O) | Topology and Putative SLiMs | CNN with Adam | PolarportPred | 1 | 1 | 1 | 0 | 1 | 4 | 8 |
| 48 |
| 2020 | Conference | Solution | Hybrid | Human | PTM | Psycho-Chemical, structural and PTM | ML | - | 0 | 2 | 1 | 1 | 1 | 0 | 5 |
| 49 |
| 2019 | Journal | solution | Computational | Human | PTM | Chou’s 5-steps | ANN | - | 0 | 2 | 1 | 1 | 1 | 2 | 7 |
| 50 |
| 2017 | Journal | Solution | Computational | Mammalian | Glycosylation (O-linked) | Protein factor base Features | KNN | - | 0 | 2 | 1 | 0 | 1 | 2 | 6 |
| 51 |
| 2019 | Journal | Solution | Hybrid | Not Mention | N-linked | Sequences | RM, Super Learner and Glmnet | - | 0 | 1 | 1 | 1 | 0 | 4 | 7 |
| 52 |
| 2021 | Journal | Solution | Experimental | Human | N-linked | - | RanoLC-MS | - | 1 | 1 | 0 | 1 | 1 | 4 | 8 |
| 53 |
| 2017 | Journal | review | Hybrid | Human | Glycosylation | Partial Mentioned | Partial Mentioned | - | 1 | 1 | 1 | 0 | 1 | 1 | 5 |
| 54 |
| 2018 | Journal | Solution | Computational | Not Mention | Glycosylation (O-linked) | FUS and KPCA | KNN,RM,SVM and NB, SVM outperform | rgb 0.141, 0.125, 0.129O-GlcNAcPRED-II | 1 | 2 | 1 | 1 | 0 | 4 | 9 |
| 55 |
| 2021 | Journal | Solution | Experimental | Human | N-linked | - | MS | - | 0 | 1 | 0 | 0 | 1 | 4 | 6 |
| 56 |
| 2021 | Journal | Solution | Experimental | Human | Glycosylation | - | MS | - | 0 | 0 | 0 | 0 | 1 | 4 | 5 |
| 57 |
| 2021 | bioRxiv | Solution | Hybrid | Human | N-linked | Sequence | ML | - | 0 | 2 | 1 | 1 | 1 | 0 | 5 |
| 58 |
| 2021 | Journal | Solution | Hybrid | Not Mention | Glycosylation | MS | SVM | - | 0 | 1 | 1 | 1 | 0 | 4 | 7 |
| 59 |
| 2021 | Journal | Solution | Computational | Human | PTM | Binary Encoding,AAC,EAAC and Dipeptide | Deep Learning | CNNrgb | 1 | 2 | 1 | 1 | 1 | 4 | 10 |
| 60 |
| 2017 | Conference | Solution | Computational | Human | Glycosylation (O-linked) | Vector Word | SVM | GLycoCell | 1 | 2 | 1 | 1 | 1 | 0 | 6 |
| 61 |
| 2021 | Journal | Solution | Computational | Human | PTM | Sequences | AI | - | 0 | 2 | 1 | 0 | 1 | 3 | 7 |
| 62 |
| 2020 | Journal | Solution | Computational | Not Mention | Protein | AAC, PseAAC,NC, PseKNC | adaboost and random forest | PPAI | 1 | 2 | 1 | 0 | 0 | 4 | 8 |
| 63 |
| 2017 | Journal | Solution | Hybrid | Not Mention | PTM (S-sulfenylated) | Psysciochemical and Clustring Method | Ensemble Classifier | - | 0 | 1 | 2 | 1 | 0 | 1 | 5 |
| 64 |
| 2021 | bioRxiv | Solution | Computational | Not Mention | PTM (Ubiquitination) | Statistical Moment | Random Forest | UBISites-SRF | 1 | 2 | 1 | 0 | 1 | 0 | 5 |
| 65 |
| 2018 | Journal | Solution | Computational | Not Mention | PTM (Lipoylation) | Biprofile Bayes Encoding | SVM | LipoPred | 1 | 1 | 1 | 0 | 0 | 3 | 6 |
| 66 |
| 2019 | Journal | Solution | Hybrid | Human | PTM | SNP | - | Awesome | 1 | 1 | 1 | 1 | 1 | 4 | 9 |
| 67 |
| 2021 | Journal | Solution | Computational | Human | PTM | UbiSite-XGBoost | Extreme gradient boosting classifier | UbiSite=XGBoost | 1 | 2 | 1 | 1 | 1 | 3 | 9 |
| 68 |
| 2017 | Journal | Solution | Computational | Human | N-linked | ProDCal | Jrip Classifier | Sequon | 0 | 2 | 1 | 1 | 1 | 4 | 9 |
| 69 |
| 2018 | Journal | Solution | Computational | Human | Palmitoylation | PSSM | SVM | RAREPalm | 1 | 2 | 1 | 0 | 1 | 3 | 8 |
| 70 |
| 2018 | Journal | Solution | Computational | Human | PTM | Sequence, Structure | KNN | - | 0 | 2 | 1 | 1 | 1 | 4 | 9 |
Figure 4Year wise distribution of publication.
Figure 5Percentage of publication channel.
Figure 6Demographical distribution of publication.
Quality assessment score.
| Reference | QA Score | Total articles |
|---|---|---|
| 10 | 10 | |
| 9 | 11 | |
| 8 | 14 | |
| 7 | 9 | |
| 6 | 9 | |
| 5 | 17 |
Percentage count of articles published in channel.
| Publication source | Reference | Count | %age |
|---|---|---|---|
| Amino Acids |
| 1 | 1 |
| Analytical and Bioanalytical Chemistry |
| 1 | 1 |
| Bioinformatics | 5 | 7 | |
| bioRxiv | 5 | 7 | |
| Biotechnology and Bioengineering |
| 1 | 1 |
| BMC Bioinformatics | 3 | 4 | |
| Briefings in Bioinformatics |
| 2 | 3 |
| Briefings in Functional Genomics |
| 1 | 1 |
| Cell Host Microbe |
| 1 | 1 |
| Cell Reports |
| 1 | 1 |
| Chemometrics and Intelligent Laboratory Systems | 2 | 3 | |
| Computational and Structural Biotechnology Journal |
| 1 | 1 |
| Computational Biology and Chemistry | 2 | 3 | |
| Computers Chemical Engineering |
| 1 | 1 |
| Computers in Biology and Medicine |
| 1 | 1 |
| Current Bioinformatics | 2 | 3 | |
| Current Genomics |
| 1 | 1 |
| Current Opinion in Chemical Engineering |
| 1 | 1 |
| Environmental Microbiology |
| 1 | 1 |
| Expert Review of Proteomics |
| 1 | 1 |
| Frontiers in Endocrinology |
| 1 | 1 |
| Fuzzy Systems and Data Mining |
| 1 | 1 |
| Genomics, Proteomics Bioinformatics | 2 | 3 | |
| Glycobiology | 3 | 4 | |
| IEEE Access |
| 1 | 1 |
| IEEE International Conference on Machine Learning and Applied Network Technologies (ICMLANT) |
| 1 | 1 |
| International Conference of Pioneering Computer Scientists, Engineers and Educators. Springer, Singapore, |
| 1 | 1 |
| Journal of Bimolecular Techniques |
| 1 | 1 |
| Journal of Computational Biology |
| 1 | 1 |
| Journal of Molecular Graphics and Modelling |
| 1 | 1 |
| Journal of Proteomics | 2 | 3 | |
| Journal of the American Chemical Society | 2 | 3 | |
| Letters in Organic Chemistry |
| 1 | 1 |
| Mathematical Bioscience |
| 1 | 1 |
| Metabolic Engineering Communications |
| 1 | 1 |
| Molecular Cellular Proteomic |
| 1 | 1 |
| Nature Communications |
| 1 | 1 |
| Nucleic Acids Research |
| 1 | 1 |
| PLoS Computational Biology |
| 1 | 1 |
| PLOS ONE |
| 1 | 1 |
| Processes |
| 1 | 1 |
| Scientific Reports | 4 | 6 | |
| Symmetry |
| 1 | 1 |
| The American Journal of Human Genetics |
| 1 | 1 |
| Trends Artifi. Intell |
| 1 | 1 |
| Trends in Biochemical Science |
| 1 | 1 |
| Trends in Glycoscience and Glycotechnolog |
| 1 | 1 |
| Trends in Microbiology |
| 1 | 1 |
Figure 7Tool available for N-linked sites identification.
N-Linked glycosylation available tool.
| Ref. | P.Year | Species | Tool | Finding |
|---|---|---|---|---|
|
| 2021 | Eukaryote | SweetOrigin | The model develop to identify Glycosylation sites using Hybrid approach on Eukaryotes. |
|
| 2021 | Animal | GlycoWork | The computational model used to identify both N and O-linked in Animal. |
|
| 2021 | Not mention | GlyNet | The computational model used to identify glycosylation protein sequences. |
|
| 2021 | Human | LectinOracle | The computational model used to identify glycosylation protein sequences for human. |
|
| 2021 | Human | SweetNet | The computational model used to identify glycosylation protein sequences for human. |
|
| 2021 | Not mention | Rosetta Carbohydrate Framework | The computational model used to identify N-linked sites and species are not mentioned. |
|
| 2021 | Not mention | Provided | The computational model used to identify glycosylation sites and species are not mentioned. |
|
| 2021 | Human | CNNrgb | The computational model used to identify PTM sites for human protein. |
|
| 2021 | Human | UbiSite = XGBoost | The computational model used to identify PTM sites for human protein. |
|
| 2020 | Human and Mouse | N-GlycoGo | The computational model used to identify N-Linked sites for human and mouse protein sequences. |
|
| 2020 | Human and avian | CCARL | The computational model used to identify glycosylation sites for human and avian protein sequences. |
|
| 2020 | Human | Existing Tool | The hybrid model consists of both experimental and computational approach to develop N-linked site identification on human protein |
|
| 2020 | Human | THETA Model | The hybrid model consists of both experimental and computational approach to develop N-linked site identification on human protein |
|
| 2019 | Human and Mouse | Sprint-Gly | The computational model used to identify both N and O-linked in human and Mouse. |
|
| 2019 | Human | NetGlyco (Exiting) | The computational model used to identify N-linked sites in human. |
|
| 2019 | Human | GlycoMine_PU | The computational model used to identify N, O and C-linked in human. |
|
| 2019 | Human | NGlyDE | The computational model used to identify N-linked in human. |
|
| 2019 | Human | Glycan Reader and Modeler | The computational model used to identify both N and O-linked in human. |
|
| 2019 | Not mention | Provided | The computational model used to identify N-linked sites while specie is not mentioned. |
|
| 2019 | Human | Awesome | The hybrid approach develop to identify PTM sites for human. |
|
| 2018 | Human | PTM Transporter | The computational approach developed PTM sites including N-Linked sites for human. |
|
| 2017 | Not mention | Provided | The computational model used to identify N-linked sites while specie type is missing. |
|
| 2017 | Human | Sequon | Computational method to identify N-Linked sites for human. |
Feature methods for the N-linked sites identification.
| Ref. | Glyotype | Method for feature | Finding |
|---|---|---|---|
|
| N-Linked | Word embedding Vector Technique | Word embedding technique to efficiently predict N-linked glycosylation sites in ion channels. |
|
| N-Linked | KDE | Kernel Density Estimation based feature extracted. |
|
| N-Linked | Stoichiometric | Hybrid method that used the experimental data using stoichiometric. |
|
| N-Linked | Unknown Parameter and Structure | Protein structure feature and some undefined features used to construct feature vector. |
|
| N-Linked | Sequence | Sequence based features computed. |
|
| N-Linked | Sequence, Structure and Function feature | sequence, structure and function base feature set of human and mouse used to predict site on imbalance dataset. |
|
| N-Linked | IQ-GPA human plasma protein | IQ-GPA procedure was used to obtain data from human plasma. |
|
| N-Linked | MS | Hybrid method based on Mass Spectrometry used data used for training. |
|
| N-Linked | Sequence | Sequence based protein sequences have computed. |
|
| N-Linked | Similarity voting and Gap Peptide | Similarity Voting method and gap peptide method used to construct features. |
|
| N-Linked | Sequences | Sequence based protein sequences have computed. |
|
| N-Linked | Structural Feature | Structure based protein sequences have computed. |
|
| N-Linked | Position relative and Statistical Moments | Position relative features and statistical moment based features have computed. |
|
| N-Linked | CfsSubSetEval | Patients with different drug responses |
|
| N-Linked | ProDCal | ProtDCal method used to get protein features. |
|
| N-Linked | Statistical Moment | Statistical Moments computed to construct feature vector. |
|
| N-Linked (and C/O-Linked) | Sequence and Structure Feature | Sequence and structure based protein sequences have computed. |
|
| N-Linked and O-Linked | Sequence and Structure | Sequence and structure based protein sequences have computed. |
|
| N-Linked and O-Linked | Sequence and Structure | Sequence and structure based protein sequences have computed. |
|
| Glycosylation | Sequence feature | Develop models for glycans that are trained on a curated dataset of 19,299 unique glycans and used sequence based features. |
|
| Glycosylation | Fingerprint Encoding | Feature vector based on Fingerprint encoding method for Predicting Protein-Glycan Interaction |
|
| Glycosylation | Protein-Glycan Sequence Feature | The sequence feature of combined protein and glycan are used to extract feature vector based on sequence features. |
|
| Glycosylation | Graph and Statistical feature | Graph algorithm and statistical moments are used to construct feature matrix for glycan. |
|
| Glycosylation | MS | Hybrid method based on Mass Spectrometry used data used for training. |
|
| Glycosylation | Frequent Subtree mining and mRMR | frequent subtree mining and mRMR used for feature vector construction. |
|
| PTM | Sequences | Sequence based features used for feature vector construction. |
|
| PTM | UbiSite-XGBoost | Pseudo ACC, K-spaced Acid Pair, Adapted Normal Distribution bi-profile Bayes, AA Index, Encoding Based Group Weight, LASSO, SMOTE and eXtreme Gradient Boosting features methods are used. |
|
| PTM | Psycho-Chemical, structural and PTM | Psycho-Chemical, structure moment of protein and PTM sequence features were used. |
|
| PTM | Chou’s 5-steps | Chou’s 5-steps based feature vector was used. |
|
| PTM | SNP | Single Nucleotide Polymorphism approach used to compute features. |
|
| PTM | Sequence, Structure | Sequence and Structure based protein sequences have computed. |
|
| PTM | Binary Encoding, AAC, EAAC and Dipeptide | Various features have extracted including binary encoding, Amino Acid Composition, Enhanced AAC and Dipeptide. |
|
| PTM (including N Linked) | Statistical Moment and F score | Statistical moment used and then F-Score was computed |
Training algorithm (method) used for N-linked model.
| Ref. | Model training algorithm | PTM type | Finding |
|---|---|---|---|
|
| ANN/Back propagation | N-Linked | Prediction of N-linked glycosylation sites using position relative features and statistical moments through multilayered ANN using back propagation approach. |
|
| XGBOOST | N-Linked | Extreme Gradient Boost method was used to predict site on imbalance dataset. |
|
| RF, KNN, SVM and XGBoost | N-Linked | Various classifiers were used for prediction including Random Forest, K-Nearest Neighbor, Support Vector Machine and XGBoost but RM outperform. |
|
| ANN | N-Linked | Artificial Neural Network algorithm used to identify N-linked site in Influenza virus using existing model on dataset. |
|
| SVM | N-Linked | N-GlyDE: a two-stage N-linked glycosylation site prediction incorporating gapped dipeptides and pattern-based encoding using SVM after collecting feature vector through two stages. |
|
| ANN/Kinetic Model | N-Linked | artificial neural networks and Kinetic model used for predicting protein glycosylation. |
|
| Glycan Tree Modler | N-Linked | prediction based on Tree method. |
|
| Kinetic | N-Linked | a two-component modeling framework integrating FBA and glycosylation kinetic model was used for prediction. |
|
| ANN | N-Linked | predict N linked sites using features computed by stoichiometric and then train model using ANN with forward propagation. |
|
| DNN | N-Linked | N linked site using DNN which later used to classify fucosylation |
|
| Baysen Network | N-Linked | Probabilistic model by Bayesian network for the prediction of antibody glycosylation in perfusion and fed-batch cell cultures |
|
| SVM | N-Linked | Drug responses identified using SVM method. |
|
| ANN | N-Linked | New genotypic approach for predicting HIV-1 CRF02-AG using ANN |
|
| RM, Super Learner and Glmnet | N-Linked | Protein sequence and biological data used to identify N-linked sites using super learner algorithm. |
|
| ML | N-Linked | Guide to Lectin Binding: Machine-Learning Directed Annotation of 57 Unique Lectin Specificities |
|
| Jrip Classifier | N-Linked | Novel “extended sequons” of human N-glycosylation sites improve the precision of qualitative predictions: an alignment-free study of pattern recognition using ProtDCal protein features. |
|
| Clustering | N-O Linked | CHARMM-GUI Glycan Modeler for modeling and simulation of carbohydrates and glycoconjugates. |
|
| CNN with Adam | N-O Linked | Novel mechanism to collect dataset using polarization and then train on CNN model. |
|
| Deep ANN and SVM | N-O Linked | Predicting N-and O-linked glycosylation sites of human and mouse proteins by using sequence and predicted structural properties through DNN and SVM |
|
| PA2DE using AlphaMax | N-C-O Linked | Positive-unlabeled data set used to predict sites using AlphaMax algorithm |
|
| Recurrent NN (LSTM) | Glycosylation | develop deep-learning using Recurrent NN models used for glycans that are trained on a curated dataset of 19,299 unique glycans and can be used to study and predict glycan functions. |
|
| MNN (ADAM) | Glycosylation | A Multi-Task Neural Network using ADAM algorithm used for Predicting Protein-Glycan Interaction |
|
| Graph CNN | Glycosylation | LectinOracle, a model combining transformer-based representations for proteins and graph convolutional neural networks for glycans to predict their interaction. |
|
| Graph NN | Glycosylation | sing graph convolutional neural networks to learn a representation for glycans. |
|
| Regression Classifier | Glycosylation | frequent subtree mining and mRMR used for feature selection then train on regression classifier for glycan motifs. |
|
| SVM | Glycosylation | The local-balanced model for improved machine learning outcomes on mass spectrometry data sets and other instrumental data |
|
| RBF Network | PTM | prediction of transport protein (including N linked) into three classes and six families using RBF Network. |
|
| Deep Learning | PTM | nhKcr: a new bioinformatics tool for predicting crotonylation sites on human non histone proteins based on deep learning |
|
| Ensemble Classifier | PTM | Predicting S-sulfenylation sites using physicochemical properties difference and ensemble classifier. |
|
| Random Forest | PTM | Ubiquitination Sites Prediction Using Statistical Moment with Random Forest Approach. |
|
| ML | PTM | Machine Learning techniques to identify potential drug targets for Anti-epileptic drugs |
|
| AI | PTM | artificial intelligence be used for peptidomics |
|
| SVM | PTM | Predicting protein lysine methylation sites by incorporating single-residue structural features into Chou’s pseudo components. |
|
| Extreme gradient boosting classifier | PTM | Prediction of protein ubiquitination sites |
|
| KNN | PTM | Feature extractions for computationally predicting protein post-translational modifications |
Performance comparison of N-linked models.
| Ref. | Glycosylation type | Result comparison on | Tool | Dataset | ACC (%) | SN (%) | SP (%) | Finding |
|---|---|---|---|---|---|---|---|---|
|
| N-Linked | Yes | No | Yes | 99.9 | 99.8 | 99.9 | Detail Comparison has perform and also present metrics but tool is not available |
|
| N-Linked | Yes | Yes | No | 84.7 | 82.8 | 84.8 | Detail Comparison has performed and also present metrics. But data set is not available |
|
| N-Linked and O-Linked | Yes | Yes | Yes | 97.5 | 98 | – | Detail Comparison has performed and also present metrics. |
|
| N-Linked | Yes | No | Yes | 93.4 | 98.6 | 92.8 | Detail comparison has perform and also present metrics but tool is not available |
|
| N-Linked | No | Yes | Yes | 50 | – | – | Not compare the result properly. |
|
| N-Linked (and C/O-Linked) | Yes | Yes | Yes | 88.6 | – | – | Detailed comparison has performed but SN and SP not computed |
|
| Glycosylation | No | No | Yes | 75 | – | – | Result not compare properly and also missing few metrics. |
|
| Glycosylation | No | Yes | No | 75 | – | – | Tool is available but data set is missing and did not perform all performance metric |
|
| N-Linked | Yes | Yes | Yes | 74 | 49 | – | Detailed comparison has performed but SP. |
|
| Glycosylation | No | No | Yes | 72 | – | – | Result are not performed properly as missing metrics and tool. |
|
| Glycosylation | No | Yes | Yes | 85 | – | – | Detailed comparison performed but missing few metrics |
|
| N-Linked | No | No | Yes | – | – | – | Did not specify results. |
|
| N-Linked | No | No | Yes | – | – | – | Did not specify results. |
|
| N-Linked and O-Linked | No | Yes | No | – | – | – | Did not specify results. |
|
| N-Linked | No | No | Yes | – | – | – | Did not specify results. |
|
| N-Linked | No | No | No | 99 | 100 | – | Achieved almost full accuracy but result comparison with independent data set, data set and tool is missing. |
|
| Glycosylation | No | Yes | Yes | 89 | – | – | Achieve good result but comparison on independent data set is missing and glycosylation type is not specified. |
|
| PTM (including N Linked) | Yes | Yes | Yes | 92 | – | – | Detailed comparison has performed and achieved good results but not specify the PTM type. |
|
| N-Linked | Yes | No | No | – | – | – | Did not specify results. |
|
| N-Linked | Yes | No | No | – | – | – | Did not specify results. |
|
| N-Linked | Yes | Yes | Yes | 88 | 86 | 89 | Detailed comparison has performed and also achieved good results. |
|
| PTM | Yes | No | Yes | – | – | – | Did not specify results. |
|
| PTM | Yes | No | Yes | – | – | – | Did not specify results. |
|
| N-Linked | Yes | No | Yes | 86 | 97 | 39 | Detailed comparison has performed and also achieved good results. |
|
| N-Linked | No | Yes | Yes | – | – | – | Did not specify results. |
|
| N-Linked | No | No | Yes | – | – | – | Did not specify results. |
|
| Glycosylation | No | No | Yes | 98 | – | – | Achieve good result but glycosylation type is not specified and missing few metrics |
|
| PTM | Yes | Yes | Yes | 85 | 62 | 90 | Detailed comparison has performed and achieved good result, but it is generic for PTM as specific type was not mentioned |
|
| PTM | No | No | No | – | – | – | Did not specify results. |
|
| PTM | No | Yes | Yes | 97 | – | – | Detailed comparison has performed and achieved good result, but SN and SP are missing |
|
| N-Linked | No | No | Yes | 99 | 82 | – | Detailed comparison has performed and achieved good result, but data set is missing. |
|
| PTM | No | No | Yes | – | – | – | Did not specify results. |
Taxonomy coding scheme for SLR.
| Domain | Code | Subdomain | Reference |
|---|---|---|---|
| Feature set method | SMF | Statistical Moment Feature | |
| SEF | Sequence Based Feature | ||
| SQF | Structure Based Feature | ||
| WEF | Word Embedding Feature |
| |
| SVF | Similarity Voting Feature |
| |
| Machine training algorithm | ANN | Artificial Neural Network | |
| SVM | Support Vector Machine | ||
| DNN | Deep Neural Network | ||
| GNN | Graph Neural Network | ||
| RBF | Radial Basis Function |
| |
| Performance metric | ACC | Accuracy | |
| SP | Specificity | ||
| SN | Sensitivity |
Figure 8Taxonomy of N-Linked site identification perspective.