| Literature DB >> 28407117 |
Morten Nielsen1,2, Massimo Andreatta1.
Abstract
Peptides are extensively used to characterize functional or (linear) structural aspects of receptor-ligand interactions in biological systems, e.g. SH2, SH3, PDZ peptide-recognition domains, the MHC membrane receptors and enzymes such as kinases and phosphatases. NNAlign is a method for the identification of such linear motifs in biological sequences. The algorithm aligns the amino acid or nucleotide sequences provided as training set, and generates a model of the sequence motif detected in the data. The webserver allows setting up cross-validation experiments to estimate the performance of the model, as well as evaluations on independent data. Many features of the training sequences can be encoded as input, and the network architecture is highly customizable. The results returned by the server include a graphical representation of the motif identified by the method, performance values and a downloadable model that can be applied to scan protein sequences for occurrence of the motif. While its performance for the characterization of peptide-MHC interactions is widely documented, we extended NNAlign to be applicable to other receptor-ligand systems as well. Version 2.0 supports alignments with insertions and deletions, encoding of receptor pseudo-sequences, and custom alphabets for the training sequences. The server is available at http://www.cbs.dtu.dk/services/NNAlign-2.0.Entities:
Mesh:
Substances:
Year: 2017 PMID: 28407117 PMCID: PMC5570195 DOI: 10.1093/nar/gkx276
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Published prediction methods based on NNAlign
| NNAlign method | MHC class | Indels | Pseudo-sequence | PCC | AUC |
|---|---|---|---|---|---|
| NetMHC-3.4 a | I | — | — | 0.691 | 0.870 |
| NetMHC-4.0 a | I | X | — |
|
|
| NetMHCpan-2.8 b | I | — | X | 0.709 | 0.882 |
| NetMHCpan-3.0 b | I | X | X |
|
|
| NetMHCII-2.2 c | II | — | — | 0.664 | 0.838 |
| NetMHCIIpan-3.0 c | II | — | X |
|
|
Features implemented in a given method are marked with an X. Methods with indels allow for insertions and deletions in the sequence alignment; methods without pseudo-sequence encoding are allele-specific, methods with pseudo-sequence encoding are pan-specific. Indels and receptor pseudo-sequence encoding are only available in NNAlign version 2.0.
aPerformance values from (12).
bPerformance values from (13).
cPerformance values from (20). The best performing method within each class is highlighted in bold.
Figure 1.(A) Sequence motif identified by NNAlign for the binding specificity of HLA-DRB1*03:01 showing distinct amino acid preferences at the anchor positions P1, P4 and P6. (B) Correlation between the target and predicted log-affinities of the training data, calculated in cross-validation; in this example PCC = 0.721 and SRC = 0.702. Both plots are automatically generated by the NNAlign server and displayed as part of the output.
Figure 2.Sequence motifs identified in a mixture of HLA class I binding data. (A) On unlabeled data, NNAlign generates a motif that is an average of the three specificities contained in the training data. (B) If training data points are labelled with the pseudo-sequence of their receptor, the NNAlign model can learn the different specificities contained in the data. Receptor pseudo-sequences are indicated under their respective HLA receptor name.
Figure 3.Sequence motifs identified by NNAlign for the three transcription factors Tfec (A), Foxo6 (B) and Mybl2 (C), derived from the PBM data of the DREAM5 TF–DNA Motif Recognition Challenge.