| Literature DB >> 23761454 |
Leon Eyrich Jessen1, Ilka Hoof, Ole Lund, Morten Nielsen.
Abstract
Identifying which mutation(s) within a given genotype is responsible for an observable phenotype is important in many aspects of molecular biology. Here, we present SigniSite, an online application for subgroup-free residue-level genotype-phenotype correlation. In contrast to similar methods, SigniSite does not require any pre-definition of subgroups or binary classification. Input is a set of protein sequences where each sequence has an associated real number, quantifying a given phenotype. SigniSite will then identify which amino acid residues are significantly associated with the data set phenotype. As output, SigniSite displays a sequence logo, depicting the strength of the phenotype association of each residue and a heat-map identifying 'hot' or 'cold' regions. SigniSite was benchmarked against SPEER, a state-of-the-art method for the prediction of specificity determining positions (SDP) using a set of human immunodeficiency virus protease-inhibitor genotype-phenotype data and corresponding resistance mutation scores from the Stanford University HIV Drug Resistance Database, and a data set of protein families with experimentally annotated SDPs. For both data sets, SigniSite was found to outperform SPEER. SigniSite is available at: http://www.cbs.dtu.dk/services/SigniSite/.Entities:
Mesh:
Substances:
Year: 2013 PMID: 23761454 PMCID: PMC3692133 DOI: 10.1093/nar/gkt497
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Sequence logo. Example of sequence logo (13) output from SigniSite from the analysis of the ATV ∼Antivirogram multiple sequence alignment (MSA), truncated to p1 – p35 for the purpose of illustration (see ‘Materials and Methods’ section). The analysis was performed with default settings. On the x-axis are the MSA positions p and on the y-axis the Z-scores for each amino acid residue a (). The height of each letter representing the residues is proportional to , i.e. the strength of the statistical association between the residue and the data set-phenotype. Residues above the Z = 0 line have a , i.e. enhances the phenotype, whereas residues below the Z = 0 line have a , i.e. inhibits the phenotype, e.g. the presence of a certain residue with favourable chemical properties may enhance binding (), whereas a residue with unfavourable properties may inhibit binding (). Colour-coding: acidic [DE]: red, basic [HKR]: blue, hydrophobic [ACFILMPVW]: black and neutral [GNQSTY]: green (14).
Figure 2.SigniSite heatmap from the analysis of the ATV ∼Antivirogram multiple sequence alignment (MSA), truncated to p1 – p35 for the purpose of illustration (see ‘Materials and Methods’ section). The analysis was performed with default settings. On the x-axis are the 20 proteogenic amino acids a and on the y-axis the positions p in the analysed MSA. The colour coding of the fields is such that fields reflecting are blue, whereas results in a red field. For , nuances in between are used. If a residue has a of 0, the cell is coloured grey. Absent residues are coloured black. If only one grey cell is present at a given position, this implies that the position is fully conserved, harbouring only this residue. If more grey cells are present, their associated P-values have become after correction for multiple testing.
Benchmark results
| Measure | |||
|---|---|---|---|
| SCC | |||
| MCC | |||
| SENS | |||
| SPEC |
aCalculated against the RMS.
bCalculated against the (RMS + IAS).
Measures are means ± SE. CMT: corrected for multiple testing, SCC: Spearman’s rank correlation, MCC: Matthews Correlation Coefficient, SENS: sensitivity, SPEC: specificity.
Figure 3.Measures are mean (AUC) ± SE. Columns are: HIV [SPEER/SIGNI], SPEER and SigniSite’s predictions on the HIVdb data set. SDP [SPEER/SIGNI] SPEER and SigniSite’s predictions on the SDP data set. P-values quantifying the significance of the difference in performance were obtained using a two-tailed paired t-test.
Overview of target table notation
| Notation | Format | Level | Annotating |
|---|---|---|---|
| RMS | Real num. | Residue | Fold-change in PI resistance |
| IASb | Binary | Residue | PI ass. resistance mutations |
| RMS | Binary | Residue | PI ass. resistance mutations |
| (RMS + IAS) | Binary | Residue | PI ass. resistance mutations |
| (RMS + IAS) | Binary | Position | Positions ass. with PI resistance |
aIt is used when calculating SCC, bit is used to look up mutations not annotated in 1, but repeatedly identified by SigniSite, cit is used when calculating AUC, dit is used for the enriched AUC calculation and when calculating the MCC, SENS and SPEC, eit is used as positional targets, when comparing the predictive performances of SigniSite and SPEER.
‘num.’, ‘ass.’, ‘PI’ abbreviates ‘numbers’, ‘association’ and ‘protease inhibitor’. In all tables, any score is considered an actual positive and any score is considered an actual negative.