Literature DB >> 23761454

SigniSite: Identification of residue-level genotype-phenotype correlations in protein multiple sequence alignments.

Leon Eyrich Jessen¹, Ilka Hoof, Ole Lund, Morten Nielsen.

Abstract

Identifying which mutation(s) within a given genotype is responsible for an observable phenotype is important in many aspects of molecular biology. Here, we present SigniSite, an online application for subgroup-free residue-level genotype-phenotype correlation. In contrast to similar methods, SigniSite does not require any pre-definition of subgroups or binary classification. Input is a set of protein sequences where each sequence has an associated real number, quantifying a given phenotype. SigniSite will then identify which amino acid residues are significantly associated with the data set phenotype. As output, SigniSite displays a sequence logo, depicting the strength of the phenotype association of each residue and a heat-map identifying 'hot' or 'cold' regions. SigniSite was benchmarked against SPEER, a state-of-the-art method for the prediction of specificity determining positions (SDP) using a set of human immunodeficiency virus protease-inhibitor genotype-phenotype data and corresponding resistance mutation scores from the Stanford University HIV Drug Resistance Database, and a data set of protein families with experimentally annotated SDPs. For both data sets, SigniSite was found to outperform SPEER. SigniSite is available at: http://www.cbs.dtu.dk/services/SigniSite/.

Entities: CellLine Chemical Disease Gene Species

Mesh：

Substances：

Year: 2013 PMID： 23761454 PMCID： PMC3692133 DOI： 10.1093/nar/gkt497

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

Whether conducting research in vaccine design or trying to elucidate the intimate details of a given receptor::ligand interaction, genotype–phenotype correlation is a powerful tool to enhance the understanding of the minute subtleties, often characterizing research within the field of molecular biology. The traditional approach for wet-laboratory analysis of genotype–phenotype correlations involves site-directed mutagenesis and subsequent quantification of mutation-impact on the phenotype, e.g. binding-affinity or catalytic efficiency. This approach of mutating all amino acid residues in a given protein is a time consuming and tedious task. Random mutagenesis has the advantage of introducing a large number of random mutations throughout the protein. One example of application of random mutagenesis is to increase the signal from near-infrared fluorescent proteins (1). In such a panel of sequenced variants with multiple mutations, it is a complex task to systematically pinpoint the exact amino acid residue(s), i.e. the genotype, associated with a given phenotype (e.g. fluorescence). Another area of application is genotype–phenotype association studies in proteins, which show inherent natural variability, as is the case for instance for proteins involved in the pathogenesis of malaria (2). Here, we present SigniSite, an online application for subgroup-free residue-level genotype–phenotype correlation in protein multiple sequence alignments (MSAs). A number of methods have been developed for the identification of functional sites in protein sequences (3–10), most requiring a definition of functional subgroups before analysis. However, if the phenotype associated with the sequences is not categorical (e.g. substrate-specificity) but continuous (e.g. catalytic efficiency), a pre-division of sequences subgroups is none trivial. In contrast, SigniSite does not require any subgroup division or binary classification. Instead, SigniSite directly analyses the raw sequences and associated continuous values. The main novelty of SigniSite is that unlike conventional methods for the prediction of specificity determining positions (SDP), it not only predicts the positions in the MSA determining a given protein function but also makes a statistical evaluation of which types of amino acid residue substitutions (genotype) are associated with the observable phenotype at the SDP. The web server implementation of the SigniSite method described here is an automatized online application with an easy-to-interpret graphical output. The application is easy to use for the non-expert end-user and aims at aiding researchers in the analysis of sequence data, where the phenotype is quantified by a real number. A list of abbreviations is available in the Supplementary Data.

THE WEB SERVER

User interface

The SigniSite server is intended to provide the non-expert user with a simple interface. At default settings, an amino acid residue is considered significantly associated with the MSA phenotype, if the P-value for the specific residue is smaller than or equal to after Bonferroni Single-Step Correction for Multiple Testing (CMT) (11). On the submission page, sequences can be submitted to the server either as paste-in or via the file upload field. On submission, SigniSite will check whether the submitted sequences are aligned. If not, an MSA will be created using MAFFT (12). SigniSite will exclude any characters other than the one-letter representation of the 20 standard proteogenic amino acids from the analysis.

Input

As input SigniSite takes an MSA in FASTA-format (minimum two sequences). Each sequence must have an associated real number, stated white-space-separated as the last element in its FASTA header. At least two different values must exist in the MSA. The MSA is assumed pre-sorted, if the end-placed value is absent. A section with options for customizing the analysis is available. The following parameters are user-adjustable: (i) the level of significance ‘α’, (default is 0.05). (ii) The method for CMT: ‘Bonferroni Single-Step’ (default), ‘Holm Step-Down’ (11) or ‘no correction’. (iii) The sorting of the sequences: ‘Decreasing’, highest sequence-associated value is considered the strongest, e.g. fluorescent protein signals, and vice versa for ‘Increasing’, e.g. binding affinity. Furthermore, the user can choose a reference sequence to assign sequence-specific positional output numbering. This is useful, when the MSA contains insertions. Finally, the user can modify the logo output by choosing to include either ‘Significant positions’ (default, displays all residues at positions where at least one amino acid residue has been identified as significantly associated with the data set phenotype), ‘Significant Residues’ (as for significant positions, but only including significant residues) or ‘Full Logo’ (all residues at all positions). At the results page, a button below the generated logo allows the user to fully customize the logo using Seq2Logo (13).

Output

The SigniSite output is intended to provide the end-user with an easily interpretable graphical representation of the statistical evaluations performed by SigniSite. An example of a sequence logo (13) generated by SigniSite is shown in Figure 1. The logo gives an overview of residue associations. See Figure 1 legend for further details. SigniSite will also generate a heatmap (Figure 2). The heatmap is intended to give a graphic overview of ‘hot’ and ‘cold’ regions in the MSA, with respect to the data set phenotype. See Figure 2 legend for details.

Figure 1.

Figure 2.

SigniSite heatmap from the analysis of the ATV ∼Antivirogram multiple sequence alignment (MSA), truncated to p1 – p35 for the purpose of illustration (see ‘Materials and Methods’ section). The analysis was performed with default settings. On the x-axis are the 20 proteogenic amino acids a and on the y-axis the positions p in the analysed MSA. The colour coding of the fields is such that fields reflecting are blue, whereas results in a red field. For , nuances in between are used. If a residue has a of 0, the cell is coloured grey. Absent residues are coloured black. If only one grey cell is present at a given position, this implies that the position is fully conserved, harbouring only this residue. If more grey cells are present, their associated P-values have become after correction for multiple testing.

Sequence logo. Example of sequence logo (13) output from SigniSite from the analysis of the ATV ∼Antivirogram multiple sequence alignment (MSA), truncated to p1 – p35 for the purpose of illustration (see ‘Materials and Methods’ section). The analysis was performed with default settings. On the x-axis are the MSA positions p and on the y-axis the Z-scores for each amino acid residue a (). The height of each letter representing the residues is proportional to , i.e. the strength of the statistical association between the residue and the data set-phenotype. Residues above the Z = 0 line have a , i.e. enhances the phenotype, whereas residues below the Z = 0 line have a , i.e. inhibits the phenotype, e.g. the presence of a certain residue with favourable chemical properties may enhance binding (), whereas a residue with unfavourable properties may inhibit binding (). Colour-coding: acidic [DE]: red, basic [HKR]: blue, hydrophobic [ACFILMPVW]: black and neutral [GNQSTY]: green (14). SigniSite heatmap from the analysis of the ATV ∼Antivirogram multiple sequence alignment (MSA), truncated to p1 – p35 for the purpose of illustration (see ‘Materials and Methods’ section). The analysis was performed with default settings. On the x-axis are the 20 proteogenic amino acids a and on the y-axis the positions p in the analysed MSA. The colour coding of the fields is such that fields reflecting are blue, whereas results in a red field. For , nuances in between are used. If a residue has a of 0, the cell is coloured grey. Absent residues are coloured black. If only one grey cell is present at a given position, this implies that the position is fully conserved, harbouring only this residue. If more grey cells are present, their associated P-values have become after correction for multiple testing.

RESULTS

As an initial performance evaluation, we chose to analyse 18 human immunodeficiency virus type 1 (HIV-1) MSAs compiled from the Stanford University HIV Drug Resistance Database (15,16) (HIVdb) using Spearman’s rank correlation (SCC) to correlate the obtained SigniSite Z-scores ( for each residue a at each position p) with the table of resistance mutation scores (RMS) also available from the HIVdb (see ‘Materials and Methods’ section), i.e. . Results are given in Table 1.

Table 1.

Benchmark results

Measure
SCC^a
MCC^b
SENS^b
SPEC^b

aCalculated against the RMS.

bCalculated against the (RMS + IAS).

Measures are means ± SE. CMT: corrected for multiple testing, SCC: Spearman’s rank correlation, MCC: Matthews Correlation Coefficient, SENS: sensitivity, SPEC: specificity.

Benchmark results aCalculated against the RMS. bCalculated against the (RMS + IAS). Measures are means ± SE. CMT: corrected for multiple testing, SCC: Spearman’s rank correlation, MCC: Matthews Correlation Coefficient, SENS: sensitivity, SPEC: specificity. As the SCC evaluation is threshold dependent, a threshold-independent performance evaluation was added using the area under the receiver operator characteristics curve (AUC) measure, resulting in . Certain mutations not included in the RMS were repeatedly identified by SigniSite. As the majority of these mutations were found in the binary resistance annotations from the international antiviral society-USA (IAS) (17), we enriched the RMS with the IAS and re-calculated the AUC, obtaining a significant performance increase of 0.011(P = 5.16 · 10−4), two-tailed paired t-test). Furthermore, we evaluated the performance of SigniSite using performance measures: Matthew’s correlation coefficient (MCC), sensitivity (SENS) and specificity (SPEC) against (RMS + IAS). See Table 1 for results. Having obtained good results for both the threshold-dependent and -independent performance evaluations, we turned to benchmark SigniSite against similar existing methods. In a 2009 benchmark study (18), SPEER (5,19) was identified as the state-of-the-art method for prediction of specificity definition positions (SDP). We, therefore, here compared the performances of SigniSite and SPEER on each of their original benchmarks data sets (see ‘Materials and Methods’ section) against (RMS + IAS). The results are shown in Figure 3. The results show that SigniSite outperforms SPEER on both data sets. The difference in predictive performance was, however, only found to be statistically significant for the HIVdb data set.

Figure 3.

Measures are mean (AUC) ± SE. Columns are: HIV [SPEER/SIGNI], SPEER and SigniSite’s predictions on the HIVdb data set. SDP [SPEER/SIGNI] SPEER and SigniSite’s predictions on the SDP data set. P-values quantifying the significance of the difference in performance were obtained using a two-tailed paired t-test.

DISCUSSION

SigniSite aims at providing a simple-to-use method for subgroup-free residue-level genotype–phenotype correlation in protein MSAs. SigniSite, thus, addresses a long-existing challenge in molecular biology; genotype-phenotype mapping. Genotype–phenotype mapping has a wide range of purposes in molecular biology, e.g. structural regions responsible for immunity (2), identifying protein-variants responsible for the severity of a disease (20) or coupling receptor polymorphisms to surface expression (21) etc. Site-directed mutagenesis in proteins and subsequent quantification of mutation-impact on a given phenotype is a time consuming and tedious task. High-throughput methods such as e.g. random mutagenesis (1) have, therefore, been developed. However, the challenge of analysing the increasingly larger volumes of data being generated only becomes greater. Additionally, large genotype–phenotype data sets (GPDs) can be compiled from publicly available databases, such as the HIVdb (15,16). SigniSite addresses this exact challenge. SigniSite was benchmarked on publicly available GPDs and RMS from the Stanford University HIV Drug Resistance Database (HIVdb) (15,16). We observed that for each of the 18 different benchmark data sets, SigniSite consistently identified certain residues, not annotated in the RMS table, as significantly associated with anti-viral drug resistance. We compared these identifications with binary resistance annotations from the International Antiviral Society-USA (IAS) (17) and found that the majority were indeed annotated as resistance impacting. This observation suggests that the RMS data are not exhaustive, and that the obtained correlation should rather be regarded as a lower bound of the true predictive performance. As the SDP method SPEER (5,19) was found to be the state-of-the-art method in a 2009 benchmark study (18), we chose to compare SigniSite to SPEER. We observed that SigniSite significantly outperformed SPEER on the HIVdb data set () and for the SDP data set (as defined in the SPEER paper), SigniSite likewise outperformed SPEER, approaching a significant difference (). Furthermore, SigniSite was much faster, taking only a few minutes to analyse the largest of the MSA (). SPEER on the other hand requires to be compiled in a slower version, when , taking ∼2 h to complete the analysis. In conclusion, SigniSite provides two important novel features: (i) SigniSite does not require any manual annotation of the data before analysis, e.g. binder/non-binder classification, SigniSite requires only sequences and associated values. (ii) Unlike conventional SDP prediction methods like SPEER, SigniSite will not only identify positions impacting the phenotype but also pinpoint the exact amino acid residue substitution(s) responsible for the impact detected at the identified position. To the best of our knowledge, this level of resolution has so far not been available.

MATERIALS AND METHODS

Benchmark data sets

Summary, see Supplementary Data for details.

HIVdb resistance mutation scores

The table of RMS was downloaded from the HIVdb (15,16), available at http://hivdb.stanford.edu/DR/cgi-bin/rules_scores_hivdb.cgi?class=PI. The table of RMS contains information about positions known to harbour mutations (n = 688) compared with wild-type (WT) and their impact on resistance towards eight different protease inhibitors (PIs). Positive scores range is [3,60] (n = 296) and indicates that the mutation increases the resistance towards a given PI. Negative score range is (n = 15) and indicates a decreased resistance. Scores of 0 (n = 377) indicate lack of resistance impact. At each position annotated in the table of RMS, the consensus residue was assigned an RMS of 0.

IAS resistance annotations

Protease mutations known to impact PI resistance were retrieved from the table ‘mutations in the protease gene associated with resistance to protease inhibitors’, in the International Antiviral Society USA (IAS)’s Update of the Drug Resistance Mutations in HIV-1: March 2013 (17). Also here, the consensus residue at annotated resistance positions was assigned an IAS score of 0.

Table transformations

The following table transformations were performed: , such that , otherwise , such that or , otherwise , such that for each position in (RMS + IAS) the resulting if at least one , otherwise . In all tables, any score is considered an actual positive and any score is considered an actual negative (Table 2).

Table 2.

Overview of target table notation

Notation	Format	Level	Annotating
RMS^a	Real num.	Residue	Fold-change in PI resistance
IAS^b	Binary	Residue	PI ass. resistance mutations
RMS_bin^c	Binary	Residue	PI ass. resistance mutations
(RMS + IAS)_mut^d	Binary	Residue	PI ass. resistance mutations
(RMS + IAS)_pos^e	Binary	Position	Positions ass. with PI resistance

aIt is used when calculating SCC, bit is used to look up mutations not annotated in 1, but repeatedly identified by SigniSite, cit is used when calculating AUC, dit is used for the enriched AUC calculation and when calculating the MCC, SENS and SPEC, eit is used as positional targets, when comparing the predictive performances of SigniSite and SPEER.

‘num.’, ‘ass.’, ‘PI’ abbreviates ‘numbers’, ‘association’ and ‘protease inhibitor’. In all tables, any score is considered an actual positive and any score is considered an actual negative.

Overview of target table notation aIt is used when calculating SCC, bit is used to look up mutations not annotated in 1, but repeatedly identified by SigniSite, cit is used when calculating AUC, dit is used for the enriched AUC calculation and when calculating the MCC, SENS and SPEC, eit is used as positional targets, when comparing the predictive performances of SigniSite and SPEER. ‘num.’, ‘ass.’, ‘PI’ abbreviates ‘numbers’, ‘association’ and ‘protease inhibitor’. In all tables, any score is considered an actual positive and any score is considered an actual negative.

MSAs from the HIVdb protease GPDs

GPDs were downloaded from the Stanford University HIV Drug Resistance Database (HIVdb) (15,16) Version 5.0, March, 2012, available at http://HIVdb.stanford.edu/cgi-bin/GenoPhenoDS.cgi. MSAs were compiled from the GPDs. Each MSA contains the sequences of a set of HIV-1 protease variants with measured fold change in resistance (compared with WT) towards the same PI, measured using the same assay. Only PIs present in both the table of RMS and the GPDs were used limiting the analysis to 6 PIs: ATV, IDV, LPV, NFV, SQV and TPV each of which was assayed using the three assays: ‘Antivirogram’ (Virco™), ‘PhenoSense’ (ViroLogic™) and ‘All Others’. A total of 12 714 sequences were constructed and compiled into 18 MSAs. The length of each of the protease variants is 99 amino acid residues.

The SPEER program and SDP benchmark data

SPEER, MSAs and corresponding experimentally annotated specificity determining sites were downloaded from the SPEER repository available at: ftp://ftp.ncbi.nih.gov/pub/SPEER/ (5,19). We downloaded the latest curated version of the data as described by Chakrabarti and Panchenko (18).

The SigniSite method

The method takes a set of (protein) sequences as input. If the sequences are not aligned, Signisite will use MAFFT (12) to make an MSA from the input sequences. Subsequently, the sequences are ranked with respect to a real number associated with each sequence, e.g. the replicative capacity or catalytic efficiency. For each amino acid at each position in the MSA, a non-parametric test is performed to test whether the observed ranks deviate significantly from the expected ranks. CMT of the resulting P-values may be performed using Bonferroni single-step or Holm step-down procedures. The resulting Z-scores per residue are visualized in a logo plot and a heatmap.

Brief description of the method underlying SigniSite

(see Supplementary Data for details). Initially each sequence is assigned a rank by sorting the sequence associated values (either ascending or descending depending on type of value) and then assigning a rank of ‘1’ to the first sequence after sorting, ‘2’ to the second and so forth. Each amino acid residue a observed at position p () in the MSA is then assigned the rank of the sequence to which it belongs. This way each is associated with a specific rank. At each position in the MSA, the mean ranks of each residue type are then calculated and placed in a rank matrix, where each row corresponds to a position in the MSA and each column to one of the 20 standard proteogenic amino acids, sorted according to A, R, N, D, C, Q, E, G, H, I, L, K, M, F, P, S, T, W, Y and V (SigniSite will exclude any characters but these 20). Subsequently, SigniSite evaluates for each position and residue type the difference between the mean of the observed and expected ranks. The mean of the expected ranks is the mean of the ranks we would observe if the residue type was randomly distributed over the column p in the MSA. This difference between observed and expected ranks is quantified by a Z-score assigned to each residue type at each position, yielding a Z-score-matrix. If a given position is fully conserved, z = 0 is assigned to the conserved residue. If a given residue type is absent at a given position, is assigned. The non-parametric statistics, on which SigniSite is based, are similar to that of Wilcoxon test statistics (22), where the obtained evaluation scores can be approximated by the standard normal distribution, thus allowing Z-score conversion to P-values by standard method. As one test is performed per residue type, per position, SigniSite will by default apply Bonferroni single-step (11) CMT to adjust the reported P-values.

Benchmarking

For each of the 18 MSAs compiled from the HIVdb GPDs (see ‘Materials and Methods’ section), a set of predictions were made (Z-scores) estimating the strength of the association of each residue type a at each position p () to the phenotype of the MSA. The obtained set of ’s was then correlated with the RMS using Spearman’s rank correlation (SCC) at three significance thresholds: including residues for which: (i) , (ii) and (iii) after CMT. The SCC was recorded for each of the 18 MSAs, and the mean and standard error (SE) of the means were calculated. For evaluating threshold-independent performance, the AUC measure was applied. The AUC was calculated against two sets of targets: RMS and the enriched set of targets (RMS + IAS). The mean AUC and SE were calculated for each set of targets. Finally, the sensitivity, specificity and MCC were calculated at the same thresholds as the SCC against the enriched set of targets (RMS + IAS). The sensitivity, specificity and MCC were recorded for each of the 18 MSAs, and the means and SEs were calculated.

Comparing SigniSite and SPEER

To compare the performance of SigniSite with that of existing methods, we turned to a 2009 benchmark study by Chakrabarti and Panchenko (18) comparing the predictive performance of five SDP prediction methods, on a set of protein families with experimentally annotated SDPs. As SPEER (5,19) in this benchmark was found to be the best performing method, we here limit our analysis to comparing SigniSite and SPEER by applying both methods to their respective GPDs. SPEER outputs positional predictions, whereas SigniSite assigns a Z-score for each residue type at each position. To cast the SigniSite Z-scores into one score per positions, the maximum of the absolute Z-scores was chosen. SigniSite assigns a prediction value to all positions regardless of residue composition, whereas SPEER by default will skip any fully conserved and positions with >20% gaps. To get prediction values for all positions, we assign a value of ‘−100’ to positions not predicted by SPEER (this value is lower than any score predicted by SPEER). SPEER requires each sequence in an MSA to be subgroup-annotated before analysis. To accommodate this requirement, each HIV MSA was split into two subgroups, by sorting the sequences in the MSA descending on their associated real values and then splitting the sequences into subgroup ‘1’ or ‘2’ on the median of the sorted values. To perform the rank analysis SigniSite requires that each sequence in the MSA has an associated real number. Of the 20 SDP MSAs, 13 contain only subgroups ‘1’ and ‘2’. We chose to use these 13 MSAs for the benchmark, using ‘1’ or ‘2’ as ‘SigniSite real number values’. This way the following two comparisons were made: SigniSite versus SPEER on the HIV protease data set and SigniSite versus SPEER in the SDP data set. The AUC measure was used to quantify the performance of each method on each benchmark data set.

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online: Supplementary description of the SigniSite Method, Supplementary descriptions of the benchmark data sets, Supplementary section on the impact of chosen seed for random number generation, Supplementary description of the benchmarks strategy, Supplementary Tables of HIV-1 PIs and abbreviations.

FUNDING

National Institutes of Health [HHSN272201200010C]; EU FP7 PepChipOmics: The European Union 7th Framework Program FP7/2007-2013 [222773]; The Center for Genomic Epidemiology (www.genomicepidemiology.org) grant 09-067103/DSF from the Danish Council for Strategic Research; The University of Copenhagen - Program of Excellence. Funding for open access charge: Technical University of Denmark - PhD programme. Conflict of interest statement. None declared.

19 in total

1. Multi-RELIEF: a method to recognize specificity determining residues from multiple sequence alignments using a Machine-Learning approach for feature weighting.

Authors: Kai Ye; K Anton Feenstra; Jaap Heringa; Adriaan P Ijzerman; Elena Marchiori
Journal: Bioinformatics Date: 2007-11-17 Impact factor: 6.937

2. Functional specificity lies within the properties and evolutionary changes of amino acids.

Authors: Saikat Chakrabarti; Stephen H Bryant; Anna R Panchenko
Journal: J Mol Biol Date: 2007-08-22 Impact factor: 5.469

3. Near-infrared fluorescent proteins.

Authors: Dmitry Shcherbo; Irina I Shemiakina; Anastasiya V Ryabova; Kathryn E Luker; Bradley T Schmidt; Ekaterina A Souslova; Tatiana V Gorodnicheva; Lydia Strukova; Konstantin M Shidlovskiy; Olga V Britanova; Andrey G Zaraisky; Konstantin A Lukyanov; Victor B Loschenov; Gary D Luker; Dmitriy M Chudakov
Journal: Nat Methods Date: 2010-09-05 Impact factor: 28.547

4. Multi-Harmony: detecting functional specificity from sequence alignment.

Authors: Bernd W Brandt; K Anton Feenstra; Jaap Heringa
Journal: Nucleic Acids Res Date: 2010-06-04 Impact factor: 16.971

5. Insight into antigenic diversity of VAR2CSA-DBL5ε domain from multiple Plasmodium falciparum placental isolates.

Authors: Sédami Gnidehou; Leon Jessen; Stéphane Gangnard; Caroline Ermont; Choukri Triqui; Mickael Quiviger; Juliette Guitard; Ole Lund; Philippe Deloron; Nicaise Tuikue Ndam
Journal: PLoS One Date: 2010-10-01 Impact factor: 3.240

6. Networks of high mutual information define the structural proximity of catalytic sites: implications for catalytic residue identification.

Authors: Cristina Marino Buslje; Elin Teppa; Tomas Di Doménico; José María Delfino; Morten Nielsen
Journal: PLoS Comput Biol Date: 2010-11-04 Impact factor: 4.475

7. Cell-specific protein phenotypes for the autoimmune locus IL2RA using a genotype-selectable human bioresource.

Authors: Calliope A Dendrou; Vincent Plagnol; Erik Fung; Jennie H M Yang; Kate Downes; Jason D Cooper; Sarah Nutland; Gillian Coleman; Matthew Himsworth; Matthew Hardy; Oliver Burren; Barry Healy; Neil M Walker; Kerstin Koch; Willem H Ouwehand; John R Bradley; Nicholas J Wareham; John A Todd; Linda S Wicker
Journal: Nat Genet Date: 2009-08-23 Impact factor: 38.330

8. Seq2Logo: a method for construction and visualization of amino acid binding motifs and sequence profiles including sequence weighting, pseudo counts and two-sided representation of amino acid enrichment and depletion.

Authors: Martin Christen Frølund Thomsen; Morten Nielsen
Journal: Nucleic Acids Res Date: 2012-05-25 Impact factor: 16.971

9. Ensemble approach to predict specificity determinants: benchmarking and validation.

Authors: Saikat Chakrabarti; Anna R Panchenko
Journal: BMC Bioinformatics Date: 2009-07-02 Impact factor: 3.169

10. Characterization and prediction of residues determining protein functional specificity.

Authors: John A Capra; Mona Singh
Journal: Bioinformatics Date: 2008-05-01 Impact factor: 6.937

7 in total

1. NEP: web server for epitope prediction based on antibody neutralization of viral strains with diverse sequences.

Authors: Gwo-Yu Chuang; David Liou; Peter D Kwong; Ivelin S Georgiev
Journal: Nucleic Acids Res Date: 2014-04-29 Impact factor: 16.971

2. Identification of a Major Dimorphic Region in the Functionally Critical N-Terminal ID1 Domain of VAR2CSA.

Authors: Justin Doritchamou; Audrey Sabbagh; Jakob S Jespersen; Emmanuelle Renard; Ali Salanti; Morten A Nielsen; Philippe Deloron; Nicaise Tuikue Ndam
Journal: PLoS One Date: 2015-09-22 Impact factor: 3.240

Review 3. Insights from 20 years of bacterial genome sequencing.

Authors: Miriam Land; Loren Hauser; Se-Ran Jun; Intawat Nookaew; Michael R Leuze; Tae-Hyuk Ahn; Tatiana Karpinets; Ole Lund; Guruprased Kora; Trudy Wassenaar; Suresh Poudel; David W Ussery
Journal: Funct Integr Genomics Date: 2015-02-27 Impact factor: 3.410

4. IDEPI: rapid prediction of HIV-1 antibody epitopes and other phenotypic features from sequence data using a flexible machine learning platform.

Authors: N Lance Hepler; Konrad Scheffler; Steven Weaver; Ben Murrell; Douglas D Richman; Dennis R Burton; Pascal Poignard; Davey M Smith; Sergei L Kosakovsky Pond
Journal: PLoS Comput Biol Date: 2014-09-25 Impact factor: 4.475

5. Cross-recognition of a pit viper (Crotalinae) polyspecific antivenom explored through high-density peptide microarray epitope mapping.

Authors: Mikael Engmark; Bruno Lomonte; José María Gutiérrez; Andreas H Laustsen; Federico De Masi; Mikael R Andersen; Ole Lund
Journal: PLoS Negl Trop Dis Date: 2017-07-14

6. Characterization and functional analysis of hypoxia-inducible factor HIF1α and its inhibitor HIF1αn in tilapia.

Authors: Hong Lian Li; Xiao Hui Gu; Bi Jun Li; Xiao Chen; Hao Ran Lin; Jun Hong Xia
Journal: PLoS One Date: 2017-03-09 Impact factor: 3.240

7. LEON-BIS: multiple alignment evaluation of sequence neighbours using a Bayesian inference system.

Authors: Renaud Vanhoutreve; Arnaud Kress; Baptiste Legrand; Hélène Gass; Olivier Poch; Julie D Thompson
Journal: BMC Bioinformatics Date: 2016-07-07 Impact factor: 3.169

7 in total