| Literature DB >> 29066768 |
Surendra S Negi1, Catherine H Schein1,2, Gregory S Ladics3, Henry Mirsky4, Peter Chang4, Jean-Baptiste Rascle5, John Kough6, Lieven Sterck7, Sabitha Papineni8, Joseph M Jez9, Lucilia Pereira Mouriès10, Werner Braun11.
Abstract
Proteins are fundamental to life and exhibit a wide diversity of activities, some of which are toxic. Therefore, assessing whether a specific protein is safe for consumption in foods and feeds is critical. Simple BLAST searches may reveal homology to a known toxin, when in fact the protein may pose no real danger. Another challenge to answer this question is the lack of curated databases with a representative set of experimentally validated toxins. Here we have systematically analyzed over 10,000 manually curated toxin sequences using sequence clustering, network analysis, and protein domain classification. We also developed a functional sequence signature method to distinguish toxic from non-toxic proteins. The current database, combined with motif analysis, can be used by researchers and regulators in a hazard screening capacity to assess the potential of a protein to be toxic at early stages of development. Identifying key signatures of toxicity can also aid in redesigning proteins, so as to maintain their desirable functions while reducing the risk of potential health hazards.Entities:
Mesh:
Substances:
Year: 2017 PMID: 29066768 PMCID: PMC5655178 DOI: 10.1038/s41598-017-13957-1
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Workflow for selecting potential toxin sequences included in the database. Different selection of keywords were combined to provide a broad coverage of toxins.
Figure 2Cluster and network analysis of protein toxin sequences: (a) Toxin sequences in each of the 35% sequence identity clusters are shown. Most sequences were contained in about 442 clusters. (b) Relation of a cluster at the 95% sequence identity level (indicated by Axxxx) to larger clusters at 35% sequence identity level. The example shows a 35% cluster of conotoxin sequences (G13) composed of multiple 95% clusters.
Figure 3Most of the 1600 singlet sequences can be related to highly populated clusters using multiple sequence alignments. For example, the first sequence is a cytotoxin associated protein from Heliobacter plyori included in cluster 4, the other three are from different clusters with only one sequence. Those sequences are almost identical to the first sequence, but contain deletions from 69 to 105, or at the C-termini.
Figure 4The toxin sequences grouped functionally to 381 PFAM domains. The number of sequences in each PFAM class varied widely (top). The most populated PFAM domains with the number of sequence entries are listed below.
Figure 5Domain structures of the hemolysins from Vibrio cholera (a) and Vibrio vulnificus (b). The membrane-active form of both is a heptameric, pore-forming structure.
Figure 6Dendrotoxin (a) and BPTI (b) group to the same Kunitz inhibitor PFAM domain and share the same 3D fold. Sequence motifs were generated in an alignment of 10 dendrotoxins (motif 1 in red, 2 in cyan and 3 in green). Only motif 3 had a significant score in trypsin inhibitors.