Literature DB >> 30445541

SIFTS: updated Structure Integration with Function, Taxonomy and Sequences resource allows 40-fold increase in coverage of structure-based annotations for proteins.

Jose M Dana1, Aleksandras Gutmanas1, Nidhi Tyagi2, Guoying Qi2, Claire O'Donovan3, Maria Martin2, Sameer Velankar1.   

Abstract

The Structure Integration with Function, Taxonomy and Sequences resource (SIFTS; http://pdbe.org/sifts/) was established in 2002 and continues to operate as a collaboration between the Protein Data Bank in Europe (PDBe; http://pdbe.org) and the UniProt Knowledgebase (UniProtKB; http://uniprot.org). The resource is instrumental in the transfer of annotations between protein structure and protein sequence resources through provision of up-to-date residue-level mappings between entries from the PDB and from UniProtKB. SIFTS also incorporates residue-level annotations from other biological resources, currently comprising the NCBI taxonomy database, IntEnz, GO, Pfam, InterPro, SCOP, CATH, PubMed, Ensembl, Homologene and automatic Pfam domain assignments based on HMM profiles. The recently released implementation of SIFTS includes support for multiple cross-references for proteins in the PDB, allowing mappings to UniProtKB isoforms and UniRef90 cluster members. This development makes structure data in the PDB readily available to over 1.8 million UniProtKB accessions.

Entities:  

Mesh:

Substances:

Year:  2019        PMID: 30445541      PMCID: PMC6324003          DOI: 10.1093/nar/gky1114

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

The rapid evolution in genetic sequencing over the past decades is leading to an unprecedented growth in the number of protein sequences available in the UniProt Knowledgebase (UniProtKB, http://uniprot.org)—a universal resource for sequence and functional information pertaining to proteins (1). It currently contains over 500 000 manually annotated sequences (UniProtKB/Swiss-Prot) and over 120 million computationally annotated ones (UniProtKB/TrEMBL) despite a near 50% reduction of the size of the holdings in 2015 to remove high sequence redundancy. This increase is set to continue and likely to accelerate even further with the growing appreciation of the role microbiome plays in health and disease. Most of these protein sequences are unlikely to be experimentally characterised and, therefore, they will not be targeted for manual curation. In order to annotate this large protein space, the UniProt team has developed a rule-based prediction system (UniRule) to automatically enrich UniProtKB/TrEMBL proteins with functional annotations. The rules in the UniRule system are manually annotated based on InterPro family classification and experimental annotation in UniProtKB/Swiss-Prot, and then computationally applied to annotate millions of protein sequences in the database (1). Knowledge of protein structure can help elucidate function, and thus enhance computational (and manual) annotations available in UniProtKB. In parallel to the growth in sequencing data, structural biology has undergone revolutionary changes over the past decade, ranging from dramatic improvements in electron microscopy to wider accessibility and near complete automation of crystallographic techniques. The Protein Data Bank (PDB) is the single global archive of experimentally determined three-dimensional (3D) biomacromolecular structures and associated experimental data (2). It is managed by the Worldwide PDB (wwPDB; http://wwpdb.org) (3), an international consortium, of which the Protein Data Bank in Europe (PDBe; http://pdbe.org) (4) is one of the founding members. PDB receives an increasing number of depositions (over 13 000 in 2017) of ever increasing complexity, yet the pace of growth of the PDB is by necessity slower than that of sequence resources, with increases in coverage of the sequence space proportionate to the increase in the number of PDB entries: from 28 000 unique UniProtKB accessions referenced by 84 000 PDB entries in early 2013 (5) to over 45 000 UniProtKB accessions referenced by over 145 000 PDB entries at present. Robust mechanisms of data discovery and of linking biological contexts pertaining to proteins are essential. A number of resources utilise the structure data from the PDB to annotate protein sequences within related families and superfamilies of sequences (6). Both the PDBe and UniProtKB are core resources at the European Bioinformatics Institute (EMBL-EBI; http://www.ebi.ac.uk) (7) and within the context of the ELIXIR infrastructure (http://elixir-europe.org) (8). Facilitated by their co-location at EMBL-EBI, the PDBe and UniProt teams developed the Structure Integration with Function, Taxonomy and Sequences (SIFTS) resource (9), which allows for transfer of value-added annotations between the protein sequences and the protein structures, helping to understand mechanisms of protein interactions and function. SIFTS provides residue-level cross-references between protein sequences in UniProtKB and 3D atomic models of those proteins within PDB entries. The resource also collates and distributes residue-level annotations from Pfam (10), InterPro (11), SCOP (12) and CATH (13), and whole sequence level cross-references from IntEnz (14), GOA (15), PubMed (16), and NCBI taxonomy (17), all of which have been part of the SIFTS process as described previously (9). The most recent update added cross-references from Homologene (https://www.ncbi.nlm.nih.gov/homologene) (18) and Ensembl (19), and automatic Pfam assignments based on HMM profiles (20,21). In order to enhance the possibility of transfer of annotations between protein sequences and structures, the underlying SIFTS pipeline was also re-engineered to support multiple cross-references between UniProtKB and PDB, as described below. The pipeline underlies many features of the PDBe website and REST API (4). Many other bioinformatics resources such as UniProt (1), RCSB PDB (22), PDBj (23), PDBsum (24), Reactome (25), Pfam (10), SCOP2 (26), MobiDB (27) and InterPro (11) rely on the SIFTS resource to establish cross-references between the PDB structures and other biological data in order to serve up-to-date information to their users. From 2018, SIFTS is incorporated into the PDBe Knowledge Base resource (PDBe-KB; http://pdbe-kb.org).

METHODOLOGY

The basic SIFTS procedure has been described previously (9). Its two main components remain the same: a semi-automated process to identify sequence cross-references from UniProtKB to the protein sequences in the PDB, and a fully automated process to generate residue-level mappings between the two sequences and to add further cross-reference information from other bioinformatics resources. The original procedure was limited to cross-referencing the polypeptide sequence in a given PDB entry to a single UniProtKB accession. This limitation was overcome in the most recent SIFTS infrastructure update by organising the PDB-UniProtKB cross-references into three categories: (i) mapping to a UniProt canonical protein sequence, unchanged compared to the previous implementation, (ii) mapping to all alternative isoforms of the canonical sequence and (iii) mapping to sequences in UniRef90 clusters. The latter two categories will be discussed below.

Mappings to isoforms

It is thought that alternative splicing is implicated in a number of diseases, and that nearly all multi-exon protein-coding genes in humans may undergo alternative splicing, giving rise to different isoform protein products (28). One of these products - usually the most prevalent - is termed a ‘canonical’ entry in UniProtKB, and was previously the only option for SIFTS cross-references to protein sequences in the PDB. In order to overcome this limitation, the SIFTS process was updated as follows (Figure 1A and B):
Figure 1.

Schematic diagrams of the SIFTS process. (A) Overall view of the data flow from PDB, UniProtKB and other resources to data distribution. (B) Calculation of direct mappings between protein structures in PDB and UniProtKB sequences, including isoforms. The process in panel B is invoked weekly and the data are released concurrently with the release of new PDB structures (see text). (C) Calculation of mappings for UniRef90 dataset. The process in panel C is invoked after the weekly release of new PDB structures.

For each polypeptide sequence in the PDB—the query sequence—retrieve the existing manually annotated cross-reference provided by either the UniProtKB or by the PDB, as described previously (9). Expand the set of UniProtKB sequences to be analysed with all the isoforms of the accession from (a), unless the query sequence is identified as a chimeric construct. In the latter case, the set of accessions is not expanded beyond the manually annotated ones. Calculate sequence alignments and sequence identity between the query sequence and each UniProtKB accession from the set defined in (b). For canonical UniProtKB sequences, coverage by the PDB sequence is also calculated. Annotate the best sequence alignment from (c). Currently, the best alignment is defined as the one with the highest sequence identity with a preference for the canonical accession in the case of a tie. Cross-references from Pfam, IntEnz and Homologene are added on the basis of the mappings to the canonical UniProtKB accessions, as these resources do not consider isoform data, while those from Ensembl are added based on the isoform information. Cross-references from GOA, InterPro and preliminary Pfam assignments based on HMM profiles are calculated for the actual query sequence from the PDB. Schematic diagrams of the SIFTS process. (A) Overall view of the data flow from PDB, UniProtKB and other resources to data distribution. (B) Calculation of direct mappings between protein structures in PDB and UniProtKB sequences, including isoforms. The process in panel B is invoked weekly and the data are released concurrently with the release of new PDB structures (see text). (C) Calculation of mappings for UniRef90 dataset. The process in panel C is invoked after the weekly release of new PDB structures. At the time of writing, 727 unique human proteins (in 2412 PDB entries) have a non-canonical isoform as their best mapping. In total, the PDB archive contains 7202 unique human proteins (in 40 325 PDB entries). Four proteins in seven PDB entries only have valid mappings to non-canonical isoforms (Supplementary Table S1). The above procedure is integrated into the weekly PDBe release process, and the resulting core SIFTS data are made available publicly along with the weekly PDB release (00:00 UTC each Wednesday). Data are available as a combination of the PDBe REST API (http://www.ebi.ac.uk/pdbe/api/doc/sifts.html), per-entry XML files with residue-level information, and summary flat files in CSV and TSV formats.

Mappings to UniRef90 clusters

UniProt Reference Clusters (UniRef) are sets of sequences from the UniProtKB, >10 residues in length, that share a level of sequence identity (29) using the CD-HIT algorithm (30). In particular, UniRef90 is built by clustering UniProtKB sequences such that each cluster is composed of sequences that have at least 90% sequence identity to and 80% overlap with the longest sequence (called the seed sequence) of the cluster. It is generally expected that proteins belonging to a given UniRef90 cluster are structurally very similar. It is therefore a useful extension to be able to cross-reference UniProtKB accessions to 3D structures in the PDB via the UniRef90 clusters. The SIFTS procedure for isoforms described above is applicable for generating mappings to members of UniRef90 clusters with a few configurable modifications (Figure 1C): For each polypeptide sequence in the PDB - the query sequence—retrieve the canonical UniProtKB cross-reference (primary accession) from the core SIFTS data, and calculate the coverage of the UniProtKB accession by the query sequence. If the coverage from (a) is greater than 70%, retrieve all UniProtKB accessions belonging to the same UniRef90 cluster(s) as the primary accession. For UniRef90 clusters with more than 5000 members, restrict the expanded set to one randomly chosen UniProtKB accession per taxonomy identifier. Perform pairwise sequence alignments between the query sequence and the set of UniProtKB accessions from (b), and calculate sequence identity for each alignment. Currently, additional cross-references from external resources are not included for mappings to UniRef90 clusters. The PDB to UniRef90 mapping procedure currently takes approximately one day to calculate and is thus performed after the weekly release. UniRef90 mapping data become publicly available via the PDBe REST API one week after the PDB data are released.

Other improvements

Ultimately, the purpose of SIFTS is to provide an infrastructure for transfer of annotations and cross-references between the structure and the sequence domains, represented by the PDB and the UniProtKB data, respectively. Thus, apart from the above improvements, the SIFTS pipeline expanded the coverage of cross-references from other resources through the addition of provisional domain assignments based on Pfam HMM profiles (20), cross-references to Ensembl identifiers and genomic positions (19), Homologene identifiers (18), and additional PubMed cross-references retrieved from UniProtKB. SIFTS continues to include cross-references from GOA (15), InterPro (11), IntEnz (14), CATH (13), SCOP (12) and Pfam (10). For each identified Pfam domain and provisional domain assignment, the coverage by the PDB structure is calculated.

DATA DISTRIBUTION

Core SIFTS data continues to be distributed as per-entry XML files available from the EMBL-EBI FTP area (ftp://ftp.ebi.ac.uk/pub/databases/msd/sifts/). Their structure remains the same as described previously (9) with the addition of Ensembl genomic position information. Summary information is also distributed as comma- or tab-delimited flat files, also available at the EMBL-EBI FTP tree. Compared to the previous description, three new files were added describing additional mappings: Mappings involving only observed PDB residues, i.e., excluding those residues which were present in the experimental sample, but whose atomic coordinates were not modelled (e.g., because of poor electron density) (ftp://ftp.ebi.ac.uk/pub/databases/msd/sifts/csv/uniprot_segments_observed.csv); for preliminary Pfam assignments based on HMM profiles (ftp://ftp.ebi.ac.uk/pub/databases/msd/sifts/csv/pdb_chain_hmmer.csv); and for Ensembl genomic positions (ftp://ftp.ebi.ac.uk/pub/databases/msd/sifts/csv/pdb_chain_ensembl.csv). Nearly all of the SIFTS data is also accessible via the PDBe REST API (http://www.ebi.ac.uk/pdbe/api/doc/sifts.html), and some information (e.g. mappings to members of UniRef90 clusters) is only available through this channel. SIFTS data underlie a major part of the PDBe search functionality and the PDB entry pages (4,31).

APPLICATIONS

The major improvement in the updated SIFTS pipeline is the ability to include multiple mappings between protein sequences found in PDB and UniProtKB entries. The two main applications of this development are the provision of mappings to isoforms and to UniProtKB sequences from UniRef90 clusters. Including the mappings to members of UniRef90 clusters expands the structural coverage of UniProtKB 40-fold from ∼45 000 UniProt accessions mapped directly to proteins within PDB entries to over 1.8 million UniProtKB accessions with at least 90% sequence identity to structures in the PDB which cover 70% or more of the UniProtKB sequence. Narrowing down to structural coverage of a particular species (Table 1), our analysis shows that while the PDB contains structures of 3010 unique human proteins with at least 70% coverage of the corresponding UniProtKB accession, this expands by 26 673 unique UniProtKB accessions that map to a structure in the PDB via the UniRef90 route. There is considerable redundancy in this set due to a large number (24 056) of unreviewed (TrEMBL) protein isoforms that are included in the UniRef90 clusters, but not in the UniProt human reference proteome (Table 2). The overwhelming majority of these UniProtKB accessions can map to the set of human proteins already present in the PDB, but there are 1318 UniProtKB accessions (970 protein names) for human proteins, which currently only map to a non-human protein structure in the PDB, thus expanding the structural coverage of the human proteome by more than 30%. In the case of the mouse proteome, this expansion more than doubles (from 764 unique protein names to 1954).
Table 1.

Structure coverage of proteomes of selected model organisms via the UniRef90 clusters

Number of UniProtKB accessions (unique protein names) from an organism → Organism(1) Direct mappings to PDB entries with at least 70% sequence coverage(2) In SIFTS UniRef90 datasets, excluding accessions in (1)(3) In SIFTS UniRef90 datasets, and mapping to a PDB sequence from another organism(4) In SIFTS UniRef90 datasets, and mapping to a PDB sequence from the same organism(5) In SIFTS UniRef90 datasets, and mapping to both PDB sequence from the same and from different organism(6) In SIFTS UniRef90 datasets, and mapping to a PDB sequence from another organism only, i.e., inaccessible from the same species
Homo Sapiens 3010 (2959)26 673 (4918)1799 (1377)26 907 (5287)689 (531)1318 (970)
Drosophila melanogaster 203 (202)262 (205)22 (22)263 (206)-21 (21)
Mus musculus 764 (752)4289 (2621)3264 (2144)1614 (911)270 (159)3045 (1954)
Escherichia coli (all subspecies)2042 (1658)272 533 (14 080)27 801 (2307)258 324 (12 836)12 925 (1013)27 663 (2288)
Saccharomyces cerevisiae (all subspecies)1187 (1168)12 070 (3841)789 (258)12 121 (3894)700 (214)725 (207)
Schizosaccharomyces pombe (all subspecies)156 (156)5 (5)6 (5)1 (1)-4 (4)
Caenorhabditis elegans 106 (97)30 (27)10 (9)35 (32)2 (2)8 (8)
Danio rerio 71 (68)493 (341)408 (283)105 (72)7 (6)406 (282)
Arabidopsis thaliana 344 (342)674 (472)73 (51)652 (465)1 (1)63 (47)
Triticum aestivum 48 (48)396 (118)279 (81)134 (49)12 (8)276 (79)
Table 2.

Structure coverage of the UniProt human proteome

Manually curated human proteins (Swiss-Prot)Automatically curated human proteins (TrEMBL) and part of the UniProt Reference ProteomeManually or automatically curated human proteins which are not included in the UniProt Reference Proteome
Number of UniProtKB accessions with a direct SIFTS mapping to proteins in the PDB and with 70% or more sequence coverageCanonical29201107
Other isoforms2618a8b
Number of UniProtKB accessions in UniRef90 clusters with at least one SIFTS mapping to a PDB structure (excluding direct mappings)Canonical2402124056
Other isoforms169a2279b

aThe number of isoforms of manually curated proteins (Swiss-Prot) includes an expansion into all isoforms of the canonical sequences from the corresponding row above.

bThe number of isoforms for mappings (direct or via UniRef90 clusters) to automatically curated proteins (TrEMBL) does not include the expansion of the canonical sequences.

Structure coverage of proteomes of selected model organisms via the UniRef90 clusters Structure coverage of the UniProt human proteome aThe number of isoforms of manually curated proteins (Swiss-Prot) includes an expansion into all isoforms of the canonical sequences from the corresponding row above. bThe number of isoforms for mappings (direct or via UniRef90 clusters) to automatically curated proteins (TrEMBL) does not include the expansion of the canonical sequences. At the time of writing, 27 Enzyme Commission (EC) numbers in the IntEnz database (14), for which no PDB structure is available, map to UniRef90 clusters with at least one PDB entry (Table 3), and thus their structures could potentially be modelled by homology with a degree of confidence. The number of species for which there is at least one protein structure in the PDB is ∼4000, while taking the UniRef90 clusters into account, studies of over 86,000 species (distinct taxonomy identifiers) could benefit from available structure data.
Table 3.

Enzymes (Enzyme Commission numbers) in the IntEnz resource that are not annotated in the PDB but that belong to UniRef90 clusters with a mapping to PDB structure

Mappings to PDB structures annotated with a different EC number from IntEnz
EC number in UniRef90Enzyme name in UniRef90UniProtKB accession in UniRef90Sequence identity to PDB entriesPDB entries (possible templates)EC number associated with PDB entryEnzyme name in PDB entryUniProtKB accession mapped to PDB structure
1.1.1.96Diiodophenylpyruvate reductaseP4092595%4mdh 5mdh1.1.1.37Malate dehydrogenaseP11708
1.6.2.6Leghemoglobin reductaseQ4121996%1dxl1.8.1.4Dihydrolipoyl dehydrogenaseP31023
3.4.24.73JararhaginP3043195%3dsl3.4.24.49BothropasinO93523
3.5.4.45Melamine deaminaseQ9EYU098%4v1x 4v1y3.8.1.8Atrazine chlorohydrolaseP72156
3.7.1.132-hydroxy-6-oxo-6-(2-aminophenyl)hexa-2,4-dienoate hydrolaseQ9AQM498%1j1i3.7.1.82,6-dioxo-6-phenylhexa-3-enoate hydrolaseQ84II3
4.1.2.9PhosphoketolaseQ9AEM995%3ahc 3ahd 3ahe 3ahf 3ahg 3ahh 3ahi 3ahj4.1.2.22Fructose-6-phosphate phosphoketolaseD6PAH1
4.2.3.32Levopimaradiene synthaseH8ZM7099%3s9v4.2.3.18 4.2.3.132Abieta-7,13-diene synthase Neoabietadiene synthaseQ38710
4.2.3.44Isopimara-7,15-diene synthaseH8ZM7192%5.5.1.12Copalyl diphosphate synthase
4.5.1.5S-carboxymethylcysteine synthaseP0ABK5100%5j43 5j5v2.5.1.47Cysteine synthaseP0ABK6
5.3.1.34D-erythrulose 4-phosphate isomeraseQ9ZB2699%5ifz5.3.1.6Ribose-5-phosphate isomeraseQ8YCV4
6.5.1.6DNA ligase (ATP or NAD(+))Q9HHC491%3rr56.5.1.1DNA ligase (ATP)C0LJI8
Mappings to PDB structures lacking annotation with an EC number from IntEnz
EC number in UniRef90Enzyme name in UniRef90UniProtKB accession in UniRef90Sequence identity to PDB entriesPDB entries (possible templates)UniProtKB accession mapped to PDB structureUnreviewed protein name from mapped UniProtKB accession
1.14.14.11Styrene monooxygenaseO50214100%3ihmO33471Styrene monooxygenase component A
1.3.1.29 cis-1,2-Dihydro-1,2-dihydroxynaphthalene dehydrogenaseP0A17098%5xtf 5xtgG9G7I72,3-dihydroxy-2,3-dihydrophenylpropionate dehydrogenase
1.3.1.60Dibenzothiophene dihydrodiol dehydrogenase
2.3.1.228Isovaleryl-homoserine lactone synthaseQ89VI2100%5w8a 5w8c 5w8d 5w8e 5w8gA0A0N0C224Autoinducer synthase
2.3.1.60Gentamicin 3-N-acetyltransferaseP2318199%6bvcQ53396Aminoglycoside-(3)-N-acetyltransferase
2.4.1.292GalNAc-alpha-(1→4)-GalNAc-alpha-(1→3)-diNAcBac-PP-undecaprenol alpha-1,4-N-acetyl-D-galactosaminyltransferaseQ0P9C597%6eji 6ejj 6ejkO86151WlaC protein
2.8.2.37Trehalose 2-sulfotransferaseA0QQ53100%1texP84151Putative sulfotransferase
2.8.3.10Citrate CoA-transferaseP4541392%1xr4Q8ZRY1Citrate lyase alpha chain
3.1.1.59Juvenile-hormone esteraseP19985100%2fj0Q9GPG0Carboxylic ester hydrolase
3.2.1.94Glucan 1,6-alpha-isomaltosidaseQ4405297%5awo 5awp 5awqQ7WSN5Isomaltodextranase
3.5.1.105Chitin disaccharide deacetylaseQ99PX199%3wx7A6P4T5Chitin oligosaccharide deacetylase COD1
4.2.1.1632-Oxo-hept-4-ene-1,7-dioate hydrataseP42270100%2eb4 2eb5 2eb6Q469822-hydroxyhexa-2,4-dienoate hydratase
4.2.1.168GDP-4-dehydro-6-deoxy-alpha-D-mannose 3-dehydrataseD3QY10100%2gms 2gmuQ9F118Putative pyridoxamine 5-phosphate-dependent dehydrase
4.2.3.1081,8-Cineole synthaseO8119192%2j5cA6XH05Cineole synthase
6.2.1.13Acetate–CoA ligase (ADP-forming)Q8U3D692%2csuO58493Uncharacterized protein
6.3.2.39Aerobactin synthaseQ4731892%6cn7Q6U605IucA/IucC family siderophore biosynthesis protein
Enzymes (Enzyme Commission numbers) in the IntEnz resource that are not annotated in the PDB but that belong to UniRef90 clusters with a mapping to PDB structure

CONCLUSION

In conclusion, the SIFTS pipeline was updated to include multiple mappings between the protein structures in the PDB and their sequences in UniProtKB. This allows a more accurate representation of structures of specific isoforms with ∼10% of human proteins in the PDB having their best sequence alignment to a non-canonical sequence in the UniProtKB. More importantly, the expansion of the cross-references to protein sequences in UniRef90 clusters increases the structure coverage of the protein sequence space 40-fold, expanding the applicability of structure-based annotation to over 1.8 million UniProtKB sequences. Inclusion in the SIFTS data of gene IDs and genomic positions from Ensembl enables a more direct cross-referencing of genomic data from PDB structures. SIFTS data are made available via a combination of the per-entry XML files, summary CSV and TSV files and the PDBe REST API. Click here for additional data file.
  31 in total

1.  Announcing the worldwide Protein Data Bank.

Authors:  Helen Berman; Kim Henrick; Haruki Nakamura
Journal:  Nat Struct Biol       Date:  2003-12

2.  IntEnz, the integrated relational enzyme database.

Authors:  Astrid Fleischmann; Michael Darsow; Kirill Degtyarenko; Wolfgang Fleischmann; Sinéad Boyce; Kristian B Axelsen; Amos Bairoch; Dietmar Schomburg; Keith F Tipton; Rolf Apweiler
Journal:  Nucleic Acids Res       Date:  2004-01-01       Impact factor: 16.971

3.  The NCBI Taxonomy database.

Authors:  Scott Federhen
Journal:  Nucleic Acids Res       Date:  2011-12-01       Impact factor: 16.971

4.  InterPro in 2017-beyond protein family and domain annotations.

Authors:  Robert D Finn; Teresa K Attwood; Patricia C Babbitt; Alex Bateman; Peer Bork; Alan J Bridge; Hsin-Yu Chang; Zsuzsanna Dosztányi; Sara El-Gebali; Matthew Fraser; Julian Gough; David Haft; Gemma L Holliday; Hongzhan Huang; Xiaosong Huang; Ivica Letunic; Rodrigo Lopez; Shennan Lu; Aron Marchler-Bauer; Huaiyu Mi; Jaina Mistry; Darren A Natale; Marco Necci; Gift Nuka; Christine A Orengo; Youngmi Park; Sebastien Pesseat; Damiano Piovesan; Simon C Potter; Neil D Rawlings; Nicole Redaschi; Lorna Richardson; Catherine Rivoire; Amaia Sangrador-Vegas; Christian Sigrist; Ian Sillitoe; Ben Smithers; Silvano Squizzato; Granger Sutton; Narmada Thanki; Paul D Thomas; Silvio C E Tosatto; Cathy H Wu; Ioannis Xenarios; Lai-Su Yeh; Siew-Yit Young; Alex L Mitchell
Journal:  Nucleic Acids Res       Date:  2016-11-29       Impact factor: 16.971

5.  The European Bioinformatics Institute in 2017: data coordination and integration.

Authors:  Charles E Cook; Mary T Bergman; Guy Cochrane; Rolf Apweiler; Ewan Birney
Journal:  Nucleic Acids Res       Date:  2018-01-04       Impact factor: 16.971

6.  Protein Data Bank Japan (PDBj): updated user interfaces, resource description framework, analysis tools for large structures.

Authors:  Akira R Kinjo; Gert-Jan Bekker; Hirofumi Suzuki; Yuko Tsuchiya; Takeshi Kawabata; Yasuyo Ikegawa; Haruki Nakamura
Journal:  Nucleic Acids Res       Date:  2016-10-26       Impact factor: 16.971

7.  UniProt: the universal protein knowledgebase.

Authors: 
Journal:  Nucleic Acids Res       Date:  2016-11-29       Impact factor: 16.971

Review 8.  The Expanding Landscape of Alternative Splicing Variation in Human Populations.

Authors:  Eddie Park; Zhicheng Pan; Zijun Zhang; Lan Lin; Yi Xing
Journal:  Am J Hum Genet       Date:  2018-01-04       Impact factor: 11.025

9.  HMMER web server: 2018 update.

Authors:  Simon C Potter; Aurélien Luciani; Sean R Eddy; Youngmi Park; Rodrigo Lopez; Robert D Finn
Journal:  Nucleic Acids Res       Date:  2018-07-02       Impact factor: 16.971

10.  Ensembl 2018.

Authors:  Daniel R Zerbino; Premanand Achuthan; Wasiu Akanni; M Ridwan Amode; Daniel Barrell; Jyothish Bhai; Konstantinos Billis; Carla Cummins; Astrid Gall; Carlos García Girón; Laurent Gil; Leo Gordon; Leanne Haggerty; Erin Haskell; Thibaut Hourlier; Osagie G Izuogu; Sophie H Janacek; Thomas Juettemann; Jimmy Kiang To; Matthew R Laird; Ilias Lavidas; Zhicheng Liu; Jane E Loveland; Thomas Maurel; William McLaren; Benjamin Moore; Jonathan Mudge; Daniel N Murphy; Victoria Newman; Michael Nuhn; Denye Ogeh; Chuang Kee Ong; Anne Parker; Mateus Patricio; Harpreet Singh Riat; Helen Schuilenburg; Dan Sheppard; Helen Sparrow; Kieron Taylor; Anja Thormann; Alessandro Vullo; Brandon Walts; Amonida Zadissa; Adam Frankish; Sarah E Hunt; Myrto Kostadima; Nicholas Langridge; Fergal J Martin; Matthieu Muffato; Emily Perry; Magali Ruffier; Dan M Staines; Stephen J Trevanion; Bronwen L Aken; Fiona Cunningham; Andrew Yates; Paul Flicek
Journal:  Nucleic Acids Res       Date:  2018-01-04       Impact factor: 16.971

View more
  47 in total

1.  The Protein Data Bank Archive.

Authors:  Sameer Velankar; Stephen K Burley; Genji Kurisu; Jeffrey C Hoch; John L Markley
Journal:  Methods Mol Biol       Date:  2021

2.  Structural evidence for a latch mechanism regulating access to the active site of SufS-family cysteine desulfurases.

Authors:  Jack A Dunkle; Michael R Bruno; Patrick A Frantom
Journal:  Acta Crystallogr D Struct Biol       Date:  2020-02-25       Impact factor: 7.652

3.  ProThermDB: thermodynamic database for proteins and mutants revisited after 15 years.

Authors:  Rahul Nikam; A Kulandaisamy; K Harini; Divya Sharma; M Michael Gromiha
Journal:  Nucleic Acids Res       Date:  2021-01-08       Impact factor: 16.971

4.  IDPology of the living cell: intrinsic disorder in the subcellular compartments of the human cell.

Authors:  Bi Zhao; Akila Katuwawala; Vladimir N Uversky; Lukasz Kurgan
Journal:  Cell Mol Life Sci       Date:  2020-09-30       Impact factor: 9.261

5.  Integrated structural and evolutionary analysis reveals common mechanisms underlying adaptive evolution in mammals.

Authors:  Greg Slodkowicz; Nick Goldman
Journal:  Proc Natl Acad Sci U S A       Date:  2020-03-02       Impact factor: 11.205

6.  Harnessing protein folding neural networks for peptide-protein docking.

Authors:  Tomer Tsaban; Julia K Varga; Orly Avraham; Ziv Ben-Aharon; Alisa Khramushin; Ora Schueler-Furman
Journal:  Nat Commun       Date:  2022-01-10       Impact factor: 14.919

7.  Blinded Testing of Function Annotation for uPE1 Proteins by I-TASSER/COFACTOR Pipeline Using the 2018-2019 Additions to neXtProt and the CAFA3 Challenge.

Authors:  Chengxin Zhang; Lydie Lane; Gilbert S Omenn; Yang Zhang
Journal:  J Proteome Res       Date:  2019-10-18       Impact factor: 4.466

8.  In-depth and 3-dimensional exploration of the budding yeast phosphoproteome.

Authors:  Michael C Lanz; Kumar Yugandhar; Shagun Gupta; Ethan J Sanford; Vitor M Faça; Stephanie Vega; Aaron M N Joiner; J Christopher Fromme; Haiyuan Yu; Marcus B Smolka
Journal:  EMBO Rep       Date:  2021-01-25       Impact factor: 8.807

9.  Human Histone Interaction Networks: An Old Concept, New Trends.

Authors:  Yunhui Peng; Yaroslav Markov; Alexander Goncearenco; David Landsman; Anna R Panchenko
Journal:  J Mol Biol       Date:  2020-10-22       Impact factor: 5.469

10.  Uncovering of cytochrome P450 anatomy by SecStrAnnotator.

Authors:  Adam Midlik; Veronika Navrátilová; Taraka Ramji Moturu; Jaroslav Koča; Radka Svobodová; Karel Berka
Journal:  Sci Rep       Date:  2021-06-11       Impact factor: 4.379

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.