| Literature DB >> 34755885 |
Jose Manuel Rodriguez1, Fernando Pozo2, Daniel Cerdán-Vélez2, Tomás Di Domenico2, Jesús Vázquez1,3, Michael L Tress2.
Abstract
APPRIS (https://appris.bioinfo.cnio.es) is a well-established database housing annotations for protein isoforms for a range of species. APPRIS selects principal isoforms based on protein structure and function features and on cross-species conservation. Most coding genes produce a single main protein isoform and the principal isoforms chosen by the APPRIS database best represent this main cellular isoform. Human genetic data, experimental protein evidence and the distribution of clinical variants all support the relevance of APPRIS principal isoforms. APPRIS annotations and principal isoforms have now been expanded to 10 model organisms. In this paper we highlight the most recent updates to the database. APPRIS annotations have been generated for two new species, cow and chicken, the protein structural information has been augmented with reliable models from the EMBL-EBI AlphaFold database, and we have substantially expanded the confirmatory proteomics evidence available for the human genome. The most significant change in APPRIS has been the implementation of TRIFID functional isoform scores. TRIFID functional scores are assigned to all splice isoforms, and APPRIS uses the TRIFID functional scores and proteomics evidence to determine principal isoforms when core methods cannot.Entities:
Mesh:
Substances:
Year: 2022 PMID: 34755885 PMCID: PMC8728124 DOI: 10.1093/nar/gkab1058
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.APPRIS annotations for ACP1 from the GENCODE v38 reference set. Panels A and B show PDB structure 4Z99, the resolved structure of isoform A of low molecular weight protein tyrosine phosphatase (ACP1-001 in the GENCODE annotation), onto which has been mapped the effects of alternative splicing for two different annotated isoforms, ACP1-002 and ACP1-005. Known catalytic residues are shown as red sticks, Src-phosphorylated tyrosines are shown in green. Panel A shows variant ACP1-002, which differs from ACP1-001 by a single tandem duplicated homologous exon. Residues identical between ACP1-001 and ACP1-002 in the exon are shown in light blue; those that differ are shown in light yellow. Catalytic and phosphorylated residues are not affected by this exon. Panel B shows variant ACP1-005. In this case the second tandem duplicated exon is read in a different frame leading to a premature stop codon. The unrelated sequence from the frameshifted amino acids is mapped onto the structure in light orange, but these residues are likely to be unfolded. The remaining protein structure of ACP1-001 (shown in light pink) would be lost in this isoform, eliminating one of the catalytic residues and both phosphorylated tyrosines. Isoforms generated from ACP1-003 and ACP1-007 swap even more of the C-terminal region of the principal isoform for unrelated residues and premature stop codons. Both lose the same catalytic and phosphorylated residues as ACP1-005. Panel C shows the scores from the APPRIS modules for the five isoforms. The principal isoform scores are shown with a green background, the ‘alternative’ isoform with an orange background. The scores for variant ACP1-002 are so similar to those of the principal isoform that APPRIS has to determine the principal isoform from external methods. ACP1-001 is chosen as the principal isoform on the strength of proteomics evidence (PRINCIPAL:3). TRIFID predicts that ‘ALTERNATIVE’ variant ACP1-002 is highly likely to be functionally important as a protein. The remaining three isoforms lose protein structure, functional domains and residues and have no detectable cross-species conservation. TRIFID predicts that they will not be functionally relevant at the protein level.
Figure 2.Improvements in principal isoform coverage in four model species. Bar charts showing the changes in principal isoform coverage with the improvements to the core methods and the change to the new TRIFID-based selection process. The distribution of principal isoforms before the changes is shown in the lighter colour (and labelled ‘Classic’) and the distribution after the changes is in darker bars (‘TRIFID’). Principal isoform types are labelled as P:1 etc. where P:1 is short for PRINCIPAL:1. From top left, (A) human, (B) mouse, (C) chicken, (D) D. melanogaster. There were no P2 or P4 isoforms in chicken or D. melanogaster prior to the improvements because these species are not annotated with CCDS (42) evidence. TRIFID plays no part in the selection of P:1 isoforms, Improvements here are due to the core methods and include the effects of adding AlphaFold models. The TRIFID selection process means that there are now very few P:5 isoforms in any species. Full details can be found in the supplementary materials.