| Literature DB >> 34377961 |
Elias DeVoe1, Gavin R Oliver2, Roman Zenka2, Patrick R Blackburn1, Margot A Cousin2, Nicole J Boczek2, Jean-Pierre A Kocher2, Raul Urrutia3, Eric W Klee2, Michael T Zimmermann1.
Abstract
MOTIVATION: Genomic data are prevalent, leading to frequent encounters with uninterpreted variants or mutations with unknown mechanisms of effect. Researchers must manually aggregate data from multiple sources and across related proteins, mentally translating effects between the genome and proteome, to attempt to understand mechanisms.Entities:
Keywords: data aggregation; genetic variation; high-throughput nucleotide sequencing; molecular sequence annotation; protein annotations
Year: 2021 PMID: 34377961 PMCID: PMC8346652 DOI: 10.1093/jamiaopen/ooab065
Source DB: PubMed Journal: JAMIA Open ISSN: 2574-2531
Figure 1.P2T2 for UBA1 demonstrates the rich and comprehensive data that our platform can aggregate and efficiently summarize. When the user places their cursor over an amino acid, the position is highlighted highlight M41 within UBA1, a site with oxidation potential close to the end of an intrinsically disordered region (MobiDB domain is highlighted) and for which homologous experimental structures exist (eg, PDB 4P22 chain A is 99.77% identical). After marking an amino acid, the right-hand panel displays a summary of all available information across that amino acid and the analogous amino acids in the MSA. Color keys for each data type are described in our help page, accessible from the upper toolbar. Pathogenic variants in UBA1 that are associated with muscular dystrophy are noted in the figure. Unlike M41, none of the pathogenic variants are simultaneously annotated with a post-translational regulatory mark.
Figure 2.Data are dynamically viewable. Zooming in on the region around M41, the specific and detailed data and annotations available within P2T2 are more easily viewed. We have highlighted M41 and show branding information for many of the available annotations. Domain annotations are provided through Interproscan. Simple Motifs are colored according to their probability of occurrence in random sequences, with blue indicative of higher random probability and orange of lower random probability. For example, the MOD_CK1_1 motif is overlapped by M41. The 3D organization of the protein is important for determining if this motif is accessible; there an experimental structure of UBA1. Clicking on any of the graphical elements directs the user to the corresponding online source data. Finally, the paralog MSA view is synced to the protein view, allowing data from both sources to inform interpretation of positions of interest.