| Literature DB >> 30988206 |
Michael Hicks1, Istvan Bartha2, Julia di Iulio3, J Craig Venter4, Amalio Telenti5.
Abstract
Sequence variation data of the human proteome can be used to analyze 3D protein structures to derive functional insights. We used genetic variant data from nearly 140,000 individuals to analyze 3D positional conservation in 4,715 proteins and 3,951 homology models using 860,292 missense and 465,886 synonymous variants. Sixty percent of protein structures harbor at least one intolerant 3D site as defined by significant depletion of observed over expected missense variation. Structural intolerance data correlated with deep mutational scanning functional readouts for PPARG, MAPK1/ERK2, UBE2I, SUMO1, PTEN, CALM1, CALM2, and TPK1 and with shallow mutagenesis data for 1,026 proteins. The 3D structural intolerance analysis revealed different features for ligand binding pockets and orthosteric and allosteric sites. Large-scale data on human genetic variation support a definition of functional 3D sites proteome-wide.Entities:
Keywords: deep mutational scanning; exome; genome constraint; protein structure
Mesh:
Substances:
Year: 2019 PMID: 30988206 PMCID: PMC6500140 DOI: 10.1073/pnas.1820813116
Source DB: PubMed Journal: Proc Natl Acad Sci U S A ISSN: 0027-8424 Impact factor: 11.205
Fig. 1.Three-dimensional tolerance to variation in the proteome. (A) Missense variation data from genome and exome sequencing projects are mapped to 3D protein structures. Features extracted from Uniprot are also mapped to the 3D structures. Using these features as reference points, a 3D context is constructed, and the corresponding genetic data are extracted. A 3DTS is generated from this information. The 3DTS values are projected back onto the 3D structure. (B) The distribution of tolerance values across the structural proteome for 139,535 3D sites for structures representing 4,715 proteins. The 3DTS value at the 20th percentile (3DTS < 0.14) is used to define intolerant sites. (C) Median 3DTS for a subset of feature types with the interquartile ranges (IQR). The number of each feature type with a 3DTS value is shown above each column. The overall median across the structural proteome is represented by a horizontal dashed line. Feature types are colored by subsections defined by Uniprot (https://www.uniprot.org/help/sequence_annotation).
Fig. 2.Validation of 3DTS. (A) Comparison of deep mutational screen data and in silico 3DTS data for the DNA-binding and ligand-binding domains of PPARG. (Top) Projection of the functional scores described in Majithia et al. (23) for each amino acid and the scores averaged across the 3DTS-defined sites for the crystal structure 3dzy (32). The color scheme is chosen to match the one described in Majithia et al. (Bottom) A projection of 3DTS onto PPARG is seen on the Left, and the 3D site level correlation between 3DTS and the 3D site averaged in vitro functional scores is shown in the plot on the Right. (B) Comparison of deep mutational screen data and 3DTS under different modeling assumptions for all available PDB structures covering 70% of the canonical protein length for nine genes. “Structure” refers to 3D sites defined by secondary structure elements, and “Allfeatures” uses 3D sites defined by all Uniprot features as detailed in the . “Constant” and “heptamer” refer to the mutation rates as discussed in the . (C) Comparison of the optimal 3DTS model to 23 other scoring methods at the 3D site level for nine genes. Pearson r2 values for comparisons of deep mutational screen data and in silico data at the 3D site level for the nine genes are provided. “NaN” refers to methods with unavailable scores. (D) Shallow mutagenesis data proteome-wide. Here, 3DTS identifies functional sites (loss of function) as more constrained (lower 3DTS values) at all levels of global gene essentiality compared with the rest of the protein. pLI > 0.9 (essential gene) functional to background Kolmogorov–Smirnov two-sided test P value = 9.3E-31; 0.1 > pLI > 0.9 functional to background Kolmogorov–Smirnov two-sided test P value = 2.3E-20; pLI < 0.1 functional to Kolmogorov–Smirnov two-sided test P value = 1.1E-18.
Fig. 3.Characteristics of druggable sites. (A) Binned 3DTS scores describing active sites, allosteric sites, protein–protein interaction sites, drug ligand-binding sites, and background. The sum of each site type is 1. Active-site background Kolmogorov–Smirnov two-sided test P value = 4.9E-110. Allosteric background Kolmogorov–Smirnov two-sided test P value = 1.1E-84. Protein–protein interactions background Kolmogorov–Smirnov two-sided test P value = 1.8E-89. Drug ligand-binding background Kolmogorov–Smirnov two-sided test P value = 3.0E-75. (B) Counts of tolerant and intolerant drug ligand-binding sites grouped by therapeutic area. Here, tolerant is defined as 3DTS > 0.24 (50th percentile of 3DTS), while intolerant is defined as described in the text (3DTS < 0.14; 20th percentile of 3DTS); drug binding sites between these 3DTS values are not included. See Dataset S3 for full details about this dataset.