Literature DB >> 36040257

Compendium of proteins containing segments that exhibit zero-tolerance to amino acid variation in humans.

Adam L Sanders1, Jake N Hermanson2, David C Samuels3, Lars Plate4, Charles R Sanders1,5,6.   

Abstract

Genetic missense tolerance ratio (MTR) analysis systematically evaluates all possible segments in a given protein-encoding transcript found in the human population. This method scores each segment for the number of observed missense variants versus the number of silent mutations in that same segment. An MTR score of 0 indicates that no missense mutations are observed within a given segment. This is indicative of evolutionary purifying selection, which excludes mutations in that segment from the general human population. Here, we conducted MTR analysis on each of the roughly 20,000 protein-encoding human genes. It was seen that there are 257 genes with at least one 31-residue encoding segment with MTR = 0 (1.3% of all human genes). The proteins encoded by these 257 genes were tabulated along with information regarding the sequence location of each intolerant segment, the likely function of the protein, and so forth. The most functionally-enriched family among these proteins is a collection of several dozen proteins that are directly involved in RNA splicing. Some of the other proteins with zero-tolerance segments have thus far escaped significant characterization. Indeed, while a number of these proteins have previously been genetically linked to human disorders, many have not. We hypothesize that this compendium of human proteins with zero-tolerance segments can be used to complement disease mutation data as a pointer to genes and proteins that are associated with interesting and underexplored human biology.
© 2022 The Authors. Protein Science published by Wiley Periodicals LLC on behalf of The Protein Society.

Entities:  

Keywords:  database; gene; genetic; genome; intolerance; intolerant; missense tolerance ratio; protein; proteome

Mesh:

Substances:

Year:  2022        PMID: 36040257      PMCID: PMC9387208          DOI: 10.1002/pro.4408

Source DB:  PubMed          Journal:  Protein Sci        ISSN: 0961-8368            Impact factor:   6.993


missense tolerance ratio

INTRODUCTION

Genetic intolerance analysis has emerged as a powerful tool for studying protein evolution, structure–function relationships, and linkage of proteins to disease. , , , , , , , , , Here we examine an extreme form of protein sequential intolerance by identifying human proteins that contain segments in which genetic variation is completely disallowed by evolution. Petrovsky and coworkers introduced an approach to measure the “missense tolerance ratios” (MTR) for segments of human protein‐encoding genes. , For a given gene, this method is based on analyzing the >105 sequences for that gene in the gnomAD database and comparing the number of missense mutations present in each 31 amino acids segment versus the number of observed silent missense mutations in that same segment. Coding genes with segments that exhibit fewer amino acid‐encoding mutations than expected based on the observed number of silent mutations for that segment are deemed to be genetically intolerant. Intolerance indicates that amino acid‐encoding missense mutations within that gene segment are evolutionarily excluded from the human gene pool by “purifying selection.” Because of the limited number of currently available sequences in gnomAD, statistically meaningful single codon MTR ratios cannot yet be determined. However, analysis of 93‐base segments encoding 31 amino acids usually yields robust statistics. A segmental MTR score of 1.0 indicates that the sequence of the analyzed gene segment is under no purifying selective pressure, whereas an MTR score of 0 means that the introduction of even a single amino acid‐varying missense mutation into a segment is not seen in gnomAD, indicative of stringent purifying selection for variations associated with that segment. Mutations occurring in an intolerant segment of a protein can result in reduced evolutionary fitness through any one of a variety of potential mechanisms, such as triggering the loss of that protein's native function or inducing the formation of toxic aggregates, as we have reviewed elsewhere. Previous studies have explored the relationship of MTR analysis to specific proteins, particularly with respect to the use of segmental intolerance analysis to predict or illuminate the linkage of proteins to human disease. , , , , , , This highlights the fact that proteins containing intolerant segments are sometimes subjected to known disease mutations in other parts of the protein and, more rarely, even within the intolerant segment. The latter instance occurs when a disease mutation is observed in a patient‐derived database such as ClinVar that is too rare to be seen in the sample of the global (mostly healthy) human population represented by the current gnomAD collection of sequences. Our objective in this paper differs from the previous work. Here we sought to systematically identify all protein‐encoding human genes that contain one or more MTR = 0 (“zero‐tolerance”) segments. The resulting list is the main deliverable of this paper. It is hoped that this list will serve as a useful resource for the research community in identifying proteins that contain segments in which mutations result in such catastrophic consequences that they are filtered out of the human population. Evidently, these proteins are profoundly important and/or perilous, such that their study in some cases may yield groundbreaking insight into human biology and molecular pathophysiology. We also reported a few selected observations that can be made regarding the 257 proteins that contain at least one zero‐tolerance segment. One important finding is that proteins involved in RNA splicing are the most common group of proteins that contain absolutely intolerant segments. Another important finding is that there are many proteins that contain at least one intolerant segment, but for which there exist no known disease or ClinVar pathogenic mutations to date. We hypothesized that some of these proteins must be essential to human reproduction and/or development, despite, in many cases, having escaped much prior attention or recognition.

RESULTS AND DISCUSSION

Human proteins with zero‐tolerance segments

We observed 257 proteins—ca. 1.3% of all human proteins—that contain at least one amino acid segment at least 31 residues long (or an N‐ or C‐terminal segment at least 16 residues long) in which amino acid variations appear not to be evolutionarily tolerated (MTR score = 0), as determined by MTR analysis of the human gene sequences in the gnomAD database. These proteins are listed in Table 1, ordered alphabetically by their corresponding gene symbol. For each entry, a variety of supporting information is included, such as the location of the intolerant segment(s) in the protein, the function of the protein, and whether it is a membrane protein. Many of the proteins contain multiple zero‐tolerance segments and some of these segments extend well beyond 31 residues. Figure 1a shows a histogram that summarizes the distribution of all possible 31 amino acids segment MTR scores within the 257 proteins. Even within these proteins, only 1.8% of all segments exhibited an MTR score at or near zero. Figure 1b shows the distribution of the whole‐protein median MTR score for each protein, where it is seen that the level of genetic tolerance within these proteins is typically not low, with a median score of 0.75 and a mean of 0.71 ± 0.26. These data complement results reported in a column of Table 1 in which the median MTR score is presented for all segments within each protein.
TABLE 1

Human proteins that contain at least one zero‐tolerance segment

Gene symbol, UniProt ID, transcript IDEncoded proteinFunctionGO pathway or processProtein lengthTransmembrane?Intolerant segment(s) (UniProt numbering)Median MTR score for entire proteinNo. of ClinVar variants in intolerant segment(s)No. of ClinVar variants in the whole protein

ABL1

P00519‐1

ENST00000318560

Tyrosine‐protein kinase ABL1NR tyrosine kinase that is linked to cell growth and survival, as well as chromatin remodeling. Regulates CDC42 signal transduction.GO:0009790; embryo development1,130No393–423 c 0.821NoneMany

ACTB

P60709‐1 b

ENST00000675515

Actin, cytoplasmic 1Actin componentGO:0030029; actin filament‐based process375No53–91, 124–185, 247–278 b , (based on ENST00000331789)0.12459Many

ACTC1

P68032‐1

ENST00000290378

Actin, alpha cardiac muscle 1ActinGO:0060048; cardiac muscle contraction377No105–1510.4354Many

ACTL6B

O94805‐1

ENST00000160382

Actin‐like protein 6BTranscriptional activation and repression of select genes by chromatin remodeling. Role in neuronal development.GO:0016573; histone acetylation426No1–190.726None2

ACTR2

P61160‐1

ENST00000260641

Actin‐related protein 2ATP binding component of Arp23 complex.GO:0007010; cytoskeleton organization394No1–160.654NoneNone

AGO2

Q9UKV8‐1

ENST00000220592

Protein argonaute‐2Essential for RNAi. May inhibit translation.GO:1905114; cell surface receptor signaling pathway involved in cell–cell signaling859No446–4850.554NoneNone

AP2M1

Q96CW1‐1

ENST00000292807

AP‐2 complex subunit muComponent of AP‐2. Adaptor protein that plays a role in trafficking.GO:1905114; cell surface receptor signaling pathway involved in cell–cell signaling435No403–447 c 0.4645NoneNone

AR

P10275‐1

ENST00000374690

Androgen receptorSteroid hormone receptor that can affect proliferation and differentiation.GO:0009790; embryo development920No891–9320.849NoneNone

ARF1

P84077‐1

ENST00000541182

ADP‐ribosylation factor 1GTP binding protein involved in protein trafficking.GO:0032880; regulation of protein localization181No16–590.44813

ARF5

P84085‐1

ENST00000000233

ADP‐ribosylation factor 5GTP‐binding protein involved in protein trafficking.GO:0006886; intracellular protein transport180No41–740.603NoneNone

ARIH1

Q9Y4X5‐1

ENST00000379887

E3 ubiquitin‐protein ligase ARIH1E3 ubiquitin ligase. Interacts with cullin‐RING ubiquitin ligase complexes.GO:0000209; protein polyubiquitination557No336–369, 450–4830.5155NoneNone

ATF2

P15336‐1

ENST00000264110

Cyclic AMP‐dependent transcription factor ATF‐2Transcriptional activator that involves anti‐apoptosis, cell growth, and DNA damage response. Can impair mitochondrial membrane potential.GO:0045930; negative regulation of mitotic cell cycle505No365–3950.848None2

ATP1A1

P05023‐1

ENST00000295598

Sodium/potassium‐transporting ATPase subunit alpha‐1Sodium potassium pumpGO:0030001; metal ion transport1,023Yes604–6370.537None8

ATP1A3

P13637‐2 d

ENST00000543770

Sodium/potassium‐transporting ATPase subunit alpha‐3Sodium potassium pumpGO:0030001; metal ion transport1,024Yes355–398 d 0.4945Many

ATP2B1

P20020‐3

ENST00000428670

Plasma membrane calcium‐transporting ATPase 1Calcium transporterGO:0030001; metal ion transport1,220Yes421–4510.712NoneNone

ATP6V0C

P27449‐1

ENST00000330398

V‐type proton ATPase 16 kDa proteolipid subunitProton‐conducting pore forming subunit of the membrane integral V0 complex of vacuolar ATPase responsible for acidifying a variety of intracellular compartments in eukaryotic cells.GO:0030001; metal ion transport155Yes133–1700.3715NoneNone

ATRX

P46100‐1

ENST00000373344

Transcriptional regulator ATRXInvolved in transcriptional regulation and chromatin remodeling. May be involved in telomere maintenance.GO:0065004; protein–DNA complex assembly2,492No1,782–1,814, 2,095–2,142, 2,159–2,2130.8373Many

BCL11B

Q9C0K0‐1

ENST00000357195

B‐cell lymphoma/leukemia 11BKey regulator of differentiation and survival of T‐lymphocytes. Required for CCR7 and CCR9 receptors.GO:0000904; cell morphogenesis involved in differentiation894No789–8220.67514

BRD4

O60885‐1

ENST00000263377

Bromodomain‐containing protein 4Chromatin reader protein that binds acetylated histones and plays a role in epigenetics.GO:0031056; regulation of histone modification1,362No508–5420.76None1

BRD8

Q9H0E9‐1

ENST00000254900

Bromodomain‐containing protein 8May act as a coactivator during transcriptional activation by hormone‐activated nuclear receptors. Component of NuA4 histone acetyltransferase.GO:0016573; histone acetylation1,235No704–7360.891NoneNone

CACNA1A

O00555‐8

ENST00000360228

Voltage‐dependent P/Q‐type calcium channel subunit alpha‐1AVoltage dependent calcium channelGO:0030001; metal ion transport2,506Yes287–3250.7621Many

CACNA1C

Q13936‐11 d

ENST00000347598

Voltage‐dependent L‐type calcium channel subunit alpha‐1CCalcium channelGO:0030001; metal ion transport2,186Yes731–764 d 0.7101Many

CACNA1E

Q15878‐1

ENST00000367573

Voltage‐dependent R‐type calcium channel subunit alpha‐1EVoltage gated calcium channelGO:0030001; metal ion transport2,313Yes1,648–1,6790.786None17

CALM1

P0DP23‐1

ENST00000356978

Calmodulin‐1Modulates the function of numerous proteins in a calcium dependent manner. Involved in centrosome cycle and cytokinesis.GO:1905114; cell surface receptor signaling pathway involved in cell–cell signaling149No110–1420.3975None12

CALM2

P0DP24‐1

ENST00000272298

Calmodulin‐2Controls a large number of enzymes and, with CCP110 and centrin, is involved in the centrosome cycle and progression through cytokinesis.GO:0055074; calcium ion homeostasis149No78–1180.4675312

CAMK2A

Q9UQM7‐1

ENST00000348628

Calcium/calmodulin‐dependent protein kinase type II subunit alphaKinase that is activated by calcium or calmodulinGO:0030001; metal ion transport; GO:1905114478No111–1630.515None9

CAND1

Q86VP6‐1

ENST00000545606

Cullin‐associated NEDD8‐dissociated protein 1Key assembly factor of SCF ubiquitin ligaseGO:0010265; SCF complex assembly1,23046–760.786None1

CASK

O14936‐1

ENST00000378163

Peripheral plasma membrane protein CASKNeuronal development protein traffickingGO:0030001; metal ion transport926No73–1030.652NoneMany

CDC42

P60953‐2

ENST00000400259

Cell division control protein 42 homologEpithelial polarization, attachment of spindle to microtubules. Cell migration. Present in neuronal cells.GO:1905114; cell surface receptor signaling pathway involved in cell–cell signaling191No28–1090.2035511

CDC73

Q6P1J9‐1

ENST00000367435

ParafibrominRNA pol II recruitment (PAF1 interaction). Recruits E2 ligases to histones.GO:0050684; regulation of mRNA processing; GO:1905114531No133–1730.721NoneMany

CDK11B

P21127‐1

ENST00000407249

Cyclin‐dependent kinase 11BCyclin dependent kinase involved in many roles. Pre‐mRNA splicing.GO:0050684; regulation of mRNA processing795No733–8010.7135NoneNone

CELF2

O95319‐1

ENST00000416382

CUGBP Elav‐like family member 2RNA splicingGO:0050684; regulation of mRNA processing508No413–4490.552NoneNone

CHD2

O14647‐1

ENST00000394196

Chromodomain‐helicase‐DNA‐binding protein 2DNA binding helicase. Promotes deposition of histone H3.3.GO:0032508; DNA duplex unwinding1,828No484–5190.7431Many

CHD4

Q14839‐1

ENST00000544040

Chromodomain‐helicase‐DNA‐binding protein 4Part of NuRD complex and remodels chromatinGO:0043044; ATP‐dependent chromatin remodeling1,912No1,110–1,160, 1,165–1,2120.672217

CLASRP

Q8N2M8‐1

ENST00000391953

CLK4‐associating serine/arginine rich proteinProbably functions as an alternative splice regulator.GO:0008380; RNA splicing674No1–320.8205NoneNone

CLCN4

P51793‐1

ENST00000380833

H(+)/Cl(−) exchange transporter 4Hydrogen chloride outward rectifying exchangerGO:0006811; ion transport760Yes519–5490.5841Many

CLTC

Q00610‐1

ENST00000269122

Clatherin heavy chain 1Central protein of clathrin coated pits. Key role in endocytosis.GO:0030001; metal ion transport1,675No1,302–1,3360.660None5

CNOT6L

Q96LI5‐1

ENST00000504123

CCR4‐NOT transcription complex subunit 6‐likeHas poly(A) exoribonuclease activity. Catalytic component of the CCR4‐NOT complex.GO:0006402; mRNA catabolic process555No404–4340.7255NoneNone

CPSF4

O95639‐1

ENST00000292476

Cleavage and polyadenylation specificity factor subunit 4Pre‐mRNA processing. Poly‐A capGO:0050684; regulation of mRNA processing269No68–1150.585NoneNone

CREB1

P16220‐2

ENST00000430624

Cyclic AMP‐responsive element‐binding protein 1Phosphorylation‐dependent transcription factor. Binds to CRE and is enhanced by TORC coactivators. Circadian rhythm and differentiation of adipose tissue.GO:0007623; circadian rhythm327No271–315 c 0.668None1

CREBL2

O60519‐1

ENST00000228865

cAMP‐responsive element‐binding protein‐like 2May play a role in cell cycle. Transcriptional activity involved in adipose differentiation.GO:0006351; transcription, DNA‐templated120No20–540.836NoneNone

CSNK2B

P67870‐1

ENST00000375882

Casein kinase II subunit betaRegulatory subunit of casein kinase 2, a normally constitutively active kinase. Participates in Wnt signaling.GO:1905114; cell surface receptor signaling pathway involved in cell–cell signaling215No1–190.569513

CSTF2

P33240‐1

ENST00000372972

Cleavage stimulation factor subunit 2Required for polyadenylation and pre‐mRNA cleavageGO:0006379; mRNA cleavage577No555–577 c 0.8155NoneNone

CTCF

P49711‐1

ENST00000264010

Transcriptional repressor CTCFInvolved in transcriptional regulation by binding to chromatin insulators. Plays a role in CENPE recruitment during mitosis.GO:0071824; protein–DNA complex subunit organization727No279–3240.6235None16

CUL1

Q13616‐1

ENST00000325222

Cullin‐1Core component of cullin‐RING‐based SCF E3 ubiquitin ligase in ubiquitination of proteins involved in cell cycle progression.GO:1905114; cell surface receptor signaling pathway involved in cell–cell signaling776No532–5660.484NoneNone

CUL4B

Q13620‐2

ENST00000371322

Cullin‐4BCore component of cullin‐RING‐based E3 ubiquitin ligaseGO:0016567; protein ubiquitination913No709–742 c 0.631NoneMany

DDX3X

O00571‐2 d

ENST00000457138

ATP‐dependent RNA helicase DDX3XATP‐dependent helicase. Binds RNA G4s. Transcription regulation. Required for ATF4 mRNA translation. Mediates virus replication.GO:1905114; cell surface receptor signaling pathway involved in cell–cell signaling646No476–507 d 0.6031Many

DENND1A

Q8TEH3‐1

ENST00000373624

DENN domain‐containing protein 1AGuanine nucleotide exchange factor regulating clathrin endocytosis through RAB35 activationGO:0046907; intracellular transport1,009No1–180.875NoneNone

DHX15

O43143‐1

ENST00000336812

Pre‐mRNA‐splicing factor ATP‐dependent RNA helicase DHX15Pre‐mRNA processing factor involved in disassembly of spliceosomes.GO:0006397; mRNA processing795No462–509, 532–5730.501NoneNone

DHX9

Q08211‐1

ENST00000367549

ATP‐dependent RNA helicase AHelicase activity. Some mRNA splicing activity.GO:0050684; regulation of mRNA processing1,270No708–7570.666NoneNone

DKC1

O60832‐1

ENST00000369550

H/ACA ribonucleoprotein complex subunit DKC1Catalyzes uridine to psuedouridine in RNAGO:0006396; RNA processing514No88–138, 167–207, 218–256, 371–4040.5842Many

DLG3

Q92796‐1

ENST00000374360

Disks large homolog 3Role in learning, through NMDA receptor signalingGO:2000310; regulation of NMDA receptor activity817No522–5720.800None14

DUSP8

Q13202‐1

ENST00000397374

Dual specificity protein phosphatase 8Phosphatase that regulates MAPK activityGO:0009966; regulation of signal transduction625No610–6250.780NoneNone

EHBP1

Q8NDI1‐1

ENST00000263991

EH domain‐binding protein 1May play a role in actin reorganization.GO:0033036; macromolecule localization1,231No1–210.931NoneNone

EHMT2

Q96KQ7‐1

ENST00000375537

Histone‐lysine N‐methyltransferase EHMT2Histone methyltransferase that mono or di‐methylates H3K9GO:0016570; histone modification1,210No1,070–1,1080.823NoneNone

EIF1AX

P47813‐1

ENST00000379607

Eukaryotic translation initiation factor 1A, X‐chromosomalSeems to be required for maximal protein biosynthesis.GO:0006413; translational initiation144No5–45, 56–1280.2105NoneNone

EIF1AY

O14602‐1

ENST00000361365

Eukaryotic translation initiation factor 1A, Y‐chromosomalSeems to be required for maximal protein biosynthesis rate.GO:0006413; translational initiation144No124–1440.5075NoneNone

EIF2S2

P20042‐1

ENST00000374980

Eukaryotic translation initiation factor 2 subunit 2Initiation of translationGO:0009790; embryo development333No226–285, 318–3330.645NoneNone

EIF2S3

P41091‐1

ENST00000253039

Eukaryotic translation initiation factor 2 subunit 3Subunit of eIF‐2 involved in early steps of protein synthesis.GO:0006413; translational initiation472No150–204, 429–4610.478None6

EIF3A

Q14152‐1

ENST00000369144

Eukaryotic translation initiation factor 3 subunit ASubunit of the eIF‐3 complex. Required for protein synthesis. Targets a subset of mRNA involved in cell proliferation.GO:0006413; translation initiation1,382No1–180.887NoneNone

EIF4A3

P38919‐1 b

ENST00000649764

Eukaryotic initiation factor 4A‐IIIATP dependent helicase. Pre‐mRNA splicing. Core component of exon junction complex. Involved in craniofacial development.GO:0009790; embryo development411No204–234 b (based on ENST00000269349)0.4735None1

ERH

P84090‐1

ENST00000557016

Enhancer of rudimentary homologMay have a role in cell cycleGO:0007049; cell cycle104No30–610.325NoneNone

ETF1

P62495‐1

ENST00000360541

Eukaryotic peptide chain release factor subunit 1Directs termination of nascent peptide synthesis in response to stop codons. Component of SURF complex.GO:0002184; cytoplasmic translational termination437No55–87, 324–3550.490NoneNone

F8

P00451‐1

ENST00000360256

Coagulation factor VIIIFactor VIII, along with calcium and phospholipid, acts as a cofactor for F9/factor IXa, when it converts F10/factor X to the activated form, factor Xa.GO:0016491; oxidoreductase activity2,351No95–1310.8934Many

FGD1

P98174‐1

ENST00000375135

FYVE, RhoGEF, and PH domain‐containing protein 1Activates CDC42. Plays a role in cytoskeleton and cell shapeGO:0007010; cytoskeleton organization961No575–616, 739–7690.76151Many

FMR1

Q06787‐1

ENST00000370475

Synaptic functional regulator FMR1mRNA regulation. Maybe DNA repair in neuronal cells.GO:0050684; regulation of mRNA processing632No69–990.858None16

FOXG1

P55316‐1

ENST00000313071

Forkhead box protein G1Transcription repression factor important for neurogenesis.GO:0007420; brain development489No175–209, 217–2470.5642Many

FOXJ3

Q9UPW0‐1

ENST00000372572

Forkhead box protein J3Transcriptional activator of MEF2C. Plays an important role in spermatogenesis.GO:0010468; regulation of gene expression622No100–1390.848NoneNone

GABPA

Q06546‐1

ENST00000354828

GA‐binding protein alpha chainTranscription factor capable of interacting with purine rich repeats.GO:0009790; embryo development454No376–4060.698NoneNone

GABRA2

P47869‐1

ENST00000514090

Gamma‐aminobutyric acid receptor subunit alpha‐2Ligand gated chloride channel that is a component of the receptor for GABA.GO:0099536; synaptic signaling451Yes279–3160.6545NoneNone

GABRA3

P34903‐1

ENST00000370314

Gamma‐aminobutyric acid receptor subunit alpha‐3GABA receptorGO:0099536; synaptic signaling492Yes306–3380.688NoneNone

GABRB2

P63137‐1

ENST00000274547

Gamma‐aminobutyric acid receptor subunit beta‐2Ligand‐gated chloride channel component of the GABA receptor.GO:0099536; synaptic signaling512Yes151–1900.618NoneMany

GDF11

O95390‐1

ENST00000257868

Growth/differentiation factor 11Secreted signal involved in development.GO:0045664; regulation of neuron differentiation407No1–210.728NoneNone

GJB1

P08034‐1

ENST00000374029

Gap junction beta‐1 proteinForms gap junctionsGO:0007267; cell–cell signaling283Yes50–890.7133Many

GLRA2

P23416‐1

ENST00000218075

Glycine receptor subunit alpha‐2Glycine ligand gated chloride channel. Also triggered by taurine and beta‐alanine.GO:1905114; cell surface receptor signaling pathway involved in cell–cell signaling452Yes261–3150.698NoneNone

GNAL

P38405‐1

ENST00000535121

Guanine nucleotide‐binding protein G(olf) subunit alphaG protein that may be involved in olfactory and visual transduction.GO:0019932; second‐messenger‐mediated signaling381No35–66, 140–171, 178–215 c 0.61516

GNAQ

P50148‐1

ENST00000286548

Guanine nucleotide‐binding protein G(q) subunit alphaG protein involved in many transmembrane signaling pathways. Is important for B cell selection and chemotaxis of neutrophils and dendritic cells.GO:0019932; second‐messenger‐mediated signaling359No171–2020.575None4

GNAS

Q5JWF2‐1

ENST00000371100

Guanine nucleotide‐binding protein G(s) subunit alpha isoforms shortG protein that is activated by GPCRs including beta‐adrenergic receptors, stimulates Ras signaling.NA1,037No899–9590.8455None17

GNG3

P63215‐1

ENST00000294117

Guanine nucleotide‐binding protein G(I)/G(S)/G(O) subunit gamma‐3G protein subunit and required for GTPase activity.GO:0055074; calcium ion homeostasis75No58–750.819None1

GOLGA8G

Q08AF8‐1 b

ENST00000526619

Putative golgin subfamily A member 8F/8GPossibly a psuedogeneGO:0000226; microtubule cytoskeleton organization430No389–422 b (based on ENST00000525590)0.9045NoneNone

GORASP2

Q9H8Y8‐1

ENST00000234160

Golgi reassembly‐stacking protein 2Role in assembly and membrane stacking of the Golgi cisternae. May regulate intracellular transport. Required for normal acrosome formation in spermiogenesis. Mediates ER‐stress and induced unconventional trafficking of core‐glycosylated CFTR to cell membrane.GO:0045184; establishment of protein localization454No1–160.896NoneNone

GRIA2

P42262‐1

ENST00000264426

Glutamate receptor 2Receptor for glutamate that functions as an ion channel in the CNS.GO:0099536; synaptic signaling833Yes521–5540.6745None1

GRIA3

P42263‐1 b

ENST00000622768

Glutamate receptor 3Glutamate gated ion channelGO:0007215; glutamate receptor signaling pathway894Yes487–543, 602–638, 745–777, 879–894 b (based on ENST00000371256)0.6082Many

GRIN1

Q05586‐1

ENST00000371561

Glutamate receptor ionotropic, NMDA 1NMDA subunit that binds glutamateGO:0050684; regulation of mRNA processing938Yes549–581 c 0.5443Many

GRIN2A

Q12879‐1

ENST00000396573

Glutamate receptor ionotropic, NMDA 2ALigand‐gated ion channelGO:0030001; metal ion transport1,464Yes631–6700.7995Many

GRIN2B

Q13224‐1

ENST00000609686

Glutamate receptor ionotropic, NMDA 2BComponent of NMDA receptor complexGO:1905114; cell surface receptor signaling pathway involved in cell–cell signaling1,484Yes504–567, 662–700, 744–7740.6595Many

GSPT1

P15170‐1

ENST00000563468

Eukaryotic peptide chain release factor GTP‐binding subunit ERF3ATranslation terminationGO:0002184; cytoplasmic translational termination499No170–204 c 0.739NoneNone

HCFC1

P51610‐1

ENST00000310441

Host cell factor 1Control of cell cycle from G1 to S. Coactivator of GABP2.GO:0009790; embryo development2035No143–173, 207–240, 1,600–1,631, 1,976–2,0070.6481Many

HMGN4

O00479‐1

ENST00000377575

High mobility group nucleosome‐binding domain‐containing protein 4Chromatin bindingGO:0031492; nucleosomal DNA binding90No1–170.978NoneNone

HNF1B

P35680‐1 b

ENST00000617811

Hepatocyte nuclear factor 1‐betaTranscription factor. Binds to FPC element in PLAU gene. Organ development.GO:0009790; embryo development557No281–313 b (based on ENST00000225893)0.8195NoneMany

HNRNPC

P07910‐1

ENST00000554455

Heterogeneous nuclear ribonucleoproteins C1/C2Binds pre‐mRNA and nucleates the assembly of 40S hnRNP particles. May play a role in spliceosome assembly and pre‐mRNA splicing.GO:0043487; regulation of RNA stability306No39–830.67NoneNone

HNRNPD

Q14103‐1

ENST00000313899

Heterogeneous nuclear ribonucleoprotein D0Binds with high affinity to RNA with AU‐rich elements. Functions as transcription factor.GO:0006401; RNA catabolic process355No108–1480.924NoneNone

HNRNPH2

P55795‐1

ENST00000316594

Heterogeneous nuclear ribonucleoprotein H2Component of hnRNP which processes pre‐mRNAs.GO:0008380; RNA splicing449No78–111, 151–1900.504None5

HNRNPK

P61978‐2 d

ENST00000351839

Heterogeneous nuclear ribonucleoprotein KmRNA processing (one of major pre‐mRNA binding proteins). DNA binding. TP53 coactivator.GO:0050684; regulation of mRNA processing463No441–463 a (based on ENST00000376263)0.6985None8

HSD17B10

Q99714‐1

ENST00000168216

3‐hydroxyacyl‐CoA dehydrogenase type‐2Mitochondrial dehydrogenase involved in pathways of fatty acid, branched‐chain amino acid and steroid metabolism.GO:1901575; organic substance catabolic process261No145–1810.628None15

HUWE1

Q7Z6Z7‐1

ENST00000342160

E3 ubiquitin‐protein ligase HUWE1E3 ubiquitin ligaseGO:0000209; protein polyubiquitination4,374No499–529, 547–578, 3,006–3,042, 3,917–3,952, 4,358–4,3900.7281Many

INTS6

Q9UL03‐1

ENST00000311234

Integrator complex subunit 6Component of integrator complex. Involved in U1 and U2 transcription.GO:0006366; transcription by RNA polymerase II887No76–1070.7715NoneNone

IRAK1

P51617‐1

ENST00000369980

Interleukin‐1 receptor‐associated kinase 1Serine/threonine kinase that plays a critical role in initiating the innate immune system.GO:0002218; activation of innate immune response712No26–580.801None1

KAT7

O95251‐1

ENST00000259021

Histone acetyltransferase KAT7Catalytic component of HBO1 histone acetyltransferase complexes.GO:0016573; histone acetylation611No405–4430.678NoneNone

KCNA3

P22001‐1

ENST00000369769

Potassium voltage‐gated channel subfamily A member 3Voltage gated potassium channelGO:0030001; metal ion transport575Yes359–3920.865None1

KCNB1

Q14721‐1

ENST00000371741

Potassium voltage‐gated channel subfamily B member 1Voltage‐gated potassium channels that can form heterotetrameric channels with other potassium channels.GO:0030001; metal ion transport858Yes82–113, 326‐357, 369–4130.6643Many

KCNC2

Q96PR1‐1

ENST00000549446

Potassium voltage‐gated channel subfamily C member 2Voltage gated potassium channel. Also acts in various signaling pathways such as NO signaling.GO:0030001; metal ion transport638Yes370–4010.720NoneNone

KCND3

Q9UK17‐1

ENST00000315987

Potassium voltage‐gated channel subfamily D member 3Voltage gated inactivated A‐type potassium channel. May contribute to current in heart or neuron.GO:0030001; metal ion transport655Yes298–332, 364–4070.74053Many

KCNH7

Q9NS40‐1

ENST00000332142

Potassium voltage‐gated channel subfamily H member 7Voltage gated potassium channelGO:0030001; metal ion transport1,196Yes612–6420.834None1

KCNJ3

P48549‐1

ENST00000295101

G protein‐activated inward rectifier potassium channel 1Inward rectifier potassium channel controlled by G proteins and playing a crucial role in regulating heartbeat.GO:0030001; metal ion transport501Yes161–1940.6365NoneNone

KCNMA1

Q12791‐1

ENST00000286628

Calcium‐activated potassium channel subunit alpha‐1Export of potassium triggered by changes in cytosolic calcium or magnesium. Regulates smooth muscles, hair cells in cochlea, transmitter release, and innate immunity.GO:0030001; metal ion transport1,236Yes554–598, 1,009–1,0390.681NoneMany

KCNQ2

O43526‐1

ENST00000359125

Potassium voltage‐gated channel subfamily KQT member 2Heterotetramerizes with KCNQ3 to form a voltage gated channel important for regulation of neuronal excitability.GO:0030001; metal ion transport872Yes197–2370.7215Many

KDM2A

Q9Y2K7‐1

ENST00000529006

Lysine‐specific demethylase 2AHistone demethylase that preferentially demethylates H3K36. Regulates circadian clock.GO:0007623; circadian rhythm1,162No275–311, 585–6270.704NoneNone

KDM3B

Q7LBC6‐1

ENST00000314358

Lysine‐specific demethylase 3BHistone demethylase that specifically demethylates histone H3.GO:0016570; histone modification1,761No1,678–1714, 1716–17480.786NoneNone

KIF11

P52732‐1

ENST00000260731

Kinesin‐like protein KIF11Motor protein required for establishing a bipolar spindle during mitosis. Also involved in Golgi‐to‐cell surface trafficking.GO:0007051; spindle organization1,056No259–2990.840None13

KIF1A

Q12756‐1

ENST00000498729

Kinesin‐like protein KIF1AMotor for anterograde axonal transport of synaptic vesicle precursors. Interacts with CALM1. Required for neuronal dense core vesicles transport to dendritic spines and axons.GO:0030705; cytoskeleton‐dependent intracellular transport1,791No1,465–1,4980.7791Many

KIF5A

Q12840‐1

ENST00000455537

Kinesin heavy chain isoform 5AKinesin transport of neurofilament proteinsGO:0030705; cytoskeleton‐dependent intracellular transport1,032No230–2640.7881Many

KMT2C

Q8NEZ4‐1

ENST00000262189

Histone‐lysine N‐methyltransferase 2CHistone methyltransferase to H3K4. Chromatin remodeling.GO:0016571; histone methylation4,911No349–3790.923NoneMany

KPNB1

Q14974‐1

ENST00000290158

Importin subunit beta‐1Binds to nuclear localization signals and imports proteins into the nucleus.GO:0051169; nuclear transport876No706–7450.580NoneNone

LPA

P08519‐1

ENST00000316300

Apolipoprotein(a)Main constituent of lipoprotein(a). Serine protease activity. Inhibits plasminogen activator 1.GO:0006508; proteolysis4,584No99–1300.935NoneNone

LUC7L3

O95232‐1

ENST00000505658

Luc7‐like protein 3Binds cAMP regulatory element DNA sequence. May play a role in RNA splicingGO:0008380; RNA splicing432No185–2250.808NoneNone

MAMLD1

Q13495‐4 a

ENST00000426613

Mastermind‐like domain‐containing protein 1Transactivates HES3 independent of NOTCHGO:0006357; regulation of transcription by RNA polymerase II749No713–747 a (based on ENST00000432680)0.916NoneNone

MAPRE2

Q15555‐1

ENST00000300249

Microtubule‐associated protein RP/EB family member 2May be involved in microtubule polymerization by anchoring at centrosome.GO:0051493; regulation of cytoskeleton organization327No45–87, 131–1620.578513

MED12

Q93074‐1

ENST00000374080

Mediator of RNA polymerase II transcription subunit 12Component of mediator complex. Involved in the regulation of nearly all RNA pol‐II dependent genes. May specifically regulate transcription of targets of Wnt signaling pathway and SHH signaling.GO:1905114; cell surface receptor signaling pathway involved in cell–cell signaling2,177No1,138–1,1690.6931Many

MED14

O60244‐1

ENST00000324817

Mediator of RNA polymerase II transcription subunit 14Component of the mediator complex, needed for nearly all RNA pol II dependent genes.GO:0006366; transcription by RNA polymerase II1,454No1,277–1,3080.845NoneNone

MEF2C

Q06413‐1

ENST00000437473

Myocyte‐specific enhancer factor 2CTranscription activator that binds specifically to MEF2 element in many muscle‐specific genes. Controls cardiac morphogenesis and myogenesis. Plays a role in hippocampal learning. Important for immune cells.GO:1905114; cell surface receptor signaling pathway involved in cell–cell signaling473No1–360.64856Many

METTL14

Q9HCE5‐1

ENST00000388822

N6‐adenosine‐methyltransferase non‐catalytic subunitComponent of methyltransferase complex that methylates at the N6 position of some mRNAs and regulates circadian rhythm, differentiation of embryonic stem cells and cortical neurogenesis.GO:0032259; methylation456No104–1350.801NoneNone

MMP16

P51512‐1

ENST00000286614

Matrix metalloproteinase‐16Endopeptidase that degrades components of extracellular matrix. Matrix remodeling of blood vessels.GO:0009790; embryo development607Yes199–2330.7975NoneNone

MOB4

Q9Y3A3‐1

ENST00000323303

MOB‐like protein phoceinMay play a role in membrane trafficking, specifically membrane budding.GO:0046872; metal ion binding225No123–1530.7225NoneNone

MRC1

P22897‐1 b

ENST00000569591

Macrophage mannose receptor 1Mediates endocytosis of glycoproteins.GO:0044419; interspecies interaction between organisms1,456Yes168–213, 411–471b (based on ENST00000239761)0.778None1

MYB

P10242‐1

ENST00000367814

Transcriptional activator MybTranscriptional activator. DNA‐binding to YAAC[GT]G. Plays an important role in the control of proliferation and differentiation of hematopoietic progenitor cells.GO:0006338; chromatin remodeling640No118–1570.841NoneNone

NAA10

P41227‐1

ENST00000464845

N‐alpha‐acetyltransferase 10Acetyltransferase, particularly the first amino acid following removal of methionine.GO:0006473; protein acetylation235No17–47, 53–95, 104–1480.4335621

NAA15

Q9BXJ9‐1

ENST00000296543

N‐alpha‐acetyltransferase 15, NatA auxiliary subunitAuxiliary subunit of N‐terminal acetyltransferase activity. May be important for vascular, hematopoietic and neuronal growth and development. Required to control retinal neovascularization.GO:0006473; protein acetylation866No97–1370.795NoneNone

NEDD8

Q15843‐1

ENST00000250495

NEDD8Plays an important role in cell cycle control and embryogenesis via its conjugation to target proteins. Ubiquitin‐likeGO:0043687; post‐translational protein modification81No16–460.4175NoneNone

NIPBL

Q6KC79‐1

ENST00000282516

Nipped‐B‐like proteinLoading of cohesion complex onto chromatin.GO:0009790; embryo development2,804No2,073–2,1070.8163Many

NONO

Q15233‐1

ENST00000276079

Non‐POU domain‐containing octamer‐binding proteinPlays a variety of roles in nuclear processes.GO:0006281; DNA repair471No174–216, 221–2650.600None5

NR4A2

P43354‐1

ENST00000339562

Nuclear receptor subfamily 4 group A member 2Transcriptional regulator for differentiation of neurons during development. Crucial for expression of SLC6A3, SLC18A2, TH, and DRD2.GO:1905114; cell surface receptor signaling pathway involved in cell–cell signaling598No261–293, 318–3480.825NoneNone

NRBP1

Q9UHY1–1 a

ENST00000379852

Nuclear receptor‐binding proteinMay play a role in trafficking between ER and the Golgi through interaction with rho‐type GTPases.GO:0006810; transport535No199–232 a (based on ENST00000379863)0.671NoneNone

NSMF

Q6X4W1‐1

ENST00000371475

NMDA receptor synaptonuclear signaling and neuronal migration factorPart of CREB shut off pathway. Couples NMDA‐sensitive glutamate receptor and triggers long lasting changes to dendrites and synapses.GO:0048814; regulation of dendrite morphogenesis530No1–190.804NoneNone

NUDT11

Q96G61‐1

ENST00000375992

Diphosphoinositol polyphosphate phosphohydrolase 3‐betaCleaves a beta‐phosphate from the diphosphate groups in PP‐InsP5GO:0009058; biosynthetic process164No1–200.522NoneNone

NUDT21

O43809‐1

ENST00000300291

Cleavage and polyadenylation specificity factor subunit 5Component of cleavage factor Im. Involved in mRNA processing.GO:0050684; regulation of mRNA processing227No170–2150.5065NoneNone

OGT

O15294‐1

ENST00000373719

UDP‐N‐acetylglucosamine‐peptide N‐acetylglucosaminyltransferase 110 kDa subunitGlycosylates other proteins.GO:0006493; protein O‐linked glycosylation1,046No21–52, 54–84, 212–248, 349–382, 384–452, 499–5380.54215

OR4F17

Q8NGA8‐1

ENST00000585993

Olfactory receptor 4F17Predicted olfactory receptorGO:0007165; signal transduction305Yes1–220.938NoneNone

OTUD5

Q96G74‐1

ENST00000156084

OTU domain‐containing protein 5Deubiquitining functioning as a negative regulator of immune system.GO:0016579; protein deubiquitination571No171–201, 343–4110.5085None1

PAK2

Q13177‐1

ENST00000327134

Serine/threonine‐protein kinase PAK 2Serine/threonine kinase. Involved in cytoskeleton regulation, cell motility, cell cycle progression apoptosis, or proliferation. Downstream of CDC42 and RAC1.GO:0031098; stress‐activated protein kinase signaling cascade524No362–3970.719NoneNone

PAK3

O75914‐1

ENST00000372010

Serine/threonine‐protein kinase PAK 3Serine/threonine kinase that affects cytoskeleton regulation, cell migration, and cell cycle. Acts downstream of CDC42.GO:0006468; protein phosphorylation559No68–99, 290–335, 413–455, 458–489 c 0.645115

PBX1

P40424‐1

ENST00000420696

Pre‐B‐cell leukemia transcription factor 1Binds DNA in junction with HOX proteins. Spleen development.GO:0009790; embryo development430No275–3060.632None4

PCBP2

Q15366‐1

ENST00000439930

Poly(rC)‐binding protein 2Single strand nucleotide binding protein that preferentially binds to dC. Acts as adaptor between MAVS and E3 ITCHGO:0043161; proteasome‐mediated ubiquitin‐dependent protein catabolic process365No90–1200.4825NoneNone

PCYT1B

Q9Y5K3‐1

ENST00000379144

Choline‐phosphate cytidylyltransferase BRate‐limiting step in the CDP‐choline pathway for phosphatidylcholine biosynthesisGO:0009058; biosynthetic process369No220–2680.675NoneNone

PHF5A

Q7RTV0‐1

ENST00000216252

PHD finger‐like domain‐containing protein 5AInvolved in PAF1 complex in transcriptional elongation. Involved in pre‐mRNA splicing and deposition of certain histones.GO:0006397; mRNA processing110No1–18, 85–1100.324NoneNone

PIK3CA

P42336‐1

ENST00000263967

Phosphatidylinositol 4,5‐bisphosphate 3‐kinase catalytic subunit alpha isoformSubunit of PI3KGO:0009749; response to glucose1,068No927–975, 1,010–1,0410.691NoneMany

PLS3

P13797‐1

ENST00000355899

Plastin‐3Actin bundling proteins found in microvilli, stereocilia, filopodia and may play a role in bone development.GO:0007010; cytoskeleton organization630No456–5030.766NoneNone

POLR2A

P24928‐1 b

NA

DNA‐directed RNA polymerase II subunit RPB1Forms RNA polymerase active center with another catalytic subunit.GO:0006366; transcription by RNA polymerase II1,970No476–506 b (based on ENST00000572844)0.667NoneNone

POLR2B

P30876‐1

ENST00000381227

DNA‐directed RNA polymerase II subunit RPB2DNA dependent RNA polymerase catalyzing transcription.GO:0006366; transcription by RNA polymerase II1,174No490–522, 524–557, 746–778, 979–1,026, 1,072–1,1150.627NoneNone

POU3F2

P20265‐1

ENST00000328345

Histone‐lysine N‐methyltransferase EHMT2Histone methyltransferase that mono or di‐methylates Lys‐9.GO:0006479; protein methylation443No278–3140.667NoneNone

POU3F3

P20264‐1

ENST00000361360

POU domain, class 3, transcription factor 3Transcription factor that acts synergistically with SOX11 and SOX4. Role in neuronal development.GO:0030900; forebrain development500No317–3520.641None1

PPP1CB

P62140‐1

ENST00000395366

Serine/threonine‐protein phosphatase PP1‐beta catalytic subunitProtein phosphatase that forms complexes with over 200 regulatory proteins. Glycogen metabolism, muscle contractility, protein synthesis, chromatin structure, and cell cycle progression.GO:0000278; mitotic cell cycle327No51–1130.3295211

PPP2CA

P67775‐1

ENST00000481195

Serine/threonine‐protein phosphatase 2A catalytic subunit alpha isoformMajor phosphatase for microtubule‐associated proteins.GO:1904528; Positive regulation of microtubule binding309No142–1810.392NoneNone

PPP3R1

P63098‐1

ENST00000234310

Calcineurin subunit B type 1Regulatory subunit of calcineurin, a calcium‐dependent, calmodulin stimulated protein phosphatase. Confers calcium sensitivity.GO:1905114; cell surface receptor signaling pathway involved in cell–cell signaling170No62–960.464NoneNone

PRPF4B

Q13523‐1

ENST00000337659

Serine/threonine‐protein kinase PRP4 homologHas a role in pre‐mRNA splicing. Phosphorylates SF2/ASF.GO:0006468; protein phosphorylation1,007No811–8410.761NoneNone

PRPF8

Q6P2Q9‐1

ENST00000572621

Pre‐mRNA‐processing‐splicing factor 8Core component of spliceosome.GO:0000398; mRNA splicing, via spliceosome2,335No505–537, 764–800, 837–879, 1,494–1,539, 1,811–1,848, 1,888–1,9190.527None21

PRPS1

P60891‐1

ENST00000372435

Ribose‐phosphate pyrophosphokinase 1Essential for nucleotide synthesis.GO:0019438; aromatic compound biosynthetic process318No1–30, 84–121, 123–162, 168–1990.4467Many

PSMC1

P62191‐1

ENST00000261303

26S proteasome regulatory subunit 426S proteosome subunitGO:1905114; cell surface receptor signaling pathway involved in cell–cell signaling440No291–3320.652NoneNone

PSMC2

P35998‐1

ENST00000435765

26S proteasome regulatory subunit 7Component of 26S proteosomeGO:1905114; cell surface receptor signaling pathway involved in cell–cell signaling433No284–3180.644NoneNone

PSMC5

P62195‐1

ENST00000310144

26S proteasome regulatory subunit 826S proteosome subunitGO:1905114; cell surface receptor signaling pathway involved in cell–cell signaling406No112–144,158–1880.563NoneNone

PSMD14

O00487‐1

ENST00000409682

26S proteasome non‐ATPase regulatory subunit 1426S proteosome subunit. Metalloprotease that specifically cleaves “Lys‐63” linked polyubiquitin chains. Plays a role in DSBs and in recombination repair by promoting RAD51 loading.GO:1905114; cell surface receptor signaling pathway involved in cell–cell signaling310No65–970.515NoneNone

PUF60

Q9UHX1‐1

ENST00000526683

Poly(U)‐binding‐splicing factor PUF60DNA and RNA binding, involved in several nuclear processes such as pre‐mRNA splicing, apoptosis, and transcription regulation. Binds to poly(U) RNA.GO:0000398; mRNA splicing, via spliceosome559No90–1640.517515

PURA

Q00577‐1

ENST00000331327

Transcriptional activator protein Pur‐alphaProbable transcription activator that binds to purine rich single strand of PUR element upstream od MYC geneGO:0032508; DNA duplex unwinding322No54–920.4642Many

RAB2A

P61019‐1

ENST00000262646

Ras‐related protein Rab‐2ARequired for transport from ER to GolgiGO:0046907; intracellular transport212No8–460.563NoneNone

RAC1

P63000‐1

ENST00000348035

Ras‐related C3 botulinum toxin substrate 1GTPase that cycles between GTP active and GDP inactive and plays a role in secretory processes, phagocytosis of apoptotic cells, epithelial cell polarization, neurons adhesion, migration and differentiation, and growth‐factor induced formation of membrane ruffles.GO:1905114; cell surface receptor signaling pathway involved in cell–cell signaling192No141–171 c 0.236None9

RAN

P62826‐1

ENST00000543796

GTP‐binding nuclear protein RanGTPase involved in nucleocytoplasmic import/export. Required for normal progression through mitosis.GO:0071426; ribonucleoprotein complex export from nucleus216No12–50, 118–1870.179NoneNone

RBBP4

Q09028‐1

ENST00000373493

Histone‐binding protein RBBP4Core histone binding subunit. Chromatin remodeling. Component of CAF‐1, HDAC, NuRD, PRC2, and NURF.GO:0043044; ATP‐dependent chromatin remodeling425No12–63, 228–260, 294–333, 335–3840.3305NoneNone

RBBP5

Q15291‐1

ENST00000264515

Retinoblastoma‐binding protein 5Plays crucial role in differentiation potential in embryonic stem cells. Gene regulation. Stimulates histone methyltransferases.GO:0016569; covalent chromatin modification538No1–180.716NoneNone

RBBP7

Q16576‐1

ENST00000380087

Histone‐binding protein RBBP7Core histone binding subunit that may target histone remodeling factors. Component of some histone remodeling complexes.GO:0006338; chromatin remodeling425No122–156 c 0.5405NoneNone

RBM10

P98175‐1

ENST00000377604

RNA‐binding protein 10mRNA processingGO:0050684; regulation of mRNA processing930No332–3750.7125None4

RBM22

Q9NW64‐1

ENST00000199814

Pre‐mRNA‐splicing factor RBM22Required for pre‐mRNA splicing as component of the activated spliceosome.GO:0000398; mRNA splicing, via spliceosome420No20–500.798NoneNone

RBM3

P98179‐1

ENST00000376759

RNA‐binding protein 3RNA bindingGO:0050684; regulation of mRNA processing157No1–200.853NoneNone

RBM39

Q14498‐1

ENST00000253363

RNA‐binding protein 39Acts as pre‐mRNA splicing factor.GO:0008380; RNA splicing530No373–4030.655NoneNone

RBMX2

Q9Y388‐1

ENST00000305536

RNA‐binding motif protein, X‐linked 2Involved in pre‐mRNA splicing as component of spliceosome.GO:0008380; RNA splicing322No1–170.907None1

RBMY1A1

P0DJD3‐1

ENST00000382707

RNA‐binding motif protein, Y chromosome, family 1 member A1mRNA bindingGO:0050684; regulation of mRNA processing496No99–1300.934NoneNone

RHOA

P61586‐1

ENST00000418115

Transforming protein RhoAGTPase involved in cytoskeleton organization. Regulates KCNA2. Can be activated by CaMKII.GO:1905114; cell surface receptor signaling pathway involved in cell–cell signaling193No1–770.26889

RHOB

P62745‐1

ENST00000272233

Rho‐related GTP‐binding protein RhoBMediates apoptosis in neoplastically transformed cells after DNA damage. Myosin contractile ring formation during cell cycle cytokinesis.GO:0000278; mitotic cell cycle196No18–480.566NoneNone

RPL10

P27635‐1

ENST00000424325

60S ribosomal protein L10Component of large ribosomal subunit. May play a role in embryonic brain development.GO:0009790; embryo development214No48–780.40518

RPL36A

P83881‐1

ENST00000553110

60S ribosomal protein L36aRibosomal proteinGO:0002181; cytoplasmic translation106No81–106 c 0.580NoneNone

RPS28

P62857‐1

ENST00000600659

40S ribosomal protein S28NAGO:0006413; translational initiation69No54–690.5585None1

RPS6KA3

P51812‐1

ENST00000379565

Ribosomal protein S6 kinase alpha‐3Serine/threonine kinase downstream of ERK. Regulates translation. Modulates mTOR signaling. Role in other pathways.GO:0006468; protein phosphorylation740No114–150, 457–494, 559–595, 681–7110.58251Many

RRAGA

Q7L523‐1

ENST00000380527

Ras‐related GTP‐binding protein AGuanine nucleotide binding protein that plays an important role in MTORC1 signaling for amino acid availability. May lead to cell death through TNF‐α signaling.GO:0043200; response to amino acid313No16–460.5725NoneNone

RRM2

P31350‐1

ENST00000304567

Ribonucleoside‐diphosphate reductase subunit M2Provides the precursors necessary for DNA synthesis.GO:0019438; aromatic compound biosynthetic process389No345–378 c 0.769NoneNone

RTF1

Q92541‐1

ENST00000389629

RNA polymerase‐associated protein RTF1 homologComponent of Paf1 complex. Implicated in regulation of development of embryonic stem cell pluripotency. Required for Wnt and Hox genes.GO:1905114; cell surface receptor signaling pathway involved in cell–cell signaling710No688–7100.672NoneNone

RYR2

Q92736‐1

ENST00000366574

Ryanodine receptor 2Mediates calcium release from the sarcoplasmic reticulum and plays a critical role in cardiac muscle contraction.GO:0030001; metal ion transport4,967Yes4,856–4,8890.8411Many

SAT1

P21673‐1

ENST00000379270

Diamine acetyltransferase 1Acetylation of small molecule polyamines (e.g., spermidine)GO:0009058; biosynthetic process171No148–1710.533NoneNone

SCN2A

Q99250‐1

ENST00000375437

Sodium channel protein type 2 subunit alpha (NaV1.2)Voltage dependent release of sodium permeability. Implicated in hippocampal replay occurring with sharp wave ripples.GO:0030001; metal ion transport2,005Yes404–434, 854–8850.72851Many

SCN8A

Q9UQD0‐1

ENST00000354534

Sodium channel protein type 8 subunit alpha (NaV1.6)Voltage dependent sodium ion channelGO:0030001; metal ion transport1,980Yes396–439, 837–875, 910–961, 1,287–1,323, 1,449–1,499, 1,639–1,671, 1,680–1716,1746–17760.639ManyMany

SF1

Q15637‐1

ENST00000377390

Splicing factor 1Required for first step in ATP dependent spliceosome assemblyGO:0000387; spliceosomal snRNP assembly639No223–258 c 0.751NoneNone

SF3A2

Q15428‐1

ENST00000221494

Splicing factor 3A subunit 2Involved in pre‐mRNA splicing as a component of the SF3A complex.GO:0006376; mRNA splice site selection464No42–720.714NoneNone

SF3B1

O75533‐1

ENST00000335508

Splicing factor 3B subunit 1Pre‐mRNA splicing as part of SF3B complex.GO:0000245; spliceosomal complex assembly1,304No537–573, 816–848, 962–1,002, 1,005–1,062, 1,133–1,170, 1,189–1,2360.567None18

SF3B4

Q15427‐1

ENST00000271628

Splicing factor 3B subunit 4mRNA splicingGO:0050684; regulation of mRNA processing424No1–230.64214

SIN3A

Q96ST3‐1

ENST00000394947

Paired amphipathic helix protein Sin3aTranscriptional repressor. Regulates cell cycle progression. Required for cortical neuron differentiation and callosal axon elongation.GO:0009790; embryo development1,273No112–1560.778None4

SLC25A5

P05141‐1

ENST00000317881

ADP/ATP translocase 2ADP:ATP antiporter that mediates ATP synthesis in the mitochondria.GO:0055085; transmembrane transport298Yes283–2980.714NoneNone

SLC9A6

Q92581‐1

ENST00000370698

Sodium/hydrogen exchanger 6Exchange of protons for sodium and potassium across endosomes. Contributes to calcium homeostasis.GO:0030001; metal ion transport699Yes334–371 c 0.758NoneMany

SMARCA2

P51531‐1

ENST00000382203

Probable global transcription activator SNF2L2Component of SWI/SNF complex which carries out chromatin remodeling. Also belongs to the neural progenitors‐specific chromatin remodeling complex (npBAF complex) and the neuron‐specific chromatin remodeling complex (nBAF complex).GO:0006338; chromatin remodeling1,590No933–9690.6341Many

SMARCA4

P51532‐1

ENST00000344626

Transcription activator BRG1Involved in chromatin remodeling, part of SWI/SNF complexGO:0006338; chromatin remodeling1,647No754–789, 879–912, 955–987, 1,035–1,0670.5591Many

SMARCA5

O60264‐1

ENST00000283131

SWI/SNF‐related matrix‐associated actin‐dependent regulator of chromatin subfamily A member 5Helicase that has ATP‐dependent nucleosome‐remodeling activity. Component of ISWI. Binds to histonesGO:0043044; ATP‐dependent chromatin remodeling1,052No290–3210.651NoneNone

SMARCE1

Q969G3‐1

ENST00000348513

SWI/SNF‐related matrix‐associated actin‐dependent regulator of chromatin subfamily E member 1Chromatin remodeling to activate or repress genes. Component of SWI/SNF.GO:0006338; chromatin remodeling411No54–1090.7855Many

SMC1A

Q14683‐1

ENST00000322213

Structural maintenance of chromosomes protein 1ACentral component of the cohesion complex, which is essential for cohesion of sister chromatids after DNA replication. Involved in DNA repair.GO:0007059; chromosome segregation1,233No36–73, 290–321, 636–670, 1,103–1,1530.52054Many

SNAI2

O43623‐1

ENST00000020945

Zinc finger protein SNAI2Transcriptional repressor. Involved in neural development.GO:0031056; regulation of histone modification268No202–2410.838None2

SNRPC

P09234‐1

ENST00000244520

U1 small nuclear ribonucleoprotein CComponent of U1 snRNP spliceosomeGO:0008380; RNA splicing159No9–470.689NoneNone

SNX12

Q9UMY4‐2

ENST00000374274

Sorting nexin‐12May be involved in intra‐cellular traffickingGO:0051049; regulation of transport162No6–370.691NoneNone

SP3

Q02447‐1

ENST00000310015

Transcription factor Sp3Transcription factor that can act as an activator or repressor depending on isoform or PTM. Binds to GT and GC boxes. Cell cycle regulation, hormone induction, and house‐keeping.GO:0009790; embryo development781No625–6580.8405NoneNone

SPIN1

Q9Y657‐1

ENST00000375859

Spindlin‐1Chromatin reader. Activator of Wnt. May play a role in cell‐cycle regulation during transition from gamete to embryo.GO:1905114; cell surface receptor signaling pathway involved in cell–cell signaling262No83–115, 240–2620.616NoneNone

SPOP

O43791‐1

ENST00000393328

Speckle‐type POZ proteinComponent of Cullin ring based BCR E3 ubiquitin ligase.GO:0016567; protein ubiquitination374No19–60, 62–1240.453615

SRF

P11831‐1

ENST00000265354

Serum response factorTranscription factor that binds to serum response element. Together with MRTFA is coupled to cytoskeletal expression and dynamics. Required for cardiac differentiation and maturation.GO:0009790; embryo development508No134–1680.785NoneNone

SRSF10

O75494‐1

ENST00000492112

Serine/arginine‐rich splicing factor 10Pre‐mRNA splicingGO:0050684; regulation of mRNA processing262No1–44, 52–960.79NoneNone

SRSF2

Q01130‐1

ENST00000392485

Serine/arginine‐rich splicing factor 2Splicing of pre‐mRNAGO:0050684; regulation of mRNA processing221No1–170.603NoneNone

SRSF3

P84103‐1

ENST00000373715

Serine/arginine‐rich splicing factor 3RNA binding and pre‐mRNA cleavageGO:0050684; regulation of mRNA processing164No24–560.390NoneNone

SRY

Q05066‐1

ENST00000383070

Sex‐determining region Y proteinTranscriptional regulator that controls a genetic switch in male development.GO:0030238; male sex determination204No129–1630.9095None16

STAG2

Q8N3U4‐1

ENST00000371160

Cohesin subunit SA‐2Component of cohesin complex. Required for cohesion of sister chromatids after DNA replication.GO:0007059; chromosome segregation1,231No113–1450.700None5

SUMO2

P61956‐1

ENST00000420826

Small ubiquitin‐related modifier 2Ubiquitin like protein that can be attached to proteins on lysine residues.GO:0018205; peptidyl‐lysine modification95No17–540.231NoneNone

SUZ12

Q15022‐1

ENST00000322652

Polycomb protein SUZ12Polycomb group protein. Involved in histone methylation.GO:0034968; histone lysine methylation739No308–3420.811None2

TAF1

P21675‐1

ENST00000423759

Transcription initiation factor TFIID subunit 1Largest component and core scaffold of the TFIID basal transcription factor complex. Kinase and histone acetyltransferase activity.GO:0016573; histone acetylation1,872No1,328–1,3700.738225

TAOK1

Q7L7X3‐1

ENST00000261716

Serine/threonine‐protein kinase TAO1Serine/threonine protein kinase involved in MAPK cascade, DNA damage response, and regulation of cytoskeleton stabilityGO:0070507; regulation of microtubule cytoskeleton organization1,001No181–2150.742NoneNone

TBC1D3H

P0C7X1‐1 b

ENST00000455054

TBC1 domain family member 3HActs as a GTPase activating protein for RAB5.GO:0006886; intracellular protein transport549No344–376 b (based on ENST00000455054)0.986NoneNone

TBL1XR1

Q9BZK7‐1

ENST00000430069

F‐box‐like/WD repeat‐containing protein TBL1XR1Recruitment of ubiquitin/19S proteosome to nuclear receptor‐regulated transcription units. Probably acts as integral component of the N‐Cor corepressor complex.GO:0009790; embryo development514No261–294, 322–3550.588NoneMany

TCF4

P15884‐1

ENST00000564999

Transcription factor 4Transcription factor that binds to immunoglobulin enhancer. Involved in neuron differentiation.GO:0006366; transcription by RNA polymerase II667No552–587 c 0.7321Many

THOC2

Q8NI27‐1

ENST00000245838

THO complex subunit 2Required for efficient export of poly‐A spliced mRNA. Component of TREX complex.GO:0006397; mRNA processing1,593No135–188, 698–731, 733–786, 1,059–1,1000.653117

TLK2

Q86UE8‐1

ENST00000326270

Serine/threonine‐protein kinase tousled‐like 2Serine/threonine kinase involved in chromatin assemblyGO:1902275; regulation of chromatin organization772No651–6890.631None2

TOP1

P11387‐1

ENST00000361337

DNA topoisomerase 1Topoisomerase. DNA repair/strain resolution.GO:0009790; embryo development765No476–5120.700None2

TRA2B

P62995‐1

ENST00000453386

Transformer‐2 protein homolog betamRNA splicingGO:0050684; regulation of mRNA processing288No104–1380.757NoneNone

TRIM24

O15164‐1

ENST00000343526

Transcription intermediary factor 1‐alphaTranscriptional coactivator that interacts with numerous nuclear receptors and modulates transcription. Interacts with chromatin histone H3 modifications.GO:0006351; transcription, DNA‐templated1,050No818–8580.854NoneNone

TRPC5

Q9UL62‐1

ENST00000262839

Short transient receptor potential channel 5Calcium channel. Causes neuron apoptosis.GO:0030001; metal ion transport973Yes290–3230.716NoneNone

TUBA1A

Q71U36‐1

ENST00000301071

Tubulin alpha‐1A chainTubulin chainGO:0000226; microtubule cytoskeleton organization451No1–41, 43–172, 174–241, 243–286, 288–338, 340–399, 401–4390.000ManyMany

TUBA1B

P68363‐1

ENST00000336023

Tubulin alpha‐1B chainTubulin chainGO:0007017; microtubule‐based process451No1–18, 81–117, 119–160, 185–231, 243–286, 341–374, 382–4410.16NoneNone

TUBB

P07437‐1

ENST00000327892

Tubulin beta chainMajor component of microtubulesGO:0000226; microtubule cytoskeleton organization444No49–120, 122–157, 190–240, 242–273, 297–3300.171514

TUBB2A

Q13885‐1

ENST00000333628

Tubulin beta‐2A chainTubulin component.GO:0000226; microtubule cytoskeleton organization445No300–3670.256117

TUBB2B

Q9BVA1‐1

ENST00000259818

Tubulin beta‐2B chainTubulin component. Implicated in neuronal migration.GO:0009790; embryo development445No283–3560.2562Many

TUBB4B

P68371‐1

ENST00000340384

Tubulin beta‐4B chainSubunit of microtubulesGO:0007017; microtubule‐based process445No163–1980.257NoneNone

U2AF1

Q01081‐1

ENST00000291552

Splicing factor U2AF 35 kDa subunitPlays critical role in mRNA splicing.GO:0008380; RNA splicing240No1–320.44624

U2AF2

P26368‐1

ENST00000308924

Splicing factor U2AF 65 kDa subunitPre‐mRNA splicingGO:0050684; regulation of mRNA processing475No188–220, 245–3090.4625NoneNone

U2SURP

O15042‐1

ENST00000473835

U2 snRNP‐associated SURP motif‐containing proteinRNA bindingGO:0008380; RNA splicing1,029No291–325, 585–6210.7705NoneNone

UBC

P0CG48‐1

ENST00000536769

Polyubiquitin‐CUbiquitiGO:1905114; cell surface receptor signaling pathway involved in cell–cell signaling685No470–5090.455NoneNone

UBE2D3

P61077‐1

ENST00000453744

Ubiquitin‐conjugating enzyme E2 D3E2 ubiquitin enzymeGO:0006513; protein monoubiquitination147No33–98, 130–147 c 0.178NoneNone

UBE2E3

Q969T4‐1

ENST00000410062

Ubiquitin‐conjugating enzyme E2 E3Accepts ubiquitin from E1 complex. Participates in regulation of transepithelial sodium transport in renal cells.GO:0016567; protein ubiquitination207No65–950.546NoneNone

UBE2H

P62256‐1

ENST00000355621

Ubiquitin‐conjugating enzyme E2 HE2 ubiquitin ligaseGO:0000209; protein polyubiquitination183No33–670.326None1

UBE2I

P63279‐1

ENST00000355803

SUMO‐conjugating enzyme UBC9Covalently attaches SUMO to target proteins.GO:0018205; peptidyl‐lysine modification158No62–98, 117–1480.193NoneNone

UBE2K

P61086‐1

ENST00000261427

Ubiquitin‐conjugating enzyme E2 KE2 ubiquitin ligaseGO:0000209; protein polyubiquitination200No1–230.370NoneNone

UHRF2

Q96PU4‐1

ENST00000276893

E3 ubiquitin‐protein ligase UHRF2E3 ubiquitin ligase that plays important roles in DNA methylation, histone modifications, cell cycle, and DNA repair. Reads for 5‐hydroxymethylcytosine in DNA.GO:0016567; protein ubiquitination802No582–6260.721NoneNone

USP9Y

O00507‐1

ENST00000338981

Probable ubiquitin carboxyl‐terminal hydrolase FAF‐YProbable deubiquitinase. Essential component of TGF‐Beta/BMP signaling.GO:0016579; protein deubiquitination2,555No1,326–1,3810.951NoneNone

UTY

O14607‐1

ENST00000331397

Histone demethylase UTYMale specific histone demethylaseGO:0016570; histone modification1,347No124–158, 873–918, 1,094–1,138, 1,194–1,225 c 0.841NoneNone

VAV1

P15498‐1

ENST00000602142

Proto‐oncogene vavCouples to tyrosine kinase signals with rho/Rac GTPases, and leads to cell differentiation and/or proliferation.GO:0010942; Positive regulation of cell death845No365–4010.778None3

WNK3

Q9BYP7‐1

ENST00000354646

Serine/threonine‐protein kinase WNK3Serine/threonine kinase that plays an important role in electrolyte homeostasis.GO:0043270; positive regulation of ion transport1,800No332–3640.915None2

XPR1

Q9UBH6‐1

ENST00000367590

Xenotropic and polytropic retrovirus receptor 1Phosphate homeostasis. Phosphate export. Binds inositol polyphosphates.GO:0006873; cellular ion homeostasis696Yes112–1490.79015

YTHDC1

Q96MU7‐1

ENST00000344157

YTH domain‐containing protein 1Pre‐mRNA splicing; mRNA export; involved in spermatogenesis.GO:0050684; regulation of mRNA processing727No361–3950.872NoneNone

YY1

P25490‐1

ENST00000262238

Transcriptional repressor protein YY1Transcription factor that exhibits positive and negative control on large number of genes. Binds to CCGCCATNTT.GO:0001558; regulation of cell growth414No288–326, 338–4100.52645

ZBTB16

Q05516‐1

ENST00000335953

Zinc finger and BTB domain‐containing protein 16Transcriptional repressor. May play a role in myeloid maturation.GO:0009790; embryo development673No576–6220.7855None1

ZBTB20

Q9HC78‐1

ENST00000474710

Zinc finger and BTB domain‐containing protein 20May be a transcription factor involved in hematopoiesis, oncogenesis and, immune response.GO:0001678; cellular glucose homeostasis741No574–6180.7355919

ZEB2

O60315‐1

ENST00000558170

Zinc finger E‐box‐binding homeobox 2Transcriptional inhibitor of E‐cadherin and represses expression of MEOX2. Binds to CACCT in different promoters.GO:0009790; embryo development1,214No296–326, 1,064–1,0940.7861Many

ZFX

P17010‐1

ENST00000379177

Zinc finger X‐chromosomal proteinProbably a transcriptional activator.GO:0006357; regulation of transcription by RNA polymerase II805No414–4440.720NoneNone

ZFY

P08048‐1

ENST00000383052

Zinc finger Y‐chromosomal proteinProbable transcription factorGO:0006357; regulation of transcription by RNA polymerase II801No654–691 c 0.828NoneNone

ZMAT2

Q96NC0‐1

ENST00000274712

Zinc finger matrin‐type protein 2Involved in pre‐mRNA splicing as a component of the spliceosome.GO:0000398; mRNA splicing, via spliceosome199No67–1010.709NoneNone

ZMYM3

Q14202‐1

ENST00000314425

Zinc finger MYM‐type protein 3Plays a role in cell morphology and cytoskeletal organization.GO:0007010; cytoskeleton organization1,370No1,185–1,2150.724None3

ZMYND8

Q9ULU4‐1

ENST00000311275

Protein kinase C‐binding protein 1Transcriptional corepressor for KDM5D. Function seems to be histone recognition.GO:0060284; regulation of cell development1,186No1,033–1072 c 0.790NoneNone

ZNF84

P51523‐1

ENST00000327668

Zinc finger protein 84May be involved in transcriptionGO:0006357; regulation of transcription by RNA polymerase II738No486–5270.885NoneNone

The transcript containing the intolerant segment is not found in UniProt and is not the canonical transcript. The amino acid numbering provided for the zero‐tolerance segment is based on the sequence of the protein encoded by the indicated transcript.

The transcript containing the intolerant segment is not found in UniProt. The amino acid numbering provided for the zero‐tolerance segment is based on the sequence of the protein encoded by the indicated transcript.

UniProt numbering, which in this case is different from the numbering of the MTR‐designated canonical transcript in gnomAD.

UniProt transcript used that is not considered the canonical sequence by UniProt.

Abbreviations: ATP, adenosine triphosphate; ER, endoplasmic reticulum; GO, gene ontology; GPCR, G‐protein coupled receptor; GDP, guanosine diphosphate; GTP, guanosine triphosphate; MTR, missense tolerance ratio; PTM, post‐translational modification.

FIGURE 1

Histograms for intolerance within the 257 proteins containing a zero‐tolerance segment. (a) Distribution of MTR scores for all possible 31 residue segments. Segments with a score in the “at or near zero” bin represent 1.9% of all segments. The mean MTR score is 0.69 ± 0.26 and the median score is 0.73. (b) Distribution of median protein MTR scores based on analysis of all possible 31 amino acid segments within each protein. The mean of these medians is 0.71 ± 0.26. MTR, missense tolerance ratio

Human proteins that contain at least one zero‐tolerance segment ABL1 P00519‐1 ENST00000318560 ACTB P60709‐1 ENST00000675515 ACTC1 P68032‐1 ENST00000290378 ACTL6B O94805‐1 ENST00000160382 ACTR2 P61160‐1 ENST00000260641 AGO2 Q9UKV8‐1 ENST00000220592 AP2M1 Q96CW1‐1 ENST00000292807 AR P10275‐1 ENST00000374690 ARF1 P84077‐1 ENST00000541182 ARF5 P84085‐1 ENST00000000233 ARIH1 Q9Y4X5‐1 ENST00000379887 ATF2 P15336‐1 ENST00000264110 ATP1A1 P05023‐1 ENST00000295598 ATP1A3 P13637‐2 ENST00000543770 ATP2B1 P20020‐3 ENST00000428670 ATP6V0C P27449‐1 ENST00000330398 ATRX P46100‐1 ENST00000373344 BCL11B Q9C0K0‐1 ENST00000357195 BRD4 O60885‐1 ENST00000263377 BRD8 Q9H0E9‐1 ENST00000254900 CACNA1A O00555‐8 ENST00000360228 CACNA1C Q13936‐11 ENST00000347598 CACNA1E Q15878‐1 ENST00000367573 CALM1 P0DP23‐1 ENST00000356978 CALM2 P0DP24‐1 ENST00000272298 CAMK2A Q9UQM7‐1 ENST00000348628 CAND1 Q86VP6‐1 ENST00000545606 CASK O14936‐1 ENST00000378163 CDC42 P60953‐2 ENST00000400259 CDC73 Q6P1J9‐1 ENST00000367435 CDK11B P21127‐1 ENST00000407249 CELF2 O95319‐1 ENST00000416382 CHD2 O14647‐1 ENST00000394196 CHD4 Q14839‐1 ENST00000544040 CLASRP Q8N2M8‐1 ENST00000391953 CLCN4 P51793‐1 ENST00000380833 CLTC Q00610‐1 ENST00000269122 CNOT6L Q96LI5‐1 ENST00000504123 CPSF4 O95639‐1 ENST00000292476 CREB1 P16220‐2 ENST00000430624 CREBL2 O60519‐1 ENST00000228865 CSNK2B P67870‐1 ENST00000375882 CSTF2 P33240‐1 ENST00000372972 CTCF P49711‐1 ENST00000264010 CUL1 Q13616‐1 ENST00000325222 CUL4B Q13620‐2 ENST00000371322 DDX3X O00571‐2 ENST00000457138 DENND1A Q8TEH3‐1 ENST00000373624 DHX15 O43143‐1 ENST00000336812 DHX9 Q08211‐1 ENST00000367549 DKC1 O60832‐1 ENST00000369550 DLG3 Q92796‐1 ENST00000374360 DUSP8 Q13202‐1 ENST00000397374 EHBP1 Q8NDI1‐1 ENST00000263991 EHMT2 Q96KQ7‐1 ENST00000375537 EIF1AX P47813‐1 ENST00000379607 EIF1AY O14602‐1 ENST00000361365 EIF2S2 P20042‐1 ENST00000374980 EIF2S3 P41091‐1 ENST00000253039 EIF3A Q14152‐1 ENST00000369144 EIF4A3 P38919‐1 ENST00000649764 ERH P84090‐1 ENST00000557016 ETF1 P62495‐1 ENST00000360541 F8 P00451‐1 ENST00000360256 FGD1 P98174‐1 ENST00000375135 FMR1 Q06787‐1 ENST00000370475 FOXG1 P55316‐1 ENST00000313071 FOXJ3 Q9UPW0‐1 ENST00000372572 GABPA Q06546‐1 ENST00000354828 GABRA2 P47869‐1 ENST00000514090 GABRA3 P34903‐1 ENST00000370314 GABRB2 P63137‐1 ENST00000274547 GDF11 O95390‐1 ENST00000257868 GJB1 P08034‐1 ENST00000374029 GLRA2 P23416‐1 ENST00000218075 GNAL P38405‐1 ENST00000535121 GNAQ P50148‐1 ENST00000286548 GNAS Q5JWF2‐1 ENST00000371100 GNG3 P63215‐1 ENST00000294117 GOLGA8G Q08AF8‐1 ENST00000526619 GORASP2 Q9H8Y8‐1 ENST00000234160 GRIA2 P42262‐1 ENST00000264426 GRIA3 P42263‐1 ENST00000622768 GRIN1 Q05586‐1 ENST00000371561 GRIN2A Q12879‐1 ENST00000396573 GRIN2B Q13224‐1 ENST00000609686 GSPT1 P15170‐1 ENST00000563468 HCFC1 P51610‐1 ENST00000310441 HMGN4 O00479‐1 ENST00000377575 HNF1B P35680‐1 ENST00000617811 HNRNPC P07910‐1 ENST00000554455 HNRNPD Q14103‐1 ENST00000313899 HNRNPH2 P55795‐1 ENST00000316594 HNRNPK P61978‐2 ENST00000351839 HSD17B10 Q99714‐1 ENST00000168216 HUWE1 Q7Z6Z7‐1 ENST00000342160 INTS6 Q9UL03‐1 ENST00000311234 IRAK1 P51617‐1 ENST00000369980 KAT7 O95251‐1 ENST00000259021 KCNA3 P22001‐1 ENST00000369769 KCNB1 Q14721‐1 ENST00000371741 KCNC2 Q96PR1‐1 ENST00000549446 KCND3 Q9UK17‐1 ENST00000315987 KCNH7 Q9NS40‐1 ENST00000332142 KCNJ3 P48549‐1 ENST00000295101 KCNMA1 Q12791‐1 ENST00000286628 KCNQ2 O43526‐1 ENST00000359125 KDM2A Q9Y2K7‐1 ENST00000529006 KDM3B Q7LBC6‐1 ENST00000314358 KIF11 P52732‐1 ENST00000260731 KIF1A Q12756‐1 ENST00000498729 KIF5A Q12840‐1 ENST00000455537 KMT2C Q8NEZ4‐1 ENST00000262189 KPNB1 Q14974‐1 ENST00000290158 LPA P08519‐1 ENST00000316300 LUC7L3 O95232‐1 ENST00000505658 MAMLD1 Q13495‐4 ENST00000426613 MAPRE2 Q15555‐1 ENST00000300249 MED12 Q93074‐1 ENST00000374080 MED14 O60244‐1 ENST00000324817 MEF2C Q06413‐1 ENST00000437473 METTL14 Q9HCE5‐1 ENST00000388822 MMP16 P51512‐1 ENST00000286614 MOB4 Q9Y3A3‐1 ENST00000323303 MRC1 P22897‐1 ENST00000569591 MYB P10242‐1 ENST00000367814 NAA10 P41227‐1 ENST00000464845 NAA15 Q9BXJ9‐1 ENST00000296543 NEDD8 Q15843‐1 ENST00000250495 NIPBL Q6KC79‐1 ENST00000282516 NONO Q15233‐1 ENST00000276079 NR4A2 P43354‐1 ENST00000339562 NRBP1 Q9UHY1–1 ENST00000379852 NSMF Q6X4W1‐1 ENST00000371475 NUDT11 Q96G61‐1 ENST00000375992 NUDT21 O43809‐1 ENST00000300291 OGT O15294‐1 ENST00000373719 OR4F17 Q8NGA8‐1 ENST00000585993 OTUD5 Q96G74‐1 ENST00000156084 PAK2 Q13177‐1 ENST00000327134 PAK3 O75914‐1 ENST00000372010 PBX1 P40424‐1 ENST00000420696 PCBP2 Q15366‐1 ENST00000439930 PCYT1B Q9Y5K3‐1 ENST00000379144 PHF5A Q7RTV0‐1 ENST00000216252 PIK3CA P42336‐1 ENST00000263967 PLS3 P13797‐1 ENST00000355899 POLR2A P24928‐1 NA POLR2B P30876‐1 ENST00000381227 POU3F2 P20265‐1 ENST00000328345 POU3F3 P20264‐1 ENST00000361360 PPP1CB P62140‐1 ENST00000395366 PPP2CA P67775‐1 ENST00000481195 PPP3R1 P63098‐1 ENST00000234310 PRPF4B Q13523‐1 ENST00000337659 PRPF8 Q6P2Q9‐1 ENST00000572621 PRPS1 P60891‐1 ENST00000372435 PSMC1 P62191‐1 ENST00000261303 PSMC2 P35998‐1 ENST00000435765 PSMC5 P62195‐1 ENST00000310144 PSMD14 O00487‐1 ENST00000409682 PUF60 Q9UHX1‐1 ENST00000526683 PURA Q00577‐1 ENST00000331327 RAB2A P61019‐1 ENST00000262646 RAC1 P63000‐1 ENST00000348035 RAN P62826‐1 ENST00000543796 RBBP4 Q09028‐1 ENST00000373493 RBBP5 Q15291‐1 ENST00000264515 RBBP7 Q16576‐1 ENST00000380087 RBM10 P98175‐1 ENST00000377604 RBM22 Q9NW64‐1 ENST00000199814 RBM3 P98179‐1 ENST00000376759 RBM39 Q14498‐1 ENST00000253363 RBMX2 Q9Y388‐1 ENST00000305536 RBMY1A1 P0DJD3‐1 ENST00000382707 RHOA P61586‐1 ENST00000418115 RHOB P62745‐1 ENST00000272233 RPL10 P27635‐1 ENST00000424325 RPL36A P83881‐1 ENST00000553110 RPS28 P62857‐1 ENST00000600659 RPS6KA3 P51812‐1 ENST00000379565 RRAGA Q7L523‐1 ENST00000380527 RRM2 P31350‐1 ENST00000304567 RTF1 Q92541‐1 ENST00000389629 RYR2 Q92736‐1 ENST00000366574 SAT1 P21673‐1 ENST00000379270 SCN2A Q99250‐1 ENST00000375437 SCN8A Q9UQD0‐1 ENST00000354534 SF1 Q15637‐1 ENST00000377390 SF3A2 Q15428‐1 ENST00000221494 SF3B1 O75533‐1 ENST00000335508 SF3B4 Q15427‐1 ENST00000271628 SIN3A Q96ST3‐1 ENST00000394947 SLC25A5 P05141‐1 ENST00000317881 SLC9A6 Q92581‐1 ENST00000370698 SMARCA2 P51531‐1 ENST00000382203 SMARCA4 P51532‐1 ENST00000344626 SMARCA5 O60264‐1 ENST00000283131 SMARCE1 Q969G3‐1 ENST00000348513 SMC1A Q14683‐1 ENST00000322213 SNAI2 O43623‐1 ENST00000020945 SNRPC P09234‐1 ENST00000244520 SNX12 Q9UMY4‐2 ENST00000374274 SP3 Q02447‐1 ENST00000310015 SPIN1 Q9Y657‐1 ENST00000375859 SPOP O43791‐1 ENST00000393328 SRF P11831‐1 ENST00000265354 SRSF10 O75494‐1 ENST00000492112 SRSF2 Q01130‐1 ENST00000392485 SRSF3 P84103‐1 ENST00000373715 SRY Q05066‐1 ENST00000383070 STAG2 Q8N3U4‐1 ENST00000371160 SUMO2 P61956‐1 ENST00000420826 SUZ12 Q15022‐1 ENST00000322652 TAF1 P21675‐1 ENST00000423759 TAOK1 Q7L7X3‐1 ENST00000261716 TBC1D3H P0C7X1‐1 ENST00000455054 TBL1XR1 Q9BZK7‐1 ENST00000430069 TCF4 P15884‐1 ENST00000564999 THOC2 Q8NI27‐1 ENST00000245838 TLK2 Q86UE8‐1 ENST00000326270 TOP1 P11387‐1 ENST00000361337 TRA2B P62995‐1 ENST00000453386 TRIM24 O15164‐1 ENST00000343526 TRPC5 Q9UL62‐1 ENST00000262839 TUBA1A Q71U36‐1 ENST00000301071 TUBA1B P68363‐1 ENST00000336023 TUBB P07437‐1 ENST00000327892 TUBB2A Q13885‐1 ENST00000333628 TUBB2B Q9BVA1‐1 ENST00000259818 TUBB4B P68371‐1 ENST00000340384 U2AF1 Q01081‐1 ENST00000291552 U2AF2 P26368‐1 ENST00000308924 U2SURP O15042‐1 ENST00000473835 UBC P0CG48‐1 ENST00000536769 UBE2D3 P61077‐1 ENST00000453744 UBE2E3 Q969T4‐1 ENST00000410062 UBE2H P62256‐1 ENST00000355621 UBE2I P63279‐1 ENST00000355803 UBE2K P61086‐1 ENST00000261427 UHRF2 Q96PU4‐1 ENST00000276893 USP9Y O00507‐1 ENST00000338981 UTY O14607‐1 ENST00000331397 VAV1 P15498‐1 ENST00000602142 WNK3 Q9BYP7‐1 ENST00000354646 XPR1 Q9UBH6‐1 ENST00000367590 YTHDC1 Q96MU7‐1 ENST00000344157 YY1 P25490‐1 ENST00000262238 ZBTB16 Q05516‐1 ENST00000335953 ZBTB20 Q9HC78‐1 ENST00000474710 ZEB2 O60315‐1 ENST00000558170 ZFX P17010‐1 ENST00000379177 ZFY P08048‐1 ENST00000383052 ZMAT2 Q96NC0‐1 ENST00000274712 ZMYM3 Q14202‐1 ENST00000314425 ZMYND8 Q9ULU4‐1 ENST00000311275 ZNF84 P51523‐1 ENST00000327668 The transcript containing the intolerant segment is not found in UniProt and is not the canonical transcript. The amino acid numbering provided for the zero‐tolerance segment is based on the sequence of the protein encoded by the indicated transcript. The transcript containing the intolerant segment is not found in UniProt. The amino acid numbering provided for the zero‐tolerance segment is based on the sequence of the protein encoded by the indicated transcript. UniProt numbering, which in this case is different from the numbering of the MTR‐designated canonical transcript in gnomAD. UniProt transcript used that is not considered the canonical sequence by UniProt. Abbreviations: ATP, adenosine triphosphate; ER, endoplasmic reticulum; GO, gene ontology; GPCR, G‐protein coupled receptor; GDP, guanosine diphosphate; GTP, guanosine triphosphate; MTR, missense tolerance ratio; PTM, post‐translational modification. Histograms for intolerance within the 257 proteins containing a zero‐tolerance segment. (a) Distribution of MTR scores for all possible 31 residue segments. Segments with a score in the “at or near zero” bin represent 1.9% of all segments. The mean MTR score is 0.69 ± 0.26 and the median score is 0.73. (b) Distribution of median protein MTR scores based on analysis of all possible 31 amino acid segments within each protein. The mean of these medians is 0.71 ± 0.26. MTR, missense tolerance ratio In addition to the 257 proteins with certain zero‐tolerance segments, we also found 33 human proteins that have 31 or longer residue segments with an MTR score of 0, but for which the statistics associated with this score are uncertain because of an insufficient number of observed silent mutations with the intolerant segments. These 33 proteins are listed in Table S1 and will require additional data to determine whether the preliminary MTR = 0 score seen for at least one segment within each of these proteins is confirmed in a statistically robust manner. It may be significant that 7 of these 33 proteins have sites of known ClinVar variants, suggestive of high relevance to human health (Table S1).

Homology of human zero‐tolerance proteins to corresponding proteins from other mammals

For each of the 257 proteins containing one or more zero‐tolerance segments, we conducted BLASTP sequence homology searches for both the entire protein sequence and the zero‐tolerance segments. Results were analyzed for the 250 closest mammalian homologs. Figure 2 gives a representative sample of the results, while Figure S1 shows the results for all 257 proteins. For each protein the statistical distribution of sequence identity to the closest 250 mammalian homologs is presented for both full‐length protein sequence (black) and zero‐tolerance segment(s) (red).
FIGURE 2

Representative examples of sequence identity patterns for proteins containing zero‐tolerance segments, comparing both the whole‐protein (black plots) and the intolerant segmental (red plots) homology levels to the 250 nearest mammalian homologs following BLASTP searches of NCBI. GENE.1, GENE.2, and so forth indicate which non‐contiguous intolerant segment for that gene was searched. The distributions of sequence identities seen for the 250 closest homologs to each protein are presented as box‐and‐whiskers plots. The bold bar is the median, the wings of the bars are the quartiles and the whiskers are 1.5 times the inner quartile ranges. The dots are outliers that lie beyond the whiskers. The complete results for all 257 proteins with zero‐tolerance segments are presented in Figure S1

Representative examples of sequence identity patterns for proteins containing zero‐tolerance segments, comparing both the whole‐protein (black plots) and the intolerant segmental (red plots) homology levels to the 250 nearest mammalian homologs following BLASTP searches of NCBI. GENE.1, GENE.2, and so forth indicate which non‐contiguous intolerant segment for that gene was searched. The distributions of sequence identities seen for the 250 closest homologs to each protein are presented as box‐and‐whiskers plots. The bold bar is the median, the wings of the bars are the quartiles and the whiskers are 1.5 times the inner quartile ranges. The dots are outliers that lie beyond the whiskers. The complete results for all 257 proteins with zero‐tolerance segments are presented in Figure S1 For most proteins, it is seen that the median of the distribution of % sequence homology is much higher for the zero‐tolerance segments than it is for the entire protein sequence. Indeed, for a great many zero‐tolerance segments, the degree of sequence identity to all 250 closest mammalian homologs is 100%. However, there are also a number of exceptions seen in Figures 2 and S1, where the median sequence homology observed for a given zero‐tolerance segment is significantly less than 100%. For example, the intolerant segment found in CTCF exhibits a lower median homology score to mammalian homologs than does the full protein sequence of that protein. There are a variety of potential explanations for why any given zero‐tolerance segment is not absolutely conserved among mammalian homologs. Because currently there are only roughly 150 fully sequenced mammalian genomes, the fact that we are considering the data for the 250 nearest homologs implies that many of the homologs included in the analysis for a given protein are paralogs, not orthologs. Between paralogs, even functionally critical residues are sometimes expected to exhibit variation. Another and particularly intriguing possibility is that some less‐than‐100%‐conserved human zero‐tolerance segments play critical roles in establishing traits that are unique to humans. Only careful future studies of specific instances will provide convincing explanations for why some of the proteins shown in Figures 2 and S1 contain human zero‐tolerance segments that are less‐than‐100% conserved.

Likely protein basis for most evolutionary intolerance associated with protein‐encoding genes

While it cannot be ruled out for all entries in Table 1 that the mechanism responsible for evolutionary purifying selection involves changes in parent DNA or mRNA structure (see the previous review ), we hypothesize that for the vast majority of cases, evolutionary intolerance stems from the altered properties of the encoded mutant protein. This is here supported by two observations. (a) A number of the proteins listed in Table 1 are known to directly form complexes with other proteins appearing in this table, suggesting that disruption by a single mutation in a single subunit of critical multi‐protein complexes is a common mechanism of underlying zero‐tolerance. In some other cases, multiple proteins containing zero‐tolerance segments are seen to be on the same pathways, even if they do not actually form a complex. (b) There are many proteins appearing in Table 1 that are known to be central players in human biology and physiology—proteins that one might expect to contain intolerant segments. These include calmodulin, ubiquitin, SUMO, clathrin, various tubulin subunits, actin, and the rynanodine receptor. In light of these considerations, this paper focuses on the implications of genetic intolerance as it relates to the encoded proteins.

A case study of intolerance: The voltage‐gated potassium and sodium channels

Although a comprehensive structural study of all 257 proteins with zero‐tolerance regions is beyond the scope of the current work, we nonetheless subjectively perused several dozen of the proteins in Table 1. Results suggest that zero‐tolerance segments tend to occupy well‐structured regions of proteins, often including functionally‐critical sites. An example is provided by the six voltage‐gated potassium channels and two voltage‐gated sodium channels appearing in Table 1 (see list in Table S2). With only one exception, the 19 intolerant segments documented for these eight channels are contained within the part of the channels that spans the critical transmembrane S4 segment of the voltage sensor domain through the transmembrane S6 segment, the latter of which includes the channel gate and ends the pore domain (see Table S2). The structural elements in this span are all known to be critical to voltage‐gated sodium and potassium channel function, where S4 and the S4–S5 linker are central to channel regulation by the transmembrane electrical potential. The actual pore is comprised of S5, the selectivity filter, the pore helix, and S6. , The location of zero‐tolerant segments in these structural elements strongly implicates mutation‐induced alteration of channel function as the mechanistic basis for evolutionary intolerance associated with these segments. It is interesting that from channel to channel the exact location of the intolerant segments varies. For example, the single zero‐tolerance segment in KCNA3 spans the S4 segment and S4–S5 linker, which are key for voltage regulation, while the single zero‐tolerance segment in KCNH7 spans the pore helix, selectivity filter, and S6, which are critical for ion selectivity and flux. , The single zero‐tolerance segment that was not located within the functionally‐central S4 through S6 part of the channels is the 82–113 residues segment found in the KCNB1 potassium channel. This segment is located in its N‐terminal tetramerization (T) domain, a domain found in some, but not all voltage‐gated potassium channels. Mutagenesis studies of H105 located within this intolerant segment revealed that mutations at this site do not interfere with KCNB1 homotetramerization, but rather disrupt heterotetramerization with subunits of voltage‐gated KV6 potassium channel family members. This strongly suggests that the basis for zero‐tolerance in this segment is not disruption of the formation of homotetrameric KCNB1 channels but rather disruption of the formation of heterotetrameric KCNB1/Kv6 channels. A final observation should be made about the sodium channel SCN2A. Unlike homotetrameric potassium channels, human voltage‐gated sodium channels combine all four subunits in a single long chain in which the four connected “pseudo‐subunits” are homologous, but are not identical in sequence, resulting in a fourfold semi‐symmetric channel. It is interesting that only two of the pseudo‐subunits of SCN2A contain zero‐tolerance segments, not all four. Some pseudo‐subunits in voltage‐dependent sodium channels are evidently more tolerant of mutations than others.

Previously overlooked proteins containing zero‐tolerance segments

While the zero‐tolerance proteins include a number of prominent proteins, on the opposite end of the spectrum are a number of genes/proteins listed in Table 1 that are almost completely uncharacterized. A February 2022 PubMed search on each of the following nine genes yielded, at most, only eight papers mentioning each: CLASRP, GOLGA8G, HMGN4, OR4F17, RBMX2, TBC1D3H, U2SURP, ZMAT2, and ZNF84. The presence of zero‐tolerance segments within these proteins suggests that at least some of them are associated with critical physiological functions and/or pathophysiology. While only further study will confirm this prediction, the MTR data seems compelling that such studies are merited. Here, we further highlight the case of OR4F17, which is a membrane protein and putative olfactory receptor. Only 36 of the zero‐tolerance proteins (13%) are integral membrane proteins, which include the aforementioned voltage‐gated channels (Table 1). This is despite the fact that membrane proteins represent roughly 20–30% of all human proteins and are the targets for more than 50% of all approved drugs. This highlights the fact that the factors that decide what represents a good target for drug development correlate only partially with the priorities of natural selection. Indeed, a particularly intriguing observation is that while the human G‐protein coupled receptor (GPCR) superfamily includes the targets for about one third of all approved drugs, OR4F17 is the only GPCR among the 257 proteins of Table 1 and is classified as one of the 500 human olfactory receptors. This raises the question that why an olfactory receptor would contain a zero‐tolerance segment. We suggest three competing hypotheses. First, it could be that mutations in the intolerant segment of this receptor (located at its N‐terminus) could result in a toxic gain‐of‐function effect such as promoting the formation of aggregates or amyloids by this protein. Another possibility is that OR4F17 is not actually an olfactory receptor but has a different and very important physiological function that is disrupted by mutations in its intolerant segment. A third possibility is that it is an olfactory receptor but has additional physiological functions. This would not be unprecedented. Only future experiments will determine which, if any, of these hypotheses are correct. However, this serves as another illustration of the power of intolerance analysis to direct attention to interesting biological questions.

Proteins involved in RNA splicing represent the largest group of proteins containing at least one intolerant segment

We sought preliminary insight into which pathways, networks, and protein complexes are most commonly represented among the 257 proteins with intolerant segments. Cytoscape stringApp with a confidence cutoff of 0.95 was used to determine high confidence interactors. This approach yielded protein interaction maps that group proteins based on broad molecular or cellular functional categories (Figure 3). The largest clusters of networked proteins are associated with central cellular processes such as chromatin remodeling, protein degradation, RNA splicing, the cytoskeleton, the cell cycle, and nucleic acids biochemistry.
FIGURE 3

Protein interaction network using Cytoscape stringApp based on an interactor cut‐off stringdb score ≥ 0.95. Not all proteins returned by this analysis (~150) are visualized here, as networks that consisted of two proteins were excluded from the visualization (with one exception). The clusters highlighted were manually assigned by identifying the general functions of proteins in the clustered area

Protein interaction network using Cytoscape stringApp based on an interactor cut‐off stringdb score ≥ 0.95. Not all proteins returned by this analysis (~150) are visualized here, as networks that consisted of two proteins were excluded from the visualization (with one exception). The clusters highlighted were manually assigned by identifying the general functions of proteins in the clustered area Further analysis of the protein interaction‐mapping presented in Figure 3 using the stringApp network clustering with a granularity parameter of three was used to help identify sets of proteins that may participate in functional complexes (Figure 4). Major complexes include proteins of the spliceosome, translation, tubulin, and NMDA receptor.
FIGURE 4

Granulated protein interaction networks among proteins containing intolerant segments. We used a granularity parameter of 3 to form more discrete interaction nodes that may represent specific protein complexes. Proteins are labeled according to gene symbol. The darkness/thickness of the lines connecting nodes is indicative of Cytoscape stringApp experimental score, which is based on high‐throughput interaction mapping, where thicker darker lines reflect more confident interactions based on experiments. The networks shown are manually identified based of the general function of the cluster. Sub‐networks of six or fewer proteins are not shown

Granulated protein interaction networks among proteins containing intolerant segments. We used a granularity parameter of 3 to form more discrete interaction nodes that may represent specific protein complexes. Proteins are labeled according to gene symbol. The darkness/thickness of the lines connecting nodes is indicative of Cytoscape stringApp experimental score, which is based on high‐throughput interaction mapping, where thicker darker lines reflect more confident interactions based on experiments. The networks shown are manually identified based of the general function of the cluster. Sub‐networks of six or fewer proteins are not shown GO Panther's overrepresentation analysis was employed to identify biological processes that are overrepresented in the intolerant gene list as compared to the Homo sapiens reference. We filtered for processes that had less than 500 proteins in the reference list to avoid very general biological processes and instead focused on more specific pathways. In addition, the p value cut‐off was required to be less than 5 × 10−10. As can be seen in Figure 5a, the most overrepresented biological processes are all or in some way related to mRNA processing, particularly RNA splicing. The two other pathways noted were histone modification and regulation of membrane potential. For further confirmation, the list of proteins was analyzed using Enrichr. Consistent with Panther, it was seen that three different databases (Bioplanet, WikiPathway 2021 Human, and KEGG 2021 Human34423492) indicate that mRNA processing was the most significantly enriched pathway. Approximately 50 of the 290 proteins with zero‐tolerance segments were identified by Panther having the gene ontology (GO) term mRNA processing (see Table 1). When GO Panther's overrepresentation analysis results were filtered to retain results only for the proteins with a zero‐tolerance segment that is at least 41 residues long, the only two categories yielding p <5 × 10−10 were RNA splicing and mRNA processing (Figure 5b), further highlighting the robust enrichment of these protein functional categories in Table 1.
FIGURE 5

Panther overrepresentation GO biological process term analysis of biological pathways associated with proteins containing a zero‐tolerance segment. Only pathways with less than 500 proteins in the Homo sapiens reference list and p < 5 × 10−10 were considered. (a) The results for analysis of proteins in which the minimum size of the zero‐tolerance segment was 31 residues. (b) The results when the minimum length of the zero‐tolerant segment was 41 residues. GO, gene ontology

Panther overrepresentation GO biological process term analysis of biological pathways associated with proteins containing a zero‐tolerance segment. Only pathways with less than 500 proteins in the Homo sapiens reference list and p < 5 × 10−10 were considered. (a) The results for analysis of proteins in which the minimum size of the zero‐tolerance segment was 31 residues. (b) The results when the minimum length of the zero‐tolerant segment was 41 residues. GO, gene ontology The gene enrichment analyses all point to RNA splicing as the biological process that is associated with the largest number of proteins that contain an intolerant segment. This may well reflect the importance of mRNA splicing in early human development (conception to birth), where this process enables proteins to be remodeled to suit varying roles during the developmental phases of human gestation. , , Failure to express the correct protein isoforms at the right time may be a particularly common mechanism of purifying selective pressure on the responsible gene variations.

Association of ClinVar variants with proteins having intolerant segments

For each protein with an MTR = 0 segment, we also examined whether there are ClinVar missense variants that encode amino acid changes in that protein and recorded this observation in Table 1. As of February 2022, we found that 127 of the 257 proteins contain at least one ClinVar missense mutation encoding an amino acid change in the protein. It is interesting that the other 130 of these proteins have no known or suspected disease mutations associated with them, highlighting the ability of intolerance analysis to detect proteins that evidently may be essential to human reproduction or gestational development but are not associated with known human genetic disorders. Currently, detection of disease variants is usually based on genetic sampling and analysis of people after they have been born, explaining why mutations in such essential genes may have escaped detection. For the 127 intolerant proteins that are seen to be associated with ClinVar variations, we also examined whether any of the encoded amino acid changes are located within MTR = 0 segments. We found this to be the case for 68 proteins. For such proteins, while mutations are not observed within their MTR = 0 gene segments in any of the >105 sequences from mostly healthy people in the current gnomAD database, there are very rare variants that are detected in clinical patient populations, often sick children with de novo (non‐inherited) mutations. These very rare variants may cause or contribute to human disorders, but are not absolutely filtered out of the human population because they do not prevent birth.

CONCLUSION

This paper reports that 257 human proteins contain zero‐tolerance segments, as identified by MTR analysis. Some of these proteins were previously known to be associated with genetic disorders and some were not. While not all proteins containing zero‐tolerance segments can be functionally grouped with other such proteins, about half were found in one of a half dozen functionally‐related groups of protein, the largest of which (containing nearly 20% of all zero‐tolerance proteins) is associated with RNA splicing and related RNA biochemistry. We hope that this report of 257 human proteins that contain zero‐tolerance segments will motivate studies of these proteins to establish exactly how and why mutations in intolerant segments within each protein result in purifying selection in the human population. This will require insight into the human physiological role(s) of each protein and also structural and structure–function data (see Perszyk et al. for a recent method that may support such efforts). For some of these proteins, such as the voltage‐gated potassium and sodium channels, there may already be enough information in the literature to rationalize the presence of zero‐tolerant segments. However, even for these channels, questions remain. For example, mutations in the zero‐tolerance segments of the sodium channels SCN2A and SCN8A are subjected to purifying selection even though these mutations would occur under heterozygous WT/mutant expression conditions and even though sodium channels, unlike potassium channels, form monomeric channels. Does this mean a 50% reduction in the function of SNC2A or SCN8A is sufficient to prevent human reproduction or terminate life before birth or is it instead the case that mutations in zero‐tolerant segments in these proteins induce some sort of toxic gain‐of‐function effect that compounds the impact of partial loss‐of‐function under WT/mutant heterozygous conditions? Future studies may be required to address such questions. For proteins that have previously escaped significant notice, such as the putative olfactory receptor, OR4F17, observation of a zero tolerant segment suggests a critical and previously overlooked role for these proteins in human reproduction and/or health. Observation of zero‐tolerance segments in proteins may be particularly useful as a way of pointing investigators to proteins that are critical for human reproduction and/or pre‐birth development, but for which associated causative mutations have never been detected. Finally, there are other interesting questions triggered by this work. These include the aforementioned question of why some zero‐tolerance segments in human protein are not 100% conserved among their nearest mammalian relatives. Another question is inspired by Figure 1a, where it is seen that there is a modest population of proteins that have not only a zero‐tolerance segment, but also contain segments with MTR values higher than 1.0, suggesting these latter segments are experiencing evolutionary pressure to rapidly mutate. Does this suggest that such proteins are critical to human reproduction and/or health, yet also are being pressured either to adapt to changes in the human environment, to further optimize a current function, or to acquire a new function or mode of regulation? We hope that addressing questions such as these will ultimately advance our understanding of the molecular biology of human health, reproduction, development, and disease.

MATERIALS AND METHODS

MTR analysis

An Excel file containing a well‐annotated list of all canonical human genes was provided by Prof. Anthony Capra of the University of California, San Francisco. From this list, we deleted all genes that encode various forms of non‐coding RNA, leaving a list of roughly 20,000 protein‐encoding genes. Each gene was then subjected to MTR analysis using the web‐mounted MTR‐Viewer server (http://biosig.unimelb.edu.au/mtr-viewer/). Version 2 of MTR analysis was run using the default window size of 31 residues. This program conducts MTR analysis in “sliding sequence” fashion for each possible 93 nucleotide segment in the coding gene transcript and returns a plot of the segmental MTR score versus the position of the amino acid in the middle of the encoded 31 residues segment. When a residue is within 16 residues of the protein's N‐ or C‐terminus, analysis is conducted, but in a truncated manner. For example, for residue 10 in any given protein, the reported MTR score will be for the gene segment that encodes residues 1–25. The MTR plots generated by the server for each protein were then manually inspected and the minimum MTR score observed for the analyzed gene/protein was recorded along with the corresponding residue number at the center of the analyzed segment. For proteins having multiple overlapping and/or non‐overlapping MTR = 0 segments, the locations of all such segments were recorded. MTR plots revealed that there were 257 human proteins that exhibited at least one statistically robust MTR = 0 segment. The 257 genes and their encoded proteins that exhibited at least one statistically robust MTR = 0 segment are tabulated (Table 1) with both gene codes and UniProt identifiers (https://www.UniProt.org/). It was found that the canonical transcript for a given gene analyzed by the MTR version 2 program does not always correspond to the canonical protein sequence listed in UniProt, usually because the MTR‐analyzed transcript is a splice variant of the transcript that encodes the UniProt‐canonical protein. For such instances this is noted in Table 1 and, to avoid confusion, the reported amino acid sequence of the intolerant segment is provided using the residue numbering for the canonical UniProt sequence. There were a few cases where the intolerant segment was not found in the sequence of the UniProt‐listed splice variant form(s) of the protein. In these cases, a note is added to the table. In conjunction with the primary MTR plot for each gene/protein, the output of the MTR‐Viewer server also includes a plot of the positions of any known ClinVar variants for the analyzed gene/protein. Along with tabulated MTR data for each protein entry we also included the total number of ClinVar variants in the protein and the number that are located within the MTR = 0 segment(s), if any. Proteins with intolerant segments were also manually characterized based on their function. The GO terms , were tabulated and pathway analysis was also conducted, as described in the following sections. We also recorded whether each protein entry contains a transmembrane domain. In addition to the 257 proteins with statically robust zero‐tolerance segments there were additional 33 proteins that contained MTR = 0 segments, but for which there were not enough observed silent mutations within these segments in gnomAD to ensure that MTR = 0 is statistically robust. These proteins are listed in Table S1, along with additional information regarding the location of the candidate intolerant segment(s) in each protein's sequence. These 33 proteins remain candidates as having zero‐tolerance segments, but more human sequences will be required to increase the number silent mutations to the point where statistically reliable MTR scores can be calculated.

Sequence homology searches

For each of the 257 protein sequences of Table 1 that contain one or more zero‐tolerance segments we ran BLASTP using the default search parameters against all available mammalian protein sequences. For each search we saved the output for the 250 closest mammalian homologs. We also ran BLASTP for each protein's zero‐tolerance segment(s). For each protein, the median % sequence identity for both the full‐length sequence and the zero‐tolerance segment(s) was determined along with related statistics (Figure 2 and S1). BLASTP searches were also conducted for the 33 proteins of Table S1 that contain an MTR = 0 segment, but for which the result is not statistically definitive, as summarized in Figure S2.

Protein–protein interaction analysis

Protein networks for proteins with MTR = 0 segments were constructed using the Cytoscape stringApp (https://apps.cytoscape.org/apps/stringapp) with a confidence cutoff of 0.95 stringdb score. Next, a granulation value of 3 was applied to determine refined complexes. The thickness of the lines connecting protein pairs in the granulated Cytoscape networks was set based on the stringdb experiment scores.

Pathway analysis

Gene symbols for proteins with at least one MTR = 0 segment were input into GO Panther's statistical overrepresentation test to determine which biological pathways are overrepresented compared to the reference human gene list (http://www.pantherdb.org/). The processes considered for evaluation were those that encompass less than 500 genes in the human genome reference to filter for GO biological processes functions that are more specific and reduce non‐specific overarching GO terms. In addition to this criterion, the GO term must also have p < 5.0 x 10−10. Additionally, the gene list was input into Enrichr (https://maayanlab.cloud/Enrichr/) to determine the biological pathways involved.

AUTHOR CONTRIBUTIONS

Adam Sanders: Data curation (equal); formal analysis (equal); investigation (equal); writing – review and editing (equal). Jake Hermanson: Data curation (equal); formal analysis (equal); investigation (equal); visualization (equal); writing – review and editing (equal). David Samuels: Investigation (equal); writing – review and editing (equal). Lars Plate: Investigation (equal); methodology (equal); supervision (equal); writing – review and editing (equal). Charles Sanders: Conceptualization (equal); funding acquisition (equal); project administration (equal); supervision (equal); writing – original draft (lead); writing – review and editing (lead).

CONFLICT OF INTEREST

The authors declare no conflict of interest. Appendix S1. Supporting Information Click here for additional data file. Figure S1. % Sequence identities for human proteins containing zero‐tolerance segments to their 250 closest mammalian homologs. For each, the % sequence identity for the entire encoded protein sequence is given first (black), followed by the % sequence identity for each non‐contiguous intolerant segment found in that gene (red), where GENE.1, GENE.2, and so forth indicate which intolerant protein segment encoded by that gene was searched. The data is presented in the form of “box and whiskers” plots. The bold bar indicates the median, the wings of the bars are the quartiles, the whiskers are 1.5 times the inner quartile range, and the dots are outliers that lie beyond the whiskers. Zero‐tolerance protein segments for the following genes failed to yield a successful BLASTP search, which is why they do not appear in this figure: DUSP8.1, EIF1AY, GDF11, HCFC1 (third zero‐tolerance segment only), HMGN4, HUWE1 (fourth zero‐tolerance segment only), KCNC2, NSMF, RBMY1A1, RGPD2, RPS28, SCN8A (eighth zero‐tolerance segment only), and TERT. Some of the possible reasons the amino acid sequences for the zero‐tolerance segments failed to return BLASTP results are given in https://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=FAQ#nohits Figure S2. % Sequence identities for the 33 human proteins that contain currently‐statistically‐unconvincing zero‐tolerance segments to their 250 closest mammalian homologs. % Sequence identities for human proteins containing zero‐tolerance segments to their 250 closest mammalian homologs. For each gene, the % sequence identity for the entire encoded protein sequence is given first (black), followed by the % sequence identity for each non‐contiguous intolerant segment found in that gene (red), where GENE.1, GENE.2, and so forth indicate which intolerant protein segment encoded by that gene was searched. The data is presented in the form of “box and whiskers” plots. The bold bar indicates the median, the wings of the bars are the quartiles, the whiskers are 1.5 times the inner quartile range, and the dots are outliers that lie beyond the whiskers. Table S1. Human proteins that contain at least one zero‐tolerance segment, but for which the statistics for these segments are uncertain. Table S2. Locations of zero‐tolerance segments in voltage‐gated potassium channels. Click here for additional data file.
  36 in total

1.  Gene ontology: tool for the unification of biology. The Gene Ontology Consortium.

Authors:  M Ashburner; C A Ball; J A Blake; D Botstein; H Butler; J M Cherry; A P Davis; K Dolinski; S S Dwight; J T Eppig; M A Harris; D P Hill; L Issel-Tarver; A Kasarskis; S Lewis; J C Matese; J E Richardson; M Ringwald; G M Rubin; G Sherlock
Journal:  Nat Genet       Date:  2000-05       Impact factor: 38.330

2.  LoFtool: a gene intolerance score based on loss-of-function variants in 60 706 individuals.

Authors:  João Fadista; Nikolay Oskolkov; Ola Hansson; Leif Groop
Journal:  Bioinformatics       Date:  2017-02-15       Impact factor: 6.937

Review 3.  Alternative splicing as a regulator of development and tissue identity.

Authors:  Francisco E Baralle; Jimena Giudice
Journal:  Nat Rev Mol Cell Biol       Date:  2017-05-10       Impact factor: 94.444

4.  WikiNetworks: translating manually created biological pathways for topological analysis.

Authors:  Mukta G Palshikar; Shannon P Hilchey; Martin S Zand; Juilee Thakar
Journal:  Bioinformatics       Date:  2021-10-12       Impact factor: 6.931

5.  The 3D mutational constraint on amino acid sites in the human proteome.

Authors:  Bian Li; Dan M Roden; John A Capra
Journal:  Nat Commun       Date:  2022-06-07       Impact factor: 17.694

6.  Cytoscape StringApp: Network Analysis and Visualization of Proteomics Data.

Authors:  Nadezhda T Doncheva; John H Morris; Jan Gorodkin; Lars J Jensen
Journal:  J Proteome Res       Date:  2018-12-05       Impact factor: 4.466

7.  Olfactory receptor responding to gut microbiota-derived signals plays a role in renin secretion and blood pressure regulation.

Authors:  Jennifer L Pluznick; Ryan J Protzko; Haykanush Gevorgyan; Zita Peterlin; Arnold Sipos; Jinah Han; Isabelle Brunet; La-Xiang Wan; Federico Rey; Tong Wang; Stuart J Firestein; Masashi Yanagisawa; Jeffrey I Gordon; Anne Eichmann; Janos Peti-Peterdi; Michael J Caplan
Journal:  Proc Natl Acad Sci U S A       Date:  2013-02-11       Impact factor: 11.205

8.  Gene Set Knowledge Discovery with Enrichr.

Authors:  Zhuorui Xie; Allison Bailey; Maxim V Kuleshov; Daniel J B Clarke; John E Evangelista; Sherry L Jenkins; Alexander Lachmann; Megan L Wojciechowicz; Eryk Kropiwnicki; Kathleen M Jagodnik; Minji Jeon; Avi Ma'ayan
Journal:  Curr Protoc       Date:  2021-03

9.  Analysis of protein-coding genetic variation in 60,706 humans.

Authors:  Monkol Lek; Konrad J Karczewski; Eric V Minikel; Kaitlin E Samocha; Eric Banks; Timothy Fennell; Anne H O'Donnell-Luria; James S Ware; Andrew J Hill; Beryl B Cummings; Taru Tukiainen; Daniel P Birnbaum; Jack A Kosmicki; Laramie E Duncan; Karol Estrada; Fengmei Zhao; James Zou; Emma Pierce-Hoffman; Joanne Berghout; David N Cooper; Nicole Deflaux; Mark DePristo; Ron Do; Jason Flannick; Menachem Fromer; Laura Gauthier; Jackie Goldstein; Namrata Gupta; Daniel Howrigan; Adam Kiezun; Mitja I Kurki; Ami Levy Moonshine; Pradeep Natarajan; Lorena Orozco; Gina M Peloso; Ryan Poplin; Manuel A Rivas; Valentin Ruano-Rubio; Samuel A Rose; Douglas M Ruderfer; Khalid Shakir; Peter D Stenson; Christine Stevens; Brett P Thomas; Grace Tiao; Maria T Tusie-Luna; Ben Weisburd; Hong-Hee Won; Dongmei Yu; David M Altshuler; Diego Ardissino; Michael Boehnke; John Danesh; Stacey Donnelly; Roberto Elosua; Jose C Florez; Stacey B Gabriel; Gad Getz; Stephen J Glatt; Christina M Hultman; Sekar Kathiresan; Markku Laakso; Steven McCarroll; Mark I McCarthy; Dermot McGovern; Ruth McPherson; Benjamin M Neale; Aarno Palotie; Shaun M Purcell; Danish Saleheen; Jeremiah M Scharf; Pamela Sklar; Patrick F Sullivan; Jaakko Tuomilehto; Ming T Tsuang; Hugh C Watkins; James G Wilson; Mark J Daly; Daniel G MacArthur
Journal:  Nature       Date:  2016-08-18       Impact factor: 49.962

10.  Molecular Mechanism of Disease-Associated Mutations in the Pre-M1 Helix of NMDA Receptors and Potential Rescue Pharmacology.

Authors:  Kevin K Ogden; Wenjuan Chen; Sharon A Swanger; Miranda J McDaniel; Linlin Z Fan; Chun Hu; Anel Tankovic; Hirofumi Kusumoto; Gabrielle J Kosobucki; Anthony J Schulien; Zhuocheng Su; Joseph Pecha; Subhrajit Bhattacharya; Slavé Petrovski; Adam E Cohen; Elias Aizenman; Stephen F Traynelis; Hongjie Yuan
Journal:  PLoS Genet       Date:  2017-01-17       Impact factor: 5.917

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.