| Literature DB >> 28986545 |
Francesco Raimondi1,2, Matthew J Betts1,2, Qianhao Lu1,2, Asuka Inoue3,4, J Silvio Gutkind5, Robert B Russell6,7.
Abstract
Members of diverse protein families often perform overlapping or redundant functions meaning that different variations within them could reflect differences between individual organisms. We investigated likely functional positions within aligned protein families that contained a significant enrichment of nonsynonymous variants in genomes of healthy individuals. We identified more than a thousand enriched positions across hundreds of family alignments with roles indicative of mammalian individuality, including sensory perception and the immune system. The most significant position is the Arginine from the Olfactory receptor "DRY" motif, which has more variants in healthy individuals than all other positions in the proteome. Odorant binding data suggests that these variants lead to receptor inactivity, and they are mostly mutually exclusive with other loss-of-function (stop/frameshift) variants. Some DRY Arginine variants correlate with smell preferences in sub-populations and all 2,504 humans studied contain a unique spectrum of active and inactive receptors. The many other variant enriched positions, across hundreds of other families might also provide insights into individual differences.Entities:
Mesh:
Substances:
Year: 2017 PMID: 28986545 PMCID: PMC5630595 DOI: 10.1038/s41598-017-12971-7
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1(a) Schematic showing how variant data are combined with aligned protein domains (coloured shapes; top), to identify equivalent alignment positions (below; boxes show conserved) and variants (red). (b) Protein domains having the most missense variants within the 1000 genomes population. (c) Enrichment at each significantly enriched alignment position (Q < = 0.01, ≥10 members with variants, log-odds > = 1) vs. the fraction of the total 1000 genomes population having at least one of these variants. Labels give Pfam alignment position and the most common residue (uppercase = conserved); colours denote Pfam families (7tm_1 and 7tm_4 are GPCR family A containing the DRY motif); diameter is proportional to variant count. (d) as for c) but where the y-axis is the number of times pairs of individuals have the identical spectrum of variants at a protein family position (i.e. dRy is the only position where no two individuals have the same spectrum, hence a value of zero).
Figure 2(a) Cartoon showing GPCR positions with the most variants; size is proportional to variant count, numbers denote Pfam 7tm_1 alignment positions, with the Ballesteros/Weinstein scheme given in parantheses. (b) Fraction of GPCR genes containing variants at each domain position showing the proportion in OR (yellow) and non-OR (green) receptors highlighted. (c) Plots of the number of the standard deviations above (positive) or below (negative) the mean for each position within GPCR family A for eight species within Ensembl variations[56] having sufficient data for the analysis. The mean and standard-deviation are specific for each species and considering only variants within GPCR family A. The dip in the plot (between H5 and H6) is due to gaps within the alignment (i.e. fewer data points overall within this region).
Figure 3(a) Percentages of CG dinucleotides for different sets of Arginines; candlestick plots are shown where distributions are available. (b) The difference between the %CG for specific conserved Arginines and that for all domain family Arginines; key positions labelled by domain name and alignment position. Dashed lines show+/−1 standard deviation for positions in domains with at least 20 sequences.
Figure 4Hierarchical clustering dendrogram of 2,504 individuals based on OR loss-of-function (dRy, stop-gain or frameshift) fingerprint distances. Clustered pie charts showing the proportions of sub-populations containing the variants; radii are proportional to population; colours show 1000 Genome super-population composition. Loss-of-function variants are shown as positional changes given as wild-type residue, position and change, where * denotes stop-gains and fs denotes frameshifts (those labelled are dRy variants). Only variants enriched/depleted (log odds ≥ 0.6 or ≤ −0.6) in one 1000 Genomes super-population relative to the others are shown. Variants are shown in boxes together with known ligands for the receptors[17] and arrows show if they are enriched (green) or depleted (red) in a particular super-population. The grey curve around the mostly African (yellow) clusters is for clarity; arrows touching this line indicate enrichment or depletion in all clusters.