| Literature DB >> 18989455 |
Yehudit Hasin1, Tsviya Olender, Miriam Khen, Claudia Gonzaga-Jauregui, Philip M Kim, Alexander Eckehart Urban, Michael Snyder, Mark B Gerstein, Doron Lancet, Jan O Korbel.
Abstract
Olfactory receptors (ORs), which are involved in odorant recognition, form the largest mammalian protein superfamily. The genomic content of OR genes is considerably reduced in humans, as reflected by the relatively small repertoire size and the high fraction ( approximately 55%) of human pseudogenes. Since several recent low-resolution surveys suggested that OR genomic loci are frequently affected by copy-number variants (CNVs), we hypothesized that CNVs may play an important role in the evolution of the human olfactory repertoire. We used high-resolution oligonucleotide tiling microarrays to detect CNVs across 851 OR gene and pseudogene loci. Examining genomic DNA from 25 individuals with ancestry from three populations, we identified 93 OR gene loci and 151 pseudogene loci affected by CNVs, generating a mosaic of OR dosages across persons. Our data suggest that approximately 50% of the CNVs involve more than one OR, with the largest CNV spanning 11 loci. In contrast to earlier reports, we observe that CNVs are more frequent among OR pseudogenes than among intact genes, presumably due to both selective constraints and CNV formation biases. Furthermore, our results show an enrichment of CNVs among ORs with a close human paralog or lacking a one-to-one ortholog in chimpanzee. Interestingly, among the latter we observed an enrichment in CNV losses over gains, a finding potentially related to the known diminution of the human OR repertoire. Quantitative PCR experiments performed for 122 sampled ORs agreed well with the microarray results and uncovered 23 additional CNVs. Importantly, these experiments allowed us to uncover nine common deletion alleles that affect 15 OR genes and five pseudogenes. Comparison to the chimpanzee reference genome revealed that all of the deletion alleles are human derived, therefore indicating a profound effect of human-specific deletions on the individual OR gene content. Furthermore, these deletion alleles may be used in future genetic association studies of olfactory inter-individual differences.Entities:
Mesh:
Substances:
Year: 2008 PMID: 18989455 PMCID: PMC2570968 DOI: 10.1371/journal.pgen.1000249
Source DB: PubMed Journal: PLoS Genet ISSN: 1553-7390 Impact factor: 5.917
Figure 1A high-resolution map of CNVs in the human OR repertoire.
A) CNV map for OR loci based on high-resolution oligonucleotide tiling arrays. 851 ORs are ordered according to their location along the chromosomes, as indicated on the left; rows represent genes, columns are individuals; gains are shown in red, losses in blue and un-changed dosage in green (calls were made relative to the male reference individual NA19154). Note that the non-uniform genomic distribution of ORs results in an unbalanced representation of chromosomes in panel A. Also, note that ‘gains’ on chromosome X do not represent CNVs but refer to the expected male/female dosage difference. CNV calls are given for all 25 individuals (i.e. a self-vs.-self-replicate of NA19154 was included as a control; see Methods). Due to the resolution of the figure single CNVs may not be visible (all events are given in Table S2). Samples appear in the following order (1–25); NA10851, NA11997, NA12003, NA12004, NA12005, NA12006, NA12246, NA12248, NA12865, NA15510, NA18501, NA18502, NA18504, NA18505, NA18506, NA18508, NA18611, NA18856, NA18945, NA18946, NA18972, NA19103, NA19128, NA19141, NA19154. B) qPCR and microarray measurements of 122 OR loci for 13 individuals. The right panel represents qPCR results, and the left panel the corresponding microarray measurements (i.e. the measure R; see Methods). Sixty of the 122 ORs were tested in 13 individuals, thus only data for these samples is shown (for the full dataset, see Table S1). OR loci were sorted based on copy-number variability as assessed with our microarrays; the top 40 rows represent genes categorized as CNVs by microarrays; the lower part refers to loci not scored as a CNVs with the arrays, but scored as CNVs by qPCR (see Table 2). qPCR data was normalized relative to NA19154, and inverted (values multiplied with −1) to fit to the microarray scale. OR2BH1P and OR9G1 showed homozygous deletion in the reference individual, thus the qPCR values of these ORs were not normalized. Relative intensities are color coded, as indicated by the color scales. Homozygously deleted OR alleles are shown in the right panel in black. Samples appear in the following order; NA12003, NA12004, NA12005, NA12006, NA12246, NA12248, NA12865, NA18504, NA18508, NA18856, NA19103, NA19141, NA19154. C) qPCR-measurements and array (R) measures for the 56 most variable OR loci. The most variable OR loci were selected based on variance in qPCR results. Representation and sample order is as in panel B.
Summary of array results for OR gene and pseudogene loci and comparison with DGV.
|
|
| Total | |
|
| 385 | 466 |
|
|
| 68 | 122 |
|
|
| 53 | 82 |
|
|
| 93 (24%) | 151 (32%) |
|
|
| |||
|
| 42 | 68 |
|
|
| 51 | 83 |
|
|
| 106 | 118 |
|
|
| 188 | 195 |
|
Copy-number variable loci are OR loci with R–measures (median normalized microarray intensity log2-ratio across an OR locus) falling beyond the cutoff C = |0.18| (see Methods). ORs were considered as copy-number variable if a gain or loss was identified in at least one individual. The variability status of each locus was compared to the Database of Genomic Variants (DGV). “New CNV”: CNV identified by us, not in DGV; “Confirmed CNV”: CNV identified by us, present in DGV; “Undetected CNV”: present in DGV, not identified in our panel of individuals; “Confirmed not variable”: was not identified as being copy-number variable by us, and is not reported in DGV.
CNVs detected in qPCR experiments that displayed little variability in microarray experiments.
| OR locus | qPCR-variance | variance of | Status in DGV |
| OR10AG1 | 0.693519007 | 0.002244359 | CNV |
| OR10J3 | 0.227390436 | 0.000961263 | NR |
| OR10J5 | 0.22072615 | 0.00106244 | NR |
| OR10Z1 | 0.573520272 | 0.001973186 | NR |
| OR1A2 | 0.176006731 | 0.000929803 | CNV |
| OR1G1 | 0.418577965 | 0.002270532 | CNV |
| OR2AJ1 | 0.414815785 | 0.002147011 | CNV |
| OR2B11 | 0.396684535 | 0.005136635 | CNV |
| OR2G3 | 0.396707452 | 0.001385149 | CNV |
| OR2L13 | 0.511584535 | 0.002920064 | CNV |
| OR2T29 | 0.150669223 | 0.003118704 | CNV |
| OR2T6 | 0.61481576 | 0.000332848 | CNV |
| OR4A45P | 0.226808226 | 0.003287325 | CNV |
| OR4C13 | 0.494025825 | 0.002679713 | CNV |
| OR4E2 | 0.387122516 | 0.001315008 | CNV |
| OR4K14 | 0.162403526 | 0.001719849 | CNV |
| OR5AK2 | 0.261028206 | 0.003661852 | NR |
| OR6C2 | 0.410398478 | 0.003290131 | CNV |
| OR6C4 | 0.193139103 | 0.001050314 | CNV |
| OR6K2 | 0.173307692 | 0.002568683 | NR |
| OR6Q1 | 0.240047903 | 0.002727374 | NR |
| OR6Y1 | 0.162319311 | 0.002412609 | NR |
| OR9G1 | 0.295386696 | 0.001038619 | NR |
Twenty-three ORs were found to be variable in qPCR experiments (beyond a conservative cutoff for the variance of >0.15), but not in the microarrays. NR: not previously reported.
Figure 2Copy-number variability expressed as variance of experimental measures.
Variance in array measurements is indicated along OR loci, with loci arranged according to genomic coordinates. The variance of individual array measurements for each OR is plotted in grey. Array variance of ORs that were assayed by qPCR is color-coded; green: OR genes; red: OR pseudogenes. Black squares indicate ORs listed in Table 3; representative ORs from each cluster are indicated by red doted lines.
Summary of deletion alleles followed up in detail.
| #. | # ORs deleted | ORs deleted | Location | Start | End | Length (kb) | Former status | Ref. | # of homozygously deleted samples in qPCR | Estimated deletion allele | Human derived allele |
| I | 6 | OR4C11 OR4P4 OR4S2 OR4C6 OR4V1P OR4P1P | 11q11 | 55,124,730 | 55,207,364 | 82.6 | Deletion | Watson genome | 3 | 0.36 | deletion |
| II | 4 | OR2T34 OR2T10 OR2T11 OR2T35 | 1q44 | 246,794,522 | 246,875,051 | 80.5 | Deletion | Watson genome | 2 | 0.29 | deletion |
| III | 2 | OR8U8 OR8U9 | 11q11 | 53,483,709 | 53,491,314 | 7.6 | NR | - | 0 | 0.09 | deletion |
| IV | 1 | OR2BH1P | 11p14.1 | 28,962,961 | 28,970,373 | 7.4 | CNV-loss | DGV | 8 | 0.59 | deletion |
| V | 1 | OR4A45P | 11p11.2 | 48,557,433 | 48,560,858 | 3.4 | CNV-loss | DGV | 5 | 0.46 | deletion |
| VI | 3 | OR56B2P OR52N5 OR52N1 | 11p15.4 | 5,740,460 | 5,766,804 | 26.3 | CNV-loss | DGV | 1 | 0.2 | deletion |
| VII | 1 | OR52E8 | 11p15.4 | 5,828,206 | 5,839,952 | 11.7 | CNV-loss | DGV | 1 | 0.2 | deletion |
| VIII | 1 | OR9G1 | 11q11 | 53819045 | 53830192 | ND | NR | - | 3 | 0.36 | deletion |
| IX | 1 | OR5P2 | 11p15.4 | 7,767,796 | 7,792,963 | 25.2 | CNV | DGV | 1 | 0.2 | deletion |
The table summarizes losses affecting OR loci for which we sought for homozygously deleted individuals using qPCR. Abbreviations: NR: not reported previously; DGV: intersecting variant previously reported in the Database of Genomic Variants (http://projects.tcag.ca/variation). “Former status” indicates as to whether this locus was reported as a “deletion”, or “CNV”, in previously published data sets. Boundaries of an event were here taken from the respective (confirmatory) data source (except for the novel deletions III and VIII). *For deletion III, 4 heterozygotes (and no homozygous deletion) were identified by combinations of allele-specific qPCR reactions.**For all deletions, except deletion III, allele frequencies were estimated as square root of proportion of homozygously deleted individuals. Frequency of deletion III allele was estimated as the number of deletion alleles (n = 4) out of total number of chromosomes tested (n = 46).
Figure 3Correlation of OR copy-number variability with paralog-similarity.
Red and blue dots indicate copy-number variable and non-variable ORs, respectively (copy-number variability is expressed in terms of the measure R, see Methods, which we found to correlate well with gene dosage). Percentage DNA sequence identity (“% identity”) to the closest paralog in the human genome is plotted versus the array-based (i.e., R-measure-based) variance. Correlation for ORs affected by CNVs is C = 0.26 (Pvalue = 10−5), whereas for non-variable ORs it is C = 0.15 (Pvalue = 10−4). Linear regression fits for each dataset are indicated with red and blue dashed lines, respectively.
Figure 4CNVs preferentially affect ORs lacking unambiguous one-to-one orthologs in the chimpanzee genome.
A) Gains and losses of OR loci were called using our microarrays (see Methods). Gains are shown in blue; losses in orange; n is the number of total calls considered (24 samples multiplied by the number of genes in each category). OR loci with a one-to-one (“1-2-1”) ortholog in the chimpanzee genome are significantly (Pvalue<0.001; Mann-Whitney U test) less often affected by CNVs than loci lacking a 1-2-1 ortholog. B) Frequencies of CNV loci are given separately for intact OR genes and pseudogenes in each of the evolutionary classes. “Frequency”: relative frequency of being called a CNV for a set.
Figure 5Zoom into a bi-allelic CNV affecting OR4C11.
Plot depicting median normalized log2-ratios of microarray intensities for OR loci affected by deletion I (chr11: 55127497–55238834), a bi-allelic CNV. Each individual is color-coded as indicated in the legend shown to the right. Black arrows indicate samples that consistently failed to produce results in the qPCR and standard PCR assays, indicating a potential homozygous deletion.