| Literature DB >> 23613972 |
Rong Qiu1, Chao Chen, Hong Jiang, Libing Shen, Min Wu, Chunyu Liu.
Abstract
DNA variants, such as single nucleotide polymorphisms (SNPs) and copy number variants (CNVs), are unevenly distributed across the human genome. Currently, dbSNP contains more than 6 million human SNPs, and whole-genome genotyping arrays can assay more than 4 million of them simultaneously. In our study, we first questioned whether published genome-wide association studies (GWASs) assays cover all regions well in the genome. Using dbSNP build 135 data, we identified 50 genomic regions longer than 100 Kb that do not contain any common SNPs, i.e., those with minor allele frequency (MAF)≥ 1%. Secondly, because conserved regions are generally of functional importance, we tested genes in those large genomic regions without common SNPs. We found 97 genes and were enriched for reproduction function. In addition, we further filtered out regions with CNVs listed in the Database of Genomic Variants (DGV), segmental duplications from Human Genome Project and common variants identified by personal genome sequencing (UCSC). No region survived after those filtering. Our analysis suggests that, while there may not be many large genomic regions free of common variants, there are still some "holes" in the current human genomic map for common SNPs. Because GWAS only focused on common SNPs, interpretation of GWAS results should take this limitation into account. Particularly, two recent GWAS of fertility may be incomplete due to the map deficit. Additional SNP discovery efforts should pay close attention to these regions.Entities:
Mesh:
Substances:
Year: 2013 PMID: 23613972 PMCID: PMC3629113 DOI: 10.1371/journal.pone.0061917
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Data Sources Used in This Study.
| Data | URL | Version | Modified date | Data description and summary statistics |
| Common SNP Data in HapMap |
| Human Genome assembly hg19. | 18-Dec-2011 | snp135Common.txt.gz Total SNPs: 11,488,259 in chr1-chrY. |
| Genome Assembly Gaps data |
| Human Genome assembly hg19. | 27-Apr-2009 | gap.txt.gz Total gaps, 357 in chr1-chrY. |
| Genomes Unzipped data |
| Based on human genome hg18, upgraded to hg19 | 10-Oct-2010 | Total of 1923 SNPs in the chrY.9 sample, 546 common SNPs with maf>1%.With data for 9 personal genome sequences. |
| personal genome variation data |
| Based on Human Genome assembly hg19. | 21-Feb-2010 | Total of 9 personal genomes: pgNA12878.txt.gzpgNA12891.txt.gzpgNA12892.txt.gzpgNA19240.txt.gzpgSjk.txt.gzpgVenter.txt.gzpgWatson.txt.gzpgYh1.txt.gzpgYoruban3.txt.gz |
| DGV data |
| Human Genome assembly hg19. | 07-Mar-2011 | dgv.txt.gz Total 101605 in chr1-chrY. |
| segmental duplication data |
| Human Genome assembly hg19. | 27-Jun-2011 | inter pairs is 22980; intra pairs is 8763 |
| Genes |
| Human Genome assembly hg19. | 21-May-2012 | Total number of genes is 42,742; after eliminating other chromosome, 30,332 genes in chr1-chrY remain. |
List of 50 common SNP-free regions containing 97 genes.
| Chr | CSFR_start | CSFR_end | CSFR_size | Gene_name | Isochore_type |
| chr1 | 145883118 | 145989503 | 106385 | GPR89C, PDZK1P1 | Isochore_border |
| chr2 | 110524226 | 110704031 | 179805 | RGPD5, RGPD6, LIMS3, LIMS3-LOC440895, LIMS3 L | Isochore |
| chr2 | 111191098 | 111347035 | 155937 | LIMS3-LOC440895, LIMS3, LIMS3L, RGPD6, RGPD5 | Isochore_border |
| chr7 | 74765724 | 74866460 | 100736 | GATSL2 | Isochore |
| chr9 | 39379250 | 39551456 | 172206 | LOC653501, ZNF658B | Unknown |
| chr9 | 39829606 | 39961804 | 132198 | FAM75A2, FAM75A1, FAM74A1 | Unknown |
| chr9 | 41497718 | 41635419 | 137701 | FAM75A5, FAM75A7, LOC653501, ZNF658B | Unknown |
| chr9 | 42743905 | 42847394 | 103489 | LOC286297 | Isochore_border |
| chr10 | 46799214 | 46907775 | 108561 | FAM35B | Isochore |
| chr10 | 48185336 | 48300420 | 115084 | LOC642826, AGAP9, FAM25B, FAM25G, FAM25C, ANXA8, ANXA8 L1 | Isochore_border |
| chr16 | 33142890 | 33293778 | 150888 | TP53TG3, TP53TG3C, TP53TG3B | Isochore_border |
| chrX | 52098738 | 52395914 | 297176 | XAGE2, XAGE2B, XAGE1B, XAGE1A, XAGE1D, XAGE1C, XAGE1E | Unknown |
| chrX | 52445914 | 52568230 | 122316 | XAGE1A, XAGE1C, XAGE1E, XAGE1D, XAGE1B | Isochore_border |
| chrY | 4834281 | 4935713 | 101432 | PCDH11Y | Isochore_border |
| chrY | 5012892 | 5205540 | 192648 | PCDH11Y | Unknown |
| chrY | 5274434 | 5421065 | 146631 | PCDH11Y | Isochore_border |
| chrY | 6074690 | 6422524 | 347834 | TTTY23, TTTY23B, TSPY2, TTTY1B, TTTY1, TTTY2B, TTTY2, TTTY21, TTTY21B, TTTY7B, TTTY7, TTTY8B,TTTY8 | Isochore |
| chrY | 9381846 | 9492957 | 111111 | RBMY3AP | Isochore |
| chrY | 9524503 | 9768115 | 243612 | TTTY8, TTTY8B, TTTY7B, TTTY7, TTTY21, TTTY21B, TTTY2B, TTTY2, TTTY1, TTTY1B, TTTY22, TTTY23,TTTY23B | Isochore |
| chrY | 14691127 | 14804076 | 112949 | TTTY15 | Isochore_border |
| chrY | 19563894 | 20143885 | 579991 | FAM41AY1, FAM41AY2, LINC00230B, LINC00230A, XKRY, XKRY2, CDY2B, CDY2A | Unknown |
| chrY | 20193885 | 20834702 | 640817 | XKRY, XKRY2, LINC00230A, LINC00230B, FAM41AY1, FAM41AY2, HSFY2, HSFY1,TTTY9B, TTTY9A | Unknown |
| chrY | 20837553 | 21080706 | 243153 | TTTY9B, TTTY9A, HSFY2, HSFY1, NCRNA00185 | Unknown |
| chrY | 22564778 | 22665261 | 100483 | TTTY10 | Unknown |
| chrY | 23473201 | 23580342 | 107141 | RBMY2EP | Isochore_border |
| chrY | 23634362 | 23838234 | 203872 | RBMY1B, RBMY1A1, RBMY1E, RBMY1D, TTTY13 | Isochore_border |
| chrY | 23993156 | 24359930 | 366774 | RBMY1A1, RBMY1D, RBMY1B, RBMY1E, PRY, PRY2, TTTY6, TTTY6B, RBMY1F, RBMY1J | Isochore_border |
| chrY | 24500602 | 24620459 | 119857 | RBMY1F, RBMY1J, TTTY6B, TTTY6 | Unknown |
| chrY | 24620459 | 28160890 | 3540431 | PRY, PRY2, TTTY17B, TTTY17C,TTTY17A, TTTY4C, TTTY4B, TTTY4, BPY2B, BPY2, BPY2C, DAZ1, DAZ4, DAZ3, DAZ2, TTTY3B, TTTY3, CDY1, CDY1B, CSPG4P1Y, GOLGA2P2Y, GOLGA2P3Y | Isochore_border |
| chr9 | 42027732 | 42145811 | 118079 | Isochore_border | |
| chr9 | 44466205 | 44651655 | 185450 | Isochore_border | |
| chr9 | 45128500 | 45250203 | 121703 | Isochore_border | |
| chr9 | 65632583 | 65745692 | 113109 | Isochore_border | |
| chrY | 3016123 | 3134221 | 118098 | Isochore | |
| chrY | 3179117 | 3359419 | 180302 | Isochore_border | |
| chrY | 3833777 | 3966707 | 132930 | Unknown | |
| chrY | 3966708 | 4346934 | 380226 | Unknown | |
| chrY | 4466077 | 4593373 | 127296 | Unknown | |
| chrY | 4593411 | 4807708 | 214297 | Unknown | |
| chrY | 6482140 | 6677618 | 195478 | Isochore_border | |
| chrY | 7401836 | 7548914 | 147078 | Unknown | |
| chrY | 8214827 | 8334874 | 120047 | Isochore_border | |
| chrY | 15039955 | 15234829 | 194874 | Unknown | |
| chrY | 18248698 | 18381734 | 133036 | Unknown | |
| chrY | 18390543 | 18560004 | 169461 | Isochore_border | |
| chrY | 19375294 | 19500106 | 124812 | Unknown | |
| chrY | 22214221 | 22369679 | 155458 | Isochore_border | |
| chrY | 22419679 | 22564743 | 145064 | Isochore_border | |
| chrY | 23241568 | 23361665 | 120097 | Isochore_border | |
| chrY | 28160891 | 28509481 | 348590 | Isochore_border |
List of 20 common variant-free regions containing 20 genes.
| chr | CVFR_start | CVFR_end | CVFR_size | gene_name |
| chrX | 52098738 | 52231295 | 132557 | XAGE2, XAGE2B |
| chrX | 52267361 | 52395914 | 128553 | XAGE2, XAGE2B |
| chrY | 4834281 | 4935713 | 101432 | PCDH11Y |
| chrY | 4935714 | 5205540 | 269826 | PCDH11Y |
| chrY | 5274434 | 5421065 | 146631 | PCDH11Y |
| chrY | 9524503 | 9640365 | 115862 | TTTY8, TTTY8B, TTTY7B, TTTY7, TTTY21, |
| TTTY21B, TTTY2B, TTTY2, TTTY1, TTTY1B | ||||
| TTTY22 | ||||
| chrY | 20228333 | 20599266 | 370933 | XKRY, XKRY2, LINC00230A, LINC00230B |
| FAM41AY1, FAM41AY2 | ||||
| chrY | 3016123 | 3134221 | 118098 | |
| chrY | 3179117 | 3359419 | 180302 | |
| chrY | 4114366 | 4346934 | 232568 | |
| chrY | 4466077 | 4593373 | 127296 | |
| chrY | 4593411 | 4807708 | 214297 | |
| chrY | 6577215 | 6677618 | 100403 | |
| chrY | 8214827 | 8334874 | 120047 | |
| chrY | 15039955 | 15234829 | 194874 | |
| chrY | 17559652 | 17661377 | 101725 | |
| chrY | 18248698 | 18381734 | 133036 | |
| chrY | 18390543 | 18560004 | 169461 | |
| chrY | 19375294 | 19500106 | 124812 | |
| chrY | 23247004 | 23361665 | 114661 |
Top 6 GO terms from the functional annotation analysis of 97 CSFR genes by DAVID.
| Category | Term | Count | % | P-Value | FDR |
| GOTERM_BP_FAT | sexual reproduction | 9 | 14.8 | 0.00000003 | 0.000033 |
| GOTERM_BP_FAT | Spermatogenesis | 8 | 13.1 | 0.000000047 | 0.000052 |
| GOTERM_BP_FAT | male gamete generation | 8 | 13.1 | 0.000000047 | 0.000052 |
| GOTERM_BP_FAT | gamete generation | 8 | 13.1 | 0.00000026 | 0.00028 |
| GOTERM_BP_FAT | multicellular organism reproduction | 8 | 13.1 | 0.0000011 | 0.0012 |
| GOTERM_BP_FAT | reproductive process in a multicellular organism | 8 | 13.1 | 0.0000011 | 0.0012 |
gene included RBMY1A1, RBMY1B, RBMY1J, RBMY1F, XKRY, XKRY2, BPY2C, BPY2B, BPY2, CDY1, CDY1B, CDY2B, CDY2A, DAZ2, DAZ3, DAZ4, DAZ1, and TSPY2.
gene included RBMY1A1, RBMY1B, RBMY1J, RBMY1F, BPY2C, BPY2B, BPY2, CDY1, CDY1B, CDY2B, CDY2A, DAZ2, DAZ3, DAZ4, DAZ1, and TSPY2.