| Literature DB >> 21097891 |
Meng-Ru Ho1, Kuo-Wang Tsai, Chun-houh Chen, Wen-chang Lin.
Abstract
Gene duplications are scattered widely throughout the human genome. A single-base difference located in nearly identical duplicated segments may be misjudged as a single nucleotide polymorphism (SNP) from individuals. This imperfection is undistinguishable in current genotyping methods. As the next-generation sequencing technologies become more popular for sequence-based association studies, numerous ambiguous SNPs are rapidly accumulated. Thus, analyzing duplication variations in the reference genome to assist in preventing false positive SNPs is imperative. We have identified >10% of human genes associated with duplicated gene loci (DGL). Through meticulous sequence alignments of DGL, we systematically designated 1,236,956 variations as duplicated gene nucleotide variants (DNVs). The DNV database (dbDNV) (http://goods.ibms.sinica.edu.tw/DNVs/) has been established to promote more accurate variation annotation. Aside from the flat file download, users can explore the gene-related duplications and the associated DNVs by DGL and DNV searches, respectively. In addition, the dbDNV contains 304,110 DNV-coupled SNPs. From DNV-coupled SNP search, users observe which SNP records are also variants among duplicates. This is useful while ∼58% of exonic SNPs in DGL are DNV-coupled. Because of high accumulation of ambiguous SNPs, we suggest that annotating SNPs with DNVs possibilities should improve association studies of these variants with human diseases.Entities:
Mesh:
Year: 2010 PMID: 21097891 PMCID: PMC3013738 DOI: 10.1093/nar/gkq1197
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.SNPs versus Paralogous Sequence Variants (PSVs). The left panel displays a SNP (T/G) in the population. The base type of the specific genomic location varies among individuals as some are homozygous and some are heterozygous. The right panel illustrates PSVs. Two copies of the segment, duplications, exist in the genome. These two duplicates possess a different but invariant base (T/G) at the position of interest. The PSV may be undistinguishable from the SNP in genotyping.
Figure 2.Flowchart of the discovery pipeline. The flowchart describes the strategy used to identify DGL and their associated DNVs in human reference genome. The existing SNPs in dbSNP that occur at DNVs provide evidence of the ambiguity.