| Literature DB >> 29084591 |
Li Chen1, Zhaohui S Qin2,3.
Abstract
OBJECTIVE: The majority of sequence variants identified by Genome-wide association studies (GWASs) fall outside of the protein-coding regions. Unlike coding variants, it is challenging to connect these noncoding variants to the pathophysiology of complex diseases/traits due to the lack of functional annotations in the non-coding regions. To overcome this, by leveraging the rich collection of genomic and epigenomic profiles, we have developed DIVAN, or Disease/trait-specific Variant ANnotation, which enables the assignment of a measurement (D-score) for each base of the human genome in a disease/trait-specific manner. To facilitate the utilization of DIVAN, we pre-computed D-scores for every base of the human genome (hg19) for 45 different diseases/traits.Entities:
Keywords: D-score; DIVAN; Non-coding variants; Software
Mesh:
Year: 2017 PMID: 29084591 PMCID: PMC5663107 DOI: 10.1186/s13104-017-2851-y
Source DB: PubMed Journal: BMC Res Notes ISSN: 1756-0500
Fig. 1a Illustration of using DIVAN to obtain D-scores of known variants by variant identifiers. The input file contains a list of variant identifiers with each variant as one row. The output file contains tab-delimited columns representing variant identifier, D-score, chromosome, chromosome position and D-score percentile of each variant respectively. b Illustration of using DIVAN to obtain D-scores of known variants fall inside genomic regions of interest. The input file contains a list of genomic regions in the format of tab-delimited chromosome, start and end positions. The D-scores of known variants located within each genomic region are reported. The output file contains tab-delimited columns representing chromosome, start and end positions, variant identifier, position of variant and D-score with its corresponding percentile of each variant respectively. c Illustration of using DIVAN to obtain average D-scores of genomic regions of interest. The input file contains a list of genomic regions in the format of tab-delimited chromosome, start and end position. The mean and standard deviation of D-scores for all bases within each genomic region are calculated. The output file contains tab-delimited columns representing chromosome, start and end positions, mean of D-scores with the corresponding percentile and standard deviation of D-scores for each region respectively
Fig. 2a D-score distribution for rs924080 for 45 diseases/traits. b D-score distribution of glucose-associated SNPs located in the SSU72 gene body for 45 diseases/traits
Fig. 3D-score distributions of the background (all bases in chr22) and risk variants associated with four diseases: Behcet Syndrome, Macular Degeneration, Bipolar Disorder and Pancreatic Neoplasms respectively