| Literature DB >> 31882953 |
Fan Song1, Yu Tao1, Yue Sun2,3, David Saffen4,5,6.
Abstract
In this study, we present a novel, multiple coefficient of determination (R2M)-based method for parsing SNPs located within the chromosomal neighborhood of a gene into semi-independent families, each of which corresponds to one or more functional variants that regulate transcription of the gene. Specifically, our method utilizes a matrix equation framework to calculate R2M values for SNPs within a chromosome region of interest (ROI) based upon the choices of 1-4 "index" SNPs (iSNPs) that serve as proxies for underlying regulatory variants. Exhaustive testing of sets of 1-4 candidate iSNPs identifies iSNP models that best account for estimated R2 values derived from single-variable linear regression analysis of correlations between mRNA expression and genotypes of individual SNPs. Subsequent genotype-based estimation of pairwise r2 linkage disequilibrium (LD) coefficients between each iSNP and the other ROI SNPs allows the SNPs to be parsed into semi-independent families. Analysis of mRNA expression and genotypes data downloaded from Gene Expression Omnibus (GEO) and database for Genotypes and Phenotypes (dbGAP) demonstrates the usefulness of this method for parsing SNPs based on experimental data. We believe that this method will be widely applicable for the analysis of the genetic basis of mRNA expression and visualizing the contributions of multiple genetic variants to the regulation of individual genes.Entities:
Mesh:
Substances:
Year: 2019 PMID: 31882953 PMCID: PMC6934451 DOI: 10.1038/s41598-019-56494-9
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Solutions to higher-order “constraint” matrix equations.
| Bi-allelic regulatory variants or index SNPs in constraint matrix equation | Values of R2A = β2/4α2 for non-regulatory/non-index SNPs | |
|---|---|---|
| 1 | SNPB | (RBb12)2/b112 |
| 2 | SNPB, SNPC | (RBb12 + RCb13)2/b112 |
| 3 | SNPB, SNPC, SNPD | (RBb12 + RCb13 + RDb14)2/b112 |
| 4 | SNPB, SNPC, SNPD, SNPE | (RBb12 + RCb13 + RDb14 + REb15)2/b112 |
| 5 | SNPB, SNPC, SNPD, SNPE, SNPF | (RBb12 + RCb13 + RDb14 + REb15 + RFb16)2/b112 |
| N | SNPB, SNPC, SNPD, SNPE, SNPF,… SNPN | (RBb12 + RCb13 + RDb14 + REb15 + RFb16 + … + RNb1N)2/b112 |
b11, b12, etc. are elements of the adjugate matrix (adjR) of the correlation matrix R, defined for each set of bi-allelic regulatory variants or index SNPs (SNPB, SNPC, etc.) and β and α refer to terms in the quadratic equation used to solve the polynomial equations derived from the “constraint” matrix equations for RA: RA = ─β ± (β2 − 4αγ)1/2/2α = −β/2α, since (β2 − 4αγ)1/2 = 0 under the defined constraints. (See Supplementary Files 1 and 3 for details concerning mathematical notation and equation derivations).
Figure 1Solutions to polynomial equations derived from “constraint” matrix equations.
Figure 2Multiple coefficient of determination-based analysis of human MTHFR mRNA expression in frontal temporal cortex (FCTX). (A) Screen shot from the USCS Genome Browser (GCH37/hg19 version) showing the chromosome 1 ROI containing MTHFR and neighboring genes. The two tracks at the bottom of this panel show: (i) levels of histone H3-lysine 27 acetylation (H3K27Ac), a marker for open, transcriptionally active chromatin detected in multiple cell lines, and (ii) clusters of DNase I-sensitive sites (DNase clusters), which are also markers for open chromatin. (B) Upper graph: a “R2-R2” plot comparing estimated values for coefficients of correlation (R2) derived from single-variable linear regression analyses of correlations between mRNA expression levels and genotypes for 100 genotyped and 298 imputed SNPs in the chromosome ROI [blue (nominal P < 0.05) and grey (nominal P ≥ 0.05) bars] and predicted R2 values calculated as described in the text. Lower graph: a “R2− Δ2” plot showing the parsing of ROI SNPs into three semi-independent families, each comprising a subset of ROI SNPs that are in LD with one of three index SNP (iSNP) selected as described in the text. The three iSNPs listed on the upper-right corners of the two plots were selected as the combination of SNPs that produced the closest agreement (smallest NRMSE) between estimated and predicted R2 values among thousands of randomly select combinations of three ROI SNPs. The adjusted R2model (adjusted R2) listed on the top left of the upper plot, was derived from linear regression analysis of the correlation between estimated and predicted R2-values and provides a measure of the “goodness-of-fit” for this combination of iSNPs. (See Supplementary File 2. Online Methods for details concerning imputation of SNP genotypes).