| Literature DB >> 21569597 |
Abstract
Uncovering the underlying genetic component of any disease is key to the understanding of its pathophysiology and may open new avenues for development of therapeutic strategies and biomarkers. In the past several years, there has been an explosion of genome-wide association studies (GWAS) resulting in the discovery of novel candidate genes conferring risk for complex diseases, including neurodegenerative diseases. Despite this success, there still remains a substantial genetic component for many complex traits and conditions that is unexplained by the GWAS findings. Additionally, in many cases, the mechanism of action of the newly discovered disease risk variants is not inherently obvious. Furthermore, a genetic region with multiple genes may be identified via GWAS, making it difficult to discern the true disease risk gene. Several alternative approaches are proposed to overcome these potential shortcomings of GWAS, including the use of quantitative, biologically relevant phenotypes. Gene expression levels represent an important class of endophenotypes. Genetic linkage and association studies that utilize gene expression levels as endophenotypes determined that the expression levels of many genes are under genetic influence. This led to the postulate that there may exist many genetic variants that confer disease risk via modifying gene expression levels. Results from the handful of genetic studies which assess gene expression level endophenotypes in conjunction with disease risk suggest that this combined phenotype approach may both increase the power for gene discovery and lead to an enhanced understanding of their mode of action. This review summarizes the evidence in support of gene expression levels as promising endophenotypes in the discovery and characterization of novel candidate genes for complex diseases, which may also represent a novel approach in the genetic studies of Alzheimer's and other neurodegenerative diseases.Entities:
Year: 2011 PMID: 21569597 PMCID: PMC3113300 DOI: 10.1186/1750-1326-6-31
Source DB: PubMed Journal: Mol Neurodegener ISSN: 1750-1326 Impact factor: 14.195
Summary of studies on geneticsof human gene expression: Study characteristics
| Reference | Reference ID | Organism | Tissue | Sample Size | Samples | Transcript Platform | Genotyping Platform |
|---|---|---|---|---|---|---|---|
| [ | Human | Lymphoblastoid cell lines (LCL) | 96 | Subjects from the CEPH families (17-37 subjects who were heterozygouse for any given gene) | ABI Prism SNaPshot Multiplex Kit for 13 genes. | SNPs for 13 genes. | |
| [ | Mouse | Liver | 111 | F2 mice constructed from two standard inbred strains, C57BL/6J and DBA/2J | 23,574 transcripts (Rosetta Inpharmatics Merck). | >100 microsatellite markers. | |
| Z. mays (corn) | Ear leaf tissue | 76 | An F3 cross constructed from standard inbred lines of Z. mays | 24,473 transcripts (Rosetta Inpharmatics Merck). | NA | ||
| Human | Lymphoblastoid cell lines (LCL) | 56 | Subjects from four CEPH families | 24,479 transcripts (Rosetta Inpharmatics Merck). | NA | ||
| [ | Human | Lymphoblastoid cell lines (LCL) | 45 | 35 unrelated subjects from CEPH families vs. 1 reference pool of 10 subjects. 5 genes assessed in a larger sample size (49 unrelated CEPH subjects, 41 sibs from 5 CEPH families and 10 monozygotic twin pairs). | 5000 random cDNA clones from IMAGE consortium. | Not done. | |
| [ | Human | Lymphoblastoid cell lines (LCL) | 234 | 94 unrelated grandparents from CEPH families and ~140 subjects from 14 large CEPH families. | 8,500 transcripts (Analysis restricted to 3554 most variable expression phenotypes). | 2,756 autosomal SNP markers (SNP Consortium). | |
| [ | Human | Lymphoblastoid cell lines (LCL) | 167 subjects | From 15 CEPH families | 23,499 transcripts (25K human gene oligonucleotide microarray) | 346 autosomal genetic markers. | |
| [ | Human | Lymphoblastoid cell lines (LCL) | 57 | Unrelated CEPH subjects. | 374 transcripts (subset of Morley et al. 2004); Affymetrix Human Genome Focus arrays. | 770,394 SNPs. | |
| [ | Human | Lymphoblastoid cell lines (LCL) | 60 | HapMap Unrelated Caucasian Subjects (CEU) | 1,433 transcripts for 630 genes (Illumina BeadArray, custom). | 753,712 SNPs | |
| [ | Human | Lymphoblastoid cell lines (LCL) | 270 | HapMap Subjects, 4 populations (30 Caucasian trios (CEU), 45 unrelated Chinese (CHB), 45 unrelated Japanese (JPT), and 30 Yoruba trios (YRI)). | 14,456 transcripts for 13,643 genes (Illumina, Sentrix Human Whole Genome-6 Expression BeadChip version 1). | >2.2 million SNPs per population. | |
| [ | Human | Lymphoblastoid cell lines (LCL) | 210 | HapMap unrelated subjects, 4 populations (60 Caucasians (CEU), 45 Chinese (CHB), 45 Japanese (JPT), 60 Yoruban (YRI)). | 14,925 transcripts for 14,072 genes | Phase 1 HapMap SNPs and CNV data from comparative genomic hybridization (CGH) array of 26,574 clones. | |
| [ | Human | Lymphoblastoid cell lines (LCL) | 400 | Affected and unaffected children from families with an asthma proband. | 54,675 transcripts representing 20,599 genes (Affymetrix HG-U133 Plus 2.0 chip). | 109,157 SNPs for 830 subjects (Illumina Sentrix Human-1 Genotyping BeadChip) and 299,116 SNPs for 378 subjects (Illumina Sentrix HumanHap300 BeadChip). | |
| [ | Human | Lymphocytes (not transformed cells) | 1240 | Multigenerational Mexican-American families from the San Antonio Family Heart Study (SAFHS). | 19,648 transcripts for 18,519 genes (Illumina, Sentrix Human Whole Genome-6 Expression BeadChip version 1). | 432 highly polymorphic microsatellite markers. | |
| [ | Human | Blood and adipose tissue | 1,002 blood and 673 adipose cohorts. | Icelandic Family Blood (IFB) cohort (N = 1,002) and the Icelandic Family Adipose (IFA) cohort (N = 673). | 20,877 transcripts | 1,732 microsatellite markers for linkage analysis and 317,503 SNPs for association analysis (150 unrelated subjects). | |
| [ | Human | Liver | 427 | Unrelated subjects. | 39,280 transcripts for 34,266 genes (Custom Agilent microarray). | 782,476 unique SNPs (Affymetrix 500 K and Illumina 650Y) | |
| [ | Human | Brain | 193 | Human brain samples that are neuropathologically normal. | 14,078 transcripts (Illumina HumanRefseq-8 Expression BeadChip). | 366,140 SNPs (Affymetrix 500 K). | |
| [ | Human | Brain | 176 cases and 188 controls. | Human brain samples that are neuropathologically normal and with pathological Alzheimer's disease (AD). | 8,650 transcripts (Illumina HumanRefseq-8 Expression BeadChip). | 380,157 SNPs (Affymetrix 500 K). | |
Summary of studies on geneticsof human gene expression: Analyses, results, conclusions.
| Reference | Reference ID | Analytic Approach | Results | Conclusion |
|---|---|---|---|---|
| [ | Comparison of relative allelic expression levels within the same cellular sample. | Significant differences in allele-specific expression observed for 6 of 13 genes. Mendelian inheritance detected for expression levels, inherited together with genetic markers. | Gene expression levels can be used to detect genetics of disease susceptibility. | |
| [ | eQTL (expression quantitative trait locus) linkage analysis. | Differential expression detected for 7,861 of 23,574 (>33%) genes in the parental and ≥10% genes in the F2 strain. 9-16% genes have eQTLs with LOD scores > 4.3. Gene expression profiling identified three distinct expression patterns for 280 genes that distinguish mice at the lower 25th percentile of an obesity trait (fat-pad mass = FBM) and two groups at the upper 25th percentile of the FPM trait. These 280 genes were enriched for eQTLs. Linkage analysis of the obseity trait focused on groups with distinct expression patterns improved the signal. | Gene expression levels can be used to identify more refined disease sub-groups, genes and pathways that are implicated in the disease phenotype. These have implications in understanding genetics of complex diseases and drug discovery aimed at more homogeneous sub-groups of distinct expression patterns. | |
| eQTL (expression quantitative trait locus) linkage analysis. | 18,805 (77%) genes with differential expression. Of these, 6,481 genes with ≥ 1 eQTL with LOD score > 3.0. Total of 7,322 eQTLs. Interactions detected in <10% of eQTL. | |||
| Variance components analysis to test heritability. | 2,726 genes with differential expression (11%). Of those 29% have a detectable heritability. | |||
| [ | Utilized 3-4 replicate measurements per person. Calculated variance ratio of each gene expression by dividing the variance of expression levels among subjects by that within subjects (using replicates). | 50% of genes on the arrays are expressed in the LCLs. 813 genes with valid observations had variance ratios of 0.4-64. 5 genes evaluated in larger group and found to have highest variance among unrelateds, then sibs then monozygotic twins (10 pairs). | There is natural variation to gene expression levels which is at least in part determined genetically. Genetic differences among individuals may account for variations in gene expression and suggest underlying heritability. | |
| [ | Variance of expression levels detected from 94 unrelated subjects. 3554 expression phenotypes with greater variation between subjects than within subjects (replicates) were used for further analysis. Genome-wide linkage analysis conducted for these phenotypes in 14 CEPH families. | Found 984 expression phenotypes with pointwise linkage p < 0.05 genome-wide, more than the 178 false positives expected by chance alone. 142 phenotypes have pointwise p < 0.001, which exceeds chance (3.5 false positives expected). Of the top 142, 27 have a cis-(within 5 Mb) and 110 have a trans-regulator. Of the 984, 164 have multiple regulators (152 with multiple trans and both cis+trans for 12). There are linkage regions with multiple expressions linking to it, called "hotspots". Genes that map to one hotspot have expression levels with higher than expected correlations. Some of these genes have close physical locations. Some cis-SNPs show differential allelic expression. | Genetic factors that influence variation in human gene expression can act in cis (5 Mb) or trans. There are transcriptional "hotspots", which may contain "master regulators" of multiple genes. Mapping genetic factors that influence gene expression could help with the understanding of human biology and disease. | |
| [ | Variance components analysis to test heritability and eQTL analysis. Comparison of biological pathways within GO and KEGG, using genes clustered by genetic correlations (GC) and Pearson's correlations (PC). Correction for multiple testing with Bonferroni and false discovery rate (FDR). | 2,430 of 23,499 genes differentially expressed in ≥50% of children. Of these, 762 were heritable with FDR of 0.05 and median heritability of 0.34. These genes were enriched for immunity-pathways. 22 genes have significant eQTLs at genome-wide level. Did not detect "hotspots". 574 genes analyzed for GC and PC showed that both clusters have similar pathway coherence for GO, but GC has better pathway coherence for KEGG pathways. | Genetic factors influence gene expression in LCL. Important to test other tissues. Random samples may not have transcriptional "hotspots". Gene expression genetics may identify novel biological pathways. | |
| [ | Follow-up association study for the significant linkage findings from Morley et al. (2004). Linear regression association for 374 expression levels with prior evidence of | 65 of 374 expression levels have ≥1 SNP that associates at p < 0.001, 12 with p < 1E-10 and 133 with p < 0.01. Same proportions of associations found for the 5', 3' and genic regions. 14 out of top 27 | Strong linkage predicts strong association for expression levels. eSNPs NOT enriched for 5' or 3' end. eGWAS is feasible and may lead to genetic determinants of expression phenotypes. | |
| [ | Analyzed 374 of 630 genes with expression signals above background and most variable. Linear regression association for these genes (688 probes in total). Three methods for multiple-test correction: Bonferroni, FDR and permutations. | Good concordance between the 3 multiple test correction methods. For 10-40 of 374 genes, | eGWAS can identify variants with regulatory activity | |
| [ | Linear regression association analysis in 4 ethnic populations. Heritability estimates in Caucasian and Yoruba trios. Tested for significance by 10,000 permutations and FDR. Candidate trans-SNP analysis (SNPs with | 10% (4,829) and 13% (6,482) of all probes analyzed has heritability >0.2 in CEU and YRI trios, respectively, with 958 overlapping genes. 154 CEU and 217 YRI genes have heritability >0.5, with overlap of 9 genes. 831 genes with significant | There is a substantial number of heritable expression traits detectable in small population (30 trios), but also substantial non-genetic variation. Substantial overlap between different ethnic groups for significant eSNPs. Ethnic differences could in part be due to differences in SNP frequencies. Most eSNPs act in- | |
| [ | Linear regression association analysis in 4 ethnic populations. | Of 14,072 genes, 888 have ≥1 SNPs with significant association in ≥1 ethnic group, 331 of which were significant in ≥2 ethnic groups and 67 of which in all 4 populations. Of 14,072 genes, 238 have ≥1 CGH clone with significant association in ≥1 ethnic group, 28 of which were significant in ≥2 ethnic groups and 5 of which in all 4 populations. Not all CGH clones have detectable CNVs. 1322 CNV clones detected. 99 genes associate with ≥1 CNV clone, in ≥1 ethnic group, 34 of which with ≥2 ethnic groups, and 7 in all 4 populations. Most CNV associations cannot be detected by SNPs (87%). | Both SNP and CNV associations replicate across ethnic groups. CNVs appear to exert their effects by disrupting both regulatory regions as well as the genic regions. Survey of structural variants in addition to SNPs is important in eGWAS. | |
| [ | Variance components analysis to test heritability and eQTL analysis on the subset of transcripts with heritabilities > 0.3. Multiple testing corrections by FDR. | No significant differences between asthmatics and non-asthmatics (unchallenged cells). 15,084 transcripts (28%) = 6,660 genes have heritabilities > 0.3. Traits with higher heritability have SNPs that explain a bigger percentage of their heritability and therefore also have a larger lod score of association (on average peak SNP explains 18.2% heritability). SNP interactions could explain transcript levels not explained by single SNPs. | Joint analysis of disease GWAS and eGWAS identified potential candidate genes for asthma (ORMDL3), Crohn's disease (PTGER4), NIDDM (PHACS), thalassemia (HBS1L). eGWAS is a useful approach to detect disease SNPs with a functional role. | |
| [ | Variance components analysis to test heritability and eQTL analysis. | 16,678 transcripts (84.9%) were heritable with median heritability estimate of 22.5%. RefSeq transcripts have higher heritability estimates than non-RefSeq transcripts. At an FDR of 5%, identified 1,345 | Lymphocytes may provide more accurate representation of natural gene expression state than lymphoblasts, though there is overlap. | |
| [ | Correlations between obesity traits and blood and adipose tissue expression levels. Variance components analysis to test heritability and eQTL analysis. Linear regression association analysis. Multiple testing correction by FDR. Generated connectivity matrix of genes with high correlation of expression in adipose tissue, compared human and mouse data, identified GO categories enriched for co-regulated genes. | Adipose tissue expression levels (63-72%) correlate better with obesity traits than do blood expression levels (3-9%). 55% of blood and 75% of adipose tissue transcripts are significantly heritable, with average heritability of 30%. 2,529 (12%) significant | Significant overlap in genetic factors underlying gene expression in two different tissue types, but expression levels from clinically-relevant tissue correlates better with clinical-phenotypes. Expression correlation networks combined with | |
| [ | Linear regression association analysis. | At Bonferroni adjusted p < 0.05, 1,350 expression traits (1,273 genes); at FDR <10%, 3,210 traits (3,043 genes) identified to have at least one significant | Evidence of common genetic control between tissues as well as tissue-specific genetic control of expression. Significant | |
| [ | Linear regression association analysis. | 58% of the transcriptome has expression in ≥5% of control brains. Of these 21% correlate with a | Evidence for genetic control of human brain gene expression. Brain eSNPs may be used in conjunction with disease-SNPs for neurologic or psychiatric illnesses to identify functional variants. | |
| [ | Linear regression association analysis. Cis eSNPs defined as being within the gene or 1 Mb of its 3' or 5' end. Analyzed cases and controls both separately and jointly. In the combined analysis, tested for diagnosis effects on expression by comparing model with diagnosis only vs. one with diagnosis, SNP and diagnosis × SNP interaction. Multiple testing corrected by permutation approaches. Network analysis was done on the transcripts with a significant eQTL (p ≤ 0.01) and those without a significant eQTL but were differentially expressed between ADs and controls. | 58% of the transcriptome has expression in ≥5% of AD brains. Hybridization date and APOE had strongest influence and post-mortem interval least influence on brain expression levels. 1,829 significant | Transcriptome measurements in disease-relevant tissue is important. Brain transcriptome appears to be unique. eQTLs may be used as biomarkers for classifying preclinical subgroups. eQTL approach may help distinguish true disease risk variants. Using tissue from subjects with disease may be needed to capture most eSNPs that have disease interactions, though significant eSNPs without disease interactions and some with disease interactions can be identified in control and disease tissue equally well. | |