Literature DB >> 18466588

Comparison of strategies for identification of regulatory quantitative trait loci of transcript expression traits.

Nora Franceschini1, Mary K Wojczynski, Harald H H Göring, Juan Manuel Peralta, Thomas D Dyer, Xia Li, Hao Li, Kari E North.   

Abstract

In order to identify regulatory genes, we determined the heritability of gene transcripts, performed linkage analysis to identify quantitative trait loci (QTLs), and evaluated the evidence for shared genetic effects among transcripts with co-localized QTLs in non-diseased participants from 14 CEPH (Centre d'Etude du Polymorphisme Humain) Utah families. Seventy-six percent of transcripts had a significant heritability and 54% of them had LOD score >or= 1.8. Bivariate genetic analysis of 15 transcripts that had co-localized QTLs on 4q28.2-q31.1 identified significant genetic correlation among some transcripts although no improvement in the magnitude of LOD scores in this region was noted. Similar results were found in analysis of 12 transcripts, that had co-localized QTLs in the 13q34 region. Principal-component analyses did not improve the ability to identify chromosomal regions of co-localized gene expressions.

Entities:  

Year:  2007        PMID: 18466588      PMCID: PMC2367462          DOI: 10.1186/1753-6561-1-s1-s85

Source DB:  PubMed          Journal:  BMC Proc        ISSN: 1753-6561


Background

There is a breadth of information being generated by the Human Genome Project and the interpretation of these data has been a major area of research. For simple Mendelian disorders, the identification of genetic effects is fairly straightforward due to understanding the biology that drives these disorders. However, for complex oligogenic or polygenic disorders, understanding all the interconnections between genes influencing a trait is a difficult task because the understanding of the biology of many of these disorders is still evolving. Multiple gene × gene and gene × environment interactions can influence the expression of phenotypes. Genes can interact by modifying the expression of other genes and therefore function as regulatory genes [1]. In an effort to dissect some of these complexities, we performed linkage analysis of gene expression transcripts of members of Centre d'Etude du Polymorphisme Humain (CEPH) Utah families to determine the heritability of transcripts and the evidence for regulatory quantitative trait loci (QTLs) and we performed pairwise bivariate linkage analysis and principal-component analysis (PCA) for data-reduction to evaluate the evidence for shared genetic effects. The ability to assess gene expression traits simultaneously and to link them to QTLs offers the possibility of identifying previously unknown underlying molecular processes for future investigation.

Methods

Population and phenotypes

We used the Genetic Analysis Workshop 15 (GAW 15) Problem 1 microarray gene expression profiles for the analyses. Data were available for 14 three-generation CEPH Utah families. Expression levels of genes were obtained from lymphoblastoid cells using the Affymetrix Human Focus Arrays [2]. We were provided with data on 3554 transcripts that showed greater variation between individuals than within the same individual. Family members were genotyped for 2882 autosomal and X-linked single-nucleotide polymorphisms (SNP) generated by the SNP Consortium . Genetic map positions were obtained using the SNP Mapping web application developed by the University College Dublin Conway Institute of Biomolecular and Biomedical Research , which uses the Rutgers Combined Linkage-Physical Map of the Human Genome and data taken from the NCBI dbSNP Build 123 (in Kosambi centimorgans). This information was used to calculate multipoint identity by descent matrices (MIBDs) with Merlin and Minx [3], after removal of Mendelian inconsistencies and double recombinants with Preswalk (based on Simwalk mistyping probabilities) [4]. MIBDs were used for linkage analyses. Transcript distributions were normalized using an inverse normalization transformation of z-scores of individual transcripts regressed on the mean individual transcript level. We further adjusted for the effects of age, age2, sex, age × sex and age2 × sex interaction using predictive linear regression models in SAS 9.1 (Cary, NC). We generated these residuals as part of our processing of the transcripts for linkage analyses.

Heritability estimation and linkage analysis

Heritability was estimated using maximum likelihood variance decomposition methods in SOLAR [5,6]. Genome scans were performed using multipoint variance-components models that test for linkage between traits and genetic variants by partitioning the variance of the expression level into its additive genetic and environmental variance components [7]. For transcripts with co-localized QTLs, we performed bivariate linkage analysis to identify shared genetic effects. The bivariate polygenic model estimates correlations caused by residual additive genetic effects (ρG) and correlations caused by random environmental effects (ρE) [8]. To test for additive genetic correlation among pairs of traits, the log likelihood of a model in which ρG is constrained to 0 (null hypothesis, no correlation) or ρG = 1 (null hypothesis, complete shared genetic effect) is compared to that of a model in which ρG is estimated for the traits. Significant differences among the models (ρG ≠ 0) suggest that some of the same genes influence both traits. We also performed linkage analysis using the factors obtained from the PCA in a sample of transcripts with co-localized QTLs.

Principal-component analysis

PCA was used to reduce the number of expression profiles into statistically meaningful groups while retaining the original total variance using all the expression profiles [9,10]. We selected two different chromosomal regions of a length of 10 to 12 MB in which the QTLs of at least 10 transcripts were co-localized. Only transcripts of genes that were not located in these selected chromosomal regions were included in the analyses (trans-regulatory genes). Because of the small number of individuals in the study and concerns of overfitting the model, a maximum sample of 50 transcript values were considered at one time [11]. The number of factors was determined using the eigenvalue-one requirement [11]. Factors are interpreted by examining the varimax-rotated factor loadings, which are the correlations between each phenotype and the factor in question. Factor loadings greater than or equal to 0.40 in absolute value were used to interpret factors and to characterize the factor structures; this criterion ensures that the individual factor variables share at least 15% of their variance with the given factor [9]. The principle components were obtained by calculating the eigenvalues of the sample covariance matrix, which represent the amount of variance contributed by each factor. Only factors with eigenvalues higher than 1 were considered for linkage analysis.

Integrating data from linkage analysis for gene co-expression

Linkage signals of individual transcript expressions were recorded and the location of QTLs was compared to the location of the transcript gene in order to identify trans-regulatory sites. In addition, the location and LOD scores of QTLs identified in single individual transcript analysis (univariate analysis) were compared with the location of the QTLs identified using bivariate analysis or factors of the PCA. This allowed a determination of whether the bivariate analysis or PCA data reduction analysis improved our evidence for linkage, and if so, a more in-depth examination of the transcripts included in the principal components needs to be examined for biologic interactions on complex disorders.

Results

Among 194 individuals from 14 families, 17 individuals with missing information on age were excluded. Seventy-six percent (n = 2688) of the transcripts had significant heritability (p < 0.05) and were considered for the linkage analysis. Of this, 1448 (54%) transcripts displayed suggestive evidence of linkage (had a maximum LOD score ≥ 1.8 [12]). The QTLs of 1661 transcripts (759 of which with LOD ≥ 1.8) were localized in a different region than the gene transcript (trans-regulatory sites). We used two different chromosomal regions with co-localized transcript QTLs, chromosomes 4q28.2-q31.1 and 13q34, for more in-depth analyses.

Chromosome 4q28.2-q31.1 region

Table 1 reports the results for the chromosome 4q28.2-q31.1 region. Fifteen transcripts co-localized in this region in the univariate linkage analysis, and the LOD scores ranged from 1.17 to 3.72. The strongest linkage signals were observed for the transcripts of the MX2, NUCB2, and SNX4 genes. Using PCA, we obtained five factors from the 15 transcripts with eigenvalues greater than 1. Only one factor, with a high positive loading for the MX2 gene transcript, had a significant heritability and a LOD score ≥ 1.8. The linkage analysis using this factor identified the chromosome region for the MX2 gene, but the LOD score was lower than the one obtained by single linkage analysis of the MX2 transcript.
Table 1

Univariate transcript heritability and linkage analysis compared to principal-component approach: chromosome 4q28.2-31.1 region

TranscriptGroupaGene nameGene symbolH2 (SE)bH2 p-valueLOD scoreTrait locus locationTranscript gene locus
218935_atEH-domain containing 3EHD30.20 (0.13)0.011.984q28.22p21
212652_s_at1sorting nexin 4SNX40.21 (0.09)0.0013.034q28.23q21.2
212426_s_at1tyrosine 3-monooxygenase/tryptophan 5-monooxygenase activation protein, theta polypeptideYWHAQ0.40 (0.15)<0.0012.124q28.22p25.1
207076_s_at1argininosuccinate synthetaseASS0.19 (0.11)0.011.174q28.29q34.1
213798_s_at2CAP, adenylate cyclase-associated protein 1 (yeast)CAP10.16 (0.09)0.0081.734q28.21p34.2
220143_atLUC7-like (S. cerevisiae)LUC7L0.16 (0.09)0.0131.794q28.316p13.3
204994_at1myxovirus (influenza virus) resistance 2 (mouse)MX20.30 (0.13)<0.0013.724q28.321q22.3
200974_atactin, alpha 2, smooth muscle, aortaACTA20.20 (0.09)0.0031.614q28.3-q31.110q23.3
201397_at2phosphoglycerate dehydrogenasePHGDH0.32 (0.12)<0.0012.124q28.3-q31.11p12
203882_at1interferon-stimulated transcription factor 3, gamma 48 kDaISGF3G0.19 (0.10)0.0061.794q28.3-q31.114q11.2
203675_at2nucleobindin 2NUCB20.47 (0.17)<0.0013.154q28.311p15.1-p14
201195_s_atsolute carrier family 7, member 5SLC7A50.16 (0.09)0.011.594q31.116q24.3
201681_s_at2discs, large homolog 5 (Drosophila)DLG50.22 (0.12)0.0052.054q31.110q23
202531_atinterferon regulatory factor 1IRF10.21 (0.14)0.022.804q31.15q31.1
202732_at2protein kinase (cAMP-dependent, catalytic) inhibitor gammaPKIG0.38 (0.11)<0.0011.784q31.120q12-q13.1
Principal component analysis, 15 transcripts factorcN/A0.29 (0.14)0.0011.994q28.3N/A
 Group 1 factor 1 (loading MX2)N/A0.27 (0.13)0.0022.284q28.3N/A
 Group 1 factor 2 (loading YWHAQ)N/A0.29 (0.13)0.0010.004q28.2-31.1N/A

aGroup 1 and 2 have transcripts correlated using bivariate analysis. Transcripts without correlation on bivariate analyses were not assigned a group.

b H2, heritability; SE, standard error; N/A, not apply.

c For principal component analysis, only factors with significant heritability (alpha = 0.05) are shown.

Univariate transcript heritability and linkage analysis compared to principal-component approach: chromosome 4q28.2-31.1 region aGroup 1 and 2 have transcripts correlated using bivariate analysis. Transcripts without correlation on bivariate analyses were not assigned a group. b H2, heritability; SE, standard error; N/A, not apply. c For principal component analysis, only factors with significant heritability (alpha = 0.05) are shown. We then performed bivariate analysis of all pairwise co-localized transcripts on 4q28.2-q31.1 and found evidence for genetic correlation of co-localized genes, although without much increase in the magnitude of the LOD score (Figure 1). This analysis identified two networks of gene expressions (Figure 1). We obtained two factors using PCA of the first network (Group 1, SNX4, YWHAQ, ASS, MX2, and ISGF3G gene transcripts). Both factors had significant heritability; however, only Factor 1, loading heavily on the MX2 gene, localized to the 4q28.2-q31.1 region (Table 1), and the magnitude of the LOD score was lower than that of the univariate MX2 gene transcript analysis (LOD = 2.28). The heritability of one factor obtained using PCA for Group 2 transcripts was not significant and further analysis was not performed.
Figure 1

Chromosome 4 co-localized gene transcripts univariate and bivariate analyses results (. Each box has the transcript name (in bold) and the univariate transcript LOD score. Genetic correlation (ρG) between two transcripts and p-values are displayed in the outside box along with the bivariate LOD scores. We found two potential networks of regulatory genes among 15 co-expressed transcripts on the 4q28.2 to 4q31.1 region. Five transcripts did not have significant genetic correlation with any other transcript and are not included in this graph.

Chromosome 4 co-localized gene transcripts univariate and bivariate analyses results (. Each box has the transcript name (in bold) and the univariate transcript LOD score. Genetic correlation (ρG) between two transcripts and p-values are displayed in the outside box along with the bivariate LOD scores. We found two potential networks of regulatory genes among 15 co-expressed transcripts on the 4q28.2 to 4q31.1 region. Five transcripts did not have significant genetic correlation with any other transcript and are not included in this graph.

Chromosome 13q34 region

We performed analysis in an additional chromosome region of co-localized transcripts, 13q34 region, and noted similar results. Using univariate analysis, 12 transcripts co-localized in this region; and bivariate analysis revealed an intricate network of correlated traits (Table 2 and Figure 2). Using PCA, we obtained five factors, three of them with significant heritability. Similar to our previous findings on chomosome 4, PCA factors did not improve the magnitude of the LOD scores when compared to univariate analysis.
Table 2

Univariate transcript heritability and linkage analysis compared to principal-component approach: chromosome 13q34 region

TranscriptGene NameGene SymbolH2 (SE)aH2 p-valueLOD scoreTrait locus locationTranscript gene locus
200805_atlectin, mannose-binding 2LMAN20.25(0.10)0.00031.313q33.2-q345q35.3
209375_atxeroderma pigmentosum, complementation group CXPC0.23 (0.13)0.0071.813q33.2-q343p25
211564_s_atPDZ and LIM domain 4PDLIM40.20 (0.13)0.012.013q345q31.1
203366_atpolymerase (DNA directed), gammaPOLG0.21 (0.11)0.0022.013q3415q25
210502_s_atpeptidylprolyl isomerase E (cyclophilin E)PPIE0.40 (0.13)<0.0011.813q341p32
217922_atMannosidase, alpha, class 1A, member 2MAN1A20.21 (0.10)0.0031.613q341p13
209715_atchromobox homolog 5 (HP1 alpha homolog, Drosophila)CBX50.28 (0.11)<0.0012.413q3412q13.13
203880_atCOX17 homolog, cytochrome c oxidase assembly protein (yeast)COX170.28 (0.11)0.00041.313q343q13.33
201145_atHCLS1 associated protein X-1HAX10.33 (0.14)0.00042.013q341q21.3
201157_s_atN-myristoyltransferase 1NMT10.31 (0.11)0.00022.013q3417q21.31
209219_atRD RNA binding proteinRDBP0.36 (0.11)<0.0011.613q346p21.3
217932_atmitochondrial ribosomal protein S7MRPS70.11 (0.08)0.051.013q3417q25
Principal component analysis, factor 2 (loading HAX1)bN/A0.28 (0.13)0.0011.613q34N/A
 factor 3 (loading MRPS7)N/A0.21 (0.10)0.0031.113q33.1-33.2N/A
 factor 5 (loading NMT1)N/A0.20 (0.12)0.021.913q34N/A

aH2, heritability; se, standard error; N/A, not apply.

b For principal component analysis, only factors with significant heritability (alpha = 0.05) are shown.

Figure 2

Chromosome 13 co-localized gene transcriptsunivariate and bivariate analyses results (. See legend to Figure 1 for explanation of symbols.

Univariate transcript heritability and linkage analysis compared to principal-component approach: chromosome 13q34 region aH2, heritability; se, standard error; N/A, not apply. b For principal component analysis, only factors with significant heritability (alpha = 0.05) are shown. Chromosome 13 co-localized gene transcriptsunivariate and bivariate analyses results (. See legend to Figure 1 for explanation of symbols.

Discussion

In this study, we identified co-localized QTLs of individual transcripts and compared the univariate and bivariate linkage results using single transcripts to those using factors obtained from PCA. By using factors that accounted for the variance of multiple transcripts with co-localized QTLs, we attempted to reduce the number of linkage analyses performed as well as possibly identifying previously unknown patterns of associated gene expression profiles. The PCA did in fact reduce the number of linkage analyses performed, but it did not improve the magnitude of signals in the target QTLs as compared with univariate or bivariate analyses. In fact, in at least one case, PCA was unable to detect a linkage signal for the main gene transcript loading in the factor (Table 1, Group 1, Factor 2). We also performed pairwise bivariate genetic analysis on those transcripts that co-localized to the same genomic region, presumably because this area of the genome harbored genes involved in the regulation of these transcripts [2]. We detected significant genetic correlation of these co-localized transcripts, indicating potential gene networks operating in these regions. However, in most cases, bivariate linkage analysis did not improve the magnitude of the LOD score compared to univariate analysis. Most traits were highly correlated (ρG > 0.60), and therefore they may provide redundant information that may reduce the power for detection of the bivariate signal [8]. In addition, because ρG is a test of the overall additive genetic correlation among two traits and not the QTL-specific pleiotropy, it is possible that the co-localized linkage signals are not in fact genetically correlated. Further analysis is required to address these issues. The chromosome regions selected for detailed analyses were arbitrarily chosen as we identified multiple other regions with co-localized linkage of gene expressions. The results from our univariate genome scan differ markedly from those reported by Morley et al. [2] because we included a smaller sample of individuals so that adjustment for covariate effects of age could be made. Our analysis strategy also adjusted for the effects of age and sex, which could also add to the observed differences [13]. Finally, our definition of genome window size for co-localized gene expressions was twice larger than the one described in the study of Morley et al.

Conclusion

We identified several chromosomal regions of co-localized trans-regulatory genes with significant heritability. Some of these regulatory genes displayed strong additive genetic correlations, and may be part of genetic networks. However, when compared to univariate analysis, linkage analysis of bivariate phenotypes and factor scores obtained from PCA did not improve the ability to identify chromosomal regions of co-localized gene expressions.

List of Abbreviations

CEPH: Centre d'Etude du Polymorphisme Humain GAW: Genetic Analysis Workshop H2: heritability LOD: logarithm of the odds MIBD: multipoint identity-by-descent matrices N/A: not apply NCBI: National Center for Biotechnology Information PCA: principal-component analysis QTL: quantitative trait loci SE: standard error of the mean SNP: single-nucleotide polymorphism SOLAR: Sequential Oligogenic Linkage Analysis Routines

Competing interests

The author(s) declare that they have no competing interests.
  10 in total

Review 1.  False positives and false negatives in genome scans.

Authors:  D C Rao; C Gu
Journal:  Adv Genet       Date:  2001       Impact factor: 1.944

2.  Detection and integration of genotyping errors in statistical genetics.

Authors:  Eric Sobel; Jeanette C Papp; Kenneth Lange
Journal:  Am J Hum Genet       Date:  2002-01-08       Impact factor: 11.025

3.  Merlin--rapid analysis of dense genetic maps using sparse gene flow trees.

Authors:  Gonçalo R Abecasis; Stacey S Cherny; William O Cookson; Lon R Cardon
Journal:  Nat Genet       Date:  2001-12-03       Impact factor: 38.330

4.  Partitioning large-sample microarray-based gene expression profiles using principal components analysis.

Authors:  Leif E Peterson
Journal:  Comput Methods Programs Biomed       Date:  2003-02       Impact factor: 5.428

Review 5.  Novel integrative approaches to the identification of candidate genes in hypertension.

Authors:  Norbert Hubner; Chana Yagil; Yoram Yagil
Journal:  Hypertension       Date:  2005-12-12       Impact factor: 10.190

6.  Bivariate quantitative trait linkage analysis: pleiotropy versus co-incident linkages.

Authors:  L Almasy; T D Dyer; J Blangero
Journal:  Genet Epidemiol       Date:  1997       Impact factor: 2.135

7.  Multipoint quantitative-trait linkage analysis in general pedigrees.

Authors:  L Almasy; J Blangero
Journal:  Am J Hum Genet       Date:  1998-05       Impact factor: 11.025

8.  Extensions to multivariate normal models for pedigree analysis.

Authors:  J L Hopper; J D Mathews
Journal:  Ann Hum Genet       Date:  1982-10       Impact factor: 1.670

9.  Extensions to pedigree analysis. IV. Covariance components models for multivariate traits.

Authors:  K Lange; M Boehnke
Journal:  Am J Med Genet       Date:  1983-03

10.  Genetic analysis of genome-wide variation in human gene expression.

Authors:  Michael Morley; Cliona M Molony; Teresa M Weber; James L Devlin; Kathryn G Ewens; Richard S Spielman; Vivian G Cheung
Journal:  Nature       Date:  2004-07-21       Impact factor: 49.962

  10 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.