Literature DB >> 19474342

Using high-density exon arrays to profile gene expression in closely related species.

Lan Lin1, Song Liu, Heather Brockway, Junhee Seok, Peng Jiang, Wing Hung Wong, Yi Xing.   

Abstract

Global comparisons of gene expression profiles between species provide significant insight into gene regulation, evolutionary processes and disease mechanisms. In this work, we describe a flexible and intuitive approach for global expression profiling of closely related species, using high-density exon arrays designed for a single reference genome. The high-density probe coverage of exon arrays allows us to select identical sets of perfect-match probes to measure expression levels of orthologous genes. This eliminates a serious confounding factor in probe affinity effects of species-specific microarray probes, and enables direct comparisons of estimated expression indexes across species. Using a newly designed Affymetrix exon array, with eight probes per exon for approximately 315,000 exons in the human genome, we conducted expression profiling in corresponding tissues from humans, chimpanzees and rhesus macaques. Quantitative real-time PCR analysis of differentially expressed candidate genes is highly concordant with microarray data, yielding a validation rate of 21/22 for human versus chimpanzee differences, and 11/11 for human versus rhesus differences. This method has the potential to greatly facilitate biomedical and evolutionary studies of gene expression in nonhuman primates and can be easily extended to expression array design and comparative analysis of other animals and plants.

Entities:  

Mesh:

Substances:

Year:  2009        PMID: 19474342      PMCID: PMC2709591          DOI: 10.1093/nar/gkp420

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

Comparative genomic analysis of gene expression has become an important tool for studying mechanisms of gene regulation, evolution, and human diseases (1). A large number of studies have utilized microarray technology for global comparison of gene expression profiles between closely related species, such as humans and nonhuman primates (2). A typical gene expression array measures the expression levels of tens of thousands of genes simultaneously based on fluorescent intensities of probes complementary to specific gene targets (3). In past research, two microarray-based approaches were used for comparative analysis of gene expression (2). The first approach, often referred to as ‘cross-species microarray hybridization’, hybridizes RNAs from the species of interest to a microarray platform designed for a closely related species (4–9). For example, Khaitovich and colleagues hybridized human and chimpanzee RNAs to the Affymetrix human U133 Plus 2.0 arrays to examine within-species and between-species gene expression differences in five tissues (9). However, sequence divergence between orthologous genes poses a major problem for cross-species microarray hybridization (2,4,10). Microarray probes designed for a human gene may contain mismatches to orthologous transcripts from nonhuman primates. Although in principle it is possible to remove individual probes targeting non-conserved regions, the small number of probes per gene on conventional gene expression arrays significantly undermines the applicability of this filtering strategy (11). Based on the sequence divergence rate between human, chimpanzee and rhesus macaque genomes, Oshlack et al. estimated that an average of fewer than three probes per probeset on the Affymetrix U133 Plus 2.0 array perfectly matched orthologous mRNA sequences from all three species (11). The second approach is to design species-specific microarray probes for every species being studied (10,12). For example, Blekhman and colleagues recently designed a NimbleGen microarray containing species-specific probes for mRNA sequences of humans, chimpanzees and macaques (13). However, it is well known that even microarray probes for the same mRNA target could have substantially different fluorescent intensities due to probe-by-probe variation in hybridization affinity (14,15). In comparative genomic studies using species-specific probes, as probes are designed independently for orthologous genes, probe affinity effects prevent direct comparisons of expression indexes across species (2). In fact, two studies show that the gene expression indexes in human tissues, as measured by an Affymetrix human 3′ array, have poor correlation with expression indexes in corresponding mouse tissues measured by an Affymetrix mouse 3′ array (16,17). After the calculation of expression indexes in individual species, complex and technically challenging statistical procedures are needed to correct for probe affinity effects before it is feasible to compare expression indexes across species (11,12). In this work, we show that high-density exon arrays designed for a single reference genome can be used as a flexible platform for global comparisons of gene expression profiles between closely related species. With the increase of oligonucleotide probe density on microarrays, a new generation of expression arrays allocates multiple probes for every known and predicted exon in the genome (18). For example, the Affymetrix Human Exon 1.0 array has an average of four probes per exon and 147 probes per gene, including an average of 58 ‘core probes’ per gene targeting exon regions supported by RefSeq transcripts (18,19). The new Affymetrix Human Exon Junction array (HJAY) has eight probes per exon for approximately 315 000 exons in the human genome (20,21), representing a 2-fold increase in the density of exon probes when compared to the Exon 1.0 array. The increased probe density of the HJAY array in well-annotated exon regions is achieved by removing Exon 1.0 array probes targeting computationally predicted transcripts. With the high probe density of these new arrays, there are a large number of perfectly matched probes between humans and closely related nonhuman primates. In this study, we assess the possibility of using high-density exon arrays of a single species for comparative analysis of gene expression profiles. We introduce a simple computational procedure to construct robust expression indexes of orthologous genes, which are not confounded by probe affinity effects and can be directly compared across multiple species. We test whether this approach can reliably detect between-species differences in gene expression levels, using the HJAY array and quantitative real-time PCR analysis of corresponding human, chimpanzee, and rhesus macaque tissues. We also provide probe annotations and a computer program JETTA (Junction and Exon array Toolkit for Transcriptome Analysis) to support exon array analysis of gene expression in nonhuman primates.

MATERIALS AND METHODS

Identification of Human U133 Plus 2.0, Exon 1.0 and HJAY array probes targeting conserved regions between humans and nonhuman primates

Gene and probe annotations of the Affymetrix Human U133 Plus 2.0 array (GEO platform ID: GPL570) and the Exon 1.0 array (GEO platform ID: GPL5175) were downloaded from Affymetrix (www.affymetrix.com/products_services/arrays/specific/hgu133plus.affx and http://www.affymetrix.com/support/technical/byproduct.affx?product=huexon-st). The Affymetrix HJAY array (GEO platform ID: GPL8444) was purchased from Affymetrix as a Technology Access product. Gene and probe annotations of HJAY arrays were provided by Affymetrix. For each probe, we obtained the coordinate of its target sequence from the hg18 assembly of the human genome. Using UCSC pairwise genome alignments of the human genome (hg18) to the genomes of chimpanzee (panTro2), orangutan (ponAbe2) and rhesus macaque (rheMac2) (22,23), we compiled the list of probes whose 25mer target regions were perfectly conserved in nonhuman primates for each array platform. We used SeqMap (24) to search 25mer sequences of all probes against the genomes of human (hg18), chimpanzee (panTro2), orangutan (ponAbe2) and rhesus macaque (rheMac2). From these results, we identified probes for each platform that matched a single unique location in the human, chimpanzee, orangutan or rhesus macaque genome. By combining UCSC pairwise genome alignment results and SeqMap mapping results, we compiled the list of probes that perfectly matched the human genome and the genomes of nonhuman primates at a single unique location for all platforms in the study.

Human exon array data of 11 human tissues

We downloaded a public Affymetrix Exon 1.0 array data set of 11 human tissues (breast, cerebellum, heart, kidney, liver, muscle, pancreas, prostate, spleen, testes and thyroid), with three replicates per tissue (http://www.affymetrix.com/support/technical/sample_data/exon_array_data.affx).

Total RNA preparation and exon array profiling of human, chimpanzee and rhesus macaque tissues

Frozen cerebellums and livers from three chimpanzees and frozen cerebellums from three rhesus macaques were generously provided by Southwest National Primate Research Center (San Antonio, TX). Total RNA was extracted using TRIzol (Invitrogen, Carlsbad, CA) according to the manufacturer's instructions. Total human cerebellum RNA (pool of 24 male and female donors) was purchased from Clontech (Mountain View, CA). Single-pass cDNA was synthesized using High-Capacity cDNA Reverse Transcription Kit (Applied Biosystems, Foster City, CA) according to manufacturer's instructions. We used the Affymetrix Human Exon 1.0 array to profile cerebellum and liver tissues from chimpanzees, with biological replicates from three separate animals. We also used the Affymetrix HJAY array to profile cerebellum tissues from humans (three technical replicates of the pooled cerebellum RNA), chimpanzees and rhesus macaques (biological replicates from three separate animals of each species). Detailed information (e.g. age, gender) of all RNA samples is described in Supplementary Table 1. Sample preparation and hybridization were identical for each platform. RNA samples were prepared using the GeneChip Whole Transcript Sense Target Labeling Assay (Affymetrix). For each sample, 2 µg of total RNA was subjected to ribosomal RNA reduction. Following rRNA reduction, double-stranded cDNA was synthesized with random hexamers tagged with a T7 promoter sequence. The double-stranded cDNA was used as a template for amplification with T7 RNA polymerase to create antisense cRNA. Next, random hexamers were used to reverse transcribe the cRNA to produce single-stranded sense strand DNA. The DNA was fragmented and labeled with terminal deoxynucleotidyl transferase. A hybridization cocktail was prepared, hybridized to the arrays and scanned.

Calculation of gene expression indexes of humans and nonhuman primates

We developed the JETTA program (Junction and Exon array Toolkit for Transcriptome Analysis, http://gluegrant1.stanford.edu/~junhee/JETTA/) to calculate gene expression indexes from Affymetrix Human Exon 1.0 array data of chimpanzee and human tissues (two chimpanzee tissues and 11 human tissues, each with three replicates). To calculate the expression index, we first predicted the background intensities of individual Exon 1.0 array probes, using a sequence-specific linear model trained from ‘anti-genomic’ background probes on the Human Exon 1.0 array (19). These ‘anti-genomic’ background probes were selected by Affymetrix to avoid a broad range of animal, plant and bacterial genomes (see http://www.affymetrix.com/support/technical/datasheets/exon_arraydesign_datasheet.pdf). For every probe, the predicted background intensity was an estimate for the amount of non-specific hybridization to the probe. This background intensity was subtracted from the observed probe intensity before downstream analysis. Second, we calculated gene expression indexes in human and chimpanzee samples. For each gene, starting with all core probes that perfectly matched human and chimpanzee genomes at a single unique location, we used a correlation-based iterative probe selection algorithm (25) to select a subset of probes with highly correlated intensities across all samples. This probe selection algorithm was developed to remove exon array probes that may not reflect overall gene expression levels, such as those targeting alternative exons or putative exon predictions, as well as low-affinity or cross-hybridizing probes (25). Our previous studies show that this probe selection algorithm produces robust expression indexes (19,26,27). The selected probes were regarded as reliable indicators of overall gene expression levels. In genes with at least six selected probes, the background-corrected intensities of selected probes were fitted to the Li-Wong model (14) as in (19,25) to construct robust estimates of gene expression indexes. Finally, the expression indexes of all human and chimpanzee samples were normalized using quantile normalization. We used the same procedure to calculate gene expression indexes from the HJAY array data of human, chimpanzee and rhesus macaque cerebellums. The background model was trained from ‘anti-genomic’ background probes on the HJAY array. Expression indexes were calculated from all HJAY probes that perfectly matched human/chimpanzee genomes or human/rhesus genomes at a single unique location, using the correlation-based iterative probe selection algorithm described above (25).

Correlation analysis of Human Exon 1.0 array profiles of human and chimpanzee tissues

For each of the two chimpanzee tissues and 11 human tissues, we first calculated average gene expression indexes of three replicates. For all possible pairs of tissues, we calculated the Spearman correlation coefficient using the expression indexes of 2165 genes with large variation in gene expression levels across tissues. These genes were selected by requiring a coefficient of variation (CV) in gene expression indexes of at least 0.8, and expression indexes of over 100 in at least 20% samples. For each gene, the coefficient of variation of its expression indexes is calculated as the standard deviation of expression indexes divided by the mean of expression indexes in all samples.

Detection of differentially expressed genes between human, chimpanzee and rhesus macaque cerebellums using HJAY array data

Using expression indexes calculated from HJAY array data, we performed a pairwise comparison of gene expression levels in human and chimpanzee cerebellums using Significance Analysis of Microarrays (SAM) (28). We filtered genes whose maximum expression indexes were <100 in the three chimpanzee samples and three human samples. SAM analysis was performed with a log transformation of the gene expression indexes. We used the default setting of SAM to identify significantly differentially expressed genes with a minimum fold change of 2.0. We used the same procedure to identify differentially expressed genes from the HJAY array data on the human and rhesus macaque cerebellums.

Quantitative real-time PCR validation of differentially expressed genes between human, chimpanzee and rhesus macaque cerebellums

Quantitative real-time polymerase chain reaction (qRT-PCR) was performed using Power SYBR Green PCR Master Mix (Applied Biosystems, Foster City, CA). For qPCR analysis of expression differences between human and chimpanzee cerebellums, a single primer set that perfectly matched both human and chimpanzee mRNAs was designed using PRIMER3 (29). Primer sequences are described in Supplementary Table 2. Using these primers, qPCR was conducted on extracted RNA from human and chimpanzee cerebellums. Two micrograms of total RNA were used for each 20 μl cDNA synthesis reaction. Using a mathematical method described by Pfaffl (30), we calculated the average expression fold change in the pooled human cerebellum sample over each of the three chimpanzee cerebellum samples. All tested mRNA concentrations were normalized to HPRT1 as the reference gene. Similar results were obtained using β-actin as the reference gene (data not shown). To determine the differences in expression between human and rhesus macaque cerebellums, qPCR primers were separately designed using PRIMER3 (29) to amplify orthologous regions in the mRNA. If the human and rhesus primer sets had a difference in amplification efficiency of >10% [as estimated by a standard curve analysis (30)], we designed and tested additional primers to select the human primer set and the rhesus primer set with similar amplification efficiency. Primer sequences are described in Supplementary Table 3. Two micrograms of total RNA were used for each 20 μl cDNA synthesis reaction. Using a mathematical method described by Pfaffl (30), we calculated the average expression fold change in the pooled human cerebellum sample over each of the three rhesus cerebellum samples. All tested mRNA concentrations were normalized to HPRT1 as the reference gene.

RESULTS

Analysis of Human U133 Plus 2.0, Exon 1.0 and HJAY array probes targeting conserved regions between humans and nonhuman primates

We analyzed three generations of Affymetrix human expression arrays (U133 Plus 2.0 array, Exon 1.0 array and Exon Junction (HJAY) array) to determine the extent of their probe coverage for expression profiling of closely related nonhuman primates. The Human U133 Plus 2.0 array is the latest and most popular version of Affymetrix 3′-biased expression arrays. It uses sets of 11 perfect-match probes complementary to the 3′ ends of mRNA. Individual genes may have multiple probesets that target different regions within the 3′-end or alternative 3′-ends. The Human Exon 1.0 array, released in 2005, is the first generation of Affymetrix exon arrays. This array averages four probes per exon and 58 ‘core probes’ per gene. The HJAY array (Human Exon Junction array) is a second generation of Affymetrix exon array. This array has eight probes per exon for approximately 315 000 exons in the human genome (20,21) and also includes probes for exon–exon junctions. The two-fold increase in exon probe density on HJAY arrays is achieved by removing Exon 1.0 array probes targeting computationally predicted transcripts. For each 25mer probe, we used pairwise alignments of the UCSC human and nonhuman primate genomes to determine if a probe was a perfect-match for its orthologous target region in chimpanzees, orangutans and rhesus macaques (see ‘Materials and Methods’ section). As conventional Affymetrix 3′-biased expression arrays (including the U133 Plus 2.0 array) have 11 perfect-match probes per probeset (14), we asked how many genes on the Exon 1.0 array and the HJAY array have at least 11 or at least 6 perfect-match probes for their orthologous regions in nonhuman primates. In comparison, for the U133 Plus 2.0 array, we counted the number of probesets with at least 6 or 11 perfect-match probes for nonhuman primates. For genes with multiple U133 Plus 2.0 probesets, we also combined probes from multiple probesets (regardless of whether these probesets target distinct alternative transcripts) to count the maximum number of probes that perfectly matched nonhuman primates. Our analysis indicates that the HJAY array and the Exon 1.0 array have a much higher number of probes that perfectly match nonhuman primate genomes, when compared to the U133 Plus 2.0 array. As summarized in Table 1, on the HJAY array, the number of genes with at least 11 perfect-match probes in chimpanzees, orangutans and rhesus macaques is 16402, 15322 and 14360, with a median count of 84, 61 and 48 probes per gene, respectively. On the Exon 1.0 array, the number of genes with at least 11 perfect-match probes in chimpanzees, orangutans and rhesus macaques was 16885, 14488 and 12824, with a median count of 41, 32 and 28 probes per gene, respectively. In contrast, the number of U133 Plus 2.0 probesets with at least 11 perfect-match probes was 4213, 735 and 282 for these three species. When we combined multiple probesets for the same gene on the U133 Plus 2.0 array, the number was 10488, 6978 and 4241 for chimpanzee, orangutan and rhesus genomes, with a median count of 20, 17 and 15 probes per gene, respectively. The same trend was observed when we counted the number of genes with at least six perfect-match probes in nonhuman primates (see Table 1). In fact, 12481 genes on the HJAY array and 10106 genes on the Exon 1.0 array had at least 11 probes that perfectly matched all four genomes (human, chimpanzee, orangutan and rhesus genomes), with a median count of 38 probes and 23 probes per gene on these two array platforms. By contrast, only 2281 genes on the U133 Plus 2.0 array had more than 11 probes that matched all four genomes, with a median count of 15 probes per gene. It should be noted that our estimate of probe counts for the U133 Plus 2.0 array is an upper bound estimate, since many genes on the U133 Plus 2.0 array have multiple probesets targeting distinct alternative transcripts which should not be combined in the counting of perfect-match probes (see http://www.affymetrix.com/support/technical/technotes/hgu133_p2_technote.pdf).
Table 1.

Number of genes on HJAY array, Exon 1.0 array and U133 Plus 2.0 array with at least 11 or at least six probes that perfectly match human, chimpanzee, orangutan and rhesus genomes

HumanChimpanzeeOrangutanRhesus
Human HJAY array
    ≥6 PM probes17 414a (102b)16 980 (81)16 127 (58)15 498 (45)
    ≥11 PM probes16 774 (104)16 402 (84)15 322 (61)14 360 (48)
Human Exon 1.0 array
    ≥6 PM probes18 473 (48)17 989 (39)16 258 (29)15 169 (24)
    ≥11 PM probes17 991 (49)16 885 (41)14 488 (32)12 824 (28)
Human U133 Plus 2.0 array (Probe-set)
    ≥6 PM probes35 660 (11)31 519 (9)17 008 (7)8162 (7)
    ≥11 PM probes24 432 (11)4213 (11)735 (11)282 (11)
Human U133 Plus 2.0 array (Gene)
    ≥6 PM probes18 225 (20)16 979 (15)12 437 (12)8984 (10)
    ≥11 PM probes15 264 (22)10 488 (20)6978 (17)4241 (15)

aNumber of genes.

bMedian count of perfect match probes.

Number of genes on HJAY array, Exon 1.0 array and U133 Plus 2.0 array with at least 11 or at least six probes that perfectly match human, chimpanzee, orangutan and rhesus genomes aNumber of genes. bMedian count of perfect match probes. We further searched 25mer probe sequences against human and nonhuman primate genomes and removed those that matched multiple locations in the genomes. For example, in the human versus rhesus genome alignment analysis, we removed probes that matched multiple locations in either human or rhesus genomes (see ‘Materials and Methods’ section). As summarized in Table 2, 14037 genes had more than 11 HJAY array probes that perfectly matched human and rhesus genomes at a single unique location, with a median count of 47 probes per gene. On the Exon 1.0 array, the number was 12250 genes with a median count of 28 probes per gene. In contrast, on the U133 Plus 2.0 array, only 3865 genes had more than 11 probes that perfectly matched human and rhesus genomes at a single unique location, with a median count of 15 probes per gene (see Table 2).
Table 2.

Number of genes on HJAY array, Exon 1.0 array and U133 Plus 2.0 array with at least 11 or at least six probes that perfectly match both the human genome and the genome of chimpanzees, orangutans or rhesus macaques at a single unique location

HumanChimpanzeeOrangutanRhesus
Human HJAY array
    ≥6 PM probes16 974a (96b)16 745 (77)15 955 (56)15 226 (43)
    ≥11 PM probes16 329 (100)16 151 (80)15 119 (59)14 037 (47)
Human Exon 1.0 array
    ≥6 PM probes17 587 (46)17 343 (37)15 832 (29)14 604 (23)
    ≥11 PM probes16 881 (48)16 066 (40)14 023 (32)12 250 (28)
Human U133 Plus 2.0 array (Probe-set)
    ≥6 PM probes32 949 (11)28 524 (9)15 637 (7)7373 (7)
    ≥11 PM probes18 758 (11)3513 (11)629 (11)241 (11)
Human U133 Plus 2.0 array (Gene)
    ≥6 PM probes16 813 (19)15 689 (14)11 724 (11)8267 (10)
    ≥11 PM probes13 139 (21)9579 (20)6485 (17)3865 (15)

aNumber of genes.

bMedian count of perfect match probes.

Number of genes on HJAY array, Exon 1.0 array and U133 Plus 2.0 array with at least 11 or at least six probes that perfectly match both the human genome and the genome of chimpanzees, orangutans or rhesus macaques at a single unique location aNumber of genes. bMedian count of perfect match probes. Together, these results suggest that we can use a single high-density human exon array to measure expression levels of the vast majority of genes in a variety of nonhuman primates. Compared to the U133 Plus 2.0 array, the second generation of Affymetrix exon array (the HJAY array) has a substantial increase in the number of perfect-match probes for nonhuman primates. For instance, for genes with at least 11 perfect-match probes in all four genomes, the HJAY array has a 5.5-fold higher gene coverage than the U133 Plus 2.0 array, and a 2.5-fold higher probe density per gene. As expected, the HJAY array also has a higher probe density for orthologs of RefSeq human genes in nonhuman primates when compared to the Exon 1.0 array (see Tables 1 and 2).

Correlation analysis of human exon 1.0 array profiles of human and chimpanzee tissues

In comparative analyses of gene expression using species-specific arrays, variation in probe affinity is a major confounding factor in comparing expression indexes across species as probes are designed independently for multiple species (16,17). Our proposed approach using high-density exon arrays should not be affected by such probe effects, as we select identical sets of perfect-match probes to measure expression levels of orthologous genes. Thus, we expect a high correlation between the expression profiles of corresponding tissues from different species. To confirm this, we took advantage of a large preexisting Exon 1.0 dataset of 11 human tissues (including cerebellum and liver, see ‘Materials and Methods’ section), and generated triplicate Exon 1.0 array data of chimpanzee cerebellum and liver RNAs for comparisons between humans and chimpanzees. For each gene, starting with all core probes that perfectly matched human and chimpanzee genomes at a single unique location, we used a correlation-based iterative probe selection algorithm to select reliable indicators of overall expression levels (see ‘Materials and Methods’ section). Requiring that at least six probes were selected for a gene, we calculated expression indexes of 15143 genes in human and chimpanzee tissues. From the computed expression indexes, we investigated the similarity of expression profiles between human and chimpanzee tissues. We selected 2165 genes with large variations in expression levels across all samples and calculated their average expression indexes in two chimpanzee and 11 human tissues (see ‘Materials and Methods’ section). For each pair of tissues, we calculated the Spearman correlation coefficient of expression indexes of these 2165 genes as the metric of similarity in expression profiles. Our analysis indicates that the expression profiles of human cerebellum and liver are closest to their chimpanzee counterparts as opposed to any other human tissue (Figure 1). We obtained a Spearman correlation coefficient of 0.936 between human and chimpanzee cerebellums, and 0.887 between human and chimpanzee livers. In contrast, the correlation coefficient was −0.159 between human cerebellum and human liver, and −0.127 between chimpanzee cerebellum and chimpanzee liver. These results contrasted with previous analyses of human and mouse tissues using species-specific expression arrays, where probe affinity effects largely obscured the similarity of expression profiles of orthologous tissues (16,17).
Figure 1.

Correlation of Exon 1.0 array profiles of human (Hs) and chimpanzee (Pt) tissues. The heatmap shows that expression profiles of human cerebellum and liver are closest to their chimpanzee counterparts as opposed to any other human tissue.

Correlation of Exon 1.0 array profiles of human (Hs) and chimpanzee (Pt) tissues. The heatmap shows that expression profiles of human cerebellum and liver are closest to their chimpanzee counterparts as opposed to any other human tissue.

HJAY array detection and real-time qPCR validation of differentially expressed genes between human, chimpanzee and rhesus macaque cerebellums

A key goal of this study is to assess whether high-density exon arrays could be used to detect expression differences of orthologous genes in corresponding tissues. To test this, we used the HJAY array to generate expression profiles of human, chimpanzee and rhesus macaque cerebellums, with three replicates per species. We chose the HJAY array for this analysis, because it has a higher probe density for orthologs of RefSeq human genes in nonhuman primates. It should be noted that the HJAY array and the Exon 1.0 array use identical sample preparation and hybridization protocols. We first tested HJAY array detection of expression differences between human and chimpanzee cerebellums. Using HJAY exon probes that perfectly matched human and chimpanzee genomes at a single unique location, we calculated expression indexes of 14884 genes. We used Significance Analysis of Microarrays (SAM) (28) under its default settings (see ‘Materials and Methods’ section) and identified 916 genes with a minimum of 2-fold change in expression levels between human and chimpanzee cerebellums, including 453 genes with increased expression in humans and 463 genes with decreased expression in humans. We randomly selected 22 differentially expressed genes for validation by SYBR Green real-time qPCR. Among the 22 genes selected for qPCR validation, 10 genes had increased expression and 12 genes had decreased expression in the human cerebellum according to HJAY array expression indexes. The genes selected for validation span a broad spectrum of functional categories and estimated expression indexes. QPCR analysis was performed on the same samples used for HJAY array profiling. Real-time qPCR data of 21 genes indicated at least 2-fold change in expression levels between human and chimpanzee cerebellums and were concordant with the microarray data (Table 3). The only exception was NT5C, for which the HJAY array and qPCR data both indicated a decreased expression level in the human cerebellum, but the fold-change estimated by qPCR was only 1.3. Therefore, using a qPCR fold change of 2.0 as the criteria for positive validation, 21 out of 22 candidate genes were validated by qPCR. We also plotted the log2 expression fold changes of these 22 genes between human and chimpanzee cerebellums as estimated by the HJAY array and qPCR. We observed a strong positive correlation between the HJAY array data and qPCR data, with a Spearman correlation coefficient of 0.90 (see Figure 2). Morey and colleagues suggest that a correlation of over 0.8 indicates strong qPCR validation of microarray results (31). It should be noted that the fold change estimated by the HJAY array was typically smaller than the fold change estimated by qPCR (see Table 3). This is expected, as saturation of oligonucleotide probes at high mRNA concentration is known to compress the fold change estimates of differentially expressed genes. Taken together, our results provide strong evidence that the human versus chimpanzee expression differences detected by HJAY arrays are accurate and reliable.
Table 3.

HJAY array and qPCR data of 22 genes in human and chimpanzee cerebellums

GeneTranscript cluster IDGene descriptionGene IDChimpanzee cerebellum
Human cerebellum
Human vs chimpanzee change
Human vs chimpanzee fold change
#100#327#487ABCHJAY arrayqPCRHJAY arrayqPCR
BCLAF1812232BCL2-associated ion factor 197741794.81831.41943.0638.5627.2647.8DecreaseDecrease−2.91−3.32
CABYR829453Calcium binding tyrosine-(Y)-phosphorylation regulated26 256791.71052.4868.286.275.384.3DecreaseDecrease−11.03−14.27
CENPT827 391Centromere protein T80 152138.294.1116.5415.1431.9398.3IncreaseIncrease3.577.36
CHL1805763Cell adhesion molecule with homology to L1CAM10 7523185.73027.83220.8829.0767.4886.9DecreaseDecrease−3.84−11.38
COL6A1832836Collagen, type VI, alpha 1129153.171.853.5368.1415.8383.9IncreaseIncrease6.5514.71
CRYM827150Crystallin, mu142841.541.638.61038.41042.6996.5IncreaseIncrease25.2849.57
DNTTIP2802388Deoxynucleotidyltransferase, terminal, interacting protein 230 8361150.61167.41134.4243.8128.2182.3DecreaseDecrease−6.23−13.50
DSEL829865Dermatan sulfate epimerase-like92 1262180.71907.52288.1806.9818.5810.5DecreaseDecrease−2.62−2.67
EPHA6806177EPH receptor A6285 220810.7752.6823.3156.0130.9131.7DecreaseDecrease−5.70−4.54
FOS824317Proto-oncogene protein c-fos235390.1147.2152.2666.1742.7666.6IncreaseIncrease5.337.19
GSTM5800 796Glutathione S-transferase M5294911.427.821.4487.0454.1472.6IncreaseIncrease23.352.05
HYDIN827427Hydrocephalus inducing homolog (mouse)54 76824.631.727.5265.3255.9256.2IncreaseIncrease9.283.79
JMJD1C819286Jumonji domain-containing protein 1C221 0372859.72792.52711.5960.1872.7933.2DecreaseDecrease−3.02−8.67
KTN1824190Kinectin (Kinesin receptor)38951061.1921.3969.0185.8135.0172.5DecreaseDecrease−5.98−9.71
LPXN821047Leupaxin9404277.9389.5306.71751.41801.61739.6IncreaseIncrease5.432.19
NR4A1821945Nuclear receptor subfamily 4, group A, member 1316454.669.498.7200.3177.1165.2IncreaseIncrease2.442.88
NRG4826086Neuregulin 4145 9571527.51641.81736.3702.8618.6651.7DecreaseDecrease−2.49−3.04
NT5C8291875′, 3′-nucleotidase, cytosolic30 8331650.51450.21464.1379.6572.5773.6DecreaseDecrease−2.65−1.30
SNTG2803360Gamma-2-syntrophin54 221209.3235.1213.369.362.663.3DecreaseDecrease−3.37−2.01
SYNGR4830567Synaptogyrin 423 54661.674.163.6291.7246.5259.6IncreaseIncrease4.003.70
TMF1807072TATA element modulatory factor 17110980.11004.71082.0294.3243.2236.4DecreaseDecrease−3.96−5.87
ZP2827149Zona pellucida glycoprotein 2 (sperm receptor)778345.540.845.8723.2795.1749.5IncreaseIncrease17.1779.06
Figure 2.

Correlation of expression fold change between human and chimpanzee cerebellums measured by HJAY array and real-time qPCR. X-axis: log2-fold change of human expression level over chimpanzee expression level measured by HJAY array. Y-axis: log2 fold change of human expression level over chimpanzee expression level measured by real-time qPCR.

Correlation of expression fold change between human and chimpanzee cerebellums measured by HJAY array and real-time qPCR. X-axis: log2-fold change of human expression level over chimpanzee expression level measured by HJAY array. Y-axis: log2 fold change of human expression level over chimpanzee expression level measured by real-time qPCR. HJAY array and qPCR data of 22 genes in human and chimpanzee cerebellums To assess whether the HJAY array can also detect expression differences of orthologous genes from more distantly related species, we compared HJAY-based expression indexes of 12473 genes between human and rhesus cerebellums. Using SAM, we identified 893 genes with increased expression levels in humans and 789 genes with decreased expression levels in humans. From the 22 genes in Table 3, we selected 11 that also had significant differences between human and rhesus cerebellums as detected by HJAY data, and examined their expression levels using real-time qPCR. All 11 genes had more than two-fold change between human and rhesus cerebellums according to qPCR (see Table 4), yielding a validation rate of 11/11. The Spearman correlation coefficient between HJAY-estimated fold changes and qPCR estimated fold changes was 0.85.
Table 4.

HJAY array and qPCR data of 11 genes in human and rhesus cerebellums

GeneTranscript cluster IDGene descriptionGene IDRhesus cerebellum
Human cerebellum
Human vs rhesus change
Human vs rhesus fold change
#453#759#775ABCHJAY arrayqPCRHJAY arrayqPCR
CABYR829453Calcium binding tyrosine-(Y)-phosphorylation regulated26256282.9247.1327.389.368.778.5DecreaseDecrease−3.63−4.34
CENPT827391Centromere protein T80 15252.450.860.3390.2409.7361.6IncreaseIncrease7.14.41
COL6A1832836Collagen, type VI, alpha 1129144.846.149.4391.4431.6374.2IncreaseIncrease8.54217.74
CRYM827150Crystallin, mu142867.861.954.01134.11178.41104.6IncreaseIncrease18.655.35
EPHA6806177EPH receptor A6285 22035.832.630.4205.9162.0143.0IncreaseIncrease5.175.42
HYDIN827427Hydrocephalus inducing homolog (mouse)54 76829.529.320.9265.7240.6234.5IncreaseIncrease9.2916.56
JMJD1C819286Jumonji domain containing 1C221 0372433.32411.02484.51000.7912.7979.7DecreaseDecrease−2.53−7.92
KTN1824190Kinectin 1 (kinesin receptor)3895996.01129.51054.4188.1132.8176.4DecreaseDecrease−6.39−21.88
NT5C8291875′, 3′-nucleotidase, cytosolic30 83367.155.869.8173.9174.1160.9IncreaseIncrease2.6419.63
TMF1807072TATA element modulatory factor 171101075.01030.21051.9287.8232.6241.0DecreaseDecrease−4.15−4.53
ZP2827149Zona pellucida glycoprotein 2 (sperm receptor)778333.334.144.0611.0668.0605.0IncreaseIncrease16.9115 173.79
HJAY array and qPCR data of 11 genes in human and rhesus cerebellums

DISCUSSION

The transition from conventional ‘probe-poor’ expression arrays to a new generation of ‘probe-rich’ exon arrays marks a major shift in the design strategy of gene expression arrays. In this manuscript, we present a flexible and intuitive approach for comparative analysis of gene expression between closely related species, using high-density exon arrays designed for a single reference genome. Our approach builds on previous work that uses microarrays to examine evolutionary differences in gene expression (5,6,9,10,12,13,32), and is intended to overcome limitations in past research using cross-species microarray hybridization or species-specific microarrays (see ‘Introduction’ section). For example, sequence divergence between species is a serious problem for cross-species hybridization to conventional expression arrays (4). Oshlack et al. estimated that an average of fewer than three probes per probeset on Affymetrix 3′ arrays perfectly match orthologous transcripts from human, chimpanzee and rhesus genomes (11). In this study, we analyzed probe sequences of three generations of Affymetrix expressions arrays, including the 3′ biased U133 Plus 2.0 array and two generations of exon arrays (Exon 1.0 array and HJAY array). Our results indicate that high-density exon arrays, in particular the HJAY array, have high probe coverage for measuring gene expression in closely related nonhuman primates. The expression indexes constructed from exon array data have two desirable features. First, for each gene the expression indexes are computed from the signals of a large number of probes tiled over its entire transcribed region. In our HJAY array analysis of human and chimpanzee tissues, on average 80 probes per gene were used in the estimation of expression levels. The increased probe density is likely to produce more accurate gene expression indexes as demonstrated by previous studies (19,33). Second, probe affinity effects do not confound between-species comparisons of expression levels, as identical sets of perfect-match probes are used for constructing expression indexes of orthologous genes. Thus, the computed expression indexes from multiple species can be directly imported into standard software tools for high-level analysis of expression data, such as detection of differential expression and hierarchical clustering. The elimination of probe affinity effects during the calculation of expression indexes greatly simplifies downstream data analysis. Our approach is expected to have false negatives and false positives. Even among genes with sufficient probe coverage in nonhuman primates, false negatives could arise due to poor probe affinity or various types of microarray artifacts. As in most microarray experiments, we were unable to systematically assess the false negative rate in our study due to the lack of a large gold-standard for differentially expressed genes between these human and nonhuman primate RNA samples. In the future, generation of spike-in data set may allow us to evaluate false negatives of the between-species HJAY array analysis. On the other hand, false discovery rate (i.e. the fraction of false positives among all reported positives) is widely accepted as the most crucial metric to evaluate genome-wide studies such as microarrays (34). Our qPCR validation suggests a low false discovery rate for HJAY array detection of differentially expressed genes between humans and nonhuman primates (1/22 for human versus chimpanzee differences; 0/11 for human versus rhesus differences). Moreover, the fold change values estimated by qPCR are highly concordant with the fold change values estimated by the HJAY arrays, with a Spearman correlation coefficient of 0.90 in the human versus chimpanzee comparison, and 0.85 in the human versus rhesus comparison. Collectively, we demonstrate that high-density exon arrays represent a cost-effective and high-throughput tool for detecting expression differences between closely related species. Also, although this study focuses on between-species comparisons of gene expression, high-density exon arrays can be used for standard microarray expression profiling within a single closely related species, such as the comparison between diseased animals and healthy controls. This could circumvent the need for designing custom arrays when expression array platform for a given species of interest is unavailable. In this work, we use exon array probes that perfectly match orthologous regions of human exons to estimate gene expression levels in nonhuman primates. This assumes that the orthologous regions of human exons are also exons in other primate species. While this assumption is generally true, we know that a small percentage of human exons have had altered splicing patterns during primate evolution (e.g. recent creation of new exons) (35,36). Although evolutionary changes in alternative splicing have extremely interesting implications for function and evolution of eukaryotic genomes (37), such exons will introduce bias into the estimate of overall gene expression levels. Our correlation-based probe selection algorithm will help guard against this scenario, as it is designed to remove probes exhibiting substantially different splicing levels across samples (25). In the future, it will be possible to use transcript sequence data (e.g. cDNAs and mRNA-seq reads) of nonhuman primates to refine the selection of probes.

Software/data availability

We developed the JETTA program (http://gluegrant1.stanford.edu/~junhee/JETTA/) to support gene-level and exon-level analysis of HJAY array and Exon 1.0 array data. Probe annotations for HJAY array and Exon 1.0 array analysis of nonhuman primates can be downloaded from http://www.medicine.uiowa.edu/Labs/Xing/Primate-microarray/. These probe annotations can be used directly by JETTA to calculate gene expression indexes of nonhuman primates. Affymetrix Human Exon 1.0 array data of chimpanzee cerebellum/liver and Affymetrix HJAY data of human, chimpanzee and rhesus cerebellums have been deposited to the NCBI GEO database under the accession number GSE15666.

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online.

FUNDING

National Institutes of Health grant U54-GM062119 (to W.H.W.) and R01-HG004634 (to W.H.W. and Y.X.). University of Iowa research startup fund (to Y.X.). Funding for open access charge: National Institutes of Health grant R01-HG004634. Conflict of interest statement. None declared.
  37 in total

Review 1.  Microarray data analysis: from disarray to consolidation and consensus.

Authors:  David B Allison; Xiangqin Cui; Grier P Page; Mahyar Sabripour
Journal:  Nat Rev Genet       Date:  2006-01       Impact factor: 53.242

2.  Using DNA microarrays to study gene expression in closely related species.

Authors:  Alicia Oshlack; Adrien E Chabot; Gordon K Smyth; Yoav Gilad
Journal:  Bioinformatics       Date:  2007-03-23       Impact factor: 6.937

Review 3.  Cross-species microarray hybridizations: a developing tool for studying species diversity.

Authors:  Carmiya Bar-Or; Henryk Czosnek; Hinanit Koltai
Journal:  Trends Genet       Date:  2007-02-20       Impact factor: 11.639

4.  Evolutionary conservation of expression profiles between human and mouse orthologous genes.

Authors:  Ben-Yang Liao; Jianzhi Zhang
Journal:  Mol Biol Evol       Date:  2005-11-09       Impact factor: 16.240

5.  Assessing the conservation of mammalian gene expression using high-density exon arrays.

Authors:  Yi Xing; Zhengqing Ouyang; Karen Kapur; Matthew P Scott; Wing Hung Wong
Journal:  Mol Biol Evol       Date:  2007-03-25       Impact factor: 16.240

6.  Intra- and interspecific variation in primate gene expression patterns.

Authors:  Wolfgang Enard; Philipp Khaitovich; Joachim Klose; Sebastian Zöllner; Florian Heissig; Patrick Giavalisco; Kay Nieselt-Struwe; Elaine Muchmore; Ajit Varki; Rivka Ravid; Gaby M Doxiadis; Ronald E Bontrop; Svante Pääbo
Journal:  Science       Date:  2002-04-12       Impact factor: 47.728

7.  28-way vertebrate alignment and conservation track in the UCSC Genome Browser.

Authors:  Webb Miller; Kate Rosenbloom; Ross C Hardison; Minmei Hou; James Taylor; Brian Raney; Richard Burhans; David C King; Robert Baertsch; Daniel Blankenberg; Sergei L Kosakovsky Pond; Anton Nekrutenko; Belinda Giardine; Robert S Harris; Svitlana Tyekucheva; Mark Diekhans; Thomas H Pringle; William J Murphy; Arthur Lesk; George M Weinstock; Kerstin Lindblad-Toh; Richard A Gibbs; Eric S Lander; Adam Siepel; David Haussler; W James Kent
Journal:  Genome Res       Date:  2007-11-05       Impact factor: 9.043

8.  Microarray validation: factors influencing correlation between oligonucleotide microarrays and real-time PCR.

Authors:  Jeanine S Morey; James C Ryan; Frances M Van Dolah
Journal:  Biol Proced Online       Date:  2006-12-12       Impact factor: 3.244

9.  Probe selection and expression index computation of Affymetrix Exon Arrays.

Authors:  Yi Xing; Karen Kapur; Wing Hung Wong
Journal:  PLoS One       Date:  2006-12-20       Impact factor: 3.240

10.  Identification and utilization of inter-species conserved (ISC) probesets on Affymetrix human GeneChip platforms for the optimization of the assessment of expression patterns in non human primate (NHP) samples.

Authors:  Zhining Wang; Mark G Lewis; Martin E Nau; Alma Arnold; Maryanne T Vahey
Journal:  BMC Bioinformatics       Date:  2004-10-26       Impact factor: 3.169

View more
  10 in total

1.  Exon Array Biomarkers for the Differential Diagnosis of Schizophrenia and Bipolar Disorder.

Authors:  Marquis Philip Vawter; Robert Philibert; Brandi Rollins; Patricia L Ruppel; Terry W Osborn
Journal:  Mol Neuropsychiatry       Date:  2018-04-10

2.  Pathogenesis of hepatitis E virus and hepatitis C virus in chimpanzees: similarities and differences.

Authors:  Claro Yu; Denali Boon; Shannon L McDonald; Timothy G Myers; Keiko Tomioka; Hanh Nguyen; Ronald E Engle; Sugantha Govindarajan; Suzanne U Emerson; Robert H Purcell
Journal:  J Virol       Date:  2010-08-25       Impact factor: 5.103

3.  Comparative genetics of the central nervous system in epigean and hypogean Astyanax mexicanus.

Authors:  Allen G Strickler; Daphne Soares
Journal:  Genetica       Date:  2011-02-13       Impact factor: 1.082

4.  Evolution of alternative splicing in primate brain transcriptomes.

Authors:  Lan Lin; Shihao Shen; Peng Jiang; Seiko Sato; Beverly L Davidson; Yi Xing
Journal:  Hum Mol Genet       Date:  2010-05-11       Impact factor: 6.150

5.  Exon-level microarray analyses identify alternative splicing programs in breast cancer.

Authors:  Anna Lapuk; Henry Marr; Lakshmi Jakkula; Helder Pedro; Sanchita Bhattacharya; Elizabeth Purdom; Zhi Hu; Ken Simpson; Lior Pachter; Steffen Durinck; Nicholas Wang; Bahram Parvin; Gerald Fontenay; Terence Speed; James Garbe; Martha Stampfer; Hovig Bayandorian; Shannon Dorton; Tyson A Clark; Anthony Schweitzer; Andrew Wyrobek; Heidi Feiler; Paul Spellman; John Conboy; Joe W Gray
Journal:  Mol Cancer Res       Date:  2010-07-06       Impact factor: 5.852

6.  Different effects of the probe summarization algorithms PLIER and RMA on high-level analysis of Affymetrix exon arrays.

Authors:  Yi Qu; Fei He; Yuchen Chen
Journal:  BMC Bioinformatics       Date:  2010-04-28       Impact factor: 3.169

7.  A comparison of RNA-Seq and high-density exon array for detecting differential gene expression between closely related species.

Authors:  Song Liu; Lan Lin; Peng Jiang; Dan Wang; Yi Xing
Journal:  Nucleic Acids Res       Date:  2010-09-22       Impact factor: 16.971

8.  CD28 costimulation regulates genome-wide effects on alternative splicing.

Authors:  Manish J Butte; Sun Jung Lee; Jonathan Jesneck; Mary E Keir; W Nicholas Haining; Arlene H Sharpe
Journal:  PLoS One       Date:  2012-06-29       Impact factor: 3.240

9.  RASA: Robust Alternative Splicing Analysis for Human Transcriptome Arrays.

Authors:  Junhee Seok; Weihong Xu; Ronald W Davis; Wenzhong Xiao
Journal:  Sci Rep       Date:  2015-07-06       Impact factor: 4.379

10.  The baboon kidney transcriptome: analysis of transcript sequence, splice variants, and abundance.

Authors:  Kimberly D Spradling; Jeremy P Glenn; Roy Garcia; Robert E Shade; Laura A Cox
Journal:  PLoS One       Date:  2013-04-23       Impact factor: 3.240

  10 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.