| Literature DB >> 22024410 |
Nalini Polavarapu1, Gaurav Arora, Vinay K Mittal, John F McDonald.
Abstract
BACKGROUND: Although humans and chimpanzees have accumulated significant differences in a number of phenotypic traits since diverging from a common ancestor about six million years ago, their genomes are more than 98.5% identical at protein-coding loci. This modest degree of nucleotide divergence is not sufficient to explain the extensive phenotypic differences between the two species. It has been hypothesized that the genetic basis of the phenotypic differences lies at the level of gene regulation and is associated with the extensive insertion and deletion (INDEL) variation between the two species. To test the hypothesis that large INDELs (80 to 12,000 bp) may have contributed significantly to differences in gene regulation between the two species, we categorized human-chimpanzee INDEL variation mapping in or around genes and determined whether this variation is significantly correlated with previously determined differences in gene expression.Entities:
Year: 2011 PMID: 22024410 PMCID: PMC3215961 DOI: 10.1186/1759-8753-2-13
Source DB: PubMed Journal: Mob DNA
Figure 1Computational pipeline for the detection and characterization of human and chimpanzee insertions and deletions. Using information from the designated databases, we characterized insertions and deletions (INDELs) and analyzed them using various in-house Perl scripts and open source algorithms (Multiz, RepeatMasker [44] and Tandem Repeats Finder [45]). The multiple alignment program Multiz was used to classify chimpanzee gaps (CGs) as insertions or deletions. The UCSC Genome Browser [40] pairwise alignment databases were used for human gap (HG) classification as insertions or deletions. Human and chimpanzee INDELs were associated with the known human and chimpanzee Ensembl genes [30] obtained from the UCSC Table Browser (http://genome.ucsc.edu/cgi-bin/hgTables), and the presence of INDELs was correlated with the microarray gene expression data. INDEL sequences that were obtained from their corresponding reference genomes were searched for various repeat elements using RepeatMasker and Tandem Repeats Finder and classified according to the families of repeat sequences (partial or complete) present within each INDEL. The characterized INDELs were then assessed using various statistical analytical methods.
Number of INDELs associated with different categories of sequences
| Categories of gaps | Human gaps | Chimpanzee gaps | Total INDELs (HGs + CGs) |
|---|---|---|---|
| Total gaps | 11,365 | 15,144 | 26,509 |
| Interspersed repeats (all) | 7,176 | 11,398 | 18,574 |
| Interspersed sequences (retrotransposons) | 7,121 | 11,355 | 18,476 |
| Retrotransposons (SINEs) | 3,494 | 7,021 | 10,515 |
| Retrotransposons (LINEs) | 1,847 | 2,052 | 3,899 |
| Retrotransposons (ERVs) | 519 | 356 | 875 |
| Retrotransposons (SVAs) | 114 | 681 | 795 |
| Retrotransposons (MEs) | 1,147 | 1,245 | 2,392 |
| Interspersed sequences (DNA elements) | 55 | 43 | 98 |
| Noninterspersed sequences (all) | 4,189 | 3,746 | 7,935 |
| Noninterspersed sequences/tandem repeats (NIS/TR) | 1,266 | 1,334 | 2,600 |
| Noninterspersed sequences/unique sequences (NIS/US) | 2,923 | 2,412 | 5,335 |
CG = chimpanzee gap; ERV = endogenous retrovirus; HG = human gap; INDEL = insertion and deletion; LINE = long interspersed nuclear element; ME = mosaic element; NIS = noninterspersed sequence; SINE = short interspersed nuclear element; SVA = biologically active composite elements consisting of fragments of SINE, VNTRs and Alu elements; TR = tandem repeat; US = unique sequence; VNTR = variable number of tandem repeats. Interspersed repeats are transposable element sequences that are present multiple times throughout the genome. The majority of interspersed repeats are retrotransposon sequences (subcategories: SINEs, LINEs, ERVs, SVAs, and MEs). DNA family transposable elements constitute less than 1% of interspersed repeats. Noninterspersed sequences are TRs or USs that map to specific INDEL sites in the genome.
Number of human and chimpanzee INDELs associated with all sequences (retrotransposons and noninterspersed sequences)
| All sequences | Human gaps | Chimpanzee gaps | Total insertions | Total deletions | Total INDELs | ||||
|---|---|---|---|---|---|---|---|---|---|
| Retrotransposons + noninterspersed sequences | CIs | HDs | CIs + HDs | HIs | CDs | HIs + CDs | CIs + HIs | CDs + HDs | CIs + HIs + CDs + HDs |
| Total | 5,911 | 5,399 | 11,310 | 10,607 | 4,494 | 15,101 | 16,518 | 9,893 | 26,411 |
CD = chimpanzee deletion; CI = chimpanzee insertion; HD = human deletion; HI = human insertion; INDEL = insertion and deletion. Using Rhesus macaque as an out-group, we characterized INDEL variation as CDs, CIs, HDs or HIs.
Number of human and chimpanzee INDELs associated with retrotransposons
| Human gaps | Chimpanzee gaps | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Retrotransposon subclass | CIs | HDs | CIs + HDs | HIs | CDs | HIs + CDs | Total insertions (CIs + HIs) | Total deletions (CDs + HDs) | Total INDELs (CIs + HIs + CDs + HDs) |
| SINE | 2,264 | 1,230 | 3,494 | 5,787 | 1,234 | 7,021 | 8,051 | 2,464 | 10,515 |
| LINE | 1,311 | 536 | 1,847 | 1,756 | 296 | 2,052 | 3,067 | 832 | 3,899 |
| ERV | 208 | 311 | 519 | 156 | 200 | 356 | 364 | 511 | 875 |
| SVA | 98 | 16 | 114 | 680 | 1 | 681 | 778 | 17 | 795 |
| ME | 154 | 993 | 1,147 | 269 | 976 | 1,245 | 423 | 1,969 | 2,392 |
| Total | 4,035 | 3,086 | 7,121 | 8,648 | 2,707 | 11,355 | 12,683 | 5,793 | 18,476 |
CD = chimpanzee deletion; CI = chimpanzee insertion; ERV = endogenous retrovirus; HD = human deletion; HI = human insertion; INDEL = insertion and deletion; LINE = long interspersed nuclear element; ME = mosaic element; SINE = short interspersed nuclear element; SVA = biologically active composite elements consisting of fragments of SINE, VNTRs and Alu elements. Using Rhesus macaque as an out-group, we characterized INDEL variation as CDs, CIs, HDs or HIs.
Number of human and chimpanzee INDELs associated with noninterspersed sequences
| Human gaps | Chimpanzee gaps | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Noninterspersed sequence subclass | CIs | HDs | CIs + HDs | HIs | CDs | HIs + CDs | Total insertions (CIs + HIs) | Total deletions (CDs + HDs) | Total INDELs (CIs + HIs + CDs + HDs) |
| TRs | 720 | 546 | 1,266 | 814 | 520 | 1,334 | 1,534 | 1,066 | 2,600 |
| USs | 1,156 | 1,767 | 2,923 | 1,145 | 1,267 | 2,412 | 2,301 | 3,034 | 5,335 |
| Total | 1,876 | 2,313 | 4,189 | 1,959 | 1,787 | 3,746 | 3,835 | 4,100 | 7,935 |
CD = chimpanzee deletion; CI = chimpanzee insertion; HD = human deletion; HI = human insertion; INDEL = insertion and deletion; TR = tandem repeat; US = unique sequence. Using Rhesus macaque as an out-group, we characterized INDEL variation as CDs, CIs, HDs or HIs.
Number of genes associated with different types of INDELs
| Type of INDEL | Genes associated with INDELS containing RTs only | Genes associated with INDELs containing NISs only | Genes associated with INDELs containing both RTs and NISs | Total (genes associated with INDELs) |
|---|---|---|---|---|
| HI | 3,149 | 718 | 326 | 4,193 |
| CI | 1,276 | 674 | 175 | 2,125 |
| HD | 1,139 | 740 | 155 | 2,034 |
| CD | 1,309 | 776 | 160 | 2,245 |
| Total | 6,873 | 2,908 | 816 | 10,597 |
CD = chimpanzee deletion; CI = chimpanzee insertion; HD = human deletion; HI = human insertion; INDEL = insertion and deletion; NIS = noninterspersed sequence; RT = retrotransposon sequence. The genes associated with INDELs were classified on the basis of the type of INDEL (HIs, CIs, HDs and CIs) associated with the gene and type of sequence contained in the INDEL (RTs vs NISs).
Number of genes differentially expressed between humans and chimpanzees across five tissues
| Tissue type | |||||
|---|---|---|---|---|---|
| Expressed genes | Brain | Testis | Heart | Liver | Kidney |
| Number of genes expressed | 14,133 | 15,445 | 13,497 | 13,684 | 14,059 |
| Number of genes differentially expressed | 6,884 (49%) | 10,803 (70%) | 6,843 (51%) | 5,308 (39%) | 6,589 (47%) |
ANOVA = analysis of variance. "Expressed genes" are those designated as "present" by the default Affymetrix MAS 5.0 software in at least one tissue in either chimpanzees or humans. ANOVA was used to identify genes whose expression was significantly different (P < 0.05) between humans and chimpanzees for each of the tissue types. The percentages in parentheses were calculated by dividing the number of genes differentially expressed or not in each tissue by the total number of genes expressed in that tissue.
Number of differentially expressed or non-differentially expressed genes associated or not associated with INDELs
| Tissue type | Number of DE genes associated with INDELs | Number of non-DE genes associated with INDELs | Number of DE genes not associated with INDELs | Number of non-DE genes not associated with INDELs | Total expressed genes |
|---|---|---|---|---|---|
| Brain | 2,266 | 2,153 | 4,618 | 5,096 | 14,133 |
| Testis | 3,438 | 1,256 | 7,365 | 3,386 | 15,445 |
| Heart | 2,233 | 1,948 | 4,610 | 4,706 | 13,497 |
| Liver | 1,696 | 2,466 | 3,612 | 5,910 | 13,684 |
| Kidney | 2,179 | 2,144 | 4,410 | 5,326 | 14,059 |
DE = differentially expressed; INDEL = insertion and deletion; kb = kilobase; non-DE = non-differentially expressed. INDELs mapping within or in proximity (± 5 kb) to genes were considered to be associated.
Proportions of differentially expressed or non-differentially expressed genes associated with INDELs are significantly different
| Tissue type | DE genes associated with INDELs/total DE genes (%) | Non-DE genes associated with INDELs/total non-DE genes (%) | Proportions test ( |
|---|---|---|---|
| Brain | 2,266/2,266 + 4,618 (33%) | 2,153/2,153 + 5,096 (30%) | 4.054E-05 |
| Testis | 3,438/3,438 + 7,365 (32%) | 1,256/1,256 + 3,386 (27%) | 3.93E-09 |
| Heart | 2,233/2,233 + 4,610 (33%) | 1,948/1,948 + 4,706 (29%) | 2.7E-05 |
| Liver | 1,696/1,696 + 3,612 (32%) | 2,466/2,466 + 5,910 (29%) | 0.0019 |
| Kidney | 2,179/2,179 + 4,410 (33%) | 2,144/2,144 + 5,326 (29%) | 2.35E-08 |
DE = differentially expressed; INDEL = insertion and deletion; non-DE = non-differentially expressed. Proportions for all INDELs across all tissues are shown.
Proportions of differentially expressed or non-differentially expressed genes associated with INDELs are significantly different
| Tissue type | DE genes associated with RT INDELs/total DE genes (%) | Non-DE genes associated with RT INDELs/total non-DE genes (%) | Proportions test ( |
|---|---|---|---|
| Brain | 1,916/2,266 + 4,618 (28%) | 1,790/2,153 + 5,096 (25%) | 2.42E-05 |
| Testis | 2,862/3,438 + 7,365 (26%) | 1,072/1,256 + 3,386 (23%) | 9.63E-06 |
| Heart | 1,876/2,233 + 4,610 (27%) | 1,636/1,948 + 4,706 (25%) | 0.00019 |
| Liver | 1,416/1,696 + 3,612 (26%) | 2,072/2,466 + 5,910 (25%) | 0.012 |
| Kidney | 1,843/2,179 + 4,410 (28%) | 1,776/2,144 + 5,326 (24%) | 1.52E-08 |
DE = differentially expressed; INDEL = insertion and deletion; non-DE = non-differentially expressed, RT = retrotransposon sequence. Proportions for all RT-associated INDELs across all tissues are shown.
Proportions of differentially expressed or non-differentially expressed genes associated with INDELs are significantly different
| Tissue type | DE genes with NIS-associated INDELs/total DE genes (%) | Non-DE genes with NIS-associated INDELs/total non-DE genes (%) | Proportions test ( |
|---|---|---|---|
| Brain | 801/2,266 + 4,618 (12%) | 762/2,153 + 5,096 (1%) | 0.036 |
| Testis | 1,193/3,438 + 7,365 (11%) | 440/1,256 + 3,386 (0.94%) | 0.0041 |
| Heart | 777/2,233 + 4,610 (11%) | 658/1,948 + 4,706 (0.98%) | 0.006 |
| Liver | 590/1,696 + 3,612 (11%) | 838/2,466 + 5,910 (1%) | 0.041 |
| Kidney | 732/2,179 + 4,410 (11%) | 768/2,144 + 5,326 (1%) | 0.11 |
DE = differentially expressed; INDEL = insertion and deletion; non-DE = non-differentially expressed; NIS = noninterspersed sequence. Proportions are shown for all NIS-associated INDELs across all tissues.
Proportions of differentially expressed genes associated or not associated with INDELs are significantly different
| Tissue type | DE genes with INDELs/total genes with INDELs (%) | DE genes with non-INDELs/total genes with non-INDELs (%) | Proportions test ( |
|---|---|---|---|
| Brain | 2,266/2,266 + 2,153 (51%) | 4,618/4,618 + 5,096 (48%) | 4.054E-05 |
| Testis | 3,438/3,438 + 1,256 (73%) | 7,365/7,365 + 3,386 (69%) | 3.93E-09 |
| Heart | 2,233/2,233 + 1,948 (53%) | 4,610/4,610 + 4,706 (5%) | 2.7E-05 |
| Liver | 1,696/1,696 + 2,466 (41%) | 3,612/3,612 + 5,910 (38%) | 0.0019 |
| Kidney | 2,179/2,179 + 2,144 (5%) | 4,410/4,410 + 5,326 (45%) | 2.35E-08 |
DE = differentially expressed; INDEL = insertion and deletion. Proportions for all INDELs across all tissues examined are shown.
Figure 2Overlap (blue region) between genes significantly differentially expressed between humans and chimpanzees and associated with nucleotide differences (green region) [31]or large insertion and deletion differences (red region) between the species. On average, fewer than 9% of genes differentially expressed at the life stages and tissues examined were associated with both types of variation. The number of differentially expressed genes associated with nucleotide differences as determined by Khaitovich et al. [31], as well as the number of differentially expressed genes associated with large insertions and deletions (INDELs) as determined in this study, are shown. The number of overlapping genes are shown at the intersection.