Literature DB >> 16857058

Transposable element derived DNaseI-hypersensitive sites in the human genome.

Leonardo Mariño-Ramírez1, I King Jordan.   

Abstract

BACKGROUND: Transposable elements (TEs) are abundant genomic sequences that have been found to contribute to genome evolution in unexpected ways. Here, we characterize the evolutionary and functional characteristics of TE-derived human genome regulatory sequences uncovered by the high throughput mapping of DNaseI-hypersensitive (HS) sites.
RESULTS: Human genome TEs were found to contribute substantially to HS regulatory sequences characterized in CD4+ T cells: 23% of HS sites contain TE-derived sequences. While HS sites are far more evolutionarily conserved than non HS sites in the human genome, consistent with their functional importance, TE-derived HS sites are highly divergent. Nevertheless, TE-derived HS sites were shown to be functionally relevant in terms of driving gene expression in CD4+ T cells. Genes involved in immune response are statistically over-represented among genes with TE-derived HS sites. A number of genes with both TE-derived HS sites and immune tissue related expression patterns were found to encode proteins involved in immune response such as T cell specific receptor antigens and secreted cytokines as well as proteins with clinical relevance to HIV and cancer. Genes with TE-derived HS sites have higher average levels of sequence and expression divergence between human and mouse orthologs compared to genes with non TE-derived HS sites.
CONCLUSION: The results reported here support the notion that TEs provide a specific genome-wide mechanism for generating functionally relevant gene regulatory divergence between evolutionary lineages. REVIEWERS: This article was reviewed by Wolfgang J. Miller (nominated by Jerzy Jurka), Itai Yanai and Mikhail S.Gelfand.

Entities:  

Year:  2006        PMID: 16857058      PMCID: PMC1538576          DOI: 10.1186/1745-6150-1-20

Source DB:  PubMed          Journal:  Biol Direct        ISSN: 1745-6150            Impact factor:   4.540


Open peer review

Reviewed by Wolfgang J. Miller (nominated by Jerzy Jurka), Itai Yanai and Mikhail S.Gelfand. For the full reviews, please go to the Reviewers' comments section.

Background

Transposable elements (TEs) are DNA sequences capable of moving among chromosomal locations within the genome. TEs are copious genomic entities; at least half of the human genome sequence is derived from TE insertions [1,2]. While TEs were once thought to be purely selfish parasites concerned only with their own proliferation [3], there are now numerous examples of TE sequences that have been domesticated [4] to play some role for the host genomes in which they reside [5]. One way that TEs can achieve such a mutualistic status is through the donation of regulatory sequences that help control the expression of nearby host genes. For instance, recent genome-wide studies have shown that TEs can be found in gene-specific regulatory regions such as proximal promoters and untranslated regions as well as regulatory sequences that exert more global effects like scaffold/matrix attachment regions and locus control regions [6,7]. LINE elements, in particular, have been demonstrated to have genome-wide effects in lowering expression when inserted within transcribed regions [8]. Comparative sequence analyses have shown that many sequences from two particular human TE families have evolved under purifying selection, strongly suggesting a functional role related to gene regulation [9]. More specifically, numerous experimentally characterized cis-regulatory binding sequences have been shown to be derived from TE insertions [6,10], and a number of anecdotal cases of gene regulatory phenotypes governed by TE sequences have been confirmed [11,12]. At the same time, TEs are known to be among the least evolutionarily conserved elements in the human genome [1]. Indeed, TE activity and insertions often lead to the most substantial evolutionary differences between mammalian genome sequences [13,14]. Taken together with their ability to donate regulatory sequences, the lineage-specific nature of TEs suggests that they may provide a specific mechanism for driving the regulatory divergence between evolutionary lineages [10]. There are numerous experimental and computational efforts underway aimed at characterizing the non-coding portion of mammalian genomes [15]. Much of this work is focused on elucidating the location and nature of regulatory sequences that control the expression of nearby genes. An example of this kind of work is the large scale attempt, spearheaded by the National Human Genome Research Institute (NHGRI), to characterize a complete set of human genome DNaseI-hypersensitive (HS) sites . HS sites are associated with gene regulatory regions, for upregulated genes in particular, and mapping of HS sites is considered to be among the most reliable experimental methods for identifying regulatory sequences. HS sites have been shown to be associated with a variety of regulatory regions such as promoters, enhancers, suppressors, insulators, and locus control regions [16]. Using recently developed high throughput experimental methods, thousands of HS sites were cloned from human CD4+ T cells [17,18] and sequenced using massively parallel signature sequencing [19]; results were confirmed with real time PCR [20]. CD4+ T cells are a class of lymphocytes known as helper or effector T cells that serve to activate and direct other immune cells. CD4+ T cells are one of the primary targets of HIV infection, and depletion of these cells leads to AIDS. Thus, the HS sites mapped to the human genome by the NHGRI should correspond to sequences that regulate gene expression related to CD4+ T cell mediated immune response. In this report, we have taken advantage of the genome-wide mapping of HS sites in order to evaluate the contribution of TEs to human gene regulatory sequences. The extent of TE-derived HS sites in the human genome was characterized, and the evolutionary conservation levels of TE-derived versus non TE-derived HS sites were compared. In addition, the expression and functional characteristics of genes with TE-derived HS sites were evaluated along with the evolutionary divergence of their sequences and expression patterns. The results reported here indicate that TEs have provided numerous functionally relevant HS sites to the human genome, and these regulatory sequences have played a role in driving functional divergence along the human evolutionary lineage.

Results and discussion

Human genome DNaseI-hypersensitive sites

A total of 14,216 DNaseI-hypersensitive (HS) sites, covering ~4.2 megabases of DNA, are mapped to the hg17 version (NCBI build 35) of the human genome sequence. These sites consist of clusters of two or more experimentally characterized HS sites that map within 500 bp of each other. These HS sites were defined in CD4+ T cells and are presumed to be functionally relevant with respect to the regulation of gene expression in these cells. Given the functional role played by HS sites, they are expected to be anomalously conserved in terms of their levels of sequence divergence. This is because the evolution of functionally important sequences is constrained by purifying selection (i.e. the removal of deleterious variants). Indeed, this idea is the basis of the phylogenetic footprinting approach that identifies putatively functional genomic elements by virtue of their sequence conservation [21,22]. The expectation that HS sites should be evolutionarily conserved was tested using the binary characterization of human genome positions as conserved or non-conserved based on analysis with the program phastCons [23]. PhastCons employs a probabilistic hidden Markov model (HMM) that represents the levels of DNA substitution at each site in the genome and how these levels change among sites. The phastCons results used here were based on a human query anchored multiple sequence alignment (MSA) of 17 vertebrate genomes. This MSA was assembled with the program multiz [24] from whole genome pairwise alignments generated using blastz [25]. The HMM used by phastCons employs a single phylogenetic tree for all sites with the branch lengths free to vary across sites. The HMM has two states – conserved and non-conserved – based on the values of the branch length scaling parameter estimated from the data. Alignment sites (segments) are predicted as being conserved if they are significantly more likely to have been generated by the conserved state of the HMM. All HS and non HS sites in hg17 were evaluated with respect to their phastCons conservation designation. HS sites in the human genome were found to be far more conserved, on average, than sites non HS sites (Figure 1a): ~13% of HS sites are conserved compared to ~5% of non HS sites. This comparison is conservative because the non HS sites include numerous exons, most of which are conserved. The figure of 5% conserved non HS sites is consistent with previous estimates for the proportion of the human genome evolving under purifying selection [23]. The greater than two-fold difference in conservation between HS and non HS sites, together with the large number of genome positions evaluated, yields a highly statistically significant difference (χ2 = 5.2 × 106, P = 0; Figure 1b). These results are consistent with previous reports that HS sites are i-enriched in human genome regions that align with mouse genomic sequence [17] and ii-often located near sequences conserved in multiple species [18]. However, it should be noted that the majority of HS sites (~87%) analyzed here are not considered conserved by the criteria employed by phastCons. Thus, many of these sites, which are likely to be functionally important, might not be detected using a naïve phylogenetic footprinting approach.
Figure 1

DNaseI-hypersensitive site conservation. a) The percent of phastCons (PC) denoted conserved positions for DNaseI-hypersensitive (HS) versus non DNaseI-hypersensitive (nonHS) sites in the human genome. b) 2 × 2 contingency table for evaluating the difference in evolutionary conservation between HS and nonHS sites.

DNaseI-hypersensitive site conservation. a) The percent of phastCons (PC) denoted conserved positions for DNaseI-hypersensitive (HS) versus non DNaseI-hypersensitive (nonHS) sites in the human genome. b) 2 × 2 contingency table for evaluating the difference in evolutionary conservation between HS and nonHS sites.

Transposable element derived hypersensitive sites

The contribution of TEs to HS sites in the human genome was evaluated by comparing the results of a RepeatMasker [26] analysis of hg17 to the mapped locations of the NHGRI HS sites. There are numerous TE-derived HS sites in the human genome: 3,229 HS sites (~23%) include TE sequence and 11% of all positions covered by HS sites (454,564 bp) are derived from TEs. This substantial fraction of HS sites derived from TEs is likely to be an underestimate since some TEs have diverged beyond recognition and others may not be covered by the RepBase library [27] employed by RepeatMasker. The relative frequency distribution of TE classes found to donate HS sites is largely similar to genomic distribution of TEs with a slight over-representation of SINEs and according under-representation of LINEs (Figure 2). This probably reflects the fact that SINEs, such as Alus, are over-represented in GC and gene rich regions of the genome, while LINEs tend to be found in relatively gene poor AT-rich regions.
Figure 2

Relative percentages of four transposable element (TE) classes. Percentages of the four classes are shown for the human genome as a whole (dark grey) and for HS sites only (light grey).

Relative percentages of four transposable element (TE) classes. Percentages of the four classes are shown for the human genome as a whole (dark grey) and for HS sites only (light grey). The evolutionary conservation of TE-derived HS sites was compared to that of non TE-derived HS sites using the same phastCons-based approach described previously. HS site positions derived from TEs are far less conserved (<1%) than non TE-derived HS site positions (~17%; Figure 3a), and the difference is highly statistically significant (χ2 = 6.5 × 105, P = 0; Figure 3b). This finding is consistent with the highly lineage-specific nature of TE-sequences. Even in the human genome, where they are relatively ancient, TEs are among the least conserved of all sequences [1]. The lack of evolutionary conservation for TE-derived HS sites can be taken to suggest the formal possibility these sites are not functionally relevant. However, the analysis of gene expression data described below indicates that TE-derived HS sites do in fact play a functional role related to CD4+ T cell-specific gene regulation. Thus, it may be the case that TE-derived HS sites provide a specific mechanism that leads to lineage-specific regulatory phenotypes, which in turn are important for the generation of higher level phenotypic differences between species. This interpretation is consistent with what has previously been shown for TE-derived cis-regulatory sequences with demonstrable roles in gene regulation, a majority of which also show little or no evidence of evolutionary conservation [10].
Figure 3

Evolutionary conservation of TE-derived versus non TE-derived HS sites. a) The percent of phastCons (PC) denoted conserved positions for TE-derived (HSTE) and non TE-derived (HSnotTE) sites in the human genome. b) 2 × 2 contingency table for evaluating the difference in evolutionary conservation between HSTE and HSnotTE sites.

Evolutionary conservation of TE-derived versus non TE-derived HS sites. a) The percent of phastCons (PC) denoted conserved positions for TE-derived (HSTE) and non TE-derived (HSnotTE) sites in the human genome. b) 2 × 2 contingency table for evaluating the difference in evolutionary conservation between HSTE and HSnotTE sites.

Gene expression

The HS sites analyzed here were cloned from CD4+ T cells. As such, the HS site locations in the human genome should correspond to genes that are upregulated in CD4+ T cells. This prediction was tested using Affymetrix microarray gene expression data from the Novartis Research Foundation SymAtlas (GNF2) [28]. The locations of all hg17 mapped HS sites were considered with respect to the locations of the so-called known genes from the UCSC genome browser. HS sites were associated with a gene if they mapped within 1-kb of the start or end of transcription for that gene, and Affymetrix probe identifiers were associated with individual genes. The relative levels of CD4+ T cell-specific gene expression were compared for genes that have co-located HS sites and those that have no HS site. Genes associated with HS sites have significantly higher relative CD4+ T cell expression (μ = 0.48) than non HS site (μ = -0.21) genes (Student's t-test P = 4.5e-264; Mann-Whitney U test, P = 8.2e-300; Figure 4a and 4b). This is consistent with a functional role for these HS sites in driving gene expression in CD4+ T cells. Similar results, finding HS sites near genes with demonstrable CD4+ T cell expression, were reported previously [17,18]. In addition, there is no real difference in the levels of CD4+ T cell expression between TE-derived (μ = 0.48) and non TE derived (μ = 0.47) HS sites (Student's t-test, P = 0.82, Mann-Whitney U test, P = 0.33; Figure 4b). This finding underscores the functional relevance of TE-derived HS sites: they too appear to drive CD4+ T cell specific expression.
Figure 4

Comparison of CD4+ T cell expression levels for different classes of human genes. a) Cumulative frequency distributions of CD4+ T cell expression for genes do not have any HS sites (notHS) and genes with HS sites (allHS). b) Comparison of average CD4+ T cell expression levels for without (notHS) and with (allHS) HS sites. Genes with HS sites were further broken down into groups of genes with TE-derived (HSandTE) and not TE-derived (HSnotTE) HS sites.

Comparison of CD4+ T cell expression levels for different classes of human genes. a) Cumulative frequency distributions of CD4+ T cell expression for genes do not have any HS sites (notHS) and genes with HS sites (allHS). b) Comparison of average CD4+ T cell expression levels for without (notHS) and with (allHS) HS sites. Genes with HS sites were further broken down into groups of genes with TE-derived (HSandTE) and not TE-derived (HSnotTE) HS sites.

Gene function

The GNF2 expression data was also used to guide functional evaluation of genes with TE-derived HS sites. Condition-specific gene expression patterns, across 73 (non-cancerous) human tissue types and cell lines, were used to group this set of genes with k-means clustering [29]. A total of 25 clusters, containing genes with similar expression profiles, were resolved in this way, and five clusters (2, 9, 15, 19 & 25) containing genes with pronounced CD4+ T cell over-expression were chosen for functional analysis. Genes with relatively high expression in CD4+ T cells also tend to be over-expressed in other immune related tissues and cells such as blood, bone marrow and lymphoblasts (Figure 5). These coherent expression patterns strongly suggest a functional role related to immune response, and the annotation of the genes in question bear this interpretation out.
Figure 5

Gene expression patterns for a cluster of coexpressed genes with TE-derived HS sites. a) View of relative expression levels across conditions for a subset of cluster 2 genes, color coded according to over (red) and under (green) expression. b) Centroid view with cluster 2 expression averages.

Gene expression patterns for a cluster of coexpressed genes with TE-derived HS sites. a) View of relative expression levels across conditions for a subset of cluster 2 genes, color coded according to over (red) and under (green) expression. b) Centroid view with cluster 2 expression averages. Gene ontology (GO) annotation terms were mapped to individual genes, and the gene members of each cluster were evaluated for the over-representation of specific GO (biological process) terms. GO terms found to be significantly over-represented in two or more clusters are shown in Table 1. When the gene expression clusters are considered from this functional annotation perspective, they resolve into cluster- and function-specific groups. Most prominent among these is the immune response group, which unites clusters 2 and 25 (as well as 19 to a lesser extent) and contains six immune related GO terms. The hierarchical (parent-child) relationships among these immune related terms on the GO graph are shown in Figure 6. The most significantly over-represented GO term covers genes specifically involved in immune response (GO:0006955).
Table 1

Overrepresented GO terms from genes with TE-derived HS sites

Clusters1GO id2GO level3GO name4Counts5P-values6
Immune response group
2GO:00096074response to biotic stimulus214.9 × 10-7
2581.7 × 10-2
2GO:00069525defense response201.0 × 10-6
2581.6 × 10-2
2GO:00069556immune response203.6 × 10-7
2581.3 × 10-2
2GO:00517075response to other organism111.8 × 10-4
2553.3 × 10-2
2GO:00096136response to pest, pathogen or parasite111.0 × 10-4
2552.7 × 10-2
2GO:00300989lymphocyte differentiation36.2 × 10-3
1931.6 × 10-2
Regulation group
15GO:00507892regulation of biological process404.4 × 10-8
25177.5 × 10-3
15GO:00507943regulation of cellular process392.2 × 10-8
25174.1 × 10-3
15GO:00507913regulation of physiological process382.2 × 10-8
25151.3 × 10-6
15GO:00512444regulation of cellular physiological process381.3 × 10-8
25151.3 × 10-2
Metabolism group
15GO:00508753cellular physiological process736.5 × 10-4
19883.8 × 10-3
15GO:00442384primary metabolism631.7 × 10-6
19674.8 × 10-3
15GO:00442374cellular metabolism639.0 × 10-6
19711.3 × 10-3
9GO:00431704macromolecule metabolism262.8 × 10-2
15372.0 × 10-2
19509.1 × 10-4
9GO:00432835biopolymer metabolism182.8 × 10-2
15308.0 × 10-5
19355.1 × 10-4
Cell death group
2GO:00430676regulation of programmed cell death77.8 × 10-3
1973.7 × 10-2
2GO:00429817regulation of apoptosis77.8 × 10-3
1973.7 × 10-2

1Gene coexpression, based on k-means analysis, cluster identifier.

2Biological process Gene Ontology (GO) annotation term.

3Hierarchical level (descending from 1) in the biological process GO directed acyclic graph.

4Name assigned to the GO term

5Number of genes in each cluster associated with the GO term.

6P-value associated with the over-representation of the GO term in each cluster

Figure 6

Path on the GO biological process graph leading to overrepresented functions related to immune response. Hierarchical levels of the directed acyclic GO graph are shown on the left. Nodes in the graph represent individual GO annotation terms. Nodes are color coded according to the statistical significance level for their overrepresentation. Edges in the graph represent parent-child relationships between GO terms.

Overrepresented GO terms from genes with TE-derived HS sites 1Gene coexpression, based on k-means analysis, cluster identifier. 2Biological process Gene Ontology (GO) annotation term. 3Hierarchical level (descending from 1) in the biological process GO directed acyclic graph. 4Name assigned to the GO term 5Number of genes in each cluster associated with the GO term. 6P-value associated with the over-representation of the GO term in each cluster Path on the GO biological process graph leading to overrepresented functions related to immune response. Hierarchical levels of the directed acyclic GO graph are shown on the left. Nodes in the graph represent individual GO annotation terms. Nodes are color coded according to the statistical significance level for their overrepresentation. Edges in the graph represent parent-child relationships between GO terms. The functions of the individual genes that have TE-derived HS sites as well as immune response related expression patterns and functional annotation (GO:0006955) are shown in Table 2. This example further emphasizes the specific functional relevance of TE-derived HS sites. For instance, there are a number of T cell receptor antigens, such as CD6, CD53, CD74 and CD97, with T cell-specific expression patterns influenced by TE insertions. These antigens are involved in processes like cell-cell adhesion and signalling cascades that govern the development and proliferation of immune cells. There are also cases of excreted cytokines (e.g. CKLF & CCLF) that are involved in the inflammatory response by serving as potent chemoattractants and stimulants for T cells. Finally, there are clinically relevant genes with TE-derived HS sites, such as CCL5, which suppresses HIV by serving as a natural ligand for the HIV coreceptor CCR5, and BCL2, which blocks the apoptotic death of lymphocytes and can lead to lymphoma when constitutively expressed.
Table 2

Immune response (GO:0006955) genes with TE-derived HS sites

Cluster1Name2Accn3Description4HS site5TE6
2CD97NM_078481leukocyte antigen; receptor involved in both cell adhesion and signalling processes early after leukocyte activation12460_5SINE/MIR/MIR
2IGLC2BC073786Ig light-chain, partial Ke-Oz- polypeptide, C-term; immunoglobulin lambda constant region 213851_3LTR/ERVL/LTR16A
2IGLC1BC070353constant region of lambda light chains13851_3LTR/ERVL/LTR16A
2MR1CR602405major histocompatibility complex, class I-related1311_2LINE/L2/L2
2ADORA2ANM_000675adenosine receptor subtype A2a; G-protein coupled receptor; reduces the activation status of inflammatory cells13875_2SINE/MIR/MIR3
2CKLFNM_016951chemokine-like factor (cytokine); essential role in the immune and inflammatory responses; potent chemoattractant for neutrophils, monocytes and lymphocytes10591_2SINE/Alu/AluSg
2KLF6NM_001008490Kruppel-like factor 6; core promoter guanine-rich element binding protein; transcriptional activator6794_2SINE/MIR/MIRb
2IL16CR749286interleukin 16; lymphocyte chemoattractant factor; cytokine; modulator of T cell activation; mediated by CD410023_2DNA/MER1/MER117
2ST6GAL1NM_173216sialyltransferase 1 (beta-galactoside alpha-2,6-sialytransferase); role in T-cell death; generation of cell-surface carbohydrate determinants and differentiation antigens3308_3SINE/MIR/MIR3
2HLA-ENM_005516HLA class I histocompatibility lymphocyte antigen, E alpha chain; immunoregulatory role for cytotoxic T-lymphocytes4584_2DNA/MER1/Charlie1
2IL21RNM_181078interleukin 21 receptor; type I cytokine receptor; transduces the growth promoting signal of IL21, and is important for the proliferation and differentiation of T cells, B cells, and natural killer (NK) cells10408_3SINE/MIR/MIRb
2ITGB2NM_000211leukocyte cell surface adhesion glycoprotein; complement receptor C3 beta-subunit; integrin beta 2; macrophage antigen 1; facilitates inflammatory cell recruitment13756_2LINE/L1/LIME4a
2FYBNM_001465FYN-binding protein; adhesion and degranulation-promoting adaptor protein; adaptor helps form immunological synapse between T cell and antigen-presenting cell; mediates signaling from the T cell antigen receptor to integrins3896_2SINE/MIR/MIR3
2IGHG1BC024289immunoglobulin heavy constant region gamma 1; involved in antigen binding and immune response9667_2LINE/L1/L1
2PTPRCY00062protein tyrosine phosphatase receptor; leukocyte-common CD45 antigen; essential regulator of T- and B-cell antigen receptor signaling; regulator of cytokine receptor signaling; involved in hematopoiesis1347_2LINE/L1/L1ME
2LCP2NM_005565lymphocyte cytosolic protein 2; adaptor or scaffold protein that promotes T cell development and activation as well as mast cell and platelet function.4315_2SINE/Alu/AluSx
2CD53NM_000560glycoprotein leukocyte cell surface antigen; contributes to the transduction of CD2-generated signals in T cells and natural killer cells; role in T cell growth regulation964_2LINE/L2/L2
2DOCK2NM_004946dedicator of cytokinesis 2; hematopoietic cell-specific CDM family protein essential for lymphocyte chemotaxis; mediates T cell receptor-induced activation of Rac2 and IL-24313_2SINE/Alu/AluSx SINE/MIR/MIR
2CD74BC024272cell surface antigen; major histocompatibility complex, class II invariant chain; involved in NF-kappaB activation and interleukin-8 production4254_2DNA/MER1/Charlie8
2MX1NM_002462myxovirus resistance protein 1; interferon inducible; role in host defense against viruses13686_3DNA/MER1/MER5A
25CD6U66145glycoprotein cell surface antigen; involved in lymphocyte activation and thymocyte development; role in maturation of the immunological synapseDNA/MER1/MER113
25PLA2G4BNM_005090phospholipase enzyme secreted by neutrophils; produces arachidonic acid used for the biosynthesis of leukotriene in inflammatory response9758_2SINE/MIR/MIRb SINE/Alu/AluY
25BCL2NM_000633B-cell lymphoma protein 2; blocks the apoptotic death of some cells such as lymphocytes12021_2LINE/L1/HAL1
25TRA@BC063432T cell antigen receptor alpha locus; involved in thymocyte developement9169_2 9171_2 9174_4LINE/L1/L1ME2 SINE/MIR/MIR LINE/L2/L2
25CCR9NM_031200chemokine (C-C motif) G protein coupled receptor; regulator of thymocytes migration and maturation in normal and inflammation conditions; functional specialization of immune responses in different segments of the gastrointestinal tract2827_2LINE/L1/L1ME4a
25TCF7NM_003202T cell specific transcription factor; regulates T cell development and peripheral T cell differentiation4139_4LINE/CR1/L3
25CCL5NM_002985T cell specific chemokine (C-C motif) ligand; chemoattractant for blood monocytes, memory T helper cells and eosinophils; causes release of histamine from basophils and activates eosinophils;11206_2LINE/L1/L1MB3
25LATNM_001014987linker for activation of T cells; phosphorylated following activation of the T-cell antigen receptor signal transduction pathway; acts as a docking site and recruits multiple adaptor proteins and downstream signaling molecules into multimolecular signaling complexes10425_2SINE/Alu/AluJb SINE/MIR/MIR

1Gene coexpression cluster from which the gene is taken.

2Official gene name symbol from the HUGO Gene nomenclature committee

3Representative mRNA Genbank accession taken from USC genome browser knownGene table.

4Brief description of the gene function

5NHGRI DNase-I hypersensitive site identifier taken from the UCSC genome browser nhgriDnaseHs table.

6Class/family/name of the transposable element (TE) sequence colocalized with the HS site

Immune response (GO:0006955) genes with TE-derived HS sites 1Gene coexpression cluster from which the gene is taken. 2Official gene name symbol from the HUGO Gene nomenclature committee 3Representative mRNA Genbank accession taken from USC genome browser knownGene table. 4Brief description of the gene function 5NHGRI DNase-I hypersensitive site identifier taken from the UCSC genome browser nhgriDnaseHs table. 6Class/family/name of the transposable element (TE) sequence colocalized with the HS site

Comparative genomics

Gene expression and function analysis point to the importance of genes with TE-derived HS sites. However, these same TE-derived HS sites are not evolutionarily conserved. This suggests that TE-derived HS sites may be important in generating functional differences between evolutionary lineages. Comparative analyses of gene sequence and expression divergence between human and mouse orthologs were performed to evaluate this possibility. Genes with HS sites were divided into those with TE-derived sites and those with non TE-derived sites. These two gene sets were mapped to 9,105 pairs of human-mouse orthologous gene pairs described previously [30], each member of which has GNF2 expression data. Proteins encoded by genes with TE-derived HS sites have slightly higher levels of sequence divergence (0.147 substitutions per site) compared to those encoded by genes with non TE-derived HS sites (0.138 substitutions per site). The difference between these average substitution rates is only marginally significant (Student's t-test, P = 0.12, Mann-Whitney U test, P = 0.08). Genes with TE-derived HS sites also have slightly greater evolutionary differences in CD4+ T cell expression (1.005) than those with non TE-derived HS sites (0.948) as measured by comparison between human and mouse orthologs. The difference in CD4+ T cell expression for TE-derived versus non TE-derived HS site genes is only marginally significant as well (Student's t-test, P = 0.08, Mann-Whitney U test, P = 0.11). However, taken together, the differences in evolutionary divergence at the sequence and expression level are consistent with the idea that TE-derived HS sites help to drive evolutionary changes between lineages. The magnitude of this effect is fairly small though, just under 10% difference for both sequence and expression, contributing to the marginal significance in each case and indicating that many other factors are in play with respect to the evolutionary divergence of these genes and phenotypes.

Conclusion

TEs contribute numerous HS sites to the human genome. While TE-derived HS sites are not evolutionarily conserved, they are functionally relevant, as demonstrated by analyses of gene expression and functional annotations. This distinction between conservation and function can be taken to suggest that TEs provide a specific mechanism for driving regulatory differences between evolutionary lineages, and comparative genomics data bear this notion out to some extent. Genes with TE-derived HS sites are slightly more divergent than those non TE-derived HS sites in terms of both sequence and CD4+ T cell expression. The results reported here point to genome-scale effects that TEs have had in shaping the regulatory evolution of their host genomes.

Methods

Human genome sequence

The May 2004 release – National Center for Biotechnology Information (NCBI) build 35 – of the human genome reference sequence [1] was analyzed using the UCSC genome browser [31]. The UCSC database containing this particular release and all associated data is referred to as hg17. The chromosome coordinates of various attributes mapped onto the hg17 genome sequence were downloaded using the UCSC Table Browser retrieval tool [32]. The Table Browser retrieval tool was also used to perform a number of logical set operations between specific tables (see below) that allowed for the identification of co-located genomic attributes.

Hypersensitive sites

DNaseI-hypersensitive sites (HS) from CD4+ T cells were characterized as described [17-19]. The HS sites have been mapped onto the hg17 sequence and their genome coordinates were retrieved from the table named nhgriDnaseHs. Only clusters of more than on HS site that map within 500 bp of each other are mapped onto hg17.

Transposable elements

The locations and identities of all hg17 TE sequences were characterized using the RepeatMasker program [26], which uses the RepBase library [27] of repeat sequences. The genome coordinates, along with the class, family and name designations, for TEs were retrieved from the table named rmsk.

Gene expression and function

Human and mouse microarray gene expression data are from the Genomics Institute of the Novartis Research Foundation SymAtlas (GNF2) [28]. These data were retrieved from the table named gnfAtlas2. This table stores relative expression values for 79 different human tissues and/or cell types (conditions). Relative expression levels are computed as follows: Two replicate microarray experiments were performed for each condition, and for each individual probe, the expression signal intensity values were averaged for each of the 79 pairs of experiments. Then, for each probe, each of the 79 condition-specific averages was normalized by the median of all values for that probe to determine relative expression levels; these ratios were log2 normalized prior to analysis. Affymetrix probe identifiers are mapped to UCSC Genome Browser known genes and these data were retrieved from the table named knownToGnfAtlas2. The known genes are based mRNA data from the NCBI Reference Sequence (RefSeq) database and GenBank along with protein data from UniProt. The chromosome coordinates for the known genes were retrieved from the table named knownGene. After probe-to-gene mapping, relative CD4+ T cell expression levels were compared for different sets of human genes. Gene expression profiles were visualized and clustered, by k-means clustering, using the program Genesis [29]. The Pearson correlation coefficient was used to compute pairwise similarities between gene expression profiles. Individual clusters with high relative levels of CD4+ T cell expression, as well as high expression in related tissue-types, were chosen for further functional analysis. Clusters (i.e. groups of genes) were evaluated with respect to overrepresented Gene Ontology (GO) [33] functional annotation terms using the program GOstat [34]. The biological process subset of the GO hierarchy was used along with the European Bioinformatics Institute (EBI) human GO mapping. A 2 × 2 contingency table was used to compare the relative frequency of GO terms in the coexpression cluster test set (observed) versus the relative frequency of GO terms in the background set of all human GO terms (expected) using the χ2 test, or the Fisher's exact test is the expected value<5. The Benjamini False Discovery Rate correction for multiple testing was used to adjust the resulting P-values. GO annotation terms that were found to be overrepresented in two or more different clusters were chosen for further analysis. Individual gene functions were explored using the NCBI Entrez Gene database . The graphical (parent-child) relationships among GO terms related to immune response were characterized using the GeneInfoViz program [35]. GO term significance levels were color coded using the program matrix2png [36]. Conserved hg17 genomic sites were characterized using the phastCons program [23]. Alignment, using blastz [25] and multiz [24], and comparison among 17 vertebrate genome sequences were used to characterize conserved sites. The genome coordinates for conserved sites were retrieved from the table named phastConsElements17way. Absolute differences between the relative levels of CD4+ T cell expression were calculated for human and mouse orthologous genes pairs described previously [30]. The evolutionary divergence between human and mouse orthologous proteins was measured as the number of substitutions per site using the Poisson Correction distance [37].

Reviewers' comments

Reviewer's report 1

Wolfgang J. Miller, Laboratories of Genome Dynamics, Center of Anatomy and Cell Biology, Medical University of Vienna, Austria (nominated by Jerzy Jurka, Genetic Information Research Institute, Sunnyvale, CA, USA) Reviewer comments: This paper provides compelling evidence that TEs did contribute significantly to the evolution of novel regulatory sections in humans and other organisms by using an elegant and innovative combination between genomics with microarray gene expression data analyses. The authors show that up to one-fourth of the human DNAse I hypersensitive sites actually stem from TEs that serve important immune response functions in especially rapidly evolving genes. Due to the extreme lineage-specific expansion/silencing dynamics of TEs in different host systems absence of evolutionary conservation of TE-derived HS sites as reported here is not surprising. Therefore these data are not contradicting their functional relevance as important cis-regulatory sections but demonstrate that mobile DNAs in general do provide a highly attractive repertoire of structural and functional information patterns to the host. Even after their successful inactivation via host-directed silencing mechanisms such TE-derived cis-regulatory sections can if proven successful be adopted by the host genome for serving novel and innovative regulatory functions. It would be highly interesting to perform comparative genomic analyses between human and chimpanzee orthologous TE-derived HS sites in the near future.

Author's response

We would like to thank Dr. Miller for taking the time and effort to review our manuscript. Dr. Miller suggests comparative genomic analyses between human and chimpanzee orthologous TE-derived HS sites. This is a good idea and may help to settle an issue, raised by Dr. Itai Yanai (Reviewer #2 below), concerning the role of TE-sequences as space holders versus the actual contribution of TE sequences to human gene regulation.

Reviewer's report 2

Itai Yanai, Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA, USA Reviewer Comments: Marino-Ramirez and Jordan show in this paper that HS sites are significantly more conserved in sequence than non-HS sites although HS sites containing TE-derived genes are far less conserved that HS sites lacking TE's. In terms of gene expression, the authors show that TE-derived genes and non-TE derived genes in HS sites are as likely to be expressed in CD4+ T cells. Taken together, these results lead to the conclusion that TE's are useful in promoting gene expression evolution. This is an interesting notion. I imagine that TE's insertion may be instrumental in modulating expression patterns by altering the spacing among transcription binding sites as well as disrupting some motifs through their insertion. Thus their effects on gene expression may be conferred solely by their role as space holders consequently freeing the actual TE sequence to drift. Indeed the authors also find that TE-derived genes evolve slightly faster in terms of sequence and expression. However the signal is so weak that it places into question the generality of this finding. One straightforward interpretation is that since in all likelihood most TE's that happen to lie at HS sites do not contribute to the evolution of gene expression, these dilute the signal to its observed weak level. Overall, these findings are important and should further prompt research to attempt to distinguish those TE's which contribute to genomic function from those that do not. We would like to thank Dr. Yanai for taking the time and effort to review our manuscript. Dr. Yanai proposes the interpretation that "most TE's that happen to lie at HS sites [that] do not contribute to the evolution of gene expression." Indeed, the evolutionary divergence in CD4+ T cell expression levels for genes containing TE-derived HS sites is only marginally greater than that seen for genes with non TE-derived HS sites, and we point this out in the text of the manuscript. However, the functional relevance of TE-derived HS sites is strongly supported by their association with genes that have relatively (significantly) higher levels of CD4+ T cell expression (Figure 4).In addition, genes with TE-derived HS sites have slightly higher sequence substitution rates, on average, than genes with non TE-derived HS sites. Taken together, these lines of evidence point to a potential role for TE-derived HS regulatory sites in facilitating the evolutionary divergence of human genes. To more carefully examine this issue, we plan to investigate regulatory changes of genes with TE-derived HS sites across species at different levels of evolutionary divergence from the human lineage. Dr. Yanai also raises an interesting point about the role that TE sequences may play as 'space holders' as opposed to contributing specific regulatory sequences. Spatial changes among promoter elements caused by TE insertions may certainly have important functional consequences. On the other hand, there are a number of known, experimentally verified, cases of TE sequences providing specific cis-regulatory binding sequences. We are currently investigating the relative rates of evolution for TE sequences that co-locate with regulatory regions versus those that do not, among species that cover a range of evolutionary divergence from human, to further investigate this possibility.

Reviewer's report 3

Mikhail S.Gelfand, Department of Bioinformatics, Institute of Information Transfer Problems, Russian Academy of Science, Moscow, Russia Reviewer Comments: The paper reports analysis of hypersensitive sites (HSs), comparing HSs containing transposable elements (TEs) and HSs in general. Given that HSs are likely to correspond to regulatory regions and tend to be conserved, the large fraction of TE-containing HSs is surprising. Still, even the latter are shown to be functionally relevant. This leads to an important conclusion about the role of TEs in the evolution of regulation. A main problem of this study is the deficit of controls. Basically, only two samples are analyzed, TE-HS and nonTE-HS. For instance: 23% HSs contain TEs and 11% HS positions are covered by TEs (page 7) – is it a lot or not? What would be expected if there were no correlation? (given that >50% of the human genome is TE-derived, I suspect that, not surprisingly, HS tend to avoid HS – but what about other classes of functional regions?) There is no difference in gene expression between TE-HS and nonTE-HS sites (page 9) – this is nice, but, again, what about other classes of genes, e.g. dependent on the degree of conservation in their upstream regions. Section "Gene function": what genes were subject to clustering in order to find over-represented GO annotations – all HS genes? TE-HS and nonTE-HS genes separately? Further, in the last paragraph of this section, it should be explicitly mentioned that it describes just an example, not an exhaustive list. Section "Comparative genomics": missing controls are genes without HS in the same GO categories and just genes without HS: without it, it is difficult to appreciate the difference between the TE-HS and nonTE-HS genes. On the other hand, the results of this paragraph are quite interesting: assuming that TE-HS genes recently experienced a change in regulation (resulting in change in expression), one could expect positive selection towards a new role. It might be interesting to apply the McDonald-Kreitman test to check this hypothesis. But again, more controls with other classes of genes are needed. Overall, I think that, although the obtained results are interesting, they are somewhat preliminary. To make the emerging picture much less shallow and enhance the authors' main point about the contribution of TEs to the evolution of regulation in the human genome, they should consider, where appropriate, the following control groups of genes: genes without HS, genes with non-conserved (but not TE-derived) HS, non-HS genes in the same GO categories as identified in the "Gene function" section. We would like to thank Dr. Gelfand for taking the time and effort to review our manuscript. Dr. Gelfand calls attention to a 'deficit of controls', in several places, pointing out that only two samples are analyzed: TE-HS versus non TE-HS containing genes. Importantly, we have analyzed a third class of genes, namely those that do not have any co-located HS sites characterized in CD4+ T cells (non HS). This latter class of genes has significantly lower levels of CD4+ T cell expression than either class of HS site containing genes (TE-derived or non TE-derived; Figure 4).This underscores the functional relevance of both classes of HS sites characterized in CD4+ T cells, TE-derived and non TE-derived, with respect to CD4+ T cell specific expression. Additional analysis of this third class of genes yields slightly more ambiguous results related to the evolutionary divergence of human gene expression patterns. Non HS containing genes have average levels of CD4+ T cell expression divergence that are intermediate to those seen for TE-HS and non TE-HS containing genes. This is in keeping with the relatively weak signal seen for the differences in the average evolutionary divergence of CD4+ T cell expression levels (as well as gene sequence divergence) across different classes of genes, and related issues were raised by the other reviewers. We have been careful to point out this caveat in the manuscript and plan to perform additional evolutionary comparisons (as described in the answers to Reviewer 1 & 2 above), for instance on more closely related species, which may help to resolve the issue of the contribution of TE-derived HS sites to host gene regulatory divergence. The point raised about 11% of HS sites being covered by TEs is also germane. This fraction is indeed less than you would expect by chance alone given that the genome consists of ~50% TE-derived sites. Clearly, not all TE-derived sites in the human are functionally relevant in terms of expression and many accumulate simply by virtue of the selfish replicative properties of the elements, i.e. without regard to any adaptive benefit they provide to the host genome. In fact most TE insertions into regulatory regions are probably deleterious, and this is consistent with previous results that have shown exclusion of TE sequences from proximal promoter regions. Nevertheless, the fact that a substantial fraction of human gene regulatory sites is derived from TE-sequences underscores the potential for such elements to be co-opted, from time-to-time, to serve some role for the host genome in which they reside. In the Gene function section, we have clarified that genes with TE-derived HS sites were clustered by k-means analysis of their tissue-specific expression patterns. Clusters with pronounced CD4+ T cell expression levels (e.g. Figure 5)were then selected for functional analysis with GO. In addition, as per Dr. Gelfand's suggestion, we explicitly point out that we describe an example, not an exhaustive list, in the last paragraph of this section (i.e. the data in Table 2).

Competing interests

The author(s) declare that they have no competing interests.

Authors' contributions

LMR and IKJ performed all analyses. IKJ drafted the manuscript. All authors read and approved the final manuscript.
  35 in total

Review 1.  Perspective: transposable elements, parasitic DNA, and genome evolution.

Authors:  M G Kidwell; D R Lisch
Journal:  Evolution       Date:  2001-01       Impact factor: 3.694

2.  Initial sequencing and analysis of the human genome.

Authors:  E S Lander; L M Linton; B Birren; C Nusbaum; M C Zody; J Baldwin; K Devon; K Dewar; M Doyle; W FitzHugh; R Funke; D Gage; K Harris; A Heaford; J Howland; L Kann; J Lehoczky; R LeVine; P McEwan; K McKernan; J Meldrim; J P Mesirov; C Miranda; W Morris; J Naylor; C Raymond; M Rosetti; R Santos; A Sheridan; C Sougnez; Y Stange-Thomann; N Stojanovic; A Subramanian; D Wyman; J Rogers; J Sulston; R Ainscough; S Beck; D Bentley; J Burton; C Clee; N Carter; A Coulson; R Deadman; P Deloukas; A Dunham; I Dunham; R Durbin; L French; D Grafham; S Gregory; T Hubbard; S Humphray; A Hunt; M Jones; C Lloyd; A McMurray; L Matthews; S Mercer; S Milne; J C Mullikin; A Mungall; R Plumb; M Ross; R Shownkeen; S Sims; R H Waterston; R K Wilson; L W Hillier; J D McPherson; M A Marra; E R Mardis; L A Fulton; A T Chinwalla; K H Pepin; W R Gish; S L Chissoe; M C Wendl; K D Delehaunty; T L Miner; A Delehaunty; J B Kramer; L L Cook; R S Fulton; D L Johnson; P J Minx; S W Clifton; T Hawkins; E Branscomb; P Predki; P Richardson; S Wenning; T Slezak; N Doggett; J F Cheng; A Olsen; S Lucas; C Elkin; E Uberbacher; M Frazier; R A Gibbs; D M Muzny; S E Scherer; J B Bouck; E J Sodergren; K C Worley; C M Rives; J H Gorrell; M L Metzker; S L Naylor; R S Kucherlapati; D L Nelson; G M Weinstock; Y Sakaki; A Fujiyama; M Hattori; T Yada; A Toyoda; T Itoh; C Kawagoe; H Watanabe; Y Totoki; T Taylor; J Weissenbach; R Heilig; W Saurin; F Artiguenave; P Brottier; T Bruls; E Pelletier; C Robert; P Wincker; D R Smith; L Doucette-Stamm; M Rubenfield; K Weinstock; H M Lee; J Dubois; A Rosenthal; M Platzer; G Nyakatura; S Taudien; A Rump; H Yang; J Yu; J Wang; G Huang; J Gu; L Hood; L Rowen; A Madan; S Qin; R W Davis; N A Federspiel; A P Abola; M J Proctor; R M Myers; J Schmutz; M Dickson; J Grimwood; D R Cox; M V Olson; R Kaul; C Raymond; N Shimizu; K Kawasaki; S Minoshima; G A Evans; M Athanasiou; R Schultz; B A Roe; F Chen; H Pan; J Ramser; H Lehrach; R Reinhardt; W R McCombie; M de la Bastide; N Dedhia; H Blöcker; K Hornischer; G Nordsiek; R Agarwala; L Aravind; J A Bailey; A Bateman; S Batzoglou; E Birney; P Bork; D G Brown; C B Burge; L Cerutti; H C Chen; D Church; M Clamp; R R Copley; T Doerks; S R Eddy; E E Eichler; T S Furey; J Galagan; J G Gilbert; C Harmon; Y Hayashizaki; D Haussler; H Hermjakob; K Hokamp; W Jang; L S Johnson; T A Jones; S Kasif; A Kaspryzk; S Kennedy; W J Kent; P Kitts; E V Koonin; I Korf; D Kulp; D Lancet; T M Lowe; A McLysaght; T Mikkelsen; J V Moran; N Mulder; V J Pollara; C P Ponting; G Schuler; J Schultz; G Slater; A F Smit; E Stupka; J Szustakowki; D Thierry-Mieg; J Thierry-Mieg; L Wagner; J Wallis; R Wheeler; A Williams; Y I Wolf; K H Wolfe; S P Yang; R F Yeh; F Collins; M S Guyer; J Peterson; A Felsenfeld; K A Wetterstrand; A Patrinos; M J Morgan; P de Jong; J J Catanese; K Osoegawa; H Shizuya; S Choi; Y J Chen; J Szustakowki
Journal:  Nature       Date:  2001-02-15       Impact factor: 49.962

3.  Genesis: cluster analysis of microarray data.

Authors:  Alexander Sturn; John Quackenbush; Zlatko Trajanoski
Journal:  Bioinformatics       Date:  2002-01       Impact factor: 6.937

4.  Analysis of primate genomic variation reveals a repeat-driven expansion of the human genome.

Authors:  Ge Liu; Shaying Zhao; Jeffrey A Bailey; S Cenk Sahinalp; Can Alkan; Eray Tuzun; Eric D Green; Evan E Eichler
Journal:  Genome Res       Date:  2003-03       Impact factor: 9.043

5.  Initial sequencing and comparative analysis of the mouse genome.

Authors:  Robert H Waterston; Kerstin Lindblad-Toh; Ewan Birney; Jane Rogers; Josep F Abril; Pankaj Agarwal; Richa Agarwala; Rachel Ainscough; Marina Alexandersson; Peter An; Stylianos E Antonarakis; John Attwood; Robert Baertsch; Jonathon Bailey; Karen Barlow; Stephan Beck; Eric Berry; Bruce Birren; Toby Bloom; Peer Bork; Marc Botcherby; Nicolas Bray; Michael R Brent; Daniel G Brown; Stephen D Brown; Carol Bult; John Burton; Jonathan Butler; Robert D Campbell; Piero Carninci; Simon Cawley; Francesca Chiaromonte; Asif T Chinwalla; Deanna M Church; Michele Clamp; Christopher Clee; Francis S Collins; Lisa L Cook; Richard R Copley; Alan Coulson; Olivier Couronne; James Cuff; Val Curwen; Tim Cutts; Mark Daly; Robert David; Joy Davies; Kimberly D Delehaunty; Justin Deri; Emmanouil T Dermitzakis; Colin Dewey; Nicholas J Dickens; Mark Diekhans; Sheila Dodge; Inna Dubchak; Diane M Dunn; Sean R Eddy; Laura Elnitski; Richard D Emes; Pallavi Eswara; Eduardo Eyras; Adam Felsenfeld; Ginger A Fewell; Paul Flicek; Karen Foley; Wayne N Frankel; Lucinda A Fulton; Robert S Fulton; Terrence S Furey; Diane Gage; Richard A Gibbs; Gustavo Glusman; Sante Gnerre; Nick Goldman; Leo Goodstadt; Darren Grafham; Tina A Graves; Eric D Green; Simon Gregory; Roderic Guigó; Mark Guyer; Ross C Hardison; David Haussler; Yoshihide Hayashizaki; LaDeana W Hillier; Angela Hinrichs; Wratko Hlavina; Timothy Holzer; Fan Hsu; Axin Hua; Tim Hubbard; Adrienne Hunt; Ian Jackson; David B Jaffe; L Steven Johnson; Matthew Jones; Thomas A Jones; Ann Joy; Michael Kamal; Elinor K Karlsson; Donna Karolchik; Arkadiusz Kasprzyk; Jun Kawai; Evan Keibler; Cristyn Kells; W James Kent; Andrew Kirby; Diana L Kolbe; Ian Korf; Raju S Kucherlapati; Edward J Kulbokas; David Kulp; Tom Landers; J P Leger; Steven Leonard; Ivica Letunic; Rosie Levine; Jia Li; Ming Li; Christine Lloyd; Susan Lucas; Bin Ma; Donna R Maglott; Elaine R Mardis; Lucy Matthews; Evan Mauceli; John H Mayer; Megan McCarthy; W Richard McCombie; Stuart McLaren; Kirsten McLay; John D McPherson; Jim Meldrim; Beverley Meredith; Jill P Mesirov; Webb Miller; Tracie L Miner; Emmanuel Mongin; Kate T Montgomery; Michael Morgan; Richard Mott; James C Mullikin; Donna M Muzny; William E Nash; Joanne O Nelson; Michael N Nhan; Robert Nicol; Zemin Ning; Chad Nusbaum; Michael J O'Connor; Yasushi Okazaki; Karen Oliver; Emma Overton-Larty; Lior Pachter; Genís Parra; Kymberlie H Pepin; Jane Peterson; Pavel Pevzner; Robert Plumb; Craig S Pohl; Alex Poliakov; Tracy C Ponce; Chris P Ponting; Simon Potter; Michael Quail; Alexandre Reymond; Bruce A Roe; Krishna M Roskin; Edward M Rubin; Alistair G Rust; Ralph Santos; Victor Sapojnikov; Brian Schultz; Jörg Schultz; Matthias S Schwartz; Scott Schwartz; Carol Scott; Steven Seaman; Steve Searle; Ted Sharpe; Andrew Sheridan; Ratna Shownkeen; Sarah Sims; Jonathan B Singer; Guy Slater; Arian Smit; Douglas R Smith; Brian Spencer; Arne Stabenau; Nicole Stange-Thomann; Charles Sugnet; Mikita Suyama; Glenn Tesler; Johanna Thompson; David Torrents; Evanne Trevaskis; John Tromp; Catherine Ucla; Abel Ureta-Vidal; Jade P Vinson; Andrew C Von Niederhausern; Claire M Wade; Melanie Wall; Ryan J Weber; Robert B Weiss; Michael C Wendl; Anthony P West; Kris Wetterstrand; Raymond Wheeler; Simon Whelan; Jamey Wierzbowski; David Willey; Sophie Williams; Richard K Wilson; Eitan Winter; Kim C Worley; Dudley Wyman; Shan Yang; Shiaw-Pyng Yang; Evgeny M Zdobnov; Michael C Zody; Eric S Lander
Journal:  Nature       Date:  2002-12-05       Impact factor: 49.962

6.  Matrix2png: a utility for visualizing matrix data.

Authors:  Paul Pavlidis; William Stafford Noble
Journal:  Bioinformatics       Date:  2003-01-22       Impact factor: 6.937

7.  The UCSC Genome Browser Database.

Authors:  D Karolchik; R Baertsch; M Diekhans; T S Furey; A Hinrichs; Y T Lu; K M Roskin; M Schwartz; C W Sugnet; D J Thomas; R J Weber; D Haussler; W J Kent
Journal:  Nucleic Acids Res       Date:  2003-01-01       Impact factor: 16.971

8.  Quantification of DNaseI-sensitivity by real-time PCR: quantitative analysis of DNaseI-hypersensitivity of the mouse beta-globin LCR.

Authors:  M McArthur; S Gerum; G Stamatoyannopoulos
Journal:  J Mol Biol       Date:  2001-10-12       Impact factor: 5.469

Review 9.  Origin of a substantial fraction of human regulatory sequences from transposable elements.

Authors:  I King Jordan; Igor B Rogozin; Galina V Glazko; Eugene V Koonin
Journal:  Trends Genet       Date:  2003-02       Impact factor: 11.639

10.  Human-mouse alignments with BLASTZ.

Authors:  Scott Schwartz; W James Kent; Arian Smit; Zheng Zhang; Robert Baertsch; Ross C Hardison; David Haussler; Webb Miller
Journal:  Genome Res       Date:  2003-01       Impact factor: 9.043

View more
  21 in total

1.  Origin and evolution of human microRNAs from transposable elements.

Authors:  Jittima Piriyapongsa; Leonardo Mariño-Ramírez; I King Jordan
Journal:  Genetics       Date:  2007-04-15       Impact factor: 4.562

Review 2.  Transposable elements and the evolution of regulatory networks.

Authors:  Cédric Feschotte
Journal:  Nat Rev Genet       Date:  2008-05       Impact factor: 53.242

3.  An atlas of transposable element-derived alternative splicing in cancer.

Authors:  Evan A Clayton; Lavanya Rishishwar; Tzu-Chuan Huang; Saurabh Gulati; Dongjo Ban; John F McDonald; I King Jordan
Journal:  Philos Trans R Soc Lond B Biol Sci       Date:  2020-02-10       Impact factor: 6.237

4.  Searching for repeats, as an example of using the generalised Ruzzo-Tompa algorithm to find optimal subsequences with gaps.

Authors:  John L Spouge; Leonardo Mariño-Ramírez; Sergey L Sheetlin
Journal:  Int J Bioinform Res Appl       Date:  2014

Review 5.  CTRL+INSERT: retrotransposons and their contribution to regulation and innovation of the transcriptome.

Authors:  Jonathan Göke; Huck Hui Ng
Journal:  EMBO Rep       Date:  2016-07-11       Impact factor: 8.807

Review 6.  Living Organisms Author Their Read-Write Genomes in Evolution.

Authors:  James A Shapiro
Journal:  Biology (Basel)       Date:  2017-12-06

7.  Transcriptional rewiring of the sex determining dmrt1 gene duplicate by transposable elements.

Authors:  Amaury Herpin; Ingo Braasch; Michael Kraeussling; Cornelia Schmidt; Eva C Thoma; Shuhei Nakamura; Minoru Tanaka; Manfred Schartl
Journal:  PLoS Genet       Date:  2010-02-12       Impact factor: 5.917

8.  A c-Myc regulatory subnetwork from human transposable element sequences.

Authors:  Jianrong Wang; Nathan J Bowen; Leonardo Mariño-Ramírez; I King Jordan
Journal:  Mol Biosyst       Date:  2009-07-21

9.  MIR retrotransposon sequences provide insulators to the human genome.

Authors:  Jianrong Wang; Cristina Vicente-García; Davide Seruggia; Eduardo Moltó; Ana Fernandez-Miñán; Ana Neto; Elbert Lee; José Luis Gómez-Skarmeta; Lluís Montoliu; Victoria V Lunyak; I King Jordan
Journal:  Proc Natl Acad Sci U S A       Date:  2015-07-27       Impact factor: 11.205

10.  The majority of primate-specific regulatory sequences are derived from transposable elements.

Authors:  Pierre-Étienne Jacques; Justin Jeyakani; Guillaume Bourque
Journal:  PLoS Genet       Date:  2013-05-09       Impact factor: 5.917

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.