Literature DB >> 18450818

Human and mouse introns are linked to the same processes and functions through each genome's most frequent non-conserved motifs.

Aristotelis Tsirigos¹, Isidore Rigoutsos.

Abstract

We identified the most frequent, variable-length DNA sequence motifs in the human and mouse genomes and sub-selected those with multiple recurrences in the intergenic and intronic regions and at least one additional exonic instance in the corresponding genome. We discovered that these motifs have virtually no overlap with intronic sequences that are conserved between human and mouse, and thus are genome-specific. Moreover, we found that these motifs span a substantial fraction of previously uncharacterized human and mouse intronic space. Surprisingly, we found that these genome-specific motifs are over-represented in the introns of genes belonging to the same biological processes and molecular functions in both the human and mouse genomes even though the underlying sequences are not conserved between the two genomes. In fact, the processes and functions that are linked to these genome-specific sequence-motifs are distinct from the processes and functions which are associated with intronic regions that are conserved between human and mouse. The findings show that intronic regions from different genomes are linked to the same processes and functions in the absence of underlying sequence conservation. We highlight the ramifications of this observation with a concrete example that involves the microsatellite instability gene MLH1.

Entities: Chemical Disease Gene Species

Mesh：

Substances：

Year: 2008 PMID： 18450818 PMCID： PMC2425492 DOI： 10.1093/nar/gkn155

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

Most of the searches for regulatory sequences have operated on the premise that functional motifs ought to be conserved across orthologous sequences (1–7). This cross-species conservation constraint has frequently proven to be a sufficient condition for the identification of regulatory regions. However, there is increasing evidence that such a prerequisite may not be necessary (8–12). Of late, the search for putative regulatory sequences has extended to introns: even though they were predicted to harbor regulatory signals (13,14) some of which were subsequently discovered (2,11,15–25), the true significance of introns remains elusive. The more recent interest in introns has been fueled in part by the discovery in them of microRNA precursors that may not always engage the canonical microRNA pathway (23,26). Parallel work has begun revealing a complex picture of the organization and functional richness of genomes. Arguably, the ENCODE project provided the latest major discoveries along those lines. In particular, it was found that the human genome is massively transcribed in a complex manner (27–29). Following this and related work, it is now clear that eukaryotic genomes must contain more functional elements than previously estimated. Human genome regions can be classified into three broad categories with respect to the extent of their evolutionary conservation and their coding potential: (a) sequences that are under strong evolutionary constraints and represent ∼5% of the human genome (18,22); (b) conserved non-exonic sequences that are more frequent than expected (30) but do not necessarily comprise functional elements (31); and (c) non-conserved, non-exonic sequences, a category with an unexpected high number of functional elements (29). Adding to this complex landscape, several repeat element fragments appear to undergo strong purifying selection and to be exapted into functional elements (1,20,32–35). Moreover, conserved non-exonic regions with repetitive origins appear near developmental genes suggesting that mobile elements may play a role in gene regulation (32), whereas a special class of fairly long stretches of DNA, termed ultraconserved elements, have been found to be exceptionally well conserved across several genomes (2). In this discussion, we present results from our exploration of intronic space in the human and mouse genomes. We analyzed human and mouse intronic sequences using ‘pyknons’ as our tool; pyknons are previously described sets of very frequent genome-specific DNA sequence motifs that were shown to have a number of interesting and functionally meaningful properties (11,36). The analysis that follows shows that pyknons span a substantial fraction of previously uncharacterized intronic space. Additionally, as a set, pyknons are distinct from repeat elements. The notable finding of the analysis presented below is that in both the human and mouse genomes these very frequent motifs are over-represented in the introns of genes belonging to the same set of biological processes and molecular functions, even though the underlying sequences are not conserved. Moreover, the intronic instances of these motifs are linked to processes and functions that do not overlap with the processes and functions which are linked to intronic regions that are conserved between human and mouse. Finally, we show that a subset of the pyknons co-localizes extensively with human and mouse piRNAs (37,38) inside human and mouse intronic sequences respectively. The presentation concludes with a discussion of the potential relevance of these findings in the disease context by analyzing the introns of the microsatellite instability gene MLH1. Our results suggest that extended regions of human and mouse introns are involved in conserved functional links that do not depend on underlying sequence conservation.

METHODS

Data sources

We obtained human and mouse chromosomal sequences and genomic region coordinates for transcripts, exons, 5′UTRs, CDSs and 3′UTRs as well as GO annotations from ENSEMBL release 42. Human/mouse pairwise alignments and repeat regions corresponding to the same genome assembly version (NCBI36) were obtained with the help of the UCSC Genome Browser. The human and mouse piRNA sequences were obtained from the supplementary material of previous work (37,38).

Computing the pyknon sets

Pyknons were recomputed to reflect changes from ENSEMBL Release 39 to Release 42 (39). For this, we used the parallel version of a pattern discovery algorithm that we developed earlier (40). The input comprised the intergenic and intronic sequences of the human and mouse genomes respectively but excluded intergenic and intronic segments that were the reverse complement of the 5′ untranslated, amino acid coding or 3′ untranslated regions of some human gene; more details can be found elsewhere (11). This exclusion ensures that any discovered patterns are not connected to the sequences of known genes, protein motifs or domains, or to the reverse complement of such sequences. The pattern discovery algorithm that we used for this analysis requires the setting of three parameters: L, W and K. The parameter L controls the minimum possible size of the discovered patterns but has no bearing on the patterns’ maximum length; the latter is not constrained in any way. The parameter W satisfies the inequality W ≥ L and controls the ‘degree of conservation’ across the various instances of the reported patterns: smaller (resp. larger) values of W will tolerate fewer (resp. more) mismatches across the instances. Since we are interested in only patterns with identically conserved instances, we set W = L (i.e. the discovered patterns contained no ‘wildcards’). The parameter K controls the minimum required number of appearances before a pattern can be reported by the algorithm. For a given choice of L, W and K the algorithm guarantees the reporting of all patterns that have K or more appearances in the processed input and are such that any L consecutive (but not necessarily contiguous) positions span at most W positions. Human and mouse pyknons were computed using L = 16, W = 16 and K = 30. These values of L and K ensure statistical significance (11).

Computing region overlaps

For any given pair of regions (for example pyknon and repeat regions), we computed their overlap by counting the number of positions in the genome belonging to both sets. All sequences were compared with one another in their 5′ → 3′ direction. As a preprocessing step, we converted each set of regions into a non-redundant set of non-overlapping sequences to avoid double-counting (e.g. different transcripts of the same gene, or genes that overlap). The probability of achieving a given overlap at random given the frequencies of the two sets was computed using the hypergeometric distribution.

Analyzing gene GO terms

For each gene, we computed: (i) its intronic sequence, i.e. the union of the introns of all of its transcripts, (ii) its associated GO term set, i.e. the union of the GO term sets associated with its transcripts and (iii) the concentration of pyknons, conserved elements and repeats in the gene's intronic sequence, i.e. the number of nucleotides of each type of region that lie inside the gene's intronic sequence, divided by the size of the intronic sequence. Concentration is defined as the fraction of the total intronic sequence of a given gene that is covered by a given set of elements, such as conserved regions, repeats, pyknons, or combinations thereof. We tested each GO term x separately for enrichment by comparing two distributions of concentrations: the distribution of concentrations of the genes that have term x on their list versus the distribution of concentrations of the genes that do not have x. First, we compared the distributions with a t-test statistic using Student's t-distribution as an approximation. In addition, we made use of random permutations and found that the generated results are in agreement with those that we obtained from the t-test analysis. This analysis yielded the initial P-values: using global random permutations we subsequently determined the appropriate P-value cutoff to ensure a 5% false discovery rate. Finally, a stability test confirmed that the low P-values were not due to the presence of a few extreme values of the distribution of concentrations.

Locating piRNA instances in introns

To locate piRNA instances or their reverse complements in intronic regions, we slide a window of size equal to each piRNA's length along these regions. For each placement of the window, the sequence of the piRNA (or its reverse complement) is compared to the underlying sequence substring and a similarity computed as the fraction of matching nucleotides. As pointed out earlier—see Supplementary Data 1 of (38), the pyrosequencing-derived piRNA sequences that were kept and reported contained less than or equal to two defects (≤∼5% error or ≥∼95% similarity) along the length of a read. Consequently, in our searches of intronic sequences, we permitted at most two nucleotide mismatches along the length of a piRNA. As control, we generated a shuffled version of the intronic regions and sought piRNA instances therein using the same exact criteria. The number of instances found in the shuffled intronic sequences provided an estimate for the expected number of false positives: at the 95% similarity threshold that we used, the false positive error rate was less than 0.00005 for both human and mouse.

RESULTS

We processed the sequences of the human and mouse genomes using the previously outlined pyknon discovery methodology—see Methods section as well as ref. (11)—and generated the corresponding pyknon sets. By definition, each pyknon is a recurrent motif whose sequence has a minimum length, a minimum number of intact copies in the intergenic and intronic regions of the genome, and at least one additional copy in an exonic region. The choices for minimum length (≥16 nucleotides) and minimum copy number (≥30 intact copies) ensure the pyknons’ statistical significance (11). It should be stressed that pyknons are discovered by processing a genome in isolation: consequently, their sequences are not necessarily conserved in other genomes or present in cross-species aligned sequences (11). The human and mouse pyknon sets contain 209 432 and 128 064 members, respectively. These pyknons are predominantly short (∼16–17 nucleotides). Moreover, to the extent that it can be deduced using RNA folding programs, pyknons do not exhibit any characteristic secondary structure. With respect to composition, the pyknons’ A–T composition is essentially identical to that of the entire genome (A = 30.4%, C = 20.0%, G = 20.6% and T = 28.8% for the pyknons versus A = ∼29.5%, C = ∼20.5%, G = ∼20.5% and T = ∼29.5% for the human genome). Finally, as reported earlier (11), a large fraction of pyknons (∼75%) have at least 100 exact intergenic and/or intronic copies.

Intronic instances of pyknons are distinct from human-mouse conserved regions and from known repeat elements

In order to simplify the presentation, we introduce and define what we will refer to as the ‘intra-genomic conservation’ model. This genome-centered ‘conservation’ manifests itself in the form of sequence fragments with multiple, intact instances in the genome under consideration. These sequence fragments are assumed to have a minimum length and a minimum number of copies. Figure 1 juxtaposes this ‘intra-genomic conservation’ model to the classical, ‘inter-genomic’ conservation model captured by cross-species alignments. In the latter model, an evolutionary relationship involving regions from at least two genomes is captured as a statistically significant sequence similarity that can be traced back to a presumed common evolutionary origin—shown as blue rectangles in Figure 1. The intra-genomic model generally involves much shorter regions (shown as green- or red-striped rectangles in Figure 1) that recur multiple times in a given genome but may not involve inter-genomically conserved sequences. Pyknons represent a special case of the intra-genomic model and involve the intergenic, intronic and exonic sequences of the same genome (11).

Figure 1.

We use a graphic involving two genomes (#1 and #2) to juxtapose the classical ‘inter-genomic’ or cross-species model of conservation with the ‘intra-genomic’ one that we introduced and defined in this presentation. Do intra-genomically conserved pyknons arise in genomic neighborhoods that were not previously characterized? We addressed this by measuring (i) the extent of overlap between pyknon instances and cross-species conserved regions; and, (ii) the extent of overlap between pyknon instances and repeat regions. Since it is generally not the case that the set of pyknons contains both a sequence and its reverse complement, we computed this overlap by comparing all sequences in 5′ → 3′ direction). Figure 2 shows, in the form of a pie chart, the decomposition of human and mouse introns in terms of conserved regions, repeats and pyknon instances. In order to ensure that the percentages of all regions sum up to 100%, we mark as conserved the regions that are exclusively conserved (i.e. both repeat-free and pyknon-free); we apply the same logic to the rest of the regions. The summarized findings of Figure 2 permit several observations regarding the intra-genomic conservation model. First, pyknons allow us to demarcate a significant fraction (7.4% in human and 4.4% in mouse) of the previously uncharacterized intronic space and to link it to one or more known exons (by virtue of the very definition of pyknons). Second, the pyknon instances have a strikingly low overlap with intronic regions that are conserved between human and mouse. The observed overlap of 0.7% in human and 0.4% in mouse is significantly lower than what would be expected by chance (=5.6% in human and 3.5% in mouse); the associated P-value is practically zero (see Methods section). Finally, Figure 2 shows that, in intronic space, pyknons are distinct from known human and mouse repeats and only partially overlap with them: as a result, a notable 7.4% of previously uncharacterized human and 4.4% of mouse intronic space is covered by pyknons that are non-conserved and do not overlap repeats in their 5′ → 3′ direction.

Figure 2.

Composition of human (A) and mouse (B) introns and the intra-genomic conservation model. Here, we have labeled as ‘conserved’ the regions that are exclusively conserved (i.e. they are repeat-free and pyknon-free), as ‘repeats’ the regions that are exclusively instances of repeats (i.e. they do not overlap conserved regions or pyknons), and as ‘pyknons’ the regions that are exclusively instances of pyknons (i.e. they are repeat-free and do not overlap conserved regions). Similarly, we have labeled as ‘conserved repeats’ the regions that are known repeats and conserved between human and mouse (but pyknon-free), and so on and so forth. It is clear from these two pie charts that the pyknons cover a substantial segment of the previously uncharacterized intronic sequence (shown in dark green in both cases). At the same time, pyknons exhibit very little overlap with sequences that are conserved between the human and mouse genomes (light green) and with sequences that correspond to repeat elements (pink). All sequence comparisons were done in the 5′ → 3′ direction. See also text for a discussion of these findings. We next extended this decomposition analysis to the intergenic and exonic regions of human and mouse. The results are shown in Figure 3. For the exonic regions, we distinguished among 5′UTRs, CDSs, and 3′UTRs. The intronic decomposition from Figure 2 was included to facilitate comparison. It is evident that pyknons also demarcate a significant fraction of intergenic space that was previously uncharacterized and link it to known exons (as a result of the definition of pyknons). Moreover, in complete analogy with introns, the intergenic instances of pyknons generally do not overlap conserved intergenic regions and intergenic repeat elements. Again, all sequence comparisons are done in 5′→3′ direction. Not surprisingly, Figure 3 also shows that the exonic regions, particularly CDSs and 5′UTRs, behave very differently from the intergenic and intronic regions. Indeed, the CDS regions of exons are almost entirely conserved, and, as a fraction of the total exonic sequence, they are essentially free of repeats; moreover, pyknon instances are present in both conserved and non-conserved sequences. The abundance of non-conserved pyknons decreases from 3′UTRs to 5′UTRs to CDSs.

Figure 3.

Composition of intergenic and exonic regions in terms of conserved regions, repeat elements and pyknons for the human and mouse (bottom) genomes. For both genomes, we included the intronic decomposition of Figure 2 to facilitate comparison. Figures 2 and 3 show that the intergenic and intronic regions spanned by pyknon instances are effectively different from those genomic sequences that are conserved between human and mouse or belong to known repeats. In fact, the three types of regions considered here (i.e. conserved, repeats and pyknons) are distinct from one another and largely non-overlapping. The pyknons delineate specific intergenic and intronic regions that are neither conserved between human and mouse nor parts of known and characterized repeats. Interestingly, the same holds true for the 3′UTRs of protein-coding genes. On the other end of the spectrum, the coding regions of the exons are almost entirely conserved between human and mouse (even their pyknon sequences lie inside conserved regions) and essentially repeat-free. These observations remain effectively unchanged in the mouse genome as shown in Figure 3B. Do pyknons merely reflect genomic oddities, or are they linked, somehow, to specific biological processes and molecular functions? We examine this next. In order to ensure that our findings pertain to sequences that are transcribed, the rest of the analysis focuses solely on human and mouse introns.

Intronic instances of pyknons are linked to the same processes and functions in human and mouse even though the underlying sequences are genome-specific and thus not conserved

To determine potential associations with biological processes and molecular functions, we performed an analysis of the GO terms (41) with which human and mouse genes are tagged. We labeled each gene's introns with the GO terms of the corresponding gene products and separately tested whether conserved regions, repeats and pyknon instances show a higher-than-random concentration in the introns of genes associated with certain GO terms (see Methods section). It is important to stress that, had we used actual coverage (i.e. the number of covered nucleotides) instead of concentration, longer introns would have been favored and the GO analysis would have simply rediscovered the well-known fact that genes associated with certain GO terms (e.g. development) tend to have much longer transcripts. Indeed, in all the regions we analyzed for GO term enrichment, coverage values highly correlate with gene length: correlation >0.90 in all cases. On the other hand, concentration values do not correlate with gene length: the absolute value of the correlation was <0.05 in all cases. First, we explored the possibility of links between human-mouse conserved intronic regions and GO terms. In both human and mouse, we identified more than 500 GO terms (at different levels of the GO hierarchy) that are significantly enriched in intronic regions conserved between human and mouse. For clarity, Table 1A includes only biological processes from the top three levels of the GO hierarchy that are enriched in conserved human and mouse introns (see Supplementary Data for complete table). Comparing the full lists of significantly enriched GO terms from the human and mouse analyses shows that they are 83% similar (Table 2): this result is not surprising since the conserved elements come from aligned orthologous sequences. Here, we define similarity as the percentage of GO terms in the shorter of the two lists that is common to both lists.

Table 1.

Enriched biological processes (representative sample) in human and mouse introns

	P-value (human)	P-value (mouse)
(A) Biological processes associated with intronic conserved regions between human & mouse
Cellular process	5.41E-07	7.30E-14
Cell communication	3.75E-35	1.24E-38
Regulation of cellular process	1.17E-05	1.17E-14
Cell adhesion	6.32E-17	2.95E-17
Cell differentiation	7.38E-39	0.00E+00
Regulation of biological process	3.58E-08	4.93E-19
Negative regulation of biological process	1.10E-16	1.38E-27
Regulation of development	4.87E-10	3.34E-12
Regulation of physiological process	2.14E-06	4.28E-16
Positive regulation of biological process	1.43E-11	1.60E-21
Regulation of growth	1.60E-04	1.65E-09
Interaction between organisms	2.32E-03	6.24E-03
Growth	1.18E-07	8.47E-12
Development	0.00E+00	0.00E+00
Sex differentiation	8.75E-04	1.18E-04
Developmental maturation	3.48E-06	5.90E-09
Anatomical structure development	0.00E+00	0.00E+00
Embryonic development	5.97E-18	8.76E-21
Pattern specification	8.20E-12	1.02E-18
Segmentation	8.48E-04	8.91E-05
Response to stimulus	3.02E-07	6.72E-06
Response to chemical stimulus	1.49E-04	6.38E-03
Response to stress	1.34E-03	6.68E-05
Response to external stimulus	2.00E-14	3.39E-11
Behavior	1.86E-12	2.84E-09
(B) Biological processes associated with pyknon elements in the introns of human & mouse
Cellular physiological process	2.76E-13
Chromosome segregation	5.39E-03	1.64E-05
Cellular metabolism	2.97E-17	3.23E-05
Cell division	4.85E-04	1.12E-05
Cell cycle (mitotic cell cycle, M phase, meiotic cell cycle)	6.58E-04	5.44E-03
Metabolism	2.52E-18	9.94E-07
Catabolism	2.42E-06	1.50E-04
Macromolecule metabolism	4.17E-19	2.64E-12
Primary metabolism	2.84E-14	9.45E-06
Protein localization	3.02E-09	7.90E-07
establishment of protein localization	7.26E-10	9.79E-08
Response to endogenous stimulus	1.91E-06	3.29E-04
response to DNA damage stimulus	2.19E-07	1.62E-04

(A) Enriched biological processes of intronic sequences that are conserved between human and mouse. (B) Enriched biological processes of intronic sequences that correspond to instances of pyknons. For each of the listed processes, the corresponding P-value is shown for the human and mouse genomes. It is important to point out that these enrichment lists hold true for both the human and mouse genomes: this is particularly notable in the case of part B of the Table because pyknons do not reside inside human-mouse conserved regions (Figure 2) See also text for a discussion.

Table 2.

Overlaps of significant GO terms in human and mouse at a false discovery rate of 5%

Similar results were obtained for the more conservative rate of 1% (see Supplement for details).

Enriched biological processes (representative sample) in human and mouse introns (A) Enriched biological processes of intronic sequences that are conserved between human and mouse. (B) Enriched biological processes of intronic sequences that correspond to instances of pyknons. For each of the listed processes, the corresponding P-value is shown for the human and mouse genomes. It is important to point out that these enrichment lists hold true for both the human and mouse genomes: this is particularly notable in the case of part B of the Table because pyknons do not reside inside human-mouse conserved regions (Figure 2) See also text for a discussion. Overlaps of significant GO terms in human and mouse at a false discovery rate of 5% Similar results were obtained for the more conservative rate of 1% (see Supplement for details). Next, we repeated the analysis separately for the intronic instances of the human and mouse pyknons and identified more than 200 significantly enriched GO terms in each of the two genomes. Table 1B includes only the high-level biological processes (see Supplementary Data for complete table). To ensure that the observed pyknon-related GO term enrichment is not due to the conserved elements (42) that co-localize with pyknons (0.7% and 0.4% overlap in human and mouse, respectively—see Figure 2), we repeated the analysis after removing the conserved regions that overlap with pyknons; the results remained unchanged. The GO term enrichment also remained unchanged when we repeated the analysis after simultaneously removing conserved and repeat elements. Finally, we analyzed the repeat elements alone and found only a handful of significantly enriched GO terms (16 for human and 34 for mouse—data not shown). These controls demonstrate that the functional links shown in Table 1B are neither due to the conserved regions nor to the presence of sense instances of repeat elements in introns. The first notable result of our study stems from the comparison of the two complete lists (see Supplementary Data) of significantly enriched GO terms in human and mouse pyknons: in fact, we find that these lists are 75% similar (Table 2). If we consider only high-level biological processes, the human and mouse lists are identical—these shorter lists were presented in Table 1B. What is particularly surprising here is that the same GO terms are enriched in human and mouse introns despite the fact that pyknons do not lie inside intronic regions that are conserved between human and mouse. Note that this functional connection resulted from the analysis of the intronic regions of gene transcripts that contain instances of pyknon sequences; as such it is orthogonal to analogous exonic findings that we described earlier (11).

Cross-genome-conserved sequences and intra-genome-conserved pyknons respectively are linked to non-overlapping lists of processes and functions

The second notable result of our study arises from the comparison of the enriched GO terms that are associated with intronic ‘conserved’ regions and with intronic ‘pyknon’ instances, respectively: we find that the overlap of the two lists of GO terms is very small and ranges from 0% to 4% (Table 2). This is remarkable because it suggests that distinct intronic sequences are linked to distinct regulatory networks in the human and mouse genomes. Table 3 summarizes these results in the context of the classical inter-genomic model (intronic sequences conserved between human and mouse) and the intra-genomic conservation (intronic space covered by organism-specific pyknon sequences) that we introduced above.

Table 3.

Summarizing the similarities and differences between the intragenomic conservation model we defined above and cross-genome conserved regions obtained from human-mouse alignments (Figure 1)

Feature	Intronic regions conserved between human & mouse	Pyknons
Length	Long	Short
Cross-species conservation	Yes	No
Organism-specific conservation	No	Yes
Functional conservation	Yes	Yes

Summarizing the similarities and differences between the intragenomic conservation model we defined above and cross-genome conserved regions obtained from human-mouse alignments (Figure 1)

A subset of pyknons co-localizes extensively with piRNAs inside human and mouse introns

We note that Table 1B includes ‘meiosis’ as one of the cellular processes associated with the intronic instances of pyknons. Recently, a new class of short RNAs, the piRNAs, was found to accumulate at the onset of meiosis and was reported in three different organisms namely, human, mouse and rat (37,38,43). The distinct association of piRNAs with the meiotic step and the fact that some of the cloned mouse sequences were reported to map to introns (38), led us to investigate the possibility of a pyknon–piRNA connection in intronic sequences. After locating all the piRNA instances in introns, we calculated their overlap with conserved regions, repeats, and pyknons, generating relative enrichment values over what would be expected randomly. We repeated the same analysis for the reverse complement of piRNAs as well as for the intersection of piRNAs and their reverse complements, i.e. the intronic regions that are covered by a piRNA and the reverse complement of a (possibly different) piRNA. The results are summarized in Supplementary Figure 1A. Essentially, we find that piRNA instances as well as the instances of their reverse complements are depleted in conserved intronic regions, somewhat enriched in repeat elements present in introns, and highly enriched in intronic pyknon instances. Again, all sequence comparisons are done in 5′→3′ direction. Supplementary Figure 1B shows the ‘recall’ percentages of piRNAs by conserved, repeat and pyknon elements. The finding is that the computed recall figures indicate that pyknon elements capture (i.e. describe) piRNAs much better than conserved regions and repeat elements: the statistical significance of the overlap of the intronic piRNA instances with intronic pyknons and intronic repeats has P-values of ∼10−10 and ∼10−3, respectively. Even though a large fraction of piRNAs co-localizes with pyknons, the converse is not true. Indeed, only a small fraction of the intronic regions occupied by human and mouse pyknons co-localizes with piRNAs (9% and 12% for human and mouse respectively at ≥95% similarity). In other words, our pyknon collections contain many sequences that do not co-localize with piRNAs. Does this mean that pyknons merely correspond to piRNAs that have not yet been sequenced? Or, do pyknons capture molecular classes beyond piRNAs? If we only consider pyknons that are not similar to the known piRNAs at the sequence level, most of the previously enriched GO terms survive but ‘meiosis’ now disappears from the list of significant GO terms. The latter result holds true even when we permit a false discovery rate as high as 50%. Given that piRNAs have only been found during meiosis, and therefore play a role during this process, the fact that the subset of pyknons which are not associated with the known piRNAs is not linked to meiosis, suggests that the sequences of the pyknons capture piRNAs but also other currently unidentified categories of molecules.

Case study: pyknons, introns, piRNAs and the MLH1 gene

Arguably, the picture that is emerging from the above analysis is complicated. We highlight this observation with a concrete example that also shows the relevance of these results in the disease context. The complete list of GO terms (see Supplementary Data) that are significantly enriched in pyknon-containing regions of the human and mouse introns includes the terms: ‘GO:0006281/DNA repair’ and ‘GO:0006298/mismatch repair’. We emphasize that these two terms are uniquely associated with pyknons as is shown in the Supplementary Data; thus, the results that we describe next are neither associated with conserved regions nor with known repeat elements. A search of the ENSEMBL database (39) for human genes labeled with these two GO terms identifies a number of entries; among them is MLH1, a gene that has been associated with hereditary non-polyposis colorectal cancer and other types of carcinomas, microsatellite instabilities, etc. (44–47). The human MLH1 transcript has 17 introns whereas its mouse orthologue has 18. Table 4 lists a few examples of human and mouse pyknons that are present in the introns of MLH1: as can be seen, their distribution and copy numbers across MLH1's introns is rather complex. Also shown is the total number of genomic copies of each listed pyknon: we provide this number only as a reference since, as we showed above, the pyknons are already over-represented in the introns of genes belonging to specific GO processes. We further examined whether the 10 human pyknons shown in Table 4 had any instances in the mouse introns of MLH1: we found no such cases even when we allowed as many as 15% of the pyknon positions to be mismatched. Respectively, we examined whether the four mouse pyknons shown in Table 4 had any instances in the human introns of MLH1: again, we found no such cases even when we allowed as many as 15% of the pyknon positions to be mismatched. In addition to their MLH1 instances, each of the shown pyknons has thousands of intact copies in other parts of the corresponding genome. It should also be pointed out that several of the listed pyknons are reverse complement pairs (e.g. GTATTTTTAGTAGAGA and TCTCTACTAAAAATAC), with the members of these pairs having generally different copy numbers that appear in either the same or in different introns of MLH1. Finally, we note that 17 known human piRNAs and 23 known mouse piRNAs can be found intact, in either sense or antisense direction, in nine human introns and six mouse introns of MLH1, respectively. The sequences and the identity of these introns are shown in Table 5.

Table 4.

Examples and related information for human and mouse pyknons that are present in the introns of MLH1, a microsatellite instability gene

Human pyknon sequence	Total copies in human genome	How many copies are present in which intron of human MLH1

		1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16	17
AACTCCTGACCTCAGGTGAT	92 215	2		1				2									2
AGTAGCTGGGATTACAG	205 790	1						3				1					2
CTGTAATCCCAGCTACT	205 790		1			1								1	1		2
ATTCTCCTGCCTCAGCCTC	292 883	1		3	1					1		1					2
GTATTTTTAGTAGAGA	323 826	1		1	1	1		1				2					3
TCTCTACTAAAAATAC	323 826		1		1	1		1				1		2	1
CCCAGGCTGGAGTGCA	358 005	1			1			2				1			1		4
TGCACTCCAGCCTGGG	358 005					1		1				1		1	3		2
TAATCCCAGCACTTTGGGA	358 314		1		1			1						1	2			1
TCCCAAAGTGCTGGGATTA	358 314	2	1	1						1							3

Mouse pyknon sequence	Total copies in mouse genome	How many copies are present in which intron of mouse MLH1

		1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16	17	18

CTGCCTCTCTGCCTCT	7 758									1		7
TGGAAGAGCAGTCAGT	19 055													1		1
TGGCTGTCCTGGAACTCACT	76 023				1					1
TGTAGACCAGGCTGGCCT	92 150				1					1

There are 17 introns in the human MLH1 and 18 introns in the mouse orthologue. Also listed is the total number of genomic copies for each listed sequence: this number is provided as a reference only: as we have already described, the pyknons are already over-represented in the introns of genes belonging to specific GO processes. Note that the pairs {AGTAGCTGGGATTACAG, CTGTAATCCCAGCTACT}, {GTATTTTTAGTAGAGA, TCTCTACTAAAAATAC}, {CCCAGGCTGGAGTGCA, TGCACTCCAGCCTGGG}, and {TAATCCCAGCACTTTGGGA, TCCCAAAGTGCTGGGATTA} of the human pyknon examples are the reverse complement of one another: the members of a given pair can be present in the same intron, or in different introns of MLH1 and generally differ in their number of intronic instances in MLH1. See also text.

Table 5.

The introns of MLH1 also contain intact instances of piRNAs and reverse complement of piRNAs in them

Organism	Intron	Direction	piRNA sequence
Human	2	Sense	CTGCATAGTATTCCATGGTGTATATGTGC
	3	Sense	CCTCCCAAAGTGCTGGGATTACAGGCGTGAG
	4	Sense	TGCCTGTAATCCCAGCACTTTGGGAGGCCG
	4	Sense	TGTAATCCCAGCACTTTGGGAGGCCGAGG
	5	Sense	AGTAGAGACAGGGTTTCACCATGTTGGCCA
	5	Sense	GGCTGGTCTCGAACTCCTGACCTCAGGT
	7	Antisense	CAGCCTCCTGAGTAGCTGGGATTACAGGCA
	7	Antisense	CTCACGCCTGTAATCCCAGCACTTTGGGAGG
	7	Sense	GGCTGGTCTCGAACTCCTGACCTCAGGT
	7	Sense	TGGTCTCGAACTCCTGACCTCAGGTGATCC
	9	Antisense	CCTCGGCCTCCCAAAGTGCTGGGATTACA
	9	Sense	CCTCCCAAAGTGCTGGGATTACAGGCGTGAG
	13	Antisense	ACCTGAGGTCAGGAGTTCGAGACCAGCC
	13	Antisense	GGATCACCTGAGGTCAGGAGTTCGAGACCA
	13	Antisense	TGGATCACCTGAGGTCAGGAGTTCGAGACCA
	14	Sense	TGCCTGTAATCCCAGCTACTCAGGAGGCTG
	16	Sense	TCACTTGAACCCAGGAGGCGGAGGTTGCAGTG

Mouse	2	Sense	TGAGTTCAAATCCCAGCAACCACATGGTGGC
	2	Sense	TGTGTGTGTGTGTGTGTGTGTGTGTG
	4	Antisense	AACTCACTCTGTAGACCAGGCTGGCCTCGAAC
	4	Antisense	AGCCCTGGCTGTCCTGGAACTCACTCTGTA
	4	Antisense	CCTCGAACTCAGAAATCCGCCTGCCTCTGCCT
	4	Sense	ACTCAGAAATCCGCCTGCCTCTGCCTCC
	4	Sense	CTGGCTGTCCTGGAACTCACTCTGTAGA
	4	Sense	TCGAACTCAGAAATCCGCCTGCCTCTGCCTC
	4	Sense	TCNCTCTGTAGACCAGGCTGGCCTCGAACT
	4	Sense	TGGAACTCACTCTGTAGACCAGGCTGGCC
	4	Sense	TGGCCTCGAACTCAGAAATCCGCCTGCCTC
	9	Antisense	GCCCTGGCTGTCCTGGAACTCACTTTGTA
	9	Antisense	TAGCCCTGGCTGTCCTGGAACTCACTTTGNA
	9	Antisense	TGGCTGTCCTGGAACTCACTTTGTA
	9	Sense	GCCCTGGCTGTCCTGGAACTCACTTTGTA
	9	Sense	TCACTTTGTAGACCAGGCTGGCCTCGAACT
	9	Sense	TCGAACTCAGAAATCTGCCTGCCTCTGCCTC
	9	Sense	TGGAACTCACTTTGTAGACCAGGCTGGCCT
	9	Sense	TGGCCTCGAACTCAGAAATCTGCCTGCCTC
	11	Sense	TCCTTCTTCTTGAGCTTCATGTGGTCTGT
	14	Antisense	TTAGGGTTTTACTGCTGTGAACAGACACCA
	14	Sense	TGCTGTGAACAGACACCATGACCAAGGCAAC
	15	Antisense	GCCACCATGTGGTTGCTGGGATTTGAACTCA

The table shows known piRNA sequences and the corresponding MLH1 introns in which they are found in human and mouse.

Examples and related information for human and mouse pyknons that are present in the introns of MLH1, a microsatellite instability gene There are 17 introns in the human MLH1 and 18 introns in the mouse orthologue. Also listed is the total number of genomic copies for each listed sequence: this number is provided as a reference only: as we have already described, the pyknons are already over-represented in the introns of genes belonging to specific GO processes. Note that the pairs {AGTAGCTGGGATTACAG, CTGTAATCCCAGCTACT}, {GTATTTTTAGTAGAGA, TCTCTACTAAAAATAC}, {CCCAGGCTGGAGTGCA, TGCACTCCAGCCTGGG}, and {TAATCCCAGCACTTTGGGA, TCCCAAAGTGCTGGGATTA} of the human pyknon examples are the reverse complement of one another: the members of a given pair can be present in the same intron, or in different introns of MLH1 and generally differ in their number of intronic instances in MLH1. See also text. The introns of MLH1 also contain intact instances of piRNAs and reverse complement of piRNAs in them The table shows known piRNA sequences and the corresponding MLH1 introns in which they are found in human and mouse.

DISCUSSION

Beginning in the early 1980s with the analysis of amino acid coding sequences (48) an argument was made in support of the hypothesis that ‘sequence conservation implies functional conservation’. The hypothesis was quickly extended to include non-coding sequences and has since been fueling the biological sequence analysis revolution (49). Underlying this hypothesis is an inter-genomic model of conservation according to which genomic regions with functional significance undergo negative selection. Contrasting this, recent work showed a first example that the same type of functional information may exist in multiple genomes in the absence of discernible underlying sequence conservation (9,12). The analysis that we presented above proceeded along similar lines: using organism-specific pyknon sequences from the human and mouse genomes, we demonstrated that functional conservation in the absence of sequence conservation is rather pronounced. Our results revealed an important role for what we defined as the ‘intra-genomic conservation’ model and led to the following surprising result: although pyknons are present in intronic regions that are not conserved between human and mouse they nonetheless exhibit a preference for the introns of genes belonging to the same biological processes and molecular functions in these two genomes. Analogously, intronic regions that are conserved between human and mouse are also associated with specific biological processes and functions. However, it is very notable that these two sets of processes and functions have an in-significant overlap; this indicates that distinct intronic regions in human and mouse are associated with distinct biological processes and molecular functions, suggesting the involvement of introns in regulation. Our findings have intriguing implications for intronic evolution. With respect to the conserved intronic regions that we examined, it is apparent that these can be traced back to a common ancestor of the human and mouse genomes. In contrast, the intronic regions that correspond to pyknon instances suggest a substantially more complicated situation: their extent and the high number of the pyknons’ genomic copies suggest that the same basic mechanism may be in action in both the human and the mouse genome. This presumed mechanism, operating on sequences that are not conserved in these two genomes, has given rise to the currently extant collection of introns. It is interesting to note that this presumed mechanism appears to preferentially ‘target’ (actively or passively) the introns of genes that are linked to specific functions, giving rise to the entries of Table 1B (as opposed to the entries of Table 1A). It is not clear at the moment how this presumed mechanism has managed, by acting in an apparently independent manner in two distinct genomes, to ‘delineate’ the pyknons sequences and to ‘arrange’ them inside the transcripts of genes in a manner that for both the human and mouse genomes favors destinations belonging to the same set of processes and functions. These functional links exist without conservation of the underlying sequence, and are in agreement with current thinking that sequence conservation is not a prerequisite for functional relevance (8). Evidence has been steadily accumulating in support of a functional significance for introns (50): a large fraction of the known microRNAs as well as snoRNAs originate in intronic space; mutations in intronic sequence have been linked to desirable phenotypes; ncRNA with currently uncharacterized regulatory role has been found to originate in intronic space; etc. More recently, intronic sequences were linked to a putative regulatory mechanism for modulating the membrane properties and ion channel gradients of hippocampal neurons (51). Such findings together with the ones that we have presented above support a much more active role for introns. This role is perhaps part of a much more pronounced RNA-driven layer of regulation, as conjectured earlier (52).

SUPPLEMENTARY DATA

At the website http://cbcsrv.watson.ibm.com/pyknons_introns.html the user can access the human and pyknon sequences discussed above.

50 in total

Review 1. The evolution of controlled multitasked gene networks: the role of introns and other noncoding RNAs in the development of complex organisms.

Authors: J S Mattick; M J Gagen
Journal: Mol Biol Evol Date: 2001-09 Impact factor: 16.240

2. Initial sequencing and comparative analysis of the mouse genome.

Authors: Robert H Waterston; Kerstin Lindblad-Toh; Ewan Birney; Jane Rogers; Josep F Abril; Pankaj Agarwal; Richa Agarwala; Rachel Ainscough; Marina Alexandersson; Peter An; Stylianos E Antonarakis; John Attwood; Robert Baertsch; Jonathon Bailey; Karen Barlow; Stephan Beck; Eric Berry; Bruce Birren; Toby Bloom; Peer Bork; Marc Botcherby; Nicolas Bray; Michael R Brent; Daniel G Brown; Stephen D Brown; Carol Bult; John Burton; Jonathan Butler; Robert D Campbell; Piero Carninci; Simon Cawley; Francesca Chiaromonte; Asif T Chinwalla; Deanna M Church; Michele Clamp; Christopher Clee; Francis S Collins; Lisa L Cook; Richard R Copley; Alan Coulson; Olivier Couronne; James Cuff; Val Curwen; Tim Cutts; Mark Daly; Robert David; Joy Davies; Kimberly D Delehaunty; Justin Deri; Emmanouil T Dermitzakis; Colin Dewey; Nicholas J Dickens; Mark Diekhans; Sheila Dodge; Inna Dubchak; Diane M Dunn; Sean R Eddy; Laura Elnitski; Richard D Emes; Pallavi Eswara; Eduardo Eyras; Adam Felsenfeld; Ginger A Fewell; Paul Flicek; Karen Foley; Wayne N Frankel; Lucinda A Fulton; Robert S Fulton; Terrence S Furey; Diane Gage; Richard A Gibbs; Gustavo Glusman; Sante Gnerre; Nick Goldman; Leo Goodstadt; Darren Grafham; Tina A Graves; Eric D Green; Simon Gregory; Roderic Guigó; Mark Guyer; Ross C Hardison; David Haussler; Yoshihide Hayashizaki; LaDeana W Hillier; Angela Hinrichs; Wratko Hlavina; Timothy Holzer; Fan Hsu; Axin Hua; Tim Hubbard; Adrienne Hunt; Ian Jackson; David B Jaffe; L Steven Johnson; Matthew Jones; Thomas A Jones; Ann Joy; Michael Kamal; Elinor K Karlsson; Donna Karolchik; Arkadiusz Kasprzyk; Jun Kawai; Evan Keibler; Cristyn Kells; W James Kent; Andrew Kirby; Diana L Kolbe; Ian Korf; Raju S Kucherlapati; Edward J Kulbokas; David Kulp; Tom Landers; J P Leger; Steven Leonard; Ivica Letunic; Rosie Levine; Jia Li; Ming Li; Christine Lloyd; Susan Lucas; Bin Ma; Donna R Maglott; Elaine R Mardis; Lucy Matthews; Evan Mauceli; John H Mayer; Megan McCarthy; W Richard McCombie; Stuart McLaren; Kirsten McLay; John D McPherson; Jim Meldrim; Beverley Meredith; Jill P Mesirov; Webb Miller; Tracie L Miner; Emmanuel Mongin; Kate T Montgomery; Michael Morgan; Richard Mott; James C Mullikin; Donna M Muzny; William E Nash; Joanne O Nelson; Michael N Nhan; Robert Nicol; Zemin Ning; Chad Nusbaum; Michael J O'Connor; Yasushi Okazaki; Karen Oliver; Emma Overton-Larty; Lior Pachter; Genís Parra; Kymberlie H Pepin; Jane Peterson; Pavel Pevzner; Robert Plumb; Craig S Pohl; Alex Poliakov; Tracy C Ponce; Chris P Ponting; Simon Potter; Michael Quail; Alexandre Reymond; Bruce A Roe; Krishna M Roskin; Edward M Rubin; Alistair G Rust; Ralph Santos; Victor Sapojnikov; Brian Schultz; Jörg Schultz; Matthias S Schwartz; Scott Schwartz; Carol Scott; Steven Seaman; Steve Searle; Ted Sharpe; Andrew Sheridan; Ratna Shownkeen; Sarah Sims; Jonathan B Singer; Guy Slater; Arian Smit; Douglas R Smith; Brian Spencer; Arne Stabenau; Nicole Stange-Thomann; Charles Sugnet; Mikita Suyama; Glenn Tesler; Johanna Thompson; David Torrents; Evanne Trevaskis; John Tromp; Catherine Ucla; Abel Ureta-Vidal; Jade P Vinson; Andrew C Von Niederhausern; Claire M Wade; Melanie Wall; Ryan J Weber; Robert B Weiss; Michael C Wendl; Anthony P West; Kris Wetterstrand; Raymond Wheeler; Simon Whelan; Jamey Wierzbowski; David Willey; Sophie Williams; Richard K Wilson; Eitan Winter; Kim C Worley; Dudley Wyman; Shan Yang; Shiaw-Pyng Yang; Evgeny M Zdobnov; Michael C Zody; Eric S Lander
Journal: Nature Date: 2002-12-05 Impact factor: 49.962

3. The birth of an alternatively spliced exon: 3' splice-site selection in Alu exons.

Authors: Galit Lev-Maor; Rotem Sorek; Noam Shomron; Gil Ast
Journal: Science Date: 2003-05-23 Impact factor: 47.728

Review 4. MicroRNAs: genomics, biogenesis, mechanism, and function.

Authors: David P Bartel
Journal: Cell Date: 2004-01-23 Impact factor: 41.582

Review 5. The Ensembl core software libraries.

Authors: Arne Stabenau; Graham McVicker; Craig Melsopp; Glenn Proctor; Michele Clamp; Ewan Birney
Journal: Genome Res Date: 2004-05 Impact factor: 9.043

Review 6. RNAs from all categories generate retrosequences that may be exapted as novel genes or regulatory elements.

Authors: J Brosius
Journal: Gene Date: 1999-09-30 Impact factor: 3.688

7. Active conservation of noncoding sequences revealed by three-way species comparisons.

Authors: I Dubchak; M Brudno; G G Loots; L Pachter; C Mayor; E M Rubin; K A Frazer
Journal: Genome Res Date: 2000-09 Impact factor: 9.043

8. A transposable element-mediated gene divergence that directly produces a novel type bovine Bcnt protein including the endonuclease domain of RTE-1.

Authors: Shintaro Iwashita; Naoki Osada; Tomohito Itoh; Mariko Sezaki; Kenshiro Oshima; Etsuko Hashimoto; Yuko Kitagawa-Arita; Ichiro Takahashi; Tohru Masui; Katsuyuki Hashimoto; Wojciech Makalowski
Journal: Mol Biol Evol Date: 2003-06-27 Impact factor: 16.240

9. Sequencing and comparison of yeast species to identify genes and regulatory elements.

Authors: Manolis Kellis; Nick Patterson; Matthew Endrizzi; Bruce Birren; Eric S Lander
Journal: Nature Date: 2003-05-15 Impact factor: 49.962

10. Conservation of RET regulatory function from human to zebrafish without sequence similarity.

Authors: Shannon Fisher; Elizabeth A Grice; Ryan M Vinton; Seneca L Bessling; Andrew S McCallion
Journal: Science Date: 2006-03-23 Impact factor: 47.728

11 in total

1. Therapeutic potential of FLANC, a novel primate-specific long non-coding RNA in colorectal cancer.

Authors: Martin Pichler; Cristian Rodriguez-Aguayo; Su Youn Nam; Mihnea Paul Dragomir; Recep Bayraktar; Simone Anfossi; Erik Knutsen; Cristina Ivan; Enrique Fuentes-Mattei; Sang Kil Lee; Hui Ling; Tina Catela Ivkovic; Guoliang Huang; Li Huang; Yoshinaga Okugawa; Hiroyuki Katayama; Ayumu Taguchi; Emine Bayraktar; Rajat Bhattacharya; Paola Amero; William Ruixian He; Anh M Tran; Petra Vychytilova-Faltejskova; Christiane Klec; Diana L Bonilla; Xinna Zhang; Sanja Kapitanovic; Bozo Loncar; Roberta Gafà; Zhihui Wang; Vittorio Cristini; Samir M Hanash; Menashe Bar-Eli; Giovanni Lanza; Ondrej Slaby; Ajay Goel; Isidore Rigoutsos; Gabriel Lopez-Berestein; George Adrian Calin
Journal: Gut Date: 2020-01-27 Impact factor: 23.059

2. Identifying common transcriptome signatures of cancer by interpreting deep learning models.

Authors: Anupama Jha; Mathieu Quesnel-Vallières; David Wang; Andrei Thomas-Tikhonenko; Kristen W Lynch; Yoseph Barash
Journal: Genome Biol Date: 2022-05-17 Impact factor: 17.906

Review 3. Novel classes of non-coding RNAs and cancer.

Authors: Jiri Sana; Petra Faltejskova; Marek Svoboda; Ondrej Slaby
Journal: J Transl Med Date: 2012-05-21 Impact factor: 5.531

4. Homoplastic microinversions and the avian tree of life.

Authors: Edward L Braun; Rebecca T Kimball; Kin-Lan Han; Naomi R Iuhasz-Velez; Amber J Bonilla; Jena L Chojnowski; Jordan V Smith; Rauri Ck Bowie; Michael J Braun; Shannon J Hackett; John Harshman; Christopher J Huddleston; Ben D Marks; Kathleen J Miglia; William S Moore; Sushma Reddy; Frederick H Sheldon; Christopher C Witt; Tamaki Yuri
Journal: BMC Evol Biol Date: 2011-05-25 Impact factor: 3.260

5. "Off-Spotter": very fast and exhaustive enumeration of genomic lookalikes for designing CRISPR/Cas guide RNAs.

Authors: Venetia Pliatsika; Isidore Rigoutsos
Journal: Biol Direct Date: 2015-01-29 Impact factor: 4.540

6. N-BLR, a primate-specific non-coding transcript leads to colorectal cancer invasion and migration.

Authors: Isidore Rigoutsos; Sang Kil Lee; Su Youn Nam; Simone Anfossi; Barbara Pasculli; Martin Pichler; Yi Jing; Cristian Rodriguez-Aguayo; Aristeidis G Telonis; Simona Rossi; Cristina Ivan; Tina Catela Ivkovic; Linda Fabris; Peter M Clark; Hui Ling; Masayoshi Shimizu; Roxana S Redis; Maitri Y Shah; Xinna Zhang; Yoshinaga Okugawa; Eun Jung Jung; Aristotelis Tsirigos; Li Huang; Jana Ferdin; Roberta Gafà; Riccardo Spizzo; Milena S Nicoloso; Anurag N Paranjape; Maryam Shariati; Aida Tiron; Jen Jen Yeh; Raul Teruel-Montoya; Lianchun Xiao; Sonia A Melo; David Menter; Zhi-Qin Jiang; Elsa R Flores; Massimo Negrini; Ajay Goel; Menashe Bar-Eli; Sendurai A Mani; Chang Gong Liu; Gabriel Lopez-Berestein; Ioana Berindan-Neagoe; Manel Esteller; Scott Kopetz; Giovanni Lanza; George A Calin
Journal: Genome Biol Date: 2017-05-24 Impact factor: 13.583

7. Alu and b1 repeats have been selectively retained in the upstream and intronic regions of genes of specific functional classes.

Authors: Aristotelis Tsirigos; Isidore Rigoutsos
Journal: PLoS Comput Biol Date: 2009-12-18 Impact factor: 4.475

8. The Murine PSE/TATA-dependent transcriptome: evidence of functional homologies with its human counterpart.

Authors: Maria Jessica Bruzzone; Paola Gavazzo; Sara Massone; Carolina Balbi; Federico Villa; Anastasia Conti; Giorgio Dieci; Ranieri Cancedda; Aldo Pagano
Journal: Int J Mol Sci Date: 2012-11-13 Impact factor: 5.923

9. The complex transcriptional landscape of the anucleate human platelet.

Authors: Paul F Bray; Steven E McKenzie; Leonard C Edelstein; Srikanth Nagalla; Kathleen Delgrosso; Adam Ertel; Joan Kupper; Yi Jing; Eric Londin; Phillipe Loher; Huang-Wen Chen; Paolo Fortina; Isidore Rigoutsos
Journal: BMC Genomics Date: 2013-01-16 Impact factor: 3.969

10. DNMT1-interacting RNAs block gene-specific DNA methylation.

Authors: Annalisa Di Ruscio; Alexander K Ebralidze; Touati Benoukraf; Giovanni Amabile; Loyal A Goff; Jolyon Terragni; Maria Eugenia Figueroa; Lorena Lobo De Figueiredo Pontes; Meritxell Alberich-Jorda; Pu Zhang; Mengchu Wu; Francesco D'Alò; Ari Melnick; Giuseppe Leone; Konstantin K Ebralidze; Sriharsa Pradhan; John L Rinn; Daniel G Tenen
Journal: Nature Date: 2013-10-09 Impact factor: 49.962