Literature DB >> 25707673

IAOseq: inferring abundance of overlapping genes using RNA-seq data.

Hong Sun, Shuang Yang, Liangliang Tun, Yixue Li.

Abstract

BACKGROUND: Overlapping transcription constitutes a common mechanism for regulating gene expression. A major limitation of the overlapping transcription assays is the lack of high throughput expression data.
RESULTS: We developed a new tool (IAOseq) that is based on reads distributions along the transcribed regions to identify the expression levels of overlapping genes from standard RNA-seq data. Compared with five commonly used quantification methods, IAOseq showed better performance in the estimation accuracy of overlapping transcription levels. For the same strand overlapping transcription, currently existing high-throughput methods are rarely available to distinguish which strand was present in the original mRNA template. The IAOseq results showed that the commonly used methods gave an average of 1.6 fold overestimation of the expression levels of same strand overlapping genes.
CONCLUSIONS: This work provides a useful tool for mining overlapping transcription levels from standard RNA-seq libraries. IAOseq could be used to help us understand the complex regulatory mechanism mediated by overlapping transcripts. IAOseq is freely available at http://lifecenter.sgst.cn/main/en/IAO_seq.jsp.

Entities: CellLine Chemical Disease Species

Mesh：

Substances：
RNA, Messenger

Year: 2015 PMID： 25707673 PMCID： PMC4331702 DOI： 10.1186/1471-2105-16-S1-S3

Source DB: PubMed Journal: BMC Bioinformatics ISSN： 1471-2105 Impact factor: 3.169

Background

The advent of genome-wide techniques for studying transcription has strongly indicated that the majority of the genome can be transcribed [1-3]. Genome-wide overlapping transcription has been reported in various animal and plant species [4-9]. Multifunctional usage of the same genomic space leads to identical cDNA sequences produced from the same or opposite strands of DNA. The overlapping regions can include the exons in mRNAs, and a large number of transcripts from overlapping genes do not encode proteins [10-13]. Overlapping transcription is a highly conserved phenomenon that spans the animal, plant and fungal kingdoms, constituting a common mechanism for regulating gene expression. The overlap of sense-antisense gene pairs can affect the regulation of gene expression at several levels including transcription, messenger RNA processing, splicing, stability, cellular transport and translation [14-16]. Natural antisense transcripts (NATs) are frequently functional and use diverse transcriptional and post-transcriptional gene regulatory mechanisms to carry out a wide variety of biological roles. Given the diverse regulatory functions and the widespread abundance of NATs in the human genome, it is not a surprise when some NATs were implicated in human diseases. Studies have shown that changes in antisense transcription were implicated in pathogenesis [17-19], indicating that activated antisense transcripts might be potential molecular markers for disease risk, as well as serving as novel therapeutic targets. However, apart from a few experimentally validated cases, the physiological roles of antisense transcription and the underlying mechanisms are largely unknown. In-depth analysis of the transcriptome of overlapping genes is a valuable way for understanding the overlapping transcripts-mediated regulatory mechanism. A major limitation to the development of overlapping transcripts assays is the lack of high throughput expression data. Expression profiles of antisense and their sense targets can be used to infer the regulatory mechanism of action and the mechanism of antisense function. Techniques, like serial analysis of gene expression (SAGE) and cap analysis gene expression (CAGE), have been extensively used for the analysis of overlapping transcription [20,21]. Both of these methods have disadvantages and are much expensive to perform [22]. The widely used high-throughput microarray method, when dealing with probes mapped to the overlapping regions of same-strand overlapping genes, would provide no help to distinguishing signals from the original mRNA templates. Next generation sequencing as a powerful tool has made dramatic improvement in sequencing cDNA derived from cellular RNA in a massively parallel and cost-effective way [23]. Recently developed techniques lead to more efficient assembly of individual transcriptomes. TIF-Seq determine both transcript ends by jointly sequencing the 5' and 3' ends of each RNA molecule [24]. RNA paired-end tags (RNA-PET) could demarcate the genomic boundaries of PET-represented DNA fragments [25]. However, standard libraries for RNA-seq, the most commonly used protocol, do not preserve information about which strand was originally transcribed, and strand specific RNA-seq method is labor intensive and requires substantial amounts of starting material [26,27]. Furthermore, though strand specific library construction preserves information about the orientation of transcripts, most current studies analyzed cDNAs without strand information because of its inefficiency and artifacts of reverse transcription. Several methods have developed to reconstruct novel transcripts [28], and estimate isoforms abundances [29]. There are also several bioinformatics methods developed to infer strand information from non-strand specific RNA-seq data based on information such as open reading frame (ORF) in protein coding genes, biases in coverage between 5' and 3' ends or splice site orientation in eukaryotic genomes [30-32]. However, when dealing with reads mapped within exon challenge must be overcome to the inference without splicing information; besides, for those reads mapped within overlapping regions of same strand overlapping genes, even strand specific RNA-seq methods could not distinguish which strand was present in the original mRNA template. To solve these problems, we developed a new method, IAOseq, to infer abundance of overlapping genes from high-throughput RNA-Seq data constructed by standard library. Levin et.al. had built a compendium of yeast libraries using several strand specific protocols and a non-strand specific protocol under same biological condition [26], which makes it possible to verify the performance of IAOseq. We therefore applied our method on the non-strand specific RNA-seq dataset (nonST in short) to infer expression levels of overlapping genes and use the strand specific dataset to test the validity of the method. Compared with other five most commonly used quantification methods, IAOseq yielded much better inferences.

Methods

According to the yeast genome annotation, about eighteen percent of yeast genes are overlapping genes, most of which are located on different strand and about one-fifth are multi-gene overlaps (Additional file 1: Table S2). The average overlapping length is 290 bp for yeast overlapping genes (Additional file 1: Figure S1), and in mammalian genomes it is longer than 1 Kb [3]. Sequence reads obtained from the common next generation sequencing platforms, including Illumina, SOLiD and 454, are often very short (30-400 nt) [27]. Therefore, there is a high possibility that reads, which are shorter than the overlapping length, would be fully mapped to the overlapping regions with the result that strand information cannot be inferred by subsequent computational analyses using informations such as splice site orientation etc, leading to an overestimation of overlapping genes' expression levels.

Implementation of IAOseq

To address this issue, we firstly divide annotated genes into two categories according to their genomic locations: overlapping genes and non-overlapping genes. To accurately infer overlapping genes' expression levels from nonST data, the overlapping regions are further divided into sub-regions as illustrated in the left box of Figure 1. Assume a transcribed genomic region contains m overlapping genes with expression levels . The transcribed region is split into n sub-regions with length based on the overlapping pattern. A set of read counts are got from nonST data, where is the total read counts mapped to the j-th sub-region. An indicator matrix is introduced to describe the overlapping pattern of the transcribed region, where or indicates whether the j-th sub-region is included in or excluded from the i-th gene respectively.

Figure 1

Flowchart of IAOseq.

Flowchart of IAOseq. Under the assumption that sequenced reads are sampled independently and uniformly, according to the Poisson distribution model proposed by Jiang et al. when modeling the distribution of an individual sample [33], the read counts would follow a Poisson distribution with parameter , and , where w is the total number of mapped reads. As reads distributions in most RNA-Seq datasets are not uniform [34], two bias curves, the global bias curve (GBC) and the local bias curve (LBC) are introduced to revise the indicator matrix . The GBC represents the general tendency of reads distribution for the whole transcriptome, and the LBC depicts gene-specific read distribution [35]. GBC is constructed from the non-overlapping gene sets because of its independence on specific genes. Reads distribution of a genomic region covered by overlapping genes is a mixture distribution of all its expressed genes. LBC is thus constructed to approximately describe the trend of read distribution along each gene [35]. For regions covered by overlapping genes, a step function is introduced for each gene on the j-th transcribed sub-region as , j = 1,2,...n, which means the read counts are normalized by the sub-region length and the gene occurrences, and the read counts are weighted by expression level. The LBC of the gene is further got by normalizing the step function to be of mean 1. A weighted indicator matrix is got from GBC. The non-zero elements in are weighted by the expression level of the j-th transcribed sub-region of the i-th gene. In the same way, a weighted indicator matrix is got from LBC. The two weighted indicator matrix and are combined together as to take the place of in order to revise the parameter in the Poisson distribution function. In this study, α is set to 0.1 (Additional file 1: Note and Table S1). For a transcribed sub-region that has reads mapped, the corresponding likelihood function is defined as . Assume the read counts of each transcribed region are independent from each other, the joint log-likelihood function for gene members of the overlapping group is Then, we have Due to the convexity of the function, the gradient descending method is used to compute the maximum likelihood estimator Θ [33], that is, the expression levels of overlapping genes. We set initial value 1 to and iterate the optimization process, the is updated after each iteration process. Figure 1 illustrates the flowchart of the method.

Correction of reads count in UTRs

Most overlapping regions involve UTR, therefore, it's necessary to include the UTR region for the overlapping analysis since UTRs are important parts of the transcript sequence. Alternative polyadenylation and transcriptional start sites could result in mRNA isoforms with variations in their untranslated regions, reads counts in UTRs are thus corrected according to a general tendency learned from reads distribution in UTRs of non-overlapping genes. As reads distributions are not uniform, bias curve UTR(z) is introduced to revise the estimation of reads in UTR. To simulate the general tendency of reads distribution along UTR, UTR(z) is constructed from those non-overlapping gene sets without intersection with any other gene body or extended UTR. Assume the non-overlapping dataset contains t genes (p1, p2... pt). The normalized general tendency of reads number mapped to the z-th nucleotide is defined as, where z stands for the z-th nucleotide from the nearest coding nucleotide and depth(z) is the number of reads mapped to it. The median lengths of yeast UTRs were estimated to be around 50 bp for 5'UTR, and 100 bp for 3'UTR [36]. Coding regions of yeast genes are therefore extended to 200 bp for 3'UTR correction and 100 bp for 5'UTR correction. The corrected reads count for the extended UTR region of the i-th overlapping gene is estimated as, The reads count in UTRs is replaced by in the above log-likelihood function.

Data

RNA-seq datasets

Currently, qRT-PCR appears to be the most popular technology for producing "gold standard" abundance measurements; however, there is limit to get qRT-PCR results of genes enough for the overlapping analysis from public datasets, and it is also difficult to get RNA-seq datasets under the same biological condition. Levin et.al. built a compendium of yeast libraries using several strand specific protocols and a non-strand specific protocol, and sequenced them to deep coverage [26]. All these libraries were constructed under the same biological condition. Comparisons of the performance between these libraries showed that the dUTP second strand marking method (dUTP in short) performed reasonably and had the best quality measures of the strand specificity [26]. Therefore we applied our method on the nonST data to infer expression levels of overlapping genes and used the dUTP dataset to test the validity of the method. All sequencing reads in fastq format were aligned to the yeast reference genome using Bowtie software [37]. RSEM program [38] was used to deal with multiple mappings, and the posterior probabilities assigned were taken into account when estimating transcript abundance.

Simulated RNA-seq dataset

As there is few expression data for overlapping genes, we performed simulation experiments to further study the performance of IAOseq. UTRs are important parts of the transcribed sequences; we therefore extend all the annotated yeast gene loci 250nt on both sides. RSEM program [38] was used to generate a set of 1.3 million RNA-Seq fragments in a non-strand specific manner from the yeast transcriptome. The expression levels estimated from dUTP data are taken as input abundance estimates, and sequencing model parameters are set same as those obtained from nonST data.

Gene annotations

Yeast genome annotations were downloaded from SGD database. SGD classifies yeast ORFs into three categories: verified, uncharacterized and dubious ORFs [39]. Though dubious ORFs are unlikely to encode a protein [39], we observed expression evidence for some of them from the dUTP data (Additional file 1: Figure S2). Furthermore, many ORFs classified as "dubious" overlap with ORFs of the class "verified" or "uncharacterized", we therefore used all annotated genes to test the method in this study. Of the overlapping groups analyzed in this study, forty-seven groups contain non-coding genes. All the data were converted into a common version for comparison. The annotated yeast transcribed regions were classified into two categories: regions covered by overlapping genes and regions comprising only one single gene. Those transcribed regions of overlapping genes were further split into parts based on their overlapping patterns.

Results

There are two principal types of overlapping transcripts: the same strand overlapping type in which the genes involved are transcribed from the same strand, and the different strand overlapping type in which at least two genes are transcribed from different strands [3]. Of the overlapping genes in yeast genome, around 76% are different strand type (Additional file 1: Table S2). As strand specific RNA-seq could not distinguish transcripts from same strand overlapping genes, we therefore tested our method on two overlapping genes transcribed from different strand in the first place, then applied the method to the inference of expression levels of same strand overlapping genes, and then to the multi-overlapping genes constituted by more than two overlapping genes with a mixture of overlapping types. Short overlapping regions, where reads are much longer and would be mapped to the overlapping junctions, have little impact on the inference of strand information. IAOseq was thus trained on overlapping genes with overlapping length greater than 150bp. Expression levels are measured in fragments per kilobase of exon model per million mapped reads (FPKM). The logarithm base 2 of estimated abundance ratio (LEARatio in short) was introduced as a measure to evaluate the performance, which is based on the expression level deduced from nonST data divided by the expression level from dUTP data. The LEARatio close to zero reflects the more accurate inference. To evaluate IAOseq, we compared its performance to five other commonly used quantification methods, i.e. Cufflinks [30], Isoem [40], RSEM [38], eXpress [41] and Bitseq [42]. As small difference was observed between values inferred using Isoem and using RSEM (data not shown), average abundance over the values estimated by the four methods (Cufflinks, RSEM, eXpress and Bitseq) from dUTP data was used as the denominator of the LEARatio.

Application on real RNA-seq data

We first applied the five commonly used methods to estimate transcript abundances, and compared the expression level deduced from the nonST data with that deduced from the dUTP data. The scatter plots showed two distinct pattern, with a group of dots concentrated around the diagonal and another group of points scattered around the left-vertical line (Additional file 1: Figure S3), indicating a strong overestimation of expression levels especially for those genes with relatively low transcription levels. Estimating expression levels of lowly expressed genes would be much more affected by the inclusion of reads transcribed from the opposite strand. In contrast with the five methods, IAOseq greatly reduced overestimation of transcription levels for lowly transcribed overlapping genes (Figure 2A). Considering correlation between expression levels deduced from nonST and dUTP data, we got a square of correlation coefficient of 0.61 using IAOseq, which is much greater than that by other five methods (Additional file 1: Table S3).

Figure 2

Performances on different strand overlapping genes. Scatterplot of the expression levels estimated using IAOseq from nonST data and the average expression levels using other four methods from dUTP data (A), and percentage of genes within LEARatio intervals for the comparison between IAOseq and the five commonly used methods (B). Compared with the five widely used methods, LEARatios of IAOseq were mostly concentrated in a narrow range close to zero with significantly lower standard deviation (Table 1 Figure 2B). IAOseq significantly reduced the overestimation of expression levels affected by the inclusion of reads transcribed from the opposite strand. Around 37% of overlapping genes are overestimated more than two-fold using IAOseq, which is much less compared with other four methods, where the percentage of genes with more than two-fold overestimation is 43% using Cufflinks, 47% using RSEM, 43% using eXpress and 48% using Bitseq. The results indicated validity of our method in the improvement of RNA-seq data analysis

Table 1

Summary of LEARatios for the IAOseq and the five commonly used quantification methods performed on different strand overlapping genes.

	Mean	Median	Standard deviation	P value(Wilcoxon test)	P value(Ansari-Bradley test)
IAOseq	0.53	0.05	5.70	---	---

Cufflinks	1.87	0.13	5.50	2.3e-08	3.6e-03

Isoem	3.22	0.65	3.70	2.2e-16	2.1e-09

RSEM	3.21	0.69	3.64	2.2e-16	9.8e-10

eXpress	-0.13	0.33	14.4	2.2e-16	4.2e-03

Bitseq	2.80	0.53	3.68	2.6e-10	1.9e-07

To test the significance of performance difference between IAOseq and the five commonly used quantification methods, we used Wilcoxon rank test for the median difference and Ansari-Bradley two-sample test for the variance difference of LEARatios.

Summary of LEARatios for the IAOseq and the five commonly used quantification methods performed on different strand overlapping genes. To test the significance of performance difference between IAOseq and the five commonly used quantification methods, we used Wilcoxon rank test for the median difference and Ansari-Bradley two-sample test for the variance difference of LEARatios.

Application on same strand overlapping genes

In yeast genome, more than three hundred genes have same strand overlapping transcripts (Additional file 1: Table S2). When dealing with transcription signals mapped to the overlapping regions of the same strand overlapping gene pairs, most commonly used high-throughput methods for measuring gene expression, i.e. microarray or strand specific RNA-seq, could rarely distinguish which strand was present in the original mRNA template. Our proposed computational pipeline is not restricted to the overlapping types and can be applied to correct expression levels of same strand overlapping genes. As transcripts from same strand overlapping genes have identical sequences, even the strand specific RNA-seq library construction method cannot distinguish from which gene template the transcripts were transcribed. It is reasonably that little difference was observed between the expression levels deduced from nonST data and from dUTP data using the five methods (Additional file 1: Figure S4). In contrast, IAOseq results showed that the expression levels of same strand overlapping genes were much lower than average abundance over the values estimated by the four methods (Cufflinks, RSEM, eXpress and Bitseq) (Figure 3A, Wilcoxon test, W = 29579, p-value = 5e-07). We estimated that the direct method for inferring gene expression levels gave an excessive overestimation of the expression levels of same strand overlapping genes with median of 1.61 (Figure 3B), and the overestimation is more obvious in genes with low expression levels (Figure 3C).

Figure 3

Performances on same strand overlapping genes. Scatterplot of the expression levels estimated by IAOseq and the average expression level estimated by other four methods from nonST data (A), overestimation on expression levels of all genes (B) and of genes in different levels of abundance by the commonly used methods (C). Overestimation is defined as the average expression level deduced by other four methods divided by the expression level deduced by IAOseq.

IAOseq performance on simulated data

As there are limited data from which to evaluate the accuracy of the quantification of overlapping gene expression, we further tested IAOseq on simulated data. More genes are excessive overestimated more than five folds by other five methods (Additional file 1: Figure S6A). Furthermore, for those overlapping genes which are simulated with no expression estimates, IAOseq show much better performance, more than 72% genes are estimated with low level, whereas overestimation is pronounced using other five methods (Additional file 1: Figure S6B).

Conclusion

In summary, the output of this project provides a useful tool for inferring overlapping transcription levels, which aims to help us gain comprehensive understandings of the complex regulatory mechanism mediated by overlapping transcripts. IAOseq not only has a good performance on the adjustment of expression levels of different strand overlapping genes from nonST data, but also could be used to estimate expression levels of same strand overlapping genes, which is more interesting as most high-throughput protocols have the problem with same strand overlapping genes. IAOseq is as fast as other commonly used quantification methods. Overlapping expression is a universal feature of eukaryotic genomes and antisense mediated regulation could be an ancient mechanism to enhance gene expression response to genetic and environmental variation. In such scenario, the task of inferring expression levels of overlapping genes should be integrated into gene expression profile analysis.

Availability

IAOseq is freely available at xxxx.

List of abbreviations

nonST: non-strand specific RNA-seq dataset; LEARatio: the logarithm base 2 of estimated abundance ratio.

Competing interests

The authors declare no conflict of interest.

Authors' contributions

HS and YL designed the project and directed the analysis. SY and LT implemented the algorithm. SY performed the analysis. HS drafted the manuscript. All authors read and approved the final manuscript.

Additional file 1

This file contains Figures S1-S7 and Tables S1-S3. Click here for file

41 in total

1. Mapping and quantifying mammalian transcriptomes by RNA-Seq.

Authors: Ali Mortazavi; Brian A Williams; Kenneth McCue; Lorian Schaeffer; Barbara Wold
Journal: Nat Methods Date: 2008-05-30 Impact factor: 28.547

2. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project.

Authors: Ewan Birney; John A Stamatoyannopoulos; Anindya Dutta; Roderic Guigó; Thomas R Gingeras; Elliott H Margulies; Zhiping Weng; Michael Snyder; Emmanouil T Dermitzakis; Robert E Thurman; Michael S Kuehn; Christopher M Taylor; Shane Neph; Christoph M Koch; Saurabh Asthana; Ankit Malhotra; Ivan Adzhubei; Jason A Greenbaum; Robert M Andrews; Paul Flicek; Patrick J Boyle; Hua Cao; Nigel P Carter; Gayle K Clelland; Sean Davis; Nathan Day; Pawandeep Dhami; Shane C Dillon; Michael O Dorschner; Heike Fiegler; Paul G Giresi; Jeff Goldy; Michael Hawrylycz; Andrew Haydock; Richard Humbert; Keith D James; Brett E Johnson; Ericka M Johnson; Tristan T Frum; Elizabeth R Rosenzweig; Neerja Karnani; Kirsten Lee; Gregory C Lefebvre; Patrick A Navas; Fidencio Neri; Stephen C J Parker; Peter J Sabo; Richard Sandstrom; Anthony Shafer; David Vetrie; Molly Weaver; Sarah Wilcox; Man Yu; Francis S Collins; Job Dekker; Jason D Lieb; Thomas D Tullius; Gregory E Crawford; Shamil Sunyaev; William S Noble; Ian Dunham; France Denoeud; Alexandre Reymond; Philipp Kapranov; Joel Rozowsky; Deyou Zheng; Robert Castelo; Adam Frankish; Jennifer Harrow; Srinka Ghosh; Albin Sandelin; Ivo L Hofacker; Robert Baertsch; Damian Keefe; Sujit Dike; Jill Cheng; Heather A Hirsch; Edward A Sekinger; Julien Lagarde; Josep F Abril; Atif Shahab; Christoph Flamm; Claudia Fried; Jörg Hackermüller; Jana Hertel; Manja Lindemeyer; Kristin Missal; Andrea Tanzer; Stefan Washietl; Jan Korbel; Olof Emanuelsson; Jakob S Pedersen; Nancy Holroyd; Ruth Taylor; David Swarbreck; Nicholas Matthews; Mark C Dickson; Daryl J Thomas; Matthew T Weirauch; James Gilbert; Jorg Drenkow; Ian Bell; XiaoDong Zhao; K G Srinivasan; Wing-Kin Sung; Hong Sain Ooi; Kuo Ping Chiu; Sylvain Foissac; Tyler Alioto; Michael Brent; Lior Pachter; Michael L Tress; Alfonso Valencia; Siew Woh Choo; Chiou Yu Choo; Catherine Ucla; Caroline Manzano; Carine Wyss; Evelyn Cheung; Taane G Clark; James B Brown; Madhavan Ganesh; Sandeep Patel; Hari Tammana; Jacqueline Chrast; Charlotte N Henrichsen; Chikatoshi Kai; Jun Kawai; Ugrappa Nagalakshmi; Jiaqian Wu; Zheng Lian; Jin Lian; Peter Newburger; Xueqing Zhang; Peter Bickel; John S Mattick; Piero Carninci; Yoshihide Hayashizaki; Sherman Weissman; Tim Hubbard; Richard M Myers; Jane Rogers; Peter F Stadler; Todd M Lowe; Chia-Lin Wei; Yijun Ruan; Kevin Struhl; Mark Gerstein; Stylianos E Antonarakis; Yutao Fu; Eric D Green; Ulaş Karaöz; Adam Siepel; James Taylor; Laura A Liefer; Kris A Wetterstrand; Peter J Good; Elise A Feingold; Mark S Guyer; Gregory M Cooper; George Asimenos; Colin N Dewey; Minmei Hou; Sergey Nikolaev; Juan I Montoya-Burgos; Ari Löytynoja; Simon Whelan; Fabio Pardi; Tim Massingham; Haiyan Huang; Nancy R Zhang; Ian Holmes; James C Mullikin; Abel Ureta-Vidal; Benedict Paten; Michael Seringhaus; Deanna Church; Kate Rosenbloom; W James Kent; Eric A Stone; Serafim Batzoglou; Nick Goldman; Ross C Hardison; David Haussler; Webb Miller; Arend Sidow; Nathan D Trinklein; Zhengdong D Zhang; Leah Barrera; Rhona Stuart; David C King; Adam Ameur; Stefan Enroth; Mark C Bieda; Jonghwan Kim; Akshay A Bhinge; Nan Jiang; Jun Liu; Fei Yao; Vinsensius B Vega; Charlie W H Lee; Patrick Ng; Atif Shahab; Annie Yang; Zarmik Moqtaderi; Zhou Zhu; Xiaoqin Xu; Sharon Squazzo; Matthew J Oberley; David Inman; Michael A Singer; Todd A Richmond; Kyle J Munn; Alvaro Rada-Iglesias; Ola Wallerman; Jan Komorowski; Joanna C Fowler; Phillippe Couttet; Alexander W Bruce; Oliver M Dovey; Peter D Ellis; Cordelia F Langford; David A Nix; Ghia Euskirchen; Stephen Hartman; Alexander E Urban; Peter Kraus; Sara Van Calcar; Nate Heintzman; Tae Hoon Kim; Kun Wang; Chunxu Qu; Gary Hon; Rosa Luna; Christopher K Glass; M Geoff Rosenfeld; Shelley Force Aldred; Sara J Cooper; Anason Halees; Jane M Lin; Hennady P Shulha; Xiaoling Zhang; Mousheng Xu; Jaafar N S Haidar; Yong Yu; Yijun Ruan; Vishwanath R Iyer; Roland D Green; Claes Wadelius; Peggy J Farnham; Bing Ren; Rachel A Harte; Angie S Hinrichs; Heather Trumbower; Hiram Clawson; Jennifer Hillman-Jackson; Ann S Zweig; Kayla Smith; Archana Thakkapallayil; Galt Barber; Robert M Kuhn; Donna Karolchik; Lluis Armengol; Christine P Bird; Paul I W de Bakker; Andrew D Kern; Nuria Lopez-Bigas; Joel D Martin; Barbara E Stranger; Abigail Woodroffe; Eugene Davydov; Antigone Dimas; Eduardo Eyras; Ingileif B Hallgrímsdóttir; Julian Huppert; Michael C Zody; Gonçalo R Abecasis; Xavier Estivill; Gerard G Bouffard; Xiaobin Guan; Nancy F Hansen; Jacquelyn R Idol; Valerie V B Maduro; Baishali Maskeri; Jennifer C McDowell; Morgan Park; Pamela J Thomas; Alice C Young; Robert W Blakesley; Donna M Muzny; Erica Sodergren; David A Wheeler; Kim C Worley; Huaiyang Jiang; George M Weinstock; Richard A Gibbs; Tina Graves; Robert Fulton; Elaine R Mardis; Richard K Wilson; Michele Clamp; James Cuff; Sante Gnerre; David B Jaffe; Jean L Chang; Kerstin Lindblad-Toh; Eric S Lander; Maxim Koriabine; Mikhail Nefedov; Kazutoyo Osoegawa; Yuko Yoshinaga; Baoli Zhu; Pieter J de Jong
Journal: Nature Date: 2007-06-14 Impact factor: 49.962

3. Transcription of antisense RNA leading to gene silencing and methylation as a novel cause of human genetic disease.

Authors: Cristina Tufarelli; Jackie A Sloane Stanley; David Garrick; Jackie A Sharpe; Helena Ayyub; William G Wood; Douglas R Higgs
Journal: Nat Genet Date: 2003-06 Impact factor: 38.330

Review 4. RNA-Seq: a revolutionary tool for transcriptomics.

Authors: Zhong Wang; Mark Gerstein; Michael Snyder
Journal: Nat Rev Genet Date: 2009-01 Impact factor: 53.242

5. Bidirectional promoters generate pervasive transcription in yeast.

Authors: Zhenyu Xu; Wu Wei; Julien Gagneur; Fabiana Perocchi; Sandra Clauder-Münster; Jurgi Camblong; Elisa Guffanti; Françoise Stutz; Wolfgang Huber; Lars M Steinmetz
Journal: Nature Date: 2009-01-25 Impact factor: 49.962

6. The transcriptional landscape of the mammalian genome.

Authors: P Carninci; T Kasukawa; S Katayama; J Gough; M C Frith; N Maeda; R Oyama; T Ravasi; B Lenhard; C Wells; R Kodzius; K Shimokawa; V B Bajic; S E Brenner; S Batalov; A R R Forrest; M Zavolan; M J Davis; L G Wilming; V Aidinis; J E Allen; A Ambesi-Impiombato; R Apweiler; R N Aturaliya; T L Bailey; M Bansal; L Baxter; K W Beisel; T Bersano; H Bono; A M Chalk; K P Chiu; V Choudhary; A Christoffels; D R Clutterbuck; M L Crowe; E Dalla; B P Dalrymple; B de Bono; G Della Gatta; D di Bernardo; T Down; P Engstrom; M Fagiolini; G Faulkner; C F Fletcher; T Fukushima; M Furuno; S Futaki; M Gariboldi; P Georgii-Hemming; T R Gingeras; T Gojobori; R E Green; S Gustincich; M Harbers; Y Hayashi; T K Hensch; N Hirokawa; D Hill; L Huminiecki; M Iacono; K Ikeo; A Iwama; T Ishikawa; M Jakt; A Kanapin; M Katoh; Y Kawasawa; J Kelso; H Kitamura; H Kitano; G Kollias; S P T Krishnan; A Kruger; S K Kummerfeld; I V Kurochkin; L F Lareau; D Lazarevic; L Lipovich; J Liu; S Liuni; S McWilliam; M Madan Babu; M Madera; L Marchionni; H Matsuda; S Matsuzawa; H Miki; F Mignone; S Miyake; K Morris; S Mottagui-Tabar; N Mulder; N Nakano; H Nakauchi; P Ng; R Nilsson; S Nishiguchi; S Nishikawa; F Nori; O Ohara; Y Okazaki; V Orlando; K C Pang; W J Pavan; G Pavesi; G Pesole; N Petrovsky; S Piazza; J Reed; J F Reid; B Z Ring; M Ringwald; B Rost; Y Ruan; S L Salzberg; A Sandelin; C Schneider; C Schönbach; K Sekiguchi; C A M Semple; S Seno; L Sessa; Y Sheng; Y Shibata; H Shimada; K Shimada; D Silva; B Sinclair; S Sperling; E Stupka; K Sugiura; R Sultana; Y Takenaka; K Taki; K Tammoja; S L Tan; S Tang; M S Taylor; J Tegner; S A Teichmann; H R Ueda; E van Nimwegen; R Verardo; C L Wei; K Yagi; H Yamanishi; E Zabarovsky; S Zhu; A Zimmer; W Hide; C Bult; S M Grimmond; R D Teasdale; E T Liu; V Brusic; J Quackenbush; C Wahlestedt; J S Mattick; D A Hume; C Kai; D Sasaki; Y Tomaru; S Fukuda; M Kanamori-Katayama; M Suzuki; J Aoki; T Arakawa; J Iida; K Imamura; M Itoh; T Kato; H Kawaji; N Kawagashira; T Kawashima; M Kojima; S Kondo; H Konno; K Nakano; N Ninomiya; T Nishio; M Okada; C Plessy; K Shibata; T Shiraki; S Suzuki; M Tagami; K Waki; A Watahiki; Y Okamura-Oho; H Suzuki; J Kawai; Y Hayashizaki
Journal: Science Date: 2005-09-02 Impact factor: 47.728

7. Saccharomyces Genome Database: the genomics resource of budding yeast.

Authors: J Michael Cherry; Eurie L Hong; Craig Amundsen; Rama Balakrishnan; Gail Binkley; Esther T Chan; Karen R Christie; Maria C Costanzo; Selina S Dwight; Stacia R Engel; Dianna G Fisk; Jodi E Hirschman; Benjamin C Hitz; Kalpana Karra; Cynthia J Krieger; Stuart R Miyasato; Rob S Nash; Julie Park; Marek S Skrzypek; Matt Simison; Shuai Weng; Edith D Wong
Journal: Nucleic Acids Res Date: 2011-11-21 Impact factor: 16.971