Literature DB >> 28850106

FANTOM5 CAGE profiles of human and mouse samples.

Shuhei Noguchi¹, Takahiro Arakawa^1,2, Shiro Fukuda², Masaaki Furuno^1,2, Akira Hasegawa^1,2, Fumi Hori^1,2, Sachi Ishikawa-Kato^1,2, Kaoru Kaida², Ai Kaiho², Mutsumi Kanamori-Katayama², Tsugumi Kawashima^1,2, Miki Kojima^1,2, Atsutaka Kubosaki², Ri-Ichiroh Manabe^1,2, Mitsuyoshi Murata^1,2, Sayaka Nagao-Sato^1,2, Kenichi Nakazato², Noriko Ninomiya², Hiromi Nishiyori-Sueki^1,2, Shohei Noma^1,2, Eri Saijyo², Akiko Saka², Mizuho Sakai^1,2, Christophe Simon², Naoko Suzuki^1,2, Michihira Tagami^1,2, Shoko Watanabe^1,2, Shigehiro Yoshida², Peter Arner^3,4, Richard A Axton⁵, Magda Babina⁶, J Kenneth Baillie⁷, Timothy C Barnett^8,9, Anthony G Beckhouse¹⁰, Antje Blumenthal¹¹, Beatrice Bodega¹², Alessandro Bonetti^1,2, James Briggs¹³, Frank Brombacher^14,15,16, Ailsa J Carlisle⁷, Hans C Clevers^17,18, Carrie A Davis¹⁹, Michael Detmar²⁰, Taeko Dohi²¹, Albert S B Edge²², Matthias Edinger^23,24, Anna Ehrlund^3,4, Karl Ekwall²⁵, Mitsuhiro Endoh²⁶, Hideki Enomoto²⁷, Afsaneh Eslami²⁸, Michela Fagiolini²⁹, Lynsey Fairbairn⁷, Mary C Farach-Carson³⁰, Geoffrey J Faulkner³¹, Carmelo Ferrai³², Malcolm E Fisher⁷, Lesley M Forrester⁵, Rie Fujita³³, Jun-Ichi Furusawa²⁶, Teunis B Geijtenbeek³⁴, Thomas Gingeras¹⁹, Daniel Goldowitz³⁵, Sven Guhl⁶, Reto Guler^14,15,16, Stefano Gustincich^36,37, Thomas J Ha³⁵, Masahide Hamaguchi³⁸, Mitsuko Hara³⁹, Yuki Hasegawa^1,2, Meenhard Herlyn⁴⁰, Peter Heutink⁴¹, Kelly J Hitchens^8,13, David A Hume⁷, Tomokatsu Ikawa²⁶, Yuri Ishizu^1,2, Chieko Kai^42,43, Hiroshi Kawamoto²⁶, Yuki I Kawamura²¹, Judith S Kempfle²², Tony J Kenna⁴⁴, Juha Kere^25,45, Levon M Khachigian^46,47, Toshio Kitamura⁴⁸, Sarah Klein²⁰, S Peter Klinken⁴⁹, Alan J Knox⁵⁰, Soichi Kojima³⁹, Haruhiko Koseki²⁶, Shigeo Koyasu²⁶, Weonju Lee⁵¹, Andreas Lennartsson²⁵, Alan Mackay-Sim⁵², Niklas Mejhert^3,4, Yosuke Mizuno⁵³, Hiromasa Morikawa³⁸, Mitsuru Morimoto²⁷, Kazuyo Moro²⁶, Kelly J Morris³², Hozumi Motohashi⁵⁴, Christine L Mummery⁵⁵, Yutaka Nakachi^53,56, Fumio Nakahara⁴⁸, Toshiyuki Nakamura⁴², Yukio Nakamura⁵⁷, Tadasuke Nozaki⁵⁸, Soichi Ogishima⁵⁹, Naganari Ohkura³⁸, Hiroshi Ohno²⁶, Mitsuhiro Ohshima⁶⁰, Mariko Okada-Hatakeyama^26,61, Yasushi Okazaki^53,56, Valerio Orlando^12,62, Dmitry A Ovchinnikov¹³, Robert Passier⁵⁵, Margaret Patrikakis⁴⁶, Ana Pombo³², Swati Pradhan-Bhatt⁶³, Xian-Yang Qin³⁹, Michael Rehli^23,24, Patrizia Rizzu⁴¹, Sugata Roy², Antti Sajantila⁶⁴, Shimon Sakaguchi³⁸, Hiroki Sato⁴², Hironori Satoh³³, Suzana Savvi^14,15,16, Alka Saxena², Christian Schmidl²³, Claudio Schneider⁶⁵, Gundula G Schulze-Tanzil⁶⁶, Anita Schwegmann^14,15,16, Guojun Sheng⁶⁷, Jay W Shin^1,2, Daisuke Sugiyama⁶⁸, Takaaki Sugiyama⁴², Kim M Summers⁷, Naoko Takahashi², Jun Takai³³, Hiroshi Tanaka²⁸, Hideki Tatsukawa⁶⁹, Andru Tomoiu⁷, Hiroo Toyoda⁵⁴, Marc van de Wetering¹⁷, Linda M van den Berg³⁴, Roberto Verardo⁷⁰, Dipti Vijayan⁷¹, Christine A Wells⁷², Louise N Winteringham⁴⁹, Ernst Wolvetang¹³, Yoko Yamaguchi⁷³, Masayuki Yamamoto³³, Chiyo Yanagi-Mizuochi⁷⁴, Misako Yoneda⁴², Yohei Yonekura²⁷, Peter G Zhang³⁵, Silvia Zucchelli³⁶, Imad Abugessaisa¹, Erik Arner^1,2, Jayson Harshbarger^1,2, Atsushi Kondo^1,2, Timo Lassmann^1,2,75, Marina Lizio^1,2, Serkan Sahin^1,2, Thierry Sengstag², Jessica Severin^1,2, Hisashi Shimoji^2,76, Masanori Suzuki², Harukazu Suzuki^1,2, Jun Kawai^2,77, Naoto Kondo^1,2, Masayoshi Itoh^1,2,77, Carsten O Daub^1,2,25, Takeya Kasukawa¹, Hideya Kawaji^1,2,76,77, Piero Carninci^1,2, Alistair R R Forrest^1,2,49, Yoshihide Hayashizaki^2,77.

Abstract

In the FANTOM5 project, transcription initiation events across the human and mouse genomes were mapped at a single base-pair resolution and their frequencies were monitored by CAGE (Cap Analysis of Gene Expression) coupled with single-molecule sequencing. Approximately three thousands of samples, consisting of a variety of primary cells, tissues, cell lines, and time series samples during cell activation and development, were subjected to a uniform pipeline of CAGE data production. The analysis pipeline started by measuring RNA extracts to assess their quality, and continued to CAGE library production by using a robotic or a manual workflow, single molecule sequencing, and computational processing to generate frequencies of transcription initiation. Resulting data represents the consequence of transcriptional regulation in each analyzed state of mammalian cells. Non-overlapping peaks over the CAGE profiles, approximately 200,000 and 150,000 peaks for the human and mouse genomes, were identified and annotated to provide precise location of known promoters as well as novel ones, and to quantify their activities.

Entities: Chemical Disease Gene Species

Mesh：

Year: 2017 PMID： 28850106 PMCID： PMC5574368 DOI： 10.1038/sdata.2017.112

Source DB: PubMed Journal: Sci Data ISSN： 2052-4463 Impact factor: 6.444

Background & Summary

Since the completion of the human genome sequencing, role of individual bases has been a central question. An international collaborative effort, FANTOM (Functional ANnoTation Of Mammalian Genome)[1], delineated a complex landscape of transcribed RNAs (transcriptome) and their regulations. The initial key technology driving the project was to make full-length cDNA clones, representing complete primary structure of transcribed RNA molecules. Sequencing of the full-length cDNA clones uncovered unexpected number of long non-coding RNAs as well as protein coding genes[2-6]. The CAGE (Cap Analysis Gene Expression)[7,8] protocol, combination with high-throughput sequencing, was developed to monitor frequencies of transcription initiation by determining 5′-end of capped RNAs. The technology was devised to uncover complexity of the transcriptome[4-6] and elucidate transcriptional regulatory networks by focusing on promoter elements[9-12]. By taking advantage of single molecule sequencer, HeliScopeCAGE was recently developed to provide more sensitive and accurate monitoring of transcription initiation activities[7,8]. In the fifth round of the FANTOM projects, FANTOM5, the challenge was to capture the transcriptome of many varieties of cell states as possible, to understand the implication of each genomic bases in different contexts. In the first phase of the FANTOM5 project, we targeted cells in steady state, called ‘snapshot’ samples[13]. Our central focus was on human primary cells, while cell lines, tissues and mouse samples were chosen to cover cells inaccessible as isolated human primary samples. The resulting data provided an atlas of promoter and enhancer activities in wide range of cell states[14], which is a baseline of understanding complex transcriptional regulation. In the second phase, we focused on transitions of cell states by monitoring ‘time course’ samples, such as activations, differentiations, and developments at sequential time points[15]. The monitored activities of promoters and enhancers demonstrated that enhancer activities is the earliest event during dynamic changes of transcriptome. These data sets are being utilized in many other studies inside and outside of the FANTOM5 consortium. The data production scheme was implemented based on the FANTOM5 collaboration. Sample collection was performed at individual institutes, since specific types of samples require dedicated systems with special expertise or settings, as well as through purchase from commercial sources. RNA quality was firstly examined at the place where the samples were obtained (the first RNA quality check). The CAGE assay pipeline established in RIKEN GeNAS (Genome Network Analysis Support Facility) employed two workflows of HeliScopeCAGE, a manual workflow for samples with small amount of total RNAs[8] and a robotic workflow for samples with standard requirements[7]. The assay pipeline started with checking RNA quality (the second RNA quality check), which provides a uniform quality assessment of the profiled RNA extracts. The resulting CAGE libraries were sequenced by HeliScope in RIKEN and also in Helicos Biosciences, and the obtained data were processed by the MOIRAI system[16]. Quality of the resulting CAGE profiles was checked with several statistics as well as manual inspection by using the ZENBU browser[17]. Finally CAGE profiles were shared among the consortium for further analysis. In the course of the two phases focused on ‘snapshot’ and ‘time course’ samples, we profiled 1,816 human and 1,016 mouse samples in total, and obtained approximately four millions of single-molecule reads successfully aligned to the genome per sample on average. Based on frequencies of the observed 5′-ends of individual capped RNA molecules at a single base-pair resolution, we identified 201,802 and 158,966 peaks for human and mouse respectively, where promoters are defined as the sequence immediately upstream of the peaks and frequencies of observed CAGE reads reflect activities of the promoters. All data generated during the course of the project were deposited to a public repository (DDBJ Read Archive, DRA) and/or provided at the FANTOM5 web resource (http://fantom.gsc.riken.jp/5/)[18]. Here we describe the data with the processing details and quality metrics.

Methods

Sample collection

Sample collection was performed as described previously[13,15]. Briefly, primary cells were purchased as purified RNAs or frozen cells, or obtained as described previously[19-24] through collaboration in the consortium. Purchased cells were cultured according to the manufacturer’s instructions and miRNeasy kit (QIAGEN) was used for RNA extraction. Human post mortem tissue RNAs were purchased or obtained through the Dutch Brain bank. Tissues collected through the consortium were snap-frozen in liquid nitrogen, transferred into Lysing Matrix D tubes (MP Biomedicals, Santa Ana, CA) containing chilled Trizol (Gibco), homogenized by FastPrep Homogenizer (Thermo Savant), and centrifuged. miRNeasy kit (QIAGEN) was used for RNA extraction from cultured cell lines as well as frozen cell line stocks. For the purchased samples, lot or catalogue numbers were recorded where available. Of the collected RNAs, those with more than 1 μg, were measured by Agilent BioAnalyzer (Agilent Technologies, Santa Clara, CA) and Nanodrop spectrophotometer (Thermo Fisher Scientific, Wilmington, DE) to check RIN (RNA integrity) score and the absorbance ratio of A260/A230 and A260/A280. The rest of the samples were directly subjected to the CAGE library production to avoid wasting material. All 2,832 profiled samples are summarized in Table 1.

Table 1

Summary of FANTOM5 phase 1 and phase 2 samples.

Sample	Phase 1		Phase 2		Total
	Human	Mouse	Human	Mouse
Cell lines	259	1	9	0	269
Fractionations	12	0	9	0	21
Primary cells	537	109	24	31	701
Timecourse samples	35	19	748	572	1,374
Tissues	150	237	33	45	465
Quality control samples	0	1	0	1	2
Total	993	367	823	649	2,832

Single molecule CAGE and data processing

HeliScopeCAGE libraries were prepared, sequenced, and processed as described previously[13,15]. Most of the RNAs were subjected to the automated HeliScopeCAGE protocol[7], except for RNAs with less than 1 μg that were subjected to the manual protocol optimized for low quantity RNAs[8]. The resulting libraries were measured by OliGreen fluorescence assay kit (Life Technologies), and sequenced by following the manufacturer’s instructions (LB-016_01, LB-017_01, and LB-001_04 (ref. 13). RNAs extracted from mouse whole body embryo E17.5 (called internal control) were systematically subjected to this workflow, with one per a sequencing run. The produced data were processed as previously described[13,15]. Briefly, reads corresponding to ribosomal RNA were removed by using the program rRNAdust (http://fantom.gsc.riken.jp/5/suppl/rRNAdust/), remaining reads were aligned to the reference genome of human and mouse (hg19 or mm9) by using Delve[25], and alignments with a quality of less than 20 (<99% chance of true) or a sequence identity of less than 85% were discarded. Frequencies of the CAGE read 5′ ends were counted to give a unit of CAGE tag start site (CTSS), a single base-pair on the reference genome. The entire flow of the data is illustrated in Fig. 1, and the number of CAGE profiles (equivalent to CTSS files) is summarized in Table 2.

Figure 1

Data processing scheme.

Data processing scheme from sample preparation to CAGE peak expression and annotation. Sky blue and beige color indicate locations storing the data, the FANTOM5 data archive (Data Citation 1, Data Citations 10) and in DDBJ Sequence Read Archive (Data Citations 2–9) respectively.

Table 2

Sequence files (CTSS files).

Sample	Phase 1		Phase 2		Total
	Human	Mouse	Human	Mouse
Cell lines	261	1	10	0	272
Fractionations	12	0	9	0	21
Primary cells	538	110	26	50	724
Timecourse samples	35	20	750	578	1,383
Tissues	152	236	36	45	469
Quality control samples	0	28	0	122	150
Total	998	395	831	795	3,019

Identification of peaks and their annotations

Non-overlapping peaks based on the all CAGE profiles were identified by using DPI (decomposition-based peak identification, https://github.com/hkawaji/dpi1/) method and annotated as previously described[13,15]. A ‘robust’ threshold, for which a peak must include a CTSS with more than 10 read counts and 1 TPM (tags per million) at least one sample, was employed to define a stringent subset of the CAGE peaks. The robust peaks were associated with known transcripts, such as RefSeq[26], UCSC known gene[27], GENCODE[28], Ensembl[29], and mRNAs (full-length cDNA clones), based on their 5′-end proximity to the peaks. Official gene symbols, Entrez Gene IDs, and protein (UniProt) IDs associated with the transcripts were retrieved and assigned as part of annotation. In addition to these associations, human readable names and descriptions were assigned to each of the CAGE peaks. Peaks were given a name in the form pN@GENE, where GENE indicates gene symbol or transcript name and N indicates the rank in the ranked list of promoter activities for that gene. For example, p1@SPI1 represent the peak with the highest number of observation (that is, read counts) in all of the FANTOM5 CAGE profiles, among the peaks associated with SPI1 gene. Peak identification with the same method and the same threshold was performed two times; the first was for ‘snapshot’ samples (phase 1), and the second was for the entire samples from both the ‘snapshot’ and ‘time course’ studies (phase 2). We integrated these two peak sets into a hybrid set consisting of all the phase 1 peaks over the robust threshold and a subset of phase 2 peaks that did not overlap with the phase 1 peaks. Annotation of phase1 peaks was used in the hybrid set, called phase 1+2 peaks, which provide a consistent reference in the definition of promoters.

Quantification of promoter activities

All the obtained CAGE profiles were subjected to the peak identification, even if they have some issues in quality, since all of them still represent independent observations of RNA 5′-ends. However promoter activities (that is, expression levels of CAGE peaks) were quantified only in the samples satisfying the following criteria: RIN score greater than 6, more than 500,000 successfully aligned reads to the genome, and more than 50% of the successful alignments are close to 5′-end of RefSeq gene model, for expression analysis requiring reliable quantification. After discarding a few CAGE profiles of low quality, read counts for individual CTSSs belonging to the same peak were summed up, normalization (or scaling) factors were calculated with RLE (Relative Log Expression)[30] method by edgeR[31], and tags per million (that is, counts per million) was computed as expression levels. The RLE normalization was first performed within the phase 1 samples. The naïve application of this to the entire data sets, consisting of phase 1 and phase 2 samples, might cause inconsistencies in expression levels between the two normalizations. To avoid this, we took the geometric mean of CAGE peak read counts across the phase 1 samples and used it as the reference expression for a normalization factor calculation in the same manner as RLE method. This enabled us to keep the expression levels of phase 1 as they were, and to adjust the expression levels of the phase 2 samples to be comparable[15].

Code availability

All software used in this study are publicly available. rRNAdust, for removing ribosomal RNA, is available at http://fantom.gsc.riken.jp/5/suppl/rRNAdust/. Mapping software Delve is available at http://fantom.gsc.riken.jp/5/suppl/delve/. The program to perform DPI, decomposition-based peak identification, method is available at https://github.com/hkawaji/dpi1/.

Data Records

Data record 1: Metadata

Two types of metadata are available at figshare and LSDB Archive (Data Citation 1, 10). One is for the samples, including their origins and extracted RNA. The other is for the CAGE assay, including the result of RNA quality check, library production, and post-processing of the CAGE tag sequences. Both of them are described in SDRF (Sample and Data Relationship Format)[32]. Sample metadata for human and mouse are ‘HumanSamples2.0.sdrf.xlsx’ and ‘MouseSamples2.0.sdrf.xlsx’, respectively. The metadata for the CAGE assay are available as ‘*sdrf.txt’.

Data record 2: CAGE profiles

All of the CAGE sequences, their alignment to the genomes, and CTSS frequencies are available at DDBJ DRA (DDBJ Sequence Read Archive) (Data Citations 2–9). The accession number of each file is summarized in ‘DRA*.txt’ at figshare (Data Citation 1).

Data record 3: CAGE peaks

Genomic coordinates, annotations and expressions of the CAGE peaks are available as ‘*phase1and2combined_coord.bed.gz’, ‘*phase1and2combined_ann.txt.gz’, and ‘*phase1and2combined_tpm.osc.txt.gz’ respectively at figshare (Data Citation 1). Genomic coordinates are formatted in BED format, and the others are formatted in OSCtable (Order Switchable Column table). The detail of the OSCtable format is available at https://sourceforge.net/projects/osctf/.

Technical Validation

RNA quality

Measured RNA qualities at the second check (that is, immediately before the CAGE library production) are shown in Fig. 2a–c. RNA Integrity Number (RIN) score, measured using an Agilent Bioanalyzer, was 8.96 on average (standard deviation 1.19), absorbance ratio of 260/230 nm (A260/A230) and 260/280 nm (A260/A280) were on average 2.01 (standard deviation 0.53) and 2.13 (standard deviation 0.14) respectively. These figures indicate that the majority of the RNAs were processed in good quality.

Figure 2

RNA and mapping quality control.

Distribution of RIN score (a), A260/A230 (b), A260/A280 (c), mapped reads (d), and promoter rate (e) for samples used for FANTOM5 expression analysis.

Mapped reads

The number of CAGE reads successfully aligned with the genome and the ratio of CAGE reads hitting conventional promoters are shown in Fig. 2d,e. The average number of mapped reads is 4,208,291 per CAGE profile. Of the 2,522 profiles, 98.3% (2,478) consists of at least 500,000 successfully aligned reads, which was a criterion of profiles used for expression analysis[13]. The average ratio of promoter-hitting reads is 76.5, and 98.6% of the all profiles (2,437/2,472) have more than 50% promoter-hitting rate, which was another criterion of profiles used for expression analysis[13].

Sample identity

Hierarchical clustering of the 126 mouse primary cells[13] within the phase 1 was shown in Fig. 3, and the same clustering of the 571 human primary cells[13] was in Supplementary Fig. 1. The average linkage method was applied to log-scale expression (TPM) profiles at promoter-level, and sample identities were assessed by expression of marker genes and also by manual inspection of the hierarchical clustering. The figures show that majority of biological replicates belonged to the same branch of the tree, that is, the same cluster, except for samples with a low number of mapped read counts.

Figure 3

Hierarchical clustering of primary cells.

Hierarchical clustering of primary cell samples of mouse based on logarithm of expression (TPM). Color shows anatomical categories of samples.

Usage Notes

As well as providing access to individual data files, we also set up a series of interfaces as described in the FANTOM web resource[18,33]. TET (Table Extraction Tool) provides an interface to obtain a subset of data by specifying the desired columns and rows. The BioMart interface[34], and FANTOM5 SSTAR (Semantic catalog of Samples, Transcription initiation And Regulators) provides the metadata of the profiled samples[35]. The CAGE profile on the genomic axis is visible in ZENBU[17] with its interactive interface and also in the UCSC genome browser[36] via track data hub[37].

Additional Information

How to cite this article: Noguchi, S. et al. FANTOM5 CAGE profiles of human and mouse samples. Sci. Data 4:170112 doi: 10.1038/sdata.2017.112 (2017). Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

37 in total

1. Unamplified cap analysis of gene expression on a single-molecule sequencer.

Authors: Mutsumi Kanamori-Katayama; Masayoshi Itoh; Hideya Kawaji; Timo Lassmann; Shintaro Katayama; Miki Kojima; Nicolas Bertin; Ai Kaiho; Noriko Ninomiya; Carsten O Daub; Piero Carninci; Alistair R R Forrest; Yoshihide Hayashizaki
Journal: Genome Res Date: 2011-05-19 Impact factor: 9.043

2. Functional annotation of a full-length mouse cDNA collection.

Authors: J Kawai; A Shinagawa; K Shibata; M Yoshino; M Itoh; Y Ishii; T Arakawa; A Hara; Y Fukunishi; H Konno; J Adachi; S Fukuda; K Aizawa; M Izawa; K Nishi; H Kiyosawa; S Kondo; I Yamanaka; T Saito; Y Okazaki; T Gojobori; H Bono; T Kasukawa; R Saito; K Kadota; H Matsuda; M Ashburner; S Batalov; T Casavant; W Fleischmann; T Gaasterland; C Gissi; B King; H Kochiwa; P Kuehl; S Lewis; Y Matsuo; I Nikaido; G Pesole; J Quackenbush; L M Schriml; F Staubli; R Suzuki; M Tomita; L Wagner; T Washio; K Sakai; T Okido; M Furuno; H Aono; R Baldarelli; G Barsh; J Blake; D Boffelli; N Bojunga; P Carninci; M F de Bonaldo; M J Brownstein; C Bult; C Fletcher; M Fujita; M Gariboldi; S Gustincich; D Hill; M Hofmann; D A Hume; M Kamiya; N H Lee; P Lyons; L Marchionni; J Mashima; J Mazzarelli; P Mombaerts; P Nordone; B Ring; M Ringwald; I Rodriguez; N Sakamoto; H Sasaki; K Sato; C Schönbach; T Seya; Y Shibata; K F Storch; H Suzuki; K Toyo-oka; K H Wang; C Weitz; C Whittaker; L Wilming; A Wynshaw-Boris; K Yoshida; Y Hasegawa; H Kawaji; S Kohtsuki; Y Hayashizaki
Journal: Nature Date: 2001-02-08 Impact factor: 49.962

3. Hepatocyte growth factor promotes lymphatic vessel formation and function.

Authors: Kentaro Kajiya; Satoshi Hirakawa; Beijia Ma; Ines Drinnenberg; Michael Detmar
Journal: EMBO J Date: 2005-07-28 Impact factor: 11.598

4. The transcriptional landscape of the mammalian genome.

Authors: P Carninci; T Kasukawa; S Katayama; J Gough; M C Frith; N Maeda; R Oyama; T Ravasi; B Lenhard; C Wells; R Kodzius; K Shimokawa; V B Bajic; S E Brenner; S Batalov; A R R Forrest; M Zavolan; M J Davis; L G Wilming; V Aidinis; J E Allen; A Ambesi-Impiombato; R Apweiler; R N Aturaliya; T L Bailey; M Bansal; L Baxter; K W Beisel; T Bersano; H Bono; A M Chalk; K P Chiu; V Choudhary; A Christoffels; D R Clutterbuck; M L Crowe; E Dalla; B P Dalrymple; B de Bono; G Della Gatta; D di Bernardo; T Down; P Engstrom; M Fagiolini; G Faulkner; C F Fletcher; T Fukushima; M Furuno; S Futaki; M Gariboldi; P Georgii-Hemming; T R Gingeras; T Gojobori; R E Green; S Gustincich; M Harbers; Y Hayashi; T K Hensch; N Hirokawa; D Hill; L Huminiecki; M Iacono; K Ikeo; A Iwama; T Ishikawa; M Jakt; A Kanapin; M Katoh; Y Kawasawa; J Kelso; H Kitamura; H Kitano; G Kollias; S P T Krishnan; A Kruger; S K Kummerfeld; I V Kurochkin; L F Lareau; D Lazarevic; L Lipovich; J Liu; S Liuni; S McWilliam; M Madan Babu; M Madera; L Marchionni; H Matsuda; S Matsuzawa; H Miki; F Mignone; S Miyake; K Morris; S Mottagui-Tabar; N Mulder; N Nakano; H Nakauchi; P Ng; R Nilsson; S Nishiguchi; S Nishikawa; F Nori; O Ohara; Y Okazaki; V Orlando; K C Pang; W J Pavan; G Pavesi; G Pesole; N Petrovsky; S Piazza; J Reed; J F Reid; B Z Ring; M Ringwald; B Rost; Y Ruan; S L Salzberg; A Sandelin; C Schneider; C Schönbach; K Sekiguchi; C A M Semple; S Seno; L Sessa; Y Sheng; Y Shibata; H Shimada; K Shimada; D Silva; B Sinclair; S Sperling; E Stupka; K Sugiura; R Sultana; Y Takenaka; K Taki; K Tammoja; S L Tan; S Tang; M S Taylor; J Tegner; S A Teichmann; H R Ueda; E van Nimwegen; R Verardo; C L Wei; K Yagi; H Yamanishi; E Zabarovsky; S Zhu; A Zimmer; W Hide; C Bult; S M Grimmond; R D Teasdale; E T Liu; V Brusic; J Quackenbush; C Wahlestedt; J S Mattick; D A Hume; C Kai; D Sasaki; Y Tomaru; S Fukuda; M Kanamori-Katayama; M Suzuki; J Aoki; T Arakawa; J Iida; K Imamura; M Itoh; T Kato; H Kawaji; N Kawagashira; T Kawashima; M Kojima; S Kondo; H Konno; K Nakano; N Ninomiya; T Nishio; M Okada; C Plessy; K Shibata; T Shiraki; S Suzuki; M Tagami; K Waki; A Watahiki; Y Okamura-Oho; H Suzuki; J Kawai; Y Hayashizaki
Journal: Science Date: 2005-09-02 Impact factor: 47.728

5. A promoter-level mammalian expression atlas.

Authors: Alistair R R Forrest; Hideya Kawaji; Michael Rehli; J Kenneth Baillie; Michiel J L de Hoon; Vanja Haberle; Timo Lassmann; Ivan V Kulakovskiy; Marina Lizio; Masayoshi Itoh; Robin Andersson; Christopher J Mungall; Terrence F Meehan; Sebastian Schmeier; Nicolas Bertin; Mette Jørgensen; Emmanuel Dimont; Erik Arner; Christian Schmidl; Ulf Schaefer; Yulia A Medvedeva; Charles Plessy; Morana Vitezic; Jessica Severin; Colin A Semple; Yuri Ishizu; Robert S Young; Margherita Francescatto; Intikhab Alam; Davide Albanese; Gabriel M Altschuler; Takahiro Arakawa; John A C Archer; Peter Arner; Magda Babina; Sarah Rennie; Piotr J Balwierz; Anthony G Beckhouse; Swati Pradhan-Bhatt; Judith A Blake; Antje Blumenthal; Beatrice Bodega; Alessandro Bonetti; James Briggs; Frank Brombacher; A Maxwell Burroughs; Andrea Califano; Carlo V Cannistraci; Daniel Carbajo; Yun Chen; Marco Chierici; Yari Ciani; Hans C Clevers; Emiliano Dalla; Carrie A Davis; Michael Detmar; Alexander D Diehl; Taeko Dohi; Finn Drabløs; Albert S B Edge; Matthias Edinger; Karl Ekwall; Mitsuhiro Endoh; Hideki Enomoto; Michela Fagiolini; Lynsey Fairbairn; Hai Fang; Mary C Farach-Carson; Geoffrey J Faulkner; Alexander V Favorov; Malcolm E Fisher; Martin C Frith; Rie Fujita; Shiro Fukuda; Cesare Furlanello; Masaaki Furino; Jun-ichi Furusawa; Teunis B Geijtenbeek; Andrew P Gibson; Thomas Gingeras; Daniel Goldowitz; Julian Gough; Sven Guhl; Reto Guler; Stefano Gustincich; Thomas J Ha; Masahide Hamaguchi; Mitsuko Hara; Matthias Harbers; Jayson Harshbarger; Akira Hasegawa; Yuki Hasegawa; Takehiro Hashimoto; Meenhard Herlyn; Kelly J Hitchens; Shannan J Ho Sui; Oliver M Hofmann; Ilka Hoof; Furni Hori; Lukasz Huminiecki; Kei Iida; Tomokatsu Ikawa; Boris R Jankovic; Hui Jia; Anagha Joshi; Giuseppe Jurman; Bogumil Kaczkowski; Chieko Kai; Kaoru Kaida; Ai Kaiho; Kazuhiro Kajiyama; Mutsumi Kanamori-Katayama; Artem S Kasianov; Takeya Kasukawa; Shintaro Katayama; Sachi Kato; Shuji Kawaguchi; Hiroshi Kawamoto; Yuki I Kawamura; Tsugumi Kawashima; Judith S Kempfle; Tony J Kenna; Juha Kere; Levon M Khachigian; Toshio Kitamura; S Peter Klinken; Alan J Knox; Miki Kojima; Soichi Kojima; Naoto Kondo; Haruhiko Koseki; Shigeo Koyasu; Sarah Krampitz; Atsutaka Kubosaki; Andrew T Kwon; Jeroen F J Laros; Weonju Lee; Andreas Lennartsson; Kang Li; Berit Lilje; Leonard Lipovich; Alan Mackay-Sim; Ri-ichiroh Manabe; Jessica C Mar; Benoit Marchand; Anthony Mathelier; Niklas Mejhert; Alison Meynert; Yosuke Mizuno; David A de Lima Morais; Hiromasa Morikawa; Mitsuru Morimoto; Kazuyo Moro; Efthymios Motakis; Hozumi Motohashi; Christine L Mummery; Mitsuyoshi Murata; Sayaka Nagao-Sato; Yutaka Nakachi; Fumio Nakahara; Toshiyuki Nakamura; Yukio Nakamura; Kenichi Nakazato; Erik van Nimwegen; Noriko Ninomiya; Hiromi Nishiyori; Shohei Noma; Shohei Noma; Tadasuke Noazaki; Soichi Ogishima; Naganari Ohkura; Hiroko Ohimiya; Hiroshi Ohno; Mitsuhiro Ohshima; Mariko Okada-Hatakeyama; Yasushi Okazaki; Valerio Orlando; Dmitry A Ovchinnikov; Arnab Pain; Robert Passier; Margaret Patrikakis; Helena Persson; Silvano Piazza; James G D Prendergast; Owen J L Rackham; Jordan A Ramilowski; Mamoon Rashid; Timothy Ravasi; Patrizia Rizzu; Marco Roncador; Sugata Roy; Morten B Rye; Eri Saijyo; Antti Sajantila; Akiko Saka; Shimon Sakaguchi; Mizuho Sakai; Hiroki Sato; Suzana Savvi; Alka Saxena; Claudio Schneider; Erik A Schultes; Gundula G Schulze-Tanzil; Anita Schwegmann; Thierry Sengstag; Guojun Sheng; Hisashi Shimoji; Yishai Shimoni; Jay W Shin; Christophe Simon; Daisuke Sugiyama; Takaai Sugiyama; Masanori Suzuki; Naoko Suzuki; Rolf K Swoboda; Peter A C 't Hoen; Michihira Tagami; Naoko Takahashi; Jun Takai; Hiroshi Tanaka; Hideki Tatsukawa; Zuotian Tatum; Mark Thompson; Hiroo Toyodo; Tetsuro Toyoda; Elvind Valen; Marc van de Wetering; Linda M van den Berg; Roberto Verado; Dipti Vijayan; Ilya E Vorontsov; Wyeth W Wasserman; Shoko Watanabe; Christine A Wells; Louise N Winteringham; Ernst Wolvetang; Emily J Wood; Yoko Yamaguchi; Masayuki Yamamoto; Misako Yoneda; Yohei Yonekura; Shigehiro Yoshida; Susan E Zabierowski; Peter G Zhang; Xiaobei Zhao; Silvia Zucchelli; Kim M Summers; Harukazu Suzuki; Carsten O Daub; Jun Kawai; Peter Heutink; Winston Hide; Tom C Freeman; Boris Lenhard; Vladimir B Bajic; Martin S Taylor; Vsevolod J Makeev; Albin Sandelin; David A Hume; Piero Carninci; Yoshihide Hayashizaki
Journal: Nature Date: 2014-03-27 Impact factor: 49.962

6. Ensembl 2011.

Authors: Paul Flicek; M Ridwan Amode; Daniel Barrell; Kathryn Beal; Simon Brent; Yuan Chen; Peter Clapham; Guy Coates; Susan Fairley; Stephen Fitzgerald; Leo Gordon; Maurice Hendrix; Thibaut Hourlier; Nathan Johnson; Andreas Kähäri; Damian Keefe; Stephen Keenan; Rhoda Kinsella; Felix Kokocinski; Eugene Kulesha; Pontus Larsson; Ian Longden; William McLaren; Bert Overduin; Bethan Pritchard; Harpreet Singh Riat; Daniel Rios; Graham R S Ritchie; Magali Ruffier; Michael Schuster; Daniel Sobral; Giulietta Spudich; Y Amy Tang; Stephen Trevanion; Jana Vandrovcova; Albert J Vilella; Simon White; Steven P Wilder; Amonida Zadissa; Jorge Zamora; Bronwen L Aken; Ewan Birney; Fiona Cunningham; Ian Dunham; Richard Durbin; Xosé M Fernández-Suarez; Javier Herrero; Tim J P Hubbard; Anne Parker; Glenn Proctor; Jan Vogel; Stephen M J Searle
Journal: Nucleic Acids Res Date: 2010-11-02 Impact factor: 16.971

7. Automated workflow for preparation of cDNA for cap analysis of gene expression on a single molecule sequencer.

Authors: Masayoshi Itoh; Miki Kojima; Sayaka Nagao-Sato; Eri Saijo; Timo Lassmann; Mutsumi Kanamori-Katayama; Ai Kaiho; Marina Lizio; Hideya Kawaji; Piero Carninci; Alistair R R Forrest; Yoshihide Hayashizaki
Journal: PLoS One Date: 2012-01-30 Impact factor: 3.240

8. The BioMart community portal: an innovative alternative to large, centralized data repositories.

Authors: Damian Smedley; Syed Haider; Steffen Durinck; Luca Pandini; Paolo Provero; James Allen; Olivier Arnaiz; Mohammad Hamza Awedh; Richard Baldock; Giulia Barbiera; Philippe Bardou; Tim Beck; Andrew Blake; Merideth Bonierbale; Anthony J Brookes; Gabriele Bucci; Iwan Buetti; Sarah Burge; Cédric Cabau; Joseph W Carlson; Claude Chelala; Charalambos Chrysostomou; Davide Cittaro; Olivier Collin; Raul Cordova; Rosalind J Cutts; Erik Dassi; Alex Di Genova; Anis Djari; Anthony Esposito; Heather Estrella; Eduardo Eyras; Julio Fernandez-Banet; Simon Forbes; Robert C Free; Takatomo Fujisawa; Emanuela Gadaleta; Jose M Garcia-Manteiga; David Goodstein; Kristian Gray; José Afonso Guerra-Assunção; Bernard Haggarty; Dong-Jin Han; Byung Woo Han; Todd Harris; Jayson Harshbarger; Robert K Hastings; Richard D Hayes; Claire Hoede; Shen Hu; Zhi-Liang Hu; Lucie Hutchins; Zhengyan Kan; Hideya Kawaji; Aminah Keliet; Arnaud Kerhornou; Sunghoon Kim; Rhoda Kinsella; Christophe Klopp; Lei Kong; Daniel Lawson; Dejan Lazarevic; Ji-Hyun Lee; Thomas Letellier; Chuan-Yun Li; Pietro Lio; Chu-Jun Liu; Jie Luo; Alejandro Maass; Jerome Mariette; Thomas Maurel; Stefania Merella; Azza Mostafa Mohamed; Francois Moreews; Ibounyamine Nabihoudine; Nelson Ndegwa; Céline Noirot; Cristian Perez-Llamas; Michael Primig; Alessandro Quattrone; Hadi Quesneville; Davide Rambaldi; James Reecy; Michela Riba; Steven Rosanoff; Amna Ali Saddiq; Elisa Salas; Olivier Sallou; Rebecca Shepherd; Reinhard Simon; Linda Sperling; William Spooner; Daniel M Staines; Delphine Steinbach; Kevin Stone; Elia Stupka; Jon W Teague; Abu Z Dayem Ullah; Jun Wang; Doreen Ware; Marie Wong-Erasmus; Ken Youens-Clark; Amonida Zadissa; Shi-Jian Zhang; Arek Kasprzyk
Journal: Nucleic Acids Res Date: 2015-04-20 Impact factor: 16.971

9. FANTOM5 transcriptome catalog of cellular states based on Semantic MediaWiki.

Authors: Imad Abugessaisa; Hisashi Shimoji; Serkan Sahin; Atsushi Kondo; Jayson Harshbarger; Marina Lizio; Yoshihide Hayashizaki; Piero Carninci; Alistair Forrest; Takeya Kasukawa; Hideya Kawaji
Journal: Database (Oxford) Date: 2016-07-09 Impact factor: 3.451

10. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data.

Authors: Mark D Robinson; Davis J McCarthy; Gordon K Smyth
Journal: Bioinformatics Date: 2009-11-11 Impact factor: 6.937

53 in total

1. SLIT3 deficiency attenuates pressure overload-induced cardiac fibrosis and remodeling.

Authors: Lianghui Gong; Shuyun Wang; Li Shen; Catherine Liu; Mena Shenouda; Baolei Li; Xiaoxiao Liu; John A Shaw; Alan L Wineman; Yifeng Yang; Dingding Xiong; Anne Eichmann; Sylvia M Evans; Stephen J Weiss; Ming-Sing Si
Journal: JCI Insight Date: 2020-06-18

Review 2. Translation-Focused Approaches to GPCR Drug Discovery for Cognitive Impairments Associated with Schizophrenia.

Authors: Cassandra J Hatzipantelis; Monica Langiu; Teresa H Vandekolk; Tracie L Pierce; Jess Nithianantharajah; Gregory D Stewart; Christopher J Langmead
Journal: ACS Pharmacol Transl Sci Date: 2020-10-28

3. PanglaoDB: a web server for exploration of mouse and human single-cell RNA sequencing data.

Authors: Oscar Franzén; Li-Ming Gan; Johan L M Björkegren
Journal: Database (Oxford) Date: 2019-01-01 Impact factor: 3.451

4. Robust T cell activation requires an eIF3-driven burst in T cell receptor translation.

Authors: Dasmanthie De Silva; Lucas Ferguson; Grant H Chin; Benjamin E Smith; Ryan A Apathy; Theodore L Roth; Franziska Blaeschke; Marek Kudla; Alexander Marson; Nicholas T Ingolia; Jamie Hd Cate
Journal: Elife Date: 2021-12-31 Impact factor: 8.140

5. RegEl corpus: identifying DNA regulatory elements in the scientific literature.

Authors: Samuele Garda; Freyda Lenihan-Geels; Sebastian Proft; Stefanie Hochmuth; Markus Schülke; Dominik Seelow; Ulf Leser
Journal: Database (Oxford) Date: 2022-06-27 Impact factor: 4.462

6. Contraceptive and Infertility Target DataBase: a contraceptive drug development tool for targeting and analysis of human reproductive specific tissues†.

Authors: Subarna Sinha; Merrill Knapp; John Pywtorak; Greg McCain; Kenneth Wingerden; Colin VanDervoort; J Mark Gondek; Peter Madrid; Toufan Parman; Stephen Gerrard; Jill E Long; Diana L Blithe; Stuart Moss; Min S Lee
Journal: Biol Reprod Date: 2021-12-20 Impact factor: 4.161

7. Integrating convolution and self-attention improves language model of human genome for interpreting non-coding regions at base-resolution.

Authors: Meng Yang; Lichao Huang; Haiping Huang; Hui Tang; Nan Zhang; Huanming Yang; Jihong Wu; Feng Mu
Journal: Nucleic Acids Res Date: 2022-08-12 Impact factor: 19.160

8. Pervasive and CpG-dependent promoter-like characteristics of transcribed enhancers.

Authors: Robin Steinhaus; Tonatiuh Gonzalez; Dominik Seelow; Peter N Robinson
Journal: Nucleic Acids Res Date: 2020-06-04 Impact factor: 16.971

9. Tissue-specific and transcription-dependent mechanisms regulate primary microRNA processing efficiency of the human chromosome 19 MicroRNA cluster.

Authors: Ábel Fóthi; Orsolya Biró; Zsuzsa Erdei; Ágota Apáti; Tamás I Orbán
Journal: RNA Biol Date: 2020-10-23 Impact factor: 4.652

10. Learning a genome-wide score of human-mouse conservation at the functional genomics level.

Authors: Soo Bin Kwon; Jason Ernst
Journal: Nat Commun Date: 2021-05-03 Impact factor: 14.919