Literature DB >> 22075998

ENCODE whole-genome data in the UCSC Genome Browser: update 2012.

Kate R Rosenbloom¹, Timothy R Dreszer, Jeffrey C Long, Venkat S Malladi, Cricket A Sloan, Brian J Raney, Melissa S Cline, Donna Karolchik, Galt P Barber, Hiram Clawson, Mark Diekhans, Pauline A Fujita, Mary Goldman, Robert C Gravell, Rachel A Harte, Angie S Hinrichs, Vanessa M Kirkup, Robert M Kuhn, Katrina Learned, Morgan Maddren, Laurence R Meyer, Andy Pohl, Brooke Rhead, Matthew C Wong, Ann S Zweig, David Haussler, W James Kent.

Abstract

The Encyclopedia of DNA Elements (ENCODE) Consortium is entering its 5th year of production-level effort generating high-quality whole-genome functional annotations of the human genome. The past year has brought the ENCODE compendium of functional elements to critical mass, with a diverse set of 27 biochemical assays now covering 200 distinct human cell types. Within the mouse genome, which has been under study by ENCODE groups for the past 2 years, 37 cell types have been assayed. Over 2000 individual experiments have been completed and submitted to the Data Coordination Center for public use. UCSC makes this data available on the quality-reviewed public Genome Browser (http://genome.ucsc.edu) and on an early-access Preview Browser (http://genome-preview.ucsc.edu). Visual browsing, data mining and download of raw and processed data files are all supported. An ENCODE portal (http://encodeproject.org) provides specialized tools and information about the ENCODE data sets.

Entities: CellLine Chemical Disease Gene Species

Mesh：

Year: 2011 PMID： 22075998 PMCID： PMC3245183 DOI： 10.1093/nar/gkr1012

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

Following a 4-year pilot phase aimed at identifying functional elements in selected regions comprising 1% of the human genome (1–2), the Encyclopedia of DNA Elements (ENCODE) project expanded to a whole-genome scope in September 2007 (3). Now beginning the 5th year of its mission to explore the ‘dark matter’ of the human genome, ENCODE contains an unprecedented range of diverse genomic data. With additional NHGRI support from the federal American Recovery and Reinvestment Act of 2009, complementary study of the mouse genome by ENCODE groups is underway. Previous manuscripts in this publication (4–5) have described the overall project and how the ENCODE Data Coordination Center at the University of California, Santa Cruz works with ENCODE labs worldwide to import their data sets, supporting documentation and metadata, and to make the data accessible to the broader biomedical community. A companion paper in this issue, ‘The UCSC Genome Browser database: Extensions and updates 2012’, provides background information about the UCSC Genome Browser database and infrastructure (6–7) that underlies ENCODE support at UCSC. This article focuses on ENCODE data and access tools introduced in 2011.

NEW DATA AVAILABILITY

With the increasing flood of ENCODE data production and the inevitable delays during quality review of submitted data, there arose a demand for an early access site for pre-reviewed data. In February 2011 UCSC deployed a Preview Browser (http://genome-preview.ucsc.edu) to serve this function. The Preview Browser is a weekly mirror of the UCSC internal development server. Data is made available on this site with the caveat that it is subject to change and has undergone only cursory review. The year 2011 marked the first release of Mouse ENCODE data to the public. The Mouse ENCODE project serves to complement the Human ENCODE project, furthering the understanding of human functional elements through comparative analysis. Mouse experiments aim to be analogous to those in the Human ENCODE project, as well as address experimental conditions not feasible in human, such as genetic knockouts and embryonic tissues. On the public UCSC server this year, we released mouse ENCODE results identifying transcription factor binding sites and histone marks by ChIP-seq, regions of transcription by RNA-seq, and open chromatin by DNase-seq. Data sets representing these functional elements in additional cell and tissue types, developmental stages and treatment conditions are hosted on the Preview Browser in preparation for quality review. During the previous year the ENCODE Consortium undertook a coordinated effort to remap and re-analyze all data sets from the initial phase of data production (referenced to the March 2006 NCBI36/hg18 human genome assembly) to the current standard human reference genome (February 2009 GRCh37/hg19). At the same time, data file formats were transitioned to newer standards [BAM (8) and bigWig/bigBed (9)]. The hg19 versions of all ENCODE data are now available at UCSC. The ENCODE human data repertoire expanded with the addition of 90 additional cell types (for a total of 235) and 57 additional transcription factor and histone modifications assayed (for a total of 177). Table 1 shows how data sets are distributed across the most intensively studied cell types.

Table 1.

ENCODE experiments in the human genome are focused on a set of cell lines selected by the Consortium for intensive study

Cell lines	Karyo	Tissue	Description	Datasets
Tier 1
GM12878	Normal	Blood	Lymphoblastoid	166
H1-hESC	Normal	Embryonic stem	Embryonic stem	89
K562	Cancer	Blood	Leukemia	253
Tier 2 existing
HeLa-S3	Cancer	Uterine cervix	Cervical carcinoma	118
HepG2	Cancer	Liver	Liver carcinoma	135
HUVEC	Normal	Umbilical endothelium	Umbilical vein endothelial	54
Tier 2 added in 2011
A549	Cancer	Lung	Lung carcinoma	35
CD14+	Normal	Blood	Monocyte	2
IMR90	Normal	Lung	Lung fibroblast	3
MCF-7	Cancer	Breast	Breast carcinoma	33
SK-N-SH	Cancer	Brain	Neuroblastoma	25
Tier 3
219 additional				928 total

All assays are performed in Tier 1; Tier 2 cell types are designated as the next level of priority.

ENCODE experiments in the human genome are focused on a set of cell lines selected by the Consortium for intensive study All assays are performed in Tier 1; Tier 2 cell types are designated as the next level of priority. New types of data available provided by UCSC this year include chromatin interaction maps by 5C (10) and ChIA-PET (11), nucleosome positioning by Mnase-seq, deep-sequenced DNAseI hypersensitive sites, SNP data for cell lines assayed for copy number variation, and three additional assays of RNA-binding proteins. The Gencode Gene set (12) has been updated to version 7 (May 2011). This version features 25% more manual annotation, along with improved organization and display of the annotation to make it more intuitive to biologists. Details pages for the annotated elements show evidence used to build the annotation such as UniProt (13), CCDS (14), RefSeq (15) and GenBank (16) sequences, and PubMed IDs for published experimental evidence. A notable addition this year was the first proteomics data within ENCODE. The new proteogenomics track features mappings of tandem mass spectrometry peptide profiles to the genome (17), complementing transcriptional evidence from RNA-based assays. The scope of DNA-binding site identification has been expanded by the introduction of epitope tagging of proteins (18) where antibodies suitable for chromatin immunoprecipitation are not available. This year also featured two new integrative tracks provided by ENCODE analysts: a segmentation of the genome into 15 states based on the chromatin state in 9 cell lines (19) and a synthesis of multiple sources of the open chromatin state in 7 cell lines. As integrative analysis is now a major focus of Consortium efforts, more analysis tracks integrating function across primary data sets are expected in the coming year. Table 2 lists the number of data sets currently available for each ENCODE data type.

Table 2.

ENCODE encompasses a diverse set of assays

Data type	No. of experiments
Chromatin Interactions
5C	4
ChIA-PET	6
DNA methylation
Methyl array	63
Methyl RRBS	93
Methyl-seq	20
Histone modifications
ChIP-seq	221
ChIP-seq (MOUSE)	28
Open chromatin
DNase-DGF	19
DNase-seq	135
Dnase-seq (MOUSE)	27
FAIRE-seq	27
RNA profiling
CAGE	45
Exon array	120
RNA-chip	26
RNA-PET	22
RNA-seq	151
RNA-seq (MOUSE)	27
Transcription factor binding sites
Epitope-tag ChIP-seq	12
ChIP-seq	745
ChIP-seq (MOUSE)	92
Other
Bi-directional promoters	1
DNA cleavage	1
DNA-PET	6
Gencode genes	5
Genotype	64
Negative regulatory elements	2
Nucleosome positioning	2
Proteogenomics	5
RNA binding proteins	49
Short read mapability	13

Descriptive overviews along with methods and references are included in the description page that accompanies all datasets.

ENCODE encompasses a diverse set of assays Descriptive overviews along with methods and references are included in the description page that accompanies all datasets. Validation data sets to accompany primary data sets are now available for open chromatin and transcription factor binding site experiments.

NEW ACCESS INFORMATION AND TOOLS

The ENCODE portal (http://encodeproject.org), which is the centralized resource for accessing the information and tools described in this section, was extensively upgraded this year. An entire section for Mouse ENCODE resources has been added. The experimental guidelines and data standards developed by the ENCODE Consortium this year for a broad range of whole-genome assays (RNA-seq, ChIP-seq, DNase-seq, DNA methylation assays) are hosted on a dedicated portal Data Standards page, along with platform characterization summaries and references. A key resource for learning about ENCODE data is the OpenHelix ENCODE tutorial (openhelix.com/ENCODE), a free Online resource released in November 2010. This tutorial provides an overview of the ENCODE project, summarizes the types of data available through ENCODE, and details methods for accessing ENCODE data via the UCSC Genome Browser. The tutorial, and accompanying instructional material, is free to the public and is sponsored by the DCC. Other resources for learning about ENCODE data usage can be found on the new ENCODE portal Education and Outreach page. The DCC devoted considerable engineering effort this year to developing tools to enable users to easily locate data of interest within the overwhelming set of ENCODE data tracks and subtracks. For an overview of ENCODE data, the DCC now provides a Data Summary page on the ENCODE portal. This page includes a spreadsheet in multiple formats itemizing ENCODE experiments by lab, data type, cell type and other experimental variables. The premier methods for locating ENCODE data are the new Track Search and File Search tools, available from the ENCODE portal and Genome Browser web pages. Both of these tools allow free-text searching by keyword, coupled with an advanced search feature that provides selectable lists of terms from the ENCODE controlled vocabulary (described below) to guide the search. Multiple terms can be applied in both ‘and’ and ‘or’ combinations. For example, in a single advanced search, a user can locate tracks showing evidence of the enhancer-associated histone modifications ‘H3K4me1’ and ‘H3K27Ac’ in either ‘NHLF’ or ‘IMR90’ lung cell lines. The Track Search tool is described more fully in the companion Genome Browser paper in this issue. The File Search tool locates downloadable files for analysis across the full range of ENCODE data sets, and the related track File Downloads tool (available from the track configuration page) selects files within a single track. The Downloads page of many ENCODE tracks include hundreds and even thousands of files. Using controlled vocabulary terms relevant for each experiment set, the files are now listed in a sortable and filterable table. In a related effort, the DCC this year implemented an accessioning scheme to group related files and tracks within logical experiments. These accessions make it easier to relate associated files and provide a short, stable identifier for citations. Each experiment groups a set of data from a single providing laboratory for a single assay in a single cell type and set of experimental conditions. All replicates and levels of data (raw sequence files and mappings to multiple genome assemblies, processed data such as peak calls or putative transcription isoforms) associated with a single logical experiment are assigned the same accession. The DCC accession is visible everywhere metadata for a track or file appears. As of this writing, ENCODE comprises 1861 experiments in human and 174 experiments in mouse. The ENCODE DCC controlled vocabulary (CV) is a mechanism for associating metadata with ENCODE experiments. Metadata terms are added as needed, and the metadata controlled vocabularies have been expanded this year for both human and mouse. There are currently 23 metadata controlled vocabularies. The largest vocabularies are ‘Antibody’ (199 terms) and ‘Cell Line’ (235 human and 34 mouse cell types). The CV has received extensive curation and quality review this year to ensure completeness and eliminate duplicate and confusing terms. This effort has led to a more informative set of metadata associated with each track, including links to term descriptions and supporting documents. Two specific areas where the CV was improved are the cell type karyotype and lineage terms. The karyotype term has been simplified to describe cell lines that are derived from normal or cancerous tissues. At present 72 cell lines have been annotated as normal and 47 cell lines as cancerous. The lineage term has been used to describe the progenitor tissue type from which the source tissue type has differentiated. The values ectoderm, endoderm, mesoderm and inner cell mass are associated with 36, 45, 90 and 12 cell lines, respectively. A new Genome Browser feature, Data Hubs, supports display of off-site annotations alongside ENCODE data. The first publicly provided hub presents the Roadmap Epigenomics (20) catalog of data sets, enabling close comparison of the voluminous and complementary results from these two consortia. Figure 1 shows a Genome Browser screen showcasing ENCODE and Roadmap Epigenomics data together. For more information about the Data Hubs feature, see the Genome Browser update in this issue.

Figure 1.

ENCODE data displayed in the UCSC Genome Browser together with two annotations from the Roadmap Epigenomics Release III data hub. The genomic region contains two protein coding genes, plasma membrane calcium ATPase 4a (ATP2B4) and lymphocyte transmembrane adaptor 1 isoform a (LAX1). The GENCODE Genes track shows multiple variant transcripts for both genes as well as a snoRNA in the region. The Epigenomics Roadmap tracks just below the GENCODE track show H3K4me3, a histone mark associated with promoters, in two cell lines not assayed by the ENCODE project. These tracks show support for the short, non-coding form of LAX1 in mesenchymal stem cells, and support for the longer isoform in CD34 cells, based on peaks at likely promoter regions. The next three tracks are transparent overlays from seven cell lines assayed by the ENCODE project showing the H3K4me3 mark again, the H3K27Ac mark associated with active regulatory regions, and a log plot of transcription levels in the same cell lines. The histone marks and pattern of transcription show coordinated, cell-type-specific activity; the ATP2B4 gene is most active in NHEK (purple) and K562 (blue) cells, while LAX1 is most active in GM12878 (orange) cells. The DNAse and Transcription Factor ChIP-seq clusters shown in the last two tracks summarize data from a much wider range of cell lines and indicate a large number of regulatory regions. Additional details for these annotations are available on click-through.

ACCESSING ENCODE DATA

ENCODE data availability is summarized in Tables 1–3 in this article, and a comprehensive spreadsheet of experiments available from the ENCODE portal Data Summary page. Data sets marked as having ‘released’ status are available from the UCSC public server, http://genome.ucsc.edu. Data sets marked ‘displayed’ or ‘reviewing’ can be viewed at the preview site, http://genome-preview.ucsc.edu. Human ENCODE data is available on two human genome assemblies: NCBI36/hg18 and GRCh37/hg19. Mouse ENCODE data is provided on the mouse NCBI37/mm9 assembly. ENCODE vital statistics, as of September 2011 All ENCODE data is subject to the Consortium data policy, which places some restrictions on use for the 9 months after the data becomes publicly available. Restriction timestamps for all experiments are prominently displayed on the track and file information pages, as well as being listed on the Data Summary spreadsheet. The data policy is described in detail on the Data Policy page of the ENCODE portal. ENCODE GEO submissions are listed on the GEO ENCODE summary page, http://www.ncbi.nlm.nih.gov/geo/info/ENCODE.html. ENCODE has been assigned NCBI BioProject identifiers to further organize the data: PRJNA30707 for Human ENCODE (with the subproject PRJNA63443 for Production phase data) and PRJNA50617 for Mouse ENCODE. Data in each project is further categorized as epigenomic, functional genomics or transcriptome.

FUTURE WORK

Highlights of the fifth and final year of this phase of the ENCODE project will be the fruition of ongoing integrative analysis efforts and dissemination of the results to the DCC, promotion of an additional collection of cell types for Consortium-wide use (see Table 1), expansion of the transcription factor space based on community input, selected new experiment types in high-value areas such as single-cell assays, and additional validation data sets. The Mouse ENCODE project makes its future experiment planning publicly available on the ENCODE portal Mouse Data Summary page. DCC efforts during the 5th year will continue to emphasize data accessibility and usability. We have scheduled an update to the OpenHelix ENCODE tutorial, and are contracting for the design and production of ENCODE Quick Reference Cards. A new Data Matrix web application on the portal will provide table and matrix-based display of the breadth of ENCODE data, with click-through access to search results for selected experiments. Figure 2 shows a snapshot as of September 2011. We expect to release this feature on the ENCODE portal by late fall 2011.

Figure 2.

Data matrix display and selection of files for download. This feature will be linked to the ENCODE portal, and will navigate to the Advanced Search features of File and Track Search.

Data matrix display and selection of files for download. This feature will be linked to the ENCODE portal, and will navigate to the Advanced Search features of File and Track Search. In upcoming months we expect the new data hub feature will be adopted more widely, and we anticipate that the larger ENCODE production groups will migrate to hub-based hosting of much of their data. The DCC will be implementing search across data hubs to further enhance the synergy between UCSC-hosted and remote data sources.

CONTACT INFORMATION

General questions and feedback about ENCODE data at UCSC should be directed to the ENCODE mailing list: encode@soe.ucsc.edu. General questions about the Genome Browser should be sent to the UCSC browser mailing list: genome@soe.ucsc.edu. Specific questions about details of laboratory methods or data interpretation should be directed to the ENCODE laboratory contact listed on the description page for that data set. We announce releases of new ENCODE data via the ENCODE announcement list. To subscribe, visit https://lists.soe.ucsc.edu/mailman/listinfo/encode-announce.

FUNDING

National Human Genome Research Institute (grants 5P41HG002371-10 and 3P41HG002371-10S1 to the UCSC Center for Genomic Science, and grant 5U41HG004568-04 and 3U41HG004568-03S1 to the UCSC ENCODE Data Coordination Center); Howard Hughes Medical Institute (to D.H.). Funding for the open access charge: The Howard Hughes Medical Institute. Conflict of interest statement. The authors receive royalties from the sale of UCSC Genome Browser source code licenses to commercial entities.

Table 3.

ENCODE vital statistics, as of September 2011

Category	Human	Mouse
Experiments	1861	174
Assay types	29	3
Cell and tissue types	235	34
ChIP antibodies	179	30

20 in total

1. The human genome browser at UCSC.

Authors: W James Kent; Charles W Sugnet; Terrence S Furey; Krishna M Roskin; Tom H Pringle; Alan M Zahler; David Haussler
Journal: Genome Res Date: 2002-06 Impact factor: 9.043

2. Chromosome Conformation Capture Carbon Copy (5C): a massively parallel solution for mapping interactions between genomic elements.

Authors: Josée Dostie; Todd A Richmond; Ramy A Arnaout; Rebecca R Selzer; William L Lee; Tracey A Honan; Eric D Rubio; Anton Krumm; Justin Lamb; Chad Nusbaum; Roland D Green; Job Dekker
Journal: Genome Res Date: 2006-09-05 Impact factor: 9.043

3. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project.

Authors: Ewan Birney; John A Stamatoyannopoulos; Anindya Dutta; Roderic Guigó; Thomas R Gingeras; Elliott H Margulies; Zhiping Weng; Michael Snyder; Emmanouil T Dermitzakis; Robert E Thurman; Michael S Kuehn; Christopher M Taylor; Shane Neph; Christoph M Koch; Saurabh Asthana; Ankit Malhotra; Ivan Adzhubei; Jason A Greenbaum; Robert M Andrews; Paul Flicek; Patrick J Boyle; Hua Cao; Nigel P Carter; Gayle K Clelland; Sean Davis; Nathan Day; Pawandeep Dhami; Shane C Dillon; Michael O Dorschner; Heike Fiegler; Paul G Giresi; Jeff Goldy; Michael Hawrylycz; Andrew Haydock; Richard Humbert; Keith D James; Brett E Johnson; Ericka M Johnson; Tristan T Frum; Elizabeth R Rosenzweig; Neerja Karnani; Kirsten Lee; Gregory C Lefebvre; Patrick A Navas; Fidencio Neri; Stephen C J Parker; Peter J Sabo; Richard Sandstrom; Anthony Shafer; David Vetrie; Molly Weaver; Sarah Wilcox; Man Yu; Francis S Collins; Job Dekker; Jason D Lieb; Thomas D Tullius; Gregory E Crawford; Shamil Sunyaev; William S Noble; Ian Dunham; France Denoeud; Alexandre Reymond; Philipp Kapranov; Joel Rozowsky; Deyou Zheng; Robert Castelo; Adam Frankish; Jennifer Harrow; Srinka Ghosh; Albin Sandelin; Ivo L Hofacker; Robert Baertsch; Damian Keefe; Sujit Dike; Jill Cheng; Heather A Hirsch; Edward A Sekinger; Julien Lagarde; Josep F Abril; Atif Shahab; Christoph Flamm; Claudia Fried; Jörg Hackermüller; Jana Hertel; Manja Lindemeyer; Kristin Missal; Andrea Tanzer; Stefan Washietl; Jan Korbel; Olof Emanuelsson; Jakob S Pedersen; Nancy Holroyd; Ruth Taylor; David Swarbreck; Nicholas Matthews; Mark C Dickson; Daryl J Thomas; Matthew T Weirauch; James Gilbert; Jorg Drenkow; Ian Bell; XiaoDong Zhao; K G Srinivasan; Wing-Kin Sung; Hong Sain Ooi; Kuo Ping Chiu; Sylvain Foissac; Tyler Alioto; Michael Brent; Lior Pachter; Michael L Tress; Alfonso Valencia; Siew Woh Choo; Chiou Yu Choo; Catherine Ucla; Caroline Manzano; Carine Wyss; Evelyn Cheung; Taane G Clark; James B Brown; Madhavan Ganesh; Sandeep Patel; Hari Tammana; Jacqueline Chrast; Charlotte N Henrichsen; Chikatoshi Kai; Jun Kawai; Ugrappa Nagalakshmi; Jiaqian Wu; Zheng Lian; Jin Lian; Peter Newburger; Xueqing Zhang; Peter Bickel; John S Mattick; Piero Carninci; Yoshihide Hayashizaki; Sherman Weissman; Tim Hubbard; Richard M Myers; Jane Rogers; Peter F Stadler; Todd M Lowe; Chia-Lin Wei; Yijun Ruan; Kevin Struhl; Mark Gerstein; Stylianos E Antonarakis; Yutao Fu; Eric D Green; Ulaş Karaöz; Adam Siepel; James Taylor; Laura A Liefer; Kris A Wetterstrand; Peter J Good; Elise A Feingold; Mark S Guyer; Gregory M Cooper; George Asimenos; Colin N Dewey; Minmei Hou; Sergey Nikolaev; Juan I Montoya-Burgos; Ari Löytynoja; Simon Whelan; Fabio Pardi; Tim Massingham; Haiyan Huang; Nancy R Zhang; Ian Holmes; James C Mullikin; Abel Ureta-Vidal; Benedict Paten; Michael Seringhaus; Deanna Church; Kate Rosenbloom; W James Kent; Eric A Stone; Serafim Batzoglou; Nick Goldman; Ross C Hardison; David Haussler; Webb Miller; Arend Sidow; Nathan D Trinklein; Zhengdong D Zhang; Leah Barrera; Rhona Stuart; David C King; Adam Ameur; Stefan Enroth; Mark C Bieda; Jonghwan Kim; Akshay A Bhinge; Nan Jiang; Jun Liu; Fei Yao; Vinsensius B Vega; Charlie W H Lee; Patrick Ng; Atif Shahab; Annie Yang; Zarmik Moqtaderi; Zhou Zhu; Xiaoqin Xu; Sharon Squazzo; Matthew J Oberley; David Inman; Michael A Singer; Todd A Richmond; Kyle J Munn; Alvaro Rada-Iglesias; Ola Wallerman; Jan Komorowski; Joanna C Fowler; Phillippe Couttet; Alexander W Bruce; Oliver M Dovey; Peter D Ellis; Cordelia F Langford; David A Nix; Ghia Euskirchen; Stephen Hartman; Alexander E Urban; Peter Kraus; Sara Van Calcar; Nate Heintzman; Tae Hoon Kim; Kun Wang; Chunxu Qu; Gary Hon; Rosa Luna; Christopher K Glass; M Geoff Rosenfeld; Shelley Force Aldred; Sara J Cooper; Anason Halees; Jane M Lin; Hennady P Shulha; Xiaoling Zhang; Mousheng Xu; Jaafar N S Haidar; Yong Yu; Yijun Ruan; Vishwanath R Iyer; Roland D Green; Claes Wadelius; Peggy J Farnham; Bing Ren; Rachel A Harte; Angie S Hinrichs; Heather Trumbower; Hiram Clawson; Jennifer Hillman-Jackson; Ann S Zweig; Kayla Smith; Archana Thakkapallayil; Galt Barber; Robert M Kuhn; Donna Karolchik; Lluis Armengol; Christine P Bird; Paul I W de Bakker; Andrew D Kern; Nuria Lopez-Bigas; Joel D Martin; Barbara E Stranger; Abigail Woodroffe; Eugene Davydov; Antigone Dimas; Eduardo Eyras; Ingileif B Hallgrímsdóttir; Julian Huppert; Michael C Zody; Gonçalo R Abecasis; Xavier Estivill; Gerard G Bouffard; Xiaobin Guan; Nancy F Hansen; Jacquelyn R Idol; Valerie V B Maduro; Baishali Maskeri; Jennifer C McDowell; Morgan Park; Pamela J Thomas; Alice C Young; Robert W Blakesley; Donna M Muzny; Erica Sodergren; David A Wheeler; Kim C Worley; Huaiyang Jiang; George M Weinstock; Richard A Gibbs; Tina Graves; Robert Fulton; Elaine R Mardis; Richard K Wilson; Michele Clamp; James Cuff; Sante Gnerre; David B Jaffe; Jean L Chang; Kerstin Lindblad-Toh; Eric S Lander; Maxim Koriabine; Mikhail Nefedov; Kazutoyo Osoegawa; Yuko Yoshinaga; Baoli Zhu; Pieter J de Jong
Journal: Nature Date: 2007-06-14 Impact factor: 49.962

4. The Sequence Alignment/Map format and SAMtools.

Authors: Heng Li; Bob Handsaker; Alec Wysoker; Tim Fennell; Jue Ruan; Nils Homer; Gabor Marth; Goncalo Abecasis; Richard Durbin
Journal: Bioinformatics Date: 2009-06-08 Impact factor: 6.937

5. BAC TransgeneOmics: a high-throughput method for exploration of protein function in mammals.

Authors: Ina Poser; Mihail Sarov; James R A Hutchins; Jean-Karim Hériché; Yusuke Toyoda; Andrei Pozniakovsky; Daniela Weigl; Anja Nitzsche; Björn Hegemann; Alexander W Bird; Laurence Pelletier; Ralf Kittler; Sujun Hua; Ronald Naumann; Martina Augsburg; Martina M Sykora; Helmut Hofemeister; Youming Zhang; Kim Nasmyth; Kevin P White; Steffen Dietzel; Karl Mechtler; Richard Durbin; A Francis Stewart; Jan-Michael Peters; Frank Buchholz; Anthony A Hyman
Journal: Nat Methods Date: 2008-04-06 Impact factor: 28.547

6. Mapping and analysis of chromatin state dynamics in nine human cell types.

Authors: Jason Ernst; Pouya Kheradpour; Tarjei S Mikkelsen; Noam Shoresh; Lucas D Ward; Charles B Epstein; Xiaolan Zhang; Li Wang; Robbyn Issner; Michael Coyne; Manching Ku; Timothy Durham; Manolis Kellis; Bradley E Bernstein
Journal: Nature Date: 2011-03-23 Impact factor: 49.962

7. NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins.

Authors: Kim D Pruitt; Tatiana Tatusova; Donna R Maglott
Journal: Nucleic Acids Res Date: 2006-11-27 Impact factor: 16.971

8. NCBI GEO: mining tens of millions of expression profiles--database and tools update.

Authors: Tanya Barrett; Dennis B Troup; Stephen E Wilhite; Pierre Ledoux; Dmitry Rudnev; Carlos Evangelista; Irene F Kim; Alexandra Soboleva; Maxim Tomashevsky; Ron Edgar
Journal: Nucleic Acids Res Date: 2006-11-11 Impact factor: 16.971

9. GENCODE: producing a reference annotation for ENCODE.

Authors: Jennifer Harrow; France Denoeud; Adam Frankish; Alexandre Reymond; Chao-Kung Chen; Jacqueline Chrast; Julien Lagarde; James G R Gilbert; Roy Storey; David Swarbreck; Colette Rossier; Catherine Ucla; Tim Hubbard; Stylianos E Antonarakis; Roderic Guigo
Journal: Genome Biol Date: 2006-08-07 Impact factor: 13.583

10. NCBI Reference Sequences: current status, policy and new initiatives.

Authors: Kim D Pruitt; Tatiana Tatusova; William Klimke; Donna R Maglott
Journal: Nucleic Acids Res Date: 2008-10-16 Impact factor: 16.971

147 in total

1. Testing for Ancient Selection Using Cross-population Allele Frequency Differentiation.

Authors: Fernando Racimo
Journal: Genetics Date: 2015-11-23 Impact factor: 4.562

Review 2. Non-coding RNAs in hepatocellular carcinoma: molecular functions and pathological implications.

Authors: Chun-Ming Wong; Felice Ho-Ching Tsang; Irene Oi-Lin Ng
Journal: Nat Rev Gastroenterol Hepatol Date: 2018-01-10 Impact factor: 46.802

3. Network2Canvas: network visualization on a canvas with enrichment analysis.

Authors: Christopher M Tan; Edward Y Chen; Ruth Dannenfelser; Neil R Clark; Avi Ma'ayan
Journal: Bioinformatics Date: 2013-06-07 Impact factor: 6.937

4. Epigenetics of human papillomaviruses.

Authors: Eric Johannsen; Paul F Lambert
Journal: Virology Date: 2013-08-13 Impact factor: 3.616

5. Mutant p53 is a transcriptional co-factor that binds to G-rich regulatory regions of active genes and generates transcriptional plasticity.

Authors: Timo Quante; Benjamin Otto; Marie Brázdová; Iva Kejnovská; Wolfgang Deppert; Genrich V Tolstonog
Journal: Cell Cycle Date: 2012-08-21 Impact factor: 4.534

6. Sequence variation in TMEM18 in association with body mass index: Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) Consortium Targeted Sequencing Study.

Authors: Ching-Ti Liu; Kristin L Young; Jennifer A Brody; Matthias Olden; Mary K Wojczynski; Nancy Heard-Costa; Guo Li; Alanna C Morrison; Donna Muzny; Richard A Gibbs; Jeffrey G Reid; Yaming Shao; Yanhua Zhou; Eric Boerwinkle; Gerardo Heiss; Lynne Wagenknecht; Barbara McKnight; Ingrid B Borecki; Caroline S Fox; Kari E North; L Adrienne Cupples
Journal: Circ Cardiovasc Genet Date: 2014-06

7. Cardiac gene expression data and in silico analysis provide novel insights into human and mouse taste receptor gene regulation.

Authors: Simon R Foster; Enzo R Porrello; Maurizio Stefani; Nicola J Smith; Peter Molenaar; Cristobal G dos Remedios; Walter G Thomas; Mirana Ramialison
Journal: Naunyn Schmiedebergs Arch Pharmacol Date: 2015-05-20 Impact factor: 3.000

Review 8. The rise of regulatory RNA.

Authors: Kevin V Morris; John S Mattick
Journal: Nat Rev Genet Date: 2014-04-29 Impact factor: 53.242

9. Evolutionary history and adaptation from high-coverage whole-genome sequences of diverse African hunter-gatherers.

Authors: Joseph Lachance; Benjamin Vernot; Clara C Elbers; Bart Ferwerda; Alain Froment; Jean-Marie Bodo; Godfrey Lema; Wenqing Fu; Thomas B Nyambo; Timothy R Rebbeck; Kun Zhang; Joshua M Akey; Sarah A Tishkoff
Journal: Cell Date: 2012-07-26 Impact factor: 41.582

10. Repression of Esophageal Neoplasia and Inflammatory Signaling by Anti-miR-31 Delivery In Vivo.

Authors: Cristian Taccioli; Michela Garofalo; Hongping Chen; Yubao Jiang; Guidantonio Malagoli Tagliazucchi; Gianpiero Di Leva; Hansjuerg Alder; Paolo Fadda; Justin Middleton; Karl J Smalley; Tommaso Selmi; Srivatsava Naidu; John L Farber; Carlo M Croce; Louise Y Fong
Journal: J Natl Cancer Inst Date: 2015-08-18 Impact factor: 13.506