Literature DB >> 19933760

H-InvDB in 2009: extended database and data mining resources for human genes and transcripts.

Chisato Yamasaki¹, Katsuhiko Murakami, Jun-ichi Takeda, Yoshiharu Sato, Akiko Noda, Ryuichi Sakate, Takuya Habara, Hajime Nakaoka, Fusano Todokoro, Akihiro Matsuya, Tadashi Imanishi, Takashi Gojobori.

Abstract

We report the extended database and data mining resources newly released in the H-Invitational Database (H-InvDB; http://www.h-invitational.jp/). H-InvDB is a comprehensive annotation resource of human genes and transcripts, and consists of two main views and six sub-databases. The latest release of H-InvDB (release 6.2) provides the annotation for 219,765 human transcripts in 43,159 human gene clusters based on human full-length cDNAs and mRNAs. H-InvDB now provides several new annotation features, such as mapping of microarray probes, new gene models, relation to known ncRNAs and information from the Glycogene database. H-InvDB also provides useful data mining resources-'Navigation search', 'H-InvDB Enrichment Analysis Tool (HEAT)' and web service APIs. 'Navigation search' is an extended search system that enables complicated searches by combining 16 different search options. HEAT is a data mining tool for automatically identifying features specific to a given human gene set. HEAT searches for H-InvDB annotations that are significantly enriched in a user-defined gene set, as compared with the entire H-InvDB representative transcripts. H-InvDB now has web service APIs of SOAP and REST to allow the use of H-InvDB data in programs, providing the users extended data accessibility.

Entities: Disease Gene Species

Mesh：

Substances：

Year: 2009 PMID： 19933760 PMCID： PMC2808976 DOI： 10.1093/nar/gkp1020

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

We held the first international workshop entitled ‘Human Full-length cDNA Annotation Invitational’ (abbreviated as H-Invitational or H-Inv) in Tokyo, Japan, from 25 August to 3 September 2002, and constructed a novel, integrative database of human transcriptome called H-Invitational Database (H-InvDB; http://www.h-invitational.jp/) (1). H-InvDB is a comprehensive annotation resource of human genes and transcripts. On 20 April 2009, we marked the fifth anniversary of the opening of H-InvDB to the public. During this period, we released six major updates, namely H-InvDB 1.0(1), 2.0(2), 3.0, 4.0(3), 5.0 and 6.0. The latest release (release 6.2) provides annotations for 219 765 human transcripts in 43 159 human gene clusters based on human full-length cDNAs and mRNAs. The increases in the number of entries in H-InvDB are summarized in Table 1.

Table 1.

Statistics of H-InvDB entries

H-InvDB release	Date of release	Number of transcripts (HIT)	Number of gene clusters (HIX)	Number of proteins (HIP)	Annotation jamboree
1.0	20 April 2004	41 118	21 037	–	H-Invitational 1^a	August 2002
2.0	31 August 2005	56 419	25 585	–	H-Invitational 2 FA^a	November 2003
3.0	31 March 2006	167 992	35 005	–	All human gene FA meeting 2005^b	October 2005
4.0	28 March 2007	175 542	34 701	173 690	All human gene FA meeting 2006^b	October 2006
5.0	26 December 2008	187 156	36 073	124 280	All human gene FA meeting 2007^b	October 2007
6.0	18 December 2008	219 765	43 159	133 523
6.2	30 March 2009	219 765	43 159	133 629

aMeeting of H-Invitational project.

bMeeting hosted by Genome Information Integration Project (GIIP).

Statistics of H-InvDB entries aMeeting of H-Invitational project. bMeeting hosted by Genome Information Integration Project (GIIP). For these human transcripts, proteins and genes, we now provide several new annotation features, such as mapping of probes, new gene models, relation to known ncRNAs and glycogene information. H-InvDB now also provides useful data mining resources—‘Navigation search’, ‘H-InvDB Enrichment Analysis Tool (HEAT)’ and web service APIs. Here, we report on the extended database and data mining resources newly released in H-InvDB.

THE EXTENDED DATABASE OF H-InvDB RELEASE 6.2

In our latest release of H-InvDB release 6.2, we annotated 162 395 human mRNAs extracted from the International Nucleotide Sequence Databases (INSD)(4) in addition to 54 927 human FLcDNAs that were available on 9 May 2008. We mapped these human transcripts onto the human genome sequences (NCBI build 36.2) and determined 43 159 human gene clusters. For these human gene clusters, we defined 34 511 (80.0%) protein-coding and 7747 (17.9%) non-protein-coding loci, whereas 901 (2.1%) transcribed loci overlapped with predicted pseudogenes. We then followed functional and further comprehensive annotation procedures as described previously (1–3). The statistics of manually curated representative human proteins are summarized in Table 2.

Table 2.

Statistics of curated representative H-Inv proteins (H-InvDB release 6.2)

Category	Definition	Number of representative HITs	Percentage
I	Identical to known^a human protein (≥98% identity, =100% coverage)	13 314	37.71
II	Similar to known^a protein (≥50% identity, ≥50% coverage)	3380	9.57
III	InterPro domain containing protein	2584	7.32
IV	Conserved hypothetical protein	4584	12.98
V	Hypothetical protein	5203	14.74
VI	Hypothetical short protein (20–79 amino acids)	5446	15.43
VII	Pseudogene candidates	901	2.55
Total		35 303	100.00

a‘Known’ proteins are experimentally validated proteins in literatures.

Statistics of curated representative H-Inv proteins (H-InvDB release 6.2) a‘Known’ proteins are experimentally validated proteins in literatures. In H-InvDB, we now include annotation for two kinds of high-quality predicted transcripts: eHITs and pHITs. The eHIT transcripts are computationally and manually annotated gene models whose exon–intron structures are synthetically predicted by integrating the information of EST and mRNA sequences. pHIT transcripts are the novel gene candidates predicted from human genome sequences using CAGE tags and several gene prediction programs summarized using JIGSAW (5). In H-InvDB release 6.2, we provided 612 eHIT and 1831 pHIT predicted transcripts. For eHIT gene models, we assigned HIT ID prefixed ‘e’ (e.g. eHIT000000001) and for pHIT gene models, we assigned HIT ID prefixed ‘p’ (e.g. pHIT000000001). For example, pHIT000015735 is mapped on chromosome 9p13.3 and consists of 18 exons. The functional description for pHIT000015735 is ‘Interleukin-11 receptor alpha chain precursor (IL-11R-alpha) (IL-11RA), Isoform HCR2’ which is classified as H-InvDB similarity category I, Identical to known human protein. For pHIT000015735, HIX0153289 is assigned as cluster ID and HIP000180408 is assigned as protein ID. It is a newly identified isoform of a known UniProtKB/Swiss-Prot entry, Q14626-2, which is a soluble form of Interleukin-11 receptor alpha chain (sIL11RA). In HIX0153289, pHIT000015735 is an only member and no other human mRNA, RefSeq nor Ensembl transcripts are included, suggesting that this is a novel human transcript candidate with a support of UniProtKB/Swiss-Prot entry. An example screen shot of G-integra for pHIT000015735 is shown in Figure 1.

Figure 1.

pHIT gene model in G-integra genome browser. An image of G-integra genome browser for a pHIT gene model, pHIT000015735, is shown (http://www.h-invitational.jp/hinv/g-integra/cgi-bin/f_genemap.cgi?id=pHIT000015735). Gene structure of pHIT000015735 is indicated by blue solid square at all human gene and JIGSAW track. The H-InvDB annotation resources consist of two main views: Transcript view and Locus view, and six sub-databases: the DiseaseInfo Viewer H-ANGEL (6), G-integra, Evola (7), the PPI view and the Gene family/group view with appropriate crosslinks. Here, we describe the viewers that we have extended since our previous report (3). The new annotation features in H-InvDB are summarized in Table 3.

Table 3.

New annotated features in H-InvDB

No.	Annotation item	Area	Available at
1	Mappings of microarray probes to H-InvDB data	Expression	‘Expression’ tab in Transcript view
2	New ID for gene families/groups (HIF)	Gene family	‘Function’ tab in Transcript view, Locus view, and Gene Family/groups view.
3	pHIT gene models	Gene model	Transcript view, Locus view, G-integra and all the related viewers
4	eHIT gene models	Gene model	Transcript view, Locus view, G-integra and all the related viewers
5	Truncation judgment	Quality control	‘Transcript Information’ tab in Transcript view
6	Kozak sequence	Quality control	‘Transcript Information’ tab in Transcript view
7	Anti-sense gene information	Gene structure	‘Gene structure’ tab in Locus view
8	Detailed data of similarity to known ncRNA.	ncRNA	‘Function’ tab in Transcript view
9	Two new species (horse and medaka) for comparative analysis	Comparative	‘Evolution’ tab in Transcript view, G-integra and Evola
10	Detailed annotation for unmapped (UM) transcripts	Gene structure	Topic Annotation viewer
11	Remote integration of GlycoGene Database (GGDB)	Function	‘Function’ tab in Transcript view
12	Remote integration of the functional RNA database (fRNAdb)	ncRNA	‘Function’ tab in Transcript view

New annotated features in H-InvDB

New features in Transcript view and Locus view

Transcript view shows all annotations of the H-Inv transcript in 12 section tabs, and Locus view shows all annotations of a locus in 6 section tabs. At the ‘expression’ tab in Transcript and Locus view, the mappings of microarray probes to H-InvDB data are now available. The probes of DNA Chip Research AceGene, Affymetrix GeneChip and Agilent in DNAProbeLocator (http://h-invitational.jp/DNAProbeLocator/) were mapped, related to H-InvDB entries (both to HIT and HIX), and are shown. To qualify the transcript quality, we now provide two new features, truncation (8) and Kozak consensus sequence (9) at the ‘Transcript Info’ tab in Transcript view. We have also integrated the annotated information of the GlycoGene Database (10) and the Functional RNA Database (11) at the ‘function’ tab in Transcript view using web services. The Transcript and Locus views also have links to related external public databases including DDBJ/EMBL/GenBank (4), RefSeq (12), UniProtKB (13), HGNC (14), GeneCards (15), InterPro (16), Ensembl (17), EntrezGene (18), CCDS (19), PubMed (20), dbSNP (21), GO (22), GTOP (23), OMIM (24) and MutationView (25).

New features in G-integra

G-integra is an integrated genome browser in which we can examine the genomic structures of transcripts. The genomic locations, gene structures and alignments against the human genome of H-Inv transcripts, and the corresponding RefSeq and Ensembl entries are shown. We now show the annotations for two types of high-quality gene models, pHIT and eHIT, for all human gene tracks (Figure 1). G-integra provides gene structure annotations for two new species (horse and medaka). In total, the gene structures for humans and 13 non-human species, namely Pan troglodytes (chimpanzee), Macaca sp. (macaque), Mus musculus (mouse), Rattus norvegicus (rat), Canis familiaris (dog), Bos taurus (cow), Monodelphis domestica (opossum), Gallus gallus (chicken), Equus ferus caballus (horse), Danio rerio (zebrafish), Tetraodon nigroviridis (tetraodon), Takifugu rubripes (fugu) and Oryzias latipes (medaka) can be optionally displayed for comparison. The reference gene structures of non-coding RNAs of fRNAdb, pseudogenes of Pseudogene.org (26) and consensus coding sequences of CCDS (19) are also shown.

NEWLY RELEASED DATA MINING RESOURCES IN H-InvDB

H-InvDB now provides newly released useful data mining resources, namely ‘Navigation search’, ‘H-InvDB Enrichment Analysis Tool (HEAT)’ and web service APIs.

Navigation search

‘Navigation search’ is an extended search system that enables complicated searches by any combination of 16 different search contents. This system consists of three interfaces: search navigation menu, new advanced search and search results and the user interface images are shown in Figure 2. Search navigation menu: for every view in H-InvDB for example the top page, there is a link to ‘Navi’ on the black menu bar (Figure 2A). The search navigation menu provides a list of all searches in H-InvDB (Figure 2B). New advanced search provides combined search of 16 search contents (Figure 2C). The search contents and items as summarized in Table 4. The search results page provides the search results and facilities to download the search results in four formats: flat file format, XML format, list of IDs in text format and sequence FASTA file (Figure 2D).

Figure 2.

Table 4.

The list of search contents and items H-InvDB Navigation search

No.	Search content	Search items
1	Keyword or ID	13 IDs and 7 different types of keywords
2	Gene structure	chromosome number, chromosomal band, genome strand and location on the human genome
3	Alternative splicing (AS) variants	splicing site, pattern and location of alternative splicing
4	Non-coding functional RNAs	type and classification of ncRNAs
5	Protein functions	definition, similarity category, gene symbol, EC name and molecular function of GO
6	Functional domains	ID, name and type of InterPro domain
7	Subcellular localization	cellular component of GO and predicted subcellular localization by WoLF PSORT, SOSUI, TMHMM, TargetP and PTS1
8	Metabolic pathways	biological process of GO, ID and name of the KEGG pathway
9	Protein 3D structure	PDB and SCOP IDs of GTOP prediction
10	Genetic polymorphism	types and features of variation such as SNP, microsatellite, copy number variation (CNV), synonymous or nonsynonymous variations
11	Gene expression	tissue specific expression in ten tissue/organ classes, Affimetrix probe ID, promoter motif and upstream transcriptional start site (TSS)
12	Relation to disease	relation to MutationView, ID and disease name of OMIM
13	Molecular evolution	orthologues and genome conservation among human and 13 model organisms
14	Protein–protein interaction	number of interacting proteins
15	Gene families and groups	all the predicted human gene families and four manually curated gene families/groups; Ig, MHC, TCR and OR
16	Transcript information	sequence data provider, molecular type, coding potential and curation status information

‘Navigation search’: powerful search tool of 16 search items. Example screen shot of the Navigation search system (http://www.h-invitational.jp/hinv/c-search/). (A) There are links to the Navigation system, ‘Navi’, at the black menu bar in all the viewers in H-InvDB including the top page. (B) Search navigation menu provide the list of all searches available in H-InvDB. (C) The new advanced search provide combination search of 16 search contents, for example, #2 gene structure, #3 alternative splicing (AS) variants, #10 genetic polymorphism and #13 relation to disease. (D) The search results provide the list of HIX IDs, HIT IDs, Chromosome number, definition, HGNC gene symbol, and links to appropriate H-InvDB and related viewers. The list of search contents and items H-InvDB Navigation search ‘Navigation search’ provides the extended application for data mining of H-InvDB. For example, a user can search human genes for chromosome 6 with alternative splicing variants of an internal acceptor pattern, which contains an SNP and has disease information in OMIM (Figure 2C). To search new gene models, pHIT or eHIT transcripts, mol_type = predicted transcript (pHIT) or predicted transcript (eHIT) must be selected in the search content ‘Transcript information’. URL: http://h-invitational.jp/hinv/c-search/hinvNaviTop.jsp

H-InvDB Enrichment Analysis Tool

H-InvDB Enrichment Analysis Tool (HEAT) is a data mining tool for automatically identifying features specific to a given human gene set. HEAT searches for H-InvDB annotations that are significantly enriched in a user-defined gene set as compared with the entire H-InvDB representative transcripts. This technique is called ‘gene set enrichment analysis’ and is popularly used for analysing the results of microarray experiments. The HEAT analysis requires three steps. (i) Gene-Set Submission: users must submit two or more human gene IDs. Acceptable IDs are H-InvDB Transcript IDs (HIT), Locus IDs (HIX), HUGO Gene Symbols, and accession numbers of INSD (DDBJ/EMBL/GenBank). (ii) Execution: the submitted IDs are converted into HIXs of H-InvDB release 6.0 representative transcripts by using the ID Converter System (27). (iii) Results: enriched features of the given gene set are shown. For each feature, the link to description of the feature, number of occurrences/genes of a submitted gene set, number of occurrences/genes among all H-InvDB representative transcripts and P-values are shown. Features with P-values smaller than 0.01 are shown and the list of results are sorted by P-value. Fisher’s exact probability is used in calculating the P-values. The following features of H-InvDB are analysed: InterPro, GO, the KEGG pathway, chromosomal band, gene family, structural domains (SCOP), subcellular localization prediction (using WoLF PSORT) and tissue-specific gene expression (10 tissue categories defined in H-ANGEL). URL: http://hinv.jp/HEAT/search.php?lang=en.

H-InvDB web-service APIs: a new data retrieval service

The web service interface is becoming a major way for accessing biological databases (28). H-InvDB now provides a new data retrieval service, web service with APIs of Simple Object Access Protocol (SOAP) and Representational State Transfer (REST), to retrieve the H-InvDB entries of given IDs or keywords. Entries in H-InvDB can be retrieved in XML or sequence FASTA format. The current H-InvDB web service provides 26 SOAP and 28 REST APIs. To use the REST service, an HTTP connection (e.g. web browser) and a programming language (e.g. Perl, JAVA) are required. Although both the POST and GET methods of access are supported, the POST method is approved. To retrieve entries for a keyword, e.g. ‘cancer’, the method and parameters are as follows: http://h-invitational.jp/hinv/hws/keyword_search.php?query=cancer. To use the SOAP service, users are requested to use the SOAP library of programming languages. Access to WSDL is via http://h-invitational.jp/hinv/hws/API?wsdl. The 12 representative SOAP APIs are listed in Table 5, and complete detailed descriptions are provided at the following URLs:

Table 5.

The list of representative H-InvDB web service APIs (SOAP)

API type	Description of API	WDSL	Query and output
Search entries	Search by IDs	soap_id_search.php?wsdl	query = any ID output = HIT ID
	Search by keywords	soap_keyword_search.php?wsdl	query = any keyword output = HIT ID
	Search by genomic location	soap_location2hit.php?wsdl	query = genomic location output = corresponding HIT ID
Count entries	Total number of HIT	soap_hit_cnt.php?wsdl	output = total number of HIT ID
Convert IDs	Convert ISND accession to HIT	soap_acc2hit.php?wsdl	query = Accession No. output = HIT ID
Retrieve data	Retrieve HIT XML file	soap_hit_xml.php?wsdl	query = HIT ID output = HIT XML file
	Retrieve HIT definition	soap_hit_definition.php?wsdl	query = HIT ID output = HIT definition
	Retrieve HIT evolutionary information	soap_hit_evolution.php?wsdl	query = HIT ID output = evolutionary information
	Retrieve HIT gene expression information	soap_hit_expression.php?wsdl	query = HIT ID output = gene expression information
	Retrieve HIT genomic location of HIT	soap_hit_location.php?wsdl	query = HIT ID output = genomic location of HIT
	Retrieve nucleotide sequence of HIT	soap_hit_nucleotide_seq_xml.php?wsdl	query = HIT ID output = nucleotide sequence of HIT (XML format)
	Retrieve protein sequence of HIT	soap_hit_protein_seq_xml.php?wsdl	query = HIT ID output = protein sequence of HIT (XML format)

The list of representative H-InvDB web service APIs (SOAP) REST APIs: http://www.h-invitational.jp/hinv/hws/doc/en/api_list.php SOAP APIs: http://www.h-invitational.jp/hinv/hws/doc/en/soap_api_list.php The H-InvDB web service is already used for retrieving H-InvDB data by other databases. For example, in MutationView, a database for mutations in human disease genes (25), the InterPro domain data in H-InvDB are used to search for relations among of the functional domains, human genes and human disease-related mutations.

DATA AVAILABILITY AND FUTURE DIRECTIONS

H-InvDB is freely available for both academic and commercial use, and can be accessed online at http://www.h-invitational.jp/ (or hinv.jp). Annotated data can also be downloaded in FASTA sequence files, original-format flat files or XML files at HTTP and FTP servers. Major updates are released once a year and minor updates are released a few times per year when necessary. For the next major update of H-InvDB by the end of this year, the annotations for the latest human genome assembly NCBI b37 will be provided.

FUNDING

Ministry of Economy, Trade and Industry of Japan (METI); the National Institute of Advanced Industrial Science and Technology (AIST); the Japan Biological Informatics Consortium (JBIC). Funding for open access charge: Advanced Industrial Science and Technology. Conflict of interest statement. None declared.

28 in total

Review 1. [International collaboration among DDBJ, EMBL Bank and GenBank].

Authors: Yoshio Tateno
Journal: Tanpakushitsu Kakusan Koso Date: 2008-02

2. Medline/PubMed revisited: new, semantic tools to explore the biomedical literature.

Authors: E Giglia
Journal: Eur J Phys Rehabil Med Date: 2009-06 Impact factor: 2.874

3. Web services at the European Bioinformatics Institute-2009.

Authors: Hamish McWilliam; Franck Valentin; Mickael Goujon; Weizhong Li; Menaka Narayanasamy; Jenny Martin; Teresa Miyar; Rodrigo Lopez
Journal: Nucleic Acids Res Date: 2009-05-12 Impact factor: 16.971

4. McKusick's Online Mendelian Inheritance in Man (OMIM).

Authors: Joanna Amberger; Carol A Bocchini; Alan F Scott; Ada Hamosh
Journal: Nucleic Acids Res Date: 2008-10-08 Impact factor: 16.971

5. InterPro: the integrative protein signature database.

Authors: Sarah Hunter; Rolf Apweiler; Teresa K Attwood; Amos Bairoch; Alex Bateman; David Binns; Peer Bork; Ujjwal Das; Louise Daugherty; Lauranne Duquenne; Robert D Finn; Julian Gough; Daniel Haft; Nicolas Hulo; Daniel Kahn; Elizabeth Kelly; Aurélie Laugraud; Ivica Letunic; David Lonsdale; Rodrigo Lopez; Martin Madera; John Maslen; Craig McAnulla; Jennifer McDowall; Jaina Mistry; Alex Mitchell; Nicola Mulder; Darren Natale; Christine Orengo; Antony F Quinn; Jeremy D Selengut; Christian J A Sigrist; Manjula Thimma; Paul D Thomas; Franck Valentin; Derek Wilson; Cathy H Wu; Corin Yeats
Journal: Nucleic Acids Res Date: 2008-10-21 Impact factor: 16.971

6. The GOA database in 2009--an integrated Gene Ontology Annotation resource.

Authors: Daniel Barrell; Emily Dimmer; Rachael P Huntley; David Binns; Claire O'Donovan; Rolf Apweiler
Journal: Nucleic Acids Res Date: 2008-10-27 Impact factor: 16.971

7. Low conservation and species-specific evolution of alternative splicing in humans and mice: comparative genomics analysis using well-annotated full-length cDNAs.

Authors: Jun-Ichi Takeda; Yutaka Suzuki; Ryuichi Sakate; Yoshiharu Sato; Masahide Seki; Takuma Irie; Nono Takeuchi; Takuya Ueda; Mitsuteru Nakao; Sumio Sugano; Takashi Gojobori; Tadashi Imanishi
Journal: Nucleic Acids Res Date: 2008-10-05 Impact factor: 16.971

8. Ensembl 2009.

Authors: T J P Hubbard; B L Aken; S Ayling; B Ballester; K Beal; E Bragin; S Brent; Y Chen; P Clapham; L Clarke; G Coates; S Fairley; S Fitzgerald; J Fernandez-Banet; L Gordon; S Graf; S Haider; M Hammond; R Holland; K Howe; A Jenkinson; N Johnson; A Kahari; D Keefe; S Keenan; R Kinsella; F Kokocinski; E Kulesha; D Lawson; I Longden; K Megy; P Meidl; B Overduin; A Parker; B Pritchard; D Rios; M Schuster; G Slater; D Smedley; W Spooner; G Spudich; S Trevanion; A Vilella; J Vogel; S White; S Wilder; A Zadissa; E Birney; F Cunningham; V Curwen; R Durbin; X M Fernandez-Suarez; J Herrero; A Kasprzyk; G Proctor; J Smith; S Searle; P Flicek
Journal: Nucleic Acids Res Date: 2008-11-25 Impact factor: 16.971

9. The GTOP database in 2009: updated content and novel features to expand and deepen insights into protein structures and functions.

Authors: Satoshi Fukuchi; Keiichi Homma; Shigetaka Sakamoto; Hideaki Sugawara; Yoshio Tateno; Takashi Gojobori; Ken Nishikawa
Journal: Nucleic Acids Res Date: 2008-11-04 Impact factor: 16.971

10. The Universal Protein Resource (UniProt) 2009.

Authors:
Journal: Nucleic Acids Res Date: 2008-10-04 Impact factor: 16.971

19 in total

1. PathEx: a novel multi factors based datasets selector web tool.

Authors: Eric Bareke; Michael Pierre; Anthoula Gaigneaux; Bertrand De Meulder; Sophie Depiereux; Fabrice Berger; Naji Habra; Eric Depiereux
Journal: BMC Bioinformatics Date: 2010-10-22 Impact factor: 3.169

Review 2. Protein Bioinformatics Databases and Resources.

Authors: Chuming Chen; Hongzhan Huang; Cathy H Wu
Journal: Methods Mol Biol Date: 2017

3. Deep transcriptome profiling of mammalian stem cells supports a regulatory role for retrotransposons in pluripotency maintenance.

Authors: Alexandre Fort; Kosuke Hashimoto; Daisuke Yamada; Md Salimullah; Chaman A Keya; Alka Saxena; Alessandro Bonetti; Irina Voineagu; Nicolas Bertin; Anton Kratz; Yukihiko Noro; Chee-Hong Wong; Michiel de Hoon; Robin Andersson; Albin Sandelin; Harukazu Suzuki; Chia-Lin Wei; Haruhiko Koseki; Yuki Hasegawa; Alistair R R Forrest; Piero Carninci
Journal: Nat Genet Date: 2014-04-28 Impact factor: 38.330

4. PCDq: human protein complex database with quality index which summarizes different levels of evidences of protein complexes predicted from h-invitational protein-protein interactions integrative dataset.

Authors: Shingo Kikugawa; Kensaku Nishikata; Katsuhiko Murakami; Yoshiharu Sato; Mami Suzuki; Md Altaf-Ul-Amin; Shigehiko Kanaya; Tadashi Imanishi
Journal: BMC Syst Biol Date: 2012-12-12

Review 5. Databases and resources for human small non-coding RNAs.

Authors: Eneritz Agirre; Eduardo Eyras
Journal: Hum Genomics Date: 2011-03 Impact factor: 4.639

6. Identification of differentially methylated regions using streptavidin bisulfite ligand methylation enrichment (SuBLiME), a new method to enrich for methylated DNA prior to deep bisulfite genomic sequencing.

Authors: Jason P Ross; Jan M Shaw; Peter L Molloy
Journal: Epigenetics Date: 2012-12-20 Impact factor: 4.528