Literature DB >> 18089548

The H-Invitational Database (H-InvDB), a comprehensive annotation resource for human genes and transcripts.

Chisato Yamasaki, Katsuhiko Murakami, Yasuyuki Fujii, Yoshiharu Sato, Erimi Harada, Jun-ichi Takeda, Takayuki Taniya, Ryuichi Sakate, Shingo Kikugawa, Makoto Shimada, Motohiko Tanino, Kanako O Koyanagi, Roberto A Barrero, Craig Gough, Hong-Woo Chun, Takuya Habara, Hideki Hanaoka, Yosuke Hayakawa, Phillip B Hilton, Yayoi Kaneko, Masako Kanno, Yoshihiro Kawahara, Toshiyuki Kawamura, Akihiro Matsuya, Naoki Nagata, Kensaku Nishikata, Akiko Ogura Noda, Shin Nurimoto, Naomi Saichi, Hiroaki Sakai, Ryoko Sanbonmatsu, Rie Shiba, Mami Suzuki, Kazuhiko Takabayashi, Aiko Takahashi, Takuro Tamura, Masayuki Tanaka, Susumu Tanaka, Fusano Todokoro, Kaori Yamaguchi, Naoyuki Yamamoto, Toshihisa Okido, Jun Mashima, Aki Hashizume, Lihua Jin, Kyung-Bum Lee, Yi-Chueh Lin, Asami Nozaki, Katsunaga Sakai, Masahito Tada, Satoru Miyazaki, Takashi Makino, Hajime Ohyanagi, Naoki Osato, Nobuhiko Tanaka, Yoshiyuki Suzuki, Kazuho Ikeo, Naruya Saitou, Hideaki Sugawara, Claire O'Donovan, Tamara Kulikova, Eleanor Whitfield, Brian Halligan, Mary Shimoyama, Simon Twigger, Kei Yura, Kouichi Kimura, Tomohiro Yasuda, Tetsuo Nishikawa, Yutaka Akiyama, Chie Motono, Yuri Mukai, Hideki Nagasaki, Makiko Suwa, Paul Horton, Reiko Kikuno, Osamu Ohara, Doron Lancet, Eric Eveno, Esther Graudens, Sandrine Imbeaud, Marie Anne Debily, Yoshihide Hayashizaki, Clara Amid, Michael Han, Andreas Osanger, Toshinori Endo, Michael A Thomas, Mika Hirakawa, Wojciech Makalowski, Mitsuteru Nakao, Nam-Soon Kim, Hyang-Sook Yoo, Sandro J De Souza, Maria de Fatima Bonaldo, Yoshihito Niimura, Vladimir Kuryshev, Ingo Schupp, Stefan Wiemann, Matthew Bellgard, Masafumi Shionyu, Libin Jia, Danielle Thierry-Mieg, Jean Thierry-Mieg, Lukas Wagner, Qinghua Zhang, Mitiko Go, Shinsei Minoshima, Masafumi Ohtsubo, Kousuke Hanada, Peter Tonellato, Takao Isogai, Ji Zhang, Boris Lenhard, Sangsoo Kim, Zhu Chen, Ursula Hinz, Anne Estreicher, Kenta Nakai, Izabela Makalowska, Winston Hide, Nicola Tiffin, Laurens Wilming, Ranajit Chakraborty, Marcelo Bento Soares, Maria Luisa Chiusano, Yutaka Suzuki, Charles Auffray, Yumi Yamaguchi-Kabata, Takeshi Itoh, Teruyoshi Hishiki, Satoshi Fukuchi, Ken Nishikawa, Sumio Sugano, Nobuo Nomura, Yoshio Tateno, Tadashi Imanishi, Takashi Gojobori.

Abstract

Here we report the new features and improvements in our latest release of the H-Invitational Database (H-InvDB; http://www.h-invitational.jp/), a comprehensive annotation resource for human genes and transcripts. H-InvDB, originally developed as an integrated database of the human transcriptome based on extensive annotation of large sets of full-length cDNA (FLcDNA) clones, now provides annotation for 120 558 human mRNAs extracted from the International Nucleotide Sequence Databases (INSD), in addition to 54 978 human FLcDNAs, in the latest release H-InvDB_4.6. We mapped those human transcripts onto the human genome sequences (NCBI build 36.1) and determined 34 699 human gene clusters, which could define 34 057 (98.1%) protein-coding and 642 (1.9%) non-protein-coding loci; 858 (2.5%) transcribed loci overlapped with predicted pseudogenes. For all these transcripts and genes, we provide comprehensive annotation including gene structures, gene functions, alternative splicing variants, functional non-protein-coding RNAs, functional domains, predicted sub cellular localizations, metabolic pathways, predictions of protein 3D structure, mapping of SNPs and microsatellite repeat motifs, co-localization with orphan diseases, gene expression profiles, orthologous genes, protein-protein interactions (PPI) and annotation for gene families. The current H-InvDB annotation resources consist of two main views: Transcript view and Locus view and eight sub-databases: the DiseaseInfo Viewer, H-ANGEL, the Clustering Viewer, G-integra, the TOPO Viewer, Evola, the PPI view and the Gene family/group.

Entities: CellLine Chemical Disease Gene Species

Mesh：

Substances：

Year: 2007 PMID： 18089548 PMCID： PMC2238988 DOI： 10.1093/nar/gkm999

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

Human transcripts represent a biologically and functionally rich format for examining the structure of human genes and alternative splicing isoforms. In particular, cloning and sequencing of full-length cDNAs (FLcDNAs) that cover all exons but no introns can facilitate the precise determination of human gene structure (1). Studies on human transcripts have thus been systematically and extensively carried out to draw the outline of the human transcriptome (2–6). The human transcriptome consists of protein-coding mRNAs and non-coding functional RNAs. Analysis of these sequences will provide insights into how genomic information is transformed into higher order biological phenomena. By comparative analysis of the transcriptome with the human genome, we will be able to determine the transcribed regions of the genome and better understand the regulatory machinery of transcription (7, 8). It is therefore of great significance to collect information about human transcripts as well as their annotations. We thus held the first international workshop entitled ‘Human Full-length cDNA Annotation Invitational’ (abbreviated as H-Invitational or H-Inv) in Tokyo, Japan from 25th August to 3rd September 2002, and constructed a novel, integrative database of the human transcriptome, called H-InvDB (9,10). This consists of the annotation of 42 421 human FLcDNAs, collected from six high-throughput producers of human FLcDNAs in the world human gene collections. To cover the increased number of human FLcDNAs since the initial release of H-InvDB, we held the second international annotation meeting entitled ‘H-Invitational 2 Functional Annotation Jamboree’ (abbreviated as H-Invitational 2 or H-Inv2) in Tokyo, Japan from 15th to 20th November 2003. The second major release of H-InvDB (release 2.0) was based on the annotation carried out at the H-Inv2 annotation jamboree. After H-Inv2, we initiated the Genome Information Integration Project (GIIP) and held the third and fourth annotation meetings in October 2005 and October 2006. The products of those two annotation meetings comprised releases 3.0 and 4.0 of H-InvDB. The increases in the number of entries in H-InvDB are summarized in Table 1.

Table 1.

Statistics of H-InvDB entries

H-InvDB release	Date of release	Number of transcripts (HIT)	Number of gene clusters (HIX)	Number of proteins (HIP)	Human genome	Date of sequence data-fix
1.0	2004/4/20	41 118	21 037	–	NCBI build 34.1	2002/7/15
2.0	2005/8/31	56 419	25 585	–	NCBI build 34.1	2003/9/1
3.0	2006/3/31	167 992	35 005	–	NCBI build 35.1	2005/3/1
4.0	2007/3/30	175 542	34 701	116 228	NCBI build 36.1	2006/6/15
4.6	2007/9/27	175 536	34 699	116 142	NCBI build 36.1	2006/6/15

Statistics of H-InvDB entries

THE ANNOTATION IN OUR LATEST UPDATE, H-InvDB 2007

In our latest release H-InvDB_4.6, we annotated 120 558 human mRNAs extracted from the International Nucleotide Sequence Databases (INSD) in addition to 54 978 human FLcDNAs that were available on 15th June 2006. We mapped those human transcripts onto the human genome sequences (NCBI build 36.1) and determined 34 699 human gene clusters, which could define 34 057 (98.1%) protein-coding and 643 (1.9%) non-protein-coding loci, while 858 (2.5%) transcribed loci overlapped with predicted pseudogenes. We basically followed the mapping technique we described previously (9,10). We updated annotation for the mitochondrial transcripts since the previous major release, H-InvDB_4.0, which resulted in a slightly decreased number for the transcripts and clusters. Then we assigned a standardized functional annotation to each H-Inv transcript by human curation, based on the results of similarity searches and InterProScan (11). The numbers of manually curated human proteins in each category are summarized in Table 2.

Table 2.

Statistics of manually curated representative H-Inv proteins

Category	Definition	Number of representative HITs	%
I	Identical to known^a human protein (≥98% identity, =100% coverage)	12 404	36.42
II	Similar to known^a protein (≥50% identity, ≥50% coverage)	3165	9.29
III	InterPro domain containing protein	3056	8.97
IV	Conserved hypothetical protein	4210	12.33
V	Hypothetical protein	5124	15.05
VI	Hypothetical short protein (20–79 amino acids)	5250	15.42
VII	Pseudogene candidates	858	2.52
Total		34 057	100

a‘Known’ proteins are experimentally validated proteins in literatures.

Statistics of manually curated representative H-Inv proteins a‘Known’ proteins are experimentally validated proteins in literatures. For these transcripts and genes, we provide comprehensive annotation including descriptions of their gene structures, alternative splicing isoforms, functional non-protein-coding RNAs, functional domains of proteins, predicted sub cellular localizations, metabolic pathways, predictions of protein 3D structure, mapping of SNPs and microsatellite repeat motifs, co-localization with orphan diseases, gene-expression profiles, orthologous genes and evolutionary features in model animals, protein–protein interaction (PPI) and annotation for gene families. We have also annotated several new features related to transcript quality.

NEW ANNOTATED FEATURES IN H-InvDB

Classification of ncRNA

We annotated the transcripts that do not have homology to known protein-coding genes or InterPro-domain-containing genes as non-protein-coding transcript candidates. We classified 1216 non-protein-coding transcripts into ‘Identical to known ncRNA’ (124), ‘Similar to known ncRNA’ (74) and ‘Putative ncRNA’ (1018) by homology with known ncRNA databases and discrimination analysis

Sequence quality features: nonsense-mediated decay (NMD), read-through, reverse orientation

A total of 269 transcripts were annotated as candidates of read-through and 2731 as targets of NMD by the extended sequence quality annotation.

Category VII: pseudogene candidates

To annotate transcribed pseudogene candidates, we did the following: First, we filtered out the functional protein-coding genes by only targeting representative category II transcripts and those identified to have frame shifts and/or nonsense mutations; Second, we predicted transcribed pseudogene candidates based on a support vector machine (SVM) method. In the current release, we annotated 1112 transcribed pseudogene candidates (Category VII).

Annotation of gene families/groups

We annotated four selected gene families/groups: T-cell receptor (TCR), Immunoglobulin (Ig), Major Histocompatibility Complex (MHC) or Human Leukocyte Antigen (HLA) and Olfactory receptor (OR) using the original pipeline based on sequence analysis against genome and protein databases complemented by a text-mining approach. In the current release, we identified 15 TCR, 21 Ig, 72 MHC and 122 OR gene clusters. All the annotation items and features of H-Inv transcript sequences are stored and shown in the main views or sub-databases in H-InvDB.

COMPREHENSIVE ANNOTATION RESOURCES IN H-InvDB

The current H-InvDB annotation resources consist of two main views, Transcript view and Locus view, and eight sub-databases: the DiseaseInfo Viewer, H-ANGEL, the Clustering Viewer, G-integra, the TOPO Viewer, Evola, the PPI view and the Gene family/group view with the appropriate cross-links. An overview of the comprehensive annotation resources of the human gene and transcripts in H-InvDB is shown in Figure 1.

Figure 1.

H-InvDB: overview of the comprehensive annotation resource for the human genes and transcripts. The current H-InvDB annotation resources consist of two main views, Transcript view and Locus view, and eight sub-databases: the DiseaseInfo Viewer, H-ANGEL, the Clustering Viewer, G-integra, the TOPO Viewer, Evola, the PPI view and the Gene family/group view. The Transcript view and the Locus view are the main viewers to display the annotation of each H-Invitational transcript (HIT) and H-Invitational cluster (HIX). The DiseaseInfo Viewer, H-ANGEL, the Clustering Viewer, G-integra, the TOPO Viewer, Evola, the PPI view and the Gene family/group view are sub-databases to provide detailed annotation for each annotation feature. The links to related databases are provided from the appropriate viewers.

Transcript view

The transcript view shows all the annotation of the H-Inv transcript in 12 section tabs: (i) gene structure, (ii) gene function, (iii) gene ontology, (iv) predicted CDS, (v) functional motif, (vi) sub cellular localization, (vii) protein structure information, (viii) gene expression, (ix) disease/pathology, (x) evolutionary information, (xi) polymorphism (SNP, indel and microsatellite) and interspersed repeat information and (xii) transcript and sequence quality information. As seen in the example of a transcript view shown in Figure 1, this view also has links to many external public databases including DDBJ/EMBL/GenBank, RefSeq, UniProtKB, HGNC, InterPro, Ensembl, EntrezGene, PubMed, dbSNP, GO and GTOP and to web sites of the original data producers of the FLcDNA clones and sequences including the Chinese National Human Genome Center (CHGC), German cDNA Consortium (DKFZ/MIPS), Helix Research Institute, Inc. (HRI), the Institute of Medical Science in the University of Tokyo (IMSUT), the Kazusa DNA Research Institute (KDRI), the Mammalian Gene Collection (MGC/NCI) and NEDO. This view was previously known as the cDNA view (mRNA view).

Locus view

The Locus view shows all the annotation of a locus in six section tabs: (i) gene structure and location in the human genome, (ii) gene function, (iii) alternative splicing pattern, (iv) gene expression, (v) disease/pathology and (vi) cluster member information. As seen in the example of a Locus view shown in Figure 1, it shows links to external public databases including DDBJ/EMBL/GenBank, RefSeq, EntrezGene, GeneCards, HGNC and OMIM.

DiseaseInfo Viewer

The DiseaseInfo Viewer is a database of known and orphan genetic diseases and their relation to H-Inv clusters with EntrezGene and OMIM cross-links. The DiseaseInfo Viewer provides two kinds of disease information related to H-Inv clusters: known disease-related genes and co-localized orphan diseases. An orphan disease is defined as a disease mapped on a chromosomal region, but for which the responsible gene has not been identified yet. Co-localization does not necessarily mean a direct relationship between gene and disease; however, genes that are cytogenetically co-localized with a disease could be possible candidate genes for that disease. The co-localized H-Inv clusters are chosen by computing the physical range of each cytogenetic band with a 1 Mbp margin.

Human anatomic gene expression library (H-ANGEL)

H-ANGEL is a database of expression patterns that we constructed to obtain a broad outline of such patterns for human genes (12). We collected gene-expression data in normal and adult human tissues that were generated by three types of methods and in seven different platforms, including: iAFLP, a PCR-based quantitative expression profiling method; DNA arrays (long oligomers, short oligomers and cDNA microarrays); and cDNA sequence tags (SAGE, EST, BodyMap and MPSS). The H-ANGEL database comprises the largest and most comprehensive collection of gene expression patterns so far, which also provides a classification of human genes in terms of their expression.

Clustering Viewer

The Clustering Viewer facilitates the comparisons of different clustering. It allows users to see whether H-Inv transcripts are consistently clustered by different clustering methods. It also displays multiple alignments of transcripts by using CLUSTALW (13). The Clustering Viewer shows all the member transcripts of an H-Inv cluster to which a query sequence belongs.

G-integra

G-integra is an integrated genome browser, in which we can examine the genomic structures of the transcripts. As seen in an example view in Figure 1, the location in the human genome and gene structure of H-Inv transcript (green), and the corresponding RefSeq and Ensembl entries are shown. The structures of the genes and transcripts for 11 non-human species, Pan troglodytes (chimpanzee), Macaca sp. (macaque), Mus musculus (mouse), Rattus norvegicus (rat), Canis familiaris (dog), Bos taurus (cow), Monodelphis domestica (opossum), Gallus gallus (chicken), Danio rerio (zebrafish), Tetraodon nigroviridis (tetraodon) and Takifugu rubripes (fugu) can be optionally displayed for comparison. Other options allow the, the results of gene prediction programs such as GenScan (14), HMMgene (15), FGENESH (16) and JIGSAW (17) to be displayed.

TOPO Viewer

The TOPO Viewer is a tool for viewing subcellular targeting signals predicted by TargetP (18) and the presence of transmembrane helices predicted by SOSUI (19) and TMHMM(20). The probabilities that a protein may be delivered to up to nine distinct sub cellular locations are predicted by WoLF PSORT (21). TargetP predicts whether a protein contains a signal peptide, a mitochondrial targeting signal or any other type of signal. The TOPO Viewer consists of four tab pages: TABLE, MAP, FILE and GFP. The TABLE tab page displays the prediction results for all the programs used.

Evola

Evola is a database of evolutionary annotation of human genes (22). It provides sequence alignments and phylogenetic trees of manually curated orthologous genes among human and 11 model organisms, Pan troglodytes (chimpanzee), Macaca sp. (macaque), Mus musculus (mouse), Rattus norvegicus (rat), Canis familiaris (dog), Bos taurus (cow), Monodelphis domestica (opossum), Gallus gallus (chicken), Danio rerio (zebra fish), Tetraodon nigroviridis (tetraodon) and Takifugu rubripes (fugu). Sequence alignments and phylogenetic trees of the orthologous genes and homologous genes are shown in Evola.

PPI view

The PPI view displays H-InvDB human PPI information at http://www.jbirc.aist.go.jp/hinv/ppi/. We collected PPI data from five databases; BIND, DIP, MINT, HPRD and IntAct, removed redundancies of the PPI data among the databases based on their sequence similarities and integrated them with the H-Invitational proteins.

Gene family/Group view

The Gene family/Group view provides human-curated annotation datasets for the selected gene families/groups at http://www.jbirc.aist.go.jp/hinv/ahg-db/geneFamilyIndex.jsp. For H-InvDB release 4.0, we provided detailed annotations for four selected gene families/groups: TCR, Ig, MHC and OR. Each page provides the list of genes, gene names, definitions and links for the appropriate H-InvDB views.

H-InvDB New Identifier

We defined and assigned a unique identifier for each annotation unit, transcript, protein or cluster (7,8). The identifier for H-Invitational transcript is ‘HIT’, prefix HIT plus nine digit numbers (e.g. HIT000000001) and for H-Invitational cluster is ‘HIX’, prefix HIX plus seven digit numbers (e.g. HIX0000001). In order to identify the modification in sequence or annotation of an H-Inv entry, a version is assigned to each ID and always stated with the ID. Additionally, we now provide a new identifier for each H-Invitational protein, ‘HIP’, prefix HIP with nine digit numbers (e.g. HIP000000001).

H-InvDB Data Availability

H-InvDB is freely available for both academic and commercial use and can be accessed online at http://www.h-invitational.jp/(or hinv.jp). Annotated data can also be downloaded in FASTA sequence files, the original-format flat files or XML files at HTTP and FTP servers. The mirror database is also available at http://hinvdb.ddbj.nig.ac.jp/. Minor updates are released every three months and major updates are released once a year.

21 in total

1. Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes.

Authors: A Krogh; B Larsson; G von Heijne; E L Sonnhammer
Journal: J Mol Biol Date: 2001-01-19 Impact factor: 5.469

2. HUGE: a database for human large proteins identified in the Kazusa cDNA sequencing project.

Authors: Reiko Kikuno; Takahiro Nagase; Mina Waki; Osamu Ohara
Journal: Nucleic Acids Res Date: 2002-01-01 Impact factor: 16.971

3. Toward a catalog of human genes and proteins: sequencing and analysis of 500 novel complete protein coding human cDNAs.

Authors: S Wiemann; B Weil; R Wellenreuther; J Gassenhuber; S Glassl; W Ansorge; M Böcher; H Blöcker; S Bauersachs; H Blum; J Lauber; A Düsterhöft; A Beyer; K Köhrer; N Strack; H W Mewes; B Ottenwälder; B Obermaier; J Tampe; D Heubner; R Wambutt; B Korn; M Klein; A Poustka
Journal: Genome Res Date: 2001-03 Impact factor: 9.043

4. Ab initio gene finding in Drosophila genomic DNA.

Authors: A A Salamov; V V Solovyev
Journal: Genome Res Date: 2000-04 Impact factor: 9.043

5. Generation and initial analysis of more than 15,000 full-length human and mouse cDNA sequences.

Authors: Robert L Strausberg; Elise A Feingold; Lynette H Grouse; Jeffery G Derge; Richard D Klausner; Francis S Collins; Lukas Wagner; Carolyn M Shenmen; Gregory D Schuler; Stephen F Altschul; Barry Zeeberg; Kenneth H Buetow; Carl F Schaefer; Narayan K Bhat; Ralph F Hopkins; Heather Jordan; Troy Moore; Steve I Max; Jun Wang; Florence Hsieh; Luda Diatchenko; Kate Marusina; Andrew A Farmer; Gerald M Rubin; Ling Hong; Mark Stapleton; M Bento Soares; Maria F Bonaldo; Tom L Casavant; Todd E Scheetz; Michael J Brownstein; Ted B Usdin; Shiraki Toshiyuki; Piero Carninci; Christa Prange; Sam S Raha; Naomi A Loquellano; Garrick J Peters; Rick D Abramson; Sara J Mullahy; Stephanie A Bosak; Paul J McEwan; Kevin J McKernan; Joel A Malek; Preethi H Gunaratne; Stephen Richards; Kim C Worley; Sarah Hale; Angela M Garcia; Laura J Gay; Stephen W Hulyk; Debbie K Villalon; Donna M Muzny; Erica J Sodergren; Xiuhua Lu; Richard A Gibbs; Jessica Fahey; Erin Helton; Mark Ketteman; Anuradha Madan; Stephanie Rodrigues; Amy Sanchez; Michelle Whiting; Anup Madan; Alice C Young; Yuriy Shevchenko; Gerard G Bouffard; Robert W Blakesley; Jeffrey W Touchman; Eric D Green; Mark C Dickson; Alex C Rodriguez; Jane Grimwood; Jeremy Schmutz; Richard M Myers; Yaron S N Butterfield; Martin I Krzywinski; Ursula Skalska; Duane E Smailus; Angelique Schnerch; Jacqueline E Schein; Steven J M Jones; Marco A Marra
Journal: Proc Natl Acad Sci U S A Date: 2002-12-11 Impact factor: 11.205

6. Prediction of complete gene structures in human genomic DNA.

Authors: C Burge; S Karlin
Journal: J Mol Biol Date: 1997-04-25 Impact factor: 5.469

7. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice.

Authors: J D Thompson; D G Higgins; T J Gibson
Journal: Nucleic Acids Res Date: 1994-11-11 Impact factor: 16.971

8. HUNT: launch of a full-length cDNA database from the Helix Research Institute.

Authors: H T Yudate; M Suwa; R Irie; H Matsui; T Nishikawa; Y Nakamura; D Yamaguchi; Z Z Peng; T Yamamoto; K Nagai; K Hayashi; T Otsuki; T Sugiyama; T Ota; Y Suzuki; S Sugano; T Isogai; Y Masuho
Journal: Nucleic Acids Res Date: 2001-01-01 Impact factor: 16.971

9. Evola: Ortholog database of all human genes in H-InvDB with manual curation of phylogenetic trees.

Authors: Akihiro Matsuya; Ryuichi Sakate; Yoshihiro Kawahara; Kanako O Koyanagi; Yoshiharu Sato; Yasuyuki Fujii; Chisato Yamasaki; Takuya Habara; Hajime Nakaoka; Fusano Todokoro; Kaori Yamaguchi; Toshinori Endo; Satoshi Oota; Wojciech Makalowski; Kazuho Ikeo; Yoshiyuki Suzuki; Kousuke Hanada; Katsuyuki Hashimoto; Momoki Hirai; Hisakazu Iwama; Naruya Saitou; Aiko T Hiraki; Lihua Jin; Yayoi Kaneko; Masako Kanno; Katsuhiko Murakami; Akiko Ogura Noda; Naomi Saichi; Ryoko Sanbonmatsu; Mami Suzuki; Jun-ichi Takeda; Masayuki Tanaka; Takashi Gojobori; Tadashi Imanishi; Takeshi Itoh
Journal: Nucleic Acids Res Date: 2007-11-03 Impact factor: 16.971

10. Integrative annotation of 21,037 human genes validated by full-length cDNA clones.

Authors: Tadashi Imanishi; Takeshi Itoh; Yutaka Suzuki; Claire O'Donovan; Satoshi Fukuchi; Kanako O Koyanagi; Roberto A Barrero; Takuro Tamura; Yumi Yamaguchi-Kabata; Motohiko Tanino; Kei Yura; Satoru Miyazaki; Kazuho Ikeo; Keiichi Homma; Arek Kasprzyk; Tetsuo Nishikawa; Mika Hirakawa; Jean Thierry-Mieg; Danielle Thierry-Mieg; Jennifer Ashurst; Libin Jia; Mitsuteru Nakao; Michael A Thomas; Nicola Mulder; Youla Karavidopoulou; Lihua Jin; Sangsoo Kim; Tomohiro Yasuda; Boris Lenhard; Eric Eveno; Yoshiyuki Suzuki; Chisato Yamasaki; Jun-ichi Takeda; Craig Gough; Phillip Hilton; Yasuyuki Fujii; Hiroaki Sakai; Susumu Tanaka; Clara Amid; Matthew Bellgard; Maria de Fatima Bonaldo; Hidemasa Bono; Susan K Bromberg; Anthony J Brookes; Elspeth Bruford; Piero Carninci; Claude Chelala; Christine Couillault; Sandro J de Souza; Marie-Anne Debily; Marie-Dominique Devignes; Inna Dubchak; Toshinori Endo; Anne Estreicher; Eduardo Eyras; Kaoru Fukami-Kobayashi; Gopal R Gopinath; Esther Graudens; Yoonsoo Hahn; Michael Han; Ze-Guang Han; Kousuke Hanada; Hideki Hanaoka; Erimi Harada; Katsuyuki Hashimoto; Ursula Hinz; Momoki Hirai; Teruyoshi Hishiki; Ian Hopkinson; Sandrine Imbeaud; Hidetoshi Inoko; Alexander Kanapin; Yayoi Kaneko; Takeya Kasukawa; Janet Kelso; Paul Kersey; Reiko Kikuno; Kouichi Kimura; Bernhard Korn; Vladimir Kuryshev; Izabela Makalowska; Takashi Makino; Shuhei Mano; Regine Mariage-Samson; Jun Mashima; Hideo Matsuda; Hans-Werner Mewes; Shinsei Minoshima; Keiichi Nagai; Hideki Nagasaki; Naoki Nagata; Rajni Nigam; Osamu Ogasawara; Osamu Ohara; Masafumi Ohtsubo; Norihiro Okada; Toshihisa Okido; Satoshi Oota; Motonori Ota; Toshio Ota; Tetsuji Otsuki; Dominique Piatier-Tonneau; Annemarie Poustka; Shuang-Xi Ren; Naruya Saitou; Katsunaga Sakai; Shigetaka Sakamoto; Ryuichi Sakate; Ingo Schupp; Florence Servant; Stephen Sherry; Rie Shiba; Nobuyoshi Shimizu; Mary Shimoyama; Andrew J Simpson; Bento Soares; Charles Steward; Makiko Suwa; Mami Suzuki; Aiko Takahashi; Gen Tamiya; Hiroshi Tanaka; Todd Taylor; Joseph D Terwilliger; Per Unneberg; Vamsi Veeramachaneni; Shinya Watanabe; Laurens Wilming; Norikazu Yasuda; Hyang-Sook Yoo; Marvin Stodolsky; Wojciech Makalowski; Mitiko Go; Kenta Nakai; Toshihisa Takagi; Minoru Kanehisa; Yoshiyuki Sakaki; John Quackenbush; Yasushi Okazaki; Yoshihide Hayashizaki; Winston Hide; Ranajit Chakraborty; Ken Nishikawa; Hideaki Sugawara; Yoshio Tateno; Zhu Chen; Michio Oishi; Peter Tonellato; Rolf Apweiler; Kousaku Okubo; Lukas Wagner; Stefan Wiemann; Robert L Strausberg; Takao Isogai; Charles Auffray; Nobuo Nomura; Takashi Gojobori; Sumio Sugano
Journal: PLoS Biol Date: 2004-04-20 Impact factor: 8.029

42 in total

1. PathEx: a novel multi factors based datasets selector web tool.

Authors: Eric Bareke; Michael Pierre; Anthoula Gaigneaux; Bertrand De Meulder; Sophie Depiereux; Fabrice Berger; Naji Habra; Eric Depiereux
Journal: BMC Bioinformatics Date: 2010-10-22 Impact factor: 3.169

2. Isoform discovery by targeted cloning, 'deep-well' pooling and parallel sequencing.

Authors: Kourosh Salehi-Ashtiani; Xinping Yang; Adnan Derti; Weidong Tian; Tong Hao; Chenwei Lin; Kathryn Makowski; Lei Shen; Ryan R Murray; David Szeto; Nadeem Tusneem; Douglas R Smith; Michael E Cusick; David E Hill; Frederick P Roth; Marc Vidal
Journal: Nat Methods Date: 2008-06-15 Impact factor: 28.547

Review 3. Genomics and bioinformatics resources for crop improvement.

Authors: Keiichi Mochida; Kazuo Shinozaki
Journal: Plant Cell Physiol Date: 2010-03-05 Impact factor: 4.927

4. TriFLDB: a database of clustered full-length coding sequences from Triticeae with applications to comparative grass genomics.

Authors: Keiichi Mochida; Takuhiro Yoshida; Tetsuya Sakurai; Yasunari Ogihara; Kazuo Shinozaki
Journal: Plant Physiol Date: 2009-05-15 Impact factor: 8.340

5. Mining mammalian transcript data for functional long non-coding RNAs.

Authors: Amit N Khachane; Paul M Harrison
Journal: PLoS One Date: 2010-04-23 Impact factor: 3.240

6. A comprehensive survey of human polymorphisms at conserved splice dinucleotides and its evolutionary relationship with alternative splicing.

Authors: Makoto K Shimada; Yosuke Hayakawa; Jun-ichi Takeda; Takashi Gojobori; Tadashi Imanishi
Journal: BMC Evol Biol Date: 2010-04-30 Impact factor: 3.260

7. GATExplorer: genomic and transcriptomic explorer; mapping expression probes to gene loci, transcripts, exons and ncRNAs.

Authors: Alberto Risueño; Celia Fontanillo; Marcel E Dinger; Javier De Las Rivas
Journal: BMC Bioinformatics Date: 2010-04-29 Impact factor: 3.169

8. Hyperlink Management System and ID Converter System: enabling maintenance-free hyperlinks among major biological databases.

Authors: Tadashi Imanishi; Hajime Nakaoka
Journal: Nucleic Acids Res Date: 2009-05-19 Impact factor: 16.971

9. H-DBAS: human-transcriptome database for alternative splicing: update 2010.

Authors: Jun-ichi Takeda; Yutaka Suzuki; Ryuichi Sakate; Yoshiharu Sato; Takashi Gojobori; Tadashi Imanishi; Sumio Sugano
Journal: Nucleic Acids Res Date: 2009-12-07 Impact factor: 16.971

10. G-compass: a web-based comparative genome browser between human and other vertebrate genomes.

Authors: Yoshihiro Kawahara; Ryuichi Sakate; Akihiro Matsuya; Katsuhiko Murakami; Yoshiharu Sato; Hao Zhang; Takashi Gojobori; Takeshi Itoh; Tadashi Imanishi
Journal: Bioinformatics Date: 2009-10-21 Impact factor: 6.937