Literature DB >> 17202165

TBestDB: a taxonomically broad database of expressed sequence tags (ESTs).

Emmet A O'Brien¹, Liisa B Koski, Yue Zhang, LiuSong Yang, Eric Wang, Michael W Gray, Gertraud Burger, B Franz Lang.

Abstract

The TBestDB database contains approximately 370,000 clustered expressed sequence tag (EST) sequences from 49 organisms, covering a taxonomically broad range of poorly studied, mainly unicellular eukaryotes, and includes experimental information, consensus sequences, gene annotations and metabolic pathway predictions. Most of these ESTs have been generated by the Protist EST Program, a collaboration among six Canadian research groups. EST sequences are read from trace files up to a minimum quality cut-off, vector and linker sequence is masked, and the ESTs are clustered using phrap. The resulting consensus sequences are automatically annotated by using the AutoFACT program. The datasets are automatically checked for clustering errors due to chimerism and potential cross-contamination between organisms, and suspect data are flagged in or removed from the database. Access to data deposited in TBestDB by individual users can be restricted to those users for a limited period. With this first report on TBestDB, we open the database to the research community for free processing, annotation, interspecies comparisons and GenBank submission of EST data generated in individual laboratories. For instructions on submission to TBestDB, contact tbestdb@bch.umontreal.ca. The database can be queried at http://tbestdb.bcm.umontreal.ca/.

Entities: Chemical Disease Species

Mesh：

Year: 2007 PMID： 17202165 PMCID： PMC1899108 DOI： 10.1093/nar/gkl770

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

Much of the evolutionary diversity and biochemical versatility of the domain Eukarya is contained outside the kingdoms of animals, plants and fungi, in a highly diverse assemblage of poorly studied, mostly unicellular eukaryotes commonly referred to as protists (1–3), many of which are biologically relevant in the fields of human health and agriculture. As the early eukaryotic world must have been exclusively unicellular, protists are the key to understanding the origin and evolution of multicellular eukaryotes. As we know today, close unicellular relatives of the multicellular animals, fungi and land plants are, respectively, choanoflagellates plus Ichthyosporea (4,5), nucleariids [(6–9); E.Steenkamp, S.Baldauf and B.F.Lang, unpublished data], and charophyte algae (10,11). Unfortunately, very few protist genome projects are underway and protist nuclear genomics data are often limited to one or a few standard genes. An effective way of alleviating this shortcoming is to generate expressed sequence tags (ESTs) from cDNA libraries. This technique is fast and cost-effective, and provides a robust approximation of the expressed genetic component of a given organism. The Protist EST Program (PEP) was a large-scale genomics collaboration among six Canadian research groups with the objective of characterizing the expressed portion of the nuclear genome of a large number of different protist species. Most other protist EST and genome projects and their associated databases focus on pathogenic organisms, e.g. ApiEST-DB [protozoans in the phylum Apicomplexa] (12), CryptoDB [Cryptosporidium] (13), Full-Malaria [Plasmodium species] (14), PlasmoDB [Plasmodium falciparum] (15), TcruziDB [Trypanosoma cruzi] (16), ToxoDB [Toxoplasma gondii] (17) and the protist data contained in GeneDB [17 protist data collections, mostly Trypanosoma and Plasmodium species] (18). The few exceptions such as the Diatom EST Database [Phaeodactylum tricornutum and Thalassiosira pseudonana] (19), dictyBase [Dictyostelium discoideum] (20) and the Porphyra yezoensis EST index (21) tend to have a very specialized focus. PEP, in contrast, aimed to survey a taxonomically broad collection of protists and other poorly studied eukaryotic groups (Table 1). During the PEP project, a total of ∼550 000 ESTs were generated, of which ∼450 000 passed quality cut-offs and 370 000 of these sequences, from 49 organisms, have been made publicly available in the TBestDB database as of July 1, 2006. Approximately 80 000 ESTs from 19 other datasets, including PEP-related and externally generated data, are still under analysis and will be released into the public domain over the next few months. Researchers are invited to submit their data to TBestDB for free processing and annotation, with private access to the results provided for a limited time.

Table 1

Publicly available sequence content of TBestDB (July 1, 2006)

Organism name	No. of ESTs	No. of clusters
Acanthamoeba castellanii	13 814	5262
Acetabularia acetabulum	3464	2573
Allomyces macrogynus	5073	2149
Amoebidium parasiticum	3623	1557
Antonospora (Nosema) locustae	2376	700
Astasia longa	2730	1718
Bigelowiella natans	3462	2318
Blastocystis hominis	12 759	3330
Capsaspora owczarzaki	8863	2516
Chlamydomonas incerta	5124	1388
Cyanophora paradoxa[Durnford group]	9867	2448
Cyanophora paradoxa[Loeffelhardt group]	4673	1478
Diplonema papillatum	4791	3664
Euglena gracilis[Durnford group]	17 236	8651
Glaucocystis nostochinearum	8745	2831
Hartmannella vermiformis	9505	4986
Helicosporidium sp.	1188	701
Heterocapsa triquetra	6804	2038
Histiona aroides	4009	1763
Hyperamoeba dachnya	2756	1762
Isochrysis galbana CCMP 1323	12 205	6095
Jakoba bahamensis	4323	2286
Jakoba libera	5452	2565
Karlodinium micrum	16 544	11 903
Malawimonas californiana	4437	2314
Malawimonas jakobiformis	9798	4505
Mastigamoeba balamuthi	19 182	4438
Mesostigma viride	5615	1771
Micromonas sp.	3662	2004
Monosiga ovata	6433	2677
Nephroselmis olivacea	126	115
Oxytricha trifallax	2272	1230
Pavlova lutheri	7590	3383
Physarum polycephalum	9684	3078
Polysphondylium pallidum	4445	1247
Polytomella parva	5062	2151
Prototheca wickerhamii	5641	1542
Reclinomonas americana	17 644	6797
Rhizopus oryzae	12 570	5105
Saitoella complicate	3840	1008
Sawyeria marinlandensis	9300	3520
Scenedesmus obliquus	6615	2666
Seculamonas ecuadoriensis	5256	2217
Sphaeroforma arctica	8006	2763
Spizellomyces punctatus	5365	2079
Streblomastix strix	4475	2595
Taphrina deformans	3919	1435
Tetrahymena thermophila	31 548	9050
Trimastix pyriformis	9615	2686
Total	371 484	149 058

Publicly available sequence content of TBestDB (July 1, 2006)

DATA CONTENT

Information in TBestDB that is publicly accessible at the time of writing is compiled in Table 1. Data include individual EST sequences, consensus sequences and clustering information, conceptual translations, functional annotations drawn from three different sources, as well as metabolic pathway predictions. In addition, the database contains experimental information on cDNA libraries and information on data quality and project status.

EST PROCESSING PIPELINE

The EST processing pipeline includes three primary steps (Figure 1), starting from the download of sequence submitted by the PEP member laboratories. Annotation is then followed by post-processing steps to detect potential contamination and chimerism.

Figure 1

EST processing pipeline. EST tracefiles are accepted in .scf or .abi format via a dedicated sftp server. Any EST for which phred cannot read more than 60 nt of high-quality sequence is discarded. The default value for quality is 99% certainty of identification of each residue (ABI sequence technology), but this value has been set to slightly lower thresholds in certain instances where justified by the effective quality. The parameters used for cross_match have been adjusted slightly from the defaults—the minscore value has been changed from 20 to 17, to allow for slightly more relaxed matches, as this was found to give the best identification and masking of short linker sequences. At this point any EST sequence containing fewer than 60 unmasked residues is removed from further consideration. AutoFACT combines the most informative of the top 10 BLAST hits from the European Ribosomal Database (BLASTN), UniRef90 (BLASTX), KEGG (BLASTX), COG (BLASTX), Pfam (RPS-BLAST), and NCBI's nr (BLASTX) and est_others (TBLASTX) databases. Default parameters bitscore >40 and E-value <1 × 10−4 were used. Rapid Annotation is performed using BLASTX against a specialized set of sequences (see Annotation in text) with an E-value cut-off of 1× 10−4. Top-BLAST-hit annotations are from TBLASTX hits to NCBI's nr database using an E-value cut-off of 1 × 10−4. ORF prediction is performed by translating the consensus sequence in all frames, identifying stop codons and marking any potential ORF longer than 20 residues.

Sequence clustering

EST data are accepted as tracefiles in .scf or .abi format. Incoming tracefiles are processed using the phred/phrap package (22), which reads each tracefile, converts it into a sequence file with associated quality assessments for each residue, removes both vector and linker sequences and finally assembles the ESTs into clusters to generate consensus sequences. It should be noted that there is an observed difficulty with phrap in clustering datasets beyond a certain number of readings (starting between 5000 and 10 000 in our experience, depending on the individual dataset), manifesting as a failure to generate some small number, usually <5%, of expected clusters. We have addressed this difficulty by recursively running phrap on the set of unclustered sequences until no new clustering is found.

Statistical breakdown

Once clustering is completed, various statistics are calculated to facilitate the management of ongoing EST projects. Sequence quality is assessed by monitoring maximal and average reading length after quality clipping, and clone insert sizes, before and after vector clipping, are evaluated globally and by library. The overall progress of a project can be assessed on the basis of the distribution and growth of cluster size, and the evolution of redundancy of individual or multiple libraries for a given organism can be monitored, allowing rapid decisions to be made about the most productive directions for further sequencing.

Annotation

TBestDB conducts three kinds of annotation procedures for consensus sequences derived from clustered ESTs. (i) AutoFACT (23) provides the most sophisticated annotations. Using local BLAST comparisons (24), AutoFACT gathers classification information following a hierarchical system, from a collection of seven specialized databases (Table 2). As not all descriptions from top BLAST hits contain biologically meaningful information, AutoFACT adopts an ‘uninformative rule’ to identify the highest scoring BLAST hit that provides a meaningful annotation, generating ∼50% more functionally informative annotations than a top-BLAST-hit approach. Annotations provided by AutoFACT are of high quality, but the process of generating them is time-consuming due to the need for multiple BLAST searches. (ii) The Rapid Annotation procedure was designed to allow quick initial surveys of incoming data. Here, annotations are assigned by searching for sequence similarity to deduced nucleus-encoded proteomes from selected organisms (Arabidopsis thaliana, Ustilago maydis, Neurospora crassa, Homo sapiens, Rickettsia prowazeki and Magnetospirillum magnetotacticum) and deduced mitochondrion-encoded proteins of Reclinomonas americana—all of which have been comprehensively reannotated using AutoFACT—and with collections of representative large and small subunit ribosomal RNAs. Using this procedure, information about ubiquitous proteins and contamination of cDNA libraries with mitochondrial or rRNA sequences is made available to TBestDB users as each new EST dataset is processed. With this system a set of 5000 clusters can be annotated in ∼2 h, which allows for newly submitted data, typically containing 500–1000 EST sequences, to be clustered with existing data from the same organism and the entire dataset to be reannotated within one working day. (iii) Finally, to detect similarities with as-yet-unrecognized hypothetical proteins in published DNA sequences, TBLASTX is run against a local copy of NCBI's non-redundant database and the top hit is shown. The time requirement for this step is quite high, ∼10 min per sequence on our 16-CPU cluster.

Table 2

Databases searched and classification information assigned by AutoFACT

Database	Classification Information	Reference
European Ribosomal Database	Large subunit (LSU) ribosomal RNAs	(34)
	Small subunit (SSU) ribosomal RNAs
	Gene Ontology terms	(35,36)
UniProt's UniRef 90	Enzyme Commission numbers
	Locus names
Clusters of Orthologous Groups (COG)	Functional categories	(37,38)
	Metabolic pathways	(39)
Kyoto Encyclopedia of Genes and Genomes (KEGG)	Enzyme Commission numbers
	Locus names
Protein Families Database (Pfam)	Protein domains	(40)
NCBI's non-redundant database (nr)	N/A	(40)
NCBI's est_others database

Databases searched and classification information assigned by AutoFACT In addition to the above-mentioned automatic annotations, expert manual annotations are available in some cases, typically provided by the submitter of the sequences. Should all the analyses fail to identify the function of a consensus sequence, it is annotated as of ‘unknown function’. The above annotation procedures are rerun regularly, and in consequence automatically assigned names may change as the reference databases are updated. For this reason any reference to data in TBestDB should use TBestDB's internal cluster IDs in addition to the annotations provided.

Metabolic pathway prediction

AutoFACT annotations are used to build a Pathway Genome Database (25) for each individual organism. On this basis, annotated sequences can be mapped to metabolic pathways available in MetaCyc (26). This allows users to determine which components of a given pathway are present in, or still missing from, the sequenced part of an EST library and, ultimately, to assess the biological versatility of the organisms studied.

POST-PROCESSING

Contamination management

In large sequencing projects, some level of contamination between datasets or from external sources is unavoidable in practice. Sources of contamination include food organisms (bacteria on which many of the organisms documented in TBestDB are grown), symbionts, and human error during culturing, cloning and sequencing. In TBestDB we have implemented an automated system for the identification of potential cross-project contamination, in order to mitigate this problem as far as possible. Each consensus sequence in TBestDB (query cluster) is searched against the consensus sequences for every other organism in the database (retrieved clusters) using BLASTN. Potential contaminants are identified at a threshold of ≥97% sequence identity over at least 50 nt. rRNA sequences and well-known highly conserved proteins such as actin and ubiquitin, which are also retrieved by these criteria, are explicitly excluded from consideration as contaminants. We automatically remove from the database any query cluster that is found to match a retrieved cluster containing at least three times as many ESTs, as this criterion has proven a reliable identifier of contaminating data. Less clear-cut cases of potential contaminants are flagged, and the source laboratory is asked to examine the flagged sequences to determine whether they should remain in TBestDB. All of the ESTs belonging to contaminating clusters are moved into a separate database table, where they are used in further rounds of contamination checking. This procedure is necessary so that the curation of different organisms at different times can identify possible common sources of contamination, such as errors introduced by commercial library services shared by several users.

Identification of chimerism

Submitted datasets occasionally include chimeric ESTs (i.e. ESTs containing sequence from two distinct cDNAs), which causes problems during clustering. The identification of such ESTs is not straightforward, but we have implemented automatic tests that identify the bulk of such artifactual sequences. The simplest test is a search for misplaced poly(A) tracts in the EST sequence. A correctly assembled consensus sequence for a complete cDNA should have a single 3′-terminal poly(A) region. In practice, at least 10 A or T residues (depending on the direction of sequencing) are sufficient to identify the 3′ end of a transcript. Any sequence containing an apparent poly(A) or reverse-complemented poly(A) tail at both ends, or an internal poly(A) or poly(T) tract, is flagged as potentially chimeric. Chimerism in EST sequences without poly(A) tails is harder to detect. Our current practice is to identify these ESTs by the effects they have on the clustering process. Sections of chimeric ESTs from different origins are expected to match with different sets of sequences. Therefore, clusters containing chimerism should consist of two distinct ‘blocks’ of ESTs usually linked by only a single sequence where the fusion occurs. (This situation is also occasionally encountered when one of the ESTs in a large cluster contains an unexcised intron.) This pattern can be automatically identified by counting the number of ESTs at every position along the cluster and looking for abrupt changes in that number over a short distance. Obviously, this pattern can only be identified in clusters with sufficient coverage—in our experience, clusters containing 10 or more ESTs. In all cases, clusters identified as potentially chimeric are flagged in the database and the decision whether or not to remove chimeric ESTs is left to the submitter of the data.

DATA ACCESS AND PRESENTATION

When users log in to TBestDB they are presented with a list of organisms currently available in the database. Each organism name on the main page links to the organism's principal data page. Access permissions for each organism are determined by the provider of the data; such permissions may allow data to remain private for up to six months so that those who generate a dataset have time to analyse it before it becomes public. An organism's principal data page contains basic library and reading information and links to pages compiling experimental information and the various statistics detailed above. To maintain data currency, most statistics are calculated dynamically upon access. This page also shows all annotated clusters, with the option to order clusters in several ways and to search the various annotation fields for clusters of interest. The cluster ID links to a page containing detailed information related to that cluster, including download functionality for DNA and deduced protein sequences (Figure 2).

Figure 2

Cluster information page. The head of the cluster information page contains the cluster consensus sequence, links to the ESTs assembled within the cluster and all annotation information. The lower half of the page contains an image illustrating the structure of the cluster. The positions of each EST are indicated. ESTs originating from different libraries are shown in different colours. The read direction of each EST is shown with an arrowhead when that information is available and ESTs that have been internally reverse-complemented by phrap in the process of cluster assembly are indicated in outline. A multiple alignment is then shown depicting the ESTs and clustered consensus sequence in the same pattern (the right-hand portion of the sequence alignment is truncated in order to improve readability of the figure). The TBestDB main page also links to a set of Pathway Genome DataBases (25) that have been built for each organism for which annotated data are available in TBestDB. Via the pathway viewer (25) integrated With the help of TBestDB, users can inspect specific pathways, enzymatic reactions or compounds of interest, as well as visualize which enzymes and pathways are present within the organism under study or shared with other organisms. Finally, it is straightforward to perform BLAST searches against all or selected data included in TBestDB to which a user has access. The corresponding query sequences can be uploaded or copy-pasted into a window, and BLAST search functionality is achieved via a link to the web-based sequence analysis workbench AnaBench (27), developed in-house.

IMPLEMENTATION

The TBestDB database is implemented in PostgreSQL 7.4.1 with a web interface written in PHP v4.3.8. The graphics on the cluster pages are generated using the GD module, version 2.0.25. The pipeline is constructed using Perl (5.8.0) scripts to manage the data, call the programs from the phred suite and insert the results into the database. BLAST searches for sequence annotation by AutoFact and TBLASTX searches are run on a separate 16-CPU cluster. All other procedures are executed on PCs with two 2.4 GHz or 2.8 GHz Intel Xeon CPUs.

DISCUSSION

The clustering process implemented in TBestDB features a high level of discrimination, capable of distinguishing closely related homologs. Data from the amoebozoan protist Acanthamoeba castellanii provide relevant examples. Clusters ACL00004208 (containing 32 ESTs) and ACL00004800 (42 ESTs) represent two variants of ribosomal protein S3A, differing only at 3 nt positions within the coding region. Similarly, five variant actin sequences are correctly distinguished in this organism (clusters ACL00003090, ACL00003089, ACL00004196, ACL00004782 and ACL00004755). Of the 1125 nt positions encoding 375 amino acids in actin, only 52 are heterogeneous in these five sequences and all except one of the substitutions are silent. The clustering process is also able to discriminate among clusters that are identical within the coding region but differ within the 3′-terminal untranslated region, either because the different clusters represent distinct alleles or because of variation in the location of the polyadenylation site in transcripts of the same gene. In cases where consensus EST cluster sequences have counterparts in partial A.castellanii genomic data (28), the match between EST and genomic sequence is almost always 100%, so that the comparison allows ready recognition of introns. For example, ACL00000330 (53 ESTs) encodes a complete ORF for ribosomal protein S3, and comparison with genomic sequence finds an exact match and precisely identifies two GT … AG spliceosomal introns in the latter sequence. Notably, the datasets collected in TBestDB allow analyses to be conducted on a number of different scales. On the one hand, these data have provided unprecedented insights into the biology of specific protists, which have not been analysed previously at the molecular level either in substantial depth or substantial breadth. For example, the question of residual plastid functions in the non-photosynthetic green algae Prototheca wickerhamii and Helicosporidium sp. has successfully been addressed by surveying nucleus-encoded plastid-targeted proteins (29,30). On a broader scale, the capacity to carry out analyses across a consistently populated and annotated set of taxonomically diverse data allows for rigorous exploration of fundamental biological questions. These questions include the origin of photosynthesis among eukaryotes (31), the extent of lateral gene transfer within various eukaryotic lineages (32) and the basal resolution of the eukaryotic tree (33). At a more practical level, another valuable feature of TBestDB is that control of access to data is adaptable to meet the needs of individual users. User accounts can be defined to have access to any possible subset of the data within TBestDB. This feature allows users to restrict access to their data for a specified (but limited) period of time prior to release. In summary, TBestDB provides a powerful and flexible resource for clustering, annotation and distribution of EST data, a combination of features facilitating in-depth analyses of the genetic and biochemical complexity of individual eukaryotic species, systematic comparisons among taxa and global phylogenetic analyses of eukaryotes.

Outlook

We are currently engaged in adding functionality to TBestDB to allow for expert manual curation of specific subsets of the data, initially by the providers of the data in question. In the future, we intend to incorporate additional data from public sources into TBestDB, including EST data from representatives of highly sampled eukaryotes such as vertebrate animals, vascular plants and fungi.

35 in total

1. KEGG: kyoto encyclopedia of genes and genomes.

Authors: M Kanehisa; S Goto
Journal: Nucleic Acids Res Date: 2000-01-01 Impact factor: 16.971

Review 2. Mitochondrial evolution.

Authors: M W Gray; G Burger; B F Lang
Journal: Science Date: 1999-03-05 Impact factor: 47.728

3. The closest unicellular relatives of animals.

Authors: B F Lang; C O'Kelly; T Nerad; M W Gray; G Burger
Journal: Curr Biol Date: 2002-10-15 Impact factor: 10.834

4. Base-calling of automated sequencer traces using phred. II. Error probabilities.

Authors: B Ewing; P Green
Journal: Genome Res Date: 1998-03 Impact factor: 9.043

5. Multiple metabolic roles for the nonphotosynthetic plastid of the green alga Prototheca wickerhamii.

Authors: Tudor Borza; Cristina E Popescu; Robert W Lee
Journal: Eukaryot Cell Date: 2005-02

6. The Gene Ontology Annotation (GOA) Database: sharing knowledge in Uniprot with Gene Ontology.

Authors: Evelyn Camon; Michele Magrane; Daniel Barrell; Vivian Lee; Emily Dimmer; John Maslen; David Binns; Nicola Harte; Rodrigo Lopez; Rolf Apweiler
Journal: Nucleic Acids Res Date: 2004-01-01 Impact factor: 16.971

7. TcruziDB: an integrated, post-genomics community resource for Trypanosoma cruzi.

Authors: Fernán Agüero; Wenlong Zheng; D Brent Weatherly; Pablo Mendes; Jessica C Kissinger
Journal: Nucleic Acids Res Date: 2006-01-01 Impact factor: 16.971

8. dictyBase, the model organism database for Dictyostelium discoideum.

Authors: Rex L Chisholm; Pascale Gaudet; Eric M Just; Karen E Pilcher; Petra Fey; Sohel N Merchant; Warren A Kibbe
Journal: Nucleic Acids Res Date: 2006-01-01 Impact factor: 16.971

9. CryptoDB: a Cryptosporidium bioinformatics resource update.

Authors: Mark Heiges; Haiming Wang; Edward Robinson; Cristina Aurrecoechea; Xin Gao; Nivedita Kaluskar; Philippa Rhodes; Sammy Wang; Cong-Zhou He; Yanqi Su; John Miller; Eileen Kraemer; Jessica C Kissinger
Journal: Nucleic Acids Res Date: 2006-01-01 Impact factor: 16.971

10. The COG database: an updated version includes eukaryotes.

Authors: Roman L Tatusov; Natalie D Fedorova; John D Jackson; Aviva R Jacobs; Boris Kiryutin; Eugene V Koonin; Dmitri M Krylov; Raja Mazumder; Sergei L Mekhedov; Anastasia N Nikolskaya; B Sridhar Rao; Sergei Smirnov; Alexander V Sverdlov; Sona Vasudevan; Yuri I Wolf; Jodie J Yin; Darren A Natale
Journal: BMC Bioinformatics Date: 2003-09-11 Impact factor: 3.169

47 in total

1. Sawyeria marylandensis (Heterolobosea) has a hydrogenosome with novel metabolic properties.

Authors: Maria José Barberà; Iñaki Ruiz-Trillo; Julia Y A Tufts; Amandine Bery; Jeffrey D Silberman; Andrew J Roger
Journal: Eukaryot Cell Date: 2010-10-29

2. Pathway of cytosolic starch synthesis in the model glaucophyte Cyanophora paradoxa.

Authors: Charlotte Plancke; Christophe Colleoni; Philippe Deschamps; David Dauvillée; Yasunori Nakamura; Sophie Haebel; Gehrardt Ritte; Martin Steup; Alain Buléon; Jean-Luc Putaux; Danielle Dupeyre; Christophe d'Hulst; Jean-Philippe Ral; Wolfgang Löffelhardt; Steven G Ball
Journal: Eukaryot Cell Date: 2007-11-30

3. Construction of EST database for comparative gene studies of Acanthamoeba.

Authors: Eun-Kyung Moon; Joung-Ok Kim; Ying-Hua Xuan; Young-Sun Yun; Se Won Kang; Yong Seok Lee; Tae-In Ahn; Yeon-Chul Hong; Dong-Il Chung; Hyun-Hee Kong
Journal: Korean J Parasitol Date: 2009-05-26 Impact factor: 1.341

Review 4. Agrigenomics for microalgal biofuel production: an overview of various bioinformatics resources and recent studies to link OMICS to bioenergy and bioeconomy.

Authors: Namrata Misra; Prasanna Kumar Panda; Bikram Kumar Parida
Journal: OMICS Date: 2013-09-17

5. A bacterial proteorhodopsin proton pump in marine eukaryotes.

Authors: Claudio H Slamovits; Noriko Okamoto; Lena Burri; Erick R James; Patrick J Keeling
Journal: Nat Commun Date: 2011-02-08 Impact factor: 14.919

6. Hydrogen production by termite gut protists: characterization of iron hydrogenases of Parabasalian symbionts of the termite Coptotermes formosanus.

Authors: Jun-Ichi Inoue; Kanako Saita; Toshiaki Kudo; Sadaharu Ui; Moriya Ohkuma
Journal: Eukaryot Cell Date: 2007-08-31

7. Large genomic differences between the morphologically indistinguishable diplomonads Spironucleus barkhanus and Spironucleus salmonicida.

Authors: Katarina Roxström-Lindquist; Jon Jerlström-Hultqvist; Anders Jørgensen; Karin Troell; Staffan G Svärd; Jan O Andersson
Journal: BMC Genomics Date: 2010-04-21 Impact factor: 3.969

8. Patterns of kinesin evolution reveal a complex ancestral eukaryote with a multifunctional cytoskeleton.

Authors: Bill Wickstead; Keith Gull; Thomas A Richards
Journal: BMC Evol Biol Date: 2010-04-27 Impact factor: 3.260

9. Phylogenomic analyses predict sistergroup relationship of nucleariids and fungi and paraphyly of zygomycetes with significant support.

Authors: Yu Liu; Emma T Steenkamp; Henner Brinkmann; Lise Forget; Hervé Philippe; B Franz Lang
Journal: BMC Evol Biol Date: 2009-11-25 Impact factor: 3.260

10. Origin of saxitoxin biosynthetic genes in cyanobacteria.

Authors: Ahmed Moustafa; Jeannette E Loram; Jeremiah D Hackett; Donald M Anderson; F Gerald Plumley; Debashish Bhattacharya
Journal: PLoS One Date: 2009-06-01 Impact factor: 3.240