| Literature DB >> 19433511 |
Jeff Elhai1, Arnaud Taton, J P Massar, John K Myers, Mike Travers, Johnny Casey, Mark Slupesky, Jeff Shrager.
Abstract
BioBIKE (biobike.csbc.vcu.edu) is a web-based environment enabling biologists with little programming expertise to combine tools, data, and knowledge in novel and possibly complex ways, as demanded by the biological problem at hand. BioBIKE is composed of three integrated components: a biological knowledge base, a graphical programming interface and an extensible set of tools. Each of the five current BioBIKE instances provides all available information (genomic, metabolic, experimental) appropriate to a given research community. The BioBIKE programming language and graphical programming interface employ familiar operations to help users combine functions and information to conduct biologically meaningful analyses. Many commonly used tools, such as Blast and PHYLIP, are built-in, allowing users to access them within the same interface and to pass results from one to another. Users may also invent their own tools, packaging complex expressions under a single name, which is immediately made accessible through the graphical interface. BioBIKE represents a partial solution to the difficult question of how to enable those with no background in computer programming to work directly and creatively with mass biological information. BioBIKE is distributed under the MIT Open Source license. A description of the underlying language and other technical matters is available at www.Biobike.org.Entities:
Mesh:
Year: 2009 PMID: 19433511 PMCID: PMC2703918 DOI: 10.1093/nar/gkp354
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Current BioBIKEs
| CyanoBIKE: Cyanobacteria (42 genomes) |
| ParaBIKE: Eukaryotic parasites (5 genomes) |
| StaphyloBIKE: Staphylococcus (45 genomes) |
| StreptoBIKE: Streptococcus (25 genomes) |
| ViroBIKE: Viruses (1797 genomes, 20 metagenomes) |
| BIKE: Used for education (0 genomes) |
aAll instances are available through biobike.csbc.vcu.edu
Figure 1.BioBIKE function palette and workspace. The green workspace shows the work of a user looking for a regulatory sequence upstream from a gene, by focusing on sequences common amongst upstream sequences of orthologous genes in related organisms. The first function defines the variable gln-orthologs as the set of orthologs in marine cyanobacteria of a gene the user knows to encode glutamine synthetase. The second function is in the midst of being completed. The user is choosing the newly defined variable from the VARIABLES menu to be inserted into a function that will extract the sequences upstream from all the orthologs and then find statistically overrepresented sequences within the set of sequences, using MEME (6).
Figure 2.Example of a nested function. The function makes an alignment of the sequences of all orthologs of the protein Asr1156, starting as many as 100 amino acids before the nominal beginning of the protein but going backwards only up to the first stop codon. The sequences are labeled with the name of the protein and aligned, using Clustal (5) and visualized using JalView (19). This is the code used to generate an alignment (discussed in Elhai, Taton, Massar, and Shrager, manuscript submitted for publication) that provides evidence against existing annotations of a family of conserved genes and for the use of nonstandard start codons in cyanobacteria.
Figure 3.Example of progressive evaluation and iteration in BioBIKE. The pattern of four cysteine residues separated by 2, 2 and 3 amino acids is often found in proteins with 4Fe-4S clusters (20). (A) The first function finds the pattern of cysteines amongst the sequences of all proteins in the cyanobacterium Synechocystis PCC 6803 and assigns the names of the proteins bearing the motif (Result 1) to a user-defined variable called 4fe-4s-proteins. (B) The annotation for each of the proteins is displayed in a separate window (see inset), and the annotations are also returned as result #2. (C) The user is concerned that this motif might well arise by chance on some proteins of Synechocystis. To test this, a set of random protein sequences is generated, each element being a random shuffling of a real protein sequence. The set is assigned to the variable random-sequences, and the random sequences are returned as result #3. (D) This set of random sequences is searched for the characteristic motif, and none are found (no result), lending some confidence to the belief that the presence of the motif in proteins of Synechocystis is of biological significance.