Literature DB >> 15123594

The Ensembl computing architecture.

James A Cuff1, Guy M P Coates, Tim J R Cutts, Mark Rae.   

Abstract

Ensembl is a software project to automatically annotate large eukaryotic genomes and release them freely into the public domain. The project currently automatically annotates 10 complete genomes. This makes very large demands on compute resources, due to the vast number of sequence comparisons that need to be executed. To circumvent the financial outlay often associated with classical supercomputing environments, farms of multiple, lower-cost machines have now become the norm and have been deployed successfully with this project. The architecture and design of farms containing hundreds of compute nodes is complex and nontrivial to implement. This study will define and explain some of the essential elements to consider when designing such systems. Server architecture and network infrastructure are discussed with a particular emphasis on solutions that worked and those that did not (often with fairly spectacular consequences). The aim of the study is to give the reader, who may be implementing a large-scale biocompute project, an insight into some of the pitfalls that may be waiting ahead.

Mesh:

Year:  2004        PMID: 15123594      PMCID: PMC479128          DOI: 10.1101/gr.1866304

Source DB:  PubMed          Journal:  Genome Res        ISSN: 1088-9051            Impact factor:   9.043


  5 in total

1.  A rapid method for determining sequences in DNA by primed synthesis with DNA polymerase.

Authors:  F Sanger; A R Coulson
Journal:  J Mol Biol       Date:  1975-05-25       Impact factor: 5.469

2.  Basic local alignment search tool.

Authors:  S F Altschul; W Gish; W Miller; E W Myers; D J Lipman
Journal:  J Mol Biol       Date:  1990-10-05       Impact factor: 5.469

3.  Improved tools for biological sequence comparison.

Authors:  W R Pearson; D J Lipman
Journal:  Proc Natl Acad Sci U S A       Date:  1988-04       Impact factor: 11.205

4.  A general method applicable to the search for similarities in the amino acid sequence of two proteins.

Authors:  S B Needleman; C D Wunsch
Journal:  J Mol Biol       Date:  1970-03       Impact factor: 5.469

5.  Identification of common molecular subsequences.

Authors:  T F Smith; M S Waterman
Journal:  J Mol Biol       Date:  1981-03-25       Impact factor: 5.469

  5 in total
  7 in total

1.  The Ensembl analysis pipeline.

Authors:  Simon C Potter; Laura Clarke; Val Curwen; Stephen Keenan; Emmanuel Mongin; Stephen M J Searle; Arne Stabenau; Roy Storey; Michele Clamp
Journal:  Genome Res       Date:  2004-05       Impact factor: 9.043

2.  The Ensembl automatic gene annotation system.

Authors:  Val Curwen; Eduardo Eyras; T Daniel Andrews; Laura Clarke; Emmanuel Mongin; Steven M J Searle; Michele Clamp
Journal:  Genome Res       Date:  2004-05       Impact factor: 9.043

Review 3.  An overview of Ensembl.

Authors:  Ewan Birney; T Daniel Andrews; Paul Bevan; Mario Caccamo; Yuan Chen; Laura Clarke; Guy Coates; James Cuff; Val Curwen; Tim Cutts; Thomas Down; Eduardo Eyras; Xose M Fernandez-Suarez; Paul Gane; Brian Gibbins; James Gilbert; Martin Hammond; Hans-Rudolf Hotz; Vivek Iyer; Kerstin Jekosch; Andreas Kahari; Arek Kasprzyk; Damian Keefe; Stephen Keenan; Heikki Lehvaslaiho; Graham McVicker; Craig Melsopp; Patrick Meidl; Emmanuel Mongin; Roger Pettett; Simon Potter; Glenn Proctor; Mark Rae; Steve Searle; Guy Slater; Damian Smedley; James Smith; Will Spooner; Arne Stabenau; James Stalker; Roy Storey; Abel Ureta-Vidal; K Cara Woodwark; Graham Cameron; Richard Durbin; Anthony Cox; Tim Hubbard; Michele Clamp
Journal:  Genome Res       Date:  2004-04-12       Impact factor: 9.043

4.  Implementing a genomic data management system using iRODS in the Wellcome Trust Sanger Institute.

Authors:  Gen-Tao Chiang; Peter Clapham; Guoying Qi; Kevin Sale; Guy Coates
Journal:  BMC Bioinformatics       Date:  2011-09-09       Impact factor: 3.169

5.  Ensembl 2005.

Authors:  T Hubbard; D Andrews; M Caccamo; G Cameron; Y Chen; M Clamp; L Clarke; G Coates; T Cox; F Cunningham; V Curwen; T Cutts; T Down; R Durbin; X M Fernandez-Suarez; J Gilbert; M Hammond; J Herrero; H Hotz; K Howe; V Iyer; K Jekosch; A Kahari; A Kasprzyk; D Keefe; S Keenan; F Kokocinsci; D London; I Longden; G McVicker; C Melsopp; P Meidl; S Potter; G Proctor; M Rae; D Rios; M Schuster; S Searle; J Severin; G Slater; D Smedley; J Smith; W Spooner; A Stabenau; J Stalker; R Storey; S Trevanion; A Ureta-Vidal; J Vogel; S White; C Woodwark; E Birney
Journal:  Nucleic Acids Res       Date:  2005-01-01       Impact factor: 16.971

6.  A white-box approach to microarray probe response characterization: the BaFL pipeline.

Authors:  Kevin J Thompson; Hrishikesh Deshmukh; Jeffrey L Solka; Jennifer W Weller
Journal:  BMC Bioinformatics       Date:  2009-12-29       Impact factor: 3.169

7.  Ensembl 2007.

Authors:  T J P Hubbard; B L Aken; K Beal; B Ballester; M Caccamo; Y Chen; L Clarke; G Coates; F Cunningham; T Cutts; T Down; S C Dyer; S Fitzgerald; J Fernandez-Banet; S Graf; S Haider; M Hammond; J Herrero; R Holland; K Howe; K Howe; N Johnson; A Kahari; D Keefe; F Kokocinski; E Kulesha; D Lawson; I Longden; C Melsopp; K Megy; P Meidl; B Ouverdin; A Parker; A Prlic; S Rice; D Rios; M Schuster; I Sealy; J Severin; G Slater; D Smedley; G Spudich; S Trevanion; A Vilella; J Vogel; S White; M Wood; T Cox; V Curwen; R Durbin; X M Fernandez-Suarez; P Flicek; A Kasprzyk; G Proctor; S Searle; J Smith; A Ureta-Vidal; E Birney
Journal:  Nucleic Acids Res       Date:  2006-12-05       Impact factor: 16.971

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.