Literature DB >> 25666585

RASTtk: a modular and extensible implementation of the RAST algorithm for building custom annotation pipelines and annotating batches of genomes.

Thomas Brettin¹, James J Davis¹, Terry Disz², Robert A Edwards³, Svetlana Gerdes⁴, Gary J Olsen⁵, Robert Olson⁶, Ross Overbeek⁴, Bruce Parrello⁴, Gordon D Pusch⁴, Maulik Shukla⁷, James A Thomason⁸, Rick Stevens⁹, Veronika Vonstein⁴, Alice R Wattam⁷, Fangfang Xia⁶.

Abstract

The RAST (Rapid Annotation using Subsystem Technology) annotation engine was built in 2008 to annotate bacterial and archaeal genomes. It works by offering a standard software pipeline for identifying genomic features (i.e., protein-encoding genes and RNA) and annotating their functions. Recently, in order to make RAST a more useful research tool and to keep pace with advancements in bioinformatics, it has become desirable to build a version of RAST that is both customizable and extensible. In this paper, we describe the RAST tool kit (RASTtk), a modular version of RAST that enables researchers to build custom annotation pipelines. RASTtk offers a choice of software for identifying and annotating genomic features as well as the ability to add custom features to an annotation job. RASTtk also accommodates the batch submission of genomes and the ability to customize annotation protocols for batch submissions. This is the first major software restructuring of RAST since its inception.

Entities: Chemical Disease Gene Species

Mesh：

Year: 2015 PMID： 25666585 PMCID： PMC4322359 DOI： 10.1038/srep08365

Source DB: PubMed Journal: Sci Rep ISSN： 2045-2322 Impact factor: 4.379

The last two decades of research have brought vast changes to the field of genomics. The sequencing of a genome and the subsequent annotation of gene functions that were originally performed by teams of researchers expending thousands of man-hours of labor has become a standard laboratory technique that can be performed by a single person in one day. As sequencing technology has advanced and the cost has dropped, the number of genomes being deposited into the public databases has outpaced Moore's Law12. This has shifted the bottlenecks in genomic analysis from the sequencing per se to the tools that are used for annotation and genomic analysis. In 2008, the RAST server (Rapid Annotation using Subsystem Technology) was developed to annotate microbial genomes34. It works by projecting manually curated gene annotations from the SEED database onto newly submitted genomes567. The key to the consistency and accuracy of the RAST algorithm has been the carefully structured annotation data in the SEED, which are organized into subsystems (sets of logically related functional roles)5. As a result, RAST has become one of the most popular sources for consistent and accurate annotations for microbial genomes. The RAST community currently consists of ~10,000 active users who have contributed an average of 1,170 microbial genomes per week in the last year. It is also being used as the foundation for maintaining consistency for automated metabolic modeling in the ModelSEED8 and KBase (kbase.us), and for comparative genomics in the bacterial pathogen database, PATRIC910. RAST and other annotation engines encapsulate software for identifying and annotating specific genomic features into a standard annotation pipeline111213141516. This approach has several advantages including offering speed, convenience and consistency to the user. In order to annotate with RAST, users submit their contigs to the server where the computation is performed. This frees users from having to download and install multiple programs, or to perform intensive computations. However, despite these advantages, this approach also has limitations. For instance, the default pipeline may not always be the best choice for a given genome. It is also difficult for researchers to customize annotation pipelines by choosing different tools and adding their own features and annotations. Until recently, it has also been difficult to submit batches of genomes, and demand has been increasing for a version of RAST that can accommodate custom batch submissions. In this paper, we describe a new modular implementation of RAST, which we call the RAST tool kit (RASTtk). RASTtk allows users to build their own annotation pipelines with a choice of gene calling algorithms, annotation scripts, and output formats. It also provides a framework for users to add features and annotations to a processed RAST job. RASTtk can handle batch submissions of genomes, as well as the batch submission with custom annotation pipelines. RASTtk can be used on both the RAST website (http://rast.nmpdr.org) and the command line.

Results and Discussion

Accessing RASTtk

In order to make RAST more flexible and to keep pace with advancements that are being made in bioinformatics, we have separated the individual steps of the RAST annotation pipeline to provide a version that is modular, extensible and customizable. RASTtk exists as a set of advanced options on the RAST web server (http://rast.nmpdr.org) (Figure 1). It has also been made available as a set of stand-alone scripts in the Interactive Remote Invocation Service (IRIS) environment (http://iris.theseed.org/). IRIS is a web application that functions like a command line window. Through IRIS, users can use the latest RASTtk scripts to build and compare custom annotation pipelines. In addition, RASTtk scripts can be installed and run locally using the RASTtk.app DMG (Mac only) (https://github.com/TheSEED/RASTtk-Distribution/releases/). Individual scripts are also available through the KBase GitHub page (https://github.com/kbase/genome_annotation). Tutorials for using the RASTtk scripts can be found on the SEED website (http://tutorial.theseed.org).

Figure 1

RASTtk options that are available on the RAST website (http://rast.nmpdr.org).

A table of options is displayed when the user selects the RASTtk annotation scheme and clicks the checkbox for “Customize”. Individual steps can be turned off and on using the check boxes. Parameters and conditions can be changed or added as needed. Dragging and dropping table rows will change the order of the steps.

The RASTtk Default Pipeline

During an annotation job, the data from individual scripts must be collected and integrated into a coherent picture of the genome as a whole. The abstract layout of a RASTtk pipeline is described below: An initial tool converts a set of contigs (with a minimal amount of metadata) into a special file format called a Genome Typed Object (GTO). A GTO is a Java Script Object Notation (JSON) formatted file that is human-readable and allows for easy exchange of data objects [www.json.org]. Each step in the pipeline transforms an input GTO into an enhanced GTO. For example, calling genes with Prodigal uses the command “rast-call-features-CDS-prodigal” to add gene calls to an input GTO, which produces an expanded GTO (see Table 1 for a list of the commands implementing the current set of supported transformation tools).

Table 1

Characteristics of the RASTtk scripts

Tool	Feature Type Annotated	Input file type	Output file type	Default	Citation
RASTtk default pipeline scripts
rast-create-genome	n/a	Contigs in FASTA format	GTO	yes	This study
rast-process-genome	CDS, RNA, Repeat Regions, CRISPRS	GTO	GTO	yes	This study
rast-export-genome	All feature types	GTO	FASTA, Genbank, feature table etc.	yes	This study
RASTtk individual scripts
rast-add-features	user-defined	tab-delimited text	GTO	no	This study
rast-annotate-proteins-kmer-v1	CDS	GTO	GTO	yes	[3,19]
rast-annotate-proteins-kmer-v2	CDS	GTO	GTO	yes	This study
rast-annotate-proteins-similarity	CDS	GTO	GTO	no	This study
rast-call-features-CDS-genemark	CDS	GTO	GTO	no	[29]
rast-call-features-CDS-glimmer3	CDS	GTO	GTO	yes	[28]
rast-call-features-CDS-prodigal	CDS	GTO	GTO	yes	[18]
rast-call-features-crispr	CRISPR array, CRISPR repeat and CRISPR spacer	GTO	GTO	yes	This study
rast-call-features-insertion-sequences	IS elements	GTO	GTO	no	This study
rast-call-features-prophage-phispy	Prophage	GTO	GTO	no	[31]
rast-call-features-pyrrolysoprotein	CDS	GTO	GTO	yes	[4]
rast-call-features-repeat-region-SEED	Repeat regions	GTO	GTO	yes	This study
rast-call-features-rRNA-SEED	RNA (rRNA)	GTO	GTO	yes	This study
rast-call-features-selenoprotein	CDS	GTO	GTO	yes	[4]
rast-call-features-strep-pneumo-repeat	Repeat regions	GTO	GTO	conditional	[24]
rast-call-features-strep-suis-repeat	Repeat regions	GTO	GTO	conditional	[24]
rast-call-features-tRNA-trnascan	RNA (tRNA)	GTO	GTO	yes	[22]
rast-resolve-overlapping-features	n/a	GTO	GTO	yes	This study
rast-update-annotations	n/a	GTO	GTO	no	This study
Batch annotation scripts
rast-set-metadata	n/a	GTO	GTO	no	This study
rast-process-genome-batch	CDS, RNA, Repeat Regions, CRISPRS, IS elements	GTO	n/a	no	This study
rast-query-genome-batch	n/a	n/a	n/a	no	This study
rast-download-genome-batch	n/a	n/a	GTO	no	This study
Additional analysis tools
rast-call-features-ProtoCDS-kmer-v1	n/a	GTO	GTO	no	This study
rast-call-features-ProtoCDS-kmer-v2	n/a	GTO	GTO	no	This study
rast-compute-special-proteins	n/a	GTO	tab-delimited text	no	[33]
rast-enumerate-special-protein-databases	n/a	n/a	n/a	no	This study

A user can export data from a GTO. Generally, the final product of a sequence of transformation commands would be used to export a tab-delimited spreadsheet or GenBank entry17; numerous alternative export formats are also supported. We offer a default pipeline for RASTtk, which represents our recommendation for a rapid and accurate annotation workflow. This workflow differs slightly from the “classic” RAST pipeline (Figure 2). For instance, we have added Prodigal18 as an additional gene caller because of its improved accuracy with short genes and start positions, and because it is more robust to differences in G+C content12. We have also rewritten several of the core RAST algorithms, including the tools that find rRNA genes and resolve overlapping features. We have included a new version of the k-mer-based annotation algorithm19 and scripts that find repeat regions, CRISPRs, insertion sequences, and Streptococcus repeats.

Figure 2

The RAST workflow.

Each individual step is bounded by a box, and steps are connected by arrows. New RASTtk steps are indicated by red boxes and arrows. Improvements in the original steps are indicated in red text. Steps that are no longer part of the RASTtk pathway are indicated by gray arrows.

Calling ribosomal RNAs

Many tools exist for finding and curating collections of rRNAs1320, but we needed a program that is simple, lightweight and fast. The current RAST rRNA finder is a custom script that uses hand-curated and phylogenetically diverse set of representative sequences of the 23S (currently 81 representatives), 16S (currently 120 representatives) and 5S (currently 292 representatives) rRNAs. These sets represent the diversity of genomes in the SEED and have been curated for the correct endpoints. The rRNAs of a new genome are simply found using a BLASTN21 search against this curated set. This script also reports partial length matches because genome assemblies with incomplete rRNA operons have become common.

Calling tRNAs

In order to find the tRNAs, we currently use tRNAscan-SE, a tool that was written by Lowe and Eddy22. It uses a secondary structure based searching method to find the tRNA genes.

Calling large repeat regions

The annotation of genomic regions that have been acquired by horizontal gene transfer remains a major challenge to maintaining consistency and accuracy in RAST. Repeat regions are often indicative of horizontal gene transfer and are hallmarks of insertion sequences and other mobile elements. Because of this, we have added a new script to the default RASTtk pipeline that performs a BLASTN21 search of the genome against itself, and reports any region occurring more than once with ≥ 95% nucleotide identity. These precomputed repeat regions can then be used for comparative analyses and as supporting data for more detailed annotation of mobile elements.

Calling seleno- and pyrrolysylproteins

Selenoproteins are widespread among the sequenced bacterial and archaeal genomes occurring in ~25% of the genomes in the CoreSEED (a collection of ~1000 highly curated diverse bacterial and archaeal genomes utilized by RAST). They contain the rare amino acid selenocysteine, which is incorporated at a UGA stop codon in frame23. In order to find these proteins, a hand-curated set of known selenoproteins is kept in a BLAST database21. When a match to a potential selenoprotein is found within a genome, it is then searched for the in-frame stop codon. If the stop is found, then the protein is annotated as a selenoprotein. Pyrrolysylproteins are less common among the currently sequenced genomes, occurring in ~1% of the sequenced bacterial and archaeal genomes in the CoreSEED. Similar to selenocysteine, pyrrolysine is incorporated at a UAG stop codon23. We search for pyrrolysylproteins using the same strategy.

Calling Streptococcus repeat elements

Streptococcus species contain small interspersed repeats that may modulate gene expression and can be used for epidemiological typing2425. We have added tools created by Croucher et al. for finding these elements24. When the user specifies the metadata for the genome, these scripts will be run if the genus is Streptococcus. In the future, we anticipate adding other species-specific annotation tools and their conditional usage in RASTtk will mirror that of the Streptococcus repeat finder.

Calling CRISPR elements

CRISPRs, clustered regularly interspaced short palindromic repeats, are a special type of repeat region found in many bacterial and archaeal genomes. The CRISPR array contains spacer regions matching foreign DNA that are regularly spaced and bounded by repeat regions. The DNA of the spacer region is transcribed and used to interfere with incoming foreign DNA. Because of their importance in horizontal gene transfer and in biotechnology2627, we have added a script to the RASTtk pipeline that finds CRISPR elements. This script works by using a Perl regular expression search [Wall, L., Christiansen, T. & Orwant, J. Programming perl. (O'Reilly Media, Inc., 2004)] to find recurring direct repeats of at least 24 nucleotides in length and spaced at regular intervals. The output of the script is three new feature types: the entire CRISPR array, the CRISPR spacer, and the CRISPR repeat.

Calling genes

RASTtk offers the option to call the open reading frames with Glimmer328, GeneMarkS29, and Prodigal18. The original web-based version of RAST used Glimmer3, and our current default RASTtk pipeline uses both Prodigal and Glimmer3. The output of both programs is added to the GTO file, and then an optimal set of calls is chosen in the overlap resolution step (described below).

Annotating proteins with k-mers

Historically, RAST has made heavy use of the FIGfam collection maintained within the SEED project6. FIGfams are protein families in which it is believed that all members of the same family share an identical function and were derived from a common ancestor (i.e., they are all isofunctional homologs). The original implementation of the k-mer-based assignment of function was based on the use of signature k-mers319. A signature k-mer is defined as an 8-mer amino acid sequence in which the vast majority (over 80%) of occurrences are found within FIGfams sharing a common function, and that do not occur in any FIGfam with a different function. For example, a k-mer for which 93% of the occurrences within the FIGfam collection were in families implementing the function SSU ribosomal protein S13p (S18e) would be considered a “signature of function”. In this case, the signatures depend critically on the FIGfams. Once we had modestly consistent collections of sequences in the SEED, we introduced a version of k-mer analysis that was not based on FIGfams. The notion of signature k-mer was modified to an 8-mer amino acid sequence in which the vast majority (over 80%) of occurrences were within sequences assigned an identical function. This version depends on the collection of protein sequences with consistent annotations. Currently, we base this version on a collection of about 1000 representative genomes present in a subset of the SEED called the CoreSEED (core.theseed.org). The CoreSEED represents our best attempt at annotation consistency and is currently the main focus of our manual annotation efforts. The CoreSEED database attempts to provide the most consistent manual annotations for the core metabolic and house keeping functions in a relatively small and diverse set of the bacteria and archaea, whereas the PubSEED database (from which the FIGfam collection is generated) attempts to absorb new annotations from the academic community for many genomes. The default RASTtk workflow first searches against the limited number of more stable annotation-based k-mers from the CoreSEED, and then if an annotation cannot be found it searches against the larger collection of FIGfam based k-mers from the PubSEED. In addition to these two k-mer based gene-function assignment tools, there are also two analogous tools that use these k-mer sets to search for function-containing regions of DNA that do not require gene calls. They are useful for searching for genes in regions where calls may have been missed and for assessing functions in un-assembled sequence data.

Annotating proteins missed by k-mers

If no function can be found for a protein-encoding gene during the k-mer analysis, a final search that uses a combination of BLAT30 and BLASTP21 is performed against a set of non-redundant genus-specific protein databases for the organism's genus and, when available, closest relatives. If a matching protein is found with an e-value < = 1e-5 and a percent identity > = 50%, then the function from the protein in the database is assigned to the gene.

Resolving Overlapping Calls

After using different tools to call open reading frames and annotate features, we try to resolve the results into a coherent picture. To resolve overlapping features, we use a dynamic programming algorithm that resembles the scoring algorithm in Prodigal18. It works by scanning the genome and generating a score for each alternative combination of feature calls for tRNA, rRNA and protein-encoding genes. In general, for a given location on the contig, tRNA and rRNA receive a higher score than protein encoding genes, and large overlaps receive a negative score based on the length of the overlap. Large gaps between genes also result in negative scores. After considering all of the combinations calls, the genomic arrangement with the highest score is chosen. User-defined features are exempt from consideration by this algorithm.

Additional Analysis Scripts

Customizing and updating an annotation job

Users can run custom analysis jobs locally on the command line or in IRIS and then add their own features to an annotation job. In order to input specialty features, the user must create a tab-delimited text file, which contains a unique identifier, the location, feature type and function. These are then added to the GTO file and can be exported in a variety of formats. Users can also update functions directly by providing a tab-delimited file with the identifier and the new function. The update is then logged in the GTO file.

Batch genome submissions

RASTtk supports the ability to upload a directory of GTO files either using IRIS or the RASTtk.app DMG. When a batch upload is performed, the entire directory is uploaded to the RAST server and placed into the queue. A job identifier is returned to the user and this is used to check the status of the job and to download the job when it is completed. A custom RASTtk pipeline can be invoked by adding a special JSON formatted workflow file along with the directory of genomes. For more information on using RASTtk in batch mode, please refer to the RASTtk tutorials (http://tutorial.theseed.org).

Calling prophage elements

In order to find potential prophage elements we have added PhiSpy31. PhiSpy uses a combination of several independent heuristic methods to identify regions in the genome, which may be derived from phages or mobile elements.

Finding insertion sequences

We have added a new tool that uses a reference set of end sequences and transposase proteins from the SEED and ISfinder databases332 to search the genome for IS elements. Matches are found by using a combination of BLASTN for the end regions and BLASTX for the proteins21.

Identifying special gene sets

The PATRIC project is an integration of data and tools for studying bacterial pathogens. In order for RASTtk to support PATRIC, it was necessary to improve the identification of genes relating to virulence and drug development33. RASTtk now offers an analysis script for this purpose. It searches against custom BLAST databases that have been built from ARDB34 and CARD35 for finding potential antibiotic resistance genes; DrugBank36 and TTD37 for finding potential drug targets; VFDB38, Victors39 and the PATRIC virulence factors1033 for finding potential virulence factors; and the human reference genome sequence40 for finding potential human homologs, which is an important step in drug screening analyses.

Future Directions

RASTtk enables users to optimize and customize the annotation steps for a given genome, and to apply these pipelines to sets of genomes as customized batch submissions. The modularity of RASTtk also makes it much easier to develop and incorporate software for improving genome annotations, and we anticipate adding tools to RASTtk as they become needed. We also expect the utilization of RASTtk to result in the community development of annotation pipelines aimed at solving more specialized annotation problems, such as more accurate gene calling in prophages, and eukaryotes. We also anticipate providing specialty scripts that improve the accuracy for gene families that are currently difficult to annotate, such as pathogenicity-related genes, transporters and mobile elements.

Author Contributions

T.B., J.J.D., T.D., R.A.E., S.G., G.J.O., R.O.l., R.O.v., B.P., G.D.P., M.S., J.A.T., R.S., V.V., A.R.W. and F.X. designed the project, built the RASTtk software and contribute to the maintenance of the RAST tool kit. J.J.D. and R.O.v. wrote this manuscript. All authors have reviewed this manuscript.

37 in total

1. BLAT--the BLAST-like alignment tool.

Authors: W James Kent
Journal: Genome Res Date: 2002-04 Impact factor: 9.043

2. A System for Automated Bacterial (genome) Integrated Annotation--SABIA.

Authors: Luiz G P Almeida; Roger Paixão; Rangel C Souza; Gisele C da Costa; Frank J A Barrientos; M Trindade dos Santos; Darcy F de Almeida; Ana Tereza R Vasconcelos
Journal: Bioinformatics Date: 2004-04-15 Impact factor: 6.937

3. Identifying bacterial genes and endosymbiont DNA with Glimmer.

Authors: Arthur L Delcher; Kirsten A Bratke; Edwin C Powers; Steven L Salzberg
Journal: Bioinformatics Date: 2007-01-19 Impact factor: 6.937

Review 4. CRISPR--a widespread system that provides acquired resistance against phages in bacteria and archaea.

Authors: Rotem Sorek; Victor Kunin; Philip Hugenholtz
Journal: Nat Rev Microbiol Date: 2008-03 Impact factor: 60.633

5. The subsystems approach to genome annotation and its use in the project to annotate 1000 genomes.

Authors: Ross Overbeek; Tadhg Begley; Ralph M Butler; Jomuna V Choudhuri; Han-Yu Chuang; Matthew Cohoon; Valérie de Crécy-Lagard; Naryttza Diaz; Terry Disz; Robert Edwards; Michael Fonstein; Ed D Frank; Svetlana Gerdes; Elizabeth M Glass; Alexander Goesmann; Andrew Hanson; Dirk Iwata-Reuyl; Roy Jensen; Neema Jamshidi; Lutz Krause; Michael Kubal; Niels Larsen; Burkhard Linke; Alice C McHardy; Folker Meyer; Heiko Neuweger; Gary Olsen; Robert Olson; Andrei Osterman; Vasiliy Portnoy; Gordon D Pusch; Dmitry A Rodionov; Christian Rückert; Jason Steiner; Rick Stevens; Ines Thiele; Olga Vassieva; Yuzhen Ye; Olga Zagnitko; Veronika Vonstein
Journal: Nucleic Acids Res Date: 2005-10-07 Impact factor: 16.971

6. ISfinder: the reference centre for bacterial insertion sequences.

Authors: P Siguier; J Perochon; L Lestrade; J Mahillon; M Chandler
Journal: Nucleic Acids Res Date: 2006-01-01 Impact factor: 16.971

7. The integrated microbial genomes (IMG) system.

Authors: Victor M Markowitz; Frank Korzeniewski; Krishna Palaniappan; Ernest Szeto; Greg Werner; Anu Padki; Xueling Zhao; Inna Dubchak; Philip Hugenholtz; Iain Anderson; Athanasios Lykidis; Konstantinos Mavromatis; Natalia Ivanova; Nikos C Kyrpides
Journal: Nucleic Acids Res Date: 2006-01-01 Impact factor: 16.971

8. RNAmmer: consistent and rapid annotation of ribosomal RNA genes.

Authors: Karin Lagesen; Peter Hallin; Einar Andreas Rødland; Hans-Henrik Staerfeldt; Torbjørn Rognes; David W Ussery
Journal: Nucleic Acids Res Date: 2007-04-22 Impact factor: 16.971

9. The RAST Server: rapid annotations using subsystems technology.

Authors: Ramy K Aziz; Daniela Bartels; Aaron A Best; Matthew DeJongh; Terrence Disz; Robert A Edwards; Kevin Formsma; Svetlana Gerdes; Elizabeth M Glass; Michael Kubal; Folker Meyer; Gary J Olsen; Robert Olson; Andrei L Osterman; Ross A Overbeek; Leslie K McNeil; Daniel Paarmann; Tobias Paczian; Bruce Parrello; Gordon D Pusch; Claudia Reich; Rick Stevens; Olga Vassieva; Veronika Vonstein; Andreas Wilke; Olga Zagnitko
Journal: BMC Genomics Date: 2008-02-08 Impact factor: 3.969

10. PHIDIAS: a pathogen-host interaction data integration and analysis system.

Authors: Zuoshuang Xiang; Yuying Tian; Yongqun He
Journal: Genome Biol Date: 2007 Impact factor: 13.583

738 in total

1. Paenibacillus roseus sp. nov., a ginsenoside-transforming bacterium isolated from forest soil.

Authors: Shahina Akter; Xiaoqing Wang; Sun-Young Lee; M Mizanur Rahman; Jong-Hyun Park; Muhammad Zubair Siddiqi; Sri Renukadevi Balusamy; Kihong Nam; Md Shahedur Rahman; Md Amdadul Huq
Journal: Arch Microbiol Date: 2021-05-25 Impact factor: 2.552

2. Scytodecamide from the Cultured Scytonema sp. UIC 10036 Expands the Chemical and Genetic Diversity of Cyanobactins.

Authors: Camila M Crnkovic; Jana Braesel; Aleksej Krunic; Alessandra S Eustáquio; Jimmy Orjala
Journal: Chembiochem Date: 2019-11-26 Impact factor: 3.164

3. Salmonella enterica Serovars Dublin and Enteritidis Comparative Proteomics Reveals Differential Expression of Proteins Involved in Stress Resistance, Virulence, and Anaerobic Metabolism.

Authors: A Y Martinez-Sanguiné; B D'Alessandro; M Langleib; G M Traglia; A Mónaco; R Durán; J A Chabalgoity; L Betancor; L Yim
Journal: Infect Immun Date: 2021-02-16 Impact factor: 3.441

4. Enzymatic Mechanism for Arabinan Degradation and Transport in the Thermophilic Bacterium Caldanaerobius polysaccharolyticus.

Authors: Daniel Wefers; Jia Dong; Ahmed M Abdel-Hamid; Hans Müller Paul; Gabriel V Pereira; Yejun Han; Dylan Dodd; Ramiya Baskaran; Beth Mayer; Roderick I Mackie; Isaac Cann
Journal: Appl Environ Microbiol Date: 2017-08-31 Impact factor: 4.792

5. Complete Genome Sequence of Actinosynnema pretiosum X47, An Industrial Strain that Produces the Antibiotic Ansamitocin AP-3.

Authors: Chuanqing Zhong; Gongli Zong; Shulan Qian; Meng Liu; Jiafang Fu; Peipei Zhang; Jun Li; Guangxiang Cao
Journal: Curr Microbiol Date: 2018-06-01 Impact factor: 2.188

6. Galacturonate Metabolism in Anaerobic Chemostat Enrichment Cultures: Combined Fermentation and Acetogenesis by the Dominant sp. nov. "Candidatus Galacturonibacter soehngenii".

Authors: Laura C Valk; Jeroen Frank; Pilar de la Torre-Cortés; Max van 't Hof; Antonius J A van Maris; Jack T Pronk; Mark C M van Loosdrecht
Journal: Appl Environ Microbiol Date: 2018-08-31 Impact factor: 4.792

7. Bi- and Multi-directional Gene Transfer in the Natural Populations of Polyvalent Bacteriophages, and Their Host Species Spectrum Representing Foodborne Versus Other Human and/or Animal Pathogens.

Authors: Ekaterine Gabashvili; Saba Kobakhidze; Stylianos Koulouris; Tobin Robinson; Mamuka Kotetishvili
Journal: Food Environ Virol Date: 2021-01-23 Impact factor: 2.778

8. Searching whole genome sequences for biochemical identification features of emerging and reemerging pathogenic Corynebacterium species.

Authors: André S Santos; Rommel T Ramos; Artur Silva; Raphael Hirata; Ana L Mattos-Guaraldi; Roberto Meyer; Vasco Azevedo; Liza Felicori; Luis G C Pacheco
Journal: Funct Integr Genomics Date: 2018-05-11 Impact factor: 3.410

9. Horsing around: Escherichia coli ST1250 of equine origin harbouring epidemic IncHI1/ST9 plasmid with bla _CTX-M-1 and an operon for short-chain fructooligosaccharides metabolism.

Authors: Adam Valcek; Petra Sismova; Kristina Nesporova; Søren Overballe-Petersen; Ibrahim Bitar; Ivana Jamborova; Arie Kant; Jaroslav Hrabak; Jaap A Wagenaar; Jean-Yves Madec; Peter Damborg; Engeline van Duijkeren; Christa Ewers; Joost Hordijk; Henrik Hasman; Michael S M Brouwer; Monika Dolejska
Journal: Antimicrob Agents Chemother Date: 2021-02-22 Impact factor: 5.191

10. Interspecies and Intraspecies Signals Synergistically Regulate Lysobacter enzymogenes Twitching Motility.

Authors: Tao Feng; Yong Han; Bingqing Li; Zhiqiang Li; Yameng Yu; Qingyang Sun; Xiaoyu Li; Liangcheng Du; Xiao-Hua Zhang; Yan Wang
Journal: Appl Environ Microbiol Date: 2019-11-14 Impact factor: 4.792