Literature DB >> 27006628

Inside the Pan-genome - Methods and Software Overview.

Luis Carlos Guimarães¹, Jolanta Florczak-Wyspianska², Leandro Benevides de Jesus³, Marcus Vinícius Canário Viana³, Artur Silva², Rommel Thiago Jucá Ramos², Siomar de Castro Soares⁴, Siomar de Castro Soares⁴.

Abstract

The number of genomes that have been deposited in databases has increased exponentially after the advent of Next-Generation Sequencing (NGS), which produces high-throughput sequence data; this circumstance has demanded the development of new bioinformatics software and the creation of new areas, such as comparative genomics. In comparative genomics, the genetic content of an organism is compared against other organisms, which helps in the prediction of gene function and coding region sequences, identification of evolutionary events and determination of phylogenetic relationships. However, expanding comparative genomics to a large number of related bacteria, we can infer their lifestyles, gene repertoires and minimal genome size. In this context, a powerful approach called Pan-genome has been initiated and developed. This approach involves the genomic comparison of different strains of the same species, or even genus. Its main goal is to establish the total number of non-redundant genes that are present in a determined dataset. Pan-genome consists of three parts: core genome; accessory or dispensable genome; and species-specific or strain-specific genes. Furthermore, pan-genome is considered to be "open" as long as new genes are added significantly to the total repertoire for each new additional genome and "closed" when the newly added genomes cannot be inferred to significantly increase the total repertoire of the genes. To perform all of the required calculations, a substantial amount of software has been developed, based on orthologous and paralogous gene identification.

Entities: Chemical Disease Gene Species

Keywords: Accessory genome; Comparative genome; Core genome; Pan-genome; Species-specific genome

Year: 2015 PMID： 27006628 PMCID： PMC4765519 DOI： 10.2174/1389202916666150423002311

Source DB: PubMed Journal: Curr Genomics ISSN： 1389-2029 Impact factor: 2.236

BACKGROUND

The advent of Next-Generation Sequencing (NGS) has allowed the reduction in the time and cost per genome sequenced [1-3]; with the use of this tool, we have observed an exponential increase in the number of whole genome sequences that have been deposited in public databases (http://www.genomesonline.org). In this context, the large number of genomes available boosted the development of comparative genomics and, consequently, the rise of the pan-genomic area [4, 5]. Comparative genomics is the direct comparison of the genetic content of an organism against another, and its main aim is to obtain a better biological understanding of many species [6]. This approach could help to determine gene function and coding region sequences of genomes as well as to characterize the frequency of evolutionary events, such as genome plasticity, and to establish phylogenetic relationships [7, 8]. Most of the comparative analyses have as an objective to identify similarities and differences among the organisms [9]. A comparative genomics approach is used often in many different aspects of science, such as in the comparison of the Drosophila melanogaster (fruit fly - model organism) genes versus human genes, where 548 human genes were identified as homologous in the fly genome. All of these genes are linked to human diseases of different natures (cardiovascular, visual, auditory, endocrine and skeletal diseases) [10]. Thus, the finding of homologous genes that are commonly shared between humans and model organisms has opened the possibility of testing new therapies in model organisms [6]. Similarly, comparative genetics can be used in prokaryotic organisms, e.g., in the comparison of Bacillus licheniformis, which is a gram-positive bacterium of biotechnology and pharmaceutical interest and is used for the expression of proteins and antibiotic production, in two related species (Bacillus subtilis and Bacillus halodurans). The comparison among these three bacteria not only enabled the assembly of the Bacillus licheniformis genome but also helped in evolutionary studies and the identification of horizontal gene transfer between them [11]. Furthermore, comparative genomics analyses in related species have shown an extensive genomic intra-species diversity and highlighted the associated bacterial promiscuity [12]. However, comparative genomics can be used with a large number of bacteria with distinct lifestyles. A study that used three hundred and seventeen genomes was performed, aiming to establish patterns among the organisms’ lifestyles, their gene repertoires and the sizes of their genomes. As a result, the authors observed that intracellular pathogens are more prone to gene loss, or reductive genome evolution [13]. Thus, the availability of thousands of bacterial genomes in databases and the use of comparative genomics taking a variety of approaches have allowed the development of new terms such as pan-genome, core genome and accessory genome [14-16].

PAN-GENOME

The main goal of pan-genome is the genomic comparison of different strains of the same species, or even genus [17, 18]. Currently, the availability of a large number of genomes from different isolates of the same pathogen has opened the possibility of investigating several genomic characteristics that are intrinsic to one or more species [16]. One way to investigate these attributes is through the pan-genomic approach [15]. The first work that described the term pan-genome was conducted by Tettelin and colleagues (2005), who used eight different strains of Streptococcus agalactiae, a pathogenic species isolated from human. After this research, other studies were performed using pan-genomic analysis for different microorganisms, including Bacillus cereus [19], Escherichia coli [20], Sulfolobus islandicus [21], Streptococcus pneumoniae [22], Methanobrevibacter smithii [23], Corynebacterium diphtheriae [24], Corynebacterium pseudotuberculosis [25], and Pantoea ananatis [26], among others. The idea of pan-genomic studies brings significant insights of the understanding of bacterial evolution, niche adaptation, population structure and host interaction as well as inferences in more applied issues, such as vaccine and drug design and the identification of virulence genes. The term “pan-genome” reflects the total number of non-redundant genes that are present in a given dataset [16, 17]. It consists basically of three parts: i) core genome, formed by genes shared by all genomes and usually involved in essential cellular processes; ii) accessory or dispensable genome, composed of genes absent in some isolates; and iii) species-specific or strain-specific genes, which are those genes that are present in a single genome [16, 27] (Fig. ). Usually, genes that are present in accessory and species-specific or strain-specific genes are involved in niche adaptation [5, 28]. In this review, we describe the different approaches to studying the pan-genome and its sub-products (core genome, accessory or dispensable genome and species-specific or strain-specific genes), and we discuss the impact that the pan-genome concept has on characterizing the bacterial life style.

CORE GENOME, ACCESSORY OR DISPENSABLE GENOME, AND SPECIES-SPECIFIC OR STRAIN-SPECIFIC GENES

Core Genome

The core genome is the subset of genes that are present in all of the genomes, and it can be determined by comparing the different genomes [15]. Lapierre and Gogarten (2009) said that over 250 gene families have been characterized as part of the bacterial core genome and these gene families constitute evidence that gene conservation highlights the conservative nature of evolution. Normally, genes that are present in the core genome are associated with the maintenance of the basic aspects of the organism’s biology and are mainly related to replication, translation and maintenance of cellular homeostasis [16, 28]. Moreover, the core genome undergoes significant selective pressure in relation to its function, which inhibits the occurrence of drastic changes [14]. The number of genes that compound the core genome could indicate the genetic diversity among the studied organisms; thus, the core genome becomes smaller when diversity increases among the organisms [29]. On the other hand, phylogenetically related genomes tend to share more genes and consequently present a larger core genome [29, 30].

Accessory or Dispensable Genome

The accessory or dispensable genome is the subset of genes that is shared by some organisms but is not present in all of the studied organisms, and it is represented by approximately 8000 gene families [14]. This subset includes genes that have specific functions that are related to survival in different niches, and usually they are associated with virulence or resistance to antibiotics and can be reflected in the organism lifestyle [27, 31]. The accessory or dispensable genome has been described as variations in gene sequences that can provide the emergence of new functions from the genes [14]. Although similar at the nucleotide level, they show a high diversity in their specificity for substrates. The accessory genome could have emerged by horizontal gene transfer and paraphyletic evolution, where it occurs as gene duplication followed by mutation. Additionally, the strain divergence can occur [32, 33]. For example, the ABC transporters gene family presents various types of substrate specificity, which is caused by nucleotide substitutions in its binding periplasmic sites [14].

Species-specific or Strain-specific Genes

Species-specific genes are present in a single species at the inter-species level, whereas strain-specific genes are only present in one strain and are at the intra-species level [15]. Normally, species-specific or strain-specific genes are obtained by horizontal gene transfer among species. According to Lefébure and colleagues [34], who conducted a pan-genomic study with the Streptococcus genus, the species-specific genes are represented by 139,000 gene families [34, 35]. The presence of these genes could confer an adaptive advantage over those strains that lack them. Moreover, studies have shown that those genes have a connection with virulence or pathogenicity in pathogenic organisms. [36, 37]. In non-pathogenic organisms, these genes could have a connection with metabolism and could be metabolic islands that are acquired by horizontal gene transfer [38]. In general, this group of genes is under relaxed mutational pressure, with mutations occurring constantly in its sequences, in contrast with genes that are present in the core genome, which have constant selective pressure to maintain their conserved sequence [39, 40]. When mutations occur successfully, raising bacterial adaptation to specific environments and conditions, the genes can be maintained in the genome and shared among species that are integrated into the accessory genome (bacterial evolution). On the other hand, mutations can lead to the creation of pseudogenes (un-functional genes), which, during the evolutionary process, could be excluded from the genomes [14]. Jordan and colleagues (2001) made a study with strain-specific genes in which they analyzed 21 genomes, and they observed that strain-specific genes ranged from 5% to 35% per genome. They also observed that the majority of strain-specific genes were duplicated, i.e., most are paralogous genes that are arranged in tandem. Generally, these genes are considered to be virulence factors because they can encode surface-exposed proteins, which would confer on pathogenic bacteria the ability to bind to cell hosts [41, 42].

Open and Closed Pan-genome

To determine the number of genomes to be sequenced to obtain the complete gene repertoire of a given species or related organisms, it is necessary to determine how many extra genes are to be added for each newly sequenced genome [5, 16]. Thus, we have the concept of the open or closed pan-genome. Tettelin and colleagues [16] used mathematical extrapolation of the data; as a result, they observed that the S. agalactidae pan-genome is enormous and that unique genes will always continue to be identified even after hundreds of genomes have been sequenced. In this case, we have an “open” pan-genome, which means that each new genome sequenced will provide novel genes. However, this “infinite” gene pool is clearly a mathematical extrapolation from the available sequenced genomes; however, it supports the fact that some species have extremely flexible genetic content [5, 27]. However, some species live in an isolated and restricted niche that would hamper the ability to obtain foreign genes by the lack of mechanisms for gene exchange and recombination. In this case, the gene pool is no longer expanding after two or three sequenced genomes; in this way, we can infer that these species have a “closed” pan-genome [28]. However, we must keep in mind that the closed pan-genome does not necessarily denote that all of the strains show the same phenotype because different nucleotide polymorphisms could confer singular features to the strains; for example, some Buchnera had its thermal tolerance amended by a single nucleotide mutation in a promoter region [43, 44]. Heap’s Law is used to calculate whether the pan-genome is open or closed. Heap’s Law is an empirical law that describes the number of distinct words in a document (or set of documents) as a function of the document length, and it is represented by the formula n=k*N-α [45]. In a genetic context, n is the expected number of genes for a given number of genomes, N is the number of genomes, and the k and α (α =1-γ) are free parameters that are determined empirically [5]. According to Heap’s Law, when α > 1 (γ < 0), the pan-genome is considered to be closed, and the addition of new genomes will not increase the number of new genes significantly. On the other hand, when α < 1 (0 < γ < 1), the pan-genome is open, and for each newly added genome, the number of genes will increase significantly [5].

PAN-GENOME STUDIES

In this section, we describe some pan-genomic studies that were performed with respect to the following species: Pantoea ananatis [26], Lactobacillus rhamnosus [46], Corynebacterium pseudotuberculosis [25], Corynebacterium diphtheria [24], and Buchnera aphidicola [26] (Table ). Pantoea ananatis belongs to the Enterobacteriaceae family, which is frequently found in a wide variety of environments, such as rivers, soil samples, refrigerated beef and aviation fuel tanks, and frequently associated with plants and animals [47, 48]. Computing the pan-genome using eight strains of P. ananatis resulted in an open pan-genome in which approximately 106 new protein coding sequences would be added for each new genome. The P. ananat pan-genome consists of a core genome with 3,876 protein coding sequences and an accessory genome with 1,690 protein coding sequences [26]. Lactobacillus rhamnosus is a Gram-positive lactic acid bacteria species that covers a range of bodily habitats and is typically associated with certain fermented milk products. Isolates of L. rhamnosus are recognized as health-beneficial and are thus used as probiotics [49, 50]. The L. rhamnosus pan-genome study focused on the characterization of relevant surface-exposed proteins, such as the spaCBA operon, which encodes pili that have a muco-adhesive phenotype, an uncommon occurrence in this species [46]. Corynebacterium pseudotuberculosis is an important animal pathogen causative of several infectious and contagious chronic diseases, such as caseous lymphadenitis (CLA). This disease normally affects small ruminants (sheep and goat), causing significant economic loss [51]. The C. pseudotuberculosis pan-genome study resulted in an open pan-genome in which approximately 19 new protein coding sequences were added for each new genome. The core genome consists of 1,504 protein coding sequences. Analysis that was more detailed about the pan-genome revealed differences between the biovar ovis and equi strains, where the biovar ovis showed a more clonal-like behavior than the biovar equi strains [25]. Corynebacterium diphtheriae is an important human pathogen and the causative agent of classical diphtheria. This disease is an upper respiratory tract illness that is characterized by sore throat, low-grade fever, and the formation of an adherent membrane on the tonsils, pharynx, and/or nasal cavity [52, 53]. The A-B exotoxin called diphtheria toxin encoded by gene tox is the main virulence factor of toxigenic C. diphtheriae [54]. A pan-genomic study with thirteen strains showed an open pan-genome with 4,786 coding protein sequences, which was increasing at an average of 65 unique genes per newly sequenced strain. The core genome consists of 1,632 coding protein sequences. Analysis with the gene tox revealed that the strain C. diphtheriae 31A harbors a hitherto-unknown tox+ corynephage [24]. Buchnera aphidicola is the obligate intracellular endosymbiont of aphids; they inhabit an isolated and limited niche that would impede the ability to acquire external genes, and in addition, they do not have mechanisms for gene exchange and recombination [16, 27]. Pan-genomic analyses with 4 genomes reveal that this bacteria has a closed pan-genome with an estimated number of approximately 2,600 genes [17]. Comparing the B. aphidicola pan-genome with others previously cited (P. ananatis, L. rhamnosus, C. pseudotuberculosis and C. diphtheria), we observed that only B. aphidicola has a closed pan-genome. This observation can be correlated with lifestyle because intracellular bacteria have a restricted niche, which could cause gene losses to occur; on the other hand, free-living and facultative intracellular bacteria inhabit several environments, receiving many external stresses. Moreover, free-living and facultative intracellular bacteria normally show a capacity to acquire foreign genes by horizontal transfer [13, 16, 27, 37].

METHODS AND SOFTWARE USED IN PAN-GENOME STUDIES

In this section, we describe some methods and tools that have been developed to calculate the pan-genome. All of these pan-genome software systems are based on orthologous and paralogous gene identification for posterior dataset (core genome, dispensable genome, and strain- or species-specific genome) prediction.

EDGAR (Efficient Database Framework for Comparative Genome Analyses Using BLAST Score Ratios)

EDGAR is a web-tool (available in: https: //edgar.computational.bio.uni-giessen.de/). This software performs homology analyses based on a specific cutoff that is automatically adjusted to the query data [55]. The orthology analysis to calculate pan-genome, core-genome, and singletons is performed using BLAST Score Ratio Values (SRV). This method divides the BLAST bit score by the maximum possible bit score, generating the SRV, and the cutoff is calculated using a sliding window instead of a fixed SRV threshold of 30, as proposed by Lerat et al. (2003). The core genome is predicted through an iterative pairwise comparison using all of the selected genomes. One genome is selected as a reference, and its gene set (A) is compared with another gene set (B). Genes with a reciprocal best hit (the A and B gene sets) are filtered according to an orthology criterion based on the SRVs, and this new gene subset forms the core AB. Subsequently, this subset is compared with another gene set (C), and this comparison continues for all of the genome sets. The pan-genome is predicted in the same way, however, adding non-orthologous genes. One genome forms the pan-genome (A), and non-orthologous genes that are present in the other genome (B) are added to the pan-genome (A), forming the pan-genome (AB). This process continues until all of the genomes have been analyzed. The singletons are predicted using genes that are present in only one genome; in other words, the singletons are predicted using non-orthologous genes that are present in a single genome [55].

PGAT (Prokaryotic Genome Analysis Tool)

PGAT is a web-tool (available in: http://nwrce.org/pgat) that is used to compare multiple strains of the same species, to predict genetic differences. Its analyses include pan-genome, synteny, identification of genes present or absent in a dataset, comparison of SNPs (single-nucleotide polymorphism) in orthologous genes, comparison of genes in metabolic pathways and improvement of functional annotation [56]. The identification of present or absent genes is based on the ortholog assignments. This method is an improvement of the ortholog prediction method, which depends on the annotation that is derived from single genome processing [57]. However, the ortholog assignment removes the bias of the single genome annotation, where the genes are separated into groups and clustered by gene families that are determined through the BLAST protein [58]. Additionally, all of the groups are mapped, using all six-frame translations, and then, the homogenized set of orthologous genes is identified through all of the genomes [56]. The SNP identification is made using MUSCLE [59], by multiple sequence alignment of orthologous genes. The metabolic pathways are predicted using KEGG [60].

PGAP – Pan-genome Analysis Pipeline

PGAP is a stand-alone tool (available in: http:// pgap.sf.net) developed to perform pan-genome analysis, genetic variation, evolution and function analysis of gene clusters. The software uses two methods to calculate all of the analyses: (i) the GF method to detect homologous genes, and (ii) the MP method to detect orthologous genes. The GF method is based on the protein BLAST and MCL algorithms. All of the protein sequences are brought together, and protein BLAST is performed; the results are filtered and clustered using the MCL algorithm [58, 61]. The MP method is based on two algorithms: (i) Inparanoid to search orthologous and parologous genes using BLAST. Then, the pairwise ortholog clusters are moved to (ii) MultiParanoid, which was specifically developed to search for gene clusters among multiple strains [58, 62-64].

PanGP: A Tool for Quickly Analyzing Bacterial Pan-genome Profiles

PanGP is a stand-alone tool that was developed to perform pan-genome analysis for large-scale strains with an extremely low time cost. The program works with two algorithms, totally random (TR) and distance guide (DG), which are integrated in the software with a user-friendly graphic interface (available at http://PanGP.big.ac.cn) [65]. The basic difference between the TR and DG algorithm consists of estimating the sample size, where the TR algorithm repeats randomly the samples in non-redundant combinations for all possible combinations, and the DG algorithm has a variable amplification coefficient, which controls the sample size for evaluating the genome diversity of all of the combinations. Tests performed by the authors showed that the DG algorithm has better efficiency [65].

ITEP – Integrated Toolkit for the Exploration of Microbial Pan-genomes

ITEP is a collection of scripts that are written in Python, and BASH is integrated with the SQLite database. This software system is a stand-alone toolkit that is available for download at https://price.systemsbiology.net/itep. The ITEP toolkit was developed to predict protein families, orthologous genes, functional domains, pan-genome (core and variable genes), and metabolic networks for related microbial species [66]. The ITEP workflow consists of a three-step process: Step 1 – Input data: ITEP receives three different types of data: Genbank file format, organism file format, and groups file format, and all of the inputs require pre-processing before running the ITPEP toolkit (for more details, see the ITEP documentation); Step 2 – Building a database (startup scripts): In this step, scripts are run to predict the gene locations, BLAST results, and clustering results; Step 3 – Analyses database: Once the database is ready, the user can start the analyses with the following: core and variable genes, phylogenies, metabolic reconstructions and gene gain and loss patterns [66].

GET_HOMOLOGUES

GET_HOMOLOGUES is a stand-alone and open-source toolkit that was written in Perl and R that can be installed on personal machines. It was developed to perform pan-genome and comparative-genomic analysis of bacterial strains [67]. To build clusters of orthologous groups, the program starts using BLAST+ [58] and HMMER [68]. Then, the sequences, features, and intergenes are extracted, sorted, and indexed. The results are submitted to the bidirectional best hit (BDBH) algorithm, which sorts the genomes by size and takes the smallest as a reference and then identifies paralogous genes that arose by duplication after speciation. Subsequently, new genomes are added and compared with the reference genome, and their BDBHs are annotated; in the last step, clusters that comprise at least one sequence per genome are conserved [67]. Concomitantly, the results are submitted to OrthoMCL [69] version 1.4 and COGtriangles [70].

PanFunPro: PAN-genome Analysis Based on FUNctional PROfiles

PanFunPro is a stand-alone tool for pan-genome analysis using functional domains from HMM (Hidden Markov Models) to group homologous proteins into families based on their functional domain content [71, 72]. In addition to pan-genome analyses, the software performs homology detection and genome annotation using HMM, genome and proteome estimation as well as gene ontology (GO) information [72, 73]. PanFunPro has four steps: Step 1 – Genome selection: Submission of the data set can be accomplished using amino acid sequences for all of the encoded proteins. If the data set does not have annotation, then it should first be submitted to Prodigal software [72, 74] for protein prediction; Step 2 – Prediction of functional domains: Prediction of functional domains in proteins for a complete data set using PfamA, TIGRFAM, and Superfamily are all integrated into the InterProScan software [75-78]; Step 3 – Construction of functional profiles and protein groupings: Here, the software considers HMM hits with an E-value below 0.001 to create functional profiles and protein grouping; Step 4: Pan, core and accessory genomes analyses: In the last step, the pan-genome, core genome, and accessory genome are calculated from the GO terms [72].

Panseq – Pan-genome Sequence Analysis Program

Panseq is a freely available web-tool written in BioPerl [79], which is available at http://76.70.11.198/panseq. However, the users can download the BioPerl scripts by contacting the author [80]. In contrast to the other programs described here, Panseq defines the core and accessory genome based on the sequence identity and segmentation length and not on the predicted proteins. For this purpose, the NRF module (Novel Region Finder) was developed. The NRF module first splits the genome sequence into fragments with predefined sizes, and then, the MUMmer alignment program [81] identifies the sequences and contiguous regions that are present or absent in the database [80]. Next, the CAGF module (Core and Accessory Genome Finder) compares a single sequence file and makes comparisons with all of the other sequences. If this single sequence fits with predefined parameters, then it is added to pan-genome, and then, the newly-added-to fragment sequence is used for subsequent comparisons, and the looping continues until all of the fragment sequences have been tested [80].

CONCLUSION

The amount of pan-genome software has increased since the first time that this term was used by Tettelin and colleagues [16] because the importance of pan-genome studies enables us to identify efficient target genes that can be used in vaccine and drug development through core-genome analyses. Moreover, analyses with genes that belong to the dispensable genome can help us to understand the different symptoms and infections in the hosts, niche adaptations, evolutionary studies development and diagnosis with respect to strains.

Table 1

Pan-genome studies.

Organism	No of Genomes	Open/Closed Pan-genome	Pan-genome Size
Pantoea ananatis	8	open	5,566
Lactobacillus rhamnosus	13	open	4,893
Corynebacterium pseudotuberculosis	15	open	2,782
Corynebacterium diphtheriae	13	open	4,786
Buchnera aphidicola	6	closed	2,597

78 in total

1. KEGG: kyoto encyclopedia of genes and genomes.

Authors: M Kanehisa; S Goto
Journal: Nucleic Acids Res Date: 2000-01-01 Impact factor: 16.971

2. Lineage-specific gene expansions in bacterial and archaeal genomes.

Authors: I K Jordan; K S Makarova; J L Spouge; Y I Wolf; E V Koonin
Journal: Genome Res Date: 2001-04 Impact factor: 9.043

Review 3. Emerging technologies in DNA sequencing.

Authors: Michael L Metzker
Journal: Genome Res Date: 2005-12 Impact factor: 9.043

Review 4. Next-generation DNA sequencing methods.

Authors: Elaine R Mardis
Journal: Annu Rev Genomics Hum Genet Date: 2008 Impact factor: 8.929

5. Pangenomic study of Corynebacterium diphtheriae that provides insights into the genomic diversity of pathogenic isolates from cases of classical diphtheria, endocarditis, and pneumonia.

Authors: Eva Trost; Jochen Blom; Siomar de Castro Soares; I-Hsiu Huang; Arwa Al-Dilaimi; Jasmin Schröder; Sebastian Jaenicke; Fernanda A Dorella; Flavia S Rocha; Anderson Miyoshi; Vasco Azevedo; Maria P Schneider; Artur Silva; Thereza C Camello; Priscila S Sabbadini; Cíntia S Santos; Louisy S Santos; Raphael Hirata; Ana L Mattos-Guaraldi; Androulla Efstratiou; Michael P Schmitt; Hung Ton-That; Andreas Tauch
Journal: J Bacteriol Date: 2012-04-13 Impact factor: 3.490

6. Pan-genome of the dominant human gut-associated archaeon, Methanobrevibacter smithii, studied in twins.

Authors: Elizabeth E Hansen; Catherine A Lozupone; Federico E Rey; Meng Wu; Janaki L Guruge; Aneesha Narra; Jonathan Goodfellow; Jesse R Zaneveld; Daniel T McDonald; Julia A Goodrich; Andrew C Heath; Rob Knight; Jeffrey I Gordon
Journal: Proc Natl Acad Sci U S A Date: 2011-02-11 Impact factor: 11.205

7. InParanoid 7: new algorithms and tools for eukaryotic orthology analysis.

Authors: Gabriel Ostlund; Thomas Schmitt; Kristoffer Forslund; Tina Köstler; David N Messina; Sanjit Roopra; Oliver Frings; Erik L L Sonnhammer
Journal: Nucleic Acids Res Date: 2009-11-05 Impact factor: 16.971

8. Aphid thermal tolerance is governed by a point mutation in bacterial symbionts.

Authors: Helen E Dunbar; Alex C C Wilson; Nicole R Ferguson; Nancy A Moran
Journal: PLoS Biol Date: 2007-05 Impact factor: 8.029

9. Complete genome sequence of the industrial bacterium Bacillus licheniformis and comparisons with closely related Bacillus species.

Authors: Michael W Rey; Preethi Ramaiya; Beth A Nelson; Shari D Brody-Karpin; Elizabeth J Zaretsky; Maria Tang; Alfredo Lopez de Leon; Henry Xiang; Veronica Gusti; Ib Groth Clausen; Peter B Olsen; Michael D Rasmussen; Jens T Andersen; Per L Jørgensen; Thomas S Larsen; Alexei Sorokin; Alexander Bolotin; Alla Lapidus; Nathalie Galleron; S Dusko Ehrlich; Randy M Berka
Journal: Genome Biol Date: 2004-09-13 Impact factor: 13.583

10. Analysis of the Pantoea ananatis pan-genome reveals factors underlying its ability to colonize and interact with plant, insect and vertebrate hosts.

Authors: Pieter De Maayer; Wai Yin Chan; Enrico Rubagotti; Stephanus N Venter; Ian K Toth; Paul R J Birch; Teresa A Coutinho
Journal: BMC Genomics Date: 2014-05-27 Impact factor: 3.969

27 in total

1. Genomic, Phenotypic, and Virulence Analysis of Streptococcus sanguinis Oral and Infective-Endocarditis Isolates.

Authors: Shannon P Baker; Tara J Nulton; Todd Kitten
Journal: Infect Immun Date: 2018-12-19 Impact factor: 3.441

Review 2. Optimizing the Parametrization of Homologue Classification in the Pan-Genome Computation for a Bacterial Species: Case Study Streptococcus pyogenes.

Authors: Erwin Tantoso; Birgit Eisenhaber; Frank Eisenhaber
Journal: Methods Mol Biol Date: 2022

3. Molecular and Genomic Characterization of PFAB2: A Non-virulent Bacillus anthracis Strain Isolated from an Indian Hot Spring.

Authors: Aparna Banerjee; Vikas K Somani; Priyanka Chakraborty; Rakesh Bhatnagar; Rajeev K Varshney; Alex Echeverría-Vega; Sara Cuadros-Orellana; Rajib Bandopadhyay
Journal: Curr Genomics Date: 2019-11 Impact factor: 2.236

4. Strong Genomic and Phenotypic Heterogeneity in the Aeromonas sobria Species Complex.

Authors: Jeff Gauthier; Antony T Vincent; Steve J Charette; Nicolas Derome
Journal: Front Microbiol Date: 2017-12-08 Impact factor: 5.640

5. Comparative Genomics and Identification of an Enterotoxin-Bearing Pathogenicity Island, SEPI-1/SECI-1, in Staphylococcus epidermidis Pathogenic Strains.

Authors: Xavier Argemi; Chimène Nanoukon; Dissou Affolabi; Daniel Keller; Yves Hansmann; Philippe Riegel; Lamine Baba-Moussa; Gilles Prévost
Journal: Toxins (Basel) Date: 2018-02-25 Impact factor: 4.546

Review 6. The Transcriptional Regulators of the CRP Family Regulate Different Essential Bacterial Functions and Can Be Inherited Vertically and Horizontally.

Authors: Gloria Soberón-Chávez; Luis D Alcaraz; Estefanía Morales; Gabriel Y Ponce-Soto; Luis Servín-González
Journal: Front Microbiol Date: 2017-05-31 Impact factor: 5.640

Review 7. Variability of Bacterial Essential Genes Among Closely Related Bacteria: The Case of Escherichia coli.

Authors: Enrique Martínez-Carranza; Hugo Barajas; Luis-David Alcaraz; Luis Servín-González; Gabriel-Yaxal Ponce-Soto; Gloria Soberón-Chávez
Journal: Front Microbiol Date: 2018-05-29 Impact factor: 5.640

8. Comparative genomic analysis of Staphylococcus lugdunensis shows a closed pan-genome and multiple barriers to horizontal gene transfer.

Authors: Xavier Argemi; Dorota Matelska; Krzysztof Ginalski; Philippe Riegel; Yves Hansmann; Jochen Bloom; Martine Pestel-Caron; Sandrine Dahyot; Jérémie Lebeurre; Gilles Prévost
Journal: BMC Genomics Date: 2018-08-20 Impact factor: 3.969

9. Pan-Resistome Insights into the Multidrug Resistance of Acinetobacter baumannii.

Authors: Diego Lucas Neres Rodrigues; Francielly Morais-Rodrigues; Raquel Hurtado; Roselane Gonçalves Dos Santos; Daniela Camargos Costa; Debmalya Barh; Preetam Ghosh; Khalid J Alzahrani; Siomar Castro Soares; Rommel Ramos; Aristóteles Góes-Neto; Vasco Azevedo; Flávia Figueira Aburjaile
Journal: Antibiotics (Basel) Date: 2021-05-18

10. Diversity and evolution of the emerging Pandoraviridae family.

Authors: Matthieu Legendre; Elisabeth Fabre; Olivier Poirot; Sandra Jeudy; Audrey Lartigue; Jean-Marie Alempic; Laure Beucher; Nadège Philippe; Lionel Bertaux; Eugène Christo-Foroux; Karine Labadie; Yohann Couté; Chantal Abergel; Jean-Michel Claverie
Journal: Nat Commun Date: 2018-06-11 Impact factor: 14.919