| Literature DB >> 23991251 |
Ulisses de Pádua Pereira1, Anderson Rodrigues Dos Santos, Syed Shah Hassan, Flávia Figueira Aburjaile, Siomar de Castro Soares, Rommel Thiago Jucá Ramos, Adriana Ribeiro Carneiro, Luís Carlos Guimarães, Sintia Silva de Almeida, Carlos Augusto Almeida Diniz, Maria Silvanira Barbosa, Pablo Gomes de Sá, Amjad Ali, Syeda Marriam Bakhtiar, Fernanda Alves Dorella, Adhemar Zerlotini, Flávio Marcos Gomes Araújo, Laura Rabelo Leite, Guilherme Oliveira, Anderson Miyoshi, Artur Silva, Vasco Azevedo, Henrique César Pereira Figueiredo.
Abstract
Streptococcus agalactiae (Lancefield group B; GBS) is the causative agent of meningoencephalitis in fish, mastitis in cows, and neonatal sepsis in humans. Meningoencephalitis is a major health problem for tilapia farming and is responsible for high economic losses worldwide. Despite its importance, the genomic characteristics and the main molecular mechanisms involved in virulence of S. agalactiae isolated from fish are still poorly understood. Here, we present the genomic features of the 1,820,886 bp long complete genome sequence of S. agalactiae SA20-06 isolated from a meningoencephalitis outbreak in Nile tilapia (Oreochromis niloticus) from Brazil, and its annotation, consisting of 1,710 protein-coding genes (excluding pseudogenes), 7 rRNA operons, 79 tRNA genes and 62 pseudogenes.Entities:
Keywords: Streptococcus agalactiae; fish pathogen; genome sequencing
Year: 2013 PMID: 23991251 PMCID: PMC3746423 DOI: 10.4056/sigs.3687314
Source DB: PubMed Journal: Stand Genomic Sci ISSN: 1944-3277
Classification and general features of SA20-06 according to the MIGS recommendations [15].
| | | | |
|---|---|---|---|
| Classification | Domain | TAS [ | |
| Phylum | TAS [ | ||
| Class | TAS [ | ||
| Order | TAS [ | ||
| Family | TAS [ | ||
| Genus | TAS [ | ||
| Species | TAS [ | ||
| Strain SA20-06 | TAS [ | ||
| Gram stain | Positive | TAS [ | |
| Cell shape | Spherical or ovoid | TAS [ | |
| Motility | non-motile | TAS [ | |
| Sporulation | non-sporulating | TAS [ | |
| Temperature range | mesophile | TAS [ | |
| Optimum temperature | 28°C (fish isolates) | IDA | |
| Salinity | usually grows in 4% of NaCl, but not in 6.5% | TAS [ | |
| MIGS-22 | Oxygen | Facultative anaerobe | TAS [ |
| Carbon source | cellobioise, beta-glucoside, trehalose, mannose, lactose, fructose, | TAS [ | |
| Energy source | Chemoorganotroph with fermentative metabolism | TAS [ | |
| MIGS-6 | Habitat | Host | TAS [ |
| MIGS-15 | Biotic relationship | Symbiotic (pathogen) | TAS [ |
| MIGS-14 | Pathogenicity | Cows, human, fishes and other animals | TAS [ |
| Biosafety level | 2 | TAS [ | |
| Isolation | Kidney of Nile tilapia | TAS [ | |
| MIGS-4 | Geographic location | Parana state, Brazil | TAS [ |
| MIGS-5 | Sample collection time | 2006 | TAS [ |
| MIGS-4.1 | Latitude | not reported | |
| MIGS-4.3 | Depth | not reported | |
| MIGS-4.4 | Altitude | not reported |
Evidence codes - IDA: Inferred from Direct Assay; TAS: Traceable Author Statement (i.e., a direct report exists in the literature); NAS: Non-traceable Author Statement (i.e., not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontology project [33]. If the evidence is IDA, then the property was directly observed for a live isolate by one of the authors or an expert mentioned in the acknowledgements.
Figure 1Phylogenetic tree highlighting the position of strain SA20-06 in relation to other selected strains of the species and others from the genus . The tree was based on 1,410 characters of the 16S rRNA gene sequence aligned using ClustalW2 [35]. The tree was inferred under the maximum likelihood criterion using MEGA5 software [36] and rooted with 16S rRNA sequence of fish pathogen (a member of the ). The branches were mapped by the expected number of substitutions per site. The numbers above the branches are support values from 1,000 bootstrap replicates. The strains and their corresponding GenBank accession numbers (and, when applicable, draft sequence coordinates) for 16S rRNA genes are: 18rs21, NZ_AAJO01000124; ATCC13813, NR_040821; 2603VR, NC_004116; GB00112, AKXO01000029; FSL_S3-026, AEXT01000002; NEM316, AL766845; SA20-06, NC_019048; A909, NC_007432; GD201008-001, CP003810; ATCC 27957, CM001076; 9117, NZ_AMOO01000003; KCT 11537, NC_015558; alab49, NC_017596; ST556, NC_017769; CNRZ1066, NC_006449; ACA-DC 198, NC_016749; , AP009332.
Genome sequencing project information.
| | | |
|---|---|---|
| MIGS-31 | Finishing quality | Finished |
| MIGS-28 | Libraries used | Two mate-paired libraries (mean size 50 or 60 bp, DNA insert size of 1-2Kb) |
| MIGS-29 | Sequencing platforms | SOLiD v3 plus and SOLiD 5500 |
| MIGS-31.2 | Sequencing coverage | 5700× |
| MIGS-30 | Assemblers | CLC Genome Workbench, Velvet, Edena |
| MIGS-32 | Gene calling method | Glimmer |
| Genbank ID | CP003919 (chromosome) | |
| Genbank Date of Release | November 02, 2012 | |
| GOLD ID | Gc02347 | |
| Project relevance | Animal and human pathogen |
Figure 2Graphical circular map of the genome performed with CGview comparison tool [49]. From outer to inner circle: Genes on forward strand (color by COG categories), Genes on reverse strand (color by COG categories), RNA genes (tRNAs red, rRNAs green, other RNAs black), GC content, GC skew.
Genome Statistics.
| | | |
|---|---|---|
| Genome size (bp) | 1,820,886 | 100.00% |
| DNA coding region (bp) | 1,547,993 | 85.01% |
| DNA G+C content (bp) | 647,477 | 35.56% |
| Number of replicons | 1 | |
| Extrachromosomal elements | 0 | |
| Total genesb | 1,872 | 100.00% |
| RNA genes | 100 | 5.34% |
| rRNA operons | 7 | |
| Protein-coding genes | 1,772 | 94.66% |
| Pseudo genes | 62 | 3.31% |
| Genes with function prediction | 1,515 | 80.93% |
| Genes in paralog clusters | 430 | 22.97% |
| Genes assigned to COGs | 1,469 | 78.47% |
| Genes assigned Pfam domains | 1,547 | 82.64% |
| Genes with signal peptides | 302 | 16.13% |
| Genes with transmembrane helices | 447 | 23.88% |
a) The total is based on either the size of the genome in base pairs or the total number of protein coding genes in the annotated genome.
b) Also includes 62 pseudogenes.
Number of genes associated with the general COG functional categories.
| | | | |
|---|---|---|---|
| J | 146 | 9.2 | Translation, ribosomal structure and biogenesis |
| A | 0 | 0.0 | RNA processing and modification |
| K | 118 | 7.44 | Transcription |
| L | 86 | 5.42 | Replication, recombination and repair |
| B | 0 | 0.0 | Chromatin structure and dynamics |
| D | 17 | 1.07 | Cell cycle control, cell division, chromosome partitioning |
| Y | 0 | 0.0 | Nuclear structure |
| V | 36 | 2.27 | Defense mechanisms |
| T | 66 | 4.16 | Signal transduction mechanisms |
| M | 92 | 5.8 | Cell wall/membrane biogenesis |
| N | 6 | 0.38 | Cell motility |
| Z | 0 | 0.0 | Cytoskeleton |
| W | 0 | 0.0 | Extracellular structures |
| U | 21 | 1.32 | Intracellular trafficking and secretion |
| O | 53 | 3.34 | Posttranslational modification, protein turnover, chaperones |
| C | 46 | 2.9 | Energy production and conversion |
| G | 150 | 9.45 | Carbohydrate transport and metabolism |
| E | 134 | 8.44 | Amino acid transport and metabolism |
| F | 75 | 4.73 | Nucleotide transport and metabolism |
| H | 52 | 3.28 | Coenzyme transport and metabolism |
| I | 43 | 2.71 | Lipid transport and metabolism |
| P | 86 | 5.42 | Inorganic ion transport and metabolism |
| Q | 19 | 1.2 | Secondary metabolites biosynthesis, transport and catabolism |
| R | 192 | 12.10 | General function prediction only |
| S | 149 | 9.39 | Function unknown |
| - | 403 | 21.53 | Not in COGs |