Literature DB >> 23991261

Non-contiguous genome sequence of Mycobacterium simiae strain DSM 44165(T.).

Mohamed Sassi¹, Catherine Robert, Didier Raoult, Michel Drancourt.

Abstract

Mycobacterium simiae is a non-tuberculosis mycobacterium causing pulmonary infections in both immunocompetent and imunocompromized patients. We announce the draft genome sequence of M. simiae DSM 44165(T). The 5,782,968-bp long genome with 65.15% GC content (one chromosome, no plasmid) contains 5,727 open reading frames (33% with unknown function and 11 ORFs sizing more than 5000 -bp), three rRNA operons, 52 tRNA, one 66-bp tmRNA matching with tmRNA tags from Mycobacterium avium, Mycobacterium tuberculosis, Mycobacterium bovis, Mycobacterium microti, Mycobacterium marinum, and Mycobacterium africanum and 389 DNA repetitive sequences. Comparing ORFs and size distribution between M. simiae and five other Mycobacterium species M. simiae clustered with M. abscessus and M. smegmatis. A 40-kb prophage was predicted in addition to two prophage-like elements, 7-kb and 18-kb in size, but no mycobacteriophage was seen after the observation of 10(6) M. simiae cells. Fifteen putative CRISPRs were found. Three genes were predicted to encode resistance to aminoglycosides, betalactams and macrolide-lincosamide-streptogramin B. A total of 163 CAZYmes were annotated. M. simiae contains ESX-1 to ESX-5 genes encoding for a type-VII secretion system. Availability of the genome sequence may help depict the unique properties of this environmental, opportunistic pathogen.

Entities: Chemical Disease Gene Mutation Species

Keywords: Mycobacterium simiae draft genome; SOLiD; non-tuberculous mycobacteria

Year: 2013 PMID： 23991261 PMCID： PMC3746425 DOI： 10.4056/sigs.3707349

Source DB: PubMed Journal: Stand Genomic Sci ISSN： 1944-3277

Introduction

is the type species for , and is phylogenetically related to [1], [2], [3], [4], [5], [6], [7], [8] and [9]. is slow growing and photochromogenic, appearing rust-colored after exposure to light and is the only non-tuberculous mycobacterium that, is niacin positive, like [10]. was isolated initially from rhesus macaques in 1965 [11]. In immunocompetent patients, is responsible for lymphadenitis [12,13], bone infection [14], respiratory tract infection [15] and skin infection [16]. also causes infection in immunocompromized HIV-infected patients [17,18], including patients with immune reconstruction [19]. Tap water has proven to be a source of infection in both community and hospital-acquired infection [20,21]. To understand the genetics of in detail, we sequenced and annotated a draft genome of the type strain of (DSM 44165T).

Classification and features

strain DSM 44165 T is the only genome sequenced strain within the complex (Table 1).

Table 1

Classification and general features of DSM44165T [22].

MIGS ID	Property	Term	Evidence code
	Current classification	Domain Bacteria	TAS [23]
		Phylum Actinobacteria	TAS [24]
		Class Actinobacteria	TAS [25]
		Subclass Actinobacteridae	TAS [25,26]
		Order Actinomycetales	TAS [25-28]
		Suborder Corynebacterineae	TAS [25,26]
		Family Mycobacteriaceae	TAS [25-27,29]
		Genus Mycobacterium	TAS [27,30,31]
		Species Mycobacterium simiae	TAS [11,27]
	Gram stain	Weakly positive	TAS [11]
	Motility	Non motile	TAS [11]
	Sporulation	nonsporulating	NAS
	Temperature range	mesophile	TAS [11]
	Optimum temperature	37°C	TAS [11]
	Salinity	normal	TAS [11]
MIGS-22	Oxygen requirement	aerobic	TAS [11]
MIGS-6	Habitat	Soil	TAS [11]
MIGS-15	Biotic relationship	Free-living	NAS
MIGS-14	Pathogenicity	none	NAS
	Biosafety level	2	NAS
	Isolation	Macacus rhesus	TAS [11]
MIGS-4	Geographic location	Country India	TAS [11]
MIGS-5	Sample collection time	1965	TAS [11]
MIGS-4.1	Latitude	20.593684	NAS [11]
MIGS-4.2	Longitude	78.96288	NAS [11]
MIGS-4.3	Depth	Not reported	TAS [11]
MIGS-4.4	Altitude	Not reported	TAS [11]

Evidence codes - TAS: Traceable Author Statement (i.e., a direct report exists in the literature); NAS: Non-traceable Author Statement (i.e., not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontology project [32]. The 16S rRNA gene sequence, derived from the strain DSM 44165 T genome sequence showed 100% sequence similarity to that of type strain DSM 44165 T /ATCC 25275 T previously deposited in GenBank (GenBank accession: GQ153280.1) and 99% sequence similarity with (GenBank accession: AY353699.1). The rpoB gene sequence of showed 98% similarity with (GenBank accession: GQ166762.1), the closest mycobacterial species. The rpoB gene sequence-based phylogenetic tree (Figure 1) illustrates that DSM 44165 T is phylogentically closest to and , which are all species constituting the complex.

Figure 1

rpoB gene sequence based phylogenetic tree highlighting the position of DSM 44165 relative to other type strains within the genus. Phylogenetic inferences obtained using the neighbor-joining method within MEGA. Numbers at the nodes are percentages of bootstrap values obtained by repeating the analysis 1,000 times to generate a majority consensus tree. P14 was used as an outgroup. The genome shares, 87%, 83%, 79% and 76% nucleotide similarity with the closest sequenced genomes of the species sp: MOTT36Y (CP003491.1), ATCC 13950 (ABIN00000000), MTCC 9506 (CP002275.1) and 104 (CP000479.1), respectively. In order to complement the phenotypic traits previously reported for [10], we observed 106 cells by electron microscopy as previously described [33]. Briefly, cells were deposited on carbon-reinforced Formvar-coated grids and negatively stained with 1.5 (w:v) phosphotungstic acid (ph 7.0). The grids were examined using a Hitachi HU-12 electron microscope (FEI, Lyon, France) at 89× magnification. No phage was observed in DSM 44165 T cultures. cells measured 1,226 nm in length and 594 nm in width of (Figure 2)

Figure 2

Electron microscopy graph of DSM 44165T

Electron microscopy graph of DSM 44165T Matrix-assisted laser-desorption/ionization time-of-flight (MALDI-TOF) MS protein analysis was carried out as previously described [34]. The spectra were imported into the MALDI Bio Typer software (version 2.0, Bruker, Wissembourg, France) and analyzed by standard pattern matching (with default parameter settings) against the main spectra of 3,769 bacteria, including spectra from 79 validly named mycobacterial species used as reference data, in the Bio Typer database (updated March 15th, 2012). The method of identification includes the m/z from 3,000 to 15,000 Da. For every spectrum, 100 peaks at most were taken into account and compared with the spectra in the database. For DSM 44165 T, the score obtained was 1.7, matching that of 423-B-I-2007-BSI thus suggesting that our isolate was a member of a species. We incremented our database with the spectrum from DSM 44165 T (Figure 3).

Figure 3

Reference mass spectrum from strain DSM 44165. Spectra from 5 individual colonies were compared and a reference spectrum was generated.

Genome sequencing and annotation

Genome project history

is the first member of the species complex for which a genome sequence has been completed. This organism was selected to gain understanding in the genetics of complex in detail (Table 2).

Table 2

Project information

MIGS ID	Property	Term
MIGS-31	Finishing quality	High-quality draft
MIGS-28	Libraries used	One 454 paired end 3-kb library
MIGS-29	Sequencing platforms	454 GS FLX Titanium
MIGS-31.2	Fold coverage	15.33
MIGS-30	Assemblers	Newbler version 2.5.3
MIGS-32	Gene calling method	Prodigal
	EMBL-EBI/NCBI project ID	PRJEB1560
	EMBL-EBI/Genbank ID	CBMJ020000001-CBMJ020000359
	EMBL-EBI Date of Release	June 27, 20113
MIGS-13	Source material identifier	DSM 44165^T
	Project relevance	Pangenome of opportunistic mycobacteria

Growth conditions and DNA isolation

strain DSM 44165 T was grown in 7H9 broth (Difco, Bordeaux, France) enriched with 10% OADC (oleic acid, bovine serum albumin, dextrose and catalase) in 8-mL tubes at 37°C. The culture was centrifuged at 8,000 g for 10 min, the pellet was resuspended in 250 µL of phosphate buffered saline (PBS) and inactivated by heating at 95°C for one h. The sample was then transferred into a sterile screw-cap Eppendorf tube containing 0.3 g of acid-washed glass beads (Sigma, Saint-Quentin Fallavier, France) and shaken using a Bio 101 Fast Prep instrument (Qbiogene, Strasbourg, France) at level 6.5 (full speed) for 45 s. The supernatant was incubated overnight at 56°C with 25 µL proteinase K (20 mg/ml) and 180 µL T1 buffer from the Nucleospin Tissue Mini kit (Macherey-Nagel, Hoerdt, France). After a second mechanical lysis and a 15 min incubation at 70°C, total DNA was extracted using the NucleoSpin Tissue Mini kit (Macherey-Nagel, Hoerdt, France). The extracted DNA was eluted into 100 µL of elution buffer and stored at –20°C until used.

Genome sequencing and assembly

The concentration of the DNA was measured using a Quant-it Picogreen kit (Invitrogen) on the Genios Tecan fluorometer at 79.36 ng/µl. A 5 µg quantity of DNA was mechanically fragmented on the Covaris device (KBioScience-LGC Genomics, Teddington, UK) through miniTUBE-Red 5Kb. The DNA fragmentation was visualized in an Agilent 2100 BioAnalyzer on a DNA labchip 7500 with an optimal size of 3.57kb. The library was constructed according to the 454 Titanium paired end protocol (Roche, Boulogne-Billancourt, France). Circularization and nebulization were performed to generate a pattern with an optimum at 415 bp. After PCR amplification through 17 cycles followed by double size selection, the single stranded paired end library was quantified on the Quant-it Ribogreen kit (Invitrogen) on the Genios_Tecan fluorometer at 865pg/µL. The library concentration equivalence was calculated as 1.91E+09 molecules/µL. The library was stocked at -20°C until used. The library was clonally amplified with 0.5 cpb in 2 emPCR reactions with the GS Titanium SV emPCR Kit (Lib-L) v2 (Roche, Boulogne-Billancourt, France). The yield of the emPCR was 20.2%, which is somewhat high compared to the range of 5 to 20% from the Roche procedure. A total of 790,000 beads were loaded on the GS Titanium PicoTiterPlate PTP Kit 70x75 and sequenced with a GS Titanium Sequencing Kit XLR70 (Roche, Boulogne-Billancourt, France). The run was done overnight and analyzed on the cluster through the gsRunBrowser and gsAssembler_Roche. A total of 241,405 passed filter wells were obtained and generated 88.64Mb with an average 367 bp length. The passed filter sequences were assembled on the gsAssembler (Roche, Boulogne-Billancourt, France), with 90% identity and 40 bp as overlap, yielding one scaffold and 338 large contigs (>1,500 bp), generating a genome size of 5.78 Mb, which corresponds to a coverage of 15.33 × genome equivalents.

Genome annotation

Open reading frames (ORFs) were predicted using Prodigal [35,36] with default parameters. The predicted bacterial protein sequences were searched against the NCBI NR database, UNIPROT [37] and against COGs [38] using BLASTP. The ARAGORN software tool [39] was used to find tRNA genes, whereas ribosomal RNAs were found by using RNAmmer [40] and BLASTn against the NR database. Proteins were also checked for domain using a hidden Markov model (HMM) search against the PFAM database [41]. The Tandem Repeat Finder was used for repetitive DNA prediction [42]. The prophage region prediction was completed using PHAST (PHAge Search Tool) [43]. CRISPRs were found using the CRISPER finder [44]. The antibiotic resistance genes were annotated using. The CAZYmes, which are enzymes involved in the synthesis, metabolism, and transport of carbohydrates were annotated using CAZYmes Analysis Toolkit (CAT) (mothra.ornl.gov/cgi-bin/cat.cgi?tab=CAZymes)

Genome properties

strain DSM 44165 T genome consists of a 5,782,968-pb long (65.15% GC content) chromosome without plasmids (Figure 4). Table 3 presents the nucleotide content and gene count levels of the genome and the distribution of genes into COGs functional categories is presented in Table 4.

Figure 4

Table 3

Nucleotide content and gene count levels of the genome

Attribute	Value	% of total^a
Genome size (bp)	5,782,968	100
DNA coding region (bp)	5,072,379	87.71
DNA G+C content (bp)	3,767,609	65.15
Number of replicons	1
Extrachromosomal elements	0
Total genes	5,782	100
RNA genes	55	0.95
Protein-coding genes	5,727	99.04
Genes with function prediction	4,673	81.6
Genes assigned to COGs	4,105	71,67
Genes with peptide signals	377	6.58
Genes with transmembrane helices	1,144	19.97
CRISPR repeats	15

a) The total is based on either the size of the genome in base pairs or the total number of protein coding genes in the annotated genome

Table 4

Number of genes associated with the 25 general COG functional categories

Code	Value	% age^a	Description
J	157	2.74	Translation
A	1	0.02	RNA processing and modification
K	410	7,16	Transcription
L	171	2.99	Replication, recombination and repair
B	2	0.03	Chromatin structure and dynamics
D	34	0.59	Cell cycle control, mitosis and meiosis
Y	0	0	Nuclear structure
V	41	0.72	Defense mechanisms
T	169	2.95	Signal transduction mechanisms
M	162	2.83	Cell wall/membrane biogenesis
N	48	0.84	Cell motility
Z	0	0	Cytoskeleton
W	0	0	Extracellular structures
U	23	0.40	Intracellular trafficking and secretion
O	132	2.30	Posttranslational modification, protein turnover, chaperones
C	400	6.98	Energy production and conversion
G	212	3.70	Carbohydrate transport and metabolism
E	151	2.64	Amino acid transport and metabolism
F	11	0.19	Nucleotide transport and metabolism
H	158	2,76	Coenzyme transport and metabolism
I	418	7,30	Lipid transport and metabolism
P	192	3,35	Inorganic ion transport and metabolism
Q	433	7,56	Secondary metabolites biosynthesis, transport and catabolism
R	656	11,45	General function prediction only
S	291	5,08	Function unknown
	1622	1,25	Not in COGs

a) The total is based on the total number of protein coding genes in the annotated genome.

Graphical circular map of the chromosome. From outside to the center: Genes on the forward strand (colored by COG categories), genes on the reverse strand (colored by COG categories), RNA genes (tRNAs green, rRNAs red), GC content, and GC skew. a) The total is based on either the size of the genome in base pairs or the total number of protein coding genes in the annotated genome a) The total is based on the total number of protein coding genes in the annotated genome. The genome contains three rRNA (5S rRNA, 23S rRNA and 16S rRNA), 52 tRNA genes with one transfer-messenger RNA (tmRNA) and 5,727 ORFs with 4,673 ORFs (81.6%) having at least one PFAM domain. The properties and the statistics of the genome are summarized in Table 3. Of the coding sequences, 66% could be assigned to COG families (Table 4). The draft genome has 389 DNA repetitive sequences and contains a 40-kb prophage like region with attachment sites. Two prophage like elements sized 7 kb and 8 kb containing six and 12 phage-like proteins respectively. A total of 15 questionable CRISPRs (Clustered Regularly Interspaced Short Palindromic Repeats) were found and three genes encoding resistance to aminoglycosides, betalactamines and Macrolide-Lincosamide-StreptograminB (Table 3) were annotated. DSM 44165T showed the presence of 163 Carbohydrate-Active Enzymes genes belonging to 36 CAZy family (supplementary data S1). Analysis of the distribution of ORF size revealed 11 ORFs > 5,000-pb, including two ORFs > 10,000-pb: a 12,942-bp ORF showed 77% similarity with a 104 gene encoding a linear gramidicin synthase subunit D; a 14,415-bp ORF showed no similarity with NR database. We verified the open reading frames of the two ORFs using ORFs finder online software [45] and found that these ORFs encode 4,313 and 4,804 amino acids proteins respectively. A heatmap based on the distribution of ORFs sizes in and five other genomes was done in R [46], which clusters with and , indicating that the three genomes have similar ORFs size distribution (Figure 5).

Figure 5

Heatmap of the ORFs size distribution of compared with 5 other genomes.

Heatmap of the ORFs size distribution of compared with 5 other genomes. Recent evidence shows that mycobacteria have developed novel and specialized secretion systems for the transport of extracellular proteins across their hydrophobic, highly impermeable, cell wall [47]. genomes encode up to five of these transport systems, and ESX-1 and ESX-5 systems are involved in virulence [47]. In comparison with H37Rv type VII clusters using Blastp, a total of 77 proteins encoding a type VII secretion system were annotated in (supplementary data II). ESX-5 seems to be a conserved cluster between and , in agreement with opportunistic pathogenicity of

36 in total

Review 1. Disseminated Mycobacterium simiae infection in patients with AIDS.

Authors: H M Al-Abdely; S G Revankar; J R Graybill
Journal: J Infect Date: 2000-09 Impact factor: 6.072

2. Mycobacterium shigaense sp. nov., a novel slowly growing scotochromogenic mycobacterium that produced nodules in an erythroderma patient with severe cellular immunodeficiency and a history of Hodgkin's disease.

Authors: Kazue Nakanaga; Yoshihiko Hoshino; Makiko Wakabayashi; Noriki Fujimoto; Enrico Tortoli; Masahiko Makino; Toshihiro Tanaka; Norihisa Ishii
Journal: J Dermatol Date: 2011-09-29 Impact factor: 4.005

3. Ongoing revolution in bacteriology: routine identification of bacteria by matrix-assisted laser desorption ionization time-of-flight mass spectrometry.

Authors: Piseth Seng; Michel Drancourt; Frédérique Gouriet; Bernard La Scola; Pierre-Edouard Fournier; Jean Marc Rolain; Didier Raoult
Journal: Clin Infect Dis Date: 2009-08-15 Impact factor: 9.079

4. Mycobacterium simae outbreak associated with a hospital water supply.

Authors: Nicholas G Conger; Robert J O'Connell; Valerie L Laurel; Kenneth N Olivier; Edward A Graviss; Natalie Williams-Bouyer; Yansheng Zhang; Barbara A Brown-Elliott; Richard J Wallace
Journal: Infect Control Hosp Epidemiol Date: 2004-12 Impact factor: 3.254

5. Clinical and microbiological assessment of Mycobacterium simiae isolates from a single laboratory in southern Arizona.

Authors: D L Rynkiewicz; G D Cage; W R Butler; N M Ampel
Journal: Clin Infect Dis Date: 1998-03 Impact factor: 9.079

6. Mycobacterium stomatepiae sp. nov., a slowly growing, non-chromogenic species isolated from fish.

Authors: Fazel Pourahmad; Fabio Cervellione; Kim D Thompson; John B Taggart; Alexandra Adams; Randolph H Richards
Journal: Int J Syst Evol Microbiol Date: 2008-12 Impact factor: 2.747

7. Mycobacterium genavense sp. nov.

Authors: E C Böttger; B Hirschel; M B Coyle
Journal: Int J Syst Bacteriol Date: 1993-10

8. PHAST: a fast phage search tool.

Authors: You Zhou; Yongjie Liang; Karlene H Lynch; Jonathan J Dennis; David S Wishart
Journal: Nucleic Acids Res Date: 2011-06-14 Impact factor: 16.971

9. The Pfam protein families database.

Authors: Marco Punta; Penny C Coggill; Ruth Y Eberhardt; Jaina Mistry; John Tate; Chris Boursnell; Ningze Pang; Kristoffer Forslund; Goran Ceric; Jody Clements; Andreas Heger; Liisa Holm; Erik L L Sonnhammer; Sean R Eddy; Alex Bateman; Robert D Finn
Journal: Nucleic Acids Res Date: 2011-11-29 Impact factor: 16.971

10. The COG database: an updated version includes eukaryotes.

Authors: Roman L Tatusov; Natalie D Fedorova; John D Jackson; Aviva R Jacobs; Boris Kiryutin; Eugene V Koonin; Dmitri M Krylov; Raja Mazumder; Sergei L Mekhedov; Anastasia N Nikolskaya; B Sridhar Rao; Sergei Smirnov; Alexander V Sverdlov; Sona Vasudevan; Yuri I Wolf; Jodie J Yin; Darren A Natale
Journal: BMC Bioinformatics Date: 2003-09-11 Impact factor: 3.169

3 in total