Literature DB >> 30197919

Data on whole genome sequencing of extrapulmonary tuberculosis clinical isolates from India.

Jayshree Advani^1,2, Kusum Sharma³, Renu Verma¹, Oishi Chatterjee^1,4,5, Hitendra S Solanki^1,6, Aman Sharma⁷, Subhash Varma⁷, Manish Modi⁸, Pallab Ray³, Megha Sharma³, M S Dhillion⁹, Akhilesh Pandey^{1,2,10,11,12,13}, Harsha Gowda¹, T S Keshava Prasad^1,5.

Abstract

This article describes the whole genome sequencing data from 5 extrapulmonary tuberculosis clinical isolates. The whole genome sequencing was carried out on Illumina MiSeq platform to identify single nucleotide variations (SNVs) associated with drug resistance. A total of 214 SNVs in the coding and promoter regions were identified in the whole genome sequencing analysis. Among the identified SNVs, 18 SNVs were identified in genes known to be associated with first and second line drug resistance. The data is related to the research article "Whole genome sequencing of Mycobacterium tuberculosis isolates from extrapulmonary sites" (Sharma et al., 2017) [1].

Entities: Chemical Disease Species

Year: 2018 PMID： 30197919 PMCID： PMC6127979 DOI： 10.1016/j.dib.2018.08.048

Source DB: PubMed Journal: Data Brief ISSN： 2352-3409

Specifications Table Value of the data This data provides insight into the genomic profiles of M. tuberculosis clinical isolates from extra pulmonary sites Lineage-specific SNVs identified in whole genome sequencing allows accurate strain typing and provided the information of lineage distribution of EPTB isolates The data also provided information on SNVs associated with conferring resistance to anti-tubercular drugs Since genomic profiles of EPTB isolates remains largely unexplored, this data would add value to our current knowledge on genomes of M. tuberculosis isolated from different infection sites

Data

The data represents whole genome sequencing of 5 extra pulmonary isolates from 3 different sites. All five clinical isolates sequenced in this data set belonged to East-African-Indian lineage (Lineage 3) (Fig. 1A). A scientific interpretation of this data set was performed by Sharma et al. [1]. Data analysis led to the identification of 15 SNVs in the coding region of genes (Fig. 1B), which are known to confer drug resistance to first and second line anti-tubercular drugs (Supplementary Table 1A). Apart from known drug resistance SNVs, we also identified 199 SNVs in the promoter regions corresponding to 157 genes (Supplementary Table 1B) (Fig. 2). Three of these 157 genes are associated with drug resistance show promoter region SNVs in all of the 5 isolates (Fig. 1B).

Fig. 1

(A) Phylogenetic tree of five EPTB clinical isolates. (B) Distribution of SNVs in the coding and promoter region of genes associated with drug resistance in the five EPTB isolates.

Fig. 2

Circos plot depicting the promoter region SNVs identified in the study.

(A) Phylogenetic tree of five EPTB clinical isolates. (B) Distribution of SNVs in the coding and promoter region of genes associated with drug resistance in the five EPTB isolates. Circos plot depicting the promoter region SNVs identified in the study.

Experimental design, materials and methods

Culturing and DNA isolation of extrapulmonary isolates

The 5 EPTB isolates were obtained from Department of Medical Microbiology, The Postgraduate Institute of Medical Education and Research, Chandigarh, India. The isolates were cultured and maintained as described in [1]. The LJ slants were incubated at 37 °C for a maximum period of 8 weeks. They were inspected daily for growth or for contamination. The isolates were then tested to rule out non tuberculous mycobacteria (NTM) or other infection and were cultured for DNA extraction as previously described [1]. DNA was extracted from the isolates cultured on the LJ slants using cetyltrimethylammonium bromide (CTAB) protocol [2].

Library preparation and sequencing

DNA libraries were constructed and sequencing was carried out on Illumina MiSeq instrument as described previously [1]. Sequencing was performed using a 2 ×100 paired-end (PE) configuration (Table 1).

Table 1

Raw data statistics.

Platform
Illumina MiSeq (2*100) paired end
Sample ID	Category	R1	R2	Total Reads
PGI-14	Cerebrospinal fluid(CSF)	2,532,274	2,532,274	5,064,548
PGI-98	Joint aspirate pus	2,250,203	2,250,203	4,500,406
PGI-100	Fine needle aspiration cytology (cervical lymph node)	2,088,387	2,088,387	4,176,774
PGI-103	Fine needle aspiration cytology (cervical lymph node)	2,315,946	2,315,946	4,631,892
PGI-155	Fine needle aspiration cytology (cervical lymph node)	2,454,773	2,454,773	4,909,546

Raw data statistics.

Variant calling and data analysis

Paired end reads were quality checked using FastQC version-0.11.5. Raw reads of Phred quality score of < 20 were discarded. High quality reads were mapped to the H37Rv reference genome (NC_000962.3) using Burrows-Wheeler Alignment Tool (BWA version-0.7.15) [3]. Variants were identified using GATK [4]. The variants were annotated using in-house perl scripts. Phylogenetic analysis was carried out using KvarQ version-0.12.2 [5]. SNVs identified in the isolates were used to generate phylogenetic tree FastTree version-2.1.10 [6].

Subject area	Biology
More specific subject area	Infectious diseases
Type of data	Raw fastq files, Excel tables and figures
How data was acquired	Illumina MiSeq
Data format	Raw and analysed data
Experimental factors	Extra pulmonary isolates from cerebrospinal fluid (CSF), joint aspirate pus and fine needle aspiration cytology were cultured on LJ slants and genomic DNA was isolated using cetyltrimethylammonium bromide (CTAB) method
Experimental features	Library preparation and sequencing was performed according to Illumina Miseq specific protocols
Data source location	Punjab and Bangalore, India
Data accessibility	Data is with this article and whole genome sequencing data is available in NCBI SRA database with accession PRJNA358480, https://www.ncbi.nlm.nih.gov/sra/?term=PRJNA358480.
	https://www.ncbi.nlm.nih.gov/sra/SRX2439868
	https://www.ncbi.nlm.nih.gov/sra/SRX2439869
	https://www.ncbi.nlm.nih.gov/sra/SRX2439870
	https://www.ncbi.nlm.nih.gov/sra/SRX2439871
	https://www.ncbi.nlm.nih.gov/sra/SRX2439872
Related research article	Whole genome sequencing of Mycobacterium tuberculosis isolates from extrapulmonary sites [1].

6 in total

1. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data.

Authors: Aaron McKenna; Matthew Hanna; Eric Banks; Andrey Sivachenko; Kristian Cibulskis; Andrew Kernytsky; Kiran Garimella; David Altshuler; Stacey Gabriel; Mark Daly; Mark A DePristo
Journal: Genome Res Date: 2010-07-19 Impact factor: 9.043

2. FastTree 2--approximately maximum-likelihood trees for large alignments.

Authors: Morgan N Price; Paramvir S Dehal; Adam P Arkin
Journal: PLoS One Date: 2010-03-10 Impact factor: 3.240

3. Whole Genome Sequencing of Mycobacterium tuberculosis Isolates From Extrapulmonary Sites.

Authors: Kusum Sharma; Renu Verma; Jayshree Advani; Oishi Chatterjee; Hitendra S Solanki; Aman Sharma; Subhash Varma; Manish Modi; Pallab Ray; Kanchan K Mukherjee; Megha Sharma; Mandeed Singh Dhillion; Mrutyunjay Suar; Aditi Chatterjee; Akhilesh Pandey; Thottethodi Subrahmanya Keshava Prasad; Harsha Gowda
Journal: OMICS Date: 2017-07

4. Occurrence and stability of insertion sequences in Mycobacterium tuberculosis complex strains: evaluation of an insertion sequence-dependent DNA polymorphism as a tool in the epidemiology of tuberculosis.

Authors: D van Soolingen; P W Hermans; P E de Haas; D R Soll; J D van Embden
Journal: J Clin Microbiol Date: 1991-11 Impact factor: 5.948

5. KvarQ: targeted and direct variant calling from fastq reads of bacterial genomes.

Authors: Andreas Steiner; David Stucki; Mireia Coscolla; Sonia Borrell; Sebastien Gagneux
Journal: BMC Genomics Date: 2014-10-09 Impact factor: 3.969

6. Fast and accurate short read alignment with Burrows-Wheeler transform.

Authors: Heng Li; Richard Durbin
Journal: Bioinformatics Date: 2009-05-18 Impact factor: 6.937

6 in total