Literature DB >> 30197919

Data on whole genome sequencing of extrapulmonary tuberculosis clinical isolates from India.

Jayshree Advani1,2, Kusum Sharma3, Renu Verma1, Oishi Chatterjee1,4,5, Hitendra S Solanki1,6, Aman Sharma7, Subhash Varma7, Manish Modi8, Pallab Ray3, Megha Sharma3, M S Dhillion9, Akhilesh Pandey1,2,10,11,12,13, Harsha Gowda1, T S Keshava Prasad1,5.   

Abstract

This article describes the whole genome sequencing data from 5 extrapulmonary tuberculosis clinical isolates. The whole genome sequencing was carried out on Illumina MiSeq platform to identify single nucleotide variations (SNVs) associated with drug resistance. A total of 214 SNVs in the coding and promoter regions were identified in the whole genome sequencing analysis. Among the identified SNVs, 18 SNVs were identified in genes known to be associated with first and second line drug resistance. The data is related to the research article "Whole genome sequencing of Mycobacterium tuberculosis isolates from extrapulmonary sites" (Sharma et al., 2017) [1].

Entities:  

Year:  2018        PMID: 30197919      PMCID: PMC6127979          DOI: 10.1016/j.dib.2018.08.048

Source DB:  PubMed          Journal:  Data Brief        ISSN: 2352-3409


Specifications Table Value of the data This data provides insight into the genomic profiles of M. tuberculosis clinical isolates from extra pulmonary sites Lineage-specific SNVs identified in whole genome sequencing allows accurate strain typing and provided the information of lineage distribution of EPTB isolates The data also provided information on SNVs associated with conferring resistance to anti-tubercular drugs Since genomic profiles of EPTB isolates remains largely unexplored, this data would add value to our current knowledge on genomes of M. tuberculosis isolated from different infection sites

Data

The data represents whole genome sequencing of 5 extra pulmonary isolates from 3 different sites. All five clinical isolates sequenced in this data set belonged to East-African-Indian lineage (Lineage 3) (Fig. 1A). A scientific interpretation of this data set was performed by Sharma et al. [1]. Data analysis led to the identification of 15 SNVs in the coding region of genes (Fig. 1B), which are known to confer drug resistance to first and second line anti-tubercular drugs (Supplementary Table 1A). Apart from known drug resistance SNVs, we also identified 199 SNVs in the promoter regions corresponding to 157 genes (Supplementary Table 1B) (Fig. 2). Three of these 157 genes are associated with drug resistance show promoter region SNVs in all of the 5 isolates (Fig. 1B).
Fig. 1

(A) Phylogenetic tree of five EPTB clinical isolates. (B) Distribution of SNVs in the coding and promoter region of genes associated with drug resistance in the five EPTB isolates.

Fig. 2

Circos plot depicting the promoter region SNVs identified in the study.

(A) Phylogenetic tree of five EPTB clinical isolates. (B) Distribution of SNVs in the coding and promoter region of genes associated with drug resistance in the five EPTB isolates. Circos plot depicting the promoter region SNVs identified in the study.

Experimental design, materials and methods

Culturing and DNA isolation of extrapulmonary isolates

The 5 EPTB isolates were obtained from Department of Medical Microbiology, The Postgraduate Institute of Medical Education and Research, Chandigarh, India. The isolates were cultured and maintained as described in [1]. The LJ slants were incubated at 37 °C for a maximum period of 8 weeks. They were inspected daily for growth or for contamination. The isolates were then tested to rule out non tuberculous mycobacteria (NTM) or other infection and were cultured for DNA extraction as previously described [1]. DNA was extracted from the isolates cultured on the LJ slants using cetyltrimethylammonium bromide (CTAB) protocol [2].

Library preparation and sequencing

DNA libraries were constructed and sequencing was carried out on Illumina MiSeq instrument as described previously [1]. Sequencing was performed using a 2 ×100 paired-end (PE) configuration (Table 1).
Table 1

Raw data statistics.

Platform
Illumina MiSeq (2*100) paired end
Sample IDCategoryR1R2Total Reads
PGI-14Cerebrospinal fluid(CSF)2,532,2742,532,2745,064,548
PGI-98Joint aspirate pus2,250,2032,250,2034,500,406
PGI-100Fine needle aspiration cytology (cervical lymph node)2,088,3872,088,3874,176,774
PGI-103Fine needle aspiration cytology (cervical lymph node)2,315,9462,315,9464,631,892
PGI-155Fine needle aspiration cytology (cervical lymph node)2,454,7732,454,7734,909,546
Raw data statistics.

Variant calling and data analysis

Paired end reads were quality checked using FastQC version-0.11.5. Raw reads of Phred quality score of < 20 were discarded. High quality reads were mapped to the H37Rv reference genome (NC_000962.3) using Burrows-Wheeler Alignment Tool (BWA version-0.7.15) [3]. Variants were identified using GATK [4]. The variants were annotated using in-house perl scripts. Phylogenetic analysis was carried out using KvarQ version-0.12.2 [5]. SNVs identified in the isolates were used to generate phylogenetic tree FastTree version-2.1.10 [6].
Subject areaBiology
More specific subject areaInfectious diseases
Type of dataRaw fastq files, Excel tables and figures
How data was acquiredIllumina MiSeq
Data formatRaw and analysed data
Experimental factorsExtra pulmonary isolates from cerebrospinal fluid (CSF), joint aspirate pus and fine needle aspiration cytology were cultured on LJ slants and genomic DNA was isolated using cetyltrimethylammonium bromide (CTAB) method
Experimental featuresLibrary preparation and sequencing was performed according to Illumina Miseq specific protocols
Data source locationPunjab and Bangalore, India
Data accessibilityData is with this article and whole genome sequencing data is available in NCBI SRA database with accession PRJNA358480, https://www.ncbi.nlm.nih.gov/sra/?term=PRJNA358480.
https://www.ncbi.nlm.nih.gov/sra/SRX2439868
https://www.ncbi.nlm.nih.gov/sra/SRX2439869
https://www.ncbi.nlm.nih.gov/sra/SRX2439870
https://www.ncbi.nlm.nih.gov/sra/SRX2439871
https://www.ncbi.nlm.nih.gov/sra/SRX2439872
Related research articleWhole genome sequencing of Mycobacterium tuberculosis isolates from extrapulmonary sites [1].
  6 in total

1.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data.

Authors:  Aaron McKenna; Matthew Hanna; Eric Banks; Andrey Sivachenko; Kristian Cibulskis; Andrew Kernytsky; Kiran Garimella; David Altshuler; Stacey Gabriel; Mark Daly; Mark A DePristo
Journal:  Genome Res       Date:  2010-07-19       Impact factor: 9.043

2.  FastTree 2--approximately maximum-likelihood trees for large alignments.

Authors:  Morgan N Price; Paramvir S Dehal; Adam P Arkin
Journal:  PLoS One       Date:  2010-03-10       Impact factor: 3.240

3.  Whole Genome Sequencing of Mycobacterium tuberculosis Isolates From Extrapulmonary Sites.

Authors:  Kusum Sharma; Renu Verma; Jayshree Advani; Oishi Chatterjee; Hitendra S Solanki; Aman Sharma; Subhash Varma; Manish Modi; Pallab Ray; Kanchan K Mukherjee; Megha Sharma; Mandeed Singh Dhillion; Mrutyunjay Suar; Aditi Chatterjee; Akhilesh Pandey; Thottethodi Subrahmanya Keshava Prasad; Harsha Gowda
Journal:  OMICS       Date:  2017-07

4.  Occurrence and stability of insertion sequences in Mycobacterium tuberculosis complex strains: evaluation of an insertion sequence-dependent DNA polymorphism as a tool in the epidemiology of tuberculosis.

Authors:  D van Soolingen; P W Hermans; P E de Haas; D R Soll; J D van Embden
Journal:  J Clin Microbiol       Date:  1991-11       Impact factor: 5.948

5.  KvarQ: targeted and direct variant calling from fastq reads of bacterial genomes.

Authors:  Andreas Steiner; David Stucki; Mireia Coscolla; Sonia Borrell; Sebastien Gagneux
Journal:  BMC Genomics       Date:  2014-10-09       Impact factor: 3.969

6.  Fast and accurate short read alignment with Burrows-Wheeler transform.

Authors:  Heng Li; Richard Durbin
Journal:  Bioinformatics       Date:  2009-05-18       Impact factor: 6.937

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.