Literature DB >> 31372401

Proteomic dataset: Profiling of cultivated Echerichia coli isolates from Crohn's disease patients and healthy individuals.

Daria Rakitina1, Julia Baikova1, Olga Pobeguts1, Olga Bukato1, Ivan Butenko1, Irina Garanina1, Mark Levites1, Alexander Manolov1, Alexandra Kanygina2, Elena Kostryukova1, Tatiana Semashko1, Irina Karpova1, Vladislav Babenko1, Petr Scherbakov1,3, Igor Khalif4, Marina Shapina4, Asfold Parfenov3, Irina Ruchkina3, Oleg Knyazev3, Nina Fadeeva3, Alexandr Subbotin5, Sergey Chamkin6, Alexandr Pyrkh7, Marina Ivantsova8, Vadim Govorun1.   

Abstract

One of the dysbioses often observed in Crohn's disease (CD) patients is an increased abundance of Escherichia coli (10-100 fold compared to healthy individuals) (Gevers et al., 2014). The data reported is a large-scale proteome profile for E. coli isolates collected from CD patients and healthy individuals. 43 isolates were achieved from 30 CD patients (17 male, 12 female, median age 30) and 19 isolates from 7 healthy individuals (7 male, median age 19). Isolates were cultivated on LB medium at aerobic conditions up to medium log phase. Protein extraction was performed with sodium deoxycholate (DCNa) and urea, alcylation with tris(2-carboxyethyl)phosphine and iodacetamide. Protein trypsinolysis was performed as described in (Matyushkina et al., 2016). Total cell proteomes were analysed by shotgun proteomics with HPLC-MS/MS on a maXis qTOF mass-spectrometer. The data including HPLC-MS/MS raw files and exported Mascot search results was deposited to the PRIDE repository project accession: PXD010920, project https://doi.org/10.6019/PXD010920.

Entities:  

Keywords:  Crohn's disease; E. coli; HPLC-MS/MS; Proteome

Year:  2019        PMID: 31372401      PMCID: PMC6660468          DOI: 10.1016/j.dib.2019.103734

Source DB:  PubMed          Journal:  Data Brief        ISSN: 2352-3409


Specifications table The dataset contains the first published wide-range proteome analysis of Escherichia coli isolates from Crohn's disease patients and healthy individuals (104 raw HPLC-MS/MS analyses searched against three different databases) and is valuable for researchers interested in bacterial proteomics The data can be of value for the studies of pathogenic/nonpathogenic Escherichia coli The data might be useful in studies of Crohn's disease pathogenesis mechanism

Data

Escherichia coli is often observed as an abundant bacteria in intestines of Crohn's disease (CD) patients (Gevers et al., 2014) [1], in cotrast with healthy individuals. To identify proteins expressed in E. coli isolates from CD patients and healthy individuals (listed in Supplementary Table 1), we carried out HPLC-MS/MS proteome analysis of cultivated bacterial cells. Analyses were performed at maXis qTOF mass-spectrometer. Dataset covers 104 samples. Lists of identified proteins during search against three databases are given in Supplementary Tables 2, 3 and 4. Proteins, significantly overrepresented in CD or healthy isolates identified are listed in Supplementary table 5, and their functions are summarized in Supplementary Table 6 and Table 1. Numbers of proteins, significantly overrepresented in CD or healthy isolates, identified during search against three databases are given on Fig. 1. Principal component analysis (PCA) of E. coli proteomes with indication of patient's sex, isolate sources and diagnoses are given on Fig. 2, Fig. 3, Fig. 4.
Table 1

Functions of proteins, significantly overrepresented in CD or healthy isolates of E. coli (Summary table of functions of overrepresented proteins that are found in at least one database). Functions and number of proteins involved in each function are listed for CD-enriched and healthy-enriched proteins.

General functionsCD-overrepresented
healthy-overrepresented
functionnumber of proteinsfunctionnumber of proteins
antibiotic resistanceantibiotic resistance2antibiotic and metall resistance3
respiration/oxidationelectron transfer3energy homeostasis and in adenine nucleotide metabolism2
anaerobic respiration1anaerobic respiration, response to DNA damage1
cell redox homeostasis3cell redox homeostasis1
Fe uptake2electron transfer6
oxidoreductase5
NAD(+) biosynthesis1
NAD-oxidoreductase, DNA damage stress1
Fe uptake3
protoporphyrin-IX biosynthesis1
metaboliccarbohydrate metabolism7alcohol metabolism7
glycolytic process1Amino-sugar metabolism1
glyoxylate and dicarboxylate metabolism.1ATP biosynthesis2
indole production1carbohydrate metabolism22
alcohol metabolism2ethanol biosynthetic process1
lactate metabolism1fatty acids degradation1
propanoate metabolism1glutathione biosynthesis1
carbon utilization1glycolysis1
molybdopterin biosynthesis1IMP biosynthesis via de novo pathway1
mycothiol biosynthesis1isoprenoid biosynthetic process1
phosphonate metabolism1
polyol metabolism4
pyruvate metabolism3
Sulfur metabolism1
teichoic acid biosynthesis1
tricarboxylic acid cycle2
capsule biosynthesisbiofilm formation1
capsule biosynthesis1
cell division1
cell envelope, cell wall biogenesis6
spore coat biogenesis1
membrane biosynthesisfatty acids metabolism, biosynthesis1lipid biosynthesis1
glycerolipid biosynthesis1lipopolysaccharide biosynthesis1
lipid biosynthesis2glycerophospholipid metabolism1
lipopolysaccharide biosynthesis2lipoprotein6
lipoprotein biosynthesis1outer membrane protein3
membrane proteinsflagellin1outer membrane transporter3
outer membrane transporter6membrane protein1
inner membrane protein3molybdate ion transport1
antigen1quorum sensing1
DNAcell division1DNA1
DNA binding2DNA damage2
purine metabolism/biosynthesis1DNA recombination4
pyrimidine metabolism2DNA replication1
purine metabolism/biosynthesis9
pyrimidine metabolism/biosynthesis5
nucleotide metabolism2
nucleotide sugars metabolism2
RNARNA degradation1transcription4
transcription7tRNA biosynthesis4
translation2
ribosomal3
protein processingpeptidase1peptidase4
protein phosphorylation2protease2
protein transport1protein dephosphorylation1
proteolysis1protein folding1
protein kinase1
protein maturation3
protein secretion1
signal peptide processing1
enzymes activity regulation1
translationaa metabolism, biosynthesis, transport7aa biosynthesis23
translation4aa catabolism1
ribosomal31aa metabolism4
tRNA biosynthesis5aa transport1
cytosol protein2
stress protectionchaperone2chaperone3
stress protein8stress18
Uncharacterized/hypothetical protein6Uncharacterized/hypothetical protein9
vitamin, coenzyme, cofactorvitamin transport, oxidation1vitamin biosynthesis8
coenzyme, cofactor biosynthesis4cofactor, coenzyme biosynthesis3
virulencevirulence, host interaction protein2
Fig. 1

Number of common and unique proteins, significantly (p-value ≤ 0.05) overrepresented in Crohn's disease (CD) or healthy (H) isolates, identified during search against three databases.

Fig. 2

PCA clustering plot of proteins identified vs 16032015 database in proteomes of CD and healthy E. coli isolates. A, B, C, – all proteins, D – proteins significantly overrepresented in CD or healthy group. Various samples parameters are indicated. A – patient's sex (black dots – female, green – male, red – lab strains). B – isolate source (black dots – biopsy, blue – lumen, red – feces, green – lab strains). C, D – diagnosis (black dots – CD, red – healthy, green – lab strains).

Fig. 3

PCA clustering plot of proteins identified vs LF82 database in proteomes of CD and healthy E. coli isolates. A, B, C, – all proteins, D – proteins significantly overrepresented in CD or healthy group. Various samples parameters are indicated. A – patient's sex (black dots – female, green – male, red – lab strains). B – isolate source (black dots – biopsy, blue – lumen, red – feces, green – lab strains). C, D – diagnosis (black dots – CD, red – healthy, green – lab strains).

Fig. 4

PCA clustering plot of proteins identified vs Nissle 1917 database in proteomes of CD and healthy E. coli isolates. A, B, C, – all proteins, D – proteins significantly overrepresented in CD or healthy group. Various samples parameters are indicated. A – patient's sex (black dots – female, green – male, red – lab strains). B – isolate source (black dots – biopsy, blue – lumen, red – feces, green – lab strains). C, D – diagnosis (black dots – CD, red – healthy, green – lab strains).

Functions of proteins, significantly overrepresented in CD or healthy isolates of E. coli (Summary table of functions of overrepresented proteins that are found in at least one database). Functions and number of proteins involved in each function are listed for CD-enriched and healthy-enriched proteins. Number of common and unique proteins, significantly (p-value ≤ 0.05) overrepresented in Crohn's disease (CD) or healthy (H) isolates, identified during search against three databases. PCA clustering plot of proteins identified vs 16032015 database in proteomes of CD and healthy E. coli isolates. A, B, C, – all proteins, D – proteins significantly overrepresented in CD or healthy group. Various samples parameters are indicated. A – patient's sex (black dots – female, green – male, red – lab strains). B – isolate source (black dots – biopsy, blue – lumen, red – feces, green – lab strains). C, D – diagnosis (black dots – CD, red – healthy, green – lab strains). PCA clustering plot of proteins identified vs LF82 database in proteomes of CD and healthy E. coli isolates. A, B, C, – all proteins, D – proteins significantly overrepresented in CD or healthy group. Various samples parameters are indicated. A – patient's sex (black dots – female, green – male, red – lab strains). B – isolate source (black dots – biopsy, blue – lumen, red – feces, green – lab strains). C, D – diagnosis (black dots – CD, red – healthy, green – lab strains). PCA clustering plot of proteins identified vs Nissle 1917 database in proteomes of CD and healthy E. coli isolates. A, B, C, – all proteins, D – proteins significantly overrepresented in CD or healthy group. Various samples parameters are indicated. A – patient's sex (black dots – female, green – male, red – lab strains). B – isolate source (black dots – biopsy, blue – lumen, red – feces, green – lab strains). C, D – diagnosis (black dots – CD, red – healthy, green – lab strains).

Experimental design, materials, and methods

Patients and samples

Escherichia coli isolates achieved from feces, ileum biopsy and liquid ileal content of Crohn's disease (CD) patients and ileal content and feces of healthy patients. Samples from CD patients were collected during diagnostic endoscopy at Central Scientific Institute of Gastroenterology (Moscow Clinical Research Centre, Moscow), State Scientific Center of Coloproctology (Ministry of Health of Russian Federation, Moscow) and Federal Research and Clinical Center of Physical-Chemical Medicine (Federal Medico-Biological Agency, Moscow). Duration of the disease was from four months to eight years. All patients had confirmed Crohn's disease three months before enrolment or earlier. The inclusion criteria were age above 18, endoscopically and radiologically diagnosed, and histologically confirmed Crohn's disease. The exclusion criteria were signs of indeterminate colitis, infectious diseases, anamnesis of total colectomy, presence of stoma, and recent antibiotic treatment. Feces from healthy patients were collected in Clinical Center of Physical-Chemical Medicine (Federal Medico-Biological Agency, Moscow). Most material collections were performed in Moscow, however, some samples from CD patients were achieved by specialists of FRCCPCM during official visits to the hospitals in other regions of Russian Federation: Khabarovsk (2 patients), Syktyvkar (2 patients), Nizhny Novgorod (2 patients). Material collection was approved by local Ethics Committees, patients gave written informed consent for research and publication of data.

E. coli isolation and cultivation

Isolation of E. coli was as follows: liquid aspirates were diluted approximately ×106 fold with sterile PBS. Approximately 0.05 ml volume of feces were placed into 0.5 ml of sterile PBS, vortexed to homogeneity, an aliquot was diluted approximately ×106 fold. Biopsy samples were vortexed in 0.2 ml of sterile PBS. For all samples 0.1 ml of resulting liquid was spread onto LB agar plates. After overnight incubation on 37 °C, isolated colonies were identified as Escherichia coli on MALDI Mass-spectrometer Bruker Microflex with the MALDI Biotyper software (Bruker Daltonics, Germany) using the mass spectrometer Microflex LT (Bruker Daltonics, Germany). Isolates were cultivated in LB at 37 °C (200 RPM) for 14 h. It was 3rd passage from the initial sample. Overnight cultures were diluted to 0.04 OD (540) and grown under the same conditions till mid-log phase (0.4 OD (540)). Bacterial cells were harvested by centrifugation (3500 g, 15 min) and pellet was washed twice with PBS.

Tryptic digestion of E. coli proteins

Protein trypsinolysis was performed as described in (Matyushkina et al., 2016) [2] with some alterations. Cell pellets were washed with PBS. Bacterial pellet was resuspended in 10 μl 100 mM NH4HCO3 with 0.5 mg/ml of lyzozyme and 1/10 volume of protease inhibitor mix. The suspension was incubated for 1 h at +4 °C. Then 10 μl of 10% of sodium deoxycholate (DCNa) and 1 μl nuclease mix (Promega) were added to the sample. The suspension was incubated for 1 h at +4 °C. Then the sample was diluted with 100 μl of 100 mM tris-HCl pH 8.0 with and 2.5mM EDTA. Cells were lyzed with ultrasonication for 1 min. Proteins were extracted with urea that was dissolved in each sample up to 6M concentration and incubated for 20 min at room temperature. After centrifugation for 10 min at 12 000 g, protein concentration was measured in supernatant by Bradford assay (Quick Start Bradford Protein Assay, BioRad) and samples were equalized. The alcylation was performed as follows. 10 mM of reducing agent tris(2-carboxyethyl)phosphine (TCEP) was added and samples were incubated at 37 °C for 30 min. Then 30 mM of iodacetamide was added (IAA) and samples were kept at room temperature in the dark for 30 min. To avoid chemical modifications and remove the unreacted IAA, samples were treated with 5 mM TCEP and incubated for 20 min at RT. Protein hydrolysis was performed by trypsin (20 μg per sample, Trypsin Gold, Mass Spectrometry Grade, Promega) for 16 h at room temperature. After that samples were diluted with 6× volume of 100 mM tris-HCl pH 8.0 and protein hydrolysis was performed by addition of trypsin (in ratio trypsin : protein equal 1 : 50, Trypsin Gold, Mass Spectrometry Grade, Promega) in 0.1% SDS and incubation at 37 °C for 17h. At this point trypsinolysis stopped by addition of 10% TFA and incubation at 37 °C for 30 min. After centrifugation for 10 min at 12 000 g, supernatant was collected and cleaned with cartridges C18 (Discovery DSC-18 Tube, (Supelco)) according to the manufacturer's protocol. Achieved peptide extracts were dried at SpeedVac (Labconco) and dissolved in 15 μl of LC-MS-MS sample buffer containing 3% acetonitrile and 0.1% trifluoracetic acid. The equivalent of 5 μg of protein was loaded onto HPLC-MS/MS analysis.

HPLC-MS/MS analysis

The HPLC-MS/MS analysis of the tryptic peptides was carried out using an Ultimate-3000 HPLC system (Thermo Scientific) coupled to a maXis qTOF after the HDC-cell upgrade (Bruker) with a nano-electrospray source. The chromatographic separation of the peptides was performed on a trap-elute system: trap column (Zorbax 300SB-C18, 5 mm × 0.3 mm, particle diameter 5 μm, Dionex) and column (Zorbax 300SB-C18, 150 mm × 75 μm, particle diameter 3.5 μm, Agilent). The gradient parameters were as follows: 5–35% acetonitrile in aqueous 0.1% (v/v) formic acid, the column flow 0.3 μl/min. The gradient duration was 120 min. The positive MS and MS/MS spectra were acquired using an AutoMSMS mode (the capillary voltage 1700, the curtain gas flow is 4 l and the temperature is 170 °C, the spectra rate 4 Hz, 20 precursors, m/z range 200–1500, the active exclusion after 2 spectra, release after 0.5 min). The lists of compounds (mgf files) were generated after a lock mass calibration (m/z 445.1200) with a Compass DataAnalysis (Bruker).

Protein identification and quantitative analysis

The protein identification was performed by the peptide search with a Mascot Data Search with the following parameters: Peptide Mass Tolerance 0,05 Da, Fragment Mass Tolerance 0,1 Da, variable modifications Carbamidomethyl (C), Oxidation (M), cutting enzyme trypsin, 1 missed cleavage per peptide was allowed. Peptide search for protein identification was performed versus database of proteins (peptides). Databases for protein search by Mascot search were created as follows: Ecoli-16032016-kerat.fasta - was created by translation and annotation by PROKKA 1.7 of 14 CD E. coli isolates and 12 isolates from healthy patients (summarized and described in Rakitina et al., 2017 [3]). Similar proteins (>80% homology at >80% sequence) were united and the one showing maximum similarity with the other group members was used as representative. The database included: total sequences 92600, total residues 32006615. The cut-off ion score was >28 as an indicator of identity (p-value <0.05). Nissle1917_goodProt_kerat.fasta – was formed on the basis of genomes of genome of typical symbiotic E. coli strain). Escherichia_coli_LF82_uid161965-1.fasta – was formed on the basis of genomes of genome of typical CD E. coli strain). Aminoacid sequences of trypsin (Promega) and Human keratins were added to all databases to avoid misinterpretation of contaminating proteins. The protein was considered as identified by no less than two unique peptides with the score above the threshold. Lists of identified proteins are given in Supplementary Tables 2, 3 and 4. The protein abundances were evaluated by a label-free method using an emPAI (Exponentially Modified Protein Abundance Index) determined by Mascot Data Search for each identified protein (Shinoda et al., 2010) [4]. Proteins significantly overrepresented in CD or healthy group are listed in Supplementary table 5. Numbers of proteins, significantly overrepresented in CD or healthy isolates, identified during search against three databases are given on Fig. 1.

Proteins abundance comparison between CD and healthy groups of E. coli isolates

The data of over- or under-represented proteins in CD and healthy groups of E. coli isolates, was achieved by the two-way Fisher test was used separately for each protein. Principal component analysis (PCA) and T-distributed Stochastic Neighbor Embedding (T-SNE) analysis were used for data analysis. Principal components were constructed, representing orthogonal transformation of the analyzed data set. The principal component plot showed directions along which variation of data was maximum, so the 2d plot we can see the projection of distances among variables in multidimensional space. Variables in the 2d plot can group in clusters reflecting the correlation among variables like in clustering analysis. The test was performed in R with prcomp. T-SNE is a machine learning algorithm for visualization of high-dimensional data based on nonlinear dimensionality reduction. T-SNE analysis was performed in R with Rtsne. Plotted 2D projections are given on Fig. 2, Fig. 3, Fig. 4. Patient's sex, isolate sources and diagnoses are indicated.

Specifications table

Subject areaBiology
More specific subject areaProteomics
Type of dataLC-MS/MS data and identification data
How data was acquiredmaXis qTOF after the HDC-cell upgrade (Bruker, Germany) with a nano-electrospray source coupled to a Ultimate-3000 HPLC system (Thermo Scientific, USA).
Data formatRaw and analyzed data
Experimental factors104 HPLC-MS/MS runs were performed: 65 samples for 43 isolates from 30 CD patients (17 male, 12 female, median age 30), 35 samples for 19 isolates from 7 healthy patients (7 male, median age 19), and 4 lab strains samples.
Experimental featuresE. coli isolates collected from Crohn's disease patients and healthy individuals were cultivated on LB medium at aerobic conditions up to medium log phase and their total proteomes were analyzed by shotgun proteomics by HPLC-MS/MS.
Data source locationResearch and Clinical Center of Physical-Chemical Medicine, Moscow, Russian Federation
Data accessibilityData was deposited to the PRIDE repository:Project accession: PXD010920 Project https://doi.org/10.6019/PXD010920
Related research articleBukato O, Garanina I, Matyshkina D, Pobeguts, O, Rakitina D, Baykova J, Ladygina V, Scherbakov P, Govorun V. (2017) Proteomic profiling of E. coli, isolated from Crohn's disease patients. FEBS JOURNAL, 284: SpT.5.3001, https://doi.org/10.1111/febs.141.
Value of the data

The dataset contains the first published wide-range proteome analysis of Escherichia coli isolates from Crohn's disease patients and healthy individuals (104 raw HPLC-MS/MS analyses searched against three different databases) and is valuable for researchers interested in bacterial proteomics

The data can be of value for the studies of pathogenic/nonpathogenic Escherichia coli

The data might be useful in studies of Crohn's disease pathogenesis mechanism

  1 in total

1.  Proteomic dataset: Profiling of membrane fraction of Escherichia coli isolated from Crohn's disease patients after adhesion and invasion experiments.

Authors:  Olga Bukato; Olga Pobeguts; Daria Rakitina; Julia Baikova; Ivan Butenko; Artemy Silantyev; Gleb Fisunov; Vadim Govorun
Journal:  Data Brief       Date:  2019-08-20
  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.