| Literature DB >> 25496341 |
Mathieu Almeida, Agnès Hébert, Anne-Laure Abraham, Simon Rasmussen, Christophe Monnet, Nicolas Pons, Céline Delbès, Valentin Loux, Jean-Michel Batto, Pierre Leonard, Sean Kennedy, Stanislas Dusko Ehrlich, Mihai Pop, Marie-Christine Montel, Françoise Irlinger, Pierre Renault1.
Abstract
BACKGROUND: Microbial communities of traditional cheeses are complex and insufficiently characterized. The origin, safety and functional role in cheese making of these microbial communities are still not well understood. Metagenomic analysis of these communities by high throughput shotgun sequencing is a promising approach to characterize their genomic and functional profiles. Such analyses, however, critically depend on the availability of appropriate reference genome databases against which the sequencing reads can be aligned.Entities:
Mesh:
Year: 2014 PMID: 25496341 PMCID: PMC4320590 DOI: 10.1186/1471-2164-15-1101
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Origin of the 142 selected dairy bacterial isolates in function of the type of dairy product (A) and the geographic area (B).
Figure 2Global phylogeny of the 117 dairy bacterial isolates sequenced in the present study. The phylogenetic tree is an ITOL circular visualization [68] with the branch length and the bootstrap values displayed. The tree is based on a concatenated alignment of 40 universal marker protein families [40]. Only genome sequences from which a minimum of 10 markers could be extracted and which had no contaminating sequences evidence were considered. The genome of Methanobrevibacter smithii ATCC35061 was used to root the tree. The colors correspond to the different phyla.
Figure 3Mapping of the good quality reads from the metagenomic sequencing of DNA from the surfaces of three cheeses. The good quality reads coming from 3 samples of cheese surface were aligned to 5873 genomes coming from NCBI and 117 genomes coming from our project. The repartition of the good quality reads that map only on the NCBI genomes (blue), on the genome sequenced in our project (green), on both NCBI and our genome (light green) and on Bos taurus genome (orange) is presented in pie charts. The unmapped good reads are presented in dark and light grey, respectively those lacking a reference and those potentially unmappable for technical reason.
Most prevalent microorganisms detected by metagenomic sequencing of three cheese surface samples
| Reference genome | New(a) | Commercial cultures (b) | Number of reads (c) | Number of CDS (d) | Cumulated CDS length | Covered CDS (%) (e) | Covered sequence length (%) (f) | Mean genome coverage | Mapped reads (%) (g) | Sequences covered by perfect match reads (%) (h) | |
|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
| 2919327 | 3299 | 2875518 | 95.4 | 93.1 | 34.64 | 19.80 | 99.5 |
|
|
|
| 1259254 | 2941 | 2824830 | 99.4 | 94.3 | 15.48 | 8.54 | 99.9 | |
|
|
|
| 768119 | 3353 | 3158767 | 95.6 | 82.2 | 8.36 | 5.21 | 92.3 | |
|
|
|
| 384444 | 3454 | 3327182 | 94.1 | 70.4 | 4.00 | 2.61 | 93.1 | |
|
|
|
| 378769 | 6925 | 10213537 | 98.3 | 38.6 | 1.30 | 2.57 | 95.5 | |
|
|
|
| 221008 | 3526 | 3281784 | 95.6 | 77.1 | 2.31 | 1.50 | 93.5 | |
|
|
|
| 105179 | 2088 | 1869362 | 95.3 | 54.9 | 1.80 | 0.71 | 96.5 | |
|
|
|
| 77071 | 6295 | 9107395 | 91.7 | 16.6 | 0.30 | 1.27 | 85.6 | |
|
|
|
|
| 1434355 | 3454 | 3327182 | 93.7 | 89.5 | 14.95 | 17.03 | 97.6 |
|
|
|
| 883652 | 3526 | 3281784 | 100.0 | 99.3 | 9.16 | 10.49 | 99.97 | |
|
|
|
| 146974 | 2584 | 2414775 | 95.6 | 76.5 | 2.08 | 1.74 | 91.5 | |
|
|
|
| 143400 | 2470 | 1969568 | 92.5 | 35.4 | 0.82 | 1.70 | 98.1 | |
|
|
|
| 83277 | 14611 | 22234696 | 85.1 | 10.8 | 0.13 | 0.99 | 97.7 | |
|
|
|
| 65937 | 3353 | 3158767 | 90.0 | 27.6 | 0.70 | 0.78 | 85.9 | |
|
|
|
| 53280 | 3824 | 3516737 | 83.6 | 26.7 | 0.51 | 0.63 | 55.3 | |
|
|
|
| 45165 | 6925 | 10213537 | 95.0 | 12.7 | 0.15 | 0.54 | 95 | |
|
|
|
|
| 2878361 | 3553 | 3110335 | 98.6 | 95.7 | 31.77 | 18.58 | 99.2 |
|
|
|
| 1179878 | 1508 | 1340406 | 98.7 | 95.4 | 29.44 | 7.62 | 99.8 | |
|
|
|
| 633123 | 14611 | 22234696 | 99.4 | 54.2 | 1.00 | 4.09 | 96.5 | |
|
|
|
| 597916 | 1827 | 1462709 | 98.3 | 86.1 | 13.62 | 3.86 | 99.4 | |
|
|
|
| 254435 | 12630 | 23447373 | 98.5 | 26.3 | 0.38 | 1.64 | 98.4 | |
|
|
|
| 146457 | 3454 | 3327182 | 93.5 | 57.8 | 1.53 | 0.95 | 92.3 | |
|
|
|
| 80179 | 6295 | 9107395 | 94.2 | 17.6 | 0.31 | 0.52 | 97.1 | |
|
|
|
| 74153 | 2830 | 2734881 | 93.7 | 41.7 | 0.87 | 0.48 | 81.5 |
(a) Genomes sequenced in the present study (1) or from the NCBI database (0).
(b) Species known to be components of cheese making commercial cultures.
(c) Number of reads mapped on CDS from the reference genome with three or less mismatches on 35 nucleotides.
(d) Number of CDS in the genome. CDS corresponding to insertion sequences, prophages and potential repeated and transferable elements were removed.
(e) Percentage of CDS covered with at least one read.
(f) Percentage of sequence covered by at least one read (sequence is restrained to the selected CDS).
(g) Number of reads aligned with this genome divided by the number of good quality reads.
(h) Length of sequence covered by perfect match reads (with no mismatch on the 35 nt length alignment) divided by the length of the sequence covered by reads.
For each cheese, the data presented correspond to the eight reference genomes with the highest numbers of mapped reads.