| Literature DB >> 25081296 |
Fabian Ripp, Christopher Felix Krombholz, Yongchao Liu, Mathias Weber, Anne Schäfer, Bertil Schmidt, Rene Köppel, Thomas Hankeln1.
Abstract
BACKGROUND: DNA-based methods like PCR efficiently identify and quantify the taxon composition of complex biological materials, but are limited to detecting species targeted by the choice of the primer assay. We show here how untargeted deep sequencing of foodstuff total genomic DNA, followed by bioinformatic analysis of sequence reads, facilitates highly accurate identification of species from all kingdoms of life, at the same time enabling quantitative measurement of the main ingredients and detection of unanticipated food components.Entities:
Mesh:
Year: 2014 PMID: 25081296 PMCID: PMC4131036 DOI: 10.1186/1471-2164-15-639
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Outline of the AFS pipeline.
Mapping results from simulated datasets
| Species | Reads assigned | Proportion [%] | Target value [%] | Difference abs. [%] | Difference rel. [%] |
|---|---|---|---|---|---|
|
| 26,555 | 2.95 | 3 | 0.05 | 1.67 |
|
| 224,077 | 24.91 | 25 | 0.09 | 0.36 |
|
| 8,969 | 1.00 | 1 | 0.00 | 0.00 |
|
| 89,421 | 9.94 | 10 | 0.06 | 0.60 |
|
| 9,042 | 1.01 | 1 | 0.01 | 1.00 |
|
| 541,432 | 60.19 | 60 | 0.19 | 0.32 |
Simulated quantification of sequence reads obtained from six different genomes using the AFS pipeline. “Difference abs.” shows the difference between the proportions of reads, as determined by AFS (“proportion”), relative to the expected amounts existing in the sample (“target value”). “Difference rel.” is calculated by dividing “Difference abs.” by the expected proportion value.
Effect of reference genome choice
| Species | Target value | Proportion without | Proportion with | Difference before [%] | Difference after [%] |
|---|---|---|---|---|---|
|
| 58.82 | 61.81 | 58.53 | 2.99 | 0.29 |
|
| 5.89 | 0.00 | 5.86 | 5.89 | 0.02 |
|
| 11.76 | 13.00 | 11.74 | 1.24 | 0.02 |
|
| 23.53 | 25.19 | 23.86 | 1.66 | 0.33 |
Simulation demonstrates the effect of choosing the adequate genomes for quantification by AFS. Initially, the E. coli reference genome was omitted in the mapping step. After observing E. coli reads in the metagenomic analysis, its genome was added to the mapping procedure, and the species proportions were now recovered with much higher accuracy.
Mapping results for the reference sausage KalD
| Species | Target value [%] | Proportion [%] | Difference abs. [%] | Difference rel. [%] | |||
|---|---|---|---|---|---|---|---|
| AFS-quant | AFS-spec | AFS-quant | AFS-spec | AFS-quant | AFS-spec | ||
|
| 35 | 36.05 ± 0.04 | 41.16 ± 0.02 | 1.05 ± 0.04 | 6.16 ± 0.02 | 3 ± 0.11 | 17.6 ± 0.03 |
|
| 1 | 1.27 ± 0.01 | 1.45 ± 0.01 | 0.28 ± 0.01 | 0.45 ± 0.01 | 27.67 ± 0.67 | 45 ± 1 |
|
| 9 | 7.22 ± 0.05 | 7.59 ± 0.09 | 1.79 ± 0.05 | 1.41 ± 0.09 | 19.85 ± 0.48 | 15.67 ± 1 |
|
| 55 | 54.76 ± 0.09 | 49.71 ± 0.08 | 0.24 ± 0.09 | 5.29 ± 0.08 | 0.44 ± 0.17 | 9.62 ± 0.15 |
|
| 0 | 0.64 ± 0.03 | 0.07 ± 0 | 0.64 ± 0.03 | 0.07 ± 0 | n.a. | n.a. |
| Total | 100 | 4 ± 0.1 | 13.38 ± 0.04 | ||||
Quantitative species analysis obtained by Illumina sequencing of DNA from the “KalD” reference sausage [37]. The AFS-quant and AFS-spec approaches (see text for details) were compared. Each dataset tested contained 1 mio of paired-end sequence reads, randomly selected from a larger dataset. Three different sub-datasets (1 mio reads each) were analyzed and mean values plus standard deviations are displayed. “Difference abs.” shows the difference between the proportion of reads as determined by AFS (“proportion”) relative to the expected amounts existing in the sample (“target value”). “Difference rel.” is calculated by dividing “Difference abs.” by the expected proportion value.
Figure 2Metagenomic analysis of unmapped reads. Results of the metagenomic analysis of sequence reads obtained from the KalD reference sausage. The global result of the BLAST/MEGAN step is shown in the box (grey frame). A more detailed classification of matches is displayed for mammals, viruses, bacteria and plants.
Figure 3Determination of the optimal number of sequence reads necessary to obtain accurate quantification results for species components. The number of sequence reads used in the mapping (x-axis) was plotted against the values of mapping accuracy (y-axis), calculated as the cumulated absolute deviation in% of mapping results versus expected species proportions.