| Literature DB >> 31754632 |
Niina Haiminen1,2, Stefan Edlund1,3, David Chambliss1,3, Mark Kunitomi1,3, Bart C Weimer1,4, Balasubramanian Ganesan1,5,6,7, Robert Baker1,5,7, Peter Markwell1,5,7, Matthew Davis1,3, B Carol Huang1,4, Nguyet Kong1,4, Robert J Prill1,3, Carl H Marlowe1,8, André Quintanar1,9, Sophie Pierre1,9, Geraud Dubois1,3, James H Kaufman1,3, Laxmi Parida1,2, Kristen L Beck1,3.
Abstract
Here we propose that using shotgun sequencing to examine food leads to accurate authentication of ingredients and detection of contaminants. To demonstrate this, we developed a bioinformatic pipeline, FASER (Food Authentication from SEquencing Reads), designed to resolve the relative composition of mixtures of eukaryotic species using RNA or DNA sequencing. Our comprehensive database includes >6000 plants and animals that may be present in food. FASER accurately identified eukaryotic species with 0.4% median absolute difference between observed and expected proportions on sequence data from various sources including sausage meat, plants, and fish. FASER was applied to 31 high protein powder raw factory ingredient total RNA samples. The samples mostly contained the expected source ingredient, chicken, while three samples unexpectedly contained pork and beef. Our results demonstrate that DNA/RNA sequencing of food ingredients, combined with a robust analysis, can be used to find contaminants and authenticate food ingredients in a single assay.Entities:
Keywords: Food microbiology; Metagenomics
Year: 2019 PMID: 31754632 PMCID: PMC6863864 DOI: 10.1038/s41538-019-0056-6
Source DB: PubMed Journal: NPJ Sci Food ISSN: 2396-8370
Fig. 1Pipeline applied to food sample sequencing data to determine matrix species and their relative proportions. In the taxonomic assignment step with exemplary diagram, reads are placed on the lowest common ancestor (LCA) of the nodes that they hit, in case of multiple hits per read. In the relative quantification step the read counts at internal nodes are re-assigned to the species at the leaf nodes
Fig. 2Illustration of the minimum size of a subsample to obtain a desired limit of detection. The required number of reads is shown as a function of frequency of species S (in the full sample). In this example with a total number of N = 300 million reads, we desire with high probability P (here P ≥ 0.9999) to have limit of detection at least L = 100 sampled reads coming from species S when S is present. For example, when frequency of S is 0.1% (x = 0.001), a subsample of 141,499 reads from the total 300 million reads is required (marked with a square). When frequency is S is 2% (x = 0.02), fewer than 10,000 reads are required (marked with a circle)
Fig. 3FASER pipeline accuracy on two simulated food mixtures. a Simulated food matrix 1. b Simulated food matrix 2. Insets are shown separately to accommodate different scales. Details regarding the input genomes are given in Supplementary Table 6
Food matrix authentication results from 150,000 simulated reads of single species food matrix samples from (A) chicken (Gallus gallus), (B) pork (Sus scrofa), (C) beef (Bos taurus)
| Taxon name | Common name | TaxId | FASER: 10−40 hits | 10−10 hits | FASER: % Assignment with 10−40 | % Assignment with 10−10 |
|---|---|---|---|---|---|---|
|
| ||||||
|
| Chicken | 9031 | 116,085 | 119,994 | 99.98% | 99.98% |
|
| Turkey | 9103 | 11 | 18 | 0.01% | 0.01% |
|
| Japanese quail | 93934 | 7 | 9 | 0.01% | 0.01% |
|
| Swan goose | 8845 | 0 | 1 | 0.00% | 0.00% |
| Total | 116,103 | 120,022 | ||||
|
| ||||||
|
| Pork | 9823 | 107,075 | 115,178 | 100.00% | 100.00% |
|
| Killer whale | 9733 | 0 | 1 | 0.00% | 0.00% |
| Total | 107,075 | 115,179 | ||||
|
| ||||||
|
| Beef | 9913 | 114,699 | 117,906 | 99.91% | 99.88% |
|
| Water buffalo | 89462 | 85 | 121 | 0.07% | 0.10% |
| Tibetan antelope | 59538 | 10 | 13 | 0.01% | 0.01% | |
|
| Goat | 9925 | 5 | 9 | 0.00% | 0.01% |
|
| Sheep | 9940 | 0 | 3 | 0.00% | 0.00% |
| Total | 114,799 | 118,052 | ||||
Paired-end reads were simulated from the respective genomes listed in Supplemental File 3 (highlighted in blue). Blast e-value thresholds 10−40 and 10−10 were applied; 10−40 is used in the FASER pipeline. The number of read hits are shown as well as the percentage of simulated reads that were assigned to the listed species
Novel BLAST promiscuous hit filtering on 1000 paired-end simulated Bos taurus reads
| With promiscuity filter | Without filter | ||
|---|---|---|---|
| Species name | Common name | Observed% | Observed% |
|
| Beef | 100.00% | 92.55% |
|
| Water buffalo | – | 2.21% |
|
| Wild yak | – | 1.86% |
|
| Zebu | – | 1.51% |
|
| Bison | – | 1.28% |
|
| Sheep | – | 0.23% |
|
| Chiru | – | 0.23% |
|
| Goat | – | 0.12% |
| Number of reads with hits | 784 | 859 |
Left: after filtering, right: before filtering
FASER results on experimental samples
| Sample name | Sequence type | Expected species | % Observed expected species | Difference of observed from expected | Other species observed at >0.1% (common names shown where available) | Sample identifier |
|---|---|---|---|---|---|---|
| Chicken embryo | RNA (polyA selected) |
| 98.34% | −1.66% | Turkey 1.02%; Japanese quail 0.43%; Green junglefowl 0.11% | SRR1804235 |
| Pork ovaries | RNA (polyA selected) |
| 96.38% | −3.62% | Water buffalo 0.47%; Minke whale 0.34%; Sperm whale 0.26%; Orca 0.20%; Alpaca 0.16%; Baiji doplhin 0.16%; Walrus 0.15%; Wolf 0.15%; Chiru 0.14%; Bottlenosed dolphin 0.11% | SRR6236882 |
| Yellowfin tuna muscle | RNA (polyA selected) |
| 99.80% | −0.20% | *Bluefin tuna reported Black rockcod 0.13% | SRR4436659 |
| Carp spleen | RNA (polyA selected) |
| 96.47% | −3.53% | SRR3239506 | |
| Rice root | RNA (polyA selected) |
| 99.57% | −0.43% | Date palm 0.14%; Oryza brachyantha 0.11%; Human 0.11% | SRR7079262 |
| Maize leaf | RNA (polyA selected) |
| 99.97% | −0.03% | – | ERR712359 |
| Poultry meal (paired samples) | Total RNA |
| 99.96% | −0.04% | – | MFMB-03 |
| Total DNA | 99.71% | −0.29% | Maize 0.17% | MFMB-08 | ||
| Meat and bone meal (paired samples) | Total RNA |
| 99.72% | −0.28% | Chicken 0.28% | MFMB-02 |
| Total DNA | 99.51% | −0.49% | Chicken 0.43% | MFMB-06 |
Results on six single ingredient polyA-selected RNA datasets from NCBI and on four high protein powder paired total DNA and RNA samples (MFMB-02 through MFMB-08). Additional dataset details are given in Supplementary Table 1
FASER results on experimental food mixture
| Species name | Common name | BLAST hits count | Observed | Expected | Observed − expected |
|---|---|---|---|---|---|
|
| Sheep | 149,726 | 54.07% | 54.49% | −0.42% |
|
| Beef | 97,224 | 35.11% | 34.67% | 0.44% |
|
| Pork | 18,459 | 6.67% | 8.92% | −2.25% |
|
| Horse | 3048 | 1.10% | 0.99% | 0.11% |
|
| Goat | 5068 | 1.83% | 0% | 1.83% |
|
| Water buffalo | 1930 | 0.70% | 0% | 0.70% |
|
| Tibetan antelope | 1132 | 0.41% | 0% | 0.41% |
| Total ALL species (incl. those not listed here) | 276,912 | 99.88% | 99.068% |
Accuracy evaluation of FASER on DNA data from All-Food-Seq raw sausage meat mixture experiment. Percentages for expected (based on ingredient weights) vs. observed (based on fraction of species-level BLAST hits) are shown. Species with at least 100 hits are included in the table. The remaining 0.932% expected content is from plants (see Supplementary Table 3)
Fig. 4Raw high protein powder (poultry meal) samples’ FASER results showing unexpected non-chicken components. a Percentage of expected content (chicken). b Percentage of unexpected content showing species with relative proportion >0.1% of total matrix composition. Content from Bos taurus (beef) and Sus scrofa (pork) is detected for samples MFMB-04, MFMB-20, and MFMB-38
High protein powder sequences mapping to observed source genomes
| MFMB-04 | MFMB-20 | MFMB-38 | MFMB-39 | MFMB-83 | MFMB-95 | |
|---|---|---|---|---|---|---|
| TOTAL concordant hits | 952,168 | 965,429 | 967,969 | 960,505 | 977,856 | 974,453 |
| TOTAL exclusive hits % | 94.10% | 95.74% | 96.23% | 95.32% | 92.72% | 93.30% |
| Chicken exclusive hits % | 79.19% | 94.36% | 95.24% | 95.29% | 92.67% | 93.26% |
| Pork exclusive hits % |
|
|
| 0.02% | 0.03% | 0.02% |
| Beef exclusive hits % |
|
|
| 0.01% | 0.03% | 0.02% |
Confirmation of poultry meal contamination by read mapping to genomes from each observed food matrix source (chicken, pork, beef) from three matrix-contaminated (MFMB-04, MFMB-20, MFMB-38) and three chicken-only (MFMB-39, MFMB-83, MFMB-95) high protein powder (poultry meal) samples. Exclusive hits mapped to only one of the three genomes. Numbers in bold indicate increased contaminant mapping rates compared with chicken-only samples