| Literature DB >> 32179734 |
Joan Mas-Lloret1,2,3, Mireia Obón-Santacana1,2,3, Gemma Ibáñez-Sanz1,2,3,4, Elisabet Guinó1,2,3, Miguel L Pato5, Francisco Rodriguez-Moranta4, Alfredo Mata6, Ana García-Rodríguez7, Victor Moreno8,9,10,11, Ville Nikolai Pimenoff12,13,14,15.
Abstract
The gut microbiome has a fundamental role in human health and disease. However, studying the complex structure and function of the gut microbiome using next generation sequencing is challenging and prone to reproducibility problems. Here, we obtained cross-sectional colon biopsies and faecal samples from nine participants in our COLSCREEN study and sequenced them in high coverage using Illumina pair-end shotgun (for faecal samples) and IonTorrent 16S (for paired feces and colon biopsies) technologies. The metagenomes consisted of between 47 and 92 million reads per sample and the targeted sequencing covered more than 300 k reads per sample across seven hypervariable regions of the 16S gene. Our data is freely available and coupled with code for the presented metagenomic analysis using up-to-date bioinformatics algorithms. These results will add up to the informed insights into designing comprehensive microbiome analysis and also provide data for further testing for unambiguous gut microbiome analysis.Entities:
Mesh:
Substances:
Year: 2020 PMID: 32179734 PMCID: PMC7075950 DOI: 10.1038/s41597-020-0427-5
Source DB: PubMed Journal: Sci Data ISSN: 2052-4463 Impact factor: 6.444
Clinical descriptives. Colorectal cancer risk-factor information. Former smoker indicates non-smoker for the last 12 months prior sampling. User consumed non-steroidal anti-inflammatory drugs (NSAIDs) in the 12 months prior sampling.
| Sample ID | Sex | Age | Weight (kg) | Height (cm) | Smoking | Red meat (g/day) | Processed meat (g/day) | Vegetables (g/day) | Alcohol (g/day) | NSAIDS use | Family history CRC |
|---|---|---|---|---|---|---|---|---|---|---|---|
| AE1235 | M | 62 | 64 | 164 | Current | NA | NA | NA | NA | No | No |
| AE1236 | F | 67 | 62 | 148 | Never | 19.1 | 3.7 | 280.4 | 0 | Yes | No |
| AE1237 | F | 63 | 63 | 155 | Former | NA | NA | NA | NA | Yes | No |
| AE1238 | M | 61 | 73 | 172 | Current | 5.8 | 14.7 | 264.3 | 720.1 | Yes | Yes |
| AE1239 | F | 54 | 69 | 166 | Current | 8.6 | 8.5 | 182.5 | 196.7 | Yes | No |
| AE1240 | M | 63 | 83 | 168 | Never | 49 | 0.8 | 197.9 | 142.7 | No | No |
| AE1241 | F | 67 | 74 | 160 | Never | 19.9 | 6.6 | 109.7 | 265 | No | No |
| AE1242 | F | 67 | 65 | 152 | Never | NA | NA | NA | NA | No | No |
| AE1243 | F | 55 | 85 | 160 | Never | 13 | 0.8 | 113.3 | 557.8 | No | No |
Clinical characteristics of the samples and DNA yields. HRA = high-risk adenoma; IRA = intermediate-risk adenoma; neg = healthy colon.
| Sample | Sex | Age | FIT result | Condition | DNA (stool, | DNA (tissue, |
|---|---|---|---|---|---|---|
| AE1235 | Male | 62 | − | HRA | 4.3 | 9.1 |
| AE1236 | Female | 67 | − | neg | 3.0 | 15.2 |
| AE1237 | Female | 63 | + | HRA | 4.2 | 31.6 |
| AE1238 | Male | 61 | − | IRA | 9.8 | 15.4 |
| AE1239 | Female | 54 | + | neg | 5.2 | 11.4 |
| AE1240 | Male | 63 | − | neg | 3.5 | 9.3 |
| AE1241 | Female | 68 | + | IRA | 5.4 | 6.5 |
| AE1242 | Female | 67 | + | IRA | 6.5 | 13.6 |
| AE1243 | Female | 55 | + | HRA | 2.4 | 17.1 |
Quality control. Numbers indicate the amount of original microbial paired-end reads and the amount of paired-end reads passing quality control, as well as percentages of read pairs excluded due to duplication or quality and adapter trimming.
| Sample | Microbial | High quality | Deduplicated (%) | Trimmed (%) |
|---|---|---|---|---|
| AE1235 | 27,510,304 | 19,991,742 | 7.42 | 19.91 |
| AE1236 | 45,050,043 | 29,097,088 | 12.47 | 22.94 |
| AE1237 | 25,720,634 | 18,745,351 | 7.78 | 19.34 |
| AE1238 | 34,831,431 | 25,727,431 | 7.78 | 18.36 |
| AE1239 | 36,353,427 | 25,946,121 | 8.15 | 20.47 |
| AE1240 | 31,699,249 | 23,225,137 | 8.08 | 18.65 |
| AE1241 | 34,083,370 | 24,830,987 | 8.04 | 19.11 |
| AE1242 | 31,592,814 | 23,239,834 | 7.77 | 18.67 |
| AE1243 | 23,476,326 | 17,887,436 | 7.80 | 16.01 |
Fig. 1Taxonomic classification of samples at family level. (a) Classification of shotgun samples using three different classifiers. (b) Classification of 16S sequences, split by region and source material, using DADA2 and IdTaxa.
Metagenome Assembled Genomes (MAGs). Summary of high quality MAGs present in at least two samples (see times observed).
| Phylum | Family | Species name | Completeness (%) | Genome size (Mb) | N50 (Kb) | Times observed |
|---|---|---|---|---|---|---|
| Actinobacteria | Coriobacteriaceae | Collinsella aerofaciens | 95–100 | 2.1–2.2 | 67–72 | 2 |
| Bacteroidetes | Bacteroidaceae | Bacteroides uniformis | 96–97 | 4.2–4.5 | 75–117 | 2 |
| Bacteroidetes | Prevotellaceae | Paraprevotella clara | 92–97 | 3.2–3.4 | 24–55 | 2 |
| Bacteroidetes | Rikenellaceae | Alistipes putredinis | 92–98 | 2.0–2.3 | 61–110 | 5 |
| Euryarchaeota | Methanobacteriaceae | Methanobrevibacter smithii | 95–100 | 1.6–1.9 | 76–189 | 3 |
| Firmicutes | Clostridiaceae | Clostridium sp CAG 127 | 91–97 | 2.4–2.6 | 53–240 | 3 |
| Firmicutes | Clostridiaceae | Clostridium sp CAG 217 | 96–97 | 1.9–2.0 | 257–320 | 2 |
| Firmicutes | Clostridiaceae | Clostridium sp L2 50 | 94–99 | 2.4–2.6 | 60–162 | 2 |
| Firmicutes | Clostridiaceae | Clostridium sp | 97–98 | 2.5–2.7 | 33–75 | 3 |
| Firmicutes | Erysipelotrichaceae | Holdemanella SGB6796 | 94–96 | 2.1–2.2 | 25–89 | 2 |
| Firmicutes | Eubacteriaceae | Eubacterium sp CAG 202 | 99 | 2.1–2.3 | 53–76 | 2 |
| Firmicutes | Eubacteriaceae | Eubacterium sp CAG 251 | 99 | 1.8–1.9 | 53–143 | 3 |
| Firmicutes | Lachnospiraceae | Coprococcus eutactus | 96 | 2.6–2.7 | 22–59 | 2 |
| Firmicutes | Lachnospiraceae | Dorea longicatena | 95–99 | 2.4–3.2 | 28–54 | 2 |
| Firmicutes | Lachnospiraceae | Eubacterium rectale | 97–99 | 2.2–2.8 | 22–91 | 5 |
| Firmicutes | Lachnospiraceae | Fusicatenibacter saccharivorans | 96–97 | 2.7–2.9 | 42–82 | 3 |
| Firmicutes | Lachnospiraceae | Roseburia sp CAG 45 | 96–98 | 2.6–2.7 | 63–138 | 3 |
| Firmicutes | Ruminococcaceae | Faecalibacterium prausnitzii | 91–99 | 2.1–2.5 | 28–123 | 4 |
| Firmicutes | Ruminococcaceae | Faecalibacterium sp CAG 74 | 98–99 | 2.8–3.0 | 40–133 | 3 |
| Firmicutes | Ruminococcaceae | Gemmiger formicilis | 94–97 | 2.3–2.7 | 25–89 | 2 |
| Firmicutes | Ruminococcaceae | Ruminococcus bromii | 98–99 | 1.9–2.0 | 28–40 | 2 |
| Firmicutes | Ruminococcaceae | Ruminococcus sp | 91–99 | 2.3–2.7 | 24–107 | 4 |
| Firmicutes | Ruminococcaceae | Ruminococcus torques | 92–95 | 2.2–2.3 | 24–61 | 2 |
| Verrucomicrobia | Akkermansiaceae | Akkermansia muciniphila | 98 | 2.8–2.9 | 105–325 | 2 |
Targeted 16S data. Percentage of 16S reads covering each region in the corresponding sample.
| Total | V2 | V3 | V4 | V6-V7 | V7-V8 | Other | ||
|---|---|---|---|---|---|---|---|---|
| Faeces | AE1235 | 739819 | 3.2 | 40.2 | 14.3 | 21.6 | 18.8 | 1.9 |
| AE1236 | 450511 | 2.9 | 43.6 | 15.0 | 20.6 | 16.0 | 2.0 | |
| AE1237 | 767495 | 4.1 | 36.0 | 14.4 | 17.6 | 24.8 | 3.2 | |
| AE1238 | 740788 | 3.6 | 38.5 | 14.5 | 20.6 | 21.0 | 1.8 | |
| AE1239 | 997171 | 5.9 | 36.1 | 14.2 | 24.2 | 17.6 | 2.0 | |
| AE1240 | 458735 | 2.4 | 39.0 | 13.5 | 17.3 | 24.8 | 2.9 | |
| AE1241 | 590541 | 3.5 | 40.0 | 14.0 | 19.6 | 21.0 | 1.9 | |
| AE1242 | 467170 | 3.4 | 37.8 | 14.7 | 19.7 | 22.6 | 1.9 | |
| AE1243 | 386045 | 3.3 | 41.0 | 14.6 | 21.0 | 18.1 | 2.0 | |
| Tissue | AE1235 | 321453 | 4.3 | 61.1 | 14.2 | 15.1 | 4.5 | 0.9 |
| AE1236 | 621908 | 8.3 | 46.8 | 16.7 | 18.7 | 8.7 | 0.8 | |
| AE1237 | 726770 | 8.2 | 43.8 | 17.5 | 18.4 | 11.0 | 1.1 | |
| AE1238 | 735109 | 7.4 | 42.3 | 18.7 | 17.8 | 11.5 | 2.3 | |
| AE1239 | 577808 | 6.8 | 49.1 | 16.5 | 20.7 | 6.2 | 0.8 | |
| AE1240 | 601785 | 9.5 | 42.3 | 19.1 | 21.4 | 6.6 | 1.0 | |
| AE1241 | 649667 | 7.9 | 45.7 | 17.3 | 24.9 | 3.4 | 0.8 | |
| AE1242 | 589330 | 5.4 | 50.4 | 16.6 | 23.2 | 3.6 | 0.9 | |
| AE1243 | 447223 | 7.0 | 48.0 | 19.4 | 16.7 | 8.1 | 0.8 |
DADA2 results. Total amount of reads entering the pipeline and passing all the quality controls are indicated, as well as percentages of reads filtered in each step.
| Source | Sample ID | Region | Input | Output | Filtered (%) | Denoised (%) | Chimeras (%) |
|---|---|---|---|---|---|---|---|
| Stool | AE1235 | v2 | 23675 | 18409 | 16.27 | 2.99 | 2.98 |
| AE1235 | v3 | 297069 | 204763 | 14.80 | 0.26 | 16.01 | |
| AE1235 | v4 | 105530 | 72361 | 26.79 | 1.17 | 3.47 | |
| AE1235 | v6v7 | 160139 | 118416 | 14.27 | 1.74 | 10.04 | |
| AE1235 | v7v8 | 139431 | 102517 | 23.41 | 1.19 | 1.87 | |
| AE1236 | v2 | 13177 | 10091 | 20.25 | 3.00 | 0.17 | |
| AE1236 | v3 | 196436 | 148363 | 12.94 | 0.30 | 11.22 | |
| AE1236 | v4 | 67353 | 46528 | 28.87 | 1.27 | 0.78 | |
| AE1236 | v6v7 | 92647 | 71073 | 13.38 | 1.78 | 8.13 | |
| AE1236 | v7v8 | 72100 | 55878 | 18.57 | 1.26 | 2.66 | |
| AE1237 | v2 | 31697 | 22779 | 21.13 | 2.02 | 4.98 | |
| AE1237 | v3 | 276040 | 201847 | 14.04 | 0.34 | 12.50 | |
| AE1237 | v4 | 110375 | 82233 | 19.16 | 0.98 | 5.36 | |
| AE1237 | v6v7 | 135004 | 91005 | 16.34 | 1.28 | 14.98 | |
| AE1237 | v7v8 | 190178 | 126317 | 18.27 | 0.72 | 14.59 | |
| AE1238 | v2 | 26631 | 21196 | 14.94 | 3.29 | 2.18 | |
| AE1238 | v3 | 285027 | 206419 | 12.46 | 0.37 | 14.74 | |
| AE1238 | v4 | 107172 | 80701 | 19.20 | 1.72 | 3.77 | |
| AE1238 | v6v7 | 152748 | 111924 | 11.94 | 2.03 | 12.76 | |
| AE1238 | v7v8 | 155514 | 111841 | 18.88 | 1.02 | 8.19 | |
| AE1239 | v2 | 58730 | 46507 | 14.39 | 1.74 | 4.68 | |
| AE1239 | v3 | 359574 | 251532 | 15.33 | 0.24 | 14.48 | |
| AE1239 | v4 | 141973 | 103323 | 21.22 | 1.19 | 4.82 | |
| AE1239 | v6v7 | 241379 | 173393 | 11.71 | 1.53 | 14.93 | |
| AE1239 | v7v8 | 175774 | 130720 | 18.40 | 1.03 | 6.20 | |
| AE1240 | v2 | 11200 | 8381 | 16.34 | 4.73 | 4.10 | |
| AE1240 | v3 | 179016 | 123229 | 16.20 | 0.47 | 14.50 | |
| AE1240 | v4 | 62106 | 47971 | 18.49 | 1.67 | 2.60 | |
| AE1240 | v6v7 | 79313 | 50315 | 17.02 | 3.24 | 16.30 | |
| AE1240 | v7v8 | 113851 | 83697 | 18.19 | 1.64 | 6.65 | |
| AE1241 | v2 | 20533 | 15287 | 18.88 | 3.23 | 3.43 | |
| AE1241 | v3 | 236319 | 164152 | 15.45 | 0.40 | 14.68 | |
| AE1241 | v4 | 82470 | 62916 | 20.12 | 1.63 | 1.96 | |
| AE1241 | v6v7 | 115842 | 83998 | 13.58 | 2.75 | 11.16 | |
| AE1241 | v7v8 | 124095 | 89112 | 19.74 | 1.26 | 7.19 | |
| AE1242 | v2 | 16093 | 12590 | 16.98 | 3.80 | 0.98 | |
| AE1242 | v3 | 176603 | 116141 | 17.49 | 0.39 | 16.36 | |
| AE1242 | v4 | 68441 | 51756 | 19.43 | 1.91 | 3.03 | |
| AE1242 | v6v7 | 91881 | 67003 | 16.06 | 2.16 | 8.86 | |
| AE1242 | v7v8 | 105442 | 81780 | 15.77 | 1.39 | 5.28 | |
| AE1243 | v2 | 12651 | 9882 | 16.73 | 3.60 | 1.56 | |
| AE1243 | v3 | 158164 | 112772 | 13.44 | 0.37 | 14.89 | |
| AE1243 | v4 | 56432 | 40641 | 24.63 | 1.38 | 1.97 | |
| AE1243 | v6v7 | 81212 | 57972 | 13.32 | 2.92 | 12.38 | |
| AE1243 | v7v8 | 69949 | 52240 | 19.07 | 2.26 | 3.99 | |
| Tissue | AE1235 | v2 | 13680 | 10741 | 18.41 | 1.69 | 1.39 |
| AE1235 | v3 | 196304 | 144394 | 11.75 | 0.23 | 14.46 | |
| AE1235 | v4 | 45755 | 35944 | 20.18 | 0.42 | 0.84 | |
| AE1235 | v6v7 | 48383 | 39295 | 15.96 | 0.67 | 2.16 | |
| AE1235 | v7v8 | 14445 | 11208 | 21.16 | 0.97 | 0.28 | |
| AE1236 | v2 | 51480 | 42622 | 15.80 | 0.50 | 0.91 | |
| AE1236 | v3 | 291280 | 226960 | 11.57 | 0.16 | 10.35 | |
| AE1236 | v4 | 103690 | 79166 | 22.58 | 0.21 | 0.86 | |
| AE1236 | v6v7 | 116437 | 101656 | 11.56 | 0.19 | 0.94 | |
| AE1236 | v7v8 | 53800 | 40664 | 20.83 | 0.57 | 3.01 | |
| AE1237 | v2 | 59739 | 48980 | 14.92 | 0.61 | 2.47 | |
| Tissue | AE1237 | v3 | 318023 | 228121 | 12.38 | 0.16 | 15.73 |
| AE1237 | v4 | 126872 | 94309 | 24.77 | 0.14 | 0.76 | |
| AE1237 | v6v7 | 133901 | 111136 | 13.67 | 0.33 | 3.00 | |
| AE1237 | v7v8 | 79930 | 58141 | 23.29 | 0.52 | 3.46 | |
| AE1238 | v2 | 54373 | 43554 | 16.29 | 0.82 | 2.79 | |
| AE1238 | v3 | 311029 | 227554 | 13.57 | 0.24 | 13.03 | |
| AE1238 | v4 | 137377 | 106679 | 20.87 | 0.32 | 1.16 | |
| AE1238 | v6v7 | 130753 | 112947 | 11.57 | 0.26 | 1.79 | |
| AE1238 | v7v8 | 84391 | 62281 | 23.08 | 0.60 | 2.52 | |
| AE1239 | v2 | 39380 | 32759 | 14.47 | 0.86 | 1.49 | |
| AE1239 | v3 | 283485 | 206573 | 11.36 | 0.16 | 15.61 | |
| AE1239 | v4 | 95146 | 74237 | 20.74 | 0.24 | 1.00 | |
| AE1239 | v6v7 | 119410 | 102233 | 11.41 | 0.35 | 2.63 | |
| AE1239 | v7v8 | 35846 | 27409 | 19.80 | 1.07 | 2.66 | |
| AE1240 | v2 | 57468 | 45978 | 16.02 | 0.77 | 3.20 | |
| AE1240 | v3 | 254594 | 182648 | 13.86 | 0.23 | 14.17 | |
| AE1240 | v4 | 115056 | 89991 | 20.65 | 0.13 | 1.01 | |
| AE1240 | v6v7 | 129027 | 106387 | 15.19 | 0.33 | 2.03 | |
| AE1240 | v7v8 | 39782 | 30472 | 20.14 | 0.62 | 2.63 | |
| AE1241 | v2 | 51322 | 42185 | 16.15 | 0.85 | 0.80 | |
| AE1241 | v3 | 297068 | 231915 | 12.17 | 0.10 | 9.66 | |
| AE1241 | v4 | 112313 | 85034 | 22.84 | 0.29 | 1.16 | |
| AE1241 | v6v7 | 161575 | 140379 | 12.25 | 0.20 | 0.67 | |
| AE1241 | v7v8 | 22036 | 16680 | 20.72 | 1.37 | 2.22 | |
| AE1242 | v2 | 31761 | 26112 | 16.67 | 1.04 | 0.07 | |
| AE1242 | v3 | 297138 | 233551 | 12.07 | 0.12 | 9.21 | |
| AE1242 | v4 | 97818 | 76855 | 20.07 | 0.17 | 1.18 | |
| AE1242 | v6v7 | 136577 | 116654 | 12.59 | 0.26 | 1.74 | |
| AE1242 | v7v8 | 21025 | 16087 | 21.35 | 0.86 | 1.28 | |
| AE1243 | v2 | 31236 | 25427 | 16.92 | 1.12 | 0.56 | |
| AE1243 | v3 | 214598 | 161786 | 12.69 | 0.26 | 11.66 | |
| AE1243 | v4 | 86913 | 69844 | 18.09 | 0.45 | 1.10 | |
| AE1243 | v6v7 | 74483 | 65530 | 10.91 | 0.53 | 0.58 | |
| AE1243 | v7v8 | 36358 | 28409 | 18.68 | 1.01 | 2.18 |
Bioinformatic tools. Software versions and related resources.
| Software | Use | Version | |
|---|---|---|---|
| Bowtie2 | Human reads mapping | 2.3.4 | [ |
| Samtools | Extraction of non-human reads | 1.8 | [ |
| FASTQC | Reads quality assessment | 0.11.7 | [ |
| Clumpify | Removal of duplicate reads | 38.26 | [ |
| BBDuk | Quality and adapter trimming | 38.26 | [ |
| Kraken | Taxonomic classification of shotgun reads | 2.0.8-beta | [ |
| Bracken | Re-estimation of taxonomic profiles | 2.2 | [ |
| MetaPhlAn2 | Taxonomic classification of shotgun reads | 2.7.8 | [ |
| Kaiju | Taxonomic classification of shotgun reads | 1.6.3 | [ |
| HUMAnN2 | Functional profiling of shotgun reads | 0.11.1 | [ |
| metaSPADES | Metagenomic assembly | 3.13.1 | [ |
| metaBAT | Binning of scaffolds | 2.12.1 | [ |
| checkM | Bins quality assessment | 1.0.12 | [ |
| PhyloPhlAn2 | Taxonomic classification of bins | 0.35 | [ |
| Reformat | Generation of lower coverage samples | 38.26 | [ |
| DADA2 (R) | Denoising of 16S reads | 1.10.1 | [ |
| IdTaxa (R) | Taxonomic classification of 16S sequences | 2.10.1 | [ |
| vegan (R) | Computation of alpha diversity | 2.5.3 | [ |
| zCompositions (R) | Compositional data analysis | 0.99.3 | [ |
| CoDaSeq (R) | Compositional data analysis ( | 1.2.0 |
Fig. 2Ordination. Principal components analysis of the datasets after central log ratio transformations of the family-level classifications. (a) 16S data, where each sample data was stratified by region and source material. (b) Shotgun data, classified using Kraken2, Kaiju and MetaPhlAn2. (c) 16S data from faeces (only V4 region) and shotgun data (classified using Kraken2).
Fig. 3Alpha diversity. Shannon index was calculated at different taxonomic levels (species, genus, phylum, top row) as classified by Kraken2 and functional (gene families: UniRef90, functional groups: KEGG orthogroups and metabolic pathways: MetaCyc, bottom row) levels as classified by HUMAnN2 by number of read pairs. Five random samples were created at each level.
| Measurement(s) | genome •rRNA_16S |
| Technology Type(s) | DNA sequencing |
| Factor Type(s) | sex •age •Smoking •Weight •Height •Diet •Medication |
| Sample Characteristic - Organism | Homo sapiens |
| Sample Characteristic - Environment | feces •colon |
16S alignment validation. Region(s) covered by 16S reads with exact matches to the SILVA database. The first column represents the region(s) called by our pipeline, while the third and fourth show the exact matching positions in the SILVA database. This shows consistency between the variable region called by our pipeline and the expected position it occupies along the 16S gene. SILVA IDs: B. fragilis: FQ312004.3243020.3244552; B. vulgatus: CP000139.2183533.2185042; F. nucleatum: AE009951.530422.531923; R. gnavus: AZJF01000012.178214.179732.
| Region | Species | Start | End |
|---|---|---|---|
| v2 | 134 | 389 | |
| v2 | 108 | 362 | |
| v2 | 110 | 364 | |
| v2 | 108 | 361 | |
| v3 | 330 | 540 | |
| v3 | 327 | 537 | |
| v4 | 531 | 818 | |
| v4 | 500 | 788 | |
| v4 | 522 | 810 | |
| v6v7 | 944 | 1207 | |
| v6v7 | 917 | 1177 | |
| v6v7 | 936 | 1194 | |
| v6v7 | 933 | 1193 |