| Literature DB >> 35167588 |
Richard J Abdill1, Elizabeth M Adamowicz1, Ran Blekhman1,2.
Abstract
The importance of sampling from globally representative populations has been well established in human genomics. In human microbiome research, however, we lack a full understanding of the global distribution of sampling in research studies. This information is crucial to better understand global patterns of microbiome-associated diseases and to extend the health benefits of this research to all populations. Here, we analyze the country of origin of all 444,829 human microbiome samples that are available from the world's 3 largest genomic data repositories, including the Sequence Read Archive (SRA). The samples are from 2,592 studies of 19 body sites, including 220,017 samples of the gut microbiome. We show that more than 71% of samples with a known origin come from Europe, the United States, and Canada, including 46.8% from the US alone, despite the country representing only 4.3% of the global population. We also find that central and southern Asia is the most underrepresented region: Countries such as India, Pakistan, and Bangladesh account for more than a quarter of the world population but make up only 1.8% of human microbiome samples. These results demonstrate a critical need to ensure more global representation of participants in microbiome studies.Entities:
Mesh:
Year: 2022 PMID: 35167588 PMCID: PMC8846514 DOI: 10.1371/journal.pbio.3001536
Source DB: PubMed Journal: PLoS Biol ISSN: 1544-9173 Impact factor: 8.029
Fig 1Global microbiome representation.
(a) Total samples by country. The color of each country indicates the total number of samples originating in that country using a log10 scale. (b) Relative representation by country. The color of each country indicates its representation in human microbiome datasets, relative to its share of world population. Red colors mark countries that are overrepresented relative to their population, and blue colors mark countries that are underrepresented. Countries with zero samples in the dataset are marked with dark blue. (c) Cumulative microbiome samples by world region. The x-axis indicates the year, and the y-axis indicates the cumulative microbiome samples available at the end of that year. Colors indicate the cumulative microbiome samples from each of the world regions specified in the legend. The colored bar to the right of the plot indicates the share of the world population living in each of the regions using the same colors. (d) Proportion of annual samples. The x-axis indicates the year, and the y-axis indicates the proportion of samples from each world region published in that year. Colors correspond to the world regions shown in panel C. The data and code needed to generate this figure can be found at https://doi.org/10.5281/zenodo.5351179. All maps are based on public domain Natural Earth data; the base layer is available for download at https://www.naturalearthdata.com/http//www.naturalearthdata.com/download/50m/cultural/ne_50m_admin_0_countries.zip.
Samples per country.
| Position | Country | Samples | Share |
|---|---|---|---|
| 1 | US | 178,960 | 40.2% |
| Unknown | 62,118 | 14.0% | |
| 2 | China | 36,162 | 8.1% |
| 3 | UK | 16,076 | 3.6% |
| 4 | Denmark | 11,497 | 2.6% |
| 5 | Australia | 9,266 | 2.1% |
| 6 | the Netherlands | 9,173 | 2.1% |
| 7 | Canada | 8,829 | 2.0% |
| 8 | Finland | 7,855 | 1.8% |
| 9 | Italy | 6,265 | 1.4% |
| 10 | Germany | 5,531 | 1.2% |
| 11 | Spain | 5,517 | 1.2% |
| 12 | Sweden | 5,248 | 1.2% |
| 13 | Israel | 4,831 | 1.1% |
| 14 | New Zealand | 4,354 | 1.0% |
| 15 | Japan | 4,298 | 1.0% |
| 16 | Chile | 3,616 | 0.8% |
| 17 | Bangladesh | 3,502 | 0.8% |
| 18 | France | 3,402 | 0.8% |
| 19 | Malawi | 3,052 | 0.7% |
| 20 | India | 2,997 | 0.7% |
| Rest of world | 52,280 | 11.8% |
Samples by body site.
Each row indicates a body site related to the human microbiome. The “Samples” column indicates the total number of samples categorized under each body site, and the “Countries” column indicates the number of unique countries with at least 1 sample in that category. Samples without a known country are included in the sample count, but not the country count. Body sites map directly to categories defined in the NCBI Taxonomy Browser; see for a list of category IDs combined for each body site.
| Body site | Samples | Countries |
|---|---|---|
| Gut | 220,017 | 96 |
| Human metagenome* | 69,697 | 58 |
| Oral | 47,798 | 63 |
| Skin | 36,593 | 44 |
| Vaginal | 17,784 | 31 |
| Lung | 17,307 | 30 |
| Nasopharyngeal | 15,646 | 22 |
| Feces | 6,858 | 13 |
| Reproductive system | 3,180 | 6 |
| Blood | 2,707 | 9 |
| Saliva | 2,503 | 15 |
| Milk | 2,060 | 9 |
| Urinary tract | 1,187 | 4 |
| Tracheal | 520 | 2 |
| Sputum | 364 | 3 |
| Eye | 359 | 8 |
| Semen | 203 | 3 |
| Bile | 45 | 2 |
| Skeleton | 1 | 1 |
Samples under the “human metagenome” label refer to an NCBI category that does not specify a particular body site.
NCBI, National Center for Biotechnology Information.
Samples and population by region.
| Region | Samples | 2020 population (estimated, in thousands) | % of samples | % of samples (known location) | % of population | Representation proportion |
|---|---|---|---|---|---|---|
| Europe and Northern America | 272,544 | 1,116,506 | 61.3% | 71.2% | 14.3% | 4.97 |
| Eastern and Southeastern Asia | 49,007 | 2,346,709 | 11.0% | 12.8% | 30.1% | 0.43 |
| Sub-Saharan Africa | 18,651 | 1,094,366 | 4.2% | 4.9% | 14.0% | 0.35 |
| Latin America and the Caribbean | 15,264 | 653,962 | 3.4% | 4.0% | 8.4% | 0.49 |
| Australia/New Zealand | 13,620 | 30,322 | 3.1% | 3.6% | 0.4% | 9.14 |
| Central and Southern Asia | 6,685 | 2,014,709 | 1.5% | 1.7% | 25.8% | 0.07 |
| Northern Africa and Western Asia | 5,621 | 525,869 | 1.3% | 1.5% | 6.7% | 0.22 |
| Oceania | 1,178 | 12,356 | 0.3% | 0.3% | 0.2% | 1.94 |
| Unknown | 62,259 | 14.0% | ||||
| Least developed countries | 15,254 | 1,057,438 | 3.4% | 4.0% | 13.6% | 0.29 |
| Rest of world | 367,457 | 6,737,361 | 82.6% | 96.0% | 86.4% | 1.11 |
| Unknown | 62,118 | 14.0% | ||||
Representation proportion calculated by dividing a regions percentage of known samples by its percentage of population.