| Literature DB >> 29448949 |
Joannella Morales1, Danielle Welter2, Emily H Bowler2, Maria Cerezo2, Laura W Harris2, Aoife C McMahon2, Peggy Hall3, Heather A Junkins3, Annalisa Milano2, Emma Hastings2, Cinzia Malangone2, Annalisa Buniello2, Tony Burdett2, Paul Flicek2, Helen Parkinson2, Fiona Cunningham2, Lucia A Hindorff3, Jacqueline A L MacArthur4.
Abstract
The accurate description of ancestry is essential to interpret, access, and integrate human genomics data, and to ensure that these benefit individuals from all ancestral backgrounds. However, there are no established guidelines for the representation of ancestry information. Here we describe a framework for the accurate and standardized description of sample ancestry, and validate it by application to the NHGRI-EBI GWAS Catalog. We confirm known biases and gaps in diversity, and find that African and Hispanic or Latin American ancestry populations contribute a disproportionately high number of associations. It is our hope that widespread adoption of this framework will lead to improved analysis, interpretation, and integration of human genomics data.Entities:
Keywords: Ancestry; Diversity; GWAS Catalog; Genome-wide association studies; Genomics; Population genetics
Mesh:
Year: 2018 PMID: 29448949 PMCID: PMC5815218 DOI: 10.1186/s13059-018-1396-2
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Ancestry categories: distinct regional population groupings used in this framework
| Ancestry category | Definition | Examples of detailed descriptions for samples included in the category |
|---|---|---|
| Aboriginal Australian | Includes individuals who either self-report or have been described by authors as Australian Aboriginal. These are expected to be descendants of early human migration into Australia from Eastern Asia and can be distinguished from other Asian populations by mtDNA and Y chromosome variation [ | Martu Australian Aboriginal |
| African American or Afro-Caribbean | Includes individuals who either self-report or have been described by authors as African American or Afro-Caribbean. This category also includes individuals who genetically cluster with reference populations from this region, for example, 1000 Genomes and/or HapMap ACB or ASW populations. We note that there is likely to be significant admixture with European ancestry populations | African American, African Caribbean |
| African unspecified | Includes individuals that either self-report or have been described as African, but there was not sufficient information to allow classification as African American, Afro-Caribbean or Sub-Saharan African | African, non-Hispanic black |
| Asian unspecified | Includes individuals that either self-report or have been described as Asian but there was not sufficient information to allow classification as East Asian, Central Asian, South Asian, or South-East Asian | Asian, Asian American |
| Central Asian | Includes individuals who either self-report or have been described by authors as Central Asian [ | Silk Road (founder/genetic isolate) |
| East Asian | Includes individuals who either self-report or have been described by authors as East Asian or one of the sub-populations from this region (e.g., Chinese). This category also includes individuals who genetically cluster with reference populations from this region, for example, 1000 Genomes and/or HapMap CDX, CHB, CHS, and JPT populations | Chinese, Japanese, Korean |
| European | Includes individuals who either self-report or have been described by authors as European, Caucasian, white, or one of the sub-populations from this region (e.g., Dutch). This category also includes individuals who genetically cluster with reference populations from this region, for example, 1000 Genomes and/or HapMap CEU, FIN, GBR, IBS, and TSI populations | Spanish, Swedish |
| Greater Middle Eastern (Middle Eastern, North African, or Persian) | Includes individuals who self-report or were described by authors as Middle Eastern, North African, Persian, or one of the sub-populations from this region (e.g., Saudi Arabian) [ | Tunisian, Arab, Iranian |
| Hispanic or Latin American | Includes individuals who either self-report or are described by authors as Hispanic, Latino, Latin American, or one of the sub-populations from this region. This category includes individuals with known admixture of primarily European, African, and Native American ancestries, though some may have also a degree of Asian (e.g., Peru). We also note that the levels of admixture vary depending on the country, with Caribbean countries carrying higher levels of African admixture when compared to South American countries, for example. This category also includes individuals who genetically cluster with reference populations from this region, for example, 1000 Genomes and/or HapMap CLM, MXL, PEL, and PUR populations [ | Brazilian, Mexican |
| Native American | Includes indigenous individuals of North, Central, and South America, descended from the original human migration into the Americas from Siberia [ | Pima Indian, Plains American Indian |
| Not reported | Includes individuals for which no ancestry or country of recruitment information is available | |
| Oceanian | Includes individuals that either self-report or have been described by authors as Oceanian or one of the sub-populations from this region (e.g., Native Hawaiian) [ | Solomon Islander, Micronesian |
| Other | Includes individuals where an ancestry descriptor is known but insufficient information is available to allow assignment to one of the other categories | Surinamese, Russian |
| Other admixed ancestry | Includes individuals who either self-report or have been described by authors as admixed and do not fit the definition of the other admixed categories already defined (“African American or Afro-Caribbean” or “Hispanic or Latin American”) | |
| South Asian | Includes individuals who either self-report or have been described by authors as South Asian or one of the sub-populations from this region (e.g., Asian Indian). This category also includes individuals who genetically cluster with reference populations from this region, for example, 1000 Genomes and/or HapMap BEB, GIH, ITU, PJL, and STU populations | Bangladeshi, Sri Lankan Sinhalese |
| South East Asian | Includes individuals who either self-report or have been described by authors as South East Asian or one of the sub-populations from this region (e.g., Vietnamese). This category also includes individuals who genetically cluster with reference populations from this region, for example, 1000 Genomes KHV population. We note that East Asian and South East Asian populations are often conflated. However, recent studies indicate a unique genetic background for South East Asian populations | Thai, Malay |
| Sub-Saharan African | Includes individuals who either self-report or have been described by authors as Sub-Saharan African or one of the sub-populations from this region (e.g., Yoruban). This category also includes individuals who genetically cluster with reference populations from this region, for example, 1000 Genomes and/or HapMap ESN, LWK, GWD, MSL, MKK, and YRI populations | Yoruban, Gambian |
Ancestry categories are assigned to samples with distinct and well-defined patterns of genetic variation, in addition to individuals with inferred relatedness to these samples. A full list of GWAS Catalog sample descriptions assigned to each category can be found in Additional file 3: Table S2
Fig. 1Representation of ancestry data in the GWAS Catalog search interface (https://www.ebi.ac.uk/gwas/). Ancestry-related data are found in the Studies and Associations tables (underlined in black) when searching the Catalog. This figure shows the results of a search for PubMed Identifier 27145994. The sample description can be found in the Studies table, either by pressing “Expand all Studies” or the “+” on the study of interest (highlighted in red). Sample ancestry is captured in two forms: (1) detailed description (highlighted in blue); and (2) ancestry category (highlighted in green). The latter follows the format: sample size, category, (country of recruitment). In cases where multiple ancestries are included in a study, the ancestry associated with a particular association is found as an annotation in the p value column in the Associations table (highlighted in pink)
Fig. 2Ancestry category distribution in the GWAS Catalog. This figure summarizes the distribution of ancestry categories in percentages, of individuals (N = 110,291,046; a), individuals over time (N = 110,291,046; b), studies (N = 4,655; c), and associations (N = 60,970; d). The largest category in all panels is European (aqua). At the level of individuals (a), the largest non-European category is Asian (bright pink), with East Asian (light pink) accounting for the majority. Non-European, Non-Asian categories together (yellow) comprise 4 % of individuals, and for 6 % (white) of samples no ancestry category could be specified. b The distribution of individuals in percentages, included in the 915 studies published between 2005 and 2010 compared to the distribution of individuals included in the 2905 studies published between 2011 and 2016. d The disproportionate contribution of associations from African (blue) and Hispanic/Latin American (purple) categories, when compared to the percentage of individuals (a, blue, purple, respectively) and studies (b, blue, purple, respectively)
Recommendations for authors reporting ancestry data in publications. These recommendations were generated by expert curators following a detailed review of the over 3200 GWAS publications included in the Catalog
| 1. Provide detailed information for each distinct group of samples, |