| Literature DB >> 22276086 |
Sarah E Ali-Khan, Tomasz Krakowski, Rabia Tahir, Abdallah S Daar.
Abstract
Post-Human Genome Project progress has enabled a new wave of population genetic research, and intensified controversy over the use of race/ethnicity in this work. At the same time, the development of methods for inferring genetic ancestry offers more empirical means of assigning group labels. Here, we provide a systematic analysis of the use of race/ethnicity and ancestry in current genetic research. We base our analysis on key published recommendations for the use and reporting of race/ethnicity which advise that researchers: explain why the terms/categories were used and how they were measured, carefully define them, and apply them consistently. We studied 170 population genetic research articles from high impact journals, published 2008-2009. A comparative perspective was obtained by aligning study metrics with similar research from articles published 2001-2004. Our analysis indicates a marked improvement in compliance with some of the recommendations/guidelines for the use of race/ethnicity over time, while showing that important shortfalls still remain: no article using 'race', 'ethnicity' or 'ancestry' defined or discussed the meaning of these concepts in context; a third of articles still do not provide a rationale for their use, with those using 'ancestry' being the least likely to do so. Further, no article discussed potential socio-ethical implications of the reported research. As such, there remains a clear imperative for highlighting the importance of consistent and comprehensive reporting on human populations to the genetics/genomics community globally, to generate explicit guidelines for the uses of ancestry and genetic ancestry, and importantly, to ensure that guidelines are followed.Entities:
Keywords: Ancestry; Ethnicity; Genetic ancestry inference; Genetic research; Race; Terminology
Year: 2011 PMID: 22276086 PMCID: PMC3237839 DOI: 10.1007/s11568-011-9154-5
Source DB: PubMed Journal: Hugo J ISSN: 1877-6558
Sample set characteristics, N (%)
| Total sample |
|
|---|---|
| Year of publication | |
| 2008 |
|
| 2009 |
|
| Journal of publication (2008 impact factor) | |
| American Journal of Human Genetics (10.153) |
|
| Human genetics (4.042) |
|
| Nature (31.434) |
|
| Nature Genetics (30.259) |
|
| PLoS Genetics (8.883) |
|
| Science (28.103) |
|
| Article general field of interest | |
| Population genetics |
|
| Medical |
|
| Methods |
|
| Non-medical |
|
Sample set coding frequencies, N (%)
| Variables coded |
|
|---|---|
| Basic features | |
| Hypothesis | 169 (99.4%) |
| Limitations | 87 (52.4%) |
| Sample origin | 163 (95.9%) |
| Reason for using population | |
| Why populations | 112 (65.9%) |
| Why this population | 117 (68.8%) |
| Basis for assigning population label | 150 (88.2%) |
| Use of genotyping data to infer genetic ancestry | 88 (51.8%) |
| SNP genotypes or ancestry informative markers (AIMs) used to infer ancestry proportions of individual participants’ DNA samples | 20 (23.3%) |
| Genotype data used to assess the genetic homogeneity of population by principal components cluster analysis, Samples outlying from population clusters of interest excluded from further analysis | 36 (41.9%) |
| Text briefly states that potential population stratification was examined in the research populations, but no further details are provided | 32 (36.4%) |
| Defines generic ‘race and ethnicity’ or ‘ancestry’ | 0 (0%) |
| Defines specific population label/describes population group | 102 (60.0%) |
| Ways of using populations in research | |
| Label for study population only | 78 (45.9%) |
| Independent variable | 87 (51.2%) |
| Dependent variable | 1 (0.59%) |
| DNA with a label | 23 (13.5%) |
| Discusses social and ethical implications | 0 (0%) |
Comparison of current data with earlier study
| Sankar et al. ( | Current study | |
|---|---|---|
| Data derived from articles from publication years | 2001–2004 | 2008–2009 |
| # Articles | 330 | 170 |
| Sample selection criteria | Medline search strategy: race and ethnicity, genetics and population keywords; AND publication in one of 3 journal type samples (genetics, clinical, and general); mainly high impact journals | Pubmed search strategy: (race OR ethnicity OR ancestry) AND (SNP OR polymorphism OR CNV) keywords; AND publication in one of six leading journals for the publication of human genetic research; mainly high impact journals |
* Indicates statistically significant difference, P < 0.05
Presence of coded article features by generic terminology used
| Total sample set, | Race and ethnicity | Ancestry | Other |
|
|---|---|---|---|---|
|
| ||||
| Basic features | ||||
| Hypothesis | 79 (98.8%) | 38 (100%) | 52 (100%) | 0.568 |
| Limitations | 43 (55.1%) | 16 (45.7%) | 28 (53.8%) | 0.319 |
| Sample origin | 75 (93.8%) | 37 (97.4%) | 51 (98.1%) | 0.413 |
| Reason for using population | ||||
| Why populations | 60 (75.0%) | 17 (44.7%) | 36 (69.2%) | 0.004* |
| Why this population | 55 (68.8%) | 23 (60.5%) | 38 (73.1%) | 0.372 |
| Basis for assigning population label | 71 (88.8%) | 35 (92.1%) | 44 (84.6%) | 0.542 |
| Use of empirical genomic methods | 44 (55.0%) | 25 (65.8%) | 19 (36.5%) | 0.017* |
| Defines generic ‘race and ethnicity’ or ‘ancestry’ | 0 (0%) | 0 (0%) | 0 (0%) | |
| Defines specific population label | 51 (63.8%) | 24 (63.2%) | 27 (51.9%) | 0.361 |
| Ways of using populations in research | ||||
| Label for study population only | 34 (43.0%) | 21 (56.8%) | 27 (51.9%) | 0.352 |
| Independent variable | 46 (58.2%) | 17 (45.9%) | 24 (46.2%) | 0.296 |
| Dependent variable | 0 (0%) | 0 (0%) | 1 (1.9%) | 0.319 |
| DNA with a label | 14 (17.5%) | 3 (7.9%) | 6 (11.5%) | 0.319 |
| General article field of interest | ||||
| Population genetics | 10 (12.5%) | 1 (2.6%) | 15 (28.8%) | 0.002* |
| Medical | 68 (85%) | 32 (84.2%) | 27 (51.9%) | <0.001* |
| Methods | 1 (1.3%) | 2 (5.3%) | 6 (11.5%) | 0.036* |
| Non-medical | 1 (1.3%) | 3 (7.9%) | 4 (7.7%) | 0.134 |
| | <0.001* | <0.001* | <0.001* | |
* Indicates statistically significant difference, P < 0.05
Terms used, and ways of describing populations compiled from our sample set
| Terms and ways of describing or referring to populations | Example |
|---|---|
| Ancestry/ancestral groups | ‘Despite wide variation in allele frequency, these genetic variants show notable homogeneity of effect across populations of European ancestry living at different latitudes and show independent association to disease risk’ (Bishop et al. |
| Anthropological names | ‘The names we use are the ones by which the groups are described anthropologically, but are not unique identifiers’ (Reich et al. |
| ‘X’-derived | ‘Variants in the FTO gene have been associated with obesity measures in mainly European-derived populations’ (Wing et al. |
| Of ‘X’-descent | ‘Significant associations with individual SNPs at a common locus were observed in the two independent populations of African descent’ (Garner et al. |
| Ethnicity/ethnic | ‘Importantly, we made similar observations when comparing populations of the same ethnicity’ (Shi et al. |
| Ethnogeographic groups | ‘These results also show that two individuals carrying the same mtDNA haplotype can be classified in opposite ethnogeographic groups…’ (Keyser et al. |
| Linguistic groups | ‘The structure results, population phylogenies, and PCA results all show that populations from the same linguistic group tend to cluster together’ (HUGO Pan-Asian SNP Consortium et al. |
| Of ‘X’-origin | ‘The clinical characteristics of participants in five independent cohorts—the white U.S. GWAS sample (n 1/4 1000), the white US family sample (n 1/4 1972), the Chinese hip fracture (HF) sample (n 1/4 700), the Chinese BMD sample (n 1/4 2995), and the Tobago cohort of African origin (n 1/4 908 men)—are described in Tables |
| Race/racial groups | ‘We also performed race stratified analyses to control for potential confounding by race as well as to evaluate the previously reported race-specific results’ (Crosslin et al. |
| Only population identifier or name used | ‘Using genome-wide association data from 1,376 French individuals, we identified 16,360 SNPs nominally associated with T2D and studied these SNPs in an independent sample of 4,977 French individuals’ (Rung et al. |
Use of empirical genomic methods by article field of interest
| General article field of interest | Population genetics ( | Medical ( | Methods ( | Non-medical ( | Chi sq |
|---|---|---|---|---|---|
| Used genomic methods | 6 (23.1%) | 74 (53.3%) | 3 (33.3%) | 5 (62.5%) | 0.006* |
Examples of indistinct, interchangeable or confusing usage of race and ethnicity and ancestry compiled from our sample set
| Example from text | Comment |
|---|---|
| (1) ‘To minimize confounding by ethnic variation we restricted our study population to individuals of self-reported European descent’ (Amos et al. | Authors do not explain why or how ‘ethnic variation’ would confound results. The relationship between ‘ethnic variation’, ‘self-reported European descent’ and genetic background is not explicated. No term defined |
| (2) ‘All genetic association analyses were stratified by self- identified race (white vs. African American) to avoid spurious associations due to population stratification’ (Rasmussen-Torvik et al. | The relationship between ‘self-identified race’, and population stratification is not explicated. No term defined |
| (3) Research populations—Gullah, African American and European American—are referred to as being of African and European descent respectively in main article body, while in the supplementary text they are referred to as ‘races’ (Nath et al. | Use of differing terminology to refer to the same populations. ‘Race’ is not defined |
| (4) ‘The self-identified race/ethnicity information for these AGRE individuals is listed below’; however, the table is entitled ‘AGRE self-identified ancestry’ and lists’ American Indian/Alaskan Native; Asian; Black or African American; More Than One Race; Native Hawaiian or other Pacific Islander; Unknown; and White (Wang et al. | Interchangeable use of ancestry, and race and ethnicity. No term defined |
| (5) ‘All samples must have Caucasian ethnicity based on hierarchical clustering of AIMs genotypes, and all other samples were excluded’. ‘Ancestry’ only, used in main article body, ethnicity used only in supplementary text (Glessner et al. | Authors are referring to the inference of population ancestral identity using empirical genomic methods. However, how ‘ethnicity’ relates to genetic background is not explicated. Inappropriate use of ‘ethnicity’, rather than ancestry. Use of anachronistic ‘Caucasian’, rather than ‘European’ terminology. No term defined |
| (6) ‘Only subjects that self-reported as being of European ancestry were retained, regardless of their self-reported race’; Genetically inferred population identity referred to as ‘imputed race’ (Yeager et al. | Relationship between ‘self-reported ancestry’, ‘self-reported race’, and ‘imputed race’ not explicated. Inappropriate use of ‘race’ with respect to ‘imputed race’. No term defined |
| (7) ‘Distributions of racial ancestries were the same in cases and controls’ (Walsh et al. | Inappropriate use of ‘racial’ and ‘ancestry’ together. No term defined |
Recommendations for the genetics community and biomedical journal editors from our analysis, for the reporting of genetic research in human populations
| (1) Provide a comprehensive explanation of the methods used for genetic ancestry imputations, including assumptions made, algorithms and parameters used, descriptions of population samples involved, and the limitations of inferences |
| (2) Define and differentiate the concepts of race, ethnicity, and ancestry used in the context of the reported research |
| (3) When empirical methods are used to assign ancestry labels, specify ‘genetic ancestry’ or ‘inferred genetic ancestry’ is being referred to, rather than simply ‘ancestry’ |
| (4) Provide an acknowledgment or brief discussion of social, ethical, legal, economic etc. issues raised by the reported research, if applicable |
| (5) Form a working group consisting of representatives from the spectrum of countries and cultures to engage the genetics community globally to: |
| • Highlight the importance of careful and consistent reporting on, and naming and description of, human populations in genetic research |
| • Address concerns and ambiguities in the implementation and reporting of genetic research in human populations |
| • Revise extant guidelines and explicitly generate guidelines for the uses of ancestry and genetic ancestry |
| • Gain broad endorsement of these guidelines/standards/requirements throughout the genetics community |
| (6) Ensure biomedical journals consistently enforce these standards and requirements in genetic research reporting |