| Literature DB >> 24896101 |
Jacob B Hall1, Logan Dumitrescu1, Holli H Dilks2, Dana C Crawford1, William S Bush1.
Abstract
Recently, the development of biobanks linked to electronic medical records has presented new opportunities for genetic and epidemiological research. Studies based on these resources, however, present unique challenges, including the accurate assignment of individual-level population ancestry. In this work we examine the accuracy of administratively-assigned race in diverse populations by comparing assigned races to genetically-defined ancestry estimates. Using 220 ancestry informative markers, we generated principal components for patients in our dataset, which were used to cluster patients into groups based on genetic ancestry. Consistent with other studies, we find a strong overall agreement (Kappa = 0.872) between genetic ancestry and assigned race, with higher rates of agreement for African-descent and European-descent assignments, and reduced agreement for Hispanic, East Asian-descent, and South Asian-descent assignments. These results suggest caution when selecting study samples of non-African and non-European backgrounds when administratively-assigned race from biobanks is used.Entities:
Mesh:
Substances:
Year: 2014 PMID: 24896101 PMCID: PMC4045967 DOI: 10.1371/journal.pone.0099161
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Distribution of administratively-assigned race.
| Race | Study Sample | BioVU | Synthetic Derivative | Davidson Co.* |
| Caucasian | 4,232 (58.4%) | 102,018 (64.4%) | 1,116,837 (51.6%) | 385,039 (61.4%) |
| African American | 1,094 (15.1%) | 14,223 (9.0%) | 191,246 (8.8%) | 173,730 (27.7%) |
| Asian/Pacific | 228 (3.1%) | 1,380 (0.9%) | 14,449 (0.7%) | 15,083 (2.4%) |
| Hispanic | 230 (3.2%) | 2,147 (1.3%) | 37,466 (1.7%) | ** |
| Native American | 184 (2.5%) | 212 (0.1%) | 1,868 (0.1%) | 2,091 (0.3%) |
| Indian | 7 (0.1%) | 1,711 (1.1%) | 20,613 (1.0%) | 4,338 (0.7%) |
| Unknown | 1,277 (17.6%) | 36,696 (23.2%) | 781,074 (36.1%) | 46,400 (7.5%) |
|
|
|
|
|
|
Race categories listed are based on classification options originating from the SD. Our BioVU dataset contained no individuals labeled Other (O). Vanderbilt University Medical Center is located in Davidson County, TN. 2010 US census data is shown for Davidson County, Tennessee [25]. * For Davidson County, “Asian/Pacific” includes Asian (Non-Indian), Native Hawaiian, and Pacific Islander individuals, “Native American” includes Native American (American Indian) and Alaskan Native individuals, “Indian” includes Asian Indian individuals, and “Unknown” includes ‘some other race’ and individuals who reported two or more races for the census. ** “Hispanic” is not listed a race in the US Census; rather, Hispanic-origin is indicated and is not exclusive to any racial category. For example, 25,156 individuals in Davidson County who self-identified as ‘White’ also self-identified, separately, as Hispanic. Within Davidson County, 9.8% of individuals indicated Hispanic origin.
Percentages of each administratively-assigned race assigned to each genetic ancestry group.
| Genetic Ancestry | ||||||
| European | African | East Asian | Hispanic | South Asian | ||
|
| Caucasian | 4,174 | 24 | 8 | 16 | 10 |
| (98.6%) | (0.6%) | (0.2%) | (0.4%) | (0.2%) | ||
| African American | 11 | 1,080 | 0 | 3 | 0 | |
| (1.0%) | (98.7%) | (0.0%) | (0.3%) | (0.0%) | ||
| Asian/Pacific | 9 | 0 | 182 | 2 | 35 | |
| (3.9%) | (0.0%) | (79.8%) | (0.9%) | (15.4%) | ||
| Hispanic | 58 | 8 | 2 | 154 | 8 | |
| (25.2%) | (3.5%) | (0.8%) | (67.0%) | (3.5%) | ||
| Native American | 90 | 17 | 18 | 18 | 41 | |
| (48.9%) | (9.2%) | (9.8%) | (9.8%) | (22.3%) | ||
| Indian | 3 | 2 | 0 | 0 | 2 | |
| (42.8%) | (28.6%) | (0.0%) | (0.0%) | (28.6%) | ||
| Unknown | 1,126 | 83 | 26 | 21 | 21 | |
| (88.3%) | (6.5%) | (2.0%) | (1.6%) | (1.6%) | ||
Percentages reflect the proportion of individuals assigned to a genetic ancestry cluster for given administratively-assigned race.
Figure 1Comparison of administratively-assigned race and genetic ancestry, based on principal component analysis.
A) All pairwise combinations of principle components (PCs) 1 through 3, by administratively assigned race. B) All pairwise combinations of PCs 1 through 3, by cluster assignments corresponding to genetic ancestry. Comparison of Frames 1A and1B indicate individuals with administratively assigned race different than their genetically defined ancestry cluster. For example, the East Asian-descent cluster (1B; blue) contains individuals with administratively-assigned race (1A) of Caucasian (green), Hispanic (purple), and Other (orange).
Agreement between genetic and assigned ancestry.
| Genetic Ancestry | Overall | Male | Female |
| Overall | 0.872 (0.009) | 0.862 (0.015) | 0.876 (0.012) |
| European-descent | 0.906 (0.013) | 0.906 (0.020) | 0.904 (0.017) |
| African-descent | 0.964 (0.013) | 0.970 (0.020) | 0.960 (0.017) |
| East Asian-descent | 0.825 (0.013) | 0.800 (0.020) | 0.836 (0.017) |
| Hispanic-descent | 0.718 (0.013) | 0.683 (0.020) | 0.738 (0.017) |
| South Asian-descent | 0.284 (0.012) | 0.237 (0.018) | 0.318 (0.016) |
Notation: Cohen's Kappa coefficient (standard error).
South Asian-descent includes individuals with Native American and Indian race codes in BioVU.
Samples with administratively-assigned race of “Unknown” were excluded from this analysis.