OBJECTIVE: Data about Arab-Americans, a growing ethnic minority, are not routinely collected in vital statistics, registry, or administrative data in the USA. The difficulty in identifying Arab-Americans using publicly available data sources is a barrier to health research about this group. Here, we validate an empirically based probabilistic Arab name algorithm (ANA) for identifying Arab-Americans in health research. DESIGN: We used data from all Michigan birth certificates between 2000 and 2005. Fathers' surnames and mothers' maiden names were coded as Arab or non-Arab according to the ANA. We calculated sensitivity, specificity, and positive (PPV) and negative predictive values (NPV) of Arab ethnicity inferred using the ANA as compared to self-reported Arab ancestry. RESULTS: Statewide, the ANA had a specificity of 98.9%, a sensitivity of 50.3%, a PPV of 57.0%, and an NPV of 98.6%. Both the false-positive and false-negative rates were higher among men than among women. As the concentration of Arab-Americans in a study locality increased, the ANA false-positive rate increased and false-negative rate decreased. CONCLUSION: The ANA is highly specific but only moderately sensitive as a means of detecting Arab ancestry. Future research should compare health characteristics among Arab-American populations defined by Arab ancestry and those defined by the ANA.
OBJECTIVE: Data about Arab-Americans, a growing ethnic minority, are not routinely collected in vital statistics, registry, or administrative data in the USA. The difficulty in identifying Arab-Americans using publicly available data sources is a barrier to health research about this group. Here, we validate an empirically based probabilistic Arab name algorithm (ANA) for identifying Arab-Americans in health research. DESIGN: We used data from all Michigan birth certificates between 2000 and 2005. Fathers' surnames and mothers' maiden names were coded as Arab or non-Arab according to the ANA. We calculated sensitivity, specificity, and positive (PPV) and negative predictive values (NPV) of Arab ethnicity inferred using the ANA as compared to self-reported Arab ancestry. RESULTS: Statewide, the ANA had a specificity of 98.9%, a sensitivity of 50.3%, a PPV of 57.0%, and an NPV of 98.6%. Both the false-positive and false-negative rates were higher among men than among women. As the concentration of Arab-Americans in a study locality increased, the ANA false-positive rate increased and false-negative rate decreased. CONCLUSION: The ANA is highly specific but only moderately sensitive as a means of detecting Arab ancestry. Future research should compare health characteristics among Arab-American populations defined by Arab ancestry and those defined by the ANA.
Authors: Michelle Housey; Peter DeGuire; Sarah Lyon-Callo; Lu Wang; Wendy Marder; W Joseph McCune; Charles G Helmick; Caroline Gordon; J Patricia Dhar; James Leisen; Emily C Somers Journal: Am J Public Health Date: 2015-03-19 Impact factor: 9.308