Literature DB >> 31026030

Identification of Factors Associated With Variation in US County-Level Obesity Prevalence Rates Using Epidemiologic vs Machine Learning Models.

David Scheinker^1,2, Areli Valencia³, Fatima Rodriguez⁴.

Abstract

Importance: Obesity is a leading cause of high health care expenditures, disability, and premature mortality. Previous studies have documented geographic disparities in obesity prevalence. Objective: To identify county-level factors associated with obesity using traditional epidemiologic and machine learning methods. Design, Setting, and Participants: Cross-sectional study using linear regression models and machine learning models to evaluate the associations between county-level obesity and county-level demographic, socioeconomic, health care, and environmental factors from summarized statistical data extracted from the 2018 Robert Wood Johnson Foundation County Health Rankings and merged with US Census data from each of 3138 US counties. The explanatory power of the linear multivariate regression and the top performing machine learning model were compared using mean R2 measured in 30-fold cross validation. Exposures: County-level demographic factors (population; rural status; census region; and race/ethnicity, sex, and age composition), socioeconomic factors (median income, unemployment rate, and percentage of population with some college education), health care factors (rate of uninsured adults and primary care physicians), and environmental factors (access to healthy foods and access to exercise opportunities). Main Outcomes and Measures: County-level obesity prevalence in 2018, its association with each county-level factor, and the percentage of variation in county-level obesity prevalence explained by linear multivariate and gradient boosting machine regression measured with R2.
Results: Among the 3138 counties studied, the mean (range) obesity prevalence was 31.5% (12.8%-47.8%). In multivariate regressions, demographic factors explained 44.9% of variation in obesity prevalence; socioeconomic factors, 33.0%; environmental factors, 15.5%; and health care factors, 9.1%. The county-level factors with the strongest association with obesity were census region, median household income, and percentage of population with some college education. R2 values of univariate regressions of obesity prevalence were 0.238 for census region, 0.218 for median household income, and 0.160 for percentage of population with some college education. Multivariate linear regression and gradient boosting machine regression (the best-performing machine learning model) of obesity prevalence using all county-level demographic, socioeconomic, health care, and environmental factors had R2 values of 0.58 and 0.66, respectively (P < .001). Conclusions and Relevance: Obesity prevalence varies significantly between counties. County-level demographic, socioeconomic, health care, and environmental factors explain the majority of variation in county-level obesity prevalence. Using machine learning models may explain significantly more of the variation in obesity prevalence..

Entities: Disease Species

Mesh：

Year: 2019 PMID： 31026030 PMCID： PMC6487629 DOI： 10.1001/jamanetworkopen.2019.2884

Source DB: PubMed Journal: JAMA Netw Open ISSN： 2574-3805

Introduction

Obesity, defined as body mass index (BMI, calculated as weight in kilograms divided by height in meters squared) greater than 30, is a leading risk factor for and contributor to morbidity and mortality.[1,2] Prior research has suggested that the obesity epidemic is linked to cardiovascular disease, cancer, and premature mortality. Geographic disparities in obesity prevalence have been documented and associated with demographic, urbanization, socioeconomic, health care, and environmental factors.[3,4,5,6,7] The Centers for Disease Control and Prevention (CDC) has updated statistics on obesity prevalence by age, education, and state.[8] The Robert Wood Johnson Foundation County Health Rankings (CHR)[9] used these and other data to interpolate 2018 county-level information. These data make it possible to create statistical models of how county-level factors are associated with obesity prevalence. Obesity is a multifactorial problem resulting from individual, community, and geographic influences.[4,10,11] To better inform public health strategies to combat the obesity epidemic, it is important to understand how county-level factors are associated with obesity prevalence. Previous studies have used traditional epidemiologic methods and factors to explore geographic disparities in obesity.[4,5] Machine learning has been proposed as an appealing alternative approach for building models of obesity with more predictive power than linear regressions. A trade-off of most machine learning models is that they are based on mathematical functions that do not have readily interpretable variable coefficients.[7,12,13,14] Our objective was to determine which factors best explain county-level variation in 2018 obesity prevalence and whether traditional epidemiologic methods or machine learning methods are better suited for doing so.

Methods

Data Sources

The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) reporting guidelines for cross-sectional studies were followed by this study. We used data from the 2018 Robert Wood Johnson Foundation CHR.[9] The CHR is an annually produced county-level data set based on a statistical compilation and interpolation of data from the Behavioral Risk Factor Surveillance System, the Dartmouth Institute, American Community Survey, CDC Diabetes Interactive Atlas, CDC WONDER mortality data, Centers for Medicare & Medicaid Services National Provider Identification, US Census, US Department of Agriculture Food Environment Atlas, and the US Department of Education. Details of data sources considered appear in eTable 1 in the Supplement. The CHR annual county-level rate of obesity is the interpolated county-level percentage of survey respondents whose Behavioral Risk Factor Surveillance System self-reported height and weight correspond to a BMI of 30 or greater.[2,9] The CHR county-level factors include demographic (population, percentage rural, percentage female, percentage younger than 18 years, percentage 65 years and older, percentage African American, percentage Hispanic, percentage Asian, percentage American Indian/Alaskan Native, and percentage Native Hawaiian/Other); socioeconomic (median household income, percentage of children in poverty, percentage with some college education, percentage food insecure, percentage unemployed, and percentage with severe housing problems); health care (percentage of adults uninsured and primary care provider rate); and environmental factors (percentage with access to exercise opportunities and food environment index) factors. The CHR data were merged with US Census data to identify each county’s census region. A detailed list of the factors, their definitions, and the original data sources on which they are based is included in eTable 1 in the Supplement. This study was based on publicly available and unidentifiable data; thus, Stanford’s institutional review board determined it exempt from review and waived consent.

Statistical Analysis

Discrepancies in county names were reconciled using the latest US Census data. Counties missing data for a factor considered in our evaluations were omitted from the training and testing of the linear regression models but included in the training and testing of machine learning algorithms, such as gradient boosting machine (GBM) regression, that allow for missing data. For each pair of county-level factors that had a pairwise linear correlation greater than or equal to 0.8, the one with the weaker association with obesity prevalence, as measured by a univariate regression, was excluded. We excluded factors whose association with obesity would have been rendered uninterpretable by endogeneity (ie, if the errors in the estimates of those factors were likely to be correlated with the errors in the estimate of the county-level obesity prevalence).[15] These were county-level factors whose values were estimated from the same surveys used to estimate obesity prevalence (Behavioral Risk Factor Surveillance System). To reduce the influence of outliers and improve the interpretability of the regression coefficients, continuous variables with values significantly greater than 100 and skewed distributions (eg, population) were log normalized and then scaled to have maximum values of 100. Details of county name changes, counties with missing data, data exclusions, and data normalization are provided in eTable 2 in the Supplement. Univariate linear regression models were used to determine the association between county-level obesity prevalence and each prespecified individual factor. Multivariate linear regression models were used to find the association between county-level obesity prevalence and all county-level factors in each group of factors: demographic, socioeconomic, health care, and environmental. Multivariate linear regression models were used to find the association between county-level obesity prevalence and all of the factors in all 4 of the above groups. The distributions of obesity prevalence associated with different census regions were compared using the Kolmogorov-Smirnov test with multitest correction. We compared the percentage of variation in obesity prevalence explained by several machine learning models using all demographic, socioeconomic, health care, and environmental factors. The models were GBM; regression trees; random forest; a linear model chosen using Akaike information criterion, Bayesian information criterion, and their variants from among all models including each factor and each second-order interaction between factors; and a penalized linear model chosen using elastic net variants of the least absolute shrinkage and selection operator (LASSO) from among all models including each available factor and each second order interaction between factors.[16,17,18,19] To balance underfitting and overfitting (ie, bias and variance), the parameters of each model were tuned using 5-fold cross validation on a training data set of 1000 counties randomly selected from the original data. The training data were divided into 5 subsets or folds. For each parameter of each model, all combinations of values from a predetermined range were combined into a search grid from which values were sampled sequentially. For example, for GBM, the parameters and their ranges were 11 values for the number of trees (150, 160, 170 . . . 250); 6 values for interaction depth (10, 12, 14 . . . 20); 5 values for shrinkage (0.01, 0.02...0.05); and 5 values for N minimum observations in node (2, 4, 6 . . . 10) for a total of 1650 (11 × 6 × 5 × 5) combinations of parameter values (see eTable 3, eFigure 1, and eFigure 2 in the Supplement for details of the parameter tuning of the other models). For each combination of parameter values in the grid, 1 testing fold was selected to be held out, the model was trained on the other 4 folds of the data, and the R2 was evaluated on the testing fold. This was repeated 5 times for each testing fold, and the average of the R2 values over the 5 testing folds was reported (ie, no model was tested on the data on which it had been trained). The top performing model and its parameters were selected based on mean R2.

Comparison of Linear and Top Performing Models

The amount of variation in obesity prevalence explained by demographic, socioeconomic, health care, and environmental factors using linear regression and the top-performing machine learning model was compared using 30-fold cross validation. For all of the 30 held-out data sets, the resulting R2 values were compared using the paired Wilcoxon signed-rank test, the nonparametric alternative to the paired t test. To test whether additional county-level factors beyond those described above explained more of the variation in obesity prevalence, the above comparison was repeated for all variables available in the data set. All analyses were performed using R version 3.5.1; RStudio Version 1.0.143; and caret, a statistical package for R (The R Foundation).[20] Statistical significance was determined using 2-sided P < .05.

Results

Among the 3138 counties studied, the mean (range) obesity prevalence was 31.5% (12.8%-47.8%) (Figure 1A). The 25th percentile of the 2018 county-level obesity prevalence was 28.8%, the 50th was 31.8%, and the 75th percentile was 34.4%. The South census region had a mean obesity prevalence of 32.9%, the Midwest had a mean prevalence of 32.2%, Northeast had a mean prevalence of 28.6%, and the West had a mean prevalence of 26.6%. The distribution of obesity prevalence differed (P < .001) between regions (Figure 1B).

Figure 1.

Distribution of Obesity Prevalence by County and Census Region

A, Map of US counties by obesity prevalence. B, Density plot of county-level obesity prevalence in each US Census region.

Distribution of Obesity Prevalence by County and Census Region

A, Map of US counties by obesity prevalence. B, Density plot of county-level obesity prevalence in each US Census region. The greatest variation in county-level obesity prevalence, as measured by R2 in univariate regression, was explained by census region (23.8%), the normalized median household income (21.8%), and percentage of population with some college education (16.0%). Details of univariate regressions for these and all other factors appear in Table 1. In multivariate regressions, demographic factors explained 44.9% of variation in obesity prevalence; socioeconomic factors, 33.0%; environmental factors, 15.5%; and health care factors, 9.1% (Table 2). Multivariate linear regression and gradient boosting machine regression (the best-performing machine learning model) of obesity prevalence using all county-level demographic, socioeconomic, health care, and environmental factors had R2 values of 58.0% and 66.0%, respectively (P < .001). The changes in obesity prevalence associated with a 1 percentage point or 1 unit change in each factor, when controlling for all other factors, are shown in Table 3.

Table 1.

Variables Included in the Regression Analysis With Summary Statistics and Univariate Regression Results for 2018 County-Level Obesity Prevalence

Variable	Summary Statistics, Mean (SD) [Range], %	Univariate Regression Results
Variable	Summary Statistics, Mean (SD) [Range], %	Coefficient (SE)	R²
Demographic factors
Population	59.1 (10.4) [16.7-100]	−0.0721 (0.0076)^a	0.0277
Rural	58.6 (31.5) [0-100]	0.0345 (0.0025)^a	0.0579
Female	49.9 (2.3) [27.8-56.5]	0.1195 (0.0354)^a	0.0036
Aged <18 y	22.3 (3.5) [0-40.9]	0.2495 (0.0228)^a	0.0369
Aged ≥65 y	18.4 (4.6) [4.6-56.3]	−0.0624 (0.0176)^a	0.0040
African American	9.0 (14.3) [0-85.2]	0.1042 (0.0053)^a	0.1092
Hispanic	9.3 (13.7) [0.5-96.3]	−0.0946 (0.0057)^a	0.0820
Asian	1.5 (2.9) [0-44.3]	−0.5008 (0.0268 ^a	0.1005
American Indian/Alaskan Native	2.3 (7.7) [0-93.1]	0.0568 (0.0105)^a	0.0093
Native Hawaiian/other	0.1 (1.0) [0-50]	−0.3945 (0.0815)^a	0.0074
Census region			0.2377
Midwest	NA	32.2 (3.0)^a^,^b	NA
Northeast		28.6 (4.0)^a^,^b
South		32.9 (4.2)^a^,^b
West		32.4 (4.8)^a^,^b
Socioeconomic factors
Household income^c	91.3 (2.1) [84.7-100]	−1.0254 (0.0347)^a	0.2179
Some college	57.2 (11.6) [15.5- 94.0]	−0.1563 (0.0064)^a	0.1597
Food insecure	14.1 (4.2) [3.4- 37.9]	0.4065 (0.0176)^a	0.1455
Unemployed	5.3 (1.9) [1.7- 23.5]	0.6532 (0.0412)^a	0.0743
Severe housing problems	14.5 (4.8) [2.7-70.1]	−0.1626 (0.0166)^a	0.0297
Health care factors
Uninsured	12.0 (5.1) [2.1-37.4]	−0.0571 (0.0158)^a	0.0041
Primary care physician rate^c	12.1 (7.7) [0-100]	−0.1769 (0.0102)^a	0.0907
Environmental factors
Access to exercise opportunities	63.0 (23.2) [0-100]	−0.0694 (0.0033)^a	0.1269
Food environment index	7.4 (1.2) [0-10.0]	−1.0379 (0.0657)^a	0.0741

Abbreviation: NA, not applicable because variables were not used in the corresponding mode.

P < .01.

Mean (SD) reported.

Variables were log normalized and scaled to have a maximum value of 100.

Table 2.

Multivariate Regression Results

Variable	Demographic Factors^a	Socioeconomic Factors^b	Health Care Factors^c	Environmental Factors^d	Combined^e
Observations, No.	3135	3137	3003	3117	2984
R²	0.452	0.331	0.092	0.156	0.603
Adjusted R²	0.449	0.330	0.091	0.155	0.600

Demographic factors include percentage of population, percentage rural, percentage female, percentage younger than 18 years, percentage 65 years and older, percentage African American, percentage Hispanic, percentage Asian, percentage American Indian/Alaskan Native, and percentage Native Hawaiian/other.

Socioeconomic factors include household income, percentage of children in poverty, percentage with some college, percentage food insecure, percentage unemployed, and percentage with severe housing problems.

Health care factors include percentage uninsured and primary care physician rate.

Environmental factors include percentage with access to exercise opportunities and food environment index.

Combined includes all factors.

Table 3.

Multivariate Regression

Variable	Coefficient (SE), %
Variable	Model 1	Model 2	Model 3	Model 4	Model 5^a
Demographic factors
Population	0.007 (0.010)				0.004 (0.010)
Rural	0.018 (0.003)^b				−0.005 (0.003)
Female	−0.170 (0.034)^b				0.031 (0.034)
Aged <18 y	0.351 (0.029)^b				0.297 (0.028)^b
Aged ≥65 y	0.016 (0.023)				−0.080 (0.021)^b
African American	0.082 (0.005)^b				0.055 (0.006)^b
Hispanic	−0.073 (0.005)^b				−0.071 (0.006)^b
Asian	−0.256 (0.025)^b				−0.047 (0.027)
American Indian/Alaskan Native	0.057 (0.009)^b				0.076 (0.010)^b
Native Hawaiian/other	0.160 (0.064)^c				0.307 (0.145)^c
Census region
Northeast	−1.876 (0.271)^b				−1.777 (0.242)^b
South	0.184 (0.163)				−0.473 (0.166)^b
West	−4.390 (0.208)^b				−3.899 (0.199)^b
Socioeconomic factors
Household income		−0.340 (0.052)^b			−0.667 (0.051)^b
Some college		−0.073 (0.008)^b			−0.077 (0.008)^b
Food insecure		0.310 (0.023)^b			−0.017 (0.033)
Unemployed		0.151 (0.045)^b			0.199 (0.039)^b
Severe housing problems		−0.305 (0.016)^b			−0.156 (0.017)^b
Health care factors
Uninsured			0.003 (0.016)		−0.139 (0.017)^b
Primary care physician rate			−0.177 (0.010)^b		−0.044 (0.008)^b
Environmental factors
Access to exercise opportunities				−0.687 (0.066)^b	−0.009 (0.003)^b
Food environment index				−0.058 (0.003)^b	0.025 (0.088)
Observations, No.	3135	3137	3003	3117	2984
R²	0.452	0.331	0.092	0.156	0.603
Adjusted R²	0.449	0.330	0.091	0.155	0.600

Abbreviation: SE, standard error.

Combined category includes all factors.

P < .01.

P < .05.

Abbreviation: NA, not applicable because variables were not used in the corresponding mode. P < .01. Mean (SD) reported. Variables were log normalized and scaled to have a maximum value of 100. Demographic factors include percentage of population, percentage rural, percentage female, percentage younger than 18 years, percentage 65 years and older, percentage African American, percentage Hispanic, percentage Asian, percentage American Indian/Alaskan Native, and percentage Native Hawaiian/other. Socioeconomic factors include household income, percentage of children in poverty, percentage with some college, percentage food insecure, percentage unemployed, and percentage with severe housing problems. Health care factors include percentage uninsured and primary care physician rate. Environmental factors include percentage with access to exercise opportunities and food environment index. Combined includes all factors. Abbreviation: SE, standard error. Combined category includes all factors. P < .01. P < .05.

Comparison of Machine Learning Regression Models

Gradient boosting machine outperformed random forest, regression tree, and models selected using variants of the Akaike information criterion, Bayesian information criterion, and LASSO as measured by R2 in 5-fold cross validation. The top performing model was GBM, with an R2 of 0.65. The model with the next best performance was LASSO, with all second-order variable interactions, with an R2 of 0.64. The parameters of the GBM model with the highest R2 were number of trees = 180, interaction depth = 20; shrinkage = 0.05, and minimum number of observations in node = 8. See eFigure 1 and eFigure 2 in the Supplement for the performance of GBM and LASSO for a variety of parameter settings, eTable 3 in the Supplement for the top performance and the corresponding parameters of each of the models considered, and eTable 3 in the Supplement for the relative importance of the variables in the GBM model.

Comparison of Linear Multivariate and GBM Regression Models

When trained on all demographic, socioeconomic, environmental, and health care access factors and tested on new data, the linear multivariate and GBM regression explained 58.1% and 66.1% (P < .001) of the variation of obesity prevalence, respectively (Figure 2). The addition of county-level factors beyond those described led to small mean increases in the percentage of variation explained by each model, significant for the linear model and not significant for the GBM model (eTable 4 in the Supplement).

Figure 2.

Comparison of Performance of Gradient Boosting Machine Regression and Linear Multivariate Regression Using 30-Fold Cross Validation

Violin plots of the distribution of the R2 values of the gradient boosting machine and linear model models. The box plots inside the violin plot show the following values of the distribution of R2 for the gradient boosting machine and linear models: the middle lines indicate the medians, the bottom and top of each box show the 25th and 75th percentiles, respectively, the bottom whiskers show the values of the 25th percentile minus 1.5 × the interquartile range, the top whiskers show the values of the 75th percentile plus 1.5 × the interquartile range, and the top and bottom points are all outliers, defined as points in the data that lie below and above the whiskers.

Comparison of Performance of Gradient Boosting Machine Regression and Linear Multivariate Regression Using 30-Fold Cross Validation

Discussion

Using 2018 national county-level data, we found that county-level obesity prevalence showed significant geographic heterogeneity, and that this was largely explained by county-level demographic, socioeconomic, health care, and environmental factors. Using traditional epidemiologic approaches, these factors explained 58% of the variation in obesity prevalence at the county level. Using a machine learning approach, these factors explained two-thirds of the variation. Demographic and socioeconomic factors explained a significant percentage of the variation in county-level obesity. The individual factors that explained the greatest percentage of variation were census region (North, South, West, Midwest), median household income, and percentage of population with some college education. These findings are consistent with previous studies that have identified significant geographic disparities in obesity prevalence.[4,10] In particular, the South has been strongly positively associated with obesity prevalence.[10] Census region still had significant explanatory statistical power after adjusting for all available factors. This suggests substantive differences in regional obesity prevalence well beyond those explained by demographic or socioeconomic factors. The association between county-level obesity prevalence and median household income and percentage of population with some college education accords with studies documenting an inverse relationship between socioeconomic status and obesity prevalence.[3,4,6] There are socioeconomic differences in engaging in physical activity that are associated in part to access to recreational resources and perceived safety of neighborhood.[21,22] Our finding that the percentage of African American individuals in the population explained more than 10% of the variation in county-level obesity is noteworthy and concordant with other studies.[4,10,23] This is associated with the higher proportion of African American individuals living in the South and counties with lower median income, although it remains an important independent predictor of obesity.[4] Counties with higher proportions of African American individuals may have fewer healthy food options and poorer opportunities for physical activity.[24,25] On the other hand, there was a negative association between the percentage of Hispanic persons in a county and obesity prevalence, despite Hispanic persons having greater obesity rates compared with other racial/ethnic groups. Previous county-level studies[13] have also documented this negative association, but an increase in Hispanic population has been associated with an increase in obesity prevalence.[23] Some have speculated that this may be because Hispanic populations are dense in regions associated with lower obesity prevalence.[4,10] Our study is complementary to and extends earlier literature by showing that machine learning may be used to explain more variation in county-level obesity prevalence than traditional epidemiologic models.[7,12,13,14] To our knowledge, this is the first study to analyze county-level national data using machine learning algorithms. Our top-performing machine learning model explained two-thirds of the variation in county-level obesity prevalence, significantly more than traditional multivariate linear models. Epidemiologic approaches including limited, preselected variables may offer interpretable results. We found that including machine learning approaches significantly improved the total amount of variation in obesity prevalence and improved estimates of obesity prevalence in counties about which this information is unavailable. When weighing the interpretability of linear regression for decision making against the performance of machine learning models, 3 factors should be considered. First, multivariate regression models may appear more interpretable than they are, for example, owing to confounding variables. Second, machine learning algorithms offer partially interpretable outputs, such as variable importance (eFigure 3 in the Supplement). Third, some machine learning models offer both superior performance and interpretability on par with that of multivariate linear regression. Our second-best performing model, LASSO with all second-order interactions, is substantially simpler than GBM and achieved similar performance. The take-home message from these considerations is that for some decisions there may be more benefits and fewer drawbacks to using powerful machine learning models. Each of our models, including the linear regression, had significantly higher performance on the data on which they were trained than on the data on which they were tested. This demonstrates the importance of evaluating performance on testing data not previously seen by the model. We measured the percentage of variation explained using 30-fold cross validation. In particular, there were 30 repetitions of training each model on training data and testing it on entirely separate testing data. This ensures that performance is greater for models that identify relationships that exist in the data rather than models that overfit the data with spurious mathematical relationships. Our approach contrasts with the common practice of fitting a single model to the data and reporting the performance of the model (eg, R2) only on the data on which it was fit.

Limitations

Our findings should be interpreted in light of several limitations. Our analysis is based on CHR data, many of the fields of which are self-reported, sampled randomly from the population, and interpolated using statistical methods. It is likely that self-reported obesity underestimates obesity prevalence.[26,27] If this bias is nondifferential by county or other factors considered, our statistical results remain directionally valid. Furthermore, obesity prevalence was based on BMI, which is an indirect measure of adiposity and health risk. At the same BMI level, non-Hispanic African American adults have lower adiposity compared with non-Hispanic white adults.[28] Health risks begin at a lower BMI among Asian adults than among non-Hispanic white adults.[29] Therefore, BMI is an indirect measure of the health risks associated with increased adiposity. Our analyses and conclusions are restricted to the variables that are routinely captured in these data sets. Individual-level risk is not accounted for. Owing to the nature of the mathematical models underlying machine learning algorithms, such models do not produce readily interpretable variable coefficients. They do not establish causal relationships or make clear the reasons certain predictors are more important than others.

Conclusions

County-level demographic, socioeconomic, health care, and environmental factors explain the majority of the variation in county-level obesity prevalence. Machine learning models explain significantly more of the variation in obesity prevalence than traditional models. For decisions about obesity prevalence based on population characteristics, there may be more benefits and fewer drawbacks to using powerful machine learning models.

21 in total

Review 1. A review of machine learning in obesity.

Authors: K W DeGregory; P Kuiper; T DeSilvio; J D Pleuss; R Miller; J W Roginski; C B Fisher; D Harness; S Viswanath; S B Heymsfield; I Dungan; D M Thomas
Journal: Obes Rev Date: 2018-02-09 Impact factor: 9.213

2. Association between body-mass index and risk of death in more than 1 million Asians.

Authors: Wei Zheng; Dale F McLerran; Betsy Rolland; Xianglan Zhang; Manami Inoue; Keitaro Matsuo; Jiang He; Prakash Chandra Gupta; Kunnambath Ramadas; Shoichiro Tsugane; Fujiko Irie; Akiko Tamakoshi; Yu-Tang Gao; Renwei Wang; Xiao-Ou Shu; Ichiro Tsuji; Shinichi Kuriyama; Hideo Tanaka; Hiroshi Satoh; Chien-Jen Chen; Jian-Min Yuan; Keun-Young Yoo; Habibul Ahsan; Wen-Harn Pan; Dongfeng Gu; Mangesh Suryakant Pednekar; Catherine Sauvaget; Shizuka Sasazuki; Toshimi Sairenchi; Gong Yang; Yong-Bing Xiang; Masato Nagai; Takeshi Suzuki; Yoshikazu Nishino; San-Lin You; Woon-Puay Koh; Sue K Park; Yu Chen; Chen-Yang Shen; Mark Thornquist; Ziding Feng; Daehee Kang; Paolo Boffetta; John D Potter
Journal: N Engl J Med Date: 2011-02-24 Impact factor: 91.245

Review 3. A comparison of direct vs. self-report measures for assessing height, weight and body mass index: a systematic review.

Authors: S Connor Gorber; M Tremblay; D Moher; B Gorber
Journal: Obes Rev Date: 2007-07 Impact factor: 9.213

4. The geographic concentration of US adult obesity prevalence and associated social, economic, and environmental factors.

Authors: Tim Slack; Candice A Myers; Corby K Martin; Steven B Heymsfield
Journal: Obesity (Silver Spring) Date: 2014-02-06 Impact factor: 5.002

5. Decomposing Racial Disparities in Obesity Prevalence: Variations in Retail Food Environment.

Authors: Chelsea R Singleton; Olivia Affuso; Bisakha Sen
Journal: Am J Prev Med Date: 2015-10-23 Impact factor: 5.043

6. Regional disparities in obesity prevalence in the United States: A spatial regime analysis.

Authors: Candice A Myers; Tim Slack; Corby K Martin; Stephanie T Broyles; Steven B Heymsfield
Journal: Obesity (Silver Spring) Date: 2014-12-17 Impact factor: 5.002

7. Deaths Attributable to Diabetes in the United States: Comparison of Data Sources and Estimation Approaches.

Authors: Andrew Stokes; Samuel H Preston
Journal: PLoS One Date: 2017-01-25 Impact factor: 3.240

8. Socioeconomic differences in lack of recreational walking among older adults: the role of neighbourhood and individual factors.

Authors: Carlijn Bm Kamphuis; Frank J van Lenthe; Katrina Giskes; Martijn Huisman; Johannes Brug; Johan P Mackenbach
Journal: Int J Behav Nutr Phys Act Date: 2009-01-05 Impact factor: 6.457

9. Change in Obesity Prevalence across the United States Is Influenced by Recreational and Healthcare Contexts, Food Environments, and Hispanic Populations.

Authors: Candice A Myers; Tim Slack; Corby K Martin; Stephanie T Broyles; Steven B Heymsfield
Journal: PLoS One Date: 2016-02-05 Impact factor: 3.240

10. Redrawing the US Obesity Landscape: Bias-Corrected Estimates of State-Specific Adult Obesity Prevalence.

Authors: Zachary J Ward; Michael W Long; Stephen C Resch; Steven L Gortmaker; Angie L Cradock; Catherine Giles; Amber Hsiao; Y Claire Wang
Journal: PLoS One Date: 2016-03-08 Impact factor: 3.240

10 in total

1. Use of Machine Learning to Determine the Information Value of a BMI Screening Program.

Authors: Samane Zare; Michael R Thomsen; Rodolfo M Nayga; Anthony Goudie
Journal: Am J Prev Med Date: 2021-01-19 Impact factor: 5.043

2. The Hispanic paradox in the prevalence of obesity at the county-level.

Authors: Areli Valencia; Bongeka Z Zuma; Gabriela Spencer-Bonilla; Lenny López; David Scheinker; Fatima Rodriguez
Journal: Obes Sci Pract Date: 2020-10-23

Review 3. Physical activity, diet, and weight loss in patients recruited from primary care settings: An update on obesity management interventions.

Authors: Louise de Lannoy; Theresa Cowan; Angela Fernandez; Robert Ross
Journal: Obes Sci Pract Date: 2021-05-04

4. Forecasting the rate of hand injuries in Singapore.

Authors: Liau Zi Qiang Glen; Joel Yat Seng Wong; Wei Xuan Tay; Jiayi Weng; Gregory Cox; Andre Eu Jin Cheah
Journal: J Occup Med Toxicol Date: 2022-05-04 Impact factor: 2.646

5. Immediate weight loss before ovarian stimulation with intrauterine insemination is associated with a lower risk of preeclampsia in women with obesity and unexplained infertility.

Authors: Robert A Wild; Rodney K Edwards; Daniel Zhao; Ashley S Kim; Karl R Hansen
Journal: F S Rep Date: 2022-06-16

Review 6. The Burden of Obesity in the Rural Adult Population of America.

Authors: Okelue E Okobi; Olamide O Ajayi; Tobechukwu J Okobi; Ifeoma C Anaya; Oyinlola O Fasehun; Chiamaka S Diala; Endurance O Evbayekha; Abimbola O Ajibowo; Iyanu V Olateju; Joanna J Ekabua; Mireille B Nkongho; Ijeoma O Amanze; Anthonette Taiwo; Ovie Okorare; Ugochi S Ojinnaka; Omosefe E Ogbeifun; Nnenna Chukwuma; Emmanuel J Nebuwa; Janet A Omole; Iboro O Udoete; Rita K Okobi
Journal: Cureus Date: 2021-06-20

7. Dietary Blueberry and Soluble Fiber Supplementation Reduces Risk of Gestational Diabetes in Women with Obesity in a Randomized Controlled Trial.

Authors: Arpita Basu; Du Feng; Petar Planinic; Jeffrey L Ebersole; Timothy J Lyons; James M Alexander
Journal: J Nutr Date: 2021-05-11 Impact factor: 4.687

8. Multimethod, multidataset analysis reveals paradoxical relationships between sociodemographic factors, Hispanic ethnicity and diabetes.

Authors: Gabriel M Knight; Gabriela Spencer-Bonilla; David M Maahs; Manuel R Blum; Areli Valencia; Bongeka Z Zuma; Priya Prahalad; Ashish Sarraju; Fatima Rodriguez; David Scheinker
Journal: BMJ Open Diabetes Res Care Date: 2020-11

9. The BAriatic surgery SUbstitution and nutrition (BASUN) population: a data-driven exploration of predictors for obesity.

Authors: Gudrún Höskuldsdóttir; My Engström; Araz Rawshani; Ville Wallenius; Frida Lenér; Lars Fändriks; Karin Mossberg; Björn Eliasson
Journal: BMC Endocr Disord Date: 2021-09-10 Impact factor: 2.763

10. Identification of county-level health factors associated with COVID-19 mortality in the United States.

Authors: Wei Pan; Yasuo Miyazaki; Hideyo Tsumura; Emi Miyazaki; Wei Yang
Journal: J Biomed Res Date: 2020-09-30

10 in total