| Literature DB >> 34855745 |
Yanan Long1,2,3, Qi Chen4, Henrik Larsson4,5, Andrey Rzhetsky2,3,6.
Abstract
The human sex ratio at birth (SRB), defined as the ratio between the number of newborn boys to the total number of newborns, is typically slightly greater than 1/2 (more boys than girls) and tends to vary across different geographical regions and time periods. In this large-scale study, we sought to validate previously-reported associations and test new hypotheses using statistical analysis of two very large datasets incorporating electronic medical records (EMRs). One of the datasets represents over half (∼ 150 million) of the US population for over 8 years (IBM Watson Health MarketScan insurance claims) while another covers the entire Swedish population (∼ 9 million) for over 30 years (the Swedish National Patient Register). After testing more than 100 hypotheses, we showed that neither dataset supported models in which the SRB changed seasonally or in response to variations in ambient temperature. However, increased levels of a diverse array of air and water pollutants, were associated with lower SRBs, including increased levels of industrial and agricultural activity, which served as proxies for water pollution. Moreover, some exogenous factors generally considered to be environmental toxins turned out to induce higher SRBs. Finally, we identified new factors with signals for either higher or lower SRBs. In all cases, the effect sizes were modest but highly statistically significant owing to the large sizes of the two datasets. We suggest that while it was unlikely that the associations have arisen from sex-specific selection mechanisms, they are still useful for the purpose of public health surveillance if they can be corroborated by empirical evidences.Entities:
Mesh:
Year: 2021 PMID: 34855745 PMCID: PMC8638995 DOI: 10.1371/journal.pcbi.1009586
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.475
Exogenous factors reported in the literature to have an impact on the SRB [6, 14].
A “-” indicates that sample sizes were not mentioned in the articles reporting or reviewing the corresponding results.
| Exogenous Factor | Number of Studies | Sample Size |
|---|---|---|
| Dioxins [ | 13 | 291 |
| Polychlorinated biphenyls (PCBs) [ | 9 | 98 |
| 1,2-Dibromo-3-chloropropane (DBCP) [ | 2 | 29 |
| Dichlorodiphenyltrichloroethane (DDT) [ | 4 | 1623 |
| Hexachlorobenzene (HCB) [ | 2 | 262 |
| Vinclozolin [ | 1 | 95 |
| Multiple pesticides [ | 5 | 382 |
| Lead [ | 5 | 6566 |
| Methylmercury [ | 1 | 4808 |
| Multiple metals [ | 10 | 1015 |
| Non-ionizing radiation [ | 12 | 2926 |
| Ionizing radiation [ | 15 | 4959 |
| Seasonality [ | 2 | - |
| Ambient temperature [ | 4 | - |
| Economic stress [ | 1 | - |
| Terrorist attacks [ | 2 | - |
Fig 1Airborne health-related substances and their association with the SRB.
A: Comparison of airborne pollutant concentrations across the US (cyan violin plots) and Sweden (pink violin plots). Only 4 air components, fine particulate matter (PM2.5), coarse particulate matter (PM10), sulfur dioxide (SO2), and nitrogen dioxide (NO2) are measured in both countries. US counties appear to have higher mean pollution levels and are more variable in terms of pollution. B-M: A sample of 12 one-environmental factor logistic regression models that are most explanatory with respect to SRB. For each environmental factor, we partition counties into 7 equal-sized groups (septiles), ordered by levels of measurements, so that the first septile corresponds to the lowest and the highestnth septile to the highest concentration. Each plot shows bar plots of regression coefficients and 95% confidence intervals (error bar) of the second to the seventh septiles, with the first septile chosen as the reference level. We rank the 12 models by the statistically significant factor’s association strength with at least one statistically significant coefficient by decreasing ΔIC; septiles whose coefficients are not significantly different from 0 at the 95% confidence level have been plotted with a reduced alpha level. Blue bars represent positive coefficients, whereas red bars represent negative coefficients. “Negative food-related businesses” is a term used by the Environmental Protection Agency’s Environmental Quality Index team and is explained as “businesses like fast-food restaurants, convenience stores, and pretzel trucks.” “Percent vacant units” stands for “percent of vacant housing units.” Substances contributing to clusters 10 and 25 are listed in Table 2. See Table K in S1 Appendix for more details regarding the factors’ and clusters’ identities.
Pollutant clusters discovered by applying the Ward’s method to the EQI raw measurements dataset.
| Cluster number | factor |
|---|---|
| 1 | a_hcbd_ln,a_hccpd_ln |
| 2 | a_nitrobenzene_ln,a_dma_ln |
| 3 | a_2clacephen_ln,a_bromoform_ln |
| 4 | a_pnp_ln,a_toluene_ln |
| 5 | a_be_ln,a_se_ln |
| 6 | a_dmf_ln,a_edb_ln,a_edc_ln |
| 7 | a_teca_ln,a_procl2_ln,a_cl4c2_ln,a_vycl_ln,county_pop_2000 |
| 8 | a_benzyl_cl_ln,a_me2so4_ln |
| 9 | mean_zn_ln,mean_cu_ln |
| 10 | mean_al_pct,mean_p_pct |
| 11 | numdays_close_activity_tot,numdays_cont_activity_tot |
| 12 | mean_as_ln,mean_se_ln |
| 13 | a_glycol_ethers_ln,a_etn_ln,a_vyac_ln |
| 14 | mean_na__pct_ln,mean_mg_pct_ln,mean_ca_pct_ln |
| 15 | a_cs_ln,a_edcl2_ln |
| 16 | a_ccl4,a_mtbe_ln |
| 17 | pct_harvest_acres,herbicides_ln,insecticides_ln |
| 18 | a_112tca_ln,a_ch3cn_ln |
| 19 | a_hcb_ln,a_pcp_ln,a_pcbs_ln |
| 20 | mg_ln_ave,k_ln_ave |
| 21 | pct_defoliate_acres_ln,pct_disease_acres_ln,pct_nematode_acres_ln |
| 22 | a_so2_mean_ln,a_no2_mean_ln,a_o3_mean_ln,so4_mean_ave |
| 23 | med_hh_value,med_hh_inc |
| 24 | rate_food_env_pos_log,rate_rec_env_log |
| 25 | ca_ln_ave,nh4_mean_ave |
| 26 | w_as_ln,w_ba_ln,w_cd_ln,w_cr_ln,w_cn_ln |
| w_fl_ln,w_hg_ln,w_no3_ln,w_no2_ln,w_se_ln | |
| w_sb_ln,w_be_ln,w_ti_ln,w_endrin_ln | |
| w_lindane_ln,w_methoxychlor_ln,w_toxaphene_ln | |
| w_dalapon_ln,w_deha_ln,w_oxamyl_ln,w_simazine_ln | |
| w_dehp_ln,w_picloram_ln,w_dinoseb_ln | |
| w_hccpd_ln,w_carbofuran_ln,w_atrazine_ln | |
| w_alachlor_ln,w_heptachlor_ln,w_heptachlor_epox_ln | |
| w_24d_ln,w_silvex_ln,w_hcb_ln,w_benzoap_ln | |
| w_pcp_ln,w_124tcib_ln,w_pcb_ln,w_dbcp_ln | |
| w_edb_ln,w_xylenes_ln,w_chlordane_ln,w_dcm_ln | |
| w_odcb_ln,w_pdcb_ln,w_vcm_ln,w_11dce_ln | |
| w_t12dce_ln,w_edc_ln,w_111trichlorane_ln | |
| w_ccl4_ln,w_pdc_ln,w_trichlorene_ln,w_112tca_ln | |
| w_c2cl4_ln,w_cl1benz_ln,w_benzene_ln,w_toluene_ln | |
| w_ethylbenz_ln,w_stryene_ln,w_alpha_ln,w_dce_ln |
Test results for factors selected from the literature reports (Table 1).
We included a factor only if both its ΔIC and the coefficient of at least one of its septiles was statistically significant.
| Factor name | effect |
|---|---|
| PCBs (air and water) | ↑ |
| DBCP (water) | − |
| Lead (land) | ↓ |
| Lead (air) | − |
| Aluminium (air) | ↑ |
| Chromium (air) | − |
| Chromium (water) | ↑ |
| Arsenic (land) | − |
| Arsenic (water) | ↑ |
| Cadmium (air and water) | − |
| Total mercury deposition | ↑ |
| Violent crime rate | − |
| Unemployed rate | − |
| Working out of county (long commute) | − |
Test results for additional factors with statistically significant effects.
We included a factor only if both its ΔIC and the coefficient of at least one of its septiles was statistically significant.
| Factor name | effect |
|---|---|
| Iron | ↓ |
| Nitrate | ↑ |
| 2-Nitropropane | ↑ |
| Carbon monoxide | ↑ |
| Bis-2-ethylhexyl phthalate | ↓ |
| Ethyl chloride | ↑ |
| Isophorone | ↑ |
| Hydrazine | ↓ |
| Phosphorus | ↑ |
| Quinonline | ↓ |
| Extreme drought | ↑ |
| Traffic fatality rate | ↑ |
| Industrial permits per 1000 km of stream | ↓ |
| Animal units | ↓ |
| Irrigation | ↓ |
| Negative food related businesses | ↓ |
| Renter occupation | ↓ |
| Vacant units | ↑ |
Fig 2County-level geographical septile distribution for the first 12 statistically significant factors with at least one statistically significant coefficient ranked by decreasing ΔIC.
The factors labelled A–M are the same as shown in Fig 1, Plates B–M and are ordered identically in both figures. Base map was taken from https://github.com/hrbrmstr/albersusa/blob/master/inst/extdata/composite_us_counties.geojson.gz.
Fig 3Time series plots and out-of-sample forecasts for SRB data grouped into 7-day periods and fitted with seasonal ARIMA models.
The blue shade is the 95% confidence level. The observed SRBs for the first five months after the intervention are presented by red dots, whereas the observed SRBs for 7 to 9 months after the intervention are presented by purple dots. A: Hurricane Katrina, all states; B: Hurricane Katrina, Louisiana and Mississippi only; C: Virginia Tech shooting, all states; D: Virginia Tech shooting, adjacent states only.
Fig 4Time series plots and out-of-sample forecasts for SRB data grouped into 7-day periods and fitted with state space models.
The blue shade is the 95% confidence level. The observed SRBs for the first five months after the intervention are presented by red dots, whereas the observed SRBs for 7 to 9 months after the intervention are presented by purple dots. A: Hurricane Katrina, all states; B: Hurricane Katrina, Louisiana and Mississippi only; C: Virginia Tech shooting, all states; D: Virginia Tech shooting, adjacent states only.