Literature DB >> 30371821

Development and validation of the PEPPER framework (Prenatal Exposure PubMed ParsER) with applications to food additives.

Mary Regina Boland1,2,3,4, Aditya Kashyap5, Jiadi Xiong5, John Holmes1,2, Scott Lorch6.   

Abstract

Background: Globally, 36% of deaths among children can be attributed to environmental factors. However, no comprehensive list of environmental exposures exists. We seek to address this gap by developing a literature-mining algorithm to catalog prenatal environmental exposures.
Methods: We designed a framework called. PEPPER: Prenatal Exposure PubMed ParsER to a) catalog prenatal exposures studied in the literature and b) identify study type. Using PubMed Central, PEPPER classifies article type (methodology, systematic review) and catalogs prenatal exposures. We coupled PEPPER with the FDA's food additive database to form a master set of exposures.
Results: We found that of 31 764 prenatal exposure studies only 53.0% were methodology studies. PEPPER consists of 219 prenatal exposures, including a common set of 43 exposures. PEPPER captured prenatal exposures from 56.4% of methodology studies (9492/16 832 studies). Two raters independently reviewed 50 randomly selected articles and annotated presence of exposures and study methodology type. Error rates for PEPPER's exposure assignment ranged from 0.56% to 1.30% depending on the rater. Evaluation of the study type assignment showed agreement ranging from 96% to 100% (kappa = 0.909, p < .001). Using a gold-standard set of relevant prenatal exposure studies, PEPPER achieved a recall of 94.4%. Conclusions: Using curated exposures and food additives; PEPPER provides the first comprehensive list of 219 prenatal exposures studied in methodology papers. On average, 1.45 exposures were investigated per study. PEPPER successfully distinguished article type for all prenatal studies allowing literature gaps to be easily identified.

Entities:  

Mesh:

Substances:

Year:  2018        PMID: 30371821      PMCID: PMC6213088          DOI: 10.1093/jamia/ocy119

Source DB:  PubMed          Journal:  J Am Med Inform Assoc        ISSN: 1067-5027            Impact factor:   4.497


Introduction

Importance of environment in human health and during the prenatal period

Environmental exposures are critically important for understanding human health and disease. Globally, 23% of all deaths and 36% of deaths among children (ages 0-14 years) can be attributed to environmental factors. Several types of factors are collectively termed “environment” by researchers. These include lifestyle factors (exercise, diet, and stress), climate factors (sunlight, precipitation, and wind speed), socioeconomic factors (occupation, income, insurance status), pharmacological factors (opioids, antidepressants, Non-Steroidal Anti-Inflammatory Drugs or NSAIDS) and pollutant factors (phthalates, fine and coarse air particulates). These 5 types of “factors” are collectively termed “environmental exposures” by the scientific community. Each of these plays a role in variance in health outcomes. However, no comprehensive list of environmental exposures exists.

Informatics and exposome-related science

Several informatics methods have investigated environmental perturbations and their effects on human health and disease. Studies have used Electronic Health Record (EHR) data to investigate environmental challenge in disease progression, identified thyroid cancer hotspots in Vermont, and correlated air pollution with disease risk in Italy. Environment-Wide Association Studies have been performed using EHR data. Boland et al. developed a method correlating birth season and trimester information with climate and pollution variables. Boland et al demonstrated that informatics methods can overcome EHR biases and be used to probe the disease-environment interaction with findings confirmed in canines. There remains a need for informatics methods to investigate environmental effects on human health following prenatal exposure. Importantly, many studies investigating the effect of the environment on human disease often look at a single environmental factor (eg air pollution) without exploring simultaneous exposure to multiple environmental exposures. The classic EWAS study investigated relationships between all factors measured in the National Health and Nutrition Examination Survey (NHANES) and their relationship with 1 outcome - type 2 diabetes. However, the NHANES contains elements from a survey and is not a comprehensive list of all environmental exposures. This purpose of this study is to derive a comprehensive list of environmental exposures from the environmental exposures studied in biomedical literature.

Literature mining of PubMed Central and cataloging exposures

Many methods utilize publically available abstracts from PubMed including PubMatrix, PubTator, LitInspector, PolySearch and COSMIC. Some methods use PubMed abstracts as 1 source in their biological networks, such as the STRING database. Another method links genes, diseases and drugs together into biological networks while another method linked PubMed abstracts with information from EHRs for pancreatic cancer prediction. Many methods use PubMed abstracts exclusively because the full text remains behind a pay wall. These articles often list exclusive use of article abstracts as a critical limitation. Recently, a study demonstrated that using the full text outperformed use of only abstracts. Alternatively, PubMed Central (PMC) is a subset of PubMed containing only freely accessible full-text manuscripts. Using the full text of the manuscript allows an algorithm to be constructed that distinguishes methodology type (eg original research articles, non-systematic reviews, editorials, perspectives, and clinical practice guidelines). Study type is important because some prenatal exposures are discussed frequently in the literature in guidelines and perspectives, but rarely studied in original research articles. Therefore, a gap in the literature exists for those exposures that would not be readily apparent from a quick search in Pubmed. We constructed our algorithm to identify study methodology type to easily identify literature gaps for environmental exposures. The purpose of this study is to develop a methodological framework that uses knowledge from the literature to derive a comprehensive list of environmental exposures studied during the prenatal period. This will allow researchers to assess the “state-of-the-field” while identifying literature gaps in the prenatal exposure space. In addition, we provide the first list of environmental exposures studied during the prenatal period for other researchers to use. The method we developed is called PEPPER: Prenatal Exposure PubMed ParsER. PEPPER uses full-text research articles from PMC to identify environmental exposures studied during the prenatal period in methodological studies. Using PEPPER, we are able to identify exposures that are studied more frequently in review papers vs methodological studies, which may indicate a form of publication bias. We provide a case study of PEPPER’s use within the food additive domain and identify literature gaps, which can be addressed by further study.

Methods

Dataset: Pubmed Central

PubMed Central (https://www.ncbi.nlm.nih.gov/pmc/) is a publicly available database maintained by the National Library of Medicine that provides the full text of each article. We used PMC as our literature data source for this study. PEPPER used the requests and BeautifulSoup libraries of Python 2.7, along with PMCID lists obtained from PMC.

Preliminary steps: extracting all relevant prenatal exposure articles

A simple query of “prenatal” and “exposure” on PMC retrieves 48 066 articles (query performed in March 2018). However, we found that many of those articles mentioned the term “prenatal” and “exposure” in the reference section making the results not relevant. This was especially true for environmental exposures. For example, general studies on lead, including chemical synthesis and reactivity studies would cite a reference on prenatal exposure to lead because of the media attention placed on the importance of prenatal exposure to lead. We refined the query and restricted the terms “prenatal” and “exposure” to the body of the manuscript excluding the references using the query: (“prenatal”[Body—All Words] AND “exposure”[Body—AllWords]). Additional details are available in Supplementary Appendix. This resulted in relevant retrieved articles and a dataset of 31 764 studies.

Construction of PEPPER framework

We extracted all prenatal exposure papers with mention of the terms prenatal and exposure in the body of the text (excluding reference mentions). The PEPPER framework a) identifies articles that are methodological in nature and b) extracts relevant exposures identified in those studies. We designed PEPPER to identify whether or not a study was methodological because no “method” tag exists for searching PMC. The corpus of 31 764 articles served as input into our PEPPER Framework along with an initial exposure list seed (development described subsequently, Figure 1).
Figure 1.

PEPPER flowchart detailing the steps of the algorithmic framework.

PEPPER flowchart detailing the steps of the algorithmic framework. The first step was to query PMC to retrieve articles relevant for specific exposures. An example of the PMC query is shown below. We employ the Arden syntax “curly bracket” notation to represent the exposure variables that PEPPER iterates over:where “{term}” is the particular chemical or contaminant of interest. (“prenatal”[Body—All Words] AND “exposure”[Body—AllWords])AND(“”[Title]OR“”[Abstract]) We decided to restrict the exposure term to occur only within the abstract and title because certain exposures (eg “phthalates”) were mentioned frequently in the Introduction and Background of prenatal exposure papers (even if the topic of the paper was about a different exposure, such as “mercury”). We made one exception for lead. We used the chemical element name “pb” (which includes “Pb”) instead of “lead” because “lead” is used commonly in other contexts (eg as a verb). PEPPER ignores capitalization of terms.

Distinguishing study methodology type

We used the following steps to identify articles with methods from those without methods. Step 1, we acquired all h2 section headers for each article. Step 2, the html was converted to lxml markup. Step 3, we searched for headings containing the term “method” in the h2 header (eg “Materials and Methods,” “Methodological Framework”). To distinguish methodological review papers (eg systematic reviews) from other methodological papers, we searched for the word “review” in the title. This allowed us to distinguish papers without methods, review papers with a methods section, and non-review papers with a methods section (ie the “original research studies”). Importantly, we only used the word “review” in the title of the paper for methodology studies to separate out the systematic reviews from original research articles because both article types have methods sections. Reviews without methods sections were binned in the “non-methodological study” category.

Development of initial exposure seed list and iterative refinement

To retrieve articles related to specific exposures, we needed an initial exposure list seed. We carefully reviewed all articles published between April 2016 and September 2017 and manually extracted exposures from methodology studies and meta-analyses. This resulted in a set of 419 articles, which included reviews and other non-methodology studies. Our manual review resulted in an initial exposure list of 29 exposures. Another member of the team carefully reviewed this list, and both individuals met to discuss any discrepancies.

PEPPER semi-automatic exposure generation process

Each exposure term from the initial 29-exposure seed was queried iteratively (one exposure at a time). Querying PMC for “prenatal” and “exposure” returned some articles that did not study an exposure listed in the initial seed set. We retained these articles and then performed additional processing to extract titles and calculate term frequencies. Titles were chosen because the majority of articles can be understood using the title alone. We extracted all titles from the articles not in the exposure list. The scraped articles (in html) were processed in the following manner. First, stop words were removed using the nltk library of python. Next, using the re library of python, regular expressions were written to remove all the content within brackets, along with figure numbers and table numbers. Words such as “he/she” and “male/female” that contained a backslash within it or any other special character were also removed. Words which were 3 letters or shorter in length were removed in order to eliminate abbreviations. Finally, we eliminated non-ASCII characters, such as “ä” and “æ.” PEPPER also ignores the case (ie capitalization) of the terms. Term frequency analysis on these titles allowed for easy manual review and subsequent updating of the environmental exposure seed list. The term frequency lists were manually reviewed by two co-authors (AK, MRB) to ensure that non-environmental exposure topics, including age, schooling, and so forth were removed. We coupled the 43-exposure set with a database of food additives, described in section 2.5 for a final exposure set of 219 exposures (Supplementary Files S1 and S2).

PEPPER exposure extraction and labeling across entire corpus

PEPPER extracted and labeled all articles in the 31 764-article corpus with the appropriate exposures. Studies were not assigned a particular exposure by PEPPER until the entire set of exposures was defined. In addition a study could be assigned multiple exposures given that some studies investigate multiple exposures. For example, 1 study may investigate the effects of obesity, high-fat diet and smoking on prenatal development. In that case “obesity,” “high-fat diet,” and “smoking” would each be assigned to the study.

Sex-specific study analysis

Studies were defined as having sex-specific outcomes, if they were returned following a separate PMC query given below:where “{term}” is the particular chemical or contaminant of interest. (“prenatal”[Body—All Words] AND “exposure”[Body—AllWords])AND(“”[Title]OR“”[Abstract])AND((“sexspecific”[Title]OR“sexdifferences”[Title])OR(“sexspecific”[Abstract]OR“sexdifferences”[Abstract])) An example query for “alcohol” is given below: (“prenatal”[Body—All Words] AND “exposure”[Body—AllWords])AND(“”[Title]OR“”[Abstract])AND((“sexspecific”[Title]OR“sexdifferences”[Title])OR(“sexspecific”[Abstract]OR“sexdifferences”[Abstract])) Knowledge of sex-specific outcomes is important for exposures, especially Endocrine Disrupting Chemicals (EDCs). Sex-specific outcomes were only explored in the 43 common exposures.

Methods for an example case study for PEPPER: food additives

Food additives have been studied since the 1940s and continue to be studied throughout the following decades. For our case study, we expanded our exposure list to include all possible food additives. The United States Food and Drug Administration (FDA) maintains a database of food additives called “Everything Added to Food in the United States” or EAFUS. EAFUS contains information on 3968 compounds. We downloaded the publicly available EAFUS, including information on all 3968 food additives in February 2018 (https://www.fda.gov/Food/IngredientsPackagingLabeling/FoodAdditivesIngredients/ucm115326.htm). We used only publicly available information and did not clean this list, but used the raw terms in our case study of PEPPER. As shown in Table 1, each compound is assigned a toxicological status code. There are 6 categories provided (Table 1), we grouped these into 4 categories for ease of interpretation. We provide our 4 super-categories - 1) commonly used; 2) new; 3) limited to no use; and 4) banned -along with the original EAFUS group in Table 1. All 3968 food additive compounds were run through PEPPER. We only made one small modification to PEPPER removing the quotes around the exposure term. By removing the quotes around the exposure term, we are making use of PMC’s built-in term expansion software, which is critical given that these exposures were not manually curated. For example, the food additive “calamus root,” would be queried as follows:
Table 1.

EAFUS food additive categories and PEPPER super-categories

EAFUS Food Additive Categories
CodeCategory DescriptionPEPPER Super- Category
ASPFully up-to-date toxicology information has been sought.Commonly_used
EAFThere is reported use of the substance, but it has not yet been assigned for toxicology literature search.New
NEWThere is reported use of the substance, and an initial toxicology literature search is in progress.New
NILAlthough listed as added to food, there is no current reported use of the substance, and, therefore, although toxicology information may be available in PAFA, it is not being updated.Limited_use
NULThere is no reported use of the substance and there is no toxicology information available in PAFA.Limited_use
BANThe substance was formerly approved as a food additive but is now banned; there may be some toxicology data available.Banned
EAFUS food additive categories and PEPPER super-categories (“prenatal”[Body—All Words] AND “exposure”[Body—AllWords])AND([Title]OR[Abstract])

Evaluation of PEPPER

Our evaluation for PEPPER consisted of 3 parts: 1) evaluation of the exposure assignment; 2) evaluation of the study type assignment, and 3) evaluation of the recall of PEPPER. For the first part, 2 raters manually reviewed 50 retrieved articles by PEPPER and annotated exposures using a set of 43 exposures (not including EAFUS). Because each article could have multiple exposures, each rater annotated all exposures mentioned in the title and abstract. For the evaluation of the study type assignment, another random set of 50 articles was extracted. Each rater assigned the manuscript as a methodology paper, non-methodology paper, PDF-only paper or systematic review. We distinguish PDF papers from other types of papers, because PEPPER (along with PMC’s search functionality) cannot parse the full text of PDF-only studies. After each rater conducted their evaluations independently, they met to discuss differences and develop consensus.

Evaluating the recall of PEPPER

In information retrieval, recall is assessed to determine how many of a set of gold standard relevant documents are retrieved by a given algorithm. Our gold-standard consists of a systematic review and meta-analysis published in 2017 on fetal alcohol spectrum disorder. This paper contained 86 references that are key in the field. References were excluded from the gold-standard if they were: a) websites; b) software packages (eg python, R); c) statistical methods papers, and d) references to background birth rates that were not exposure-related. This process resulted in a set of 64 gold-standard relevant alcohol exposure studies. We reviewed each study for presence/absence in PMC and if absent from PMC then presence/absence in Pubmed was ascertained. Of 64 relevant articles, only 18 were found in PMC. PEPPER’s recall was determined using these 18 relevant alcohol exposure studies.

Results

For researchers interested in running an article or a set of articles through PEPPER, we provide code and relevant datasets at: https://github.com/bolandlab/PEPPER/.

Corpus

Our final prenatal exposure study corpus consisted of all articles on PMC with a mention of “prenatal” and “exposure” within the body of the article. The final set included 31 764 articles ranging in publication date from 1921 through 2018.

Study methodology type varies by prenatal exposure

We investigated the study methodology type and how it varies by prenatal exposure. Overall, 16 832 prenatal exposure articles were methodological out of 31 764 articles (53.0%). We highlight the percent of studies that are methodological and the percent of studies constituting systematic reviews in Figure 2. We sorted exposures by the total number of studies per exposure for clarity. Notice that some exposures, eg iodine (second-highest peak in Figure 2B), have relatively fewer methodological studies and relatively more systematic reviews. This indicates that fewer original research articles contribute to those systematic reviews. Other exposures, such as fluoride (highest peak in Figure 2B) have a relatively large proportion of systematic reviews with 31 total studies including 16 methodological and 3 systematic reviews. Not all exposures that were low in methodological studies were necessarily high in systematic reviews because some non-methodological studies are editorials, perspectives, case studies and clinical practice guidelines.
Figure 2.

Percent of studies that are methodological vs systematic reviews by prenatal exposure. Exposures are sorted by the total number of studies per exposure. Notice that some exposures have less then 50% of the research consisting of methodological studies, for example radiation, genetics, iodine, and thiamine (Figure 2A). Other exposures are frequently studied in systematic reviews, for example fluoride, iodine, and medications (Figure 2B). Notice that not all exposures that are low in methodological studies are necessarily high in systematic reviews (although some like iodine are), and this is because some non-methodological studies are editorials, perspectives, case studies, and clinical practice guidelines.

Percent of studies that are methodological vs systematic reviews by prenatal exposure. Exposures are sorted by the total number of studies per exposure. Notice that some exposures have less then 50% of the research consisting of methodological studies, for example radiation, genetics, iodine, and thiamine (Figure 2A). Other exposures are frequently studied in systematic reviews, for example fluoride, iodine, and medications (Figure 2B). Notice that not all exposures that are low in methodological studies are necessarily high in systematic reviews (although some like iodine are), and this is because some non-methodological studies are editorials, perspectives, case studies, and clinical practice guidelines. PEPPER allowed us to compute the number of methodology studies per number of systematic reviews for each exposure. This enabled us to estimate the average number of methodology studies underlying each systematic review. The overall average across all 43 common exposures (excluding exposures with 0 systematic reviews) was 28.95 methodology studies per systematic review with a standard deviation of 28.24. We provide the proportion of methodology studies per systematic review for each exposure in Table 2. Some exposures, such as testosterone, have a high proportion—148—indicating that a large number of methodology studies exist per systematic review. This occurs when an exposure is well studied in the literature. Other exposures, such as HIV and soy, have low proportions. An exposure can have a low proportion for two reasons: 1) there is a relatively large number of systematic reviews given the total number of studies (eg soy) or 2) there is a relatively low number of methodological studies (eg HIV). Exposures such as HIV tend to have a large number of non-methodological studies, including guidelines, case reports and perspectives. We provide the breakdown of study methodology type for each of the 43 common exposures and the 176 EAFUS food exposures as Supplementary files (Supplementary Files S3 and S4).
Table 2.

Proportion of methodology studies per systematic review by prenatal exposure

ExposureOriginal Research Articles (ie Methodological Studies)Systematic Review StudiesNum. Methodological Studies / Num. of Systematic Reviews
Genetic13109214.239
Stress16006624.242
Alcohol12694230.214
Smoking12615921.373
Obesity7375712.93
HIV4883314.788
Testosterone4383146
Estrogen263643.833
Cocaine334655.667
Nicotine2931915.421
Medication2622311.391
Mercury2281120.727
Air pollution2631616.438
Vitamin D2371318.231
Radiation118148.429
BPA237926.333
Twin167918.556
Folic acid1641312.615
Phthalate202633.667
Arsenic1691313
Endocrine disrupting chemicals8999.889
Opioid1381112.545
Marijuana157352.333
Anti depressant103911.444
Malaria109521.8
Temperature94331.333
High fat diet136268
Lead114619
Choline1051105
DDT64232
Cannabis4859.6
Manganese63512.6
DEHP84242
Iodine4075.714
Valproic acid63163
PFAS47223.5
Soy34217
Acetaminophen250NA*
Fluoride1635.333
Farming150NA*
Domoic acid70NA*
TCE60NA*
Thiamine20NA*

Because there are 0 Systematic Reviews this number is not possible to compute.

Proportion of methodology studies per systematic review by prenatal exposure Because there are 0 Systematic Reviews this number is not possible to compute.

Studies evaluating sex-specific effects

The percent of prenatal exposure studies investigating sex-specific effects is shown in Figure 3, along with the breakdown of sex-specific methodology studies. Sex-specific variance in exposure outcomes was studied frequently for testosterone, estrogen, endocrine disrupting chemicals (or EDCs), BPA (bisphenol A), high-fat diet, phthalate, and soy.
Figure 3.

Percent of prenatal exposure studies on a particular exposure that investigates a sex-specific difference. Exposures are sorted by the total number of studies per exposure (far left is the most number of studies, far right is the least number of studies). The percent of all studies that investigate sex-specific effects is shown along with the percent of methodology studies investigating sex-specific effects (out of all studies).

Percent of prenatal exposure studies on a particular exposure that investigates a sex-specific difference. Exposures are sorted by the total number of studies per exposure (far left is the most number of studies, far right is the least number of studies). The percent of all studies that investigate sex-specific effects is shown along with the percent of methodology studies investigating sex-specific effects (out of all studies). Studies on exposures such as thiamine, iodine, HIV, vitamin D, malaria, acetaminophen, and radiation did not typically investigate sex-specific outcomes following the exposure. This represents a potential literature gap, and a possibility for measurement bias (only known endocrine disruptors were investigated for a sex-related effect). Therefore, non-endocrine disruptors (eg mercury) were not frequently studied with regards to their sex-related outcomes. This is an example of measurement bias because the outcome is mainly studied among exposures where an effect is expected. The effects of pollutants on sex ratios is not fully known and, therefore, worth investigating more broadly (ie not just among known endocrine disrupting chemicals).

Methodological studies investigate multiple exposures

Environmental exposures rarely occur in isolation. We found that on average methodological studies investigated 1.45 prenatal exposures per study (min = 1, max = 8) among the 9492 studies where PEPPER captured at least 1 exposure. To illustrate the high-degree of overlap among studies, we performed Multi-Dimensional Scaling (MDS). MDS is a form of dimensionality reduction that visualizes the overlap of exposures across studies. We performed MDS for all 9492 studies with exposures captured by PEPPER. We grouped exposures to illustrate the breakdown of lifestyle exposures, endocrine disrupting pollutants, non-endocrine disrupting pollutants, vitamins and minerals, infections, genetics, traditional environmental exposures (ie air pollution, radiation, temperature), and food additives from EAFUS. Figure 4 illustrates how certain exposure types are studied in patterns that are distinct from other exposure types. For instance, food additive studies are located in regions II and IV (Figure 4) while genetics; infection, non-endocrine disrupting pollutants, smoking/nicotine, and alcohol are located only in region I. Certain exposures are located throughout the graph including the traditional environment exposures, medications and illicit drugs, hormones and endocrine-disrupting pollutants, stress/diet, and vitamins and minerals (Figure 4).
Figure 4.

Multi-Dimensional scaling plots visualize the overlap among prenatal exposures present in original research articles. The graph showing all 219 exposures is the upper left hand subplot. For the purpose of the MDS, all 176 EAFUS food additive exposures were grouped together into a single term. Therefore 44 distinct exposures are shown in these subplots. The graphs are divided into 4 keys areas denoted by Roman Numerals (I, II, III, IV). In addition, exposures are grouped according to their types: traditional environment (ie air pollution, radiation, temperature), medications and illicit drugs, food additives from EAFUS, genetics, infection, hormonal disrupting pollutant, non-hormonal disrupting pollutant, vitamins and minerals, and various lifestyle exposures including stress/diet, smoking/nicotine and alcohol and other lifestyle exposures. The overall average number of exposures per study was 1.45 with a range from 1 to 6. The overlap is also easily identified in these MDS subplots.

Multi-Dimensional scaling plots visualize the overlap among prenatal exposures present in original research articles. The graph showing all 219 exposures is the upper left hand subplot. For the purpose of the MDS, all 176 EAFUS food additive exposures were grouped together into a single term. Therefore 44 distinct exposures are shown in these subplots. The graphs are divided into 4 keys areas denoted by Roman Numerals (I, II, III, IV). In addition, exposures are grouped according to their types: traditional environment (ie air pollution, radiation, temperature), medications and illicit drugs, food additives from EAFUS, genetics, infection, hormonal disrupting pollutant, non-hormonal disrupting pollutant, vitamins and minerals, and various lifestyle exposures including stress/diet, smoking/nicotine and alcohol and other lifestyle exposures. The overall average number of exposures per study was 1.45 with a range from 1 to 6. The overlap is also easily identified in these MDS subplots.

Example case study for PEPPER: food additives

We used PEPPER for a case study on food additives. Using FDA’s EAFUS, PEPPER determined how many studies on food additives investigate prenatal exposure. We compared this against the total number of studies mentioning the food additive to obtain the proportion of studies assessing a prenatal effect. Prenatal exposure effects of food additives were studied for 176 compounds out of 3968 (4.4%) compounds contained in EAFUS. Of 16 832 prenatal exposure methodology studies, only 1886 (11.2%) investigate food additive effects. Among these, 768 (4.6%) investigated only a food additive contained in EAFUS (ie not any of the 43 common exposures) while another 1118 (6.6%) investigated food additives and a commonly studied prenatal exposure (Figure 5).
Figure 5.

Breakdown of the percentage of all studies in PMC on food additive exposures by study methodology type and food additive type. The percent of food additive studies investigating a prenatal exposure to the additive is provided. We also provide the percent of food additive studies that are methodological and investigate another common exposure (in the 43-exposure set) and also the food additive studies that only investigate a food additive exposure.

Breakdown of the percentage of all studies in PMC on food additive exposures by study methodology type and food additive type. The percent of food additive studies investigating a prenatal exposure to the additive is provided. We also provide the percent of food additive studies that are methodological and investigate another common exposure (in the 43-exposure set) and also the food additive studies that only investigate a food additive exposure. In total, 3117 studies investigated prenatal exposure to food additives. The majority of these were methodology studies (60.5%), followed by non-methodology studies (27.2%), PDF only (8.9%) and systematic reviews (3.4%) (Table 3). Prenatal exposure to commonly used food additives (EAFUS category ASP) are rarely studied with a rate of only 0.24% of methodology studies. Surprisingly, there is also a paucity of research on the effects of banned food additives on prenatal development. Of 2105 research articles investigating banned food additives, only 4 (0.19%) investigate effects during the prenatal period and only 3 (0.14%) were methodology studies (Table 4).
Table 3.

Breakdown of studies investigating prenatal exposure to food additives by study type

Study TypeAll Food Additive Studies (N = 3117)All Studies (N = 31 764)
Methodology (not including systematic review)1886 (60.5%)16 832 (53.0%)
Systematic Review (with methods)107 (3.4%)974 (3.1%)
Non-methodology Paper847 (27.2%)9060 (28.5%)
PDF only (no method assessment possible)277 (8.9%)4898 (15.4%)
Table 4.

Food additives from EAFUS breakdown by study type

Food AdditiveAll StudiesPrenatal Exposure StudiesPrenatal Exposure Methodology Studies*Prenatal Exposure Methodology Studies with Common Exposure AND EAFUS Exposure**Prenatal Exposure Methodology Studies with Only EAFUS Exposure
Commonly Used (ASP)463 898306718541111743
New (EAF)39 82139325616393
Limited Use (NIL)5571241174
Limited Use (NUL)27 493130805228
Banned (BAN)21054303
Total Unique Studies445 142311718861118768

excluding non-methods, systematic reviews.

lacking other common exposures.

Breakdown of studies investigating prenatal exposure to food additives by study type Food additives from EAFUS breakdown by study type excluding non-methods, systematic reviews. lacking other common exposures. Two raters independently assigned exposures to 50 articles. PEPPER’s error rate for exposure assignment ranged from 0.56% to 1.30% depending on the rater. Evaluation of the study type assignment showed agreement ranging from 96% to 100% depending on the rater (kappa = 0.916, p < .001). There was 100% agreement with regards to the PDF-only categorization among both raters and PEPPER. Two discrepancies existed between raters. One involved a state-of-the-art review article on a scientific method, and; therefore, one rater annotated it as a non-methodological paper and the other as a methodological paper. The second discrepancy involved a publication of a set of abstracts from a conference (assigned to one PMC ID). One rater decided this was not a methodological study because it was a set of abstracts (which is the same as PEPPER’s assignment) and the other decided that the term “method” was used in the abstracts, and; therefore, this should be considered a methodology study. Overall, agreement was high between raters (Fleiss kappa = 0.916, p < .001).

PEPPER’s recall

Using a gold standard set of 64 prenatal alcohol exposure studies, we computed PEPPER’s recall. Only 18 of the 64 relevant studies were found in PMC (28.1%), most likely this is because PMC requires free access to the manuscript’s fulltext. PEPPER successfully retrieved 17 of the 18 relevant studies (94.4%) available within PMC. The only relevant article that PEPPER failed to retrieve was a fetal alcohol syndrome study where the prenatal exposure to alcohol was implied implicitly but not stated explicitly. Diseases and disorders that are based on a prenatal exposure, such as “fetal alcohol syndrome disorder” sometimes omit mentioning that the exposure was prenatal because this knowledge is implied in the diagnosis.

Discussion

In this study, we describe the development of PEPPER: Prenatal Exposure PubMed ParsER. PEPPER performs 2 tasks: a) catalogs prenatal exposures studied in the literature and b) identifies study type. PEPPER allowed us to develop a list of commonly studied exposures. We are also able to identify exposures that have a large number of systematic reviews vs methodology studies and exposures that are discuss more in non-methodological studies (eg perspectives, guidelines, and editorials). This allowed us to identify gaps in the literature, especially pertaining to food additives (Figure 5). A main contribution of PEPPER is the ability to distinguish original research studies (ie methodology studies) from systematic reviews (containing a methods section) and also from non-methodology studies (eg perspectives, editorials, clinical practice guidelines). As of June 2018, PMC has added a “Methods – Key Terms” item to their metadata options for querying, which should facilitate this for other researchers in the future (however as of July 2018, it is still under development and not fully functional). The literature gaps that PEPPER identifies are vital for distinguishing exposures mentioned frequently in non-methodological papers vs commonly reviewed vs studied in formal research articles. Exposure assignment can be challenging. PEPPER does not distinguish between the main exposure outcome and the confounder variable in a study. Instead, it identifies all exposures present in the title or abstract of the study. Of 9492 articles where we identified an exposure, we found that on average 1.45 exposures were studied per methods paper. In Figure 4, we display the articles by exposure type and the patterns of overlap across studies. Exposures can be grouped by their patterns. Interestingly, food additives formed a unique pattern. Exposures can be grouped by their type (eg lifestyle vs genetics) or by the pattern obtained in Figure 4 illustrating how they are studied. These patterns can be utilized by others studying environmental exposures. Food additives are ubiquitous in modern day life. For example, caffeine is a well-known food additive with known prenatal effects. Furthermore, it is reported that 85% of the US population ingests a caffeine beverage per day with some estimates up to 89%. PEPPER identified 81 studies investigating prenatal exposure effects of caffeine. Therefore, this is a well-studied exposure. However, across all food additives, PEPPER revealed that less then 1% of studies for each of the major food additive groupings by the FDA investigate prenatal exposure. The highest proportion of prenatal exposure studies was found in the “new” food additives. Additionally, only 60.5% of food additive studies investigating prenatal exposure were methodological studies (Table 3). Across all major food additive categories, >1% of prenatal exposure studies are methodological in nature (Figure 5) illustrating an important literature gap. Food additives are important not only because of their potential teratogenicity, but also because of food allergies and their effects on the developing fetus. Food allergies are becoming more common with estimates between 1 to 10% of the general population. Since immune reactions during pregnancy can affect fetal development, studying food additives during the prenatal period is important to distinguish the food additive effect from the allergy. Another important literature gap is the paucity of literature on currently banned food additives. Only 3 articles describing prenatal exposure effects were methodological studies. Only 2 food additives were tested – coumarin and thiourea. All 3 studies were investigating the same exposure, namely thiourea (a pesticide). The coumarin study was a PDF-only “state-of-the-art” study from 1985. Thiourea (or ethylene thiourea) is a metabolite of mancozeb. Mancozeb is sprayed on banana plantations in Costa Rica, where pregnant women have been exposed to large doses, because they live in close proximity to the banana plantations. Another study reported elevated thiourea levels in pregnant agricultural rose workers in Ecuador. Some banned food additives such as calamus root have never been studied (in PMC) in the prenatal exposure context. Calamus root is an herbal remedy used by indigenous peoples of the Americas, especially among the Chipewyan people, to treat fungal infections. It is also noted to have hallucinogenic properties – most likely the reason it was banned as a food additive. However, studying its effects, at least in an animal model setting, is important as individuals may ingest this substance. Without knowledge of the effects of prenatal exposure on the offspring, we would be unlikely to detect this type of poisoning at birth. PEPPER’s main contributions are in identifying literature gaps with regard to methodological studies of prenatal exposure to various compounds, including food additives. In addition, we highlight exposures that are reviewed heavily in the literature with proportionally lower amounts of methodological studies (Table 2). We can also identify exposures where sex-specific outcomes have been investigated heavily vs exposures – including HIV – where very few sex-specific outcomes have been investigated (Figure 3). This form of “deep” analysis of the literature across the breadth of prenatal exposures is important for future researchers, especially given that no comprehensive list of prenatal exposures exists. Our list of 219 exposures can serve as an important starting point for the community while highlighting areas for future research. There are several limitations of our work. PEPPER provides a list of 219 exposures studied in the literature during the prenatal period. However, PEPPER does not capture every single prenatal exposure studied. PEPPER also depended on manual review of frequent terms. We are aware that this exposure set is not fully complete and requires additional refinement and analysis. Future work includes exploring applications of other methods to learn exposures from PMC. Another limitation is that while PEPPER captures study type (ie methodological, non-methodological, systematic review or PDF-only) it does not capture the outcomes following prenatal exposure. This would require another algorithm to extract the result of the prenatal exposure from the articles and remains a focus of future work.

Conclusion

In conclusion, we contribute the PEPPER framework that mines full-text articles from PubMed Central and extracts articles investigating exposure to compounds during the prenatal period. PEPPER distinguishes methodology studies from non-methodological studies (eg editorials, perspectives), systematic reviews and PDF-only papers. Overall, only 53.0% of prenatal exposure studies were methodology studies. PEPPER provides a master set of 219 prenatal exposures studied in the literature. No prior comprehensive list existed and therefore PEPPER provides the first such contribution. PEPPER captures exposures from 9492 prenatal exposure methodology studies out of 16 832 studies or 56.4%. PEPPER achieved a recall of 94.4%. PEPPER also highlights important literature gaps, including the paucity of research on prenatal exposure to food additives.

Funding

This project was made possible by generous funding from the Perelman School of Medicine, University of Pennsylvania.

Competing interests

None.

Contributors

Conceived Study Design: MRB Developed methodology: MRB, AK Performed evaluations: MRB, AK, JX Assisted with initial review: MRB, JX Provided informatics and medical advice pertinent to study problem: JH, SL Wrote Paper: MRB Reviewed, Edited, and Approved Final Manuscript: MRB, AK, JX, JH, SL

SUPPLEMENTARY MATERIAL

Supplementary material is available at Journal of the American Medical Informatics Association online. In addition code is available at: https://github.com/bolandlab/PEPPER/ Click here for additional data file.
  36 in total

1.  On The Correlation Between Geo-Referenced Clinical Data And Remotely Sensed Air Pollution Maps.

Authors:  Arianna Dagliati; Andrea Marinoni; Carlo Cerra; Paolo Gamba; Riccardo Bellazzi
Journal:  Stud Health Technol Inform       Date:  2015

2.  Urticaria; detection of ingested allergens; the single food additive diet.

Authors:  B H WINSTON; R L SUTTON
Journal:  Practitioner       Date:  1948-05

Review 3.  Global Prevalence of Fetal Alcohol Spectrum Disorder Among Children and Youth: A Systematic Review and Meta-analysis.

Authors:  Shannon Lange; Charlotte Probst; Gerrit Gmel; Jürgen Rehm; Larry Burd; Svetlana Popova
Journal:  JAMA Pediatr       Date:  2017-10-01       Impact factor: 16.193

Review 4.  Systematic review of the potential adverse effects of caffeine consumption in healthy adults, pregnant women, adolescents, and children.

Authors:  Daniele Wikoff; Brian T Welsh; Rayetta Henderson; Gregory P Brorby; Janice Britt; Esther Myers; Jeffrey Goldberger; Harris R Lieberman; Charles O'Brien; Jennifer Peck; Milton Tenenbein; Connie Weaver; Seneca Harvey; Jonathan Urban; Candace Doepker
Journal:  Food Chem Toxicol       Date:  2017-04-21       Impact factor: 6.023

5.  Evidence for a food additive as a cause of ketosis-prone diabetes.

Authors:  T Helgason; M R Jonasson
Journal:  Lancet       Date:  1981-10-03       Impact factor: 79.321

6.  Accelerating literature curation with text-mining tools: a case study of using PubTator to curate genes in PubMed abstracts.

Authors:  Chih-Hsuan Wei; Bethany R Harris; Donghui Li; Tanya Z Berardini; Eva Huala; Hung-Yu Kao; Zhiyong Lu
Journal:  Database (Oxford)       Date:  2012-11-17       Impact factor: 3.451

7.  STRING v10: protein-protein interaction networks, integrated over the tree of life.

Authors:  Damian Szklarczyk; Andrea Franceschini; Stefan Wyder; Kristoffer Forslund; Davide Heller; Jaime Huerta-Cepas; Milan Simonovic; Alexander Roth; Alberto Santos; Kalliopi P Tsafou; Michael Kuhn; Peer Bork; Lars J Jensen; Christian von Mering
Journal:  Nucleic Acids Res       Date:  2014-10-28       Impact factor: 16.971

8.  Uncovering exposures responsible for birth season - disease effects: a global study.

Authors:  Mary Regina Boland; Pradipta Parhi; Li Li; Riccardo Miotto; Robert Carroll; Usman Iqbal; Phung-Anh Alex Nguyen; Martijn Schuemie; Seng Chan You; Donahue Smith; Sean Mooney; Patrick Ryan; Yu-Chuan Jack Li; Rae Woong Park; Josh Denny; Joel T Dudley; George Hripcsak; Pierre Gentine; Nicholas P Tatonetti
Journal:  J Am Med Inform Assoc       Date:  2018-03-01       Impact factor: 4.497

9.  Content-rich biological network constructed by mining PubMed abstracts.

Authors:  Hao Chen; Burt M Sharp
Journal:  BMC Bioinformatics       Date:  2004-10-08       Impact factor: 3.169

10.  Cardiovascular Disease Risk Varies by Birth Month in Canines.

Authors:  Mary Regina Boland; Marc S Kraus; Eddie Dziuk; Anna R Gelzer
Journal:  Sci Rep       Date:  2018-05-17       Impact factor: 4.379

View more
  1 in total

1.  Applied Veterinary Informatics: Development of a Semantic and Domain-Specific Method to Construct a Canine Data Repository.

Authors:  Mary Regina Boland; Margret L Casal; Marc S Kraus; Anna R Gelzer
Journal:  Sci Rep       Date:  2019-12-09       Impact factor: 4.379

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.