Literature DB >> 32733602

Evolution of colorectal cancer screening research in the past 25 years: text-mining analysis of publication trends and topics.

Shelly Soffer1, Eyal Klang2, Noam Tau3, Roni Zemet4, Shomron Ben-Horin5, Yiftach Barash2, Uri Kopylov5.   

Abstract

BACKGROUND: There is a growing research effort in the field of colorectal cancer (CRC) screening, with varying topics and shifting research foci over the years. The aim of this study was to apply a text-mining technique to evaluate trends in publications for CRC screening in the last 25 years.
METHODS: We retrieved MEDLINE/PubMed datasets from 1992-2017. We selected keywords from Medical Subject Headings to include CRC screening related publications. For each article, we extracted the following data: title, journal, publication date, abstract, article type, citation frequency, and country of origin. Articles were categorized into topics using word combination and title match technique.
RESULTS: In 1992-2017, 14,119 CRC screening related papers were published. The US had the highest number of papers (n = 4824) and China had the highest growth rate in publications. Overall, the most researched topic was "screening and surveillance programs" (38%). The topics of "quality assurance" (r = 0.87) and "racial disparities" (r = 0.91) have gained increased research attention over the years. In total, 11 of the 20 most cited articles in the field were published in The New England Journal of Medicine.
CONCLUSION: The number of publications devoted to CRC screening has grown, with high-quality research reaching top-tier journals. A surge in the number of publications has been increasing in countries previously less involved in research in the field. Screening programs remain the most researched topic, and quality indicators is attracting a growing attention. Text-mining analysis of CRC screening research contributes to an understanding of publication trends and topics and can point to the need for potential future investigations.
© The Author(s), 2020.

Entities:  

Keywords:  cancer screening; colorectal cancer; text mining

Year:  2020        PMID: 32733602      PMCID: PMC7372615          DOI: 10.1177/1756284820941153

Source DB:  PubMed          Journal:  Therap Adv Gastroenterol        ISSN: 1756-283X            Impact factor:   4.409


Introduction

Colorectal cancer (CRC) is a world-wide healthcare problem with high morbidity and mortality. It is the second leading cause of cancer-related death throughout the world.[1] The annual expenditures associated with CRC are substantial. They are estimated at 14 billion US dollars per year in the US alone.[2] Several factors make CRC applicable for population screening. These include high incidence, a prolonged disease development course, and effective endoscopic treatment options in premalignant stages.[3] CRC screening has therefore been adopted on a wide scale and has been incorporated into international guidelines.[4,5] CRC screening has significantly reduced the incidence and mortality rates of the disease in the last two decades.[6,7] Extensive research in the field of CRC screening has been published. These studies assessed various aspects, including available screening tools (endoscopic, fecal, radiologic, and blood tests), screening programs, high-risk populations, and cost-effectiveness. Manual summarization of this large amount of research data is both infeasible and impractical. When examining the literature in other medical fields, we note that endeavors have been made to identify research trends. These studies have been narrowed down to a defined number of cited articles,[8] to specific journals,[9] or to a limited number of years.[10] Current computational power and machine learning advancements have prompted a technique termed “text-mining.” This technique extracts information from texts using computational statistical methods.[11] Text-mining can be applied to identify trends and to investigate the dynamics in a research field.[12-15] The aim of our study was to apply a text-mining technique to evaluate published literature for CRC screening in the last 25 years. We performed trend analysis to discover patterns in CRC screening publications.

Methods

An institutional review board (IRB) approval was granted for this study. Informed consent was waived by the IRB committee.

Search strategy

The US National Library of Medicine (NLM) produces an annual version of MEDLINE/PubMed data which is freely available for download.[16-18] We used the 2018 MEDLINE/PubMed baseline dataset in this study. We retrieved all available MEDLINE/PubMed annual datasets from 1992 to the end of 2017 (25 years). Data lock and citation retrieval were performed on 1 August 2019.

Data processing

The data processing and results visualization were written in Python (version 3.6.5, 64 bits). We used the open-source Pandas library (version 0.24.2) for data handling, open-source Geopandas (version 0.4.1) for geographical visualization, open-source SCIPY (version 1.3.0), open-source NLTK (version 3.4.4) for text handling, and open-source MatPlotLib (version 3.1.0) for results visualization. For text-mining, each title, abstract, and the first author’s affiliation were tokenized. All punctuation and double spaces were removed, and each word became a single entry in a list.

Inclusion and exclusion criteria

For creating a subset of papers which were relevant to our desired topic, we used Medical Subject Headings (MeSH). MeSH terms are used as a supervised glossary for searching in the PubMed database.[13] The following keywords were selected from MeSH to create the subset of articles relating to the colon and rectum: “colorectal,” “colon,” “rectal,” “rectum,” “colonic,” and “CRC.” These terms were matched to the tokenized title list and a subset of records was retrieved. We then included papers which had one of the terms “cancer,” “carcinoma,” “adenocarcinoma,” “adenoma,” “polyp,” or “mass” in the abstract, and also one of the following terms: “screening,” “surveillance,” or “screen.” Abstracts shorter than 50 words were excluded.

Data extraction

The following data was extracted from each of the included articles: PubMed unique article ID (PMID), title, journal, publication date (year and month), abstract text, article type (e.g. review, randomized control trial), article language, and authors (including the first author’s affiliation, if available). We then used a free-for-use application provided by the National Center for Biotechnology Information (NCBI) to retrieve the number of times each article was cited, based on its PMID.[19] The first author’s country was retrieved from the affiliation data, if available. The first author’s affiliation was compared with a country list extracted from the Geopandas library. We normalized the number of publications and the number of citations for each country according to its population by extracting the yearly population size of each country from the World Bank Catalog.[20]

Topic modeling

All included studies were divided into topics using the following methodology: each study’s title was analyzed after omitting stop words such as “the,” “a,” “an,” and “in”, which are detailed in NLTK version 3.4.4 StopWords corpus. The 1000 most frequent two-word combinations in all titles were listed in descending order of frequency. A gastroenterologist specialist physician (KU) defined 10 topics in the field of CRC screening. Topics included: Screening and surveillance programs, risk stratification, non-invasive screening, epidemiology, inflammatory bowel disease (IBD) screening, quality assurance, racial disparities, treatment, quality of life, and cost-effectiveness. Each word combination in the list was manually labeled as either non-specific or related to 1 of the 10 topics. Each study record was then matched to one of the 10 topics by comparing the words in the title with the topic list.

Data analysis

We used Pearson correlation to evaluate normalized trends in topics for intervals of 5 years. We used univariable linear regression to evaluate country growth rate trends. The slope statistical significance is presented through the p-value which is the regression output (open-source SCIPY). For article type and country analysis, the citation rate was calculated by dividing the overall number of times articles were cited by the overall number of publications.

Results

A total of 19,657,610 records were retrieved from the NLM database between 1992 and 2017. Of these, 14,119 publications were related to CRC screening. A flow diagram of the search is provided in Figure 1. Almost all papers were in the English language (93.5%).
Figure 1.

Flow diagram of included studies using MEDLINE/PubMed search.

Flow diagram of included studies using MEDLINE/PubMed search.

Time trend analysis

The number of annual publications relating to CRC screening increased between 1992 and 2014 (Figure 2), with a slight decline since 2014. The overall number of annual publications increased from 124 publications in 1992 to 992 publications in 2017.
Figure 2.

Trends in the number of colorectal cancer screening research from 1992 to 2017.

Trends in the number of colorectal cancer screening research from 1992 to 2017.

Article type analysis

MEDLINE/PubMed article type was specified for 2862/14,119 (20.3%) papers. Among those, 1429/2862 (50.0%) were review articles, 519/2862 (18.1%) were randomized controlled trials, and 412/2862 (14.4%) were multi-center studies. The article type with the highest citation rate (number of citations/number of publications) was guideline papers (69.2) followed by multi-center studies (27.4) and randomized controlled trials (27.3). Figure 3 shows the distribution of article types and their corresponding citation rate.
Figure 3.

Distribution of colorectal cancer screening by article type, indicating (a) publication volume and (b) citation rate (i.e. number of citations per number of publications).

Distribution of colorectal cancer screening by article type, indicating (a) publication volume and (b) citation rate (i.e. number of citations per number of publications).

Country analysis

Publications on CRC screening originated from 97 countries, mainly from North America and Europe. The US had the highest number of papers (n = 4824), followed by the UK (n = 927), and China (n = 848). After normalizing to population size, the Netherlands had the largest number of publications per million people (n = 27), followed by Denmark (n = 23), and Israel (n = 20). The countries with the highest citation number were the US (n = 73,638), the UK, (n = 12,678) and Germany (n = 6642). When normalizing the number of citations to the population size, the Netherlands had the largest number of citations per million people (n = 373), followed by Finland (n = 288), and the US (n = 250). The countries with the highest citation rate (number of citations/number of publications) were Finland (18), US (15), and UK (13). Figure 4 presents the distributions of publications, citations, and citation rates among the 20 countries with the highest number of publications.
Figure 4.

Distribution of colorectal cancer screening by country, indicating (a) publication volume, (b) citation frequency, and (c) citation rate (i.e. number of citations per number of publications). The left axis presents absolute numbers and right axis presents normalization by country population size.

Distribution of colorectal cancer screening by country, indicating (a) publication volume, (b) citation frequency, and (c) citation rate (i.e. number of citations per number of publications). The left axis presents absolute numbers and right axis presents normalization by country population size. Figure 5 shows the growth rate in number of annual publications in respect to the country of origin, China (0.14, p < 0.001), Spain (0.13, p < 0.001), and Taiwan (0.10, p < 0.001) had the highest growth rate over time.
Figure 5.

World map indicating colorectal cancer screening publication growth rate by country. The color index represents the calculated growth rate. Countries with less than 100 overall publications were omitted from the growth rate analysis (demonstrated by the white color).

World map indicating colorectal cancer screening publication growth rate by country. The color index represents the calculated growth rate. Countries with less than 100 overall publications were omitted from the growth rate analysis (demonstrated by the white color).

Topic analysis

The most researched topic is “screening and surveillance programs” (38%). Yet, a continuous decrease in research attention is shown for this topic over the past 25 years (r = −0.91, p = 0.035). Another area with a constant research interest is “non-invasive screening” (14%; r = 0.85, p = 0.93). The topic of “risk stratification” shows a non-linear correlation in time (r = −0.72, p = 0.63) with an increase until 1997 (24%) followed by a constant decrease until the end of 2017 (16%). The topic of “quality assurance” was steady until 2007 and has gained increased research attention in the past decade (r = 0.87, p = 0.052). Research interest in “racial disparities” increased fivefold from 1992–2007 (0.7–3.3%) and has remained stable since then (r = 0.91, p = 0.032). Quality assurance was the most trending topic in the last 5 years (years 2012–2017). The distribution of all assigned topics according to the year of publication is presented in Figure 6.
Figure 6.

Published article topic popularity and trends over time.

Published article topic popularity and trends over time.

Most frequently cited articles

Table 1 lists the top 20 most cited articles published on CRC screening in the past 25 years. The mean number of citations per article is 98. The top 20 most cited articles were published in five journals, with the greatest number in The New England Journal of Medicine (n = 11) and have originated mainly from the US (n = 13).
Table 1.

The top 20 most frequently cited articles published on colorectal cancer screening in the past 25 years.

Article rankStudyTitlePublication yearNumber of times citedCountryJournal
1U.S. Preventive Services Task Force[21]Screening for colorectal cancer: U.S. Preventive Services Task Force recommendation statement.2008560United States Annals of Internal Medicine
2Mandel et al.[22]Reducing mortality from colorectal cancer by screening for fecal occult blood. Minnesota Colon Cancer Control Study.1993554United States The New England Journal of Medicine
3Levin et al.[23]Screening and surveillance for the early detection of colorectal cancer and adenomatous polyps, 2008: a joint guideline from the American Cancer Society, the US Multi-Society Task Force on Colorectal Cancer, and the American College of Radiology.2008541United States Gastroenterology
4Edwards et al.[24]Annual report to the nation on the status of cancer, 1975–2006, featuring colorectal cancer trends and impact of interventions (risk factors, screening, and treatment) to reduce future rates.2009526United States Cancer
5Hardcastle et al.[25]Randomised controlled trial of faecal-occult-blood screening for colorectal cancer.1996497United Kingdom Lancet (London, England)
6Zauber et al.[26]Colonoscopic polypectomy and long-term prevention of colorectal-cancer deaths.2012470United States The New England Journal of Medicine
7Kronborg et al.[27]Randomised study of screening for colorectal cancer with faecal-occult-blood test.1996437Denmark Lancet (London, England)
8Winawer et al.[28]Colorectal cancer screening and surveillance: clinical guidelines and rationale-update based on new evidence.2003416United States Gastroenterology
9Rex et al.[29]American College of Gastroenterology guidelines for colorectal cancer screening 2009.2009403United States The American Journal of Gastroenterology
10Atkin et al.[30]Once-only flexible sigmoidoscopy screening in prevention of colorectal cancer: a multicentre randomised controlled trial.2010385United Kingdom Lancet (London, England)
11Lieberman et al.[31]Use of colonoscopy to screen asymptomatic adults for colorectal cancer. Veterans Affairs Cooperative Study Group 380.2000304United States The New England Journal of Medicine
12Pickhardt et al.[32]Computed tomographic virtual colonoscopy to screen for colorectal neoplasia in asymptomatic adults.2003303United States The New England Journal of Medicine
13Mandel et al.[33]The effect of fecal occult-blood screening on the incidence of colorectal cancer.2000300United States The New England Journal of Medicine
14Baxter et al.[34]Association of colonoscopy and death from colorectal cancer.2008292Canada Annals of Internal Medicine
15Hampel et al.[35]Screening for the Lynch syndrome (hereditary nonpolyposis colorectal cancer).2005287United States The New England Journal of Medicine
16Kaminski et al.[36]Quality indicators for colonoscopy and the risk of interval cancer.2010282Poland The New England Journal of Medicine
17Nishihara et al.[37]Long-term colorectal-cancer incidence and mortality after lower endoscopy.2013260United States The New England Journal of Medicine
18Järvinen et al.[38]Controlled 15-year trial on screening for colorectal cancer in families with hereditary nonpolyposis colorectal cancer.2000248Finland Gastroenterology
19Selby et al.[39]A case-control study of screening sigmoidoscopy and mortality from colorectal cancer.1992235United States The New England Journal of Medicine
20Whitlock et al.[40]Screening for colorectal cancer: a targeted, updated systematic review for the U.S. Preventive Services Task Force.2008229United States Annals of Internal Medicine
The top 20 most frequently cited articles published on colorectal cancer screening in the past 25 years.

Discussion

In our study, we applied a text-mining approach to present an overview of 14,119 CRC screening publications over the past 25 years. The number of CRC screening publications has increased over the years. In 2017, the number of published papers in CRC screening was eight times greater than in 1992. This increase in the number of published articles coincides with the general trend of increased global publications in the medical field.[41] Several factors can be attributed to this particular trend in CRC screening publications. This growth can be a result of the expansion of CRC screening programs and the implementation of population-based programs.[42,43] The awareness of CRC screening is consistent with the worldwide endeavors that have focused on cancer prevention.[44-46] Furthermore, the increase in CRC screening publications could be linked to the rise in CRC incidence, particularly in countries in Eastern Europe, Asia, and South America.[47] Another possible factor is the emergence of new technologies. For example, in 1994 computed tomographic colonography was introduced by Vining et al. and was followed by a large number of related publications.[48] Research in the field of CRC screening started with several seminal publications 25 years ago.[3,22,25] These papers established the understanding that CRC screening can effectively reduce CRC mortality rate. They showed that colonoscopic polypectomy resulted in a lower-than-expected incidence of CRC and that annual fecal occult-blood test decreased mortality from CRC. These papers have likely promoted interest in CRC screening research and added momentum to the production of publications. When analyzing the type of articles, the most frequently cited were guideline articles. Guidelines are usually composed of the accumulation of a large research body that can influence the clinical setting.[49] Over the past 25 years, guidelines for CRC screening have been composed by professional groups and by a panel of expert gastroenterologists. They offer recommendations to assist practitioners and patients in decisions regarding screening variables such as average-risk persons, high-risk family history, screening tools, and quality indicators.[23,28,50,51] The beneficial effects of guidelines depend on the successful adaptation to clinical settings. The high citation rate of CRC screening guidelines reflects their contribution to the field. Most of the CRC screening research studies have been performed in North America and Europe. In these countries, greater resources are available and screening is more frequently implemented.[4] CRC incidence and the implementation of CRC screening differs among continents and countries.[4] The US leads in the number of publications and citations in CRC screening, which reflects the prominent role of the US gastroenterology community and its dominant position in international CRC screening research. The advancements of screening programs in the US can also be attributed to the endeavors of various national societies including the U.S. Preventive Services Task Force, American Cancer Society, American Gastroenterological Association, American Society for Gastrointestinal Endoscopy, and National Colorectal Cancer Roundtable. The extensive research in this field as well as the progression of screening programs in the US have resulted in a decrease in CRC incidence and mortality over the past two decades, as reported by the American Cancer Society.[52] Over the last few decades, there is an increasing trend in CRC incidence and mortality in Asia.[53] We have demonstrated a high growth rate of CRC screening publications in China. The screening programs in this country are still relatively lacking.[54,55] Hopefully, the rising trend in CRC screening publications can promote the understanding of screening significance, which will ultimately influence screening behavior for the wide population. In our study, we performed a text-mining analysis of two-word combinations. This allowed us to study “hot topics” in the field of CRC screening. Naturally, the most researched topic in the field of CRC screening was “screening and surveillance programs.” This topic has remained relatively stable over 25 years. We found that “quality assurance” was the most commonly trending topic over the last 5 years. This may help predict, to a definite extent, future trends in CRC publications. “Quality assurance” defines optimization of the benefit to risk ratio of colonoscopy screening.[36,56] Initially, research focused on the implementation of CRC screening programs but, with time, an emphasis has also been placed on the quality of screening. The topic of “non-invasive tests” is a prominent subject with a slight non-significant increase in the number of studies during the last decade. New laboratory tests include DNA, RNA, and protein biomarker stool and blood tests.[57] Novel imaging tests include colon capsule endoscopy[58] and magnetic resonance colonography.[59] The focus of research on this topic can be attributed to the attempts to develop and implement non-invasive tests, thereby reducing the need for colonoscopy for low-risk populations. Although a relatively small number of studies have focused on “race disparities,” in 2002 race related research showed an increase in interest and has plateaued since then. Disparities in CRC screening are experienced by minority groups. Screening rates remain low for African Americans, Hispanics, and Asians.[60-62] The research accumulated on “racial disparities” can promote effective intervention designed to decrease gaps in CRC screening. The research topics of “screening among IBD patients” and “risk stratification” have declined over the years. This may indicate that a foundation for recommendations for screening high-risk groups has already been effectively formulated. When observing the 20 most cited articles, we can note that these studies have been published in the top-ranking world medical journals. In total, 11 have been published in The New England Journal of Medicine, thus reflecting the importance of the subject. Our research has several limitations. First, this is a comprehensive study that includes 25 years of research conducted in 97 countries. As such, it can only provide a representation of CRC screening research on a global level. Second, the citation frequency was extracted from data provided by NCBI, while other options such as google scholar might have produced different results. Lastly, we used two-word combinations for topic modeling. Other approaches are available, such as latent Dirichlet allocation, but were found to be less effective in our study. In conclusion, the number of publications devoted to CRC screening is steadily rising, with high-quality research reaching top-tier journals. A surge in the number of publications on the topics has been increasing in countries previously much less involved in academic research in the field. Screening programs remain the most researched topic, and quality indicators in screening colonoscopy has been attracting attention in recent years. A text-mining analysis of CRC screening research contributes to the understanding of current publication trends and topics. This technique has predictive value in illuminating future trends in CRC publications.
  53 in total

1.  Computed tomographic virtual colonoscopy to screen for colorectal neoplasia in asymptomatic adults.

Authors:  Perry J Pickhardt; J Richard Choi; Inku Hwang; James A Butler; Michael L Puckett; Hans A Hildebrandt; Roy K Wong; Pamela A Nugent; Pauline A Mysliwiec; William R Schindler
Journal:  N Engl J Med       Date:  2003-12-01       Impact factor: 91.245

2.  Evaluation and adaptation of clinical practice guidelines.

Authors:  Ian D Graham; Margaret B Harrison
Journal:  Evid Based Nurs       Date:  2005-07

Review 3.  Colorectal cancer screening: a global overview of existing programmes.

Authors:  Eline H Schreuders; Arlinda Ruco; Linda Rabeneck; Robert E Schoen; Joseph J Y Sung; Graeme P Young; Ernst J Kuipers
Journal:  Gut       Date:  2015-06-03       Impact factor: 23.059

4.  Evaluation of the PillCam Colon capsule in the detection of colonic pathology: results of the first multicenter, prospective, comparative study.

Authors:  R Eliakim; Z Fireman; I M Gralnek; K Yassin; M Waterman; Y Kopelman; J Lachter; B Koslowsky; S N Adler
Journal:  Endoscopy       Date:  2006-10       Impact factor: 10.093

5.  Colorectal cancer screening: Recommendations for physicians and patients from the U.S. Multi-Society Task Force on Colorectal Cancer.

Authors:  Douglas K Rex; C Richard Boland; Jason A Dominitz; Francis M Giardiello; David A Johnson; Tonya Kaltenbach; Theodore R Levin; David Lieberman; Douglas J Robertson
Journal:  Gastrointest Endosc       Date:  2017-06-06       Impact factor: 9.427

6.  Randomised study of screening for colorectal cancer with faecal-occult-blood test.

Authors:  O Kronborg; C Fenger; J Olsen; O D Jørgensen; O Søndergaard
Journal:  Lancet       Date:  1996-11-30       Impact factor: 79.321

7.  Global trends in medical journal publishing.

Authors:  Youngsuk Chi
Journal:  J Korean Med Sci       Date:  2013-08       Impact factor: 2.153

8.  Effect of screening colonoscopy on colorectal cancer incidence and mortality.

Authors:  Charles J Kahi; Thomas F Imperiale; Beth E Juliar; Douglas K Rex
Journal:  Clin Gastroenterol Hepatol       Date:  2009-01-11       Impact factor: 11.382

9.  Evolution of Inflammatory Bowel Disease Research From a Bird's-Eye Perspective: A Text-Mining Analysis of Publication Trends and Topics.

Authors:  Yiftach Barash; Eyal Klang; Noam Tau; Shomron Ben-Horin; Hussein Mahajna; Asaf Levartovsky; Naila Arebi; Shelly Soffer; Uri Kopylov
Journal:  Inflamm Bowel Dis       Date:  2021-02-16       Impact factor: 5.325

10.  Text mining for identifying topics in the literatures about adolescent substance use and depression.

Authors:  Shi-Heng Wang; Yijun Ding; Weizhong Zhao; Yung-Hsiang Huang; Roger Perkins; Wen Zou; James J Chen
Journal:  BMC Public Health       Date:  2016-03-19       Impact factor: 3.295

View more
  2 in total

1.  Need differences by treatment phases between patients with colorectal cancer and their caregivers: A text mining analysis.

Authors:  Jaehee Yoon; Heesook Son
Journal:  Asia Pac J Oncol Nurs       Date:  2022-04-11

2.  Changes in Helicobacter pylori Treatment from Discovery to Nowadays: A High-Level Analysis of PubMed Publications.

Authors:  Eyal Klang; Shelly Soffer; Yiftach Barash; Eyal Shachar; Adi Lahat
Journal:  Clin Exp Gastroenterol       Date:  2022-03-18
  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.