| Literature DB >> 35252490 |
Simon Joss1, Daan Schraven2, Martin de Jong3.
Abstract
This data article presents a tripartite dataset that formed the empirical basis for a comprehensive bibliometric analysis of the use of city labels denoting sustainable urbanism in the scientific literature (Schraven, 2021). The tripartite dataset was generated using the abstract and citation database Scopus (Elsevier). Dataset A lists 148 city labels denoting different approaches to urban planning and development. It was used to select 35 city labels that specifically address sustainable urbanism ('sustainable city', 'smart city', 'compact city' etc.). Dataset B references 11,337 journal and review articles spanning the period 1990-2019. All retrieved articles contain at least one of the 35 city labels in the title, abstract, and author keywords. This database was used to calculate the frequency of the selected city labels across time, and to analyze the co-occurrences of city labels. It was further used to calculate the future trajectory of scientific outputs using the Logistic Growth Model (LGM). Dataset C entails 22,820 author keywords extracted from across the 11,337 articles. This was used to analyze the co-occurrences of keywords with city labels. The data article describes the methods of data collection and curation, the analysis performed, and the potential for reusing the data for further research. The comprehensiveness of the bibliometric corpus - spanning three decades and 35 city labels - lends itself to further investigation of how sustainable urban development has evolved as a topic in the scientific literature since the 1990s. Furthermore, the robust methodology developed could be adapted to other scientific repositories and, indeed, other research problems and questions.Entities:
Keywords: Bibliometrics; City labels; SDGs; Scientometrics; Smart city; Sustainable city; Urban development; Urban futures
Year: 2022 PMID: 35252490 PMCID: PMC8889341 DOI: 10.1016/j.dib.2022.107966
Source DB: PubMed Journal: Data Brief ISSN: 2352-3409
Fig. 1Tripartite dataset A–C.
Methodological procedures for datasets A–C.
Check existing bibliometric studies on multiple city labels Input the 12 city labels as search query in Scopus to retrieve further city labels from author keywords of retrieved articles, resulting in 148 city labels. Delete any duplicate city labels; carry out qualitative review (triangulated among researchers) based on three joint criteria (derived from Formulate the 35 city labels as search query: see footnote 1. Enter search query in Scopus, setting 1990–2019, thus retrieving 11,337 articles. Collect bibliometric data: (i) title; (ii) abstract; (iii) author keywords. Arrange 5-yearly temporal incisions resulting in 6 cumulative periods: 1990–1994; 1990–1999; 1990–2004; 1990–2009; 1990–2014; 1990–2019. Count all articles in database mentioning a given city label at least once (an article is counted only once even if given city label is mentioned twice or more); repeat for each of the 35 city labels. Tabulate city labels from highest to lowest counts, across six cumulative time periods. Draw line graph showing yearly counts 1990–2019 for all city labels; apply logarithmic scale for legibility. Draw scatter plot showing relative positions (cumulative frequencies) and new entry points of 35 city labels across six time periods. Count all articles mentioning a pair of city labels (e.g. ‘sustainable city’ AND ‘smart city’) at least once; repeat for all unique pairs of city labels; and repeat for each cumulative period. Store all counts of unique pairs in 6 matrices (35 × 35 cells) representing the 6 cumulative periods. In Pajek software, draw a social network graph using each of the 6 m2atrices. Use 6th matrix (1990–2019) to list 10 highest co-occurrence frequencies in ranking order. Use 6th matrix (1990–2019) to list city labels co-occurring with ‘sustainable city’, and ‘smart city’, respectively, in order of strength of connection. Extrapolate future trajectory of city labels from occurrence rates 1990–2019, by applying Logistic Growth Model Curve to city label occurrences as follows: Extract from database city label occurrences per year. Following General Limit Theorem, exclude city labels with <30 occurrences, thus withdrawing 10 city labels. For each of the 25 retained city labels, create a regression model based on occurrences between cumulative growth of articles (Y) per year (X): Plot no. of articles over time following general logistic growth pattern in the shape of S-curve. Lock the position of each of the city labels on S-curve at final complete publication year: 2019. Normalize S-curve to relative growth, where Draw development stages ‘infant’, ‘growth’, ‘mature’ For each of the 25 city labels, use regression model to predict start and finish of three development stages (Zeng et al., 2019), and store predictions in matrix. Sort city labels by predicted longevity, from ‘open city’ (till 2077) to ‘ubiquitous city’ (till 2024), and draw stacked bar chart of 25 city labels with development stages shown. Count all articles mentioning at least one city label and at least one keyword (e.g., ‘sustainable city’ AND ‘planning’); repeat for all unique pairs of city labels and keyworks; store resulting counts in large 35 × 22,820 matrix. Calculate degree of centrality (co-occurrence with no. of city labels) of all keywords; rank the keywords with 15 highest degrees (cut-off at degree of centrality 10). Harvest and rank the 15 most frequent keywords for each city label, yielding a total of 149 keywords. Filter and store 149 keyword counts in 35 × 149 matrix, and draw social network graph in Pajek. Draw two graphs based on extracted cluster (A) ‘smart’-‘intelligent’-‘digital’-‘ubiquitous’-‘future’-‘creative’-‘connected’ and cluster (B) ‘sustainable’-‘low-carbon’-‘liveable’-‘green’-‘eco’-‘compact’. |
| Subject | Social Sciences; Library and Information Sciences |
| Specific subject area | Bibliometrics; Scientometrics; sustainable urban development |
| Type of data | Figure (visualized data) |
| How the data were acquired | Using an initial search query for 12 city labels (for Dataset A), followed by an extended search query for 35 city labels (for Dataset B and C), the data was acquired through the Elsevier Scopus abstract and citation database. |
| Data format | Curated (re-ordered; filtered); analyzed |
| Description of data collection | The tripartite dataset consists of: Dataset A: 148 city labels obtained from initial search query in Scopus, used to select 35 city labels based upon qualitative analysis. Dataset B: 11,337 journal articles (author names, journal publication, year of publication, article title, abstract, author keywords) retrieved from Scopus using search query containing 35 city labels. Dataset C: 22,820 author keywords extracted from Dataset B. |
| Data source location | The primary data was sourced from Scopus (Elsevier) at: |
| Data accessibility | The curated data |
| Related research article | D. Schraven, S Joss, M. de Jong, Past, present, future: Engagement with sustainable urban development through 35 city labels in the scientific literature 1990–2019, Journal of Cleaner Production, 292 (2021) 125,924, doi: |