| Literature DB >> 26379270 |
Neal Robert Haddaway1, Alexandra Mary Collins2, Deborah Coughlin3, Stuart Kirk4.
Abstract
Google Scholar (GS), a commonly used web-based academic search engine, catalogues between 2 and 100 million records of both academic and grey literature (articles not formally published by commercial academic publishers). Google Scholar collates results from across the internet and is free to use. As a result it has received considerable attention as a method for searching for literature, particularly in searches for grey literature, as required by systematic reviews. The reliance on GS as a standalone resource has been greatly debated, however, and its efficacy in grey literature searching has not yet been investigated. Using systematic review case studies from environmental science, we investigated the utility of GS in systematic reviews and in searches for grey literature. Our findings show that GS results contain moderate amounts of grey literature, with the majority found on average at page 80. We also found that, when searched for specifically, the majority of literature identified using Web of Science was also found using GS. However, our findings showed moderate/poor overlap in results when similar search strings were used in Web of Science and GS (10-67%), and that GS missed some important literature in five of six case studies. Furthermore, a general GS search failed to find any grey literature from a case study that involved manual searching of organisations' websites. If used in systematic reviews for grey literature, we recommend that searches of article titles focus on the first 200 to 300 results. We conclude that whilst Google Scholar can find much grey literature and specific, known studies, it should not be used alone for systematic review searches. Rather, it forms a powerful addition to other traditional search methods. In addition, we advocate the use of tools to transparently document and catalogue GS search results to maintain high levels of transparency and the ability to be updated, critical to systematic reviews.Entities:
Mesh:
Year: 2015 PMID: 26379270 PMCID: PMC4574933 DOI: 10.1371/journal.pone.0138237
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Typical characteristics of academic citation databases and search engines.
| Feature | Academic Citation Databases | Academic Citation Search Engines |
|---|---|---|
| Time coverage | Depends on the database, but time restrictions apply for all (for earliest entry) and may depend on institutional subscription | No time restrictions (full |
| Access | Via an online platform for which a subscription is often required (e.g. Proquest) | Service provided purely through a free-to-access online search engine |
| Inclusion | Typically selectively included according to a predefined list of journals, publishers or subject areas | Anything that matches a set of criteria automatically included. Criteria (for Google Scholar): 1) must have a dedicated page with a title, 2) title must be closely followed by authorship list, 3) manuscript should be PDF, HTML or DOC file, 4) manuscript file should include a ‘References’ or ‘Bibliography’ section |
| Update frequency | Variable–may be as often as weekly, but some databases are monthly or less frequent (e.g. Biological Abstracts, 6 weeks). Updates are based on print versions of journals so will not include ‘early view’ manuscripts until they appear in print. Updates are based on citations submitted by catalogued journals | Typically 1–2 weeks |
| Examples | Web of Science, Biological Abstracts | Google Scholar, Microsoft Academic Search |
| Search facility | Full Boolean strings allowed | Variable–Google Scholar allows limited Boolean operators (no nesting using parentheses permitted) and search string limited to 256 characters |
| Results displayed | Unlimited results from within the database returned, but numbers estimated for large record sets (> c. 5,000). Results sortable by many different fields | Typically limited–Google Scholar limited to first 1,000 with no explanation of or alteration to sort order |
Systematic reviews (SRs) used as case studies and their search strings (along with modifications to WoS search strings necessary to function in Google Scholar advanced search facility as indicated by strikethrough text).
Searches were performed on 06/02/15. Web of Science includes the following databases as part of the MISTRA EviEM subscription; KCI-Korean Journal Database, SciELO Citation Index and Web of Sciences Core Collection.
| Systematic review title and reference | Search string | Original systematic review methods | Google scholar search results | Web of Science search results |
|---|---|---|---|---|
| Evaluating effects of land management on greenhouse gas fluxes and carbon balances in boreo-temperate lowland peatland systems (SR1) [ | peat AND (“greenhouse gas” OR GHG OR CO2 OR "carbon dioxide" OR CH4 OR N2O OR methane OR "nitrous oxide" OR DOC OR carbon) | Full search in WoS, SR searched first 50 records in Google Scholar | 318,000 (full text), 1,120 (title) | 4,151 (topic), 419 (title) |
| Systematic review of effects on biodiversity from oil palm production (SR2) [ | “oil palm” AND tropic* AND (diversity OR richness OR abundance OR similarity OR composition OR community OR deforestation OR “land use change” OR fragmentation OR “habitat loss” OR connectivity OR “functional diversity” OR ecosystem OR displacement) | Full search in WoS, SR searched first 50 records in Google Scholar | 126,000 (full text), 968 (title) | 290 (topic), 3 (title) |
| Which components or attributes of biodiversity influence which dimensions of poverty? (SR3) [ | (biodiversity | Full search in WoS, SR searched first 50 records in Google Scholar (term ‘wildlife’ removed from our search as multiple OR sub-strings not possible in Google Scholar) | 835,000 (full text), 591 (title) | 4,435 (topic), 114 (title) |
| Evaluating the biological effectiveness of fully and partially protected marine areas (SR4) [ | marine AND (reserve OR "protected area" OR sanctuary OR "harvest refuge") | Full search in WoS, SR searched first 50 records in Google Scholar | 554,000 (full text), 4,310 (title) | 47,932 (topic), 1,303 (title) |
| Human well-being impacts of terrestrial protected areas (SR5) [ | "protected area" AND (poverty OR “human OR well*” OR socioeconomic* OR econom* OR “human OR health” OR livelihood OR “social OR capital” OR “social OR welfare” OR empowerment OR equity OR “ecosystem OR service” OR perception OR attitude) | Full search in WoS, GS search not performed | 49,700 (full text), 68 (title) | 1,059 (topic), 122 (title) |
| Evidence on the environmental impacts of farm land abandonment in high altitude/mountain regions: a systematic map (SR6) [ | abandonment AND (grassland OR farm OR cropland OR agriculture OR land OR pasture) | Full search in WoS, SR searched first 260 records in Google Scholar | 216,000 (full text), 517 (title) | 2,550 (topic), 180 (title) |
| A systematic review of phenotypic responses to between-population outbreeding (SR7) [ | depression AND ("out-breeding" OR outcrossing OR "out-crossing" OR "out-mating" OR outmating) | Full search in WoS, GS search not performed | 15,200 (full text), 50 (title) | 1,071 (topic),31 (title) |
Fig 1Proportion of total a) full text and b) title Google Scholar search results by literature type for 7 case studies (see Table 2 for descriptions of SR codes).
Overlap between Web of Science (WoS) and Google Scholar (GS) for title searches in Web of Science and the first 1,000 search results from title searches in Google Scholar.
See Table 2 for case study explanations.
| Case Study | Number of overlapping search results (% of WoS records) | No. of WoS title search results | No. of GS title search results |
|---|---|---|---|
| SR1 | 157 (37.8%) | 415 | 1,120 |
| SR2 | 2 (66.7%) | 3 | 968 |
| SR3 | 32 (49.2%) | 65 | 591 |
| SR4 | 223 (17.1%) | 1,301 | 4,310 |
| SR5 | 6 (10.3%) | 58 | 68 |
| SR6 | 68 (37.8%) | 180 | 517 |
| SR7 | 18 (58.1%) | 31 | 50 |
Overlap between Web of Science (WoS) and Google Scholar (GS) for topic word searches in Web of Science and the first 1,000 search results from full text searches in Google Scholar.
n/a corresponds to search results that were too voluminous to download in full. See Table 2 for case study explanations.
| Case Study | No. of WoS topic word search records | No. of GS full text search records | Number of overlapping search results (% of WoS records) |
|---|---|---|---|
| SR1 | 4,504 | 318,000 | 255 (5.7%) |
| SR2 | 230 | 126,000 | 11 (4.8%) |
| SR3 | 4,240 | 835,000 | 106 (2.5%) |
| SR4 | 47,932 | 554,000 | n/a |
| SR5 | 1,059 | 49,700 | 87 (8.2%) |
| SR6 | 2,549 | 216,000 | 5 (0.2%) |
| SR7 | 1,071 | 15,200 | 212 (19.8%) |
Duplication rates (proportion of total results that are duplicates) for Google Scholar and Web of Science for title-level, topic word and full text searches using 7 case study systematic review search strings.
Numbers in parentheses correspond to the standard deviations of the individual case study duplication rates. Sample size refers to the number of search records in total, followed by the number of independent search strings (i.e. the number of case studies investigated).
| Search Platform | Search Type | Duplication Rate (mean %) | Sample Size (search records, search strings) |
|---|---|---|---|
| Google Scholar | Full text | 0.56 (± 0.59) | 6988, 7 |
| Title | 2.93 (± 1.47) | 4194, 7 | |
| Web of Science | Topic words (sorted by publication date, newest first) | 0.00 (± 0.00) | 6359, 7 |
| Topic words (sorted by relevance) | 0.03 (± 0.05) | 4000, 4 | |
| Title | 0.05 (± 0.03) | 2102, 7 |
Duplication rates (proportion of total results that are duplicates) in Google Scholar and Web of Science searches across the 7 case studies.
Duplication rates are assessed for up to 1,000 search records (or the total number where less than c. 1,300). For Web of Science the full text results were ordered by publication date (newest first) and relevance where more than 1,000 results were returned. Numbers are duplication rate (%) followed by total search records in parentheses.
| Google Scholar | Web of Science | ||||
|---|---|---|---|---|---|
| Case Study | Title Search | Full Text Search | Title Search | Full Text Search (publ. date) | Full Text Search (relevance) |
| Evaluating effects of land management on greenhouse gas fluxes and carbon balances in boreo-temperate lowland peatland systems (SR1) [ | 3.4 (1000) | 0.2 (998) | 0 (415) | 0 (1000) | 0 (1000) |
| Systematic review of effects on biodiversity from oil palm production (SR2) [ | 2.6 (968) | 0.2 (990) | 0 (3) | 0 (230) | n/a |
| Which components or attributes of biodiversity influence which dimensions of poverty? (SR3) [ | 4.4 (591) | 0.7 (1000) | 0 (114) | 0 (1000) | 0 (1000) |
| Evaluating the biological effectiveness of fully and partially protected marine areas (SR4) [ | 1.0 (1000) | 0.0 (1000) | 0.1 (1301) | 0 (1000) | 0.1 (1000) |
| Human well-being impacts of terrestrial protected areas (SR5) [ | 1.5 (68) | 0.1 (1000) | 0 (58) | 0 (1058) | n/a |
| Evidence on the environmental impacts of farm land abandonment in high altitude/mountain regions: a systematic map (SR6) [ | 4.8 (517) | 1.3 (1000) | 0 (180) | 0 (1000) | 0 (1000) |
| A systematic review of phenotypic responses to between-population outbreeding (SR7) [ | 4.0 (50) | 1.4 (1000) | 0 (31) | 0 (1071) | n/a |
The ability of Google Scholar to find included articles from six published systematic reviews.
Records identified as citations are found only within reference lists of other articles (their existence is not verified by the presence of a publisher version or full text article, unlike hyperlinked citations).
| Review | Identified | Identified (as citation) | Not identified | (Of which, findable in WoS |
|---|---|---|---|---|
| Evaluating effects of land management on greenhouse gas fluxes and carbon balances in boreo-temperate lowland peatland systems (SR1) [ | 59 | 0 | 1 | 0 |
| Evaluating the biological effectiveness of fully and partially protected marine areas (SR4) [ | 158 | 24 | 11 | 3 |
| Human well-being impacts of terrestrial protected areas (SR5) [ | 162 | 4 | 10 | 3 |
| Evidence on the environmental impacts of farm land abandonment in high altitude/mountain regions: a systematic map (SR6) [ | 180 | 4 | 1 | 0 |
| What are the impacts of reindeer/caribou (Rangifer tarandus L.) on arctic and alpine vegetation? [ | 35 | 2 | 0 | 0 |
| What is the influence on water quality in temperate eutrophic lakes of a reduction of planktivorous and benthivorous fish? [ | 77 | 8 | 39 | 0 |
1 For those articles not found using Google Scholar, Web of Science searches were carried out using Bangor University subscription (Biological Abstracts, MEDLINE, SciELO Citation Index, Web of Science Core Collections, Zoological Record).