Literature DB >> 25927831

Are we filling the data void? An assessment of the amount and extent of plant collection records and census data available for tropical South America.

Kenneth Feeley1.   

Abstract

Large-scale studies are needed to increase our understanding of how large-scale conservation threats, such as climate change and deforestation, are impacting diverse tropical ecosystems. These types of studies rely fundamentally on access to extensive and representative datasets (i.e., "big data"). In this study, I asses the availability of plant species occurrence records through the Global Biodiversity Information Facility (GBIF) and the distribution of networked vegetation census plots in tropical South America. I analyze how the amount of available data has changed through time and the consequent changes in taxonomic, spatial, habitat, and climatic representativeness. I show that there are large and growing amounts of data available for tropical South America. Specifically, there are almost 2,000,000 unique geo-referenced collection records representing more than 50,000 species of plants in tropical South America and over 1,500 census plots. However, there is still a gaping "data void" such that many species and many habitats remain so poorly represented in either of the databases as to be functionally invisible for most studies. It is important that we support efforts to increase the availability of data, and the representativeness of these data, so that we can better predict and mitigate the impacts of anthropogenic disturbances.

Entities:  

Mesh:

Year:  2015        PMID: 25927831      PMCID: PMC4416035          DOI: 10.1371/journal.pone.0125629

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


Introduction

Big problems call for big ecology. Big ecology needs big data. There is a rapidly increasing need for large-scale studies in order to predict and mitigate the effects of large-scale conservation threats, such as deforestation and climate change [1,2]. For example, various studies have used massive collections of natural history records, range maps, and census plot data to estimate patterns of biodiversity across continental-scale areas and to predict how diversity will be impacted under different scenarios of climate change and habitat loss [3-10]. These large-scale studies use “big data”—data that is generally beyond the scope of what can be collected by individual researchers or through individual projects [2]. As such, these studies often depend heavily on extensive and expansive collations of datasets that are standardized and made available through collaborative networks or data clearinghouses. One of the most important clearinghouses for biogeographic and natural history data is the Global Biodiversity Information Facility (GBIF; http://www.gbif.org/). Indeed, since it contains copious amounts of data, is easy to use, is compatible with popular biogeographic methods (e.g., species distribution modeling) and is entirely open access, GBIF has rapidly become one of the most widely-used and important resources in ecology, biogeography, and conservation biology since its launch in 1997. According to their own statistics, GBIF data has been used in nearly 900 peer-reviewed scientific publications to date. While species occurrence databases, such as those linked through GBIF, are clearly a powerful and important resource, many studies have pointed out its potential limitations for biogeographic and ecological studies [11]. These limitations can be due to problem with data quality. For example, collections data are prone to taxonomic and georeferencing errors [12,13] and may suffer from biases in the taxonomic and spatial representativeness of the samples [14-18] due in part to the understandable tendency of collectors to focus their efforts in accessible areas and areas with well-established logistical and intellectual infrastructures [19-21]. Another potential limitation of occurrence databases is simply insufficient data quantity [14]. In 2011, Feeley and Silman, reported on the extreme paucity of collections data in GBIF (and a similar database for Brazil named SpeciesLink; http://splink.cria.org.br/) for tropical plant species. Specifically, using data downloaded in 2009 they estimated that only about 65% of tropical plant species were represented by any available geo-referenced collections and that of the represented species, only about 8% or 0.5% (approx. 5% or 0.3%, respectively, of all tropical plant species) had enough available records to be used in species distribution models or other analyses requiring 20 or 100 minimum samples, respectively [22]. Given the dominant role that GBIF (and at the time, SpeciesLink) plays in distributing natural history records, this lack of data from the tropics was considered a major constraint on studies of tropical species and diversity. Perhaps more troubling, the lack of available records from the tropics was considered symptomatic of a more general lack of knowledge about the distribution and ecology of most tropical species as well as a lack of knowledge about the composition and structure of vast expanses of the tropics. Feeley and Silman referred to this lack of knowledge as the “data void” [22]. Since Feeley and Silman published their study in 2011, GBIF has continued to grow and the amount of data available from all regions, including the tropics, has greatly increased. This growth has been due to ongoing collection efforts, the digitization of additional pre-existing records, and inclusion of new datasets into GBIF. There have also been laudable efforts at data standardization and cleaning (e.g., the Taxonomic Name Resolution Service, TNRS; http://tnrs.iplantcollaborative.org/) which affects the number of species represented in the dataset (in most cases decreasing the number of species through the elimination of synonyms and false species created by spelling errors) and the number of records available per species (generally increasing the number of records per species through the combination of records formerly assigned to different species names). While clearly important, natural history and occurrence records such as those provided by GBIF are inherently limited in their utility. For example, the geo-referenced data contained in natural history records can be used to map species ranges in relation to large climatic gradients, but they provide no information about local patterns of occurrence, species abundances, alpha diversity, or community composition. These types of patterns are better assessed through analyses of intensive plot inventories or censuses. Over the past several years there have been notable attempts to collate and standardize tropical forest inventory data (i.e., plot data) through collaborative networks. For example, the Amazon Tree Diversity Network (ATDN; http://web.science.uu.nl/Amazon/atdn/) and the RAINFOR Amazon Forest Inventory Network (http://www.rainfor.org/) have each compiled data from hundreds of pre-existing forest inventory plots in the Amazon basin supplemented with subsequent installations of new plots in targeted areas. These and other networks of plot census data are being used to look at large-scale patterns in forest structure and composition across the Amazon [5,6,23-26]. Here, I asses the availability of occurrence and census data from the tropics and examine how this availability has varied through time as well as how it varies through space. Specifically, I quantify the amount of occurrence data available through GBIF for plant species and from different habitats in tropical South America and examine how data availability has changed through time. I also analyze the spatial and habitat distribution of South American forest census plots as represented in several of the most prominent plot networks. The goal of this study is to characterize the state of data availability for tropical species and habitats of South America and to evaluate the rate that we are, or are not, filling the data void. By understanding where our data limitations are, we can better assess the generalizability of the results emerging from analyses of the existing data and direct future studies to help reduce these data limitations.

Methods

Data

All records for plant species (kingdom plantae) occurring in the tropical latitudes of South America were downloaded from the GBIF data portal on February 1st 2014 (see S1 File for list of contributing databases and herbaria). The records were screened to exclude those without geo-referencing information or with obvious errors in geo-referencing data (i.e., flagged by GBIF for data quality issues or with coordinates occurring in ocean or in areas outside of South America). All of the species names listed with the collection records were then verified using the online Taxonomic Name Resolution Service (TNRS) to remove synonyms and correct for spelling errors. Duplicate records were removed by screening for records with identical species names and collection coordinates. I collated the locations of inventory plots as published online and in published articles for five of the largest and most prominent South American census plot networks: RAINFOR, ATDN, Forestplots.net (https://www.forestplots.net/; [6]), the Smithsonian Institute’s Center for Tropical Forest Science (CTFS; http://www.ctfs.si.edu/), and the Red de Bosques (http://www.condesan.org/redbosques/). Many individual census plots are members of multiple networks so all duplicate records were identified and removed. In order to assess the representation of different habitats in the herbarium and plot datasets, I classified the habitats of Tropical South America according to WWF ecoregions (http://www.worldwildlife.org/publications/terrestrial-ecoregions-of-the-world; [27]). In addition, I divided tropical South America into discrete climatic zones on the basis of mean annual temperature (MAT) and total annual precipitation (TAP). Estimates of MAT and TAP were based on “current” (mean of 1960–1990) conditions according to the WorldClim database (http://www.worldclim.org/ [28]). Climatic zones were characterized as those areas having different combinations of MAT = 0–2°C, 2–4°C, 4–6°C, 6–8°C, 8–10°C, 10–12°C, 12–14°C, 14–16°C,16–18°C, 18–20°C, 20–22°C, 22–24°C, 24–26°C, 26–28°C, 28–30°C; and TAP = 0–500mm, 1000–1500mm, 1500–2000mm, 2000–3000mm, 3000–4000mm, 4000–6000mm, and >6000mm.

Analyses

To calculate the change in data availability in GBIF through time, I tallied the number of records available in each year from 2007 (when GBIF launched) to the end of 2013. I then tallied the number of records available per species in each year. I calculated and mapped the average density of records (no. of records per km2) per 0.5 x 0.5 longitude/latitude degree cell and how this record density has changed through time. Using the collated list of plot locations, I measured the straight line distance of all possible 30 arc second grid cell centers in tropical South America to the closest census plot. Finally, I calculated and mapped the average density of collections and census plots within each of the WWF ecoregions and within each of the climatic zones as defined above.

Results

The number of georeferenced plant records available through GBIF for tropical South America has increased rapidly since 2007 (Table 1 and Fig 1a–1g), as has the number of represented species and the number of collections available per species (Table 1 and Fig 2a). Of note is the fact that most of this increase is due to the inclusion of additional pre-existing records rather than new collections. For example, of the 177,925 records added to GBIF in 2013, only 2,910 (1.5%) were of collections that were actually made in 2013. The largest increase in data availability came between 2010 and 2011 when the number of records increased by nearly 300%; this increase was driven in large part by the incorporation of SpeciesLink data into GBIF.
Table 1

The availability of plant collections data through the Global Biodiversity and Information Facility (GBIF).

YearNo. of collectionsNo. of speciesMean no. of collections per species* Median no. of collections per species* Mean no. col / km2 Median no. col / km2 % of area with zero collections
2007164214133393.15 (12.40)0 (3)0.010.0054.81
2008206218174543.95 (12.93)0 (3)0.020.0051.25
2009331856275326.36 (12.15)1 (3)0.020.0030.91
2010386042298247.40 (13.04)1 (4)0.030.0027.50
201110983864605621.05(23.98)4 (6)0.080.0118.50
201216386005002631.40 (32.92)5 (6)0.120.0117.73
201318165255243234.81(34.81)6 (6)0.130.0113.14

* the values in parentheses indicate the mean or median number of collections per species if only the species represented in that year are included.

Fig 1

Locations of occurrence records and census plots in tropical South America.

Maps showing the location of plant occurrence records available through GBIF each year from 2007 to 2013 (Panels A-G, respectively) and the location of the networked vegetation census plots included in the analyses (Panel H).

Fig 2

Number of occurrence records per species and density of collection in tropical South America.

Panel A shows the cumulative number of occurrence records per plant species of tropical South America available through GBIF in 2007–2013. Panel B shows the cumulative mean density of occurrence records (no. per km2) available through GBIF for tropical South America in 2007–2013.

Locations of occurrence records and census plots in tropical South America.

Maps showing the location of plant occurrence records available through GBIF each year from 2007 to 2013 (Panels A-G, respectively) and the location of the networked vegetation census plots included in the analyses (Panel H).

Number of occurrence records per species and density of collection in tropical South America.

Panel A shows the cumulative number of occurrence records per plant species of tropical South America available through GBIF in 2007–2013. Panel B shows the cumulative mean density of occurrence records (no. per km2) available through GBIF for tropical South America in 2007–2013. * the values in parentheses indicate the mean or median number of collections per species if only the species represented in that year are included. In terms of spatial representation, the average and median density of collections has increased by an order of magnitude since 2007 (Table 1 and Fig 2b). In 2007, the majority of tropical South America was unrepresented by any GBIF herbarium collections (when collections are aggregated at a spatial scale of 0.5° latitude/longitude); by 2013, more than 85% of tropical South America was represented by at least one GBIF collection (Table 1 and Fig 3a). The density of collections varies greatly across space (Fig 3a) and between ecoregions (Table 2 and Fig 4a) and climatic zones (Tables 3 and 4). The greatest density of collections comes from the Northern Andean Paramo ecoregion. This ecoregion is relatively small (approximately 25000 km2) but is represented by more than 40000 unique plant collections (Table 2). The 2nd through 5th best collected ecoregions are also Andean (High Monte, Northwestern Andean Montane Forests, Eastern Cordillera Real Montane Forests, and Bolivian Yungas). Accordingly, the greatest density of collections come from cool, wet habitats such as those occurring in the montane Paramo. More generally, dryer areas and areas with hot (>20°C) or very cold (<10°C) mean annual temperatures are underrepresented in the GBIF database (Table 4).
Fig 3

Density of occurrence records and distance to census plots in tropical South America.

Panel A maps the mean density of plant occurrence records available in 2013 for tropical South America. Density is mapped at a spatial resolution of 0.5 x 0.5°, white pixels are areas where there are no available occurrence records. Panel B maps the distance from points in tropical South America to the closest vegetation census plot (mapped at a spatial resolution on 30 arc seconds).

Table 2

Density of collections data and census plots in different ecoregions of tropical South America (ecoregions defined according to the World Wildlife Fund; http://www.worldwildlife.org/publications/terrestrial-ecoregions-of-the-world; [23]).

Ecoregion nameArea (km2)* No. collectionsNo. col / km2 No. plotsNo. plots / 10000 km2
Cerrado1914593692340.036390.204
Southwest Amazon moist forests7608471968130.2591752.300
Caatinga737482279060.03800.000
Madeira-Tapajos moist forests721309100590.014330.458
Guianan moist forests482415971360.2012334.830
Uatuma-Trombetas moist forests470092101730.0222034.318
Mato Grosso seasonal forests410995149110.036200.487
Llanos380653208050.05560.158
Dry Chaco347572168020.04870.201
Tapajos-Xingu moist forests33878212590.004270.797
Alto Parana Atlantic forests310138105930.03400.000
Japura-Solimoes-Negro moist forests27144337140.014220.810
Xingu-Tocantins-Araguaia moist forests26736322760.009451.683
Napo moist forests2589252508100.969963.708
Jurua-Purus moist forests2442834310.002251.023
Bahia interior forests230463127210.05500.000
Chiquitano dry forests230418209220.091351.519
Guianan piedmont and lowland moist forests229187130290.05770.305
Central Andean dry puna21158333880.01600.000
Negro-Branco moist forests204934436090.213251.220
Caqueta moist forests19615972010.037391.988
Tocantins/Pindare moist forests19452634410.018653.341
Peruvian Yungas186496437570.235160.858
Sechura desert18525381000.04400.000
Solimoes-Japura moist forests178847301880.169291.621
Purus-Madeira moist forests1742017650.004623.559
Pantanal17044431110.01800.000
Central Andean puna157648165450.10520.127
Purus varzea15085729290.019322.121
Guianan Highlands moist forests144118157490.109151.041
Maranhao Babatu forests14381318000.01310.070
Beni savanna12301534920.02810.081
Central Andean wet puna122709160600.13100.000
Ucayali moist forests116663383580.329110.943
Atlantic dry forests11215780170.07100.000
Bahia coastal forests112007218950.19500.000
Magdalena Valley montane forests108222902590.834171.571
Eastern Cordillera real montane forests1058321102151.041676.331
Guianan savanna10549258610.05680.758
Iquitos varzea103580665280.642181.738
Rio Negro campinarana9496327220.029242.527
Bolivian Yungas92087953281.035121.303
Marajo varzea877505270.00670.798
Northwestern Andean montane forests83136911441.09610.120
Atacama desert812765290.00700.000
Magdalena-Uraba moist forests7629461960.08100.000
Bolivian montane dry forests73599174950.23840.543
Cordillera Oriental montane forests69768229470.32900.000
Monte Alegre varzea692806540.009243.464
La Costa xeric shrublands6887143320.06330.436
Apure-Villavicencio dry forests6863768590.100142.040
Choco-Darian moist forests60304361480.59900.000
Humid Chaco5733333950.05900.000
Serra do Mar coastal forests55436194400.35100.000
Pantepui4869888660.18230.616
Southern Andean Yungas44794117350.262357.813
Tumbes-Piura dry forests4216119770.04700.000
Cauca Valley montane forests34628304240.87941.155
Western Ecuador moist forests33726223670.66300.000
Amazon-Orinoco-Southern Caribbean mangroves3316755050.16620.603
Maracaibo dry forests3084114850.04820.648
Venezuelan Andes montane forests2993587650.293124.009
Guajira-Barranquilla xeric scrub2805321680.07741.426
Orinoco Delta swamp forests2789512270.04410.358
Campos Rupestres montane savanna2694088060.32700.000
Sin· Valley dry forests2643421330.08100.000
Northern Andean paramo24388400231.64120.820
Ecuadorian dry forests2236666700.29800.000
Pernambuco interior forests2175011380.05200.000
Catatumbo moist forests2107514900.07100.000
Magdalena Valley dry forests1923173240.38100.000
Pernambuco coastal forests1768913430.07600.000
Lara-Falcon dry forests175779350.05300.000
Paraguana xeric scrub155532040.01300.000
Cordillera La Costa montane forests1524344180.29000.000
Maranon dry forests1231437970.30800.000
Cordillera Central paramo1159515000.12900.000
Northeastern Brazil restingas11002810.00700.000
Gurupa varzea96331070.01100.000
Araucaria moist forests75563090.04100.000
Guianan freshwater swamp forests6850900.01300.000
South American Pacific mangroves653414050.21500.000
Araya and Paria xeric scrub60904820.07900.000
Southern Atlantic mangroves60199180.15300.000
Orinoco wetlands5432210.00400.000
Santa Marta montane forests507225680.50600.000
Cauca Valley dry forests480331380.65300.000
Atlantic Coast restingas32994240.12900.000
Guayaquil flooded grasslands309612630.40800.000
Caatinga Enclaves moist forests3089980.03200.000
Cordillera de Merida paramo27218270.30400.000
Chilean matorral2211180.00800.000
Patia Valley dry forests2064970.04700.000
Santa Marta paramo13523780.28000.000
High Monte126113971.10800.000
Eastern Panamanian montane forests341260.07600.000

* area only within tropical South America; some ecoregions may extend beyond this region.

Fig 4

Density of occurrence records and census plots in ecoregions of tropical South America.

Maps showing the mean density of (A) occurrence records and (B) vegetation census plots in each of tropical South American ecoregions.

Table 3

The extent of land area (km2) under different climatic conditions as defined by current Total Annual Precipitation (TAP) and Mean Annual Temperature (MAT).

TAP (mm) MAT (°C)0–500500–10001000–15001500–20002000–30003000–40004000–6000>6000>0
0–2 77811565400000023435
2–4 36708547440034200091452
4–6 499236386933833421019000118536
6–8 7538273829749924001717000160827
8–10 104778657551330365231019000191378
10–12 44748352712323585902734000114578
12–14 44012467701939678824451000122511
14–16 515914358223777191471129434200149732
16–18 5379045071292563234425799377600190035
18–20 4679242818123692829462895895916770335475
20–22 11238162000365341119160727512707641090761676
22–24 8595346057278231146653612693043135125996851978721
24–26 528745506048000391322593180555142005622959133734988049
26–28 5399201323500020103827023397752203359575147504329448
28–30 505694591561095203554634091705080305
>0 6760251871321270686231162524457545727720516252880813636159
Table 4

The density of collection records (No. col / km2) available for areas under different climatic conditions as defined by current Total Annual Precipitation (TAP) and Mean Annual Temperature (MAT).

TAP (mm) MAT (°C)0–500500–10001000–15001500–20002000–30003000–40004000–6000>6000>0
0–2 0.0180.014------0.009
2–4 0.0100.053--6.376---0.032
4–6 0.0190.1150.8440.0563.553---0.117
6–8 0.0690.1621.4400.7670.474---0.158
8–10 0.0780.2050.8901.5860.692---0.190
10–12 0.0450.6421.3340.8470.520---0.544
12–14 0.0790.5130.8590.7401.240---0.425
14–16 0.0220.4810.7910.6280.8980.056--0.414
16–18 0.0220.3491.0070.4500.7620.753--0.433
18–20 0.0630.4460.2710.3360.6020.8510.095-0.317
20–22 0.0220.1620.1210.2560.4300.4571.875-0.200
22–24 0.0290.0480.0670.0670.1830.5410.9660.0260.083
24–26 0.0560.0300.0490.0690.1250.3650.1860.8030.109
26–28 0.0090.0120.0300.0910.1310.1580.4201.1810.110
28–30 0.0030.0250.0250.0520.0960.1020.056-0.062
>0 0.0460.1100.1130.1050.1460.3230.5490.9780.131

Density of occurrence records and distance to census plots in tropical South America.

Panel A maps the mean density of plant occurrence records available in 2013 for tropical South America. Density is mapped at a spatial resolution of 0.5 x 0.5°, white pixels are areas where there are no available occurrence records. Panel B maps the distance from points in tropical South America to the closest vegetation census plot (mapped at a spatial resolution on 30 arc seconds).

Density of occurrence records and census plots in ecoregions of tropical South America.

Maps showing the mean density of (A) occurrence records and (B) vegetation census plots in each of tropical South American ecoregions. * area only within tropical South America; some ecoregions may extend beyond this region. Combined, the included plot networks represent 1535 census plots distributed throughout tropical South America (Table 2 and Fig 1h). Any given point in tropical South America is an average of 235.7 km from the closest census plot (median distance = 149.2 km; Fig 3b). 1.3% of tropical South America is within 10 km of the closet census plot; 14.9% is within 50 km of a plot; 33.8% is within 100 km of a plot; and 86.7% is within 500 km of a plot. The greatest distance from any point in tropical South America to its closest of the included plots is 1273.3 km. As with the collections data, the greatest concentration of plots are in the tropical montane ecoregions (the Southern Andean Yungas, Eastern Cordillera Real Montane Forests, and Venezuelan Andes Montane Forests are the best, second-best, and fifth-best represented ecoregions, respectively; Table 2 and Fig 4b). The best-represented climatic zones, by far, are the areas with 4000–6000 mm rainfall and mean annual temperatures of 20–22°C, and areas with 3000–4000 mm rainfall and mean annual temperatures of 16–18°C. The other climatic zones have markedly lower densities of plots (Table 5).
Table 5

The density of census plots (No. plots / 10000 km2) in areas under different climatic conditions as defined by current Total Annual Precipitation (TAP) and Mean Annual Temperature (MAT).

TAP (mm) MAT (°C)0–500500–10001000–15001500–20002000–30003000–40004000–6000>6000>0
0–2 0.0000.000------0.000
2–4 0.0000.000--0.000---0.000
4–6 0.0000.0000.0000.0000.000---0.000
6–8 0.0000.0002.6670.0000.000---0.124
8–10 0.0000.0000.0000.0000.000---0.000
10–12 0.0000.5670.0002.3280.000---0.349
12–14 0.4540.0000.00017.7624.494---1.469
14–16 0.0001.1470.8410.0001.7710.000--0.601
16–18 0.0001.3316.1530.0002.71342.375--2.473
18–20 0.0001.8680.0810.2410.0008.3410.000-0.566
20–22 0.0000.3700.8490.7550.8250.36955.968-0.998
22–24 0.0000.0000.1790.5140.5512.5500.7940.0000.288
24–26 0.0000.0540.5250.8321.3132.6430.0000.0001.008
26–28 0.0000.2480.2200.9052.7402.0880.0000.0001.841
28–30 0.0000.0000.0001.0500.5630.0000.000-0.374
>0 0.0300.1870.4470.8212.0282.6524.6490.0001.126

Discussion

In order to advance conservation science we need to overcome the Wallacean and Linnean shortfalls [29-31]. In other words, we need to know what species are out there and where they occur [32]. One tool that can help us to bypass these shortfalls is the rapidly expanding availability of natural history and collections data. For tropical South America, the amount of collections data that is available online has skyrocketed over the past 2 decades (Fig 1a–1g). Indeed, since the launch of GBIF in 2007, the number of records available from tropical South America has increased by nearly 60% annually. The rapid increase in available collection data has led to a marked decrease in the “data void”; however, the data void still exists and in some regards remains unacceptably large. This is because the majority of newly-added collections have gone towards increasing the number of species represented but relatively few collections go towards augmenting the sample size of records available for the already-collected species. In other words, the records now include many more species than previously. For example, the number of species represented in the available data rose from <15,000 species in 2007 to >52,000 species in 2013. However, most species remain so poorly represented that they are functionally invisible to ecological studies since they have too small of sample sizes to be included in most modelling exercises or conservation assessments (e.g., in 2013, <14,000 species [26%] had 20 or more available records while >30,000 species [57%] are represented by fewer than 10 records and nearly 10,000 species [19%] are represented by just a single record; Fig 2a). It is also highly likely that there are many more species that remain unnamed or that are not represented by the online databases [31,33]. Indeed, many species will probably never be represented in herbaria or online databases due to their rarity as well as the difficulty of collecting flowers and fruits which are often needed to accurately identify the collections to species. The vast majority of specimens gathered for ecological studies in the tropics are sterile; therefore many collections are not identified to species or are identified incorrectly [6]. Likewise, from a spatial perspective, the data void has shrunk but still remains distressingly large. To date, more than 10% of tropical South America is still represented by no collection records at all and an additional 15% remains with a density of less than 0.0005 available records per km2—or in other words, with just one collection for every 2000 km2 (Fig 3a). Indeed, the overall collection density across the region is approximately 1 collection record for every 10 km2. Making matters worse, the density of collection records is not evenly distributed amongst habitat types or climatic zones. Many ecoregions are very poorly represented in the GBIF collections database. For example, the Cerrado is one of the South America’s largest, most diverse, and most threatened ecoregions [4,34] but it is represented by an average of just one record for approximately every 30 km2. The Caatinga Dry Forest ecoregion of northern Brazil is represented by an approximately equal density of collection records while other ecoregions such as the Madeira-Tapajos Moist Forests are represented by even lower densities of records (1 collection for every 70 km2). In contrast, several of the Andean ecoregions are relatively well-represented with densities of more than 1 collection record per km2 (Table 2). In accord with these patterns, there are also large disparities between climatic zones in their collection intensities. Hot, dry habitats are the least-represented, potentially due to lower diversities and densities of plants in these area. However, there are other very large and relatively-hospitable climate zones which are also very poorly represented. For example, large parts of tropical South America have mean annual rainfalls of between 1500–2000mm and mean annual temperatures of 24–28°C, but these areas are not well-represented in the collections database (Table 4). This may be due to a lack of access (due to physical or bureaucratic impediments) or infrastructure in these areas. Natural history and herbarium records are only one form of data. In the past decades there have been multiple independent efforts to collate and standardize census data. However, even when combining these efforts the number of plots pales in comparison to the extent and diversity of tropical South America. Across all of the included networks, there are a total of approximately 1500 plots (Fig 1h) covering roughly 15 km2 of forest in tropical South America. In other words, combining all our efforts we are still only censusing about 0.0001% of the total land area (or <0.0003% of the Amazon). For comparison, this is 1/20th the spatial density of plots in the USA that are maintained by just the US Forest Services’ Forest Inventory and Analysis Program alone (http://fia.fs.fed.us/). Most places in tropical South America are more than 150 km from the closest census plot (Fig 3b) and most ecoregions in tropical South America have less than 1 plot per 30,000 square km2 (Fig 4b). The relative density of census plots across South American ecoregions is not significantly correlated with the density of collection records (Pearson correlation coefficient R = 0.16). In other words, habitats that are well represented in the plot networks are often only poorly-represented in the collections data and vice versa (Table 2). In some cases, this discrepancy is understandable given the emphasis of the included plot networks on tree census and hence the preclusion of plots from some areas with low forest cover but with high plant diversities and thus many collections (such as the Northern Andean Paramo ecoregion). As such, it is possible that the discrepancy between plot and collection densities could be reduced through the inclusion of additional plots focused on non-forest habitats (for example, the GLobal Observation Research Initiative in Alpine Environments network [GLORIA; http://www.gloria.ac.at/] promotes standardized methodologies to census alpine vegetation around the world, including in the high Andes; the GLORIA network was not included in this analysis because their plot locations and data are not readily available online). In other cases (e.g., high plot density but low collection density in the Monte Alegre Varzea and Southern Andean Yungas ecoregions, and vice versa in the Western Ecuador Moist Forests ecoregion), the discrepancy in plot vs. collection density has no apparent explanation other than differential sampling efforts between habitats and regions. Interestingly, the density of neither plots nor collections appears to be strongly linked to the actual representation of the different habitats in ecological studies. For example, Andean ecosystems are greatly underrepresented in the ecological literature but are fairly well-represented in the two types of databases explored here [35]. Given the high spatial variability in composition, structure and dynamic across the Amazon and Tropical South America, there is a clear need for more and better-distributed census plots, collection campaigns, and research efforts. From a diversity standpoint, it is hard to assess the representativeness of the census plots. This is because many species remain unidentified [31] and nomenclature and taxonomy has still not been fully standardized between, or even within, separate plot networks [36]. In one recent study [6], the taxonomy and nomenclature of collections from the ATDN, the largest of the tropical plot networks, was standardized. The ATDN’s network of 1170 plots were found to include 4,962 species of trees. This is only 30% of the estimated Amazonian tree diversity and less than 10% of the total tropical South American plant diversity as represented in the herbarium collection database. Furthermore, most species were poorly-represented within and across the plots. 21% of represented species occurred within just a single plot and 13% had only a single individual [6]. In other words, as with the herbarium records, most species remain functionally invisible to ecology and conservation studies due to small sample sizes. Making matters worse, census plot data are not usually open access. Each plot network understandably has its own policies for data distribution and sharing, but in most cases data are provided only upon request for use in specific, pre-approved analyses. In other words, census data are not typically available for data mining, data exploration, preliminary studies, or for general assessments (for this study I only used the published locations of plots and not any actual plot data). Limited access to data can hinder the advancement of large-scale ecological, biogeographic and conservation studies; data should made publically available whenever possible (with proper attribution to the data creators and managers). The results of this assessment clearly illustrate that while recent expansions of collection databases and census plot networks have greatly increased the amount of data available for tropical South America, the data void remains far from filled. There are many species that are still not adequately represented in either the collection and/or plot databases. There are also huge parts of tropical South America, containing many distinct and diverse habitats and climatic zones, that remain very poorly-represented in either the plot and/or collections databases. The lack of data for particular species, habitats and climate zones limits our ability to predict the impacts of climate change and other large-scale anthropogenic disturbances—especially when these disturbances themselves may disproportionately impact different species and habitats. This limitation is expected to be even more severe in other tropical regions, such as Africa and Southeast Asia, where data availability is generally lower [22]. To conclude, I stress that my goal in this study is not to criticize the workers who are diligently adding to or administrating the natural history databases and plot networks. Quite the opposite, my motivation is to help highlight how valuable these data are, and how important ongoing efforts at increasing sample sizes through the generation of new data and the publishing of existing datasets will be [18,37]. Simply put, we need more data.

Contributing Herbaria.

Citations for herbaria and databases contributing information to the Global Biodiversity Information Facility as used in this study (Data was downloaded from GBIF in January and February 2014). (DOCX) Click here for additional data file.
  12 in total

1.  Concerted changes in tropical forest structure and dynamics: evidence from 50 South American long-term plots.

Authors:  S L Lewis; O L Phillips; T R Baker; J Lloyd; Y Malhi; S Almeida; N Higuchi; W F Laurance; D A Neill; J N M Silva; J Terborgh; A Torres Lezama; R Vásquez Martínez; S Brown; J Chave; C Kuebler; P Núñez Vargas; B Vinceti
Journal:  Philos Trans R Soc Lond B Biol Sci       Date:  2004-03-29       Impact factor: 6.237

Review 2.  Fingerprinting the impacts of global change on tropical forests.

Authors:  Simon L Lewis; Yadvinder Malhi; Oliver L Phillips
Journal:  Philos Trans R Soc Lond B Biol Sci       Date:  2004-03-29       Impact factor: 6.237

3.  Plant and animal endemism in the eastern Andean slope: challenges to conservation.

Authors:  Jennifer J Swenson; Bruce E Young; Stephan Beck; Pat Comer; Jesús H Córdova; Jessica Dyson; Dirk Embert; Filomeno Encarnación; Wanderley Ferreira; Irma Franke; Dennis Grossman; Pilar Hernandez; Sebastian K Herzog; Carmen Josse; Gonzalo Navarro; Víctor Pacheco; Bruce A Stein; Martín Timaná; Antonio Tovar; Carolina Tovar; Julieta Vargas; Carlos M Zambrana-Torrelio
Journal:  BMC Ecol       Date:  2012-01-27       Impact factor: 2.964

4.  Limitations of biodiversity databases: case study on seed-plant diversity in Tenerife, Canary Islands.

Authors:  Joaquín Hortal; Jorge M Lobo; Alberto Jiménez-Valverde
Journal:  Conserv Biol       Date:  2007-06       Impact factor: 6.560

5.  Extinction risks of Amazonian plant species.

Authors:  Kenneth J Feeley; Miles R Silman
Journal:  Proc Natl Acad Sci U S A       Date:  2009-07-14       Impact factor: 11.205

6.  A large and persistent carbon sink in the world's forests.

Authors:  Yude Pan; Richard A Birdsey; Jingyun Fang; Richard Houghton; Pekka E Kauppi; Werner A Kurz; Oliver L Phillips; Anatoly Shvidenko; Simon L Lewis; Josep G Canadell; Philippe Ciais; Robert B Jackson; Stephen W Pacala; A David McGuire; Shilong Piao; Aapo Rautiainen; Stephen Sitch; Daniel Hayes
Journal:  Science       Date:  2011-07-14       Impact factor: 47.728

7.  Moving forward with species distributions.

Authors:  Kenneth J Feeley
Journal:  Am J Bot       Date:  2015-02-03       Impact factor: 3.844

8.  Biodiversity hotspots house most undiscovered plant species.

Authors:  Lucas N Joppa; David L Roberts; Norman Myers; Stuart L Pimm
Journal:  Proc Natl Acad Sci U S A       Date:  2011-07-05       Impact factor: 11.205

Review 9.  CTFS-ForestGEO: a worldwide network monitoring forests in an era of global change.

Authors:  Kristina J Anderson-Teixeira; Stuart J Davies; Amy C Bennett; Erika B Gonzalez-Akre; Helene C Muller-Landau; S Joseph Wright; Kamariah Abu Salim; Angélica M Almeyda Zambrano; Alfonso Alonso; Jennifer L Baltzer; Yves Basset; Norman A Bourg; Eben N Broadbent; Warren Y Brockelman; Sarayudh Bunyavejchewin; David F R P Burslem; Nathalie Butt; Min Cao; Dairon Cardenas; George B Chuyong; Keith Clay; Susan Cordell; Handanakere S Dattaraja; Xiaobao Deng; Matteo Detto; Xiaojun Du; Alvaro Duque; David L Erikson; Corneille E N Ewango; Gunter A Fischer; Christine Fletcher; Robin B Foster; Christian P Giardina; Gregory S Gilbert; Nimal Gunatilleke; Savitri Gunatilleke; Zhanqing Hao; William W Hargrove; Terese B Hart; Billy C H Hau; Fangliang He; Forrest M Hoffman; Robert W Howe; Stephen P Hubbell; Faith M Inman-Narahari; Patrick A Jansen; Mingxi Jiang; Daniel J Johnson; Mamoru Kanzaki; Abdul Rahman Kassim; David Kenfack; Staline Kibet; Margaret F Kinnaird; Lisa Korte; Kamil Kral; Jitendra Kumar; Andrew J Larson; Yide Li; Xiankun Li; Shirong Liu; Shawn K Y Lum; James A Lutz; Keping Ma; Damian M Maddalena; Jean-Remy Makana; Yadvinder Malhi; Toby Marthews; Rafizah Mat Serudin; Sean M McMahon; William J McShea; Hervé R Memiaghe; Xiangcheng Mi; Takashi Mizuno; Michael Morecroft; Jonathan A Myers; Vojtech Novotny; Alexandre A de Oliveira; Perry S Ong; David A Orwig; Rebecca Ostertag; Jan den Ouden; Geoffrey G Parker; Richard P Phillips; Lawren Sack; Moses N Sainge; Weiguo Sang; Kriangsak Sri-Ngernyuang; Raman Sukumar; I-Fang Sun; Witchaphart Sungpalee; Hebbalalu Sathyanarayana Suresh; Sylvester Tan; Sean C Thomas; Duncan W Thomas; Jill Thompson; Benjamin L Turner; Maria Uriarte; Renato Valencia; Marta I Vallejo; Alberto Vicentini; Tomáš Vrška; Xihua Wang; Xugao Wang; George Weiblen; Amy Wolf; Han Xu; Sandra Yap; Jess Zimmerman
Journal:  Glob Chang Biol       Date:  2014-09-25       Impact factor: 10.863

10.  How many species of flowering plants are there?

Authors:  Lucas N Joppa; David L Roberts; Stuart L Pimm
Journal:  Proc Biol Sci       Date:  2010-07-07       Impact factor: 5.349

View more
  10 in total

1.  Ancient human disturbances may be skewing our understanding of Amazonian forests.

Authors:  Crystal N H McMichael; Frazer Matthews-Bird; William Farfan-Rios; Kenneth J Feeley
Journal:  Proc Natl Acad Sci U S A       Date:  2017-01-03       Impact factor: 11.205

2.  The discovery of the Amazonian tree flora with an updated checklist of all known tree taxa.

Authors:  Hans Ter Steege; Rens W Vaessen; Dairon Cárdenas-López; Daniel Sabatier; Alexandre Antonelli; Sylvia Mota de Oliveira; Nigel C A Pitman; Peter Møller Jørgensen; Rafael P Salomão
Journal:  Sci Rep       Date:  2016-07-13       Impact factor: 4.379

3.  A free-access online key to identify Amazonian ferns.

Authors:  Gabriela Zuquim; Hanna Tuomisto; Jefferson Prado
Journal:  PhytoKeys       Date:  2017-03-22       Impact factor: 1.635

4.  Exploring the floristic diversity of tropical Africa.

Authors:  Marc S M Sosef; Gilles Dauby; Anne Blach-Overgaard; Xander van der Burgt; Luís Catarino; Theo Damen; Vincent Deblauwe; Steven Dessein; John Dransfield; Vincent Droissart; Maria Cristina Duarte; Henry Engledow; Geoffrey Fadeur; Rui Figueira; Roy E Gereau; Olivier J Hardy; David J Harris; Janneke de Heij; Steven Janssens; Yannick Klomberg; Alexandra C Ley; Barbara A Mackinder; Pierre Meerts; Jeike L van de Poel; Bonaventure Sonké; Tariq Stévart; Piet Stoffelen; Jens-Christian Svenning; Pierre Sepulchre; Rainer Zaiss; Jan J Wieringa; Thomas L P Couvreur
Journal:  BMC Biol       Date:  2017-03-07       Impact factor: 7.431

5.  The tree species pool of Amazonian wetland forests: Which species can assemble in periodically waterlogged habitats?

Authors:  Bruno Garcia Luize; José Leonardo Lima Magalhães; Helder Queiroz; Maria Aparecida Lopes; Eduardo Martins Venticinque; Evlyn Márcia Leão de Moraes Novo; Thiago Sanna Freire Silva
Journal:  PLoS One       Date:  2018-05-29       Impact factor: 3.240

6.  Conceptual and empirical advances in Neotropical biodiversity research.

Authors:  Alexandre Antonelli; María Ariza; James Albert; Tobias Andermann; Josué Azevedo; Christine Bacon; Søren Faurby; Thais Guedes; Carina Hoorn; Lúcia G Lohmann; Pável Matos-Maraví; Camila D Ritter; Isabel Sanmartín; Daniele Silvestro; Marcelo Tejedor; Hans Ter Steege; Hanna Tuomisto; Fernanda P Werneck; Alexander Zizka; Scott V Edwards
Journal:  PeerJ       Date:  2018-10-04       Impact factor: 2.984

Review 7.  The future of hyperdiverse tropical ecosystems.

Authors:  Jos Barlow; Filipe França; Toby A Gardner; Christina C Hicks; Gareth D Lennox; Erika Berenguer; Leandro Castello; Evan P Economo; Joice Ferreira; Benoit Guénard; Cecília Gontijo Leal; Victoria Isaac; Alexander C Lees; Catherine L Parr; Shaun K Wilson; Paul J Young; Nicholas A J Graham
Journal:  Nature       Date:  2018-07-25       Impact factor: 49.962

8.  Using digital soil maps to infer edaphic affinities of plant species in Amazonia: Problems and prospects.

Authors:  Gabriel Massaine Moulatlet; Gabriela Zuquim; Fernando Oliveira Gouvêa Figueiredo; Samuli Lehtonen; Thaise Emilio; Kalle Ruokolainen; Hanna Tuomisto
Journal:  Ecol Evol       Date:  2017-09-12       Impact factor: 2.912

9.  Research applications of primary biodiversity databases in the digital age.

Authors:  Joan E Ball-Damerow; Laura Brenskelle; Narayani Barve; Pamela S Soltis; Petra Sierwald; Rüdiger Bieler; Raphael LaFrance; Arturo H Ariño; Robert P Guralnick
Journal:  PLoS One       Date:  2019-09-11       Impact factor: 3.240

10.  A Preliminary Evaluation of The Karst Flora of Brazil Using Collections Data.

Authors:  Nadia Bystriakova; Pablo Hendrigo Alves De Melo; Justin Moat; Eimear Nic Lughadha; Alexandre K Monro
Journal:  Sci Rep       Date:  2019-11-19       Impact factor: 4.379

  10 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.