| Literature DB >> 21765886 |
Abstract
Many initiatives encourage investigators to share their raw datasets in hopes of increasing research efficiency and quality. Despite these investments of time and money, we do not have a firm grasp of who openly shares raw research data, who doesn't, and which initiatives are correlated with high rates of data sharing. In this analysis I use bibliometric methods to identify patterns in the frequency with which investigators openly archive their raw gene expression microarray datasets after study publication. Automated methods identified 11,603 articles published between 2000 and 2009 that describe the creation of gene expression microarray data. Associated datasets in best-practice repositories were found for 25% of these articles, increasing from less than 5% in 2001 to 30%-35% in 2007-2009. Accounting for sensitivity of the automated methods, approximately 45% of recent gene expression studies made their data publicly available. First-order factor analysis on 124 diverse bibliometric attributes of the data creation articles revealed 15 factors describing authorship, funding, institution, publication, and domain environments. In multivariate regression, authors were most likely to share data if they had prior experience sharing or reusing data, if their study was published in an open access journal or a journal with a relatively strong data sharing policy, or if the study was funded by a large number of NIH grants. Authors of studies on cancer and human subjects were least likely to make their datasets available. These results suggest research data sharing levels are still low and increasing only slowly, and data is least available in areas where it could make the biggest impact. Let's learn from those with high rates of sharing to embrace the full potential of our research output.Entities:
Mesh:
Year: 2011 PMID: 21765886 PMCID: PMC3135593 DOI: 10.1371/journal.pone.0018657
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Proportion of articles with shared datasets, by year (error bars are 95% confidence intervals of the proportions).
Figure 2Proportion of articles with shared datasets, by journal (error bars are 95% confidence intervals of the proportions).
First-order factor loadings.
| Large NIH grant |
| 0.97 num.post2005.morethan1000k.tr |
| 0.96 num.post2005.morethan750k.tr |
| 0.92 num.post2004.morethan750k.tr |
| 0.91 num.post2004.morethan1000k.tr |
| 0.91 num.post2005.morethan500k.tr |
| 0.89 num.post2006.morethan1000k.tr |
| 0.89 num.post2006.morethan750k.tr |
| 0.86 num.post2004.morethan500k.tr |
| 0.85 num.post2006.morethan500k.tr |
| 0.84 num.post2003.morethan750k.tr |
| 0.84 num.post2003.morethan1000k.tr |
| 0.80 num.post2003.morethan500k.tr |
| 0.74 has.U.funding |
| 0.71 has.P.funding |
| 0.58 nih.sum.avg.dollars.tr |
| 0.56 nih.sum.sum.dollars.tr |
| 0.44 nih.max.max.dollars.tr |
| Has journal policy |
| 1.00 journal.policy.contains..geo.omnibus |
| 0.95 journal.policy.at.least.requests.sharing.array |
| 0.95 journal.policy.mentions.any.sharing |
| 0.93 journal.policy.contains.word.microarray |
| 0.91 journal.policy.requests.sharing.other.data |
| 0.85 journal.policy.says.must.deposit |
| 0.83 journal.policy.contains.word.arrayexpress |
| 0.72 journal.policy.requires.microarray.accession |
| 0.71 journal.policy.requests.accession |
| 0.58 journal.policy.contains.word.miame.mged |
| 0.48 journal.microarray.creating.count.tr |
| 0.45 journal.policy.mentions.consequences |
| 0.42 journal.policy.general.statement |
| NOT institution NCI or intramural |
| 0.59 pubmed.is.funded.non.us.govt |
| 0.55 institution.is.higher.ed |
| −0.89 institution.nci |
| −0.86 pubmed.is.funded.nih.intramural |
| −0.42 country.usa |
| Count of R01 & other NIH grants |
| 1.15 has.R01.funding |
| 1.14 has.R.funding |
| 0.89 num.grants.via.nih.tr |
| 0.86 nih.cumulative.years.tr |
| 0.82 num.grant.numbers.tr |
| 0.80 max.grant.duration.tr |
| 0.66 pubmed.is.funded.nih |
| 0.50 nih.max.max.dollars.tr |
| 0.45 num.nih.is.nigms.tr |
| 0.44 country.usa |
| 0.42 has.T.funding |
| 0.41 num.nih.is.niaid.tr |
| Journal impact |
| 0.88 journal.5yr.impact.factor.log |
| 0.88 journal.impact.factor.log |
| 0.85 journal.immediacy.index.log |
| 0.70 journal.policy.mentions.exceptions |
| 0.54 journal.num.articles.2008.tr |
| 0.51 journal.policy.contains.word.miame.mged |
| −0.61 journal.policy.contains.word.arrayexpress |
| −0.48 pubmed.is.open.access |
| Last author num prev pubs & first year pub |
| 0.84 last.author.num.prev.pubs.tr |
| 0.74 last.author.year.first.pub.ago.tr |
| 0.73 last.author.num.prev.pmc.cites.tr |
| 0.68 last.author.num.prev.other.sharing.tr |
| 0.48 country.japan |
| 0.44 last.author.num.prev.microarray.creations.tr |
| Journal policy consequences & long half-life |
| 0.78 journal.policy.mentions.consequences |
| 0.73 journal.cited.halflife |
| 0.60 pubmed.is.bacteria |
| 0.42 journal.policy.requires.microarray.accession |
| −0.54 pubmed.is.open.access |
| −0.45 journal.policy.general.statement |
| Institution high citations & collaboration |
| 0.76 institution.mean.norm.citation.score |
| 0.72 institution.international.collaboration |
| 0.64 institution.mean.norm.impact.factor |
| 0.41 country.germany |
| −0.67 country.china |
| −0.61 country.korea |
| −0.56 last.author.gender.not.found |
| −0.43 country.japan |
| NO geo reuse & YES high institution output |
| 0.66 institution.research.output.tr |
| 0.58 institution.harvard |
| 0.46 has.K.funding |
| 0.42 institution.stanford |
| −0.79 pubmed.is.geo.reuse |
| −0.62 country.australia |
| −0.46 institution.rank |
| NOT animals or mice |
| 0.51 pubmed.is.humans |
| 0.43 pubmed.is.diagnosis |
| 0.40 pubmed.is.effectiveness |
| −0.93 pubmed.is.animals |
| −0.86 pubmed.is.mice |
| Humans & cancer |
| 0.84 pubmed.is.humans |
| 0.75 pubmed.is.cancer |
| 0.67 pubmed.is.cultured.cells |
| 0.52 institution.is.medical |
| 0.47 pubmed.is.core.clinical.journal |
| −0.68 pubmed.is.plants |
| −0.49 pubmed.is.fungi |
| Institution is government & NOT higher ed |
| 0.92 institution.is.govnt |
| 0.70 country.germany |
| 0.65 country.france |
| 0.46 institution.international.collaboration |
| −0.78 institution.is.higher.ed |
| −0.56 country.canada |
| −0.51 institution.stanford |
| −0.42 institution.is.medical |
| NO K funding or P funding |
| 0.56 has.R01.funding |
| 0.49 has.R.funding |
| 0.41 num.post2006.morethan500k.tr |
| 0.41 num.post2006.morethan750k.tr |
| 0.40 num.post2006.morethan1000k.tr |
| −0.65 has.K.funding |
| −0.63 has.P.funding |
| Authors prev GEOAE sharing & OA & arry creation |
| 0.83 last.author.num.prev.geoae.sharing.tr |
| 0.74 last.author.num.prev.microarray.creations.tr |
| 0.73 last.author.num.prev.oa.tr |
| 0.60 first.author.num.prev.geoae.sharing.tr |
| 0.47 first.author.num.prev.oa.tr |
| 0.46 first.author.num.prev.microarray.creations.tr |
| 0.40 institution.stanford |
| −0.44 years.ago.tr |
| First author num prev pubs & first year pub |
| 0.83 first.author.num.prev.pubs.tr |
| 0.77 first.author.year.first.pub.ago.tr |
| 0.73 first.author.num.prev.pmc.cites.tr |
| 0.52 first.author.num.prev.other.sharing.tr |
Figure 3Association between shared data and first-order factors.
Percentage of studies with shared data is shown for each quartile for each factor. Univariate analysis.
Figure 4Odds ratios of data sharing for first-order factor, multivariate model.
Odd ratios are calculated as factor scores are each varied from their 25th percentile value to their 75th percentile value. Horizontal lines show the 95% confidence intervals of the odds ratios.
Second-order factor loadings, by first-order factors.
| Amount of NIH funding |
| 0.89 Count of R01 & other NIH grants |
| 0.49 Large NIH grant |
| −0.55 NO K funding or P funding |
| Cancer & humans |
| 0.83 Humans & cancer |
| OA journal & previous GEO-AE sharing |
| 0.59 Authors prev GEOAE sharing & OA & microarray creation |
| 0.43 Institution high citations & collaboration |
| 0.31 First author num prev pubs & first year pub |
| −0.36 Last author num prev pubs & first year pub |
| Journal impact factor and policy |
| 0.57 Journal impact |
| 0.51 Last author num prev pubs & first year pub |
| Higher Ed in USA |
| 0.40 NO geo reuse+YES high institution output |
| −0.44 Institution is government & NOT higher ed |
Second-order factor loadings, by original variables.
| Amount of NIH funding |
| 0.87 nih.cumulative.years.tr |
| 0.85 num.grants.via.nih.tr |
| 0.84 max.grant.duration.tr |
| 0.82 num.grant.numbers.tr |
| 0.80 pubmed.is.funded.nih |
| 0.79 nih.max.max.dollars.tr |
| 0.70 nih.sum.avg.dollars.tr |
| 0.70 nih.sum.sum.dollars.tr |
| 0.59 has.R.funding |
| 0.59 num.post2003.morethan500k.tr |
| 0.58 country.usa |
| 0.58 has.U.funding |
| 0.57 has.R01.funding |
| 0.55 num.post2003.morethan750k.tr |
| 0.53 has.T.funding |
| 0.53 num.post2003.morethan1000k.tr |
| 0.49 num.post2004.morethan500k.tr |
| 0.45 num.post2004.morethan750k.tr |
| 0.44 has.P.funding |
| 0.43 num.post2004.morethan1000k.tr |
| 0.43 num.nih.is.nci.tr |
| 0.35 num.post2005.morethan500k.tr |
| 0.32 num.nih.is.nigms.tr |
| 0.31 num.post2005.morethan750k.tr |
| Cancer & humans |
| 0.60 pubmed.is.cancer |
| 0.59 pubmed.is.humans |
| 0.52 pubmed.is.cultured.cells |
| 0.43 pubmed.is.core.clinical.journal |
| 0.39 institution.is.medical |
| −0.58 pubmed.is.plants |
| −0.50 pubmed.is.fungi |
| −0.37 pubmed.is.shared.other |
| −0.30 pubmed.is.bacteria |
| OA journal & previous GEO-AE sharing |
| 0.40 first.author.num.prev.geoae.sharing.tr |
| 0.37 pubmed.is.open.access |
| 0.37 first.author.num.prev.oa.tr |
| 0.35 last.author.num.prev.geoae.sharing.tr |
| 0.32 pubmed.is.effectiveness |
| 0.32 last.author.num.prev.oa.tr |
| 0.31 pubmed.is.geo.reuse |
| −0.38 country.japan |
| Journal impact factor and policy |
| 0.48 journal.impact.factor.log |
| 0.47 jour.policy.requires.microarray.accession |
| 0.46 jour.policy.mentions.exceptions |
| 0.46 pubmed.num.cites.from.pmc.tr |
| 0.45 journal.5yr.impact.factor.log |
| 0.45 jour.policy.contains.word.miame.mged |
| 0.42 last.author.num.prev.pmc.cites.tr |
| 0.41 jour.policy.requests.accession |
| 0.40 journal.immediacy.index.log |
| 0.40 journal.num.articles.2008.tr |
| 0.39 years.ago.tr |
| 0.36 jour.policy.says.must.deposit |
| 0.35 pubmed.num.cites.from.pmc.per.year |
| 0.33 institution.mean.norm.citation.score |
| 0.32 last.author.year.first.pub.ago.tr |
| 0.31 country.usa |
| 0.31 last.author.num.prev.pubs.tr |
| 0.31 jour.policy.contains.word.microarray |
| −0.31 pubmed.is.open.access |
| Higher Ed in USA |
| 0.36 institution.stanford |
| 0.36 institution.is.higher.ed |
| 0.35 country.usa |
| 0.35 has.R.funding |
| 0.33 has.R01.funding |
| 0.30 institution.harvard |
| −0.37 institution.is.govnt |
Figure 5Association between shared data and second-order factors.
Percentage of studies with shared data is shown for each quartile for each factor. Univariate analysis.
Figure 6Odds ratios of data sharing for second-order factor, multivariate model.
Odd ratios are calculated as factor scores are each varied from their 25th percentile value to their 75th percentile value. Horizontal lines show the 95% confidence intervals of the odds ratios.
Data sharing prevalence of subgroups divided at medians of two second-order factors [95% confidence intervals].
| number of studies with shared data/ number of studies | Above the median value for the factor “Cancer & Humans” | Below the median value for the factor “Cancer & Humans” | Total |
|
| 629/2624 = 24% [22%, 26%] | 1184/3178 = | 1813/5802 = 31% [30%, 32%] |
|
| 440/3178 = | 648/2623 = 25% [23%, 26%] | 1088/5801 = 19% [18%, 20%] |
|
| 1069/5802 = 18% [17%, 19%] | 1832/5801 = 32% [30%, 33%] |
|