| Literature DB >> 27540499 |
Mark Stalzer1, Chris Mentzel1.
Abstract
The Gordon and Betty Moore Foundation ran an Investigator Competition as part of its Data-Driven Discovery Initiative in 2014. We received about 1100 applications and each applicant had the opportunity to list up to five influential works in the general field of "Big Data" for scientific discovery. We collected nearly 5000 references and 53 works were cited at least six times. This paper contains our preliminary findings.Entities:
Year: 2016 PMID: 27540499 PMCID: PMC4975741 DOI: 10.1186/s40064-016-2888-8
Source DB: PubMed Journal: Springerplus ISSN: 2193-1801
Works that were cited at least ten times, with count, year, and citation
| Count | Year | Citation |
|---|---|---|
| 63 | 2008 | MapReduce (Dean and Ghemawat |
| 51 | 2009 |
|
| 43 | 2009 |
|
| 30 | 2001 | Initial sequencing of the human genome (Lander et al. |
| 24 | 1948 | A mathematical theory of communication (Shannon |
| 23 | 2000 | Sloan Digital Sky Survey (York et al. |
| 20 | 1990 | BLAST (Altschul et al. |
| 19 | 1996 | Lasso (Tibshirani |
| 19 | 2003 | Latent Dirichlet allocation (Blei et al. |
| 17 | 1977 | EM algorith (Dempster et al. |
| 17 | 1995 | Support vector networks (Cortes and Vapnik |
| 15 | 2001 | Random forests (Breiman |
| 14 | 2006 |
|
| 14 | 1998 | Anatomy of web search engine (Brin and Page |
| 13 | 2007 |
|
| 11 | 1979 | Bootstrap methods (Efron |
| 11 | 1953 | Equation of state calculations (Metropolis et al. |
| 11 | 1977 | Exploratory data analysis (Tukey |
| 11 | 1988 |
|
| 10 | 1999 | PageRank (Page et al. |
| 10 | 2013 |
|
| 10 | 2009 | Unreasonable effectiveness of data (Halevy et al. |
Fig. 1Fit of the influential works to a power law (x is index, y is count). The correlation coefficient is
A clustering of the 53 influential works with associated sections
| Count | Cluster | Key topics |
|---|---|---|
| 7 | “ | Astronomy |
| Genomics | ||
| 29 | “ | Theory |
| Statistical methods | ||
| Machine learning | ||
| 9 | “ | |
| General tools | ||
| 8 | “ | |
| 53 | ALL |
Key to reference tags and fields
| Tag | Field |
|---|---|
| ACM | Applied and computational mathematics |
| AG | Agriculture |
| APHYS | Applied physics |
| ASPC | Aerospace |
| ASTRO | Astronomy and astrophysics |
| ASTROB | Astrobiology |
| ATMOS | Atmospheric science |
| BCS | Brain and cognitive science |
| BIO | Biology |
| BIOE | Bioengineering |
| BIOI | Bioinformatics |
| CBIO | Computational biology |
| CE | Computer engineering |
| CHEM | Chemistry |
| CHEME | Chemical engineering |
| CIVE | Civil engineering |
| CLI | Climate science |
| CS | Computer science |
| CSS | Computational social science |
| CSYS | Complex systems |
| DM | Data mining |
| EBIO | Evolutionary biology |
| ECO | Ecology |
| EE | Electrical engineering |
| ENGR | Engineering (general) |
| EPS | Earth and planetary science |
| ESE | Environmental science and engineering |
| EST | Energy science and technology |
| GENE | Genetics |
| GENOM | Genomics |
| GEOP | Geophysics |
| MATH | Mathematics |
| MATS | Materials science |
| MBIO | Biochemistry and molecular biophysics |
| ME | Mechanical engineering and solid mechanics |
| MED | Medicine |
| MMO | Marine microbiology and oceanography |
| NEURO | Neuroscience |
| OPSR | Operations research |
| PHYS | Physics |
| REMS | Remote sensing |
| SBIO | Systems biology |
| SML | Statistics and machine learning |