| Literature DB >> 31358028 |
Abstract
The computer software used for genomic analysis has become a crucial component of the infrastructure for life sciences. However, genomic software is still typically developed in an ad hoc manner, with inadequate funding, and by academic researchers not trained in software development, at substantial costs to the research community. I examine the roots of the incongruity between the importance of and the degree of investment in genomic software, and I suggest several potential remedies for current problems. As genomics continues to grow, new strategies for funding and developing the software that powers the field will become increasingly essential.Entities:
Mesh:
Year: 2019 PMID: 31358028 PMCID: PMC6664559 DOI: 10.1186/s13059-019-1763-7
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Fig. 1a William and Caroline Herschel’s 40-foot telescope in Slough, England, 1789 [1]. b Two of the teams of scientists that contributed to the Human Genome Project: (top) Sanger Center, Hinxton, UK [2]; (bottom) Washington University Genome Sequencing Center, St. Louis, USA, both circa 2000. c Some relics of the pre-Internet world: (clockwise from top left) the author learning to program in BASIC on a Commodore 64 in a cold upstate New York basement, 1983 (credit: Virginia Siepel), as one of the millions of children who were introduced to home computers during the 1980s and 1990s, some of whom would go on to write much of the software that powers genomics today; floppy disk for PAUP version 3.1.1,©1993; Sun Microsystems SPARCstation 1 with Mosaic web browser faintly visible on screen, 1994 [3]; screen shot from the MASE alignment program [4]. d Prof. David Haussler of UC Santa Cruz with the original Dell computer cluster that his team used to assemble the human genome, 2000. Photo (c) UC Santa Cruz, used with permission
Highly cited genomic software tools
| Program name | Yeara | Primary institution(s)b | Primary funding source(s)c | Refs.d | Citationse |
|---|---|---|---|---|---|
| Homology searching and alignment | |||||
| FASTA | 1988 | U. Virginia, NIH | [ | 13,496 | |
| CLUSTAL | 1988 | Trinity College, Dublin; EMBL, Heidelberg; EBI | European Community Biotechnology Action Programme | [ | 94,789 |
| BLAST | 1990 | NCBI | NIH | [ | 75,328 |
| PSI-BLAST | 1997 | NCBI | NIH | [ | 69,604 |
| HMMer | 1998 | Washington U., St. Louis | NIH, HHMI | [ | 8,836 |
| T-Coffee | 2000 | Nat. Inst. Med. Res., London | Swiss Nat’l Science Fnd. | [ | 6,247 |
| BLAT | 2002 | UC Santa Cruz | NIH, HHMI | [ | 6,911 |
| MUSCLE | 2004 | [ | 24,261 | ||
| MAFFT | 2013 | Kyoto U. | Ministry of Education, Culture, Sports, and Technology of Japan | [ | 21,486 |
| Phylogenetic modeling and tree inference | |||||
| PHYLIP | 1980 | U. Washington | NIH, NSF | [ | 21,851 |
| MacClade | 1986 | Sinauer Assoc. | (Commercial) | [ | 10,255 |
| PAUP | 1989 | Illinois Nat. Hist. Survey, Sinauer Assoc. | (Commercial) | [ | 62,807 |
| PAML | 1993 | UC Berkeley, Univ. College London | NSF of China, NIH, NSF | [ | 11,375 |
| MEGA | 1993 | Penn. State U., Arizona State U. | NIH, NSF, Burroughs-Wellcome | [ | 119,268 |
| Mr. Bayes | 2001 | U. Rochester, Uppsala U. | NSF | [ | 52,742 |
| Mesquite | 2001 | U. Arizona, U. British Columbia | Packard, NSF | [ | 7,693 |
| PhyML | 2003 | CNRS, Montpellier | Montpellier Genopole, InterEPST Bioinformatics Program | [ | 24,614 |
| PHAST | 2004 | UC Santa Cruz, Cornell | NSF, Packard, NIH | [ | 4,690 |
| RAxML | 2004 | Technical U. Munich | Heidelberg Institute for Theoretical Studies | [ | 27,550 |
| HyPhy | 2005 | UC San Diego, NC State | [ | 2,159 | |
| BEAST | 2007 | U. Auckland, U. Edinburgh | Wellcome Trust, Royal Society | [ | 12,027 |
| FastTree/FastTree2 | 2009 | Lawrence Berkeley Nat’l Lab, UC Berkeley | DOE, GTL Program | [ | 5,308 |
| Gene prediction, motif finding, and RNA folding | |||||
| MEME | 1994 | UC San Diego | NIH, NSF | [ | 11,790 |
| Genscan | 1997 | Stanford | NIH, NSF | [ | 4,061 |
| tRNAscan-SE | 1997 | Washington U., St. Louis | [ | 7,559 | |
| Vienna package | 2003 | Institute for Theoretical Chemistry, Austria | Austrian Science Fund | [ | 4,781 |
| Visualization | |||||
| Jalview | 1996 | EBI, Sanger, Oxford | BBSRC | [ | 5,895 |
| TreeView | 1998 | Stanford | NIH | [ | 17,796 |
| UCSC Genome Browser | 2000 | UC Santa Cruz | NIH, DOE, HHMI | [ | 11,365 |
| ENSEMBL Browser | 2000 | EBI, Sanger | Wellcome Trust, NIH, EMBL | [ | 5,235 |
| Cytoscape | 2003 | Inst. Systems Biology, Whitehead Inst., UC San Diego | Pfizer, NIH, NSF | [ | 17,862 |
| IGV | 2011 | Broad | NIH | [ | 4,678 |
| Statistical and population genomics | |||||
| STRUCTURE | 2000 | Oxford | NIH, Burroughs-Wellcome, BBRC | [ | 30,948 |
| PHASE/fastPHASE | 2001 | Oxford, U. Washington | Wellcome Trust, BBSRC, Engineering and Physical Sciences Research Council | [ | 10,073 |
| ms | 2002 | U. Chicago | [ | 2,119 | |
| PolyPhen | 2002 | EMBL, Max Delbrück Center for Mol. Med., Engelhardt Inst. Mol. Biol. | NIH | [ | 11,136 |
| SIFT | 2003 | Fred Hutchinson Cancer Res. Ctr. | NIH | [ | 7,024 |
| EIGENSTRAT | 2006 | Harvard, Broad | Millenium Pharmaceuticals, Burroughs Wellcome | [ | 6,812 |
| PLINK | 2007 | MGH, Broad, U. Hong Kong | NIH | [ | 17,938 |
| TASSEL | 2007 | USDA-ARS, Cornell | USDA-ARS, NSF | [ | 2,609 |
| BEAGLE | 2007 | U. Auckland | University of Auckland Research Committee, NIH | [ | 2,997 |
| IMPUTE/IMPUTE2 | 2007 | Oxford | Wellcome Trust, NIH | [ | 4,930 |
| VCFtools | 2011 | Sanger | Medical Research Council, British Heart Foundation, Wellcome Trust, NIH | [ | 3,133 |
| CADD | 2014 | U. Washington | NIH | [ | 2,353 |
| Functional genomics, annotations, and transcriptomics | |||||
| Gene Ontology | 2000 | UC Berkeley, Stanford | NIH, Astra Zeneca | [ | 22,898 |
| GSEA | 2005 | Broad | [ | 16,135 | |
| MACS/MACS2 | 2008 | Dana-Farber, Harvard | NIH | [ | 5,965 |
| TopHat/Cufflinks | 2009 | U. Maryland | NIH, NSF | [ | 28,242 |
| ChromHMM | 2010 | MIT, Broad | NSF, NIH | [ | 3,977 |
| BEDtools | 2010 | U. Virginia | NIH, Burroughs-Wellcome | [ | 7,137 |
| edgeR | 2010 | Garavan Inst. Med. Res., Walter & Eliza Hall Inst. Med. Res., Australia | NHMRC | [ | 9,992 |
| Trinity | 2011 | MIT, Broad | NIH, US-Israel Binational Science Foundation | [ | 7,178 |
| DEseq/DEseq2 | 2012 | EMBL | [ | 16,355 | |
| Assembly, read mapping, and base/variant calling | |||||
| Staden package | 1977 | LMB | [ | 5,029 | |
| Phred | 1993 | Washington U., St. Louis, U. Washington | NIH | [ | 12,172 |
| MAQ | 2008 | Sanger | Wellcome Trust | [ | 2,777 |
| ALLPATHS/ALLPATHS-LG | 2008 | Broad, MGH | NIH | [ | 2,079 |
| Velvet | 2008 | EBI | EMBL | [ | 7,635 |
| Bowtie/Bowtie2 | 2009 | U. Maryland | NIH | [ | 26,607 |
| BWT | 2009 | Sanger | Wellcome Trust | [ | 17,546 |
| SOAP2 | 2009 | Beijing Genomics Inst., U. Southern Denmark | National Natural Science Foundation of China, Danish Natural Science Research Council | [ | 2,818 |
| SAMtools | 2009 | Sanger | Wellcome Trust, NIH | [ | 17,811 |
| ABySS | 2009 | Genome Sciences Centre, Vancouver, BC | Genome Canada, Genome British Columbia, British Columbia Cancer Foundation | [ | 2,761 |
| GATK | 2010 | Broad, MGH | NIH | [ | 9,291 |
| SOAPdenovo/SOAPdenovo2 | 2010 | Beijing Genomics Inst. | Chinese Academy of Science, National Natural Science Foundation of China | [ | 4,295 |
| STAR | 2013 | CSHL | NIH | [ | 6,013 |
a Approximate first year available, or year of first publication if unknown
bInstitutions most central in supporting project, or affiliations of first and last authors of first publication if unknown. Broad Eli & Edythe Broad Institute of MIT & Harvard, USA; CNRS Centre National de la Recherche Scientifique, France; CSHL Cold Spring Harbor Laboratory, USA; EBI European Bioinformatics Institute; EMBL European Molecular Biology Laboratory; HSPH Harvard School of Public Health, USA; LMB Laboratory of Molecular Biology, UK; MGH Massachusetts General Hospital, USA; Sanger Wellcome Trust Sanger Institute, UK
c BBSRC Biotechnology & Biological Sciences Research Council, UK; HHMI Howard Hughes Medical Institute, US; NA not applicable; NCBI National Center for Biotechnology Information, US; NHMRC The National Health & Medical Research Council, Australia; NIH National Institutes of Health, US; NSF National Science Foundation, US; USDA-ARS United States Department of Agriculture - Agriculture Research Service; Packard David & Lucile Packard Foundation
d Most highly cited associated publications (at most five)
e Total number of citations, obtained from Google Scholar on Feb. 22, 2019
Grant opportunities for genomic software development
| Title | Source | Country | Last call | Funding rate |
|---|---|---|---|---|
| Bioinformatics and Computational Biology | Genome Canada | Canada | 2017 | CAD$12 M |
| Cyberinfrastructure Initiative | Canada Foundation for Innovation | Canada | 2017 | ~ CAD$10 M |
| Research Software Program | CANARIE | Canada | Open | CAD$4.5 M |
| ELIXIR Tools Platform | ELIXIR | (Europe) | Open | |
| Call for Challenges and Unlocking of Technological and Scientific Barriers | Institut Français de Bioinformatique (IFB) | France | Open | |
| Accelerating Scientific Discovery | Netherlands eScience | Netherlands | 2018 | ~€1 M |
| Bioinformatics and Biological Resources Fund | BBSRC | UK | 2017 | Up to £6 M |
| Transformative Research Technologies | BBSRC, EPSRC, MRC | UK | 2017 | Up to £3.5 M |
| Collaborative Computational Tools for the Human Cell Atlas | Chan-Zuckerberg Initiative | USA | 2017 | $15 M |
| Continued Development and Maintenance of Software | NIH | USA | 2014 | |
| Cyberinfrastructure for Sustained Scientific Innovation | NSF (spans Directorates) | USA | Open | $46.5 M |
| Data-Driven Discovery Investigator Competition | Gordon and Betty Moore Foundation | USA | 2014 | $22.5 M |
| Extended Development, Hardening & Dissemination of Technologies in Biomedical Computing, Informatics & Big Data Science | NIH | USA | 2014 | |
| Informatics Technology for Cancer Research | NCI/NIH | USA | 2018 | |
| Infrastructure Capacity for Biology | NSF Division of Biological Infrastructure (DBI) | USA | Open | $40 M |
| Innovation in Cancer Informatics | Fund for Innovation in Cancer Informatics | USA | Open | ~ $1 M |
| Investigator Initiated Research in Computational Genomics and Data Science | NHGRI/NIH | USA | Open | |
| BBSRC-NSF/BIO Lead Agency Opportunity in Bioinformatics and Synthetic Biology | NSF Directorate for Biological Sciences (NSF/BIO), BBSRC | USA/UK | 2018 |
BBSRC Biotechnology and Biological Sciences Research Council, UK; EPSRC Engineering and Physical Sciences Research Council, UK; MRC Medical Research Council, UK; NA not applicable; NCI National Cancer Institute, US; NHGRI National Human Genome Research Institute, US; NIH National Institutes of Health, US; NSF National Science Foundation, US