Literature DB >> 23587178

The rise of a digital immune system.

Abstract

Driven by million-fold improvements in biotechnology, biology is increasingly shifting towards high-resolution, quantitative approaches to study the molecular dynamics of entire populations. One exciting application enabled by this new era of biology is the "digital immune system". It would work in much the same way as an adaptive, biological immune system: by observing the microbial landscape, detecting potential threats, and neutralizing them before they spread beyond control. With the potential to have an enormous impact on public health, it is time to integrate the necessary biotechnology, computational, and organizational systems to seed the development of a global, sequencing-based pathogen surveillance system.

Entities: Disease Gene Species

Year: 2012 PMID： 23587178 PMCID： PMC3617452 DOI： 10.1186/2047-217X-1-4

Source DB: PubMed Journal: Gigascience ISSN： 2047-217X Impact factor: 6.524

The “mega-genomics” era

If the last two decades marked the beginning of the genomics era, starting with the sequencing and publication of the first free-living organism in 1995 and then the human genome in 2001, the current decade marks the beginning of the “mega-genomics” era, where large numbers of genomes are analyzed with diverse, sequencing-based assays to infer molecular diversity and dynamics of life. Examples include projects to determine the molecular basis of complex human diseases such as cancer [1], to study the incredible diversity and function of the human microbiome [2], to rapidly identify the origins of pathogen outbreaks [3], and to generally develop a deeper understanding of the living world through the increasing use of large-scale sequencing. These breakthroughs are driven by a shift from single-reference genomics to more quantitative, population-wide analyses. Biology has moved beyond developing a merely qualitative understanding of cellular and evolutionary processes, and now strives for base-pair resolution and predictive models of biological systems and disease. This has been enabled through the combination of dramatically improved biotechnology, computer technology, algorithms, and statistical models. Through sophisticated protocols and assays, sequencing is no longer limited to just reading DNA, but has been creatively adapted to measure transcript abundance, protein-DNA binding patterns, and the three-dimensional configuration of DNA or RNA, among others (see [4] for a overview of available applications). Sequencing throughput and costs have improved by more than a million-fold, and these advances have risen alongside similarly radical advances in computational technology and algorithm sophistication [5]. Amazingly, there seems to be no end to the exponential capability growth we have witnessed, and vendor roadmaps continue to project breakneck innovation well into the next decade. Worldwide sequencing capacities currently exceed 15 petabases per year, and compute clouds with seemingly infinite capacity can now be rented on demand. On the sequencing side, real-time, single-molecule sequencing has been achieved by Pacific Biosciences, and Oxford Nanopore has promised to deliver a mobile, disposable sequencing device the size of a thumb-drive [6]. With equally amazing advancements happening every year, it is virtually certain that the confluence of cheap sequencing and “big data” computer science will enable many new, digital forms of biology.

A digital immune system

One exciting application of digital biology with the potential to have enormous public health impact is the “digital immune system.” The term, coined by David Lipman of NCBI, draws an analogy between computing and biology—a recurring technique of computational scientists (viruses, genetic algorithms, neural networks). A digital immune system would work in much the same way as an adaptive, biological immune system: by observing the microbial landscape, detecting potential threats, and neutralizing them before they cause widespread harm. This simple strategy, effectively tested over millions of years, can now start to be replicated in silico with the combination of distributed “sensor” sequencing and bioinformatics—where a network of mobile sequencing devices serves a real-time stream of microbial genomes to a global compute cloud for analysis. An effective immune response relies on the ability to differentiate normal from abnormal. In the digital realm, this ability will rely on extensive knowledge of microbial diversity. However, unlike the macroscopic world where outliers can often be easily recognized, microbial diversity is less well characterized, with only a small fraction of the world’s microbes ever sequenced [7]. It is difficult to characterize an emerging outbreak, for example, when only a handful of known genomes exist. Effective pathogen detection and response requires a complete catalog of genomic diversity, antibiotic resistance, and virulence across both temporal and geospatial dimensions. This must be achieved by sequencing and archiving huge numbers of microbial genomes, both from clinical cases and known environmental reservoirs, on a continual basis. Just as an immunological memory improves with each exposure, genome databases will also expand and improve over time as new outbreaks and environments are analyzed, but only if this digital memory is properly managed. Standardized sequences and metadata must be made freely available in real-time and on a global scale, requiring a daunting level of cooperation. The primary nucleotide archives NCBI, EMBL and DDBJ are obvious candidates for this task, but these archives must rapidly adapt to the new era of population sequencing. The current database models are outdated; the number of genomes being submitted lags far behind the genomes being sequenced and those submitted often lack essential metadata. Barriers must be eliminated and new incentives structured to encourage the submission of usable, large-scale data: “more data, faster” should be the guiding principle and the minimum metadata of “what, where, when” (sequence, location, time) must be reliably captured. An explosion of openly available microbial genomes, linked with temporal and geospatial metadata, would undoubtedly lead to new discoveries in epidemiology and ultimately lead to more predictive biology. Open data sharing has already reduced outbreak attribution to a matter of weeks, as evidenced by the “crowd-sourced” responses to the swine flu [8] and Escherichia coli O104:H4 [9] outbreaks, while the NIAID Influenza Genome Sequencing Project, spearheaded by The Institute for Genomic Research (now JCVI) [10], has hinted at the predictive potential. Influenza research has exploded with the ongoing generation and release of these genomes, spawning many follow-on studies and predictive models that have shown, among other things, that seasonal influenza severity can be predicted by the genetic diversity of the circulating strains [11]. The potential power of expanding such surveillance efforts is extremely compelling and would drastically shape the future of infectious disease—potentially stopping the next outbreak before it happens. The technology necessary to implement these proposals is imminent, although a few significant obstacles remain. Computational hurdles, while daunting, are perhaps the most manageable, with computer science, high-energy physics, and astronomy already leading the big-data charge. Rather, a more pressing need is for inexpensive and portable sequencing devices that can act as the sensors in a distributed, real-time sequencing network—just as atmospheric sensors feed real-time data to sophisticated weather modeling programs. These sequencing sensors would also require significant advances in sample preparation to allow easy nucleic acid extraction direct from any sample, so that they could be widely deployed and operated by health care providers rather than specialists. As the necessary technologies continue to grow to address these needs, there is no need to wait. Recent studies have shown that implementing this vision on a limited scale can yield tremendous insight (e.g. [12]). In addition, many universities and hospitals already perform routine pathogen sequencing and environmental screens. What is needed now is an organized effort towards making these genomes as widely available as possible to enable a digital immune system with the potential to drastically advance human health.

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

Both authors contributed equally to the drafting of the manuscript. Both authors read and approved the final manuscript.

12 in total

Review 1. Next-generation genomics: an integrative approach.

Authors: R David Hawkins; Gary C Hon; Bing Ren
Journal: Nat Rev Genet Date: 2010-07 Impact factor: 53.242

2. Faster sequencers, larger datasets, new challenges.

Authors: Christopher E Mason; Olivier Elemento
Journal: Genome Biol Date: 2012 Impact factor: 13.583

3. The human microbiome project.

Authors: Peter J Turnbaugh; Ruth E Ley; Micah Hamady; Claire M Fraser-Liggett; Rob Knight; Jeffrey I Gordon
Journal: Nature Date: 2007-10-18 Impact factor: 49.962

4. Large-scale sequencing of human influenza reveals the dynamic nature of viral genome evolution.

Authors: Elodie Ghedin; Naomi A Sengamalay; Martin Shumway; Jennifer Zaborsky; Tamara Feldblyum; Vik Subbu; David J Spiro; Jeff Sitz; Hean Koo; Pavel Bolotov; Dmitry Dernovoy; Tatiana Tatusova; Yiming Bao; Kirsten St George; Jill Taylor; David J Lipman; Claire M Fraser; Jeffery K Taubenberger; Steven L Salzberg
Journal: Nature Date: 2005-10-05 Impact factor: 49.962

5. Origins of the E. coli strain causing an outbreak of hemolytic-uremic syndrome in Germany.

Authors: David A Rasko; Dale R Webster; Jason W Sahl; Ali Bashir; Nadia Boisen; Flemming Scheutz; Ellen E Paxinos; Robert Sebra; Chen-Shan Chin; Dimitris Iliopoulos; Aaron Klammer; Paul Peluso; Lawrence Lee; Andrey O Kislyuk; James Bullard; Andrew Kasarskis; Susanna Wang; John Eid; David Rank; Julia C Redman; Susan R Steyert; Jakob Frimodt-Møller; Carsten Struve; Andreas M Petersen; Karen A Krogfelt; James P Nataro; Eric E Schadt; Matthew K Waldor
Journal: N Engl J Med Date: 2011-07-27 Impact factor: 91.245

Review 6. Advances in understanding cancer genomes through second-generation sequencing.

Authors: Matthew Meyerson; Stacey Gabriel; Gad Getz
Journal: Nat Rev Genet Date: 2010-10 Impact factor: 53.242

7. Cloud computing and the DNA data race.

Authors: Michael C Schatz; Ben Langmead; Steven L Salzberg
Journal: Nat Biotechnol Date: 2010-07 Impact factor: 54.908

8. Projection of seasonal influenza severity from sequence and serological data.

Authors: Yuri I Wolf; Anastasia Nikolskaya; Joshua L Cherry; Cecile Viboud; Eugene Koonin; David J Lipman
Journal: PLoS Curr Date: 2010-12-03

9. Evolution of MRSA during hospital transmission and intercontinental spread.

Authors: Simon R Harris; Edward J Feil; Matthew T G Holden; Michael A Quail; Emma K Nickerson; Narisara Chantratita; Susana Gardete; Ana Tavares; Nick Day; Jodi A Lindsay; Jonathan D Edgeworth; Hermínia de Lencastre; Julian Parkhill; Sharon J Peacock; Stephen D Bentley
Journal: Science Date: 2010-01-22 Impact factor: 47.728

10. Antigenic and genetic characteristics of swine-origin 2009 A(H1N1) influenza viruses circulating in humans.

Authors: Rebecca J Garten; C Todd Davis; Colin A Russell; Bo Shu; Stephen Lindstrom; Amanda Balish; Wendy M Sessions; Xiyan Xu; Eugene Skepner; Varough Deyde; Margaret Okomo-Adhiambo; Larisa Gubareva; John Barnes; Catherine B Smith; Shannon L Emery; Michael J Hillman; Pierre Rivailler; James Smagala; Miranda de Graaf; David F Burke; Ron A M Fouchier; Claudia Pappas; Celia M Alpuche-Aranda; Hugo López-Gatell; Hiram Olivera; Irma López; Christopher A Myers; Dennis Faix; Patrick J Blair; Cindy Yu; Kimberly M Keene; P David Dotson; David Boxrud; Anthony R Sambol; Syed H Abid; Kirsten St George; Tammy Bannerman; Amanda L Moore; David J Stringer; Patricia Blevins; Gail J Demmler-Harrison; Michele Ginsberg; Paula Kriner; Steve Waterman; Sandra Smole; Hugo F Guevara; Edward A Belongia; Patricia A Clark; Sara T Beatrice; Ruben Donis; Jacqueline Katz; Lyn Finelli; Carolyn B Bridges; Michael Shaw; Daniel B Jernigan; Timothy M Uyeki; Derek J Smith; Alexander I Klimov; Nancy J Cox
Journal: Science Date: 2009-05-22 Impact factor: 47.728

17 in total

Review 1. Practical Value of Food Pathogen Traceability through Building a Whole-Genome Sequencing Network and Database.

Authors: Marc W Allard; Errol Strain; David Melka; Kelly Bunning; Steven M Musser; Eric W Brown; Ruth Timme
Journal: J Clin Microbiol Date: 2016-03-23 Impact factor: 5.948

2. High-throughput screening of microchip-synthesized genes in programmable double-emulsion droplets.

Authors: H F Chan; S Ma; J Tian; K W Leong
Journal: Nanoscale Date: 2017-03-09 Impact factor: 7.790

3. Irreconcilable differences: divorcing geographic mutation and recombination rates within a global MRSA clone.

Authors: Todd J Treangen; Adam M Phillippy
Journal: Genome Biol Date: 2012-12-27 Impact factor: 13.583

4. Large-scale genomic sequencing of extraintestinal pathogenic Escherichia coli strains.

Authors: Stephen J Salipante; David J Roach; Jacob O Kitzman; Matthew W Snyder; Bethany Stackhouse; Susan M Butler-Wu; Choli Lee; Brad T Cookson; Jay Shendure
Journal: Genome Res Date: 2014-11-04 Impact factor: 9.043

5. Biological data sciences in genome research.

Authors: Michael C Schatz
Journal: Genome Res Date: 2015-10 Impact factor: 9.043

6. Real-time digital pathogen surveillance - the time is now.

Authors: Jennifer Gardy; Nicholas J Loman; Andrew Rambaut
Journal: Genome Biol Date: 2015-07-30 Impact factor: 13.583

7. GigaDB: announcing the GigaScience database.

Authors: Tam P Sneddon; Peter Li; Scott C Edmunds
Journal: Gigascience Date: 2012-07-12 Impact factor: 6.524

8. Large and linked in scientific publishing.

Authors: Laurie Goodman; Scott C Edmunds; Alexandra T Basford
Journal: Gigascience Date: 2012-07-12 Impact factor: 6.524

9. Crowdsourcing genomic analyses of ash and ash dieback - power to the people.

Authors: Dan Maclean; Kentaro Yoshida; Anne Edwards; Lisa Crossman; Bernardo Clavijo; Matt Clark; David Swarbreck; Matthew Bashton; Patrick Chapman; Mark Gijzen; Mario Caccamo; Allan Downie; Sophien Kamoun; Diane Go Saunders
Journal: Gigascience Date: 2013-02-12 Impact factor: 6.524

10. A Year of Infection in the Intensive Care Unit: Prospective Whole Genome Sequencing of Bacterial Clinical Isolates Reveals Cryptic Transmissions and Novel Microbiota.

Authors: David J Roach; Joshua N Burton; Choli Lee; Bethany Stackhouse; Susan M Butler-Wu; Brad T Cookson; Jay Shendure; Stephen J Salipante
Journal: PLoS Genet Date: 2015-07-31 Impact factor: 5.917