Harris A Lewin1,2, Stephen Richards3, Erez Lieberman Aiden4, Miguel L Allende5,6, John M Archibald7, Miklós Bálint8,9, Katharine B Barker10, Bridget Baumgartner11, Katherine Belov12, Giorgio Bertorelle13, Mark L Blaxter14, Jing Cai15, Nicolette D Caperello3, Keith Carlson16, Juan Carlos Castilla-Rubio17, Shu-Miaw Chaw18, Lei Chen15, Anna K Childers19, Jonathan A Coddington20, Dalia A Conde21,22, Montserrat Corominas23,24, Keith A Crandall25,26, Andrew J Crawford27, Federica DiPalma28, Richard Durbin29,30, ThankGod E Ebenezer31, Scott V Edwards32,33, Olivier Fedrigo34, Paul Flicek30,35, Giulio Formenti36, Richard A Gibbs37, M Thomas P Gilbert38,39, Melissa M Goldstein40, Jennifer Marshall Graves41,42, Henry T Greely43, Igor V Grigoriev44,45, Kevin J Hackett46, Neil Hall47, David Haussler48,49, Kristofer M Helgen50, Carolyn J Hogg12, Sachiko Isobe51, Kjetill Sigurd Jakobsen52, Axel Janke8, Erich D Jarvis34,49, Warren E Johnson53,54, Steven J M Jones55, Elinor K Karlsson56,57, Paul J Kersey58, Jin-Hyoung Kim59, W John Kress60, Shigehiro Kuraku61,62, Mara K N Lawniczak14, James H Leebens-Mack63, Xueyan Li64, Kerstin Lindblad-Toh57,65, Xin Liu66, Jose V Lopez67,68, Tomas Marques-Bonet69,70,71,72, Sophie Mazard73, Jonna A K Mazet74, Camila J Mazzoni75,76, Eugene W Myers77, Rachel J O'Neill78,79, Sadye Paez34, Hyun Park80, Gene E Robinson81, Cristina Roquet82,83, Oliver A Ryder84,85, Jamal S M Sabir86,87, H Bradley Shaffer88,89, Timothy M Shank90, Jacob S Sherkow81,91, Pamela S Soltis92,93, Boping Tang94, Leho Tedersoo95,96, Marcela Uliano-Silva14, Kun Wang15, Xiaofeng Wei66, Regina Wetzer97,98, Julia L Wilson30, Xun Xu66, Huanming Yang66, Anne D Yoder99,100, Guojie Zhang64,66,101,102. 1. Department of Evolution and Ecology, College of Biological Sciences, University of California, Davis, CA 95616; lewin@ucdavis.edu. 2. Department of Population Health and Reproduction, University of California, Davis, CA 95616. 3. University of California Davis Genome Center, University of California, Davis, CA 95616. 4. DNA Zoo and The Center for Genome Architecture, Baylor College of Medicine, Houston, TX 77030. 5. Center for Genome Regulation, Universidad de Chile 3425 Santiago, Chile. 6. Facultad de Ciencias, Universidad de Chile 3425 Santiago, Chile. 7. Department of Biochemistry & Molecular Biology, Dalhousie University, Halifax, NS B3H 4H7, Canada. 8. LOEWE Centre of Translational Biodiversity Genomics, Senckenberg Leibniz Institution for Biodiversity and Earth System Research 60325 Frankfurt am Main, Germany. 9. Institute for Insect Biotechnology, Justus-Liebig University 35392 Giessen, Germany. 10. Global Genome Biodiversity Network Secretariat, National Museum of Natural History, Smithsonian Institution, Washington, DC 20560. 11. Revive & Restore, Sausalito, CA 94965. 12. School of Life and Environmental Sciences, University of Sydney, Sydney, NSW 2006, Australia. 13. Department of Life Sciences and Biotechnology, University of Ferrara 44121 Ferrara, Italy. 14. Tree of Life, Wellcome Sanger Institute, Cambridge CB10 1SA, United Kingdom. 15. School of Ecology and Environment, Northwestern Polytechnical University 710072 Xi'an, China. 16. The Novim Group, University of California, Santa Barbara, CA 93106. 17. Spacetime Ventures 05449-050 São Paulo, Brazil. 18. Biodiversity Research Center, Academia Sinica 11529 Taipei, Taiwan. 19. Bee Research Laboratory, Beltsville Agricultural Research Center, US Department of Agriculture, Agriculture Research Service, Beltsville, MD 20705. 20. Global Genome Initiative, National Museum of Natural History, Smithsonian Institution, Washington, DC 20560. 21. Conservation Science, Species360 Conservation Science Alliance, Bloomington, MN 55425. 22. Department of Biology, University of Southern Denmark 5230 Odense M, Denmark. 23. Department of Genetics, Microbiology, and Statistics, Universitat de Barcelona 08028 Barcelona, Spain. 24. Catalan Society for Biology, Institute for Catalan Studies 08001 Barcelona, Spain. 25. Department of Biostatistics & Bioinformatics, Computational Biology Institute, George Washington University, Washington, DC 20052. 26. Department of Biostatistics & Bioinformatics, Milken Institute School of Public Health, George Washington University, Washington, DC 20052. 27. Department of Biological Sciences, Universidad de los Andes 111711 Bogotá, Colombia. 28. Genome British Columbia, Vancouver, BC V5Z 0C4, Canada. 29. Department of Genetics, University of Cambridge, Cambridge CB2 3EH, United Kingdom. 30. Wellcome Sanger Institute, Cambridge CB10 1SA, United Kingdom. 31. UniProt, European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Cambridge CB10 1SD, United Kingdom. 32. Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138. 33. Museum of Comparative Zoology, Harvard University, Cambridge, MA 02138. 34. Laboratory of the Neurogenetics of Language, The Rockefeller University, New York, NY 10065. 35. European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge CB10 1SD, United Kingdom. 36. Vertebrate Genome Laboratory, The Rockefeller University, New York, NY 10065. 37. Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030. 38. GLOBE Institute, University of Copenhagen 1350 Copenhagen, Denmark. 39. University Museum, Norwegian University of Science and Technology 7491 Trondheim, Norway. 40. Department of Health Policy and Management, George Washington University, Washington, DC 20052. 41. School of Life Sciences, La Trobe University, Bundoora, VIC 3086, Australia. 42. Institute for Applied Ecology, University of Canberra, Bruce, ACT 2617, Australia. 43. Stanford Law School, Stanford University, Stanford, CA 94305. 44. US Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA 94720. 45. Department of Plant and Microbial Biology, University of California, Berkeley, CA 94720. 46. Office of National Programs, US Department of Agriculture, Agricultural Research Service, Beltsville, MD 20705. 47. Earlham Institute, Norwich Research Park, Norwich NR4 7UZ, United Kingdom. 48. Genome Institute, University of California, Santa Cruz, CA 95060. 49. HHMI, Chevy Chase, MD 20815. 50. Australian Museum Research Institute, Australian Museum, Sydney, NSW 2000, Australia. 51. Department of Frontier Research and Development, Kazusa DNA Research Institute, Chiba 292-0818, Japan. 52. Department of Biosciences, University of Oslo, Oslo 0316, Norway. 53. Walter Reed Biosystematics Unit, Smithsonian Institution, Suitland, MD 20746. 54. Center for Species Survival, Smithsonian Conservation Biology Institute, National Zoological Park, Front Royal, VA 22630. 55. Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC V5Z 4S6, Canada. 56. Bioinformatics and Integrative Biology, University of Massachusetts Medical School, Worcester, MA 01605. 57. Broad Institute of MIT and Harvard, Cambridge, MA 02142. 58. Royal Botanic Gardens, Kew, Richmond TW9 3AE, United Kingdom. 59. Division of Life Sciences, Korea Polar Research Institute 21990 Incheon, South Korea. 60. Museum of Natural History, Smithsonian Institution, Washington, DC 20013-7012. 61. Department of Genomics and Evolutionary Biology, National Institute of Genetics 411-8540 Shizuoka, Japan. 62. Laboratory for Phyloinformatics, RIKEN Center for Biosystems Dynamics Research 650-0047 Hyogo, Japan. 63. Department of Plant Biology, University of Georgia, Athens, GA 30602. 64. State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences 650223 Yunnan, China. 65. Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University 752 36 Uppsala, Sweden. 66. BGI-Research, Beijing Genomics Institute-Shenzhen 518083 Shenzhen, China. 67. Department of Biological Sciences, Halmos College of Arts and Sciences, Nova Southeastern University, Dania Beach, FL 33004. 68. Guy Harvey Oceanographic Center, Dania Beach, FL 33004. 69. Institute of Evolutionary Biology, Pompeu Fabra University, Consejo Superior de Investigaciones Cientificas, Parc de Recerca Biomedica de Barcelona 08003 Barcelona, Spain. 70. Catalan Institute of Research and Advanced Studies 08010 Barcelona, Spain. 71. Centre Nacional d'Anàlisi Genòmica, Centre for Genomic Regulation, Barcelona Institute of Science and Technology 08028 Barcelona, Spain. 72. Institut Català de Paleontologia Miquel Crusafont, Universitat Autònoma de Barcelona 08193 Barcelona, Spain. 73. Bioplatforms Australia, Macquarie University, Sydney, NSW 2109, Australia. 74. One Health Institute, University of California Davis, CA 95616. 75. Berlin Center for Genomics in Biodiversity Research 14195 Berlin, Germany. 76. Evolutionary Genetics Department, Leibniz Institute for Zoo and Wildlife Research 10315 Berlin, Germany. 77. Max Planck Institute for Molecular Cell Biology and Genetics 01307 Dresden, Germany. 78. Institute for Systems Genomics, University of Connecticut, Storrs, CT 06269. 79. Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT 06269. 80. Division of Biotechnology, Korea University 02841 Seoul, Korea. 81. Department of Entomology, Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL 61801. 82. Systematics and Evolution of Vascular Plants Associated Unit to Consejo Superior de Investigaciones Cientificas, Departament de Biologia Animal, Biologia Vegetal i Ecologia, Universitat Autònoma de Barcelona 08193 Bellaterra, Spain. 83. Laboratoire d'Ecologie Alpine, University Grenoble Alpes, University Savoie Mont Blanc, CNRS 38000 Grenoble, France. 84. Conservation Genetics, San Diego Zoo Wildlife Alliance, Escondido, CA 92027. 85. Division of Biology, Department of Evolution, Behavior, and Ecology, University of California, San Diego, La Jolla, CA 92039. 86. Department of Biological Sciences, Faculty of Science, King Abdulaziz University 21589 Jeddah, Saudi Arabia. 87. Centre of Excellence in Bionanoscience Research, King Abdulaziz University 21589 Jeddah, Saudi Arabia. 88. La Kretz Center for California Conservation Science, Institute of Environment and Sustainability, University of California, Los Angeles, CA 90024. 89. Department of Ecology and Evolutionary Biology, University of California, Los Angeles, CA 90095. 90. Biology Department, Woods Hole Oceanographic Institution, Woods Hole, MA 02543. 91. College of Law, University of Illinois at Urbana-Champaign, Champaign, IL 61820. 92. Florida Museum of Natural History, University of Florida, Gainesville, FL 32611. 93. Biodiversity Institute, University of Florida, Gainesville, FL 32611. 94. Jiangsu Key Laboratory for Bioresources of Saline Soils, Jiangsu Provincial Key Laboratory of Coastal Wetland Bioresources and Environmental Protection, Jiangsu Synthetic Innovation Center for Coastal Bio-agriculture, School of Wetlands, Yancheng Teachers University 224002 Yancheng, China. 95. Center of Mycology and Microbiology, University of Tartu 50411 Tartu, Estonia. 96. College of Science, King Saud University 11451 Riyadh, Saudi Arabia. 97. Research and Collections, Natural History Museum of Los Angeles County, Los Angeles, CA 90007. 98. Biological Sciences, University of Southern California, Los Angeles, CA 90089. 99. Department of Biology, Duke University, Durham, NC 27708. 100. Duke Center for Genomic and Computational Biology, Duke University, Durham, NC 27708. 101. Villum Center for Biodiversity Genomics, Section for Ecology and Evolution, Department of Biology, University of Copenhagen 2100 Copenhagen, Denmark. 102. China National Genebank, Beijing Genomics Institute 51803 Shenzhen, China.
November 2020 marked 2 y since the launch of the Earth BioGenome Project (EBP), which aims to sequence all known eukaryotic species in a 10-y timeframe. Since then, significant progress has been made across all aspects of the EBP roadmap, as outlined in the 2018 article describing the project’s goals, strategies, and challenges (1). The launch phase has ended and the clock has started on reaching the EBP’s major milestones. This Special Feature explores the many facets of the EBP, including a review of progress, a description of major scientific goals, exemplar projects, ethical legal and social issues, and applications of biodiversity genomics. In this Introduction, we summarize the current status of the EBP, held virtually October 5 to 9, 2020, including recent updates through February 2021. References to the nine Perspective articles included in this Special Feature are cited to guide the reader toward deeper understanding of the goals and challenges facing the EBP.It is urgent that the EBP move forward. The year 2020 marked a global failure in meeting any of the 20 “Aichi goals” for the preservation of wildlife and ecosystems (2). The International Union for Conservation of Nature now counts more than 35,000 (28%) of all surveyed species of plants and animals as threatened with extinction (3). The Earth may lose 50% of its biodiversity by the end of this century if nothing is done to mitigate the anthropogenic factors that drive species to extinction and destroy the health of global ecosystems that sustain human existence (2). Degradation of aquatic and terrestrial ecosystems has continued unabated, and we may soon face the possibility of massive ecosystem collapse on a global scale.Such a collapse would have an enormous impact not only on biodiversity, but also on global political stability, and might ultimately affect the survival of our own species. Biological diversity underpins ecosystem services: that is, those services provided by nature that generate food, clean air and water, regulation of critical environmental processes and biogeochemical cycles, and are the basis for deep cultural and esthetic ties between humans and the natural world. Biodiversity is also foundational for the rapidly growing global bioeconomy that exceeds $500 billion each year in just the United States and European Union (4, 5), and it is essential for sustainable food security (6). If biodiversity disappears, so too will the potential for a new inclusive bioeconomy that is possible through a combination of genomics, computational biology, and synthetic biology, identified by the World Economic Forum as key to the fourth Industrial Revolution (7) and estimated to be worth up to US $3 to 5 trillion per annum (8).The year 2020 will also be remembered in history as the beginning of the COVID-19 pandemic. The virus that causes COVID-19, SARS-CoV-2, evolved from a bat betacoronavirus (9), possibly finding its way into the human population through an intermediate host that has yet to be identified (10). Spillover of SARS-CoV-2 infection to wildlife, pets, and captive-bred animals demonstrates the interconnectedness of life on Earth, reinforcing the One Health concept that all organisms are interdependent: the health of one impacts the health of all (11). A One Health approach to addressing the biodiversity crisis critically relies on supporting infrastructures, such as the genomic infrastructure that can be provided by the EBP and affiliated projects. The economic disaster and devastating human death toll caused by the pandemic illustrate just how critical it is to have knowledge of potential human pathogens and their hosts before such events arise (12). Clearly, DNA sequence information on the virus and its potential hosts has helped the world to manage and hopefully soon contain COVID-19. Similarly, creating a library of DNA sequences for all known eukaryotic life can contribute critical data necessary to generate effective tools for preventing biodiversity loss and pathogen spread, monitoring and protecting ecosystems, and enhancing ecosystem services [see The Darwin Tree of Life Project Consortium, this issue (13)]. The EBP’s proactive stance on understanding the ethical, legal, and social issues surrounding the project will also inform recommendations on access and commercial benefit sharing, equity, and inclusion in the biodiversity genomics community and in indigenous communities within the world’s most biodiverse countries [see McCartney et al., this issue (14)].
Organization and Governance
A critical role of the EBP organization is to: develop and promote standards for the scalable production of reference-quality genomes; dissemination of best practices; coordination of sequencing, annotation, data analysis, and training activities; public accessibility of data; and communications about the project’s progress. To accomplish these goals, the EBP was established as an international network-of-networks: organizations that specialize in sample acquisition and vouchering; technology centers for sequencing, assembly, and annotation; and affiliated projects with deep expertise with specific taxonomic groups, biomes, and ecosystems (Box 1). In addition, the EBP develops ethical standards for project participation, data sharing, access and benefit sharing of intellectual property derived from whole-genome sequencing [see Sherkow et al., this issue (15)], and promotes programs for diversity, equity, inclusion, and justice among the project’s participants. The EBP Member Institutions and Affiliated Projects are committed to open data access and compliance with the principles of Access and Benefits Sharing under the Convention on Biological Diversity and the Nagoya Protocol (16). The EBP communicates progress and information about the project through its website (https://www.earthbiogenome.org), its Twitter handle (@EBPgenome), and other social media accounts, currently with more than 2,000 followers.The EBP international network-of-networks functions to support the three proposed phases of the EBPPhase I: An annotated reference genome for one representative of each taxonomic family of eukaryotes (∼9,400 species) in 3 y.Phase II: Reference genomes for one representative of each genus (∼180,000 species) in years 4 to 7.Phase III: Reference genomes for remaining ∼1.65 million known eukaryotic species in the final 3 y of the project.The EBP Secretariat is located at the University of California, Davis, and operates under a Memorandum of Understanding between participating institutions available at the EBP website, https://www.earthbiogenome.org. The representatives of member institutions have adopted an interim governance structure ().An interim governance committee is in place, The Earth BioGenome Project Working Group, which as of February 2021 consists of one representative of each of the 43 Memorandum of Understanding-signing institutions (see list on the EBP website, https://www.earthbiogenome.org) and 44 affiliated projects (Dataset S1; brief summaries of 21 affiliated projects can be found in ), with membership up 121% and 153%, respectively, since 2018. The Chair of the EBP Working Group coordinates the activities of all the working committees and conducts extensive international outreach for promoting collaboration between member institutions and affiliated projects, implementation of standards, assisting the formation of national and regional projects, and coordination of activities across the EBP network-of-networks. The International Science Committee consists of a chairperson and five subcommittees that are responsible for standards development in the following areas: sample collection and processing, sequencing and assembly, annotation, information technology and informatics, and data analyses. Committee reports are available on the EBP website (https://www.earthbiogenome.org) and summarized in this issue. The EBP plans to formally adopt a permanent governance structure in 2021. Those institutions and projects that are interested in joining the EBP should contact the Secretariat using the EBP website for further information.The EBP’s Committee on Ethical, Legal, and Social Issues (ELSI), established in 2020, makes recommendations to the EBP Working Group on legal obligations relating to the Nagoya Protocol on Access and Benefit Sharing; ethical considerations relating to collection of samples, societal concerns, and biosecurity; and collaboration standards (e.g., sample information, digital sequence information, intellectual property, authorship and publication guidelines). The committee’s outline of the ELSI issues facing the EBP can be found in this issue (15). A Committee on Diversity, Equity, Inclusion, and Justice (DEIJ) was approved recently by the EBP Working Group. DEIJ recommendations will be based on participatory approaches with fair treatment and meaningful involvement of all people to define processes and practices for creating a welcoming, inclusive, and supportive biodiversity genomics community.
Global Status of Biodiversity Sequencing
Our current ability to investigate the diversity and evolution of Earth’s biota is severely constrained by the absence of high-quality genome sequences for most of the species on the eukaryotic tree of life. There are now ∼1.84 million taxonomically classified eukaryotic species, but the estimated number of eukaryotic species is 12 to 15 million, including 8.1 million plants and animals (17). The EBP aims to sequence all classified species and to facilitate the discovery and classification of new species. As of March 4, 2021, the International Nucleotide Sequence Database Collaboration (INSDC) contained whole-genome DNA sequence information on 6,480 unique species, representing 81.4% of eukaryotic phyla, 64.7% of classes, 40.1% of orders, 15.5% of families, 2.3% of genera, and just 0.43% of all species (Fig. 1).
Fig. 1.
Global progress in whole-genome sequencing across all eukaryotic taxonomic levels. Data source: National Center for Biotechnology Information, March 4, 2021 (18).
Global progress in whole-genome sequencing across all eukaryotic taxonomic levels. Data source: National Center for Biotechnology Information, March 4, 2021 (18).However, the assembly quality of these 6,480 species’ genomes varies greatly (). A majority (63.1%) of the assemblies falls into the short-read draft category, with contig N50 < 100 kb and scaffold N50 < 10 Mb. A relatively small number of the draft-quality assemblies have achieved greater contiguity using scaffolding methods, such as Hi-C, linked-reads, and optical maps (19). The number of unique eukaryotic species with whole-genome assemblies has more than doubled since 2018 (Fig. 2), most of which are short-read draft quality. The number of reference-quality chromosome-scale assemblies of unique species representing taxonomic families nearly tripled since 2018, from 210 to 583. EBP-affiliated projects produced about half of these new reference-quality assemblies (see below), demonstrating the efficacy of shared goals and standards.
Fig. 2.
Year-over-year progress in whole genome sequencing for all eukaryotic taxa (Upper) and family-level (Lower) eukaryotic taxa, 2010 to March 4, 2021. The metrics for draft and reference quality assemblies are given in the text.
Year-over-year progress in whole genome sequencing for all eukaryotic taxa (Upper) and family-level (Lower) eukaryotic taxa, 2010 to March 4, 2021. The metrics for draft and reference quality assemblies are given in the text.
Progress of the EBP toward Phase I Goals
The past 2 y represent the start-up phase of the EBP. The major activities of the international EBP network-of-networks include: the development of standards; the evaluation of strategies for producing reference genomes; organizing regional, national, and transnational projects; and building communities through regular working committee meetings and an annual conference. The “Biodiversity Genomics 2020” conference was held virtually and had 3,000 registrants from 89 countries. The full recording of the meeting is available (20). The EBP is also developing new initiatives in training, broadening diversity and inclusion in project leadership, and building support for project funding from government agencies and private foundations around the world.The current line-up of 43 EBP-affiliated projects cover most of the major groups of eukaryotic taxa and represent access to tens of thousands of high-quality samples in museum collections and those from field biologists. The geographic diversity of the institutional members and affiliated projects cover 21 countries across all continents except Antarctica. The first African nodes have recently come on line in 2021 as part of the Africa BioGenome Project. The EBP also aims to expand member institutions and affiliated projects across additional biodiverse regions of the world, including the Indian subcontinent, Southeast Asia, and South America [for example, see Huddart et al. (21), this issue]. With high endemism concentrated in these regions, the ultimate success of the EBP requires building scientific capacity in developing nations and respecting national laws for access and benefit sharing.EBP-affiliated projects, such as the Darwin Tree of Life Project [see The Darwin Tree of Life Project Consortium, this issue (22)], The Vertebrate Genomes Project, 1000 Fungal Genomes Project, B10K (sequencing 10,000 bird species), and others have led the way in producing publicly accessible high-quality genomes (Table 1 and ). A Perspective on sequencing of plant genomes is included in this special issue (23). EBP-affiliated sequencing centers around the world are now coming online for the production of reference genomes using a simplified pipeline consisting of long reads and Hi-C (or equivalent), and other scaffolding methods, such as optical mapping, and public domain assembly tools, such as the recently developed hifiasm for generating long-read–based contigs (24) and SALSA for generating Hi-C scaffolds (25). This simplified approach, within the reach of most EBP-affiliated laboratories, yields chromosome-scale assemblies that meet the EBP standard (see above).
Table 1.
Progress of EBP affiliated projects in whole-genome sequencing and the production of reference genomes
Project name
No. of species
No. of references
No. of families with reference
No. of references 2021
No. of drafts 2021
1000 Fungal Genomes
663
20
10
10
100
B10K (birds)
400
32
29
0
400
Zoonomia (mammals)
130
0
0
0
0
VGP (vertebrates)
128
129
119
200
0
i5K (arthropods)
86
2
2
2
8
Darwin Tree of Life
71
71
14
1,500
0
Tree of Life, Sanger
50
50
50
300
0
LOEWE Center
43
43
43
84
95
Ungulates Genome Project
41
0
0
4
6
CanSeq150
36
21
0
0
0
10KP (plants)
21
2
2
16
42
All other
80
42
30
905
1,505
Total
1,719
412
316
3,021
2,156
All tabulated genomes in the first three columns have been submitted to the INSDC or other public domain databases. Numbers in the last two columns are projected additional species genomes for 2021. A complete table with INSDC project identifiers can be found in Dataset S1. Totals include some species that overlap between projects.
Progress of EBP affiliated projects in whole-genome sequencing and the production of reference genomesAll tabulated genomes in the first three columns have been submitted to the INSDC or other public domain databases. Numbers in the last two columns are projected additional species genomes for 2021. A complete table with INSDC project identifiers can be found in Dataset S1. Totals include some species that overlap between projects.The EBP-affiliated projects have sequenced the genomes of 1,719 eukaryotic species, all of which have assemblies deposited in public domain databases (Table 1 and Dataset S1). Of these, 316 are reference-quality genomes, constituting ∼50% of all the genomes in the INSDC that meet the EBP reference standard. Furthermore, these already represent more than 200 taxonomically distinct nonredundant families. Thus, in the start-up phase, EBP-affiliated projects have sequenced ∼2% of extant eukaryotic families to reference-level quality. There are 3,021 family-level reference genomes expected to be completed in 2021. Thus, by the end of 2021, the first full year of the project, we project that ∼3,200 taxonomic families will have been sampled with at least one reference genome, corresponding to 34% completion of the EBP Phase I goal.Other large-scale initiatives with complementary goals have joined EBP as affiliated projects. These include BIOSCAN (26) and the Global Virome Project (27). BIOSCAN aims to DNA barcode every eukaryotic species on Earth, which will be critical to the EBP sample vouchering process and for accessing rare samples for sequencing. Partnership with the Global Virome Project creates an exciting avenue to identify potentially pathogenic viruses linked with their host species and for codevelopment of biosurveillance strategies (12). Integrated high-level coordination between these projects will have synergistic effects on biodiversity science and societal outcomes. A broad perspective on the scientific challenges and opportunities enabled by large-scale comparative genomics is provided by Stephan et al., this issue (28).
The Challenges Ahead
Although the number of reference-quality genomes at the family level tripled from 2018 to March 4, 2021 (Fig. 2), the EBP will have to produce nearly 3,000 genomes per year to meet the EBP Phase I goal of producing at least one reference genome from all ∼9,400 eukaryotic families in 3 y. The main challenges in meeting this target are given in Box 2.Challenges in meeting EBP goalsSourcing, vouchering, and permitting thousands of specimens globallyHigh molecular weight DNA and RNA isolation at scaleSequencing capacity and throughputAssembly and curation at scaleAnnotation at scaleManaging data flow in the context of international current and future data access and sharing regulationsWhole genome alignments at scaleComparative genomic analysis, population genomics, and data visualization at scaleTo meet the EBP Phase I goal, the EBP network-of-networks will need to produce nine genomes per day, 365 d/y. Is this feasible? The Wellcome Sanger Institute alone plans to produce 1,500 reference-quality genomes in 2021 as part of the Darwin Tree of Life Project, corresponding to four genomes per day. As presented in Table 1, the Institute is already well on its way to achieving this goal in the coming year. The Vertebrate Genomes Project aims to produce six genomes per week to complete its goal of producing high-quality assemblies for species representing 260 vertebrate lineages separated by 50 million y or more from a common ancestor (19), by the end of 2021. With current technology and funded commitments for 2021 by EBP-affiliated sequencing centers, reaching the goal of 9 genomes per day globally, or nearly 3,000 annually, is anticipated (Table 1). The main challenge will be sourcing high-quality taxonomically identified samples for the isolation of high molecular weight DNA and RNA required for long-read DNA sequencing, scaffolding, and annotation. Separate from the current commitments above, about 50% of the taxonomic families could be obtained today from existing collections in the Global Genome Biodiversity Network () (29). Obtaining samples from many countries may require diverse permit processes that can last weeks to years. The EBP is working to develop long-term collaborations to facilitate sample access across the world.Another critical challenge will be obtaining reference-quality assemblies from small organisms, single-cell eukaryotes, and some green plants. New low-DNA input methods (30) have essentially solved the problem for most metazoans, but not for single-cell eukaryotes that cannot be cultured. Producing reference-quality genomes thus remains a significant challenge for a large part of the eukaryotic tree of life. Setting standards for the generation and storage of the complex set of genomes that characterize green plants will need to accommodate the immense variation in their size, transposable element content, and structure, while enabling research into the molecular and evolutionary processes that have resulted in this enormous genomic variation (23). Recommendations for sample collection and processing are included in this issue. Accelerating the annotation pipeline will also present major challenges as the production of genomes scales up. Planned 2021 annotation throughput is 300, 400, and 500 species for the National Center for Biotechnology Information, Joint Genome Institute, and European Molecular Biology Laboratory–European Bioinformatics Institute, respectively, which remains short of what will be necessary. This issue can be addressed by expanding capacity and creating more efficient genome annotation tools (31). Current recommendations for genome annotation are provided in this issue .To achieve the outputs required for Phase II and Phase III, dramatic increases in genome sequence production and efficiency will be required. Sequencing one representative for each of ∼165,000 genera in 4 y will require an increase in the throughput of genomes from 9 per day to 123 per day, or 14-fold above the Phase I target. Phase III will require another 10-fold increase above the Phase II target in order to complete the project in 10 y. We are optimistic that within 5 y, sample processing and sequencing technology will improve and costs will be reduced so that reference-quality genomes can be produced for all species for under USD $1,000 for a 2-Gb genome. We note that the cost, accuracy, and contiguity of assemblies produced today with long reads were not available 2 y ago. High-quality draft assemblies based on long reads can already be produced for ∼$2,000 in reagents and compute per 1-Gb genome average, getting closer to the $800 originally envisioned for short-read draft-quality genomes (1). Sequencing done for Phases II and III should meet or exceed the minimum standards for short-read–based draft assemblies: contig N50 > 100 Kb, scaffold N50 > 1 Mbp (or chromosome scale for smaller genomes), QV30. Although the EBP aspires to produce chromosome-level assemblies for all species, for uncultured microbial eukaryotes and highly repetitive genomes, the project will sacrifice perfection for progress in the near term.In 2018, we estimated a total EBP cost of USD $4.7 billion. This is significantly less than the original USD $2.7 billion (1991 dollars) cost of sequencing the human genome, comparable with USD $5.2 billion today. We note that producing complete telomere-to-telomere assemblies for all human chromosomes is a mission that is now being realized (32), and that the true cost of sequencing the human genome is significantly higher than the original USD $2.7 billion price tag. Reference-quality genomes currently being produced by the EBP’s sequencing nodes are of far greater quality (i.e., continuity, completeness, phasing) than the original “complete” human genome sequence [e.g., Rhie et al. (19)], and can now be produced for about USD $10,000 per 2-Gb genome, including transcriptome data for annotation. This amount is 20% of the cost of a similar quality assembly only 3 y ago when the original estimates were made. The project will save about USD $186 million in Phase I due to these improvements, bringing the total cost of Phase I down to $414 million from $600 million.The EBP has embraced the strategy of supporting funding efforts by states and nations: for example, the California Conservation Genomics Project and 1000 Chilean Genomes (), and EBP-Colombia (21). This effort has proven highly successful as it allows for local and regional concerns to be addressed in the funding drive. For example, in Australia there is great interest in conserving endangered marsupial species [see Hogg, this issue (33)]. This has led to a funded project that will produce five new marsupial reference genomes in 2021 (Table 1). Other examples include the Catalan Initiative for the Earth BioGenome Project, which aims to prioritize sequencing of endemic species with the goal of eventually sequencing all species in the Catalan territories (). National funding also provides an inherent mechanism for compliance with national laws on access and benefit sharing, which may prove essential for building trust, and ultimately obtaining all taxonomically classified species for sequencing. Capacity building in developing countries will be a direct benefit of participation.
Conclusions
The past year has been one of great progress for the EBP, marking the start of the clock for completing Phase I of the project. There are many challenges ahead in meeting Phase II and Phase III goals. Clearly, the ultimate aim of sequencing 1.84 million eukaryotes cannot be achieved by a single country or private entity. The coordinated efforts of thousands of scientists and institutions around the world are needed to produce ∼9,400 family reference genomes in 3 y. The project needs significant amounts of new funding, but the investments required on a global scale should be obtainable given the importance of the project to conserving and enhancing ecosystem services in the context of climate change and promoting a new bioeconomy. Despite limited financial resources for coordination, the EBP international network-of-networks has matured as the world’s most technically advanced organization to tackle the grand challenge of sequencing all known eukaryotes, identifying their genes and functions, advancing our understanding of the evolution of life on Earth, and developing a complete genomic characterization of Earth’s critical ecosystems. Based on a survey of institutional members and affiliates, the EBP now includes more than 5,000 scientists and technical staff around the world who are dedicated to EBP’s mission. The EBP has unleashed tremendous passion and energy among the project’s participants, particularly its younger generation of scientists and the general public.Given the precarious condition of Earth’s biodiversity, it is essential that the EBP and its affiliated projects achieve their ambitious goals. In the words of David Attenborough, “Extinction is forever—so our action must be immediate.” Every eukaryotic species is the product of millions of years of evolution. Recorded in their genomes are secrets that can fundamentally change our understanding of the evolution of life on Earth—its very existence and essence—and may lead to radical new approaches for mitigating the effects of climate change on biodiversity, improving agriculture, growing a sustainable global bioeconomy, saving species and repairing ecosystems, and preventing future pandemics. Let us go forth and sequence!
Authors: Dennis Carroll; Peter Daszak; Nathan D Wolfe; George F Gao; Carlos M Morel; Subhash Morzaria; Ariel Pablos-Méndez; Oyewale Tomori; Jonna A K Mazet Journal: Science Date: 2018-02-23 Impact factor: 47.728
Authors: Harris A Lewin; Gene E Robinson; W John Kress; William J Baker; Jonathan Coddington; Keith A Crandall; Richard Durbin; Scott V Edwards; Félix Forest; M Thomas P Gilbert; Melissa M Goldstein; Igor V Grigoriev; Kevin J Hackett; David Haussler; Erich D Jarvis; Warren E Johnson; Aristides Patrinos; Stephen Richards; Juan Carlos Castilla-Rubio; Marie-Anne van Sluys; Pamela S Soltis; Xun Xu; Huanming Yang; Guojie Zhang Journal: Proc Natl Acad Sci U S A Date: 2018-04-24 Impact factor: 11.205
Authors: Joana Damas; Graham M Hughes; Kathleen C Keough; Corrie A Painter; Nicole S Persky; Marco Corbo; Michael Hiller; Klaus-Peter Koepfli; Andreas R Pfenning; Huabin Zhao; Diane P Genereux; Ross Swofford; Katherine S Pollard; Oliver A Ryder; Martin T Nweeia; Kerstin Lindblad-Toh; Emma C Teeling; Elinor K Karlsson; Harris A Lewin Journal: Proc Natl Acad Sci U S A Date: 2020-08-21 Impact factor: 11.205
Authors: Karen H Miga; Sergey Koren; Arang Rhie; Mitchell R Vollger; Ariel Gershman; Andrey Bzikadze; Shelise Brooks; Edmund Howe; David Porubsky; Glennis A Logsdon; Valerie A Schneider; Tamara Potapova; Jonathan Wood; William Chow; Joel Armstrong; Jeanne Fredrickson; Evgenia Pak; Kristof Tigyi; Milinn Kremitzki; Christopher Markovic; Valerie Maduro; Amalia Dutra; Gerard G Bouffard; Alexander M Chang; Nancy F Hansen; Amy B Wilfert; Françoise Thibaud-Nissen; Anthony D Schmitt; Jon-Matthew Belton; Siddarth Selvaraj; Megan Y Dennis; Daniela C Soto; Ruta Sahasrabudhe; Gulhan Kaya; Josh Quick; Nicholas J Loman; Nadine Holmes; Matthew Loose; Urvashi Surti; Rosa Ana Risques; Tina A Graves Lindsay; Robert Fulton; Ira Hall; Benedict Paten; Kerstin Howe; Winston Timp; Alice Young; James C Mullikin; Pavel A Pevzner; Jennifer L Gerton; Beth A Sullivan; Evan E Eichler; Adam M Phillippy Journal: Nature Date: 2020-07-14 Impact factor: 49.962
Authors: Arang Rhie; Shane A McCarthy; Olivier Fedrigo; Joana Damas; Giulio Formenti; Sergey Koren; Marcela Uliano-Silva; William Chow; Arkarachai Fungtammasan; Juwan Kim; Chul Lee; Byung June Ko; Mark Chaisson; Gregory L Gedman; Lindsey J Cantin; Francoise Thibaud-Nissen; Leanne Haggerty; Iliana Bista; Michelle Smith; Bettina Haase; Jacquelyn Mountcastle; Sylke Winkler; Sadye Paez; Jason Howard; Sonja C Vernes; Tanya M Lama; Frank Grutzner; Wesley C Warren; Christopher N Balakrishnan; Dave Burt; Julia M George; Matthew T Biegler; David Iorns; Andrew Digby; Daryl Eason; Bruce Robertson; Taylor Edwards; Mark Wilkinson; George Turner; Axel Meyer; Andreas F Kautt; Paolo Franchini; H William Detrich; Hannes Svardal; Maximilian Wagner; Gavin J P Naylor; Martin Pippel; Milan Malinsky; Mark Mooney; Maria Simbirsky; Brett T Hannigan; Trevor Pesout; Marlys Houck; Ann Misuraca; Sarah B Kingan; Richard Hall; Zev Kronenberg; Ivan Sović; Christopher Dunn; Zemin Ning; Alex Hastie; Joyce Lee; Siddarth Selvaraj; Richard E Green; Nicholas H Putnam; Ivo Gut; Jay Ghurye; Erik Garrison; Ying Sims; Joanna Collins; Sarah Pelan; James Torrance; Alan Tracey; Jonathan Wood; Robel E Dagnew; Dengfeng Guan; Sarah E London; David F Clayton; Claudio V Mello; Samantha R Friedrich; Peter V Lovell; Ekaterina Osipova; Farooq O Al-Ajli; Simona Secomandi; Heebal Kim; Constantina Theofanopoulou; Michael Hiller; Yang Zhou; Robert S Harris; Kateryna D Makova; Paul Medvedev; Jinna Hoffman; Patrick Masterson; Karen Clark; Fergal Martin; Kevin Howe; Paul Flicek; Brian P Walenz; Woori Kwak; Hiram Clawson; Mark Diekhans; Luis Nassar; Benedict Paten; Robert H S Kraus; Andrew J Crawford; M Thomas P Gilbert; Guojie Zhang; Byrappa Venkatesh; Robert W Murphy; Klaus-Peter Koepfli; Beth Shapiro; Warren E Johnson; Federica Di Palma; Tomas Marques-Bonet; Emma C Teeling; Tandy Warnow; Jennifer Marshall Graves; Oliver A Ryder; David Haussler; Stephen J O'Brien; Jonas Korlach; Harris A Lewin; Kerstin Howe; Eugene W Myers; Richard Durbin; Adam M Phillippy; Erich D Jarvis Journal: Nature Date: 2021-04-28 Impact factor: 49.962
Authors: Gabriele Droege; Katharine Barker; Jonas J Astrin; Paul Bartels; Carol Butler; David Cantrill; Jonathan Coddington; Félix Forest; Birgit Gemeinholzer; Donald Hobern; Jacqueline Mackenzie-Dodds; Éamonn Ó Tuama; Gitte Petersen; Oris Sanjur; David Schindel; Ole Seberg Journal: Nucleic Acids Res Date: 2013-10-16 Impact factor: 16.971
Authors: Sarah B Kingan; Haynes Heaton; Juliana Cudini; Christine C Lambert; Primo Baybayan; Brendan D Galvin; Richard Durbin; Jonas Korlach; Mara K N Lawniczak Journal: Genes (Basel) Date: 2019-01-18 Impact factor: 4.096
Authors: ThankGod Echezona Ebenezer; Anne W T Muigai; Simplice Nouala; Bouabid Badaoui; Mark Blaxter; Alan G Buddie; Erich D Jarvis; Jonas Korlach; Josiah O Kuja; Harris A Lewin; Roksana Majewska; Ntanganedzeni Mapholi; Suresh Maslamoney; Michèle Mbo'o-Tchouawou; Julian O Osuji; Ole Seehausen; Oluwaseyi Shorinola; Christian Keambou Tiambo; Nicola Mulder; Cathrine Ziyomo; Appolinaire Djikeng Journal: Nature Date: 2022-03 Impact factor: 69.504