Literature DB >> 35042800

The Earth BioGenome Project 2020: Starting the clock.

Harris A Lewin^1,2, Stephen Richards³, Erez Lieberman Aiden⁴, Miguel L Allende^5,6, John M Archibald⁷, Miklós Bálint^8,9, Katharine B Barker¹⁰, Bridget Baumgartner¹¹, Katherine Belov¹², Giorgio Bertorelle¹³, Mark L Blaxter¹⁴, Jing Cai¹⁵, Nicolette D Caperello³, Keith Carlson¹⁶, Juan Carlos Castilla-Rubio¹⁷, Shu-Miaw Chaw¹⁸, Lei Chen¹⁵, Anna K Childers¹⁹, Jonathan A Coddington²⁰, Dalia A Conde^21,22, Montserrat Corominas^23,24, Keith A Crandall^25,26, Andrew J Crawford²⁷, Federica DiPalma²⁸, Richard Durbin^29,30, ThankGod E Ebenezer³¹, Scott V Edwards^32,33, Olivier Fedrigo³⁴, Paul Flicek^30,35, Giulio Formenti³⁶, Richard A Gibbs³⁷, M Thomas P Gilbert^38,39, Melissa M Goldstein⁴⁰, Jennifer Marshall Graves^41,42, Henry T Greely⁴³, Igor V Grigoriev^44,45, Kevin J Hackett⁴⁶, Neil Hall⁴⁷, David Haussler^48,49, Kristofer M Helgen⁵⁰, Carolyn J Hogg¹², Sachiko Isobe⁵¹, Kjetill Sigurd Jakobsen⁵², Axel Janke⁸, Erich D Jarvis^34,49, Warren E Johnson^53,54, Steven J M Jones⁵⁵, Elinor K Karlsson^56,57, Paul J Kersey⁵⁸, Jin-Hyoung Kim⁵⁹, W John Kress⁶⁰, Shigehiro Kuraku^61,62, Mara K N Lawniczak¹⁴, James H Leebens-Mack⁶³, Xueyan Li⁶⁴, Kerstin Lindblad-Toh^57,65, Xin Liu⁶⁶, Jose V Lopez^67,68, Tomas Marques-Bonet^69,70,71,72, Sophie Mazard⁷³, Jonna A K Mazet⁷⁴, Camila J Mazzoni^75,76, Eugene W Myers⁷⁷, Rachel J O'Neill^78,79, Sadye Paez³⁴, Hyun Park⁸⁰, Gene E Robinson⁸¹, Cristina Roquet^82,83, Oliver A Ryder^84,85, Jamal S M Sabir^86,87, H Bradley Shaffer^88,89, Timothy M Shank⁹⁰, Jacob S Sherkow^81,91, Pamela S Soltis^92,93, Boping Tang⁹⁴, Leho Tedersoo^95,96, Marcela Uliano-Silva¹⁴, Kun Wang¹⁵, Xiaofeng Wei⁶⁶, Regina Wetzer^97,98, Julia L Wilson³⁰, Xun Xu⁶⁶, Huanming Yang⁶⁶, Anne D Yoder^99,100, Guojie Zhang^{64,66,101,102}.

Abstract

Entities: Chemical

Mesh：

Year: 2022 PMID： 35042800 PMCID： PMC8795548 DOI： 10.1073/pnas.2115635118

Source DB: PubMed Journal: Proc Natl Acad Sci U S A ISSN： 0027-8424 Impact factor: 12.779

× No keyword cloud information.

November 2020 marked 2 y since the launch of the Earth BioGenome Project (EBP), which aims to sequence all known eukaryotic species in a 10-y timeframe. Since then, significant progress has been made across all aspects of the EBP roadmap, as outlined in the 2018 article describing the project’s goals, strategies, and challenges (1). The launch phase has ended and the clock has started on reaching the EBP’s major milestones. This Special Feature explores the many facets of the EBP, including a review of progress, a description of major scientific goals, exemplar projects, ethical legal and social issues, and applications of biodiversity genomics. In this Introduction, we summarize the current status of the EBP, held virtually October 5 to 9, 2020, including recent updates through February 2021. References to the nine Perspective articles included in this Special Feature are cited to guide the reader toward deeper understanding of the goals and challenges facing the EBP. It is urgent that the EBP move forward. The year 2020 marked a global failure in meeting any of the 20 “Aichi goals” for the preservation of wildlife and ecosystems (2). The International Union for Conservation of Nature now counts more than 35,000 (28%) of all surveyed species of plants and animals as threatened with extinction (3). The Earth may lose 50% of its biodiversity by the end of this century if nothing is done to mitigate the anthropogenic factors that drive species to extinction and destroy the health of global ecosystems that sustain human existence (2). Degradation of aquatic and terrestrial ecosystems has continued unabated, and we may soon face the possibility of massive ecosystem collapse on a global scale. Such a collapse would have an enormous impact not only on biodiversity, but also on global political stability, and might ultimately affect the survival of our own species. Biological diversity underpins ecosystem services: that is, those services provided by nature that generate food, clean air and water, regulation of critical environmental processes and biogeochemical cycles, and are the basis for deep cultural and esthetic ties between humans and the natural world. Biodiversity is also foundational for the rapidly growing global bioeconomy that exceeds $500 billion each year in just the United States and European Union (4, 5), and it is essential for sustainable food security (6). If biodiversity disappears, so too will the potential for a new inclusive bioeconomy that is possible through a combination of genomics, computational biology, and synthetic biology, identified by the World Economic Forum as key to the fourth Industrial Revolution (7) and estimated to be worth up to US $3 to 5 trillion per annum (8). The year 2020 will also be remembered in history as the beginning of the COVID-19 pandemic. The virus that causes COVID-19, SARS-CoV-2, evolved from a bat betacoronavirus (9), possibly finding its way into the human population through an intermediate host that has yet to be identified (10). Spillover of SARS-CoV-2 infection to wildlife, pets, and captive-bred animals demonstrates the interconnectedness of life on Earth, reinforcing the One Health concept that all organisms are interdependent: the health of one impacts the health of all (11). A One Health approach to addressing the biodiversity crisis critically relies on supporting infrastructures, such as the genomic infrastructure that can be provided by the EBP and affiliated projects. The economic disaster and devastating human death toll caused by the pandemic illustrate just how critical it is to have knowledge of potential human pathogens and their hosts before such events arise (12). Clearly, DNA sequence information on the virus and its potential hosts has helped the world to manage and hopefully soon contain COVID-19. Similarly, creating a library of DNA sequences for all known eukaryotic life can contribute critical data necessary to generate effective tools for preventing biodiversity loss and pathogen spread, monitoring and protecting ecosystems, and enhancing ecosystem services [see The Darwin Tree of Life Project Consortium, this issue (13)]. The EBP’s proactive stance on understanding the ethical, legal, and social issues surrounding the project will also inform recommendations on access and commercial benefit sharing, equity, and inclusion in the biodiversity genomics community and in indigenous communities within the world’s most biodiverse countries [see McCartney et al., this issue (14)].

Organization and Governance

A critical role of the EBP organization is to: develop and promote standards for the scalable production of reference-quality genomes; dissemination of best practices; coordination of sequencing, annotation, data analysis, and training activities; public accessibility of data; and communications about the project’s progress. To accomplish these goals, the EBP was established as an international network-of-networks: organizations that specialize in sample acquisition and vouchering; technology centers for sequencing, assembly, and annotation; and affiliated projects with deep expertise with specific taxonomic groups, biomes, and ecosystems (Box 1). In addition, the EBP develops ethical standards for project participation, data sharing, access and benefit sharing of intellectual property derived from whole-genome sequencing [see Sherkow et al., this issue (15)], and promotes programs for diversity, equity, inclusion, and justice among the project’s participants. The EBP Member Institutions and Affiliated Projects are committed to open data access and compliance with the principles of Access and Benefits Sharing under the Convention on Biological Diversity and the Nagoya Protocol (16). The EBP communicates progress and information about the project through its website (https://www.earthbiogenome.org), its Twitter handle (@EBPgenome), and other social media accounts, currently with more than 2,000 followers. The EBP international network-of-networks functions to support the three proposed phases of the EBP Phase I: An annotated reference genome for one representative of each taxonomic family of eukaryotes (∼9,400 species) in 3 y. Phase II: Reference genomes for one representative of each genus (∼180,000 species) in years 4 to 7. Phase III: Reference genomes for remaining ∼1.65 million known eukaryotic species in the final 3 y of the project. The EBP Secretariat is located at the University of California, Davis, and operates under a Memorandum of Understanding between participating institutions available at the EBP website, https://www.earthbiogenome.org. The representatives of member institutions have adopted an interim governance structure (). An interim governance committee is in place, The Earth BioGenome Project Working Group, which as of February 2021 consists of one representative of each of the 43 Memorandum of Understanding-signing institutions (see list on the EBP website, https://www.earthbiogenome.org) and 44 affiliated projects (Dataset S1; brief summaries of 21 affiliated projects can be found in ), with membership up 121% and 153%, respectively, since 2018. The Chair of the EBP Working Group coordinates the activities of all the working committees and conducts extensive international outreach for promoting collaboration between member institutions and affiliated projects, implementation of standards, assisting the formation of national and regional projects, and coordination of activities across the EBP network-of-networks. The International Science Committee consists of a chairperson and five subcommittees that are responsible for standards development in the following areas: sample collection and processing, sequencing and assembly, annotation, information technology and informatics, and data analyses. Committee reports are available on the EBP website (https://www.earthbiogenome.org) and summarized in this issue. The EBP plans to formally adopt a permanent governance structure in 2021. Those institutions and projects that are interested in joining the EBP should contact the Secretariat using the EBP website for further information. The EBP’s Committee on Ethical, Legal, and Social Issues (ELSI), established in 2020, makes recommendations to the EBP Working Group on legal obligations relating to the Nagoya Protocol on Access and Benefit Sharing; ethical considerations relating to collection of samples, societal concerns, and biosecurity; and collaboration standards (e.g., sample information, digital sequence information, intellectual property, authorship and publication guidelines). The committee’s outline of the ELSI issues facing the EBP can be found in this issue (15). A Committee on Diversity, Equity, Inclusion, and Justice (DEIJ) was approved recently by the EBP Working Group. DEIJ recommendations will be based on participatory approaches with fair treatment and meaningful involvement of all people to define processes and practices for creating a welcoming, inclusive, and supportive biodiversity genomics community.

Global Status of Biodiversity Sequencing

Our current ability to investigate the diversity and evolution of Earth’s biota is severely constrained by the absence of high-quality genome sequences for most of the species on the eukaryotic tree of life. There are now ∼1.84 million taxonomically classified eukaryotic species, but the estimated number of eukaryotic species is 12 to 15 million, including 8.1 million plants and animals (17). The EBP aims to sequence all classified species and to facilitate the discovery and classification of new species. As of March 4, 2021, the International Nucleotide Sequence Database Collaboration (INSDC) contained whole-genome DNA sequence information on 6,480 unique species, representing 81.4% of eukaryotic phyla, 64.7% of classes, 40.1% of orders, 15.5% of families, 2.3% of genera, and just 0.43% of all species (Fig. 1).

Fig. 1.

Global progress in whole-genome sequencing across all eukaryotic taxonomic levels. Data source: National Center for Biotechnology Information, March 4, 2021 (18).

Global progress in whole-genome sequencing across all eukaryotic taxonomic levels. Data source: National Center for Biotechnology Information, March 4, 2021 (18). However, the assembly quality of these 6,480 species’ genomes varies greatly (). A majority (63.1%) of the assemblies falls into the short-read draft category, with contig N50 < 100 kb and scaffold N50 < 10 Mb. A relatively small number of the draft-quality assemblies have achieved greater contiguity using scaffolding methods, such as Hi-C, linked-reads, and optical maps (19). The number of unique eukaryotic species with whole-genome assemblies has more than doubled since 2018 (Fig. 2), most of which are short-read draft quality. The number of reference-quality chromosome-scale assemblies of unique species representing taxonomic families nearly tripled since 2018, from 210 to 583. EBP-affiliated projects produced about half of these new reference-quality assemblies (see below), demonstrating the efficacy of shared goals and standards.

Fig. 2.

Year-over-year progress in whole genome sequencing for all eukaryotic taxa (Upper) and family-level (Lower) eukaryotic taxa, 2010 to March 4, 2021. The metrics for draft and reference quality assemblies are given in the text.

Progress of the EBP toward Phase I Goals

The past 2 y represent the start-up phase of the EBP. The major activities of the international EBP network-of-networks include: the development of standards; the evaluation of strategies for producing reference genomes; organizing regional, national, and transnational projects; and building communities through regular working committee meetings and an annual conference. The “Biodiversity Genomics 2020” conference was held virtually and had 3,000 registrants from 89 countries. The full recording of the meeting is available (20). The EBP is also developing new initiatives in training, broadening diversity and inclusion in project leadership, and building support for project funding from government agencies and private foundations around the world. The current line-up of 43 EBP-affiliated projects cover most of the major groups of eukaryotic taxa and represent access to tens of thousands of high-quality samples in museum collections and those from field biologists. The geographic diversity of the institutional members and affiliated projects cover 21 countries across all continents except Antarctica. The first African nodes have recently come on line in 2021 as part of the Africa BioGenome Project. The EBP also aims to expand member institutions and affiliated projects across additional biodiverse regions of the world, including the Indian subcontinent, Southeast Asia, and South America [for example, see Huddart et al. (21), this issue]. With high endemism concentrated in these regions, the ultimate success of the EBP requires building scientific capacity in developing nations and respecting national laws for access and benefit sharing. EBP-affiliated projects, such as the Darwin Tree of Life Project [see The Darwin Tree of Life Project Consortium, this issue (22)], The Vertebrate Genomes Project, 1000 Fungal Genomes Project, B10K (sequencing 10,000 bird species), and others have led the way in producing publicly accessible high-quality genomes (Table 1 and ). A Perspective on sequencing of plant genomes is included in this special issue (23). EBP-affiliated sequencing centers around the world are now coming online for the production of reference genomes using a simplified pipeline consisting of long reads and Hi-C (or equivalent), and other scaffolding methods, such as optical mapping, and public domain assembly tools, such as the recently developed hifiasm for generating long-read–based contigs (24) and SALSA for generating Hi-C scaffolds (25). This simplified approach, within the reach of most EBP-affiliated laboratories, yields chromosome-scale assemblies that meet the EBP standard (see above).

Table 1.

Progress of EBP affiliated projects in whole-genome sequencing and the production of reference genomes

Project name	No. of species	No. of references	No. of families with reference	No. of references 2021	No. of drafts 2021
1000 Fungal Genomes	663	20	10	10	100
B10K (birds)	400	32	29	0	400
Zoonomia (mammals)	130	0	0	0	0
VGP (vertebrates)	128	129	119	200	0
i5K (arthropods)	86	2	2	2	8
Darwin Tree of Life	71	71	14	1,500	0
Tree of Life, Sanger	50	50	50	300	0
LOEWE Center	43	43	43	84	95
Ungulates Genome Project	41	0	0	4	6
CanSeq150	36	21	0	0	0
10KP (plants)	21	2	2	16	42
All other	80	42	30	905	1,505
Total	1,719	412	316	3,021	2,156

All tabulated genomes in the first three columns have been submitted to the INSDC or other public domain databases. Numbers in the last two columns are projected additional species genomes for 2021. A complete table with INSDC project identifiers can be found in Dataset S1. Totals include some species that overlap between projects.

Progress of EBP affiliated projects in whole-genome sequencing and the production of reference genomes All tabulated genomes in the first three columns have been submitted to the INSDC or other public domain databases. Numbers in the last two columns are projected additional species genomes for 2021. A complete table with INSDC project identifiers can be found in Dataset S1. Totals include some species that overlap between projects. The EBP-affiliated projects have sequenced the genomes of 1,719 eukaryotic species, all of which have assemblies deposited in public domain databases (Table 1 and Dataset S1). Of these, 316 are reference-quality genomes, constituting ∼50% of all the genomes in the INSDC that meet the EBP reference standard. Furthermore, these already represent more than 200 taxonomically distinct nonredundant families. Thus, in the start-up phase, EBP-affiliated projects have sequenced ∼2% of extant eukaryotic families to reference-level quality. There are 3,021 family-level reference genomes expected to be completed in 2021. Thus, by the end of 2021, the first full year of the project, we project that ∼3,200 taxonomic families will have been sampled with at least one reference genome, corresponding to 34% completion of the EBP Phase I goal. Other large-scale initiatives with complementary goals have joined EBP as affiliated projects. These include BIOSCAN (26) and the Global Virome Project (27). BIOSCAN aims to DNA barcode every eukaryotic species on Earth, which will be critical to the EBP sample vouchering process and for accessing rare samples for sequencing. Partnership with the Global Virome Project creates an exciting avenue to identify potentially pathogenic viruses linked with their host species and for codevelopment of biosurveillance strategies (12). Integrated high-level coordination between these projects will have synergistic effects on biodiversity science and societal outcomes. A broad perspective on the scientific challenges and opportunities enabled by large-scale comparative genomics is provided by Stephan et al., this issue (28).

The Challenges Ahead

Although the number of reference-quality genomes at the family level tripled from 2018 to March 4, 2021 (Fig. 2), the EBP will have to produce nearly 3,000 genomes per year to meet the EBP Phase I goal of producing at least one reference genome from all ∼9,400 eukaryotic families in 3 y. The main challenges in meeting this target are given in Box 2. Challenges in meeting EBP goals Sourcing, vouchering, and permitting thousands of specimens globally High molecular weight DNA and RNA isolation at scale Sequencing capacity and throughput Assembly and curation at scale Annotation at scale Managing data flow in the context of international current and future data access and sharing regulations Whole genome alignments at scale Comparative genomic analysis, population genomics, and data visualization at scale To meet the EBP Phase I goal, the EBP network-of-networks will need to produce nine genomes per day, 365 d/y. Is this feasible? The Wellcome Sanger Institute alone plans to produce 1,500 reference-quality genomes in 2021 as part of the Darwin Tree of Life Project, corresponding to four genomes per day. As presented in Table 1, the Institute is already well on its way to achieving this goal in the coming year. The Vertebrate Genomes Project aims to produce six genomes per week to complete its goal of producing high-quality assemblies for species representing 260 vertebrate lineages separated by 50 million y or more from a common ancestor (19), by the end of 2021. With current technology and funded commitments for 2021 by EBP-affiliated sequencing centers, reaching the goal of 9 genomes per day globally, or nearly 3,000 annually, is anticipated (Table 1). The main challenge will be sourcing high-quality taxonomically identified samples for the isolation of high molecular weight DNA and RNA required for long-read DNA sequencing, scaffolding, and annotation. Separate from the current commitments above, about 50% of the taxonomic families could be obtained today from existing collections in the Global Genome Biodiversity Network () (29). Obtaining samples from many countries may require diverse permit processes that can last weeks to years. The EBP is working to develop long-term collaborations to facilitate sample access across the world. Another critical challenge will be obtaining reference-quality assemblies from small organisms, single-cell eukaryotes, and some green plants. New low-DNA input methods (30) have essentially solved the problem for most metazoans, but not for single-cell eukaryotes that cannot be cultured. Producing reference-quality genomes thus remains a significant challenge for a large part of the eukaryotic tree of life. Setting standards for the generation and storage of the complex set of genomes that characterize green plants will need to accommodate the immense variation in their size, transposable element content, and structure, while enabling research into the molecular and evolutionary processes that have resulted in this enormous genomic variation (23). Recommendations for sample collection and processing are included in this issue. Accelerating the annotation pipeline will also present major challenges as the production of genomes scales up. Planned 2021 annotation throughput is 300, 400, and 500 species for the National Center for Biotechnology Information, Joint Genome Institute, and European Molecular Biology Laboratory–European Bioinformatics Institute, respectively, which remains short of what will be necessary. This issue can be addressed by expanding capacity and creating more efficient genome annotation tools (31). Current recommendations for genome annotation are provided in this issue . To achieve the outputs required for Phase II and Phase III, dramatic increases in genome sequence production and efficiency will be required. Sequencing one representative for each of ∼165,000 genera in 4 y will require an increase in the throughput of genomes from 9 per day to 123 per day, or 14-fold above the Phase I target. Phase III will require another 10-fold increase above the Phase II target in order to complete the project in 10 y. We are optimistic that within 5 y, sample processing and sequencing technology will improve and costs will be reduced so that reference-quality genomes can be produced for all species for under USD $1,000 for a 2-Gb genome. We note that the cost, accuracy, and contiguity of assemblies produced today with long reads were not available 2 y ago. High-quality draft assemblies based on long reads can already be produced for ∼$2,000 in reagents and compute per 1-Gb genome average, getting closer to the $800 originally envisioned for short-read draft-quality genomes (1). Sequencing done for Phases II and III should meet or exceed the minimum standards for short-read–based draft assemblies: contig N50 > 100 Kb, scaffold N50 > 1 Mbp (or chromosome scale for smaller genomes), QV30. Although the EBP aspires to produce chromosome-level assemblies for all species, for uncultured microbial eukaryotes and highly repetitive genomes, the project will sacrifice perfection for progress in the near term. In 2018, we estimated a total EBP cost of USD $4.7 billion. This is significantly less than the original USD $2.7 billion (1991 dollars) cost of sequencing the human genome, comparable with USD $5.2 billion today. We note that producing complete telomere-to-telomere assemblies for all human chromosomes is a mission that is now being realized (32), and that the true cost of sequencing the human genome is significantly higher than the original USD $2.7 billion price tag. Reference-quality genomes currently being produced by the EBP’s sequencing nodes are of far greater quality (i.e., continuity, completeness, phasing) than the original “complete” human genome sequence [e.g., Rhie et al. (19)], and can now be produced for about USD $10,000 per 2-Gb genome, including transcriptome data for annotation. This amount is 20% of the cost of a similar quality assembly only 3 y ago when the original estimates were made. The project will save about USD $186 million in Phase I due to these improvements, bringing the total cost of Phase I down to $414 million from $600 million. The EBP has embraced the strategy of supporting funding efforts by states and nations: for example, the California Conservation Genomics Project and 1000 Chilean Genomes (), and EBP-Colombia (21). This effort has proven highly successful as it allows for local and regional concerns to be addressed in the funding drive. For example, in Australia there is great interest in conserving endangered marsupial species [see Hogg, this issue (33)]. This has led to a funded project that will produce five new marsupial reference genomes in 2021 (Table 1). Other examples include the Catalan Initiative for the Earth BioGenome Project, which aims to prioritize sequencing of endemic species with the goal of eventually sequencing all species in the Catalan territories (). National funding also provides an inherent mechanism for compliance with national laws on access and benefit sharing, which may prove essential for building trust, and ultimately obtaining all taxonomically classified species for sequencing. Capacity building in developing countries will be a direct benefit of participation.

Conclusions

The past year has been one of great progress for the EBP, marking the start of the clock for completing Phase I of the project. There are many challenges ahead in meeting Phase II and Phase III goals. Clearly, the ultimate aim of sequencing 1.84 million eukaryotes cannot be achieved by a single country or private entity. The coordinated efforts of thousands of scientists and institutions around the world are needed to produce ∼9,400 family reference genomes in 3 y. The project needs significant amounts of new funding, but the investments required on a global scale should be obtainable given the importance of the project to conserving and enhancing ecosystem services in the context of climate change and promoting a new bioeconomy. Despite limited financial resources for coordination, the EBP international network-of-networks has matured as the world’s most technically advanced organization to tackle the grand challenge of sequencing all known eukaryotes, identifying their genes and functions, advancing our understanding of the evolution of life on Earth, and developing a complete genomic characterization of Earth’s critical ecosystems. Based on a survey of institutional members and affiliates, the EBP now includes more than 5,000 scientists and technical staff around the world who are dedicated to EBP’s mission. The EBP has unleashed tremendous passion and energy among the project’s participants, particularly its younger generation of scientists and the general public. Given the precarious condition of Earth’s biodiversity, it is essential that the EBP and its affiliated projects achieve their ambitious goals. In the words of David Attenborough, “Extinction is forever—so our action must be immediate.” Every eukaryotic species is the product of millions of years of evolution. Recorded in their genomes are secrets that can fundamentally change our understanding of the evolution of life on Earth—its very existence and essence—and may lead to radical new approaches for mitigating the effects of climate change on biodiversity, improving agriculture, growing a sustainable global bioeconomy, saving species and repairing ecosystems, and preventing future pandemics. Let us go forth and sequence!

13 in total

1. BIOSCAN: DNA barcoding to accelerate taxonomy and biogeography for conservation and sustainability.

Authors: Donald Hobern
Journal: Genome Date: 2020-04-08 Impact factor: 2.166

2. The Global Virome Project.

Authors: Dennis Carroll; Peter Daszak; Nathan D Wolfe; George F Gao; Carlos M Morel; Subhash Morzaria; Ariel Pablos-Méndez; Oyewale Tomori; Jonna A K Mazet
Journal: Science Date: 2018-02-23 Impact factor: 47.728

3. Earth BioGenome Project: Sequencing life for the future of life.

Authors: Harris A Lewin; Gene E Robinson; W John Kress; William J Baker; Jonathan Coddington; Keith A Crandall; Richard Durbin; Scott V Edwards; Félix Forest; M Thomas P Gilbert; Melissa M Goldstein; Igor V Grigoriev; Kevin J Hackett; David Haussler; Erich D Jarvis; Warren E Johnson; Aristides Patrinos; Stephen Richards; Juan Carlos Castilla-Rubio; Marie-Anne van Sluys; Pamela S Soltis; Xun Xu; Huanming Yang; Guojie Zhang
Journal: Proc Natl Acad Sci U S A Date: 2018-04-24 Impact factor: 11.205

4. Opinion: Intercepting pandemics through genomics.

Authors: W John Kress; Jonna A K Mazet; Paul D N Hebert
Journal: Proc Natl Acad Sci U S A Date: 2020-06-03 Impact factor: 11.205

5. Broad host range of SARS-CoV-2 predicted by comparative and structural analysis of ACE2 in vertebrates.

Authors: Joana Damas; Graham M Hughes; Kathleen C Keough; Corrie A Painter; Nicole S Persky; Marco Corbo; Michael Hiller; Klaus-Peter Koepfli; Andreas R Pfenning; Huabin Zhao; Diane P Genereux; Ross Swofford; Katherine S Pollard; Oliver A Ryder; Martin T Nweeia; Kerstin Lindblad-Toh; Emma C Teeling; Elinor K Karlsson; Harris A Lewin
Journal: Proc Natl Acad Sci U S A Date: 2020-08-21 Impact factor: 11.205

6. A pneumonia outbreak associated with a new coronavirus of probable bat origin.

Authors: Peng Zhou; Xing-Lou Yang; Xian-Guang Wang; Ben Hu; Lei Zhang; Wei Zhang; Hao-Rui Si; Yan Zhu; Bei Li; Chao-Lin Huang; Hui-Dong Chen; Jing Chen; Yun Luo; Hua Guo; Ren-Di Jiang; Mei-Qin Liu; Ying Chen; Xu-Rui Shen; Xi Wang; Xiao-Shuang Zheng; Kai Zhao; Quan-Jiao Chen; Fei Deng; Lin-Lin Liu; Bing Yan; Fa-Xian Zhan; Yan-Yi Wang; Geng-Fu Xiao; Zheng-Li Shi
Journal: Nature Date: 2020-02-03 Impact factor: 69.504

7. Telomere-to-telomere assembly of a complete human X chromosome.

Authors: Karen H Miga; Sergey Koren; Arang Rhie; Mitchell R Vollger; Ariel Gershman; Andrey Bzikadze; Shelise Brooks; Edmund Howe; David Porubsky; Glennis A Logsdon; Valerie A Schneider; Tamara Potapova; Jonathan Wood; William Chow; Joel Armstrong; Jeanne Fredrickson; Evgenia Pak; Kristof Tigyi; Milinn Kremitzki; Christopher Markovic; Valerie Maduro; Amalia Dutra; Gerard G Bouffard; Alexander M Chang; Nancy F Hansen; Amy B Wilfert; Françoise Thibaud-Nissen; Anthony D Schmitt; Jon-Matthew Belton; Siddarth Selvaraj; Megan Y Dennis; Daniela C Soto; Ruta Sahasrabudhe; Gulhan Kaya; Josh Quick; Nicholas J Loman; Nadine Holmes; Matthew Loose; Urvashi Surti; Rosa Ana Risques; Tina A Graves Lindsay; Robert Fulton; Ira Hall; Benedict Paten; Kerstin Howe; Winston Timp; Alice Young; James C Mullikin; Pavel A Pevzner; Jennifer L Gerton; Beth A Sullivan; Evan E Eichler; Adam M Phillippy
Journal: Nature Date: 2020-07-14 Impact factor: 49.962

8. Towards complete and error-free genome assemblies of all vertebrate species.

Authors: Arang Rhie; Shane A McCarthy; Olivier Fedrigo; Joana Damas; Giulio Formenti; Sergey Koren; Marcela Uliano-Silva; William Chow; Arkarachai Fungtammasan; Juwan Kim; Chul Lee; Byung June Ko; Mark Chaisson; Gregory L Gedman; Lindsey J Cantin; Francoise Thibaud-Nissen; Leanne Haggerty; Iliana Bista; Michelle Smith; Bettina Haase; Jacquelyn Mountcastle; Sylke Winkler; Sadye Paez; Jason Howard; Sonja C Vernes; Tanya M Lama; Frank Grutzner; Wesley C Warren; Christopher N Balakrishnan; Dave Burt; Julia M George; Matthew T Biegler; David Iorns; Andrew Digby; Daryl Eason; Bruce Robertson; Taylor Edwards; Mark Wilkinson; George Turner; Axel Meyer; Andreas F Kautt; Paolo Franchini; H William Detrich; Hannes Svardal; Maximilian Wagner; Gavin J P Naylor; Martin Pippel; Milan Malinsky; Mark Mooney; Maria Simbirsky; Brett T Hannigan; Trevor Pesout; Marlys Houck; Ann Misuraca; Sarah B Kingan; Richard Hall; Zev Kronenberg; Ivan Sović; Christopher Dunn; Zemin Ning; Alex Hastie; Joyce Lee; Siddarth Selvaraj; Richard E Green; Nicholas H Putnam; Ivo Gut; Jay Ghurye; Erik Garrison; Ying Sims; Joanna Collins; Sarah Pelan; James Torrance; Alan Tracey; Jonathan Wood; Robel E Dagnew; Dengfeng Guan; Sarah E London; David F Clayton; Claudio V Mello; Samantha R Friedrich; Peter V Lovell; Ekaterina Osipova; Farooq O Al-Ajli; Simona Secomandi; Heebal Kim; Constantina Theofanopoulou; Michael Hiller; Yang Zhou; Robert S Harris; Kateryna D Makova; Paul Medvedev; Jinna Hoffman; Patrick Masterson; Karen Clark; Fergal Martin; Kevin Howe; Paul Flicek; Brian P Walenz; Woori Kwak; Hiram Clawson; Mark Diekhans; Luis Nassar; Benedict Paten; Robert H S Kraus; Andrew J Crawford; M Thomas P Gilbert; Guojie Zhang; Byrappa Venkatesh; Robert W Murphy; Klaus-Peter Koepfli; Beth Shapiro; Warren E Johnson; Federica Di Palma; Tomas Marques-Bonet; Emma C Teeling; Tandy Warnow; Jennifer Marshall Graves; Oliver A Ryder; David Haussler; Stephen J O'Brien; Jonas Korlach; Harris A Lewin; Kerstin Howe; Eugene W Myers; Richard Durbin; Adam M Phillippy; Erich D Jarvis
Journal: Nature Date: 2021-04-28 Impact factor: 49.962

9. The Global Genome Biodiversity Network (GGBN) Data Portal.

Authors: Gabriele Droege; Katharine Barker; Jonas J Astrin; Paul Bartels; Carol Butler; David Cantrill; Jonathan Coddington; Félix Forest; Birgit Gemeinholzer; Donald Hobern; Jacqueline Mackenzie-Dodds; Éamonn Ó Tuama; Gitte Petersen; Oris Sanjur; David Schindel; Ole Seberg
Journal: Nucleic Acids Res Date: 2013-10-16 Impact factor: 16.971

10. A High-Quality De novo Genome Assembly from a Single Mosquito Using PacBio Sequencing.

Authors: Sarah B Kingan; Haynes Heaton; Juliana Cudini; Christine C Lambert; Primo Baybayan; Brendan D Galvin; Richard Durbin; Jonas Korlach; Mara K N Lawniczak
Journal: Genes (Basel) Date: 2019-01-18 Impact factor: 4.096

11 in total

Review 1. A Revised Phylogenetic Classification for Viola (Violaceae).

Authors: Thomas Marcussen; Harvey E Ballard; Jiří Danihelka; Ana R Flores; Marcela V Nicola; John M Watson
Journal: Plants (Basel) Date: 2022-08-27

Review 2. Proteotranscriptomics - A facilitator in omics research.

Authors: Michal Levin; Falk Butter
Journal: Comput Struct Biotechnol J Date: 2022-07-09 Impact factor: 6.155

3. Africa: sequence 100,000 species to safeguard biodiversity.

Authors: ThankGod Echezona Ebenezer; Anne W T Muigai; Simplice Nouala; Bouabid Badaoui; Mark Blaxter; Alan G Buddie; Erich D Jarvis; Jonas Korlach; Josiah O Kuja; Harris A Lewin; Roksana Majewska; Ntanganedzeni Mapholi; Suresh Maslamoney; Michèle Mbo'o-Tchouawou; Julian O Osuji; Ole Seehausen; Oluwaseyi Shorinola; Christian Keambou Tiambo; Nicola Mulder; Cathrine Ziyomo; Appolinaire Djikeng
Journal: Nature Date: 2022-03 Impact factor: 69.504

4. Shotgun metagenomics of soil invertebrate communities reflects taxonomy, biomass, and reference genome properties.

Authors: Alexandra Schmidt; Clément Schneider; Peter Decker; Karin Hohberg; Jörg Römbke; Ricarda Lehmitz; Miklós Bálint
Journal: Ecol Evol Date: 2022-06-06 Impact factor: 3.167

5. A decade of GigaScience: A perspective on conservation genetics.

Authors: Stephen J O'Brien
Journal: Gigascience Date: 2022-06-14 Impact factor: 7.658

6. HiFiAdapterFilt, a memory efficient read processing pipeline, prevents occurrence of adapter sequence in PacBio HiFi reads and their negative impacts on genome assembly.

Authors: Sheina B Sim; Renee L Corpuz; Tyler J Simmonds; Scott M Geib
Journal: BMC Genomics Date: 2022-02-22 Impact factor: 3.969

Review 7. Methodologies for the De novo Discovery of Transposable Element Families.

Authors: Jessica M Storer; Robert Hubley; Jeb Rosen; Arian F A Smit
Journal: Genes (Basel) Date: 2022-04-17 Impact factor: 4.141

8. A chromosome-level reference genome of Ensete glaucum gives insight into diversity and chromosomal and repetitive sequence evolution in the Musaceae.

Authors: Ziwei Wang; Mathieu Rouard; Manosh Kumar Biswas; Gaetan Droc; Dongli Cui; Nicolas Roux; Franc-Christophe Baurens; Xue-Jun Ge; Trude Schwarzacher; Pat J S Heslop-Harrison; Qing Liu
Journal: Gigascience Date: 2022-04-30 Impact factor: 7.658

Review 9. Pervasive genome duplications across the plant tree of life and their links to major evolutionary innovations and transitions.

Authors: Xin Qiao; Shaoling Zhang; Andrew H Paterson
Journal: Comput Struct Biotechnol J Date: 2022-06-15 Impact factor: 6.155

10. Spatial Genomic Resource Reveals Molecular Insights into Key Bioactive-Metabolite Biosynthesis in Endangered Angelica glauca Edgew.

Authors: Amna Devi; Romit Seth; Mamta Masand; Gopal Singh; Ashlesha Holkar; Shikha Sharma; Ashok Singh; Ram Kumar Sharma
Journal: Int J Mol Sci Date: 2022-09-21 Impact factor: 6.208