Literature DB >> 36057561

Draft genome of the aardaker (Lathyrus tuberosus L.), a tuberous legume.

Pádraic J Flood¹, Minou Nowrousian², Bruno Huettel³, Christian Woehle³, Kerstin Becker⁴, Tassilo Erik Wollenweber⁴, Dominik Begerow⁵, Christopher Grefen⁶.

Abstract

OBJECTIVES: Lathyrus tuberosus is a nitrogen-fixing member of the Fabaceae which forms protein-rich tubers. To aid future domestication programs for this legume plant and facilitate evolutionary studies of tuber formation, we have generated a draft genome assembly based on Pacific Biosciences sequence reads. DATA DESCRIPTION: Genomic DNA from L. tuberosus was sequenced with PacBio's HiFi sequencing chemistry generating 12.8 million sequence reads with an average read length of 14 kb (approximately 180 Gb of sequence data). The reads were assembled to give a draft genome of 6.8 Gb in 1353 contigs with an N50 contig length of 11.1 Mb. The GC content of the genome assembly was 38.3%. BUSCO analysis of the genome assembly indicated a genome completeness of at least 96%. The genome sequence will be a valuable resource, for example, in assessing genomic consequences of domestication efforts and developing marker sets for breeding programs. The L. tuberosus genome will also aid in the analysis of the evolutionary history of plants within the nitrogen-fixing Fabaceae family and in understanding the molecular basis of tuber evolution.

Entities: Chemical

Keywords: Fabaceae; Genome sequencing; Lathyrus tuberosus; PacBio sequencing; tuber formation

Mesh：

Substances：
Nitrogen

Year: 2022 PMID： 36057561 PMCID： PMC9441078 DOI： 10.1186/s12863-022-01083-5

Source DB: PubMed Journal: BMC Genom Data ISSN： 2730-6844

Objective

Our current modus operandi for adapting our food production to changing climates is to continuously improve our existing crops to projected future environments. This is a sensible strategy which is of great importance for future food security. A complimentary approach which receives little attention is to select species which have properties we deem useful for future food production and convert these into crops. The lack of research into this approach hampers our ability to make full use of the functional and biological diversity which surrounds us. In addition to diversifying our crop portfolio to improve food supply, one key requirement for a sustainable future is to move away from animal-derived protein to plant-derived protein by growing more protein-rich crops. Peas, beans, and nuts are obvious candidates. However, for wider adoption of protein-rich plant-based diets we also need alternatives to beans and nuts. Protein-rich tubers are good candidates, and one of the plants that produce such protein-rich tubers is Lathyrus tuberosus, a nitrogen fixing member of the Fabaceae which produces tubers with up to 20% protein [1]. L. tuberosus is native to Eurasia and North Africa with a wide geographical distribution extending from Mediterranean to boreal environments. For centuries, it was cultivated or harvested from the wild on a small to medium scale throughout its range for food (leaves, seeds, and tubers) [2-5]; however, large scale adoption of L. tuberosus as a crop was hampered by poor yields. L. tuberosus is diploid with seven chromosomes and an estimated genome size of 6 Gb [6, 7]. To aid future domestication programs for this legume plant, we have generated a draft genome assembly based on Pacific Biosciences (PacBio) HiFi reads.

Data description

L. tuberosus seeds were obtained commercially from Vreeken’s Zaden (Dordrecht, Netherlands) and were grown in Wageningen (Netherlands). Formal identification of the plant material was performed by one of the authors (PJF). A tuber was sent to the botanical garden of the Ruhr-University Bochum (Germany) where it is maintained as a living collection (sample ID: Lathyrus tuberosus NL20). For DNA extraction, shoot tips were collected from the plants grown in Wageningen in September 2020 and immediately frozen in liquid nitrogen. High molecular weight genomic DNA was extracted with the NucleoBond HMW kit (Macherey-Nagel, Germany). PacBio HiFi sequencing libraries were prepared with the SMRTbell Express Template Prep Kit 2.0 (Pacific Biosciences, USA), size-fractionated with the SageELF system (Sage Science, USA) and sequenced on a PacBio Sequel II in six SMRT cells resulting in approximately 12.8 million HiFi reads with an average read length of 14 kb (Table 1, Data set 1, [9]), providing approximately 30-fold coverage of the genome of L. tuberosus that was previously estimated at 6 Gb using flow cytometry [7]. The sequence reads were assembled with hifiasm [11], and subsequently the purge_haplotigs tool was used to remove duplicated contigs [12]. The resulting 1668 contigs were searched for putative mitochondrial or chloroplast sequences using BLASTN [13] against the chloroplast and mitochondrial sequences from Lathyrus sativus and Pisum sativum, respectively [14, 15]. The PacBio sequence reads were mapped against the resulting putative mitochondrial or chloroplast contigs with graphmap [16] and mapped reads together with the L. sativus or P. sativum organelle genomes were used for similarity-assisted reassembly of the putative mitochondrial or chloroplast contigs with AlignGraph2 [17] resulting in one putative chloroplast contig and five putative mitochondrial contigs. The final L. tuberosus genome assembly (including the reassembled chloroplast and mitochondrial contigs) consists of 1353 contigs with a total length of 6.8 Gb, a contig N50 of 11.1 Mb and a GC content of 38.3% (Table 1, Data file 1, Data set 2, [8, 10]), including five mitochondrial contigs (total length of 476 kb, GC content 45.2%) and a single chloroplast contig (total length of 124 kb, GC content 35.2%) (Table 1, Data file 2, [8]).

Table 1

Overview of data files/data sets

Label	Name of data file/data set	File types(file extension)	Data repository and identifier (DOI or accession number)
Data file 1	Basic statistics of the L. tuberosus genome assembly	Spreadsheet (.xlsx)	Figshare, 10.6084/m9.figshare.19535053.v2 [8]
Data file 2	Basic statistics of the L. tuberosus mitochondrial and chloroplast contigs	Spreadsheet (.xlsx)	Figshare, 10.6084/m9.figshare.19535053.v2 [8]
Data file 3	Short BUSCO summary of the L. tuberosus genome assembly	Spreadsheet (.xlsx)	Figshare, 10.6084/m9.figshare.19535053.v2 [8]
Data set 1	PacBio sequence reads of L. tuberosus genomic DNA	fastq files (.fastq)	NCBI Sequence Read Archive (https://identifiers.org/ncbi/insdc.sra:SRR18139057) [9]
Data set 2	Genome assembly of L. tuberosus	fasta file (.fna)	NCBI GenBank (https://identifiers.org/ncbi/bioproject:PRJNA810344) [10]

Overview of data files/data sets BUSCO (Benchmarking Universal Single-Copy Orthologs) analysis of the assembly showed 96–100% completeness depending on the BUSCO library used for the analysis (Table 1, Data file 3, [8]). Between 18 and 30% of BUSCO groups were duplicated, most likely due to unphased heterozygous regions (see section Limitations). The L. tuberosus draft genome sequence will be a valuable resource in future domestication programs, e.g. for developing marker sets for breeding programs. In addition, the L. tuberosus genome will aid in the analysis of the evolutionary history of plants within the nitrogen-fixing Fabaceae family.

Limitations

The assembly still contains a relatively high degree of duplicated BUSCO groups (up to 30%), most likely due to unphased heterozygous regions. This might also explain the larger assembly size (6.8 Gb) compared to previous estimates by flow cytometry (6 Gb) [7]. L. tuberosus is thought to be an obligate outcrosser, thus obtaining homozygous material is likely to be challenging. Therefore, the duplicated regions might be addressed in future studies, e.g. by using single cell sequencing of pollen grains (gametes) to generate a set of recombinant haploid genotypes, which could be used to phase heterozygous loci (gamete binning, [18]).

10 in total

1. Genome size and DNA base composition of geophytes: the mirror of phenology and ecology?

Authors: Pavel Veselý; Petr Bures; Petr Smarda; Tomás Pavlícek
Journal: Ann Bot Date: 2011-10-21 Impact factor: 4.357

2. Localized hypermutation and associated gene losses in legume chloroplast genomes.

Authors: Alan M Magee; Sue Aspinall; Danny W Rice; Brian P Cusack; Marie Sémon; Antoinette S Perry; Sasa Stefanović; Dan Milbourne; Susanne Barth; Jeffrey D Palmer; John C Gray; Tony A Kavanagh; Kenneth H Wolfe
Journal: Genome Res Date: 2010-10-26 Impact factor: 9.043

3. The Chromosomes of Lathyrus Tuberosus.

Authors: E L Fisk
Journal: Proc Natl Acad Sci U S A Date: 1931-09 Impact factor: 11.205

Review 4. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.

Authors: S F Altschul; T L Madden; A A Schäffer; J Zhang; Z Zhang; W Miller; D J Lipman
Journal: Nucleic Acids Res Date: 1997-09-01 Impact factor: 16.971

5. AlignGraph2: similar genome-assisted reassembly pipeline for PacBio long reads.

Authors: Shien Huang; Xinyu He; Guohua Wang; Ergude Bao
Journal: Brief Bioinform Date: 2021-02-23 Impact factor: 11.622

6. Discordant evolution of organellar genomes in peas (Pisum L.).

Authors: Vera S Bogdanova; Natalia V Shatskaya; Anatoliy V Mglinets; Oleg E Kosterin; Gennadiy V Vasiliev
Journal: Mol Phylogenet Evol Date: 2021-03-06 Impact factor: 4.286

7. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm.

Authors: Haoyu Cheng; Gregory T Concepcion; Xiaowen Feng; Haowen Zhang; Heng Li
Journal: Nat Methods Date: 2021-02-01 Impact factor: 28.547

8. Purge Haplotigs: allelic contig reassignment for third-gen diploid genome assemblies.

Authors: Michael J Roach; Simon A Schmidt; Anthony R Borneman
Journal: BMC Bioinformatics Date: 2018-11-29 Impact factor: 3.169

9. Gamete binning: chromosome-level and haplotype-resolved genome assembly enabled by high-throughput single-cell sequencing of gamete genomes.

Authors: José A Campoy; Hequan Sun; Manish Goel; Wen-Biao Jiao; Kat Folz-Donahue; Nan Wang; Manuel Rubio; Chang Liu; Christian Kukat; David Ruiz; Bruno Huettel; Korbinian Schneeberger
Journal: Genome Biol Date: 2020-12-29 Impact factor: 13.583

10. Fast and sensitive mapping of nanopore sequencing reads with GraphMap.

Authors: Ivan Sović; Mile Šikić; Andreas Wilm; Shannon Nicole Fenlon; Swaine Chen; Niranjan Nagarajan
Journal: Nat Commun Date: 2016-04-15 Impact factor: 14.919

10 in total