Literature DB >> 31339217

Chromosome-level reference genome of X12, a highly virulent race of the soybean cyst nematode Heterodera glycines.

Yun Lian¹, He Wei¹, Jinshe Wang¹, Chenfang Lei¹, Haichao Li¹, Jinying Li¹, Yongkang Wu¹, Shufeng Wang¹, Hui Zhang¹, Tingfeng Wang¹, Pei Du¹, Jianqiu Guo², Weiguo Lu¹.

Abstract

Soybean cyst nematode (SCN, Heterodera glycines) is a major pest of soybean that is spreading across major soybean production regions worldwide. Increased SCN virulence has recently been observed in both the United States and China. However, no study has reported a genome assembly for H. glycines at the chromosome scale. Herein, the first chromosome-level reference genome of X12, an unusual SCN race with high infection ability, is presented. Using whole-genome shotgun (WGS) sequencing, Pacific Biosciences (PacBio) sequencing, Illumina paired-end sequencing, 10X Genomics linked reads and high-throughput chromatin conformation capture (Hi-C) genome scaffolding techniques, a 141.01-megabase (Mb) assembled genome was obtained with scaffold and contig N50 sizes of 16.27 Mb and 330.54 kilobases (kb), respectively. The assembly showed high integrity and quality, with over 90% of Illumina reads mapped to the genome. The assembly quality was evaluated using Core Eukaryotic Genes Mapping Approach and Benchmarking Universal Single-Copy Orthologs. A total of 11,882 genes were predicted using de novo, homolog and RNAseq data generated from eggs, second-stage juveniles (J2), third-stage juveniles (J3) and fourth-stage juveniles (J4) of X12, and 79.0% of homologous sequences were annotated in the genome. These high-quality X12 genome data will provide valuable resources for research in a broad range of areas, including fundamental nematode biology, SCN-plant interactions and co-evolution, and also contribute to the development of technology for overall SCN management.

Entities: Chemical Disease Species

Keywords: zzm321990Heterodera glycineszzm321990; Soybean cyst nematode; X12; chromosome scale; evolution; genome assembly

Mesh：

Substances：
Glycine

Year: 2019 PMID： 31339217 PMCID： PMC6899682 DOI： 10.1111/1755-0998.13068

Source DB: PubMed Journal: Mol Ecol Resour ISSN： 1755-098X Impact factor: 7.090

INTRODUCTION

Soybean cyst nematode (SCN) has become a major pest in soybean (Glycine max Merr.) worldwide, posing a serious threat to the sustainability of the soybean industry (Kim et al., 2016; Koenning & Wrather, 2010; Woo et al., 2014). SCN is estimated to cause annual yield losses of more than $1.2 billion in the United States (Koenning & Wrather, 2010) and more than $120 million in China (Wang, Zhao, & Chu, 2015). Moreover, increased virulence of SCN has been observed, and the dominant races appear to be shifting (Howland, Monnig, Mathesius, Nathan, & Mitchum, 2018; Hua et al., 2018; Lian et al., 2017). To some extent, this indicates that the ecological environment has caused evolution of nematode virulence, which is a serious threat to agro‐ecology and agricultural production. Classifying SCN populations by the published race scheme (Riggs & Schmitt, 1988) or the HG type test (Niblack et al., 2002) based on their virulence phenotype involves assessing the reproductive potential of a given population on a set of soybean indicator lines. Currently, planting SCN‐resistant cultivars is the primary method of controlling the nematode (Mitchum, 2016; Mitchum, Wrather, Heinz, Shannon, & Danekas, 2007; Niblack, Colgrove, Colgrove, & Bond, 2008). Because SCN‐resistant cultivars can invoke a defence against SCN population, a solid understanding of SCN genome information is the basis for analysing the mechanisms underlying pathogenicity and breeding for new SCN‐resistant cultivars (Gardner, Heinz, Wang, & Mitchum, 2017; Kadam et al., 2016; Patil et al., 2019). Race X12 was isolated from a soybean field heavily infected by SCN in Shanxi Province, China. To date, this race is able to successfully parasitize all resistant soybean germplasm tested, including the four indicator lines of the race scheme (Peking, Pickett, PI88788 and PI90763) (Riggs & Schmitt, 1988), the seven indicator lines of the HG type test (Peking, PI88788, PI90763, PI437654, PI 209332, PI 89772 and PI548316) (Niblack et al., 2002) and ZDD2315, the most promising elite resistant germplasm from China. Indeed, ZDD2315 is resistant to all SCN populations identified thus far, except for the newly identified race X12 (Lian et al., 2017). PI437654 is another elite resistant germplasm from the United States that is vulnerable to few natural SCN populations (Donald & Young, 2004; Jiao et al., 2015). Accordingly, X12 is thought to express additional or new virulence factors compared with other races, and it constitutes a potentially serious threat to soybean production, especially in China. Overall, genetic and genomic information for X12 is crucial for understanding the evolution of SCN parasitism genes and breeding additional resistant cultivars. Genome sequencing of the free‐living nematode Caenorhabditis elegans (The C. elegans sequencing consortium, 1998) and the parasitic nematodes Meloidogyne hapla (Opperman et al., 2008) and Globodera rostochiensis (Akker et al., 2016) has provided reference genomes that can be utilized for comparison with parasitic nematodes. The genetic map of SCN was reported in 2005 with 10 linkage groups (Atibalentja et al., 2005). Although a draft genome sequence for SCN was recently published with 738 contigs in the genome, these contigs were not successfully assembled into chromosomes (Masonbrink et al., 2019). Genome sequencing of the X12 race is extremely important for our understanding of SCN virulence genes. In this study, PacBio sequencing, 10X Genomics sequencing and Hi‐C were applied to assemble the genome of the newly reported SCN race X12. The sequence information provided in this study combined with other published nematode genomes will allow for comparative genomic approaches to study fundamental nematode biology, gene function, nematode parasitism and evolution. As plant–parasitic nematodes are among the most damaging and difficult‐to‐control agricultural pests, available genome sequences will help scientists meet the current and future worldwide demands for food and bioenergy by providing powerful information for the development of new control paradigms and by minimizing crop losses.

MATERIALS AND METHODS

Selection of individuals for sequencing

The genomes of Heterodera glycines are challenging to sequence and assemble because these animals are dioecious with exceptionally high levels of population heterozygosity (Masonbrink et al., 2019; reviewed by Jones et al., 2013). To assemble this highly heterozygous population, H. glycines race X12 was first purified using ZDD2315, an elite resistant soybean germplasm in China (Lu, Gai, Zheng, & Li, 2006) with high resistance to all races detected in the SCN survey in Huang‐Huai Valleys (Lian et al., 2016), except for race X12, to which it is highly susceptible. X12 (Hg type 1.2.3.4.5.6.7) was grown on ZDD2315 in a greenhouse at HeNan Academy of Agricultural Science. The starting culture was a single cyst selected from the X12 population, which was bulked for eight generations on ZDD2315 planted in steam‐pasteurized soil and grown with approximately 16 daylight hours at 28°C. The X12 cysts used for genome sequencing were cultivated from soil infected with the 8th‐generation purified X12 population.

DNA/RNA isolation

The forms of H. glycines include cysts (Figure 1a) and early (Figure 1b) (second‐stage juveniles, J2), middle (third‐stage juveniles, J3) and late (fourth‐stage juveniles, J4) life stages (as reviewed by Jones et al., 2013). Cyst nematodes moult to J2 in the egg; J2 is the dormant stage of the life cycle. J2, J3 and J4 are similar in morphology. Specimens for the four life stages of SCN were isolated from the X12 population. J2 were collected, followed by collection of J3 and J4 at 3, 8 and 15 days postinfection according to standard nematological methods (De Boer, Yan, Smant, Davis, & Baum, 1998). The samples were purified by sucrose gradient centrifugation (De Boer et al., 1996). Genomic DNA was extracted from approximately 20,000 eggs using MasterPure Complete DNA Purification Kit, and total RNA was extracted from 10,000 eggs or 300 J2/ J3/ J4 using Exiqon miRCURY RNA Isolation Kit.

Figure 1

The combination of cysts on soybean roots (a) and micrograph of Heterodera glycines (soybean cyst nematode) second‐stage juvenile (J2) (b) [Colour figure can be viewed at http://wileyonlinelibrary.com]

Genome sequencing

The DNA extracted from H. glycines was used for genome sequencing and sheared with a sonication device (Bioruptor Pico) for paired‐end library construction. Libraries with 350 base pair (bp) insert sizes were produced according to the instructions provided in the ×TEN Illumina Library Preparation Kit. The Illumina HiSeq ×TEN platform was used to generate 13.65‐gigabyte (Gb) whole‐genome sequencing data, and the clean reads obtained from this process were employed for subsequent analyses. Construction of a 10X Genomics library produced 31.32‐Gb sequencing data. For PacBio library construction, H. glycines genomic DNA was sheared to ～20 kb, and filtered fragments were converted into the proprietary 9 SMRTbell library using PacBio DNA Template Preparation Kit. In total, 28.48 Gb of quality‐filtered data was obtained from PacBio sequencing.

Genome assembly and quality control

The genome size was estimated based on the k‐mer spectrum of de novo data. All raw reads from the PacBio platform were aligned to each other using ‘daligner’ executed using the mail script of the falcon (v0.7) assembler. Overlapping reads and raw subreads were processed to generate consensus sequences, and error correction of the assembly was performed using the consensus‐calling algorithm Quiver (smrtlink_5.0.7). The paired‐end clean reads from the Illumina platform were further corrected using Pilon (v1.22), and the reads obtained after strict error correction were further used for the subsequent scaffolding. The 10X Genomics scaffold extension was performed using fragScaff (v140324.1) software, in which the linked reads generated using the 10X genomic library were aligned to the consensus sequence of the PacBio assembly. To obtain the superscaffold, only the consensus sequence with linked‐read support was used for assembly. To assess the accuracy of the assembled X12 genome, a small fragment library was selected for comparison of the assembled genome using bwa software (v0.7.8) (Burton et al., 2013; Li & Durbin, 2009; Rao et al., 2014; Yaffe & Tanay, 2011). Core Eukaryotic Genes Mapping Approach (CEGMA: http://korflab.ucdavis.edu/dataseda/cegma/; Parra, Bradnam, & Korf, 2007) analysis was performed to assess the completeness and continuity of the SCN genome (X12 race) assembly, along with six additional published genomes, based on a core eukaryotic gene (CEG) library with 248 conserved genes. In addition, the assembly was evaluated with Benchmarking Universal Single‐Copy Orthologs (BUSCO: http://busco.ezlab.org/; Simao, Waterhouse, Ioannidis, Kriventseva, & Zdobnov, 2015).

Repeat prediction

LTR_FINDER (v1.0.7), RepeatScout (v1.0.5) and RepeatModeler (v1.0.3) were used for de novo identification of repeat elements and for generating a repeat element database. This database was used in RepeatMasker (v4.07) to predict repeat elements. Putative repeats were further filtered on the basis of copy number.

Gene structure prediction

For gene structure prediction, both de novo and homology‐based approaches were combined to predict protein‐coding genes in the SCN genome. For the former, gene sets from Ascaris_suum (wormbase. WBPS6), Brugia malayi (ensembl. metazoa.v32), Caenorhabditis_briggsae (ensembl.metazoa.v32), C. elegans (ensembl.metazoa.v32), Drosophila melanogaster (ensembl.metazoa.v32) and Onchocerca volvulus (ensembl.metazoa.v34) were used as queries to search against the SCN genome. For the de novo‐based method, Augustus (v3.2.3), GlimmerHMM (v3.0.4), snap (v2013.11.29) and Genscan (v1.0) were employed as engines to predict gene models. The gene prediction results derived from both methods were merged using GLEAN to generate a consensus gene set.

Functional annotation of protein‐coding genes

Translated coding sequences were aligned to known databases such as Swiss‐Prot (v20180824), Nr (v20180716), Pfam (v31.0), kegg (v20160503) and InterPro (v5.31‐70.0). We annotated all protein‐coding genes identified in this study by retrieving functional terms according to Swiss‐Prot, Nr, kegg, InterPro, GO and Pfam.

Chromosome preparation

Chromosome preparation was performed according to the method of Du et al. (2016). Briefly, eggs or J2 were collected and treated with colchicine (0.005 g/L) in 2‐ml tubes for 3–5 hr before fixation in a solution of 3:1 ethanol:acetic acid (v/v) for 2–3 days. The samples were suspended in 45% acetic acid and squashed on a slide. After freezing at −70°C overnight, the slides that contained mitotic chromosomes were dehydrated in 100% ethanol followed by DAPI (4′‐,6‐diamidino‐2‐phenylindole) staining, and the chromosomes were observed and photographed under a fluorescence microscope using 450–490 nm excitation.

Hi‐C scaffolding of the assembly to the chromosome level

The Hi‐C clean data were aligned to the primary assembly using bwa software. Only read pairs with both reads in the pair aligned to contigs were considered for scaffolding. The scaffolds (greater than 100 bp) were selected by Lachesis (v201701) to scaffold the assembly at the chromosome level.

Phylogenetic analysis and species divergence time estimation

To investigate the phylogenetic position of H. glycines, orthologous and paralogous groups from 11 species were assigned by OrthoMCL as follows: pig roundworm, B. malayi, C. briggsae, C. elegans, D. melanogaster, Haemonchus contortus, Heterorhabditis bacteriophora, Loa loa, M. hapla, O. volvulus and Trichoplax adhaerens. Orthologous groups that contained only one gene for each species were represented by the gene encoding the longest protein sequence. Genes encoding protein sequences shorter than 50 amino acids were filtered out to exclude putative fragmented genes. All‐against‐all blastP was applied to identify similarities among the filtered protein sequences in these species with an E‐value cut‐off of 1e−5. muscle (Robert, 2004) with default parameters was used to generate a multiple sequence alignment of the protein sequences in each single‐copy family. The alignments of each family were then concatenated to form a superalignment that was used for phylogenetic tree reconstruction using maximum‐likelihood methods (Guindon et al., 2010; Yang & Rannala, 2012). Species divergence time was estimated using mcmctree (http://abacus.gene.ucl.ac.uk/software/paml.html) in the paml software package. The correction time points were T. adhaerens and D. melanogaster (1147–713 million years ago), D. melanogaster and M. hapla (946–551 million years ago), and M. hapla and C. elegans (217.5–190.0 million years ago).

RESULTS AND DISCUSSION

Genome summary

Multiple libraries with different insert sizes were constructed from DNA extracted from the eggs of the purified X12 population. In total, 95.22 Gb of sequencing data was generated, of which 13.65 Gb (96.81X coverage) was produced from Illumina reads, 28.48 Gb (201.97X coverage) from PacBio reads, 31.32 Gb (222.13X coverage) from 10X Genomics linked‐read libraries and 21.77 Gb (154.39X coverage) from the Hi‐C library (Table S1). The assembled genome is estimated to be 141.01 Mb, with scaffold and contig N50 sizes of 16.27 Mb and 330.54 kb, respectively (Figure 2b). In addition, the sequencing results (SCN_Lian) were compared with the newly released sequencing results (SCN_Masonbrink) of 2019 (Masonbrink et al., 2019) and the genomes of the plant–parasitic nematode M. hapla (Opperman et al., 2008) and the free‐living nematode C. elegans (The C. elegans sequencing consortium, 1998) (Table 1). The genome size of SCN_Lian is 141.01 Mb, which is almost identical to that of SCN_Masonbrink, at 123 Mb. Notably, SCN_Masonbrink did not assemble the genome of H. glycines at the chromosome scale, though SCN_Lian did. The BUSCO value of SCN_Lian is 53.4% compared with 72% for SCN_Masonbrink, but the BUSCO value of SCN_Masonbrink is ～54% when analysed using the nematode database and the genomic data supplied by Masonbrink et al. Therefore, there is little difference in assembly quality between the genomes of SCN_Masonbrink and SCN_Lian. The GC content of SCN_Lian (36.89%) is similar to that of C. elegans (35.4%), whereas M. hapla has an unusually low GC content of 27.4%. SCN_Masonbrink annotated 29,769 genes, and SCN_Lian annotated 11,882 genes.

Figure 2

Table 1

Comparison of Heterodera glycines genome statistics with other plant–parasitic nematodes Meloidogyne hapla and Caenorhabditis elegans

	H. glycines		M. hapla	C. elegans
	SCN‐Masonbrink	SCN‐Lian	M. hapla	C. elegans
Sequencing material	Inbred population TN10 (Hg type 1.2.6.7)	Natural population X12 (Hg type 1.2.3.4.5.6.7)
Genome size, Mb	123.85	141.01	54	100
Contigs, bp	738	889	3,452	N/A
Contig N50, Kb	304,130	330,544	N/A	N/A
Scaffolds	N/A	267	1,523	N/A
Scaffolds N50, bp	N/A	16,265,615	83,645	17,494,000
Assembled, bp	N/A	141,354,287	53,578,246	100,267,623
Sequence coverage, %	N/A	98.33	99.2	100
Per cent complete BUSCO, %	72	53.4	59	99.6
G+C, %	N/A	36.89	27.4	35.4
Annotated genes	29,769	11,882	14,420	20,060
Repeats numbers accounted for the genome, %	34	51.10	17	16.5
Identified SNPs	1,619,134	247,046	N/A	N/A
Chromosomes	NA	18	16	6
Chromosomes‐level assembly	N/A	Yes	N/A	Yes

Abbreviation: SCN, soybean cyst nematode.

Chromatin conformation capture‐based improved assembly Heterodera glycines genome. (a) Postclustering heat map showing the density of Hi‐C interactions between scaffolds from the proximity‐guided assembly. (b) Statistics of the completeness of the hybrid de novo assembly of X12 race genome. Listed are the assembled genome of ~141 Mb with scaffold and contig N50 size of 16.27 Mb and 330.54 Kb. Also listed in the table are the size and number of N60, N70, N80 and N90 of contigs and scaffolds. (c) Clustering of scaffolds using Hi‐C data into pseudochromosome‐scale scaffolds. Listed are the 258 scaffolds of total length ~12 Mb used for clustering. Also listed in the table are the cluster numbers, the number of contigs and the reference length of contigs [Colour figure can be viewed at http://wileyonlinelibrary.com] Comparison of Heterodera glycines genome statistics with other plant–parasitic nematodes Meloidogyne hapla and Caenorhabditis elegans Abbreviation: SCN, soybean cyst nematode. The data quality control results are shown in Tables S2–S4 and Figures S1 and S2. The following was obtained for assessment of polymerase length distribution: read number of 2,080,111, with mean read length of 13,703 and read length N50 of 23,355. Insert size length distribution showed the following: read number of 2,080,111, with mean read length of 9,875 and read length N50 of 14,429. Assessment of subread length distribution revealed that the read number was 3,179,171, with mean read length of 8,948 and read length N50 of 12,988. According to bwa software, the mapping rate of all small fragment reads to the genome was approximately 90.72%, and the coverage rate was approximately 98.33% (Table S5); thus, the reads show good agreement with the assembled genome. After sorting chromosome coordinates, removing repeated sequences and performing single nucleotide polymorphism (SNP) calling for the BWA comparison results, 247,046 SNPs were obtained, with 0.213% SNP heterozygosity and 0.0024% SNP homozygosity based on SAMtools (http://samtools.sourceforge.net/) (Table S6); therefore, the genome assembly has high single‐base accuracy. In addition, the GC content and average depth of the assembled genome were calculated and mapped using 10k Windows without repeated calculation. The results showed that the GC content is concentrated in a region encompassing 40% of the genome, without apparent separation, which showed that the genome was not contaminated by foreign sources (Table S7 and Figure S3). The results of CEGMA analysis demonstrated that the assembly was complete, with mapping rate of 86.29% (a total of 214 genes) (Table S8). BUSCO evaluation results also indicated that the assembly result was complete, with 53.4% assembled complete single‐copy genes of 978 homologous single‐copy genes (Table S9). Remarkably, only 53.4% of the genes in the H. glycines assembly are single copy according to the BUSCO analysis, with 3.7% duplicated. For comparison, the BUSCO results for SCN_Masonbrink indicate that 56% of the genes in H. glycines are single copy, with 16% duplicated (Masonbrink et al., 2019). Results of repeat prediction showed that the X12 genome contains 51.10% repeat sequences. Repetitive sequence statistics and classification results are shown in Tables S10 and S11 and Figure S4. The genome of H. glycines is diploid and consists of repeated sequences with higher nucleotide divergence (19.21%) than the genomes of Meloidogyne species, which are polyploid and consist of duplicated regions with low nucleotide divergence (~8%) (Abad et al., 2008; Blanc‐Mathieu et al., 2017; Sato et al., 2018; Szitenberg et al., 2017). Gene structure prediction was performed, and 11,882 protein‐coding genes were predicted, with a mean of 1,233.92 bp of coding sequence (CDS) and 8.3 exons per gene (Table S12 and Figure S5). The transcript lengths of genes, CDSs, exons and introns of SCN are comparable to those of the genomes used for homology‐based prediction (Table S13 and Figure S6). In addition, noncoding RNA genes were predicted in the SCN genome, including a total length of 17,688‐bp ribosomal RNA (rRNA), 46,685‐bp transfer RNA (tRNA), 39,375‐bp microRNA (miRNA) and 21,549‐bp snRNA genes (Table S14). Based on functional annotation of protein‐coding genes, 64.5% (7,663), 76.5% (9,093), 60% (7,126), 70.7% (8,405), 49.1% (5,840) and 61.5% (7,303) of genes are annotated in Swiss‐Prot, Nr, kegg, InterPro, GO and Pfam, respectively. The four life stages of SCN were isolated and then mixed before sequencing for genome annotation. In total, 9,383 protein‐coding genes (79.0%) with conserved functional motifs and functional terms were successfully annotated (Table S15 and Figure S7). The distribution of genes, GC contents, long terminal repeats (LTRs), long interspersed nuclear elements (LINEs), short interspersed nuclear elements (SINEs), tRNAs, miRNAs, snRNAs and rRNAs in X12’s chromosomes are shown in Figure 3.

Figure 3

The genome characteristics of Heterodera glycines. Circos plot showing the genomic features. 1u = 40 kb, small scale means 5u and large scale means 25u. From outer to inner circles: Track a: nine chromosomes of the genome; Track b: gene distribution in nine chromosomes; Track c: GC content distribution in nine chromosomes; Track d: LTR distribution in nine chromosomes; Track e: LINE distribution in nine chromosomes; Track f: SINE distribution in nine chromosomes; Track g: tRNA located on chromosomes; Track h: miRNA located on chromosomes; Track i: snRNA located on chromosomes; Track j: rRNA located on chromosomes [Colour figure can be viewed at http://wileyonlinelibrary.com] There are some differences regarding the results for SCN_Lian and SCN_Masonbrink, such as the number of annotated genes. The possible reasons are as follows. First, there were differences in the sequencing technologies used. For SCN_Masonbrink, PacBio long‐read technology was mainly used, whereas combined Illumina short‐read and PacBio long‐read technologies were employed for SCN_Lian. Second, there were differences in the materials sequenced. The inbred population TN10 (Hg type 1.2.6.7) was used for SCN_Masonbrink, but the natural population X12 (Hg type 1.2.3.4.5.6.7), which is the most virulent SCN population identified to date, was utilized for SCN_Lian. The differences in the pathogenicity of these populations may also be judged from the differential proportions of S genes (50.4% and 45.3% in SCN_Lian and SCN_Masonbrink, respectively) and D genes (2.3% and 8.7% in SCN_Lian and SCN_Masonbrink, respectively) in the BUSCO results (Table 2). Third, different annotation methods were applied. Gene annotations were performed using Braker for SCN_Masonbrink with an unmasked assembly, which annotated 29,769 genes including 12,357 expressed repetitive elements and showed that the H. glycines genome has a significant number of repeats, at 34% of the genome. To prevent the number of genes from being too high, which can be caused by false positives from repeats during gene annotation, repeat masking before structure annotation was performed for SCN_Lian, as also conducted in many other studies (Xu et al., 2013; Zhang et al., 2019). To obtain more comprehensive and accurate repeat sequences, homologous sequence alignment and ab initio prediction were performed. Ultimately, 11,882 annotated genes and 51.10% nonredundant repeat sequences were obtained.

Table 2

Genomic statistics and Benchmarking Universal Single‐Copy Orthologs analysis using nematode database

Scientific name	Version	Genome size	Gene number	BUSCO genome
Caenorhabditis_elegans	ensembl.metazoa.v32	98M		C:98.6% (S:98.0%, D0.6%), F:0.8%, M:0.6%, n:982
Caenorhabditis_briggsae	ensembl.metazoa.v32	106M		C:97.7% (S:97.0%, D0.7%), F:1.5%, M:0.8%, n:982
Ascaris_suum	ensembl.metazoa.v32	265M		C:89.8% (S:88.0%, D1.8%), F:6.6%, M:3.6%, n:982
Brugia_malayi	wormbase.WBPS6	93M		C:96.6% (S:96.0%, D0.6%), F:2.4%, M:1.0%, n:982
Onchocerca_volvulus	ensembl.metazoa.v32	94M		C:97.6% (S:97.3%, D0.3%), F:1.7%, M:0.7%, n:982
Meloidogyne hapla		54M	14420	C:59.9% (S:58.7%, D1.2%), F:9.4%, M:30.7%, n:982
Meloidogyne incognita		184M	43718/45351	C:61.8% (S:25.8%, D36.0%), F:8.1%, M:30.1%, n:982
Heterodera glycines (SCN‐Lian)		135M	11882	C:52.7% (S:50.4%, D2.3%), F:9.6%, M:37.7%, n:982
H. glycines (SCN‐Masonbrink)		129M	29769	C:54.0% (S:45.3%, D8.7%), F:10.4%, M:35.6%, n:982

Abbreviation: SCN, soybean cyst nematode.

Genomic statistics and Benchmarking Universal Single‐Copy Orthologs analysis using nematode database Abbreviation: SCN, soybean cyst nematode.

Chromosome observation and Hi‐C scaffolding

The chromosome number of H. glycines during meiosis was observed under a fluorescence microscope using 450–490 nm excitation (2n = 18) (Figure 4). The Illumina‐based Hi‐C data were remapped to the PacBio assembly, clustering into nine pseudomolecules using the Proximo Hi‐C scaffolding pipeline (Figure 2a). The Hi‐C scaffolding was able to anchor and order with high confidence all of the 258 scaffolds into nine pseudomolecules. The scaffold sizes ranged from 7.6 to 185 Mb with an N50 of 16.3 Mb (Figure 2c). The overall scaffolding rate was 91.2% (Table S16).

Figure 4

Observation of the chromosome of Heterodera glycines in meiosis under fluorescence microscope with 450–490 nm excitation light (2n = 18) [Colour figure can be viewed at http://wileyonlinelibrary.com]

Evolutionary analysis

A total of 25,535 gene family clusters were constructed. The genes used for gene family clustering in each species are shown in Table S17. In total, 482 single‐copy gene families are common to all 12 species. The distribution of single‐copy orthologs, multiple‐copy orthologs, genes unique to H. glycines and other orthologs in different species is shown in Table S18. Protein sequences from the 482 single‐copy gene families were used for phylogenetic tree reconstruction, and the estimation of divergence time was performed (Figure 5) with mcmctree software. Synteny diminished as phylogenetic relatedness declined, and our results showed that the divergence time between H. glycines and M. hapla is approximately 143.6 million years. Thus, the divergence of H. glycines preceded that of the model nematode C. elegans. Moreover, because plant parasitism is a lifestyle found in three different clades in the nematode tree of life, plant parasitism appeared at least three times independently during the evolution of nematodes (Danchin & Perfus‐Barbeoch, 2009). It was also inferred that plant parasites evolved from fungus‐feeding nematodes, according to previous results that showed consistent coclustering of plant parasites with fungivorous species (Holterman et al., 2006).

Figure 5

Phylogenetic relationships of species related to Heterodera glycines. Phylogenetic tree of the single‐copy gene families coexistent in the 12 species representing the relatedness of each species. Red box surrounded denotes the sequenced species H. glycines. Node labels represent node ages [Colour figure can be viewed at http://wileyonlinelibrary.com]

CONCLUSIONS

The chromosome‐level reference genome sequence of X12, a notable race of the SCN H. glycines, has immediate and important implications for research on plant nematodes and also for broader biological studies. In total, approximately 95.22 Gb of sequencing data were generated using a combination of long‐ and short‐read techniques. The assembled genome contains 267 scaffolds, with an N50 scaffold length of 16.27 Mb and a total length of 141.01 Mb. The assembly was estimated to contain 86.29% of core genes according to CEGMA analysis and 53.4% of complete single‐copy genes. The assembled genome is more contiguous than is the previously published H. glycines genome (Masonbrink et al., 2019), with 1.2‐fold more contigs and a 1.09‐fold greater N50 contig length. The mapping rate for reads back to the assembled genome is approximately 90.72%. A total of 11,882 genes were predicted, assisted by RNA sequencing data, and 79.0% homologous sequences were annotated in the genome. This high‐quality genome assembly of H. glycines will help to enable the identification of virulence‐related genes.

CONFLICT OF INTEREST

The authors declare that they have no competing interests.

AUTHOR CONTRIBUTIONS

Yun Lian and Weiguo Lu conceived and designed the experiments. He Wei, Chenfang Lei and Jinying Li collected the cyst samples and managed the data. Jinshe Wang performed the data analysis. Haichao Li collected the sequencing data. Jianqiu Guo collected the soil sample containing X12 from the soybean field. Pei Du participated in the chromosome preparation. Yongkang Wu, Shufeng Wang, Hui Zhang and Tingfeng Wang revised the manuscript. Click here for additional data file. Click here for additional data file.

37 in total

Review 1. Molecular phylogenetics: principles and practice.

Authors: Ziheng Yang; Bruce Rannala
Journal: Nat Rev Genet Date: 2012-03-28 Impact factor: 53.242

Review 2. Soybean Resistance to the Soybean Cyst Nematode Heterodera glycines: An Update.

Authors: Melissa G Mitchum
Journal: Phytopathology Date: 2016-09-06 Impact factor: 4.025

3. Complete Characterization of the Race Scheme for Heterodera glycines.

Authors: R D Riggs; D P Schmitt
Journal: J Nematol Date: 1988-07 Impact factor: 1.402

4. Sequence and genetic map of Meloidogyne hapla: A compact nematode genome for plant parasitism.

Authors: Charles H Opperman; David M Bird; Valerie M Williamson; Dan S Rokhsar; Mark Burke; Jonathan Cohn; John Cromer; Steve Diener; Jim Gajan; Steve Graham; T D Houfek; Qingli Liu; Therese Mitros; Jennifer Schaff; Reenah Schaffer; Elizabeth Scholl; Bryon R Sosinski; Varghese P Thomas; Eric Windham
Journal: Proc Natl Acad Sci U S A Date: 2008-09-22 Impact factor: 11.205

5. A genetic linkage map of the soybean cyst nematode Heterodera glycines.

Authors: N Atibalentja; S Bekal; L L Domier; T L Niblack; G R Noel; K N Lambert
Journal: Mol Genet Genomics Date: 2005-04-06 Impact factor: 3.291

Review 6. Genome sequence of the nematode C. elegans: a platform for investigating biology.

Authors:
Journal: Science Date: 1998-12-11 Impact factor: 47.728

7. Survey of Heterodera glycines Population Densities and Virulence Phenotypes During 2015-2016 in Missouri.

Authors: Amanda Howland; Nick Monnig; Jeff Mathesius; Manjula Nathan; Melissa G Mitchum
Journal: Plant Dis Date: 2018-10-23 Impact factor: 4.438

Review 8. Advancements in breeding, genetics, and genomics for resistance to three nematode species in soybean.

Authors: Ki-Seung Kim; Tri D Vuong; Dan Qiu; Robert T Robbins; J Grover Shannon; Zenglu Li; Henry T Nguyen
Journal: Theor Appl Genet Date: 2016-10-28 Impact factor: 5.699

9. Hybridization and polyploidy enable genomic plasticity without sex in the most devastating plant-parasitic nematodes.

Authors: Romain Blanc-Mathieu; Laetitia Perfus-Barbeoch; Jean-Marc Aury; Martine Da Rocha; Jérôme Gouzy; Erika Sallet; Cristina Martin-Jimenez; Marc Bailly-Bechet; Philippe Castagnone-Sereno; Jean-François Flot; Djampa K Kozlowski; Julie Cazareth; Arnaud Couloux; Corinne Da Silva; Julie Guy; Yu-Jin Kim-Jo; Corinne Rancurel; Thomas Schiex; Pierre Abad; Patrick Wincker; Etienne G J Danchin
Journal: PLoS Genet Date: 2017-06-08 Impact factor: 5.917

10. Fast and accurate short read alignment with Burrows-Wheeler transform.

Authors: Heng Li; Richard Durbin
Journal: Bioinformatics Date: 2009-05-18 Impact factor: 6.937

7 in total

1. Identification of Candidate Genes Controlling Soybean Cyst Nematode Resistance in "Handou 10" Based on Genome and Transcriptome Analyzes.

Authors: He Wei; Yun Lian; Jinying Li; Haichao Li; Qijian Song; Yongkang Wu; Chenfang Lei; Shiwei Wang; Hui Zhang; Jinshe Wang; Weiguo Lu
Journal: Front Plant Sci Date: 2022-03-15 Impact factor: 5.753

2. Evolution of sexual systems, sex chromosomes and sex-linked gene transcription in flatworms and roundworms.

Authors: Yifeng Wang; Robin B Gasser; Deborah Charlesworth; Qi Zhou
Journal: Nat Commun Date: 2022-06-10 Impact factor: 17.694

3. The Spatial Distribution and Genetic Diversity of the Soybean Cyst Nematode, Heterodera glycines, in China: It Is Time to Take Measures to Control Soybean Cyst Nematode.

Authors: Yun Lian; Georg Koch; Dexin Bo; Jinshe Wang; Henry T Nguyen; Chun Li; Weiguo Lu
Journal: Front Plant Sci Date: 2022-06-15 Impact factor: 6.627

4. Chromosome-level reference genome of X12, a highly virulent race of the soybean cyst nematode Heterodera glycines.

Authors: Yun Lian; He Wei; Jinshe Wang; Chenfang Lei; Haichao Li; Jinying Li; Yongkang Wu; Shufeng Wang; Hui Zhang; Tingfeng Wang; Pei Du; Jianqiu Guo; Weiguo Lu
Journal: Mol Ecol Resour Date: 2019-09-09 Impact factor: 7.090

5. A chromosome-level genome assembly of the orange wheat blossom midge, Sitodiplosis mosellana Géhin (Diptera: Cecidomyiidae) provides insights into the evolution of a detoxification system.

Authors: Zhongjun Gong; Tong Li; Jin Miao; Yun Duan; Yueli Jiang; Huiling Li; Pei Guo; Xueqin Wang; Jing Zhang; Yuqing Wu
Journal: G3 (Bethesda) Date: 2022-07-29 Impact factor: 3.542

Review 6. Metabolomics as an Emerging Tool for the Study of Plant-Pathogen Interactions.

Authors: Fernanda R Castro-Moretti; Irene N Gentzel; David Mackey; Ana P Alonso
Journal: Metabolites Date: 2020-01-29

7. Speciation and adaptive evolution reshape antioxidant enzymatic system diversity across the phylum Nematoda.

Authors: Lian Xu; Jian Yang; Meng Xu; Dai Shan; Zhongdao Wu; Dongjuan Yuan
Journal: BMC Biol Date: 2020-11-26 Impact factor: 7.431

7 in total