Literature DB >> 34101329

Current status of structural variation studies in plants.

Yuxuan Yuan^1,2, Philipp E Bayer¹, Jacqueline Batley¹, David Edwards¹.

Abstract

Structural variations (SVs) including gene presence/absence variations and copy number variations are a common feature of genomes in plants and, together with single nucleotide polymorphisms and epigenetic differences, are responsible for the heritable phenotypic diversity observed within and between species. Understanding the contribution of SVs to plant phenotypic variation is important for plant breeders to assist in producing improved varieties. The low resolution of early genetic technologies and inefficient methods have previously limited our understanding of SVs in plants. However, with the rapid expansion in genomic technologies, it is possible to assess SVs with an ever-greater resolution and accuracy. Here, we review the current status of SV studies in plants, examine the roles that SVs play in phenotypic traits, compare current technologies and assess future challenges for SV studies.

Entities: Chemical

Keywords: DNA sequencing; breeding; gene expression; optical mapping; phenotypic variation; structural variation

Mesh：

Year: 2021 PMID： 34101329 PMCID： PMC8541774 DOI： 10.1111/pbi.13646

Source DB: PubMed Journal: Plant Biotechnol J ISSN： 1467-7644 Impact factor: 9.803

Introduction

Structural variations (SVs) are genetic differences between individuals, which can lead to gene loss, gene duplication and the generation of novel genes, therefore, leading to phenotypic variations in a species. An SV is defined as a region of DNA that has a change in sequence length, copy number, orientation or chromosomal location between individuals (Escaramis et al., 2015). Generally, an SV can be classified as a deletion, an insertion, a copy number variation (CNV), an inversion or a translocation. In contrast to single nucleotide polymorphisms (SNPs) and small indels (insertions and deletions), SVs are considered to be longer (>50 bp) and can have a greater influence on gene expression and protein function than SNPs (Chiang et al., 2017). In early plant genomic studies, the limitation of technologies and the lack of high‐quality reference genome assemblies prevented the comprehensive exploration of SVs in plants. Many plants have large and complex genomes, with polyploidy occurring in up to 80% of plant species (Meyers and Levin, 2006), making the identification of SVs in plant genomes a challenge. Recent advances in genomic technologies, particularly long‐read sequencing and whole‐genome mapping, promise the production of high‐quality plant genome and pangenome assemblies and access to a broad range of SVs to assess their potential role in plant phenotypic variation. Although improvements in DNA sequencing, whole‐genome mapping and novel algorithms have made it feasible to characterize SVs on a genome‐wide scale with higher accuracy, the reports of SV studies in plants are still limited. Significant effort and resources are needed to comprehensively decipher the association between SVs and agronomic traits to support plant improvement, supporting the economies and food security. Here, we discuss the current progress and challenges of SV studies in plants and the potential to apply knowledge of SVs to improve crop varieties.

Limitations of early technologies for SV identification

Before the widespread use of molecular markers and DNA sequencing, SVs were characterized by microscopes at a karyotype level, with a resolution >3 Mb (Figure 1a) (Feuk et al., 2006). Due to the low throughput and the limited resolution of microscopic observation however, there are few novel SV studies using microscopic techniques, and they are now mostly applied to confirm known SVs. Approximately 15 years ago, the advent of hybridization‐based microarray approaches made it possible to perform SV studies with a greater resolution and lower cost than microscopic methods. Two commonly used methods were array comparative genomic hybridization (array‐CGH) and SNP arrays.

Figure 1

Methods used to identify structural variations from the past to the present. The figure lists commonly used methods to identify SVs from the early (a) microscope observation, (b) array comparative genomic hybridization and (c) SNP array to the current (d) DNA sequencing. Array‐CGH can efficiently detect CNVs at multiple genomic loci (Figure 1b) and has been applied to diverse studies, including gene discovery, epigenetic modification and chromatin conformation (Bejjani and Shaffer, 2006). Nevertheless, array‐CGH cannot detect balanced SVs (e.g. reciprocal translocations and inversions) or absolute copy numbers of a DNA segment, because it detects genetic imbalances between two individual genomes, where a sample has more or less of a specific genetic material than others (Escaramis et al., 2015). Additionally, array‐CGH is specifically designed for diploid individuals and it is not sensitive to higher degrees of ploidy (>2 sets of chromosomes). A well‐assembled reference is also essential during the design of the array (Park et al., 2010). In contrast to array‐CGH, SNP arrays are more sensitive to allele‐specific CNVs and can help to identify large‐scale CNVs in diverse populations (Figure 1c; Alkan et al., 2011). However, SNP arrays provide a poor signal‐to‐noise ratio compared with array‐CGH due to the smaller target size (Hester et al., 2009). As with array‐CGH, SNP arrays cannot be used to detect insertions. The number of SVs detected is dependent on the density or presence and absence of SNPs in the target genome regions. Moreover, SNP arrays were initially designed for diploid samples and struggle to characterize repeat‐rich and duplicated regions. The design of SNP arrays also depends on the quality of the reference genome assemblies. Furthermore, the breakpoints of SVs cannot be easily detected by SNP arrays or array‐CGH.

Current technologies for SV identification

With advances in DNA sequencing, whole‐genome analysis has become viable for a wide range of plant species. Examining whole genomes by DNA sequencing has permitted SV characterization at the nucleotide level, and the detection of inversions and translocations, as well as recombination breakpoints, has become more efficient. Initially, single‐end reads (sequenced only in one direction from a DNA strand) were used. With the expansion of sequencing methods, paired reads, sequenced from both forward and reverse orientations from complementary DNA strands, with a known approximate distance between the pairs, have been used to overcome the challenges of associating short single reads with regions of the genome. However, the short length (<600 bp) of these reads still poses challenges for the characterization of repetitive regions (Michael and VanBuren, 2020), and thus, the accuracy of SV detection based on short sequence reads is limited. Recent advances in long‐read sequencing and high‐throughput chromosome conformation capture (Hi‐C) technologies offer solutions to overcome some of the problems associated with short sequence reads. Hi‐C read pairs can physically span the entire chromosomes and can be applied to detect large‐scale SVs (Ho et al., 2020), while long‐read sequencing, comprising synthetic long‐read sequencing and single‐molecule long‐read sequencing (Goodwin et al., 2016) can average 10 to >100 kb in length to resolve SVs that cannot readily be assayed using short reads by short‐read sequencing. The previous high error rates (5–15%), low throughput and relatively high cost of single‐molecule long‐read sequencing have limited their application (Yuan et al., 2017). However, with reducing costs and the continued advances in sequencing technology and computational algorithms, more accurate data (accuracy >99%), such as PacBio HiFi reads and Oxford Nanopore R10.3 reads, have been produced, which could further improve the accuracy of genome analysis, particularly for haplotype‐aware genome assembly and SV studies (Wenger et al., 2019). Optical mapping in nanochannels is complementary to DNA sequencing and provides an approach for large‐scale SV detection (Yuan et al., 2020). DNA from the plant species is nicked or directly labelled by specific enzymes such as Nt. BspQI, Nb. BssSI and DLE‐1, and strands are loaded and stretched in nanochannels, labelled by fluorescence and scanned in an optical mapping device (Lam et al., 2012). The fluorescence images produced are then converted into single‐molecule maps based on nicked enzyme site positions. The average length of single‐molecule maps is around 225 kb (Shelton et al., 2015), and thus, optical mapping can capture larger genomic structural variation that is not easily detected by DNA sequencing.

Strategies for SV identification

There are two commonly used strategies to detect SVs using DNA sequencing (Figure 1d). One is to directly compare de novo genome assemblies, and the other is to use the information from mapping reads to a reference, such as paired reads (PR), read depth (RD) and split reads (SR) to detect SVs (Escaramis et al., 2015). Since the release of the Arabidopsis thaliana genome assembly in 2000 (Arabidopsis Genome, 2000), approximately 450 plant genomes have been assembled (https://www.plabipd.de). The continued increase in high‐quality genome assemblies makes SV characterization in plants more reliable. Whole‐genome comparison can identify SVs by comparing the genome of one individual to another. Several tools have been developed for this purpose, including Mauve (Darling et al., 2004), MUMMER (Kurtz et al., 2004), LASTZ (Harris, 2007), Assemblytics (Nattestad and Schatz, 2016), paftools (Li, 2018), SyRI (Goel et al., 2019) and SVIM‐asm (Heller and Vingron, 2020). However, due to the difficulty and expense of producing high‐quality genome assemblies, and the challenge of differentiating between real genomic differences and assembly or annotation artefacts (Bayer et al., 2018; Bayer et al., 2017), the application of whole‐genome comparison in SV detection is limited (Wala et al., 2018), while SV analysis using read mapping is more common. In principle, paired reads can be used to detect all kinds of SVs, as SVs change the paired read mapping patterns (Ye et al., 2016). Briefly, when aligning to an insertion, the distance between paired reads will be increased compared to the average insert size, while for a deletion, the distance between paired reads will reduce compared to the average. If an inversion occurs, the orientation of reads can be reversed. Translocations can also be detected using the information from mapped paired reads, as the reads may map to different chromosomal locations. For CNVs, read mapping can lead to increased or decreased mapped read depth depending on the copy number of the target genome regions. However, due to the short‐read length, repetitiveness and complexity of plant genomes, up to 89% of SVs have been reported to be false positives, which needs comprehensive filtration to ensure robust results (Sedlazeck et al., 2018). Although short sequence reads can be less efficient for SV detection than longer reads, they are still applied to characterize SVs due to their relatively low cost. To facilitate such analysis, many tools have been developed for using short reads to detect SVs (Table 1).

Table 1

Software used to detect structural variations

Software	Language	SV calling type					Data type	References
Software	Language	Insertion	Deletion	Inversion	CNV	Translocation	Data type	References
ETCHING	C and C++		✓	✓		✓	PE	Choi et al. (2020)
Scpluscnv	R	✓	✓	✓	✓	✓	PE	Lopez et al. (2020)
CONY	R				✓		PE	Wei and Huang (2020)
cuteSV	Python	✓	✓	✓		✓	PB; ONT	Jiang et al. (2020)
NanoVar	C++; Python; C; shell	✓	✓	✓			ONT	Tham et al. (2019)
SVIM	Python	✓	✓	✓			PB; ONT	Heller and Vingron (2019)
PBSV	Python	✓	✓	✓	✓	✓	PB	PacificBiosciences (2018)
Sniffles	C++; C; HTML	✓	✓	✓		✓	PB; ONT	Sedlazeck et al. (2018)
Picky	Perl	✓	✓	✓		✓	PB; ONT	Gong et al. (2018)
NanoSV	Python; shell	✓	✓	✓		✓	PB; ONT	Cretu Stancu et al. (2017)
SVachra	Ruby	✓	✓	✓		✓	PE; MP	Hampton et al. (2017)
PSSV	R	✓	✓	✓		✓	PE	Chen et al. (2017)
Seeksv	C++	✓	✓	✓			PE	Liang et al. (2017)
novoBreak	Perl; shell		✓	✓		✓	PE	Chong et al. (2017)
Manta	C++; Python	✓	✓				PE	Chen et al. (2016)
SoftSV	C++		✓	✓		✓	PE	Bartenhagen and Dugas (2016)
SV‐STAT	Shell; Perl	✓	✓	✓		✓	PE; SE	Davis et al. (2016)
MUMdex	C++	✓	✓	✓		✓	PE	Andrews et al. (2016)
MetaSV	Python; HTML; Shell	✓	✓	✓		✓	PE	Mohiyuddin et al. (2015)
BreaKmer	Python	✓	✓	✓		✓	SE	Abo et al. (2015)
Genome STRiP2	Java; R		✓	✓			PE	Handsaker et al. (2015)
Hydra‐multi	C++; Python; Shell; Perl		✓	✓		✓	PE	Lindberg et al. (2015)
Ulysses	Python; R	✓	✓	✓		✓	MP	Gillet‐Markowska et al. (2015)
LUMPY	C; C++; Python; Shell		✓	✓	✓	✓	PE	Layer et al. (2014)
Scalpel	Perl; C++	✓	✓				PE	Narzisi et al. (2014)
Gustaf	C++		✓	✓		✓	PE; SE	Trappe et al. (2014)
PBHoney	Python	✓	✓	✓		✓	PB	English et al. (2014)
Socrates	Java		✓	✓		✓	PE; SE	Schroder et al. (2014)
FACTERA	Perl		✓	✓		✓	PE	Newman et al. (2014)
SMuFin	C	✓	✓	✓		✓	PE	Moncunill et al. (2014)
CNVeM	C				✓		PE	Wang et al. (2013)
Breakpointer	Fortran; Python					✓	PE	Drier et al. (2013)
Bellerophon	Perl					✓	PE	Hayes and Li (2013)
PeSV‐Fisher	Python		✓	✓		✓	PE; MP	Escaramis et al. (2013)
RetroSeq	Perl	✓					PE	Keane et al. (2013)
SOAPindel	Perl; C++	✓	✓				PE	Li et al. (2013)
cn.MOPS	R				✓		PE; SE	Klambauer et al. (2012)
Magnolya	Python				✓		PE	Nijkamp et al. (2012)
Cortex	C	✓	✓				PE; SE	Iqbal et al. (2012)
CNVnorma	R				✓		PE	Gusnanto et al. (2012)
Control‐FREEC	C++				✓		PE; SE	Boeva et al. (2012)
cnvHiTSeq	Java				✓		PE	Bellos et al. (2012)
CLEVER	C++	✓	✓				PE	Marschall et al. (2012)
Delly	C++		✓	✓		✓	PE	Rausch et al. (2012)
GASVPro	Java; C++; perl; python	✓	✓	✓		✓	PE	Sindi et al. (2012)
PRISM	N/A		✓	✓			PE	Jiang et al. (2012)
SVMiner	C++; Perl	✓	✓				PE	Hayes et al. (2012)
BIC‐seq	Perl; R				✓		PE; SE	Xi et al. (2011)
ReadDepth	R				✓		PE	Miller et al. (2011)
CNVnator	C++; Perl				✓		PE; SE	Abyzov et al. (2011)
JointSLM	R				✓		PE; SE	Magi et al. (2011)
Clipcrop	JavaScript	✓	✓	✓		✓	PE	Suzuki et al. (2011)
CREST	Perl		✓	✓		✓	PE; SE	Wang et al. (2011)
inGAP‐sv	Java	✓	✓	✓		✓	PE	Qi and Zhao (2011)
Splitread	C; Shell	✓	✓				PE	Karakoc et al. (2011)
rSW‐seq	C				✓		SE	Kim et al. (2010)
cnD	D				✓		PE	Simpson et al. (2010)
CNVer	Shell; C				✓		PE	Medvedev et al. (2010)
SVMerge	Perl; Shell	✓	✓	✓		✓	PE; SE	Wong et al. (2010)
SVDetect	Perl	✓	✓	✓		✓	PE; MP	Zeitouni et al. (2010)
VariationHunter	N/A	✓					PE	Hormozdiari et al. (2010)
NovelSeq	C++	✓					PE	Hajirasouliha et al. (2010)
SLOPE	C++	✓	✓			✓	PE; SE	Abel et al. (2010)
BreakSeq	Python; Perl	✓	✓				PE	Lam et al. (2010)
mrCaNaVaR	C		✓		✓		PE	Alkan et al. (2009)
CNV‐seq	Perl;R				✓		PE	Xie and Tammi (2009)
RDXplorer	Shell; C				✓		SE	Yoon et al. (2009)
BreakDancer	Perl; C++	✓	✓	✓		✓	PE	Chen et al. (2009)
MoDIL	N/A	✓	✓				PE	Lee et al. (2009)
PEMer	Python; Perl; C++	✓	✓	✓		✓	PE	Korbel et al. (2009)
Pindel	C++; Perl; Python; Shell	✓	✓	✓			PE	Ye et al. (2009)

Data type: PE – paired end; SE – single end; MP – mate pair. PB: PacBio; ONT: Oxford nanopore; N/A: not available.

Software used to detect structural variations Data type: PE – paired end; SE – single end; MP – mate pair. PB: PacBio; ONT: Oxford nanopore; N/A: not available. With continued advances in DNA sequencing and algorithms, long DNA sequence reads have increasingly been adopted for SV detection. Compared to short‐read‐based mapping approaches, long sequence reads can more accurately identify SVs, particularly in complex regions that cannot be spanned by short sequence reads (Sedlazeck et al., 2018; Spielmann et al., 2018). Long sequence reads are particularly useful for insertion detection, which can be challenging using short sequence reads. For example, in a human SV study, Huddleston et al. (2017) used PacBio long sequence reads and detected 1967 novel SVs that had been missed in previous studies. Using 10× Genomics reads, Wong et al. (2018) found that short sequence reads were inefficient in large‐scale insertion detection, with 1842 unique insertions having been missed. In plants, a chromosome‐level assembly of A. thaliana Nd‐1 using PacBio long sequence reads revealed 385 genes initially identified in A. thaliana Col‐0, having at least two copies in Nd‐1 (Pucker et al., 2019). Although long‐read sequencing has provided improved resolution in detecting SVs that may not readily be identified by short‐read sequencing, both technologies are inefficient in large‐scale SV detection. In these cases, optical mapping and Hi‐C technologies afford useful solutions. By mapping the physical locations of nicking sites in reference and query genomes, optical mapping can detect large‐scale variations in genome structure (Cao et al., 2014). However, although optical maps are long, the accuracy of SVs detected by optical mapping is highly dependent on the quality of the reference genome and the density of nicking sites (Yuan et al., 2018). In contrast, Hi‐C detects large‐scale SVs based on 3D chromatin structure, while the coverage of Hi‐C reads can support the accuracy of SV detection (Bickhart et al., 2017). With the increasing understanding of SVs between individuals of the same species, there has been a growth in the production of pangenome references which aim to capture presence and absence variations (Golicz et al., 2016a). A pangenome describes the whole gene set in a species, involving genes present in all individuals (core genes) and genes present only in some individuals (variable or dispensable genes; Bayer et al., 2020; Danilevicz et al., 2020; Golicz et al., 2020). First applied to the studies of microorganisms (Read et al., 2013; Tettelin et al., 2005), pangenome studies have been extended to more complex organisms including plants, and the definition has also been expanded to include all genomic elements, not just expressed genes. Several plant pangenomes have been analysed, including wheat (Montenegro et al., 2017; Walkowiak et al., 2020), barley (Jayakodi et al., 2020), maize (Hirsch et al., 2014; Hufford et al., 2021; Lu et al., 2015; Unterseer et al., 2017), rice (Schatz et al., 2014; Sun et al., 2017; Wang et al., 2018), soybean (Li et al., 2014; Liu et al., 2020; Valliyodan et al., 2021), Brassica rapa (Lin et al., 2014), Brassica oleracea (Golicz et al., 2016b), Brassica napus (Hurgobin et al., 2018; Song et al., 2020), chickpea (Varshney et al., 2019), grapevine (Magris et al., 2015), Medicago truncatula (Zhou et al., 2014), Arabidopsis thaliana (Cao et al., 2011; Jiao and Schneeberger, 2020), pigeonpea (Zhao et al., 2020), Brachypodium distachyon (Vogel et al., 2016), cultivated pepper (Ou et al., 2018), sesame (Yu et al., 2019), sunflower (Hubner et al., 2019), tomato (Alonge et al., 2020; Gao et al., 2019), apple (Sun et al., 2020) and poplar (Pinosio et al., 2016). The methods to study PAVs in a pangenome are similar to these described here for SV detection, and with the further improvement of genomic technologies and algorithms, the study of pangenomes in plants will be more common.

Current status of SV studies in plants

Structural variation studies in plants are increasing and are being applied to understand genomic changes during evolution, domestication and breeding. Recently, several pangenome studies have been conducted for different plant species, and PAV diversity has been investigated. For example, in a wheat pangenome study, Montenegro et al. (2017) used 18 wheat cultivars to identify PAVs associated with important agronomic traits, including response to environmental stress and defence response. In this study, they also demonstrated that the reference genome cultivar, Chinese Spring, poorly represented modern wheat lines. Recently, Walkowiak et al. (2020) studied 15 representative wheat cultivars collected from around the world and found that a translocation that occurred in some of the cultivars between chromosomes 5B and 7B is selectively neutral during breeding. In a subsequent study using 538 wheat lines, they found that the translocation occurred in 66% of the selected lines (Walkowiak et al., 2020). In a recent barley pangenome study, Jayakodi et al. (2020) found that a large‐scale inversion (~10 Mb) on chromosome 2H is frequently found in germplasm from northern Europe. Golicz et al. (2016b) reported that SVs affected the presence of flowering time genes such as FLOWERING LOCUS C (FLC) in Brassica oleracea. Through the pangenome study of Brassica oleracea and Brassica napus, Bayer et al. (2019) and Dolatabadian et al. (2020) revealed that disease resistance genes show diverse PAV patterns among different Brassica accessions and that this seems to be a common feature of plant pangenomes (Dolatabadian et al., 2017). By examining 725 tomato accessions, Gao et al. (2019) discovered 4873 genes demonstrating PAV and identified a rare allele deletion in the TomLoxC promoter that affects the flavour of tomato. In a 3000 rice genome project, Fuentes et al. (2019) demonstrated that rice genomic regions with frequent SVs were enriched in stress response genes. Zhang et al. (2015) reported that SVs affected the coding regions of 1676 cucumber genes, and they found that genes in deleted regions were associated with histone methylation and abiotic stress response, while duplicated genes were often involved in the reproductive process. Genes encoded in inversion regions played an important role in the response to chemical stimulus, and genes in insertion regions were related to histone acetylation. With recent development and application of long‐read sequencing and optical mapping, SV studies in plants have been further refined, and numerous high‐quality SV studies have been reported (Table 2). For example, Michael et al. (2018) assembled one Arabidopsis genome and found that, compared to the Col‐0 genome assembly, the new Oxford Nanopore genome assembly has 4280 SVs with a total length of 9.5 Mb, among which, repeat‐related SVs account for 58%, followed by insertions and deletions (31%). Zhou et al. (2019) studied 50 grapevine cultivars and 19 wild relatives and found that inversions and translocations have strong associations with selection. In a soybean study, Xie et al. (2019) compared wild and cultivated soybeans using optical mapping and confirmed a large inversion at the I locus that can affect seed coat colour during domestication. Using PacBio long‐read sequencing, Song et al. (2020) de novo assembled eight canola genomes and found 77.2 −149.6 Mb PAVs among these accessions. After a PAV‐based genome‐wide association study (GWAS), they identified three FLC genes that are related to ecotype differentiation. Recently, two SV studies by Liu et al. (2020) and Alonge et al. (2020) used long‐read sequencing to study the role of SVs in plants. In Liu et al. (2020), more than 776 000 SVs were discovered in 26 representative soybean accessions. They also identified a 10‐kb PAV on chromosome 15 that has a significant association with seed lustre. Alonge et al. (2020) performed a ‘panSV’ study using Oxford Nanopore data for 100 tomato accessions with 238 490 SVs identified. After associating SVs with QTL involved in the metabolism of guaiacol and fruit weight, different haplotypes were resolved that had been missed in previous GWAS.

Table 2

Recent structural variations studies in plants

Species	Methods	Major SV findings	References
Melon	Short‐read alignment	A 1,070‐bp deletion at 23.85 kb upstream of MELO3C019694 was found which might impair the transcriptional regulation of this gene	Zhao et al. (2019)
Setaria viridis	Whole‐genome comparison	Approximately 22% of the genes were variable genes	Mamidi et al. (2020)
Brassica nigra	Long‐read alignment	Approximately 6000–7000 SVs found in the two B. nigra accessions and among the SVs 63.4−70% were deletions	Perumal et al. (2020)
Eggplant	Short‐read alignment	Asymmetric SV accumulation was found in potential regulatory regions of protein‐coding genes among the different eggplant genomes	Wei et al. (2020)
Peach	Short‐read alignment	A 9‐bp insertion in Prupe.4G186800 had an association with early fruit maturity; a 487‐bp deletion in the promoter of PpMYB10.1 was associated with flesh colour around the stone; a 1.67 Mb inversion was highly associated with fruit shape; a gene adjacent to the inversion breakpoint of PpOFP1 regulated flat shape formation	Guo et al. (2020)
Banana	Fluorescence in situ hybridization (FISH)	Large differences in chromosome structure discriminated individual banana accessions	Simonikova et al. (2020)
Maize	Whole‐genome comparison, short‐read alignment, and FISH	A 1.8 Mb duplication on the Gametophyte factor1 locus which was for unilateral cross‐incompatibility; increased copy number of carotenoid cleavage dioxygenase 1 (ccd1) in A188 was associated with elevated expression during seed development	Lin et al. (2020)
Peanut	Short‐read alignment, whole‐genome comparison	A. hypogaea showed more enrichment of deletions and insertions in the upstream regions of the coding sequences than A. monticola	Yin et al. (2020)
Brassica napus	whole‐genome comparison	77.2–149.6 Mb sequences showed PAV patterns, which included more than 9.5% of the genes	Song et al. (2020)
Rice	Short‐read alignment, whole‐genome comparison and long‐read alignment	The site‐frequency spectrum of SVs was skewed towards lower frequency variants than synonymous SNPs; peaks of SV divergence were enriched for known domestication genes	Kou et al. (2020)
Maize	Whole‐genome comparison	21.9% of the polymorphic SVs showed low linkage disequilibrium with nearby SNPs; A new significant locus for oil concentration and long‐chain fatty acid composition (C18_1, C18_2 and C20_1) on chromosome 4 was found associating with SVs	Yang et al. (2019)
Solanum pimpinellifolium	Whole‐genome comparison and long‐read alignment	SVs overlapping genes played a role in breeding traits such as fruit weight and lycopene content; SVs contribute to complex regulatory networks, such as fruit quality traits	Wang et al. (2020)
Rice	Whole‐genome comparison and long‐read alignment	organelle‐to‐nucleus DNA transfers resulted in numerous SVs that participated in the nuclear genome divergence of rice species and subspecies	Ma et al. (2020)
Banyan tree	Whole‐genome comparison	A chromosome fusion event found in FmChr03, FhChr03 and FhChr07, which was followed by two inversions. Genes within the rearranged regions of FmChr03 and FhChr03 have an association with plant immunity	Zhang et al. (2020)
Apple	Short‐read alignment	PAV genes were highly associated with pollination, signal transduction and response to stress	Sun et al. (2020)
Brassica napus	Long‐read alignment	SVs played a role in B. napus eco‐geographical adaptation and disease resistance	Chawla et al. (2020)
Tomato	Long‐read alignment	SVs could change tomato gene dosage and expression levels modified fruit flavour, size and production	Alonge et al. (2020)
Sunflower	Whole‐genome comparison	SVs had associations with flowering time and seed size	Todesco et al. (2020)
Soybean	Whole‐genome comparison	PAV was a major contributor to driving genome size variation. A 10‐kb PAV of a hydrophobic protein‐encoding gene may be responsible for seed lustre	Liu et al. (2020)
Wheat	Short‐read alignment	23% of the genes were variable and 330 genes were absent from the reference. Variable genes tended to be enriched in functions like protein phosphorylation and protein catabolic process	De Oliveira et al. (2020)

Recent structural variations studies in plants

Conclusions and perspective

Structural variation represents an important part of genetic diversity in plants and plays a role in phenotypic variation. The limitations of technology and methods used to analyse SVs have previously hindered our understanding of the extent and importance of these variations. With recent advances in DNA sequencing and optical mapping, together with the development of advanced bioinformatics tools, the study of SVs in plants is becoming more common, and there is an increasing awareness that SVs are as important as SNPs and small indels (Wellenreuther et al., 2019). Although the current technology and methods have dramatically increased the resolution of SV identification, false positives remain. Filtration and further validation are required to make SV detection more reliable. New computational algorithms for SV calling, particularly using long‐read sequencing and long‐range genomic information, are expected to be developed for plant genome data, which considers different ploidy levels and genome repetitiveness. Refined SV identification pipelines are also needed to further increase sensitivity. Machine learning approaches may be developed to integrate SV calls from different algorithms to reduce false positives. With the improved accuracy and read length in long‐read sequencing, haplotype‐aware plant genome assemblies are expected to be produced, which could support a detailed mining of allelic or heterozygous variation and the hidden genes missing in current linear assemblies. Techniques, such as Strand‐seq (Falconer et al., 2012), can be applied to further assess allele‐based SVs, particularly inversions. Other techniques, such as short‐read, long‐read and direct RNA‐sequencing may be useful to check the accuracy of SV identification through gene expression. With more individuals studied, improved visualization of SVs between different individuals, such as using a genome graph, can be produced to display SVs. However, more efforts are needed to solve the problems in using a graph genome, such as finding an efficient way to easily switch sequence coordinates between assemblies. Mining SVs or genes altered by SVs can be useful for breeding. Currently, there are few methods to directly link SVs with particular phenotypes; therefore, SV‐specific genome‐wide association study approaches are needed to efficiently associate SVs with phenotypes. Genome editing using such as CRISPR/Cas system provides a way to validate or induce SVs of interest in plants to produce advanced crop varieties (Zhang et al., 2018). Building pangenomes or genus‐wide pangenomes (Khan et al., 2020) provides a useful way of mining SV‐related genes. To further benefit plant breeding, SV‐phenotype‐related databases for different species are needed. By searching such databases, breeders and crop researchers can identify candidate SVs that can be used in their breeding programmes to produce improved varieties.

Funding

This work was funded by the Australian Research Council (ARC) (grant no: LP160100030).

Conflict of interest

There is no conflict of interest to disclose.

Author contributions

Y.Y drafted this manuscript and prepared all figures and tables. P.B, J.B and D.E edited this manuscript. All authors read and approved this manuscript.

162 in total

1. MoDIL: detecting small indels from clone-end sequencing with mixtures of distributions.

Authors: Seunghak Lee; Fereydoun Hormozdiari; Can Alkan; Michael Brudno
Journal: Nat Methods Date: 2009-05-31 Impact factor: 28.547

2. Massive haplotypes underlie ecotypic differentiation in sunflowers.

Authors: Marco Todesco; Gregory L Owens; Natalia Bercovich; Jean-Sébastien Légaré; Shaghayegh Soudi; Dylan O Burge; Kaichi Huang; Katherine L Ostevik; Emily B M Drummond; Ivana Imerovski; Kathryn Lande; Mariana A Pascual-Robles; Mihir Nanavati; Mojtaba Jahani; Winnie Cheung; S Evan Staton; Stéphane Muños; Rasmus Nielsen; Lisa A Donovan; John M Burke; Sam Yeaman; Loren H Rieseberg
Journal: Nature Date: 2020-07-08 Impact factor: 49.962

3. rSW-seq: algorithm for detection of copy number alterations in deep sequencing data.

Authors: Tae-Min Kim; Lovelace J Luquette; Ruibin Xi; Peter J Park
Journal: BMC Bioinformatics Date: 2010-08-18 Impact factor: 3.169

4. A comprehensive genome variation map of melon identifies multiple domestication events and loci influencing agronomic traits.

Authors: Guangwei Zhao; Qun Lian; Zhonghua Zhang; Qiushi Fu; Yuhua He; Shuangwu Ma; Valentino Ruggieri; Antonio J Monforte; Pingyong Wang; Irene Julca; Huaisong Wang; Junpu Liu; Yong Xu; Runze Wang; Jiabing Ji; Zhihong Xu; Weihu Kong; Yang Zhong; Jianli Shang; Lara Pereira; Jason Argyris; Jian Zhang; Carlos Mayobre; Marta Pujol; Elad Oren; Diandian Ou; Jiming Wang; Dexi Sun; Shengjie Zhao; Yingchun Zhu; Na Li; Nurit Katzir; Amit Gur; Catherine Dogimont; Hanno Schaefer; Wei Fan; Abdelhafid Bendahmane; Zhangjun Fei; Michel Pitrat; Toni Gabaldón; Tao Lin; Jordi Garcia-Mas; Yongyang Xu; Sanwen Huang
Journal: Nat Genet Date: 2019-11-01 Impact factor: 38.330

5. Variation in abundance of predicted resistance genes in the Brassica oleracea pangenome.

Authors: Philipp E Bayer; Agnieszka A Golicz; Soodeh Tirnaz; Chon-Kit Kenneth Chan; David Edwards; Jacqueline Batley
Journal: Plant Biotechnol J Date: 2018-05-31 Impact factor: 9.803

6. Evolutionary Genomics of Structural Variation in Asian Rice (Oryza sativa) Domestication.

Authors: Yixuan Kou; Yi Liao; Tuomas Toivainen; Yuanda Lv; Xinmin Tian; J J Emerson; Brandon S Gaut; Yongfeng Zhou
Journal: Mol Biol Evol Date: 2020-12-16 Impact factor: 16.240

7. DELLY: structural variant discovery by integrated paired-end and split-read analysis.

Authors: Tobias Rausch; Thomas Zichner; Andreas Schlattl; Adrian M Stütz; Vladimir Benes; Jan O Korbel
Journal: Bioinformatics Date: 2012-09-15 Impact factor: 6.937

8. A high-contiguity Brassica nigra genome localizes active centromeres and defines the ancestral Brassica genome.

Authors: Sampath Perumal; Chu Shin Koh; Lingling Jin; Miles Buchwaldt; Erin E Higgins; Chunfang Zheng; David Sankoff; Stephen J Robinson; Sateesh Kagale; Zahra-Katy Navabi; Lily Tang; Kyla N Horner; Zhesi He; Ian Bancroft; Boulos Chalhoub; Andrew G Sharpe; Isobel A P Parkin
Journal: Nat Plants Date: 2020-08-10 Impact factor: 15.793

9. A genome resource for green millet Setaria viridis enables discovery of agronomically valuable loci.

Authors: Sujan Mamidi; Adam Healey; Pu Huang; Jane Grimwood; Jerry Jenkins; Kerrie Barry; Avinash Sreedasyam; Shengqiang Shu; John T Lovell; Maximilian Feldman; Jinxia Wu; Yunqing Yu; Cindy Chen; Jenifer Johnson; Hitoshi Sakakibara; Takatoshi Kiba; Tetsuya Sakurai; Rachel Tavares; Dmitri A Nusinow; Ivan Baxter; Jeremy Schmutz; Thomas P Brutnell; Elizabeth A Kellogg
Journal: Nat Biotechnol Date: 2020-10-05 Impact factor: 54.908

10. Long-read sequencing reveals widespread intragenic structural variants in a recent allopolyploid crop plant.

Authors: Harmeet Singh Chawla; HueyTyng Lee; Iulian Gabur; Paul Vollrath; Suriya Tamilselvan-Nattar-Amutha; Christian Obermeier; Sarah V Schiessl; Jia-Ming Song; Kede Liu; Liang Guo; Isobel A P Parkin; Rod J Snowdon
Journal: Plant Biotechnol J Date: 2020-09-06 Impact factor: 9.803

5 in total

Review 1. Pangenomes as a Resource to Accelerate Breeding of Under-Utilised Crop Species.

Authors: Cassandria Geraldine Tay Fernandez; Benjamin John Nestor; Monica Furaste Danilevicz; Mitchell Gill; Jakob Petereit; Philipp Emanuel Bayer; Patrick Michael Finnegan; Jacqueline Batley; David Edwards
Journal: Int J Mol Sci Date: 2022-02-28 Impact factor: 5.923

2. Next-Generation Sequencing of Local Romanian Tomato Varieties and Bioinformatics Analysis of the Ve Locus.

Authors: Anca-Amalia Udriște; Mihaela Iordachescu; Roxana Ciceoi; Liliana Bădulescu
Journal: Int J Mol Sci Date: 2022-08-28 Impact factor: 6.208

Review 3. Entailing the Next-Generation Sequencing and Metabolome for Sustainable Agriculture by Improving Plant Tolerance.

Authors: Muhammad Furqan Ashraf; Dan Hou; Quaid Hussain; Muhammad Imran; Jialong Pei; Mohsin Ali; Aamar Shehzad; Muhammad Anwar; Ali Noman; Muhammad Waseem; Xinchun Lin
Journal: Int J Mol Sci Date: 2022-01-07 Impact factor: 5.923

Review 4. Genomic Variations and Mutational Events Associated with Plant-Pathogen Interactions.

Authors: Aria Dolatabadian; Wannakuwattewaduge Gerard Dilantha Fernando
Journal: Biology (Basel) Date: 2022-03-10

Review 5. Current status of structural variation studies in plants.

Authors: Yuxuan Yuan; Philipp E Bayer; Jacqueline Batley; David Edwards
Journal: Plant Biotechnol J Date: 2021-07-20 Impact factor: 9.803

5 in total