Literature DB >> 34101329

Current status of structural variation studies in plants.

Yuxuan Yuan1,2, Philipp E Bayer1, Jacqueline Batley1, David Edwards1.   

Abstract

Structural variations (SVs) including gene presence/absence variations and copy number variations are a common feature of genomes in plants and, together with single nucleotide polymorphisms and epigenetic differences, are responsible for the heritable phenotypic diversity observed within and between species. Understanding the contribution of SVs to plant phenotypic variation is important for plant breeders to assist in producing improved varieties. The low resolution of early genetic technologies and inefficient methods have previously limited our understanding of SVs in plants. However, with the rapid expansion in genomic technologies, it is possible to assess SVs with an ever-greater resolution and accuracy. Here, we review the current status of SV studies in plants, examine the roles that SVs play in phenotypic traits, compare current technologies and assess future challenges for SV studies.
© 2021 The Authors. Plant Biotechnology Journal published by Society for Experimental Biology and The Association of Applied Biologists and John Wiley & Sons Ltd.

Entities:  

Keywords:  DNA sequencing; breeding; gene expression; optical mapping; phenotypic variation; structural variation

Mesh:

Year:  2021        PMID: 34101329      PMCID: PMC8541774          DOI: 10.1111/pbi.13646

Source DB:  PubMed          Journal:  Plant Biotechnol J        ISSN: 1467-7644            Impact factor:   9.803


Introduction

Structural variations (SVs) are genetic differences between individuals, which can lead to gene loss, gene duplication and the generation of novel genes, therefore, leading to phenotypic variations in a species. An SV is defined as a region of DNA that has a change in sequence length, copy number, orientation or chromosomal location between individuals (Escaramis et al., 2015). Generally, an SV can be classified as a deletion, an insertion, a copy number variation (CNV), an inversion or a translocation. In contrast to single nucleotide polymorphisms (SNPs) and small indels (insertions and deletions), SVs are considered to be longer (>50 bp) and can have a greater influence on gene expression and protein function than SNPs (Chiang et al., 2017). In early plant genomic studies, the limitation of technologies and the lack of high‐quality reference genome assemblies prevented the comprehensive exploration of SVs in plants. Many plants have large and complex genomes, with polyploidy occurring in up to 80% of plant species (Meyers and Levin, 2006), making the identification of SVs in plant genomes a challenge. Recent advances in genomic technologies, particularly long‐read sequencing and whole‐genome mapping, promise the production of high‐quality plant genome and pangenome assemblies and access to a broad range of SVs to assess their potential role in plant phenotypic variation. Although improvements in DNA sequencing, whole‐genome mapping and novel algorithms have made it feasible to characterize SVs on a genome‐wide scale with higher accuracy, the reports of SV studies in plants are still limited. Significant effort and resources are needed to comprehensively decipher the association between SVs and agronomic traits to support plant improvement, supporting the economies and food security. Here, we discuss the current progress and challenges of SV studies in plants and the potential to apply knowledge of SVs to improve crop varieties.

Limitations of early technologies for SV identification

Before the widespread use of molecular markers and DNA sequencing, SVs were characterized by microscopes at a karyotype level, with a resolution >3 Mb (Figure 1a) (Feuk et al., 2006). Due to the low throughput and the limited resolution of microscopic observation however, there are few novel SV studies using microscopic techniques, and they are now mostly applied to confirm known SVs. Approximately 15 years ago, the advent of hybridization‐based microarray approaches made it possible to perform SV studies with a greater resolution and lower cost than microscopic methods. Two commonly used methods were array comparative genomic hybridization (array‐CGH) and SNP arrays.
Figure 1

Methods used to identify structural variations from the past to the present. The figure lists commonly used methods to identify SVs from the early (a) microscope observation, (b) array comparative genomic hybridization and (c) SNP array to the current (d) DNA sequencing.

Methods used to identify structural variations from the past to the present. The figure lists commonly used methods to identify SVs from the early (a) microscope observation, (b) array comparative genomic hybridization and (c) SNP array to the current (d) DNA sequencing. Array‐CGH can efficiently detect CNVs at multiple genomic loci (Figure 1b) and has been applied to diverse studies, including gene discovery, epigenetic modification and chromatin conformation (Bejjani and Shaffer, 2006). Nevertheless, array‐CGH cannot detect balanced SVs (e.g. reciprocal translocations and inversions) or absolute copy numbers of a DNA segment, because it detects genetic imbalances between two individual genomes, where a sample has more or less of a specific genetic material than others (Escaramis et al., 2015). Additionally, array‐CGH is specifically designed for diploid individuals and it is not sensitive to higher degrees of ploidy (>2 sets of chromosomes). A well‐assembled reference is also essential during the design of the array (Park et al., 2010). In contrast to array‐CGH, SNP arrays are more sensitive to allele‐specific CNVs and can help to identify large‐scale CNVs in diverse populations (Figure 1c; Alkan et al., 2011). However, SNP arrays provide a poor signal‐to‐noise ratio compared with array‐CGH due to the smaller target size (Hester et al., 2009). As with array‐CGH, SNP arrays cannot be used to detect insertions. The number of SVs detected is dependent on the density or presence and absence of SNPs in the target genome regions. Moreover, SNP arrays were initially designed for diploid samples and struggle to characterize repeat‐rich and duplicated regions. The design of SNP arrays also depends on the quality of the reference genome assemblies. Furthermore, the breakpoints of SVs cannot be easily detected by SNP arrays or array‐CGH.

Current technologies for SV identification

With advances in DNA sequencing, whole‐genome analysis has become viable for a wide range of plant species. Examining whole genomes by DNA sequencing has permitted SV characterization at the nucleotide level, and the detection of inversions and translocations, as well as recombination breakpoints, has become more efficient. Initially, single‐end reads (sequenced only in one direction from a DNA strand) were used. With the expansion of sequencing methods, paired reads, sequenced from both forward and reverse orientations from complementary DNA strands, with a known approximate distance between the pairs, have been used to overcome the challenges of associating short single reads with regions of the genome. However, the short length (<600 bp) of these reads still poses challenges for the characterization of repetitive regions (Michael and VanBuren, 2020), and thus, the accuracy of SV detection based on short sequence reads is limited. Recent advances in long‐read sequencing and high‐throughput chromosome conformation capture (Hi‐C) technologies offer solutions to overcome some of the problems associated with short sequence reads. Hi‐C read pairs can physically span the entire chromosomes and can be applied to detect large‐scale SVs (Ho et al., 2020), while long‐read sequencing, comprising synthetic long‐read sequencing and single‐molecule long‐read sequencing (Goodwin et al., 2016) can average 10 to >100 kb in length to resolve SVs that cannot readily be assayed using short reads by short‐read sequencing. The previous high error rates (5–15%), low throughput and relatively high cost of single‐molecule long‐read sequencing have limited their application (Yuan et al., 2017). However, with reducing costs and the continued advances in sequencing technology and computational algorithms, more accurate data (accuracy >99%), such as PacBio HiFi reads and Oxford Nanopore R10.3 reads, have been produced, which could further improve the accuracy of genome analysis, particularly for haplotype‐aware genome assembly and SV studies (Wenger et al., 2019). Optical mapping in nanochannels is complementary to DNA sequencing and provides an approach for large‐scale SV detection (Yuan et al., 2020). DNA from the plant species is nicked or directly labelled by specific enzymes such as Nt. BspQI, Nb. BssSI and DLE‐1, and strands are loaded and stretched in nanochannels, labelled by fluorescence and scanned in an optical mapping device (Lam et al., 2012). The fluorescence images produced are then converted into single‐molecule maps based on nicked enzyme site positions. The average length of single‐molecule maps is around 225 kb (Shelton et al., 2015), and thus, optical mapping can capture larger genomic structural variation that is not easily detected by DNA sequencing.

Strategies for SV identification

There are two commonly used strategies to detect SVs using DNA sequencing (Figure 1d). One is to directly compare de novo genome assemblies, and the other is to use the information from mapping reads to a reference, such as paired reads (PR), read depth (RD) and split reads (SR) to detect SVs (Escaramis et al., 2015). Since the release of the Arabidopsis thaliana genome assembly in 2000 (Arabidopsis Genome, 2000), approximately 450 plant genomes have been assembled (https://www.plabipd.de). The continued increase in high‐quality genome assemblies makes SV characterization in plants more reliable. Whole‐genome comparison can identify SVs by comparing the genome of one individual to another. Several tools have been developed for this purpose, including Mauve (Darling et al., 2004), MUMMER (Kurtz et al., 2004), LASTZ (Harris, 2007), Assemblytics (Nattestad and Schatz, 2016), paftools (Li, 2018), SyRI (Goel et al., 2019) and SVIM‐asm (Heller and Vingron, 2020). However, due to the difficulty and expense of producing high‐quality genome assemblies, and the challenge of differentiating between real genomic differences and assembly or annotation artefacts (Bayer et al., 2018; Bayer et al., 2017), the application of whole‐genome comparison in SV detection is limited (Wala et al., 2018), while SV analysis using read mapping is more common. In principle, paired reads can be used to detect all kinds of SVs, as SVs change the paired read mapping patterns (Ye et al., 2016). Briefly, when aligning to an insertion, the distance between paired reads will be increased compared to the average insert size, while for a deletion, the distance between paired reads will reduce compared to the average. If an inversion occurs, the orientation of reads can be reversed. Translocations can also be detected using the information from mapped paired reads, as the reads may map to different chromosomal locations. For CNVs, read mapping can lead to increased or decreased mapped read depth depending on the copy number of the target genome regions. However, due to the short‐read length, repetitiveness and complexity of plant genomes, up to 89% of SVs have been reported to be false positives, which needs comprehensive filtration to ensure robust results (Sedlazeck et al., 2018). Although short sequence reads can be less efficient for SV detection than longer reads, they are still applied to characterize SVs due to their relatively low cost. To facilitate such analysis, many tools have been developed for using short reads to detect SVs (Table 1).
Table 1

Software used to detect structural variations

SoftwareLanguageSV calling typeData typeReferences
InsertionDeletionInversionCNVTranslocation
ETCHINGC and C++PEChoi et al. (2020)
ScpluscnvRPELopez et al. (2020)
CONYRPEWei and Huang (2020)
cuteSVPythonPB; ONTJiang et al. (2020)
NanoVarC++; Python; C; shellONTTham et al. (2019)
SVIMPythonPB; ONTHeller and Vingron (2019)
PBSVPythonPBPacificBiosciences (2018)
SnifflesC++; C; HTMLPB; ONTSedlazeck et al. (2018)
PickyPerlPB; ONTGong et al. (2018)
NanoSVPython; shellPB; ONTCretu Stancu et al. (2017)
SVachraRubyPE; MPHampton et al. (2017)
PSSVRPEChen et al. (2017)
SeeksvC++PELiang et al. (2017)
novoBreakPerl; shellPEChong et al. (2017)
MantaC++; PythonPEChen et al. (2016)
SoftSVC++PEBartenhagen and Dugas (2016)
SV‐STATShell; PerlPE; SEDavis et al. (2016)
MUMdexC++PEAndrews et al. (2016)
MetaSVPython; HTML; ShellPEMohiyuddin et al. (2015)
BreaKmerPythonSEAbo et al. (2015)
Genome STRiP2Java; RPEHandsaker et al. (2015)
Hydra‐multiC++; Python; Shell; PerlPELindberg et al. (2015)
UlyssesPython; RMPGillet‐Markowska et al. (2015)
LUMPYC; C++; Python; ShellPELayer et al. (2014)
ScalpelPerl; C++PENarzisi et al. (2014)
GustafC++PE; SETrappe et al. (2014)
PBHoneyPythonPBEnglish et al. (2014)
SocratesJavaPE; SESchroder et al. (2014)
FACTERAPerlPENewman et al. (2014)
SMuFinCPEMoncunill et al. (2014)
CNVeMCPEWang et al. (2013)
BreakpointerFortran; PythonPEDrier et al. (2013)
BellerophonPerlPEHayes and Li (2013)
PeSV‐FisherPythonPE; MPEscaramis et al. (2013)
RetroSeqPerlPEKeane et al. (2013)
SOAPindelPerl; C++PELi et al. (2013)
cn.MOPSRPE; SEKlambauer et al. (2012)
MagnolyaPythonPENijkamp et al. (2012)
CortexCPE; SEIqbal et al. (2012)
CNVnormaRPEGusnanto et al. (2012)
Control‐FREECC++PE; SEBoeva et al. (2012)
cnvHiTSeqJavaPEBellos et al. (2012)
CLEVERC++PEMarschall et al. (2012)
DellyC++PERausch et al. (2012)
GASVProJava; C++; perl; pythonPESindi et al. (2012)
PRISMN/APEJiang et al. (2012)
SVMinerC++; PerlPEHayes et al. (2012)
BIC‐seqPerl; RPE; SEXi et al. (2011)
ReadDepthRPEMiller et al. (2011)
CNVnatorC++; PerlPE; SEAbyzov et al. (2011)
JointSLMRPE; SEMagi et al. (2011)
ClipcropJavaScriptPESuzuki et al. (2011)
CRESTPerlPE; SEWang et al. (2011)
inGAP‐svJavaPEQi and Zhao (2011)
SplitreadC; ShellPEKarakoc et al. (2011)
rSW‐seqCSEKim et al. (2010)
cnDDPESimpson et al. (2010)
CNVerShell; CPEMedvedev et al. (2010)
SVMergePerl; ShellPE; SEWong et al. (2010)
SVDetectPerlPE; MPZeitouni et al. (2010)
VariationHunterN/APEHormozdiari et al. (2010)
NovelSeqC++PEHajirasouliha et al. (2010)
SLOPEC++PE; SEAbel et al. (2010)
BreakSeqPython; PerlPELam et al. (2010)
mrCaNaVaRCPEAlkan et al. (2009)
CNV‐seqPerl;RPEXie and Tammi (2009)
RDXplorerShell; CSEYoon et al. (2009)
BreakDancerPerl; C++PEChen et al. (2009)
MoDILN/APELee et al. (2009)
PEMerPython; Perl; C++PEKorbel et al. (2009)
PindelC++; Perl; Python; ShellPEYe et al. (2009)

Data type: PE – paired end; SE – single end; MP – mate pair. PB: PacBio; ONT: Oxford nanopore; N/A: not available.

Software used to detect structural variations Data type: PE – paired end; SE – single end; MP – mate pair. PB: PacBio; ONT: Oxford nanopore; N/A: not available. With continued advances in DNA sequencing and algorithms, long DNA sequence reads have increasingly been adopted for SV detection. Compared to short‐read‐based mapping approaches, long sequence reads can more accurately identify SVs, particularly in complex regions that cannot be spanned by short sequence reads (Sedlazeck et al., 2018; Spielmann et al., 2018). Long sequence reads are particularly useful for insertion detection, which can be challenging using short sequence reads. For example, in a human SV study, Huddleston et al. (2017) used PacBio long sequence reads and detected 1967 novel SVs that had been missed in previous studies. Using 10× Genomics reads, Wong et al. (2018) found that short sequence reads were inefficient in large‐scale insertion detection, with 1842 unique insertions having been missed. In plants, a chromosome‐level assembly of A. thaliana Nd‐1 using PacBio long sequence reads revealed 385 genes initially identified in A. thaliana Col‐0, having at least two copies in Nd‐1 (Pucker et al., 2019). Although long‐read sequencing has provided improved resolution in detecting SVs that may not readily be identified by short‐read sequencing, both technologies are inefficient in large‐scale SV detection. In these cases, optical mapping and Hi‐C technologies afford useful solutions. By mapping the physical locations of nicking sites in reference and query genomes, optical mapping can detect large‐scale variations in genome structure (Cao et al., 2014). However, although optical maps are long, the accuracy of SVs detected by optical mapping is highly dependent on the quality of the reference genome and the density of nicking sites (Yuan et al., 2018). In contrast, Hi‐C detects large‐scale SVs based on 3D chromatin structure, while the coverage of Hi‐C reads can support the accuracy of SV detection (Bickhart et al., 2017). With the increasing understanding of SVs between individuals of the same species, there has been a growth in the production of pangenome references which aim to capture presence and absence variations (Golicz et al., 2016a). A pangenome describes the whole gene set in a species, involving genes present in all individuals (core genes) and genes present only in some individuals (variable or dispensable genes; Bayer et al., 2020; Danilevicz et al., 2020; Golicz et al., 2020). First applied to the studies of microorganisms (Read et al., 2013; Tettelin et al., 2005), pangenome studies have been extended to more complex organisms including plants, and the definition has also been expanded to include all genomic elements, not just expressed genes. Several plant pangenomes have been analysed, including wheat (Montenegro et al., 2017; Walkowiak et al., 2020), barley (Jayakodi et al., 2020), maize (Hirsch et al., 2014; Hufford et al., 2021; Lu et al., 2015; Unterseer et al., 2017), rice (Schatz et al., 2014; Sun et al., 2017; Wang et al., 2018), soybean (Li et al., 2014; Liu et al., 2020; Valliyodan et al., 2021), Brassica rapa (Lin et al., 2014), Brassica oleracea (Golicz et al., 2016b), Brassica napus (Hurgobin et al., 2018; Song et al., 2020), chickpea (Varshney et al., 2019), grapevine (Magris et al., 2015), Medicago truncatula (Zhou et al., 2014), Arabidopsis thaliana (Cao et al., 2011; Jiao and Schneeberger, 2020), pigeonpea (Zhao et al., 2020), Brachypodium distachyon (Vogel et al., 2016), cultivated pepper (Ou et al., 2018), sesame (Yu et al., 2019), sunflower (Hubner et al., 2019), tomato (Alonge et al., 2020; Gao et al., 2019), apple (Sun et al., 2020) and poplar (Pinosio et al., 2016). The methods to study PAVs in a pangenome are similar to these described here for SV detection, and with the further improvement of genomic technologies and algorithms, the study of pangenomes in plants will be more common.

Current status of SV studies in plants

Structural variation studies in plants are increasing and are being applied to understand genomic changes during evolution, domestication and breeding. Recently, several pangenome studies have been conducted for different plant species, and PAV diversity has been investigated. For example, in a wheat pangenome study, Montenegro et al. (2017) used 18 wheat cultivars to identify PAVs associated with important agronomic traits, including response to environmental stress and defence response. In this study, they also demonstrated that the reference genome cultivar, Chinese Spring, poorly represented modern wheat lines. Recently, Walkowiak et al. (2020) studied 15 representative wheat cultivars collected from around the world and found that a translocation that occurred in some of the cultivars between chromosomes 5B and 7B is selectively neutral during breeding. In a subsequent study using 538 wheat lines, they found that the translocation occurred in 66% of the selected lines (Walkowiak et al., 2020). In a recent barley pangenome study, Jayakodi et al. (2020) found that a large‐scale inversion (~10 Mb) on chromosome 2H is frequently found in germplasm from northern Europe. Golicz et al. (2016b) reported that SVs affected the presence of flowering time genes such as FLOWERING LOCUS C (FLC) in Brassica oleracea. Through the pangenome study of Brassica oleracea and Brassica napus, Bayer et al. (2019) and Dolatabadian et al. (2020) revealed that disease resistance genes show diverse PAV patterns among different Brassica accessions and that this seems to be a common feature of plant pangenomes (Dolatabadian et al., 2017). By examining 725 tomato accessions, Gao et al. (2019) discovered 4873 genes demonstrating PAV and identified a rare allele deletion in the TomLoxC promoter that affects the flavour of tomato. In a 3000 rice genome project, Fuentes et al. (2019) demonstrated that rice genomic regions with frequent SVs were enriched in stress response genes. Zhang et al. (2015) reported that SVs affected the coding regions of 1676 cucumber genes, and they found that genes in deleted regions were associated with histone methylation and abiotic stress response, while duplicated genes were often involved in the reproductive process. Genes encoded in inversion regions played an important role in the response to chemical stimulus, and genes in insertion regions were related to histone acetylation. With recent development and application of long‐read sequencing and optical mapping, SV studies in plants have been further refined, and numerous high‐quality SV studies have been reported (Table 2). For example, Michael et al. (2018) assembled one Arabidopsis genome and found that, compared to the Col‐0 genome assembly, the new Oxford Nanopore genome assembly has 4280 SVs with a total length of 9.5 Mb, among which, repeat‐related SVs account for 58%, followed by insertions and deletions (31%). Zhou et al. (2019) studied 50 grapevine cultivars and 19 wild relatives and found that inversions and translocations have strong associations with selection. In a soybean study, Xie et al. (2019) compared wild and cultivated soybeans using optical mapping and confirmed a large inversion at the I locus that can affect seed coat colour during domestication. Using PacBio long‐read sequencing, Song et al. (2020) de novo assembled eight canola genomes and found 77.2 −149.6 Mb PAVs among these accessions. After a PAV‐based genome‐wide association study (GWAS), they identified three FLC genes that are related to ecotype differentiation. Recently, two SV studies by Liu et al. (2020) and Alonge et al. (2020) used long‐read sequencing to study the role of SVs in plants. In Liu et al. (2020), more than 776 000 SVs were discovered in 26 representative soybean accessions. They also identified a 10‐kb PAV on chromosome 15 that has a significant association with seed lustre. Alonge et al. (2020) performed a ‘panSV’ study using Oxford Nanopore data for 100 tomato accessions with 238 490 SVs identified. After associating SVs with QTL involved in the metabolism of guaiacol and fruit weight, different haplotypes were resolved that had been missed in previous GWAS.
Table 2

Recent structural variations studies in plants

SpeciesMethodsMajor SV findingsReferences
MelonShort‐read alignmentA 1,070‐bp deletion at 23.85 kb upstream of MELO3C019694 was found which might impair the transcriptional regulation of this geneZhao et al. (2019)
Setaria viridis Whole‐genome comparisonApproximately 22% of the genes were variable genesMamidi et al. (2020)
Brassica nigra Long‐read alignmentApproximately 6000–7000 SVs found in the two B. nigra accessions and among the SVs 63.4−70% were deletionsPerumal et al. (2020)
EggplantShort‐read alignmentAsymmetric SV accumulation was found in potential regulatory regions of protein‐coding genes among the different eggplant genomesWei et al. (2020)
PeachShort‐read alignmentA 9‐bp insertion in Prupe.4G186800 had an association with early fruit maturity; a 487‐bp deletion in the promoter of PpMYB10.1 was associated with flesh colour around the stone; a 1.67 Mb inversion was highly associated with fruit shape; a gene adjacent to the inversion breakpoint of PpOFP1 regulated flat shape formationGuo et al. (2020)
BananaFluorescence in situ hybridization (FISH)Large differences in chromosome structure discriminated individual banana accessionsSimonikova et al. (2020)
MaizeWhole‐genome comparison, short‐read alignment, and FISHA 1.8 Mb duplication on the Gametophyte factor1 locus which was for unilateral cross‐incompatibility; increased copy number of carotenoid cleavage dioxygenase 1 (ccd1) in A188 was associated with elevated expression during seed developmentLin et al. (2020)
PeanutShort‐read alignment, whole‐genome comparisonA. hypogaea showed more enrichment of deletions and insertions in the upstream regions of the coding sequences than A. monticola Yin et al. (2020)
Brassica napus whole‐genome comparison77.2–149.6 Mb sequences showed PAV patterns, which included more than 9.5% of the genesSong et al. (2020)
RiceShort‐read alignment, whole‐genome comparison and long‐read alignmentThe site‐frequency spectrum of SVs was skewed towards lower frequency variants than synonymous SNPs; peaks of SV divergence were enriched for known domestication genesKou et al. (2020)
MaizeWhole‐genome comparison21.9% of the polymorphic SVs showed low linkage disequilibrium with nearby SNPs; A new significant locus for oil concentration and long‐chain fatty acid composition (C18_1, C18_2 and C20_1) on chromosome 4 was found associating with SVsYang et al. (2019)
Solanum pimpinellifolium Whole‐genome comparison and long‐read alignmentSVs overlapping genes played a role in breeding traits such as fruit weight and lycopene content; SVs contribute to complex regulatory networks, such as fruit quality traitsWang et al. (2020)
RiceWhole‐genome comparison and long‐read alignmentorganelle‐to‐nucleus DNA transfers resulted in numerous SVs that participated in the nuclear genome divergence of rice species and subspeciesMa et al. (2020)
Banyan treeWhole‐genome comparisonA chromosome fusion event found in FmChr03, FhChr03 and FhChr07, which was followed by two inversions. Genes within the rearranged regions of FmChr03 and FhChr03 have an association with plant immunityZhang et al. (2020)
AppleShort‐read alignmentPAV genes were highly associated with pollination, signal transduction and response to stressSun et al. (2020)
Brassica napus Long‐read alignmentSVs played a role in B. napus eco‐geographical adaptation and disease resistanceChawla et al. (2020)
TomatoLong‐read alignmentSVs could change tomato gene dosage and expression levels modified fruit flavour, size and productionAlonge et al. (2020)
SunflowerWhole‐genome comparisonSVs had associations with flowering time and seed sizeTodesco et al. (2020)
SoybeanWhole‐genome comparisonPAV was a major contributor to driving genome size variation. A 10‐kb PAV of a hydrophobic protein‐encoding gene may be responsible for seed lustreLiu et al. (2020)
WheatShort‐read alignment23% of the genes were variable and 330 genes were absent from the reference. Variable genes tended to be enriched in functions like protein phosphorylation and protein catabolic processDe Oliveira et al. (2020)
Recent structural variations studies in plants

Conclusions and perspective

Structural variation represents an important part of genetic diversity in plants and plays a role in phenotypic variation. The limitations of technology and methods used to analyse SVs have previously hindered our understanding of the extent and importance of these variations. With recent advances in DNA sequencing and optical mapping, together with the development of advanced bioinformatics tools, the study of SVs in plants is becoming more common, and there is an increasing awareness that SVs are as important as SNPs and small indels (Wellenreuther et al., 2019). Although the current technology and methods have dramatically increased the resolution of SV identification, false positives remain. Filtration and further validation are required to make SV detection more reliable. New computational algorithms for SV calling, particularly using long‐read sequencing and long‐range genomic information, are expected to be developed for plant genome data, which considers different ploidy levels and genome repetitiveness. Refined SV identification pipelines are also needed to further increase sensitivity. Machine learning approaches may be developed to integrate SV calls from different algorithms to reduce false positives. With the improved accuracy and read length in long‐read sequencing, haplotype‐aware plant genome assemblies are expected to be produced, which could support a detailed mining of allelic or heterozygous variation and the hidden genes missing in current linear assemblies. Techniques, such as Strand‐seq (Falconer et al., 2012), can be applied to further assess allele‐based SVs, particularly inversions. Other techniques, such as short‐read, long‐read and direct RNA‐sequencing may be useful to check the accuracy of SV identification through gene expression. With more individuals studied, improved visualization of SVs between different individuals, such as using a genome graph, can be produced to display SVs. However, more efforts are needed to solve the problems in using a graph genome, such as finding an efficient way to easily switch sequence coordinates between assemblies. Mining SVs or genes altered by SVs can be useful for breeding. Currently, there are few methods to directly link SVs with particular phenotypes; therefore, SV‐specific genome‐wide association study approaches are needed to efficiently associate SVs with phenotypes. Genome editing using such as CRISPR/Cas system provides a way to validate or induce SVs of interest in plants to produce advanced crop varieties (Zhang et al., 2018). Building pangenomes or genus‐wide pangenomes (Khan et al., 2020) provides a useful way of mining SV‐related genes. To further benefit plant breeding, SV‐phenotype‐related databases for different species are needed. By searching such databases, breeders and crop researchers can identify candidate SVs that can be used in their breeding programmes to produce improved varieties.

Funding

This work was funded by the Australian Research Council (ARC) (grant no: LP160100030).

Conflict of interest

There is no conflict of interest to disclose.

Author contributions

Y.Y drafted this manuscript and prepared all figures and tables. P.B, J.B and D.E edited this manuscript. All authors read and approved this manuscript.
  162 in total

1.  MoDIL: detecting small indels from clone-end sequencing with mixtures of distributions.

Authors:  Seunghak Lee; Fereydoun Hormozdiari; Can Alkan; Michael Brudno
Journal:  Nat Methods       Date:  2009-05-31       Impact factor: 28.547

2.  Massive haplotypes underlie ecotypic differentiation in sunflowers.

Authors:  Marco Todesco; Gregory L Owens; Natalia Bercovich; Jean-Sébastien Légaré; Shaghayegh Soudi; Dylan O Burge; Kaichi Huang; Katherine L Ostevik; Emily B M Drummond; Ivana Imerovski; Kathryn Lande; Mariana A Pascual-Robles; Mihir Nanavati; Mojtaba Jahani; Winnie Cheung; S Evan Staton; Stéphane Muños; Rasmus Nielsen; Lisa A Donovan; John M Burke; Sam Yeaman; Loren H Rieseberg
Journal:  Nature       Date:  2020-07-08       Impact factor: 49.962

3.  rSW-seq: algorithm for detection of copy number alterations in deep sequencing data.

Authors:  Tae-Min Kim; Lovelace J Luquette; Ruibin Xi; Peter J Park
Journal:  BMC Bioinformatics       Date:  2010-08-18       Impact factor: 3.169

4.  A comprehensive genome variation map of melon identifies multiple domestication events and loci influencing agronomic traits.

Authors:  Guangwei Zhao; Qun Lian; Zhonghua Zhang; Qiushi Fu; Yuhua He; Shuangwu Ma; Valentino Ruggieri; Antonio J Monforte; Pingyong Wang; Irene Julca; Huaisong Wang; Junpu Liu; Yong Xu; Runze Wang; Jiabing Ji; Zhihong Xu; Weihu Kong; Yang Zhong; Jianli Shang; Lara Pereira; Jason Argyris; Jian Zhang; Carlos Mayobre; Marta Pujol; Elad Oren; Diandian Ou; Jiming Wang; Dexi Sun; Shengjie Zhao; Yingchun Zhu; Na Li; Nurit Katzir; Amit Gur; Catherine Dogimont; Hanno Schaefer; Wei Fan; Abdelhafid Bendahmane; Zhangjun Fei; Michel Pitrat; Toni Gabaldón; Tao Lin; Jordi Garcia-Mas; Yongyang Xu; Sanwen Huang
Journal:  Nat Genet       Date:  2019-11-01       Impact factor: 38.330

5.  Variation in abundance of predicted resistance genes in the Brassica oleracea pangenome.

Authors:  Philipp E Bayer; Agnieszka A Golicz; Soodeh Tirnaz; Chon-Kit Kenneth Chan; David Edwards; Jacqueline Batley
Journal:  Plant Biotechnol J       Date:  2018-05-31       Impact factor: 9.803

6.  Evolutionary Genomics of Structural Variation in Asian Rice (Oryza sativa) Domestication.

Authors:  Yixuan Kou; Yi Liao; Tuomas Toivainen; Yuanda Lv; Xinmin Tian; J J Emerson; Brandon S Gaut; Yongfeng Zhou
Journal:  Mol Biol Evol       Date:  2020-12-16       Impact factor: 16.240

7.  DELLY: structural variant discovery by integrated paired-end and split-read analysis.

Authors:  Tobias Rausch; Thomas Zichner; Andreas Schlattl; Adrian M Stütz; Vladimir Benes; Jan O Korbel
Journal:  Bioinformatics       Date:  2012-09-15       Impact factor: 6.937

8.  A high-contiguity Brassica nigra genome localizes active centromeres and defines the ancestral Brassica genome.

Authors:  Sampath Perumal; Chu Shin Koh; Lingling Jin; Miles Buchwaldt; Erin E Higgins; Chunfang Zheng; David Sankoff; Stephen J Robinson; Sateesh Kagale; Zahra-Katy Navabi; Lily Tang; Kyla N Horner; Zhesi He; Ian Bancroft; Boulos Chalhoub; Andrew G Sharpe; Isobel A P Parkin
Journal:  Nat Plants       Date:  2020-08-10       Impact factor: 15.793

9.  A genome resource for green millet Setaria viridis enables discovery of agronomically valuable loci.

Authors:  Sujan Mamidi; Adam Healey; Pu Huang; Jane Grimwood; Jerry Jenkins; Kerrie Barry; Avinash Sreedasyam; Shengqiang Shu; John T Lovell; Maximilian Feldman; Jinxia Wu; Yunqing Yu; Cindy Chen; Jenifer Johnson; Hitoshi Sakakibara; Takatoshi Kiba; Tetsuya Sakurai; Rachel Tavares; Dmitri A Nusinow; Ivan Baxter; Jeremy Schmutz; Thomas P Brutnell; Elizabeth A Kellogg
Journal:  Nat Biotechnol       Date:  2020-10-05       Impact factor: 54.908

10.  Long-read sequencing reveals widespread intragenic structural variants in a recent allopolyploid crop plant.

Authors:  Harmeet Singh Chawla; HueyTyng Lee; Iulian Gabur; Paul Vollrath; Suriya Tamilselvan-Nattar-Amutha; Christian Obermeier; Sarah V Schiessl; Jia-Ming Song; Kede Liu; Liang Guo; Isobel A P Parkin; Rod J Snowdon
Journal:  Plant Biotechnol J       Date:  2020-09-06       Impact factor: 9.803

View more
  5 in total

Review 1.  Pangenomes as a Resource to Accelerate Breeding of Under-Utilised Crop Species.

Authors:  Cassandria Geraldine Tay Fernandez; Benjamin John Nestor; Monica Furaste Danilevicz; Mitchell Gill; Jakob Petereit; Philipp Emanuel Bayer; Patrick Michael Finnegan; Jacqueline Batley; David Edwards
Journal:  Int J Mol Sci       Date:  2022-02-28       Impact factor: 5.923

2.  Next-Generation Sequencing of Local Romanian Tomato Varieties and Bioinformatics Analysis of the Ve Locus.

Authors:  Anca-Amalia Udriște; Mihaela Iordachescu; Roxana Ciceoi; Liliana Bădulescu
Journal:  Int J Mol Sci       Date:  2022-08-28       Impact factor: 6.208

Review 3.  Entailing the Next-Generation Sequencing and Metabolome for Sustainable Agriculture by Improving Plant Tolerance.

Authors:  Muhammad Furqan Ashraf; Dan Hou; Quaid Hussain; Muhammad Imran; Jialong Pei; Mohsin Ali; Aamar Shehzad; Muhammad Anwar; Ali Noman; Muhammad Waseem; Xinchun Lin
Journal:  Int J Mol Sci       Date:  2022-01-07       Impact factor: 5.923

Review 4.  Genomic Variations and Mutational Events Associated with Plant-Pathogen Interactions.

Authors:  Aria Dolatabadian; Wannakuwattewaduge Gerard Dilantha Fernando
Journal:  Biology (Basel)       Date:  2022-03-10

Review 5.  Current status of structural variation studies in plants.

Authors:  Yuxuan Yuan; Philipp E Bayer; Jacqueline Batley; David Edwards
Journal:  Plant Biotechnol J       Date:  2021-07-20       Impact factor: 9.803

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.