Literature DB >> 24907366

Structural variations in plant genomes.

Rachit K Saxena, David Edwards, Rajeev K Varshney.

Abstract

Differences between plant genomes range from single nucleotide polymorphisms to large-scale duplications, deletions and rearrangements. The large polymorphisms are termed structural variants (SVs). SVs have received significant attention in human genetics and were found to be responsible for various chronic diseases. However, little effort has been directed towards understanding the role of SVs in plants. Many recent advances in plant genetics have resulted from improvements in high-resolution technologies for measuring SVs, including microarray-based techniques, and more recently, high-throughput DNA sequencing. In this review we describe recent reports of SV in plants and describe the genomic technologies currently used to measure these SVs.

Entities: Chemical Disease Gene Species

Keywords: copy number variations (CNVs); inversions; next-generation sequencing (NGS); presence and absence variations (PAVs); structural variations (SVs); translocations

Mesh：

Year: 2014 PMID： 24907366 PMCID： PMC4110416 DOI： 10.1093/bfgp/elu016

Source DB: PubMed Journal: Brief Funct Genomics ISSN： 2041-2649 Impact factor: 4.241

INTRODUCTION

Plant species frequently possess unique features in terms of their habitat, growth and reproduction, often owing to differences in their genomes. Unlocking the information present within plant genomes will advance our understanding of some of the basic biological phenomena that make individual plant species special and may help in the improvement of agronomic crop species. A central challenge in genome studies is to correlate genomic DNA variation with observed heritable phenotypes [1]. The ability to detect genomic differences between individuals is the foundation of these studies, and technologies to detect genomic variation have advanced significantly in recent years. Plant genome variation exists in many forms, and these variations can be beneficial, neutral or deleterious to the plant. The first differences observed in plant genome composition were mainly in the number and structure of chromosomes, observed using microscopy. However, during the past two decades, the application of molecular genetic markers has dominated this experimental landscape [2]. Molecular marker technology has advanced from laborious and expensive restriction fragment polymorphisms to high-throughput sequence bases markers such as simple sequence repeats and single nucleotide polymorphisms (SNPs) [3]. Since the introduction of next-generation DNA sequencing (NGS) technology, SNPs have come to dominate molecular genetic studies [2, 4–6]. Recent developments have demonstrated that SNPs do not capture all the meaningful genomic variations that contribute to phenotypic differences [7] and that larger structural variants (SVs) also play an important role. SVs are defined as genomic variations that involve segments of DNA larger than 1 kb in length [8]. SVs refer to insertions/deletions (InDels), inversions, translocations and copy number variations (CNVs) [8]. SVs can also be classified as microscopic or submicroscopic depending on the method of their detection. The mechanism of SV formation has been an active area of research. Human studies revealed two main mechanisms of SV formation, which rely on sequence similarity at DNA breakpoints. The first mechanism is known as nonhomologous end-joining (NHEJ) and requires a very low level of sequence similarity at the breakpoints. NHEJ is the result of aberrant repair of uneven double-stranded breaks produced following DNA damage [9, 10]. A second mechanism proposed for repetitive sequences in the genome is termed non-allelic homologous recombination and this requires high sequence similarity at the breakpoints [11, 12]. Plant genomes host large numbers of repetitive sequences ranging from 10% in Arabidopsis to >80% in bread wheat (Triticum aestivum), and many plants contain multiple copies of entire chromosomes in the form of ploidy levels (from diploid to octaploid and higher) that arise from spontaneous genome duplication (autopolyploidy) or hybridization of chromosomes from different species (allopolyploidy). In addition to recent genome duplications, there is substantial evidence of ancient duplication events in various evolutionary lineages (paleopolyploidy). SVs can arise through duplication events, with differential loss of genes between lineages. In addition, transposons can play important roles in genome evolution and may also generate SVs. Several other mechanisms for SV production have also been proposed, such as fork stalling and template switching (FoSTeS) [13]. In human genetics, SVs have been extensively studied for their association with chronic disease [14]. However, in plants, studies of SVs are more limited. In the 10 years since the sequencing of the Arabidopsis genome, the genomes of several plant species have become available [15], and the cost of sequencing or re-sequencing genomes has reduced significantly, enabling the high-throughput genome-wide analysis of variants such as SNPs and SVs. Recently, SVs have been identified in several plant species, including Arabidopsis [16], barley (Hordeum vulgare) [17, 18], foxtail millet (Setaria italica) [19], maize (Zea mays) [7, 20, 21], rice (Oryza sativa) [22], sorghum (Sorghum bicolor) [23], soybean (Glycine max) [24] and wheat (T. aestivum) [25], and in several cases, SVs were found to be associated with phenotypic variation (Table 1). In this review we focus on submicroscopic SVs and present methods for their identification and characterization. In addition, we provide a brief account of current research into microscopic SVs.

Table 1:

List of structural variations and their associations with phenotypes in plant species

Plant species	SVs identified		Genes covered	Accessions	Trait/genes associated with SVs	Method used for detection of SVs	Studies
Plant species	CNVs	PAVs	Genes covered	Accessions	Trait/genes associated with SVs	Method used for detection of SVs	Studies
Arabidopsis	1059 regions	–	500 genes	80 inbred lines	Adaptation to diverse environment	Re-sequencing and de novo assembly	Cao et al. [26]
	14 CNV events (comparing 16°C lineage with reference lineage at 22°C); 11 CNV events (comparing 28°C lineage with reference lineage at 22°C); 13 CNV events (comparing biotic stress lineage with reference lineage)	–	400 (comparing 16°C lineage with reference lineage at 22°C); 292 (comparing 28°C lineage with reference lineage at 22°C); 402 (comparing biotic stress lineage with reference lineage)	Three siblings from five lineage derived from common ancestor	Different environmental conditions (temperature and biotic stress)	CGH	DeBolt [16]
	2315 large InDels including CNVs		316 genes	Ler accession compared with Co10	Stress responsive genes	Re-sequencing	Lu et al. [27]
	1220 (Eil-0), 1312 (Lc-0), 1344 (Sav-0) and 987 (Tsu-1) genes with deletions were identified			Eil-0, Lc-0, Sav-0, Tsu-1 and Col-0 (reference)	Common ancestry and history of rearrangements	CGH and re-sequencing	Santuari et al. [28]

Barley	1 kb insertion in the upstream of the HvAACT1 coding region			265 cultivated and 154 wild barley accessions were used to examine the presence of the specific insertion	Aluminium tolerance	Targeted re-sequencing	Fujii et al. [29]
Barley	Four times Bot 1copies in barley landrace Sahara 3771 as compared with clipper genotypes				Boron toxicity tolerance	Combination of mapping approaches, hybridization and qPCR	Sutton et al. [17]

Foxtail millet	37 232 SVs in SLX_Yugu1; 41 514 SVs SLX_Zhang gu		1612 SVs in genes in SLX_Yugu1; 2163 SVs in genes SLX_Zhang gu	Landrace, Shi-Li-Xiang (SLX) compared with the two reference genome sequences	–	Re-sequencing	Bai et al. [19]

Maize	>2000 regions	–	All CNVs present in genes	14 inbred lines	Disease response and heterosis	CGH	Beló et al. [20]
	10 000 segments	–	The majority (70%) of genes had an read-depth variants in at least one line	103 lines across pre-domestication and domesticated Zea mays lines	–	Re-sequencing	Chia et al. [30]
	–	296 genes putatively missing from one or more lines		Six elite maize inbred lines	–	Re-sequencing	Lai et al. [31]
	Tandem triplication of MATE1 gene	–	MATE1 gene	Three copy allele were identified from maize and teosinte diversity panel and validated in recombinant inbred lines	Aluminium tolerance	qPCR	Maron et al. [32]
	>400 segments	>1700	∼50 genes associated with CNVs and 180 genes associated with PAVs	Mo17 and B73	Domestication	CGH	Springer et al. [7]
	3410 genes			19 inbred and 14 wild lines	–	CGH	Swanson-Wagner et al. [21]
	333 genes			278 inbred lines	Breeding selection	Re-sequencing	Jiao et al. [33]
Opium	–	10 genes		Three varieties, F₂ population of 271 individuals	Noscapine synthesis	Re-sequencing	Winzer et al. [34]

Pigeonpea	–	29 regions	–	4 lines	Cytoplasmic male sterility	Re-sequencing	Tuteja et al. [35]

Potato	Four genes associated with CNVs			16 lines	Growth and development	FISH	Iovene et al. [36]

Rice	1676 segments	1327 genes	50% CNVs and all PAVs associated with genes	40 cultivated and 10 wild lines	Disease resistance and domestication	Re-sequencing	Xu et al. [22]
Rice	641 segments	–	–	One line each from japonica and indica	–	CGH	Yu et al. [37]

Sorghum	17 111	16 487	CNVs associated with 2600 genes and PAVs associated with 1416 genes	Two sweet and one grain sorghum inbred lines	Disease resistance and selection	Re-sequencing	Zheng et al. [23]

Soybean	Significant levels of CNVs identified	25 genes		Williams 82 individuals and parental lines	Stress responsive genes	CGH	Haun et al. [38]
	–	18 600 regions	856 genes	14 cultivated and 17 wild lines	Metabolic and catalytic processes and disease resistance	Re-sequencing	Lam et al. [39]
	188–267 segments	133 regions	672 genes associated with CNVs	Archer, Minsor, Noir 1and Williams 82	Disease resistance and biotic stress	CGH	McHale et al. [24]

Wheat	Two to three copies of Vrn1-A	–	–	–	Flowering time	Targated re-sequencing	Díaz et al. [40]
	–	Deletion in upstream region of Ppd-1 gene	–	–	Heading time	qPCR	Nishida et al. [25]
	85	7	–	–	Biotic and abiotic stresses	Re-sequencing	Saintenac et al. [41]

List of structural variations and their associations with phenotypes in plant species

TYPES OF SVs

Microscopic SVs

After defining chromosomes as the carrier of the genes in the early 20th century, a number of karyotype studies were conducted to determine the size and number of chromosomes in different species. Features could be visualized directly on chromosomes through a microscope using cytogenetic techniques such as chromosome painting or fluorescent in situ hybridization (FISH). The earliest unbanded karyotypes consisted of relatively short condensed chromosomes that were barely distinguishable from one another. However, changes in chromosome numbers and highly abnormal chromosomes could be distinguished. Later, solid-stained chromosomes were used to detect secondary constrictions, satellite-regions and size variations in heterochromatic regions [42]. By using chromosome-banding techniques, more discrete structural variations could be identified in plant genomes. An alternative strategy, FISH, allows the positioning of unique sequences and repetitive DNA on chromosomes. At this resolution, common variations such as changes in length or inversions of the pericentric heterochromatic region of chromosomes could be identified. Genomic in situ hybridization was the first technique that used fluorescent labels for analysing genome organization in interspecific hybrids, allopolyploid species and interspecific introgression lines [43]. FISH, together with chromosomal arm ratio and the mapping of heterochromatic regions was conducted for inbred lines of maize and lily (Lilium spp.) [44, 45]. In several plant species, large cloned genomic regions maintained as bacterial artificial chromosome (BACs) have also been successfully used as FISH probes to determine the chromosomal location of specific sequences [46, 47]. Recently, FISH has been used to survey CNVs using 18 randomly selected potato (Solanum tuberosum) BAC clones in 16 potato cultivars with diverse genetic backgrounds. Six BACs with insert sizes of 137–145 kb were found to be associated with large CNVs. Four genes affected by CNVs displayed a dosage effect in transcription and were probably affecting the growth and development of the potato plants [36]. FISH screening using subtracted random polymerase chain reaction (PCR) libraries as probes also provided the positions of microsatellite and chromosome-specific subtelomeric sequences [48]. Cytogenetically detectable heterochromatic variants have been used for species distinction and relationship studies in plants [49, 50]. These initial studies have provided knowledge of genome size variation that demonstrated the relatively consistent nature of genomes within a species. However, microscopic variations could be found even among closely related species, and these might be correlated with various adaptive features at the nuclear and organismic levels in plants. Microscopic variations in some genera occur in a discontinuous manner, forming groups of taxa, which are separated by regular time intervals. However, some genera showed continuous variation [49]. These facts demonstrated that microscopic genome variations could be used as corroborative evidence in plant systematics.

Submicroscopic SVs

Recent advances in DNA sequencing technology have allowed plant structural genetic variations to be analysed at a higher resolution than the microscopic studies described above. These SVs have been identified in either a genome-wide or a targeted manner, with varying degrees of resolution. Relatively little is known about genomic SVs and their association with phenotypic characteristics in plants. However, reports on such variants have started to appear (Table 1). Here we review recent SV studies in plant genomes.

Copy number variations

The term CNV is used to define sequences that demonstrate a variable copy number between individuals. The term has been used to describe duplications, deletions and insertions [51]. CNVs have been extensively characterized in maize [7]. In this study, genome-wide comparison of two inbred lines B73 and Mo17, identified 400 putative CNVs, and these CNVs were reported to be the result of tandem duplications [7]. In a subsequent study, genome-wide comparison of a set of 14 inbred maize lines identified thousands of CNVs [20]. In a further study in maize, CNVs were examined in 19 diverse inbred maize lines and 14 teosinte accessions [21]. This identified 479 genes with higher copy number and 3410 genes with fewer copies following comparison with a reference genome. Most of these CNVs were found to be present in related wild individuals, suggesting that these CNVs were not associated with deleterious genes responsible for lethality or major fitness loss [21]. In the small genome model plant Arabidopsis, CNVs were detected in 402 genes [16], while in rice, a comparison of japonica and indica cultivars identified 641 CNVs [37]. The majority of these rice CNVs suggested a loss of genomic segments in the indica cultivar ‘Guang-lu-ai 4’. Japonica and indica rice diverged around 0.2–0.4 million years ago and display a high degree of DNA sequence variation [52]. Genome-wide patterns of CNVs have also been detected in sorghum by comparing two sweet and one grain inbred sorghum lines, identifying 3234 CNVs in 2600 genes [23]. Soybean was the first legume species to have its genome analysed for CNVs, and a total of 267 CNVs with an average size of 18–23 kb were detected across the genomes assayed [24] (Table 1). The relationship between CNV occurrence and recombination frequency is not fully understood. In general, CNVs are scattered across plant genomes. Studies conducted in the maize genome have revealed that low-recombination regions such as telomeres show a greater number of CNVs [20, 21]. In contrast to maize, higher levels of CNV were identified in high-recombination regions in soybean and barley [18, 24].

Presence and absence variations

Sequences that are present in one genome and absent in another genome have been termed presence–absence variation (PAV). PAVs can be considered to be extreme CNVs, where the sequence is completely missing from one or more individual. A comparison of sequence data from two maize inbred lines (B73 and Mo17) detected 1783 PAVs that were present in the B73 genome and absent in the Mo17 genome. These PAVs relate to 1270 genes, suggesting that PAV affects a significant portion of maize genome. Analysis of these PAVs highlighted their association with ancestral evolution events and domestication [7]. Initially, CNVs and PAVs were combined for analysis of genome-wide variation in maize [21]. However, the mechanism of PAV formation was found to be different from that for CNVs and is not influenced by recombination. It was found that a short deletion mechanism that is based on short direct repeats likely contributes to the high rate of PAV among maize genotypes [53]. Comparing sequence data from sweet sorghum and grain sorghum lines identified 16 487 PAVs associated with 1416 genes. In pigeonpea (Cajanus cajan), PAVs have been reported in the mitochondrial genomes of male-sterile (A-), maintainer (B-), hybrid (H-) and wild (W-) lines of pigeonpea [35]. Similar mitochondrial structural variations have been identified in other plant species including maize [54] and Arabidopsis [55].

Other structural variations

Other types of submicroscopic structural variation include inversions and translocations. These variations have been reported in nuclear and organelle genomes and are of considerable interest, as they can introduce novel diversity in plants. Several studies have reported the presence of subgenomic structural variations in mitochondrial genomes that have arisen from inversions and translocations [56, 57]. While such events in plant mitochondria increase organelle genome complexity, recombination has also been found to maintain genomic stability and may provide a mechanism to increase genetic variation in the absence of sexual reproduction [58]. Genomic inversions can be a driver of speciation, and this has been studied in plants using comparative genomics [59, 60]. An inverted region may not successfully recombine with its counterpart chromosome and might lead to infertility. Inversions are highly polymorphic in some species and may play a critical role in local adaptation [61]. Large-scale inversions have also been characterized in the chloroplast genomes of land plants [62]. Cytological studies have previously been conducted to characterize genomic inversions in various plant species; however, the application of large-scale genome sequencing will significantly help in characterizing the complex landscape of inversions and translocations in plant genomes.

APPROACHES TO IDENTIFY SUBMICROSCOPIC STRUCTURAL VARIATIONS

The on-going revolution in DNA sequencing technology known as NGS together with advances in bioinformatics have allowed structural genetic variations to be analysed at high resolution at a genome-wide level [63, 64]. SVs differ in size and complexity and hence different techniques have been used to characterize them in plant genomes. PCR-based approaches have been used for targeted regions of the genome. For example, real-time quantitative PCR (qPCR) was used to detect multiple copies of Bot1 gene in barley genotypes [17], MATE1 gene in maize genotypes [32] and a deletion in the upstream region of Ppd-1 homeologs of wheat [25]. This technique offers a high sensitivity and a high-throughput alternative to the more traditional Southern blot used for determining gene copy number. PCR can also identify small translocations and inversions, as well as InDel polymorphism and CNVs [65]. Below we discuss approaches that have had a major impact on the discoveries of submicroscopic variants in the plant genome.

Microarrays

Microarray-based techniques were among the first used to detect genome-wide variation in human and plant genomes. Using array comparative genomic hybridization (aCGH), differentially labelled DNA from the test genome and a reference genome are hybridized to an array. Such an array contains thousands of probes developed from known gene sequences. BACs are the most popular arrayed targets in aCGH experiments, as they provide extensive coverage of the genome; however, cDNAs, PCR products and oligonucleotides can all be used as array targets. To increase the resolution of aCGH, the ‘complexity’ of the input DNA is reduced by a method called representation or whole-genome sampling [66]. A number of variations have been included in this approach to improve its efficiency, for instance using spotted oligonucleotides on Affymetrix arrays [67]. aCGH was first developed and applied for cancer genomics [14], and later used extensively in plant genomics to detect SVs [7, 16, 21, 24]. An early version of an array used in maize was composed of 14 423 BACs [7]. In comparison, the latest maize array contains 32 450 maize genes [21]. In Arabidopsis, a whole-genome CGH array was used to estimate SVs [16], and a recently developed high-resolution CGH platform was used to investigate the structure and diversity of genomic introgressions in two classical soybean near isogenic line populations [68]. Several factors affect aCGH-based SV detection. Gene distribution along the genome captured in arrays is not uniform, leading to bias; the majority of the probes are often designed to be complementary to a single genotype, reducing the efficiency of detecting SVs in other genotypes; sequences that are present in individuals and not in the reference sequence from which CGH arrays designed would not be represented; hybridization signals may deviate owing to DNA polymorphisms and lead to the false calling of SVs; and finally there remains a need to physically map the location of the probe in genome. A further challenge is applying moderate density arrays to highly repetitive plant genomes. In this scenario, a high-density microarray platform designed for aCGH would greatly improve the efficiency of detection and estimation of SVs. Evolving NGS techniques offer several advantages over aCGH by enabling the direct detection of DNA variations and recombination breakpoints [69]. NGS-based approaches also provide ability to detect inversions and translocations that are not generally detected by aCGH. However, aCGH would still be beneficial in genomic regions with multiple repeats where NGS-based assembly is difficult.

Genome sequencing/re-sequencing

In recent years, sequencing technologies have rapidly evolved from classical Sanger sequencing to NGS [70]. This has significantly lowered the cost of sequencing DNA. However, there are some limitations associated with these technologies such as the length of a DNA molecule that can be sequenced, though there are continuous improvements in this area. At present read lengths produced by the various technologies range from 25 bp to 15 kb. There is usually a compromise between read length, cost and accuracy, with low cost or longer read sequencing generally demonstrating significantly lower accuracy than some of the more popular technologies. The Illumina sequencing systems currently dominate the NGS market and they produce accurate reads of 150 bp for the HiSeq2500 and 300 bp for the MiSeq. Many NGS technologies such as those from Illumina use paired end or mate pair sequencing protocols, where two reads are generated with a known orientation and approximate distance between them. This significant assists the specificity of mapping or assembling this sequence data. Evolving technologies such as Single-Molecule Real Time (SMART) sequencing from Pacific Biosciences and Moleculo technology from Illumina have demonstrated the ability in reading long molecules of DNA up to 10 kb to 20 kb [71]. Nanopore technology also promises advances in this area, though little is known about the specific applications. Advances in DNA sequencing technology will continue to drive genomics and enhance the ability to detect structural variations with increasing resolution over a greater number of samples. There are three main approaches that can be used for the detection of SVs in plant genomes using DNA sequence data: (i) de novo assembly, (ii) re-sequencing approach and (iii) pan-genome. In this approach two or more unique assemblies can be compared to identify and characterize SVs. Once the assemblies have been generated, this is a very efficient approach and can detect all types of SVs including CNVs, PAVs, translocations and inversions (Figure 1). The initial assembly needs high sequence coverage and sophisticated algorithms to reconstruct the genome from short overlapping sequences [72, 73]. This approach is the most robust for the characterization of SVs in a genome; however, the production of de novo assembled genomes of suitable quality remains the chief limitation. Draft plant genome assemblies are often highly fragmented and may contain many collapsed repeat regions that confound CNV detection. Improving and validating genome assemblies is an active research area, which is advancing through the application of novel algorithms and improved DNA sequence data. However, until the sequencing cost reduces significantly with substantially longer reads the de novo assembly of all genotypes representing a species is unfeasible and this approach is usually restricted to the detection of inter-species variation. Different draft genome assemblies from various plant species have been used to detect lineage and translocations and inversions [59, 60, 74].

Figure 1:

Two major NGS approaches to detect SVs are de novo assembly and re-sequencing. De novo assembly method is highly efficient to detect all types of SVs including CNVs, PAVs, inversions and translocations. Re-sequencing approaches are viable options to detect CNVs and PAVs. In the re-sequencing approach, DNA sequence reads from individual genotypes are aligned to a closely related reference genome (Figure 1). Differences between genomes then correlate to variations between the aligned reads and the reference genome. This approach can also be used for the detection of inversions, based on the orientation of aligned reads with the reference genome. Although this approach may not have such a high resolution as the de novo assembly approach, it will remain, in our opinion, the preferred method to detect intra-specific variation owing to its relatively low cost and lack of complexity associated with the generation of a de novo genome assembly for each variety. The re-sequencing approach has been used in sorghum, where a set of nearly 1500 genes differentiating sweet and grain sorghum were identified harbouring SVs [23]. Re-sequencing-based approaches are currently being applied to detect SVs in several other projects including the 1001 genome project in Arabidopsis [75], the maize panzea project (http://www.panzea.org) and the rice variation catalogue [22]. We are currently using this approach in pigeonpea, chickpea (Cicer arietenum) and peanut (Arachis hypogaea), re-sequencing 300 lines from reference sets for each species. These on-going efforts in a variety of plant species will provide insight into the distribution of SVs in plants as well as their evolution. The pan-genome is composed of a core genome and a dispensable genome. The core genome contains genome segments or genes that are present in all accessions, while a dispensable genome is composed of partially shared and accession-specific DNA sequence elements. This concept of separate core and dispensable genomes was first described in prokaryotes [76]. A single genome sequence does not possess the entire genomic architecture of a species and so a pan-genome approach enables the description of a species rather than an individual at the genome level. Multiple accession sequencing projects in several plant species enables the creation of pan-genomes by defining the core and dispensable genome components of a species. The pan-genome has been described in some plants, e.g. maize [77-79] and Arabidopsis thaliana [80, 81].

ASSOCIATION OF SVs WITH PLANT PHENOTYPES

The role of SVs has been found to be important in human evolution and disease [13, 21], and SVs have been shown to be more frequent than SNPs in human genomes [13]. Although SVs have also been discovered in plants, their discovery and characterization are heavily reliant on the availability of at least one reference genome [82]. Few studies have been conducted to characterize the role of SVs in shaping plant phenotypes. The role of PAVs in determining plant phenotype has been demonstrated in opium (Papaver somniferum), where a cluster of 10 genes spanning a 221 kb genomic region were found to be associated with noscapine synthesis. Analysis of an F2 mapping population indicated that these genes are tightly linked and absent in non-noscapine-producing lines [34]. Many of the CNVs identified in maize were found to be associated with domestication [21, 30]. The effect of selection on maize diversity has been estimated by sequencing 278 temperate maize inbred lines from different stages of breeding history. The results demonstrated that modern breeding has introduced highly dynamic genetic variations in the form of SNPs, InDels and CNVs, and affected a number of genic and non-genic regions in the maize genome [33]. The first-generation maize HapMap was constructed using sequence polymorphisms between 27 diverse inbred lines. This identified 18 regions that have undergone selective sweeps, including one region of 11 Mb on the long arm of chromosome 10 [83]. The second-generation maize HapMap was constructed using 103 lines and identified SVs that are enriched at loci associated with important traits [30]. An RNA-seq experiment using diverse lines of maize detected 757 loci that were restricted to a subset of the lines. Using de novo assembly of unmapped reads, novel transcripts were identified. It was also demonstrated that PAVs observed between different heterotic groups were transcribed. Furthermore, a core set and dispensable set of genes were identified [84]. Similarly Lai et al. [31] re-sequenced six elite maize inbred lines, including the parents of the commercial hybrids, and found 296 genes in B73 that were missing from at least one of the six inbred lines. Inbred lines representing different heterotic groups contained different sets of deleted genes. In both RNA-seq [84] and re-sequencing [31] studies it was postulated that unique transcripts or genes present in different heterotic groups might be contributing to the genetic basis of heterosis. In a recent study in maize by Maron et al. [32], CNVs were identified for the MATE1 gene in aluminium-tolerant lines, but these were not common in teosinte. This study suggested that multiple copies of the MATE1 gene arose recently and probably after domestication, and that CNVs were selected for their association with aluminium tolerance. MATE1 expression found to be associated with CNV, where three MATE1 copies were identical and part of a tandem triplication. Only three maize-inbred lines carrying the three-copy allele and demonstrating higher aluminium tolerance were identified from maize and teosinte diversity panels [32]. CNV of a 31 kb repeat segment observed in different haplotypes of the Rhg1 locus encode multiple gene products in soybean cyst nematode (SCN)-resistant varieties. In SCN-susceptible varieties, one copy of the 31 kb segment per haploid genome was present. SCN resistance was found to be associated with increased expression of the CNV-related genes [85]. In an interesting study in palmer amaranth (Amaranthus palmeri), some plants were found resistant to herbicide glyphosate. These resistant plants contained 5–160 copies more of the EPSPS gene than susceptible plants. Expression and protein level of EPSPS gene was positively correlated with enhanced copy number [86]. In wheat, the recent association of SVs with plant phenotype has come in form of CNVs and large InDel polymorphisms. CNV for the gene Vrn-A1 is associated with intermediate or late flowering phenotypes. CNV of Ppd-B1 is found to contribute to photoperiod sensitivity in wheat [40]. Genotypes with a single copy of the Ppd-B1 gene were photoperiod sensitive, while genotypes with elevated copy numbers were found to be early flowering and day-neutral [40]. An InDel polymorphism in the 50 bp upstream region of the Ppd-1 gene was also associated with heading time of wheat cultivars [25]. In barley, a CACTA-like transposon insertion 5 kb upstream of the Open Reading Frame (ORF) of the aluminium tolerance gene HcAACT1 enhances and alters the tissue localization of HcAACT1 expression [29]. Another example of trait-associated CNVs in barley is the boron efflux carrier gene Bot1 that plays an important role in boron tolerance [17]. CNVs have been found to be associated with nucleotide-binding leucine-rich repeat (NB-LRR) genes and receptor-like kinase (RLK) genes, known to be involved in plant defence-related mechanisms. CNVs related to disease resistance and biotic stress responses have also been identified in Arabidopsis [27], rice [22] and soybean [24]. Variable copies of these genes may be advantageous in the face of changing environmental conditions and possible threats posed by continuously evolving pest and pathogens.

OUTLOOK

Results from plant genome analysis have demonstrated the importance of SVs in evolutionary and biological processes. Initial studies conducted in a limited number of plant species suggest that a range of SVs are present and distributed across the genomes. It is anticipated that SVs will contribute an equal amount to the overall variation observed in the genome as SNPs. The low level of sequence diversity that is often suggested to exist in some of the self-pollinated or partially cross-pollinated crop species might therefore be considered to be an overestimate. There remain challenges that need to be resolved before we achieve a complete understanding of the genome and its relationship with the plant phenotype. These include the effect of combinations of variants, interactions between genetic and environmental factors and epigenetic mechanisms. At present, no single method has the capability to detect the total complement of genomic structural variations. Even genome re-sequencing that is being applied in a number of important plant species would resolve only a proportion of the structural variation present in the genome. The highest resolution studies of SVs can be achieved by using a de novo assembly-based approach; however, this is not currently feasible for large numbers of individuals. Further, continuous improvements in sequencing technologies and reduction in costs will make it possible to detect nearly all variants between genomes. Even after de novo assembly, a significant amount of information could be lost owing to the challenges of assembling SVs using the available algorithms, and major advances in sequencing technology are required to facilitate accurate whole-genome assembly on a large scale. Improved assembly algorithms, combined with the ability to accurately sequence long stretches of DNA, would be beneficial to overcome many of these limitations. On-going and future efforts would greatly facilitate studies aimed at correlating genetic variations with plant performance. These efforts will also provide better understanding of the nature of the population history, natural selection and impact of structural variation in the plant genomes. This review describes recent reports of structural variations (SVs) in plant genomes and genomics technologies currently used to measure these SVs. Much of the recent attention in plant genetics is the result of the availability of high-resolution technologies for measuring these variants, including microarray-based techniques, and more recently, high-throughput DNA sequencing. On-going projects in a number of plant species promise to explore and characterize SVs and their associations with plant phenotypes.

78 in total

1. Pervasive gene content variation and copy number variation in maize and its undomesticated progenitor.

Authors: Ruth A Swanson-Wagner; Steven R Eichten; Sunita Kumari; Peter Tiffin; Joshua C Stein; Doreen Ware; Nathan M Springer
Journal: Genome Res Date: 2010-10-29 Impact factor: 9.043

2. Architectures of somatic genomic rearrangement in human cancer amplicons at sequence-level resolution.

Authors: Graham R Bignell; Thomas Santarius; Jessica C M Pole; Adam P Butler; Janet Perry; Erin Pleasance; Chris Greenman; Andrew Menzies; Sheila Taylor; Sarah Edkins; Peter Campbell; Michael Quail; Bob Plumb; Lucy Matthews; Kirsten McLay; Paul A W Edwards; Jane Rogers; Richard Wooster; P Andrew Futreal; Michael R Stratton
Journal: Genome Res Date: 2007-08-03 Impact factor: 9.043

Review 3. Structural variants: changing the landscape of chromosomes and design of disease studies.

Authors: Lars Feuk; Christian R Marshall; Richard F Wintle; Stephen W Scherer
Journal: Hum Mol Genet Date: 2006-04-15 Impact factor: 6.150

4. Copy number variation in potato - an asexually propagated autotetraploid species.

Authors: Marina Iovene; Tao Zhang; Qunfeng Lou; C Robin Buell; Jiming Jiang
Journal: Plant J Date: 2013-05-13 Impact factor: 6.417

5. Resequencing 50 accessions of cultivated and wild rice yields markers for identifying agronomically important genes.

Authors: Xun Xu; Xin Liu; Song Ge; Jeffrey D Jensen; Fengyi Hu; Xin Li; Yang Dong; Ryan N Gutenkunst; Lin Fang; Lei Huang; Jingxiang Li; Weiming He; Guojie Zhang; Xiaoming Zheng; Fumin Zhang; Yingrui Li; Chang Yu; Karsten Kristiansen; Xiuqing Zhang; Jian Wang; Mark Wright; Susan McCouch; Rasmus Nielsen; Jun Wang; Wen Wang
Journal: Nat Biotechnol Date: 2011-12-11 Impact factor: 54.908

6. The 1001 genomes project for Arabidopsis thaliana.

Authors: Detlef Weigel; Richard Mott
Journal: Genome Biol Date: 2009-05-27 Impact factor: 13.583

7. Draft genome sequence of chickpea (Cicer arietinum) provides a resource for trait improvement.

Authors: Rajeev K Varshney; Chi Song; Rachit K Saxena; Sarwar Azam; Sheng Yu; Andrew G Sharpe; Steven Cannon; Jongmin Baek; Benjamin D Rosen; Bunyamin Tar'an; Teresa Millan; Xudong Zhang; Larissa D Ramsay; Aiko Iwata; Ying Wang; William Nelson; Andrew D Farmer; Pooran M Gaur; Carol Soderlund; R Varma Penmetsa; Chunyan Xu; Arvind K Bharti; Weiming He; Peter Winter; Shancen Zhao; James K Hane; Noelia Carrasquilla-Garcia; Janet A Condie; Hari D Upadhyaya; Ming-Cheng Luo; Mahendar Thudi; C L L Gowda; Narendra P Singh; Judith Lichtenzveig; Krishna K Gali; Josefa Rubio; N Nadarajan; Jaroslav Dolezel; Kailash C Bansal; Xun Xu; David Edwards; Gengyun Zhang; Guenter Kahl; Juan Gil; Karam B Singh; Swapan K Datta; Scott A Jackson; Jun Wang; Douglas R Cook
Journal: Nat Biotechnol Date: 2013-01-27 Impact factor: 54.908

8. Double-strand break repair processes drive evolution of the mitochondrial genome in Arabidopsis.

Authors: Jaime I Davila; Maria P Arrieta-Montiel; Yashitola Wamboldt; Jun Cao; Joerg Hagmann; Vikas Shedge; Ying-Zhi Xu; Detlef Weigel; Sally A Mackenzie
Journal: BMC Biol Date: 2011-09-27 Impact factor: 7.431

9. Genomic heterogeneity and structural variation in soybean near isogenic lines.

Authors: Adrian O Stec; Pudota B Bhaskar; Yung-Tsi Bolon; Rebecca Nolan; Randy C Shoemaker; Carroll P Vance; Robert M Stupar
Journal: Front Plant Sci Date: 2013-04-24 Impact factor: 5.753

10. Distribution, functional impact, and origin mechanisms of copy number variation in the barley genome.

Authors: María Muñoz-Amatriaín; Steven R Eichten; Thomas Wicker; Todd A Richmond; Martin Mascher; Burkhard Steuernagel; Uwe Scholz; Ruvini Ariyadasa; Manuel Spannagl; Thomas Nussbaumer; Klaus F X Mayer; Stefan Taudien; Matthias Platzer; Jeffrey A Jeddeloh; Nathan M Springer; Gary J Muehlbauer; Nils Stein
Journal: Genome Biol Date: 2013-06-12 Impact factor: 13.583

66 in total

1. Genome-Wide Mapping of Structural Variations Reveals a Copy Number Variant That Determines Reproductive Morphology in Cucumber.

Authors: Zhonghua Zhang; Linyong Mao; Huiming Chen; Fengjiao Bu; Guangcun Li; Jinjing Sun; Shuai Li; Honghe Sun; Chen Jiao; Rachel Blakely; Junsong Pan; Run Cai; Ruibang Luo; Yves Van de Peer; Evert Jacobsen; Zhangjun Fei; Sanwen Huang
Journal: Plant Cell Date: 2015-05-22 Impact factor: 11.277

Review 2. Copy number variation and disease resistance in plants.

Authors: Aria Dolatabadian; Dhwani Apurva Patel; David Edwards; Jacqueline Batley
Journal: Theor Appl Genet Date: 2017-10-17 Impact factor: 5.699

3. Whole-genome resequencing of 292 pigeonpea accessions identifies genomic regions associated with domestication and agronomic traits.

Authors: Rajeev K Varshney; Rachit K Saxena; Hari D Upadhyaya; Aamir W Khan; Yue Yu; Changhoon Kim; Abhishek Rathore; Dongseon Kim; Jihun Kim; Shaun An; Vinay Kumar; Ghanta Anuradha; Kalinati Narasimhan Yamini; Wei Zhang; Sonnappa Muniswamy; Jong-So Kim; R Varma Penmetsa; Eric von Wettberg; Swapan K Datta
Journal: Nat Genet Date: 2017-05-22 Impact factor: 38.330

4. Mutator-Based Transposon Display: A Genetic Tool for Evolutionary and Crop-Improvement Studies in Maize.

Authors: Rahul Vasudeo Ramekar; Kyong-Cheul Park; Kyu Jin Sa; Ju Kyong Lee
Journal: Mol Biotechnol Date: 2018-11 Impact factor: 2.695

5. The Sequences of 1504 Mutants in the Model Rice Variety Kitaake Facilitate Rapid Functional Genomic Studies.

Authors: Guotian Li; Rashmi Jain; Mawsheng Chern; Nikki T Pham; Joel A Martin; Tong Wei; Wendy S Schackwitz; Anna M Lipzen; Phat Q Duong; Kyle C Jones; Liangrong Jiang; Deling Ruan; Diane Bauer; Yi Peng; Kerrie W Barry; Jeremy Schmutz; Pamela C Ronald
Journal: Plant Cell Date: 2017-06-02 Impact factor: 11.277

6. A high-quality cucumber genome assembly enhances computational comparative genomics.

Authors: Paweł Osipowski; Magdalena Pawełkowicz; Michał Wojcieszek; Agnieszka Skarzyńska; Zbigniew Przybecki; Wojciech Pląder
Journal: Mol Genet Genomics Date: 2019-10-16 Impact factor: 3.291

Review 7. Efficient genome-wide genotyping strategies and data integration in crop plants.

Authors: Davoud Torkamaneh; Brian Boyle; François Belzile
Journal: Theor Appl Genet Date: 2018-01-19 Impact factor: 5.699

8. Modeling copy number variation in the genomic prediction of maize hybrids.

Authors: Danilo Hottis Lyra; Giovanni Galli; Filipe Couto Alves; Ítalo Stefanine Correia Granato; Miriam Suzane Vidotti; Massaine Bandeira E Sousa; Júlia Silva Morosini; José Crossa; Roberto Fritsche-Neto
Journal: Theor Appl Genet Date: 2018-10-31 Impact factor: 5.699

Review 9. Connecting genome structural variation with complex traits in crop plants.

Authors: Iulian Gabur; Harmeet Singh Chawla; Rod J Snowdon; Isobel A P Parkin
Journal: Theor Appl Genet Date: 2018-11-17 Impact factor: 5.699

10. Structural variations in papaya genomes.

Authors: Zhenyang Liao; Xunxiao Zhang; Shengcheng Zhang; Zhicong Lin; Xingtan Zhang; Ray Ming
Journal: BMC Genomics Date: 2021-05-10 Impact factor: 3.969