Literature DB >> 32241919

A Genome Assembly of the Barley 'Transformation Reference' Cultivar Golden Promise.

Miriam Schreiber¹, Martin Mascher^2,3, Jonathan Wright⁴, Sudharasan Padmarasu², Axel Himmelbach², Darren Heavens⁴, Linda Milne⁵, Bernardo J Clavijo⁴, Nils Stein^2,6, Robbie Waugh^7,8,9.

Abstract

Barley (Hordeum vulgare) is one of the most important crops worldwide and is also considered a research model for the large-genome small grain temperate cereals. Despite genomic resources improving all the time, they are limited for the cv Golden Promise, the most efficient genotype for genetic transformation. We have developed a barley cv Golden Promise reference assembly integrating Illumina paired-end reads, long mate-pair reads, Dovetail Chicago in vitro proximity ligation libraries and chromosome conformation capture sequencing (Hi-C) libraries into a contiguous reference assembly. The assembled genome of 7 chromosomes and 4.13Gb in size, has a super-scaffold N50 after Chicago libraries of 4.14Mb and contains only 2.2% gaps. Using BUSCO (benchmarking universal single copy orthologous genes) as evaluation the genome assembly contains 95.2% of complete and single copy genes from the plant database. A high-quality Golden Promise reference assembly will be useful and utilized by the whole barley research community but will prove particularly useful for CRISPR-Cas9 experiments.

Entities: Chemical

Keywords: Barley; Golden Promise; reference assembly

Mesh：

Year: 2020 PMID： 32241919 PMCID： PMC7263683 DOI： 10.1534/g3.119.401010

Source DB: PubMed Journal: G3 (Bethesda) ISSN： 2160-1836 Impact factor: 3.154

Barley is a true diploid with 14 chromosomes (2n = 14). Its genome is around 5Gb in size and mainly consists of repetitive elements (International Barley Genome Sequencing Consortium 2012). Barley is and has been an important crop for thousands of years (Mascher ). It was the fourth most produced cereal in 2016 worldwide (Faostat, http://www.fao.org/faostat/en/#home) and second most in the UK. While the majority of barley is used as feed, the most important market for 2-row spring barley is the whisky industry. An iconic historical variety is the cv. Golden Promise which was used extensively for malting and whisky production and some distilleries still use it today. Golden Promise is a 2-row spring type which was mainly grown in Scotland in the 1970s and early 1980s and was identified as a semi-dwarf mutant after a gamma-ray treatment of the cultivar Maythorpe. In recent years, the main research interest in Golden Promise has come from its genetic transformability. Most barley transformations are successfully conducted using Golden Promise as it usually achieves the best shoot recovery from callus (Hensel ). While many other cultivars have been tested and some successfully used, the transformation efficiency of Golden Promise is always superior (Murray ; Ibrahim ; Lim ) With the rise of the CRISPR-Cas9 genome editing technology, a potential Golden Promise reference assembly has already sparked wide interest in the barley community. The use of CRISPR-Cas9 ideally requires a complete and correct reference assembly for the identification of target sites (Karkute ). The Cas9 enzyme targets a position in the genome based on a sgRNA (single-guide RNA) followed by a PAM (protospacer-adjacent motif). The guide RNA is usually designed to be 20 bp long and target-specific to avoid any off-target effects. The PAM region consists of three nucleotides “NGG” (Belhaj ; Lawrenson ). Any nucleotide variation between different cultivars can therefore cause problems with the CRISPR-Cas9 genome editing technology (Bortesi ; Jaganathan ). The time and cost involved in such increasingly common experiments highlights the value of a high-quality Golden Promise reference assembly.

Materials and Methods

Contig construction and scaffolding

DNA extraction, library construction and sequencing:

High molecular weight barley DNA was isolated from leaf material of 3-week old Golden Promise plants that had been kept in the dark for 48 hr to reduce starch levels. DNA was extracted using the GE Life Sciences Nucleon PhytoPure kit (GE Healthcare Life Sciences, Buckinghamshire, UK) according to the Manufacturers’ instructions. Both paired-end and long mate-pair libraries were constructed and sequenced at the Earlham Institute by the Genomics Pipelines Group. A total of 2 µg of DNA was sheared targeting 1 kbp fragments on a Covaris-S2 (Covaris Brighton, UK), size selected on a Sage Science Blue Pippin 1.5% cassette (Sage Science, Beverly, USA) to remove DNA molecules <600bp, and amplification-free, paired-end libraries constructed using the Kapa Biosciences Hyper Prep Kit (Roche, New Jersey, USA). Long mate-pair libraries were constructed from 9 µg of DNA according to the protocol described in Heavens based on the Illumina Nextera Long Mate Pair Kit (Illumina, San Diego, USA). Sequencing was performed on Illumina HiSeq 2500 instruments with a 2x250 bp read metric targeting >60x raw coverage of the amplification-free library and 30x coverage of a combination of different insert long mate-pair libraries with inserts sizes >7 kbp.

Contig and scaffold generation:

Contigging was performed using the w2rap-contigger (Clavijo ). Three mate-pair libraries were produced with insert sizes 6.5, 8 and 9.5kb and sequenced to generate approximately 284 million 2x250 bp reads. Mate-pair reads were processed and used to scaffold contigs as described in the w2rap pipeline (Clavijo ; https://github.com/bioinfologics/w2rap). Scaffolds less than 500 bp were removed from the final assembly.

Chromosome conformation capture

Dovetail:

Golden Promise 10-day old leaf material was sent to Dovetail Genomics (Santa Cruz, CA, USA) for the construction of Chicago libraries. Dovetail extracted high molecular weight DNA and conducted the library preparations. The Chicago libraries were sequenced on an Illumina HiSeqX (Illumina, San Diego, CA, USA) with 150bp paired-end reads. Using the scaffold assembly as input, the HiRise scaffolding pipeline was used to build super scaffolds (Putnam ).

Hi-C:

The Hi-C library construction from one week old seedlings of Golden Promise was performed as per protocol described in Padmarasu using DpnII for digestion of crosslinked chromatin. Sequencing of the Hi-C library was conducted on an Illumina HiSeq 2500 (Illumina, San Diego, CA, USA) with 101 bp paired-end reads. Super scaffolds from Dovetail were ordered and orientated to build the final pseudomolecule using the TRITEX assembly pipeline (Monat ), with a detailed user guide available (https://tritexassembly.bitbucket.io).

Repeat and transcript annotation

The final assembly was analyzed for repetitive regions using RepeatMasker (version 4.0.9) (Smit et al. 2013-2015) with the TREP Repeat library (trep-db_complete_Rel-16) (Wicker ) and changing repetitive regions to lower case (-xsmall parameter) [repeat library downloaded from: http://botserv2.uzh.ch/kelldata/trep-db/downloadFiles.html]. The output of RepeatMask was condensed using the perl script “one-code-to-find-them-all” (Bailly-Bechet ) with the parameters–strict and–unknown. Transcript annotation was transferred from the BaRT transcriptome dataset (Rapazote-Flores ) and the TRITEX gene annotation (Monat ), using Gmap (version 2018-03-25) with the following parameters: -f 2 -n 1–min-trimmed-coverage = 0.8–min-identity = 0.9 (both files are available to download from figshare. BaRT: https://doi.org/10.6084/m9.figshare.9705278; TRITEX: https://doi.org/10.6084/m9.figshare.9705125).

Data validation and quality control

We used BUSCO with the plant dataset (embryophyta_odb9). For gene prediction BUSCO uses Augustus (Version 3.3) (Stanke ; König ). For the gene finding parameters in Augustus we set species to wheat and ran BUSCO in the genome mode (-m geno -sp wheat).

Data availability

Raw reads have been deposited to the NCBI sequence read archive. Bioproject: PRJNA533066 [SRA: Paired-end reads: SRR9291461, SRR9291462, SRR9291463, SRR9291464; Long mate-pair reads: SRR9266823, SRR9266824, SRR9266825, SRR9266826, SRR9266827, SRR9266828; Dovetail reads: SRR9202370, SRR9202371, SRR9202372, SRR9202373, SRR9202374; Hi-C data: SRR8922888] The reference assembly is either available to download from figshare: https://doi.org/10.6084/m9.figshare.9332045 or through the European Nucleotide Archive (GCA_902500625).

Results and Discussion

Genome assembly

Here we report a full-length Golden Promise genome assembly which was generated integrating short read sequencing and two chromosome conformation sequencing approaches. Approximately 624 million 2x250 bp paired reads were generated providing an estimated 62.4x coverage of the genome. 245,820 scaffolds were generated comprising 4.11 Gb of sequence with an N50 of 86.6kb. Gaps comprised only 1.6% of the scaffolds (Table 1). To generate full chromosome assembly, we utilized two different chromosome conformation captures. In a first step, we used Chicago Dovetail data which is generated by in vitro proximity ligation of large DNA fragments to increase the scaffold size and to correct false misjoins from the previous scaffolding. In the next step, we integrated Hi-C data which uses the native chromatin folding to increase the contiguity to full chromosome size. This resulted in a final assembly of 4.13Gb and 7 chromosomes plus an extra chromosome containing the unassigned scaffolds. We have provided the reference sequence as a blast and gmap searchable website for easy access: https://ics.hutton.ac.uk/gmapper/.

Table 1

Statistics for the different stages of the assembly process

	Contigs	Scaffolds	Dovetail	Hi-C
N50	22.4kb	86.67kb	4.14Mb	/
Number	786,696	245,820	128,283	8
Longest	352,153bp	1,540,019bp	22,832,123bp	612,216,794bp
Size	4.02Gb	4.11Gb	4.12Gb	4.13Gb

Completeness of the assembly

We used the spectra-cn function from the Kmer Analysis Toolkit (KAT) (Mapleson ) to check for content inclusion in the contigs and scaffolds. KAT generates a k-mer frequency distribution from the paired-end reads and identifies how many times k-mers from each part of the distribution appear in the assembly being compared. It is assumed that with high coverage of paired-end reads, every part of the underlying genome has been sampled. Ideally, an assembly should contain all k-mers found in the reads (not including k-mers arising from sequencing errors) and no k-mers not present in the reads. The spectra-cn plot in Figure 1a generated from the contigs shows sequencing errors (k-mer multiplicity <20) appearing in black as these are not included in the assembly. The majority of the content appears in a single red peak indicating sequence that appears once in the assembly. The black region under the main peak is very small indicating that most of this content from the reads is present in the assembly. The content that appears to the right of the main peak and is present twice or three times in the assembly represents repeats.

Figure 1

Spectra cn plots comparing k-mers from the paired-end reads to kmers in (a) the contig assembly and (b) the scaffold assembly.

Spectra cn plots comparing k-mers from the paired-end reads to kmers in (a) the contig assembly and (b) the scaffold assembly. Scaffolds generally contain more miss-assemblies than contigs and this is reflected in the spectra-cn plot in Figure 1b generated from the scaffolds. The red bar at k-mer multiplicity 0 that is not present in the contigs spectra-cn plot reflects k-mers that appear in the scaffolds but do not appear in the reads. Approximately 7.2 million k-mers are represented in this region, less than 0.15% of the total.

Repetitive regions

The Golden Promise reference assembly was analyzed for repetitive regions using RepeatMasker with the TREP repeat library. This identified 73.2% (2.95 Gb) of the Golden Promise assembly as transposable elements (Table 2) with almost all from the class of retroelements. The same analysis was also done for MorexV1 and MorexV2 showing that all three have very similar results (Table 2). Differences to the published results from MorexV1 and MorexV2 assembly (International Barley Genome Sequencing Consortium 2012; Mascher ; Monat ) are due to the different repeat libraries used.

Table 2

Identified repetitive elements in the Golden Promise assembly. Values represent percentage coverage of the genome

	Golden Promise	MorexV1	MorexV2
	72.88	70.65	74.93
Class I: Retroelement
LTR Retrotransposon	63.16	62.25	64.25
LTR/Copia	19.87	21	20.94
LTR/Gypsy	42.97	40.93	42.99
Unclassified LTR	0.32	0.31	0.32
Non-LTR Retrotransposon
LINE	0.25	0.24	0.24
SINE	0.03	0.03	0.03
Class II: DNA Transposon
DNA Transposon Superfamily	8.25	7.39	8.97
CACTA superfamily (DTC)	7.77	6.92	8.49
hAT superfamily (DTA)	0.004	0.004	0.004
Mutator superfamily (DTM)	0.13	0.13	0.13
Tc1/Mariner superfamily (DTT)	0.2	0.19	0.2
Harbinger superfamily (DTH)	0.13	0.12	0.13
Unclassified (DTX)	0.02	0.02	0.02
MITE (DXX)	0.01	0.01	0.01
Helitron (DHH)	0.08	0.09	0.09
Unclassified Element (XXX)	0.46	0.3	0.74
Simple Sequence Repeats	0.63	0.36	0.59

Transcript annotation

For transcript annotation we transferred the latest barley annotation from MorexV2 onto the Golden Promise reference assembly. From a total of 63,658 genes in MorexV2, 62,605 genes could be transferred onto Golden Promise. Among these genes 7.2% did not contain a valid start codon, 7.7% had a different nucleotide length and 5% had a premature stop codon in the gene. As some transcripts contained a combination of those errors, this still left 84% of correctly transferred transcripts. We used two approaches to evaluate the quality of the Golden Promise assembly based on gene content. The analysis was done for each of the steps along the assembly process. The first approach was done with BUSCO (Benchmarking Universal Single-Copy Orthologs, v3.0.2) (Simão ; Waterhouse ). It assesses the completeness of a genome by identifying conserved single-copy, orthologous genes. Even the contig stage had already more complete single copy genes, 92.4%, in comparison to the published barley assembly from the cultivar MorexV1 with 91.5% (Figure 2a). Throughout the assembly process this improved to 95.2% of complete and single copy genes in the final pseudomolecule. This is very close to the recently published MorexV2 assembly with 97.2% of single copy genes. As expected, the number of fragmented sequences decreased during the assembly process from 2.8% of fragmented genes to only 1.1% in the pseudomolecule.

Figure 2

Completeness assessment of the Golden Promise assembly in comparison to the previous steps of the assembly process and the published barley references MorexV1 and MorexV2 for both the BUSCO analysis (a) and the flcDNA mapping analysis (b). The second approach used a flcDNA dataset which consists of 22,651 sequences generated from the cultivar Haruna Nijo (Sato ; Matsumoto ). These sequences were created from 12 different conditions and representing a good snapshot of the barley transcriptome. They can be used to identify the number of retained sequences in the Golden Promise pseudomolecule and give an impression on the segmentation of the pseudomolecule, highlighted by cDNAs which have been split within or across chromosomes. The 22,651 flcDNAs were mapped to the Golden Promise pseudomolecule using Gmap (version 2018-03-25; Wu and Watanabe 2005) with the following parameters: a minimum identity of 98% and a minimum trimmed coverage of 95%. The results for this dataset are very similar to the BUSCO analysis. The contigs already contained 81.4% of complete and single copy genes in comparison to the 73% of the MorexV1 reference (Figure 2b). The final assembly contained 87.1% of complete and single copy genes, 14% more than the barley reference MorexV1 and around 400 genes more in comparison to MorexV2 accounting for a difference of 1.9%. Similar to the BUSCO analysis the number of duplicated complete genes and the number of fragmented genes is decreased in the Golden Promise assembly. Again, the overall comparison to MorexV2 shows very similar results emphasizing the high quality of both barley genomes.

Conclusion

Here, we presented such an assembly that is an improvement on the currently available barley reference from the cultivar MorexV1 (International Barley Genome Sequencing Consortium 2012; Mascher ) and near-equivalent to the recently released MorexV2 (Monat ). Importantly, it is a European 2-row cultivar, expanding barley genomic resources to European breeding material in contrast to the American 6-row cultivar Morex. The importance of having another genome assembly has already been demonstrated in the analysis of the highly divergent Jekyll genes (Radchuk ). We anticipate it will benefit the whole barley research community but will be especially useful for groups working on CRISPR-Cas9.

24 in total

1. In Situ Hi-C for Plants: An Improved Method to Detect Long-Range Chromatin Interactions.

Authors: Sudharsan Padmarasu; Axel Himmelbach; Martin Mascher; Nils Stein
Journal: Methods Mol Biol Date: 2019

2. Method for hull-less barley transformation and manipulation of grain mixed-linkage beta-glucan.

Authors: Wai Li Lim; Helen M Collins; Rohan R Singh; Natalie A J Kibble; Kuok Yap; Jillian Taylor; Geoffrey B Fincher; Rachel A Burton
Journal: J Integr Plant Biol Date: 2018-02-28 Impact factor: 7.061

3. Genomic analysis of 6,000-year-old cultivated grain illuminates the domestication history of barley.

Authors: Martin Mascher; Verena J Schuenemann; Uri Davidovich; Nimrod Marom; Axel Himmelbach; Sariel Hübner; Abraham Korol; Michal David; Ella Reiter; Simone Riehl; Mona Schreiber; Samuel H Vohr; Richard E Green; Ian K Dawson; Joanne Russell; Benjamin Kilian; Gary J Muehlbauer; Robbie Waugh; Tzion Fahima; Johannes Krause; Ehud Weiss; Nils Stein
Journal: Nat Genet Date: 2016-07-18 Impact factor: 38.330

4. A chromosome conformation capture ordered sequence of the barley genome.

Authors: Martin Mascher; Heidrun Gundlach; Axel Himmelbach; Sebastian Beier; Sven O Twardziok; Thomas Wicker; Volodymyr Radchuk; Christoph Dockter; Pete E Hedley; Joanne Russell; Micha Bayer; Luke Ramsay; Hui Liu; Georg Haberer; Xiao-Qi Zhang; Qisen Zhang; Roberto A Barrero; Lin Li; Stefan Taudien; Marco Groth; Marius Felder; Alex Hastie; Hana Šimková; Helena Staňková; Jan Vrána; Saki Chan; María Muñoz-Amatriaín; Rachid Ounit; Steve Wanamaker; Daniel Bolser; Christian Colmsee; Thomas Schmutzer; Lala Aliyeva-Schnorr; Stefano Grasso; Jaakko Tanskanen; Anna Chailyan; Dharanya Sampath; Darren Heavens; Leah Clissold; Sujie Cao; Brett Chapman; Fei Dai; Yong Han; Hua Li; Xuan Li; Chongyun Lin; John K McCooke; Cong Tan; Penghao Wang; Songbo Wang; Shuya Yin; Gaofeng Zhou; Jesse A Poland; Matthew I Bellgard; Ljudmilla Borisjuk; Andreas Houben; Jaroslav Doležel; Sarah Ayling; Stefano Lonardi; Paul Kersey; Peter Langridge; Gary J Muehlbauer; Matthew D Clark; Mario Caccamo; Alan H Schulman; Klaus F X Mayer; Matthias Platzer; Timothy J Close; Uwe Scholz; Mats Hansson; Guoping Zhang; Ilka Braumann; Manuel Spannagl; Chengdao Li; Robbie Waugh; Nils Stein
Journal: Nature Date: 2017-04-26 Impact factor: 49.962

5. BUSCO Applications from Quality Assessments to Gene Prediction and Phylogenomics.

Authors: Robert M Waterhouse; Mathieu Seppey; Felipe A Simão; Mosè Manni; Panagiotis Ioannidis; Guennadi Klioutchnikov; Evgenia V Kriventseva; Evgeny M Zdobnov
Journal: Mol Biol Evol Date: 2018-03-01 Impact factor: 16.240

Review 6. CRISPR for Crop Improvement: An Update Review.

Authors: Deepa Jaganathan; Karthikeyan Ramasamy; Gothandapani Sellamuthu; Shilpha Jayabalan; Gayatri Venkataraman
Journal: Front Plant Sci Date: 2018-07-17 Impact factor: 5.753

7. The highly divergent Jekyll genes, required for sexual reproduction, are lineage specific for the related grass tribes Triticeae and Bromeae.

Authors: Volodymyr Radchuk; Rajiv Sharma; Elena Potokina; Ruslana Radchuk; Diana Weier; Eberhard Munz; Miriam Schreiber; Martin Mascher; Nils Stein; Thomas Wicker; Benjamin Kilian; Ljudmilla Borisjuk
Journal: Plant J Date: 2019-05-25 Impact factor: 6.417

8. Development of 5006 full-length CDNAs in barley: a tool for accessing cereal genomics resources.

Authors: Kazuhiro Sato; Tadasu Shin-I; Motoaki Seki; Kazuo Shinozaki; Hideya Yoshida; Kazuyoshi Takeda; Yukiko Yamazaki; Matthieu Conte; Yuji Kohara
Journal: DNA Res Date: 2009-01-15 Impact factor: 4.458

Review 9. CRISPR/Cas9 Mediated Genome Engineering for Improvement of Horticultural Crops.

Authors: Suhas G Karkute; Achuit K Singh; Om P Gupta; Prabhakar M Singh; Bijendra Singh
Journal: Front Plant Sci Date: 2017-09-22 Impact factor: 5.753

10. BaRTv1.0: an improved barley reference transcript dataset to determine accurate changes in the barley transcriptome using RNA-seq.

Authors: Paulo Rapazote-Flores; Micha Bayer; Linda Milne; Claus-Dieter Mayer; John Fuller; Wenbin Guo; Pete E Hedley; Jenny Morris; Claire Halpin; Jason Kam; Sarah M McKim; Monika Zwirek; M Cristina Casao; Abdellah Barakate; Miriam Schreiber; Gordon Stephen; Runxuan Zhang; John W S Brown; Robbie Waugh; Craig G Simpson
Journal: BMC Genomics Date: 2019-12-11 Impact factor: 3.969

21 in total

1. Gramene: A Resource for Comparative Analysis of Plants Genomes and Pathways.

Authors: Marcela Karey Tello-Ruiz; Pankaj Jaiswal; Doreen Ware
Journal: Methods Mol Biol Date: 2022

2. Applications of Optical Mapping for Plant Genome Assembly and Structural Variation Detection.

Authors: Yuxuan Yuan
Journal: Methods Mol Biol Date: 2022

Review 3. Barley's Second Spring as A Model Organism for Chloroplast Research.

Authors: Lisa Rotasperti; Francesca Sansoni; Chiara Mizzotti; Luca Tadini; Paolo Pesaresi
Journal: Plants (Basel) Date: 2020-06-27

4. Barley Anther and Meiocyte Transcriptome Dynamics in Meiotic Prophase I.

Authors: Abdellah Barakate; Jamie Orr; Miriam Schreiber; Isabelle Colas; Dominika Lewandowska; Nicola McCallum; Malcolm Macaulay; Jenny Morris; Mikel Arrieta; Pete E Hedley; Luke Ramsay; Robbie Waugh
Journal: Front Plant Sci Date: 2021-01-12 Impact factor: 5.753

5. Strain-specific genome evolution in Trypanosoma cruzi, the agent of Chagas disease.

Authors: Wei Wang; Duo Peng; Rodrigo P Baptista; Yiran Li; Jessica C Kissinger; Rick L Tarleton
Journal: PLoS Pathog Date: 2021-01-28 Impact factor: 6.823

6. Genetic analysis of the barley variegation mutant, grandpa1.a.

Authors: Shengming Yang; Megan Overlander; Jason Fiedler
Journal: BMC Plant Biol Date: 2021-03-13 Impact factor: 4.215

7. The barley pan-genome reveals the hidden legacy of mutation breeding.

Authors: Murukarthick Jayakodi; Sudharsan Padmarasu; Georg Haberer; Venkata Suresh Bonthala; Heidrun Gundlach; Cécile Monat; Thomas Lux; Nadia Kamal; Daniel Lang; Axel Himmelbach; Jennifer Ens; Xiao-Qi Zhang; Tefera T Angessa; Gaofeng Zhou; Cong Tan; Camilla Hill; Penghao Wang; Miriam Schreiber; Lori B Boston; Christopher Plott; Jerry Jenkins; Yu Guo; Anne Fiebig; Hikmet Budak; Dongdong Xu; Jing Zhang; Chunchao Wang; Jane Grimwood; Jeremy Schmutz; Ganggang Guo; Guoping Zhang; Keiichi Mochida; Takashi Hirayama; Kazuhiro Sato; Kenneth J Chalmers; Peter Langridge; Robbie Waugh; Curtis J Pozniak; Uwe Scholz; Klaus F X Mayer; Manuel Spannagl; Chengdao Li; Martin Mascher; Nils Stein
Journal: Nature Date: 2020-11-25 Impact factor: 49.962