| Literature DB >> 34247248 |
Daniel Frei1, Elisabeth Veekman2, Daniel Grogg3, Ingrid Stoffel-Studer3, Aki Morishima4, Rie Shimizu-Inatsugi4, Steven Yates3, Kentaro K Shimizu4,5, Jürg E Frey1, Bruno Studer3, Dario Copetti3,4.
Abstract
Despite the progress made in DNA sequencing over the last decade, reconstructing telomere-to-telomere genome assemblies of large and repeat-rich eukaryotic genomes is still difficult. More accurate basecalls or longer reads could address this issue, but no current sequencing platform can provide both simultaneously. Perennial ryegrass (Lolium perenne L.) is an example of an important species for which the lack of a reference genome assembly hindered a swift adoption of genomics-based methods into breeding programs. To fill this gap, we optimized the Oxford Nanopore Technologies' sequencing protocol, obtaining sequencing reads with an N50 of 62 kb-a very high value for a plant sample. The assembly of such reads produced a highly complete (2.3 of 2.7 Gb), correct (QV 45), and contiguous (contig N50 and N90 11.74 and 3.34 Mb, respectively) genome assembly. We show how read length was key in determining the assembly contiguity. Sequence annotation revealed the dominance of transposable elements and repeated sequences (81.6% of the assembly) and identified 38,868 protein coding genes. Almost 90% of the bases could be anchored to seven pseudomolecules, providing the first high-quality haploid reference assembly for perennial ryegrass. This protocol will enable producing longer Oxford Nanopore Technology reads for more plant samples and ushering forage grasses into modern genomics-assisted breeding programs.Entities:
Keywords: zzm321990 Lolium perennezzm321990 ; Oxford Nanopore; forage grasses; genome assembly; genomics; perennial ryegrass
Mesh:
Substances:
Year: 2021 PMID: 34247248 PMCID: PMC8358221 DOI: 10.1093/gbe/evab159
Source DB: PubMed Journal: Genome Biol Evol ISSN: 1759-6653 Impact factor: 3.416
Fig. 1.Features of the Kyuss genome and assembly. (a) Flow cytometry trace of Kyuss nuclei showing the occurrence of peaks at the same position as in the diploid parent (b) when compared with the tomato external standard. (c) KAT comp spectra describing the occurrence of mostly single-copy k-mers (red area) in the assembly under the main peak. The assembly is overall very complete (very small 0× area at multiplicities higher than 20) and the repeated sequences are correctly represented (purple and green enrichment values at multiples of the main peak). (d) Long-read coverage distribution upon read alignment. The lack of shoulders or additional peaks results entails the lack of collapsed or allelic regions in the assembly. (e) BUSCO analysis of the Kyuss and other public ryegrass assemblies (Byrne et al. 2015; Copetti et al. 2021; Knorst et al. 2019). Kyuss assembly shows the highest completeness in terms of conserved single-copy orthologs (SCOs), with only 4% of the models being fragmented or missing. The “Rabiosa” assembly is a diploid assembly, thus most of the SCOs are expected to be identified twice. The columns with the asterisk denote BUSCO scores for the predicted gene models. Blue: single copy, orange: duplicated, yellow: fragmented, green: missing models. (f) Total size and contiguity of the ryegrass assemblies evaluated by cumulative sequence length. The expected total size of the assemblies is around 2,500 − 2,700 Mb, except for Rabiosa where the diploid assembly should result in approximately 5,200 Mb. The high contiguity of the Kyuss assembly is denoted by the sharp vertical raise of the contig index approaching rapidly the total assembly size. In comparison, the “P226/135/16” and “M2289” assemblies show dramatically lower completeness and contiguity.
Statistics of the Lolium perenne Kyuss Genome Assembly and Comparison with Other Public Ryegrass Assemblies
| Kyuss | P226/135/16 | Rabiosa | M2289 | |
|---|---|---|---|---|
|
|
|
|
| |
| Reference | This study |
|
|
|
| Est. genome size (Gb) | 2.720 | 2.068 | 2.464 | 2.500 |
| Assembly size (Gb) | 2.281 | 1.128 | 4.531 | 0.585 |
| % of genome assembled | 83.9 | 54.6 | 183.9 | 23.4 |
| # of sequences | 1,935 | 48,415 | 226,949 | 129,579 |
| N50 (kb) | 11,276 | 70 | 2,941 | 5 |
| N90 (kb) | 3,320 | 14 | 283 | 2 |
| L50 (#) | 65 | 4,908 | 443 | 37,162 |
| L90 (#) | 209 | 16,951 | 1,984 | 103,446 |
Note.—The fraction of the assembled genome is based upon the genome size estimation provided in the respective studies. In the Rabiosa assembly, most of the allelic regions are represented as separate sequences, thus reaching the diploid genome size.
Statistics of the contigs or scaffolds before being placed on pseudomolecules.