| Literature DB >> 34534288 |
Ruth Freire1, Marius Weisweiler1, Ricardo Guerreiro1, Nadia Baig1, Bruno Hüttel2, Evelyn Obeng-Hinneh3, Juliane Renner3, Stefanie Hartje3, Katja Muders4, Bernd Truberg4, Arne Rosen4, Vanessa Prigge5, Julien Bruckmüller6, Jens Lübeck6, Benjamin Stich1,7.
Abstract
Potato (Solanum tuberosum L.) is one of the most important crops with a worldwide production of 370 million metric tons. The objectives of this study were (1) to create a high-quality consensus sequence across the two haplotypes of a diploid clone derived from a tetraploid elite variety and assess the sequence divergence from the available potato genome assemblies, as well as among the two haplotypes; (2) to evaluate the new assembly's usefulness for various genomic methods; and (3) to assess the performance of phasing in diploid and tetraploid clones, using linked-read sequencing technology. We used PacBio long reads coupled with 10x Genomics reads and proximity ligation scaffolding to create the dAg1_v1.0 reference genome sequence. With a final assembly size of 812 Mb, where 750 Mb are anchored to 12 chromosomes, our assembly is larger than other available potato reference sequences and high proportions of properly paired reads were observed for clones unrelated by pedigree to dAg1. Comparisons of the new dAg1_v1.0 sequence to other potato genome sequences point out the high divergence between the different potato varieties and illustrate the potential of using dAg1_v1.0 sequence in breeding applications.Entities:
Keywords: chromosome-scale; elite potato variety; genome divergence; intragenomic diversity; reference sequence
Mesh:
Year: 2021 PMID: 34534288 PMCID: PMC8664475 DOI: 10.1093/g3journal/jkab330
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
Assembly statistics of different steps of our final genome assembly strategy for dAg1
| Assembly step | No. of contigs | Assembly size (Mb) | Largest contig (Mb) | N50 (Mb) | N90 (Mb) | L50 | L90 | Ns per 100kb | BUSCO (%) |
|---|---|---|---|---|---|---|---|---|---|
| Assembling | |||||||||
| Canu | 14,037 | 1,343.9 | 4.559 | 0.203 | 0.035 | 1,643 | 7,787 | 0 | 95 |
| Falcon | 2,109 | 845.7 | 4.904 | 0.618 | 0.206 | 393 | 1,315 | 0 | 95 |
| quickmerge | 1,592 | 889.7 | 10.609 | 0.865 | 0.276 | 267 | 974 | 0 | 95 |
| Arcs 1× | 1,055 | 895.9 | 13.589 | 1.440 | 0.407 | 176 | 635 | 757 | 95 |
| Arcs 2× | 704 | 788.1 | 13.585 | 1.656 | 0.548 | 136 | 445 | 977 | 95 |
| Hi-C scaffolding | |||||||||
| Hi-C Salsa | 385 | 788.4 | 29.219 | 5.059 | 1.007 | 41 | 175 | 1,006 | 95 |
| Hi-C 3D-DNA | 12 (+614) | 812.2 | 89.719 | 57.412 | 52.458 | 6 | 12 | 994 | 94 |
For details see Materials and Methods.
Figure 1Hi-C contact map of dAg1_v1.0 sequence (A). Dot plots of whole-genome alignments of dAg1_v1.0 (vertical) vs DM_v6.1 (B), RH89 (C), and Solyntus_v1.1 (D) genomes (horizontal). Each dot indicates an alignment with a length of ≥1000 bp between the two genomes (≥100 bp for D). Forward and reverse alignments are represented as blue and red dots, respectively.
Figure 2Percentage of 10xG linked reads of different potato clones mapped to different potato assemblies (A) and percentage of 10xG linked reads properly paired in mapping against different potato assemblies (B).
Figure 3Percentage of dAg1 PacBio reads (A), tAg PacBio reads (B), and tAg high-quality Iso-seq RNA reads (C) mapped to different potato assemblies.
Number of variants (SNV and indels) and genes with at least one deleterious variant among the haplotypes of a potato clone
| Clone | Number of variants | Number of genes (del. variant) | |||
|---|---|---|---|---|---|
| Total | Heterozygous | Homozygous | Total | Homozygous | |
| dAg1 | 7,829,534 | 7,829,534 | — | 13,287 | — |
| dAg2 | 9,790,584 | 7,710,744 | 2,079,840 | 16,365 | 1,838 |
| dAg3 | 9,461,662 | 7,975,910 | 1,485,752 | 16,766 | 1,436 |
| tAg | 25,559,532 | 25,495,186 | 64,346 | 26,134 | 25 |
| tPa1 | 30,680,341 | 29,831,031 | 849,310 | 28,357 | 669 |
| tPa2 | 28,666,770 | 27,156,995 | 1,509,775 | 27,927 | 1,060 |
Figure 4Distribution of genomic features across the potato genome. The outermost circle denotes the chromosome number and the physical position. The next inner circles report the distributions of genes (black), repeats (green) measured as percentage of masked bp, and structural variations (blue). The four most inner circles illustrate the proportion of genes with at least one deleterious variant in dAg1-3 (black) and tPa1-2 (orange), and heterozygous variants in dAg1, dAg2, and dAg3 in 1-Mb windows, respectively. The gray bars mark the pericentromeric regions, whereas the yellow bars mark the regions where the highest difference between the proportion of genes with at least one deleterious variant of dAg1-3 and tPa1-2 was identified.
Q95, median, and Q5 of the block length in bp of phased variants
| Clone | Q95 (bp) | Median (bp) | Q5 (bp) |
|---|---|---|---|
| dAg1 | 626,568 | 6,824 | 6 |
| dAg2 | 341,204 | 267 | 2 |
| dAg3 | 251,607 | 301 | 2 |
| tAg | 1,207 | 188 | 16 |
| tPa1 | 872 | 131 | 11 |
| tPa2 | 822 | 116 | 9 |
Percentage of phased blocks for which the haplotypes of progenies occurred in 0 to multiple copies in the parental clones
| Samples | dAg1 | dAg2 | dAg3 | tAg |
|---|---|---|---|---|
| 0/1/2 (%) | 0/1/2/3/4 (%) | |||
| tAg | 2.3/9.5/ 88.2 | 2.0/10.8/ 87.2 | 2.1/9.2/ 88.7 | – |
| tPa1 | 2.1/11.8/ 86.1 | 2.9/11.2/ 85.9 | 3.3/11.2/ 85.5 | 1.8/5.0/14.5/ 23.7/55.0 |
| tPa2 | 2.1/11.4/ 86.5 | 2.2/10.9/ 86.9 | 2.9/11.2/ 85.9 | 1.6/4.9/14.7/ 24.9/53.9 |