| Literature DB >> 32964225 |
Gina M Pham1, John P Hamilton1, Joshua C Wood1, Joseph T Burke1, Hainan Zhao1, Brieanne Vaillancourt1, Shujun Ou2, Jiming Jiang1,3,4, C Robin Buell1,4,5.
Abstract
BACKGROUND: Worldwide, the cultivated potato, Solanum tuberosum L., is the No. 1 vegetable crop and a critical food security crop. The genome sequence of DM1-3 516 R44, a doubled monoploid clone of S. tuberosum Group Phureja, was published in 2011 using a whole-genome shotgun sequencing approach with short-read sequence data. Current advanced sequencing technologies now permit generation of near-complete, high-quality chromosome-scale genome assemblies at minimal cost.Entities:
Keywords: chromosome-scale; long-read; potato; reference genome
Year: 2020 PMID: 32964225 PMCID: PMC7509475 DOI: 10.1093/gigascience/giaa100
Source DB: PubMed Journal: Gigascience ISSN: 2047-217X Impact factor: 6.524
Figure 1:Doubled monoploid potato clone, DM1–3 516 R44. (a) Aboveground tissues and (b) tubers from the doubled monoploid potato clone, DM1–3 516 R44. Photos courtesy of Joseph Coombs.
Assembly metrics of the DM 1–3 R44 v4 and v6 assemblies
| Parameter | v4.03[ | v4.04[ | v6.1[ |
|---|---|---|---|
| Total assembly size, Mb | 773.0 | 884.1 | 741.6 |
| Total non-gapped size, Mb | 676.3 | 728.7 | 741.5 |
| Contig N50 size, bp | 31,914 | 29,071 | 17,312,182 |
| Total contig No. | 60,068 | 170,833 | 1,382 |
| Scaffold N50 size, bp | 1,344,915 | 1,344,915 | 59,670,755 |
| Scaffold No. | 14,853 | 14,853 | 288 |
PGSC contigs and scaffolds downloaded from NCBI: AEWC01000001-AEWC01060068; JH137791-JH152643 [1, 2].
DM v4.04 is composed of v4.03 plus an additional 110,765 unanchored contigs (55.7 Mb) [3].
The DM v6.1 scaffolds are composed of the 12 chromosome-scale pseudomolecules and 276 unanchored scaffolds.
Chromosome lengths and gap (N) content in DM v4.04 and v6.1
| Chromosome | DM v4.04 | DM v6.1 | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Total chromosome length (bp) | Total sequence length (bp) | % Sequence | Total gap length (bp) | % Gaps | Total chromosome length (bp) | Total sequence length (bp) | % Sequence | Total gap length (bp) | % Gaps | |
| chr01 | 88,663,952 | 77,894,594 | 87.85 | 10,769,358 | 12.15 | 88,591,686 | 88,579,186 | 99.99 | 12,500 | 0.01 |
| chr02 | 48,614,681 | 42,696,816 | 87.83 | 5917,865 | 12.17 | 46,102,915 | 46,100,415 | 99.99 | 2,500 | 0.01 |
| chr03 | 62,290,286 | 53,928,846 | 86.58 | 8361,440 | 13.42 | 60,707,570 | 60,704,570 | 100 | 3,000 | 0.00 |
| chr04 | 72,208,621 | 62,203,573 | 86.14 | 10,005,048 | 13.86 | 69,236,331 | 69,230,831 | 99.99 | 5,500 | 0.01 |
| chr05 | 52,070,158 | 46,610,373 | 89.51 | 5459,785 | 10.49 | 55,599,697 | 55,591,197 | 99.98 | 8,500 | 0.02 |
| chr06 | 59,532,096 | 51,644,783 | 86.75 | 7887,313 | 13.25 | 59,091,578 | 59,085,578 | 99.99 | 6,000 | 0.01 |
| chr07 | 56,760,843 | 49,550,308 | 87.30 | 7210,535 | 12.70 | 57,639,317 | 57,635,317 | 99.99 | 4,000 | 0.01 |
| chr08 | 56,938,457 | 49,300,183 | 86.59 | 7638,274 | 13.41 | 59,226,000 | 59,217,000 | 99.98 | 9,000 | 0.02 |
| chr09 | 61,540,751 | 53,891,571 | 87.57 | 7649,180 | 12.43 | 67,600,300 | 67,594,300 | 99.99 | 6,000 | 0.01 |
| chr10 | 59,756,223 | 52,349,496 | 87.61 | 7406,727 | 12.39 | 61,044,151 | 61,037,651 | 99.99 | 6,500 | 0.01 |
| chr11 | 45,475,667 | 40,128,174 | 88.24 | 5347,493 | 11.76 | 46,777,387 | 46,772,387 | 99.99 | 5,000 | 0.01 |
| chr12 | 61,165,649 | 53,902,062 | 88.12 | 7263,587 | 11.88 | 59,670,755 | 59,658,755 | 99.98 | 12,000 | 0.02 |
| Total pseudomolecules | 725,017,384 | 634,100,779 | 87.46 | 90,916,605 | 12.54 | 731,287,687 | 731,207,187 | 99.99 | 80,500 | 0.01 |
| Unanchored sequences | 159,090,912 | 94,595,563 | 59.46 | 64,495,349 | 40.54 | 10,297,348 | 10,289,348 | 99.92 | 8,000 | 0.08 |
| Total assembly | 884,108,296 | 728,696,342 | 82.42 | 155,411,954 | 17.58 | 741,585,035 | 741,496,535 | 99.99 | 88,500 | 0.01 |
Figure 2:Genome-wide LTR Assembly Index (LAI) [38] scores for DM assembly v.4.04 (V4) and v.6.1 (V6). LAI was calculated for 3-Mb sliding windows with a 300-kb step size.
Figure 3:Distribution of subtelomeric repeat sequences, centromeric repeat sequences, CENH3 ChIP-seq alignments, and oligonucleotide fluorescent in situ hybridization (oligo-FISH) probes. (A) Distribution of features on DM v4.04 assembly. (B) Distribution of features on DM v6.1 assembly. Red and green rectangles represent the positions of the 2 “barcode” oligo-FISH probes [42]. For CENH3 ChIP-seq reads, chromosomes were divided into 100-kb windows and CENH3 read number in each window was calculated and plotted [45]. Circles represent centromeric repeats [45]. Triangles represent subtelomeric repeats [55].
Figure 4:Improved assembly of the centromeric regions in DM v6.1. (A) CENH3 read distribution on centromere 7. (B) CENH3 read distribution on centromere 10. Chromosomes were divided into 100-kb windows and the CENH3 ChIP-seq read number [45] in each window was calculated and plotted. Red dots represent centromeric repeats. Upper panel shows the CENH3 ChIP-seq read distribution in the DM v4.04 assembly; lower panel shows the distribution in the DM v6.1 assembly.
Figure 5:Whole-genome alignment of the DM v4.04 vs v6.1 DM genome assemblies. Whole-genome alignments of the long-read, chromosome-scale DM v6.1 assembly with the DM 4.04 genome assembly using D-GENIES reveals concordance in the euchromatic arms but misassemblies in the pericentromeric regions.