| Literature DB >> 34784084 |
Priyanka Sharma1, Ardashir Kharabian Masouleh1, Bruce Topp1, Agnelo Furtado1, Robert J Henry1,2.
Abstract
Recent advances in the sequencing and assembly of plant genomes have allowed the generation of genomes with increasing contiguity and sequence accuracy. Chromosome level genome assemblies using sequence contigs generated from long read sequencing have involved the use of proximity analysis (Hi-C) or traditional genetic maps to guide the placement of sequence contigs within chromosomes. The development of highly accurate long reads by repeated sequencing of circularized DNA (HiFi; PacBio) has greatly increased the size of contigs. We now report the use of HiFiasm to assemble the genome of Macadamia jansenii, a genome that has been used as a model to test sequencing and assembly. This achieved almost complete chromosome level assembly from the sequence data alone without the need for higher level chromosome map information. Eight of the 14 chromosomes were represented by a single large contig (six with telomere repeats at both ends) and the other six assembled from two to four main contigs. The small number of chromosome breaks appears to be the result of highly repetitive regions including ribosomal genes that cannot be assembled by these approaches. De novo assembly of near complete chromosome level plant genomes now appears possible using these sequencing and assembly tools. Further targeted strategies might allow these remaining gaps to be closed.Entities:
Keywords: zzm321990Macadamia janseniizzm321990; HiFi reads; HiFiasm; de novo genome assembly; mitochondrial genome; nuclear genome; nuclear ribosomal RNA; plastid genome; technical advance
Mesh:
Year: 2021 PMID: 34784084 PMCID: PMC9300133 DOI: 10.1111/tpj.15583
Source DB: PubMed Journal: Plant J ISSN: 0960-7412 Impact factor: 7.091
HiFiasm contigs in different size categories and comparison of primary and haploid assemblies generated from HiFiasm genome assembler tool
| Number of contigs | Assembly length (Mb) | N50 (Mb) | N75 (Mb) |
| |
|---|---|---|---|---|---|
| HiFiasm assembly | |||||
| Total contigs | 779 | 826 | 46 | 25 | 99.6 |
| Contigs >40 Mb | 10 | 524 | 50 | 46 | 68.7 |
| Contigs >10 Mb | 19 | 746 | 48 | 39 | 93.9 |
| Contigs >1 Mb | 30 | 784 | 46 | 30 | 99.1 |
| Contigs >100 kb | 94 | 805 | 46 | 27 | 99.0 |
| Between 100 kb and 1 Mb | 64 | 20 | 0.49 | 0.22 | 0.20 |
| Between 10 kb and 100 kb | 685 | 22 | 0.032 | 0.028 | 0.00 |
| Comparison of HiFiasm primary and haploid assemblies | |||||
| Primary assembly | 779 | 827 | 46.1 | 25 | 99.60 |
| Hap 1_assembly | 879 | 816 | 24.4 | 8.9 | 98.80 |
| Hap 2_assembly | 363 | 776 | 14.3 | 5.4 | 97.90 |
| Hap 1 >1 Mb | 96 | 736 | 16.4 | 6.8 | 96.70 |
| Hap 2 >1 Mb | 72 | 766 | 24.5 | 12.3 | 98.10 |
Figure 2Dotplots of HiFiasm contigs against Hi‐C pseudo‐molecules. (a) Pseudo‐molecules that are covered by a single HiFiasm contig. (b) Pseudo‐molecules that are covered by more than one HiFiasm contig.
Chromosomal location of HiFiasm contigs >1 Mb
| Contig id >1 Mb | Length in bp | Hi‐C pseudo‐molecule corresponding HiFiasm contigs |
|---|---|---|
| ptg000016l | 71 935 981 | Chr 1 + Ribo RNA |
| ptg000003l | 57 251 071 | Chr 6 |
| ptg000017l | 57 081 251 | Chr 4 |
| ptg000011l | 56 513 637 | Chr 5 |
| ptg000004l | 49 863 231 | Chr 10 |
| ptg000012l | 48 320 516 | Chr 11 |
| ptg000023l | 47 997 562 | Chr 13 |
| ptg000010l | 46 138 073 | Chr 2 + Ribo RNA |
| ptg000008l | 46 131 124 | Chr 14 |
| ptg000014l | 43 049 961 | Chr 9 |
| ptg000009l | 39 279 660 | Chr 3 |
| ptg000002l | 29 700 554 | Chr 8 |
| ptg000001l | 26 771 894 | Chr 12 |
| ptg000006l | 25 189 511 | Chr 2 |
| ptg000007l | 23 138 637 | Chr 7 + Ribo RNA |
| ptg000013l | 22 539 440 | Chr 8 |
| ptg000020l | 22 399 594 | Chr 7 |
| ptg000052l | 20 335 125 | Chr 12 |
| ptg000021l | 13 354 688 | Chr 3 |
| ptg000019l | 8 098 418 | Chr 7 |
| ptg000022l | 6 676 624 | Chr 3 |
| ptg000005l | 6 127 021 | Chr 9 |
| ptg000072l | 4 271 045 | Chr 12 |
| ptg000018l | 2 743 534 | Ribo RNA |
| ptg000025l | 2 713 795 | Part of Chr 2 |
| ptg000062l | 1 651 603 | Ribo RNA |
| ptg000074l | 1 299 006 | Ribo RNA |
| ptg000034l | 1 171 806 | Ribo RNA |
| ptg000036l | 1 154 310 | Ribo RNA |
| ptg000033l | 1 122 141 | Part of Chr 7 |
HiFiasm contigs ( 1 Mb) covering each of the Hi‐C pseudo‐molecules
|
| Size of HiC pseudo‐molecules (B) | HiFiasm contigs corresponding to HiC scaffolds (C) | HiFiasm contigs length (with explanation) (D) | HiFiasm combined contigs length (E) | Extra HiFiasm length (HiFiasm contig length − HiC scaffold length) (E − B) |
|---|---|---|---|---|---|
| Chr 1 | 67 682 215 | ptg000016l | 71 93 5981 | 71 935 981 | 4 253 766 |
| Chr 2 | 63 669 590 | ptg000006I + ptg000025I + ptg000010I | 74 041 379 (=25 189 511 + 2 713 795 + 46 138 073) | 74 041 379 | 10 371 789 |
| Chr 3 | 58 143 993 | ptg000021I + ptg000009l + ptg000022I | 59 310 972 | 59 310 972 | 1 166 979 |
| Chr 4 | 56 076 407 | ptg000017l | 57 081 251 | 57 081 251 | 1 004 844 |
| Chr 5 | 5 522 0784 | ptg000011l | 56 513 637 | 56 513 637 | 1 292 853 |
| Chr 6 | 53 595 462 | ptg000003l | 57 251 071 | 57 251 071 | 3 655 609 |
| Chr 7 | 52 077 970 | ptg000019I + ptg000020I +ptg000033l + ptg000007I | 54 758 790 (=8 098 418 + 22 399 594 + 1 122 141 + 23 138 637) | 54 758 790 | 2 680 820 |
| Chr 8 | 49 563 658 | ptg000013I + ptg000002l | 5 223 9994 (=22 539 440 + 29 700 554) | 52 239 994 | 2 676 336 |
| Chr 9 | 49 085 581 | ptg000014l + ptg000005 | 4 917 6982 (=43 049 961 + 6 127 021) | 49 176 982 | 91 401 |
| Chr 10 | 48 974 653 | ptg000004l | 4 986 3231 | 49 863 231 | 888 578 |
| Chr 11 | 47 698 009 | ptg000012l | 4 832 0516 | 48 320 516 | 622 507 |
| Chr 12 | 46 713 600 | ptg000001l + ptg000072I + ptg000052I | 51 378 064 (=26 771 894 + 4 271 045 + 20 335 125) | 51 378 064 | 4 664 464 |
| Chr 13 | 45 610 911 | ptg000023l | 47 997 562 | 47 997 562 | 23 86 651 |
| Chr 14 | 45 288 529 | ptg000008l | 46 131 124 | 46 131 124 | 842 595 |
Figure 1Dotplot of Macadamia jansenii Hi‐C genome assembly against HiFiasm contigs. (a) HiFiasm longest contigs (>1 Mb size), (b) HiFiasm medium size contigs (<1 Mb and >100 kb) and (c) HiFiasm smallest contigs (<100 kb).
Figure 5Dotplot of Macadamia jansenii nuclear ribosomal RNA sequence against HiFiasm contigs. (a) HiFiasm longest contigs (>1 Mb size), (b) HiFiasm medium size contigs (<1 Mb and >100 kb) and (c) HiFiasm smallest contigs (<100 kb).
Figure 3Dotplot of Macadamia jansenii chloroplast genome sequence against HiFiasm contigs. (a) HiFiasm longest contigs (>1 Mb size), (b) HiFiasm medium size contigs (<1 Mb and >100 kb) and (c) HiFiasm smallest contigs (<100 kb).
Figure 4Dotplot of Macadamia jansenii mitochondria genome sequence against three sets of HiFiasm contigs. (a) HiFiasm longest contigs (>1 Mb size), (b) HiFiasm medium size contigs (<1 Mb and >100 kb) and (c) HiFiasm smallest contigs (<100 kb).
Presence of telomere repeats and rRNA at the ends of HiFiasm contigs
| Hi‐C pseudo‐molecule | HiFiasm contigs | Terminal 1 (HiFiasm contig) | Terminal 2 (HiFiasm contig) |
|---|---|---|---|
| Hi‐C pseudo‐molecules covered by a single HiFiasm contig | |||
| Chr 1 | ptg000016l | Telomere | 18S rRNA |
| Chr 4 | ptg000017l | Telomere | Telomere |
| Chr 5 | ptg000011l | Telomere | 18S rRNA |
| Chr 6 | ptg000003l | Telomere | Telomere |
| Chr 10 | ptg000004l | Telomere | Telomere |
| Chr 11 | ptg000012l | Telomere | Telomere |
| Chr 13 | ptg000023l | Telomere | Telomere |
| Chr 14 | ptg000008l | Telomere | Telomere |
| Hi‐C pseudo‐molecules covered by more than one HiFiasm contig | |||
| Chr 2 | ptg000006I | – | Telomere |
| ptg000025I | – | 28S rRNA | |
| ptg000010I | 18S rRNA | 28S rRNA | |
| Chr 3 | ptg000021I | – | Telomere |
| ptg000009I | – | – | |
| ptg000022I | – | Telomere | |
| Chr 7 | ptg000019I | – | Telomere |
| ptg000020I | – | – | |
| ptg000033I | – | – | |
| ptg000007I | Telomere | – | |
| Chr 8 | ptg000013I | Telomere | Repeats |
| ptg000002l | Telomere | Repeats | |
| Chr 9 | ptg000014l | Telomere | – |
| ptg000005 | Telomere | – | |
| Chr 12 | ptg000001l | Telomere | – |
| ptg000072I | – | 5S rRNA | |
| ptg000052I | Telomere | 5S rRNA | |
Comparative repetitive elements of Hi‐C pseudo‐molecules and HiFiasm assembly
|
| Genome assembler | Size of pseudo‐molecules | Total repeats (%) | LINE (%) | LTR (%) | DNA elements (%) | Unclassified (%) | Simple repeats (%) |
|---|---|---|---|---|---|---|---|---|
| Chr 1 | Hi‐C | 67 682 215 | 62 | 4.13 | 30.3 | 0.52 | 26.8 | 0.64 |
| HiFiasm | 71 935 981 | 62.2 | 3.88 | 34.8 | 0.58 | 22.6 | 0.65 | |
| Chr 2 | Hi‐C | 63 669 590 | 66 | 3.31 | 38.2 | 1.12 | 23.3 | 0.86 |
| HiFiasm | 74 041 379 | 68 | 2.98 | 39.6 | 1.22 | 23.8 | 0.44 | |
| Chr 3 | Hi‐C | 58 143 993 | 52.3 | 6.15 | 20.5 | 1.13 | 24.2 | 0.67 |
| HiFiasm | 59 310 972 | 54.3 | 7.93 | 20.7 | 0.97 | 24.1 | 0.72 | |
| Chr 4 | Hi‐C | 56 076 407 | 55.1 | 6.26 | 22.8 | 0.79 | 24.3 | 1.13 |
| HiFiasm | 57 081 251 | 57.2 | 7.21 | 21.6 | 0.90 | 26.56 | 1.23 | |
| Chr 5 | Hi‐C | 55 220 784 | 53.1 | 3.27 | 31.4 | 0.96 | 17.0 | 0.78 |
| HiFiasm | 56 513 637 | 54.3 | 3.47 | 31.6 | 0.75 | 17.9 | 0.85 | |
| Chr 6 | Hi‐C | 53 595 462 | 55.1 | 8.75 | 19.9 | 1.18 | 25.5 | 0.79 |
| HiFiasm | 57 251 071 | 58.8 | 9.47 | 22.6 | 1.43 | 24.0 | 1.44 | |
| Chr 7 | Hi‐C | 52 077 970 | 52.9 | 7.04 | 21.3 | 1.42 | 22.4 | 0.85 |
| HiFiasm | 54 758 790 | 51.2 | 6.65 | 21.3 | 1.25 | 21.4 | 0.79 | |
| Chr 8 | Hi‐C | 49 563 658 | 44.0 | 5.39 | 15.5 | 0.64 | 21.3 | 1.25 |
| HiFiasm | 52 239 994 | 41.8 | 6.25 | 22.0 | 1.30 | 12.3 | 0 | |
| Chr 9 | Hi‐C | 49 085 581 | 48.1 | 5.39 | 18.4 | 1.76 | 22.0 | 0.86 |
| HiFiasm | 49 176 982 | 45.9 | 5.41 | 24.8 | 1.65 | 14.0 | 0 | |
| Chr 10 | Hi‐C | 48 974 653 | 48.1 | 6.24 | 17.3 | 0.90 | 22.8 | 1.02 |
| HiFiasm | 49 863 231 | 44.7 | 5.91 | 22.9 | 2.82 | 13.1 | 0 | |
| Chr 11 | Hi‐C | 47 698 009 | 48.1 | 6.24 | 17.3 | 0.90 | 22.8 | 1.02 |
| HiFiasm | 48 320 516 | 44.7 | 4.01 | 26.3 | 2.82 | 11.6 | 0 | |
| Chr 12 | Hi‐C | 46 713 600 | 44.6 | 4.47 | 16.5 | 1.39 | 21.3 | 0.85 |
| HiFiasm | 51 378 064 | 25.6 | 4.47 | 21.2 | 2.44 | 11.7 | 0 | |
| Chr 13 | Hi‐C | 45 610 911 | 42.2 | 5.52 | 13.7 | 0.70 | 21.1 | 1.31 |
| HiFiasm | 47 997 562 | 39.7 | 5.42 | 18.8 | 1.63 | 13.8 | 0 | |
| Chr 14 | Hi‐C | 45 288 529 | 42.5 | 5.82 | 12.9 | 0.96 | 21.5 | 1.50 |
| HiFiasm | 46 131 124 | 41.1 | 5.53 | 19.6 | 1.94 | 13.9 | 0 |