| Literature DB >> 32747341 |
Catherine J Nock1, Abdul Baten2,3, Ramil Mauleon2, Kirsty S Langdon2, Bruce Topp4, Craig Hardner4, Agnelo Furtado4, Robert J Henry4, Graham J King2.
Abstract
Macadamia integrifolia is a representative of the large basal eudicot family Proteaceae and the main progenitor species of the Australian native nut crop macadamia. Since its commercialisation in Hawaii fewer than 100 years ago, global production has expanded rapidly. However, genomic resources are limited in comparison to other horticultural crops. The first draft assembly of M. integrifolia had good coverage of the functional gene space but its high fragmentation has restricted its use in comparative genomics and association studies. Here we have generated an improved assembly of cultivar HAES 741 (4,094 scaffolds, 745 Mb, N50 413 kb) using a combination of Illumina paired and PacBio long read sequences. Scaffolds were anchored to 14 pseudo-chromosomes using seven genetic linkage maps. This assembly has improved contiguity and coverage, with >120 Gb of additional sequence. Following annotation, 34,274 protein-coding genes were predicted, representing 90% of the expected gene content. Our results indicate that the macadamia genome is repetitive and heterozygous. The total repeat content was 55% and genome-wide heterozygosity, estimated by read mapping, was 0.98% or an average of one SNP per 102 bp. This is the first chromosome-scale genome assembly for macadamia and the Proteaceae. It is expected to be a valuable resource for breeding, gene discovery, conservation and evolutionary genomics.Entities:
Keywords: Proteaceae; genome; nut crop; pseudo-chromosome; transcriptome
Mesh:
Year: 2020 PMID: 32747341 PMCID: PMC7534425 DOI: 10.1534/g3.120.401326
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
Figure 1Macadamia integrifolia (a) orchard (b) nut in husk (c) racemes.
Data files and library information for Macadamia integrifolia genome sequencing. *Data deposited for draft assembly v1.1
| Sequencing Platform | Library | Insert size (bp) | Read length (bp) | Sequence reads (Million) | Sequence bases (Gb) | Accession |
|---|---|---|---|---|---|---|
| Genomic data | ||||||
| Illumina | Pair end | 200 | 2 x 125 | 228.8 | 57.2 | SRR10896963 |
| Pair end | 350 | 2 x 125 | 119.0 | 29.8 | SRR10896962 | |
| Pair end | 550 | 2 x 125 | 107.6 | 26.9 | SRR10896961 | |
| Pair end | 480 | 2 x 150 | 58.7 | 17.7 | ERX1468522-23* | |
| Pair end | 700 | 2 x 150 | 28.7 | 8.7 | ERX1468524* | |
| Mate pair | 8000 | 2 x 100 | 200.3 | 40.5 | ERX1468525* | |
| PacBio | NA | 9769 | 1.0 | 6.4 | 10896960 | |
| Transcriptomic data | ||||||
| Illumina | young leaf | 300 | 2 x 100 | 77.9 | 15.6 | SRR10897159 |
| shoot | 300 | 2 x 100 | 71.6 | 14.3 | SRR10897158 | |
| flower | 300 | 2 x 100 | 83.3 | 16.7 | SRR10897157 | |
Comparison of the new Macadamia integrifolia cv. HAES 741 genome assembly with the previously published draft assembly. Coverage is based on k-mer estimated genome size of 895.7 Mb. Gaps are ambiguous base calls
| Data | Scaffold | Assembly | Annotation | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Post-QC Gb | Genome Coverage X | Number | N50 kb | Longest Mb | Assembled Genome Mb | Gaps Mb | Genome Coverage % | Repeats | Gene Models | Expected Gene content | |
| v1.1 | 51.6 | 58 | 193,493 | 4.7 | 0.64 | 518.49 | 70 | 58% | 37% | 35,337 | 77.4% |
| v2 | 161.5 | 180 | 4094 | 413.4 | 2.19 | 744.64 | 11 | 83% | 55% | 34,274 | 90.2 |
Summary of the assembled chromosomes of macadamia
| Chromosome | Length bp | scaffolds | genes |
|---|---|---|---|
| Chr01 | 36,399,236 | 126 | 1543 |
| Chr02 | 44,010,915 | 120 | 2214 |
| Chr03 | 37,831,390 | 112 | 2039 |
| Chr04 | 37,507,012 | 91 | 2133 |
| Chr05 | 46,991,412 | 110 | 2334 |
| Chr06 | 40,842,004 | 101 | 1941 |
| Chr07 | 36,789,931 | 112 | 1940 |
| Chr08 | 34,869,527 | 94 | 1975 |
| Chr09 | 42,247,492 | 109 | 2019 |
| Chr10 | 34,167,173 | 87 | 1701 |
| Chr11 | 33,967,801 | 107 | 1488 |
| Chr12 | 31,654,987 | 87 | 1523 |
| Chr13 | 29,219,948 | 100 | 1398 |
| Chr14 | 32,845,142 | 109 | 1531 |
| Unplaced | 225,433,603 | 2629 | 8495 |
| 34,274 |
Figure 2Genetic length (centimorgans, cM) vs. physical length (megabases, Mb) plotted for the Macadamia integrifolia cv. HAES 741 genome.
Figure 3The 25-mer distribution for estimation of genome heterozygosity and size. Peaks at approximately 50, 100 and 200 represent heterozygous, homozygous and repeated k-mers respectively.
Figure 4Genome wide SNP density. Thousands of SNPs per 1 Mb window, shown across each chromosome.
Figure 5Interspersed repeats and low complexity elements representing 55.1% of the macadamia genome assembly.
Figure 6Venn diagram of orthologous gene clusters for six eudicot species including the basal eudicots M. integrifolia and N. nucifera (Proteales) and core eudicots P. persica (Rosales), A. thaliana (Brassicales), C. canephora (Gentianales) and E. grandis (Myrtales).