| Literature DB >> 26772750 |
John W Davey1, Mathieu Chouteau2, Sarah L Barker3, Luana Maroja4, Simon W Baxter5, Fraser Simpson6, Richard M Merrill, Mathieu Joron2, James Mallet6, Kanchon K Dasmahapatra6, Chris D Jiggins1.
Abstract
The Heliconius butterflies are a widely studied adaptive radiation of 46 species spread across Central and South America, several of which are known to hybridize in the wild. Here, we present a substantially improved assembly of the Heliconius melpomene genome, developed using novel methods that should be applicable to improving other genome assemblies produced using short read sequencing. First, we whole-genome-sequenced a pedigree to produce a linkage map incorporating 99% of the genome. Second, we incorporated haplotype scaffolds extensively to produce a more complete haploid version of the draft genome. Third, we incorporated ∼20x coverage of Pacific Biosciences sequencing, and scaffolded the haploid genome using an assembly of this long-read sequence. These improvements result in a genome of 795 scaffolds, 275 Mb in length, with an N50 length of 2.1 Mb, an N50 number of 34, and with 99% of the genome placed, and 84% anchored on chromosomes. We use the new genome assembly to confirm that the Heliconius genome underwent 10 chromosome fusions since the split with its sister genus Eueides, over a period of about 6 million yr.Entities:
Keywords: Eueides; Heliconius; chromosome fusions; genome assembly; linkage mapping
Mesh:
Year: 2016 PMID: 26772750 PMCID: PMC4777131 DOI: 10.1534/g3.115.023655
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
Figure 1Genome assembly quality. A perfect assembly would appear as an almost straight vertical line. Horizontal plateaus indicate many very small scaffolds. The top right end of each curve shows the number of scaffolds and genome size in the whole assembly. See Table 1 for statistics.
Statistics for genome assembly versions
| Assembly | Length (Mb) | Scaffolds | Scaffold N50 Number | Scaffold N50 Length | Contig N50 Length (kb) |
|---|---|---|---|---|---|
| Hmel1.1 | 273 | 4309 | 345 | 194 kb | 51 |
| Hmel1.1 with haplotypes | 343 | 12,386 | 567 | 128 kb | 33 |
| Hmel1.1 haploid | 289 | 6689 | 346 | 214 kb | 47 |
| PacBio FALCON | 325 | 11,121 | 719 | 96 kb | 96 |
| PacBio haploid | 256 | 4565 | 345 | 178 kb | 178 |
| Hmel1.1 + PacBio | 283 | 2961 | 113 | 629 kb | 316 |
| Hmel2 | 275 | 795 | 34 | 2.1 Mb | 330 |
N50 number, number of scaffolds as long as or longer than the N50 length; N50 length, length of scaffold or contig such that 50% of the genome is in scaffolds or contigs of this length or longer.
Figure 2The Hmel2 genome assembly. Chromosome numbers shown on the left. Each chromosome has a genetic map and a physical map. Linkage markers (alternating blue and orange vertical lines) connect to physical ranges for each marker (alternating blue and orange horizontal lines) scaled to maximum chromosome length (x-axis at the bottom of each page). Scaffolds are shown in green (anchored), orange (one unoriented scaffold placed at a marker), and alternating light and dark red (multiple unordered scaffolds placed at one marker). Red scaffolds at each marker are arbitrarily ordered by length. Eueides chromosome synteny is shown above each chromosome (see Figure 4).
Figure 4Chromosome fusions in H. melpomene. Chromosomes of H. melpomene ordered by length. Unfused Heliconius chromosomes in pink; fused Eueides/Melitaea chromosomes in orange and blue, longest chromosome of each pair in blue. Melitaea chromosome numbers in white. Black line, beginning of H. melpomene chromosome in Hmel2. Black labels, loci known to be associated with color pattern features or altitude (alt) in H. melpomene or H. erato (Nadeau ); see Table S4 for details.
Figure 3SNPs across the B/D locus scaffold for the major marker types Maternal (F1 mother heterozygous, F1 father homozygous), Paternal (F1 father heterozygous, F1 mother homozygous), and Intercross (both F1 parents heterozygous); see Table B in File S2 for marker type details. Kinesin, Dennis, Rays and Optix are major features of the locus (Baxter ; Reed ; Wallbank ). Vertical lines, SNPs; horizontal lines, linkage map marker ranges (cf. Figure 2). SNP colors: black, maternal pattern for chromosome 18; alternating blue and orange, linkage map markers from 1.45 cM to 11.6 cM on chromosome 18 (cf. Figure 2); gray; misassembly, now on chromosome 16.
Genome assembly statistics for Hmel1.1, Hmel2, and other published Lepidopteran genomes
| Hmel1.1 | Hmel2 | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Scaffolds | 4309 | 795 | 43,462 | 5397 | 29,988 | 8261 | 68,029 | 3873 | 5572 | 1819 |
| Total length (bp) | 273,786,188 | 275,198,613 | 481,803,763 | 248,564,116 | 298,173,436 | 389,907,520 | 375,987,417 | 227,005,758 | 243,890,167 | 394,062,517 |
| Mean scaffold size (bp) | 63,538 | 346,161 | 11,085 | 46,055 | 9943 | 47,198 | 5526 | 58,612 | 43,770 | 216,636 |
| Maximum scaffold size (bp) | 1,451,426 | 9,352,983 | 16,203,812 | 6,243,218 | 3,082,282 | 668,473 | 1,977,235 | 9,881,032 | 16,292,344 | 3,493,687 |
| Scaffold N50 length (bp) | 194,302 | 2,102,720 | 4,008,358 | 715,606 | 525,349 | 119,328 | 230,299 | 3,672,263 | 6,198,915 | 737,182 |
| Scaffold N90 length (bp) | 38,051 | 273,111 | 61,147 | 160,499 | 60,308 | 29,598 | 2022 | 930,396 | 533,617 | 152,088 |
| Scaffold N95 length (bp) | 21,864 | 124,798 | 928 | 68,064 | 1913 | 16,097 | 945 | 417,439 | 160,478 | 72,492 |
| Scaffold N50 number | 345 | 34 | 38 | 101 | 160 | 970 | 421 | 21 | 16 | 155 |
| Scaffold N90 number | 1634 | 176 | 258 | 366 | 689 | 3396 | 7589 | 63 | 48 | 575 |
| Scaffold N95 number | 2105 | 251 | 5679 | 483 | 3385 | 4263 | 21,037 | 81 | 91 | 753 |
| Contigs | 11,607 | 3105 | 87,972 | 10,545 | 52,985 | 45,618 | 96,532 | 13,441 | 10,483 | 15,764 |
| Mean contig size (bp) | 23,231 | 88,314 | 4907 | 22,939 | 5466 | 7914 | 3754 | 16,239 | 22,697 | 24,557 |
| Contig N50 length | 51,611 | 330,037 | 15,765 | 113,903 | 18,018 | 15,003 | 12,958 | 51,561 | 133,779 | 59,184 |
| Gaps | 7298 | 2310 | 44,510 | 5148 | 22,997 | 37,357 | 28,503 | 9568 | 4911 | 13,945 |
| Total gap length (bp) | 4,132,701 | 981,612 | 50,083,569 | 6,664,276 | 8,535,705 | 28,877,732 | 13,599,067 | 8,725,522 | 5,949,704 | 6,937,203 |
| Gap % | 1.5 | 0.4 | 10.4 | 2.7 | 2.9 | 7.4 | 3.6 | 3.8 | 2.4 | 1.8 |
| Complete BUSCOs % | 81.6 | 85.5 | 75.5 | 87.1 | 77.7 | 55.8 | 75.8 | 76.7 | 84.2 | 75.0 |
| Duplicated BUSCOs % | 2.9 | 3.1 | 2.2 | 3.6 | 2.7 | 1.7 | 2.7 | 2.5 | 3.1 | 20.4 |
| Fragmented BUSCOs % | 11.1 | 9.5 | 16.1 | 10.1 | 13.9 | 20.6 | 14.6 | 12.3 | 8.3 | 11.8 |
| Missing BUSCOs % | 7.3 | 5.0 | 8.4 | 2.8 | 8.4 | 23.6 | 9.6 | 11.0 | 7.5 | 13.2 |
See Table 1 legend for definitions of N50 length and number. BUSCO (Benchmarking Universal Single-Copy Ortholog) values are based on a set of 2675 arthropod BUSCOs (Simão ). Complete and duplicated BUSCOs are included in the count of complete single-copy BUSCOs. See Materials and Methods for details of genomes and calculation of statistics.