| Literature DB >> 30059545 |
Shivani Mahajan1, Kevin H-C Wei1, Matthew J Nalley1, Lauren Gibilisco1, Doris Bachtrog1.
Abstract
While short-read sequencing technology has resulted in a sharp increase in the number of species with genome assemblies, these assemblies are typically highly fragmented. Repeats pose the largest challenge for reference genome assembly, and pericentromeric regions and the repeat-rich Y chromosome are typically ignored from sequencing projects. Here, we assemble the genome of Drosophila miranda using long reads for contig formation, chromatin interaction maps for scaffolding and short reads, and optical mapping and bacterial artificial chromosome (BAC) clone sequencing for consensus validation. Our assembly recovers entire chromosomes and contains large fractions of repetitive DNA, including about 41.5 Mb of pericentromeric and telomeric regions, and >100 Mb of the recently formed highly repetitive neo-Y chromosome. While Y chromosome evolution is typically characterized by global sequence loss and shrinkage, the neo-Y increased in size by almost 3-fold because of the accumulation of repetitive sequences. Our high-quality assembly allows us to reconstruct the chromosomal events that have led to the unusual sex chromosome karyotype in D. miranda, including the independent de novo formation of a pair of sex chromosomes at two distinct time points, or the reversion of a former Y chromosome to an autosome.Entities:
Mesh:
Substances:
Year: 2018 PMID: 30059545 PMCID: PMC6117089 DOI: 10.1371/journal.pbio.2006348
Source DB: PubMed Journal: PLoS Biol ISSN: 1544-9173 Impact factor: 8.029
Fig 1Drosophila miranda is a model species to study sex chromosome evolution.
A. Male (left) and female (right) D. miranda. B. Mitotic chromosome squashes of male D. miranda. Both the ancestral X (XL/XR) and the Y chromosome (YD/neo-Y) show large blocks of dark staining (Giemsa), indicative of heterochromatin. The acrocentric rods are the neo-X, and chromosomes 2 and 4. C. Polytene chromosomes of a female D. miranda stained for HP1 (heterochromatin protein 1). Note the large blocks of heterochromatin (arrows) on chromosomes 2 and 4. D. Karyotype evolution in D. miranda. Chromosomal fusions between the sex chromosomes and autosomes have resulted in both the reversal of Y to an autosome as well as the independent de novo formation of new sex chromosomes from autosomes at two distinct evolutionary time points (XR and YD were formed about 15 MY ago, and the neo-X and neo-Y originated about 1.5 MY ago). Genome analysis allows us to reconstruct the temporal dynamics and molecular processes involved in sex chromosome evolution in this species. chr, chromosome; dot, dot chromosome; XL and XR, left and right arm of the X chromosome; YD, Y chromosome resulting from the unfused D element; HP1, heterochromatin protein 1; MY, million years; MYA, million years ago; Y, ancestral Y chromosome.
Fig 2Assembly and validation of Drosophila miranda genome.
A. Overview of assembly pipeline. The steps include assembly of male PacBio reads followed by scaffolding using Hi-C, and extensive QC using BioNano reads and BAC clone sequencing followed by gene and repeat annotation. B. Hi-C linkage density map. Chromatin interaction maps allow recovery of entire chromosome arms. Note that the Y-linked contigs were scaffolded separately from X-linked and autosomal contigs. Unlinked regions with many contacts indicate repetitive regions. C. Comparison of current (Dmir2.0) versus old (Dmir1.0) D. miranda assembly. Note that the Y/neo-Y was not assembled in Dmir1.0, and the dot plot indicates homology between our neo-Y assembly and the neo-X. Other repeat-rich regions, such as the large pericentromeric block on AD, are also missing from D.mir1.0. D. BAC clone mapping for assembly verification. BAC clones are color coded according to how many genomic regions they map to in our assembly; green lines indicate stitch points of scaffolds based on Hi-C contacts, and the black line gives the local repeat content along the genome. Three hundred sixty-one sequenced BAC clones (97%) map contiguously and uniquely to our genome assembly. BAC, bacterial artificial chromosome; F, female; M, male; QC, quality control; Repeat %, local repeat content.
Assembly statistics.
| Assembly | Contigs + Scaffolds | Scaffolds | Unplaced Contigs | N50 (bp) | Assembly Size (Mb) | Assembly in Scaffolds (%) |
|---|---|---|---|---|---|---|
| PacBio Falcon | 625 | NA | 625 | 2,242,328 | 273 | NA |
| PacBio Canu | 521 | NA | 521 | 3,884,273 | 296 | NA |
| Quickmerge | 271 | NA | 271 | 5,177,776 | 295 | NA |
| PacBio + Hi-C | 102 | 14 | 88 | 37,186,217 | 289 | 96.5 |
| D.mir1.0 (female only, stitched with | 4,236 | 6 | 530 | 28,826,359 | 140 | 97.9 |
| D.mir1.0 (not stitched with | 47,035 | NA | NA | 5,007 | 112 | NA |
| D.mir2.0; X-linked and autosomal scaffolds | 40 | 6 | 34 | 32,539,883 | 177 | 97.1 |
| D.mir2.0; Y-linked scaffolds | 62 | 8 | 54 | 36,637,378 | 111 | 95.7 |
| D.mir2.0 | 102 | 14 | 88 | 35,263,102 | 287 | 96.6 |
Abbreviations: NA, not applicable; N50, 50% of the assembly is contained in contigs or scaffolds equal to or larger than this value.