| Literature DB >> 25858959 |
Flavia J Krsticevic1, Carlos G Schrago2, A Bernardo Carvalho3.
Abstract
The autosomal gene Mst77F of Drosophila melanogaster is essential for male fertility. In 2010, Krsticevic et al. (Genetics 184: 295-307) found 18 Y-linked copies of Mst77F ("Mst77Y"), which collectively account for 20% of the functional Mst77F-like mRNA. The Mst77Y genes were severely misassembled in the then-available genome assembly and were identified by cloning and sequencing polymerase chain reaction products. The genomic structure of the Mst77Y region and the possible existence of additional copies remained unknown. The recent publication of two long-read assemblies of D. melanogaster prompted us to reinvestigate this challenging region of the Y chromosome. We found that the Illumina Synthetic Long Reads assembly failed in the Mst77Y region, most likely because of its tandem duplication structure. The PacBio MHAP assembly of the Mst77Y region seems to be very accurate, as revealed by comparisons with the previously found Mst77Y genes, a bacterial artificial chromosome sequence, and Illumina reads of the same strain. We found that the Mst77Y region spans 96 kb and originated from a 3.4-kb transposition from chromosome 3L to the Y chromosome, followed by tandem duplications inside the Y chromosome and invasion of transposable elements, which account for 48% of its length. Twelve of the 18 Mst77Y genes found in 2010 were confirmed in the PacBio assembly, the remaining six being polymerase chain reaction-induced artifacts. There are several identical copies of some Mst77Y genes, coincidentally bringing the total copy number to 18. Besides providing a detailed picture of the Mst77Y region, our results highlight the utility of PacBio technology in assembling difficult genomic regions such as tandemly repeated genes.Entities:
Keywords: Drosophila melanogaster; Mst77F; PacBio; Y chromosome; long-read assembly
Mesh:
Year: 2015 PMID: 25858959 PMCID: PMC4478544 DOI: 10.1534/g3.115.017277
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
Mst77Y genes in different assemblies of the D. melanogaster genome
| Assembly | Perfect Matches | With Errors | Number of Scaffolds | Scaffold Size, kb | |
|---|---|---|---|---|---|
| SLR | 10 | 8 | 2 | 7 | 3−13 |
| MHAP | 18 | 18 | − | 1 | 747 |
| PBcR | 20 | 17 | 3 | 2 | 20; 177 |
| FALCON | 18 | 11 | 7 | 1 | 619 |
| WGS3 | 6 | 2 | 4 | 6 | <2 |
100% identical over the entire length to some gene described in Krsticevic .
Figure 1General view of the Mst77Y region (MHAP assembly). All 18 Mst77Y genes are located in a single contig (JSAE01000257). Gene names were abridged (Mst77Y-1 as “Y1,” Mst77Y-17ψ as “Y17,” and so forth). All genes have the same orientation (not visible at this scale). The red tick near 110 kb marks the unmatched k-mer found in this region (caused by a C/T substitution in an intergenic region). The pseudogenes of Pka-R1 and CG3618, which flank each Mst77Y gene, were omitted for the sake of clarity. Repeats (mostly retrotransposons) occupy 48% of the sequence.
Figure 2Validation of Mst77Y genes by alignment with PacBio reads. Gene names were abridged: Mst77F to 77F, Mst77Y-1 to Y1, and so forth. PacBio reads were aligned with bwa against the 18 Mst77Y genes identified by Krsticevic , and alignment depth was calculated with bedtools. Sequencing depth is ∼90× for autosomes (dashed line marked with “diploid”) and ∼45× for sex-chromosomes (“haploid” dashed line; http://bergmanlab.smith.man.ac.uk/?p=2176). Note that the six genes absent from the assembly (Y2, Y5, Y8, Y9, Y11, and Y14) have essentially zero coverage, and hence are artifacts (see MHAP assembly and description of the Mst77Y region section). Note also that Y6 and Y7 behave as diploids (and indeed have two copies in the assembled scaffold), whereas the coverage of Y4 and Y12 suggest three copies (which indeed are found in the assembly). We used PacBio reads before “polishment” (i.e., error correction), so these are essentially raw reads.
Assembly errors of PacBio assemblies in the Mst77Y region
| Assembly | Contig | Coordinates | Unmatched | Regions With Zero Coverage | Total Base Pairs With Zero Coverage |
|---|---|---|---|---|---|
| MHAP | JSAE01000257 | 85040−180612 | 1 | 0 | 0 |
| PBcR | 0_176540 | 87315−173699 | 9 | 1 | 245 |
| FALCON | 0032_03 | 436429 −531977 | 36 | 11 | 138 |
Figure 3Misassemblies in a region of the PBcR assembly. This snapshot of the IGV browser shows a region of contig 0_176540 that seemed to contain a new Mst77Y gene (characterized by a 5′ deletion; labeled as “Mst77Y-?”). Note the “zero Illumina coverage” region and the presence of unmatched k-mers (marked in red), which show that the region was misassembled, and that the new gene is an artifact.