Literature DB >> 28143951

Single-Molecule Sequencing of the Drosophila serrata Genome.

Scott L Allen1, Emily K Delaney2, Artyom Kopp2, Stephen F Chenoweth3.   

Abstract

Long-read sequencing technology promises to greatly enhance de novo assembly of genomes for nonmodel species. Although the error rates of long reads have been a stumbling block, sequencing at high coverage permits the self-correction of many errors. Here, we sequence and de novo assemble the genome of Drosophila serrata, a species from the montium subgroup that has been well-studied for latitudinal clines, sexual selection, and gene expression, but which lacks a reference genome. Using 11 PacBio single-molecule real-time (SMRT cells), we generated 12 Gbp of raw sequence data comprising ∼65 × whole-genome coverage. Read lengths averaged 8940 bp (NRead50 12,200) with the longest read at 53 kbp. We self-corrected reads using the PBDagCon algorithm and assembled the genome using the MHAP algorithm within the PBcR assembler. Total genome length was 198 Mbp with an N50 just under 1 Mbp. Contigs displayed a high degree of chromosome arm-level conservation with the D. melanogaster genome and many could be sensibly placed on the D. serrata physical map. We also provide an initial annotation for this genome using in silico gene predictions that were supported by RNA-seq data.
Copyright © 2017 Allen et al.

Entities:  

Keywords:  Celera; Drosophila; PacBio; genome assembly; long reads; montium

Mesh:

Year:  2017        PMID: 28143951      PMCID: PMC5345708          DOI: 10.1534/g3.116.037598

Source DB:  PubMed          Journal:  G3 (Bethesda)        ISSN: 2160-1836            Impact factor:   3.154


Second-generation sequencing (2GS) platforms, such as Illumina sequencing-by-synthesis, have dramatically reduced genome sequencing costs while increasing throughput exponentially (Shendure and Ji 2008). The relatively low cost and massive throughput of 2GS platforms have paved the way for sequencing and de novo assembly of thousands of species’ genomes (Alkan ). 2GS methods generate short reads (less than a few hundred base pairs in length) that have limitations for de novo genome assembly, where assembly is performed without the aid of a reference genome (Green 1997; Miller ; Nagarajan and Pop 2013; Alkan ). With short reads, de novo assembly is an inherently difficult computational problem because repetitive DNA sequences are often much longer than the length of each read (Ukkonen 1992). For instance, it has been estimated that short-read de novo assemblies could be missing up to 20% of sequence information because repeat DNA sequences can increase the number of misassembled and fragmented regions (Schatz ; Alkan ; Ukkonen 1992). One way to alleviate the problem of repetitive DNA in the de novo assembly process has been to incorporate a second set of mate-pair libraries with very long inserts (> 2 kbp) (Li ; Chaisson ; Simpson ; Alkan ; Butler ). Mate-pair libraries can resolve repeats (Treangen and Salzberg 2012; Wetzel ) and improve scaffolding (van Heesch ), but paired-end contamination and insert size misestimation can also lead to misassemblies (Phillippy ; Sahlin ). More recently, third-generation (3GS) single-molecule sequencing technologies, such as Pacific Biosciences’ (PacBio) SMRT sequencing and Oxford Nanopore’s MinION sequencing, which currently produce much longer reads of up to 54 kbp (Lee ) and > 10 kbp (Quick ), respectively, can overcome some of the shortcomings of 2GS de novo assembly (Berlin ). Although long-read sequencing technology produces reads with a high error rate, ranging from 82.1% (Chin ) to 84.6% accuracy (Rasko ), sequencing errors occur at more or less random positions across long reads (Chin ) and can be corrected with 2GS short-read data (Koren ) or by using excess 3GS reads for self-correction (Chin ). In this paper, we use PacBio long-read sequencing to de novo assemble the genome of the fly, Drosophila serrata, which has been particularly well-studied from an evolutionary standpoint. D. serrata is a member of the D. montium subgroup, which split from the D. melanogaster subgroup ∼40 MYA (Tamura ), and consists of an estimated 98 species (Brake and Bächli 2008). At present, only one draft genome assembly (D. kikkawai) is available (Chen ) from this species-rich subgroup. D. serrata has a broad geographical distribution, ranging from Papua New Guinea to south eastern Australia and has emerged as a powerful model for addressing evolutionary questions such as the evolution of species borders (Blows and Hoffman 1993; Hallas ; Magiafoglou ) and climate adaptation (Frentiu and Chenoweth 2010; Latimer ; Kellermann ). The species has also been used to investigate sexual selection (Hine ; Gosden and Chenoweth 2011; Frentiu and Chenoweth 2008; Chenoweth ), male mate choice (Chenoweth and Blows 2003; Chenoweth ), mate recognition (Higgie ), sexual dimorphism (Chenoweth ; Yassin ), sexual conflict (Delcourt ), and indirect genetic effects (Chenoweth ). In addtion, its cuticular hydrocarbons, which serve as contact pheromones (Chung ), have been extensively used to develop novel multivariate quantitative genetic approaches for exploring genetic constraints on adaptation (Blows ; Chenoweth ; McGuigan ; Rundle ). Despite the importance of D. serrata as a model for evolutionary research, our poor understanding of its genome remains a significant limitation. Linkage and physical genome maps are available (Stocker ) and an expressed sequence tag (EST) library has been developed (Frentiu ), but the species lacks a draft genome. Here, we report the sequencing and assembly of the D. serrata genome using exclusively PacBio SMRT technology. We also provide an initial annotation of the genome based on in silco gene predictors supported by empirical RNA-seq data. Our de novo genome and its annotation will provide a resource for ongoing population genomic and trait mapping studies in this species as well as facilitate broader studies of genome evolution in the family Drosophilidae.

Materials and Methods

Fly strains and DNA extraction

We sequenced a mix of ∼100 mg of males and females from a single inbred line that originated from Forster, Australia, and had been inbred via full-sib mating for 10 generations before being maintained at a large population size (N ∼ 250 individuals) (McGuigan ). A single further generation of full-sib inbreeding was applied before extraction of DNA. This same inbred line was used for the D. serrata linkage map, was the founding line for previous mutation accumulation studies (Latimer ; McGuigan , 2014a,b), and is fixed for the light female abdominal pigmentation phenotype mapped by Yassin . High molecular weight DNA was extracted from fly bodies (heads were excluded to reduce eye pigment contamination) using a QIAGEN Gentra Puregene Tissue Kit (Cat #158667), which produced fragments > 100 kbp (measured using pulsed-field gel electrophoresis). Two phenolchloroform extractions were performed at the University of California, Davis at the DNA Technologies Core prior to preparation of a standard sequencing library.

Genome sequencing and assembly

DNA was sequenced using 11 SMRT cells and P6-C4 chemistry on the PacBio RS II platform. In total, this produced ∼13 Gbp spanning 136,119 filtered subreads with a mean read length of 8840 bp and an N50 of 12,220 bp (Supplemental Material, Figure S1). The PacBio genome was assembled using the PBcR pipeline, which implements the MHAP algorithm within the Celera Assembler (version 8.3rc2) (Berlin ), and polished with Quiver (GenomicConsensus version 0.9.2 and ConsensusCore version: 0.8.8) (Chin ) in three steps: (1) errors were corrected in reads using PBDagCon, which requires at least 50 × genome coverage and utilizes the consensus of oversampled sequences (Chin ); (2) overlapping sequences were assembled using MHAP and the Celera Assembler (Berlin ); and (3) contigs were polished with Quiver to correct for spurious SNP calls and small indels (Chin ). The “sensitive” setting was used for both read correction and genome assembly (Berlin ) whereas the default settings were used for polishing with Quiver (Chin ). We elected to correct all reads as opposed to the default longest 40 ×. The longest 25 × corrected reads were subsequently used for genome assembly. The PBDagCon correction was performed on a computer with 60 CPU cores and 1 TB of RAM; 58 CPU cores were used for the assembly and the amount of RAM used, although not tracked, was far less than machine capacity. Error correction with PBDagCon took ∼26 days. Assembly of corrected reads using MHAP and the Celera Assembler took ∼19 hr using 28 CPU cores. Our initial runs using the much faster error correction algorithm (HGAP) produced a slightly shorter assembly (194 Mbp compared to 198 Mbp) with a slightly lower N50 (0.88 Mbp vs. 0.95 Mbp). Therefore, we chose to use the more sensitive PBDagCon correction method.

Transcriptome sequencing and assembly

The same inbred fly strain that was used for DNA sequencing was also used for adult mRNA sequencing to annotate the D. serrata genome. Adult males and females were transferred to fresh vials shortly after eclosion and held in groups of ∼25 where they were allowed to mate and lay eggs for 2 d. They were then sexed under light CO2 anesthesia and snap frozen using liquid nitrogen in groups of 10; at the time of freezing, all flies were assumed to be nonvirgins. Total RNA was extracted from each pool of flies using the standard TRIzol protocol. Initial quality assessment of the total RNA using a NanoDrop and gel electrophoresis indicated that the RNA was of high quality, this was later confirmed with a RNA integrity number > 7 (measured uisng a BioAnalyzer). RNA was stored at −80° for several days before being shipped for sequencing. One male and one female 75 bp paired-end sequencing library was prepared using the TruSeq Stranded mRNA Library prep kit and sequenced on an Illumina NextSeq500 at the Ramaciotti Centre for Genomics, University of New South Wales, Australia. In total, 79 and 88 million reads were produced for males and females, respectively. Quality assessment of the RNA-seq data using FastQC (Andrews 2010) indicated that the reads were of a high quality and therefore no trimming of reads was performed. The transcriptome was de novo assembled for each sex separately using Trinity version 2.1.1 (Grabherr ), where all reads were used and the jaccard_clip option was enabled to minimize gene fusion events caused by UTR overlap in high gene density regions.

Annotation

Maker version 2.31.8 (Campbell ; Holt and Yandell 2011) was used to annotate the PacBio genome via incorporation of in silico gene models detected by Augustus (Stanke and Morgenstern 2005) and/or SNAP (Johnson ), the de novo D. serrata male and female transcriptomes, and protein sequences from 12 Drosophila species genomes (D. ananassae r1.04, D. erecta r1.04, D. grimshawi r1.3, D. melanogaster r6.07, D. mojavensis r1.04, D. persmillis r1.3, D. pseudoobscura pseudoobscura r3.03, D. sechellia 1.3, D. simulans, r2.01, D. virilis r1.03, D. willistoni r1.04, and D. yakuba r1.04) obtained from FlyBase (McQuilton ; Attrill ). Repeat masking was performed based on D. melanogaster training (Smit ). Maker was run with default settings apart from allowing Maker to take extra steps to identify alternate splice variants and correct for erroneous gene fusion events.

Data availability

All sequence data including PacBio and RNA-seq reads have been submitted to public repositories and are available via the D. serrata genome NCBI project accession PRJNA355616. The genome assembly and annotation tracks are available from http://www.chenowethlab.org. We also supply a list of D. melanogaster orthologs in Table S1.

Results and Discussion

To assemble a draft D. serrata genome, we sequenced DNA from a pool of adult males and females (that originated from a single inbred line) to a coverage of ∼65 × using PacBio long-read, SMRT sequencing technology. This produced 136,119 filtered subreads with a mean read length of 8940 bp and a read N50 of 12,200 bp that spanned > ∼13 Gbp (Figure S1). The PacBio reads were assembled using the MHAP algorithm within the Celera Assembler (Miller ; Berlin ) after self-correction using PBDagCon (Chin ). The final genome was polished with a single iteration of Quiver (Chin ) and consisted of 1360 contigs spanning > 198 Mbp with a GC content of 39.13% (Table 1). The longest contig was ∼7.3 Mbp and the N50 of all contigs was ∼0.95 Mbp. Flow cytometry studies suggest that species of the montium subgroup commonly have genome lengths over 200 Mbp (Gregory and Johnston 2008) with the estimate for the female D. serrata genome being ∼215 Mbp (0.22 pg). This estimate is in broad agreement with our assembly length of 198 Mbp for the female genome.
Table 1 D.

serrata genome assembly statistics

DescriptionStatistic
Number of contigs1360
Genome size (bp)198,298,763
Longest contig (bp)7,300,740
< 1 kbp0.0%
1–10 kbp3.3%
10–100 kbp78.8%
100–1000 kbp15.3%
> 1 Mbp2.6%
N50 (bp)942,627
GC content39.13%

Contig length percentages refer to percent total length in each size bin.

Contig length percentages refer to percent total length in each size bin.

Completeness

Genome completeness was assessed using BUSCO gene set analysis version 2.0, which includes a set of 2799 genes specific to Diptera (Simao ). The D. serrata assembly contained 96.2% of the BUSCO genes with 94.1% being complete single-copy (defined as complete when the gene’s length is within 2 SDs of the BUSCO group’s mean length) and 2.5% detected as fragmented. Only 1.3% of the BUSCO genes were not found in the D. serrata assembly (Table 2). Completeness of the D. serrata genome was similar to the reference D. melanogaster genome (version r6.05), which contained 98.7% complete BUSCO genes. As a further point of comparison, we computed BUSCO metrics for a recent PacBio-only assembly of the D. melanogaster ISO1 strain genome using all 790 contigs rather than the 132 that were constructed from > 50 reads only [http://www.cbcb.umd.edu/software/PBcR/MHAP/ (quivered full assembly)], and we also analyzed the only other member of the montium subgroup with a publicly available genome assembly, D. kikkawai, (https://www.hgsc.bcm.edu/arthropods/drosophila-modencode-project; NCBI PRJNA62319). Although these assemblies tended to contain marginally lower numbers of missing BUSCOs, metrics were generally very similar (Table 2), indicating a high level of completeness for the D. serrata assembly.
Table 2

BUSCO gene content assessment for D. serrata and two different D. melanogaster assemblies, version r6.05 from www.flybase.org, and the full ISO 1 PacBio assembly of Berlin consisting of 790 contigs, also constructed with the PBcR pipeline

CategoryD. serrataD. kikkawaiD. melanogaster
r6.05PacBio
Complete Single-copy  BUSCOs (%)94.197.198.297.7
Duplicated (%)2.11.00.50.6
Fragmented BUSCOs (%)2.51.20.80.8
Missing BUSCOs (%)1.30.80.50.9

A total of 2799 BUSCOs were searched that form a set of highly conserved Dipteran genes. PacBio, Pacific Biosciences; BUSCO, Benchmarking Universal Single-Copy Ortholog.

A total of 2799 BUSCOs were searched that form a set of highly conserved Dipteran genes. PacBio, Pacific Biosciences; BUSCO, Benchmarking Universal Single-Copy Ortholog.

Fragmentation and misassemblies

Although our assembly consisted of 1360 contigs with a N50 of 0.94 Mbp, which was an N50 at the upper end of what might be expected for a short-read assembly, it is much lower than a recent PacBio-only assembly of the D. melanogaster genome (Berlin ). There are several reasons why this might be the case. First, we report metrics on all contigs in the assembly rather than excluding those that incorporated < 50 reads, as was the case for the D. melanogaster assembly (Berlin ) (132 contigs with an N50 of 13.6 M). Excluding such contigs resulted in a D. serrata assembly of only 273 contigs with a total genome length of 175 Mbp (vs. 198 Mbp) and an N50 of 1.4 Mbp. In this reduced assembly, half of the genome was represented in only 25 contigs, which is closer to the performance seen for D. melanogaster. While contigs with < 50 read support were generally short (median 23.5 kbp and range 6.3–110 kbp) and could be excluded in some cases on the basis of quality, when we examined the D. serrata annotation data, we saw that many of these contigs contained predicted genes that had RNA-seq support, including 14 complete single-copy BUSCOs. Therefore, we have retained all contigs in our assembly. Second, although our N50 filtered subread length of 12,200 kbp is on a par with the D. melanogaster P5-C3 filtered subread lengths (12.2–14.2 kbp) (Kim ), we had approximately half the coverage of the D. melanogaster assembly (65 × vs. 130 ×), which may have reduced our ability to span repetitive regions of the D. serrata genome. To examine this further, we reran the PBcR pipeline with D. melanogaster data from Kim but downsampled it to 65 ×. We did not see genome contiguity drop to the levels seen for D. serrata (data not shown) and note that similar findings were observed by Chakraborty (see their Figure 5). Therefore, it seems likely that the D. serrata genome, which is longer than that of D. melanogaster, may also be more complex due to longer repetitive regions. Therefore, adequate repeat-spanning coverage would presumably require additional very long reads to achieve the same assembly contiguity seen for D. melanogaster. A third factor possibly contributing to a higher degree of fragmentation in our assembly is residual heterozygosity, which may have been higher in our D. serrata line than the ISO1 D. melanogaster line. We used several methods to assess the quality of the genome with regards to misassemblies. First, because the D. serrata physical map indicates very strong chromosome arm-level conservation of gene content between D. serrata and D. melanogaster (Stocker ), we examined possible misassemblies between chromosomal arms by aligning the six largest contigs (total length ∼37 Mbp) to the D. melanogaster genome using MUMmer (Kurtz ). If there were no chromosome arm misplacements, then it was expected that each contig would align to a single D. melanogaster chromosome arm, albeit fragmented due to changes in gene order. This was largely the case (Figure 1), where each contig aligned to a single D. melanogaster chromosome arm but with minor sections of alignment to other chromosome arms toward the contig edges where repetitive elements were more likely to be found. The one major exception to this general pattern of conservation was found in the longest contig in the assembly, contig 3208, which aligned mainly to D. melanogaster 3R but contained an ∼600 kbp segment that aligned to D. melanogaster 3L. To test whether this was likely to be a misassembly, we searched the contig for previously published SNP markers that have been placed on the D. serrata linkage map. The marker m25 (Stocker ), which maps to 3L, was located in the suspected misassembled region (contig 3208 and position 3,537,591) indicating that a misassembly rather than a genomic translocation rearrangement between 3R and 3L was most likely.
Figure 1

Alignment of the six longest contigs from the D. serrata assembly to D. melanogaster genome version 6.05. Red dots indicate a MUMmer alignment that matches to the D. melanogaster genome in the forward orientation; blue dots indicate a MUMmer alignment that matches to the D. melanogaster genome in the reverse orientation. M, million.

Alignment of the six longest contigs from the D. serrata assembly to D. melanogaster genome version 6.05. Red dots indicate a MUMmer alignment that matches to the D. melanogaster genome in the forward orientation; blue dots indicate a MUMmer alignment that matches to the D. melanogaster genome in the reverse orientation. M, million. To further examine assembly quality, we compared our assembly to the entire physical genome map of D. serrata (Stocker ), where in situ hybridization was used to physically locate 78 genes. We were able to assess possible misassemblies when a contig contained multiple physically mapped genes (11 contigs ranging in size from ∼1 to ∼6 Mbp). Using this approach, we observed no apparent chromosome arm-level assignment errors beyond that seen for contig 3208 (Figure 2). Furthermore, when contigs contained three or more physically mapped genes, gene order could be examined. We saw three cases of apparent gene order reversal (two on 2R and one on 3L). Interestingly, two of these regions map to the positions of known chromosomal inversions (Mavragani-Tsipidou ), which is perhaps not unexpected given that different inbred lines were used for the physical map and genome sequencing. After considering these probable inversions, gene order and location appears to be largely correct for these 11 contigs at least. For contig 3208, each section that aligned to 3R could be placed on the physical map only after splitting the contig into three pieces based on the previously identified misassembly.
Figure 2

Comparison between the draft genome assembly and the physical D. serrata genome map, image is adapted from Stocker . Genes in red were mapped by Drosopoulou and Scouras (1995, 1998), Drosopoulou , 1997, 2002), and Pardali . Genes in blue are also included in the linkage map produced by Stocker . Thin red lines are inversions found by Stocker and thin black lines are inversions found by Mavragani-Tsipidou . Contig3208 (shown in red), was split into three parts based on the misassembly; parts 1 and 3 aligned with D. melanogaster 3R and part 2 aligned with 3L (Figure 1). Markers Act88F and hsp70 were not mapped to contigs because the former appears twice and nomenclature changes meant we could not be certain exactly which gene hsp70 was referring to.

Comparison between the draft genome assembly and the physical D. serrata genome map, image is adapted from Stocker . Genes in red were mapped by Drosopoulou and Scouras (1995, 1998), Drosopoulou , 1997, 2002), and Pardali . Genes in blue are also included in the linkage map produced by Stocker . Thin red lines are inversions found by Stocker and thin black lines are inversions found by Mavragani-Tsipidou . Contig3208 (shown in red), was split into three parts based on the misassembly; parts 1 and 3 aligned with D. melanogaster 3R and part 2 aligned with 3L (Figure 1). Markers Act88F and hsp70 were not mapped to contigs because the former appears twice and nomenclature changes meant we could not be certain exactly which gene hsp70 was referring to. The conservation of chromosome arm-level gene content was a common feature of the remaining contigs as well. For example, while only 354 contigs contained significant tBLASTx hits to at least one D. melanogaster gene (genome version 6.05), these contigs spanned 167 Mbp, and the vast majority had > 90% tBLASTx hits to a single D. melanogaster chromosomal arm (mean = 96.35% and median = 100%) (Figure 3). Furthermore, only 34 contigs displayed < 90% similarity to D. melanogaster and a linear regression where contig size predicted percent similarity indicated no significant relationship (F(1,32) = 0.4003, P = 0.5314), suggesting that very large contigs were no more likely to be misassembled than short contigs.
Figure 3

Comparison of D. serrata gene locations relative to D. melanogaster. On average, > 95% of tBLASTx hits to D. melanogaster genes (version 6.05) in each contig map to a single D. melanogaster arm.

Comparison of D. serrata gene locations relative to D. melanogaster. On average, > 95% of tBLASTx hits to D. melanogaster genes (version 6.05) in each contig map to a single D. melanogaster arm. To facilitate annotation of the D. serrata genome, we sequenced mRNA from male and female adult flies. The in silico gene predictors SNAP (Johnson ) and Augustus (Stanke and Morgenstern 2005) found 22,718 and 15,984 genes, respectively. Of these in silico predicted genes, a total of 14,271 protein coding genes were sufficiently supported by RNA-seq and/or protein sequence data to be annotated by Maker2 (Holt and Yandell 2011). Maker scores annotations using the annotation edit distance (AED), a zero-to-one score where a value of zero indicates that the in silico annotation and the empirical evidence are in perfect agreement and a value of one indicates that the in silico annotation has no support from empirical data (Eilbeck ). The AED for the D. serrata genome had a mean score of 0.18 and median of 0.13, suggesting that most annotations were of high quality with strong empirical support. While the number of genes we annotated in D. serrata is similar to the 13,929 protein coding genes that have currently been annotated in D. melanogaster (genome version 6.05), we annotated far fewer total transcripts (31,482 identified in D. melanogaster versus 16,202 in D. serrata) (Attrill ); this is likely due to the larger number of tissue types and life stages for which D. melanogaster gene expression has been characterized with RNA-seq. For instance, considering that in Drosophila appreciable numbers of genes peak in expression during early life stages such as embryogenesis (Arbeitman ), our use of adult fly RNA-seq data may mean that some such genes are yet to be annotated. Furthermore, as we used mRNA-seq, we have not yet annotated noncoding genes of which there are 3503 in the D. melanogaster genome (Attrill ). Future RNA-seq datasets will be used to update the existing gene models. We observed differences in gene, exon, and intron lengths between D. serrata and D. melanogaster. In D. serrata, there were on average 3.9 exons per protein coding gene and the gene, exon, and intron lengths were 4655, 451, and 699 bp respectively. Apart from average exon number, which does not differ between the two species, these values are lower than those for D. melanogaster protein coding genes (genome version 6.05), where the mean gene, exon, and intron lengths are 6962, 539, and 1704 bp, respectively (Attrill ). The lower average intron length observed in D. serrata may be a consequence of annotating far fewer alternate splice variants. In total, coding sequence comprised 33.6% of the genome when including introns and 15.4% of the genome when considering only exons. Lower percentage intron content has been associated with overall longer genomes in Drosophilidae (Gregory and Johnston 2008), which is consistent with our observations here. Many of the annotated genes in D. serrata were found to be putative orthologs of D. melanogaster genes (Table S1). In total, 10,995 (77%) were found to be orthologs via best reciprocal BLAST (Huynen and Bork 1998; Moreno-Hagelsieb and Latimer 2008; Tatusov ) using tBLASTx with default settings (Camacho ) and version 6.05 of the D. melanogaster genome (Drosophila 12 Genomes Consortium 2007; McQuilton ). The median e-value of each reciprocal comparison was zero, indicating that most orthologs are very similar to one another. Furthermore, when comparing D. serrata genes to D. melanogaster, the largest e-value was 1.6 with only 85 orthologs having an e-value > 1e−10. Similarly, when comparing D. melanogaster genes to D. serrata, the largest e-value was 0.18 with only 78 orthologs having an e-value > 1e−10. The correlation between e-values for the reciprocal BLAST was 0.88.

Conclusions

We have assembled a draft genome for a species with no existing genome using only 3GS data. Our study indicates the feasibility of long-read-only genome assembly for nonmodel species with modest sized genomes when using an inbred line. While either greater 3GS coverage or a hybrid merged assembly (Chakraborty ) may be required to provide greater genome contiguity, it is clear that the genome has a high degree of completeness in terms of gene content and that misassemblies at chromosome arm-level are rare. The genome and its initial annotation provide a useful resource of future population genomic and trait mapping studies in this species.

Supplementary Material

Supplemental material is available online at www.g3journal.org/lookup/suppl/doi:10.1534/g3.116.037598/-/DC1. Click here for additional data file. Click here for additional data file.
  77 in total

1.  Positive genetic correlation between female preference and offspring fitness.

Authors:  Emma Hine; Shelly Lachish; Megan Higgie; Mark W Blows
Journal:  Proc Biol Sci       Date:  2002-11-07       Impact factor: 5.349

2.  Orientation of the genetic variance-covariance matrix and the fitness surface for multiple male sexually selected traits.

Authors:  Mark W Blows; Stephen F Chenoweth; Emma Hine
Journal:  Am Nat       Date:  2004-03-09       Impact factor: 3.926

3.  Isolation, characterization, and localization of beta-tubulin genomic clones of three Drosophila montium subgroup species.

Authors:  Elena Drosopoulou; Karin Wiebauer; Minas Yiangou; Penelope Mavragani-Tsipidou; Horst Domdey; Zacharias G Scouras
Journal:  Genome       Date:  2002-06       Impact factor: 2.166

4.  Natural selection and the reinforcement of mate recognition.

Authors:  M Higgie; S Chenoweth; M W Blows
Journal:  Science       Date:  2000-10-20       Impact factor: 47.728

5.  Inversion frequencies in Drosophila serrata along an eastern Australian transect.

Authors:  Ann Jacob Stocker; Brad Foley; Ary Hoffmann
Journal:  Genome       Date:  2004-12       Impact factor: 2.166

6.  Clinal variation in Drosophila serrata for stress resistance and body size.

Authors:  Rebecca Hallas; Michele Schiffer; Ary A Hoffmann
Journal:  Genet Res       Date:  2002-04       Impact factor: 1.588

7.  Temporal patterns of fruit fly (Drosophila) evolution revealed by mutation clocks.

Authors:  Koichiro Tamura; Sankar Subramanian; Sudhir Kumar
Journal:  Mol Biol Evol       Date:  2003-08-29       Impact factor: 16.240

8.  Versatile and open software for comparing large genomes.

Authors:  Stefan Kurtz; Adam Phillippy; Arthur L Delcher; Michael Smoot; Martin Shumway; Corina Antonescu; Steven L Salzberg
Journal:  Genome Biol       Date:  2004-01-30       Impact factor: 13.583

9.  Signal trait sexual dimorphism and mutual sexual selection in Drosophila serrata.

Authors:  Stephen F Chenoweth; Mark W Blows
Journal:  Evolution       Date:  2003-10       Impact factor: 3.694

10.  Gene expression during the life cycle of Drosophila melanogaster.

Authors:  Michelle N Arbeitman; Eileen E M Furlong; Farhad Imam; Eric Johnson; Brian H Null; Bruce S Baker; Mark A Krasnow; Matthew P Scott; Ronald W Davis; Kevin P White
Journal:  Science       Date:  2002-09-27       Impact factor: 47.728

View more
  10 in total

1.  A phylogeny for the Drosophila montium species group: A model clade for comparative analyses.

Authors:  William R Conner; Emily K Delaney; Michael J Bronski; Paul S Ginsberg; Timothy B Wheeler; Kelly M Richardson; Brooke Peckenpaugh; Kevin J Kim; Masayoshi Watada; Ary A Hoffmann; Michael B Eisen; Artyom Kopp; Brandon S Cooper; Michael Turelli
Journal:  Mol Phylogenet Evol       Date:  2020-12-31       Impact factor: 4.286

2.  SMRT sequencing of the full-length transcriptome of the white-backed planthopper Sogatella furcifera.

Authors:  Jing Chen; Yaya Yu; Kui Kang; Daowei Zhang
Journal:  PeerJ       Date:  2020-06-09       Impact factor: 2.984

3.  Single-molecule real-time sequencing facilitates the analysis of transcripts and splice isoforms of anthers in Chinese cabbage (Brassica rapa L. ssp. pekinensis).

Authors:  Chong Tan; Hongxin Liu; Jie Ren; Xueling Ye; Hui Feng; Zhiyong Liu
Journal:  BMC Plant Biol       Date:  2019-11-27       Impact factor: 4.215

4.  Maintenance of quantitative genetic variance in complex, multitrait phenotypes: the contribution of rare, large effect variants in 2 Drosophila species.

Authors:  Emma Hine; Daniel E Runcie; Scott L Allen; Yiguan Wang; Stephen F Chenoweth; Mark W Blows; Katrina McGuigan
Journal:  Genetics       Date:  2022-09-30       Impact factor: 4.402

5.  A Genomic Reference Panel for Drosophila serrata.

Authors:  Adam J Reddiex; Scott L Allen; Stephen F Chenoweth
Journal:  G3 (Bethesda)       Date:  2018-03-28       Impact factor: 3.154

6.  Whole Genome Sequences of 23 Species from the Drosophila montium Species Group (Diptera: Drosophilidae): A Resource for Testing Evolutionary Hypotheses.

Authors:  Michael J Bronski; Ciera C Martinez; Holli A Weld; Michael B Eisen
Journal:  G3 (Bethesda)       Date:  2020-05-04       Impact factor: 3.154

7.  An investigation of Y chromosome incorporations in 400 species of Drosophila and related genera.

Authors:  Eduardo G Dupim; Gabriel Goldstein; Thyago Vanderlinde; Suzana C Vaz; Flávia Krsticevic; Aline Bastos; Thadeo Pinhão; Marcos Torres; Jean R David; Carlos R Vilela; Antonio Bernardo Carvalho
Journal:  PLoS Genet       Date:  2018-11-02       Impact factor: 5.917

8.  Highly Contiguous Genome Assemblies of 15 Drosophila Species Generated Using Nanopore Sequencing.

Authors:  Danny E Miller; Cynthia Staber; Julia Zeitlinger; R Scott Hawley
Journal:  G3 (Bethesda)       Date:  2018-10-03       Impact factor: 3.154

9.  Comprehensive evaluation of non-hybrid genome assembly tools for third-generation PacBio long-read sequence data.

Authors:  Vasanthan Jayakumar; Yasubumi Sakakibara
Journal:  Brief Bioinform       Date:  2019-05-21       Impact factor: 11.622

10.  The Transposable Elements of the Drosophila serrata Reference Panel.

Authors:  Zachery Tiedeman; Sarah Signor
Journal:  Genome Biol Evol       Date:  2021-09-01       Impact factor: 3.416

  10 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.