Literature DB >> 29703783

A High-Quality Reference Genome for the Invasive Mosquitofish Gambusia affinis Using a Chicago Library.

Sandra L Hoffberg1, Nicholas J Troendle2, Travis C Glenn2,3,4, Ousman Mahmud2,5,6, Swarnali Louha4, Domitille Chalopin2, Jeffrey L Bennetzen2, Rodney Mauricio2.   

Abstract

The western mosquitofish, Gambusia affinis, is a freshwater poecilid fish native to the southeastern United States but with a global distribution due to widespread human introduction. Gambusia affinis has been used as a model species for a broad range of evolutionary and ecological studies. We sequenced the genome of a male G. affinis to facilitate genetic studies in diverse fields including invasion biology and comparative genetics. We generated Illumina short read data from paired-end libraries and in vitro proximity-ligation libraries. We obtained 54.9× coverage, N50 contig length of 17.6 kb, and N50 scaffold length of 6.65 Mb. Compared to two other species in the Poeciliidae family, G. affinis has slightly fewer genes that have shorter total, exon, and intron length on average. Using a set of universal single-copy orthologs in fish genomes, we found 95.5% of these genes were complete in the G. affinis assembly. The number of transposable elements in the G. affinis assembly is similar to those of closely related species. The high-quality genome sequence and annotations we report will be valuable resources for scientists to map the genetic architecture of traits of interest in this species.
Copyright © 2018 Hoffberg et al.

Entities:  

Keywords:  Dovetail Genomics; Gambusia affinis; Poecilid; de novo assembly; whole genome sequencing

Mesh:

Substances:

Year:  2018        PMID: 29703783      PMCID: PMC5982815          DOI: 10.1534/g3.118.200101

Source DB:  PubMed          Journal:  G3 (Bethesda)        ISSN: 2160-1836            Impact factor:   3.154


Mosquitofish (Gambusia affinis) are small freshwater fish in the Poeciliidae family that are native to the southeastern United Stated and northern Mexico and live on the banks of streams or lakes. Many species in the Poeciliidae have been used as models for a broad range of evolutionary and ecological studies. For example, since the poeciliids are viviparous, they have been studied for the evolution of the placenta and viviparity (Pollux ). Related fish in the Poeciliidae family have been used to study life history evolution (Reznick and Endler 1982), phenotypic plasticity (Trexler and Travis 1990), adaptations to extreme environments (Tobler ; Tobler ), sex-chromosome evolution (Lamatsch ), sex determination systems (Volff and Schartl 2001), sexual selection (Basolo 1990), genital evolution (Langerhans ), biological invasions (Sakai ) and social behavior (Constanz 1975; Farr 1980). Since the early 1900s, the mosquitofish has rapidly expanded from its native range in North America and is now considered one of the most widespread freshwater fish in the world (Pyke 2008). It was purposely introduced into every continent besides Antarctica (Krumholz 1948; Lever 1996) to control mosquito populations during malaria and yellow fever outbreaks (Howard 1920; Howard 1910) (although it is not especially effective for this purpose; Lloyd ), and has led to declines in native insect, fish, and amphibian populations in its introduced range (Pyke 2008). The creation of a high-quality genome assembly for G. affinis will provide an important resource to study the genetic basis of traits commonly studied in Poeciliidae (described above). Because G. affinis is highly invasive, a high-quality genome will also allow us to understand the genetic basis of traits that significantly differ between invasive and non-invasive species and can help us to understand how invasiveness evolves. Currently, few members of the Poeciliidae family have sequenced genomes, including platyfish and swordtails in the genus Xiphophorus (Schartl ; Shen ), as well as the guppy, Poecilia reticulata (Fraser ). A sequenced genome in the Gambusia genus will provide a sequenced genome in the third major clade in this family and facilitate genomic and phylogenetic comparisons between Xiphophorus, P. reticulata, and G. affinis, and could be used as a reference genome for comparative genomic studies in diverse fields. We created a high-quality genome assembly of G. affinis using Illumina sequencing of both traditional paired-end libraries and an in vitro proximity-ligation Chicago library (Putnam ). The Chicago library method in conjunction with the HiRise software pipeline is designed to bridge gaps in alignment due to repetitive sequences (Putnam ) and thereby increase assembly contiguity by up to twenty-fold, despite only using short read data from the Illumina platform. After the creation of a high-quality genome, we annotated this genome using genic sequences from X. maculatus and P. reticulata, and compared the gene and repetitive element content and quality of the G. affinis genome and assembly to related fish species.

Methods

Library preparation and de novo shotgun assembly

We constructed our de novo genome assembly for G. affinis using a short insert library of DNA (Fig S1). We collected a single male fish from the Zuibaiji River in Japan (located at the GPS coordinates: 33.59111, 130.25444) in 2010 and stored this fish in 70% ethanol until we extracted DNA from the muscle tissue using a phenol-chloroform extraction. DNA was stored in TE until library preparation. A male fish was used because males possess homomorphic sex chromosomes (Volff and Schartl 2001). We sheared the genomic DNA using the Covaris S2 (Covaris, Woburn, MA, USA) targeting a 600bp average fragment size. The sheared DNA was end-repaired, adenylated, and ligated to TruSeq LT adapters using a TruSeq DNA PCR-Free Library Preparation Kit (Illumina, San Diego, CA, USA). We purified the ligation reaction using a Qiaquick Gel Extraction Kit (Qiagen, Venlo, The Netherlands) from a 2% agarose gel. We sequenced the library on an Illumina HiSeq 2500 at the University of California at Los Angeles Genotyping and Sequencing Core to obtain paired-end (PE) ∼100 bp reads. Reads were sent to Dovetail Genomics (Santa Cruz, CA, USA) for construction of a draft genome assembly using Meraculous v. 2.0 (Chapman ). We used default parameters when available with a kmer size of 31bp for de novo assembly, and ignored kmers with a depth less than 3. We compared the kmer depth on trimmed reads using Meraculous’ mercounting and untrimmed reads using Dovetail’s mercounting software, which fully tallies very high copy counts. The bubble_depth was set to 20 because that was near a local minimum average depth of kmer coverage (21; Fig S2) on bubbletigs (unique-extension contigs extended by popping “bubbles” caused by SNPs). Reads were trimmed for quality and sequencing adapters using Trimmomatic (Bolger ). This produced a 594.6 Mbp assembly, with scaffold N50 of 31 kbp and contig N50 of 13.9 kbp.

Chicago library prep and scaffolding the draft genome

To improve the de novo assembly, we created a Chicago library (Putnam ) at Dovetail Genomics. This required extracting DNA from another sample to obtain higher molecular weight DNA. Briefly, ≥ 0.5 μg of high molecular weight genomic DNA was extracted from muscle of a male G. affinis from the Platt River in Nebraska (located at the GPS coordinates: 40.8379, -96.7072) as in Allen with modifications made to the extraction buffer, incubation time, and temperature by Dovetail so that the mean DNA fragment size was ∼50 kbp. The DNA was further purified using a Qiagen Genomic column. Chromatin was reconstituted in vitro onto naked DNA, and fixed with formaldehyde. Fixed chromatin was digested with DpnII, resulting 5′ overhangs were filled in with biotinylated nucleotides, and free blunt ends were ligated together. After ligation, crosslinks were reversed and DNA was purified from protein. Biotin that was not internal to ligated fragments was removed from DNA, DNA was sheared to a mean fragment size of ∼350 bp, and sequencing libraries were generated using NEBNext Ultra enzymes (New England Biolabs, Ipswich, MA, USA) and Illumina-compatible adapters. Biotin-containing fragments were isolated using streptavidin beads before PCR enrichment of the library. The Chicago library was sequenced on an Illumina HiSeq 2500 at Dovetail Genomics to obtain PE∼100 bp reads. The G. affinis draft genome in FASTA format, shotgun sequences, and Chicago library sequence (57M read pairs; PE∼125 bp) in FASTQ format were used as input data for HiRise (Putnam ). HiRise is a software pipeline designed specifically for using Chicago library sequence data to assemble genomes. We aligned the shotgun data and Chicago library sequences to the draft input assembly using a modified SNAP read mapper (http://snap.cs.berkeley.edu). Shotgun data were used to detect regions of the assembly with abnormally high coverage, which were omitted when scoring joins and breaks. We analyzed the separations of Chicago read pairs mapped within draft scaffolds by HiRise to produce a likelihood model, and used the resulting likelihood model to identify putative misjoins and score prospective joins. Then we used scaffolding and shotgun sequences to close gaps between contigs that were not in the same scaffold in the draft input assembly in Meraculous’s gap-closing “marauder” component. In this way, HiRise uses the Meraculous feature to close the gaps it creates when making joins.

Gene prediction and annotation

We used the MAKER genome annotation pipeline (Campbell ) to identify the locations of G. affinis genes. MAKER combines several classes of data, including RNAseq data or proteins from closely related species, to generate ab initio gene predictions. The MAKER pipeline consisted of the following steps: 1) RNAseq and protein sequences from X. maculatus (Schartl ) and P. reticulata (Fraser ) were used for the initial annotations; 2) the initial annotations were used to train SNAP gene prediction tool (Korf 2004) multiple times; and 3) the final set of gene annotations were generated from the trained ab initio SNAP predictions (Files S1-S5). To assess the quality of our G. affinis gene annotations, we used BLAST to compare the number of gene annotations in G. affinis to those in X. maculatus, P. reticulata, and Oryzias latipes (medaka; Kasahara ) by setting an e-value cutoff of 10−10 (File S6). These species were chosen because X. maculatus and P. reticulata are Poeciliidae with fully sequenced genomes and O. latipes has a high-quality genome sequence that is often used for comparisons in fish species. To functionally annotate G. affinis genes, we identified the best homologs from the UniProt/Swiss-Prot protein database (Pundir ) using BLASTP with an e-value cutoff of 10−20. In this way, putative functions were assigned to gene annotations. To assess the quality and completeness of annotations in the G. affinis genome in an evolutionary context, we ran Benchmarking Universal Single-Copy Orthologs (BUSCO) v 2.0.1 (Simão ; File S7). BUSCO uses a set of genes from major lineages (e.g., 66 species of fish with a sequenced genome, actinopterygii_odb9; Malmstrøm ) that are orthologous groups with genes present as single-copy orthologs in at least 90% of the species in the group. Therefore, we quantitatively checked for expected gene content while allowing for rare gene duplications or losses. The BUSCO pipeline incorporates AUGUSTUS v3.2.2 (Stanke and Morgenstern 2005), BLAST+ v2.6.0 (Camacho ), and HMMER v3.1b2 (Finn ).

Noncoding RNA prediction

Transfer RNAs in the G. affinis genome were predicted using tRNAscan-SE 2.0 (Lowe and Chan 2016; File S8). The training set used for training the covariance model employed by tRNAscan-SE 2.0 was comprised of eukaryotic tRNAs. Ten of the predicted tRNAs decoding for amino acids were selected randomly and their sequences were searched against databases of tRNAs, GtRNAdb (Chan and Lowe 2016) and tRNAdb (Juhling ). Lastly, we compared the predicted classes of tRNAs in the G. affinis genome with tRNAs reported in the genomes of X. maculatus (Schartl ), P. reticulata (Künstner ), and O. latipes (Chan and Lowe 2016). Homology-based prediction was used to detect rRNAs (ribosomal RNA), snRNAs (small nuclear RNA), snoRNAs (small nucleolar RNA) and miRNAs (microRNAs) in the G. affinis genome (File S8). ncRNAs from O. latipes, X. maculatus, G. aculeatus, and D. rerio were downloaded from Ensemble (http://useast.ensembl.org/info/data/ftp/index.html) to create separate multispecies ncRNA databases for the rRNAs, snRNAs, snoRNAs and miRNAs. The following versions of fish databases were downloaded from Ensemble: BROAD S1 (Gasterosteus aculeatus), Xipmac4.4.2 (Xiphophorus maculatus), HdrR (Oryzias latipes) and GRCz10 (Danio rerio). These databases were used as queries by BLASTN to predict homologous rRNAs, snRNAs, snoRNAs and miRNAs in the G. affinis genome and the duplicates were removed from the output files. An e-value cutoff of less than 10−5 was employed to filter out significant hits. miRNAs were identified using the RNAfold program of the Vienna RNA package (v2.4.3) of MiRscan (http://genes.mit.edu/mirscan). miRNA sequences corresponding to structures having a minimum free energy of < -20kcal/mol were retained in the final output.

Transposable elements

We compared the proportion and composition of transposable elements (TEs) in X. maculatus, P. reticulata, O. latipes, and G. affinis genomes. To determine the repeat diversity in the assembly, we used RepeatModeler (Smit and Hubley 2008–2015) with default parameters to identify and build a library containing transposable elements, simple repeats and low complexity regions (this library contained 737 consensus sequences). We also classified Miniature Inverted Repeat Transposable Elements (MITEs), which are not found with RepeatModeler, using MITE-Hunter (Han and Wessler 2010). As MITEs are non-autonomous sequences that lack protein-coding regions and distinctive features such as poly A tracts, MITE-Hunter may detect false positives. To avoid false positives, we used various MITE-specific criteria: 1) Terminal Inverted Repeats (TIRs) and Target Site Duplications (TSDs) identified by multiple sequence alignment (MSA) while flanking regions were divergent; 2) the high repetition of MITEs in their host genome; 3) and the identification of associated autonomous DNA transposons. Thus, we performed the following analyses for each consensus sequence: TSD identification in the MSA file, use of BlastN2 and RNAfold (Lorenz ) to help identify the TIRs and TSDs, family identification using CENSOR (Kohany ), and copy number estimation in the assembly via BLAST analysis. We used RepeatMasker 4.0.0 (Smit, AFA, Hubley, R & Green, P. RepeatMasker Open-4.0) with the -lib option to specify the G. affinis-specific library to estimate the number of copies of each class and transposable element family as well as the coverage in the assembly.

Data availability

Raw reads have been deposited in the NCBI Sequence Read Archive (SRR5601730 for the Nebraska fish/HiRise assembly, and SRR5601729 for the Japanese fish/Meraculous assembly). This Whole Genome Shotgun project has been deposited at DDBJ/ENA/GenBank under the accession NHOQ00000000. The version described in this paper is version NHOQ01000000. The genome sequence, annotations, and aligned reads (in BAM format) are available at gambusia.genetics.uga.edu. Figure S1 compares the size distribution of library inserts in the Meraculous and HiRise assemblies. Figure S2 shows the frequency of kmers at each kmer length. Figure S3 shows the distribution of scaffold lengths in the HiRise assembly. Figure S4 shows the cumulative percent of the assembly for a given scaffold size in the Meraculous and HiRise assemblies. Table S1 presents a detailed list of the number of copies and percent of the assembly of transposons and repeatable elements. Files S1-S4 contain the MAKER submission script, executable file (maker_exe.ctl), specifications for downstream filtering of BLAST and Exonerate alignments (maker_bopts.ctl), and primary configuration of MAKER specific options (maker_opts.ctl), respectively. File S5 contains the commands for training SNAP. File S6 contains the submission script for BLAST comparing Gambusia with related fish. File S7 contains the submission script for BUSCO. File S8 contains the submission script for predicting ncRNAs. File S9 contains the Illumina reads aligned to the reference in BAM format. The sequence and structure of tRNAs can be found in Files S10 and S11, respectively. File S12 contains the rRNA, snRNA, snoRNA, and miRNA sequences. Supplemental material available at Figshare: https://doi.org/10.25387/g3.6157706.

Results

Assembly

We sequenced the whole genome of the mosquitofish, Gambusia affinis, using one male fish from the invasive range in Japan for the initial shotgun sequencing and a second male fish from the invasive range in Nebraska, USA for the HiRise sequencing. We produced a 598.7 Mb genome assembly with 54.9× coverage on average (File S9). Using kmer analysis, we estimate the size of the genome to be 683 Mbp with Meraculous’ mercounter and 759 Mbp with Dovetail’s mercounter, with 18.5% of the genome repetitive. The N50 contig size was 17.6 kb and scaffold size was 6.65 Mb (Table 1; Figs. S1 and S3). This was a large improvement over the initial shotgun assembly, which had 24× coverage and contig and scaffold sizes of 13.9 kb and 31 kb, respectively (Table 1; Fig S4). In addition to the increase in scaffold size, we also had a large increase in contiguity, with the number of scaffolds above the median length decreasing from 5,240 in the Meraculous assembly to 26 in the HiRise assembly (Table 1). The overall number of scaffolds decreased from 38,526 to 2,943 (Table 1).
Table 1

Quality statistics of initial shotgun sequencing assembled by Meraculous and final assembly by HiRise

Meraculous AssemblyDovetail HiRise Assembly
Total length594.6 Mb598.7 Mb
Scaffold N5031 kb6.65 Mb
Scaffold N907 kb914 kb
Scaffold L505,240 scaffolds26 scaffolds
Scaffold L9020,613 scaffolds117 scaffolds
Longest scaffold324,44424,339,338
Number of scaffolds38,5262,943
Number of scaffolds >1 kb38,5192,940
Contig N5013.9 kb17.6 kb
Contig N903.56 kb4.23 kb
Contig L5012,100 contigs9,490 contigs
Contig L9044,284 contigs35,674 contigs
Number of gaps >= 100 bpa18,14540,532
Percent of genome in gaps0.972%1.34%

HiRise arbitrarily sizes gaps to 100 Ns.

HiRise arbitrarily sizes gaps to 100 Ns. The final annotation set of the G. affinis genome from the MAKER annotation pipeline contained 21,163 predicted genes (Table 2), fewer than closely related species (P. reticulata, X. maculatus and O. latipes). BLASTP analyses revealed 20,511 (97%), 19,904 (94%) and 18,880 (89%) of predicted G. affinis genes had significant hits to P. reticulata, X. maculatus and O. latipes respectively. Average gene, exon, and intron lengths are shorter in G. affinis when compared to closely related organisms, but average coding sequence length and the number of exons per gene are similar (Table 2). A total of 17,565 gene annotations were assigned putative functions through BLASTP analyses.
Table 2

Comparison of genes predicted in Gambusia affinis from BLAST to genome annotations for Poecilia reticulate (guppy), Xiphophorus maculatus (platyfish), and Oryzias latipes (medaka) from NCBI ‬

G. affinisP. reticulataaX. maculatusbO. latipesc
Number of protein-encoding genes21,14422,98222,08222,658
Mean gene length (bp)13,51018,44115,70216,221
Mean CDS length (bp)1,8272,1751,7141,893
# of G. affinis BLASTP Hits20,51119,90418,880
Number of exons236,097276,363227,016258,916
Mean exon length (bp)164267189260
Mean number of exons per gene11.212.910.611.0
Number of introns214,953248,065205,251230,293
Mean intron length (bp)1,1512,0001,5001,726

http://www.ncbi.nlm.nih.gov/genome/annotation_euk/Poecilia_reticulata/100/

http://www.ncbi.nlm.nih.gov/genome/annotation_euk/Xiphophorus_maculatus/101/

http://www.ncbi.nlm.nih.gov/genome/annotation_euk/Oryzias_latipes/101/

http://www.ncbi.nlm.nih.gov/genome/annotation_euk/Poecilia_reticulata/100/ http://www.ncbi.nlm.nih.gov/genome/annotation_euk/Xiphophorus_maculatus/101/ http://www.ncbi.nlm.nih.gov/genome/annotation_euk/Oryzias_latipes/101/ In 66 sequenced fish genomes, 4584 genes are found as single copy in at least 90% of these species. In the G. affinis genome, 95.5% (4379) of these 4584 genes had “complete” orthologs, defined as genes that scored within the expected range and were within the expected length. Of these, 93.4% of the total were found in single copy and 2.1% were duplicated. About 2.6% (120) of genes were “fragmented”, meaning that there was a significant match to a gene within the G. affinis genome, but the length was outside of two standard deviations of the BUSCO group mean length, either because the gene is only partially present or indicating a problem with the genome assembly. The last 1.9% (85 genes) had no significant matches, indicating that the ortholog is missing or highly divergent, the gene prediction failed, or those genes are incorrectly assembled.

ncRNA prediction

1769 tRNAs were detected by tRNAscan-SE 2.0 in total (see Files S10 and S11 for the sequence and structure of tRNAs), out of which 260 were found to decode for amino acids, including a single tRNA which decodes for selenocysteine. 22 tRNAs had undetermined isotypes. Related species had more tRNAs decoding amino acids and tRNAs with undetermined isotypes (Table 3). 1453 tRNAs were detected as pseudogenes with poor primary/secondary structures, more than in O. latipes but fewer than in P. reticulata. These were found to have a low Infernal as well as Isotype score in the predicted output from tRNAscan-SE 2.0. Thirty-four tRNAs were chimeric with mismatched isotypes, meaning they have specific identity elements in their bodies which are recognized by specific tRNA synthetases, but they code for mRNAs corresponding to different amino acids due to point mutations in their anticodon sequence. Hence, there exists a disagreement in their functional classification, with predicted isotype based on the anticodon sequence and another predicted by the isotype-specific covariance model (Lowe and Chan 2016).
Table 3

The number of tRNAs predicted in the Gambusia affinis genome compared to Xiphophorus maculatus (platyfish), Poecilia reticulata (guppy), and Oryzias latipes (medaka)

G. affinisP. reticulataX. maculatusO. latipes
tRNAs decoding standard 20 AA260439535726
Selenocysteine tRNAs134
Possible suppressor tRNAs012
tRNAs with undetermined or unknown isotypes2265603
Predicted pseudogenes14534186497
284 tRNAs had introns, out of which 257 were predicted to be pseudogenes, two were chimeras, and 25 decoded for the twenty standard amino acids. No suppressor tRNAs were found in the analysis. The subset of predicted tRNAs decoding for amino acids were also predicted in a large number of other species in both GtRNAdb and tRNAdb. 665 miRNAs, 4 rRNAs, 50 snRNAs, and 164 snoRNAs were predicted in the G. affinis genome (Table 4; File S12). Compared to other fish with sequenced genomes, including X. maculatus, O. latipes, G. aculeatus, and D. rerio, G. affinis had the highest number of miRNAs predicted, but fewest rRNAs, snRNAs, and snoRNAs.
Table 4

The number of noncoding RNAs predicted in Gambusia affinis compared to Xiphophorus maculatus (platyfish), Oryzias latipes (medaka), Gasterosteus aculeatus (stickleback), and Danio rerio (zebrafish)

G. affinisX. maculatusO. latipesG. aculeatusD. rerio
miRNA665342366504440
rRNA46574161579
snRNA50763661287
snoRNA164225297305
MITE-hunter found 170 consensus elements and, of these, 102 consensus elements were from 24 families and the other 68 were from singlet families. After further sequence analyses, 35 sequences were added in the repeat library. Twenty of these 205 total repetitive sequences were found to be very conserved relative to the X. maculatus genome (full length sequence, >90% of identity). Non-genic repeats accounted for ∼20% of the assembly with the great majority (∼17.7%) coming from TEs (Table 5, Table S1). Among TEs, DNA transposons are the most abundant class, with the TcMariner and hAT families particularly abundant. The G. affinis assembly is less repetitive than other sequenced poeciliid genomes (from the Xiphophorus genus, Shen ), primarily due to higher contents of TcMariner and hAT families in other fish genomes.
Table 5

Number and percent of transposons and other repeats in the Gambusia affinis genome

ClassificationNumber of copiesPercentage of assembly
DNA Transposons318,3319.361
LTR Retrotransposons12,6020.379
LINE Retrotransposons50,0481.401
SINE Retrotransposons16,6090.427
Unknown198,5646.23
Total transposable elementsa596,15417.799
Low complexity regionsb33,0730.255
Satellitesc4,9140.229
Microsatellites219,9651.431
Total854,10619.714

Includes DNA transposons, LTR, LINE, SINE retrotransposons and unknown.

Regions composed of a single or two nucleotides, e.g.: A-rich, GA-rich, C-rich.

Duplications of complex sequences 100-200 bp long.

Includes DNA transposons, LTR, LINE, SINE retrotransposons and unknown. Regions composed of a single or two nucleotides, e.g.: A-rich, GA-rich, C-rich. Duplications of complex sequences 100-200 bp long.

Discussion

We sequenced and assembled the genome of the mosquitofish, Gambusia affinis, using short read Illumina data from paired-end and in vitro proximity-ligation Chicago libraries. The resulting genome assembly had high coverage, improved contigs, and long scaffold sizes compared to other assemblies that used Illumina mate-paired libraries (Poecilia reticulata; Künstner ), assemblies that utilized Roche 454 long insert sequencing (Xiphophorus maculatus; Schartl ), or multiple types of sequencing reads, including PacBio reads (Pootakham ). The Chicago library improves the scaffold contiguity because it provides links between genomic regions hundreds of kb apart and uses information about proximity ligation libraries to obtain a highly continuous genome assembly (Putnam ). The result is a high-quality genome sequence composed of 26 (N50) scaffolds, just more than the haploid number of chromosomes (n = 24; Chen and Ebeling 1968). Our genome assembly was 598.7 Mbp, slightly shorter than the Meraculous kmer estimate of 683 Mbp and the Dovetail kmer estimate of 759 Mbp. The kmer estimates differ because the Dovetail kmercounter used untrimmed reads, and therefore had deeper coverage and better discrimination of homozygotes and heterozygotes, and fully counted the repeats without a maximum copy count. Previous estimates of n = 0.74 to 0.76 pg (724 to 743 Mbp) from white blood cells from 50 native fish, averaged between males and females (Tiersch ) and n = 0.695 to 0.855 pg (680 to 836 Mbp) from blood of two invasive fish, where the sex was not recorded (Jianxun ), have been reported using flow cytomery. G. affinis has dimorphic (WZ) sex chromosomes, where males are homomorphic ZZ and females are heteromorphic ZW, and the W chromosome is the single largest metacentric chromosome and Zs are the smallest acrocentric chromosomes (Black and Howell 1979; Chen and Ebeling 1968). Therefore, we expect a smaller assembly size for our male fish than flow cytometry estimates that average both females and males (Jianxun ; Tiersch ). We found that approximately 90% or more of the genes in G. affinis had hits in the Poeciliid family and in other fish species. Similarly, the majority of genes in the BUSCO gene set were detected in single copy in the G. affinis genome, indicating that the G. affinis genome was largely complete. The number of TEs reported here comprise slightly less of the G. affinis genome than other Poeciliidae genomes, which average ∼21% TEs (Shen ), but this difference is well within the range of expected variation among species and accuracy of the estimated scaffolds, especially when considering the variance in approaches used for scaffolding. Although we find fewer tRNAs, rRNAs, snRNAs and snoRNAs than in related species, we have high confidence in the predicted ncRNAs we report because we used conservative cutoffs to reduce false positives. We find similar relative abundances of each type of ncRNA as in related fish species. G. affinis is a model organism in diverse fields of ecology and evolution, such as life-history evolution (Haynes and Cashner 1995), behavior (Cote ), and biological invasions (Rehage ). The genome assembly and annotations we have created will be a useful resource for those interested in mapping a genetic architecture to traits of interest in this species. In addition, this genome serves as a resource in comparative genomics among Poecilids and teleosts.
  31 in total

1.  Variability of genetic sex determination in poeciliid fishes.

Authors:  J N Volff; M Schartl
Journal:  Genetica       Date:  2001       Impact factor: 1.082

2.  BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs.

Authors:  Felipe A Simão; Robert M Waterhouse; Panagiotis Ioannidis; Evgenia V Kriventseva; Evgeny M Zdobnov
Journal:  Bioinformatics       Date:  2015-06-09       Impact factor: 6.937

3.  THE IMPACT OF PREDATION ON LIFE HISTORY EVOLUTION IN TRINIDADIAN GUPPIES (POECILIA RETICULATA).

Authors:  David Reznick; John A Endler
Journal:  Evolution       Date:  1982-01       Impact factor: 3.694

4.  A modified protocol for rapid DNA isolation from plant tissues using cetyltrimethylammonium bromide.

Authors:  G C Allen; M A Flores-Vergara; S Krasynanski; S Kumar; W F Thompson
Journal:  Nat Protoc       Date:  2006       Impact factor: 13.491

5.  MITE-Hunter: a program for discovering miniature inverted-repeat transposable elements from genomic sequences.

Authors:  Yujun Han; Susan R Wessler
Journal:  Nucleic Acids Res       Date:  2010-09-29       Impact factor: 16.971

6.  ViennaRNA Package 2.0.

Authors:  Ronny Lorenz; Stephan H Bernhart; Christian Höner Zu Siederdissen; Hakim Tafer; Christoph Flamm; Peter F Stadler; Ivo L Hofacker
Journal:  Algorithms Mol Biol       Date:  2011-11-24       Impact factor: 1.405

7.  Chromosome-scale shotgun assembly using an in vitro method for long-range linkage.

Authors:  Nicholas H Putnam; Brendan L O'Connell; Jonathan C Stites; Brandon J Rice; Marco Blanchette; Robert Calef; Christopher J Troll; Andrew Fields; Paul D Hartley; Charles W Sugnet; David Haussler; Daniel S Rokhsar; Richard E Green
Journal:  Genome Res       Date:  2016-02-04       Impact factor: 9.043

8.  tRNAscan-SE On-line: integrating search and context for analysis of transfer RNA genes.

Authors:  Todd M Lowe; Patricia P Chan
Journal:  Nucleic Acids Res       Date:  2016-05-12       Impact factor: 16.971

9.  Whole genome sequencing data and de novo draft assemblies for 66 teleost species.

Authors:  Martin Malmstrøm; Michael Matschiner; Ole K Tørresen; Kjetill S Jakobsen; Sissel Jentoft
Journal:  Sci Data       Date:  2017-01-17       Impact factor: 6.444

10.  tRNAdb 2009: compilation of tRNA sequences and tRNA genes.

Authors:  Frank Jühling; Mario Mörl; Roland K Hartmann; Mathias Sprinzl; Peter F Stadler; Joern Pütz
Journal:  Nucleic Acids Res       Date:  2008-10-28       Impact factor: 16.971

View more
  6 in total

1.  Independent Origin of XY and ZW Sex Determination Mechanisms in Mosquitofish Sister Species.

Authors:  Verena A Kottler; Romain Feron; Indrajit Nanda; Christophe Klopp; Kang Du; Susanne Kneitz; Frederik Helmprobst; Dunja K Lamatsch; Céline Lopez-Roques; Jerôme Lluch; Laurent Journot; Hugues Parrinello; Yann Guiguen; Manfred Schartl
Journal:  Genetics       Date:  2019-11-08       Impact factor: 4.562

2.  Adapterama III: Quadruple-indexed, double/triple-enzyme RADseq libraries (2RAD/3RAD).

Authors:  Natalia J Bayona-Vásquez; Travis C Glenn; Troy J Kieran; Todd W Pierson; Sandra L Hoffberg; Peter A Scott; Kerin E Bentley; John W Finger; Swarnali Louha; Nicholas Troendle; Pindaro Diaz-Jaimes; Rodney Mauricio; Brant C Faircloth
Journal:  PeerJ       Date:  2019-10-11       Impact factor: 2.984

3.  The Genomes of the Livebearing Fish Species Poeciliopsis retropinna and Poeciliopsis turrubarensis Reflect Their Different Reproductive Strategies.

Authors:  Henri van Kruistum; Michael W Guernsey; Julie C Baker; Susan L Kloet; Martien A M Groenen; Bart J A Pollux; Hendrik-Jan Megens
Journal:  Mol Biol Evol       Date:  2020-05-01       Impact factor: 16.240

4.  A First Insight into a Draft Genome of Silver Sillago (Sillago sihama) via Genome Survey Sequencing.

Authors:  Zhiyuan Li; Changxu Tian; Yang Huang; Xinghua Lin; Yaorong Wang; Dongneng Jiang; Chunhua Zhu; Huapu Chen; Guangli Li
Journal:  Animals (Basel)       Date:  2019-10-01       Impact factor: 2.752

5.  Chromosome-level genome assembly of the female western mosquitofish (Gambusia affinis).

Authors:  Feng Shao; Arne Ludwig; Yang Mao; Ni Liu; Zuogang Peng
Journal:  Gigascience       Date:  2020-08-01       Impact factor: 6.524

6.  Phylogenetic analysis and ontogenetic changes in the cone opsins of the western mosquitofish (Gambusia affinis).

Authors:  Chia-Hao Chang; Yu-Chun Wang; Yi Ta Shao; Shih-Hui Liu
Journal:  PLoS One       Date:  2020-10-13       Impact factor: 3.240

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.