| Literature DB >> 29703783 |
Sandra L Hoffberg1, Nicholas J Troendle2, Travis C Glenn2,3,4, Ousman Mahmud2,5,6, Swarnali Louha4, Domitille Chalopin2, Jeffrey L Bennetzen2, Rodney Mauricio2.
Abstract
The western mosquitofish, Gambusia affinis, is a freshwater poecilid fish native to the southeastern United States but with a global distribution due to widespread human introduction. Gambusia affinis has been used as a model species for a broad range of evolutionary and ecological studies. We sequenced the genome of a male G. affinis to facilitate genetic studies in diverse fields including invasion biology and comparative genetics. We generated Illumina short read data from paired-end libraries and in vitro proximity-ligation libraries. We obtained 54.9× coverage, N50 contig length of 17.6 kb, and N50 scaffold length of 6.65 Mb. Compared to two other species in the Poeciliidae family, G. affinis has slightly fewer genes that have shorter total, exon, and intron length on average. Using a set of universal single-copy orthologs in fish genomes, we found 95.5% of these genes were complete in the G. affinis assembly. The number of transposable elements in the G. affinis assembly is similar to those of closely related species. The high-quality genome sequence and annotations we report will be valuable resources for scientists to map the genetic architecture of traits of interest in this species.Entities:
Keywords: Dovetail Genomics; Gambusia affinis; Poecilid; de novo assembly; whole genome sequencing
Mesh:
Substances:
Year: 2018 PMID: 29703783 PMCID: PMC5982815 DOI: 10.1534/g3.118.200101
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
Quality statistics of initial shotgun sequencing assembled by Meraculous and final assembly by HiRise
| Meraculous Assembly | Dovetail HiRise Assembly | |
|---|---|---|
| Total length | 594.6 Mb | 598.7 Mb |
| Scaffold N50 | 31 kb | 6.65 Mb |
| Scaffold N90 | 7 kb | 914 kb |
| Scaffold L50 | 5,240 scaffolds | 26 scaffolds |
| Scaffold L90 | 20,613 scaffolds | 117 scaffolds |
| Longest scaffold | 324,444 | 24,339,338 |
| Number of scaffolds | 38,526 | 2,943 |
| Number of scaffolds >1 kb | 38,519 | 2,940 |
| Contig N50 | 13.9 kb | 17.6 kb |
| Contig N90 | 3.56 kb | 4.23 kb |
| Contig L50 | 12,100 contigs | 9,490 contigs |
| Contig L90 | 44,284 contigs | 35,674 contigs |
| Number of gaps >= 100 bp | 18,145 | 40,532 |
| Percent of genome in gaps | 0.972% | 1.34% |
HiRise arbitrarily sizes gaps to 100 Ns.
Comparison of genes predicted in Gambusia affinis from BLAST to genome annotations for Poecilia reticulate (guppy), Xiphophorus maculatus (platyfish), and Oryzias latipes (medaka) from NCBI
| Number of protein-encoding genes | 21,144 | 22,982 | 22,082 | 22,658 |
| Mean gene length (bp) | 13,510 | 18,441 | 15,702 | 16,221 |
| Mean CDS length (bp) | 1,827 | 2,175 | 1,714 | 1,893 |
| # of | — | 20,511 | 19,904 | 18,880 |
| Number of exons | 236,097 | 276,363 | 227,016 | 258,916 |
| Mean exon length (bp) | 164 | 267 | 189 | 260 |
| Mean number of exons per gene | 11.2 | 12.9 | 10.6 | 11.0 |
| Number of introns | 214,953 | 248,065 | 205,251 | 230,293 |
| Mean intron length (bp) | 1,151 | 2,000 | 1,500 | 1,726 |
http://www.ncbi.nlm.nih.gov/genome/annotation_euk/Poecilia_reticulata/100/
http://www.ncbi.nlm.nih.gov/genome/annotation_euk/Xiphophorus_maculatus/101/
http://www.ncbi.nlm.nih.gov/genome/annotation_euk/Oryzias_latipes/101/
The number of tRNAs predicted in the Gambusia affinis genome compared to Xiphophorus maculatus (platyfish), Poecilia reticulata (guppy), and Oryzias latipes (medaka)
| tRNAs decoding standard 20 AA | 260 | 439 | 535 | 726 |
| Selenocysteine tRNAs | 1 | 3 | – | 4 |
| Possible suppressor tRNAs | 0 | 1 | – | 2 |
| tRNAs with undetermined or unknown isotypes | 22 | 65 | – | 603 |
| Predicted pseudogenes | 1453 | 4186 | – | 497 |
The number of noncoding RNAs predicted in Gambusia affinis compared to Xiphophorus maculatus (platyfish), Oryzias latipes (medaka), Gasterosteus aculeatus (stickleback), and Danio rerio (zebrafish)
| miRNA | 665 | 342 | 366 | 504 | 440 |
| rRNA | 4 | 6 | 57 | 416 | 1579 |
| snRNA | 50 | – | 76 | 366 | 1287 |
| snoRNA | 164 | – | 225 | 297 | 305 |
Number and percent of transposons and other repeats in the Gambusia affinis genome
| Classification | Number of copies | Percentage of assembly |
|---|---|---|
| DNA Transposons | 318,331 | 9.361 |
| LTR Retrotransposons | 12,602 | 0.379 |
| LINE Retrotransposons | 50,048 | 1.401 |
| SINE Retrotransposons | 16,609 | 0.427 |
| Unknown | 198,564 | 6.23 |
| Low complexity regions | 33,073 | 0.255 |
| Satellites | 4,914 | 0.229 |
| Microsatellites | 219,965 | 1.431 |
Includes DNA transposons, LTR, LINE, SINE retrotransposons and unknown.
Regions composed of a single or two nucleotides, e.g.: A-rich, GA-rich, C-rich.
Duplications of complex sequences 100-200 bp long.