| Literature DB >> 32556169 |
Lisa K Johnson, Ruta Sahasrabudhe, James Anthony Gill, Jennifer L Roach, Lutz Froenicke, C Titus Brown, Andrew Whitehead.
Abstract
BACKGROUND: Whole-genome sequencing data from wild-caught individuals of closely related North American killifish species (Fundulus xenicus, Fundulus catenatus, Fundulus nottii, and Fundulus olivaceus) were obtained using long-read Oxford Nanopore Technology (ONT) PromethION and short-read Illumina platforms.Entities:
Keywords: Oxford Nanopore; genome assemblies; genomes; killifish; long reads; polish
Year: 2020 PMID: 32556169 PMCID: PMC7301629 DOI: 10.1093/gigascience/giaa067
Source DB: PubMed Journal: Gigascience ISSN: 2047-217X Impact factor: 6.524
Figure 1:Four Fundulus killifish (left to right): the marine diamond killifish, Fundulus xenicus; the freshwater northern studfish, Fundulus catenatus (south central United States); the freshwater bayou topminnow, Fundulus nottii; and the freshwater blackspotted topminnow, Fundulus olivaceus. (drawings used with permission from the artist, Joseph R. Tomelleri).
Figure 2:Field inversion gels with red boxes showing samples sequenced (in order from left to right: F. catenatus (sheared vs unsheared), F. olivaceus, F. nottii, F. xenicus). DNA was extracted from fresh tissues for F. xenicus and F. nottii, and from frozen tissues for F. catenatus and F. olivaceus.
ONT data collected from each species
| Species | Bases called (Gb) | Coverage (×) | Mean read length | Reads N50 | Q>5 bases called (Gb) | Q>5 mean read length | ONT signal accession | ONT FASTQ accession |
|---|---|---|---|---|---|---|---|---|
|
| 38.5 | 35.0 | 2,449 | 5,733; n = 1,373,426 | 36.42 | 2,699 | ERR3385273 | ERR3385269 |
|
| 40.3 | 36.6 | 1,699 | 3,439; n = 2,687,295 | 34.28 | 2,021 | ERR3385274 | ERR3385270 |
|
| 33.4 | 30.4 | 6,480 | 12,995; n = 700,534 | 31.06 | 7,548 | ERR3385275 | ERR3385271 |
|
| 50.1 | 45.5 | 4,595 | 11,670; n = 987,921 | 45.97 | 5,365 | ERR3385276 | ERR3385272 |
Coverage assumes that the genome size of each species is 1.1 Gb, as estimated for F. heteroclitus [40]. Untrimmed reads were deposited in the ENA under study PRJEB29136. Reads N50 represent the N50 length of all ONT reads before filtering and assembly, followed by the number (n) of reads constituting 50% of the length of all ONT reads. Data used for subsequent genome assemblies were filtered with a requirement for having a mean Phred quality score >Q5. The remaining bases called and mean read length that are >Q5 are listed.
Illumina data collected were all paired-end 150 reads
| Species | Platform | Reads (M) | Coverage (×) | FASTQ accessions |
|---|---|---|---|---|
|
| Illumina HiSeq | 327.5 | 89.3 | ERR3385278 |
| ERR3385279 | ||||
|
| Illumina HiSeq | 316.5 | 86.3 | ERR3385280 |
| ERR3385281 | ||||
|
| Illumina HiSeq | 197.0 | 53.7 | ERR3385282 |
| ERR3385283 | ||||
|
| Illumina NovaSeq | 601.9 | 164.0 | ERR3385284 |
| ERR3385285 |
Coverage assumes 1.1 Gb genome size measured for F. heteroclitus [40].
Figure 3:(A) Quality score profiles for representative R1 Illumina reads from F. xenicus, F. catenatus, F. nottii, and F. olivaceus. For Illumina data, phred quality scores were consistently above Q30 across all reads. Average read quality scores (Q score) vs. read lengths for ONT PromethION from (B) F. xenicus, (C) F. catenatus, (D) F. nottii, (E) F. olivaceus.
Statistics for Illumina-only assemblies using ABySS (version 2.1.5) for each species
| Species | Bases in the Illumina-only assembly | N contigs | Mean length | Largest contig | N50 | Illumina-only BUSCO C; CS/CD/F/M |
|---|---|---|---|---|---|---|
|
| 1,283,257,056 | 5,195,861 | 246.98 | 71,596 | 2,571; n = 107,350 | 57.1%; 56.4/0.7/33.3/9.6 |
|
| 1,205,429,912 | 3,989,534 | 302.15 | 70,870 | 3,629; n = 80,839 | 53.8%; 52.8/1.0/36.0/10.2 |
|
| 1,167,835,004 | 3,875,693 | 301.32 | 92,540 | 3,740; n = 72810 | 62.7%; 61.7/1.0/27.4/9.9 |
|
| 1,252,948,998 | 4,509,089 | 277.87 | 70,765 | 3,670; n = 77136 | 65.7%; 64.0/1.7/25.1/9.2 |
The BUSCO Eukaryota database (303 genes) was used to evaluate the completeness of each assembly [46]. BUSCO numbers reported are percentage complete (C) followed by the percentages of complete single-copy (CS), complete duplicated (CD), fragmented (F), and missing (M) out of 303 genes.
ONT PromethION assemblies using the wtdbg2 version 2.3 assembler [47] followed by polishing with pilon version 1.23 [33]
| Species | Contigs | Contig N50 | Assembly size (bases) | Complete BUSCO C; CS/CD/F/M | |
|---|---|---|---|---|---|
| After wtdbg2 ONT-only | After pilon polishing | ||||
|
| 5,621 | 888,041; n = 325 | 1,075,031,690 | 10.2%; 10.2/0/11.6/78.2 | 90.5%; 87.5/3.0/3.0/6.5 |
|
| 5,854 | 436,102; n = 780 | 1,163,592,740 | 11.2%; 28.4/0/24.4/47.2 | 90.4%; 88.4/2.0/2.6/7.0 |
|
| 2,242 | 2,701,963; n = 95 | 1,081,276,623 | 28.4%; 11.2/0/22.1/66.7 | 94.4%; 92.1/2.3/1.0/4.6 |
|
| 2,622 | 2,669,230; n = 105 | 1,198,526,423 | 23.4%; 23.4/0/25.7/50.9 | 92.1%; 89.8/2.3/1.3/6.6 |
Of interest is the dramatic improvement of the complete BUSCO metric after polishing with pilon. BUSCO numbers reported are percentage complete (C) followed by the percentages of complete single-copy (CS), complete duplicated (CD), fragmented (F), and missing (M) out of the 303 genes in the BUSCO Eukaryota database [46].