| Literature DB >> 30698692 |
Karine A Martinez-Viaud1, Cindy Taylor Lawley2,3, Milmer Martinez Vergara4, Gil Ben-Zvi5, Tammy Biniashvili5, Kobi Baruch5, Judy St Leger6, Jennie Le1, Aparna Natarajan1, Marlem Rivera1,3, Marbie Guillergan1, Erich Jaeger1, Brian Steffy1, Aleksey Zimin7.
Abstract
High-quality genomes are essential to resolve challenges in breeding, comparative biology, medicine, and conservation planning. New library preparation techniques along with better assembly algorithms result in continued improvements in assemblies for non-model organisms, moving them toward reference-quality genomes. We report on the latest genome assembly of the Atlantic bottlenose dolphin, leveraging Illumina sequencing data coupled with a combination of several library preparation techniques. These include Linked-Reads (Chromium, 10x Genomics), mate pairs (MP), long insert paired ends, and standard paired end. Data were assembled with the commercial DeNovoMAGIC assembly software, resulting in two assemblies, a traditional "haploid" assembly (Tur_tru_Illumina_hap_v1) that is a mosaic of the two parental haplotypes and a phased assembly (Tur_tru_Illumina_phased_v1) where each scaffold has sequence from a single homologous chromosome. We show that Tur_tru_Illumina_hap_v1 is more complete and more accurate compared to the current best reference based on the amount and composition of sequence, the consistency of the MP alignments to the assembled scaffolds, and on the analysis of conserved single-copy mammalian orthologs. The phased de novo assembly Tur_tru_Illumina_phased_v1 is the first publicly available for this species and provides the community with novel and accurate ways to explore the heterozygous nature of the dolphin genome.Entities:
Keywords: zzm321990 Tursiops truncatuszzm321990 ; zzm321990 de novo genome assembly; 10x Genomics; DeNovoMAGIC; Illumina; bottlenose dolphin
Mesh:
Year: 2019 PMID: 30698692 PMCID: PMC6443575 DOI: 10.1093/gigascience/giy168
Source DB: PubMed Journal: Gigascience ISSN: 2047-217X Impact factor: 6.524
Summary of the sequencing data collected to create Tur_tru_Illumina_hap_v1 and Tur_tru_Illumina_phased_v1
| Library type | Read length | Insert size | Genomic coverage |
|---|---|---|---|
|
| 2 × 250 bp | 450 bp | 101x |
|
| 2 × 160 bp | 800 bp | 123x |
|
| 2 × 150 bp | 2–4 Kbp (peak 4.2 Kbp) | 37x |
|
| 2 × 150 bp | 5–7 Kbp (peak 6.0 Kbp) | 61x |
|
| 2 × 150 bp | 8–10 Kbp (peak 9.9 Kbp) | 58x |
|
| 2 × 150 bp | – | 70x |
Comparison of quantitative statistics for different assemblies of the bottlenose dolphin
| Tur_tru v1 | Tur_tru_Illumina_hap_v1 | Tur_tru_Illumina_phased_v1 | |
|---|---|---|---|
|
| 2,120,283,832 | 2,383,130,043 | 4,678,362,582 |
|
| 2,647 | 481 | 98,209 |
|
| 96,299,184 | 83,924,496 | 10,429,594 |
|
| 23,564,561 | 26,997,441 | 777,432 |
|
| 26 | 30 | 1,509 |
| No. | 116,650 | 139,544 | 355,974 |
|
| 403,070 | 320,783 | 298,006 |
|
| 37,749 | 30,985 | 25,997 |
|
| 17,321 | 23,199 | 53,738 |
|
| 40.85 | 41.25 | 41.95 |
The total sequence listed excludes Ns (ambiguous nucleotides). Ns were also squeezed out from the scaffolds for N50 computations. We used genome size of 2,383,130,043 bp, equal to the total amount of sequence in the scaffolds of the bigger haploid assembly, for comparison of the N50 contig and scaffold sizes between the two assemblies. The Tur_tru_Illumina_hap_v1 and Tur_tru v1 assemblies have comparable scaffold N50 sizes, and Tur_tru v1 has bigger contigs. The Tur_tru_Illumina_hap_v1 assembly has more sequence and our Benchmarking Universal Single-Copy Orthologs (BUSCO) analysis (Table 3) shows that it is likely more complete. The N50 comparisons to the haplotype-resolved Tur_tru_Illumina_phased_v1 assembly are shown for completeness, computed with 2x genome size (2*2,383,130,043 = 4,766,260,086 bp).
Comparison of BUSCO 3.0.2 Mammalia single copy orthologs among the three Dolphin assemblies
| BUSCOs | Tur_tru v1 | Tur_tru_Illumina_hap_v1 | Tur_tru_Illumina_phased_v1 |
|---|---|---|---|
| Complete | 3,647 | 3,837 | 3,537 |
| Complete single-copy | 3,614 | 3,760 | 1,310 |
| Complete duplicated | 33 | 77 | 2,227 |
| Fragmented | 187 | 107 | 301 |
| Missing | 270 | 160 | 266 |
| Total | 4,104 | 4,104 | 4,104 |
The table shows that the Tur_tru_Illumina_hap_v1 assembly is more complete, with 110 fewer missing single-copy orthologs compared to the Tru_tru v1 assembly. The Tur_tru_Illumina_hap_v1 assembly has 43 extra duplicated orthologs, which possibly points to incomplete filtering of redundant haplotypes. While the Tur_tru v1 assembly has bigger contigs, the Tur_tru_Illumina_hap_v1 assembly has many fewer fragmented BUSCOs. The haplotype-resolved Tur_tru_Illumina_phased_v1 assembly is less contiguous and less complete. As expected, more than half of the complete BUSCOs are duplicated, corresponding to the two resolved haplotypes.
Figure 1:Venn diagram of BUSCOs present in two dolphin assemblies. Out of 4,104 BUSCOs in the mammalia set, 105 are missing from both assemblies. Our assembly has 165 BUSCOs not present in Tur_tru v1, and Tur_tru v1 has 55 BUSCOs that are not present in our assembly.
Comparison of the number of MPs from 5–7 Kbp library uniquely aligned to Tur_tru_Illumina_hap_v1, Tur_tru_Illumina_phased_v1, and Tur_tru v1 assemblies
| Tur_tru_Illumina_hap_v1 | Tur_tru_Illumina_phased_v1 | Tur_tru v1 | |
|---|---|---|---|
|
| 228,285,253 | 73,514,546 | 219,277,963 |
|
| 125,224,663 | 37,248,420 | 117,942,390 |
|
| 158,547 | 51,533 | 1,164,118 |
|
| 82,948 | 9,183 | 274,458 |
|
| 102,819,095 | 36,205,410 | 99,896,997 |
|
| 5,629,393 | 7,38,471 | 5,576,802 |
|
| 61,284,891 | 26,228,913 | 68,707,025 |
The alignments were done with Bowtie2. Only the reads that mapped uniquely were used for this computation, thus the number of MPs uniquely mapping to haplotype resolved assembly is much smaller. Same scaffold means that both mates mapped to the same scaffold; happy mates aligned in the correct orientation with mate distance within 5 standard deviations from the mean; misoriented mates aligned in the wrong orientation; long mates aligned with the distance between the mates exceeding 5 standard deviations; short mates aligned with the distance of less than 1,000 bp. Same scaffold mate ALL is the total number of all MPs where both mates aligned to the same scaffold.
Figure 2:An example mummerplot of the alignments of the phased assembly to the “haploid” one, spanning about 15 Mbp of sequence of scaffold314. The circles represent contig ends, with lines joining them representing aligned sequence. The color indicates the direction of the alignment, with red and blue forward and reverse, respectively. We show that for most locations on the x-axis (haploid assembly coordinates) there are two alignments on the y-axis corresponding to the two phased haplotypes. The small number of regions with a single contig aligning represent long homozygous regions of the genome that we were unable to phase.
Figure 3:This figure shows the alignment of the Tur_tru_Illumina_hap_v1 assembly to the human GRCH38 reference (primary chromosomes only). Each dot represents an alignment, with red indicating forward direction and blue indicating reverse direction. Human reference coordinates are on the x-axis and Tur_tru_Illumina_hap_v1 assembly alignment coordinates are on the y-axis. (A) Alignment of the entire assembly to the human reference with alignments to human chromosome 1 highlighted by the black box. One can clearly see the synteny that is present between the dolphin scaffolds and human chromosome 1. No other human chromosome shows clear synteny. Dolphin scaffolds with syntenic alignments spanning over 50% of the scaffold were extracted. (B)Alignments only to human chromosome 1.