| Literature DB >> 31462444 |
Joshua A Udall1, Evan Long2, Chris Hanson2, Daojun Yuan3, Thiruvarangan Ramaraj4, Justin L Conover3, Lei Gong5, Mark A Arick6, Corrinne E Grover3, Daniel G Peterson6, Jonathan F Wendel3.
Abstract
Cotton is an agriculturally important crop. Because of its importance, a genome sequence of a diploid cotton species (Gossypium raimondii, D-genome) was first assembled using Sanger sequencing data in 2012. Improvements to DNA sequencing technology have improved accuracy and correctness of assembled genome sequences. Here we report a new de novo genome assembly of G. raimondii and its close relative G. turneri The two genomes were assembled to a chromosome level using PacBio long-read technology, HiC, and Bionano optical mapping. This report corrects some minor assembly errors found in the Sanger assembly of G. raimondii We also compare the genome sequences of these two species for gene composition, repetitive element composition, and collinearity. Most of the identified structural rearrangements between these two species are due to intra-chromosomal inversions. More inversions were found in the G. turneri genome sequence than the G. raimondii genome sequence. These findings and updates to the D-genome sequence will improve accuracy and translation of genomics to cotton breeding and genetics.Entities:
Keywords: Gossypium raimondii; Gossypium turneri; PacBio; cotton; genome sequence
Mesh:
Year: 2019 PMID: 31462444 PMCID: PMC6778788 DOI: 10.1534/g3.119.400392
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
Assembly metrics of the G. turneri genome, the G. raimondii (our current assembly, D5), and the previous G. raimondii assembly (Paterson )
| Contigs | 220 | 187 | 16,924 |
| Max Contig | 23,475,487 | 24,216,129 | 1,162,971 |
| Mean Contig | 3,432,648 | 3,929,767 | 43,597 |
| Contig N50 | 7,909,293 | 6,291,832 | 136,998 |
| Contig N90 | 1,624,019 | 2,044,991 | 32,166 |
| Total Contig Length | 755,182,540 | 734,866,495 | 737,837,083 |
| Assembly GC | 33.21 | 33.19 | 33.19 |
| Scaffolds | 13 | 13 | 13 |
| Max Scaffold | 67,704,245 | 65,701,939 | 70,713,020 |
| Mean Scaffold | 58,092,557 | 56,529,546 | 57,632,930 |
| Scaffold N50 | 60,464,062 | 58,819,159 | 62,175,169 |
| Scaffold N90 | 50,570,303 | 46,322,098 | 45,765,648 |
| Total Scaffold Length | 755,203,240 | 734,884,094 | 749,228,090 |
| Captured Gaps | 207 | 174 | 16,911 |
| Max Gap | 100 | 200 | 63,138 |
| Mean Gap | 100 | 101 | 674 |
| Gap N50 | 100 | 100 | 2,607 |
| Total Gap Length | 20,700 | 17,599 | 11,391,007 |
Figure 1Genome comparisons between G. raimondii (D5), G. turneri (D10), G. raimondii (2012), and the DT-genome of G. hirsutum (DT). A) Genome alignment between G. turneri (D10) and G. raimondii (D5). B) Genome alignment between DT and D10. C) Genome alignment between DT and D5. D) Genome alignment between D5 (2012) and D5 (new). Red circles indicate assembly errors in the 2012 sequence as identified by these alignments and independent HiC data (e.g., D5_13 – Chr01, D5_11 – Chr03, D5_04 – Chr09, D5_02 – Chr13).
Figure 2HiC interactions detected in the previously published G. raimondii genome sequence (Paterson ). A) Most interaction maps of chromosome sequences suggested that the genome sequence was assembled in the correct order. B) A sequence was incorrectly assembled within Chr. 9 (now D5_05) that created a large insertion (red box). Few interactions were found between the inserted segment and the remainder of Chr. 9. C) Corresponding interactions were identified in the HiC interaction plot between Chr. 9 and Chr. 12 (now D5_04), as well as ‘pinch’ within the diagonal interaction map in Chr. 12, indicating the true position of the incorrectly assembled sequence.
Figure 3Genomic assembly data of the new G. raimondii sequence suggest that the previously reported mitochondrial insertion was likely due to an assembly error. A) Genome alignments between G. raimondii Chr. 01 (Paterson ) and our new genome sequence of D5_07. The red circle indicates the putative position of the mitochondrial genome insertion in the previous G. raimondii sequence relative to the new assembly. B) Alignment of G. raimondii PacBio reads (Track 2) to the new reference genome of G. raimondii (Track 1). The multi-colored bars represent individual PacBio reads (Track 2). The previous reference genome of G. raimondii had a mitochondrial insertion somewhere in this 14kb region indicated by the blue bar of Track 3. There are no PacBio reads that span the gap between the flanking regions of the 6,071 repeat and the repeat itself. C) Bionano data mapped to the previous reference genome sequence of G. raimondii (Paterson ) also suggest an insertion of a sequence that is non-contiguous in the flanking regions. The Ref1 track reference to the originally published genome sequence of G. raimondii with a mitochondrial insertion between ∼23Mb and ∼24Mb. Independently constructed Bionano contigs were aligned to the 2012 reference sequence. A Bionano contig matched the reference sequence in the mitochondria insertion region, but the flanking regions of the Bionano contig (yellow) did not match flanking Bionano contigs or the reference sequence.
Inversions between the de novo genome assemblies of G. turneri and G. raimondii
| Chromosome | Inv. number | Total Length | Gene number |
|---|---|---|---|
| 1 | 9 | 4,856,224 | 132 |
| 2 | 9 | 7,086,444 | 114 |
| 3 | 5 | 5,569,613 | 431 |
| 4 | 4 | 2,192,874 | 60 |
| 5 | 5 | 4,213,508 | 179 |
| 6 | 6 | 1,597,287 | 164 |
| 7 | 3 | 2,453,735 | 159 |
| 8 | 7 | 16,167,439 | 345 |
| 9 | 8 | 5,741,456 | 267 |
| 10 | 2 | 417,545 | 9 |
| 11 | 9 | 7,708,678 | 501 |
| 12 | 4 | 1,771,113 | 44 |
| 13 | 10 | 4,944,400 | 187 |
| Total | 81 | 64,720,316 | 2592 |
Each of the de novo genome assemblies were annotated for gene content using Maker-P
| Predicted Features | |||
|---|---|---|---|
| CDS | 205,333 | 235,836 | 486,043 |
| exon | 200,384 | 236,559 | 527,563 |
| gene | 38,489 | 40,743 | 37,505 |
| mRNA | 39,553 | 41,030 | 77,267 |
Repetitive content of the newly sequenced G. turneri and G. raimondii genomes, and the previously published G. raimondii (Paterson ). No LINE or SINE elements were detected. The genome size of G. turneri is 910 Mb and G. raimondii is 880 Mb
| Family | Fragments | Copies | Total (Mb) | Fragments | Copies | Total (Mb) | Fragments | Copies | Total (Mb) |
|---|---|---|---|---|---|---|---|---|---|
| 20,199 | 12,453 | 18.28 | 22,503 | 13,764 | 20.63 | 23,474 | 13,969 | 20.27 | |
| 2 | 1 | 0.00 | 2 | 2 | 0.00 | 14 | 9 | 0.00 | |
| 2,443 | 1,385 | 3.92 | 3,172 | 1,864 | 5.24 | 3,648 | 1,878 | 4.87 | |
| 30 | 22 | 0.01 | 58 | 41 | 0.03 | 42 | 28 | 0.02 | |
| 2,725 | 1,712 | 1.01 | 3,079 | 1,895 | 1.01 | 3,209 | 1,966 | 1.03 | |
| 1,255 | 638 | 1.56 | 1,256 | 618 | 1.49 | 1,290 | 633 | 1.54 | |
| 98 | 51 | 0.07 | 76 | 40 | 0.06 | 84 | 43 | 0.06 | |
| 13,590 | 8,592 | 11.71 | 14,828 | 9,280 | 12.79 | 15,145 | 9,381 | 12.73 | |
| 52 | 50 | 0.01 | 21 | 19 | 0.00 | 25 | 23 | 0.00 | |
| 4 | 2 | 0.00 | 11 | 5 | 0.00 | 17 | 8 | 0.01 | |
| 338,644 | 199,672 | 277.72 | 325,760 | 190,122 | 264.75 | 336,908 | 196,564 | 267.24 | |
| 224 | 216 | 0.02 | 214 | 206 | 0.02 | 311 | 304 | 0.03 | |
| 48,098 | 28,294 | 45.51 | 48,911 | 29,032 | 45.29 | 50,993 | 29,965 | 45.72 | |
| 290,322 | 171,162 | 232.19 | 276,635 | 160,884 | 219.44 | 285,604 | 166,295 | 221.49 | |