| Literature DB >> 32122962 |
Corrinne E Grover1, Mengqiao Pan2, Daojun Yuan3, Mark A Arick4, Guanjing Hu1, Logan Brase5, David M Stelly6, Zefu Lu3, Robert J Schmitz3, Daniel G Peterson4, Jonathan F Wendel1, Joshua A Udall7,8.
Abstract
Cotton is an important crop that has made significant gains in production over the last century. Emerging pests such as the reniform nematode have threatened cotton production. The rare African diploid species Gossypium longicalyx is a wild species that has been used as an important source of reniform nematode immunity. While mapping and breeding efforts have made some strides in transferring this immunity to the cultivated polyploid species, the complexities of interploidal transfer combined with substantial linkage drag have inhibited progress in this area. Moreover, this species shares its most recent common ancestor with the cultivated A-genome diploid cottons, thereby providing insight into the evolution of long, spinnable fiber. Here we report a newly generated de novo genome assembly of G. longicalyx This high-quality genome leveraged a combination of PacBio long-read technology, Hi-C chromatin conformation capture, and BioNano optical mapping to achieve a chromosome level assembly. The utility of the G. longicalyx genome for understanding reniform immunity and fiber evolution is discussed.Entities:
Keywords: Gossypium longicalyx; PacBio; cotton fiber; genome sequence; nematode resistance
Mesh:
Year: 2020 PMID: 32122962 PMCID: PMC7202014 DOI: 10.1534/g3.120.401050
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
Statistics for assembly versions
| Longicalyx V1.0 | Longicalyx V3.0 | Longicalyx V4.0 | Longicalyx V5.0 | |
|---|---|---|---|---|
| Method | PacBio/Canu | +Chicago HighRise+HiC | +BioNano | +IllumiNa+Minion |
| Coverage | 79.45 | |||
| Total Contig Number | 229 | 135 | 17 | 17 |
| Assembly Length | 1196.17 Mb | 1196.19 Mb | 1190.66 Mb | 1190.67 Mb |
| Average Contig Length | 5.22 Mb | 8.86 Mb | 70.04 Mb | 70.04 Mb |
| Total Length of Ns | 0 | 18200 | 18000 | 8488 |
| N50 value is | 28.88 Mb | 95.88 Mb | 95.88 Mb | 95.88 Mb |
| N90 value is | 7.58 Mb | 76.48 Mb | 76.48 Mb | 76.29 Mb |
Statistics for Longicalyx_V2.0 not calculated.
Genome size for G. longicalyx is 1311 Mb (Hendrix and Stewart 2005).
BUSCO and LAI scores for the G. longicalyx genome compared to existing cotton genomes
| Complete BUSCO | Incomplete BUSCO | LAI score | Reference | ||||
|---|---|---|---|---|---|---|---|
| Total | Single | Duplicated | Fragmented | Missing | |||
| 95.80% | 86.00% | 9.80% | 1.00% | 3.20% | 8.51 | ( | |
| 92.80% | 85.10% | 7.70% | 2.70% | 4.50% | 10.57 | ( | |
| 98.00% | 87.30% | 10.70% | 0.70% | 1.30% | 8.51 | ( | |
| 94.70% | 85.20% | 9.50% | 1.00% | 4.30% | 12.59 | ( | |
| 96.30% | 12.20% | 84.10% | 0.80% | 2.90% | 10.38 | (Wang | |
| 97.70% | 14.50% | 83.20% | 0.50% | 1.80% | 10.61 | (Wang | |
Orthogroups between G. longicalyx and two related diploid species. Numbers of genes are listed and percentages within species are in parentheses. Relationships listed in the last four lines of the table represent one/many G. longicalyx genes relative to one or many genes from G. arboreum or G. raimondii
| Number of genes | 38,378 | 40,960 | 37,223 |
| Genes in orthogroups | 33,369 (86.9%) | 38,404 (93.8%) | 35,207 (94.6%) |
| Unassigned genes | 5,009 (13.1%) | 2,556 (6.2%) | 2,016 (5.4%) |
| Orthogroups containing species** | 26,591 (78.5%) | 29,763 (87.8%) | 29,153 (86.0%) |
| Genes in species-specific orthogroups** | 74 (0.2%) | 0 | 8 (0.0%) |
| 1-to-1 relationship | 26,249 (70.5%) | 25,637 (68.9%) | |
| 1-to-many relationship | 1,207 (3.2%) | 1,153 (3.1%) | |
| many-to-1 relationship | 1,438 (3.9%) | 1,172 (3.1%) | |
| many-to-many relationship | 513 (1.4%) | 290 (0.8%) | |
| * only designated primary transcripts were included | 3158 | 8.23% | |
| ** orthogroups may contain one or more genes per species | 2615 | 6.81% | |
Transposable element content in G. longicalyx vs. the sister clade (subsection Gossypium)
| Genome Size | 1311 | 1667 | 1711 |
| LTR/Gypsy (Ty3) | 557 | 876 | 943 |
| LTR/Copia (Ty1) | 39 | 43 | 41 |
| LTR, unspecified | 44 | 62 | 57 |
| DNA (all element types) | 2.3 | 2.7 | 2.4 |
| unknown | 18 | 27 | 25 |
| Total repetitive clustered | 660 | 1011 | 1067 |
| % genome is repet | 50% | 61% | 62% |
| % genome is gypsy | 42% | 53% | 55% |
| % repet is gypsy | 84% | 87% | 88% |
Figure 1Accessible chromatin regions (ACRs) in the G. longicalyx genome. a. Categorization of ACRs in relation to nearest gene annotations - distal dACRs, proximal pACRs, and genic gACRs. b. Length distribution of ACRs that were identified by both HOMER and MACS2 contained within various genomic regions. c. Distance of gACRs and pACRs to nearest annotated genes. d. Boxplot of GC content in ACRs and control regions.
Figure 2Diagram of the RENlon region in G. longicalyx. Marker BNL1231, which co-segregates with nematode resistance, is located at approximately 95.3 Mb on chromosome F11.
Orthogroup identity (by Orthofinder) for defense-related genes in the RenLon region and the copy number per species. In G. longicalyx, this number includes genes found outside of the RenLon region. G. hirsutum and G. barbadense copy numbers are split as genes found on the A or D chromosomes, or on scaffolds/contigs not placed on a chromosome
| Description | Orthogroup | ||||||
|---|---|---|---|---|---|---|---|
| adenylyl-sulfate kinase 3-like | OG0053444 | Golon.011G359300 | 1 | — | — | — | — |
| L-type lectin-domain containing receptor kinase IV.2-like | OG0053450 | Golon.011G361200 | 1 | — | — | — | — |
| T-complex protein 1 subunit theta-like | OG0053447 | Golon.011G360400 | 1 | — | — | — | — |
| protein STRICTOSIDINE SYNTHASE-LIKE 10-like | OG0000242 | Golon.011G363400 | 6 | 4 | 2 | 6 A | 9 A, 5 scaffold |
| Golon.011G363500 | |||||||
| Golon.011G363600 | |||||||
| Golon.011G363700 | |||||||
| Golon.011G363800 | |||||||
| OG0053454 | Golon.011G363300 | 1 | — | — | — | — | |
| TMV resistance protein N-like | OG0000022 | Golon.011G360100 | 25 | 22 | 5 | 10 A, 22 D | 12 A, 21 D, 1 scaffold |
| Golon.011G360300 | |||||||
| Golon.011G360500 | |||||||
| Golon.011G360700 | |||||||
| Golon.011G360800 | |||||||
| Golon.011G361000 | |||||||
| Golon.011G361100 | |||||||
| Golon.011G361400 | |||||||
| Golon.011G361900 | |||||||
| Golon.011G362000 | |||||||
| Golon.011G362400 | |||||||
| Golon.011G362700 | |||||||
| Golon.011G362800 | |||||||
| Golon.011G362900 | |||||||
| Golon.011G364000 | |||||||
| OG0028874 | Golon.011G359900 | 4 | — | — | — | — | |
| Golon.011G362600 | |||||||
| OG0028544 | Golon.011G363200 | 3 | — | — | 1 A | — | |
| OG0030067 | Golon.011G360200 | 1 | — | — | 2 A | — | |
| OG0030069 | Golon.011G362500 | 1 | — | — | 1 A | 1 A | |
| OG0053445 | Golon.011G359800 | 1 | — | — | — | — | |
| OG0053446 | Golon.011G360000 | 1 | — | — | — | — | |
| OG0053448 | Golon.011G360600 | 1 | — | — | — | — | |
| OG0053451 | Golon.011G361700 | 1 | — | — | — | — | |
| OG0053452 | Golon.011G361800 | 1 | — | — | — | — | |
| OG0053453 | Golon.011G362100 | 1 | — | — | — | — | |
| TMV resistance protein N-like isoform X1 | OG0053449 | Golon.011G360900 | 1 | — | — | — | — |
| TMV resistance protein N-like isoform X2 | OG0028874 | Golon.011G362300 | 4 | — | — | — | — |
| OG0033549 | Golon.011G363900 | 1 | — | — | — | 1 A |
This gene is syntenically conserved with G. arboreum in the COGE-GEVO analysis.
This orthogroup is split between two related, but separately named, annotations.
Figure 3Synteny between G. longicalyx and domesticated G. arboreum. Mean percent identity is illustrated by the color (93–94% identity from blue to red), including intergenic regions. Lower right inset: Distribution of pairwise p-distances between coding regions of predicted orthologs (i.e., exons only, start to stop) between G. longicalyx and either G. arboreum (blue) or G. raimondii (green). Only orthologs with <5% divergence are shown, which comprises most orthologs in each comparison.