Literature DB >> 28540183

The draft genome of Corchorus olitorius cv. JRO-524 (Navin).

Debabrata Sarkar¹, Ajay Kumar Mahato², Pratik Satya¹, Avijit Kundu¹, Sangeeta Singh², Pawan Kumar Jayaswal², Akshay Singh², Kaushlendra Bahadur², Sasmita Pattnaik², Nisha Singh², Avrajit Chakraborty¹, Nur Alam Mandal¹, Debajeet Das¹, Tista Basu¹, Amitha Mithra Sevanthi², Dipnarayan Saha¹, Subhojit Datta¹, Chandan Sourav Kar¹, Jiban Mitra¹, Karabi Datta³, Pran Gobinda Karmakar¹, Tilak Raj Sharma², Trilochan Mohapatra⁴, Nagendra Kumar Singh².

Abstract

Here, we present the draft genome (377.3 Mbp) of Corchorus olitorious cv. JRO-524 (Navin), which is a leading dark jute variety developed from a cross between African (cv. Sudan Green) and indigenous (cv. JRO-632) types. We predicted from the draft genome a total of 57,087 protein-coding genes with annotated functions. We identified a large number of 1765 disease resistance-like and defense response genes in the jute genome. The annotated genes showed the highest sequence similarities with that of Theobroma cacao followed by Gossypium raimondii. Seven chromosome-scale genetically anchored pseudomolecules were constructed with a total size of 8.53 Mbp and used for synteny analyses with the cocoa and cotton genomes. Like other plant species, gypsy and copia retrotransposons were the most abundant classes of repeat elements in jute. The raw data of our study are available in SRA database of NCBI with accession number SRX1506532. The genome sequence has been deposited at DDBJ/EMBL/GenBank under the accession LLWS00000000, and the version described in this paper will be the first version (LLWS01000000).

Entities: Chemical Disease Species

Keywords: Bast fibre; Corchorus olitorius; Dark jute; Illumina MiSeq; Whole genome sequence

Year: 2017 PMID： 28540183 PMCID： PMC5432662 DOI： 10.1016/j.gdata.2017.05.007

Source DB: PubMed Journal: Genom Data ISSN： 2213-5960

Direct link to deposited data

http://www.ncbi.nlm.nih.gov/bioproject/PRJNA278717 for Corchorus olitorius cv. JRO-524 (http://www.ncbi.nlm.nih.gov/sra/SRX1506532). (https://www.ncbi.nlm.nih.gov/biosample/SAMN04160039).

Introduction

Corchorus olitorius L. (2n = 2 × = 14; Malvaceae s. l.), commonly known as dark jute or jute mallow, is an important ligno-cellulosic bast fibre crop, with > 80% acreage of jute growing areas of the world. Grown in tropical lowland areas, it produces one of the strongest vegetable fibres and is only next to cotton in terms of production [1]. Though it is ideally suited for transplanted paddy-based crop rotation and makes softer and stronger fibre than its other cultivated counterpart C. capsularis (white jute), there are several biological constraints that limit its diversified uses in textile industry [2]. Besides yield enhancement, there is an urgent need to develop dark jute varieties with quality fibre in terms of fibre fineness and tensile strength including low-lignin content using genomics-assisted breeding approaches. Recently, the draft genome sequence of C. olitorius cv. O-4 has been released by Bangladesh [3]. However, the variety sequenced by Bangladesh is a pure line selection from a local landrace [4]. Since C. olitorius originated in Africa [5] and reached India together with many African crops in prehistory [6], it is of potential interest to decode one of its genomes that represents an admixture of both African and Indian gene pools. In this study, we sequenced a leading Indian variety JRO-524 (Navin), which was developed from a cross between African (cv. Sudan Green from Sudan) and indigenous (cv. JRO-632; a local selection) types. Our results provide new insights into the C. olitorius genome, and its availability would not only facilitate jute research and development, but also foster the application of translational genomics in jute improvement.

Experimental design, material and methods

Plant material and DNA isolation

Seeds of C. olitorius cv. JRO-524 were germinated in petri dishes and leaves were collected from 10-day-old seedlings. Twenty leaves collected from ten seedlings were pooled and used for DNA extraction using the GenElute™ Plant Genomic DNA Miniprep Kit (Sigma-Aldrich Co., St. Louis, USA).

Genome sequencing, de-novo assembly and annotation

DNA was fragmented using the Covaris AFA™ system (Covaris, Inc., Woburn, USA) with a median fragment size of 544 bp, and shotgun libraries were prepared using the Illumina TruSeq DNA PCR-Free Sample Preparation Kit (Illumina, San Diego, USA). Paired-end sequencing was performed on two flow cells of an Illumina MiSeq (2 × 250 bp) platform. The sequence reads were quality-checked using FASTQC (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/). Following adapter trimming, the poor-quality bases were removed using Trimmomatic v0.36 [7]. The genome size was evaluated using the K-mer Analysis Toolkit (KAT) [8]. High-quality reads were merged using PANDASeq v2.7 [9], and then assembled de novo using Newbler v. 2.6 with default parameters (Roeche Inc. Germany). We used the FGENESH gene prediction pipeline from the software package Molquest v4.5 (http://www.softberry.com) for the in silico prediction of genes. The predicted genes were annotated using BLASTX (E < 10− 6) search against the NCBI non-redundant (nr) protein database.

Synteny mapping and pseudomolecule construction

SyMap v3.4 [10] was used for pairwise synteny mapping with cocoa (Theobroma cacao) and diploid cotton (Gossypium raimondii) that showed the highest sequence similarities with our assembled C. olitorius genome during the BLAST similarity search. For the construction of seven chromosome-scale pseudomolecules, we used ALLMAPS [11] to integrate the genome assembly with a RAD-SNP-based genetic map of C. olitorius [12].

Identification of disease resistance-like and defense response genes

The disease resistance-like (R-like) and defense response (DR) genes were manually categorized using different keywords/phrases that represent R-like and DR genes into five main classes as follows: (i) NBS-LRR (matching with NBS-LRR, but not with LZ-NBS-LRR and LRR, CC-NBS-LRR, Pib, Pita, Rp 1-d8, Lr10, Mla 1 and rust resistance), (ii) LZ-NBS-LRR (matching with LZ-NBS-LRR, but not with NBS-LRR, CC-NBS-LRR, LRR and RPM1), (iii) LRR-TM (matching with Xa21, serine/threonine kinases and Cf2/Cf5 resistance), (iv) LRR (matching with disease resistance, viral resistance, Yr10, LRR, but not with NBS-LRR, CC-NBS-LRR, LZ-NBS-LRR), and (v) defense response genes (matching with glucanases, chitinases and thaumatin like genes) [13]. We mapped these R-like and DR genes to an integrated RAD-SNP-based genetic map of jute [12].

Repeat elements and SSR identification

All assembled contigs were screened for the presence of simple sequence repeats (SSRs) using MISA (http://pgrc.ipk-gatersleben.de/misa/). The assembled contigs were analyzed to identify repeat sequences using RepeatModeler and RepeatMasker with Repbase library v22.01 [14].

Data description

Illumina MiSeq sequencing generated 52,507,986 overlapping 2 × 250 bp paired-end raw reads (~ 15.65 Gbp sequence) that were processed to yield 24,996,514 merged high-quality reads with an average read length of 450 bp (~ 12.9 Gbp) and a 31.32 × coverage of the K-mer based estimated 415 Mbp genome of C. olitorius cv. JRO-524. The longer merged reads from Illumina MiSeq platform facilitated economical de-novo assembly of jute genome into 52,373 contigs (377.3 Mbp) covering 90.8% of the estimated genome size. The mean contig size was 7206 bp, while the N50 size was 16,573 bp (Table 1). The raw sequence data are available in NCBI SRA database with accession number SRX1506532, and the assembled genome sequence has been deposited at DDBJ/EMBL/GenBank with the accession number LLWS00000000 vide BioProject PRJNA278717 and BioSample SAMN04160039. We predicted 76,881 gene models, with an average and the largest gene size of 1.3 kbp and 37 kbp, respectively. In total 59,531 (77.4%) of the predicted genes were annotated using BLASTx, while 17,350 genes (22.6%) remained non-annotated and were thus unique to C. olitorius cv. JRO-524 genome. Of these, 57,087 were protein-coding genes with annotated functions. The predicted genes showed the highest sequence similarity with that of T. cacao (37.45%), followed by G. raimondii (9.68%). Using a restriction site-associated DNA (RAD)-SNP linkage map, we have shown earlier that C. olitorius has the maximum syntenic relationship with cocoa followed by diploid cotton [12]. Recently, Islam et al. [3] have also reported the same pattern of syntenic relationship for C. olitorius. In the present study, 501 (99.6%) of the published RAD-SNP markers were mapped to 288 contigs (8.53 Mbp) of the draft genome (Table 2).

Table 1

Summary statistics of de novo-assembled draft genome of C. olitorius cv. JRO-524.

Index	Statistics
Raw reads	52,507,986
High-quality merged reads	24,996,514
Number of assembled contigs	52,373
Size of assembled contigs (bp)	377,376,943
Longest contig (bp)	177,749
Shortest contig (bp)	500
Number of contigs > 1 kb	41,086
Number of contigs > 10 kb	11,958
Number of contigs > 100 kb	38
Mean contig size (bp)	7206
Contig N50 (bp)	16,573

Table 2

Summary of seven chromosome-scale pseudomolecules of C. olitorius cv. JRO-524. The assembled genome was integrated with a RAD-SNP-based genetic map of C. olitorius[12] and anchored contigs were joined together with 50 Ns to generate the chromosome-scale pseudomolecules.

Chromosome	No. of RAD-SNP markers in genetic map	No. of mapped RAD-SNP markers in genome	No. of anchored contigs	Size of anchored contigs (bp)
Chr1	139	139	76	2,336,828
Chr2	119	119	65	1,979,308
Chr3	114	114	69	2,035,515
Chr4	48	47	38	742,950
Chr5	32	32	17	582,942
Chr6	29	29	6	400,300
Chr7	22	21	17	441,461
Total	503	501	288	8,519,304

Summary statistics of de novo-assembled draft genome of C. olitorius cv. JRO-524. Summary of seven chromosome-scale pseudomolecules of C. olitorius cv. JRO-524. The assembled genome was integrated with a RAD-SNP-based genetic map of C. olitorius[12] and anchored contigs were joined together with 50 Ns to generate the chromosome-scale pseudomolecules. Further, we annotated 1765 genes with disease resistance (R-like) and defense response (DR) functions. Of the total R-like and DR genes, 831 (47.1%) belong to LRR-TM, 440 (25%) to NBS-LRR, 352 (19.9%) to LRR and 44 (2.49%) to LZ-NBS-LRR categories. Further, we identified 87 (4.9%) DR genes and categorized them into three sub-categories of chitinases (40 genes), glucanases (28 genes) and thaumatin-like proteins (19 genes). In the genome of C. olitorius cv. JRO-524, 51.9% of the repeat elements were masked, which was much higher than that reported for its closest related published genome of T. cacao (25.7%) [15], but less than that its second-closest related species of G. raimondii (57.0%) [16]. Expectedly, our assembled jute genome was characterized by much higher proportion of retro-transposons (45.7%) than DNA transposons (5.5%). The most dominant classes of transposable elements (TEs) were identified as gypsy (34.3%) and copia (5.7%) that belongs to the LTR superfamily. Earlier, Begum et al. [17] have also predicted high number of LTR retro-transposons in jute. Further, we identified a total of 185,698 genomic SSRs, with mononucleotide repeats being the most abundant class (76.0%), followed by di- (16.0%), tri- (5.7%), tetra- (0.8%), penta- (0.2%) and hexa-nucleotide (0.2%) repeats. Using genetically anchored contigs seven chromosome-scale pseudomolecules were constructed with a mean size of 1,219,051 bp and N50 of 2,038,915 bp (Table 2). Chromosome1 was the longest, while chromosome 6 was the shortest pseudomolecule. Comparative analysis of seven genetically anchored jute chromosomes with 10 chromosomes of T. cacao [15] revealed significant syntenic relationship between the two species, however, collinearity was not conserved (Fig. 1). Jute chromosomes 1, 4 and 7 showed synteny with cocoa chromosomes 9, 5 and 2, respectively, whereas chromosome 2 shared synteny with cocoa chromosomes 3 and 10 and chromosome 3 with cocoa chromosomes 4 and 2. However, jute chromosomes 5 and 6 shared synteny with a single cocoa chromosome 1. Similarly, comparative analysis of jute and diploid cotton species G. raimondii [16] revealed synteny of jute chromosomes 6 and 7 with cotton chromosomes 4 and 13, respectively (Fig. 1), with chromosomes 1, 2 and 3 showing matches with multiple chromosomes of cotton, viz., (1, 4, 9 and 10), (3, 4, 8 and11) and (5, 6, 7 and 12), respectively. Thus comparative analysis with a small fraction (8.53 Mbp) of genetically anchored jute genome revealed chromosomal level synteny of jute with both cocoa and cotton genomes.

Fig. 1

Genomic syntenic relationships of C. olitorius (2n = 2x = 14) with T. cacao (2n = 2x = 20) and G. raimondii (2n = 2x = 26).

Conclusions

To our knowledge, the work presented here is the first whole genome sequence for a C. olitorius genotype derived from an African jute. C. olitorius cv. Sudan Green, one of the parents of cv. JRO-524, was primarily used to transfer premature flowering resistance (in early sowing) to indigenous types [18]. Thus an in-depth comparison of the present sequence with the recently published draft genome [3], would provide new insights that could help understand the mechanisms underlying premature flowering vis-à-vis photoperiodic control of bast fibre development in jute. This would allow breeding of high-yielding varieties with durable premature flowering resistance, which has been recently observed to be breaking down when dark jute crops are sown early under long-day conditions, possibly due to climate change.

Conflict of interest

The authors declare that they have no conflict of interests.

Specifications
Organism/cell line/tissue	Dark jute (Corchorus olitorius cv. JRO-524)/leaves
Sex	Hermaphrodite
Sequence or array type	Illumina MiSeq
Data format	Raw and processed
Experimental factors	The draft genome sequence of Corchorus olitorius cv. JRO-524 (Navin)
Experimental features	DNA was extracted from seedling leaves of C. olitorius cv. JRO-524, and shotgun libraries were prepared followed by paired-end sequencing on an Illumina MiSeq platform, generating 2 × 250 bp overlapping reads. The cleaned sequence reads were merged with PANDASeq and assembled de novo using Newbler software. Genes were predicted by FGENESH and annotated using BLASTx against the NCBI non-redundant protein database. We used SyMap for pairwise synteny mapping and ALLMAPS to integrate our draft genome with a RAD-SNP-based genetic map of C. olitorius.
Consent	N/A
Sample source location	Barrackpore, Kolkata, India (22°46′2.7372″ N 88°23′18.0384″ E)

11 in total

1. The genome of Theobroma cacao.

Authors: Xavier Argout; Jerome Salse; Jean-Marc Aury; Mark J Guiltinan; Gaetan Droc; Jerome Gouzy; Mathilde Allegre; Cristian Chaparro; Thierry Legavre; Siela N Maximova; Michael Abrouk; Florent Murat; Olivier Fouet; Julie Poulain; Manuel Ruiz; Yolande Roguet; Maguy Rodier-Goud; Jose Fernandes Barbosa-Neto; Francois Sabot; Dave Kudrna; Jetty Siva S Ammiraju; Stephan C Schuster; John E Carlson; Erika Sallet; Thomas Schiex; Anne Dievart; Melissa Kramer; Laura Gelley; Zi Shi; Aurélie Bérard; Christopher Viot; Michel Boccara; Ange Marie Risterucci; Valentin Guignon; Xavier Sabau; Michael J Axtell; Zhaorong Ma; Yufan Zhang; Spencer Brown; Mickael Bourge; Wolfgang Golser; Xiang Song; Didier Clement; Ronan Rivallan; Mathias Tahi; Joseph Moroh Akaza; Bertrand Pitollat; Karina Gramacho; Angélique D'Hont; Dominique Brunel; Diogenes Infante; Ismael Kebe; Pierre Costet; Rod Wing; W Richard McCombie; Emmanuel Guiderdoni; Francis Quetier; Olivier Panaud; Patrick Wincker; Stephanie Bocs; Claire Lanaud
Journal: Nat Genet Date: 2010-12-26 Impact factor: 38.330

2. Comparative genomics of two jute species and insight into fibre biogenesis.

Authors: Md Shahidul Islam; Jennifer A Saito; Emdadul Mannan Emdad; Borhan Ahmed; Mohammad Moinul Islam; Abdul Halim; Quazi Md Mosaddeque Hossen; Md Zakir Hossain; Rasel Ahmed; Md Sabbir Hossain; Shah Md Tamim Kabir; Md Sarwar Alam Khan; Md Mursalin Khan; Rajnee Hasan; Nasima Aktar; Ummay Honi; Rahin Islam; Md Mamunur Rashid; Xuehua Wan; Shaobin Hou; Taslima Haque; Muhammad Shafiul Azam; Mahdi Muhammad Moosa; Sabrina M Elias; A M Mahedi Hasan; Niaz Mahmood; Md Shafiuddin; Saima Shahid; Nusrat Sharmeen Shommu; Sharmin Jahan; Saroj Roy; Amlan Chowdhury; Ashikul Islam Akhand; Golam Morshad Nisho; Khaled Salah Uddin; Taposhi Rabeya; S M Ekramul Hoque; Afsana Rahman Snigdha; Sarowar Mortoza; Syed Abdul Matin; Md Kamrul Islam; M Z H Lashkar; Mahboob Zaman; Anton Yuryev; Md Kamal Uddin; Md Sharifur Rahman; Md Samiul Haque; Md Monjurul Alam; Haseena Khan; Maqsudul Alam
Journal: Nat Plants Date: 2017-01-30 Impact factor: 15.793

3. The draft genome of a diploid cotton Gossypium raimondii.

Authors: Kunbo Wang; Zhiwen Wang; Fuguang Li; Wuwei Ye; Junyi Wang; Guoli Song; Zhen Yue; Lin Cong; Haihong Shang; Shilin Zhu; Changsong Zou; Qin Li; Youlu Yuan; Cairui Lu; Hengling Wei; Caiyun Gou; Zequn Zheng; Ye Yin; Xueyan Zhang; Kun Liu; Bo Wang; Chi Song; Nan Shi; Russell J Kohel; Richard G Percy; John Z Yu; Yu-Xian Zhu; Jun Wang; Shuxun Yu
Journal: Nat Genet Date: 2012-08-26 Impact factor: 38.330

4. Comparative molecular cytogenetic analyses of a major tandemly repeated DNA family and retrotransposon sequences in cultivated jute Corchorus species (Malvaceae).

Authors: Rabeya Begum; Falk Zakrzewski; Gerhard Menzel; Beatrice Weber; Sheikh Shamimul Alam; Thomas Schmidt
Journal: Ann Bot Date: 2013-05-10 Impact factor: 4.357

5. PANDAseq: paired-end assembler for illumina sequences.

Authors: Andre P Masella; Andrea K Bartram; Jakub M Truszkowski; Daniel G Brown; Josh D Neufeld
Journal: BMC Bioinformatics Date: 2012-02-14 Impact factor: 3.169

6. SyMAP v3.4: a turnkey synteny system with application to plant genomes.

Authors: Carol Soderlund; Matthew Bomhoff; William M Nelson
Journal: Nucleic Acids Res Date: 2011-03-11 Impact factor: 16.971

7. ALLMAPS: robust scaffold ordering based on multiple maps.

Authors: Haibao Tang; Xingtan Zhang; Chenyong Miao; Jisen Zhang; Ray Ming; James C Schnable; Patrick S Schnable; Eric Lyons; Jianguo Lu
Journal: Genome Biol Date: 2015-01-13 Impact factor: 13.583

8. Genome-Wide Distribution, Organisation and Functional Characterization of Disease Resistance and Defence Response Genes across Rice Species.

Authors: Sangeeta Singh; Suresh Chand; N K Singh; Tilak Raj Sharma
Journal: PLoS One Date: 2015-04-22 Impact factor: 3.240

9. KAT: a K-mer analysis toolkit to quality control NGS datasets and genome assemblies.

Authors: Daniel Mapleson; Gonzalo Garcia Accinelli; George Kettleborough; Jonathan Wright; Bernardo J Clavijo
Journal: Bioinformatics Date: 2017-02-15 Impact factor: 6.937

10. Trimmomatic: a flexible trimmer for Illumina sequence data.

Authors: Anthony M Bolger; Marc Lohse; Bjoern Usadel
Journal: Bioinformatics Date: 2014-04-01 Impact factor: 6.937

7 in total

1. The genome of hibiscus hamabo reveals its adaptation to saline and waterlogged habitat.

Authors: Zhiquan Wang; Jia-Yu Xue; Shuai-Ya Hu; Fengjiao Zhang; Ranran Yu; Dijun Chen; Yves Van de Peer; Jiafu Jiang; Aiping Song; Longjie Ni; Jianfeng Hua; Zhiguo Lu; Chaoguang Yu; Yunlong Yin; Chunsun Gu
Journal: Hortic Res Date: 2022-03-23 Impact factor: 7.291

2. Identification and validation of reference genes for real-time quantitative RT-PCR analysis in jute.

Authors: Md Sabbir Hossain; Rasel Ahmed; Md Samiul Haque; Md Monjurul Alam; Md Shahidul Islam
Journal: BMC Mol Biol Date: 2019-04-29 Impact factor: 2.946

3. The Development of Macrophomina phaseolina (Fungus) Resistant and Glufosinate (Herbicide) Tolerant Transgenic Jute.

Authors: Shuvobrata Majumder; Karabi Datta; Chirabrata Sarkar; Subhas C Saha; Swapan K Datta
Journal: Front Plant Sci Date: 2018-07-10 Impact factor: 5.753

Review 4. Population Genomic Approaches for Weed Science.

Authors: Sara L Martin; Jean-Sebastien Parent; Martin Laforest; Eric Page; Julia M Kreiner; Tracey James
Journal: Plants (Basel) Date: 2019-09-19

Review 5. Targeted Metagenomics of Retting in Flax: The Beginning of the Quest to Harness the Secret Powers of the Microbiota.

Authors: Christophe Djemiel; Estelle Goulas; Nelly Badalato; Brigitte Chabbert; Simon Hawkins; Sébastien Grec
Journal: Front Genet Date: 2020-10-27 Impact factor: 4.599

6. Genome Wide Analysis of Citrus sinensis Heat Shock Proteins.

Authors: Waqar Shafqat; Muhammad Jafar Jaskani; Rizwana Maqbool; Ahmad Sattar Khan; Summar Abbas Naqvi; Zulfiqar Ali; Iqrar Ahmad Khan
Journal: Iran J Biotechnol Date: 2020-10-01 Impact factor: 1.671

7. Reference genomes of the two cultivated jute species.

Authors: Lilan Zhang; Xiaokai Ma; Xingtan Zhang; Yi Xu; Aminu Kurawa Ibrahim; Jiayu Yao; Huaxing Huang; Shuai Chen; Zhenyang Liao; Qing Zhang; Sylvain Niyitanga; Jiaxin Yu; Yi Liu; Xiuming Xu; Jingjing Wang; Aifen Tao; Jiantang Xu; Siyuan Chen; Xin Yang; Qingyao He; Lihui Lin; Pingping Fang; Liemei Zhang; Ray Ming; Jianmin Qi; Liwu Zhang
Journal: Plant Biotechnol J Date: 2021-07-08 Impact factor: 9.803

7 in total