Literature DB >> 28736702

Cultivar-specific transcriptome prediction and annotation in Ficus carica L.

Liceth Solorzano Zambrano¹, Gabriele Usai¹, Alberto Vangelisti¹, Flavia Mascagni¹, Tommaso Giordani¹, Rodolfo Bernardi¹, Andrea Cavallini¹, Riccardo Gucci¹, Giovanni Caruso¹, Claudio D'Onofrio¹, Mike Frank Quartacci¹, Piero Picciarelli¹, Barbara Conti¹, Andrea Lucchi¹, Lucia Natali¹.

Abstract

The availability of transcriptomic data sequence is a key step for functional genomics studies. Recently, a repertoire of predicted genes of a Japanese cultivar of fig (Ficus carica L.) was released. Because of the great phenotypic variability that can be found in this species, we decided to study another fig genotype, the Italian cv. Dottato, in order to perform comparative studies between the two cultivars and extend the pan genome of this species. We isolated, sequenced and assembled fig genomic DNA from young fruits of cv. Dottato. Then, putative gene sequences were predicted and annotated. Finally, a comparison was performed between cvs. Dottato and Horaishi predicted transcriptomes. Our data provide a resource (available at the Sequence Read Archive database under SRP109082) to be used for functional genomics of fig, in order to fill the gap of knowledge still existing in this species concerning plant development, defense and adaptation to the environment.

Entities: Chemical Disease Species

Year: 2017 PMID： 28736702 PMCID： PMC5510491 DOI： 10.1016/j.gdata.2017.07.005

Source DB: PubMed Journal: Genom Data ISSN： 2213-5960

Direct link to deposited data

Deposited data can be found at: https://www.ncbi.nlm.nih.gov/Traces/study/?acc=SRP109082.

Experimental design, materials and methods

Sample collection, DNA isolation, generation and trimming of sequence data

Epidermal and sub-epidermal tissues of young fruits of the cv. Dottato, a common parthenocarpic variety, were isolated under a dissection microscope. Then, fig DNA was isolated using the CTAB protocol described by Mascagni et al. [1]. Nuclear DNA was used for the construction of paired-end libraries (insert size of 500–600 bp) using the TruSeq DNA sample kit (Illumina Inc., San Diego, CA, USA) according to the standard Illumina protocol. The DNA sequencing was carried out with two different sequencers: MiSeq and HiSeq2000 sequencer (Illumina). HiSeq and MiSeq paired reads were trimmed using Trimmomatic [2] to remove adapters and low quality regions, using the following parameters: ILLUMINACLIP:2:30:10; LEADING:20; TRAILING:20; SLIDINGWINDOW:4:20; and MINLEN:25. Duplicated reads were discarded using CLC-BIO Genomic Workbench 8.0 (CLC-BIO, Aarhus, Denmark).

Sequence assembly

The HiSeq reads that passed the quality check (12.96 genome equivalents, 25 to 110 nt long) were analysed with KmerGenie [3] to detect the best k-mer for the assembly (best k = 25). De novo assembly of these reads was performed using CLC-BIO Genomic Workbench 8.0 (with mismatch cost = 2, insertion cost = 3, deletion cost = 3, length fraction = 0.5, similarity fraction = 0.8, word size = 25). HiSeq reads produced 158,440 contigs, with N50 = 1137 nt. The MiSeq reads were used to reconstruct long reads by 3′ overlapping with a minimum overlap of 30 bp and a maximum mismatch ratio of 0.4. Errors on ends were mutually corrected by best scoring bases. After quality check, reads (25.64 genome equivalents, 35 to 511 nt long) were analysed with KmerGenie to find the best k-mer for the assembly (best k = 57). Assembly was then performed using CLC-BIO Genomic Workbench 8.0 (with same parameters as above but Word size = 57). MiSeq reads produced 277,111 contigs, with N50 = 2575 nt. A hybrid assembly was then performed using all contigs previously assembled by CLC-BIO Genomic Workbench 8.0, using Minimus2 (-D REFCOUNT = 158,440 -D MINID = 90), a tool from the AMOS toolbox [4], and obtaining 52,167 supercontigs (mean length 3615 nt, N50 = 5341 nt) and 236,059 single contigs. Contigs and supercontigs with organellar read contamination were removed by masking against a Rosaceae organellar database using RepeatMasker (-s -no_is -nolow -X -lib) [5]. After organellar removal, scaffolds were obtained from the pre-assembled sequences using the SSPACE 2.0 software (-k 5 -a 0.70 -T 5 -n 15 -p 1) [6]. This produced 264,088 scaffolds with average size = 1225 nt (max size = 41,760), N50 = 2523 nt and GC content = 33.6%. Overall, 323,708,138 nt of sequence were produced, corresponding to 87.5% of the fig genome size. Whole DNA-Seq data were submitted to the NCBI Sequence Read Archive (accession number SRP109082).

Gene prediction and annotation

Gene prediction was performed on scaffolds and supercontigs longer than 1000 nt using AUGUSTUS [7] with Arabidopsis gene models and default parameters. After retaining only the best score for each predicted gene, a total of 41,857 predicted genes were found, with a gene average length of 2135 bp and an average CDS length of 1230 bp. Total predicted gene length was 89,366,702 nt (corresponding to 24.2% of the genome). Total putative intron length was 33,896,665 nt, corresponding to 37.9% of the gene portion. A fasta file with the predicted genes of cv. Dottato is available at the Department of Agriculture, Food, and Environment of the University of Pisa repository website (http://www.agr.unipi.it/index.php/ricerca/plant-genetics-and-genomics-lab/sequence-repository). Predicted CDSs were subject to BLAST2GO [8] for finding similarities with known protein sequences and collecting the corresponding gene ontologies. In order to identify the biological pathways active in F. carica, the predicted CDSs were also annotated with corresponding EC numbers against the Kyoto Encyclopedia of Genes and Genomes (KEGG) Pathways database [9]. By mapping EC numbers to the reference canonical pathways, a total of 6608 contigs (15.8%) were assigned to 146 KEGG biochemical pathways. Fig. 1 shows the 30 KEGG metabolic pathways mostly represented by unique sequences of F. carica. The most abundant pathways include purine metabolism, thiamine metabolism and biosynthesis of antibiotics.

Fig. 1

Top 30 KEGG metabolic pathways in F. carica cv. Dottato predicted transcriptome.

Comparative analysis between predicted transcriptomes of two fig cultivars

A BLASTN analysis was performed to evaluate the differences between the predicted transcriptomes of the two fig cultivars - the Italian cv Dottato (this article) and the Japanese cv. Horaishi [10]. Whilst the vast majority of genes were found in both cultivars, a number of genes were recovered specifically in either the Japanese cultivar or the Italian cultivar (Fig. 2). Obviously, genes predicted only in the Japanese genotype could simply be missing in the Italian fig genome because of the lower sequence coverage used in our experiments. By contrast, predicted genes specific to cv. Dottato might represent genes that are not present (or are largely different) in the cv. Horaishi genome. Among KEGG pathways, significantly over-represented in the cv. Dottato predicted transcriptome, we found phosphoglycerolipid metabolism, involved in membrane composition and signal transduction, and cyano amino acid metabolism, involved in the chemical defense against herbivores and pathogens.

Fig. 2

Venn diagram showing a comparison between the predicted fig transcriptomes of cv. Horaishi and cv. Dottato.

Discussion

The availability of a gDNA-based reference transcriptome is the best option for RNA-seq analyses of gene expression. Such a transcriptome was used, for example, in tree species under abiotic stress [11], [12], [13]. Such a reference transcriptome is available for fig [10]. However, it is known that differences in the genome (and even in the transcriptome) composition can occur among genotypes of the same species. For example, large variations in the coding portion of the genome were found between maize inbreds [14]. In this sense, the availability of the predicted transcriptome of a specific genotype allows a more precise and complete analysis of gene expression in that genotype. Moreover, extending the number of reference transcriptomes of a species allows the characterization of the pan-genome of that species. Overall, 41,857 predicted genes of F. carica cv. Dottato were included in the fig reference transcriptome. Predicted genes were characterized by gene ontology and metabolic pathway. Among KEGG metabolic pathways, the most represented was purine metabolism (1328 members), a metabolic pathway of central significance in plant growth and development [15]. Differences were observed in the predicted gene repertoire of the two cultivars, with 4803 and 2383 genes specifically found in the Horaishi and in the Dottato predicted transcriptomes, respectively. Interestingly, many genes specific to the cv. Dottato predicted transcriptome are related to the chemical defense against herbivores and pathogens. Our data serves as a resource for fig functional genomics and can be employed to address existing questions in this plant species relating to development, defense and adaptation to the environment.

Conflict of interest

Authors declare no conflict of interest.

Specifications
Organism/cell line/tissue	Ficus carica/Cv. Dottato/developing fruit (2 cm in diameter) epidermal and sub-epidermal tissue
Sex	F
Sequencer or array type	Illumina MiSeq and HiSeq2000
Data format	Raw data: FASTQ files, processed data: txt files
Experimental factors	Genomic DNA
Experimental features	gDNA-seq dataset for genome assembly and gene prediction
Consent	N/A
Sample source location	43°35′22.1″N, 10°38′27.9″E, Capannoli, Pisa, Italy

12 in total

1. The KEGG resource for deciphering the genome.

Authors: Minoru Kanehisa; Susumu Goto; Shuichi Kawashima; Yasushi Okuno; Masahiro Hattori
Journal: Nucleic Acids Res Date: 2004-01-01 Impact factor: 16.971

2. Scaffolding pre-assembled contigs using SSPACE.

Authors: Marten Boetzer; Christiaan V Henkel; Hans J Jansen; Derek Butler; Walter Pirovano
Journal: Bioinformatics Date: 2010-12-12 Impact factor: 6.937

3. Informed and automated k-mer size selection for genome assembly.

Authors: Rayan Chikhi; Paul Medvedev
Journal: Bioinformatics Date: 2013-06-03 Impact factor: 6.937

Review 4. Pyrimidine and purine biosynthesis and degradation in plants.

Authors: Rita Zrenner; Mark Stitt; Uwe Sonnewald; Ralf Boldt
Journal: Annu Rev Plant Biol Date: 2006 Impact factor: 26.379

5. Gene prediction with a hidden Markov model and a new intron submodel.

Authors: Mario Stanke; Stephan Waack
Journal: Bioinformatics Date: 2003-10 Impact factor: 6.937

6. Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research.

Authors: Ana Conesa; Stefan Götz; Juan Miguel García-Gómez; Javier Terol; Manuel Talón; Montserrat Robles
Journal: Bioinformatics Date: 2005-08-04 Impact factor: 6.937

7. Minimus: a fast, lightweight genome assembler.

Authors: Daniel D Sommer; Arthur L Delcher; Steven L Salzberg; Mihai Pop
Journal: BMC Bioinformatics Date: 2007-02-26 Impact factor: 3.169

8. Identification of RAN1 orthologue associated with sex determination through whole genome sequencing analysis in fig (Ficus carica L.).

Authors: Kazuki Mori; Kenta Shirasawa; Hitoshi Nogata; Chiharu Hirata; Kosuke Tashiro; Tsuyoshi Habu; Sangwan Kim; Shuichi Himeno; Satoru Kuhara; Hidetoshi Ikegami
Journal: Sci Rep Date: 2017-01-25 Impact factor: 4.379

9. Trimmomatic: a flexible trimmer for Illumina sequence data.

Authors: Anthony M Bolger; Marc Lohse; Bjoern Usadel
Journal: Bioinformatics Date: 2014-04-01 Impact factor: 6.937

10. Repetitive DNA and Plant Domestication: Variation in Copy Number and Proximity to Genes of LTR-Retrotransposons among Wild and Cultivated Sunflower (Helianthus annuus) Genotypes.

Authors: Flavia Mascagni; Elena Barghini; Tommaso Giordani; Loren H Rieseberg; Andrea Cavallini; Lucia Natali
Journal: Genome Biol Evol Date: 2015-11-24 Impact factor: 3.416

4 in total

1. Cloning and Aggregation Characterization of Rubber Elongation Factor and Small Rubber Particle Protein from Ficus carica.

Authors: Saki Yokota; Yurina Suzuki; Keisuke Saitoh; Sakihito Kitajima; Norimasa Ohya; Takeshi Gotoh
Journal: Mol Biotechnol Date: 2018-02 Impact factor: 2.695

2. Physiological and molecular responses for long term salinity stress in common fig (Ficus carica L.).

Authors: Monther T Sadder; Ibrahim Alshomali; Ahmad Ateyyeh; Anas Musallam
Journal: Physiol Mol Biol Plants Date: 2021-01-23

3. How an ancient, salt-tolerant fruit crop, Ficus carica L., copes with salinity: a transcriptome analysis.

Authors: Alberto Vangelisti; Liceth Solorzano Zambrano; Giovanni Caruso; Desiré Macheda; Rodolfo Bernardi; Gabriele Usai; Flavia Mascagni; Tommaso Giordani; Riccardo Gucci; Andrea Cavallini; Lucia Natali
Journal: Sci Rep Date: 2019-02-22 Impact factor: 4.379

4. LTR-retrotransposon dynamics in common fig (Ficus carica L.) genome.

Authors: Alberto Vangelisti; Samuel Simoni; Gabriele Usai; Maria Ventimiglia; Lucia Natali; Andrea Cavallini; Flavia Mascagni; Tommaso Giordani
Journal: BMC Plant Biol Date: 2021-05-17 Impact factor: 4.215

4 in total