Literature DB >> 30642874

Iso-Seq Allows Genome-Independent Transcriptome Profiling of Grape Berry Development.

Andrea Minio¹, Mélanie Massonnet¹, Rosa Figueroa-Balderas¹, Amanda M Vondras¹, Barbara Blanco-Ulate², Dario Cantu³.

Abstract

Transcriptomics has been widely applied to study grape berry development. With few exceptions, transcriptomic studies in grape are performed using the available genome sequence, PN40024, as reference. However, differences in gene content among grape accessions, which contribute to phenotypic differences among cultivars, suggest that a single reference genome does not represent the species' entire gene space. Though whole genome assembly and annotation can reveal the relatively unique or "private" gene space of any particular cultivar, transcriptome reconstruction is a more rapid, less costly, and less computationally intensive strategy to accomplish the same goal. In this study, we used single molecule-real time sequencing (SMRT) to sequence full-length cDNA (Iso-Seq) and reconstruct the transcriptome of Cabernet Sauvignon berries during berry ripening. In addition, short reads from ripening berries were used to error-correct low-expression isoforms and to profile isoform expression. By comparing the annotated gene space of Cabernet Sauvignon to other grape cultivars, we demonstrate that the transcriptome reference built with Iso-Seq data represents most of the expressed genes in the grape berries and includes 1,501 cultivar-specific genes. Iso-Seq produced transcriptome profiles similar to those obtained after mapping on a complete genome reference. Together, these results justify the application of Iso-Seq to identify cultivar-specific genes and build a comprehensive reference for transcriptional profiling that circumvents the necessity of a genome reference with its associated costs and computational weight.

Entities: Chemical Disease Gene Species

Keywords: RNA-seq; Vitis vinifera; berry ripening; transcriptome reference

Mesh：

Year: 2019 PMID： 30642874 PMCID： PMC6404599 DOI： 10.1534/g3.118.201008

Source DB: PubMed Journal: G3 (Bethesda) ISSN： 2160-1836 Impact factor: 3.154

Grape berries undergo a series of complex physiological and biochemical changes during their development that determine their characteristics at harvest (Kuhn ). Genome-wide expression studies using microarray and, more recently, RNA sequencing (RNA-seq) revealed that berry development involves the expression and modulation of approximately 23,000 genes (Massonnet ) and that the ripening transition is associated with a major transcriptome shift (Fasoli ). Transcriptomic studies characterized the ripening program across grapevine cultivars (Venturini ; Da Silva ; Jiao ; Massonnet ), identifying key ripening-related genes (Palumbo ; Massonnet ) and determining the impact of stress and viticultural practices on ripening (Deluc ; Pastore ; Xi ; Amrine ; Blanco-Ulate , 2017; Corso ; Hopper ; Savoi ; Zenoni ; Lecourieux ; Massonnet ). This knowledge increases the possibility of exerting control over the ripening process, improving fruit composition under suboptimal or adverse conditions, and enhancing desirable traits in a crop with outstanding cultural and commercial significance (Savoi , 2017; Serrano ; Zenoni ). These genome-wide expression analyses were possible because a highly contiguous assembly for the species was produced (Jaillon ); this first effort used a grape line (PN40024) created by several rounds of backcrossing to reduce heterozygosity, facilitating genome assembly (Jaillon ). Though poor by current standards, this pioneering, chromosome-resolved assembly served as the basis for numerous publications. However, the structural diversity of grape genomes makes using a single one-size-fits-all reference genome inappropriate (Golicz , 2016b). There is substantial unshared gene content between cultivars, with 8–10% of the genes missing when two cultivars are compared (Da Silva ). Although many of these genes are not essential for plant survival, they can account for 80% of the expression within their respective families and expand key gene families possibly associated with cultivar-specific traits (Da Silva ). Assembling genome references for all interesting cultivars is impractical, in part because its cost remains prohibitive and because of genomic features that impede the development of high-quality genome assemblies for any grape cultivar. Although the V. vinifera genome is relatively small (∼487Mb) (Lodhi and Reisch 1995; Jaillon ) and as repetitive as other plant genomes of similar size (41.4%) (Jaillon ; Michael and Jackson 2013), it is highly heterozygous (∼13%) (Jaillon ; Velasco ). Most domesticated grape cultivars are crosses between distantly related parents; this and clonal propagation cause the high heterozygosity observed in the species (Strefeler ; Ohmi ; Bowers and Meredith 1997; Sefc ; Lopes ; Di Gaspero ; Tapia ; Ibáñez ; Cipriani ; Myles ; Lacombe ; Chin ; Minio ; Zhou ). Earlier attempts using short reads struggled to resolve complex, highly heterozygous genomes (Gnerre ; Huang ; Di Genova ; Kajitani ; Safonova ). A limited ability to call consensus polymorphic regions yields highly fragmented assemblies where structural ambiguity occurs and alternative alleles at heterozygous sites are excluded altogether (Velasco ). Single Molecule Real Time (SMRT) DNA sequencing (Pacific Biosciences, California, USA) has emerged as the leading technology for reconstructing highly contiguous, diploid assemblies of long, repetitive genomes that include phased information about heterozygous sites (Chin , 2016; Doi ; Gordon ; Pryszcz and Gabaldón 2016; Ricker ; Seo ; Vij ; Huddleston ). Recently, we used Vitis vinifera cv. Cabernet Sauvignon to test the ability of SMRT reads and the FALCON-Unzip assembly pipeline to resolve both alleles at heterozygous sites in the genome (Chin ). The assembly produced was significantly more contiguous (contig N50 = 2.17 Mb) than the original PN40024 assembly (contig N50 = 102.7 kbp) and provided the first phased sequences of the diploid V. vinifera genome (Minio ). Despite recent advances in genome reconstruction methodologies, assembling a complex plant genome is still costly. Transcriptome reconstruction is the only alternative strategy to depict known and unknown gene content information (Venturini ; Da Silva ; Jiao ). De novo assembly of RNA-seq reads is widely used for this purpose (Grabherr ; Ashrafi ; Venturini ; Bellucci ). SMRT technology was recently deployed to investigate expressed gene isoforms (Iso-Seq) in a variety of organisms, including a handful of plant species (Liu ; Zulkapli ; Filichkin ). Long reads delivered by this methodology report full-length transcripts sequenced from their 5′-ends to polyadenylated tails (Dong ; Weirather ; Gao ; Tombácz ; Kuo ; Price and Gibas 2017; Workman ), making Iso-Seq an ideal technology for reconstructing a transcriptome without a reference genome sequence and without assembling fragments to resolve the complete isoform sequence (Honaas ; Ju ). Moreover, alternative transcripts that contribute to the gene space complexity (Brett ) and vary with cell type (Swarup ), developmental stage (Thatcher ), and stress (Yan ; Liu ) cannot be definitively characterized without full-length transcript information. The objective of this study was to test whether full-length cDNA sequencing with Iso-Seq technology is a suitable alternative to traditional genome sequencing, assembly, and annotation for reconstructing a grape transcriptome reference for transcriptional profiling. We compared how Cabernet Sauvignon’s Iso-Seq transcriptome fares as a reference for RNA-seq analysis vs. its annotated genome. We sequenced the full-length transcripts of ripening berries with Iso-Seq and Illumina RNA-seq reads. The high-coverage short-read data were used to profile gene expression and to error-correct low-expression isoforms that would have been otherwise lost by the standard Iso-Seq pipeline. The transcriptome reference built with Iso-Seq data represented most of the expressed genes in the grape berries and included cultivar-specific or “private” genes. When used as the reference for RNA-seq, Iso-Seq generated transcriptome profiles quantitatively similar to those obtained by mapping on a complete genome reference. These results support using Iso-Seq to capture the gene space of a plant and build a comprehensive reference for transcriptional profiling without a pre-defined reference genome.

Methods

Plant material and RNA isolation

Grape berries from Cabernet Sauvignon FPS clone 08 were collected in Summer 2016 from vines grown in the Foundation Plant Services (FPS) Classic Foundation Vineyard (Davis, CA, USA). Between 10 and 15 berries were sampled at pre-véraison, véraison, post-véraison, and at commercial maturity. Table S1 provides weather information for the sampling days. The ripening stages were visually assessed based on color development and confirmed by measurements of soluble solids (Figure S1; Table S2). On the day of sampling, berries were deseeded, frozen in liquid nitrogen, and ground to powder (skin and pulp). Total RNA was isolated using a Cetyltrimethyl Ammonium Bromide (CTAB)-based extraction protocol as described in Blanco-Ulate . RNA purity was evaluated with a Nanodrop 2000 spectrophotometer (Thermo Scientific, Hanover Park, IL, USA). RNA was quantified with a Qubit 2.0 Fluorometer using the RNA broad range kit (Life Technologies, Carlsbad, CA, USA). RNA integrity was assessed using electrophoresis and an Agilent 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA, USA). Only RNA with integrity number (RIN) greater than 8.0 was used for SMRTbell library preparation.

Library preparation and sequencing

RNAs from four biological replicates per developmental stage were pooled in equal amounts. One µg of the pooled RNA was used for cDNA synthesis and for SMRTbell library construction using the SMARTer PCR cDNA synthesis kit (Clontech Laboratories, Inc. Mountain View, CA, USA). First-strand cDNA synthesis was performed using the SMRTScribe Reverse Transcriptase (Clontech Laboratories, Inc. Mountain View, CA, USA). Each developmental stage was individually barcoded (Table S3). To minimize artifacts during large-scale amplification, a cycle optimization step was performed by collecting five 5 µl aliquots at 10, 12, 14, 16, and 18 PCR cycles. PCR reaction aliquots were loaded on an E-Gel pre-cast agarose gel 0.8% (Invitrogen, Life Technologies, Carlsbad, CA, USA) to determine the optimal cycle number. Second-strand cDNA was synthesized and amplified using the Kapa HiFi PCR kit (Kapa Biosystems, Wilmington, MA, USA) with the 5′ PCR primer IIA (Clontech Laboratories, Inc. Mountain View, CA, USA) following the manufacturer’s instructions. Large-scale PCR was performed using the number of cycles determined during the optimization step (14 cycles). Barcoded double-stranded cDNAs were pooled at equal amounts and used for size selection. Size selection was carried out with a BluePippin (Sage Science, Beverly, MA, USA) and 1-2 kbp, 2-3 kbp, 3-6 kbp, and 5-10 kbp fractions were collected. After size selection, each fraction was PCR-enriched prior to SMRTbell template library preparation. cDNA SMRTbell libraries were prepared using 1-3 µg of PCR enriched size-selected samples, followed by DNA damage repair and SMRTbell ligation using the SMRTbell Template Prep Kit 1.0 (Pacific Biosciences, Menlo Park, CA, USA). A second size selection was performed on the 3-6 kbp and 5-10 kbp fractions to remove short contaminating SMRTbell templates. A total of 8 SMRT cells were sequenced on a PacBio Sequel system (DNA Technologies Core, University of California, Davis, USA). Demultiplexing, filtering, quality control, clustering and polishing of the Iso-Seq sequencing data were performed using SMRT Link ver. 4.0.0 (Table S4). Iso-Seq read error rates were estimated using the identity of the best alignment on the diploid Cabernet Sauvignon genomic assembly (Chin ). Alignment was performed with GMAP ver. 2015-09-29 (Wu and Watanabe 2005) using the parameters “-K 20000 -B 4 -f 2”. Coding sequences (CDS) were identified using Transdecoder (Haas ) as implemented in the PASA ver. 2.1.0 (Haas ). RNA-seq libraries were prepared using the Illumina TruSeq RNA sample preparation kit v2 (Illumina, San Diego, CA, USA), following the low-throughput protocol. Each biological replicate was barcoded individually. Libraries were evaluated for quantity and quality with the High Sensitivity chip on a Bioanalyzer 2100 (Agilent Technologies, Santa Clara, CA, USA) and sequenced on an Illumina HiSeq4000 (DNA Technologies Core Facility, University of California, Davis, USA; Table S5). Quality filtering and adapter trimming were performed with Trimmomatic ver. 0.36 (Bolger ) using the following parameters: “ILLUMINACLIP:2:30:10 LEADING:7 TRAILING:7 SLIDINGWINDOW:10:20 MINLEN:36” (Table S5). Error correction of the Full-length Non-Chimeric (FLNC) Iso-Seq reads was carried out using high-quality Illumina reads and LSC ver. 2.0 (Au ) with a minimum coverage threshold of 5 reads (“–short_read_coverage_threshold 5”).

Transcriptome reconstruction and annotation

Isoforms identified by the SMRT Link pipeline and error-corrected Iso-Seq reads were merged using EvidentialGene (Gilbert 2013) in order to obtain a non-redundant transcriptome (ISNT). Contaminant sequences were searched by parsing blastn (Altschul ) alignments against the NCBI NT database (ftp://ftp.ncbi.nlm.nih.gov/blast/db/, retrieved January 17th, 2017) using Megan ver. 6.6.5 (Huson ) with default parameters. Sequences detected as non-viridiplantae were removed. Isoforms with no RNA-seq read mapping on their sequence over the 16 samples (662 isoforms) were considered as putative artifacts and were also discarded. ISNT sequences are provided in File S1. Functional annotation was performed with blastx (Altschul ) using RefSeq plant proteins as database (ftp://ftp.ncbi.nlm.nih.gov/refseq, retrieved January 17th, 2017) imposing an HSP length cutoff of 50 amino acids. Functional domains were identified with InterProScan ver. 5.28-68.0 (Jones ; File S2). Tree view of identified GO terms was generated using WEGO ver. 2.0 (Ye ). Iso-Seq reads were considered derived from repetitive regions when showing a RepeatMasker (Smit ) hit with coverage ≥ 75% and an identity ≥ 50% using the custom-created repeat library (see below). Non-coding RNAs were identified with Infernal ver. 1.1.2 (Nawrocki ) using the Rfam database ver. 12.2 (Nawrocki ). Secondary overlapping alignments and structures with an e-value ≥ 0.01 were rejected. Hits on the minus strand of the Iso-Seq reads were rejected as well as matches that were truncated or covering less than 80% of the entire read.

Cabernet Sauvignon genome annotation

Cabernet Sauvignon primary contigs and haplotigs (Chin ) were used as genomic reference. Primary contigs are the backbone sequence of the genome assembly and haplotigs are the alternative haplotype sequences where the assembler could phase the two haplotypes. A repeat library was created ad hoc for Cabernet Sauvignon. MITEs were identified with MITEHunter ver. 11.2011 (Han and Wessler 2010); LTRs and TRIMs were identified with LTRharvest (GenomeTools ver. 1.5.7; Ellinghaus ) and LTRdigest (GenomeTools ver. 1.5.7; Steinbiss ). RepeatModeler ver. 1.0.8 (Smit and Hubley 2008), and RepeatMasker ver. open-4.0.6 (Smit ) were used to generated a custom library of repeat models. Repetitive elements in Cabernet Sauvignon genome were then identified with RepeatMasker (Smit ) using both custom and plant repeat models altogether (Table S6). Ab initio trainings and predictions were carried out with SNAP ver. 2006-07-28 (Korf 2004), Augustus ver. 3.0.3 (Stanke ), GeneMark-ES ver. 4.32 (Lomsadze ), GlimmerHMM ver. 3.0.4 (Majoros ), GeneID ver. 1.4.4 (Parra ) and Twinscan ver. 4.1.2 (Korf ; Brent 2008) using TAIR10 annotation for Arabidopsis as informant species. MAKER-P ver. 2.31.3 (Campbell ) was used to integrate the ab initio predictions with the experimental evidence listed in Table S7 using the parameters reported in File S3. Only MAKER-P models showing an Annotation Edit Distance (AED) < 0.5 were kept. Gene structure refinement was carried out with PASA ver. 2.1.0 (Haas ), parameters can be found in File S4. As transcriptomic evidence, we used the Iso-Seq data as well all the publicly available grape transcriptomic data (Table S7). Public Cabernet Sauvignon RNA-seq data (Table S7) were de novo assembled separately using HISAT2 and Stringtie ver. 1.1.3 (Pertea ). De novo transcript sequences were then clustered in a non-redundant dataset using CD-HIT-EST ver. 4.6 (Li and Godzik 2006) with an identity threshold of 99%. Genome annotation summary is reported in Table S8. Alternative splicing forms were classified using AStalavista ver. 3.0 (Foissac and Sammeth 2007). Non-coding RNAs were searched with Infernal ver. 1.1.2 (Nawrocki ) as described above (Table S9). The predicted transcripts’ functional annotation was made with blastp search using the RefSeq plant protein database. Functional domain identification was done with InterProScan as described above for the ISNT. For each gene locus, a non-redundant list of the GO terms attributed to all the alternative transcripts was generated.

Gene expression analysis with RNA-seq

For expression profiling, short reads were aligned on transcript sequences using Bowtie2 ver. 2.26 (Langmead and Salzberg 2012) with options “–sensitive–dpad 0–gbar 99999999–mp 1,1–np 1–score-min L,0,-0.1”. Evaluation of expression at isoform and gene locus levels was carried out using RSEM ver. 1.1.14b3 (Li and Dewey 2011) with default parameters. Differential expression analysis was performed using EBSeq ver. 1.16.0 (Leng ). For each pairwise comparison of consecutive growth stages, size factors were calculated with median normalization using five iterations of the EM algorithm. Genes were considered as significantly differentially expressed if they had a minimum posteriori estimate probability (PPDE) threshold of 0.95 and mean RPKM greater than 1 in at least one of two growth stages. Pearson correlation matrices and PCAs were performed using the log2-transformed RPKM values. Heatmaps of the Pearson correlation matrices and dendrograms were generated using heatmap.2 function from the gplots R package ver. 3.0.1 (Warnes ). PCAs were carried out in R with the FactoMineR package ver. 1.41 (Lê ). In order to compare expression values of ISNT transcripts and gene loci in the Cabernet Sauvignon genes, ISNT transcripts were aligned to the Cabernet Sauvignon genome using GMAP as described above. ISNT transcripts were associated with gene loci annotated on the Cabernet Sauvignon genome if they aligned at least for 66% of their length. Alignments with translocation were excluded (Table S10). For each ISNT-gene locus association, gene expression was calculated as the sum of the mean expression values of all ISNT transcripts and all Cabernet Sauvignon gene loci, separately, using RSEM ver. 1.1.14b3 (Li and Dewey 2011). Differential expression analysis at cluster level was performed using EBSeq ver. 1.16.0 (Leng ).

Data Availability

Sequencing data are accessible through NCBI (PRJNA433195) and other relevant datasets, such as protein-coding gene and repeat coordinates, can be retrieved from the Cantu lab github repository (http://cantulab.github.io/data.html). Supplemental material available at Figshare: https://doi.org/10.25387/g3.7291943.

Results

Isoform sequencing of the grape transcriptome during berry development

To obtain a comprehensive representation of the transcripts expressed during berry development, we isolated RNA from Cabernet Sauvignon berries (Figure 1) before the onset of ripening (4.35 ± 0.39 °Brix), at véraison (10.94 ± 0.26 °Brix), after véraison (18.38 ± 0.61 °Brix), and at commercial ripeness (20.33 ± 0.76 °Brix). To avoid loading bias, cDNAs were fractionated based on their length to produce four libraries at each developmental stage in size ranges of 1-2 kbp, 2-3 kbp, 3-6 kbp, or 5-10 kbp (Figure 1). Libraries derived from different developmental stages were barcoded and libraries with similar cDNA size were pooled together. Each library pool was sequenced independently on two SMRT cells of a Pacific Biosciences Sequel system generating a total of 23.6 Gbp. In parallel, the same samples were sequenced using Illumina technology (25,655,771 ± 3,512,980 high-quality reads per sample) to provide high-coverage sequence information for error correction and for gene expression quantification (Table S5). Demultiplexing, filtering and quality control of SMRT sequencing data were performed using SMRT Link as described in the Methods section. A total of 672,635 full-length non-chimeric (FLNC; Figure 2) reads with a maximum length of 14.6 kbp and a N50 of 3.5 kbp were generated (Table S4). FLNC reads were further polished and clustered into 46,675 single representatives of expressed transcripts (henceforth, polished-clustered Iso-Seq reads or PCIRs) ranging from 400 bp to 8.8 kbp with a N50 of 3.6 kbp (Table S4). The alignment of FLNC reads and PCIRs to the genomic DNA contigs of the same Cabernet Sauvignon clone (Chin ; Minio ) confirmed that sequence clustering and polishing successfully increased sequence accuracy, whose median values were 95.4% in FLNC reads and 99.6% in the PCIRs. The increase in sequence accuracy was also reflected by the significantly longer detectable coding sequences (CDS) in the PCIRs compared to the short and fragmented CDS found in the FLNC reads (Figure 2). The residual sequence discrepancy between PCIRs and the genomic contigs could be explained by heterozygosity and/or sequencing errors, but unexpectedly not by expression level (Figure S2).

Figure 1

Figure 2

Diagram depicting the main steps of Iso-Seq read analysis. Raw Iso-Seq reads were processed to obtain Full-Length Non-Chimeric (FLNC) reads and clustered isoform reads (PCIRs). FLNC reads that did not cluster in any of the PCIRs were error-corrected using RNA-seq data (C-FLNC). The final dataset described in this study included PCIRs and C-FLNC reads. For each step, sequencing accuracy and predicted coding sequence (CDS) length distributions are reported.

Full-length cDNA sequencing (Iso-Seq) of Cabernet Sauvignon berries. Representative pictures of berry clusters at the four growth stages used in this study and the read length distribution of the Iso-Seq libraries by cDNA size fraction. FLNC, Full-Length Non-Chimeric. Diagram depicting the main steps of Iso-Seq read analysis. Raw Iso-Seq reads were processed to obtain Full-Length Non-Chimeric (FLNC) reads and clustered isoform reads (PCIRs). FLNC reads that did not cluster in any of the PCIRs were error-corrected using RNA-seq data (C-FLNC). The final dataset described in this study included PCIRs and C-FLNC reads. For each step, sequencing accuracy and predicted coding sequence (CDS) length distributions are reported. Over 18.5% of the FLNC reads did not cluster with any other reads and were discarded by the SMRT Link pipeline. When mapped on the genomic contigs, the uncorrected reads displayed a sequence accuracy that reflected the typical error rate of 10–20% of the technology (Figure 2) (Giordano ; Zimin ; Koren ). High error rates also resulted in short and fragmented detectable CDS (Figure 2). To recover the information carried by these 124,185 uncorrected FLNCs, which represented an important fraction of the transcriptome (see below), we error-corrected their sequences with LSC (Au ) using the short reads generated with Illumina technology. As for the PCIRs, error correction resulted in greater sequence accuracy and longer CDS (Figure 2). PCIRs and error-corrected FLNC (C-FLNC) reads were finally combined into a single dataset of 170,860 corrected Iso-Seq isoforms (CISIs). As low as 1.7% (2,826) of the CISIs showed significant homology with interspersed repeats. LTRs and LINEs were the most abundant orders with 778 and 729 representatives, respectively. Chloroplast and mitochondria genes represented a small fraction of the CISIs with only 89 isoforms (0.05%) having a significant match (50% identity and mutual alignment coverage). CISIs were also searched for non-coding RNA (ncRNA) using the covariance models of the Rfam database; only 182 isoforms were annotated as ncRNAs and were all ribosomal RNA (145 attributed to the large subunit, clan CL00112, and 37 to the small subunit, clan CL00111). Excluding these transcribed isoforms, only 164 CISIs (0.1%) failed to align to the Cabernet Sauvignon genomic contigs, confirming the completeness of the genome assembly and the negligible biological contamination of the berry samples. To reconstruct an Iso-Seq non-redundant transcriptome (ISNT) that would be tested as a reference for expression profiling, CISIs were clustered with EvidentialGene (Gilbert 2013) to reduce the redundancy between stages and libraries. One representative for each expressed transcript was retained for a total of 28,721 isoforms (File S1). Less than four percent of these isoforms did not contain a complete open reading frame, likely due to residual errors in their sequence. Mapping on the Cabernet Sauvignon genome showed that they potentially derive from 24,378 gene loci (i.e., protein-coding gene sequences) over both primary contigs (13,583 loci) and haplotigs (10,795 loci; File S5). Most isoforms (61.2%) aligned to both a locus on a primary contig and a locus on its associated haplotig, suggesting that despite the differences between haplotypes the mapping could not differentiate between alleles. For 10,727 loci (44%), multiple isoforms (2.03 ± 3.49 isoforms/locus) mapped on the same locus possibly due to alternative splicing and structural differences between alleles. ISNT included 77.6% of the BUSCO (Simão ) conserved orthologous; while far from representing the complete grape transcriptome, the ISNT dataset included a remarkably high fraction of these conserved genes, particularly considering that ISNT was constructed using expression data from berries only. Interestingly, 301 BUSCO complete gene models (20%) were found in multiple copies in the ISNT, suggesting that alternative isoforms of these highly conserved genes are expressed during ripening. Putative functions were assigned to the ISNT transcripts as described in the Methods section (Table S11). Only four sequences did not match any sequence in the databases used. Gene Ontology (GO) terms were assigned to 23,386 transcripts with an average of ∼6.9 GO terms per transcript (Table S11, File S6-S8).

Isoform sequencing allows the discovery of private Cabernet Sauvignon genes

Previous analyses of gene content in a limited number of grape cultivars showed that up to 10% of grape isoforms were not shared between genotypes. Some of these “dispensable” genes were associated with cultivar-specific characteristics (Da Silva ). To identify protein-coding transcripts characteristic of Cabernet Sauvignon (i.e., private genes), we looked for homologous sequences among the ISNT transcripts in the PN40024 genome (Jaillon ; Vitulo ) and in the transcriptomes of Corvina (Venturini ), Tannat (Da Silva ), and Nebbiolo (Gambino ). Approximately five percent of the ISNT (1,501 isoforms) did not have a homologous copy in any of the four datasets (Table S12). These putative Cabernet Sauvignon private isoforms were involved in various biological processes of berry development and ripening like phenylpropanoid/flavonoid biosynthesis (a chalcone synthase, a flavanone 3-hydroxylase, and a flavonoid 3′-hydroxylase (Falginella )), sugar accumulation and transport (ten sucrose-phosphate synthases, a phosphofructokinase, a glucose-6-phosphate dehydrogenase, a sucrose transport SUC4-like, a polyol transporter, and an inositol transporter (Afoufa-Bastien ; Xin )), water transport (eleven aquaporins), and cell wall metabolism and loosening (six cellulose synthases, a xyloglucan galactosyltransferase, one xyloglucan glycosyltransferase 9-like, two expansins, two xyloglucan endotransglucosylase/hydrolases, two pectinesterases, a pectin methylesterase/invertase inhibitor, seven polygalacturonases, and two β-galactosidases (Carey ; Cosgrove 2000, 2005)).

Protein-coding gene models in the Cabernet Sauvignon genomic assembly

To evaluate the non-redundant Iso-Seq transcriptome’s completeness and usefulness as a reference for RNA-seq analysis, the protein-coding genes in the Cabernet Sauvignon genome were predicted as described in Figure S3. First, the repetitive regions of the genome were masked using a custom-made library of Cabernet Sauvignon MITE, LTR, and TRIM information. Overall, ∼51% of the assembly consisted of repetitive elements (Table S6), with 412,994 repetitive elements on the primary assembly (313 Mb) and 274,123 on the haplotigs (177Mb), LTRs were the most abundant class, covering over 335 Mb of the genomic sequences, with Gypsy and Copia families accounting for 201 Mb and 104 Mb, respectively. Next, MAKER-P (Campbell ) identified putative protein-coding loci, combining the results of six ab initio predictors trained ad hoc with publicly available experimental evidences. Ab initio predictors were trained using a custom set of 4,000 randomly selected gene models out of the 5,636 high-quality, non-redundant, and highly conserved gene models of the PN40024 V1 transcriptome (4,459 multiexonic and 1,177 monoexonic). Experimental evidence from public databases (Table S7) was incorporated and used to validate the predicted models. The final MAKER-P prediction included 38,227 high-quality gene models (AED < 0.5) on the primary contigs and 26,789 on the associated haplotigs. Using the covariance models from the Rfam database, 5,780 non-overlapping putative long non-coding RNAs (ncRNAs; 3,239 on primary contigs and 2,541 on haplotigs) belonging to 275 different families were annotated (Table S9). Gene models were further improved using the information from all Iso-Seq full-length datasets (PCIRs, C-FLNC, FLNC), RNA-seq, and the publicly available grapevine transcriptome assemblies. This final refinement improved the annotation of the UTRs and added isoform information. PCIRs helped identify 155 new loci not detected by MAKER-P, update the structure of 10,801 gene models, and add 2,712 alternative transcripts. C-FLNC reads introduced 830 additional missing loci and added 3,738 alternative transcripts to the annotation. Together, 14,388 gene models were updated. FLNC reads introduced 14 new loci and 20,493 alternative transcripts, bringing the number of updated model structures to 24,945. Predicted genes without similarity to known proteins in the RefSeq database and without any functional domains identified by InterProScan (Jones ) were removed. The final predicted transcriptome included 55,887 transcripts on 36,689 loci on primary contigs and 40,444 transcripts on 25,479 loci on haplotigs (Table S8). GO terms were assigned to 80,752 transcripts (83.8%) based on homology with protein domains in RefSeq and InterPro databases (Table S13, File S9-S11). A genome browser for Cabernet Sauvignon, its annotation, and an associated blast tool are available at http://cantulab.github.io/data. For 2,995 (11%) of the 25,479 protein-coding gene loci identified on the haplotigs, we could not find a corresponding homolog in the primary contigs, likely reflecting the diversity in gene content between parental genomes (Cabernet Franc and Sauvignon Blanc (Bowers and Meredith 1997)), as in Corvina (2,321 transcripts; Venturini ), Tannat (1,873 transcripts; Da Silva ), and Nebbiolo (3,961 transcripts; Gambino ). We could not find homologous genes in PN40024, Corvina, Tannat, and Nebbiolo for 1,714 protein-coding gene loci (Table S14). Those genes included likely members of the phenylpropanoid/flavonoid biosynthesis pathway: four phenylalanine ammonia-lyase, three chalcone synthases, a chalcone isomerase, a dihydroflavonol-4-reductase, a flavanone 3-hydroxylase, a flavonoid 3′,5′-hydroxylase, a leucoanthocyanidin dioxygenase, and an anthocyanin acyltransferase. Other private Cabernet Sauvignon genes were associated with terpenoid biosynthesis, including six valencene synthases that may play a role in grapevine flower aroma, three vinorine synthases (Lücker ; Martin ), and a (E,E)-geranyllinalool synthase. The incorporation of Iso-Seq data in the gene prediction pipeline also allowed the structural annotation of alternative transcripts. Twenty five percent (15,509) of the 62,168 annotated gene loci had two or more alternative isoforms, an average of 1.55 ± 1.29 alternative transcripts per locus, confirming previous reports in PN40024 (Vitulo ). The frequency of splicing variant types was similarly observed in other plant species (Reddy ). Intron retention was the most abundant type, accounting for over 44% (File S12), similar to what was observed for rice (45-55%; Zhang ), Arabidopsis (30-64%; Marquez ; Reddy ; Zhang ) and maize (40-58%; Zhang ; Wang ). Alternative acceptor sites (13%), alternative donor sites (10%), and exon skipping (8%) were the other most abundant types of alternative splicing found in the Cabernet Sauvignon genome; a full description of the selected splicing events is reported in File S12.

Iso-Seq transcriptome as reference for RNA-seq analysis

The final step of the analysis was to evaluate the effectiveness of the reconstructed ISNT as reference for RNA-seq analysis of berry development compared to the gene space predicted on the Cabernet Sauvignon genome. Comparisons between the predicted transcripts and the reconstructed ISNT as references for RNA-seq are summarized in Table 1. Only about three percent more RNA-seq reads mapped on the Cabernet Sauvignon predicted transcriptome (90.6 ± 0.8%) than on the ISNT (87.2 ± 0.8%), suggesting that Iso-Seq reconstructed most of the transcripts detectable by RNA-seq at a coverage of ∼26 M reads/sample. Approximately 75% of the ISNT (21,494 transcripts) and ∼49% of the predicted gene space (30,501 gene loci) was detected as expressed (mean RPKM ≥ 1) in at least one stage (Figure 3A, Table S15-S16). In both datasets, the number of expressed genes was slightly higher at pre-véraison stage than at later developmental stages, consistent with previous observations of ripening Cabernet Sauvignon berries (Fasoli et al. 2018). For both datasets, Pearson’s correlation matrix and Principal Component Analysis (PCA) showed a clear distinction between pre-véraison stage and the three ripening stages, as well as a stronger correlation between post-véraison and full-ripe berry transcriptomes (Figure 4), confirming the well-known transcriptional reprogramming associated with the onset of ripening (Fasoli ; Massonnet ) and suggesting that similar global transcriptomic dynamics of berry development can be obtained using either Iso-Seq or the whole genome as reference. We then applied a sequence clustering approach to define associations between ISNT isoforms and gene loci to directly compare the expression values of each gene in the two transcriptomes. Based on reciprocal overlap of the alignment, we were able to associate 25,306 ISNT transcripts with 26,873 gene loci in the Cabernet Sauvignon genome (Table S10). Gene expression levels measured on the two transcriptomes were well-correlated (R = 0.92; P-value < 2.2 e-16; Figure 3B; Tables S17-18). Differential gene expression analysis identified 14,477 ISNT transcripts and 18,600 Cabernet Sauvignon genes significantly differentially expressed (PPDE ≥ 0.95) at least once during berry development (Table S15-S16). More genes were differentially regulated between pre-véraison and véraison than during ripening for both transcriptomes, (Figure 5A & B) as previously observed (Palumbo ; Massonnet ). Ninety one percent of the differentially expressed ISNT isoforms were also differentially expressed when RNA-seq data were mapped on genomic loci. Similar relative amounts of Biological Process GO terms among differentially expressed genes were observed between the two transcriptomes (Figure 5C & D). Interestingly, 302 Cabernet Sauvignon private isoforms (transcripts not found in other cultivars) were differentially expressed during berry development, including transcripts encoding a polyol transporter, an inositol transporter, and five aquaporins.

Table 1

Summary of the RNA-seq results when Cabernet Sauvignon berry ISNT and the Cabernet Sauvignon genome were used as reference for short-read mapping

	Iso-Seq transcriptome	Cabernet Sauvignon genome
Mapping rate	87.2 ± 0.8%	90.6 ± 0.8%
Expressed features (mean RPKM ≥ 1 at least at one growth stage)	21,494 isoforms (74.8%)*	30,501 gene loci (49%)**
Expressed features with BP GO term annotation	14,431 isoforms	21,588 gene loci
Total BP GO terms among expressed features	61,603 GO terms	80,103 GO terms
DE features (PPDE ≥ 0.95 and mean RPKM ≥ 1 at least at one growth stage)	14,477 isoforms (50.4%)*	18,600 gene loci (29.9%)**
DE features with BP GO term annotation	10,237 isoforms	14,026 gene loci
Total BP GO terms among DE features	44,179 GO terms	53,349 GO terms

Biological process, BP; differentially expressed, DE.

Percentage of ISNT.

Percentage of Cabernet Sauvignon predicted transcriptome.

Figure 3

Gene expression analysis during berry development using the ISNT as reference and comparison with the Cabernet Sauvignon genome. (A) Number of ISNT transcripts (ISNTs) and gene loci expressed overall and at each ripening stage. Abbreviation of berry growth stages: PRV, Pre-véraison; V, Véraison; PV, Post- véraison; H, harvest. (B) Scatterplot showing the correlation between the gene expression level (log2(RPKM + 1)) of ISNT isoforms and Cabernet Sauvignon (CS) gene loci. Line of best fit and correlation coefficient factor (R) are provided.

Figure 4

Global transcriptomic changes during berry development ripening using the ISNT and Cabernet Sauvignon gene space as references. (A, B) Pearson correlation matrices of the 16 berry transcriptomes. (C, D) PCA plots of the 16 berry transcriptomes. Each point represents a biological replicate and ellipses define confidence areas (95%) for each berry growth stage. All analyses were performed using the log2-transformed RPKM values of expressed features (mean RPKM ≥ 1 at least at one stage).

Figure 5

Comparison of the differentially expressed (DE) ISNT transcripts and Cabernet Sauvignon gene loci during berry development. (A, B) Number of up- and down-regulated genes between each pairwise comparison of developmental stages (PPDE ≥ 0.95 and mean RPKM ≥ 1 at least at one stage). (C, D) Relative amount of Biological Process Gene Ontology (GO) terms among differentially expressed genes. Abbreviation of berry growth stages: PRV, Pre-véraison; V, Véraison; PV, Post- véraison; H, harvest.

Biological process, BP; differentially expressed, DE. Percentage of ISNT. Percentage of Cabernet Sauvignon predicted transcriptome. Gene expression analysis during berry development using the ISNT as reference and comparison with the Cabernet Sauvignon genome. (A) Number of ISNT transcripts (ISNTs) and gene loci expressed overall and at each ripening stage. Abbreviation of berry growth stages: PRV, Pre-véraison; V, Véraison; PV, Post- véraison; H, harvest. (B) Scatterplot showing the correlation between the gene expression level (log2(RPKM + 1)) of ISNT isoforms and Cabernet Sauvignon (CS) gene loci. Line of best fit and correlation coefficient factor (R) are provided. Global transcriptomic changes during berry development ripening using the ISNT and Cabernet Sauvignon gene space as references. (A, B) Pearson correlation matrices of the 16 berry transcriptomes. (C, D) PCA plots of the 16 berry transcriptomes. Each point represents a biological replicate and ellipses define confidence areas (95%) for each berry growth stage. All analyses were performed using the log2-transformed RPKM values of expressed features (mean RPKM ≥ 1 at least at one stage). Comparison of the differentially expressed (DE) ISNT transcripts and Cabernet Sauvignon gene loci during berry development. (A, B) Number of up- and down-regulated genes between each pairwise comparison of developmental stages (PPDE ≥ 0.95 and mean RPKM ≥ 1 at least at one stage). (C, D) Relative amount of Biological Process Gene Ontology (GO) terms among differentially expressed genes. Abbreviation of berry growth stages: PRV, Pre-véraison; V, Véraison; PV, Post- véraison; H, harvest.

Discussion

Full-length cDNA sequencing with SMRT technology (Iso-Seq) can be used to rapidly reconstruct the grape berry transcriptome, enabling the identification of cultivar-specific isoforms, refinement of the Cabernet Sauvignon genome annotation, and the creation of a reference for transcriptome-wide expression profiling. In contrast to transcriptome reconstruction using short-read sequencing that requires de novo assembly, Iso-Seq delivers full-length transcripts that eliminate the introduction of assembly errors and artifacts like chimeric transcripts and incomplete fragments due to PolyA capture (Chang ; Huang ; Moreton ; Smith-Unna ; Geniza and Jaiswal 2017; Ungaro ). The incorporation of high-coverage short-read sequencing is still necessary to benefit from the complete transcript sequencing enable by Iso-Seq. Although Iso-Seq provides much longer reads than second-generation sequencing platforms and as a result is excellent in resolving transcript structure, its sequencing error rate is high (10–20%) and throughput is still relatively low (Giordano ; Zimin ; Koren ). Here we show that combining Iso-Seq with Illumina sequencing at high coverage enables expression profiling and sequence error correction of Iso-Seq reads, particularly those derived from low-expression genes. The clustering analysis of the SMRT link pipeline discarded ∼18.5% of the FLNC reads, likely caused by low sequence accuracy. To overcome this technical issue, we applied a hybrid error correction pipeline consisting in performing the error correction of the unclustered FLNC reads, followed by an additional clustering step of both to resolve redundancies. Error correction with Illumina reads recovered a significant amount of Iso-Seq reads that would have otherwise been removed by the standard Iso-Seq pipeline, highlighting the importance of integrating multiple sequencing technologies with complementary features (Koren ; Au ; Salmela and Rivals 2014; Hu ). Transcriptome reconstruction has been widely used to develop references for genome-wide expression profiling in the absence of an annotated genome assembly (Simon ; Garber ; Martin and Wang 2011; Yang and Kim 2015). Though a genome reference is available for grape, transcriptome reconstruction overcomes the limitations of a cultivar-specific reference that lacks the gene content of other cultivars. Although cultivar-specific genes appear nonessential for berry development, those private genes could contribute to cultivar characteristics. For example, the wine grape Tannat accumulates unusually high levels of polyphenols in the berry; its cultivar-specific genes account for more than 80% of the expression of phenolic and polyphenolic compound biosynthetic enzymes (Da Silva ). De novo transcriptome assembly from short RNA-seq reads has been used to explore the gene content diversity in Tannat (Da Silva ), Corvina (Venturini ), and Nebbiolo (Gambino ). Iso-Seq identified 1,501 Cabernet Sauvignon transcripts expressed during berry development that were found in neither the genome of PN40024 nor the transcriptomes of Tannat, Nebbiolo and Corvina. Some private Cabernet Sauvignon transcripts have functions potentially associated with traits characteristic of Cabernet Sauvignon grapes and wines like their color and sugar content. These transcripts included three sugar transporter-coding genes, which could be involved in the accumulation of glucose and fructose during berry ripening (Lecourieux ), and a chalcone synthase, a flavanone 3-hydroxylase, and a flavonoid 3′-hydroxylase, all involved in the flavonoid pathway. Chalcone synthases catalyze the first committed step of the flavonoid biosynthesis pathway (Sparvoli ), which produces different classes of metabolites in grape berry, including flavonols (yellow pigments), flavan-3-ols and proanthocyanidins (mouth-feel and smooth sensory perceptions), and anthocyanins (red, purple, and blue pigments). In addition, products of the flavonoid 3′-hydroxylase can lead to the synthesis of cyanidin-3-glucoside, a red anthocyanin (Castellarin ). The analysis of the gene space in the genome assembly showed that private Cabernet Sauvignon genes identified using Iso-Seq are only a fraction of the private Cabernet Sauvignon transcriptome. As in other transcriptome reconstruction methods, Iso-Seq can only identify transcripts that are expressed in the organs and developmental stages used for RNA sequencing. Obtaining the full set of private transcripts without genome assembly would require sequencing additional organs and developmental stages. In addition, it is challenging to differentiate isoforms derived from close paralogous genes, alleles of the same gene, and alternative splicing variants, in any transcriptome obtained by RNA sequencing (including Iso-Seq); this potentially leads to an overestimation of the genes in the final transcriptome reference. This study could not resolve isoform redundancy in the final transcriptome for about 37% of the gene loci in the Cabernet Sauvignon genome. This is a limitation of Iso-Seq as well as of all transcriptome references that cannot be overcome without a complete genome assembly. In this study, we tested whether the transcriptome reconstructed using Iso-Seq can be used for expression profiling. Only an approximately 3% difference in read alignment between ISNT and the genome reference was observed, implying that at high coverage, ISNT detects almost all genes expressed during berry development. The slight difference in mapping rate between the two references can be explained by either the absence of some low-expression transcripts in the ISNT or the residual error rate in isoform sequences. Gene expression analysis using the ISNT as reference showed similar results compared to the Cabernet Sauvignon genome assembly, with a very high correlation of expression level and differential gene expression, and with similar global transcriptomic changes. However, we observed differences in the number of expressed and differentially expressed features that depend on the reference used. Those differences could be explained by the diploid phasing of the Cabernet Sauvignon genome assembly and that multiple ISNT transcripts might correspond to a single gene locus. Nonetheless, similar relative amounts of Biological Process GO terms were found among the differentially expressed genes, confirming that the transcriptome obtained using Iso-Seq captured the transcriptional reprogramming underlying the main physiological and biochemical changes during grape berry development. In addition, gene expression analysis revealed that some private isoforms (20%) are significantly modulated during berry development, indicating that in addition to identifying the private gene space, the ISNT reference makes it possible to observe its expression. In conclusion, this study demonstrates that Iso-Seq data can be used to create and refine a comprehensive reference transcriptome that represents most genes expressed in a tissue undergoing extensive transcriptional reprogramming during development. In grapes, this approach can aid developing transcriptome references and is particularly valuable given diverse cultivars with private transcripts and accessions that are genetically distant from available genome references, like the non-vinifera Vitis species used as rootstocks or for breeding. The pipeline described here can be useful in efforts to reconstruct the gene space in plant species with large and complex genomes still unresolved.

128 in total

1. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences.

Authors: Weizhong Li; Adam Godzik
Journal: Bioinformatics Date: 2006-05-26 Impact factor: 6.937

2. The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla.

Authors: Olivier Jaillon; Jean-Marc Aury; Benjamin Noel; Alberto Policriti; Christian Clepet; Alberto Casagrande; Nathalie Choisne; Sébastien Aubourg; Nicola Vitulo; Claire Jubin; Alessandro Vezzi; Fabrice Legeai; Philippe Hugueney; Corinne Dasilva; David Horner; Erica Mica; Delphine Jublot; Julie Poulain; Clémence Bruyère; Alain Billault; Béatrice Segurens; Michel Gouyvenoux; Edgardo Ugarte; Federica Cattonaro; Véronique Anthouard; Virginie Vico; Cristian Del Fabbro; Michaël Alaux; Gabriele Di Gaspero; Vincent Dumas; Nicoletta Felice; Sophie Paillard; Irena Juman; Marco Moroldo; Simone Scalabrin; Aurélie Canaguier; Isabelle Le Clainche; Giorgio Malacrida; Eléonore Durand; Graziano Pesole; Valérie Laucou; Philippe Chatelet; Didier Merdinoglu; Massimo Delledonne; Mario Pezzotti; Alain Lecharny; Claude Scarpelli; François Artiguenave; M Enrico Pè; Giorgio Valle; Michele Morgante; Michel Caboche; Anne-Françoise Adam-Blondon; Jean Weissenbach; Francis Quétier; Patrick Wincker
Journal: Nature Date: 2007-08-26 Impact factor: 49.962

Review 3. Short-read sequencing technologies for transcriptional analyses.

Authors: Stacey A Simon; Jixian Zhai; Raja Sekhar Nandety; Kevin P McCormick; Jia Zeng; Diego Mejia; Blake C Meyers
Journal: Annu Rev Plant Biol Date: 2009 Impact factor: 26.379

Review 4. Computational methods for transcriptome annotation and quantification using RNA-seq.

Authors: Manuel Garber; Manfred G Grabherr; Mitchell Guttman; Cole Trapnell
Journal: Nat Methods Date: 2011-05-27 Impact factor: 28.547

5. The grapevine expression atlas reveals a deep transcriptome shift driving the entire plant into a maturation program.

Authors: Marianna Fasoli; Silvia Dal Santo; Sara Zenoni; Giovanni Battista Tornielli; Lorenzo Farina; Anita Zamboni; Andrea Porceddu; Luca Venturini; Manuele Bicego; Vittorio Murino; Alberto Ferrarini; Massimo Delledonne; Mario Pezzotti
Journal: Plant Cell Date: 2012-09-04 Impact factor: 11.277

6. Transcriptome survey reveals increased complexity of the alternative splicing landscape in Arabidopsis.

Authors: Yamile Marquez; John W S Brown; Craig Simpson; Andrea Barta; Maria Kalyna
Journal: Genome Res Date: 2012-03-05 Impact factor: 9.043

7. Gene identification in novel eukaryotic genomes by self-training algorithm.

Authors: Alexandre Lomsadze; Vardges Ter-Hovhannisyan; Yury O Chernoff; Mark Borodovsky
Journal: Nucleic Acids Res Date: 2005-11-28 Impact factor: 16.971

8. Chromosomal-Level Assembly of the Asian Seabass Genome Using Long Sequence Reads and Multi-layered Scaffolding.

Authors: Shubha Vij; Heiner Kuhl; Inna S Kuznetsova; Aleksey Komissarov; Andrey A Yurchenko; Peter Van Heusden; Siddharth Singh; Natascha M Thevasagayam; Sai Rama Sridatta Prakki; Kathiresan Purushothaman; Jolly M Saju; Junhui Jiang; Stanley Kimbung Mbandi; Mario Jonas; Amy Hin Yan Tong; Sarah Mwangi; Doreen Lau; Si Yan Ngoh; Woei Chang Liew; Xueyan Shen; Lawrence S Hon; James P Drake; Matthew Boitano; Richard Hall; Chen-Shan Chin; Ramkumar Lachumanan; Jonas Korlach; Vladimir Trifonov; Marsel Kabilov; Alexey Tupikin; Darrell Green; Simon Moxon; Tyler Garvin; Fritz J Sedlazeck; Gregory W Vurture; Gopikrishna Gopalapillai; Vinaya Kumar Katneni; Tansyn H Noble; Vinod Scaria; Sridhar Sivasubbu; Dean R Jerry; Stephen J O'Brien; Michael C Schatz; Tamás Dalmay; Stephen W Turner; Si Lok; Alan Christoffels; László Orbán
Journal: PLoS Genet Date: 2016-04-15 Impact factor: 5.917

Review 9. Analysis of Whole Transcriptome Sequencing Data: Workflow and Software.

Authors: In Seok Yang; Sangwoo Kim
Journal: Genomics Inform Date: 2015-12-31

10. Hybrid assembly of the large and highly repetitive genome of Aegilops tauschii, a progenitor of bread wheat, with the MaSuRCA mega-reads algorithm.

Authors: Aleksey V Zimin; Daniela Puiu; Ming-Cheng Luo; Tingting Zhu; Sergey Koren; Guillaume Marçais; James A Yorke; Jan Dvořák; Steven L Salzberg
Journal: Genome Res Date: 2017-01-27 Impact factor: 9.043

27 in total

1. A high-resolution single-molecule sequencing-based Arabidopsis transcriptome using novel methods of Iso-seq analysis.

Authors: Runxuan Zhang; Richard Kuo; Max Coulter; Cristiane P G Calixto; Juan Carlos Entizne; Wenbin Guo; Yamile Marquez; Linda Milne; Stefan Riegler; Akihiro Matsui; Maho Tanaka; Sarah Harvey; Yubang Gao; Theresa Wießner-Kroh; Alejandro Paniagua; Martin Crespi; Katherine Denby; Asa Ben Hur; Enamul Huq; Michael Jantsch; Artur Jarmolowski; Tino Koester; Sascha Laubinger; Qingshun Quinn Li; Lianfeng Gu; Motoaki Seki; Dorothee Staiger; Ramanjulu Sunkar; Zofia Szweykowska-Kulinska; Shih-Long Tu; Andreas Wachter; Robbie Waugh; Liming Xiong; Xiao-Ning Zhang; Ana Conesa; Anireddy S N Reddy; Andrea Barta; Maria Kalyna; John W S Brown
Journal: Genome Biol Date: 2022-07-07 Impact factor: 17.906

2. Resolving the Microalgal Gene Landscape at the Strain Level: a Novel Hybrid Transcriptome of Emiliania huxleyi CCMP3266.

Authors: Martin Sperfeld; Dayana Yahalomi; Einat Segev
Journal: Appl Environ Microbiol Date: 2021-11-10 Impact factor: 5.005

3. Long-read RNA sequencing reveals widespread sex-specific alternative splicing in threespine stickleback fish.

Authors: Alice S Naftaly; Shana Pau; Michael A White
Journal: Genome Res Date: 2021-06-15 Impact factor: 9.043

4. VviERF6Ls: an expanded clade in Vitis responds transcriptionally to abiotic and biotic stresses and berry development.

Authors: Haley S Toups; Noé Cochetel; Dennis Gray; Grant R Cramer
Journal: BMC Genomics Date: 2020-07-09 Impact factor: 3.969

5. Diploid Genome Assembly of the Wine Grape Carménère.

Authors: Andrea Minio; Mélanie Massonnet; Rosa Figueroa-Balderas; Alvaro Castro; Dario Cantu
Journal: G3 (Bethesda) Date: 2019-05-07 Impact factor: 3.154

6. Linking Terpene Synthases to Sesquiterpene Metabolism in Grapevine Flowers.

Authors: Samuel Jacobus Smit; Melané Alethea Vivier; Philip Richard Young
Journal: Front Plant Sci Date: 2019-02-21 Impact factor: 5.753

7. The Impact of cDNA Normalization on Long-Read Sequencing of a Complex Transcriptome.

Authors: Nam V Hoang; Agnelo Furtado; Virginie Perlo; Frederik C Botha; Robert J Henry
Journal: Front Genet Date: 2019-07-23 Impact factor: 4.599

8. Unraveling the Deep Genetic Architecture for Seedlessness in Grapevine and the Development and Validation of a New Set of Markers for VviAGL11-Based Gene-Assisted Selection.

Authors: Nallatt Ocarez; Nicolás Jiménez; Reynaldo Núñez; Rocco Perniola; Antonio Domenico Marsico; Maria Francesca Cardone; Carlo Bergamini; Nilo Mejía
Journal: Genes (Basel) Date: 2020-01-30 Impact factor: 4.096

Review 9. A Bioinformatics Guide to Plant Microbiome Analysis.

Authors: Rares Lucaciu; Claus Pelikan; Samuel M Gerner; Christos Zioutis; Stephan Köstlbacher; Harald Marx; Craig W Herbold; Hannes Schmidt; Thomas Rattei
Journal: Front Plant Sci Date: 2019-10-23 Impact factor: 5.753

10. Rootstock influences the effect of grapevine leafroll-associated viruses on berry development and metabolism via abscisic acid signalling.

Authors: Amanda M Vondras; Larry Lerno; Mélanie Massonnet; Andrea Minio; Adib Rowhani; Dingren Liang; Jadran Garcia; Daniela Quiroz; Rosa Figueroa-Balderas; Deborah A Golino; Susan E Ebeler; Maher Al Rwahnih; Dario Cantu
Journal: Mol Plant Pathol Date: 2021-06-01 Impact factor: 5.663