Literature DB >> 30158558

A gene-rich fraction analysis of the Passiflora edulis genome reveals highly conserved microsyntenic regions with two related Malpighiales species.

Carla Freitas Munhoz1, Zirlane Portugal Costa1, Luiz Augusto Cauz-Santos1, Alina Carmen Egoávil Reátegui1, Nathalie Rodde2, Stéphane Cauet2, Marcelo Carnier Dornelas3, Philippe Leroy4, Alessandro de Mello Varani5, Hélène Bergès2, Maria Lucia Carneiro Vieira6.   

Abstract

Passiflora edulis is the most widely cultivated species of passionflowers, cropped mainly for industrialized juice production and fresh fruit consumption. Despite its commercial importance, little is known about the genome structure of P. edulis. To fill in this gap in our knowledge, a genomic library was built, and now completely sequenced over 100 large-inserts. Sequencing data were assembled from long sequence reads, and structural sequence annotation resulted in the prediction of about 1,900 genes, providing data for subsequent functional analysis. The richness of repetitive elements was also evaluated. Microsyntenic regions of P. edulis common to Populus trichocarpa and Manihot esculenta, two related Malpighiales species with available fully sequenced genomes were examined. Overall, gene order was well conserved, with some disruptions of collinearity identified as rearrangements, such as inversion and translocation events. The microsynteny level observed between the P. edulis sequences and the compared genomes is surprising, given the long divergence time that separates them from the common ancestor. P. edulis gene-rich segments are more compact than those of the other two species, even though its genome is much larger. This study provides a first accurate gene set for P. edulis, opening the way for new studies on the evolutionary issues in Malpighiales genomes.

Entities:  

Mesh:

Year:  2018        PMID: 30158558      PMCID: PMC6115403          DOI: 10.1038/s41598-018-31330-8

Source DB:  PubMed          Journal:  Sci Rep        ISSN: 2045-2322            Impact factor:   4.379


Introduction

The Passifloraceae family belongs to the Malpighiales order and is a member of the Rosids clade, according to classical and molecular phylogenetic analysis. The family consists of 700 species, classified in 16 genera. The majority of species belong to the genus Passiflora (~530 species), popularly known as passion fruits[1]. This genus is widely distributed in tropical and subtropical regions of the Neotropics. Approximately 150 species are native to Brazil, which is acknowledged to be an important centre of diversity[2]. Among the American tropical species of Passiflora, 60 fruit-bearing species are marketed for human consumption. Moreover, several species and hybrids have been produced for ornamental purposes (see www.passiflora.it;)[3], and pharmacologists have found that passion fruit vines contain bioactive compounds that are used in traditional folk medicines as anxiolytics and antispasmodics[4]. Passiflora edulis is the major species of passionflowers grown for fresh fruit consumption and juice production in climates ranging from cool subtropical (purple variety) to warm tropical (yellow variety). Species grown particularly in Brazil include P. edulis (sour passion fruit) and P. alata (sweet passion fruit). Because of the quality of its fruit and yield for processing into commercial juices, P. edulis is grown in 90% of the commercial orchards. The most recent agricultural production survey showed that 58,089 hectares were planted with passion fruits, yielding 838,444 tons per year[5]. P. edulis is a diploid (2n = 18)[6], self-incompatible species[7,8], with perfect, insect-pollinated flowers. Over the last two decades, our research group has carried out studies for estimating the genetic parameters of experimental populations[9], as well as constructing genetic maps[10,11] and mapping quantitative loci associated with the response to Xanthomonas axonopodis infection[12]. Munhoz and co-workers were able to determine which gene expression patterns were significantly modulated during the P. edulis-X. axonopodis interaction[13]. Despite its commercial success, little is known about the genome structure of P. edulis. The genome size has been estimated at ~1,230 Mb (1 C DNA content = 1.258 pg by flow cytometric analysis)[14]. To fill in this gap in our knowledge, a large-insert genomic BAC (Bacterial Artificial Chromosomes) library was built and denoted Ped-B-Flav (https://cnrgv.toulouse.inra.fr/library/genomic_resource/Ped-B-Flav). It contains 83,000 clones, which are kept at the National Centre for Plant Genomic Resources (CNRGV: cnrgv.toulouse.inra.fr) at INRA in Toulouse, France. In addition, previous studies provided initial insights into the P. edulis genome using BAC-end sequence (BES) data as a major resource[15], and described the structural organization of the plant’s chloroplast genome, which differs from that of various Malpighiales species due to rearrangement events[16]. Although based on small-sized sequences, BAC-end sequences can be mapped to intervals of sequenced related genomes[17] in order to identify collinear microsyntenic regions as a preliminary step towards selecting clones for full sequencing, which can be done with high accuracy using the single-molecule real-time (SMRT) sequencing (Pacific Biosciences). This method produces long, unbiased sequences that, in turn, facilitate subsequent assembly[18], a critical step in plants due to the high proportion of repetitive sequences throughout their genomes[19]. Most of the projects aimed at obtaining a draft or a complete plant genome were performed using large-insert based sequencing methods[20,21] to allow estimation of the number of genes, and abundance of transposable elements and microsatellites. In the functional part of the genome in particular, the annotation of large-inserts can provide an arsenal of biological information to facilitate comparison against databases and, in addition, to determine the distribution of BAC inserts relative to related genomes in order to examine the degree of synteny between them and gain insights into evolutionary relationships[22,23]. In this scenario, the P. edulis genome is continuing to be studied based on the large-insert BAC library and using the SMRT sequencing platform to completely sequence over 100 inserts of BAC clones. These clones were pre-selected based on BES microsynteny results and probes homologous to transcripts from a subtractive library of P. edulis in response to Xanthomonas axonopodis infection, which allowed us to obtain a gene-rich fraction of this genome. The repetitive content, predicted genes, and coding sequences were annotated. Also, microsyntenic regions of P. edulis common to Populus trichocarpa (Salicaceae, 485 Mb[24]) and Manihot esculenta (Euphorbiaceae, 742 Mb[25]), two related Malpighiales species with available fully sequenced and well-annotated genomes, were identified.

Material and Methods

BAC Selection and DNA Preparation

BAC clones were selected from the findings of Santos et al.[15], which provides an initial overview of the P. edulis genome using BAC-end sequence (BES) data as a major resource. The results of comparative mapping between P. edulisBES and the reference genomes of Arabidopsis thaliana, Populus trichocarpa and Vitis vinifera were also used to choose BAC clones for sequencing. In addition, based on BES functional annotation results, the BAC-inserts with coding sequences (CDS) in one or both BESs were also selected. A second selection procedure was performed after screening the genomic library using the probes homologous to P. edulis transcripts described in[13]. Briefly, the authors used suppression subtractive hybridization to construct two cDNA libraries enriched for transcripts induced and repressed by Xanthomonas axonopodis, respectively, 24 h after inoculation with a highly virulent bacterial strain. The homologous probes were prepared via PCR, using as a template the genomic DNA from ‘IAPAR-123’, the accession used to construct the Ped-B-Flav BAC library. Specific primers were used to generate a single amplicon (200 to 600 bp in size) for each probe gene sequence. The ‘DecaLabel DNA Labeling Kit’ (Fermentas) was used for radiolabeling the probes. The amplification products were then purified with ‘Illustra ProbeQuantTM G-50 Micro Columns’ (GE Healthcare). The library was previously gridded onto macroarrays in which 41,472 clones were double-spotted on each 22 × 22 cm nylon membrane. These membranes were submerged in a bath of SSC (Saline-Sodium Citrate) solution (6×, 17 min., 50 °C); incubated overnight (68 °C) in hybridization buffer [6× SSC, 5× Denhardt’s Solution, 0.5% (w/v) SDS (Sodium Dodecyl Sulfate)]; hybridized with denatured probes (10 min, 95 °C; 1 min., cooled on ice); and washed twice in buffer 1 [2× SSC, 0.1% (w/v) SDS] (15 min., 50 °C) and buffer 2 [0.5× SSC, 0.1% (w/v) SDS] (30 min., 50 °C). Next, the hybridized membranes were placed in a film cassette for 24 h.; radioactive signals were detected using a PhosphorImagerTM and Storm 820 scanner (Amersham Biosciences) and analyzed using HDFR3 software, to identify the positive clones. Each positive clone was individually validated by PCR. In order to estimate insert sizes, the preserved cultures were scraped and a positive single colony of each BAC grown in a 96-well plate (overnight, 37 °C) containing 1200 µL of LB medium with chloramphenicol (12.5 µg/mL) and glycerol (6%). DNAs were then isolated using a NucleoSpin® 96 Flash (Macherey-Nagel) BAC DNA purification kit, digested with 5 U of FastDigest™ NotI enzyme (Fermentas) and size-fractioned by PFGE (6 V.cm−1, 5 to 15 s switch time, 16 h run time, 12.5 °C) in a Chef Mapper XA Chiller System 220 V (BioRad), followed by ethidium bromide staining and visualization. The insert sizes were determined by comparison with PFGE (pulsed-field gel electrophoresis) standard size markers. To prepare the DNA for sequencing, 1 μl of the above cultures was allowed to regrow in 20 mL of LB medium (plus 12.5 µg/mL chloramphenicol at 37 °C overnight) under shaking (250 rpm). The cultures were then mixed in pools, at a maximum of 20 clones per pool. DNA extraction was performed using the Nucleobond Xtra Midi Plus kit (Macherey-Nagel) according to the manufacturer’s instructions.

DNA Sequencing and Assembly From Long Sequence Reads

Approximately 5 µg of each pool was used for the construction of a SMRT library based on the standard Pacific Biosciences (San Francisco, CA, USA) preparation protocol for 10-kb libraries. Each pool was sequenced in one SMRT Cell using P6 polymerase in combination with C4 chemistry, following the manufacturer’s standard operating procedures and using the PacBio RS II long-read sequencer. Reads were assembled by a hierarchical genome assembly process (HGAP workflow)[26], and using the v2.2.0 SMRT® analysis software suite for HGAP implementation. Reads were first aligned by the PacBio long-read aligner or BLASR[27] against the complete genome of Escherichia coli, strain K12, substrain DH10B (GenBank: CP000948.1). The E. coli reads, as well as low quality reads (minimum read length of 500 bp and minimum read quality of 0.80) were removed from the data set. Filtered reads were then preassembled to yield long, highly accurate sequences. To perform this step, the smallest and the longest reads were separated from each other to correct errors by mapping single-pass reads to the longest reads (seed reads), which represent the longest portion of the read length distribution. Next, sequences were filtered against vector (BAC) sequences, and the Celera assembler used to assemble data and obtain draft assemblies. The last step was performed in order to significantly reduce the remaining indels and base substitution errors in the draft assembly. The Quiver algorithm was used for this purpose. This quality-aware consensus algorithm uses rich quality scores (Quality Value/QV scores) and QV is a per-base estimate of base accuracy. QV scores over 20 are from very good data with only 1% error probability. Finally, Quiver polishes the assembly for final consensus[26]. Once the refined assembly was obtained, each BAC-insert sequence was individualized by matching the end sequences to the pool of assembled sequences using BLAST. Read coverage was assessed by aligning the raw reads on the assembled sequences with BLASR.

Identification and Annotation of Repetitive Sequences

Eukaryotic genomes contain a substantial portion of repetitive elements which are organized into three main classes: dispersed repeats (mostly transposable elements and retrotransposed genes), local repeats (tandem repeats and simple sequence repeats or microsatellites) and segmental duplications (duplicated genomic fragments)[28]. It is highly recommended to identify and mask repetitive regions before gene prediction. Otherwise, unmasked repeats can produce spurious BLAST alignments, resulting in false evidence for gene annotations[29]. The v2.2 REPET package was used for de novo detection and annotation of transposable elements (TEs). The annotation process starts with self-alignment of the sequences by all-by-all comparison. Matching clusters are then identified based on the same cluster sequences in a given family. A consensus for each family is created, and each consensus is classified according to the structures and domains present. The last step entails annotating TE copies[30,31]. The resulting elements were then compared with sequences deposited in the Viridiplantae section of the Repbase repeat database[32]. They were classified by PASTEC, a tool for classifying TEs by searching for structural features and similarities[33] and implementing the hierarchical classification system proposed by[34]. Repeat masking was subsequently performed with RepeatMasker Open-3.0[35] using the library generated by the REPET and Repbase Viridiplantae dataset[32]. MISA[36] was used to search for microsatellites based on microsatellite sequences with at least 10 nucleotides in the repeat for mono-, 5 for di -, and 3 for tri-, tetra-, penta- or hexanucleotides. Composite microsatellites were also identified. They are formed by multiple, adjacent, repetitive motifs. Hence, a microsatellite is considered composite if it has a maximum interruption of 10 bp between motifs[37,38].

Gene Prediction and Functional Annotation

Evidence-driven gene prediction was performed based on gene models of Arabidopsis thaliana and Theobroma cacao and using the following software: Augustus[39], GlimmerHMM[40], GeneMark.hmm[41], and SNAP[42]. Ab initio gene finding was performed with the BRAKER pipeline[43]. Protein homology detection and potential intron resolution were detected by Exonerate software[44] against the annotated genomes of Populus trichocarpa, Salix purpurea, Ricinus communis and Manihot esculenta, downloaded from the Phytozome website[45]. These species are among the plant genomes with the highest number of top hits for P. edulis[15]. Additionally, a P. edulis RNA-seq library (see details below) was used to support gene model predictions. PASA[46] was used to produce alignment assemblies based on overlapping transcript alignments from P. edulis RNA-seq data. The results were combined by EVidence Modeler software[47], and PASA was used to update the EVidence Modeler consensus predictions, adding UTR annotations and models for alternatively spliced isoforms. Exon-intron boundaries were manually examined using GenomeView[48] and adjusted where necessary. RNA-seq reads (2 × 100 bp; Illumina HiSeq2000) were trimmed based on quality (Phred quality score >20). Contaminants, remaining adapters, and sequences (<50 bp) were removed using SeqyClean v1.9.9[49]. Total RNA-seq assembly was implemented by Trinity[50]. In brief, RNA-seq reads were derived from three libraries (each replicated three times) of shoot apexes of juvenile, vegetative and reproductive adult plants of P. edulis, constructed with the aim of performing comparisons of these three developmental stages (Dornelas M.C. et al., unpublished data). Functional annotation of the predicted gene sequences was performed using Blast2GO v3.2 tools[51] for assigning ontological terms in accordance with BLASTX results (e-value cut-off of 1 × 10−6). In addition, protein signature recognition was performed using the InterProScan tool[52].

Microsynteny Analysis

The 20 P. edulis BAC-inserts with the highest number of annotated genes were used for the identification of potential microsyntenic regions between P. edulis and Populus trichocarpa (Salicaceae), and P. edulis and Manihot esculenta (Euphorbiaceae), two related Malpighiales species with entirely sequenced and well-annotated genomes. P. edulis coding sequences were compared with these two genome sequences, available in the Phytozome database[45] using BLASTN. Based on the phylogenetic relationships among the Malpighiales species, we chose P. trichocarpa because it is the closest species to P. edulis. Taxonomically speaking, Passifloraceae appears as a sister group to Salicaceae. On the other hand, M. esculenta is the most distant species from P. edulis among those Malpighiales with fully sequenced and well-annotated genomes. To consider two genes as orthologs, the alignment had to show an e-value < 10−10 and coverage >50%. After identifying the orthologs, microsyntenic regions were defined. These are regions with more than four pairs of orthologous genes. All gene positions in the microsyntenic regions were recorded to construct comparative graphs. The analysis was carried out on JBrowse, (Phytozome v12.1 platform)[45] to search for genes exhibiting each P. edulis microsyntenic region and in the P. trichocarpa and M. esculenta genome. The initial and final positions of the orthologous genes and chromosome identification were used as a basis for constructing comparative graphs. Using the GenomeView browser[48], each of the microsyntenic regions was visualized and confirmed. Finally, comparative graphs were constructed using a graphics application.

Results

BAC Selection, Sequencing and Assembly

A total of 66 BAC inserts were selected for complete sequencing based on our previous BAC-end sequencing results[15], and 46 were selected using probes homologous to transcripts of P. edulis[53] (Supplementary Table S1). Thus, in total, 112 BAC inserts from the P. edulis genomic library were sequenced. The sequencing process resulted in 571,565 high quality reads, ranging from 500 to 46,831 bp in length. Sequences were between 24,316 and 142,456 bp in length, corresponding to their respective band sizes resolved by PFGE. The high quality of the long reads (QV > 47) and high coverage of the contigs (on average 278×) are indications of the reliability of our data (Supplementary Table S2), leading to the conclusion that all inserts were completely sequenced and assembled. The assembly, gene models, and genome browser are available at https://genomevolution.org/coge/GenomeInfo.pl?gid=52053. The sequencing method was of sufficient quality to provide a single contig per insert, with only two exceptions; in the assembly process, insert sequences Pe101K14 and Pe141H13 had overlapping regions that resulted in a single contig of 172,337 bp; similarly, Pe20N3 and Pe64C12 resulted in a single contig of 114,997 bp. In addition, of the 112 BAC insert sequences, three corresponded to organelle DNA, and therefore these sequences were not included. Thus, 107 sequences were subjected to annotation, totaling 10,401,671 bp (10.4 Mb) corresponding to approximately 1.0% of the P. edulis genome. GC content across this genome fraction was 41.09%, and in the CDS 46.49%.

Gene Representativeness, Structure and Functional Annotation

Structural sequence annotation resulted in the prediction of 1,883 genes ranging from 153 to 24,687 bp in length, with an average of 2,448 bp. These gene sequences represented 44% of the total sequenced nucleotides, corresponding to 4,608,830 bp. Intergenic regions covered from 0 (overlapped genes) to 92,497 bp, with a mean length of 3,184 bp. Between 3 and 36 predicted genes were identified per sequenced insert, with an average of 17.6 predicted genes per insert (Table 1, Supplementary Table S3). Taking into account the estimated size of the P. edulis genome (~1,230 Mb), the high number of genes identified herein (1,833) endorses the efficiency of the strategy used for selecting BAC-inserts that were supposedly gene-rich.
Table 1

Gene content in a gene-rich fraction of the Pasiflora edulis genome and structural annotation.

BAC codeNo. of predicted genesIntronless genesExons per geneGene length variation (bp)Average gene length (bp)Intergenic spacer length variation (bp)Average intergenic spacer length (bp)CDS length (bp)Average CDS length (bp)
Pe101K14 + 141H1336*172–17264–11,7782,72033–6,3122,070264–6,5761,187
Pe185D1136*122–12201–4,7781,54816–9,7301,802201–1,725689
Pe164B1829*92–19243–16,2792,31342–7,4491,316243–11,4091,393
Pe214H1129*42–39799–13,9563,857194–5,7281,134174–4,5721,636
Pe164D928*132–11198–5,8171,868114–5,8441,600156–2,2021,066
Pe186E1928*42–14770–7,4502,65111–13,5011,559210–2,3071,098
Pe43L227*32–18339–10,0972,718162–2,768973279–3,1231,145
Pe86F927132–5201–20,5011,622147–12,5072,776201–1,740607
Pe164K1726*42–13436–9,5023,03711–7,7751,761204–5,3341,310
Pe215I826*52–18312–8,2383,007230–13,3382,168180–3,5011,253
Pe75K1525142–5186–4,19385710–11,7212,951186–2,100591
Pe84I1425*62–12345–8,1183,01469–4,352936198–4,2751,295
Pe84M232552–13305–8,6522,75352–5,197998177–3,0181,168
Pe93M225*52–16399–7,0692,274135–11,9332,170192–2,9611,109
Pe171P1325*82–20461–9,7272,759158–15,9602,392330–4,0351,193
Pe207D1125*122–17213–6,7561,8965–20,5512,838213–2,730897
Pe93N724*52–11921–8,8893,12018–7,5881,421387–5,0851,486
Pe108C1624*82–14234–6,5531,89234–9,1132,209234–3,252974
Pe173B1624*62–32475–15,3903,079151–15,1272,134279–6,3751,523
Pe185J1624*42–21447–8,7732,432201–6,9242,083237–2,3671,035
Pe198H2324*82–6180–5,2791,9431–11,0082,681180–3,5101,143
Pe212I124*52–35234–12,6942,71553–15,1332,607234–3,5671,080
Pe93J92332–16615–6,1312,9073–9,0661,824201–3,3211,295
Pe135J122362–15162–9,5432,71481–8,7581,868162–4,4331,260
Pe195F42322–20261–8,3642,8439–11,1332,208177–5,4421,192
Pe74I62292–39204–17,6553,407146–6,1911,764204–4,3741,164
Pe84M182262–10321–8,1242,56322–15,2242,364321–4,3561,160
Pe101O42262–19624–9,7022,678315–10,4992,782300–2,235884
Pe141J232262–15189–9,2582,567608–12,0792,407189–2,550870
Pe201C1122112–17195–5,4521,865288–17,8914,128195–2,634822
Pe69G182132–22228–8,6582,95861–19,1042,304210–3,5821,192
Pe69H242122–14335–6,4612,752445–57052,306234–2,5591,142
Pe93K192132–12792–10,3733,523196–6,3221,422387–4,6291,593
Pe125I232152–14414–7,9932,52651–8,6592,406414–1,7761,106
Pe164A122172–11384–7,9642,35426–7,4061,675228–4,5031,050
Pe168B172132–11321–6,8612,50947–16,9324,619174–4,1401,234
Pe214A182172–11243–6,3141,944237–27,5863,916243–2,184924
Pe7M1520112–15213–9,0312,38812–17,4203,676213–3,4951,046
Pe28D1120162–4189–2,43078022–28,0735,567189–1,410558
Pe60G102062–24351–9,9252,51391–10,9472,767261–3,3781,291
Pe65F72082–14306–7,0811,97312–25,5392,702213–3,252844
Pe175N82082–27219–14,2452,94115–11,2372,495219–3,6631,299
Pe214N192092–13234–5,9131,59437–15,5983,485189–2,470788
Pe43D21932–8447–7,3382,601271–19,6332,158222–4,8721,120
Pe51C21952–16357–8,8893,603493–6,7562,110357–5,0881,520
Pe85B191972–18372–10,1152,85142–8,1032,368183–3,2281,157
Pe101P71932–20234–8,4843,74216–2,340963234–2,7121,247
Pe134H151982–11295–7,2902,527208–5,9532,351219–1,899844
Pe216F31922–37393–14,1513,198241–3,160914393–8,9431,626
Pe216F91952–13207–9,2743,547420–5,5732,107207–3,4171,180
Pe20N3 + Pe64C121852–12441–6,9412,557266–10,5192,009276–2,3641,223
Pe24G1918122–6165–3,8031,054184–22,1763,639165–1,593598
Pe69C71872–22210–8,5053,745132–18,0292,165210–4,1641,450
Pe69O161844–19590–17,6704,33986–1,976767177–14,5832,292
Pe212D71872–36171–21,1312,654415–20,0354,436171–9,3301,229
Pe27H1717132–3177–2,134620197–13,5114,390177–1,071464
Pe85I91752–12207–8,5782,908334–20,2102,892207–1,9561,107
Pe89E1017102–13183–4,327974342–18,5845,178174–1,794509
Pe101P131742–21666 –13,5524,43790–4,9411,072210–2,3071,261
Pe209G151732–14219–8,3533,108118–17,1052,754219–3,0841,416
Pe21O151672–13189–4,5701,512106–14,5723,633156–1,902595
Pe63J1816102–5441–6,9412,750266–10,5192,054213–3,429970
Pe84K81632–18162–12,3563,570178–4,8671,891162–2,2951,072
Pe93M416102–7216–3,06397215–37,5084,704216–1,998640
Pe117C1716112––12153–6,8529797–18,1685,302153–1,188414
Pe138G1716102–10178–6,1131,39540–13,3944,513178–2,934731
Pe141K81642–241,053–11,5924,060283–5,0912,179387–3,9751,653
Pe216B221612–151013–8,8153,93147–19,8623,119795–3,7681,575
Pe216I51664–16201–5,9293,296462–4,5631,373201–2,8621,458
Pe61E21543–12231–8,5983,100223–18,1873,244231–2,103973
Pe99P161592–33249–15,0222,441501–9,3872,582216–4,605908
Pe123N81552–22163–10,0512,93870–13,30639,979163–2,3971,028
Pe3F101442–14652–6,5522,47190–4,3891,557285–3,2521,080
Pe28E221412–12379–11,1073,66113–16,0732,221261–2,7181,247
Pe34M71462–4225–1,29865282–39,7016,611192–1,026459
Pe75F201462–13198–6,4181,859182–21,9795,567198–1,842541
Pe85H41412–51489–22,4813,938178–17,5782,764300–5,7061,546
Pe85J231422–15760–9,6313,222362–9,6092,597492–3,0661,087
Pe101H1514102–5225–24,6872,257122–15,1956,521255–1,008524
Pe69F221302–14438–6,5973,680196–26,1184,433207–1,7101,029
Pe75A211383–10162–5,7301,56910–15,5694,038162–2,076630
Pe84M61382–13185–3,0261,059262–16,4554,686185–1,578792
Pe86H71372–3213–4,4971,42931–28,5756,964213–3,459875
Pe34H91232–14258–6,2851,96149–44,5326,154258–1,623748
Pe213C91282–5327–3,5991,246213–31,6537,880234–2,016749
Pe71E31123–9207–3,7272,185362–31,4896,138207–1,6981,047
Pe93A71182–4162–1,37458218–25,4727,604162–759373
Pe93F51122–8192–11,0412,7455 – 24,1677,152192–1722707
Pe93O181132–11387–7,6432,714596–49,4829,337387–1,6321,080
Pe101F211172–7243–4,83594758–27,1728,438198–1,806534
Pe141B121142–15288 – 6,7692,412251–24,6115,214282–3,4171,142
Pe75D121062–5219–3,255778109–39,9458,052216–1,224456
Pe75N151082204–71444478–32,2437,353204–714402
Pe9E4942 – 14342–6,1002,099654 – 13,9256,177342–2,8981,171
Pe15E1942–13270–2,8961,153700–33,0219,014270–1,578714
Pe20E10942–2159–1,578605278–35,1129,958159–1,578496
Pe212M5942–6267–3,1701,020851–10,4684,056267–1,566727
Pe103M2822–17222–12,6563,122418–32,4536,547222–2,010807
Pe28I20752–2237–88146767–30,51611,363237–762437
Pe75F13742–3180–1,63665416,743–92,49758,535180–1,245519
Pe85O9712–8441–3,3242,079515–6,4471,784441–1,329765
Pe1M17612–4312–2,4731,099256–10,8485,311312–1,404784
Pe212J12622–15405–4,3571,37781–12,7083,133381–1,644692
Pe216B2612–24218–15,9695,097830–4,5752,306218–3,8191,605
Pe113A7532156–22541,2063472–26,02613,464156–681503
Pe1K19302–9958–4,7373,111287–37,48718,877840–897869
Pe33M2323210–2,0378244,001–69,19936,600210–697377

*BAC-inserts with the highest number of annotated genes, used for microsynteny analysis.

Gene content in a gene-rich fraction of the Pasiflora edulis genome and structural annotation. *BAC-inserts with the highest number of annotated genes, used for microsynteny analysis. One third of the genes (631) had no introns. The remaining (1,252) had up to 50 introns. A total of 6,122 introns (ranging from 26 to 7,869 bp in length) and 8,005 exons (ranging from 3 to 6,249 bp) were recognized. CDS ranged from 153 to 14,583 bp in length, totaling 1,985,892 bp, with a mean of 1,054 bp. A total of 61 were insert-end sequences and therefore incomplete gene sequences. According to the RNA-seq read alignment results, 252 genes exhibited more than one transcript (Supplementary Table S3), including glutamine synthetase leaf enzyme, chloroplastic (6 transcripts), ultraviolet-B receptor UVR8, a protein responsive to UV-B (5), the auxin response factor (2), an abscisic acid insensitive protein (2) and an ethylene receptor protein (2). Of the 1,883 predicted genes, 1,502 showed significant levels of similarity (e-values < 1 × 10−6) to plant proteins (Supplementary Table S3) according to the Blast2GO results. The top hits for this large fraction of genes (~80%) were from Jatropha curcas (298), Populus trichocarpa (275), Populus euphratica (232) and Ricinus communis (212). These results were expected, since among all available plant genomes, these species are phylogenetically close to P. edulis, and all belong to the Malpighiales order. Functional annotation resulted in 3,178 ontological terms assigned to 1,191 genes. These GO terms were related to several processes, and are usually classified into three broad categories (known as level 1): biological process, molecular function and cellular component. The distribution of level 2 terms within each of these major categories is shown in Fig. 1 and matches the results of BES annotation[15].
Figure 1

Distribution of GO annotations assigned to gene products in ontological categories: (A) Biological process, (B) Molecular function and (C) Cellular component. GO annotations were extracted from all sequences (10.4 Mb) of Passiflora edulis.

Distribution of GO annotations assigned to gene products in ontological categories: (A) Biological process, (B) Molecular function and (C) Cellular component. GO annotations were extracted from all sequences (10.4 Mb) of Passiflora edulis. Regarding the 46 regions selected using probes homologous to transcripts induced and repressed by X. axonopodis infection, none of the functional categories related to plant defense were found to be overrepresented. However, protein signatures related to plant immunity and defense functions were identified. The serine/threonine-protein kinase active site (32 genes), and the leucine-rich repeat domain, L domain-like (27 genes) were among the most represented signatures (Table 2). In total, automated searches for protein signatures recognized 1,383 signatures in 1,488 genes of P. edulis: 783 domains, 453 protein families, 125 sites and 22 replicates (Table 2). Most of these signatures (769) were taken from the Pfam database[54], and the remainder from SuperFamily (239)[55] and Smart (223)[56].
Table 2

Most frequent protein signatures (≥10) recognized in genes of Passiflora edulis according to InterProScan results.

InterProScan IDNo. of genes
IPR005162 [Domain]: Retrotransposon gag domain58
IPR011009 [Domain]: Protein kinase-like domain51
IPR000719 [Domain]: Protein kinase domain49
IPR027417 [Domain]: P-loop containing nucleoside triphosphate hydrolase39
IPR001878 [Domain]: Zinc finger, CCHC-type36
IPR011990 [Domain]: Tetratricopeptide-like helical domain34
IPR008271 [Active_Site]: Serine/threonine-protein kinase, active site32
IPR013083 [Domain]: Zinc finger, RING/FYVE/PHD-type31
IPR029058 [Domain]: Alpha/Beta hydrolase fold30
IPR017441 [Binding_Site]: Protein kinase, ATP binding site30
IPR016024 [Domain]: Armadillo-type fold27
IPR032675 [Domain]: Leucine-rich repeat domain, L domain-like27
IPR013320 [Domain]: Concanavalin A-like lectin/glucanase domain25
IPR009057 [Domain]: Homeodomain-like25
IPR002885 [Repeat]: Pentatricopeptide repeat25
IPR011989 [Domain]: Armadillo-like helical22
IPR016040 [Domain]: NAD(P)-binding domain19
IPR013242 [Domain]: Retroviral aspartyl protease19
IPR001841 [Domain]: Zinc finger, RING-type19
IPR017986 [Domain]: WD40-repeat-containing domain18
IPR012337 [Domain]: Ribonuclease H-like domain18
IPR015943 [Domain]: WD40/YVTN repeat-like-containing domain18
IPR001128 [Family]: Cytochrome P45017
IPR001611 [REPEAT] - Leucine-rich repeat17
IPR012677 [Domain]: Nucleotide-binding alpha-beta plait domain16
IPR001680 [Repeat]: WD40 repeat16
IPR001005 [Domain]: SANT/Myb domain15
IPR029044 [Domain]: Nucleotide-diphospho-sugar transferases15
IPR026960 [Domain]: Reverse transcriptase zinc-binding domain15
IPR017853 [Domain]: Glycoside hydrolase superfamily15
IPR000504 [Domain]: RNA recognition motif domain14
IPR013210 [Domain]: Leucine-rich repeat-containing N-terminal, plant-type14
IPR001245 [Domain]: Serine-threonine/tyrosine-protein kinase catalytic domain14
IPR018247 [Binding_Site]: EF-Hand 1, calcium-binding site13
IPR005135 [Domain]: Endonuclease/exonuclease/phosphatase13
IPR011598 [Domain]: Myc-type, basic helix-loop-helix (bHLH) domain13
IPR011992 [Domain]: EF-hand domain pair13
IPR002401 [Family]: Cytochrome P450, E-class, group I13
IPR005123 [Domain]: Oxoglutarate/iron-dependent dioxygenase12
IPR002048 [Domain]: EF-hand domain12
IPR012334 [Domain]: Pectin lyase fold11
IPR013781 [Domain]: Glycoside hydrolase, catalytic domain11
IPR011050 [Domain]: Pectin lyase fold/virulence factor11
IPR017930 [Domain]: Myb domain11
IPR017972 [Conserved_Site]: Cytochrome P450, conserved site11
IPR006121 [Domain]: Heavy metal-associated domain, HMA10
IPR001810 [Domain]: F-box domain10
IPR000620 [Domain]: EamA domain10
IPR012336 [Domain]: Thioredoxin-like fold10
IPR016140 [Domain]: Bifunctional inhibitor/plant lipid transfer protein/seed storage helical10
IPR025558 [Domain]: Domain of unknown function DUF428310
Most frequent protein signatures (≥10) recognized in genes of Passiflora edulis according to InterProScan results.

Richness of Transposable Elements and Microsatellites

The search for transposable elements resulted in the identification of 250 TEs that, in turn, were automatically classified as Class I (retrotransposons) and Class II (DNA transposons), and in terms of order[33]. These TEs represented 17.6% of total data, corresponding to 1,830,620 bp. Class I was prevalent with 96.4% (241/250) retrotransposons (Table 3). These TEs were preferentially hosted in intergenic regions (70.4%, 176/250); 74 TEs were found within genes, including 70 exonic TEs, and only four were located in introns.
Table 3

Classification of transposable elements identified in a gene-rich fraction of the Pasiflora edulis genome.

ClassNumber of elementsPercentage of nucleotides*
Class I total (RXX)24117.15
DIRS total (RYX)111.11
  DIRS incomplete11
  DIRS potential chimeric11
LINE total (RIX)70.52
  LINE complete3
  LINE incomplete4
  LINE potential chimeric6
LTR total (RLX)18113.64
  LTR complete73
  LTR incomplete108
  LTR potential chimeric36
SINE total (RSX)20.01
  SINE incomplete2
LARD total (RXX-LARD)361.82
LARD potential chimeric 2
TRIM total (RXX-TRIM)40.05
Classe II total (DXX)90.45
Helitron total (DHX)20.13
  Helitron complete2
TIR total (DTX)60.31
  TIR incomplete6
  TIR potential chimeric1
MITE total (DXX-MITE)10.01
Total25017.60

*Percentage of nucleotides in 10.4 Mb of P. edulis sequences.

Classification of transposable elements identified in a gene-rich fraction of the Pasiflora edulis genome. *Percentage of nucleotides in 10.4 Mb of P. edulis sequences. The LTR (Long Terminal Repeat) retrotransposon was the most frequent order, and accounted for 75.1% (181/241) of retrotransposons, corresponding to 1,418,389 bp or 13.6% (1,418,389 bp/10,401,671 bp) of all sequence data. The other orders of Class I were poorly represented, but note that LARDs (Large Retrotransposon Derivatives) accounted for 36 elements (Table 3). Only 3.6% (9/250) of TEs were of Class II, the majority (6) classified as TIR (Terminal Inverted Repeats) (Table 3). The search for microsatellites resulted in the identification of 11,020 simple sequence repeats (SSR), representing 1.05% of all sequence data (109,695 bp/10,401,671 bp). In CDS (1,985,806 bp) there were 1,762 SSRs (~16% of the total). Taking into account all sequence data, 106 SSRs were found every 100 kb (one SSR every 0.94 kb). Analyzing the CDS region, 89 SSRs were found every 100 kb (one SSR every 1.12 kb); hence, the frequency of SSRs was slightly lower in the CDS region (~1.2×, 1.12 kb/0.94 kb). Our estimates were 10× lower than those reported in[15] using P. edulis BES data as a major resource (10.8 SSRs every 100 kb or one SSR every 9.25 kb). Microsatellite sequences were grouped according to motif, and all possible classes of repeats were found, with trinucleotides the most prevalent in both data sources. Compound SSRs accounted for 17.4% (1,919/11,020) of all SSRs, and 15.7% (278/1,762) of these SSRs were found in CDS (Fig. 2A). Among the mononucleotides, the A/T motif far surpassed the number of G/C motifs. The most frequent dinucleotides were AT/AT (49.3%), followed by AG/CT (35.4%), which were prevalent in CDS (74%). Among the trinucleotides, AAG/CTT were the most frequent in both data sources (~23%). Other occurrences (tetra-, penta- and hexanucleotides) are shown in Fig. 2B.
Figure 2

(A) Percentage of mono-, di-, tri-, tetra-, penta- and hexanucleotides in microsatellites (SSRs) found in all sequences (10.4 Mb) of Passiflora edulis (blue bars) and in coding DNA sequences (CDS, orange bars). (B) Percentage of the most frequent motifs in each class of microsatellites (SSRs) found in all sequences (blue bars) and in coding DNA sequences (CDS, orange bars) of Passiflora edulis.

(A) Percentage of mono-, di-, tri-, tetra-, penta- and hexanucleotides in microsatellites (SSRs) found in all sequences (10.4 Mb) of Passiflora edulis (blue bars) and in coding DNA sequences (CDS, orange bars). (B) Percentage of the most frequent motifs in each class of microsatellites (SSRs) found in all sequences (blue bars) and in coding DNA sequences (CDS, orange bars) of Passiflora edulis.

Microsynteny Analysis Results

The following 20 P. edulis BAC-inserts were used for microsynteny analysis: Pe101K14 + 141H13 (36), Pe185D11 (36), Pe164B18 (29), Pe214H11 (29), Pe164D9 (28), Pe186E19 (28), Pe43L2 (27), Pe164K17 (26), Pe215I8 (26), Pe84I14 (25), Pe84M23 (25), Pe93M2 (25), Pe171P13 (25), Pe207D11 (25), Pe93N7 (24), Pe108C16 (24), Pe173B16 (24), Pe185J16 (24), Pe198H23 (24) and Pe212I1 (24). These regions were found to contain the highest number of annotated genes (given in parenthesis) and account for 2,243,840 bp, encompassing 534 genes (Table 1). Microsynteny analysis showed that 18 of the 20 P. edulis regions contained syntenic P. trichocarpa chromosomal regions, and 15 P. edulis regions had syntenic M. esculenta chromosomal regions (Figs 3−7, S1−S13). In some comparisons, the microsyntenic region of P. edulis had the opposite orientation with respect to the chromosomes of both (see Fig. 3) or one of the species compared.
Figure 3

Collinear microsyntenic regions identified in Passiflora edulis (yellow bars) and Populus trichocarpa chromosome 2 (green bar) and Manihot esculenta chromosomes 12 and 13 (brown bars). Note the opposite orientation of the P. edulis microsyntenic region relative to the chromosomes of both species. The orthologous genes of P. edulis are duplicated in M. esculenta chromosomes.

Figure 7

Collinear microsyntenic regions identified in Passiflora edulis (yellow bars) and Populus trichocarpa chromosomes 4 and 9 (green bars) and Manihot esculenta chromosome 4 (brown bar). Note the opposite orientation of the P. edulis microsyntenic region relative to P. trichocarpa chromosomes, and the large segment of P. trichocarpa chromosome 4 that is missing in P. edulis. The orthologous genes of P. edulis are duplicated in P. trichocarpa chromosomes.

Collinear microsyntenic regions identified in Passiflora edulis (yellow bars) and Populus trichocarpa chromosome 2 (green bar) and Manihot esculenta chromosomes 12 and 13 (brown bars). Note the opposite orientation of the P. edulis microsyntenic region relative to the chromosomes of both species. The orthologous genes of P. edulis are duplicated in M. esculenta chromosomes. Collinear microsyntenic regions identified in Passiflora edulis (yellow bars) and Populus trichocarpa chromosomes 4 and 9 (green bars). Note the opposite orientation of P. edulis microsyntenic region. The orthologous genes of P. edulis are duplicated in P. trichocarpa chromosomes. Collinear microsyntenic regions identified in Passiflora edulis (yellow bars) and Populus trichocarpa chromosome 14 (green bar) and Manihot esculenta chromosomes 1 and 5 (brown bars). Note the opposite orientation of M. esculenta chromosome 1, and rearranged segments at the end of the P. edulis microsyntenic region. The orthologous genes of P. edulis are duplicated in M. esculenta chromosomes. Collinear microsyntenic regions identified in Passiflora edulis (yellow bars) and Populus trichocarpa chromosome 1 (green bars) and Manihot esculenta chromosome 6 and 14 (brown bars). Note the opposite orientation of M. esculenta chromosome 6. There are translocated segments in the P. edulis microsyntenic region relative to chromosome 1 of P. trichocarpa. The orthologous genes of P. edulis are duplicated in M. esculenta chromosomes. Collinear microsyntenic regions identified in Passiflora edulis (yellow bars) and Populus trichocarpa chromosomes 4 and 9 (green bars) and Manihot esculenta chromosome 4 (brown bar). Note the opposite orientation of the P. edulis microsyntenic region relative to P. trichocarpa chromosomes, and the large segment of P. trichocarpa chromosome 4 that is missing in P. edulis. The orthologous genes of P. edulis are duplicated in P. trichocarpa chromosomes. The 18 P. edulis regions span 1,702,975 bp and contain 406 genes. They matched syntenic segments of P. trichocarpa chromosomes that span 7,137,451 bp and contain 966 genes, including 501 orthologs (Table 4). Ten of the syntenic regions of P. edulis have orthologous genes that are duplicated in P. trichocarpa chromosomes. Interestingly, a continuous region in P. edulis (Pe214H11) is syntenic to segments of P. trichocarpa chromosome 4, and these segments are separated by 1.4 Mb. The same is true for segments of chromosome 9, separated by 1.2 Mb (Fig. 4). Other large segments of the P. trichocarpa chromosome 4 are also missing in the corresponding P. edulis syntenic region (Fig. 7). These presumably relate to deletion events that occurred in P. edulis.
Table 4

Characterization of 18 Passiflora edulis regions found to have syntenic Populus trichocarpa chromosomal regions.

Passiflora edulis Populus trichocarpa
BAC codeInsert length (bp)Syntenic region length (bp)Syntenic region length (bp)ChromosomeNumber of orthologous genes
Pe101K14 + 141H13172,337159,949213,9421412
Pe108C1696,75368,880137,749616
65,309130,2291813
Pe164B18104,102103,945369,800420
103,945189,230918
Pe164D993,52780,789430,901427
85,112209,2531726
Pe164K17113,504113,313332,6371423
110,607307,065216
Pe171P13111,12385,809340,005712
Pe173B16109,801105,875409,775428
105,875166,729929
Pe185D11119,061110,316253,596222
Pe185J16103,09547,587231,4191210
Pe186E19115,21817,44227,58315
92,977268,11718
Pe207D11111,69031,090122,49718
Pe212I1121,38485,114162,212214
85,114169,126513
Pe214H11142,45679,416221,003917
64,482248,247414
60,720202,191913
62,181222,504411
Pe215I8129,73779,415166,694112
Pe84I1497,84893,065141,6471413
Pe84M2393,21792,795171,100215
89,339206,947512
Pe93M2100,43698,828199,3501217
88,334207,9611518
Pe93N7106,968105,007340,655623
99,896337,2871816
Total2,042,2571,702,975*7,137,451501

*Non-redundant data.

Figure 4

Collinear microsyntenic regions identified in Passiflora edulis (yellow bars) and Populus trichocarpa chromosomes 4 and 9 (green bars). Note the opposite orientation of P. edulis microsyntenic region. The orthologous genes of P. edulis are duplicated in P. trichocarpa chromosomes.

Characterization of 18 Passiflora edulis regions found to have syntenic Populus trichocarpa chromosomal regions. *Non-redundant data. Average gene length in P. edulis (2,785 bp) is slightly lower than that of P. trichocarpa (3,290 bp). However, the average intergenic spacer length in P. trichocarpa (8,694 bp) is four times that of P. edulis (1,871 bp) (Supplementary Table S4). The gene order is conserved in most of the syntenic regions, but rearrangements were observed. On comparing P. edulis with P. trichocarpa, two typical inversion events in the gene order were recognized (Supplementary Figs S3 and S6). Moreover, two adjacent genes in P. trichocarpa chromosome 1 were found to be inverted, and also interrupted in the P. edulis syntenic region (Fig. 6). Finally, it is worth noting the occurrence of particular gene duplications within the syntenic regions involving two to seven copies. Figure 4 shows two P. edulis genes (8th and 22nd) that have four copies in P. trichocarpa chromosome 9.
Figure 6

Collinear microsyntenic regions identified in Passiflora edulis (yellow bars) and Populus trichocarpa chromosome 1 (green bars) and Manihot esculenta chromosome 6 and 14 (brown bars). Note the opposite orientation of M. esculenta chromosome 6. There are translocated segments in the P. edulis microsyntenic region relative to chromosome 1 of P. trichocarpa. The orthologous genes of P. edulis are duplicated in M. esculenta chromosomes.

In the comparison with M. esculenta, the 15 regions of P. edulis span 1,392,795 bp and contain 348 genes, matching syntenic segments of M. esculenta chromosomes that span 5,053,254 bp and contain 633 genes, including 365 orthologs (Table 5). Eleven of the syntenic regions of P. edulis contain orthologous genes that are duplicated in M. esculenta chromosomes.
Table 5

Characterization of 15 Passiflora edulis regions found to have syntenic Manihot esculenta chromosomal regions.

Passiflora edulis Manihot esculenta
BAC codeBAC length (bp)Syntenic region length (bp)Syntenic region length (bp)ChromosomeNumber of orthologous genes
Pe101K14-141H13172,337170,391183,133116
164,887259,161514
Pe108C1696,75368,88076,043310
63,47488,4581610
Pe164B18104,102103,945182,7201717
103,945345,2431512
Pe164D993,52793,489206,242225
85,112118,187115
Pe164K17113,504101,996189,788111
110,607393,258517
Pe173B16109,80192,992235,649420
Pe185D11119,061110,279317,8861227
112,597254,2961318
Pe185J16103,09588,563308,705112
Pe186E19115,21850,679304,339149
50,679101,36168
Pe207D11111,69028,90248,780156
Pe212I1121,38485,114172,1431814
Pe215I8129,737118,786162,3631714
124,698193,7251514
Pe84I1497,84896,433135,657112
94,441211,686514
Pe84M2393,21766,677148,6821813
53,520137,51128
Pe93M2100,43698,828126,299617
78,587151,9391412
TOTAL1,681,7101,392,795*5,053,254365

*Non-redundant data.

Characterization of 15 Passiflora edulis regions found to have syntenic Manihot esculenta chromosomal regions. *Non-redundant data. The average P. edulis gene length (2,641 bp) is slightly lower than that of M. esculenta (3,886 bp). However, the average intergenic spacer length (6,777 bp) was three times that of P. edulis (1,850 bp) (Supplementary Table S4). Gene order is also conserved in most of the syntenic regions, but rearrangements were recognized in genes of both P. edulis and M. esculenta (Figs S1, S2, S6, S7). The occurrence of particular gene duplications within syntenic regions involving two to five copies was also detected. Figure 3 shows three copies of a P. edulis gene (18th) arranged in tandem on chromosome 13 of M. esculenta and two copies in tandem on chromosome 12, totaling 5 copies. The 2nd gene within the P. edulis microsyntenic region is also duplicated in M. esculenta chromosome 12. In terms of specific genes, note that a single copy of the gene encoding a KIN1-related stress-induced protein was found in P. edulis but there are seven orthologous copies in P. trichocarpa chromosome 4 and three in chromosome 17 (Supplementary Fig. S2). Moreover, five copies in tandem of the gene encoding an endo-1,3 1,4-beta-D-glucanase were found in P. edulis, but no orthologs were found in P. trichocarpa and M. esculenta. Finally, four copies in tandem of the salicylic acid-binding protein 2-like gene were found in P. edulis: an orthologous copy was found in chromosome 4 and three in chromosome 9 of P. trichocarpa, but only one copy was found in chromosome 17 of M. esculenta (Supplementary Fig. S1). There is a higher degree of comparative microsynteny between P. edulis and P. trichocarpa than between P. edulis and M. esculenta. The number of genes is significantly high in most P. trichocarpa and M. esculenta chromosomes compared to that found in P. edulis microsyntenic regions (Tables 4 and 5). The highest level of synteny conservation was found between Pe173B16 and P. trichocarpa chromosome 9, with 29 orthologous, collinear gene pairs (Table 4; Fig. 7), and between Pe185D11 and M. esculenta chromosome 12, with 27 orthologous, collinear gene pairs (Table 5; Fig. 3).

Discussion

Despite great advances in genome sequencing, the process of sequencing a plant genome is still laborious, due primarily to the size and complexity of genome regions which pose a challenge when it comes to sequencing and assembly. For instance, Passiflora species are extensively diversified in morphological terms, with genome sizes ranging from 207 Mb to 2.15 Gb[14] and there are no draft genomes for any passion fruits, even the most cultivated species, P. edulis. In this study, a gene-rich fraction of the P. edulis genome was sequenced and assembled from long sequence reads, allowing us to obtain 10.4 Mb of highly curated data. About half of all sequences (44%) matched P. edulis gene sequences and annotation revealed several functional categories and protein domains. Interestingly, the most frequent domain was retrotransposon gag, associated with transcripts of the LTR retrotransposon, followed by the kinase domains. This abundance was to be expected, since kinases belong to a superfamily of proteins with copies in the hundreds or thousands and are components of all cellular functions. These proteins use ATP γ-phosphate to phosphorylate serine and threonine or tyrosine residues from other proteins[57]. Note that to date there is an enormous scarcity of information on Passiflora nuclear genes in databases. This means that obtaining gene-based probes for selecting new regions for whole sequencing is practically impossible. The structural and functional annotation of 1,883 genes provides a significant set of high quality gene sequences that can be used in many other studies on Passiflora (see Supplementary Table S3). Transposable elements (TEs) are highly widespread in plant genomes, accounting for 14% of the Arabidopsis thaliana genome[58], up to 80% of the maize genome[59] and 17.6% of all P. edulis sequences. The vast majority are retroelements that belong to Class I (96.4%), and especially to the LTR order. This abundance is very similar to that previously reported[15] analyzing ~10,000 BES (18.5% TEs, 94.1% Class I TEs, the majority belonging to the LTR order), and this pattern should be repeated in P. edulis. On examining high quality genomes, several authors have stated that the spread of TEs (mostly retrotransposons) is the main driver of genome size variation in plants. This is particularly true of LTR retrotransposons due to the replication mechanism. LTRs are found mainly in centromeric regions, playing important role in chromatin structure maintenance, centromere performance and the regulation of host gene expression[60-62]. The content of LTR elements in P. edulis is comparable to that identified in related Malpighiaceae species with completely sequenced genomes, although the abundance of TEs is highly variable. This variation is to be expected and is indicative of particular TE-driven evolutionary processes[60]. For instance, ~42% of the P. trichocarpa genome consists of transposable elements (although only 12.9% of the sequences could be classified as known TEs), the majority belonging to the LTR order (~60%). These figures relate to the draft genome of P. trichocarpa[24], and the authors state that this genome could contain even more non-classified LTRs. In R. communis, approximately 50% of the genome consists of transposable elements, and LTRs were the most abundant, making up ~16% of the genome[63], close to the value observed in P. edulis (13.6%), although the genome size of this species is ~3.8× larger than that of R. communis. Finally, in Manihot esculenta, ~25.7% of the genome consists of transposable elements, and LTR is also the most represented order among classified TEs, forming ~11% of the genomic sequences[25]. In this case, the genome report was based on 65% of an assembled genome of the domesticated variety. In terms of microsatellite abundance, ~1.0% of all P. edulis sequences consisted of SSRs, with trinucleotide repeats prevalent (55.6%), even in CDS (93.8%). Microsatellite abundance generally varies from one genome region to another, but trinucleotides are usually overrepresented in coding sequences, due to selection pressures against mutations that may alter the reading frames[64]. The P. edulis results corroborate the findings of a pioneer study[65] with regard to the effect that trinucleotide repeats are significantly more abundant in the expressed regions of plant genomes. Recently, a total of 1,300 perfect microsatellite sites were described in P. edulis genomic regions (with minimum 15× coverage as a cut off; Illumina paired-end reads) that were selected for marker development and Passiflora diversity analysis[66]. In this significant sample, the prevalence of tri-, tetra- and dinucleotides was found to be 41.0%, 36.4% and 22.6%, respectively. In the P. trichocarpa genome, the predominance of mono- (69.8%), di- (19.5%) and trinucleotides (9.0%) decreased stepwise as the motif length increased (mono- to hexanucleotide repeats); 98% of P. trichocarpa mononucleotides consist of A/T motifs and only 2% of C/G motifs. The same applies to P. edulis (Fig. 2B). For di- and trinucleotides, the most frequent motifs were AT/AT (60.5%) and AAT/ATT (48.2%). In terms of coding sequences, 90.3% and 76.6% of the mono- and dinucleotides consist respectively of A/T and AG/CT motifs. Trinucleotides consist mainly of AAG/CTT, ACC/GGT and AGG/CCT motifs (~20% of each), and the frequencies of tetra-, penta- and hexanucleotides were very low[67]. In M. esculenta, 37.4% of all SSRs corresponded to dinucleotides, and tri- and pentanucleotides were found in the same proportion (~24%); within the coding sequences, tri- and hexanucleotides accounted for 95.6%. AT/AT and AAT/ATT were the most common di- and trinucleotide motifs (~23% and ~12%, respectively) and, as in P. edulis, AG/CT and AAG/CTT were the most prevalent in coding sequences (~4% and ~23%, respectively)[68]. In the R. communis genome, most of the SSRs found were also dinucleotides (70.4%), followed by trinucleotides (24.9%). AT/TA was the most frequent motif among dinucleotides (75.3%) and AAT/TTA among trinucleotides (71%)[69]. Clearly, the particular occurrence of certain motifs in plant genomes and in different genome regions is due to selection pressure during evolution[70,71], and structural and functional genome attributes, like GC content and codon usage bias, may be responsible for the unique content and distribution patterns of microsatellites[72,73]. Remarkable, there are several benefits that can be derived from the knowledge we have generated. First, a draft sequencing of the Passiflora edulis nuclear genome, especially of a gene-rich fraction, provides a platform for functional analysis and development of genomic tools in applied passion fruit improvement. Our work also represents a first step towards full sequencing of the P. edulis genome. Moreover, wild Passiflora species harbor a variety of characteristics that determine their ecological importance and adaptability. The availability of gene sequences could help researchers test for the presence of gene variants or polymorphisms in different environments. This is also possible for cultivated species. Gene prediction has yielded around 1,900 genes, and functional annotation has associated genes with plant immunity and defense functions (Supplementary Table S3). Taxonomically speaking, the genus is subdivided into four subgenera: three clades were recognized as monophyletic (Astrophea, Decaloba, and Passiflora), but the position of Deidamioides remained unresolved, as this particular clade was found to be paraphyletic. Therefore, gene sequences could be used in phylogenetic analysis to obtain accurate evolutionary information. By providing information on the levels of synteny conservation and rearrangements within the microcollinear regions (inverted and translocated segments, deletion and gene duplication events), this study will help confirm the relationships between a Passiflora species and related Malpighiales, with important taxonomical implications. Our previous phylogenetic analyses based on the available chloroplast genomes of members of the four families that compose the Malpighiales order indicated that the Passifloraceae are more closely related to the Salicaceae than to the Euphorbiaceae[16]. This proximity is definitively confirmed herein by microsynteny analysis, confirming the importance of using comparative genomic approaches as an additional resource for elucidating the phylogenetic relationships in the families that compose the Malpighiales order, one of the largest of flowering plants. Although P. edulis microsyntenic regions were compared with whole genomes of P. trichocarpa (Salicaceae) and M. esculenta (Euphorbiaceae), i.e. species that belong to different taxonomic families, the analysis showed that overall gene order was well conserved. The level of microsynteny observed between the majority of P. edulis BAC inserts and these genomes is surprising, given the long divergence time that separates them from the common ancestor of the Malpighiales, some 100 million years ago[74]. The event of whole genome duplication (WGD) in P. trichocarpa occurred about 60−65 million years ago and reached around 92% of its genome[24]. On the other hand, M. esculenta has undergone a paleo-genome duplication event, and a number of its genes were found to have only two copies[25,75]. This may be related to the loss of one of the homologous copies in M. esculenta owing to selection pressure that restored the single-copy state of genes that impair fitness when present in multiple copies[76]. The genome size of P. edulis is estimated at ~1.23 Gb, significantly higher than the estimated genome sizes of P. trichocarpa (~485 Mb)[24] and M. esculenta (~742 Mb)[25]. These differences raise the question: did an ancestor of the passionflowers undergo genome duplication? Possibly. According to cytogenetic studies, the basic chromosome number in the genus Passiflora is x = 6, with several species containing secondary numbers, as in the case of P. edulis (x = 9). These species with secondary chromosome numbers are possibly of polyploid origin[77,78]. Nevertheless, there is evolutionary evidence indicating x = 12 as the basic chromosome number, since x = 6 was reported to occur only in the subgenus Decaloba. In primitive Passiflora species, such as those of the Astrophea subgenus, x = 12, and the same applied to other species of the Passifloraceae family[78,79]. This suggests that descending dysploidy events may have occurred in the Passiflora (x = 9) and Decaloba (x = 6) subgenera, lending weight to the hypothesis that genome duplication occurred in an ancestor of the Passifloraceae. In actual fact the diploid numbers 2n = 12, 18, 24, and 72 have been reported for Passiflora species[80]. An examination of the microsyntenic regions shows that the P. edulis gene-rich segments are more compact than those of the species compared, even though its genome size is three times longer than that of P. trichocarpa, and almost twice the size of the M. esculenta genome. The limited sampling of P. edulis genome analyzed herein does not account for these apparently contradictory attributes regarding the compactness of gene regions and genome sizes. Further studies are required to elucidate the abundance of repetitive DNA (including TEs) associated with gene-poor regions and/or the occurrence of large heterochromatin blocks in P. edulis[81,82]. Finally, wide variations in genome size occur within the genus Passiflora[14] indicating that genome duplication, DNA sequence acquisition and loss throughout the evolution of the genus (favoring species disruption) have occurred since its diversification from the common ancestor about 38 million years ago[83].

Conclusion

The outcome of this research was a unique set of high quality sequence data on a gene-rich fraction of the Passiflora edulis genome, describing gene content and abundance of repetitive elements. The structural and functional annotations of 1,883 genes of P. edulis are detailed. It is proposed that there is a relatively high degree of conservation in gene regions of P. edulis, Populus trichocarpa and Manihot esculenta, according to our microsynteny analysis results. Collinear orthologous genes are shown to be prevalent, although some disruptions of collinearity have occurred due to rearrangements (inversion, translocation events) within microsyntenic regions. Interestingly, even though the P. edulis genome is much larger than those of P. trichocarpa (3×) and M. esculenta (2×), which evolved by polyploidy, the P. edulis gene-rich segments are much more compact. In this study the first steps have been taken, but further studies are required to elucidate the abundance of repetitive DNA associated with gene-poor regions and/or the occurrence of large heterochromatin blocks in P. edulis, in order to contribute to our understanding of the evolutionary issues that these genomes raise. Supplementary Figures S1-S13 Supplementary Tables S1 and S2 Supplementary Tables S3 and S4
  66 in total

1.  Self-incompatibility in passionfruit: evidence of gametophytic-sporophytic control.

Authors:  T de M F Suassuna; H Bruckner; R de Carvalho; A Borém
Journal:  Theor Appl Genet       Date:  2002-11-15       Impact factor: 5.699

Review 2.  A beginner's guide to eukaryotic genome annotation.

Authors:  Mark Yandell; Daniel Ence
Journal:  Nat Rev Genet       Date:  2012-04-18       Impact factor: 53.242

3.  BRAKER1: Unsupervised RNA-Seq-Based Genome Annotation with GeneMark-ET and AUGUSTUS.

Authors:  Katharina J Hoff; Simone Lange; Alexandre Lomsadze; Mark Borodovsky; Mario Stanke
Journal:  Bioinformatics       Date:  2015-11-11       Impact factor: 6.937

4.  Convergent gene loss following gene and genome duplications creates single-copy families in flowering plants.

Authors:  Riet De Smet; Keith L Adams; Klaas Vandepoele; Marc C E Van Montagu; Steven Maere; Yves Van de Peer
Journal:  Proc Natl Acad Sci U S A       Date:  2013-02-04       Impact factor: 11.205

5.  RAPD-based genetic linkage maps of yellow passion fruit (Passiflora edulis Sims. f. flavicarpa Deg.).

Authors:  Monalisa Sampaio Carneiro; Luis Eduardo Aranha Camargo; Alexandre Siqueira Guedes Coelho; Roland Vencovsky; Pereira Leite Júnior Rui; Neusa Maria Colauto Stenzel; Maria Lucia Carneiro Vieira
Journal:  Genome       Date:  2002-08       Impact factor: 2.166

6.  A physical, genetic and functional sequence assembly of the barley genome.

Authors:  Klaus F X Mayer; Robbie Waugh; John W S Brown; Alan Schulman; Peter Langridge; Matthias Platzer; Geoffrey B Fincher; Gary J Muehlbauer; Kazuhiro Sato; Timothy J Close; Roger P Wise; Nils Stein
Journal:  Nature       Date:  2012-10-17       Impact factor: 49.962

7.  Linkage and mapping of resistance genes to Xanthomonas axonopodis pv. passiflorae in yellow passion fruit.

Authors:  Ricardo Lopes; Maria Teresa Gomes Lopes; Monalisa Sampaio Carneiro; Frederico de Pina Matta; Luis Eduardo Aranha Camargo; Maria Lucia Carneiro Vieira
Journal:  Genome       Date:  2006-01       Impact factor: 2.166

8.  Genome-wide distribution and organization of microsatellites in plants: an insight into marker development in Brachypodium.

Authors:  Humira Sonah; Rupesh K Deshmukh; Anshul Sharma; Vinay P Singh; Deepak K Gupta; Raju N Gacche; Jai C Rana; Nagendra K Singh; Tilak R Sharma
Journal:  PLoS One       Date:  2011-06-21       Impact factor: 3.240

9.  PASTEC: an automatic transposable element classification tool.

Authors:  Claire Hoede; Sandie Arnoux; Mark Moisset; Timothée Chaumier; Olivier Inizan; Véronique Jamilloux; Hadi Quesneville
Journal:  PLoS One       Date:  2014-05-02       Impact factor: 3.240

10.  Microsatellite marker development by partial sequencing of the sour passion fruit genome (Passiflora edulis Sims).

Authors:  Susan Araya; Alexandre M Martins; Nilton T V Junqueira; Ana Maria Costa; Fábio G Faleiro; Márcio E Ferreira
Journal:  BMC Genomics       Date:  2017-07-21       Impact factor: 3.969

View more
  5 in total

1.  Identification of passion fruit (Passiflora edulis) chromosomes using BAC-FISH.

Authors:  M A Sader; Y Dias; Z P Costa; C Munhoz; H Penha; H Bergès; M L C Vieira; Andrea Pedrosa-Harand
Journal:  Chromosome Res       Date:  2019-07-18       Impact factor: 5.239

2.  Cryopreservation and germinative behavior of Passiflora spp. seeds.

Authors:  Jailton de Jesus Silva; Tatiana Góes Junghans; Carlos Alberto da Silva Ledo; Fabiane de Lima Silva; Everton Hilo de Souza; Kuang Hongyu; Fernanda Vidigal Duarte Souza
Journal:  3 Biotech       Date:  2022-09-12       Impact factor: 2.893

3.  Identification and evaluation of reference genes for quantitative real-time PCR analysis in Passiflora edulis under stem rot condition.

Authors:  Yanyan Wu; Qinglan Tian; Weihua Huang; Jieyun Liu; Xiuzhong Xia; Xinghai Yang; Haifei Mou
Journal:  Mol Biol Rep       Date:  2020-03-25       Impact factor: 2.316

4.  Chromosome-scale genome assembly provides insights into the evolution and flavor synthesis of passion fruit (Passiflora edulis Sims).

Authors:  Zhiqiang Xia; Dongmei Huang; Shengkui Zhang; Wenquan Wang; Funing Ma; Bin Wu; Yi Xu; Bingqiang Xu; Di Chen; Meiling Zou; Huanyu Xu; Xincheng Zhou; Rulin Zhan; Shun Song
Journal:  Hortic Res       Date:  2021-01-08       Impact factor: 6.793

5.  Transposable element discovery and characterization of LTR-retrotransposon evolutionary lineages in the tropical fruit species Passiflora edulis.

Authors:  Zirlane Portugal da Costa; Luiz Augusto Cauz-Santos; Geovani Tolfo Ragagnin; Marie-Anne Van Sluys; Marcelo Carnier Dornelas; Hélène Berges; Alessandro de Mello Varani; Maria Lucia Carneiro Vieira
Journal:  Mol Biol Rep       Date:  2019-09-24       Impact factor: 2.316

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.