| Literature DB >> 29896517 |
Zerihun Yemataw1,2, Sadik Muzemil1, Daniel Ambachew1, Leena Tripathi3, Kassahun Tesfaye4,5, Alemayheu Chala6, Audrey Farbos7,8, Paul O'Neill7,8, Karen Moore7,8, Murray Grant9, David J Studholme7.
Abstract
We present raw sequence reads and genome assemblies derived from 17 accessions of the Ethiopian orphan crop plant enset (Ensete ventricosum (Welw.) Cheesman) using the Illumina HiSeq and MiSeq platforms. Also presented is a catalogue of single-nucleotide polymorphisms inferred from the sequence data at an average density of approximately one per kilobase of genomic DNA.Entities:
Year: 2018 PMID: 29896517 PMCID: PMC5996239 DOI: 10.1016/j.dib.2018.03.026
Source DB: PubMed Journal: Data Brief ISSN: 2352-3409
Fig. 1Phenotypic variation among sequenced accessions of E. ventricosum. Panels A, B and C shows cultivars Mazia, Lochingie and Nobo respectively.
Fig. 2Phylogenetic positions of the enset accessions sequenced here compared to that of the previously sequenced enset genome based on sequences of the trnF–trnT barcode voucher region of the chloroplast DNA. This locus has previously been used as a barcode and phylogenetic indicator and sequence data for this locus are available from previously published studies (Bekele and Shigeta, [36]; Li et al. [19]; Harrison et al. [18]). There was no sequence variation at this locus among the 17 genomes presented here, as judged by BWA alignments of raw sequence reads against trnF-trnT sequence. Thus, the branch indicated by the black circle represents the phylogenetic position of all 17 sequenced accessions. A black triangle highlights the position of the “Jungle Seeds” individual whose genome was previously sequenced. The Maximum Likelihood tree presented here is based on a multiple sequence alignment of trnF-trnT sequences generated using MUSCLE (Edgar, 2004). Evolutionary history was inferred by using the Maximum Likelihood method based on the Tamura-Nei model (Tamura and Nei [37]). The tree with the highest log likelihood (-1249.11) is shown. The percentage of trees in which the associated taxa clustered together is shown next to the branches. Initial tree(s) for the heuristic search were obtained automatically by applying Neighbor-Join and BioNJ algorithms to a matrix of pairwise distances estimated using the Maximum Composite Likelihood (MCL) approach, and then selecting the topology with superior log likelihood value. The tree is drawn to scale, with branch lengths measured in the number of substitutions per site. The analysis involved 32 nucleotide sequences. All positions containing gaps and missing data were eliminated. There were a total of 666 positions in the final dataset. Evolutionary analyses were conducted in MEGA7 (Kumar et al. [38]).
Illumina sequencing of E. ventricosum accessions. Pairs of 100-bp reads were generated using the Illumina HiSeq. 2500 in normal mode except where indicated. A single asterisk (*) indicates use of the Illumina HiSeq. 2500 in rapid-run mode to generate pairs of 300-bp reads and two asterisks (**) indicate use of the Illumina MiSeq to generate pairs of 300-bp reads.
| 362 | Arkiya | Dawro | 7.36× | SRR4304969, SRR4304970 |
| 455 | Arkiya | Wolaita | 8.04× | SRR4304981*, SRR4304987 |
| 112 | Astara | Sidama | 15.64× | SRR4304989 |
| n/a | Bedadeti | Unknown | 45.81× | SRR1515268, SRR1515269** |
| 406 | Buffero | West Arsi | 18.25× | SRR4304990 |
| 435 | Derea | Gurage | 18.43× | SRR4308285, SRR4308286 |
| 451 | Erpha 13 | Dawro | 9.21× | SRR4304991*, SRR4304992 |
| 449 | Erpha 20 | Dawro | 9.43× | SRR4304971, SRR4304993* |
| 221 | Lochingie | Dawro | 8.86× | SRR4304972*, SRR4304973 |
| 253 | Lochingie | Wolaita | 8.66× | SRR4304974*, SRR4304975 |
| 208 | Mazia | Wolaita | 7.00× | SRR4304976*, SRR4304977 |
| 429 | Mazia | Dawro | 8.24× | SRR4304978*, SRR4304979 |
| 39 | Nechuwe | Gurage | 20.69× | SRR4304982 |
| 49 | Nobo | Sheka | 17.16× | SRR4304983 |
| 170 | Onjamo | Kembata-Tembaro | 21.75× | SRR4308284 |
| 183 | Siyuti | Wolaita | 16.54× | SRR4304984 |
| 54 | Yako | Kaffa | 17.96× | SRR4304985 |
Assembly statistics for E. ventricosum genomes.
| GenBank accession number | Enset accession | Total length (bp) | Contig N50 (bp) | Scaffold N50 (bp) |
|---|---|---|---|---|
| GCA_000818735.2 | Bedadeti | 451,284,018 | 20,943 | 21,097 |
| GCA_001884805.1 | Derea (435) | 429,479,738 | 10,278 | n.d. |
| GCA_001884845.1 | Onjamo (170) | 444,841,970 | 15,546 | 16,208 |
Fig. 3Overview of genetic variation in the sequenced E. ventricosum genomes. Each column in the heat-map represents one of 20,000 single-nucleotide variant sites. Each row represents one of the sequenced genomes. Colour indicates the relative frequency of aligned sequence reads with the variant nucleotide at that site in that genome, on a yellow-orange-red palette. Thus, heterozygous sites would be expected to be orange, while homozygous sites would be yellow (same as Bedadeti reference genome sequence) or red (variant from the Bedadeti reference genome sequence). These frequency values were inferred from mpileup-formatted files, generated by aligning genomic sequence reads against the Bedadeti reference genome sequence. The Perl script used to extract these from the mpileup files is included in the Supplementary Material.
Oligonucleotide primers for PCR-RFLP genotyping assays.
| No. | Forward and reverse primer sequences | PCR product size (bp) | Restriction enzyme | Genomic coordinates of PCR target (GenBank accession number: start-end) | Corresponding location in banana genome |
|---|---|---|---|---|---|
| 1 | TAGACTGCCAAGAGACTGCC, GAGTTTGTTCTCCACTTGCTG | 395 | EcoRV | JTFG02000023: 86778–87172 | Chromosome 9 |
| 2 | CAATGAAATGAGCTCTCGAATGA, CCTCCCTCCCTCTACACAAG | 453 | ClaI | JTFG02000451: 2383–2835 | Chromosome 3 |
| 3 | AGCTGCCTACTTATGTGCCA, AGGATGGGAGGATTTCACTCA | 296 | ClaI | JTFG02001079: 44094–44389 | No match |
| 4 | GAAAGATTCAACCACGCAACA, CAAAGTTGCCCAAATAATAGGGG | 100 | HindIII | JTFG02001701: 16598–16697 | Chromosome 9 |
| 5 | ACGTAGGAAACAGAAGGCGT, AGAATGAAAACCGGACAGATGA | 400 | BglII | JTFG02004430: 21696–22095 | Chromosome 10 |
| 6 | GACCAAGGTTGCAACGATGT, AACTCCCTAAAGTGGACCCG | 296 | HindIII | JTFG02004708: 2865–3160 | No match |
| 7 | TGCCAATTGTAGCACGCTTT, TCCCAATGATCAGGATGTCATC | 321 | BglII | JTFG02007725: 4758–5078 | Chromosome 4 |
| 8 | AGCTGATCGGTAGGCTGTTT, TGTTCACTTGCTCAACTTCAATG | 329 | EcoRV | JTFG02008123: 5568–5896 | Chromosome 4 |
| 9 | CGAAGGAACAAGAGGACGT, CGGCATGAACTAACCGCTTA | 380 | BglII | JTFG02010045: 2436–2815 | No match |
| 10 | AGAGTAGAGGTCAGCGCATC, AGGCGAGTGACTAAAGTGCT | 385 | HindIII | JTFG02015245: 4512–4896 | No match |
| 11 | GTCATGTAGAATTCAAAAGCCCA, ACCCATGACCAAGACTTTTCT | 458 | ClaI | JTFG02000797: 35394–35851 | Chromosome 10 |
| 12 | GCAGAATCCCGTGAACCATC, TGTAAGTTTCTTCTCCTCCGCT | 377 | BglII | JTFG02001387: 44650–45026 | Chromosome 10 |
| 13 | TGCTTTAACCTAGTGAGCTACAA, ACGTCGCCCTTTTACTTTTCT | 400 | BamHI | JTFG02001793: 29736–30135 | Chromosome 7 |
| 14 | GCCCATGCCATTCTTAAGGA, TCCAATTCCATCCTTCTTCATCT | 398 | BglII | JTFG02003127: 17456–17853 | Matches multiple chromosomes |
| 15 | ACTACACAATCCTGGTCCAAAA, CGTAGTTTCCGCCCTTTGAG | 113 | EcoRV | JTFG02004277: 15220–15332 | Chromosome 5 |
| 16 | CCTGGTTGAGAATGCGGATG, CGACCAATTACACTAAGCCCA | 419 | BglII | JTFG02006088: 4069–4489 | Matches several chromosomes |
| 17 | TCCAGCCCAACAATTGATTCTT, CTGAACCTCGGCCAACCT | 400 | ClaI | JTFG02006206: 13985–14384 | Matches several chromosomes |
| 18 | TGCCAACCGAACCTCTCAG, TCAGCCATCTACGACATTTACA | 400 | PstI | JTFG02010369: 10275–10674 | No match |
| 19 | TGCTTACTGACTATGGAGAGCT, TGCCTGTTTGAGTCCATATAAGT | 487 | BamHI | JTFG02011833: 6273–6759 | Matches several chromosomes |
| 20 | CTCGTTAAGGTTCCCCATGC, CCAGCGTGGGAGATCTTTTG | 452 | EcoRV | JTFG02024842: 425–876 | No match |
| 21 | CGAGGGCTTCATCGAAAAGG, GCTGCCGACGAGTTGTTC | 391 | BamHI | JTFG02043259: 629–1019 | No match |
| 22 | CGATCGTTACGTTGCTTCAG, GGAGCCACAACCAACCAATT | 446 | PstI | JTFG02009519: 11979–12424 | No match |
| Subject area | |
| More specific subject area | |
| Type of data | |
| How data was acquired | |
| Data format | |
| Experimental factors | |
| Experimental features | |
| Data source location | |
| Data accessibility |