| Literature DB >> 35572792 |
George Tsiolas1, Sofia Michailidou1, Antiopi Tsoureki1, Anagnostis Argiriou1,2.
Abstract
The genetic material of Vitis varieties is crucial for the wine sector. In addition, genomic technologies applied in vitis germplasm characterization are important for the conservation of indigenous genetic reservoirs. Until recently the most common method to genetically identify vitis varieties was the use of Simple Sequence Repeats (SSR) along with SNP chips. Yet, with the progress in Next Generation Sequencing (NGS) technologies and the reduced sequencing cost per base, a twist in plant species genetic identification methods has occurred. Among them, the low coverage Whole-Genome Sequencing (lcWGS) method with downstream bioinformatic analysis for variant discovery and phylogenetic characterization is gaining scientific attention. In this dataset, shotgun sequencing data of two different Greek Vitis varieties, 'Razaki' and 'Vlachiko' are presented. Vitis cultivars were collected from the Aristotle University of Thessaloniki's (AUTH) ampelographic collection and have been previously phenotypically and genetically characterized. WGS libraries were sequenced on an IlluminaⓇ NovaSeq 6000 platform with the IlluminaⓇ NovaSeq 6000 S2 Reagent Kit (300 cycles). Raw sequence data used for analysis are available in NCBI under the Sequence Read Archive (SRA), with BioProject ID PRJNA805368. Reads were aligned to the reference genome of Vitis vinifera available from the EnsemblPlants database and formal analysis was conducted with the Genome Analysis Toolkit 4 (GATK4) pipeline. Data can be used to enrich our knowledge related to the genetic background of vitis cultivars and can also serve as a threshold in the scientific community towards the construction of a genomic database of vitis cultivars.Entities:
Keywords: SNPs; Variant analysis; Vitis cultivars; Whole-genome sequencing
Year: 2022 PMID: 35572792 PMCID: PMC9092844 DOI: 10.1016/j.dib.2022.108216
Source DB: PubMed Journal: Data Brief ISSN: 2352-3409
Generated genomic data of Vitis varieties ‘Razaki’ and ‘Vlachiko’.
| BioSample | SRA Accession | Variety | Raw Reads | Gbases | Depth of coverage |
|---|---|---|---|---|---|
| SAMN25855225 | SRR17982063 | 47,292,258 | 12,917,903,048 | 26.57 | |
| SAMN25855226 | SRR17982062 | 47,404,592 | 12,714,283,858 | 26.15 |
Type and number of variants per variety.
| Razaki | Vlachiko | |||
|---|---|---|---|---|
| Summary Variant Statistics | InDels | SNPs | InDels | SNPs |
| Total number of loci | 889,874 | 5,687,476 | 927,939 | 5,931,063 |
| Number of variants (before filtering) | 926,009 | 5,735,996 | 968,309 | 5,984,608 |
| Number of variants processed (after filtering) | 915,881 | 5,704,855 | 957,137 | 5,950,310 |
| Number of multi-allelic variants (more than two alleles) | 36,135 | 48,520 | 40,370 | 53,545 |
| Number of effects | 1,700,732 | 9,494,769 | 1,769,227 | 9,906,520 |
| Reference genome total length | 486,265,422 | 486,265,422 | 486,265,422 | 486,265,422 |
| Reference genome effective length | 486,265,422 | 486,265,422 | 486,265,422 | 486,265,422 |
| Variant rate | 1 every 530 bases | 1 every 85 bases | 1 every 508 bases | 1 every 81 bases |
The number of variants and the affected Sequence Ontologies (SO).
| Sequence Ontologies (SO) affected | Razaki | Vlachiko | ||
|---|---|---|---|---|
| InDels | SNPs | InDels | SNPs | |
| 3_prime_UTR_truncation | 3 | 44,682 | 2 | 0 |
| 3_prime_UTR_variant | 10,404 | 0 | 11,375 | 48,289 |
| 5_prime_UTR_premature_start_codon_gain_variant | 0 | 3,776 | 0 | 3,889 |
| 5_prime_UTR_truncation | 4 | 0 | 2 | 0 |
| 5_prime_UTR_variant | 4,571 | 23,523 | 4,792 | 24,673 |
| bidirectional_gene_fusion | 1 | 0 | 0 | 0 |
| conservative_inframe_deletion | 786 | 0 | 794 | 0 |
| conservative_inframe_insertion | 807 | 0 | 907 | 0 |
| disruptive_inframe_deletion | 1,335 | 0 | 1,391 | 0 |
| disruptive_inframe_insertion | 952 | 0 | 1,012 | 0 |
| downstream_gene_variant | 371,034 | 1,811,783 | 386,922 | 1,906,153 |
| exon_loss_variant | 13 | 0 | 13 | 0 |
| frameshift_variant | 6,847 | 0 | 7,024 | 0 |
| gene_fusion | 1 | 0 | 3 | 0 |
| initiator_codon_variant | 0 | 51 | 0 | 50 |
| intergenic_region | 686,851 | 4,261,933 | 709,662 | 4,401,118 |
| intragenic_variant | 5 | 0 | 0 | 0 |
| intron_variant | 203,918 | 1,154,763 | 1,241,972 | 1,241,972 |
| missense_variant | 119,532 | 0 | 126,314 | 0 |
| non_coding_transcript_exon_variant | 968 | 122 | 1,216 | 133 |
| non_coding_transcript_variant | 0 | 307 | 0 | 331 |
| splice_acceptor_variant | 545 | 356 | 554 | 340 |
| splice_donor_variant | 522 | 428 | 570 | 422 |
| splice_region_variant | 14,691 | 3,220 | 16,125 | 3,346 |
| start_lost | 405 | 181 | 412 | 170 |
| start_retained_variant | 0 | 15 | 0 | 16 |
| stop_gained | 2,591 | 291 | 2,628 | 282 |
| stop_lost | 517 | 133 | 524 | 129 |
| stop_retained_variant | 231 | 25 | 253 | 35 |
| synonymous_variant | 95,902 | 0 | 103,131 | 0 |
| upstream_gene_variant | 1,973,306 | 412,606 | 2,045,032 | 423,890 |
Fig. 1Heatmap of variants for ‘Razaki’ and ‘Vlachiko’ varieties. Rows depict the affected Sequence Ontologies and columns the SNPs and InDels for each variety. Color scale refers to log10[(variants)+1].
| Subject | Biological sciences: Omics: Genomics |
| Specific subject area | Low coverage whole genome sequencing of two Greek vitis cultivars for cultivar identification and variant discovery |
| Type of data | Tables and Figures |
| How the data were acquired | WGS libraries were constructed using Illumina's Nextera DNA Flex library preparation kit. Sequencing was performed on an IlluminaⓇ NovaSeq 6000 platform using the IlluminaⓇ NovaSeq 6000 S2 Reagent Kit (300 cycles). The variant discovery was conducted using the Genome Analysis Toolkit 4 pipeline. |
| Data format | Raw and Analyzed |
| Description of data collection | Leaves from two grapevine varieties, ‘Razaki’ (white grape variety) and ‘Vlachiko’ (red grape variety), were obtained from the Ampelographic Collection of the Aristotle University of Thessaloniki. |
| Data source location | Institution: Institute of Applied Biosciences – Centre for Research and Technology HellasCity: ThessalonikiCountry: GreeceLatitude and longitude for analyzed data: 40.56806, 22.99713 |
| Data accessibility | Repository name: NCBI SRAData identification number: PRJNA805368Direct URL to data: |