| Literature DB >> 35313494 |
Nazima Habibi1, Fadila Al Salameen1, Muhammed Rahman1, Vinod Kumar1, Sami Al Amad1, Anisha Shajan1, Farhana Zakir1, Nasreem Abdul Razzack1, Waiel Hussain Tinwala2.
Abstract
Acacia tree population is declining in several countries of the world especially in the Arabian peninsula due to human-induced activities. The tree has potential medicinal and economic benefits as a source of fuel and timber. It can fix nitrogen, a significant property that assists in desert rehabilitation. However, the lack of genomic information of Acacia pachyceras hampers its genetic study and breeding process. We performed paired-end sequencing of A. pachyceras at a depth of 120X to obtain raw sequences of 108.9 GB with a per base quality >Q30. Filtered raw data was assembled into a fasta file of 4 GB. The assembled genomic sequences consisted of 901,755 single sequence repeats (SSRs). In total 11,596 primer pairs were designed against these SSR motifs. The data generated provides baseline genomic information about the species and formulates a base for further sequencing of A. pachyceras through PACBio and HiC technologies. The novel developed SSR markers will facilitate genetic diversity and conservation studies for Acacia species.Entities:
Keywords: De novo assembly; Genome survey; Molecular markers; Native plants; Whole genome sequencing
Year: 2022 PMID: 35313494 PMCID: PMC8933827 DOI: 10.1016/j.dib.2022.108031
Source DB: PubMed Journal: Data Brief ISSN: 2352-3409
Statistics of clean sequence data.
| Raw Data (in Mb) | Insert Size (bp) | Read Length (bp) | Filtered Data (in Mb) |
| 108,900 | 300–400 | 100;100 | 96,785 |
Fig. 1Sequence quality score. The x-axis represents the average Phred scores. The y-axis depicts the raw reads.
Basic statistics of genome assembly of Acacia pachyceras.
| Statistics | PRJNA754103 |
| # contigs (> = 0 bp) | 51,761,594 |
| # contigs (> = 1000 bp) | 269 |
| Total Length (> = 0 bp) | 2,654,428,893 |
| Total length ((> = 1000 bp) | 330,734 |
| # contigs | 6,096 |
| Largest contig | 3,140 |
| Total length | 3,904,753 |
| N50 | 609 |
| N75 | 543 |
| L50 | 2,514 |
| L75 | 4,220 |
Fig. 2Guanine plus cytosine (GC) content analysis (a) The x-axis represents the GC content and the y-axis is the no of contigs. (b) The x-axis represents GC content and the y-axis is the no. of windows.
Fig. 3SSR motifs mined from genome assembly of Acacia pachyceras (a) Distribution of SSR motifs; (b) percentage of di and trinucleotides; (c) SSR length distribution; (d) Distribution of paired SSR motifs; (e) Contigs with the highest occurrence of SSRs; (f) SSR count versus sequence length.
| Subject | |
| Specific subject area | |
| Type of data | |
| How the data were acquired | Paired-end (2 × 150 cycles) sequencing on Illumina HiSeq 2500 |
| Data format | Raw, analysed, filtered |
| Description of data collection | Fresh leaf samples were collected from the single specimen growing in the SANR area. DNA was isolated by CTAB method in triplicates. DNA purity and concentration were measured before sequencing. DNA sequences were obtained by Illumina HiSeq 2500 platform followed by |
| Data source location | • Institution: Kuwait Institute for Scientific Research |
| Data accessibility | Repository Name: National Centre for Biotechnology Information and figshare |