| Literature DB >> 34917712 |
Sergei S Ryakhovsky1,2, Victoria A Dikaya2, Vitaly I Korchagin1, Andrey A Vergun1,3, Lavrentii G Danilov4, Sofia D Ochkalova1,2, Anastasiya E Girnyk1, Daria V Zhernakova1,5, Marine S Arakelyan6, Vladimir B Brukhin7, Aleksey S Komissarov1,2, Alexey P Ryskov1.
Abstract
Darevskia rock lizards include 29 sexual and seven parthenogenetic species of hybrid origin distributed in the Caucasus. All seven parthenogenetic species of the genus Darevskia were formed as a result of interspecific hybridization of only four sexual species. It remains unknown what are the main advantages of interspecific hybridization along with switching on parthenogenetic reproduction in evolution of reptiles. Data on whole transcriptome sequencing of parthenogens and their parental ancestors can provide value impact in solving this problem. Here we have sequenced ovary tissue transcriptomes from unisexual parthenogenetic lizard D. unisexualis and its parental bisexual ancestors to facilitate the subsequent annotation and to obtain the collinear characteristics for comparison with other lizard species. Here we report generated RNAseq data from total mRNA of ovary tissues of D. unisexualis, D. valentini and D. raddei with 58932755, 51634041 and 62788216 reads. Obtained RNA reads were assembled by Trinity assembler and 95141, 62123, 61836 contigs were identified with N50 values of 2409, 2801 and 2827 respectively. For further analysis top Gene Ontology terms were annotated for all species and transcript number was calculated. The raw data were deposited in the NCBI SRA database (BioProject PRJNA773939). The assemblies are available in Mendeley Data and can be accessed via doi:10.17632/rtd8cx7zc3.1.Entities:
Keywords: AMP; Darevskia lizards; Ovaries; Parthenogenesis; Transcriptome analysis
Year: 2021 PMID: 34917712 PMCID: PMC8666336 DOI: 10.1016/j.dib.2021.107685
Source DB: PubMed Journal: Data Brief ISSN: 2352-3409
Statistics of the RNA-seq generated from three lizards.
| Species | Total reads | Total bases | Q20 bases | Q30 bases | GC content |
|---|---|---|---|---|---|
| 58.932755 M | 17,313222 G | 97.98% | 94.34% | 48.05% | |
| 51.634041 M | 15,593480 G | 97.65% | 93.35% | 46.59% | |
| 62.788216 M | 18,962041 G | 97.13% | 92.33% | 46.77% |
Q20 - ratio of bases with probability of containing no more than one error in 100 bases.
Q30 - ratio of bases with probability of containing no more than one error in 1,000 bases.
Summary characteristics of transcriptome sequence assembly of all three samples data.
| # contigs (>= 0 bp) | 122746 | 126141 | 228862 |
| # contigs (>= 1000 bp) | 39398 | 39145 | 57984 |
| # contigs (>= 5000 bp) | 3517 | 3533 | 2447 |
| # contigs (>= 10000 bp) | 134 | 118 | 8 |
| # contigs (>= 25000 bp) | 0 | 0 | 0 |
| # contigs (>= 50000 bp) | 0 | 0 | 0 |
| Total length (>= 0 bp) | 139903614 | 140424203 | 204321108 |
| Total length (>= 1000 bp) | 105256962 | 104453618 | 138481341 |
| Total length (>= 5000 bp) | 22937215 | 22851989 | 14738047 |
| Total length (>= 10000 bp) | 1524044 | 1315073 | 90998 |
| Total length (>= 25000 bp) | 0 | 0 | 0 |
| Total length (>= 50000 bp) | 0 | 0 | 0 |
| # contigs | 61836 | 62123 | 95141 |
| Largest contig | 16187 | 14170 | 12181 |
| Total length | 121008445 | 120525010 | 164367910 |
| GC (%) | 46,23 | 46,19 | 46,06 |
| N50 | 2827 | 2801 | 2409 |
| N75 | 1567 | 1558 | 1378 |
| L50 | 13677 | 13716 | 22778 |
| L75 | 27887 | 27951 | 45070 |
| # N's per 100 kbp | 0.0 | 0.0 | 0.0 |
Fig. 1Structural characteristics of three transcriptomes. A number of annotated proteins (A), and TRINITY contigs, and transcripts (B).
Fig. 2Distribution of the top 10 GO terms for each of the three species.
Summary of Trinotate/Gene Ontologies.
| Species | Total transcripts with GO | Total transcripts with only one GO | Total transcripts with multiple GO | Total GO in the file | Total unique GO in the file |
|---|---|---|---|---|---|
| 38844 | 1241 | 37603 | 553827 | 16934 | |
| 38756 | 1195 | 37561 | 550189 | 16885 | |
| 63219 | 1796 | 61423 | 880200 | 17771 |
Open Reading Frames (ORFs) prediction numbers using TransDecoder.
| Species | Total | Complete | 5-prime partial | 3-prime partial | Internal |
|---|---|---|---|---|---|
| 55816 | 34333 | 12663 | 3133 | 22136 | |
| 55808 | 34499 | 12893 | 3026 | 5327 | |
| 81344 | 43408 | 22136 | 5390 | 10473 |
Fig. 3Venn diagrams showing overlapping hits for three species (A) and overlapping of EggNOG genes (B).
| Subject | Biology |
| Specific subject area | Transcriptomics |
| Type of data | Transcriptome assemblies, raw sequences |
| How data were acquired | Ovary RNA from three lizard species were isolated and used for sequencing by the Macrogen Inc. (Korea) |
| Data format | Analyzed, Raw |
| Parameters for data collection | Data collection contains raw transcriptome data for ovary tissues of three lizard species: unisexual (parthenogenetic) |
| Description of data collection | Data collection includes total Illumina HiSeq2500 generated transcriptome reads, transcripts, TRINITY contigs, predicted proteins, and ORFs. |
| Data source location | All lizards were collected from Armenia populations. |
| Data accessibility | Raw data - BioProject |