| Literature DB >> 31189745 |
Guo-Qi Li1,2, Li-Xiao Song1,2, Chang-Qing Jin3,2, Miao Li3,4, Shi-Pei Gong3,2, Ya-Fang Wang3,2.
Abstract
Apocynum venetum is an eco-economic plant that exhibits high stress resistance. In the present paper, we carried out a whole-genome survey of A. venetum in order to provide a foundation for its whole-genome sequencing. High-throughput sequencing technology (Illumina NovaSep) was first used to measure the genome size of A. venetum, and bioinformatics methods were employed for the evaluation of the genome size, heterozygosity ratio, repeated sequences, and GC content in order to provide a foundation for subsequent whole-genome sequencing. The sequencing analysis results indicated that the preliminary estimated genome size of A. venetum was 254.40 Mbp, and its heterozygosity ratio and percentage of repeated sequences were 0.63 and 40.87%, respectively, indicating that it has a complex genome. We used k-mer = 41 to carry out a preliminary assembly and obtained contig N50, which was 3841 bp with a total length of 223949699 bp. We carried out further assembly to obtain scaffold N50, which was 6196 bp with a total length of 227322054 bp. We performed simple sequence repeat (SSR) molecular marker prediction based on the A. venetum genome data and identified a total of 101918 SSRs. The differences between the different types of nucleotide repeats were large, with mononucleotide repeats being most numerous and hexanucleotide repeats being least numerous. We recommend the use of the '2+3' (Illumina+PacBio) sequencing combination to supplement the Hi-C technique and resequencing technique in future whole-genome research in A. venetum.Entities:
Keywords: Apocynum venetum; GC content; K-mer analysis; SSR molecular marker; genome sequencing; heterozygosity ratio
Mesh:
Year: 2019 PMID: 31189745 PMCID: PMC6591564 DOI: 10.1042/BSR20190146
Source DB: PubMed Journal: Biosci Rep ISSN: 0144-8463 Impact factor: 3.840
Data statistics
| Lib ID | Raw base (bp) | Clean base (bp) | Effective rate (%) | Error rate (%) | Q20 (%) | Q30 (%) |
|---|---|---|---|---|---|---|
| NDES00224 | 36620525700 | 36605877489 | 99.96 | 0.04 | 96.09 | 89.95 |
Abbreviations: Q20, percentage of bases with quality value ≥ 20; Q30, percentage of bases with quality value ≥ 30.
Figure 1Distribution figure of GC content
The left half of the dotted line in this figure is the read-1 GC content distribution, and the right half is the read-2 GC content distribution, different colors represent different base types, which is used to detect whether AT, GC separation is present.
Figure 2Distribution curve of K-mer
It is an analysis of the genome size prediction of Apocynum, which determines the expected depth of K-mer from the position of the main peak.
Statistics of the assembled genome sequences in A. venetum
| Item | Contig | Scaffold | ||
|---|---|---|---|---|
| Length (bp) | Number | Length (bp) | Number | |
| N50 | 3841 | 10027 | 6196 | 6502 |
| N60 | 2016 | 18198 | 3091 | 11744 |
| N70 | 1074 | 33671 | 1468 | 22655 |
| N80 | 562 | 62797 | 696 | 45616 |
| N90 | 255 | 122424 | 300 | 96392 |
| Total length (bp) | 223949699 | 227322054 | ||
| Total number | 282245 | 239333 | ||
| Max length (bp) | 134265 | 191270 | ||
| GC content (%) | 32.91 | |||
Figure 3Distribution figure of contig coverage depth and length
In the figure, the peak with the most distribution is the main peak, the heterozygosity of the genome was judged according to the peak of 1/2 position before the main peak.
Figure 4Distribution figure of contig coverage depth and number
In the figure, the peak with the most distribution is the main peak, the heterozygosity of the genome was judged according to the peak of 1/2 position before the main peak.
Figure 5Distribution figure of GC_depth
Type and proportion of SSR
| SSR repeat type | Number | Proportion (%) | SSR repeat type | Number | Proportion (%) | |
|---|---|---|---|---|---|---|
| Mononucleotide | ||||||
| A/T | 59078 | 57.966 | Tetranucleotide | AAAT/ATTT | 248 | 0.243 |
| C/G | 6142 | 6.026 | AACC/GGTT | 1 | 0.001 | |
| Dinucleotide | ||||||
| AC/GT | 3050 | 2.993 | AACT/AGTT | 5 | 0.005 | |
| AG/CT | 5552 | 5.448 | AAGG/CCTT | 4 | 0.004 | |
| AT/TA | 20655 | 20.266 | AATC/ATTG | 6 | 0.006 | |
| CG/GC | 19 | 0.019 | AATG/ATTC | 5 | 0.005 | |
| Trinucleotide | ||||||
| AAC/GTT | 120 | 0.118 | AATT/AATT | 37 | 0.036 | |
| AAG/CTT | 1390 | 1.364 | ACAG/CTGT | 2 | 0.002 | |
| AAT/ATT | 4020 | 3.944 | ACAT/ATGT | 42 | 0.041 | |
| ACC/GGT | 215 | 0.211 | ACCC/GGGT | 4 | 0.004 | |
| ACG/CGT | 27 | 0.026 | ACTC/AGTG | 2 | 0.002 | |
| ACT/AGT | 135 | 0.132 | AGAT/ATCT | 25 | 0.025 | |
| AGC/CTG | 124 | 0.122 | AGCC/CTGG | 1 | 0.001 | |
| AGG/CCT | 166 | 0.163 | AGGC/CCTG | 1 | 0.001 | |
| ATC/ATG | 395 | 0.388 | AGGG/CCCT | 10 | 0.010 | |
| CCG/CGG | 40 | 0.039 | ATCC/ATGG | 4 | 0.004 | |
| Tetranucleotide | ||||||
| AAAC/GTTT | 18 | 0.018 | ATGC/ATGC | 2 | 0.002 | |
| AAAG/CTTT | 112 | 0.110 |