| Literature DB >> 33026067 |
Tingyu Shan1, Junxian Wu1, Daqing Yu1, Jin Xie2, Qingying Fang1, Liangping Zha1,3, Huasheng Peng1,4.
Abstract
Atractylodes lancea (Thunb.) DC. is a traditional Chinese medicine rich in sesquiterpenes that has been widely used in China and Japan for the treatment of viral infections. Despite its important pharmacological value, genomic information regarding A. lancea is currently unavailable. In the present study, the whole genome sequence of A. lancea was obtained using an Illumina sequencing platform. The results revealed an estimated genome size for A. lancea of 4,159.24 Mb, with 2.28% heterozygosity, and a repeat rate of 89.2%, all of which indicate a highly heterozygous genome. Based on the genomic data of A. lancea, 27,582 simple sequence repeat (SSR) markers were identified. The differences in representation among nucleotide repeat types were large, e.g., the mononucleotide repeat type was the most abundant (54.74%) while the pentanucleotide repeats were the least abundant (0.10%), and sequence motifs GA/TC (31.17%) and TTC/GAA (7.23%) were the most abundant among the dinucleotide and trinucleotide repeat motifs, respectively. A total of 93,434 genes matched known genes in common databases including 48,493 genes in the Gene Ontology (GO) database and 34,929 genes in the Kyoto Encyclopedia of Genes and Genomes (KEGG) database. This is the first report to sequence and characterize the whole genome of A. lancea and will provide a theoretical basis and reference for further genome-wide deep sequencing and SSR molecular marker development of A. lancea.Entities:
Keywords: Atractylodes lancea; Functional annotation; Genome size; Genome survey; Simple sequence repeat (SSR)
Mesh:
Substances:
Year: 2020 PMID: 33026067 PMCID: PMC7593537 DOI: 10.1042/BSR20202709
Source DB: PubMed Journal: Biosci Rep ISSN: 0144-8463 Impact factor: 3.840
Statistical data from the 17-mer analysis
| K-mer number | Genome size (Mp) | Repeat (%) | Heterozygous ratio (%) | Used bases (Gp) | Used sequence depth (X) | |
|---|---|---|---|---|---|---|
| 17 | 223162157589 | 4159 | 89.2 | 2.28 | 251.12 | 60 |
Figure 1The 17-mer and GC-depth distribution of A. lancea genome
(A) The distribution curve of 17-mer. (B) The GC-depth distribution of A. lancea genome.
Genomic information statistics of A. lancea
| Scaffold | Contig | |||
|---|---|---|---|---|
| Length (bp) | Number | Length (bp) | Number | |
| max_len | 289109 | 289109 | ||
| N10 | 4256 | 68307 | 2857 | 94469 |
| N20 | 2477 | 211142 | 1622 | 299948 |
| N30 | 1629 | 438525 | 1071 | 629298 |
| N40 | 1120 | 775151 | 750 | 1110774 |
| N50 | 778 | 1260281 | 533 | 1790925 |
| N60 | 526 | 1967071 | 375 | 2751303 |
| N70 | 335 | 3047918 | 253 | 4135094 |
| N80 | 190 | 4851250 | 165 | 6261804 |
| N90 | 119 | 7855566 | 112 | 9405484 |
| Total_length | 4508659206 | 4277961587 | ||
| number ≥ 100 bp | 12088217 | 13462724 | ||
| number ≥ 2000 bp | 311077 | 199719 | ||
| GC_rate | 0.367 | 0.384 | ||
SSR types detected in the A. lancea sequences
| Searching item | Number | Ratio (%) |
|---|---|---|
| total_SSR_ number | 27582 | |
| total_SSR_ length (bp) | 476561 | |
| Number of SSR-containing sequences | 25144 | 91.16% |
| Number of sequences containing more than one SSR | 2212 | 8.02% |
| Total number of identified SSR | 26034 | 100.00% |
| Mononucleotide | 14250 | 54.74% |
| Dinucleotide | 7219 | 27.73% |
| Trinucleotide | 4217 | 16.20% |
| Tetranucleotide | 273 | 1.05% |
| Pentanucleotide | 26 | 0.10% |
| Hexanucleotide | 49 | 0.19% |
Figure 2Distribution of various classes of simple repeat motifs with different numbers of repeats in the A. lancea genome
X-axis, number of SSR repeats; Y-axis, frequency of SSR type.
Figure 3Percentage of different motifs in dinucleotide and trinucleotide repeats in A. lancea
(A) Frequency of different dinucleotide SSR motifs. (B) Frequency of different trinucleotide SSR motifs.
Figure 4GO functional classification of A. lancea unigenes
Figure 5KEGG functional classification of A. lancea unigenes