| Literature DB >> 32155132 |
Wen-Hua Qi1,2, Ting Lu2, Cheng-Li Zheng3, Xue-Mei Jiang4, Hang Jie5, Xiu-Yue Zhang2, Bi-Song Yue2, Gui-Jun Zhao5.
Abstract
Forest musk deer (Moschus berezovskii, FMD) is an endangered artiodactyl species, male FMD produce musk. We have sequenced the whole genome of FMD, completed the genomic assembly and annotation, and performed bioinformatic analyses. Our results showed that microsatellites (SSRs) displayed nonrandomly distribution in genomic regions, and SSR abundances were much higher in the intronic and intergenic regions compared to other genomic regions. Tri- and hexanucleotide perfect (P) SSRs predominated in coding regions (CDSs), whereas, tetra- and pentanucleotide P-SSRs were less abundant. Trifold P-SSRs had more GC-contents in the 5'-untranslated regions (5'UTRs) and CDSs than other genomic regions, whereas mononucleotide P-SSRs had the least GC-contents. The repeat copy numbers (RCN) of the same mono- to hexanucleotide P-SSRs had different distributions in different genomic regions. The RCN of trinucleotide P-SSRs had increased significantly in the CDSs compared to the transposable elements (TEs), intronic and intergenic regions. The analysis of coefficient of variability (CV) of P-SSRs showed that the RCN of mononucleotide P-SSRs had relative higher variation in different genomic regions, followed by the CV pattern of RCN: dinucleotide P-SSRs > trinucleotide P-SSRs > tetranucleotide P-SSRs > pentanucleotide P-SSRs > hexanucleotide P-SSRs. The CV variations of RCN of the same mono- to hexanucleotide P-SSRs were relative higher in the intron and intergenic regions, followed by that in the TEs, and the relative lower was in the 5'UTR, CDSs and 3'UTRs. 58 novel polymorphic SSR loci were detected based on genotyping DNA from 36 captive FMD and 22 SSR markers finally showed polymorphism, stability, and repetition.Entities:
Keywords: GC; forest musk deer genome; genomic regions; microsatellites; variation analysis
Mesh:
Year: 2020 PMID: 32155132 PMCID: PMC7093171 DOI: 10.18632/aging.102895
Source DB: PubMed Journal: Aging (Albany NY) ISSN: 1945-4589 Impact factor: 5.682
Overview of mono- to hexanucleotide P-SSRs in the FMD genome.
| # of P-SSRs | 273,518 | 148,175 | 122,105 | 399,77 | 962,62 | 598 | 680,635 |
| GC-content (in %) | 1.72 | 37.14 | 62.56 | 29.13 | 40.27 | 59.97 | 32.41 |
| Total length of P-SSRs (bp) | 3,192,531 | 2,719,558 | 2,077,536 | 678,452 | 106,801 | 15,420 | 8,790,299 |
| Relative abundance (#/Mb) | 100.38 | 53.40 | 44.00 | 14.41 | 35.33 | 0.22 | 247.75 |
| P-SSR percentage (%a) | 40.19 | 21.77 | 17.94 | 5.87 | 14.14 | 0.09 | 100.00 |
%a= mono- to hexanucleotide P-SSRs account for the proportion of all P-SSRs in the whole FMD genome.
The most frequent P-SSR motifs in the FMD genome.
| A(80.48) | AC (34.47) | ACG (17.94) | AAAT (4.12) | AACTG (0.87) | AACCCT (0.02) |
| C(1.08) | AG (4.17) | AGC (17.74) | AAAC (1.78) | AGTTC (0.87) | ACCCCC (0.02) |
| — | AT (14.62) | AAC (2.00) | AAAG (0.88) | AAGTG (0.02) | AGGGTT (0.02) |
| — | CG (0.14) | AAT (1.97) | AGGT (0.64) | AAACA (0.01) | AAACAA (0.01) |
| — | — | ACC (1.39) | ACGT (0.61) | AAGGC (0.01) | ACACAG (0.01) |
| — | — | CCG (1.09) | ACCT (0.59) | GCCTT (0.01) | ACTGCT (0.01) |
a The numbers in parentheses refer to relative abundance.
Figure 1The proportion of mono- to hexanucleotide P-SSRs in different genomic regions of FMD genome.
Number, percentage, and relative abundance of P-SSRs in the different genomic regions of the FMD genome.
| Mono- | # of P-SSRs | 115 | 26 | 72,591 | 598 | 58,987 | 135,726 |
| #/Mb | 59.10 | 0.76 | 104.31 | 78.20 | 49.01 | 91.61 | |
| Di- | # of P-SSRs | 60 | 16 | 457,58 | 316 | 27,997 | 89,153 |
| #/Mb | 30.83 | 0.47 | 54.79 | 41.32 | 22.93 | 62.54 | |
| Tri- | # of P-SSRs | 178 | 2,419 | 362,75 | 83 | 6,789 | 53,607 |
| #/Mb | 91.47 | 70.38 | 43.43 | 10.85 | 5.35 | 36.18 | |
| Tetra | # of P-SSRs | 38 | 23 | 134,58 | 56 | 12,619 | 28,975 |
| #/Mb | 19.53 | 0.67 | 16.11 | 7.32 | 8.58 | 19.56 | |
| Penta- | # of P-SSRs | 9 | 18 | 328,70 | 19 | 3,683 | 50,814 |
| #/Mb | 4.62 | 0.52 | 39.36 | 2.48 | 2.94 | 34.30 | |
| Hexa- | # of P-SSRs | 5 | 86 | 173 | 2 | 74 | 791 |
| #/Mb | 2.57 | 2.50 | 0.21 | 0.13 | 0.05 | 0.56 | |
| Total | # of P-SSRs | 405 | 2,588 | 201,125 | 1074 | 110,149 | 359,066 |
| #/Mb | 208.12 | 75.30 | 258.21 | 140.45 | 88.86 | 244.75 | |
Figure 2GC-content of mono- to hexanucleotide P-SSRs in different genomic regions of FMD genome.
Figure 3Distribution of different motifs of mono- to trinucleotide P-SSRs in different genomic regions of FMD genome.
Figure 4Comparative analysis of repeat copy number (RCN) of mono- to hexanucleotide P-SSRs in different genomic regions of FMD genome.
Figure 5The CV analysis of RCN of P-SSRs in different genomic regions of FMD genome.
Characteristics of the novel microsatellite marker system and the genetic diversity of captive FMD population, including locus names, primer sequences, accession number, repeat unit, fluorescent dyes, annealing temperatures (Tm), length (bp), numbers of individuals genotyped (N), numbers of alleles (k), observed heterozygosity (HO), expected heterozygosity (HE), allelic richness (Ar), Polymorphism Information Contents (PIC), HWE P values (P-value).
| LS-2-1 | F: GATCGAGTTGCAGGAGTC | KT390284 | (GCAG)10 | FAM | 57 | 416-440 | 36 | 6 | 0.385 | 0.65 | 6.000 | 0.593 | 0.03 |
| LS-6-1 | F: CAGGATCTGCTTCTGACATT | KT390285 | (GATG)8 | HEX | 59 | 420-432 | 36 | 3 | 0.538 | 0.555 | 3.000 | 0.484 | 0.376 |
| LS-7-1 | F: TAATTAGAGGGGTGTAAGCG | KT390286 | (AGGA)8 | HEX | 57 | 412-428 | 36 | 2 | 0.154 | 0.145 | 2.000 | 0.132 | 0.884 |
| LS-8-1 | F: TGTTCCTGGGATTCTTGAAG | KT390287 | (AGAC)8 | FAM | 55 | 408-432 | 36 | 5 | 0.654 | 0.675 | 5.000 | 0.598 | 0.311 |
| LS-9-1 | F: ATGAATCAACTCAGTCCCTG | KT390288 | (ATAG)8 | HEX | 59 | 410-430 | 36 | 3 | 0.192 | 0.278 | 2.895 | 0.255 | 0.054 |
| LS-12-3 | F: GCGGGATCATGAGAATAGGT | KT390289 | (CAGA)8 | FAM | 61 | 408-432 | 36 | 3 | 0.538 | 0.679 | 3.00 | 0.592 | 0.097 |
| LS-13-1 | F: TTGATCCAGTTCAGCAAAGT | KT390290 | (AGAA)8 | FAM | 61 | 400-432 | 36 | 6 | 0.615 | 0.655 | 6.000 | 0.582 | 0.253 |
| LS-14-2 | F: GGTCTTTCCTG TCACTCCTC | KT390291 | (TGCG)8 | FAM | 57 | 396-432 | 36 | 6 | 0.692 | 0.735 | 5.997 | 0.683 | 0.562 |
| LS-16-1 | F: AGCCATATTCTCAAACCATTC | KT390292 | (AGAC)8 | HEX | 57 | 406-430 | 36 | 4 | 0.577 | 0.508 | 3.990 | 0.454 | 0.181 |
| LS-17-1 | F: TTAACATGACATATGGGAGAG | KT390293 | (TATG)8 | FAM | 57 | 295-315 | 36 | 4 | 0.615 | 0.63 | 4.000 | 0.548 | 0.503 |
| LS-18-1 | F: CATCCATTCATCTGTCCCTT | KT390294 | (CCAT)8 | HEX | 57 | 412-430 | 36 | 3 | 0.577 | 0.562 | 2.993 | 0.472 | 0.391 |
| LS-24-1 | F: TTAAACATATGCCTAAGAGTCC | KT390295 | (TTGG)7 | HEX | 57 | 404-428 | 36 | 4 | 0.615 | 0.613 | 4.000 | 0.555 | 0.607 |
| LS-27-1 | F: CAGGGTAGCTCTAGATTTGT | KT390296 | (ATGG)7 | HEX | 55 | 368-388 | 36 | 4 | 0.308 | 0.278 | 3.995 | 0.257 | 0.54 |
| LS-28-1 | F: CCTAATTTTCCAGCTTGCAG | KT390297 | (ATCC)7 | FAM | 57 | 396-428 | 36 | 3 | 0.154 | 0.147 | 3.000 | 0.138 | 0.883 |
| LS-29-1 | F: GGAAACACACATCAGAACTC | KT390298 | (TTTA)7 | HEX | 57 | 308-328 | 36 | 3 | 0.346 | 0.386 | 3.000 | 0.343 | 0.351 |
| LS-30-1 | F: CATCACTGAAGCGACTTAGA | KT390299 | (ATCC)7 | FAM | 57 | 391-403 | 36 | 2 | 0.231 | 0.208 | 1.968 | 0.183 | 0.722 |
| LS-31-1 | F: GTGCTGTATTAGGCTTCAGA | KT390300 | (ATGG)7 | HEX | 57 | 408-428 | 36 | 2 | 0.154 | 0.208 | 2.000 | 0.183 | 0.274 |
| LS-35-1 | F: CCCTCAATTCCCTTCGATAG | KT844932 | (TGGA)7 | FAM | 57 | 178-190 | 36 | 2 | 0.115 | 0.111 | 1.965 | 0.103 | 0.94 |
| LS-47-1 | F: GCCCAGCAATTCTACTTCTA | KT844933 | (CCAA)15 | FAM | 59 | 220-244 | 36 | 4 | 0.615 | 0.571 | 4.000 | 0.512 | 0.319 |
| LS-50-1 | F: GGGTTGGTATGGAAAGTTCT | KT844934 | (TCCA)12 | HEX | 61 | 215-235 | 36 | 4 | 0.5 | 0.455 | 3.995 | 0.397 | 0.235 |
| LS-56-1 | F: GTACAGTACCATGCAGTCTT | KT844936 | (CATA)12 | HEX | 59 | 270-286 | 36 | 3 | 0.308 | 0.305 | 3.000 | 0.277 | 0.656 |