| Literature DB >> 29643326 |
San-Xu Liu1, Wei Hou1, Xue-Yan Zhang1, Chang-Jun Peng1, Bi-Song Yue1,2, Zhen-Xin Fan1, Jing Li3,2.
Abstract
The Tibetan macaque, which is endemic to China, is currently listed as a Near Endangered primate species by the International Union for Conservation of Nature (IUCN). Short tandem repeats (STRs) refer to repetitive elements of genome sequence that range in length from 1-6 bp. They are found in many organisms and are widely applied in population genetic studies. To clarify the distribution characteristics of genome-wide STRs and understand their variation among Tibetan macaques, we conducted a genome-wide survey of STRs with next-generation sequencing of five macaque samples. A total of 1 077 790 perfect STRs were mined from our assembly, with an N50 of 4 966 bp. Mono-nucleotide repeats were the most abundant, followed by tetra- and di-nucleotide repeats. Analysis of GC content and repeats showed consistent results with other macaques. Furthermore, using STR analysis software (lobSTR), we found that the proportion of base pair deletions in the STRs was greater than that of insertions in the five Tibetan macaque individuals (P<0.05, t-test). We also found a greater number of homozygous STRs than heterozygous STRs (P<0.05, t-test), with the Emei and Jianyang Tibetan macaques showing more heterozygous loci than Huangshan Tibetan macaques. The proportion of insertions and mean variation of alleles in the Emei and Jianyang individuals were slightly higher than those in the Huangshan individuals, thus revealing differences in STR allele size between the two populations. The polymorphic STR loci identified based on the reference genome showed good amplification efficiency and could be used to study population genetics in Tibetan macaques. The neighbor-joining tree classified the five macaques into two different branches according to their geographical origin, indicating high genetic differentiation between the Huangshan and Sichuan populations. We elucidated the distribution characteristics of STRs in the Tibetan macaque genome and provided an effective method for screening polymorphic STRs. Our results also lay a foundation for future genetic variation studies of macaques.Entities:
Keywords: Next-generation sequencing; Polymorphism; Short tandem repeats; Tibetan macaque (Macaca thibetana) genome; Variation analysis
Mesh:
Year: 2018 PMID: 29643326 PMCID: PMC5968858 DOI: 10.24272/j.issn.2095-8137.2018.047
Source DB: PubMed Journal: Zool Res ISSN: 2095-8137
Assembly results for the M. thibetana genome
| Item | Number |
|---|---|
| Total number of sequences examined ( | 1 223 752 |
| Total size of examined sequences (bp) | 2 658 459 556 |
| Mean length of examined sequences (bp) | 2 172.38 |
| N50 | 4 966 |
| N90 | 1 108 |
| GC content (%) | 40.48% |
| Number of sequences containing STRs ( | 521 523 |
| Number of sequences containing more than one STR ( | 252 041 |
Number, length, frequency, density, and GC content of perfect STRs
| Types | Total Counts | Total | Average | Frequency | Density | GC | GC |
|---|---|---|---|---|---|---|---|
| Mono- | 623 930 | 10 397 456 | 16.66 | 234.7 | 3 911.083 | 50 200 | 0.48 |
| Di- | 165 769 | 3 646 216 | 22 | 62.36 | 1 371.552 | 1 777 603 | 48.75 |
| Tri- | 64 529 | 1 216 338 | 18.85 | 24.27 | 457.535 | 436 005 | 35.85 |
| Tetra- | 181 344 | 4 372 964 | 24.11 | 68.21 | 1 644.924 | 1 407 970 | 32.20 |
| Penta- | 35 871 | 873 535 | 24.35 | 13.49 | 328.587 | 230 585 | 26.40 |
| Hexa- | 6 347 | 172 866 | 27.24 | 2.39 | 65.025 | 60 271 | 34.87 |
| Total | 1 077 790 | 20 679 37 | 19.19 | 405.42 | 7 778.706 | 3 962 634 | 19.16 |
Figure 1Distribution of STR motifs in M. thibetana
Figure 2Distribution of STR types in the M. thibetana genome by repeat time
Figure 3Alignment results of STRs
Number, length, frequency, density, and GC content of perfect STRs
| Locus | |||||
|---|---|---|---|---|---|
| JR05 | 4 | 16 | 0.875 | 0.659 | 0.590 |
| JR09 | 7 | 16 | 0.750 | 0.808 | 0.754 |
| JR10 | 6 | 15 | 0.400 | 0.577 | 0.524 |
| JR12 | 6 | 15 | 0.400 | 0.811 | 0.751 |
| JR18 | 5 | 15 | 0.467 | 0.602 | 0.519 |
| JR20 | 4 | 15 | 0.400 | 0.701 | 0.613 |
Na, observed number of alleles; N, sample number; Ho, mean observed heterozygosity; He, mean expected heterozygosity; PIC, polymorphic information content.
Figure 4Comparison of the four allelotype categories for each repeat type among the five Tibetan macaques
Figure 5Distribution of allele size differences in STRs from the reference for the five M. thibetana individuals