| Literature DB >> 30314376 |
Jing Li6,7, Zhenxin Fan2, Tianlin Sun3, Changjun Peng4, Bisong Yue5, Jing Li6,7.
Abstract
Macaca is of great importance in evolutionary and biomedical research. Aiming at elucidating genetic diversity patterns and potential biomedical applications of macaques, we characterized single nucleotide variations (SNVs) of six Macaca species based on the reference genome of Macaca mulatta. Using eight whole-genome sequences, representing the most comprehensive genomic SNV study in Macaca to date, we focused on discovery and comparison of nonsynonymous SNVs (nsSNVs) with bioinformatic tools. We observed that SNV distribution patterns were generally congruent among the eight individuals. Outlier tests of nsSNV distribution patterns detected 319 bins with significantly distinct genetic divergence among macaques, including differences in genes associated with taste transduction, homologous recombination, and fat and protein digestion. Genes with specific nsSNVs in various macaques were differentially enriched for metabolism pathways, such as glycolysis, protein digestion and absorption. On average, 24.95% and 11.67% specific nsSNVs were putatively deleterious according to PolyPhen2 and SIFT4G, respectively, among which the shared deleterious SNVs were located in 564⁻1981 genes. These genes displayed enrichment signals in the 'obesity-related traits' disease category for all surveyed macaques, confirming that they were suitable models for obesity related studies. Additional enriched disease categories were observed in some macaques, exhibiting promising potential for biomedical application. Positively selected genes identified by PAML in most tested Macaca species played roles in immune and nervous system, growth and development, and fat metabolism. We propose that metabolism and body size play important roles in the evolutionary adaptation of macaques.Entities:
Keywords: Macaca; SNVs; biomedical applications; comparative genomics; genetic diversity; macaques
Mesh:
Year: 2018 PMID: 30314376 PMCID: PMC6212917 DOI: 10.3390/ijms19103123
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Information on genome data.
| Scientific Names | Species Symbol | Sample Identifier(s) | GenBank Accession(s) | Sequencing Platform(s) | #Reads | Depth | Total Usable Sites | Sex | Sample Origin(s) | Source(s) |
|---|---|---|---|---|---|---|---|---|---|---|
|
| IR | IR | -- | Illumina | 20,100,000 | 5.1X | --- | Female | Washington National Primate Research Center | Gibbs et al. 2007 [ |
|
| CR | CR1 | SRA023856 | Illumina | 3,299,851,568 | 45.65X | 2,264,143,011 | Female | Yunnan, China | Yan et al. 2011 [ |
|
| CE | CE1 | SRA023855 | Illumina | 3,299,851,568 | 43.96X | 2,245,482,535 | Female | Vietnam | Yan et al. 2011 [ |
| CE2 | -- | SOLiD 3+ | 3,692,987,634 | 24.69X | 2,261,105,771 | Female | Malaysia | Higashino et al. 2012 [ | ||
|
| SM | SM1 | SRX1470574 | Illumina | 1,001,034,260 | 34.55X | 2,280,352,231 | Female | Southwestern China | Fan et al. 2018 [ |
| SM2 | SRX1470575 | Illumina | 471,805,366 | 20.51X | 2,079,812,789 | Female | Southwestern China | Fan et al. 2018 [ | ||
|
| TM | TM1 | SRP032525 | Illumina | 1,275,012,390 | 36.92X | 2,281,638,762 | Female | Sichuan, China | Fan et al. 2014 [ |
|
| AM | AM1 | SRX1470561 | Illumina | 1,231,654,664 | 54.04X | 2,011,347,545 | Male | Yunnan, China | Fan et al. 2018 [ |
|
| PM | PM1 | SRX1022644 | Illumina | 770,413,198 | 25.59X | 2,246,079,419 | Female | Washington National Primate Research Center | Baylor College of Medicine |
-- means there is no GenBank Accession.
SNV information for each analyzed macaque (see species symbol and sample identifiers in Table 1) including the total number of SNVs, the number of heterozygous (het.) or homozygous (homo.) SNVs, and the number of specific SNVs.
| Species Symbol | Sample Identifier(s) | #SNVs | %SNVs | #Homo. | #Het. | %Het. | Ti/Tv | #Specific | %Specific | #Specific Het. |
|---|---|---|---|---|---|---|---|---|---|---|
| CR | CR1 | 9,384,359 | 3.47/kb | 3,458,482 | 5,925,877 | 63.15 | 2.23 | 2,614,186 | 27.86 | 2,297,127 |
| CE | CE1 | 11,751,302 | 4.35/kb | 5,004,945 | 6,746,357 | 57.41 | 2.21 | 2,892,996 | 24.62 | 2,464,551 |
| CE2 | 12,000,848 | 4.44/kb | 4,812,493 | 7,188,355 | 59.90 | 2.23 | 3,317,314 | 27.64 | 2,900,657 | |
| 5,089,889 † | -- | 2,712,160 | 2,377,729 | 46.71 | 2.25 | 762,701 | 14.98 | 369,245 | ||
| SM | SM1 | 12,712,801 | 4.69/kb | 8,985,648 | 3,727,153 | 29.32 | 2.21 | 803,928 | 6.32 | 740,450 |
| SM2 | 11,035,407 | 4.08/kb | 7,861,537 | 3,173,870 | 28.76 | 2.17 | 753,236 | 6.83 | 696,062 | |
| 9,353,661 † | -- | 6,931,659 | 2,422,002 | 25.89 | 2.19 | 2,005,117 | 21.43 | 661,821 | ||
| TM | TM1 | 11,937,445 | 4.42/kb | 9,889,106 | 2,048,339 | 17.16 | 2.21 | 1,633,457 | 13.68 | 701,115 |
| AM | AM1 | 12,249,208 | 4.52/kb | 6,770,425 | 5,478,783 | 44.73 | 2.17 | 2,638,128 | 21.54 | 2,300,208 |
| PM | PM1 | 13,914,612 | 5.15/kb | 7,613,888 | 6,300,724 | 45.28 | 2.18 | 5,307,739 | 38.15 | 3,860,264 |
† SNVs shared by two individuals of the same species. -- means not applicable here.
Figure 1UpSetR plot illustrating the numbers of SNVs shared by different pairs or sets of macaques. Only the first twenty sets are displayed. Intersection Size on the y-axis represents the number of shared SNVs in the pair or set of macaques showed on x-axis.
Functional annotation of (a) all processed SNVs and (b) exonic SNVs in Macaca species based on rheMac2 provided by ANNOVAR.
| ( | ||||||||||||
|
|
|
|
|
|
|
|
|
| ||||
| CR1 | 9,384,359 | 6,172,058 | 65.77% | 3,212,301 | 34.23% | 144,929 | 56,515 | 0.60% | 426 | 2,976,764 | 28,536 | 5131 |
| CE1 | 11,751,302 | 7,595,956 | 64.64% | 4,155,346 | 35.36% | 181,392 | 71,807 | 0.61% | 524 | 3,858,404 | 37,045 | 6174 |
| CE2 | 12,000,848 | 7,836,824 | 65.30% | 4,164,024 | 34.70% | 189,573 | 82,821 | 0.69% | 545 | 3,844,219 | 39,977 | 6889 |
| SM | 9,353,661 | 5,956,902 | 63.69% | 3,396,759 | 36.31% | 149,351 | 63,179 | 0.68% | 411 | 3,145,443 | 33,775 | 4600 |
| TM1 | 11,937,445 | 7,642,434 | 64.02% | 4,295,011 | 35.98% | 188,920 | 77,819 | 0.65% | 519 | 3,980,620 | 40,541 | 6592 |
| AM1 | 12,249,208 | 7,880,410 | 64.33% | 4,368,798 | 35.67% | 184,730 | 73,890 | 0.60% | 505 | 4,063,127 | 40,412 | 6134 |
| PM1 | 13,914,612 | 8,944,532 | 64.28% | 4,970,080 | 35.72% | 214,348 | 87,789 | 0.63% | 612 | 4,613,059 | 46,798 | 7474 |
| ( | ||||||||||||
|
|
|
|
| |||||||||
| CR1 | All | 33,295 | 58.91% | 22,839 | 40.41% | 328 | 0.58% | 53 | 0.09% | |||
| Specific | 9576 | 60.79% | 6077 | 38.58% | 92 | 0.58% | 8 | 0.05% | ||||
| CE1 | All | 42,590 | 59.31% | 28,772 | 40.07% | 392 | 0.55% | 53 | 0.07% | |||
| Specific | 10,392 | 59.50% | 6963 | 39.87% | 101 | 0.58% | 10 | 0.06% | ||||
| CE2 | All | 48,535 | 58.60% | 33,769 | 40.77% | 464 | 0.56% | 53 | 0.06% | |||
| Specific | 12,259 | 59.93% | 8091 | 39.55% | 100 | 0.49% | 7 | 0.03% | ||||
| SM | All | 38,424 | 60.82% | 24,478 | 38.74% | 237 | 0.38% | 40 | 0.06% | |||
| Specific | 7753 | 61.25% | 4839 | 38.23% | 61 | 0.48% | 5 | 0.04% | ||||
| TM1 | All | 45,939 | 59.03% | 31,441 | 40.40% | 375 | 0.48% | 64 | 0.08% | |||
| Specific | 6057 | 58.76% | 4195 | 40.70% | 52 | 0.50% | 4 | 0.04% | ||||
| AM1 | All | 43,945 | 59.47% | 29,497 | 39.92% | 400 | 0.54% | 48 | 0.06% | |||
| Specific | 10,381 | 60.70% | 6599 | 38.58% | 115 | 0.67% | 8 | 0.05% | ||||
| PM1 | All | 51,763 | 58.96% | 35,503 | 40.44% | 466 | 0.53% | 57 | 0.06% | |||
| Specific | 20,034 | 61.25% | 12,507 | 38.24% | 154 | 0.47% | 15 | 0.05% | ||||
Figure 2The mean distribution frequencies of the total, genic, and exonic SNVs for all Macaca species structurally annotated based on rhesus macaque genome.
Figure 3Outlier test of nonsynonymous SNV distribution patterns on chromosome 1 for eight macaque individuals using Cook’s distance test in R. The blue circles represent outlier chromosomal bins that hold significantly more nsSNVs than others. In addition, the numbers next to the blue circles are Cook’s distances of the outlier bins. The dark blue line shows the threshold we used that is 30 times of the mean Cook’s distance.
Enrichment outputs of genes with nonsynonymous SNVs in windows with distinct nonsynonymous SNV distribution patterns based on outlier test in R, (a) GO term enrichment, (b) KEGG pathway enrichment.
| ( | |||||
|
|
|
|
|
|
|
| Proteolysis | GO:0006508 | 16 | 151 | 0.0198 | ENSMMUG00000008649, ENSMMUG00000013071, ENSMMUG00000012849, ENSMMUG00000005527, ENSMMUG00000007838, ENSMMUG00000008264, ENSMMUG00000003339, ENSMMUG00000004413, ENSMMUG00000015029, ENSMMUG00000007209, ENSMMUG00000006734, ENSMMUG00000016370, ENSMMUG00000007785, ENSMMUG00000001344, ENSMMUG00000019294, ENSMMUG00000005120 |
| Regulation of proteolysis | GO:0030162 | 6 | 40 | 0.0374 | ENSMMUG00000001344, ENSMMUG00000012849, ENSMMUG00000005527, ENSMMUG00000007209, ENSMMUG00000006734, ENSMMUG00000005120 |
| Positive regulation of innate immune response | GO:0045089 | 3 | 12 | 0.0448 | ENSMMUG00000019932, ENSMMUG00000003373, ENSMMUG00000008854 |
| Cation transmembrane transporter activity | GO:0008324 | 12 | 116 | 0.0463 | ENSMMUG00000031030, ENSMMUG00000015607, ENSMMUG00000006442, ENSMMUG00000018390, ENSMMUG00000013626, ENSMMUG00000030358, ENSMMUG00000007087, ENSMMUG00000032213, ENSMMUG00000004969, ENSMMUG00000007062, ENSMMUG00000007061, ENSMMUG00000010257 |
| ( | |||||
|
|
|
|
|
|
|
| Taste transduction | mcc04742 | 10 | 70 | 0.0113 | ENSMMUG00000007062, ENSMMUG00000020698, ENSMMUG00000021005, ENSMMUG00000015717, ENSMMUG00000016272, ENSMMUG00000011771, ENSMMUG00000022440, ENSMMUG00000022439, ENSMMUG00000032291, ENSMMUG00000004773 |
| Homologous recombination | mcc03440 | 5 | 23 | 0.0163 | ENSMMUG00000007197, ENSMMUG00000003130, ENSMMUG00000022442, ENSMMUG00000014487, ENSMMUG00000019014 |
| Fat digestion and absorption | mcc04975 | 5 | 26 | 0.0247 | ENSMMUG00000007692, ENSMMUG00000000825, ENSMMUG00000031036, ENSMMUG00000000148, ENSMMUG00000002724 |
| Pancreatic secretion | mcc04972 | 9 | 73 | 0.0339 | ENSMMUG00000020698, ENSMMUG00000031036, ENSMMUG00000018390, ENSMMUG00000015298, ENSMMUG00000010306, ENSMMUG00000000148, ENSMMUG00000021397, ENSMMUG00000032208, ENSMMUG00000002724 |
Positively selected genes (PSGs) identified by PAML for different Macaca species/population. CE_Viet represents Vietnamese cynomolgus macaque (CE1 in Table 1), and CE_Mal stands for Malaysian cynomolgus macaque (CE2 in Table 1).
| Species/Population Symbol | CR | CE_Viet | CE_Mal | SM | TM | AM | PM |
|---|---|---|---|---|---|---|---|
| Gene Counts | 13 | 14 | 17 | 18 | 12 | 18 | 33 |
| Gene Symbol | BRI3 † KSR1 ZMYND10 PGGT1B SHARPIN EVI2B EP400 RBP3 TBX4 MYRF CHST1 NCKAP5 ‡ SEC16B ‡ | KSR1 † PIN1 ZNF787DDT IGFL1 FCAMR CCDC33 ‡ EVC2 ‡ FRMPD2‡ SF3B1 FAM53A HAP1 ‡ KIAA0825 ZNF474 ‡ | CMYA5 †‡ BAHD1 EVI2B HTT ‡ ZNF646 KIAA1671 ‡ KCNMB2 NHLRC1 BAP1 RNF222 SH2D2A DROSHA | ETFB † ACE SKIV2L BAHD1 EP400 URB2 C9ORF131 PRR14L LMNB1 DIS3L2 KLF13 APOBR ‡ ASXL1 ‡ DACT2 AIM1 SPAG5 THEM6 RSL24D1 | CSRP1 † VSTM2L LYST ‡ DIS3L2 BAHD1 ACAN DDIAS KIAA1549 EXOSC6 THEM6 LYRM5 RSL24D 1 | KCNK1 † AFAP1L1 JSRP1 OGFR INPP5E MYD88 BAHD1 ASPM ‡ EVC2 LMNB1 KIF26B‡ KIAA1671 ‡ GAMT SRRM 2 AQP1 ‡ LYST ‡ ZNF330 THEM6 | HTT †‡, SRRM2 ‡ CCDC17 ‡, WHAMM BDP1 ‡, ALPK3 ‡ NDNF, ACAN ‡ TRIM28, LYST ‡ LRRC10B, RARRES1 ‡ CMIP, RCSD1 ERCC6 ‡, TFG GAMT, KIF26B ‡ SLC9C2, NAT6 DHRS9, EXO1 ‡ HAVCR1, TNK2 ‡ DDX31, XIRP2 ‡ WDR73, HVCN1 ‡ TMEM126B ‡, APOBR, NDUFV3 PRDM1, YWHAE |
† Gene symbol in bold represents the top PSG for each species/population. ‡ PSGs with probably damaging nsSNVs.