| Literature DB >> 24648498 |
Zhenxin Fan1, Guang Zhao2, Peng Li3, Naoki Osada4, Jinchuan Xing5, Yong Yi6, Lianming Du3, Pedro Silva7, Hongxing Wang6, Ryuichi Sakate8, Xiuyue Zhang3, Huailiang Xu9, Bisong Yue2, Jing Li1.
Abstract
Macaques are the most widely distributed nonhuman primates and used as animal models in biomedical research. The availability of full-genome sequences from them would be essential to both biomedical and primate evolutionary studies. Previous studies have reported whole-genome sequences from rhesus macaque (Macaca mulatta) and cynomolgus macaque (M. fascicularis, CE), both of which belong to the fascicularis group. Here, we present a 37-fold coverage genome sequence of the Tibetan macaque (M. thibetana; TM). TM is an endemic species to China belonging to the sinica group. On the basis of mapping to the rhesus macaque genome, we identified approximately 11.9 million single-nucleotide variants), of which 3.9 million were TM specific, as assessed by comparison two Chinese rhesus macaques (CR) and two CE genomes. Some genes carried TM-specific homozygous nonsynonymous variants (TSHNVs), which were scored as deleterious in human by both PolyPhen-2 and SIFT (Sorting Tolerant From Intolerant) and were enriched in the eye disease genes. In total, 273 immune response and disease-related genes carried at least one TSHNV. The heterozygosity rates of two CRs (0.002617 and 0.002612) and two CEs (0.003004 and 0.003179) were approximately three times higher than that of TM (0.000898). Polymerase chain reaction resequencing of 18 TM individuals showed that 29 TSHNVs exhibited high allele frequencies, thus confirming their low heterozygosity. Genome-wide genetic divergence analysis demonstrated that TM was more closely related to CR than to CE. We further detected unusual low divergence regions between TM and CR. In addition, after applying statistical criteria to detect putative introgression regions (PIRs) in the TM genome, up to 239,620 kb PIRs (8.84% of the genome) were identified. Given that TM and CR have overlapping geographical distributions, had the same refuge during the Middle Pleistocene, and show similar mating behaviors, it is highly likely that there was an ancient introgression event between them. Moreover, demographic inferences revealed that TM exhibited a similar demographic history as other macaques until 0.5 Ma, but then it maintained a lower effective population size until present time. Our study has provided new insight into the macaque evolutionary history, confirming hybridization events between macaque species groups based on genome-wide data.Entities:
Keywords: SNVs; Tibetan macaque; demographic trajectories; genetic divergence; introgression; whole-genome sequencing
Mesh:
Year: 2014 PMID: 24648498 PMCID: PMC4032132 DOI: 10.1093/molbev/msu104
Source DB: PubMed Journal: Mol Biol Evol ISSN: 0737-4038 Impact factor: 16.240
FGenotyping pipeline and mean genome wide coverage. (A) Overview of the sequence alignment and customized genotyping pipeline in our study. Details of the major steps and postgenotype filters can be found in supplementary file S4, Supplementary Material online. (B) Proportion of the covered genome per sample as a function of minimum depth of coverage. The numbers in legend are the mean genome wide coverage.
Numbers of Useable Sites, Autosomal Heterozygosity, and SNV Rate in Five Macaques.
| Sample | Nonvariant Sites | SNVs | Total Useable Sites | Heterozygosity | SNV Rate | ||
|---|---|---|---|---|---|---|---|
| Heterozygous | Homozygous | Total | |||||
| CR1 | 2,254,758,652 | 5,925,877 | 3,458,482 | 9,384,359 | 2,264,143,011 | 0.002617 | 0.004145 |
| CR2 | 1,631,118,438 | 4,277,233 | 1,974,865 | 6,252,098 | 1,637,370,536 | 0.002612 | 0.003818 |
| CE1 | 2,233,731,233 | 6,746,357 | 5,004,945 | 11,751,302 | 2,245,482,535 | 0.003004 | 0.005233 |
| CE2 | 2,249,104,923 | 7,188,355 | 4,812,493 | 12,000,848 | 2,261,105,771 | 0.003179 | 0.005308 |
| TM | 2,269,701,317 | 2,048,339 | 9,889,106 | 11,937,445 | 2,281,638,762 | 0.000898 | 0.005232 |
Numbers of Small Indels (Deletions and Insertions) in TM.
| Type | Numbers | Cause Frame Shifting | |||
|---|---|---|---|---|---|
| Total | Within Gene | Within Exon | Within UTR | ||
| Deletions | 1,125,876 | 367,317 | 962 | 5,268 | 623 |
| Insertions | 1,032,913 | 336,218 | 875 | 4,894 | 613 |
Note.—Numbers of total indels and numbers of indels within genes/exon/UTR of genes are shown. “Cause frame shifting” indels mean the indels within the exon of genes could cause frame shifting because they are not 3x-bp length.
FDistribution of small indel lengths detected in the TM genome. (A) Total numbers of indels and the numbers of genic indels. (B) Numbers of exon indels.
SNV Distributions in TM.
| Total | Homozygous | Heterozygous | |
|---|---|---|---|
| Total SNVs | 11,937,445 (3,936,546) | 9,889,106 (2,861,190) | 2,048,339 (1,075,356) |
| Gene region | 3,833,049 (1,332,527) | 3,228,837 (999,602) | 604,212 (332,925) |
| Exon | 63,626 (26,205) | 52,992 (19,784) | 10,634 (6,421) |
| Intron | 3,262,692 (1,130,354) | 2,750,155 (847,520) | 512,537 (282,834) |
| UTR | 50,118 (18,892) | 41,776 (14,454) | 8,342 (4,438) |
| Noncoding | 456,613 (157,076) | 383,914 (117,844) | 72,699 (39,232) |
Note.—In parentheses are the numbers of SNVs that are specific to the TM when compared with other macaque individuals.
FSingle-nucleotide divergence and pairwise differences. (A) Single-nucleotide divergence between TM and other macaques in 50 kb nonoverlapping windows across the 20 autosomes. The heterozygous variants were not used in this test. (B) Pairwise differences between and within macaques estimated in 50 kb nonoverlapping windows across the genome. We calculated the genetic distance by using the genetic distance metric from Gronau et al. (2011). (C) The divergence between TM and CRs (TM/CRs) and compared it with different control sets in 50 kb nonoverlapping windows. Divergences within two CRs and between CRs and CEs were used as control sets. Numbers in parentheses are the percentages of windows, which TM/CRs minus control sets is smaller than zero. The divergence ratio is smaller than zero means the divergence between TM and CRs is lower than that of control sets.
Size and Proportion of the PIRs in TM Genome with Different P Values under Different Window Sizes.
| Terms | Different Window Sizes (kb) | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| 10 | 20 | 30 | 40 | 50 | 60 | 80 | 100 | ||
| 0.01 | Cutoff | −0.5 | −0.31 | −0.233 | −0.172 | −0.135 | −0.115 | −0.069 | −0.043 |
| Sizes (kb) | 35,880 | 36,660 | 34,020 | 35,400 | 36,450 | 33,360 | 34,320 | 36,600 | |
| Proportion of the genome (%) | 1.324 | 1.353 | 1.255 | 1.306 | 1.345 | 1.231 | 1.267 | 1.351 | |
| 0.05 | Cutoff | −0.131 | −0.032 | 0.0125 | 0.045 | 0.062 | 0.076 | 0.104 | 0.123 |
| Sizes (kb) | 215,640 | 239,620 | 218,100 | 171,000 | 141,200 | 99,300 | 82,720 | 63,300 | |
| Proportion of the genome (%) | 7.958 | 8.843 | 8.049 | 6.311 | 5.211 | 3.665 | 3.053 | 2.336 | |
Note.—The 1% and 5% quantiles of Rdiff in the simulated data are used as cutoff Rdiff, which are equal to P value 0.01 and 0.05 when applied them to the real data. The control set in the real data and the simulated data is the divergence between CR1 and CE1.
FDistribution of PIRs in the TM genome. The red bar represents the total sizes of PIRs in each chromosome, and the blue plot represents the proportion of the PIRs in each chromosome (based on the full size of the reference genome). The last column “all” means the sum of the 20 autosomes. (A) With the P value = 0.01 as cutoff. (B) With the P value = 0.05 as cutoff.
FThe allele frequency of 29 TSHNVs. The allele frequency of 29 TSHNVs at 14 important immune or disease genes was generated by PCR resequencing in 14–18 unrelated TM individuals from Sichuan province, China. The gene symbol, chromosome location, and genomic position of the variants were shown, and the details for these variants and genes could be found in supplementary file S3, table S15, Supplementary Material online. The detail sequencing results could be found in supplementary file S3, table S16, Supplementary Material online.
FHistorical changes in effective population size. Reconstruction of historical patterns of effective population size for five macaque genomes based on the genomic distribution of heterozygous sites by using the pairwise sequential Markovian coalescent (PSMC) method.