Literature DB >> 25805843

Whole-genome sequencing of six Mauritian Cynomolgus macaques (Macaca fascicularis) reveals a genome-wide pattern of polymorphisms under extreme population bottleneck.

Naoki Osada¹, Nilmini Hettiarachchi², Isaac Adeyemi Babarinde², Naruya Saitou², Antoine Blancher³.

Abstract

Cynomolgus macaques (Macaca fascicularis) were introduced to the island of Mauritius by humans around the 16th century. The unique demographic history of the Mauritian cynomolgus macaques provides the opportunity to not only examine the genetic background of well-established nonhuman primates for biomedical research but also understand the effect of an extreme population bottleneck on the pattern of polymorphisms in genomes. We sequenced the whole genomes of six Mauritian cynomolgus macaques and obtained an average of 20-fold coverage of the genome sequences for each individual. The overall level of nucleotide diversity was 23% smaller than that of the Malaysian cynomolgus macaques, and a reduction of low-frequency polymorphisms was observed. In addition, we also confirmed that the Mauritian cynomolgus macaques were genetically closer to a representative of the Malaysian population than to a representative of the Indochinese population. Excess of nonsynonymous polymorphisms in low frequency, which has been observed in many other species, was not very strong in the Mauritian samples, and the proportion of heterozygous nonsynonymous polymorphisms relative to synonymous polymorphisms is higher within individuals in Mauritian than Malaysian cynomolgus macaques. Those patterns indicate that the extreme population bottleneck made purifying selection overwhelmed by the power of genetic drift in the population. Finally, we estimated the number of founding individuals by using the genome-wide site frequency spectrum of the six samples. Assuming a simple demographic scenario with a single bottleneck followed by exponential growth, the estimated number of founders (∼20 individuals) is largely consistent with previous estimates.

Entities: Chemical Disease Gene Species

Keywords: Mauritian cynomolgus macaque; genome sequence; population bottleneck

Mesh：

Substances：
Nucleotides

Year: 2015 PMID： 25805843 PMCID： PMC5322541 DOI： 10.1093/gbe/evv033

Source DB: PubMed Journal: Genome Biol Evol ISSN： 1759-6653 Impact factor: 3.416

Introduction

Nonhuman primates, in particular macaque monkeys, are important biological resources because of their genetic similarity with humans (approximately 94% in nucleotide sequence identity), which is much higher than that of the nonprimate mammal animal models (Gibbs et al. 2007; Shively and Clarkson 2009). However, the difficulty of obtaining genetically homogenous individuals in primates hampers their use in several fields of experimental medicine. Therefore, it is important to elucidate the genetic background of macaques to be able to use these animals for future biomedical research. The cynomolgus macaque (Macaca fascicularis) is one of the most widely used experimental animals in biomedical research, and has been used to study the effect of various medications as well as vaccines against infectious diseases. This species lives in widely distributed range in Southeast Asia, including areas of Indochina, Malaysia, Indonesia, the Philippines, and also the island of Mauritius, where the animals were only recently introduced by humans (Fooden 1976). Cynomolgus macaques are evolutionarily closely related to rhesus macaques (Macaca mulatta), another species which has been extensively studied. Polymorphisms shared between the cynomolgus and rhesus macaques suggest historical gene introgression between the two species, particularly in populations living near the boundary between their geographical distribution areas in the North of the Indochinese peninsula (Bonhomme et al. 2009; Stevison and Kohn 2009; Higashino et al. 2012). The average genetic divergence between the cynomolgus and rhesus macaques is 0.4–0.5% per site in the nuclear genome (Osada et al. 2010), which is considerably close to the average genetic diversity within each species. After the Indian government banned the export of the rhesus macaque to foreign countries in 1978, the importance of the cynomolgus macaques as an alternative resources for biomedical research has been increasingly appreciated (Wade 1978; Pavlin et al. 2009). Previous studies have shown that cynomolgus macaques are genetically highly heterogeneous (Osada et al. 2010; Yan et al. 2011), and that this genetic heterogeneity could contribute to varied responses to drugs and pathogens (Menninger et al. 2002; Drevon-Gaillot et al. 2006) and influence various biological parameters (Aarnink, Garchon, et al. 2011; Aarnink et al. 2013). Studies using mitochondrial and nuclear genome data have revealed that cynomolgus macaque populations are divided into four major genetic groups (Smith et al. 2007; Blancher et al. 2008; Osada et al. 2010): The Indonesian-Malaysian, Indochinese, Philippine, and Mauritian populations. These four populations show different levels of genetic diversity and have different demographic histories. The Indonesian-Malaysian population is thought to be the ancestral population of cynomolgus macaques, and show the highest level of nucleotide diversity (π), estimated to be 3.0 − 3.2 × 10−3 per site (Osada et al. 2010; Higashino et al. 2012; Fan et al. 2014), which is approximately three times higher than that in the entire human population (Prado-Martinez et al. 2013). The macaque population in the Philippines shows slightly reduced genetic diversity, probably because of a recent population size contraction (Osada et al. 2013). Phylogenetic trees of the mitochondrial DNA suggest that the Philippine population was derived from the Indonesian-Malaysian population (Smith et al. 2007; Tosi and Coke 2007; Blancher et al. 2008; Kanthaswamy et al. 2008; Stevison and Kohn 2008). The Indochinese cynomolgus population is thought to have experienced a nonnegligible amount of gene introgression from the rhesus macaques (Kanthaswamy et al. 2008; Bonhomme et al. 2009; Stevison and Kohn 2009), although the historical effect of this interspecies gene flow has not been solely restricted to the Indochinese population (Osada et al. 2010). Among the four major population groups, the Mauritian population has a particularly interesting demographic history; a small number of individuals were brought to the Mauritian island in the 16th century, where they settled to give rise to a quickly expanding population (Sussman and Tattersall 1986). Consistent with this historical record, the Mauritian population is characterized by a limited number of major histocompatibility complex (MHC) alleles (Leuchte et al. 2004; Krebs et al. 2005; Aarnink, Apoil, et al. 2011; Blancher et al. 2012), a small number of mitochondrial haplotypes (Smith et al. 2007; Tosi and Coke 2007; Blancher et al. 2008), and small numbers of microsatellite alleles at various loci (Bonhomme et al. 2008; Kawamoto et al. 2008). Because of their large population on the island (they are an invasive species in the Mauritian island) and the relatively simple configuration of their MHC alleles, Mauritian cynomolgus macaques have been used in several biomedical studies and their genome was sequenced as the first cynomolgus macaque genome (Ebeling et al. 2011). Although Mauritian cynomolgus macaques are thought to have a highly homogenous genetic background, recent studies using single nucleotide variant (SNV) markers unexpectedly identified genetic structures, indicating that there may be two or three subpopulations within the Mauritian cynomolgus macaques (Ogawa and Vallender 2014). However, these genetic structures do not correspond to their geographic distribution (Satkoski Trask et al. 2013). Through the studies of their MHC locus, it has been demonstrated that their repertoire of MHC haplotypes has been reduced by the founder effect; however, the impact of this population bottleneck on the other nuclear genes has not been well studied at the SNV level. The extreme population bottleneck in Mauritian cynomolgus macaques also provides evolutionary insight into how, in such circumstances, deleterious mutations can accumulate in genomes. Theoretical studies predict that the reduction of effective population size reduces the efficacy of natural selection and could result in the segregation and fixation of slightly deleterious mutations (Ohta 1973). Extreme population bottleneck reduces genetic diversity; hence, may cause a decline of the average fitness of a population. Mauritian cynomolgus macaques have well thrived in the island and rapidly expanded their population size (Sussman and Tattersall 1986). The well-documented demographic history of Mauritian cynomolgus macaque may provide a good opportunity for investigating the effect of extreme population bottleneck on the genome-wide pattern of polymorphisms. To date, whole-genome sequences of one Malaysian (Higashino et al. 2012), one Vietnamese (Yan et al. 2011), and one Mauritian (Ebeling et al. 2011) cynomolgus macaque have been analyzed. More recently, large-scale genome sequencing of Mauritian cynomolgus macaques was performed to find genetic causes of viral susceptibility (Ericsen et al. 2014). However, analyzing randomly sampled individuals to infer the past demography of the Mauritian cynomolgus macaques has not yet been conducted. Clarifying the genetic background of Mauritian cynomolgus macaques is of great importance for both biomedical and evolutionary research. Here, we report the whole-genome sequences of six Mauritian cynomolgus macaques with an approximately 20-fold coverage of the genome.

Materials and Methods

DNA Sequencing

We extracted DNA from blood samples of wild-caught male Mauritian cynomolgus macaques, initially used for studying the sex-matched response to SIV infection (Aarnink, Dereuddre-Bosquet, et al. 2011). At sampling, there was no evidence that they were closely related with each other. Genome sequencing libraries of approximately 450 bp length were constructed for each of the six macaques. Paired-end sequences of 100 bp were determined using HiSeq2000 (Illumina Inc, San Diego, CA). The library construction, sequencing, and initial quality check were performed at Beijing Genomics Institute (Shenzhen, China).

SNV Calling

Reads were mapped on the draft genome of the rhesus macaque (rheMac2), the draft Y chromosome sequence (Hughes et al. 2012), and the mitochondrial genome (DDBJ/GenBank/EMBL accession number: AY612638) using the BWA aln/sampe algorithm with default parameter settings, except for quality trimming score of −15 (Li and Durbin 2009). Among the samples, the average mapping rate of reads was 93.3%. Potential polymerase chain reaction duplicates were marked using Picard software (http://picard.sourceforge.net, last accessed March 5, 2015). SNVs were jointly called on all six samples using the Best Practice pipeline of the Genome Analysis Toolkit software package (Mckenna et al. 2010), which includes base quality score recalibration, insertion/deletion (indel) realignment, SNV calling, and variant quality score recalibration (Van Der Auwera et al. 2002; Depristo et al. 2011). After calling the initial set of variants, further application of the following hard filters was employed: FS > 60.0, HaplotypeScore > 13.0, MQ < 40.0, MQRankSum < −12.5, QD < 2.0, ReadPosRankSum < −8.0. SNVs on fragmented scaffolds (chrUr) were not included in analysis. Heterozygosity within individuals and nucleotide diversity (π) were estimated using only high coverage sites (≥10-folds). All raw read sequences and initial sets of variants are deposited into the public database (EMBL-EBI accession number: PRJEB7871).

Principal Component Analysis and Population Tree

Principal component analysis (PCA) was conducted using the smartpca program in the EIGENSOFT software package (Patterson et al. 2006). Extraction, filtering, and processing of data were performed using custom-made perl scripts. A population tree was constructed using three additional genome sequences of macaques (Yan et al. 2011; Higashino et al. 2012). PHYLIP software (Felsenstein 1989) was used to generate the distance matrix for the macaque individuals by Nei’s genetic distance (Nei 1972). For the tree construction, the allele frequency data of SNV sites (coverage ≥ 10) found in all nine individuals were used. The phylogenetic tree was constructed using the neighbor-joining method (Saitou and Nei 1987) implemented in MEGA6 (Tamura et al. 2013).

Pairwise Sequentially Markovian Coalescent Method

The analysis was performed using Pairwise Sequentially Markovian Coalescent (PSMC) software (Li and Durbin 2011). Consensus genome sequences for PSMC input were constructed using samtools and vcf2fq utility (Li et al. 2009). The time interval parameter of 4 + 25*2 + 4 + 6 and the number of iterations of 25 were used for the parameters of PSMC.

Prediction of Disease Causing Mutations

To infer the disease causality of nonsynonymous mutations in the Mauritian cynomolgus macaques, we identified respective nonsynonymous mutations in human orthologs and predicted the functional effect using PolyPhen-2 (Adzhubei et al. 2010). Gene annotation of the rhesus macaque followed the annotation in a previous study (Higashino et al. 2012). The human–macaque ortholog information was retrieved from the Ensembl database (Flicek et al. 2014). Human and macaque protein sequences were aligned using ClustalW (Thompson et al. 1994) and only the sites that have the same amino acid residues between human and macaque reference proteins in the alignment were analyzed through the PolyPhen-2 website (http://genetics.bwh.harvard.edu/pph2/, last accessed March 5, 2015). From this analysis, we had a final functional prediction of 7,976 nonsynonymous mutations.

Site Frequency Spectrum

Folded site frequency spectrum (fSFS) of ith occurrence was defined as and , where C represents the number of variants observed for i chromosomes and n is the number of sampled chromosomes. The number of sampled chromosomes in our study is 12 (diploid chromosomes of six individuals). To correct the excess of SNVs that are heterozygous in all individuals, most of which are thought to be due to genotyping error (see also Results and Discussion section), we applied a simple correction method assuming the Hardy–Weinberg equilibrium. We assumed that the allele frequency of all SNV sites that showed heterozygosity in all six individuals was 0.5, for which the highest proportion (1/2) of heterozygotes is expected. It should be noted that this assumption is conservative. We denoted the observed number of SNV sites that showed heterozygosity in all six individuals by C6_H6, with a corresponding expected probability of 0.56. The observed number of SNV sites that have a frequency of 0.5 and are not heterozygous in all samples is . If is the true number of mutations that are heterozygous in all individuals, the following relationship should hold: . The number of was estimated by solving this equation.

Estimating the Number of Founders

The level of the past population bottleneck was inferred by fitting expected fSFS to the observed fSFS using the analytical formula obtained by Marth et al. (2004). We considered a single bottleneck event, followed by exponential growth, which has two population genetic parameters to be estimated: The time of the bottleneck (Tb) and the size of the bottleneck (Nb). The ancestral population size (Na) and the current population size (N0) were fixed for each estimation. Because the model is scalable to any population size, we estimated fSFS for when Na is 100,000, and scaled the parameters after fitting. The deviance of expected to observed fSFS was evaluated using χ2 statistics for very small intervals for each Tb and Nb. In addition, we confirmed that the analytical formula and coalescent simulations gave highly consistent expected fSFS using our population growth model (data not shown).

Results and Discussion

Identification of SNVs and Estimation of Nucleotide Diversity

We obtained 100-bp-length Illumina paired-end sequences from six unrelated Mauritian cynomolgus macaques and mapped the reads to the reference rhesus macaque genome (see Materials and Methods section). We did not map the reads to the reference genome of the Mauritian cynomolgus macaque (Ebeling et al. 2011), because the rhesus macaque reference has better gene annotation and previous studies have shown that rhesus macaque genomes are sufficiently close to cynomolgus macaque genomes for read mapping by typical short-read mappers (Yan et al. 2011; Higashino et al. 2012). The average coverage is approximately 20-fold for each individual (table 1). In total, we identified approximately 21.8 million SNVs and 1.9 million indels against the reference genome among the six macaques on the autosomes. Because all samples are males, sex chromosomes and mitochondrial genome are all haploid genomes in our samples. Therefore, we mainly focused on the pattern of SNVs on the autosomes in this study. Summary of SNVs identified on the sex chromosomes is shown in supplementary table S1, Supplementary Material online. Each individual has an average of 5.8 million heterozygous and 8.3 million homozygous SNVs, and these numbers are highly consistent among individuals. Here, homozygous variants are defined against the reference genome sequence of rhesus macaque. Estimation of genetic diversity within the Mauritian cynomolgus macaque population was 2.28 × 10−3 for nucleotide diversity (π). We retrieved the previously published Malaysian cynomolgus macaque genomes and estimated heterozygosity using the same criteria for SNV identification (π = 2.96 × 10−3). The heterozygosity of;?>the Mauritian cynomolgus macaques was 23% smaller than the Malaysian cynomolgus macaque, which is thought to have very high genetic diversity.

Table 1

Summary of Variant Calling in Six Mauritian Cynomolgus Macaques

Sample ID	Average Sequencing Depth^a	Total SNV	Heterozygous SNV	Homozygous SNV	Heterozygosity
Tlse-8102	20.80	14,048,997	5,676,969	8,372,028	0.00225
(MCM1)	20.80	14,048,997	5,676,969	8,372,028	0.00225
Tlse-8141	20.42	14,045,116	5,814,396	8,230,720	0.00231
(MCM2)	20.42	14,045,116	5,814,396	8,230,720	0.00231
Tlse-8249	19.14	14,175,393	5,947,573	8,227,820	0.00236
(MCM3)	19.14	14,175,393	5,947,573	8,227,820	0.00236
Tlse-9204	20.33	14,116,067	5,837,878	8,278,189	0.00232
(MCM4)	20.33	14,116,067	5,837,878	8,278,189	0.00232
Tsle-9413	19.61	14,150,408	5,883,805	8,266,603	0.00234
(MCM5)	19.61	14,150,408	5,883,805	8,266,603	0.00234
Tlse-9859	20.42	14,080,208	5,753,557	8,326,651	0.00229
(MCM6)	20.42	14,080,208	5,753,557	8,326,651	0.00229
Malaysian cynomolgus macaque	26.1	12,758,246	7,177,728	5,580,518	0.00296

aReads mapped on autosomes.

Summary of Variant Calling in Six Mauritian Cynomolgus Macaques aReads mapped on autosomes.

Genetic Relationship between and within Populations

In addition to the Malaysian cynomolgus macaque genome, we retrieved the two more previously published macaque genomes (Vietnamese cynomolgus macaque and Chinese rhesus macaque). Genetic relationships among the six Mauritian cynomolgus macaque individuals were examined using PCA plot (fig. 1). We confirmed that no individuals were closely overlapped in the plot. A plot including all nine macaque genomes is presented in the supplementary figure S1, Supplementary Material online. We further examined whether the Mauritian cynomolgus macaques are genetically closer to the Malaysian or to the Indochina cynomolgus macaques. Figure 2 shows the phylogenetic relationship of the four macaque populations. Consistent with the results from mitochondrial data (Smith et al. 2007), the Mauritian cynomolgus macaques are genetically closer to the Malaysian cynomolgus macaques. Because genome sequences of the Indonesian populations have not been analyzed, we were not able to determine the detailed origin of the Mauritian cynomolgus macaques.

PCA plot of the six Mauritian cynomolgus macaque individuals. The individual ID is given beside each data point. The x- and y-axes represent the first and second principal components, respectively.

Phylogenetic relationships of the four populations. MFA and MMU designate M. fascicularis and M. mulatta, respectively. The branch length represents Nei’s genetic distance. Bootstrap confidence values (percentile) are shown upon the branches.

PCA plot of the six Mauritian cynomolgus macaque individuals. The individual ID is given beside each data point. The x- and y-axes represent the first and second principal components, respectively. Phylogenetic relationships of the four populations. MFA and MMU designate M. fascicularis and M. mulatta, respectively. The branch length represents Nei’s genetic distance. Bootstrap confidence values (percentile) are shown upon the branches. In addition, the past demography was estimated using the PSMC method (Li and Durbin 2011). The inferred demographic histories are shown in figure 3. The six Mauritian cynomolgus macaques showed a very similar trend of past demography. This result indicates that they are not derived from genetically distinct origins, which agrees with that of the mitochondrial data (Smith et al. 2007). However, we should note that PSMC would not work for the Mauritian cynomolgus macaques to properly scale time and population size because this analysis has a limitation in inferring very recent population size changes. The actual bottleneck in the Mauritian cynomolgus macaques was very recent to be inferred by PSMC. If the genome-wide heterozygosity were dramatically changed by very recent demographic events, scaling parameters would fail. In this study, our purpose of using PSMC was to check whether the population size trajectories overlap with each other, and not for estimating demographic parameters themselves; therefore, in figure 3, we only showed parameter values scaled by N0 = 10,000, which was arbitrarily determined. The confidence intervals of population size estimation are shown in the supplementary figure S2, Supplementary Material online. In order to infer the recent demography we applied the method using information on polymorphism frequency, which is described in the later section.

The change in population size inferred by PSMC. Six individuals, MCM1–MCM6, are labeled by red, blue, green, black, purple, and orange lines, respectively. The time from the present and effective population size are shown in the x- and y-axes, respectively. Note that the time and the size were arbitrarily scaled by baseline effective population size (N0) equal to 10,000 (see Results and Discussion).

Site Frequency Spectrum of SNVs

To elucidate a more detailed pattern of the polymorphisms in the Mauritian cynomolgus macaques, we calculated the SFS of mutations in the six samples. Because we cannot assume that the reference rhesus macaque genome has ancestral states, the spectrum is folded (see Materials and Methods section). Before evolutionary inference, we carefully examined the potential genotyping errors that could affect the pattern of SFS. We found an excess of SNVs for which all six macaques were heterozygous (H6 sites); this fraction of H6 sites was enriched with nonsynonymous mutations. Fixation of segmental duplication with subsequent mutations may cause false identification of heterozygosity at such sites; alternatively, these sites would be observed when one of the duplicated loci is not present in the reference genome sequence. If either of these cases were the cause of the H6 sites, we would expect these sites to have higher genome sequencing coverage. To evaluate this, we compared the occurrence of H6 sites with the average coverage depth among the six genomes at those sites. Repeat regions of the genome were excluded from this analysis to avoid the complex effect of repetitive sequences. The results showed that the coverage distribution for H6 sites was skewed toward higher coverage, and the coverage distribution for nonsynonymous and synonymous sites among the H6 sites was more strongly biased toward higher coverage (fig. 4). Although we cannot identify the reason for these systematic biases, this pattern of miscalling should be carefully interpreted in future whole-genome sequencing studies. To remove the potential genotyping errors for the analysis of fSFS data, we corrected the miscalling of H6 sites by assuming the Hardy–Weinberg equilibrium (see Materials and Methods).

Genome sequencing coverage of SNV sites. The height of the lines shows the density estimation of read coverage. The black, red, and blue lines represent the estimated density for all SNV sites, noncoding H6 sites, and coding H6 sites, respectively. H6 sites are the sites where all six samples are heterozygous. In figure 5A, fSFS for nonsynonymous, synonymous, and noncoding sites is shown. Notably, nonsynonymous and synonymous sites are defined among the six Mauritian cynomolgus macaque alleles. Compared with a neutral expectation with a constant population size, Mauritian cynomolgus macaques harbor significantly fewer low-frequency polymorphisms, particularly singletons (P < 10−15; χ2 test). A reduction of population size is expected to affect low-frequency polymorphisms more than common-frequency polymorphisms (Luikart et al. 1998). Because π is more sensitive to the difference in common-frequency polymorphisms, we expected that π would not be greatly affected by the very recent population bottleneck (Tajima 1989).

fSFS for nonsynonymous (dark gray), synonymous (light gray), and noncoding (white) sites among the six Mauritian cynomolgus macaque individuals. The expected frequency with constant population size is shown by the black bar (A). Prediction of disease-causing mutations was performed using PolyPhen-2. Probably damaging, possibly damaging, and benign nonsynonymous mutations are shown in the black, dark gray, and light gray bars, respectively (B). Interestingly, the patterns of fSFS for noncoding, synonymous, and nonsynonymous mutations are not strongly different within the Mauritian cynomolgus macaque population. In particular, most of the large-scale population genetic studies in humans (e.g., Fujimoto et al. 2010) have found an excess of low frequency nonsynonymous mutations; however, this was not observed in the Mauritian macaque, although the difference between synonymous and nonsynonymous mutations was statistically significant (P < 10−15; χ2 test). In addition, nonsynonymous and synonymous mutations showed similar level of singletons (P = 0.67; χ2 test). Because recent population bottleneck mostly affects the pattern of rare polymorphisms, mutations segregating within the macaque population at low frequencies were rapidly lost during the bottleneck period, and the time elapsed since the bottleneck has been short to allow for the appearance of new mutations. We examined the phenotypic effect of mutations using predictions of disease causality in human genes. Predictions of the functional effect on nonsynonymous SNVs were performed using PolyPhen-2 (Adzhubei et al. 2010), which predicts the potential impact of an amino acid substitution based on protein structure and evolutionary conservation. We excluded all H6 mutations from the disease-causing mutation analyses because most of them are likely genotyping errors. The fraction of potentially damaging mutations for each fSFS category is shown in figure 5B. The excess of damaging mutations in the singleton class was not statistically significant (P = 0.17; χ2 test), which is considerably different from the pattern in humans (Andrés et al. 2009). On an average, Mauritian cynomolgus macaques have 10,565 nonsynonymous and 13,533 synonymous heterozygous SNVs. The ratio of nonsynonymous to synonymous polymorphisms was 0.78, which is significantly higher than the ratio observed in the Malaysian cynomolgus macaque individual (0.68; Higashino et al. 2012; P < 10−15; χ2 test). The higher ratio in the Mauritian cynomolgus macaques indicates that more deleterious mutations are segregating with greater frequency in the population. In addition, we found that 12,467 nonsynonymous and 17,749 synonymous changes are fixed among the Mauritian cynomolgus macaque samples compared with the reference rhesus macaque genome. Considering that the proportion of heterozygosity at nonsynonymous SNVs/synonymous SNVs is higher in Mauritian individuals than in the Malaysian individual, we concluded that the pattern of polymorphisms in the Mauritian cynomolgus macaques has been predominantly shaped by a strong genetic drift and has overwhelmed by the power of purifying selection during the population bottleneck. However, the data also showed that, at the same time, low-frequency nonsynonymous polymorphisms have been effectively removed from the population by genetic drift. Therefore, there have been both gain and loss of deleterious mutations in the population. The observation is consistent with recent theoretical and experimental studies in humans, which found that the genetic load of the population is not strongly affected by recent demographic changes (Lohmueller 2014; Simons et al. 2014; Do et al. 2015).

Estimation of Demography

To investigate whether the observed fSFS agrees with the extreme population bottleneck from the known historical record, we estimated the level of population bottleneck by fitting expected fSFS to observed fSFS. To this end, we assume a simple demographic scenario, where a small number of individuals were introduced to the island from the ancestral population with a constant population size, followed by a quickly increased population size with exponential growth (fig. 6). Four parameters: Ancestral effective population size (Na), current effective population size (N0), the effective population size at bottleneck (Nb), and timing of bottleneck (Tb), are involved in this model. Our approach estimated Nb for a given Na, N0, and Tb by fitting observed fSFS to expected fSFS with grid sampling of Nb and Tb (Marth et al. 2004). Because the number of analyzed sites is so large (approximately 16 million), the confidence intervals of the estimate became small. Therefore, although we need be careful in the interpreting these results, here we present only the maximum-likelihood point estimates of Nb for a given Na, N0, and Tb.

Proposed demographic model for estimating the number of founding individuals. The width of the shaded area represents effective population sizes. We assumed that the bottleneck occurred at time Tb and the number of Nb individuals were randomly selected at the bottleneck from an ancestral population with constant population size of Na. After the bottleneck the size of population recovered to N0 with exponential growth. According to the historical record, introduction of macaques to the Mauritian islands was around 400–500 years ago. The estimation of a generation time for macaques is uncertain, ranging from 5 to 12 years (Gage 1998). Any bias in generation time estimation would affect the accuracy of the estimation of Nb. Therefore, we used the two long-term demographic studies of Japanese macaques (Macaca fuscata) to calculate the average time of reproduction of females, which yielded a generation time of 9.6–11.4 years (Koyama et al. 1992; Fujimoto et al. 2010). In this study, we applied the estimation of 10-year generation time, which means the population bottleneck occurred 40–50 generations ago. In the following estimation, we assumed Tb = 40. Assuming Na = 30,000 and N0 = 30,000, the estimated number of individuals during the bottleneck is 16. This estimated number of founders is robust against the assumption of Na and N0. For example, Nb = 15 assuming Na = 50,000 and N0 = 25,000, Nb = 14. In table 2, the estimated numbers of founders with different values of Na and N0 are shown. The range of the estimated number of founders does not contradict the previous microsatellite (Bonhomme et al. 2008) and mitochondrial data (Smith et al. 2007).

Table 2

Estimated Number of Founders with Different Na and N0

	N_a = 50,000	N_a = 30,000
N₀ = N_a	15	16
N₀ = N_a/2	17	18
N₀ = N_a/5	20	21

Estimated Number of Founders with Different Na and N0 In addition to the exponential growth model, we examined logistic growth models. These models showed a better fit than the previous study using microsatellite data (Bonhomme et al. 2008). In general, the logistic growth models yielded a smaller number of founding individuals than the exponential growth model. This is because the logistic model has more rapid growth in the early phase, which makes the bottleneck less effective. Using similar parameter settings as the study of Bonhomme et al. (2008), a generation time of 5 years and a growth rate of 0.3, we obtained a slightly smaller number of founding individuals (2–8 founders) than the estimated number by Bonhomme et al. (12 founders). However, in this study, we did not thoroughly apply different growth models because our data have a limited sample size and may not have enough power to infer a very recent demographic history. The estimated number of founders assuming exponential growth may be overestimated because reports of large-scale haplotyping in the MHC region have identified only seven founding MHC haplotypes in the Mauritian cynomolgus macaque population (Wiseman et al. 2007; Mee et al. 2009; Budde et al. 2010; Blancher et al. 2012; Aarnink et al. 2014) and eight haplotypes in the killer cell immunoglobulin-like receptor (KIR) region (Bimber et al. 2008). In this scenario, the lower limit of the founding individuals is 4, which is closer to our estimated number assuming logistic growth. However, the probability of allele loss is highly dependent on the initial pattern of population growth; this is difficult to accurately estimate, and the process could be highly stochastic with a small number of founders. Natural selection could have preserved the number of alleles at the MHC locus; in particular, a recent study found that there is MHC class I semi-incompatibility between mother and offspring in cynomolgus macaques; thus, natural selection would act against the loss of MHC alleles in this population (Aarnink et al. 2014). It is of interest to further investigate the effect of natural selection on the genetic diversity at the MHC locus in future studies using whole-genome sequences of more individuals from the populations from which the Mauritian macaques originated.

Conclusions

In this article, we report the genome sequences of six Mauritian cynomolgus macaques. The pattern of polymorphisms in these animals shows a reduced level of genetic diversity, particularly in low-frequency polymorphisms. This pattern agrees well with the historical record of an extreme population bottleneck during the founding of this population. The low efficacy of purifying selection on their genomes may provide the further insight into the specific phenotypic characteristics in Mauritian cynomolgus macaques. The smaller genetic diversity in this population is of great importance for better reproducibility of drug testing and viral infection experiments. In addition, the whole-genome sequences of the Mauritian cynomolgus macaques provide further insights into the genetic basis of variation among macaques for drug and viral response in future biomedical research.

Supplementary Material

Supplementary table S1 and figures S1 and S2 are available at Genome Biology and Evolution online (http://www.gbe.oxfordjournals.org/).

61 in total

1. The allele frequency spectrum in genome-wide human variation data reveals signals of differential demographic history in three large world populations.

Authors: Gabor T Marth; Eva Czabarka; Janos Murvai; Stephen T Sherry
Journal: Genetics Date: 2004-01 Impact factor: 4.562

2. Genetic diversity of longtail macaques (Macaca fascicularis) on the island of Mauritius: an assessment of nuclear and mitochondrial DNA polymorphisms.

Authors: Y Kawamoto; S Kawamoto; K Matsubayashi; K Nozawa; T Watanabe; M-A Stanley; D Perwitasari-Farajallah
Journal: J Med Primatol Date: 2008-02 Impact factor: 0.667

3. India bans monkey export: u.s. May have breached accord.

Authors: N Wade
Journal: Science Date: 1978-01-20 Impact factor: 47.728

4. Statistical method for testing the neutral mutation hypothesis by DNA polymorphism.

Authors: F Tajima
Journal: Genetics Date: 1989-11 Impact factor: 4.562

5. Impact of MHC class II polymorphism on blood counts of CD4+ T lymphocytes in macaque.

Authors: Alice Aarnink; Henri-Jean Garchon; Bénédicte Puissant-Lubrano; Marie Blancher-Sardou; Pol-André Apoil; Antoine Blancher
Journal: Immunogenetics Date: 2010-11-18 Impact factor: 2.846

6. Inference of human population history from individual whole-genome sequences.

Authors: Heng Li; Richard Durbin
Journal: Nature Date: 2011-07-13 Impact factor: 49.962

7. Use of Cumulative Poisson Probability Distribution as an Estimator of the Recombination Rate in an Expanding Population: Example of the Macaca fascicularis Major Histocompatibility Complex.

Authors: Antoine Blancher; Alice Aarnink; Nicolas Savy; Naoyuki Takahata
Journal: G3 (Bethesda) Date: 2012-01-01 Impact factor: 3.154

8. Population structure and eigenanalysis.

Authors: Nick Patterson; Alkes L Price; David Reich
Journal: PLoS Genet Date: 2006-12 Impact factor: 5.917

9. Great ape genetic diversity and population history.

Authors: Javier Prado-Martinez; Peter H Sudmant; Jeffrey M Kidd; Heng Li; Joanna L Kelley; Belen Lorente-Galdos; Krishna R Veeramah; August E Woerner; Timothy D O'Connor; Gabriel Santpere; Alexander Cagan; Christoph Theunert; Ferran Casals; Hafid Laayouni; Kasper Munch; Asger Hobolth; Anders E Halager; Maika Malig; Jessica Hernandez-Rodriguez; Irene Hernando-Herraez; Kay Prüfer; Marc Pybus; Laurel Johnstone; Michael Lachmann; Can Alkan; Dorina Twigg; Natalia Petit; Carl Baker; Fereydoun Hormozdiari; Marcos Fernandez-Callejo; Marc Dabad; Michael L Wilson; Laurie Stevison; Cristina Camprubí; Tiago Carvalho; Aurora Ruiz-Herrera; Laura Vives; Marta Mele; Teresa Abello; Ivanela Kondova; Ronald E Bontrop; Anne Pusey; Felix Lankester; John A Kiyang; Richard A Bergl; Elizabeth Lonsdorf; Simon Myers; Mario Ventura; Pascal Gagneux; David Comas; Hans Siegismund; Julie Blanc; Lidia Agueda-Calpena; Marta Gut; Lucinda Fulton; Sarah A Tishkoff; James C Mullikin; Richard K Wilson; Ivo G Gut; Mary Katherine Gonder; Oliver A Ryder; Beatrice H Hahn; Arcadi Navarro; Joshua M Akey; Jaume Bertranpetit; David Reich; Thomas Mailund; Mikkel H Schierup; Christina Hvilsom; Aida M Andrés; Jeffrey D Wall; Carlos D Bustamante; Michael F Hammer; Evan E Eichler; Tomas Marques-Bonet
Journal: Nature Date: 2013-07-03 Impact factor: 49.962

10. Whole-genome sequencing of tibetan macaque (Macaca Thibetana) provides new insight into the macaque evolutionary history.

Authors: Zhenxin Fan; Guang Zhao; Peng Li; Naoki Osada; Jinchuan Xing; Yong Yi; Lianming Du; Pedro Silva; Hongxing Wang; Ryuichi Sakate; Xiuyue Zhang; Huailiang Xu; Bisong Yue; Jing Li
Journal: Mol Biol Evol Date: 2014-03-18 Impact factor: 16.240

16 in total

1. Single nucleotide polymorphisms in the FcγR3A and TAP1 genes impact ADCC in cynomolgus monkey PBMCs.

Authors: Jonathan C Sanford; Hong Wu; Yasmina Abdiche; Julie A Harney; Javier Chaparro-Riggers; Karissa Adkins
Journal: Immunogenetics Date: 2017-02-03 Impact factor: 2.846

2. Mx1 and Mx2 key antiviral proteins are surprisingly lost in toothed whales.

Authors: Benjamin A Braun; Amir Marcovitz; J Gray Camp; Robin Jia; Gill Bejerano
Journal: Proc Natl Acad Sci U S A Date: 2015-06-15 Impact factor: 11.205

3. The Population Genetic Composition of Conventional and SPF Colonies of Rhesus Macaques (Macaca mulatta) at the Caribbean Primate Research Center.

Authors: Sreetharan Kanthaswamy; Jillian Ng; Raisa Hernández-Pacheco; Angelina Ruiz-Lambides; Elizabeth Maldonado; Melween I Martínez; Carlos A Sariol
Journal: J Am Assoc Lab Anim Sci Date: 2016-03 Impact factor: 1.232

4. The conservation genetics juggling act: integrating genetics and ecology, science and policy.

Authors: Susan M Haig; Mark P Miller; Renee Bellinger; Hope M Draheim; Dacey M Mercer; Thomas D Mullins
Journal: Evol Appl Date: 2015-12-01 Impact factor: 5.183

5. RNA sequencing (RNA-Seq) of lymph node, spleen, and thymus transcriptome from wild Peninsular Malaysian cynomolgus macaque (Macaca fascicularis).

Authors: Joey Ee Uli; Christina Seok Yien Yong; Swee Keong Yeap; Jeffrine J Rovie-Ryan; Nurulfiza Mat Isa; Soon Guan Tan; Noorjahan Banu Alitheen
Journal: PeerJ Date: 2017-08-17 Impact factor: 2.984

6. Diversity and regulatory impact of copy number variation in the primate Macaca fascicularis.

Authors: Andreas R Gschwind; Anjali Singh; Ulrich Certa; Alexandre Reymond; Tobias Heckel
Journal: BMC Genomics Date: 2017-02-10 Impact factor: 3.969

7. Highlight: After the bottleneck--how a tiny group of macaques founded a thriving population on Mauritius.

Authors: Danielle Venton
Journal: Genome Biol Evol Date: 2015-03-28 Impact factor: 3.416

8. Whole genome sequencing in the search for genes associated with the control of SIV infection in the Mauritian macaque model.

Authors: Marc de Manuel; Takashi Shiina; Shingo Suzuki; Nathalie Dereuddre-Bosquet; Henri-Jean Garchon; Masayuki Tanaka; Nicolas Congy-Jolivet; Alice Aarnink; Roger Le Grand; Tomas Marques-Bonet; Antoine Blancher
Journal: Sci Rep Date: 2018-05-08 Impact factor: 4.379

9. Identification of microRNAs in Macaca fascicularis (Cynomolgus Monkey) by Homology Search and Experimental Validation by Small RNA-Seq and RT-qPCR Using Kidney Cortex Tissues.

Authors: Yaligara Veeranagouda; Pierrick Rival; Catherine Prades; Claire Mariet; Jean-François Léonard; Jean-Charles Gautier; Xiaobing Zhou; Jufeng Wang; Bo Li; Marie-Laure Ozoux; Eric Boitier
Journal: PLoS One Date: 2015-11-12 Impact factor: 3.240

10. Identification and characterization of short tandem repeats in the Tibetan macaque genome based on resequencing data.

Authors: San-Xu Liu; Wei Hou; Xue-Yan Zhang; Chang-Jun Peng; Bi-Song Yue; Zhen-Xin Fan; Jing Li
Journal: Zool Res Date: 2018-04-11