Hong-Xin Xie1,2, Xi-Xi Liang1, Zhi-Qiang Chen3, Wei-Ming Li1,2, Chun-Rong Mi1,2, Ming Li1, Zheng-Jun Wu4,5, Xu-Ming Zhou1, Wei-Guo Du1. 1. Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing, China. 2. University of Chinese Academy of Sciences, Beijing, China. 3. Novogene Bioinformatics Institute, Beijing, China. 4. Key Laboratory of Ecology of Rare and Endangered Species and Environmental Protection, Ministry of Education, Guangxi Normal University, Guilin, China. 5. Guangxi Key Laboratory of Rare and Endangered Animal Ecology, College of Life Science, Guangxi Normal University, Guilin, China.
Abstract
The purging of deleterious alleles has been hypothesized to mitigate inbreeding depression, but its effectiveness in endangered species remains debatable. To understand how deleterious alleles are purged during population contractions, we analyzed genomes of the endangered Chinese crocodile lizard (Shinisaurus crocodilurus), which is the only surviving species of its family and currently isolated into small populations. Population genomic analyses revealed four genetically distinct conservation units and sharp declines in both effective population size and genetic diversity. By comparing the relative genetic load across populations and conducting genomic simulations, we discovered that seriously deleterious alleles were effectively purged during population contractions in this relict species, although inbreeding generally enhanced the genetic burden. However, despite with the initial purging, our simulations also predicted that seriously deleterious alleles will gradually accumulate under prolonged bottlenecking. Therefore, we emphasize the importance of maintaining a minimum population capacity and increasing the functional genetic diversity in conservation efforts to preserve populations of the crocodile lizard and other endangered species.
The purging of deleterious alleles has been hypothesized to mitigate inbreeding depression, but its effectiveness in endangered species remains debatable. To understand how deleterious alleles are purged during population contractions, we analyzed genomes of the endangered Chinese crocodile lizard (Shinisaurus crocodilurus), which is the only surviving species of its family and currently isolated into small populations. Population genomic analyses revealed four genetically distinct conservation units and sharp declines in both effective population size and genetic diversity. By comparing the relative genetic load across populations and conducting genomic simulations, we discovered that seriously deleterious alleles were effectively purged during population contractions in this relict species, although inbreeding generally enhanced the genetic burden. However, despite with the initial purging, our simulations also predicted that seriously deleterious alleles will gradually accumulate under prolonged bottlenecking. Therefore, we emphasize the importance of maintaining a minimum population capacity and increasing the functional genetic diversity in conservation efforts to preserve populations of the crocodile lizard and other endangered species.
Endangered species are often confined to small, isolated populations. Under these conditions, genetic drift and inbreeding are predicted to cause the loss of genetic diversity and decline in fitness (Caughley 1994; Frankham 2005). It is generally acknowledged that inbreeding depression (reduced fitness in inbred populations) is primarily caused by the homozygosity of recessive deleterious alleles, which are typically masked by heterozygosity (Charlesworth and Willis 2009; Hedrick 2012; Hedrick et al. 2016). In response to these deleterious alleles exposed by inbreeding, reducing the frequency of strongly deleterious alleles by purging helps mitigate inbreeding depression (Glémin 2003; García-Dorado 2012; Hedrick and García-Dorado 2016). However, most inbreeding experiments have been conducted on self-fertilizing plants and model animals in the laboratory (Bijlsma et al. 2000; Miller and Hedrick 2001; Hedrick et al. 2016). The cause, severity, and consequences of inbreeding in natural animal populations and whether purging works effectively in the wild remain largely unknown (Hedrick and García-Dorado 2016; Hasselgren and Norén 2019).Isolated populations with historically larger population sizes are expected to harbor more recessive deleterious alleles, and experience more severe inbreeding depression when shrinking to small populations (Hedrick et al. 2016; Robinson et al. 2019; Kyriazis et al. 2021). Exploring the demographic history and the accumulation or purging of deleterious alleles of endangered species can provide a unique opportunity for understanding the role of purging in the mitigation of inbreeding depression during a population decline. Advances in whole-genome sequencing enable the reconstruction of demographic history and genome-wide assessment of mutation loads in nonmodel species (Zhao et al. 2013, 2014; Beichman et al. 2018; Jiang et al. 2019). Unbiased estimates of inbreeding in wild populations, in the absence of pedigrees, are also feasible through genomic data (Kardos et al. 2016, 2018; Robinson et al. 2019). Recent research has used genomic data to address the purging of deleterious alleles in some mammals and birds. However, there have been conflicting results on the purging effect in small wild populations (Xue et al. 2015; Robinson et al. 2016, 2018; Rogers and Slatkin 2017; Grossen et al. 2020; Wang et al. 2021). For instance, observed data from isolated gorilla populations provided direct evidence of the purging of strongly deleterious alleles in mountain gorillas and eastern lowland gorillas (Xue et al. 2015). In contrast, studies on island foxes (Robinson et al. 2016), ancient wooly mammoths (Rogers and Slatkin 2017), and brown-eared pheasants (Wang et al. 2021) reported inefficiency of purging in small isolated populations. This between-species discrepancy in purging effectiveness implies that the purging of deleterious alleles during population contraction in endangered species may be more complicated than expected, and therefore merits further research.This study focused on the critically endangered Chinese crocodile lizard, Shinisaurus crocodilurus, the only living species of the family Shinisauridae. The crocodile lizard was first described in 1930 and is considered a primitive squamate species based on its morphological characteristics (Zhang 2002), yet we still know little about its evolutionary history. Currently, habitat destruction and illegal poaching have brought the crocodile lizard to the brink of extinction (van Schingen et al. 2014). The remaining populations are restricted to fragmented areas in southern China and northern Vietnam. The total population in China was reported to be 950 in 2008 (Huang et al. 2008), and the Vietnamese population (VN) was estimated to be less than 200 in 2016 (van Schingen, Ha, et al. 2016). Understanding the demographic history, genetic diversity, level of inbreeding, and strength of purging in crocodile lizard populations is vital for the future conservation management of this species. Although all crocodile lizard populations are small, they exhibit different census sizes. The Guangdong Maoming population (GDMM) in China was unknown to the public until its discovery in 2005, with the population size estimated to be over 1,000 in 2008 (Huang et al. 2015). In contrast, the other two populations in China, the Guangxi Daguishan (GXDGS) and Guangdong Luokeng (GDLK) populations, were reported to be 100 and 220, respectively (Huang et al. 2008). These three sites provide a unique opportunity to study inbreeding and the efficiency of purging deleterious genes for populations of varying sizes. The ineffective purging of deleterious alleles in smaller populations might increase the risk of local extinction.After de novo assembly of a chromosome-level crocodile lizard genome, we further sequenced samples from all available extant wild populations. The whole-genome sequencing data we generated enabled us to clarify the phylogenetic relationships of extant populations and reconstruct their demographic histories, which were previously unable to be resolved using mitochondrial fragments (Huang et al. 2014; van Schingen, Le, et al. 2016). We then explored the patterns of genetic diversity, inbreeding, and the purging of deleterious alleles in different crocodile lizard populations. Based on our population comparisons and genomic simulations, we found that the effectiveness of purging is driven both by the contracted population size and the duration of population contraction. Our results have valuable implications for the management and long-term survival of small populations often threatened by the “extinction vortex” (Caughley 1994).
Results
Genome Assembly and Population Structure
We sequenced the whole genome of a female Chinese crocodile lizard. We generated a 2.19 Gb high-quality reference genome with contig N50 of 3.34 Mb using single-molecule real-time (PacBio Sequel), paired (Illumina HiSeq), linked-read (10X Genomics), and Hi-C (Phase Genomics, Inc.) sequencing approaches (supplementary fig. S1, supplementary table S1, and supplementary note 1, Supplementary Material online). Contigs were assigned to 16 pseudo-chromosomes, resulting in a chromosome-level genome assembly (supplementary figs. S2 and S3, Supplementary Material online). Compared with a previously published short-read-based assembly (Gao et al. 2017), the new genome has a 285-fold increase in continuity (contig N50) and 202-fold increase in scaffold N50. Our species tree supports the sister relationship between anguimorphs (S.crocodilurus, Varanus komodoensis, and Ophisaurus gracilis) and iguanians (Anolis carolinensis and Pogona vitticeps) (fig. 1; supplementary fig. S4, Supplementary Material online), which is consistent with previously published marker gene-based analyses (Pyron et al. 2013; Zheng and Wiens 2016; Burbrink et al. 2020). Species divergence time analysis revealed that the Chinese crocodile lizard and its most closely related extant species (V. komodoensis) diverged approximately 93.9 Ma (95% credibility interval: 45.5–180.1 Ma), which is in the range of the Cretaceous period (fig. 1). Additionally, we collected 47 saliva samples from four extant wild populations of crocodile lizards (fig. 1). Whole-genome sequencing data resulted in an average depth of 9.88-fold and 95.85% average sequencing coverage (supplementary tables S2 and S3, Supplementary Material online). Single-nucleotide polymorphism (SNP) calling following strict quality control criteria resulted in a set of 4,568,681 high-quality SNPs that were used for following analyses. Of these, 1.36%, or 62,248 SNPs, were in exonic regions (supplementary table S4, Supplementary Material online).
Fig. 1.
Phylogenetics and genetic structure of the crocodile lizard. (a) Time calibrated phylogenetic tree of 25 representative vertebrate species. Nodes with solid circle represent fossil calibration points used in divergence time estimation. The node with a hollow star shows the divergence time between the crocodile lizard and monitor lizards. (b) Map showing the sampling sites of the four extant wild populations. (c) Phylogenetic relationship of 47 individuals. Ophisaurus gracilis was added as an outgroup. (d) Principal components analysis of 47 individuals. (e) Population structure estimated by ADMIXTURE. (f) RCCR between crocodile lizard populations. The RCCR value is close to 1 when the two populations are well mixed and 0 after they have fully split.
Phylogenetics and genetic structure of the crocodile lizard. (a) Time calibrated phylogenetic tree of 25 representative vertebrate species. Nodes with solid circle represent fossil calibration points used in divergence time estimation. The node with a hollow star shows the divergence time between the crocodile lizard and monitor lizards. (b) Map showing the sampling sites of the four extant wild populations. (c) Phylogenetic relationship of 47 individuals. Ophisaurus gracilis was added as an outgroup. (d) Principal components analysis of 47 individuals. (e) Population structure estimated by ADMIXTURE. (f) RCCR between crocodile lizard populations. The RCCR value is close to 1 when the two populations are well mixed and 0 after they have fully split.To explore the genetic relationship of extant populations, we first constructed a phylogenetic tree using O. gracilis as an outgroup (fig. 1; supplementary fig. S5, Supplementary Material online). This demonstrates that individuals from the same population cluster together, with the VN diverging from Chinese populations (GDMM, GXDGS, and GDLK) at the first instance. The GXDGS and GDLK populations were sister groups and then coalesced into a branch with the GDMM population. Clustering by principal components analysis (PCA) (Patterson et al. 2006) using average genome distances also supports four distinct groups (fig. 1). We then used ADMIXTURE (Alexander et al. 2009) to estimate the genetic components of each sample with varying numbers of ancestors (K). When K = 2, the VN population diverged from the other populations, and when K = 4, the best value of K indicated by cross-validation error (supplementary fig. S6, Supplementary Material online), four divergent populations could be identified (fig. 1). Overall, phylogenetic and cluster analyses support four genetically distinct populations, and an early divergence of the Vietnamese from the Chinese populations. To explore their separation history and gene flow, we conducted relative cross-coalescence rate (RCCR) analyses using the multiple sequentially Markovian coalescent (MSMC) (Schiffels and Durbin 2014) method (fig. 1; supplementary fig. S7, Supplementary Material online). Unsurprisingly, the three Chinese populations have never been fully isolated, according to their RCCR. However, the RCCR curve of the VN population compared with the other three populations converged to 0 at about 20,000–25,000 years ago, suggesting full separation since the last glacial maximum (LGM). We also calculated statistics of the three-population test (F3 statistics) (Patterson et al. 2012) to test any apparent admixture among the Chinese populations. All resulting F3 statistics are positive and do not support admixture (supplementary table S5, Supplementary Material online). We calculated the pairwise fixation index (FST) between the four populations to quantify their genetic distances (supplementary table S6, Supplementary Material online). The FST values between the VN population and the three Chinese populations GDMM, GXDGS, and GDLK are 0.35, 0.38, and 0.45, respectively, suggesting a very high level of genetic differentiation (Wright 1978). The level of differentiation between any two of the Chinese populations is also high (with FST values between the range of 0.15 and 0.25), supporting genetically distinct populations.
Decline of Population Size and Genetic Diversity
In order to evaluate the effective population sizes (Ne) of crocodile lizards, we first reconstructed the dynamics of historic Ne using the pairwise sequentially Markovian coalescent (PSMC) method (Li and Durbin 2011). The four populations showed a high correlation in historical Ne changes through the quaternary glaciation cycles (fig. 2). There was a dramatic decrease of Ne from about 2 to 1 Ma. After the population decline, the Ne of the crocodile lizards stayed relatively stable, around 100,000 for nearly 800,000 years until about 200,000 years ago. During the last glaciation process, Ne of crocodile lizards continuously declined until reaching a severe bottleneck during the LGM (about 20,000–30,000 years ago). There were expansions after the bottleneck, as shown by MSMC analyses (Schiffels and Durbin 2014) (supplementary fig. S8, Supplementary Material online). We used PopSizeABC (Boitard et al. 2016), an approximate Bayesian computation approach, to estimate the very recent population sizes (from the generation before the present to the most recent common ancestor). Here, we successfully reconstructed the population size history of the four populations from the LGM to the present (fig. 2). The four populations showed similar Ne curves, reached a local maximum about 6,000–10,000 years ago (early to mid-Holocene), and a sharp decline followed. Herein the most recent 1,000 years.
Fig. 2.
Population demography of the crocodile lizard. (a) Estimation of historical Ne using the PSMC model. Each population was represented with an individual with the highest average sequencing coverage. Estimates of global sea surface temperature in the Pleistocene were adopted from the literature (Hansen et al. 2013). (b) Demographic history estimation of the latest 20,000 years using the PopSizeABC model. Dotted lines represent 90% confidence interval.
Population demography of the crocodile lizard. (a) Estimation of historical Ne using the PSMC model. Each population was represented with an individual with the highest average sequencing coverage. Estimates of global sea surface temperature in the Pleistocene were adopted from the literature (Hansen et al. 2013). (b) Demographic history estimation of the latest 20,000 years using the PopSizeABC model. Dotted lines represent 90% confidence interval.We then estimated the genetic diversity of the four crocodile lizard populations. To reduce sampling biases when comparing population parameters in the following analyses, we excluded 16 samples to ensure no close relation (no greater than a third-degree relationship) among the remaining samples (supplementary fig. S9, Supplementary Material online). Genome-wide nucleotide diversity values (θ) differ in the four populations: VN (3.85 × 10−4) < GDLK (3.89 × 10−4) < GXDGS (5.01 × 10−4) < GDMM (5.47 × 10−4). Whole-genome heterozygosity (heterozygous sites divided by total sequence) values calculated for each sample showed a similar trend (fig. 3). The VN and GDLK populations have significantly lower heterozygosity than GXDGS and GDMM. Compared with other endangered species, the whole-genome heterozygosity of crocodile lizards (0.020–0.057%) is lower than pandas (0.135%) (Li et al. 2010) and great apes (0.065–0.178%) (Prado-Martinez et al. 2013), comparable to tigers (0.026–0.072%) and snub-nosed monkeys (0.015–0.068%) (Zhou et al. 2016), and higher than the Chinese alligator (0.015%) (Wan et al. 2013) and the Yangtze River dolphin (0.0121%) (Zhou et al. 2013). To further characterize the inbreeding level of the crocodile lizards, we first calculated the individual inbreeding coefficient (FIS) of the 31 samples. The average FIS of the four populations is consistent with the trend of genetic diversity (supplementary table S7, Supplementary Material online): VN (0.48) = GDLK (0.48) > GXDGS (0.39) > GDMM (0.21). The higher the average FIS is, the lower the genetic diversity. The direct consequence of inbreeding on the genome is creating long ranges of homozygous regions across chromosomes (Kardos et al. 2018). We characterized the heterozygosity across 16 chromosomes of crocodile lizards using the fraction of heterozygous SNPs (supplementary fig. S10, Supplementary Material online). Surprisingly, several individuals have almost an entire chromosome in homozygosity, such as GDLK_4, GXDGS_5, and VN_2. Exceedingly long homozygous regions in the genome indicated very recent inbreeding.
Fig. 3.
Characterization of inbreeding and genetic loads in crocodile lizards. (a) Heterozygosity comparisons within the four populations. Whole genome heterozygosity for each sample was calculated as the total number of heterozygous sites divided by the length of the total covered region of the genome. (b) Comparison of FROH among the four populations. Letters on top of each box in (a) and (b) are results of post hoc tests using Student–Newman–Keuls method after significance of one-way analysis of variance. Different letters between two populations represent pairwise significance (P < 0.05). (c) Regression of whole genome heterozygosity against FROH (R2 = 0.82, F(1,29) = 130.10, P < 0.0001). (d) Linear regression of the proportion of LoF mutations in homozygotes against FROH (R2 = 0.85, F(1,20) = 113.20, P < 0.0001). (e) Linear regression of scaled LoF allele frequency against FROH (R2 = 0.42, F(1,20) = 14.71, P = 0.001). (f) Linear regression of scaled missense allele frequency against FROH (R2 = 3.2 × 10−5, F(1,20) = 6.5 × 10−4, P = 0.98). (g) The normalized R/ statistic for LoF and missense mutations between the three populations. Horizontal line of 1 indicates nondifference in relative number of derived alleles. Error bars represent ±2 SD. (h) Circles show the number of LoF variant sites in each population where homozygosity exists for the derived allele. Boxplots show the distribution of the same statistic for 10,000 sampled missense variant sites. Numbers are scaled by the median of missense samples. Whiskers show 5th and 95th percentiles. P values are the proportion of synonymous samples lower than the corresponding LoF count. (i) Circles show the number of LoF variant sites in ROH regions and outside ROH regions normalized by the number of synonymous variant sites in the same region for each individual. P values were from Kolmogorov–Smirnov test for difference in distribution between homozygous ROH and outside ROH regions within each population. If there exist recessive LoF alleles with a deleterious effect on viability or survival, they should be less common in homozygous regions, where they are exposed, than elsewhere in the genome.
Characterization of inbreeding and genetic loads in crocodile lizards. (a) Heterozygosity comparisons within the four populations. Whole genome heterozygosity for each sample was calculated as the total number of heterozygous sites divided by the length of the total covered region of the genome. (b) Comparison of FROH among the four populations. Letters on top of each box in (a) and (b) are results of post hoc tests using Student–Newman–Keuls method after significance of one-way analysis of variance. Different letters between two populations represent pairwise significance (P < 0.05). (c) Regression of whole genome heterozygosity against FROH (R2 = 0.82, F(1,29) = 130.10, P < 0.0001). (d) Linear regression of the proportion of LoF mutations in homozygotes against FROH (R2 = 0.85, F(1,20) = 113.20, P < 0.0001). (e) Linear regression of scaled LoF allele frequency against FROH (R2 = 0.42, F(1,20) = 14.71, P = 0.001). (f) Linear regression of scaled missense allele frequency against FROH (R2 = 3.2 × 10−5, F(1,20) = 6.5 × 10−4, P = 0.98). (g) The normalized R/ statistic for LoF and missense mutations between the three populations. Horizontal line of 1 indicates nondifference in relative number of derived alleles. Error bars represent ±2 SD. (h) Circles show the number of LoF variant sites in each population where homozygosity exists for the derived allele. Boxplots show the distribution of the same statistic for 10,000 sampled missense variant sites. Numbers are scaled by the median of missense samples. Whiskers show 5th and 95th percentiles. P values are the proportion of synonymous samples lower than the corresponding LoF count. (i) Circles show the number of LoF variant sites in ROH regions and outside ROH regions normalized by the number of synonymous variant sites in the same region for each individual. P values were from Kolmogorov–Smirnov test for difference in distribution between homozygous ROH and outside ROH regions within each population. If there exist recessive LoF alleles with a deleterious effect on viability or survival, they should be less common in homozygous regions, where they are exposed, than elsewhere in the genome.The low genetic diversity and a high degree of inbreeding are reflected in the fraction of runs of homozygosity (ROH or long stretches of homozygous regions) across a genome. Of the 31 crocodile lizard individuals analyzed, a total of 69,246 ROHs ranging from 0.1 to 7.3 Mb were identified. We then analyzed the cumulative percentage of ROH in the genomes of crocodile lizards. Homozygous regions of the crocodile lizard genome accounted for about 20–60% of the total genome length (supplementary fig. S11, Supplementary Material online), which is lower than in the highly inbred brown-eared pheasant populations (>80%) (Wang et al. 2021), comparable to mountain and eastern lowland gorillas (34.5% and 38.4%), and larger than in most human populations (3.2–38.7%) (Pemberton et al. 2012). Excepting the GDMM population, the crocodile lizard populations have a higher fraction of ROH in their genomes (most individuals > 30%), with the highest individual exceeding 60%. This indicated greater inbreeding in the GXDGS, GDLK, and VN populations. The fraction of ROH in the genome (FROH) of GDMM is significantly smaller than in the other three populations (fig. 3). The whole-genome heterozygosity of an individual is negatively correlated with FROH (R2 = 0.82, F(1,29) = 130.10, P < 0.0001) (fig. 3). These results demonstrated that severe inbreeding caused ROH accumulation in the genome and a dramatic decline in the overall genetic diversity in crocodile lizard populations.
Enhanced Homologous Segments and Purging of Deleterious Alleles
The analysis of genomic diversity and ROH revealed a high level of inbreeding in crocodile lizard populations. We further quantified the genetic load by analyzing the accumulation of deleterious alleles. SNPs in gene regions were grouped into three categories based on their potential impact on gene function: loss of function (LoF), missense, and synonymous. The fraction of LoF alleles in homozygotes was positively correlated with FROH (R2 = 0.85, F(1,20) = 113.20, P < 0.0001) (fig. 3). The fractions of missense and synonymous variant alleles showed similar trends (supplementary figs. S12 and S13, Supplementary Material online). In contrast, the frequencies of LoF and missense alleles did not accrue with FROH, but rather showed a negative correlation for LoF alleles (R2 = 0.42, F(1,20) = 14.71, P = 0.001) (fig. 3) and no significant linear correlation for missense alleles (R2 = 3.2 × 10−5, F(1,20) = 6.5 × 10−4, P = 0.98) (fig. 3). The pairwise R-statistic validated the relatively scarce number of LoF alleles in the more inbred populations (fig. 3). Compared with the less inbred GDMM population, GXDGS and GDLK populations had relatively fewer LoF alleles. In terms of missense alleles, there was a much smaller difference between the populations. We also counted the average number of derived LoF and missense variants for each population as an alternative measurement of the amount of putatively deleterious alleles per population. In accordance with the R-statistic, the two more inbred populations showed a greater reduction in the number of deleterious missense variants (supplementary table S8, Supplementary Material online). These results imply that although inbreeding has caused the increase in overall genetic loads, the purging of strongly deleterious alleles is also working effectively in the more inbred GXDGS and GDLK populations. Compared with missense variants, the GDMM population showed no deficit of homozygous LoF genotypes (fig. 3), and all three populations did not lack LoF genotypes in the ROH regions (fig. 3).
Simulations of the Effect of Purging in Contracting Populations
Our genomic results revealed an enhanced level of purging in highly inbred populations of crocodile lizards. Additionally, the reconstructed demographic history showed that crocodile lizard populations all experienced sharp population declines in the last 1,000 years. These findings allowed for further exploration of the effect of purging in contracting populations. A model of a contracting population using parameters based on population demography and history of crocodile lizards was created (fig. 4), where the first contraction in the model corresponded to the population decline in the last 1,000 years. The second contraction, caused by rapid habitat destruction and poaching, was set to be ∼100 years ago, which led to the current endangered population. The carrying capacity of 50 individual lizards is an estimate based on field surveys along a 1.7-km stream located within the range of the GXDGS population in 2018. Forward-in-time simulations were conducted in SLiM using the non-Wright–Fisher method to allow for overlapping generations and an absolute fitness (Haller and Messer 2019; Kyriazis et al. 2021). Deleterious mutations were classified into very strongly deleterious (s ≤ −0.05), strongly deleterious (s ≤ −0.01), moderately deleterious (−0.01 < s ≤ −0.001), and weakly deleterious (−0.001 < s ≤ −0.00001). Our results confirmed the purging of strongly deleterious alleles in the simulated genomes based on the crocodile lizard population history, and further predicted long-term patterns under small population size (fig. 4). First, the moderate contraction over 1,000 generations (carrying capacity K = 1,000) resulted in the average purging of 67.1% of very strongly deleterious alleles (Wilcoxon rank-sum test, P < 0.0001) and 48.1% of strongly deleterious alleles (Wilcoxon rank-sum test, P < 0.0001) per genome. However, there were no significant changes in the quantity of moderately and weakly deleterious alleles. In the meantime, the average heterozygosity decreased by 20.4% (Wilcoxon rank-sum test, P < 0.0001). Mean fitness had only a slight 0.27% decrease (Wilcoxon rank-sum test, P = 0.0036), and long ROHs in the genome started to emerge. Following a second severe contraction to a carrying capacity of K = 50, strongly deleterious alleles underwent further purging, with the average number of very strongly deleterious alleles continuing to decline by 34.6% in 100 generations (Wilcoxon rank-sum test, P = 0.0020) and 25.1% in terms of strongly deleterious alleles (Wilcoxon rank-sum test, P = 0.0004). Moderately and weakly deleterious alleles still did not significantly change in the 100 generations after the second contraction. However, aside from the very strongly deleterious alleles that were able to be purged effectively and almost completely, others experienced gradual accumulation during periods of long-term small population size (supplementary fig. S14, Supplementary Material online). The accumulated deleterious alleles mainly consist of fixed deleterious mutations (supplementary fig. S15, Supplementary Material online). Mean heterozygosity had a sharp 40.7% decrease in 100 generations after the second contraction (Wilcoxon rank-sum test, P < 0.0001), and continued to decline until nearly all heterozygotes were lost due to inbreeding. The mean FROH (only including ROHs > 1 Mb) reached 0.029, and the mean fitness declined by 2.0% (Wilcoxon rank-sum test, P < 0.0001) during this contraction. With more generations with a small population size, FROH increased sharply until reaching an equilibrium of about 0.96, whereas the mean fitness declined gradually.
Fig. 4.
Genetic purging in a simulated history of the crocodile lizard. (a) Schematic depictions of demographic models used in simulations. The “present” refers to 100 generations after the second contraction, whereas our simulations continue for a total of 6,000 generations. (b)–(f) Charts showing different genomic parameters monitored during the simulation with K = 50 for the second contraction. Dots are the mean of the individuals within a single replicate; in total, 25 replicates were run. Bar height represents the mean across all replicates. Note that the x-axis shows discrete points during the simulation rather than continuous time. Overlapping generations were used in our simulations, so a generation corresponds to a year (reproduction happens every year).
Genetic purging in a simulated history of the crocodile lizard. (a) Schematic depictions of demographic models used in simulations. The “present” refers to 100 generations after the second contraction, whereas our simulations continue for a total of 6,000 generations. (b)–(f) Charts showing different genomic parameters monitored during the simulation with K = 50 for the second contraction. Dots are the mean of the individuals within a single replicate; in total, 25 replicates were run. Bar height represents the mean across all replicates. Note that the x-axis shows discrete points during the simulation rather than continuous time. Overlapping generations were used in our simulations, so a generation corresponds to a year (reproduction happens every year).Simulations with different carrying capacities in the second severe contraction (K = 200, 100, 50, and 25) were performed to see if the purging patterns and drift happened and compare their differences. The purging of very strongly deleterious and strongly deleterious alleles was faster with smaller K values. However, after the initial decline, the accumulation of strongly deleterious alleles occurred earlier with smaller K (fig. 5). Specifically, when K = 200, the strongly deleterious alleles kept decreasing, and did not show significant turnover even after 5,000 generations of small population size (Wilcoxon rank-sum test, P = 0.7635, compared with the point of 2,000 generations after the second contraction). Moderately deleterious alleles accumulated faster with smaller K values, whereas weakly deleterious alleles accumulated but showed no clear trend under different K values (fig. 5). Except for the very strongly deleterious mutations that were rarely fixated, deleterious mutations generally underwent faster fixation under smaller K values, which explains the long-term accumulation of strongly deleterious alleles despite initial purging after contracting to extremely small population sizes (supplementary fig. S16, Supplementary Material online). With smaller K values, more rapid loss of genetic diversity (supplementary fig. S17, Supplementary Material online), accumulation of long ROHs in the genome (supplementary fig. S18, Supplementary Material online), and decline in the mean fitness of the population (supplementary fig. S19, Supplementary Material online) were noted.
Fig. 5.
The changes of deleterious alleles after the second contraction with different K values. (a)–(d) The changes of very strongly, strongly, moderately, and weakly deleterious alleles, respectively. Smaller carrying capacity resulted in faster purging of strongly deleterious alleles and an earlier turnover after initial purging. Lines were drawn from the mean of 25 replicates in each generation from generation 1,000 (beginning of the second contraction) to 6,000.
The changes of deleterious alleles after the second contraction with different K values. (a)–(d) The changes of very strongly, strongly, moderately, and weakly deleterious alleles, respectively. Smaller carrying capacity resulted in faster purging of strongly deleterious alleles and an earlier turnover after initial purging. Lines were drawn from the mean of 25 replicates in each generation from generation 1,000 (beginning of the second contraction) to 6,000.These results might explain the relatively fewer LoF alleles in the two more inbred crocodile lizard populations (GDLK and GXDGS) compared with the GDMM population. Although the GXDGS and GDLK populations were reported to be 100 and 220 respectively in 2008 (Huang et al. 2008), field surveys at the time of our sample collection suggested less than 100 adults in each population. The GDMM population of crocodile lizards is the last wild population found in China and has been poorly surveyed due to difficulties accessing its habitat, implying less human disturbance. The GDMM population was estimated to be over 1,000 in 2008 (Huang et al. 2015). At our sampling time, it had a larger population than GXDGS and GDLK, leading to less inbreeding, and relatively more strongly deleterious alleles, as predicted by our simulations.We applied several alternative models to explore the nature of deleterious alleles in small populations. We first compared the simulation results of simple constant models with small population sizes (K = 200, 100, 50, and 25) and moderate population size (K = 1,000). Following the original crocodile lizard models, strongly and moderately deleterious alleles accumulated over time in small populations, but a population size of K = 200 prevented the accumulation of strongly deleterious alleles. A population size of K = 1,000 could maintain the number of moderately deleterious alleles to a relatively low level (supplementary fig. S20, Supplementary Material online), indicating effective purging with a moderate population size. The effect of a bigger ancestral population on the efficiency of purging was then tested. A 2-fold ancestral population size of K = 20,000 resulted in a 39.2% increase of strongly deleterious alleles in the ancestral population, but they were effectively purged after a population contraction, leaving the remaining number of strongly deleterious alleles only 18.0% more than that of the original model at the end of the first contraction (supplementary fig. S21, Supplementary Material online). As the mutation rate of crocodile lizards is relatively low compared with other squamates (supplementary note 2, Supplementary Material online), simulations with mutation rates that were 2- and 3-fold that of the original parameter were also performed. A faster mutation rate resulted in more deleterious alleles after the burn-in but the same pattern of purging after population contractions was noted (supplementary figs. S22 and S23, Supplementary Material online).
Discussion
The Chinese crocodile lizard is widely recognized as a “living fossil” based on its morphological characteristics, absence of closely related species, and unique habitat preferences. Here, we conducted population genomic analyses of this relic species and found that the effective population size of S.crocodilurus showed a trend of declining over the quaternary glaciation cycles in the last 1 My. The distribution of crocodile lizards in the wild has dwindled to a handful of fragmented populations across China and Vietnam. The genetic diversity in the remaining populations is relatively low compared with other endangered animals. The GDMM population, the largest of the four populations, had smaller portions of their genomes in ROH compared with the other populations we studied, suggesting a lower level of inbreeding. We also found evidence that severe inbreeding had already resulted in reducing genetic diversity and accumulation of genetic loads in some populations. However, the frequency of LoF alleles in the highly inbred GXDGS and GDLK populations was lower than the frequency of LoF alleles in the GDMM population, helping to buffer the effect of inbreeding depression (Xue et al. 2015; Robinson et al. 2018). The lack of ancient samples of crocodile lizards or any closely related species prevented the direct quantification of changes in the rate of deleterious alleles over crocodile lizard history. Nevertheless, our simulation results have illustrated the rapid purging of strongly deleterious alleles in a contracted population based on our crocodile lizard model. We found that very strongly deleterious alleles are almost completely purged in all populations (average number < 1 for an individual) after two population contractions to a final carrying capacity below 200 or less. Overall, strongly deleterious alleles were continually purged following the second contraction, but a smaller population led to faster purging in the first hundred generations.Kyriazis et al. (2021) suggested that populations that were larger in the past might be at a higher risk of extinction following a bottleneck due to higher accumulation and masking of deleterious alleles, which quickly become exposed after contraction due to inbreeding. However, their main models assumed a rather extreme population history (a single abrupt bottleneck after the burn-in), and they added multiple random factors when simulating small populations (including random shifts in carrying capacity and random natural catastrophes). These may have reduced the effectiveness of genetic purging. Our gradual contraction model using crocodile lizard demographics shows that effective purging of strongly deleterious alleles occurs after population contractions. Apart from differences in genomic parameters and demographic models, our simulation results used models with moderate ecological stochasticity under small populations by only adding random natural catastrophes after the second contraction. We noted that while increasing the stochasticity reduced the effectiveness of purging (supplementary fig. S24, Supplementary Material online), decreasing random factors enhanced purging (supplementary fig. S25, Supplementary Material online). Recessive deleterious alleles are exposed to selection when contracting to small populations due to intensive inbreeding. Genetic purging mitigates inbreeding depression initially but, with prolonged bottlenecking, the continual decrease of Ne may reduce the power of selection against strongly and moderately deleterious alleles, leading to the fixation of deleterious mutations. This may be common in some endangered animals. For example, a recent study found that brown-eared pheasant populations had an average of >80% of ROHs in the genome, nearly twice the number of ROHs in the crocodile lizard (Wang et al. 2021). Extremely high levels of inbreeding in the brown-eared pheasant are caused either by extremely small population size or long-term inbreeding in small populations, both contributing to the accumulation of deleterious alleles. The island fox has also been isolated in small populations for thousands of years, resulting in the loss of genetic diversity and accumulation of deleterious alleles (Robinson et al. 2016, 2018).Our empirical observations demonstrated a higher efficiency in the purge of deleterious alleles in smaller crocodile lizard populations. These results suggested no genetic indication of extinction risk in crocodile lizards, despite the increased inbreeding and loss of genetic diversity. This supports a recent perspective that, though disputed (DeWoody et al. 2021; Kardos et al. 2021), overall genetic diversity is poorly correlated with the risk of species extinction (Teixeira and Huber 2021). The traditional conception about the importance of genetic diversity is that neutral diversity is correlated with adaptive diversity (Mackintosh et al. 2019; DeWoody et al. 2021; Fernandez-Fournier et al. 2021). The contraction of population size inevitably leads to the loss of genetic diversity over time, which is suggested to limit the ability to adapt to changing environments, and thus increasing the risk of extinction (Kardos 2021). However, our results show that, at least for deleterious variants, strongly deleterious alleles behaved differently to more neutral ones in a small population size. Evidence shows that the diversity of functional genes, such as the major histocompatibility complex genes, can be maintained in populations with a low neutral diversity (Aguilar et al. 2004; Zhang et al. 2018; Escoda and Castresana 2021). Functional variants serve as the basis of adaptive potential (DeWoody et al. 2021; Teixeira and Huber 2021), although whether overall genetic diversity is adequate to assess population viability is still under debate. It is important to develop functional markers, which may greatly influence the fitness of populations in the wild, using genomic techniques (Kardos et al. 2018; Escoda and Castresana 2021). Our results also emphasize the importance of knowing the purging stage of functional variants, particularly the strongly deleterious variants. A better understanding of the functional genetic diversity will facilitate the effective genetic management of endangered populations. In the case of crocodile lizards, its populations, which are highly fragmented with extremely low census numbers, are under threat from enhanced environmental stochasticity, although our results suggest that inbreeding depression is not a major threat to its current survival. Conservation efforts should focus on the recovery of the population size to reduce the risk of extinction from random factors. Increasing the gene flow by genetic rescue may also be considered for increasing functional genetic diversity.More generally, our results provide field evidence for a viable role of the effect of purging on maintaining population fitness in endangered species with diminishing populations. Here, demographic history influenced the effect of purging in small populations due to the extent of population contraction and duration of bottlenecks. Inconsistency in the results from different studies may reflect the population history experienced before the sample collection, further highlighting the importance of accurate demographic information for the conservation and management of endangered species. Genetic purging may reduce the decline of fitness in the early stages of population contraction, but deleterious alleles can accumulate in the long term, as predicted by the extinction vortex model (Gilpin and Soulé 1986; Caughley 1994). Future exploration of these genomic effects during population contractions using empirical studies and in silico simulations would be of great value for assessing the genetic vulnerability of endangered species.
Materials and Methods
Genome Sequencing and Annotation
The individual used for de novo genome assembly was a captive female crocodile lizard that died from an infection in August 2017 at the breeding center of Daguishan Nature Reserve, Guangxi province, China (individual ID 128, 3 years old). Sampling for sequencing was approved and the lizard’s body was kept at −20 °C immediately after death and stored at −80 °C after transport to the laboratory. Muscle tissue was used for genomic sequencing. Genomic DNA was extracted using the Qiagen DNA Genomic kit (Qiagen, Valencia, CA). Four types of whole-genome sequencing data were generated: Illumina short-reads, PacBio long-reads, 10× Genomics reads, and Hi-C reads. Details of the sequencing libraries, genome assembly and assessment, repeat annotation, and functional gene annotation are described in supplementary note 1, Supplementary Material online.
Phylogenetic Tree Construction Using Genomic Data
We combined a 801,084-bp whole-genome alignment DNA sequence and a 252,961 aa one-to-one ortholog protein sequence to get a less biased data set. The whole-genome DNA alignment of the 25 vertebrate species we used was extracted as previously described (Chen et al. 2019). Genomes were first aligned to the reference genome (S. crocodilurus) by LAST v809 (Kiełbasa et al. 2011) using the “lastal” command with the parameter –E0.05. The best pairwise aligned blocks were then obtained using the “maf-swap” command. Finally, shared pairwise alignment of all the species were generated using MULTIZ v11.2 (Blanchette et al. 2004). For protein alignment, one-to-one ortholog proteins were obtained using Orthomcl V1.4 (Chen et al. 2006). A total of 651 one-to-one ortholog protein sequences were concatenated and trimmed by trimAl v1.4 (Capella-Gutiérrez et al. 2009) using the option “automated1.” A partition tree combining DNA and protein data (Chernomor et al. 2016) was generated using iqtree v1.6.12 (Nguyen et al. 2015) with the parameter –bb10000 (ultrafast bootstrap) (Hoang et al. 2018). Divergence time estimation was conducted under Bayesian model in the MCMCTREE program in PAML v4.7 packages (Yang 2007). The fossil calibrations are available in supplementary table S9, Supplementary Material online.
Saliva Sampling and Sequencing
Wild crocodile lizard saliva samples from three Chinese populations, GXDGS (Daguishan Nature Reserve, Guangxi province, n = 16), GDLK (Luokeng Nature Reserve, Guangdong province, n = 10), and GDMM (Linzhouding Nature Reserve, Maoming city, Guangdong province, n = 11) and one Vietnamese population VN (Tay Yen Tu Nature Reserve, Bac Giang Province; Yen Tu and Dong Son-Ky Thuong Nature Reserves, Quang Ninh Province; n = 10) were collected. Field sampling was done in 2017 for the VN populations and in 2018 for the three Chinese populations. Field samples were collected at night when the lizards perched on high branches, and they were released in the same place to minimize disturbance. No more than five samples were collected from lizards along the same stream to minimize close relatives. Note that six samples for the GXDGS population (GXDGS_11–GXDGS_16) were collected from individuals rescued from highly destroyed habitats during 2010 to 2013 and kept in the breeding center. All samples for the GDMM population were collected from individuals rescued from poachers in 2008. Saliva samples were collected using sterile oral swabs. The tips were then stored in 100% ethanol and kept at −20 °C after being transported to the laboratory. Whole-genome DNA was extracted from the swab tips using the Qiagen DNA Genomic kit. Short-read libraries were constructed with 350 bp insert sizes and sequenced on an Illumina HiSeq platform using paired-end 150 bp reads. A total of 1,100 Gb of clean data were generated.
Read Mapping and Variant Calling
Clean reads were obtained by removing adaptors and low-quality reads (reads with more than 10% unidentified nucleotides or reads with more than 50% low-quality bases [Phred Q-score < 10]). Valid clean reads were mapped to the genome using BWA v0.7.8 (Li and Durbin 2009) (parameters: mem -t 4 -k 32 -M). Duplicates were removed using Samtools v1.3 (Li et al. 2009) option “rmdup.” Variant calling for all samples were performed using Samtools with parameter “mpileup -q 1 -C 50 -t SP -t DP -m 2 -F 0.002.” Variants were then filtered using the following criteria: 1) read depth should be higher than 5 and lower than 200, 2) genotype quality of a site should be more than 20, and 3) minor allele frequency was 0.1 and variant sites missing more than 10% of data were filtered out.
Phylogenetic Tree and Population Structure
We constructed neighbor-joining trees using TreeBest v1.9.2 (Vilella et al. 2009) with 1,000 bootstrap replicates. Short-read data of O.gracilis (SRA: SRP052050) were mapped to the S. crocodilurus genome and served as the outgroup. GCTA (version 1.24.2) software (Yang et al. 2011) was used to conduct PCA. The population structure of the 47 individuals was inferred using ADMIXTURE v1.23 (Zhou et al. 2011). Genetic clusters K were defined from two to five, and the best structure was determined with the lowest cross-validation error. F3 statistics were calculated using the qp3Pop program from AdmixTools v5.1 (Patterson et al. 2012). Pairwise FST was calculated in a 40-kb sliding window with 20 kb step size using VCFtools v0.1.14 (Danecek et al. 2011), and averaged genome-wide to compare between populations.
Population Demography Reconstruction
The PSMC (Li and Durbin 2011) method was applied to estimate historical Ne. Scaffolds shorter than 50 kb were removed from the input data set. The generation time was set to 4 years according to captive breeding data from Daguishan Nature Reserve, Guangxi province, China. The mutation rate (the number of substitutions per site per generation) was estimated to be 3.05 × 10−9. The neutral mutation rate was calculated using 4-fold degenerate site alignment of five species (Chinese crocodile lizard, Asian glass lizard, green anole lizard, Australian bearded dragon lizard, and Japanese gecko) in the BaseML program in the PAML4 packages (Yang 2007). The squamate lineages showed different molecular evolutionary rates (supplementary fig. S4, Supplementary Material online), where the crocodile lizard has a relatively lower molecular evolutionary rate than the other squamates (supplementary note 2, Supplementary Material online). We used a local clock with three different mutation rates (the branch of the Chinese crocodile lizard was the second rate and the branches of the Australian bearded dragon lizard and green anole lizard were the third rate). The divergence time between the Chinese crocodile lizard and the Asian glass lizard was used for calibration, and the alpha parameter was not fixed. The four individuals with the highest average sequencing depth represented each population. Estimates of the sea surface temperature in the Pleistocene were taken from the literature (Hansen et al. 2013).Additionally, MSMC (Schiffels and Durbin 2014) analyses were conducted. Four individuals per population were used for historical Ne estimation and the RCCR analyses were conducted pairwise. PopSizeABC was completed using the reformatted version with a config file (https://github.com/stsmall/popsizeabc) (Boitard et al. 2016). Prior distributions of the population sizes were sampled uniformly from log10(N0) (N0 ranging from 10 to 100,000), and a uniform prior between 1 × 10−9 and 1 × 10−8 was used for the recombination rate. The mutation rate of the crocodile lizard was set to 3.05 × 10−9 per site per generation, assuming a generation time of 4 years. Two summary statistics, the site frequency spectrum and linkage disequilibrium statistics, were used in the ABC analysis. Summary statistics were calculated at 21 discrete time windows (the oldest time window started 130,000 generations before the present) based on the empirical data set, and compared with the corresponding statistics calculated from simulated data sets. Minor allele frequency for calculating summary statistics was 0.2 to minimize prediction error. For simulated data sets, the number of segments was 100 (the size of a segment was 2,000,000 bp), and the number of simulated data sets was 100,000. Other parameters remained at default. ABC estimations of the population parameters were obtained by neural network regression with 0.01 tolerance.
Relatedness Analysis
Pairwise relatedness estimation using KING v2.2.7 (Manichaikul et al. 2010) was performed. The pairwise kinship coefficient of the 47 individuals was inferred from whole-genome SNPs, and estimated kinship coefficient ranges of [>0.354], [0.177, 0.354], [0.0884, 0.177], and [0.0442, 0.0884] were assigned to monozygotic twin, first-degree, second-degree, and third-degree relations, respectively. A total of 16 individuals were excluded to eliminate relations stronger than a third-degree relationship between any pairs in the remaining samples (supplementary fig. S9, Supplementary Material online). The following analyses used the remaining 31 individuals for comparison of population parameters.
Genetic Diversity Estimation and Detection of ROH
θ was calculated for each population in a 40-kb sliding window with 20 kb step size using VCFtools v0.1.14 (Danecek et al. 2011), and was averaged to obtain a genome-wide statistic. Whole-genome heterozygosity for each sample was calculated as the total number of heterozygous sites divided by the length of the total covered region of the genome. The inbreeding coefficient (FIS) parameter for every individual was obtained using PLINK v1.90b6.10 (Purcell et al. 2007). ROH were identified using PLINK. The scanning window was set to 5,000 kb (20 SNPs). The minimum length of a segment to be called homozygous was set to 100 kb (10 SNPs) and the max inverse density (in kb/SNP) was 10. Other parameters were set as default. Overlapping homozygous segments were grouped with the option “homozyg-group.” We assessed individual inbreeding levels using FROH, which was defined as the fraction of the genome in ROH (Kardos et al. 2015).
Genetic Load Estimation and Comparison
We annotated genomic SNPs using SnpEff v4.3t (Cingolani et al. 2012). Because deleterious SNPs often exist in low frequency, we did not filter minor allele frequencies of our SNP sites (other filters were used as previously described). Estimating deleterious loads requires the assignment of an ancestral state for comparison. As there is no close relative of the Chinese crocodile lizard, we used the most divergent population (Vietnamese population) to serve as the outgroup. We made the Chinese populations the focus of genetic load estimation. We excluded sites where the outgroup population contained multiple alleles. SnpEff annotated variants to different types after comparing the reference genome sequence and gene annotation information. The variants were grouped in three categories: 1) LoF (variants with high impact on gene structures [stop gained, stop lost, start lost, start gained]); 2) missense; and 3) synonymous. The fraction of derived alleles in homozygotes was calculated with the formula:The allele frequency of a set of sites was calculated as:derived allele count in all sites ÷ total allele count.The frequency of synonymous variants normalized allele frequencies of LoF and missense variants before comparison to minimize potential biases, such as different sample sizes.The relative abundance of derived allele frequency of a particular category of sites (A) between two populations (X and Y) was calculated following the formula (Do et al. 2015; Xue et al. 2015):
where L is the total expected likelihood value of a derived allele to be found in a population but not the other under random samplings. L was calculated using:The derived allele frequency (f) of each site was calculated using , where is the number of derived alleles in population X in site i, and is the total number of alleles. We calculated the corresponding RX/Y(S) for synonymous sites for normalization, accounting for potential biases. The final statistic is:Estimates of the variance of was obtained using 100 block jackknife samples on the set of sites in A. is expected to be 1 if there is no difference in derived allele frequencies between the two populations. We calculated pairwise for both LoF and missense variants for the three Chinese populations to compare mutation loads.We further counted the number of LoF variant sites with at least one individual in homozygosity of derived alleles (nhom) for every population and the number of LoF variant sites occurring in ROH and non-ROH regions for every individual (Xue et al. 2015). nhom of the LoF variants was normalized by nhom of the synonymous variants. We randomly sampled synonymous variant sites with the same number of LoF variant sites in the population 10,000 times, and determined nhom. The nhom of LoF sites was scaled by the median of 10,000 synonymous samples. The nhom of synonymous samples was also scaled by its median to form distribution, and the P value of how LoF sites differed from synonymous sites was assessed by the proportion of synonymous samples lower than the observed LoF sites in nhom. LoF variant sites in ROH and non-ROH regions were normalized by the number of synonymous sites in the same region.
Genomic Simulation of the Purging of Deleterious Alleles
Forward-in-time simulations were performed in SLiM v3.6 (Haller and Messer 2019) under a non-Wright–Fisher model, as initially implemented by Kyriazis et al.(2021). Based on genome characteristics of the crocodile lizard, we simulated a diploid genome with 20,000 genes. Each gene was 1,500 bp long, representing the total exon length of a protein-coding gene, and the gene accumulated mutations at a rate of 3.05 × 10−9 per site per generation (calculated assuming a 4-year generation time). Sixteen chromosomes carried genes with gene numbers proportional to chromosome lengths. Recombination rate between genes was set to 1 × 10−3 per site per generation to reach an effective recombination rate of 1 × 10−8 per site per generation in the 100 kb noncoding region between two genes. Recombination within a gene disallowed and recombination between chromosomes was free. The ratio of deleterious to neutral mutations was set to 2.31:1 (Huber et al. 2017) and the selection coefficient (s) for deleterious alleles was obtained by the distribution of fitness effects inferred from human data (Kim et al. 2017). Mixed dominance coefficients were used, given that more deleterious alleles tend to be more recessive (h = 0 when s < −0.01 and h = 0.25 when s ≥ −0.01) (Kyriazis et al. 2021). Based on our demographic results showing that crocodile lizard populations have experienced continuous contractions in the last 1,000 years, we ran each simulation with an ancestral carrying capacity (K) of 10,000 for 100,000 generations (10 × K) for the burn-in process. The first contraction ran for 1,000 generations with K = 1,000, and the second contraction ran until the population went extinct or after 5,000 generations in a small population with K = 200, 100, 50, and 25. Random natural catastrophes were modeled in the final small population by adding random deaths. The probability of mortality for each generation was drawn from a beta distribution with α = 0.5 and β = 8. We assumed a hermaphroditic random mating population, and individuals with age > 1 reproduced every generation. During each simulation, we monitored the average number of alleles per individual for very strongly (s ≤ −0.05), strongly (s ≤ −0.01), moderately (−0.01 < s ≤ −0.001), and weakly (−0.001 < s ≤ −0.00001) deleterious mutations. The mean heterozygosity, FROH (calculated using ROHs > 1 Mb to monitor recent inbreeding), and population fitness were also tracked. We ran 25 replicates of each model to test statistical significance. Wilcoxon rank-sum tests were implemented in R v4.0.5. Scripts for simulations are available on Github at https://github.com/XieHongX/Crocodile_lizard_simulation.
Supplementary Material
Supplementary data are available at Molecular Biology and Evolution online.Click here for additional data file.
Authors: Petr Danecek; Adam Auton; Goncalo Abecasis; Cornelis A Albers; Eric Banks; Mark A DePristo; Robert E Handsaker; Gerton Lunter; Gabor T Marth; Stephen T Sherry; Gilean McVean; Richard Durbin Journal: Bioinformatics Date: 2011-06-07 Impact factor: 6.937