Alessio Iannucci1, Andrea Benazzo2, Chiara Natali1, Evy Ayu Arida3, Moch Samsul Arifin Zein3, Tim S Jessop4, Giorgio Bertorelle2, Claudio Ciofi1. 1. Department of Biology, University of Florence, Firenze, Italy. 2. Department of Life Sciences and Biotechnology, University of Ferrara, Ferrara, Italy. 3. Research Center for Biology, The Indonesian Institute of Sciences (LIPI), Cibinong Science Center, Cibinong, Indonesia. 4. School of Life and Environmental Sciences, Deakin University, Geelong, Vic., Australia.
Abstract
Population and conservation genetics studies have greatly benefited from the development of new techniques and bioinformatic tools associated with next-generation sequencing. Analysis of extensive data sets from whole-genome sequencing of even a few individuals allows the detection of patterns of fine-scale population structure and detailed reconstruction of demographic dynamics through time. In this study, we investigated the population structure, genomic diversity and demographic history of the Komodo dragon (Varanus komodoensis), the world's largest lizard, by sequencing the whole genomes of 24 individuals from the five main Indonesian islands comprising the entire range of the species. Three main genomic groups were observed. The populations of the Island of Komodo and the northern coast of Flores, in particular, were identified as two distinct conservation units. Degrees of genomic divergence among island populations were interpreted as a result of changes in sea level affecting connectivity across islands. Demographic inference suggested that Komodo dragons probably experienced a relatively steep population decline over the last million years, reaching a relatively stable Ne during the Saalian glacial cycle (400-150 thousand years ago) followed by a rapid Ne decrease. Genomic diversity of Komodo dragons was similar to that found in endangered or already extinct reptile species. Overall, this study provides an example of how whole-genome analysis of a few individuals per population can help define population structure and intraspecific demographic dynamics. This is particularly important when applying population genomics data to conservation of rare or elusive endangered species.
Population and conservation genetics studies have greatly benefited from the development of new techniques and bioinformatic tools associated with next-generation sequencing. Analysis of extensive data sets from whole-genome sequencing of even a few individuals allows the detection of patterns of fine-scale population structure and detailed reconstruction of demographic dynamics through time. In this study, we investigated the population structure, genomic diversity and demographic history of the Komodo dragon (Varanus komodoensis), the world's largest lizard, by sequencing the whole genomes of 24 individuals from the five main Indonesian islands comprising the entire range of the species. Three main genomic groups were observed. The populations of the Island of Komodo and the northern coast of Flores, in particular, were identified as two distinct conservation units. Degrees of genomic divergence among island populations were interpreted as a result of changes in sea level affecting connectivity across islands. Demographic inference suggested that Komodo dragons probably experienced a relatively steep population decline over the last million years, reaching a relatively stable Ne during the Saalian glacial cycle (400-150 thousand years ago) followed by a rapid Ne decrease. Genomic diversity of Komodo dragons was similar to that found in endangered or already extinct reptile species. Overall, this study provides an example of how whole-genome analysis of a few individuals per population can help define population structure and intraspecific demographic dynamics. This is particularly important when applying population genomics data to conservation of rare or elusive endangered species.
Single‐ and multilocus molecular assays have been efficiently employed in a multitude of population and conservation genetic studies on endangered species to assess population divergence and gene flow, and how palaeogeographical and historical factors as well as habitat fragmentation may have affected contemporary population dynamics (e.g., Amos & Balmford, 2001; DeSalle & Amato, 2004; Hedrick, 2001). More recently, next‐generation sequencing technologies and advances in bioinformatic tools have introduced a wider, genomic perspective to population ecology and conservation (Amato et al., 2009; Avise, 2010; Hohenlohe et al., 2020; Supple & Shapiro, 2018).Parallel sequencing of pools of DNA molecules results in the detection of tens of thousands of single nucleotide polymorphisms (SNPs) and indels distributed along the whole genome. This allows a comprehensive description of the patterns of genetic variation among individuals (Ellegren, 2014; Hohenlohe et al., 2018; Luikart et al., 2018). Population structure can be estimated for multiple polymorphisms in a sliding window analysis along the genome, resulting in increased accuracy and the possibility of better detecting differences at specific genomic regions (Corander et al., 2013; Gaughran et al., 2018; Martin & Van Belleghem, 2017; Steane et al., 2015; Waples et al., 2016). Genomic analysis also provides a powerful tool to reconstruct the demographic history of populations, infer fluctuations in effective population size, test for population expansion and contraction, and delineate conservation and management units (Funk et al., 2012; Sato et al., 2020; Smith & Flaxman, 2020; Smith et al., 2018; Younger et al., 2017). Moreover, demographic events can be reconstructed over much wider timescales than it is possible by using high‐mutation‐rate loci, such as microsatellites (Li & Durbin, 2011).A great advantage of using genomic data is that many independent loci allow for comprehensive investigations to be conducted even when the sample size is small. Genetic diversity and population differentiation estimated over thousands of loci in a few individuals per population may, in fact, be comparable to those obtained by genotyping a high number of individuals using traditional molecular markers (Attard et al., 2018; Gaughran et al., 2018; Wright et al., 2020). This is of particular importance when studying species of conservation concern, with few or very elusive individuals available for sample collection, as well as relatively small island populations, where a more comprehensive analysis of genomic diversity can better describe demographic patterns resulting from complex biogeographical scenarios (e.g., Jensen et al., 2018; Meröndun et al., 2019; Sjodin et al., 2020).Among a variety of molecular techniques available for population genomic studies, the most comprehensive approach is to sequence the whole genomes of target individuals (Fuentes‐Pardo & Ruzzante, 2017; Schlötterer et al., 2014; Therkildsen & Palumbi, 2017; Wright et al., 2020). In this study, we assessed patterns of population structure and demographic history in island populations of the Komodo dragon (Varanus komodoensis), the world's largest lizard, by comparison of whole genome sequences obtained from a relatively limited number of animals. The Komodo dragon is endemic to Komodo National Park (KNP) and the Island of Flores in the Lesser Sunda region of eastern Indonesia, and has one of the smallest known range distributions of any large vertebrate (Ariefiandy, Purwandana, Azmi, et al., 2021; Ciofi & de Boer, 2004; Jessop et al., 2018). Extant populations occupy different sized islands and persist at very different abundances and densities (Jessop et al., 2007; Laver et al., 2012; Purwandana et al., 2014). The Komodo dragon is a keystone and umbrella species for the dry monsoon forest ecosystem, one of the biodiversity hotspots for conservation (Myers et al., 2000). It is still considered “vulnerable” by the IUCN (2020), although the range distribution has been substantially reduced over the last five decades and the species now comprises fewer than 4,000 individuals in the wild (Ariefiandy & Purwandana, 2019; Purwandana et al., 2014).The palaeogeography of KNP and Flores reflects patterns of vicariance and connectivity among islands which are mainly the result of recurrent past eustatic changes in sea level. Despite physical and sensory capabilities for long‐distance movements on land and limited sea water crossing, Komodo dragons show little dispersal both within and across islands (Jessop et al., 2018). This scenario, along with historical and more recent population changes due to habitat encroachment and expansion of human populations, particularly on the Island of Flores (Ariefiandy, Purwandana, Azmi, et al., 2021; Ariefiandy et al., 2015; Ciofi & de Boer, 2004), make the current insular distribution of V. komodoensis an excellent case study for a genome‐wide assessment of population structure. Komodo dragon population genetics have been investigated using species‐specific microsatellite loci (Ciofi & Bruford, 1998; Ciofi et al., 2011), which recovered a gradient of population distinctiveness and gene flow across islands with different levels of proximity (Ciofi et al., 1999; Ciofi & Bruford, 1999). In this work, we investigate whether analysis of whole‐genome sequencing data for a few individuals per island corroborates previous results based on multilocus allelic variation over a much wider sample size, and/or reveals previously undetected fine‐scale demographic patterns.
MATERIALS AND METHODS
Sampling, DNA isolation and sequencing by synthesis
Komodo dragons are found on the islands of Komodo (KM), Rinca (RN), Nusa Kode (NK), Gili Motang (GM), all part of KNP, and the West (WF) and North (NF) coastal areas of Flores (Ariefiandy, Purwandana, Azmi, et al., 2021; Ciofi & de Boer, 2004). Samples were collected as described in Ariefiandy et al. (2013) on the two large islands of Komodo (311.5 km2) and Rinca (204.8 km2) and two smaller islands of Gili Motang (9.5 km2) and Nusa Kode (7.8 km2). On Flores, Komodo dragons were sampled on the western (Wae Wuul nature reserve and the Lenteng area) and northern coast (Pota and Riung districts). We obtained either blood or tissue samples of four individuals from each of the six sampling locations for a total of 24 samples (Figure 1; Table S1). Whole DNA was extracted using a PureLink Genomic DNA Mini Kit (Invitrogen). DNA integrity was assessed by 1.5% agarose gel electrophoresis and DNA concentration was measured using a Qubit 4 fluorometer Broad Range Assay (Invitrogen). Sex was determined by polymerase chain reaction (PCR) amplification of sex‐specific genomic regions as described in Halverson and Spelman (2002). Short‐read genomic libraries were constructed using a Nextera DNA Flex Library Prep Kit (Illumina) according to the manufacturer's protocol. Target coverage was 10× for all samples except NF1, WF1, NK1, GM1, RN1 and KM1 (one from each sampling site), for which a 25× coverage was obtained. This sequencing strategy was used as demographic inference methods need high‐confidence genotypes (coverage ≥20×) for at least one individual per group. Lower coverages are sufficient to discover polymorphisms segregating at high frequency in all the other individuals from the same population (Nielsen et al., 2011). Libraries were pooled with a 2.5:1 concentration ratio of high‐ vs. low‐coverage individuals. A free adapter blocking reagent (Illumina) was used to reduce index hopping. Libraries were sequenced paired‐end on an Illumina NovaSeq 6000 System using a 300‐cycle NovaSeq 6000 S1 Reagent Kit v1.0.
FIGURE 1
Map of the study sites. A total of 24 Komodo dragons were sampled for genomic analyses on four islands in Komodo National Park (dotted line) and on the western and northern coast of Flores, covering the entire range of the species. The current distribution of Varanus komodoensis is shown by the light red areas
Map of the study sites. A total of 24 Komodo dragons were sampled for genomic analyses on four islands in Komodo National Park (dotted line) and on the western and northern coast of Flores, covering the entire range of the species. The current distribution of Varanus komodoensis is shown by the light red areas
Mapping and SNP calling
Demultiplexing and conversion of sequencing data from bcl to fastq formats were performed using bcl2fastq version 2.20 (Illumina). Quality control of the reads was assessed with fastqc version 0.11.8 (Andrews, 2010). Reads were then processed with adapterremoval version 2 (Schubert et al., 2016) to remove residual Illumina adapters. Read tails with a mean Phred‐quality score <10 over a 4‐bp sliding window were trimmed and subsequently aligned to the Komodo dragon reference genome (Lind et al., 2019) using the mem algorithm implemented in the bwa version 0.7.15 aligner (Li & Durbin, 2009). Alignments in sam format were sorted, indexed and compressed in bam format using samtools version 1.9 (Li & Durbin, 2009). Polymerase chain reaction duplicates, produced during library preparation, and optical duplicates were removed using the MarkDuplicates tool in the picard toolkit version 2.18.20 (http://broadinstitute.github.io/picard/). Regions close to indels showing putative alignment errors were identified and realigned using the RealignerTargetCreator and the IndelRealignment tools in gatk version 3.5 (McKenna et al., 2010). Alignment statistics were calculated using the CollectAlignmentSummaryMetrics tool, and bam files were validated with the ValidateSamFile tool of the picard toolkit version 2.18.20. Observed coverage was computed using the depth command of samtools version 1.9 with the “‐aa” flag activated.We also downloaded paired‐end reads of the Chinese crocodile lizard Shinisaurus crocodilurus (Gao et al., 2017) from GenBank (accession no.: PRJNA353147), and used it as outgroup for population structure analyses by applying the same informatics pipeline as described above.Single nucleotide polymorphisms and indels were called using the HaplotypeCaller algorithm implemented in gatk version 3.5. We excluded nucleotides with a base phred quality score <20 or those located in reads with a mapping phred quality score <20. The raw callset was then filtered by excluding variants matching at least one of the following criteria: not a biallelic SNP, a SNP phred quality score (QUAL) <60, a significant fisher strand test (FS >60), a Variant Confidence/Quality by Depth (QD) <2, a root mean square of the Mapping Quality (MQ) <40, an MQRankSum < −20 or a significant read position bias (ReadPosRankSum < −8.0). Genomic regions showing a depth of coverage lower than 0.25× or higher than 4× the mean coverage across samples were removed. We additionally removed SNPs within 5 bp of called indels with a QUAL >60. The quality of the variants was further improved by removing singletons, private doubletons and sites showing a frequency of the minor allele lower than 0.042 or that were missing in more than four individuals. Finally, we removed variants in genomic regions (i) showing an excessive coverage (>4 times the mean coverage) in at least one individual, (ii) containing repetitive elements (see Lind et al., 2019) and (iii) having a low mappability score (p < 1) computed using gem (Derrien et al., 2012) by setting a maximum mismatch of 4% in a 150‐bp read. Variants located in scaffolds of <500 kb in length, or scaffolds showing a coverage across individuals lower than half or higher than 3× the mean coverage were also removed from the final set. A total of 135 scaffolds were retained, corresponding to 96% of the V. komodoensis genome length. All remaining variants were phased using a two‐stage approach. Initially, whatshap version 0.18 (Martin et al., 2016) was used to phase genomic variants by considering all sequencing reads spanning multiple heterozygous sites. shapeit version 4 (Delaneau et al., 2019) was then run to phase all the remaining unphased variants by setting the “‐‐use‐PS” and “‐‐sequencing” flags.
Analysis of population structure
The autosomal SNP data set was used to estimate individual ancestry using admixture version 1.3 (Alexander et al., 2009). This method provides maximum‐likelihood estimates of the proportion of each sequenced genome that belonged to each of K populations. We explored co‐ancestry for a number of K ancestral populations between one and 10. The optimal number of hypothetical K ancestral groups was inferred using the cross‐validation (CV) error estimation method, whereby the CV error for each K is inferred by first masking and then re‐inferring genotypes. The optimal value of K was that with the lowest CV error across 20 replicates. The analysis was restricted to variants having an maf ≥0.05 and a minimum distance between SNPs of 20 kb.A multivariate discriminant analysis of principal components (DAPC, Jombart et al., 2010) was also performed using adegenet version 1.2.8 (Jombart & Collins, 2015) in R 3.5.1 (Team, 2018). We determined the optimal number of principal components (PCs) by cross‐validation using the “xvalDapc” function with 1,000 replicates. We then selected the number of PCs associated with the lowest root mean squared error value. We ran DAPC using all the available discriminant functions.Single nucleotide polymorphisms contained in the autosomal regions were used to compute an individual pairwise distance matrix between samples using the “‐‐distance square0 1‐ibs flat‐missing” command in plink version 1.9 (Chang et al., 2015). The distance matrix was then converted to the nexus format using the phangorn R package (Schliep, 2011), and splitstree version 4.14.6 (Huson & Bryant, 2006) was used to obtain a phylogenetic network according to the neighbour‐net algorithm (Bryant & Moulton, 2004).Evolutionary relationships were further estimated from SNPs using snapp (Bryant et al., 2012), a coalescent‐based method implemented in beast2 (Bouckaert et al., 2014). We selected one variant every 100 kb, for a total of 13,482 SNPs, in order to reduce correlation between markers. We ran snapp for 1,000,000 Markov chain Monte Carlo (MCMC) generations, sampling every 1,000 steps and setting a burn‐in of 10%. We set mutation rates equal to 1 and used default parameters for the gamma prior (alpha 11.75, beta 109.73). The trees distribution was visualized using densitree version 2.1 (Bouckaert et al., 2014).The level of divergence between populations was assessed by the Weir and Cockerham (1984) θ estimator of the F
ST parameter computed using vcftools. A population phylogenetic tree was built based on the pairwise θ matrix using the neighbour‐joining algorithm implemented in the ape R package (Paradis & Schliep, 2019).
Mitochondrial DNA analysis
Mitochondrial DNA (mtDNA) sequencing reads were extracted by filtering whole genome alignments for the scaffold corresponding to the mtDNA of the Komodo dragon reference genome (NCBI accession no.: SJPD01001108.1) using samtools view. Alignments with MAPQ <30 were filtered out. Mitochondrial region alignments were visually screened using geneious prime 2020.1.1 (Kearse et al., 2012). Consensus sequences were called for each individual from the most frequent nucleotide at each site with a 60% consensus threshold. An mtDNA phylogenetic tree was built using raxml version 8.2.7 implemented in geneious prime 2020.1.1 by applying the GTR GAMMA nucleotide model with rapid bootstrapping and search for best‐scoring maximum‐likelihood trees across 100 bootstrap replicates (Stamatakis, 2014). We used the complete mtDNA sequence of the water monitor V. salvator as outgroup (NCBI accession no.: EU747731.1).
Z chromosome analysis
Reads aligned to scaffolds associated with the Z chromosome of the Komodo dragon (Lind et al., 2019) were used to call variants against the reference sequence using bcftools version 1.10.2 (Li, 2011). We set a minimum mapping quality of 40, a minimum base quality score of 30 and the “‐C50” flag to adjust the mapping quality of reads containing excessive mismatches. The variants were called in female individuals by setting a ploidy equal to 1 because of the Z chromosome's hemizygous state, and applying the same genomic masks used for autosomes. Positions showing a phred quality score (QUAL) <60, a sequencing depth <5 reads or an INDEL were excluded. The filtered set of variants, together with the Komodo dragon reference sequence, was used to produce a consensus sequence for each female using bcftools version 1.10.2. Consensus sequence quality was improved by repeating the variant calling procedure and setting a ploidy of 2 with the same quality filters as above. Single nucleotide polymorphisms showing within‐individual heterozygosity were masked with “N” in the final sequences. Consensus sequences were used to build a phylogenetic tree using the same method described for the mtDNA sequences.
Genomic diversity
Genetic diversity of individuals and populations was evaluated using Watterson's Theta estimator (Watterson, 1975). The number of segregating sites was counted for each callable region, defined by genomic intervals showing good mappability, low repetitiveness and appropriate coverage levels (see Section 2.2). The total number of segregating sites was first divided by the (n − 1) harmonic number, where n is the number of haploid chromosome copies, and then by the total size of the callable regions to obtain the per‐base estimator θ
W. The same approach was used to obtain a θ
W estimate over neutral regions and exons. Neutral regions were defined by callable regions located in intergenic regions that were at least 25 kb from the closest gene. Exon regions were extracted directly from the reference genome annotation by merging overlapping elements in different strands and discarding portions that were not in callable regions.Genomic diversity estimates for Komodo dragon individuals were compared to published values for the Aldabra giant tortoise Aldabrachelys gigantea and Pinta Island Galápagos tortoise Chelonoidis abingdonii (Quesada et al., 2019), saltwater crocodile Crocodylus porosus, Indian gharial Gavialis gangeticus and American alligator Alligator mississippiensis (Green et al., 2014). Genetic variants in callable, neutral and exon genomic partitions were processed using snpsift (Cingolani et al., 2012) to identify private segregating sites in each of the six groups.
Run of homozygosity and inbreeding
Runs of homozygosity (ROH) were first identified by estimating the heterozygosity levels in 250‐kb nonoverlapping windows using Rohan's probabilistic method (Renaud et al., 2019). Each genomic segment was then defined to be in ROH based on a Hidden Markov Models (HMM) classifier. The analysis was performed on bam alignments considering base and mapping errors. We used a transition/transversion rate of 2.251, estimated by vcftools across the entire data set, and an expected θ
W in ROH regions (rohmu flag) of 2 × 10−5. The parameter θ
W was estimated by either including or excluding ROH regions. Although the method was developed to provide reliable ROH estimates for different coverages (>5×) (Renaud et al., 2019), the six individuals sequenced at higher coverage were downsampled to 10× in order to facilitate comparison with the other samples.A second approach to identify ROH was based on the HMM implemented in bcftools version 1.10.2. Regions in autozygosity were called using genotype likelihoods as input and a fixed recombination rate of 1 centiMorgan (cM) Mb–1. The analysis was repeated three times by setting the per‐nucleotide frequency of the alternate allele to (i) the value observed across all 24 individuals, (ii) the per‐population estimate (n = 4) or (iii) a fixed value of 0.4. Finally, ROH was also detected using plink (Chang et al., 2015) under default parameters. All of these methods used ROH regions ≥1 Mb to estimate the fraction of the whole genome that was in ROH state (F
ROH).
Demographic analyses
Trajectories of effective population size (N) through time were inferred using a Multiple Sequentially Markovian Coalescent (MSMC) (Schiffels & Wang, 2020) on one high‐coverage sample. Input data were generated using the “generate_multihetsep.py” script selecting all callable segments from autosomal scaffolds and removing those having a minimum length <500 kb (Gower et al., 2018). One‐hundred bootstrap replicates were generated using the “multihetsep_bootstrap.py” script which created, at each iteration, a set of 35 chromosomes each composed of 20 random chunks of 2 Mb in length from the original data set (total size of 1.4 Gb for each replicate). The same method was also used to estimate the Relative Cross Coalescent Rate (RCCR) between pairs of high‐coverage individuals by applying a more stringent approach to exclude genomic regions containing phasing artefacts that could bias the inference. Each of the six high‐coverage genomes was processed to remove all 50‐kb genomic segments containing at least one heterozygous site that we were unable to phase using paired‐end read information. The total proportion of masked base pairs along the genome was 0.28, 0.36, 0.34, 0.35, 0.45 and 0.40 in NF1, NK1, GM1, RN1, WF1 and KM1, respectively. The “‐s” flag was also activated to avoid sites with ambiguous phasing, as suggested by Schiffels and Wang (2020). The time at which the RCCR decreased below the 0.5 threshold was taken as a point estimate of the divergence time. All parameter estimates were scaled using a mutation rate of 7.9 × 10−9 bp−1 per generation (Green et al., 2014) and a generation time of 12 years (Auffenberg, 1981). Results of genomic analyses were integrated with known eustatic changes in sea level during the last five glacial cycles (Grant et al., 2014) and the approximate arrival of anatomically modern human (AMHs) on the Island of Flores (Aubert et al., 2014).
RESULTS
SNP and genotype calling
We produced a total of ~3.92 billion reads which uniquely mapped to the Komodo dragon genome. The mean coverage for six high‐coverage individuals was 28.4× while the mean coverage for the remaining 18 individuals was 12.3×. Individuals previously classified as females by using end‐point PCR amplification of sex chromosome genes showed, for Z chromosome scaffolds, approximately half of the mean coverage found in males for the same scaffolds and autosomal scaffolds (Figure S1). This was expected considering that female Komodo dragons have heteromorphic sex chromosomes (ZW) while males have two copies of the Z chromosome (Iannucci et al., 2019; Johnson Pokorná et al., 2016).The number of reads, percentage of aligned reads and final mean coverage are reported for each sample in Table S1. A total of 608,471 SNPs were retained across 24 individuals after filtering. Of these, 6,991 were located in coding regions. The number of heterozygous SNPs ranged from 95,013 in GM3 to 172,607 in WF2. An average of 1.4% of the heterozygous SNPs was found to be within coding regions. The mean percentage of phased heterozygous SNPs across all samples was 68% in the first phasing step and 100% in the final step.The population with the highest number of SNPs was West Flores (333,416) followed by Komodo (331,214), Rinca (312,421), North Flores (276,933), Nusa Kode (271,718) and Gili Motang (250,404). Details on SNP numbers are reported for each individual and population in Tables S2 and S3, respectively.
Population structure
Based on the cross‐validation error value (Figure S2), the clustering analysis performed with admixture suggested the presence of three distinct genetic clusters (Figure 2a). These correspond to the Island of Komodo, the northern coast of Flores and a third group including all remaining individuals. There were no admixed individuals except in West Flores, where we observed a North Flores genetic component ranging from 4% (WF1, WF2 and WF4) to 8% (WF3). The cross‐validation error was slightly higher for K = 4 and K = 5. In particular, a genomic differentiation of Gili Motang was supported for K = 4, while a further separation of West Flores and Nusa Kode was recovered for K = 5. However, at K = 5 the Rinca samples were not a homogeneous group but rather a mixture of the West Flores and Nusa Kode genetic components. The DAPC analysis (Figure 2b), the networks based on whole‐genome analysis (Figure 3a) and Z chromosome data (Figure S3), and the phylogenetic trees based on mtDNA (Figure 3b) and SNPs (Figure 3c) all support the overall scenario of three main genomic units described by admixture.
FIGURE 2
Results of the population structure analyses performed using admixture version 1.3 (a), and a multivariate discriminant analysis of principal components (b). Each column in (a) corresponds to an individual Komodo dragon, and each colour represents the proportion of an individual's genome belonging to each of the K clusters
FIGURE 3
Phylogenetic network based on autosomal SNPs showing relationships among 24 Komodo dragons from six locations in Komodo National Park and the Island of Flores (a), phylogenetic tree based on mitochondrial DNA genomes (b), and phylogenetic tree based on SNPs (c). The cloudogram represents the posterior distribution of lineage trees from the Bayesian phylogenetic analysis performed using snapp. Higher density areas indicate greater agreement in tree topologies
Results of the population structure analyses performed using admixture version 1.3 (a), and a multivariate discriminant analysis of principal components (b). Each column in (a) corresponds to an individual Komodo dragon, and each colour represents the proportion of an individual's genome belonging to each of the K clustersPhylogenetic network based on autosomal SNPs showing relationships among 24 Komodo dragons from six locations in Komodo National Park and the Island of Flores (a), phylogenetic tree based on mitochondrial DNA genomes (b), and phylogenetic tree based on SNPs (c). The cloudogram represents the posterior distribution of lineage trees from the Bayesian phylogenetic analysis performed using snapp. Higher density areas indicate greater agreement in tree topologiesThe neighbour‐joining network constructed using pairwise F
ST values between sampling locations also confirms the individual‐based structure whereby the Island of Komodo and North Flores represent distinct groups equally diverging from a third one composed of the remaining populations (Figure S4). Low levels of differentiation were recorded among Rinca, Nusa Kode and West Flores, with Gili Motang separated by a more pronounced branch length.
Genomic diversity and inbreeding
Single individual genome‐wide heterozygosity levels for high‐coverage Komodo dragon samples were rather homogeneous, with θ
W estimates ranging from 7.62 × 10–5 in southern Rinca to 1.31 × 10–4 in western Flores. These values were consistently lower than θ
W estimates recorded for other reptiles including vulnerable, critically endangered or even extinct species (Figure 4a). At the population level, the lowest θ
W estimate was recorded for the small island populations of Gili Motang (7.37 × 10–5). The islet of Nusa Kode and the North Flores region, both of limited extent and small population abundances, showed a slightly higher (8.00 × 10–5 and 8.15 × 10–5, respectively) genomic diversity values. Relatively higher θ
W estimates were instead recorded in West Flores (9.81 × 10–5) and for the largest island populations of Komodo (9.75 × 10–5) and Rinca (9.20 × 10–5).
FIGURE 4
Genomic diversity estimates based on Watterson's Theta θ
W (a), and estimates of θ
W per site in exons and neutral regions (b). Solid colour bars show values recorded for each of the six Komodo dragon individuals with a 25× genome coverage (one individual per location). Striped colour bars indicate θ
W estimates per location. Grey bars show values for other reptile species. VU: Vulnerable; CR: Critically Endangered; EX: Extinct
Genomic diversity estimates based on Watterson's Theta θ
W (a), and estimates of θ
W per site in exons and neutral regions (b). Solid colour bars show values recorded for each of the six Komodo dragon individuals with a 25× genome coverage (one individual per location). Striped colour bars indicate θ
W estimates per location. Grey bars show values for other reptile species. VU: Vulnerable; CR: Critically Endangered; EX: ExtinctAnalysis of genetic diversity recovered a marked reduction in diversity in neutral regions with respect to exons for populations with low genomic diversity (Figure 4b; Table S4). Neutral regions were 10.8%, 8.3% and 8.2% more variable than exons in the populations of Komodo, West Flores and Rinca, while a strong reduction to 3.6%, 2.1% and 1.7% was recorded in Nusa Kode, North Flores and Gili Motang, respectively.The genome‐wide diversity was substantially different if ROH regions were either included or excluded. Outside ROH, all individuals showed a θ
W midpoint estimate between 1.91 × 10–4 and 2.45 × 10–4, with negligible differences between mean values across populations (Figure 5a). Inclusion of ROH in the analysis caused an average decrease in θ
W estimates of ~25% (Figure 5b). The fraction of the genome being in ROH (F
ROH) was quite different across samples. Some individuals had relatively high F
ROH, while others showed negligible F
ROH (Figure 5c). The North Flores population had the highest mean F
ROH (12%) with sample NF1 showing 30% of its genome in ROH. A similar pattern was observed in the populations of Nusa Kode and Komodo with a mean F
ROH value of 11% and 10%, respectively, while the populations of West Flores, Rinca and Gili Motang showed mean F
ROH values lower than 10% (Table S5). plink and bcftools produced F
ROH estimates that were positively correlated to values estimated by rohan (Pearson correlation coefficient ranging from 0.31 to 0.71). However, no significant correlation was observed when the bcftools analysis was conducted using an alternate frequency across individuals or a fixed frequency of 0.4 in each population. This may suggest that the frequency threshold is a critical parameter to identify the proper within‐individual F
ROH and that a single frequency value may not be suitable for all populations (Table S6).
FIGURE 5
Population genomic diversity estimated using rohan including (a), or excluding (b) ROH regions; (c) proportion of the genome composed of ROH (F
ROH) ≥ 1 Mb
Population genomic diversity estimated using rohan including (a), or excluding (b) ROH regions; (c) proportion of the genome composed of ROH (F
ROH) ≥ 1 MbPrivate segregating sites were not uniformly distributed across populations. Komodo and North Flores showed the highest fraction of private polymorphisms (11.7% and 8.9% of the total SNP variation in the genome, respectively). Proportions of private SNPs were lower in West Flores (0.4%) and the populations of Rinca, Nusa Kode and Gili Motang (0.3%). Similar proportions were observed in exons and neutral regions (Figure S5).
Demographic analysis
Estimates of past demographic patterns based on whole genome analysis of extant populations recovered a very large effective population size between 1 million and 500 thousand years ago (ka). Moving forward in time, in North Flores we estimated a long and gradual population decline that ended ~3 ka with an N of a few hundred individuals (Figure 6). A similar pattern was recovered for the other populations, where an initial, steady decrease in N was followed by a period of constant population size during the Saalian cold period (spanning from ~400 to 150 ka), and then by a further population decline that ended between 5 and 3 ka.
FIGURE 6
Evolutionary dynamics of Komodo dragon populations inferred across the whole genomes using MSMC (Schiffels & Wang, 2020). Upper plot: effective population size through time in six populations (dashed lines) with bootstrap intervals (coloured areas) based on high‐coverage individuals. Eustatic changes in sea levels relative to the present are shown in dark grey. Light grey columns represent the time span of the last glacial maximum (LGM; 31–16 ka) and the Saalian cold period (SCP; 400–150 ka). Lower plot: relative cross coalescent rates (RCCR) through time between populations. Each line depicts the RCCR profile estimated using pairs of high‐coverage genomes. The pairwise comparisons involving the North Flores and Komodo Island populations vs. all the other sampling sites are shown by the black and red lines, respectively. Comparisons among West Flores, Nusa Kode, Gili Motang and Rinca populations are shown in grey. Values below the threshold of 0.5 (horizontal dashed line) indicate negligible gene flow between groups
Evolutionary dynamics of Komodo dragon populations inferred across the whole genomes using MSMC (Schiffels & Wang, 2020). Upper plot: effective population size through time in six populations (dashed lines) with bootstrap intervals (coloured areas) based on high‐coverage individuals. Eustatic changes in sea levels relative to the present are shown in dark grey. Light grey columns represent the time span of the last glacial maximum (LGM; 31–16 ka) and the Saalian cold period (SCP; 400–150 ka). Lower plot: relative cross coalescent rates (RCCR) through time between populations. Each line depicts the RCCR profile estimated using pairs of high‐coverage genomes. The pairwise comparisons involving the North Flores and Komodo Island populations vs. all the other sampling sites are shown by the black and red lines, respectively. Comparisons among West Flores, Nusa Kode, Gili Motang and Rinca populations are shown in grey. Values below the threshold of 0.5 (horizontal dashed line) indicate negligible gene flow between groupsThe relative cross‐coalescent rates analysis highlighted three time periods with reduced gene flow between populations (Figure 6). Approximately 20 ka, the gene flow between North Flores and all other populations started to decrease, reaching the 0.5 threshold ~15 ka. More recently, ~10 ka, Komodo Island showed a decreasing RCCR with respect to other populations (excluding North Flores) going below 0.5 ~5 ka. All other pairwise comparisons involving Rinca, Nusa Kode, West Flores and Gili Motang showed oscillating RCCR through time, with rates going below the 0.5 threshold ~1 ka. These values, however, were never equal to 0, an indication of historical and ongoing gene flow among island populations across the central part of KNP and West Flores. The most informative RCCR dynamics refers to the central part of the plot of Figure 6. Although the analysis was restricted to high‐quality phased regions, the presence of residual incorrectly phased sites might affect the reconstruction of the RCCR profile in this time window (Schiffels & Wang, 2020).
DISCUSSION
In this study, whole genome sequencing was used to assess population structure, genomic diversity and demographic history of Komodo dragons. We performed genome resequencing of 24 individuals from six islands, covering the entire distribution of the species. Overall, our results show how whole genome analysis of a few individuals per population can refine and improve information on intraspecific genetic diversity that can be otherwise obtained using a data set based on a much larger sample size genotyped at a few genetic markers (Ciofi et al., 1999; Ciofi & Bruford, 1999). This is in accordance with other studies where genomic analysis of a very limited number of animals was used to corroborate multilocus assessments of population structure in an insular ecosystem (e.g., Gaughran et al., 2018).We recorded a clear genomic distinction of Komodo dragons of the island of Komodo and the north coast of Flores from the rest of the archipelago. The degree of isolation of Komodo dragons could be associated with the Island's palaeogeographical history. According to eustatic sea level variations over the past 250 ka (Chappell & Shackleton, 1986; McCulloch et al., 1999; Voris, 2000) and bathymetric data of the study area, the Island of Komodo was probably connected to the eastern islands for relatively short time intervals, ~140 and 18 ka, during the last two Pleistocene glacial maxima. On the other hand, Flores and Rinca, currently separated by narrow and shallow waters, remained isolated during a high sea level event about 125 ka and were then reconnected until ~10 ka (Voris, 2000). The smaller islands of Gili Motang and Nusa Kode were also connected several times to Flores and Rinca. The increased distance between Komodo and Rinca following the sea level rise after the last glaciation might have represented a major barrier to gene flow. By contrast, gene flow was probably maintained between West Flores, Rinca, Nusa Kode and, to some extent, Gili Motang given the Komodo dragons' ability to swim over short distances (Auffenberg, 1981).Our genomic data also advocate the strong pattern of genetic divergence of the North Flores population, described by previous analysis of microsatellite allelic diversity (Ciofi et al., 1999), from the other islands and the population found on the western coast of Flores. Similar levels of within‐island genetic structure have been recorded for other amphibian and reptile species on Flores (e.g., Reilly, 2016). Reilly et al. (2019) pointed at Flores's palaeogeography as a possible explanation for the population structure of fanged frogs. In particular, the observed patterns of genetic differentiation could have been related to the existence of ancient volcanic islands that later coalesced into a single island. However, this explanation contrasts with fossil records that support the appearance of Komodo dragons on Flores ~900 ka (Hocknull et al., 2009), a relatively short time period compared to the geological time of Flores Island formation. The observed divergence between the West and North Flores Komodo dragon populations might be instead the result of an isolation by distance (IBD) process. We suggest that Komodo dragons previously had a much wider distribution on Flores (Ariefiandy, Purwandana, Azmi, et al., 2021; Auffenberg, 1981; Ciofi & de Boer, 2004). The limited dispersal of the species (Jessop et al., 2018), coupled with anthropogenic habitat fragmentation and other ecological barriers, may have resulted in a gradient of genetic diversity across the north coast of Flores, and eventually to peripatric divergence. Patterns of IBD related to sedentary habits and habitat fragmentation have also been described for other reptile species (e.g., Driscoll, 2004; Heath et al., 2012; Moore et al., 2008). Unfortunately, no samples are available for locations in between the northern and western coast of Flores in order to effectively test this hypothesis.Although Komodo dragons sampled on the islands of Rinca, Nusa Kode, Gili Motang and on West Flores probably experienced higher population connectivity levels over the last glacial periods, as confirmed by the RCCR analysis, a slight differentiation of the Gili Motang population was observed (Figure 6). This result could be due to the remote position of this islet and the strong currents originating from the exchanges of water masses between the Indian and Pacific Oceans (Gordon et al., 1994), which may significantly hinder gene flow between Gili Motang and the other nearby islands.Estimates of θ
W showed that genomic diversity was low for all populations. In particular, genomic diversity values were lower than estimates reported for vulnerable and critically endangered reptiles, such as the Indian gharial (Green et al., 2014), the Aldabra giant tortoise and the now extinct Pinta Island Galápagos giant tortoise (Quesada et al., 2019). Low genomic diversity has commonly been associated with an increased susceptibility to genetic diseases and a decreased adaptive potential, both of which can lead to increased extinction rates in vertebrates (Clark et al., 2011; Johnson et al., 2010; Reed & Frankham, 2003; de Villemereuil et al., 2019). Nevertheless, in other species with similarly low genetic diversity, no detrimental consequences were reported (e.g., Benazzo et al., 2017; Westbury et al., 2018; Xue et al., 2015). This may also be the case for Komodo dragons, which currently show no evident sign of severe deleterious mutations. A possible explanation for this state may be the absence of a significant difference in genomic diversity recorded for Komodo dragons in neutral regions with respect to exons (Figure 4b; Table S4). As suggested for other species (Morin et al., 2020; Westbury et al., 2019), such a condition may imply the retention of diversity in coding regions relative to the noncoding regions. If so, this process could help maintain adaptive potential in Komodo dragons, enabling the species to better adapt to environmental changes (but see Jones et al., 2020). Conversely, the overall low levels of heterozygosity and lack of variation in diversity levels between coding and noncoding regions may suggest that heterozygosity has reached a minimum, stable threshold, and any further decrease in genomic diversity could affect survival (Morin et al., 2020; Purwandana et al., 2015; Westbury et al., 2019).Overall, the proportion of the genome in long ROH (F
ROH) was moderate. The most inbred sample had ~30% of its genome composed of ROH. However, F
ROH was highly variable among individuals within populations and among populations. This pattern was particularly evident in Komodo, Rinca and West Flores, where two individuals per group showed an F
ROH lower than 5% while the others had F
ROH between 10% and 20%. This result may suggest that inbreeding levels in Komodo dragon populations have been increasing with respect to higher, historical values maintained by gene flow among islands. Such average values of inbreeding coefficients across populations, where most of the variation actually occurs within populations, are neither unexpected nor uncommon events. Segregation and recombination are random processes, mating can be assortative and it is more likely to occur in small populations where some individuals may have highly consanguineous parents (Kardos et al., 2015; Schraiber et al., 2012). Although our results suggest moderate intraspecific levels of inbreeding, a 10% F
ROH increase in a single individual may severely impact its fitness, especially in certain age classes, as observed for other organisms with small effective population size (Stoffel et al., 2020).
Demographic history
Whole‐genome analysis allowed reconstruction of the demographic history of Komodo dragons in the last one million years. All populations showed a similar demographic pattern consisting of a rapid N decrease in the ancient past, followed by a period of relatively stable effective population size during the Saalian glacial cycle, and a further decline following the colonization of Flores by AMHs.Studies of demographic trends over extended periods of time are scarce for reptiles, but they all report patterns of effective population size decrease in the Pleistocene. Green et al. (2014) investigated demographic trends in crocodilians and found that all three studied species experienced a sharp N decline between 100 and 10 ka. Similar results were reported for the Chinese alligator (Wan et al., 2013) and elapid snakes (Ludington & Sanders, 2020). Marine reptiles also showed a decline in N size over the last glacial period (Fitak & Johnsen, 2018; Kishida et al., 2020; Ludington & Sanders, 2020), suggesting that global climatic events including a generalized decrease in temperature (Van de Wal et al., 2011) may have had major demographic consequences for ectothermic species. A reduction in temperature could have played only a minor role in Komodo dragon N reduction. Varanids are more independent of ambient temperature than other lizards, especially large individuals (McNab & Auffenberg, 1976). On the other hand, temperature lowering could have indirectly influenced the Komodo dragon's abundance by affecting the habitat quality of this species (Jones et al., 2020). Palaeoecological reconstructions of the Quaternary habitat of the Banda Sea area suggest a dry environment and a reduction in precipitation levels, with open vegetation replacing rainforest in some areas (van der Kaars et al., 2000). This could have influenced the population dynamics of V. komodoensis, considering that monsoon forest offers better conditions to Komodo dragons for thermoregulating than does savannah habitat (Harlow et al., 2010).A period of relatively constant effective population size was recorded for all populations over the Saalian glacial cycle (400–150 ka), except for the North Flores population. At this time, a decrease in sea level reduced the distance among several islands of the Lesser Sunda and Banda Arcs, and increased habitat availability for Komodo dragons (Voris, 2000). This might have led to either a temporary population range expansion or an increase in gene flow between islands. Both events could explain a temporary population recovery. The MSMC reconstruction is expected to track changes in effective population size, but N trajectories might also represent changes in connectivity in metapopulation systems (Mather et al., 2020; Mazet et al., 2016), as supported by the RCCR analysis. The above‐mentioned ecological conditions might have had a marginal effect on North Flores, where the effective population size was constantly decreasing. Since new habitat areas were probably available for all populations, this may not be true for the connectivity between groups. Western populations probably experienced more gene flow due to their geographical proximity, whereas a reduced gene flow towards the North Flores group due to its geographical location might have promoted its isolation. For this reason, gene flow could have been the main factor explaining the population recovery observed during the Saalian period between 400 and 150 ka.Considering the differences in the N curve between the Saalian glaciation and the last glacial maximum, where no population recovery was recorded, it is possible that other factors in addition to changes in environmental temperature and habitat availability affected the demography of Komodo dragons in the Quaternary. The colonization of Flores by AMHs around 50 ka and their subsequent population expansion (Aubert et al., 2014; Tucci et al., 2018) coincides with the beginning of a steep descent of the Komodo dragon N curve. Anthropogenic interference has been suggested as one of the drivers of population decline in vertebrates in the early and mid‐Holocene (e.g., Cooke et al., 2017; Dong et al., 2021). However, further research is needed to assess whether humans have directly or indirectly negatively affected Komodo dragon population size and distribution.Time trajectories of N resulted in an interesting demographic scenario for Komodo dragons over the last 1 million years. However, while MSMC is proving a robust approach for estimating N and patterns of population size variation over distant time periods (Li & Durbin, 2011), it does also depend on assumptions that may bias the calculation of the actual values of N, particularly for very recent population histories (Liu & Fu, 2015; Sheehan et al., 2013). The assumption of MSMC that populations are isolated and the uncertainty over a precise mutation rate estimate are additional factors that need further evaluation for ad hoc analysis of Komodo dragon effective population size.
Conservation outcomes
Our study provides an example of how whole‐genome analysis of a few individuals per population can help assess fine‐scale population structure and intraspecific demographic dynamics. This is particularly important when applying population genomics data to management and conservation of endangered species, for which extended field effort is required in order to obtain an adequate sample size for analyses based on more traditional molecular markers.Our data advocate the genomic distinction of the populations of the Island of Komodo and the northern coast of Flores, both of which should be managed as separate conservation units (Casacci et al., 2014; Ciofi et al., 1999; Crandall et al., 2000; DeWeerdt, 2002; de Guia & Saitoh, 2007). However, while the Komodo Island population is fairly well protected within the boundaries of KNP, Komodo dragons from North Flores suffer from habitat encroachment and other human‐related threats (Ciofi & de Boer, 2004). Only a small proportion of the extant populations of Flores Island are found in protected areas. The unambiguous genetic distinction of Komodo dragons from the northern coast of Flores is, therefore, important information to support ongoing collaborative efforts with local communities and authorities for the protection of V. komodoensis outside KNP (Ariefiandy et al., 2015; Ariefiandy, Purwandana, Azmi, et al., 2021).Future directions in the definition and management of conservation units of Komodo dragons could rely on genome sequencing of a broader sample set in order to assess adaptive genetic variation among populations. This information will be valuable to prioritize which populations to focus management efforts on, and which populations to use as sources for translocation, demographic reinforcement and assisted migration efforts (Barbosa et al., 2018; Funk et al., 2012).Our data showed levels of genomic diversity in Komodo dragons to be lower than other threatened or even extinct reptile species. IUCN assigns Red List status based mainly on population size and trends and degree of population fragmentation, while genetic diversity is yet to be considered as an important parameter in evaluating the status of a species (IUCN, 2020). Genetic diversity is critical to the sustainability of small populations (Reed & Frankham, 2003), and many authors have argued that Red List status should be determined in part by the degree of genetic diversity of a species with respect to closely related lineages (e.g., Brüniche‐Olsen et al., 2018; Willoughby et al., 2015). Therefore, results of genomic analysis should be integrated with data on current population size and distribution (Ariefiandy, Purwandana, Azmi, et al., 2021; Purwandana et al., 2014), differences in population ecology and carrying capacity across islands (Ariefiandy et al., 2016; Jessop et al., 2006, 2007; Purwandana et al., 2015), as well as deterministic and stochastic threats to extant populations (Ariefiandy et al., 2015; Jones et al., 2020) to try and re‐evaluate the conservation status of Komodo dragons, particularly for groups living outside protected area networks (Jessop et al., 2020).
AUTHOR CONTRIBUTIONS
A.I., A.B. and C.C. designed and conceived the study. C.C. collected the samples and secured funding for consumables relating to laboratory work. A.I., C.N. and E.A.A. carried out laboratory work. G.B. contributed to data analysis. A.I. and A.B. wrote the paper. C.C., T.S.J., E.A.A., M.S.A.Z. and G.B. revised the paper. All authors commented on and approved the final manuscript.Supplementary MaterialClick here for additional data file.