Literature DB >> 35271794

Probing the genomic limits of de-extinction in the Christmas Island rat.

Jianqing Lin1, David Duchêne2, Christian Carøe3, Oliver Smith3, Marta Maria Ciucani3, Jonas Niemann3, Douglas Richmond3, Alex D Greenwood4, Ross MacPhee5, Guojie Zhang6, Shyam Gopalakrishnan7, M Thomas P Gilbert8.   

Abstract

Three principal methods are under discussion as possible pathways to "true" de-extinction; i.e., back-breeding, cloning, and genetic engineering.1,2 Of these, while the latter approach is most likely to apply to the largest number of extinct species, its potential is constrained by the degree to which the extinct species genome can be reconstructed. We explore this question using the extinct Christmas Island rat (Rattus macleari) as a model, an endemic rat species that was driven extinct between 1898 and 1908.3-5 We first re-sequenced its genome to an average of >60× coverage, then mapped it to the reference genomes of different Rattus species. We then explored how evolutionary divergence from the extant reference genome affected the fraction of the Christmas Island rat genome that could be recovered. Our analyses show that even when the extremely high-quality Norway brown rat (R. norvegicus) is used as a reference, nearly 5% of the genome sequence is unrecoverable, with 1,661 genes recovered at lower than 90% completeness, and 26 completely absent. Furthermore, we find the distribution of regions affected is not random, but for example, if 90% completeness is used as the cutoff, genes related to immune response and olfaction are excessively affected. Ultimately, our approach demonstrates the importance of applying similar analyses to candidates for de-extinction through genome editing in order to provide critical baseline information about how representative the edited form would be of the extinct species.
Copyright © 2022 The Author(s). Published by Elsevier Inc. All rights reserved.

Entities:  

Keywords:  Christmas Island rat; Rattus macleari; ancient DNA; de-extinction; evolutionary divergence; genomic sequencing; molecular dating

Mesh:

Year:  2022        PMID: 35271794      PMCID: PMC9044923          DOI: 10.1016/j.cub.2022.02.027

Source DB:  PubMed          Journal:  Curr Biol        ISSN: 0960-9822            Impact factor:   10.900


Results and discussion

Unlike alternative potential de-extinction approaches such as targeted back-breeding and interspecies somatic cell nuclear transfer (iSCNT), genome reconstruction through genetic engineering approaches are not constrained by the requirement of working with still-living/viable material. In contrast, they propose to take advantage of recent advances in both ancient DNA (aDNA) and genome editing technology to potentially revive extinct species for which either no genomic tracts are preserved in living species (for back-breeding) or as viable frozen somatic cells (for iSCNT). Genetic engineering for de-extinction is conceptually based upon the idea of first describing the sequence of the extinct species, then editing the genomes of living cells from related species, for example using CRISPR-Cas9 technologies., However, currently this process is not straightforward. First, since DNA recovered from most historic/ancient samples is typically heavily fragmented,, the extinct species' genome is unlikely to be reconstructed through de novo genome assembly.,,, Rather, the extinct species’ genome sequence is obtained through mapping its DNA against the (ideally) de novo sequenced genome of a closely related living species in order to identify sequence differences for use in the subsequent editing. There are at least two key hurdles inherent in de-extinction through this route, both ultimately derived from the evolutionary divergence that separates the extinct from the extant species. First, as current gene editing technologies are typically limited to the range of introducing several tens to several hundreds of edits per cycle, multiple rounds of edits would be required to fully modify a genome that may differ at many thousands of positions (or even much more)., However, even if the genome editing technology can be improved to efficiently edit every site required in a single generation, an additional possible challenge remains that may be far more problematic. Because ancient DNA molecules are typically very short as a result of post mortem diagenesis (most typically well under 50 bp in length),, these map poorly and/or ambiguously (if at all) to any regions of the genome that are highly divergent from the reference, thus potentially rendering them unrecoverable., Although some computational solutions to this challenge have been proposed, such as reducing the evolutionary divergence through mapping to in silico predicted ancestral nodes on phylogenies, at best the effect is reduced, not eliminated. As such, given that the ultimate goal of at least some de-extinction projects may be the regeneration of species whose genomes are as representative as possible of the lost form (as opposed to the definition adopted by IUCN Species Survival Commission, that is the creation of “a proxy of an extinct species” that is “a functional equivalent able to restore ecological functions or processes that might have been lost as a result of the extinction of the original species”), a key question is how exactly does evolutionary divergence affect genome reconstruction success? In particular, given that evolutionary rates can vary greatly across the genome,, how might this information inform us about the biological reality of any resurrected species created in this way?7, 8, 9 We extracted and sequenced aDNA from two dry preserved skin samples of the Christmas Island rat (Rattus macleari), originally collected between 1900–1902 and held as part of the Oxford University Museum of Natural History collections. We assume that should gene editing be used to attempt resurrection, the Norway brown rat (Rattus norvegicus) would represent an ideal system for editing for several reasons. First, the relatively close estimated evolutionary divergence of the Christmas Island rat and Norway brown rat (previously estimated split at ca. 2.6 million years ago (mya) based on molecular phylogenetic analysis,,, assuming a mutation rate of 1.655 × 10−9 per generation per base pair and generation time of 0.5 years). Second, the Norway brown rat is widely used as a laboratory model in both general genomic studies, but also those that require genome editing. Third, it has an excellent quality (i.e., highly complete and contiguous) reference genome that is more complete than that of another possibly relevant candidate, the black rat (Rattus rattus) with regards to contig N50 (29.20 Mb versus 1.64 Mb), and number of scaffolds (176 versus 2,173) and contigs (757 versus 1,635,336)., Following sequencing using both Illumina and BGISeq technologies, the sequence data were trimmed and mapped to the Norway brown rat reference genome (mRatBN7.2, NCBI: GCA_015227675.2, male) using the Paleomix pipeline. The sequences displayed characteristic aDNA damage profiles such as misincorporations and fragmentation (Figure S1; Table S1). The amount of sequence data generated allowed us to map the Christmas Island rat’s genome sequence to an average depth of 60.81× once the data from the two samples was merged. Nevertheless, despite this high average depth of coverage, the data only spanned 95.15% of the Norway brown rat reference genome (Table 1), raising the question as to why. While this can be partly explained by the observation that 0.81% of the bases in the reference genome are undetermined (Ns), we hypothesized that the remaining 4.04% of the reference genome was unmappable because either (1) the short length of the ancient DNA templates (take the BGISeq data, for example; 48.27% of the reads are shorter than 50 bp; Figure S1A) reduces their mapping ability; (2) the AT richness of some genome regions introduces PCR amplification bias, thus sequencing bias; and/or (3) the missing regions are unmappable due to the evolutionary divergence of the two species.
Table 1

Summary of mapping genomic sequencing data of five Rattus species to Norway brown rat reference genome, related to Table S1 and Figure S1

Hit readsCoverageMaxDepth10×
Christmas Island rat/Maclear’s rat (Merged data)Rattus macleari289209613560.809770942870.95150.91420.8793
Christmas Island rat/Maclear’s rat (BGISeq data)Rattus macleari109569839422.59070271970.93650.86010.766
Christmas Island rat/Maclear’s rat (HiSeq data)Rattus macleari179436124138.170208172050.9370.87710.816
Christmas Island rat/Maclear’s rat (MiSeq data)Rattus macleari20365000.04886006230.04440.00010.0001
Norway brown rat (Simulative ancient DNA)Rattus norvegicus292156766960.771774181140.99190.99190.9919
Norway brown rat (Simulative modern DNA)Rattus norvegicus161668390760.79049948830.99190.99190.9919
Norway brown rat (Real modern DNA, five samples)aRattus norvegicus155533023860.830496872050.99140.9870.9759
Norway brown rat (Real modern DNA, four samples)bRattus norvegicus159146798360.790351881840.99120.98450.9593
Himalayan field rat (seven samples)Rattus nitidus2132796965120.46044764470.98480.97480.9664
Himalayan field rat (three samples)cRattus nitidus101501382857.697646932080.98050.96470.9502
Himalayan field rat (four samples)dRattus nitidus111778313762.76280072460.98150.96570.9512
Asian house ratRattus tanezumi58713468929.877081061010.92760.86340.8002
Black ratRattus rattus115279252842.484571571390.94810.90980.8815

Norway brown rat (Real modern DNA, five samples): China1+Mali+AH2+BJ+Cambodia5

Norway brown rat (Real modern DNA, four samples): Mali+AH1+SD+Cambodia5

Himalayan field rat (three samples): NZ1+NZ2+WH3

Himalayan field rat (four samples): SG1+SG2+WH1+WH2

Summary of mapping genomic sequencing data of five Rattus species to Norway brown rat reference genome, related to Table S1 and Figure S1 Norway brown rat (Real modern DNA, five samples): China1+Mali+AH2+BJ+Cambodia5 Norway brown rat (Real modern DNA, four samples): Mali+AH1+SD+Cambodia5 Himalayan field rat (three samples): NZ1+NZ2+WH3 Himalayan field rat (four samples): SG1+SG2+WH1+WH2 To test these hypotheses, we undertook several different analyses. First we used gargammel to generate in silico simulative modern (60.79×) and ancient data (60.77×) of the Norway brown rat and mapped them back to the Norway brown rat reference genome. The results showed that when mapped back to the reference genomes, both the simulative modern and ancient datasets covered over 99.19% of the reference genome. Second, we mapped two sets of real Norway brown rat sequencing data of 60.83× and 60.79× coverage to its reference genome, and found that both of them covered 99.14% and 99.12% of the genome, respectively (Table 1). The slight difference between the real and simulative modern data in terms of recovering genomic regions was the result of the genetic variation between the sequenced individuals and the reference genome. Third, we explored the relationship between depth of sequencing coverage and AT content and found that the regions in Norway rat genome with higher AT content did tend to have lower coverage (Figure S2), suggesting that PCR amplification bias may partly contribute to the problem, but not to the degree needed to explain the observations. Fourth, we explored the role of evolutionary divergence in reducing the mappability of sequence reads by obtaining the sequence datasets of three other Rattus species from public databases (the Himalayan field rat [Rattus nitidus], the Asian house rat [Rattus tanezumi], and the black rat),,, then mapping them to the Norway brown rat reference genome. Using this nuclear genome dataset, we both inferred the phylogenetic placement and evolutionary divergence times among the five Rattus species, and calculated the percent genome coverage for each species recovered after mapping to the Norway rat. Our results not only provide a new, nuclear genome-based estimate of the divergence times of the Christmas Island rat from other species, but more importantly show that although all five Rattus species share a last common ancestor only ca. 2.3 mya (Figure 1), the percentage genome coverage rapidly decreases to as low as 92.76% for R. tanezumi (Table 1).
Figure 1

The phylogenetic placement and evolutionary timescale of R. norvegicus, Rattus nitidus, R. macleari, Rattus rattus, and Rattus tanezumi

Numbers following the species names indicate the coverage of genomic sequencing data for the corresponding species when mapped to the Norway brown rat reference genome. Related to Table S6.

The phylogenetic placement and evolutionary timescale of R. norvegicus, Rattus nitidus, R. macleari, Rattus rattus, and Rattus tanezumi Numbers following the species names indicate the coverage of genomic sequencing data for the corresponding species when mapped to the Norway brown rat reference genome. Related to Table S6. In summary, the above analyses provide clear evidence that a major part of the 4.04% of the Norway rat genome that is not covered by Christmas Island rat sequences derives from evolutionary divergence, as opposed to the quality of the reference genome itself or damage to the ancient DNA templates. In light of this, an interesting question is, how representative would a hypothetically re-generated Christmas Island rat be of the authentic extinct form? To answer this question, we explored the genomic distribution of the 128,423,913 bp of the Norway brown rat genome that was not covered by Christmas Island rat sequence data, and found that ca. one-quarter of it fell within gene regions (Table S2), thus implying that information is missing that would likely have functional consequences. We then calculated the coverage of each of the 34,200 genes annotated in the Norway rat reference genome, including 22,228 protein coding genes and 11,972 non-coding genes (Figure 2; Table S3). We found that 17,121 (50.19%) genes were covered at higher than 0.99 completeness. Almost all genes (83/86) encoding keratins or keratin-associated proteins, which are the key structural materials of hair and whiskers, have coverage higher than 90%. Additionally, all eight orthologs of the human round-ear phenotype-involved-genes (CEP57, ERF, MYH3, NALCN, PSMC3, TNNI2, TNNT3, and TPM2; https://hpo.jax.org/app/browse/term/HP:0100830) were found to be covered at higher than 97%. These results suggested that most of the long thick black hair, long dark whisker, and round ear phenotypes of the Christmas Island rat could likely be recreated if genome editing of a Norway rat was attempted. However, another 1,661, 677, 235, and 100 genes exhibited coverages lower than 0.90, 0.75, 0.50, and 0.25, respectively. And notably, 26 genes, including MAGEB18, PUF-like, five endogenous retrovirus group K members (ERVKs), eight snRNA, ten snoRNA, and one tRNA, were completely missed by Christmas Island rat data (Figure 2; Table S3). We speculate that the absent MAGEB18 and ERVKs genes may simply be due to the two species’ different evolutionary histories, while the 19 non-coding RNAs may be unrecoverable simply due to their very short lengths.
Figure 2

Numbers of genes found at different coverage levels after mapping Christmas Island rat genomic sequencing data to the Norway brown rat reference genome

Coverage levels (on a scale of 0–1) are shown next to the figure. Related to Tables S2 and S3 and Figure S2.

Numbers of genes found at different coverage levels after mapping Christmas Island rat genomic sequencing data to the Norway brown rat reference genome Coverage levels (on a scale of 0–1) are shown next to the figure. Related to Tables S2 and S3 and Figure S2. We furthermore found that these incompletely covered genes are not random representatives of the genome. Rather, genes that exhibit a coverage of lower than 0.9 are biased for GO/KEGG terms related to immune response (“autoimmune thyroid disease,” “antigen processing and presentation,” “herpes simplex virus 1 infection,” “MHC class I protein binding,” “immune response,” etc.) and olfaction (“olfactory receptor activity” and “odorant binding”) (Figures 3A and 3B; Tables S4 and S5). Additionally, even when ontology categories that appear at least superficially similar were compared, striking differences were observed. While the coverages of major histocompatibility complex (MHC) class I genes were significantly lower than other genes (q = 3.19E-12), MHC II yielded significantly higher coverage (higher than 0.9) (Figure S3A). The vomeronasal 2 receptor (Vom2r) genes, one of the olfactory receptor families associated with the detection of peptide pheromone, yielded significantly lower coverage than vomeronasal 1 receptor (Vom1r) genes, and indeed genes in other categories (Figures 3C and 3D).
Figure 3

Annotation of genes with unrecoverable regions in of Christmas Island rat genome

(A) GO enrichment (q < 0.05) of Christmas Island rat genes obtained at coverage lower than 0.9. Numbers following bars: the number of genes; x axes: −log10(q value); y axes: the GO terms enriched in genes with coverage lower than 0.9.

(B) KEGG enrichment (q < 0.05) in genes with coverage lower than 0.9; x axes: rich factor, number of genes with coverage lower than 0.9/total genes in KEGG terms; y axes: the KEGG pathways enriched in genes with coverage lower than 0.9.

(C and D) The coverage of genomic sequencing data on major histocompatibility complex (MHC) and vomeronasal receptor (VoR) genes in the Norway brown rat reference genome.

Related to Tables S4 and S5.

Annotation of genes with unrecoverable regions in of Christmas Island rat genome (A) GO enrichment (q < 0.05) of Christmas Island rat genes obtained at coverage lower than 0.9. Numbers following bars: the number of genes; x axes: −log10(q value); y axes: the GO terms enriched in genes with coverage lower than 0.9. (B) KEGG enrichment (q < 0.05) in genes with coverage lower than 0.9; x axes: rich factor, number of genes with coverage lower than 0.9/total genes in KEGG terms; y axes: the KEGG pathways enriched in genes with coverage lower than 0.9. (C and D) The coverage of genomic sequencing data on major histocompatibility complex (MHC) and vomeronasal receptor (VoR) genes in the Norway brown rat reference genome. Related to Tables S4 and S5.

Conclusion

Our results clearly demonstrate that, should genome editing (ignoring current technical limitations) be applied to the Norway brown rat in order to recreate the Christmas Island rat through editing every identifiable difference, a remarkable number of genes would either only partially resemble the extinct form, or in the worst case, remain 100% Norway rat-like. Naturally given that ultimately, evolutionary divergence is driving this phenomenon, the use of a more closely related species (e.g., the black rat) would lead to some improvements in the amount of the genome reconstructed, although any gains are likely to be small. For example, mapping of our Christmas Island rat data to the black rat allows recovery of a maximum of 96.56% of its genome, compared to 95.15% of the Norway rat genome. Furthermore, it is clear that the non-random distribution of these genes would have consequences for the resulting biology of the reconstructed animals, potentially precluding reintroduction of the species to its original environment. For example, given the role of olfaction in many critical behaviors; such as foraging and food selection, detecting predators, and mate choice; a reconstructed Christmas Island rat would lack attributes likely critical to surviving in its natural or natural-like environment. In contrast, however, in light of the hypothesis that the Christmas Island rat was driven extinct due to an infectious disease introduced from black rats,, one might hypothesize that maintaining the immune genes of the Norway rat could even have some potential benefit. Current de-extinction work is focused on species such as Mammuthus primigenius, the woolly mammoth (μ = 3.83 × 10−8, estimated generation time = 31 years), and Ectopistes migratorius, the passenger pigeon (μ = 5.68 × 10−9, estimated generation time = 4 years) (Table S6), and we suspect it is unfortunately unlikely that serious efforts will ever be attempted to bring back a rat species. Nevertheless, by using, as an example here, closely related extinct and extant rat species, we highlight that the scale of the challenge will only multiply as the evolutionary divergence between extinct and living species increases. In this context we note that the genomic divergence between the woolly mammoth and Asian elephant (Elephas maximus) is similar to that between the Christmas Island rat and the Norway brown rat, while the genomic divergence between the passenger pigeon and band-tailed pigeons (Patagioenas fasciata) is much larger (2.24 times) (Table S6). As divergence relates not only to absolute time, but generation time, ideally analyses such as ours should be done on a case-by-case basis. Therefore, we hope that the approach demonstrated here may offer a framework that others can consider when exploring the viability of other proposed de-extinction projects.

STAR★Methods

Key resources table

Resource availability

Lead contact

Further information and requests for resources and reagents should be directed to and will be fulfilled by the Lead Contact, Jianqing Lin (linjianqing@stu.edu.cn).

Materials availability

This study did not generate new unique reagents.

Experimental model and subject details

Historic specimens

The specimens, ID numbers 18844 and 18845 from the collections of the Oxford University Museum of Natural History, are dried skin samples collected on Christmas Island between 1900-1902, and originally sampled for genetic analyses for a prior study that explored whether the species’ extinction could be ascribed to introduced pathogens.

Method details

Ancient DNA methods

DNA was extracted from ca 1x1 cm2 of the skin samples using the digestion buffer of Gilbert et al. combined with a silica-based purification following Dabney et al.

Library construction and sequencing

The specimen 18844 was used for Illumina (HiSeq/Miseq) and 18845 for BGISEQ-500 library construction. Extracted DNA was converted into both Illumina and BGISEQ-500 compatible libraries using blunt-end protocols, with both sequenced with 100 bp SE chemistry. Illumina library construction used the NEBNext 6070L kit, following Wales et al. (2015), while BGISEQ library construction followed Mak et al. (2017). In total 2,694,229,632 reads of Illumina and 2,754,802,455 reads of BGISeq data were generated from these libraries.

Mapping and calculation of coverage/depth

Before mapping, the last 10 bases of each read from the BGISEQ-500 sequencing perform were removed because they represent the index. Subsequently the sequence data was trimmed and mapped against the reference genomes of the Norway brown rat (mRatBN7.2, GCA_015227675.2) and black rat, using Paleomix v1.3.2. Specifically, the adapters in BGISEQ-500 data (–adapter1: AAGTCGGAGGCCAAGCGGTCTTAGGAAGACAA;–adapter2: GAACGACATGGCTACGATCCGACTT) and Illumina data (–adapter1: AGATCGGAAGAGCACACGTCTGAACTCCAGTCACNNNNNNATCTCGTATGCCGTCTTCTGCTTG;–adapter2: AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT) were trimmed by AdapterRemoval v2.3.1 with default setting. bwa v0.7.17 (the backtrack algorithm) was used to map the reads to the genome with options “MinQuality: 0; FilterUnmappedReads: yes; UsedSeed: no.” In the mapping step, we used the same setting for modern and ancient samples to avoid introducing any biases. For the ancient samples, mapDamage v2.2.1 was used to estimate the ancient DNA damage parameters, to both validate that the data is truly ancient and to provide input values for the gargammel simulations. We recovered the mtDNA consensus sequences (see below) from each of the Christmas Island specimens, and compared them using MEGA X, and found that they exhibited very little genetic distance (0.002890), thus we elected to merge the two sequence datasets to obtain the final high coverage dataset. The bam files generated from each species were merged into one bam file using samtools v1.9. We used paleomix coverage and Paleomix depths to calculate the coverage and depth histogram for a bam file. The “bedtools coverage” command in bedtools v2.29.0 was used to calculate the coverage rate of each gene in the Norway rat genome. The “bedtools nuc command in bedtools v2.29.0 was used to calculate the base composition of each 100-bp window across chromosome NC_051336.1 of the Norway rat genome.

Consensus genome

We identified the SNPs in each of the rat/mouse species’ genomes, and replaced them in the Norway brown rat reference genome to create the consensus genomes using the bam file from one sample of each species and samtools v1.9 and bcftools v1.9. The “seqkit fq2fa” command in seqkit v0.16.1 was used to converse fastq file into fasta format.

Quantification and statistical analysis

Sequence simulation

We used gargammel, a DNA sequence simulator, to generate simulative modern and simulative ancient Norway brown rat data. The simulated reads were set to be single end and 100 bp in length (consistent with the BGISEQ-500 data). We set the overall raw data coverage of the modern data 62.63 × and that of ancient data 63.59 × to ensure that the coverage of both the modern and ancient data to be about 60.81 × , consistent with that of the Christmas Island rat. For the modern data, the fixed fragment length (-l) is 100 bp. For the ancient data, the size frequency file (f) and the miscorporation file (-mapdamage) input values were taken from the estimates made by mapDamage on the BGISEQ-500 Christmas Island rat data.

GO and KEGG annotation and enrichment

GO term and KEGG pathway enrichment analyses were carried out using the KOBAS 3.0 web server. The Statistical test method was Fisher’s Exact Test and the false discovery rate (FDR) correction method was Benjamini and Hochberg. GO terms and KEGG pathways with an FDR (q value) < 0.05 were regarded as significantly enriched.

Phylogenetics and molecular dating of Rattus

We inferred the phylogenetic placement and evolutionary timescale of five rat species using a set of 3,095 loci regions with 1000 nucleotides each (3,095,000 nucleotides). Loci were first identified by mapping the genomic data for five species of rats (R. macleari, R. rattus, R. tanezumi, R. norvegicus, and R. nitidus) and the mouse (M. musculus) to the reference genome of R. norvegicus. While it is ideal to use outgroup species as reference, we found that using the mouse as a reference led to a near-complete lack of phylogenetic signal among rat species. This can be explained by the substantial distance of rat species to the mouse, relative to the distances within rat species. This difference leads to a bias toward excessively slowly-evolving nucleotide substitutions that lack information about very recent divergences. Therefore, we focused on the data using the Norway rat genome as a reference. Future research using a close outgroup relative as reference would be a valuable contribution. We randomly extracted contiguous windows with 100 Kb each from 1 Mb window in the genomes. We then performed automated multiple sequence alignment using MAFFT v7.4, and randomly extracted 1 Kb windows from alignments, in order to minimize the impact of recombination breakpoints on the data. Maximum likelihood phylogenetic searches were performed for each locus under a GTR+R4 substitution model as implemented in IQ-TREE v1.6. Data selection and individual locus tree inference was followed by two methods of species tree inference. First, we concatenated our loci and performed maximum likelihood phylogenetic inference on the concatenated dataset under the GTR+R10 substitution model using IQ-TREE. Approximate likelihood ratio tests (aLRT) per branch were used as branch supports with 1,000 bootstrap replicates. Second, we inferred the species tree under the multi-species coalescent, used the individual locus trees as input to the summary coalescent method implemented in ASTRAL-III. Local posterior probabilities were taken as branch supports following the multispecies coalescent analysis. Both methods of species tree inference led to identical tree topologies and maximal statistical supports for all branches. Most branches were supported by nearly the whole set of loci and nucleotides. The single exception was the placement of the Christmas Island rat, which was supported by 60.3% of gene trees and 53.8% of nucleotide sites. This low concordance across the data is likely driven by short times between divergence events involving these taxa, and therefore large amounts of incomplete lineage sorting in the data. Molecular dating was performed by assuming our inferred species tree topology and a further subset of 1,407 loci (1,407,000 nucleotides). Loci were retained for molecular dating to minimize rate variation among lineages, and gene tree discordance. The loci included were those that led to locus trees with coefficients of variation in root-to-tip lengths (non-clocklikeness) < 0.1, and Robinson-Foulds distances to our inferred species tree £2. Selected loci were then concatenated for molecular dating. A single time-calibration was used at the root of the tree, taking the mouse-rat divergence to have occurred between 11 and 12.3 Mya following evidence from palaeontology (genus Prognomys) and phylogenetics. The prior distribution for this root calibration was a uniform with soft maximum and minimum bounds, with a 2.5% prior probability of the age occurring beyond each bound. Bayesian molecular dating was performed using a GTR+Γ substitution model and an uncorrelated-gamma relaxed clock model as implemented in MCMCtree, using PAML v4.8. Approximate Bayesian computation was implemented to improve the efficiency of the analysis. The posterior distribution was approximated using Markov chain Monte Carlo (MCMC), starting with a burn-in phase of 105 MCMC steps, and then drawing samples every 103 MCMC steps over a total of 107 steps. Convergence to the stationary distribution was verified by comparing parameter estimates from four independent runs, and confirming that effective sample sizes were above 200 for sampled parameters.

Additional resources

The reference genome of Norway brown rat (R. norvegicus), black rat (Rattus rattus) and house mouse (Mus musculus) was downloaded from the NCBI website (assembly accession: GCA_015227675.2, GCF_011064425.1 and GCF_000001635.27). Additional Norway brown rat resequencing datasets were downloaded from CNCB-NGDC (http://gsa.big.ac.cn/) under accession IDs CRX019583 (Mali), CRX019522 (Cambodia5), CRX019633 (China1), CRX019515 (AH1), CRX019516 (AH2), CRX019517 (BJ1) and CRX019639 (SD1). The black rat resequencing datasets were downloaded from CNCB-NGDC under Accession ID CRX019632, from EBI under accession ID SAMEA2051945 and from NCBI under accession ID SRX9009079. The Himalayan field rat (R. nitidus) sequence data were downloaded from NCBI under accession ID SAMN05425704 (NZ2), SAMN05425705 (SG1), SAMN05425706 (SG2), SAMN05425709 (NZ1), SAMN05425641 (WH1), SAMN05425642 (WH2), and SAMN05425643 (WH3). The Asian house rat (R. tanezumi) sequence data were downloaded from NCBI under accession ID SAMN05425710. The house mouse sequence data were downloaded from NCBI under accession ID SRX10650663.
REAGENT or RESOURCESOURCEIDENTIFIER
Biological samples

Dry preserved skin samples of the Christmas Island rat (Rattus macleari)Oxford University Museum of Natural History18844
Dry preserved skin samples of the Christmas Island rat (Rattus macleari)Oxford University Museum of Natural History18845

Deposited data

Christmas Island rat (Rattus macleari) resequencing datasetsThis study; ENA (https://www.ebi.ac.uk/)SAMEA12813846 (18844)SAMEA12813847 (18845)
Reference genome of Norway brown ratNCBI website (https://www.ncbi.nlm.nih.gov/assembly/)GCF_015227675.2
Reference genome of black ratNCBI website (https://www.ncbi.nlm.nih.gov/assembly/)GCF_011064425.1
Reference genome of house mouseNCBI website (https://www.ncbi.nlm.nih.gov/assembly/)GCF_000001635.27
Norway brown rat resequencing datasetsCNCB-NGDC (http://gsa.big.ac.cn/)CRX019583 (Mali), CRX019522 (Cambodia5), CRX019633 (China1), CRX019515 (AH1), CRX019516 (AH2), CRX019517 (BJ1) and CRX019639 (SD1)
Black rat resequencing dataset #1CNCB-NGDC (http://gsa.big.ac.cn/)CRX019632
Black rat resequencing dataset #2ENA (https://www.ebi.ac.uk/)SAMEA2051945
Black rat resequencing dataset #3NCBI (https://www.ncbi.nlm.nih.gov/sra/)SRX9009079
Himalayan field rat resequencing datasetsNCBI (https://www.ncbi.nlm.nih.gov/sra/)SAMN05425704 (NZ2), SAMN05425705 (SG1) SAMN05425706 (SG2), SAMN05425709 (NZ1), SAMN05425641 (WH1), SAMN05425642 (WH2), and SAMN05425643 (WH3).
Asian house rat resequencing datasetNCBI (https://www.ncbi.nlm.nih.gov/sra/)SAMN05425710
House mouse resequencing datasetNCBI (https://www.ncbi.nlm.nih.gov/sra/)SRX10650663

Software and Algorithms

AdapterRemoval v2.3.1Schubert et al., 201629https://adapterremoval.readthedocs.io/en/stable/
bwaLi and Durbin30http://bio-bwa.sourceforge.net/
Paleomix v1.3.2Schubert et al., 201420https://paleomix.readthedocs.io/en/stable/
mapDamage v2.2.1Jónsson et al.31https://github.com/ginolhac/mapDamage/
MEGA XKumar et al.32https://www.megasoftware.net/
samtools v1.9Li et al.33https://github.com/samtools/samtools
bedtools v2.29.0Quinlan34
bcftools v1.9Li et al.33https://github.com/samtools/bcftools
seqkit v0.16.1Shen et al.35https://github.com/shenwei356/seqkit/tree/v0.16.1
gargammelRenaud et al.21https://grenaud.github.io/gargammel/
KOBAS 3.0Bu et al.36http://bioinfo.org/kobas
MAFFT v7.4Katoh et al.37https://mafft.cbrc.jp/alignment/software/
IQ-TREE v1.6Minh et al.38http://www.iqtree.org/
PAML v4.8Yang39https://github.com/abacus-gene/paml
  47 in total

1.  New insights on single-stranded versus double-stranded DNA library preparation for ancient DNA.

Authors:  Nathan Wales; Christian Carøe; Marcela Sandoval-Velasco; Cristina Gamba; Ross Barnett; José Alfredo Samaniego; Jazmín Ramos Madrigal; Ludovic Orlando; M Thomas P Gilbert
Journal:  Biotechniques       Date:  2015-12-01       Impact factor: 1.993

Review 2.  Paleontological evidence to date the tree of life.

Authors:  Michael J Benton; Philip C J Donoghue
Journal:  Mol Biol Evol       Date:  2006-10-17       Impact factor: 16.240

3.  PAML 4: phylogenetic analysis by maximum likelihood.

Authors:  Ziheng Yang
Journal:  Mol Biol Evol       Date:  2007-05-04       Impact factor: 16.240

4.  Characterization of ancient and modern genomes by SNP detection and phylogenomic and metagenomic analysis using PALEOMIX.

Authors:  Mikkel Schubert; Luca Ermini; Clio Der Sarkissian; Hákon Jónsson; Aurélien Ginolhac; Robert Schaefer; Michael D Martin; Ruth Fernández; Martin Kircher; Molly McCue; Eske Willerslev; Ludovic Orlando
Journal:  Nat Protoc       Date:  2014-04-10       Impact factor: 13.491

5.  Ancient DNA: extraction, characterization, molecular cloning, and enzymatic amplification.

Authors:  S Pääbo
Journal:  Proc Natl Acad Sci U S A       Date:  1989-03       Impact factor: 11.205

6.  Complete genomes reveal signatures of demographic and genetic declines in the woolly mammoth.

Authors:  Eleftheria Palkopoulou; Swapan Mallick; Pontus Skoglund; Jacob Enk; Nadin Rohland; Heng Li; Ayça Omrak; Sergey Vartanyan; Hendrik Poinar; Anders Götherström; David Reich; Love Dalén
Journal:  Curr Biol       Date:  2015-04-23       Impact factor: 10.834

7.  BEDTools: The Swiss-Army Tool for Genome Feature Analysis.

Authors:  Aaron R Quinlan
Journal:  Curr Protoc Bioinformatics       Date:  2014-09-08

8.  KOBAS-i: intelligent prioritization and exploratory visualization of biological functions for gene enrichment analysis.

Authors:  Dechao Bu; Haitao Luo; Peipei Huo; Zhihao Wang; Shan Zhang; Zihao He; Yang Wu; Lianhe Zhao; Jingjia Liu; Jincheng Guo; Shuangsang Fang; Wanchen Cao; Lan Yi; Yi Zhao; Lei Kong
Journal:  Nucleic Acids Res       Date:  2021-06-04       Impact factor: 16.971

9.  Recent Evolution in Rattus norvegicus Is Shaped by Declining Effective Population Size.

Authors:  Eva E Deinum; Daniel L Halligan; Rob W Ness; Yao-Hua Zhang; Lin Cong; Jian-Xu Zhang; Peter D Keightley
Journal:  Mol Biol Evol       Date:  2015-06-01       Impact factor: 16.240

10.  ASTRAL-III: polynomial time species tree reconstruction from partially resolved gene trees.

Authors:  Chao Zhang; Maryam Rabiee; Erfan Sayyari; Siavash Mirarab
Journal:  BMC Bioinformatics       Date:  2018-05-08       Impact factor: 3.169

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.