Literature DB >> 31418539

Chromosomal level assembly and population sequencing of the Chinese tree shrew genome.

Yu Fan1,2, Mao-Sen Ye1,3, Jin-Yan Zhang1,3, Ling Xu1,2, Dan-Dan Yu1,2, Tian-Le Gu1,3, Yu-Lin Yao1,3, Jia-Qi Chen4, Long-Bao Lv4, Ping Zheng2,3,4,5,6,7, Dong-Dong Wu2,6, Guo-Jie Zhang2,6, Yong-Gang Yao8,2,3,4,5.   

Abstract

Chinese tree shrews (Tupaia belangeri chinensis) have become an increasingly important experimental animal in biomedical research due to their close relationship to primates. An accurately sequenced and assembled genome is essential for understanding the genetic features and biology of this animal. In this study, we used long-read single-molecule sequencing and high-throughput chromosome conformation capture (Hi-C) technology to obtain a high-qualitychromosome-scale scaffolding of the Chinese tree shrew genome. The new reference genome (KIZ version 2: TS_2.0) resolved problems in presently available tree shrew genomes and enabled accurate identification of large and complex repeat regions, gene structures, and species-specific genomic structural variants. In addition, by sequencing the genomes of six Chinese tree shrew individuals, we produced a comprehensive map of 12.8 M single nucleotide polymorphisms and confirmed that the major histocompatibility complex (MHC) loci and immunoglobulin gene family exhibited high nucleotide diversity in the tree shrew genome. We updated the tree shrew genome database (TreeshrewDB v2.0: http://www.treeshrewdb.org) to include the genome annotation information and genetic variations. The new high-quality reference genome of the Chinese tree shrew and the updated TreeshrewDB will facilitate the use of this animal in many different fields of research.

Entities:  

Keywords:  Chromosomal level assembly genome; Database; Population sequencing; Tupaia belangeri

Mesh:

Year:  2019        PMID: 31418539      PMCID: PMC6822927          DOI: 10.24272/j.issn.2095-8137.2019.063

Source DB:  PubMed          Journal:  Zool Res        ISSN: 2095-8137


INTRODUCTION

Tree shrews (Tupaia belangeri) are widely distributed throughout South Asia, Southeast Asia (Fuchs & Corbach-Söhle, 2010), and South and Southwest China (Peng et al., 1991). They possesses many unique characteristics that are useful in biomedical research models, such as small adult body size (100–150 g), easy and low cost maintenance, short reproductive cycle (~6 weeks), moderate life span (6–8 years), high brain-to-body mass ratio, and very close relationship to primates (Fan et al., 2013; Xiao et al., 2017; Xu et al., 2012; Yao, 2017; Zheng et al., 2014). Hitherto, tree shrews have been used in a wide variety of studies, including research on viral infection (Amako et al., 2010; Guo et al., 2018; Kock et al., 2001; Li et al., 2014; Li et al., 2016; Yang et al., 2013), visual cortex function (Bosking et al., 2002; Lee et al., 2016; MacEvoy et al., 2009; Mooser et al., 2004; Veit et al., 2014), brain development and aging (Fan et al., 2018; Wei et al., 2017), and neuropsychiatric disorders induced by social stress (Fuchs, 2005; Meyer et al., 2001). Previously, we successfully sequenced the genome of the Chinese tree shrew (Tupaia belangeri chinensis) using Illumina short-read sequencing (KIZ version 1: TS_1.0) and showed their close relationship to non-human primates, thereby settling a long-running debate regarding the phylogenetic position of tree shrews within eutherian mammals (Fan et al., 2013). Furthermore, to advance the use of the tree shrew genome, we developed a user-friendly tree shrew database (TreeshrewDB: www.treeshrewdb.org) (Fan et al., 2014). The successful genome sequencing (Fan et al., 2013) and genetic manipulation of tree shrews (Li et al., 2017) have opened up new avenues for the wide usage of this species in biomedical research (Yao, 2017). Accurate genome sequencing and assembly are essential for understanding phylogenetic relationships and genome and phenome evolution (Kronenberg et al., 2018). Despite the fact that short-read sequencing technologies remain the most popular methods used to generate high-throughput data at relatively low cost (Schatz et al., 2010), whole-genome assembly of mammalian genomes based on these older sequencing technologies contains many problems, including assembly gaps and incomplete gene models (Sohn & Nam, 2018). For instance, approximately 50% of the human genome comprises non-random repeat elements (Cordaux & Batzer, 2009) and a complex sequence structure, which is a major challenge in reference genome assembly (Phillippy et al., 2008). Although our earlier Chinese tree shrew genome (KIZ version 1: TS_1.0) produced in 2013 (Fan et al., 2013) had high sequencing coverage (79x), the assembled genome still contained 223 607 gaps (including 65 222 gaps in the genic region), and thus did not fully meet research needs. Single-molecule sequencing technology can generate reads tens of kilobases in size and can span most repeat sequences, which allows for complete reference genome assembly (Bickhart et al., 2017; Chaisson et al., 2015). High-throughput chromosome conformation capture (Hi-C) technology can be used to study the three-dimensional architecture of genomes and can order, orient, and anchor contigs into chromosome-scale scaffolds (Burton et al., 2013). Here, we applied both long-read single-molecule sequencing and Hi-C technology to obtain a new reference genome for the Chinese tree shrew. We also generated a single nucleotide polymorphism (SNP) map of the tree shrew by whole-genome sequencing of six individuals. We updated the TreeshrewDB v2.0 (http://www.treeshrewdb.org) to incorporate the new reference genome and population genetic variations.

MATERIALS AND METHODS

Tissue samples and genome sequencing

A male Chinese tree shrew from the Experimental Animal Center of the Kunming Institute of Zoology, Chinese Academy of Sciences, was used for single-molecule, real-time (SMRT) long-read sequencing (PacBio) and Hi-C sequencing. Ear tissues of six Chinese tree shrews were used for whole-genome sequencing using Illumina HiSeq X Ten (USA). This study was approved by the Institutional Review Board of the Kunming Institute of Zoology, Chinese Academy of Sciences (KIZ-SYDW-20101015-001 and KIZ-SMKX-20160315-001). We generated long-insert (20–40 kb) genomic libraries based on standard SMRT sequencing protocols developed by Pacific Biosciences (PacBio). The libraries were sequenced using the PacBio RS II instrument with the P6-C4 sequencing reagent. Brain tissue from the same individual was used to construct the Hi-C libraries. Briefly, minced brain tissue was fixed in 2% formaldehyde for 10 min and then lysed in 2.5 mol/L glycine. Cross-linked genomic DNA was digested with Mbol (#B7024, New England Biolabs, UK). Sticky ends were filled with nucleotides, one of which was biotinylated. Ligation was performed under extremely dilute conditions favoring intramolecular ligation events: the Mbol site was lost and a NheI (#R3131, New England Biolabs, UK) site was created. DNA was purified and sheared, and biotinylated junctions were isolated using streptavidin beads. Interacting fragments were sequenced by Illumina HiSeq X Ten (USA). For whole-genome sequencing of the six tree shrew individuals, short-insert read (300 bp) genomic libraries were constructed using the Illumina TruSeq Nano DNA Library Prep Kits (USA) and sequenced using the Illumina HiSeq X Ten (USA).

Genome assembly and quality evaluation

We applied Canu (Koren et al., 2017) to correct the SMRT reads, then used smartdenovo (https://github.com/ruanjue/smartdenovo) to perform de novoassembly. The assembly was error-corrected using Quiver (Chin et al., 2013) and Pilon (Walker et al., 2014) based on alignment of 30-fold Illumina paired-end reads. The Hi-C sequencing reads were aligned to the assembled contigs using the bowtie2 end-to-end algorithm (Langmead & Salzberg, 2012). We used Lachesis (Burton et al., 2013) to cluster, order, and direct the assembled contigs onto 31 pseudo-chromosomes (TS_2.0 assembly), which was arbitrarily defined based on the number of haploid chromosomes of the tree shrew (Liu et al., 1989). A total of 4 104 benchmarking universal single-copy orthologs in the mammalian dataset of the Benchmarking with Universal Single-Copy Orthologs (BUSCO) (Simao et al., 2015) were mapped to the assembled contigs using tBlastn (Altschul et al., 1997) to assess overall assembly quality. We also used the whole-genome sequencing data of the male Chinese tree shrew to assess the quality of the TS_2.0 assembly. In brief, we aligned the reads to the TS_2.0 assembly and previous TS_1.0 assembly (Fan et al., 2013) using BWA (Li & Durbin, 2009). We called genetic variants (SNPs and indels (insertions and deletions)) using FreeBayes (Garrison & Marth, 2012) and the structural variants (SVs) using Lumpy-SV (Layer et al., 2014), respectively. The feature response curve (FRC) (Vezzi et al., 2012) was estimated based on the aligned reads. The quality value (QV) was calculated as described previously (Bickhart et al., 2017): where, S indicates the cumulative length of all SNPs and indels identified using FreeBayes (Garrison & Marth, 2012) that had a probability of being heterozygous greater than 0.5, and B indicates the number of base pairs in the assembly that had at least 3x sequencing coverage.

Annotation of repeats in genome

We employed Tandem Repeats Finder v4.09 (Benson, 1999) to annotate the tandem repeats in the TS_2.0 assembly. The transposable elements (TEs) were identified based on a combination of de novo and homology-based predictions, as described in our previous study (Fan et al., 2013). Briefly, the RepeatModeler (Chen, 2004) was used to construct a de novo repeat library. We used RepeatMasker and RepeatProteinMask (Chen, 2004) to identify different types of TEs by aligning the TS_2.0 assembly with the known RepBase library (Chen, 2004) and the constructed de novo repeat library.

Gene prediction and annotation

A total of 13 RNA-seq datasets from our previous studies (Fan et al., 2013, 2018) were cleaned using Trimmomatic (Bolger et al., 2014), then aligned to the TS_2.0 assembly using Tophat2 (Kim et al., 2013). The cleaned reads were alsode novo assembled using Trinity (Grabherr et al., 2011). The above RNA-seq assemblies were further combined using PASA (Haas et al., 2008). For homology-based gene prediction, we downloaded protein sequences of humans (Homo sapiens), chimpanzees (Pan troglodytes), macaques (Macaca mulatta) and mice (Mus musculus) from Ensembl (release 71;https://asia.ensembl.org/index.html), which display more accurate annotation of gene models. These protein sequences were mapped to the TS_2.0 assembly using TblastN (Altschul et al., 1997). GeneWise (Birney et al., 2004) was used to define gene models. For ab initio gene prediction, Augustus (Stanke & Waack, 2003), Genescan (Salamov & Solovyev, 2000), SNAP (Korf, 2004), and GeneMark (Besemer & Borodovsky, 2005) were used to predict coding genes. We employed EVidenceModeler (Haas et al., 2008) to combine the RNA-seq, cDNA, and protein alignments with differentweights (RNA-seq>cDNA/protein>ab initio gene predictions) to achieve a comprehensive and non-redundant reference gene set. This gene setwas further updated using PASA (Haas et al., 2008), followed by annotation based on the best matches derived from the protein sequence alignments described in the SwissProt and TrEMBL databases (O’Donovan et al., 2002) using Blastp (with default parameters) (Altschul et al., 1997). We annotated motifs and domains of proteins using InterPro (Mulder & Apweiler, 2007) to search publicly available databases, including Pfam (http://pfam.sanger.ac.uk/), PRINTS (http://www.bioinf.manchester.ac.uk/dbbrowser/PRINTS/index.php), PROSITE (http://prosite.expasy.org/), ProDom (http://prodom.prabi.fr/prodom/current/html/home.php), and SMART (http://smart.embl-heidelberg.de/). Descriptions of gene products, such as Gene Ontology (Ashburner et al., 2000) information, were retrieved from InterPro (Mulder & Apweiler, 2007). Pathway information was obtained by blasting the above reference gene set with the KEGG database (Kanehisa & Goto, 2000), with the best hit for each gene used for the annotation.

Gene synteny map among different species

We used the human (hg38;https://www.ncbi.nlm.nih.gov/grc/human), macaque (rheMac3 (Yan et al., 2011)), and mouse (GRCm38;https://www.ncbi.nlm.nih.gov/grc/mouse) genomes and TS_2.0 to build a gene synteny map, as described previously (Fan et al., 2013). Briefly, the gene synteny map was constructed on the basis of orthologous genes. We did not use the whole genome alignment due to great sequence diversity among the species. The longest human, macaque, tree shrew, and mouse transcripts were chosen to represent each gene with alternative splicing variants. All protein sequences from the four species were aligned against the same protein set using BlastP with a similarity cutoff threshold of e-value=1×10-5. With the human protein set as a reference, we found the best hit for each protein in the other species, with a criterion that more than 30% of the aligned sequence showed identity above 30%. Reciprocal best-match pairs were defined as orthologs. Orthologs not in the gene synteny blocks were removed from further analysis. For example, for three continuous genes (A, B, and C) in the human genome, if all three orthologs could be identified between humans and tree shrews based on the cutoff threshold described above, and the B gene in the tree shrew genome was not between genes A and C, or located in other scaffolds or other places within the same scaffold, then the B gene was removed. Using this method, we identified four-way gene synteny relationships for humans, macaques, tree shrews, and mice. The gene order information of the human genome was used to identify the macaque, tree shrew, and mouse genomic SVs.

Whole genome sequencing and SNP calling

Low-quality raw short reads were removed using Trimmomatic v0.32 (Bolger et al., 2014) with the parameters “LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36”. Quality-filtered reads were aligned to the reference TS_2.0 assembly using BWA-MEM (Li & Durbin, 2009). Picard Tools (http://broadinstitute.github.io/picard/) were used to flag duplicate reads. Only non-duplicate reads were used for subsequent analyses. GenomeAnalysisTK-3.7 (GATK) (McKenna et al., 2010) was used to realign indels and recalibrate base quality. We retained all single-nucleotide variants (SNVs) called by GATK UG with a Phred-quality score>Q10. The SNVs were hard filtered with the parameters “DP>8 & QD>5.0 & HRun<5 & SB<0.00 & QUAL>50 & FS<60.0 & MQ>40.0 & HaplotypeScore>13.0”. ANNOVAR was used to classify variants into different functional categories according to their locations and expected effects on encoded gene products (Wang et al., 2010).

Population genetic analyses

Nucleotide diversity (π) and Tajima’s D value (Tajima, 1989) were estimated using VCFtools based on the six wild Chinese tree shrews, with a sliding window of 100 kb in each genome. For each coding gene, we estimated the population genetic parameters, including π, Watterson theta estimate (θw) (Watterson, 1975), Tajima’s D (Tajima, 1989), Fu and Li’s D (Fu & Li, 1993), Fu and Li’s F (Fu & Li, 1993), and Fay and Wu’s H (Fay & Wu, 2000), using in-house perl scripts. Manhattan plot analysis was performed using the R package qqman (https://cran.r-project.org/web/packages/qqman/index.html). mRNA expression analysis The raw RNA-seq data (Supplementary Table S1) were trimmed to remove sequencing adapters and reads containing one or more Ns>5% or of low quality (more than 20% of the base’s qualities were less than 10). The filtered reads were aligned to the reference genome TS_2.0 assembly using HISAT2 (Kim et al., 2015). The HTSeq-count (Anders et al., 2015) was used to count aligned reads mapped with the above reference gene set. We calculated the FPKM (fragments per kilobase per million mapped reads) value using in-house perl script to quantify mRNA expression as follows: where, FPKM refers to the mRNA expression level of gene A, C is the number of fragments uniquely aligned to gene A, N is the total number of reads uniquely aligned to all genes, and L is the base number in the coding region of gene A. For co-expression analysis, we used reported RNA-seq data from seven tree shrew brain tissues (Fan et al., 2013, 2018) to calculate Pearson’s correlation coefficients for each gene pair. A co-expression gene pair was defined by a Pearson’s correlation coefficient cut-off of 0.8.

Tree shrew database

Our developed tree shrew database (TreeshrewDB v2.0) runs on a dual-processor server with an Ubuntu operating system and is implemented under the LAMP (Linux-Apache-MySQL-Perl) software stack. The Chinese tree shrew genome TS_2.0 assembly, gene set, gene annotation, and other information are stored in the MySQL, and are administrated with the help of phpMyAdmin. The web interfaces were developed using various computer languages such as HTML, CSS, JavaScript, and Perl.

RESULTS

Assembly of reference genome and quality evaluation

We generated ~55x (148.58 Gb) whole-genome sequence coverage for the sampled adult male Chinese tree shrew using SMRT long-read sequencing technology (PacBio). After filtering poor-quality reads, we programmed a combination method of de novo assembly to generate a high-quality tree shrew genome (KIZ version 2: TS_2.0, with a size of 2.67 Gb). The new assembly produced a total of 3 344 sequence contigs, with a 112-fold reduction in the number of contigs compared to that of our previous assembly (KIZ version 1: TS_1.0) based on short reads (Fan et al., 2013) (Table 1). The contig N50 of TS_2.0 was 3.2 Mb and exhibited remarkable improvement (146-fold) compared with that of the previous assembly (TS_1.0) (Fan et al., 2013). Nearly 60% of contigs (1 963/3 344) were longer than 100 kb and accounted for 97.4% of the assembled genome. The longest contig was 16.2 Mb (Table 1; Figure 1). We used BUSCO analysis, which is a powerful tool for assessing genome assembly and annotation completeness with single-copy orthologs (Simao et al., 2015), to evaluate the quality of the TS_2.0 contigs. About 91.4% of the 4 104 core genes in the mammalian dataset were complete BUSCO genes (Table 2). These tests all showed that the newly assembled tree shrew genome contigs had superior quality to those in recently reported ape genomes (Kronenberg et al., 2018).
Table 1

Comparison of Chinese tree shrew assembly quality between assemblies TS_1.0 and TS_2.0

VersionItemContig length (bp)Scaffold length (bp)Contig No.Scaffold No.

Short-read assembly

(KIZ version 1: TS_1.0)

Total2 719 442 4842 861 790 358374 120150 513
Max_length187 50519 269 909
>2 000 bp180 8024 525
>100 kb3051 418
N5022 0003 655 60836 335234
N6017 5003 042 66450 199319
N7013 4312 302 65167 915427
N809 5711 648 84891 810573

Long-read assembly

(KIZ version 2: TS_2.0)

Total2 667 337 5362 667 507 2363 3441 647
Max_length16 177 999224 450 918
>2 000 bp3 3441 647
>100 kb1963281
N503 217 288104 643 08022910
N602 462 06294 037 08132313
N701 641 09371 760 10345716
N80995 87157 328 33766420

Contig: Contiguous length of genomic sequence in which the order of bases has a high confidence level. Gaps occur where reads from two sequenced ends of at least one fragment overlap with other reads in two different contigs. Scaffolds are composed of contigs and gaps. N50: N50 statistic defines assembly quality in terms of contiguity. Given a set of contigs ranked by contig size, N50 is defined as the size of the shortest contig, which adds contigs with larger size to reach 50% of total genome length. –: Not available.

Figure 1

Assembly, annotation, and nucleotide diversity of Chinese tree shrew genome

A: Contig length distribution of long-read assembly (TS_2.0) in comparison with short-read assembly (TS_1.0) (Fan et al., 2013). B: Circos plot showing genome-wide distribution profiles of genes, SNPs, and indels across Chinese tree shrew genome, and values of population genetic parameters (π and Tajima’s D). C: Manhattan plot of nucleotide diversity (π) at gene level based on SNPs located in coding regions of six wild tree shrews. Top 30 genes are shown in plot, with a cut-off π value of 0.025.

Table 2

Assessment of assembly completeness in Chinese tree shrew using BUSCO

ParameterNo.Percentage (%)
Complete genes3 74991.40
Complete and single-copy genes3 69690.10
Complete and duplicated genes531.30
Fragmented genes1844.50
Missing genes1714.10
Total genes4 104

BUSCO: Benchmarking with Universal Single-Copy Orthologs. A total of 4 104 benchmarking universal single-copy orthologs of the mammalian dataset were retrieved from BUSCO (Simao et al., 2015). These genes were mapped to the TS_2.0 assembly using tBlastn (Altschul et al., 1997). –: Not available.

Comparison of Chinese tree shrew assembly quality between assemblies TS_1.0 and TS_2.0 Short-read assembly (KIZ version 1: TS_1.0) Long-read assembly (KIZ version 2: TS_2.0) Contig: Contiguous length of genomic sequence in which the order of bases has a high confidence level. Gaps occur where reads from two sequenced ends of at least one fragment overlap with other reads in two different contigs. Scaffolds are composed of contigs and gaps. N50: N50 statistic defines assembly quality in terms of contiguity. Given a set of contigs ranked by contig size, N50 is defined as the size of the shortest contig, which adds contigs with larger size to reach 50% of total genome length. –: Not available.

Assembly, annotation, and nucleotide diversity of Chinese tree shrew genome

A: Contig length distribution of long-read assembly (TS_2.0) in comparison with short-read assembly (TS_1.0) (Fan et al., 2013). B: Circos plot showing genome-wide distribution profiles of genes, SNPs, and indels across Chinese tree shrew genome, and values of population genetic parameters (π and Tajima’s D). C: Manhattan plot of nucleotide diversity (π) at gene level based on SNPs located in coding regions of six wild tree shrews. Top 30 genes are shown in plot, with a cut-off π value of 0.025. Assessment of assembly completeness in Chinese tree shrew using BUSCO BUSCO: Benchmarking with Universal Single-Copy Orthologs. A total of 4 104 benchmarking universal single-copy orthologs of the mammalian dataset were retrieved from BUSCO (Simao et al., 2015). These genes were mapped to the TS_2.0 assembly using tBlastn (Altschul et al., 1997). –: Not available. We also generated ~264 x (705 Gb) Hi-C data (Table 3) to cluster the contigs into chromosome-scale scaffolds. A total of 1 728 contigs (comprising 96.2% of the assembled genome sequence) were anchored into 31 pseudo-molecules, whereas 1 616 contigs (102 Mb, 3.8% of assembled genome sequence) were unanchored (Table 4). The final chromosome-scale scaffolding of the de novo genome assembly of the Chinese tree shrew (TS_2.0) had a scaffold N50 length of 104 Mb, which is much more complete than the previous TS_1.0 assembly (Fan et al., 2013) (Table 1).
Table 3

Statistics of Hi-C data for mapping

ParameterHi-C data
Clean data705 Gb
Clean paired-end reads2 351 150 069
Unmapped paired-end reads47 514 976
Unmapped paired-end reads rate (%)2.02
Paired-end reads with singleton303 352 692
Paired-end reads with singleton rate (%)12.9
Multi mapped paired-end reads443 734 174
Multi mapped ratio (%)18.87
Unique mapped paired-end reads1 556 548 227
Unique mapped ratio (%)66.2
Table 4

Pseudo-chromosome sizes and assignment of Hi-C scaffolds

Pseudo-chromosomeContig No.Length (bp) of pseudo-chromosome
chr1154224 402 198
chr2107187 971 973
chr3111137 178 494
chr461121 533 334
chr574120 860 892
chr688117 379 583
chr767108 205 678
chr856108 052 698
chr962104 638 498
chr1064101 327 006
chr117197 509 983
chr124994 027 333
chr135492 296 458
chr145889 547 586
chr156971 741 294
chr164269 742 744
chr174866 945 814
chr184163 456 188
chr192757 308 528
chr203254 551 840
chr213549 758 179
chr225352 049 165
chr232743 809 476
chr242342 251 409
chr253341 996 642
chr2610030 565 635
chr275025 814 610
chr283426 506 761
chr292722 607 893
chr302821 670 314
chrX452118 492 391
Total anchored2 1972 564 200 597
Unanchored1 616102 561 400

We used Lachesis (Burton et al., 2013) to cluster, order, and direct the assembled contigs onto 31 pseudo-chromosomes, which were defined according to number of haploid chromosomes of the tree shrew (Liu et al., 1989). Contig No.: Number of contigs assembled onto each chromosome by Hi-C. Total anchored: total number of contigs that could be anchored into 31 pseudo-chromosomes. Unanchored: total number of contigs that could not be anchored into 31 pseudo-chromosomes.

Statistics of Hi-C data for mapping Pseudo-chromosome sizes and assignment of Hi-C scaffolds We used Lachesis (Burton et al., 2013) to cluster, order, and direct the assembled contigs onto 31 pseudo-chromosomes, which were defined according to number of haploid chromosomes of the tree shrew (Liu et al., 1989). Contig No.: Number of contigs assembled onto each chromosome by Hi-C. Total anchored: total number of contigs that could be anchored into 31 pseudo-chromosomes. Unanchored: total number of contigs that could not be anchored into 31 pseudo-chromosomes. To compare the long-read tree shrew genome assembly (TS_2.0) in this study with the previous short-read assembly (TS_1.0) (Fan et al., 2013), we generated ~30x coverage Illumina paired-end read sequences from another tree shrew and aligned it to both assemblies. The identified SNPs and indels were used to estimate assembly accuracy. The TS_2.0 assembly (quality value=28.56) had a higher quality value, as estimated using the number of non-matching base calls from FreeBayes (Bickhart et al., 2017; Garrison & Marth, 2012), than that of the TS_1.0 assembly (quality value=26.75) (Table 5). In addition, the TS_2.0 assembly had 3-fold fewer SVs than that of TS_1.0 (Fan et al., 2013), thus suggesting fewer assembly errors (Table 5). Quality evaluation using the FRC method (Vezzi et al., 2012) also showed TS_2.0 to be a better assembly (Table 5).
Table 5

Assembly quality score value statistics

ParameterLong-read assembly (TS_2.0)Short-read assembly (TS_1.0)
Quality value28.5626.75
Translocation2 8246 034
Deletion3 73312 607
Duplication142438
Inversion8099
Errors Per 100 Mbp253.89718.27
HIGH_COV_PE12 01666 655
HIGH_NORM_COV_PE12 41569 902
HIGH_OUTIE_PE1371 594
HIGH_SINGLE_PE10151
HIGH_SPAN_PE1 23732 751
LOW_NORM_COV_PE53672 38
STRECH_PE31 74166 763
COMPR_PE13 81820 437

Quality value was estimated based on number of non-matching base calls from FreeBayes (Garrison & Marth, 2012). Errors per 100 Mbp were calculated as a sum ratio of Lumpy (Layer et al., 2014) structural variants (SV) to a standardized genome size of 2.67 Gbp. FRC features (Vezzi et al., 2012) can assess assembly errors, including LOW_COV_PE: Low read coverage; HIGH_COV_PE: High read coverage; LOW_NORM_COV_PE: Low coverage of normal paired-end reads; HIGH_NORM_COV_PE: High coverage of normal paired-end reads; COMPR_PE: Areas with low CE statistics; STRECH_PE: Areas with high CE statistics; HIGH_SINGLE_PE: Regions with high numbers of unmapped pairs; HIGH_SPAN_PE: Regions with high numbers of discordant pairs that map to different contigs/scaffolds; HIGH_OUTIE_PE: Regions with high numbers of misoriented or distant pairs. With the exception of the QV score, lower counts are indicative of better assembly.

Assembly quality score value statistics Quality value was estimated based on number of non-matching base calls from FreeBayes (Garrison & Marth, 2012). Errors per 100 Mbp were calculated as a sum ratio of Lumpy (Layer et al., 2014) structural variants (SV) to a standardized genome size of 2.67 Gbp. FRC features (Vezzi et al., 2012) can assess assembly errors, including LOW_COV_PE: Low read coverage; HIGH_COV_PE: High read coverage; LOW_NORM_COV_PE: Low coverage of normal paired-end reads; HIGH_NORM_COV_PE: High coverage of normal paired-end reads; COMPR_PE: Areas with low CE statistics; STRECH_PE: Areas with high CE statistics; HIGH_SINGLE_PE: Regions with high numbers of unmapped pairs; HIGH_SPAN_PE: Regions with high numbers of discordant pairs that map to different contigs/scaffolds; HIGH_OUTIE_PE: Regions with high numbers of misoriented or distant pairs. With the exception of the QV score, lower counts are indicative of better assembly. About 73% of the gaps (163 220 gaps, 93.10 Mb) in the TS_1.0 assembly (Fan et al., 2013) were filled by the 49.15 Mb long-read sequences in the TS_2.0 assembly. Among these fully closed gaps, 65 222 were located in the genic regions. Only 39 gaps in TS_2.0 were fully closed by TS_1.0 (Table 6). We note that 4 112 gaps in TS_1.0 had flanking sequences that were mapped to separate pseudo-chromosomes in TS_2.0, indicating assembly errors in TS_1.0 (Fan et al., 2013).
Table 6

Gap closure statistics of the two genome assemblies

ParameterLong-read assembly TS_2.0)Short-read assembly (TS_1.0)
Total number of gaps1 697223 607
Partially closed gap using TS_1.0476
Partially closed gap using TS_2.00
Fully closed gap using TS_1.039
Fully closed gap using TS_2.0163 220
Fully closed gap in genic region065 222
Trans-scaffold gaps2644 112

Partially closed gap: Gap in one assembly was filled by a scaffold of another assembly, but still had some ambiguous (N) bases within the filled region. Fully closed gap: Gap in one assembly was filled by a contig of another assembly, without any ambiguous (N) bases. Trans-scaffold gap: Flanking sequences of a gap were aligned to two separate scaffolds or pseudo-chromosomes, which was most likely to be assembly errors. –: Not available.

Gap closure statistics of the two genome assemblies Partially closed gap: Gap in one assembly was filled by a scaffold of another assembly, but still had some ambiguous (N) bases within the filled region. Fully closed gap: Gap in one assembly was filled by a contig of another assembly, without any ambiguous (N) bases. Trans-scaffold gap: Flanking sequences of a gap were aligned to two separate scaffolds or pseudo-chromosomes, which was most likely to be assembly errors. –: Not available. The updated genome assembly is available at TreeshrewDB v2.0 (http://www.treeshrewdb.org) and has been deposited in GSA (accession No. PRJCA001472;http://gsa.big.ac.cn/) (Wang et al., 2017).

Repeat content in tree shrew genome

Repeat content in a genome poses a daunting difficulty for sequence assembly (Kronenberg et al., 2018). The function of repeat content has also begun to be recognized (Chuong et al., 2017). The TS_2.0 assembly presented an opportunity to identify and study full-length repeats. Here, 49.14% (up to 1.31 Gb) of the TS_2.0 assembly was identified as interspersed repeats, which represents an increase of 308 Mb repeat elements relative to TS_1.0 (Fan et al., 2013) (Table 7). Among the defined repeat elements, LINE1 (L1, long interspersed nuclear elements 1) repeats accounted for the highest proportion in TS_2.0 (18.54% of genome size; Table 8), similar to that of L1 in the human genome (Beck et al., 2010). The tree shrew specific tRNA-derived Tu-III family, the largest proportion of the SINE (short interspersed nuclear elements) in the Chinese tree shrew genome (Fan et al., 2013), accounted for 15.17% of genome size in TS_2.0 (Table 8). The reason for the unusually high prevalence of the tRNA-derived Tu-III family in the tree shrew genome remains to be determined. Because of the improvement in genome quality, we were able to identify 127 long transposable elements (each >20 kb). We also defined 4 411 709 satellites (total length of 131 Mb) in TS_2.0. Among them, 4 293 990 were short tandem repeats (each <150 bp) and 1 152 were long tandem repeats (each >5kb). The longest tandem repeat was mapped to a non-coding region in the end of pseudo-chromosome 26 and had a length of 168.3 kb (period size=359 bp, copy number=471). We note that some long tandem repeats within the genic regions were located in gaps in the TS_1.0 assembly (Fan et al., 2013). For instance, a long tandem repeat (period size=1 917 bp, copy number=26) overlapped with the coding sequence of the OS9 gene (osteosarcoma amplified 9, endoplasmic reticulum lectin), which plays a key role in the endoplasmic reticulum stress response associated with hypoxia (Satoh et al., 2010). Therefore, these long tandem repeats in the tree shrew genome may be functional. However, focused studies are required for their characterization.
Table 7

Comparison of transposable elements in Chinese tree shrews between short-read assembly (KIZ version 1: TS_1.0) and long-read assembly (KIZ version 2: TS_2.0)

TypeLong-read assembly (TS_2.0)Short-read assembly (TS_1.0)
Length (Mp)% in genomeLength (Mp)% in genome
DNA96.83.676.62.7
LINE553.320.8295.210.3
SINE663.124.9527.218.8
LTR138.05.2113.14
Other0.00050.00.060.002
Unknown68.02.60.90.03
Total1 310.549.11 001.935

DNA: Deoxyribonucleic acid transposon. LINE: Long interspersed nuclear element. SINE: Short interspersed nuclear element. LTR: Long terminal repeat.

Table 8

Comparison of transposable element subtypes in Chinese tree shrews between short-read assembly (KIZ version 1: TS_1.0) and long-read assembly (KIZ version 2: TS_2.0)

TE subtypeLong-read assembly (TS_2.0)Short-read assembly (TS_1.0)
Length (Mp)% in genomeLength (Mp)% in genome
DNA/En-Spm7.920.304.870.17
DNA/hAT34.291.2833.771.18
DNA/TcMar52.111.9626.900.94
LINE/CR15.570.212.000.07
LINE/L1494.5118.54267.299.34
LINE/L249.481.8622.040.77
LINE/Penelope2.020.082.290.08
LTR/ERV137.661.4131.771.11
LTR/ERVK18.550.708.870.31
LTR/ERVL78.132.9368.402.39
LTR/Gypsy2.590.102.860.1
SINE/Alu10.600.403.150.11
SINE/B43.020.111.720.06
SINE/MIR47.401.7823.750.83
SINE/tRNA-Lys15.040.561.140.04
SINE/Tu-III404.4915.17410.0914.33
Comparison of transposable elements in Chinese tree shrews between short-read assembly (KIZ version 1: TS_1.0) and long-read assembly (KIZ version 2: TS_2.0) DNA: Deoxyribonucleic acid transposon. LINE: Long interspersed nuclear element. SINE: Short interspersed nuclear element. LTR: Long terminal repeat. Comparison of transposable element subtypes in Chinese tree shrews between short-read assembly (KIZ version 1: TS_1.0) and long-read assembly (KIZ version 2: TS_2.0)

Gene annotation updates

We combined the homology-based, de novo, and transcriptome-based methods (Haas et al., 2008) to predict protein-coding genes in the TS_2.0 assembly and identified a total of 23 568 non-redundant protein-coding genes (Fan et al., 2013) (Table 9; Figure 1B). Among these genes, the majority (22 907 genes) were supported by the reported RNA-seq data in our recent studies (Fan et al., 2013, 2018). The newly updated gene set had longer coding sequences, which were, on average, composed of more exons (Table 9) compared with the TS_1.0 gene set (Fan et al., 2013). The gaps in 2 091 exons in TS_1.0 (Fan et al., 2013) were all filled in TS_2.0, thus providing better annotation information for the genes. For instance, LILRB3 (leukocyte immunoglobulin like receptor B3), which binds to the major histocompatibility complex (MHC) class I molecules on antigen-presenting cells to inhibit stimulation of immune response (Huang et al., 2010), was complete in TS_2.0, but less than 50% of this gene sequence was retrieved in TS_1.0 (Fan et al., 2013). BMP8A (bone morphogenetic protein 8a), which plays a role in the development of the reproductive system (Wu et al., 2017), exhibited low protein sequence identity (57.22%) with human homolog in TS_1.0 (Fan et al., 2013) due to assembly error and gaps, but the sequence identity reached 88.94% in TS_2.0. In addition, ALOX15 (arachidonate 15-lipoxygenase), which uses polyunsaturated fatty acid substrates to generate various bioactive lipid mediators, such as eicosanoids, hepoxilins, and lipoxins (Kuhn et al., 2018; Singh & Rao, 2019), had only one copy in TS_1.0 (Fan et al., 2013) but four copies in TS_2.0. The updated versions of these important genes have provided a good basis for further specific functional characterization.
Table 9

Comparison of Chinese tree shrew gene annotation between short-read assembly (KIZ version 1: TS_1.0) and long-read assembly (KIZ version 2: TS_2.0)

ParameterLong-read assembly (TS_2.0)Short-read assembly (TS_1.0)
Total number of genes23 56822 121
Complete ORFs21 11721 085
Annotated genes20 81120 225
Average mRNA length40 11433 712
Average CDS length1 5271 404
Average exon number8.867.54
Average exon length172186
Average intron length4 9074 937

ORF: open reading frame. CDS: coding-region sequences.

Comparison of Chinese tree shrew gene annotation between short-read assembly (KIZ version 1: TS_1.0) and long-read assembly (KIZ version 2: TS_2.0) ORF: open reading frame. CDS: coding-region sequences. Of the annotated genes, 20 811 (88.3%) were functionally classified according to InterPro (Mulder & Apweiler, 2007), GO (Ashburner et al., 2000), KEGG (Kanehisa & Goto, 2000), Swissprot, and TrEMBL (O'Donovan et al., 2002). In addition, 586 genes were newly annotated in TS_2.0 (Table 10). All these genes can be retrieved from TreeshrewDB v2.0.
Table 10

Comparison of Chinese tree shrew gene functional annotation between short-read assembly (KIZ version 1: TS_1.0) and long-read assembly (KIZ version 2: TS_2.0)

Functional annotationShort-read assembly (TS_1.0)Long-read assembly (TS_2.0)
No.Percent (%)No.Percent (%)
Total22 12123 568
InterPro17 42078.717 53474.4
KEGG16 59375.016 96472.0
Swissprot & TrEMBL20 22591.420 81188.3
Unannotated1 8968.62 30911.7

InterPro (http://www.ebi.ac.uk/interpro/). KEGG (https://www.kegg.jp/). Swissprot & TrEMBL (https://web.expasy.org/docs/swiss-prot_guideline.html). –: Not available.

Comparison of Chinese tree shrew gene functional annotation between short-read assembly (KIZ version 1: TS_1.0) and long-read assembly (KIZ version 2: TS_2.0) InterPro (http://www.ebi.ac.uk/interpro/). KEGG (https://www.kegg.jp/). Swissprot & TrEMBL (https://web.expasy.org/docs/swiss-prot_guideline.html). –: Not available. It should be mentioned that the MHC region (starting from MOG to COL11A2 (Beck et al., 1999) in pseudo-chromosome 3) was completely assembled (Figure 2A) in the TS_2.0 assembly. It was previously difficult to assemble this region using short-read sequencing technologies as it is highly polymorphic and repetitive. There were 412 gaps in the MHC region in TS_1.0 (Fan et al., 2013), which were all filled in TS_2.0. Thus, the tree shrew has more MHC class I genes (n=8 according to TS_2.0) than those identified in humans (n=6), although fewer than those identified in mice (n=12) (Elmer & McAllister, 2012) (Figure 2B).
Figure 2

Chinese tree shrew and human MHC genes

A: Synteny of MHC genes between Chinese tree shrews and humans. HLA class I & II genes are in red, other genes are in black. Tree shrew TS_2.0 assembly and human genome (hg38; https://www.ncbi.nlm.nih.gov/grc/human) were used for comparison. B: Phylogenetic relationship of MHC-class I genes in humans, tree shrews, and mice.

Chinese tree shrew and human MHC genes

A: Synteny of MHC genes between Chinese tree shrews and humans. HLA class I & II genes are in red, other genes are in black. Tree shrew TS_2.0 assembly and human genome (hg38; https://www.ncbi.nlm.nih.gov/grc/human) were used for comparison. B: Phylogenetic relationship of MHC-class I genes in humans, tree shrews, and mice.

Genomic structural variants

The genome TS_2.0 assembly improved sequence continuity and provided an opportunity to explore species-specific genomic SVs in genic regions. We used the human (hg38;https://www.ncbi.nlm.nih.gov/grc/human), macaque (rheMac3 (Yan et al., 2011)), and mouse (GRCm38;https://www.ncbi.nlm.nih.gov/grc/mouse) genomes and TS_2.0 to construct a synteny map of orthologous genes, using the human genome as a reference. We identified 221 SVs in tree shrews (Supplementary Table S2), 188 SVs in macaques (Supplementary Table S3), and 387 SVs in mice (Supplementary Table S4), suggesting that the tree shrew’s genomic structure had an overall higher similarity to that of primates than to that of mice. A detailed comparison of the SVs showed that the tree shrews had a seemingly mosaic pattern with some similarities to rodents and others to primates. For instance, the tree shrew and primates (human and macaque) had a specific genomic SV in the region starting from MYSM1 to SLC35D1, which was inverted in the mouse genome (Figure 3A). Some SVs existed in the tree shrew and mouse, but primates had different counterparts, such as the region from PRKAB2 to POLR3GL (Figure 3B). Note that in this region, GPR89B (G protein-coupled receptor 89B) and NOTCH2NL (notch 2 N-terminal like A) were only present in the human genome (Figure 3B). The updated TS_2.0 assembly has thus provided more opportunities to understand the evolution of SVs and potentially disrupted genes in the tree shrew genome. The exact reason for the occurrence of species-specific SVs and their potential evolutionary and functional effects await further study.
Figure 3

Examples of structural variants in mouse, macaque, tree shrew, and human genomes

A: Chinese tree shrews and humans, but not mice, shared a specific genomic structure in the region from MYSM1 to SLC35D1. B: Chinese tree shrews and mice shared a specific genomic structure in the region from PRKAB2 to POLR3GL, which has undergone dramatic changes in humans. GPR89B (G protein-coupled receptor 89B) and NOTCH2NL (notch 2 N-terminal like A) genes, marked in green, were only present in the human genome. These genomes were retrieved from public domains (mouse: GRCm38; https://www.ncbi.nlm.nih.gov/grc/mouse; macaque: rheMac3(Yan et al., 2011); human: hg38; https://www.ncbi.nlm.nih.gov/grc/human) or generated in this study (tree shrew: TS_2.0).

Examples of structural variants in mouse, macaque, tree shrew, and human genomes

A: Chinese tree shrews and humans, but not mice, shared a specific genomic structure in the region from MYSM1 to SLC35D1. B: Chinese tree shrews and mice shared a specific genomic structure in the region from PRKAB2 to POLR3GL, which has undergone dramatic changes in humans. GPR89B (G protein-coupled receptor 89B) and NOTCH2NL (notch 2 N-terminal like A) genes, marked in green, were only present in the human genome. These genomes were retrieved from public domains (mouse: GRCm38; https://www.ncbi.nlm.nih.gov/grc/mouse; macaque: rheMac3(Yan et al., 2011); human: hg38; https://www.ncbi.nlm.nih.gov/grc/human) or generated in this study (tree shrew: TS_2.0).

Genomic sequence variations at population level

To understand genomic sequence variations in the Chinese tree shrew, we analyzed the whole-genome sequencing data of six individuals (each with a sequencing depth of 30x). After mapping to TS_2.0, we identified a total of 12.8 million (M) SNPs in these individuals (Figure 1B), with 293 128 (including 194 751 synonymous and 98 377 non-synonymous SNPs) located in the coding regions. We estimated population genetic parameters for the Chinese tree shrew using the six captive individuals. We calculated the nucleotide diversity based on SNPs located in coding regions and identified 30 genes with high nucleotide diversity based on a cut-off π value of 0.025 (Figure 1C). Among these genes, five were located in the MHC loci or belonged to the immunoglobulin gene family, suggesting that immune genes may have a relatively high evolutionary rate in tree shrews, although this needs to be validated by analyzing more samples and including non-coding regions (Figure 1C). Whether or not this pattern reflects a compensatory effect due to the loss of RIG-I in the tree shrew genome (Fan et al., 2013; Xu et al., 2016; Yao, 2017) remains to be studied. We calculated Tajima’s D (Tajima, 1989) for each gene, and found discrete distribution, with no obvious clustering (Supplementary Figure S1). Results for Fu and Li’s D test (Fu & Li, 1993), Fu and Li’s F test (Fu & Li, 1993), Fay and Wu’s H test (Fay & Wu, 2000) all showed a pattern similar to the Tajima’s D test. Nonetheless, these results should be treated with caution, as they may be biased by the limited sample size.

Tree shrew database updates

Based on the TS_2.0 assembly, we updated TreeshrewDB v2.0 (Figure 4) to distribute the new high-quality tree shrew genome and our newly annotated gene and genome information. The main database updates included revision and expansion of genomic data, gene co-expression patterns, population genetic statistics, and improvements to the web interface. Briefly, for the retrieval module, we updated the reference sequence ID, genomic location and map, transcript sequence, and functional annotation based on the new gene set. We added five primate species (gibbon (Nomascus leucogenys), golden snub-nosed monkey (Rhinopithecus roxellana), black snub-nosed monkey (R. bieti), Bolivian squirrel monkey (Saimiri boliviensis boliviensis), and bushbaby (Otolemur garnettii)) in the orthologous gene sets from Ensembl (release 71;https://asia.ensembl.org/index.html) to allow for better comparison for one-to-one homologs. The mRNA expression pattern was upgraded based on RNA-seq data from 26 tissues and/or cells (Supplementary Table S1). For each gene query, it is possible to retrieve basic information on the queried gene, sequence alignment with homologs of other species, mRNA expression levels in tissues/cells, co-expression patterns in brain tissues, sequence variations at the population level, and results of population genetic parameters (including π, Watterson theta estimate (θw) (Watterson, 1975), Tajima’s D (Tajima, 1989), Fu and Li’s D (Fu & Li, 1993), Fu and Li’s F (Fu & Li, 1993), and Fay and Wu’s H (Fay & Wu, 2000)) (Figure 4).
Figure 4

Overview of updated tree shrew database (TreeshrewDB version 2.0)

Inclusion of the high-quality reference genome assembly (TS_2.0) in TreeshrewDB version 2.0 provided a comprehensive update of gene annotation information, genomic variations, population genetic features, and mRNA expressions. Population genetic parameters (including π, Watterson theta estimate (θw) (Watterson, 1975), Tajima’s D (Tajima, 1989), Fu and Li’s D (Fu & Li, 1993), Fu and Li’s F (Fu & Li, 1993), and Fay and Wu’s H (Fay & Wu, 2000)) were estimated based on SNPs located in coding regions in the whole genome sequences of six wild tree shrews. The database is freely accessible at http://www.treeshrewdb.org.

Overview of updated tree shrew database (TreeshrewDB version 2.0)

Inclusion of the high-quality reference genome assembly (TS_2.0) in TreeshrewDB version 2.0 provided a comprehensive update of gene annotation information, genomic variations, population genetic features, and mRNA expressions. Population genetic parameters (including π, Watterson theta estimate (θw) (Watterson, 1975), Tajima’s D (Tajima, 1989), Fu and Li’s D (Fu & Li, 1993), Fu and Li’s F (Fu & Li, 1993), and Fay and Wu’s H (Fay & Wu, 2000)) were estimated based on SNPs located in coding regions in the whole genome sequences of six wild tree shrews. The database is freely accessible at http://www.treeshrewdb.org. We added the new Chinese tree shrew reference genome (TS_2.0) to the TreeshrewDB v2.0, which is free to download. The updated gene sequences can be extracted in batches or individually by our homemade ExtractSeq. We incorporated Blast (Altschul et al., 1997) and Genewise (Birney et al., 2004) to show the mapping of genes in the genome. Overall, the updated database now provides a comprehensive annotation of the Chinese tree shrew genome to satisfy the needs of evolutionary analysis and biomedical research.

DISCUSSION

The combination of long-read sequencing and long-range chromosome interaction mapping (such as Hi-C) represents the most efficient approach to produce high-quality reference genome assembly (Bickhart et al., 2017; Kronenberg et al., 2018). In this study, we used these techniques to generate an updated reference genome for the Chinese tree shrew (KIZ version 2: TS_2.0) and resolved some of the problems from our earlier tree shrew genome (Fan et al., 2013). The updated TS_2.0 assembly enabled accurate identification of large and complex repeat regions, gene structures, and species-specific genomic SVs in the genic regions. This high-quality tree shrew genome will facilitate the use of this species in both biomedical and basic research, such as annotation and interpretation of RNA-seq data from normal and pathological tissues (Supplementary Table S1), and for a more comprehensive understanding of the evolution of primate-specific SVs and their potential regulatory changes (Kronenberg et al., 2018). For instance, we identified 221 SVs in the genic regions of the Chinese tree shrew genome and found that the overall pattern of SVs in the tree shrew more resembled that of primates than that of rodents (Supplementary Tables S2–4), further confirming the very close relationship between tree shrews and primates (Fan et al., 2013; Yao, 2017). It should be mentioned that the TS_2.0 assembly still misses many large and complex SVs due to the limitations of current sequencing technology and assembly approaches. Moreover, we did not experimentally validate the SVs between TS_1.0 and TS_2.0, which would offer further information regarding the construction of a well-defined reference genome of the Chinese tree shrew. We will continue to refine the tree shrew genome using more data in the future. In general, the new TS_2.0 assembly filled most of the gaps and corrected most assembly errors present in the previous tree shrew genome (Fan et al., 2013), thereby providing better gene annotations. To understand the unique genetic features of the tree shrew genome, such as long tandem repeats, repeat content, and genomic SVs, detailed studies should be carried out in the future. In our previous study, we built the TreeshrewDB (Fan et al., 2014) for easy access to the Chinese tree shrew genome data based on short-read sequencing technology (Fan et al., 2013), which has been visited frequently and used by many researchers. We comprehensively updated TreeshrewDB v2 based on the new high-quality reference genome (TS_2.0) generated in this study. We optimized the visualizations of gene annotation and genomic variations of the tree shrew and included results from population genetic parameters for this species. Furthermore, the inclusion of the reported transcriptomic data from 26 tissues and cells (Supplementary Table S1) has enhanced our knowledge of mRNA expression profiling in the Chinese tree shrew. This database will be regularly updated to include recently released genetic data and serve as a platform for data sharing among tree shrew studies and for further elucidation of the genetic features of this animal. We believe that the tree shrew genome assembly TS_2.0 and the updates will meet the increasing needs in the field.

SUPPLEMENTARY DATA

Supplementary data to this article can be found online. Click here for additional data file. Click here for additional data file. Click here for additional data file. Click here for additional data file.
  72 in total

1.  Hitchhiking under positive Darwinian selection.

Authors:  J C Fay; C I Wu
Journal:  Genetics       Date:  2000-07       Impact factor: 4.562

2.  Spatial coding of position and orientation in primary visual cortex.

Authors:  William H Bosking; Justin C Crowley; David Fitzpatrick
Journal:  Nat Neurosci       Date:  2002-09       Impact factor: 24.884

3.  GeneWise and Genomewise.

Authors:  Ewan Birney; Michele Clamp; Richard Durbin
Journal:  Genome Res       Date:  2004-05       Impact factor: 9.043

4.  InterPro and InterProScan: tools for protein sequence classification and comparison.

Authors:  Nicola Mulder; Rolf Apweiler
Journal:  Methods Mol Biol       Date:  2007

5.  Using RepeatMasker to identify repetitive elements in genomic sequences.

Authors:  Nansheng Chen
Journal:  Curr Protoc Bioinformatics       Date:  2004-05

6.  Loss of RIG-I leads to a functional replacement with MDA5 in the Chinese tree shrew.

Authors:  Ling Xu; Dandan Yu; Yu Fan; Li Peng; Yong Wu; Yong-Gang Yao
Journal:  Proc Natl Acad Sci U S A       Date:  2016-09-12       Impact factor: 11.205

7.  Tree shrew database (TreeshrewDB): a genomic knowledge base for the Chinese tree shrew.

Authors:  Yu Fan; Dandan Yu; Yong-Gang Yao
Journal:  Sci Rep       Date:  2014-11-21       Impact factor: 4.379

8.  GSA: Genome Sequence Archive<sup/>.

Authors:  Yanqing Wang; Fuhai Song; Junwei Zhu; Sisi Zhang; Yadong Yang; Tingting Chen; Bixia Tang; Lili Dong; Nan Ding; Qian Zhang; Zhouxian Bai; Xunong Dong; Huanxin Chen; Mingyuan Sun; Shuang Zhai; Yubin Sun; Lei Yu; Li Lan; Jingfa Xiao; Xiangdong Fang; Hongxing Lei; Zhang Zhang; Wenming Zhao
Journal:  Genomics Proteomics Bioinformatics       Date:  2017-02-02       Impact factor: 7.691

9.  The tree shrew provides a useful alternative model for the study of influenza H1N1 virus.

Authors:  Zi-feng Yang; Jin Zhao; Yu-tong Zhu; Yu-tao Wang; Rong Liu; Sui-shan Zhao; Run-feng Li; Chun-guang Yang; Ji-qiang Li; Nan-shan Zhong
Journal:  Virol J       Date:  2013-04-10       Impact factor: 4.099

10.  Full-length transcriptome assembly from RNA-Seq data without a reference genome.

Authors:  Manfred G Grabherr; Brian J Haas; Moran Yassour; Joshua Z Levin; Dawn A Thompson; Ido Amit; Xian Adiconis; Lin Fan; Raktima Raychowdhury; Qiandong Zeng; Zehua Chen; Evan Mauceli; Nir Hacohen; Andreas Gnirke; Nicholas Rhind; Federica di Palma; Bruce W Birren; Chad Nusbaum; Kerstin Lindblad-Toh; Nir Friedman; Aviv Regev
Journal:  Nat Biotechnol       Date:  2011-05-15       Impact factor: 54.908

View more
  16 in total

1.  WGCNA analysis of the subcutaneous fat transcriptome in a novel tree shrew model.

Authors:  Yuanyuan Han; Wenguang Wang; Jie Jia; Xiaomei Sun; Dexuan Kuang; Pinfen Tong; Na Li; Caixia Lu; Huatang Zhang; Jiejie Dai
Journal:  Exp Biol Med (Maywood)       Date:  2020-03-26

Review 2.  The Progresses of Spermatogonial Stem Cells Sorting Using Fluorescence-Activated Cell Sorting.

Authors:  Yihui Cai; Jingjing Wang; Kang Zou
Journal:  Stem Cell Rev Rep       Date:  2020-02       Impact factor: 5.739

3.  Optimization of Milk Substitutes for the Artificial Rearing of Chinese Tree Shrews (Tupaia belangeri chinensis).

Authors:  Jia-Qi Chen; Qingyu Zhang; Dandan Yu; Rui Bi; Yuhua Ma; Yijiang Li; Long-Bao Lv; Yong-Gang Yao
Journal:  Animals (Basel)       Date:  2022-06-28       Impact factor: 3.231

Review 4.  Breast cancer animal models and applications.

Authors:  Li Zeng; Wei Li; Ce-Shi Chen
Journal:  Zool Res       Date:  2020-09-18

5.  From our roots, we grow.

Authors:  Yong-Gang Yao; Hua Shen
Journal:  Zool Res       Date:  2019-11-18

6.  COVID-19-like symptoms observed in Chinese tree shrews infected with SARS-CoV-2.

Authors:  Ling Xu; Dan-Dan Yu; Yu-Hua Ma; Yu-Lin Yao; Rong-Hua Luo; Xiao-Li Feng; Hou-Rong Cai; Jian-Bao Han; Xue-Hui Wang; Ming-Hua Li; Chang-Wen Ke; Yong-Tang Zheng; Yong-Gang Yao
Journal:  Zool Res       Date:  2020-09-18

7.  The protective effect of gastrodin against the synergistic effect of HIV-Tat protein and METH on the blood-brain barrier via glucose transporter 1 and glucose transporter 3.

Authors:  Juan Li; Jian Huang; Yongwang He; Wenguang Wang; Chi-Kwan Leung; Dongxian Zhang; Ruilin Zhang; Shangwen Wang; Yuanyuan Li; Liu Liu; Xiaofeng Zeng; Zhen Li
Journal:  Toxicol Res (Camb)       Date:  2021-01-22       Impact factor: 3.524

8.  Social avoidance behavior in male tree shrews and prosocial behavior in male mice toward unfamiliar conspecifics in the laboratory.

Authors:  Rong-Jun Ni; Yang Tian; Xin-Ye Dai; Lian-Sheng Zhao; Jin-Xue Wei; Jiang-Ning Zhou; Xiao-Hong Ma; Tao Li
Journal:  Zool Res       Date:  2020-05-18

9.  Comparison of nonhuman primates identified the suitable model for COVID-19.

Authors:  Shuaiyao Lu; Yuan Zhao; Wenhai Yu; Yun Yang; Jiahong Gao; Junbin Wang; Dexuan Kuang; Mengli Yang; Jing Yang; Chunxia Ma; Jingwen Xu; Xingli Qian; Haiyan Li; Siwen Zhao; Jingmei Li; Haixuan Wang; Haiting Long; Jingxian Zhou; Fangyu Luo; Kaiyun Ding; Daoju Wu; Yong Zhang; Yinliang Dong; Yuqin Liu; Yinqiu Zheng; Xiaochen Lin; Li Jiao; Huanying Zheng; Qing Dai; Qiangming Sun; Yunzhang Hu; Changwen Ke; Hongqi Liu; Xiaozhong Peng
Journal:  Signal Transduct Target Ther       Date:  2020-10-19

10.  A tree shrew model for steroid-associated osteonecrosis.

Authors:  Qi Chen; Zhao-Xia Ma; Li-Bin Xia; Zhen-Ni Ye; Bao-Ling Liu; Tie-Kun Ma; Peng-Fei Bao; Xing-Fei Wu; Cong-Tao Yu; Dai-Ping Ma; Yuan-Yuan Han; Wen-Guang Wang; De-Xuan Kuang; Jie-Jie Dai; Rong-Ping Zhang; Min Hu; Hong Shi; Wen-Lin Wang; Yan-Jiao Li
Journal:  Zool Res       Date:  2020-09-18
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.