Literature DB >> 35137004

The genome of the Paleogene relic tree Bretschneidera sinensis: insights into trade-offs in gene family evolution, demographic history, and adaptive SNPs.

Hai-Lin Liu^1,2,3,4, A J Harris^1,5, Zheng-Feng Wang^1,6,7,8, Hong-Feng Chen^1,5, Zhi-An Li^1,6,7,8, Xiao Wei⁹.

Abstract

Among relic species, genomic information may provide the key to inferring their long-term survival. Therefore, in this study, we investigated the genome of the Paleogene relic tree species, Bretschneidera sinensis, which is a rare endemic species within southeastern Asia. Specifically, we assembled a high-quality genome for B. sinensis using PacBio high-fidelity and high-throughput chromosome conformation capture reads and annotated it with long and short RNA sequencing reads. Using the genome, we then detected a trade-off between active and passive disease defences among the gene families. Gene families involved in salicylic acid and MAPK signalling pathways expanded as active defence mechanisms against disease, but families involved in terpene synthase activity as passive defences contracted. When inferring the long evolutionary history of B. sinensis, we detected population declines corresponding to historical climate change around the Eocene-Oligocene transition and to climatic fluctuations in the Quaternary. Additionally, based on this genome, we identified 388 single nucleotide polymorphisms (SNPs) that were likely under selection, and showed diverse functions in growth and stress responses. Among them, we further found 41 climate-associated SNPs. The genome of B. sinensis and the SNP dataset will be important resources for understanding extinction/diversification processes using comparative genomics in different lineages.

Entities: Chemical

Keywords: SNP; gene family; genome assembly; population genetics; resequencing

Mesh：

Year: 2022 PMID： 35137004 PMCID： PMC8825261 DOI： 10.1093/dnares/dsac003

Source DB: PubMed Journal: DNA Res ISSN： 1340-2838 Impact factor: 4.477

1. Introduction

Relic species are surviving members of lineages that were once widespread geographically and/or contained considerable taxonomic diversity that is lacking today. Although many relict species are currently confined to historical refugia and are endangered, they are remarkable for their persistence, whereas other related lineages or populations have become extinct. Understanding how relic species have survived historical environmental changes, including under recent anthropogenic influence, is fundamental to conservation and restoration studies. Considering many relic lineages contain only a few species or are monotypic, their genomes can provide species-specific information, such as on gene family evolution, demographic history, and adaptive single nucleotide polymorphisms (SNPs), to support long-term survival. With the rapid development of genome sequencing techniques, it is increasingly possible to obtain the whole genomes relatively easily, which benefits the understanding and conservation of relic species. The relic species, Bretschneidera sinensis Hemsl (2n = 18; Fig. 1), is a relic deciduous broad-leaved forest tree that was once a component of the boreotropical flora found throughout the Northern Hemisphere. However, in modern times, B. sinensis is regionally endemic and mainly restricted to elevations of 300–1,700 in remote mountainous areas of southern China at a latitude between 20°N and 30°N with other scattered individuals occurring in northern Vietnam, Thailand, and Myanmar. As with many relic plants, all natural population sizes of B. sinensis are small, often comprising 30 or less mature individuals in each population based on field investigations. Presently, the species is listed as a Category-I endangered species in the ‘Key List of Protection of Wild Plants in China’ and as endangered globally by the International Union for Conservation of Nature Red List., Despite its small populations and concerning conservation status, B. sinensis occupies a larger geographic breadth than many relic species, and, therefore, occurs in a relatively wide range of environments. This indicates that this species may have an evolutionary mechanism or adaptive genetic variation that supports its persistence.

Figure 1

Representative photographs of Bretschneidera sinensis, including the whole tree, flowers, and fruits.

Representative photographs of Bretschneidera sinensis, including the whole tree, flowers, and fruits. Since B. sinensis was first discovered at the end of the 19th century, its taxonomic status has been controversial.,, Within various taxonomic treatments, it has been classified into different families and orders, including Brassicales and Sapindales. However, classification systems broadly agree that the Bretschneidera genus should be regarded as monotypic because no close living relatives of the species are known to exist. Moreover, treatments also agree that B. sinensis belongs within the malvids lineage, which includes both Brassicales and Sapindales. The phylogeny using the complete chloroplast genome sequence of B. sinensis and other species of malvids further indicated that B. sinensis is genetically closest to Carica papaya. In addition to taxonomy, other studies on B. sinensis have primarily focussed on its conservation, especially to identify the factors that possibly underlie its rarity and endangered status, such as seed germination, plant growth rates,, water and light utilization,, leaf and root anatomy, reproductive strategies,, and genetic diversity., Results from these studies showed that, in the lab, the seed germination rate of B. sinensis was about 70%, but can be increased to 80% with gibberellic acid (GA3) treatment. However, the seed germination rate dramatically decreases to ∼15% in the field. With respect to growth rate, B. sinensis is a slow- to medium-growing plant exhibiting a curvilinear pattern of height gain; increasing in rate from the seeding to juvenile stage (∼0.26 m/year) but decreasing in the second and third years (∼0.18 m/year)., The growth rate gradually increases again beyond the fourth year, but at only ∼0.40 m/year, and remains lower than many other local tree species. During growth, B. sinensis requires shade for the seedling stage and full sun at maturity.,, It prefers a moist climate and is highly sensitive to drought at all stages of growth,, including during seed dormancy and germination. The reproductive strategy of B. sinensis includes typical insect pollination of extremely attractive, strongly zygomorphic white flowers containing pink or reddish veins, but its pollen grains are relatively large and not sticky and are, thus, not easily carried by insects., Moreover, B. sinensis has a stigma that excretes a relatively small amount of fluid and a small receptacle. These factors contribute to low pollination and fertilization rates in B. sinensis and therefore low fecundity. However, this species has an outcrossing breeding system through which it maintains genetic diversity, leading to cautious optimism that conservation of the species is feasible. Nevertheless, taken together, the growth and reproduction strategies of B. sinensis may lead to a lower competitive ability compared with other tree species within plant communities where it occurs, thus explaining its rarity.,, Anatomical studies of leaves and roots of B. sinensis have also shed light on the potential causes of its rarity because these two organ systems are responsible for light, water, and nutrition absorption and are, thus, tied to plant growth. In particular, studies of leaf anatomy revealed that seedlings of B. sinensis have leaves with a higher trichome density, the presence of flower-like papillae, and thinner cuticles compared with leaves of mature individuals., The trichomes, papillae, and cuticles are useful in preventing water loss from the leaf, and papillae may also function in increasing the leaf surface area, consequently enhancing the absorbance of diffuse light within the forest understory during the seedling stage. Thus, these structures appear to represent highly adaptive strategies for B. sinensis. However, studies of root anatomy revealed that the roots of B. sinensis lack root hairs. Root hairs are critical for the absorption of water and soil nutrients. Therefore, the lack of root hairs in this species may yield slower growth rates and decreased competitive ability. Although prior studies have elucidated many aspects of B. sinensis, no studies have investigated its gene family evolution, demographic history, and potential adaptive genetic variation, which could elucidate the genomic and/or environmental mechanisms of its long-term persistence and aid its restoration. Therefore, to illustrate these within B. sinensis, we generated and annotated a high-quality reference genome for the species, resequenced geographically representative populations, and mapped the resequencing data to the assembled genome to obtain SNPs. We used PacBio high-fidelity (HiFi) and high-throughput chromosome conformation capture (Hi-C) reads for reference genome assembly, transcripts from both PacBio isoform and Illumina RNA sequencing (RNA-Seq) for genome annotation, and Illumina short reads for resequencing. We believe that the B. sinensis genome and the newly identified SNPs will not only benefit conservation efforts in this species but also provide a pipeline that can be applied to elucidate the genomic basis for the modern-day relic status of other, similar species.

2. Materials and methods

Library construction and sequencing

For genome assembly, we sampled fresh leaves from one B. sinensis individual planted in the South Botanical Garden, China (SCBG, Guangzhou, China). From this leaf material, we constructed a total of three genomic libraries and two transcriptomic libraries for five different next-generation sequencing approaches. These comprised PacBio long-read libraries for circular consensus sequencing of DNA (PacBio HiFi) and full-length RNA transcript sequencing (Iso-Seq), Illumina short-read libraries for whole-genome sequencing (WGS) and RNA-Seq, and a Hi-C library for genome scaffolding. For DNA sequencing using Illumina and PacBio, we extracted genomic DNA using a modified cetyltrimethylammonium bromide protocol. For the Hi-C library, we extracted genomic DNA following cross-linkage with formaldehyde. Additionally, we extracted the total RNA from B. sinensis leaves and then synthesized them to cDNA for the Illumina RNA-Seq and Iso-Seq libraries. All library preparations and sequencing were carried out by the Annoroad Gene Technology Company (Beijing, China). Illumina short-read sequencings were performed using an Illumina HiSeq X Ten platform with 150 bp paired-end reads (PE-150 bp) and long-read sequencing with the PacBio Sequel II platform. For resequencing of B. sinensis individuals, we applied Illumina HiSeq X Ten with PE-150 bp to 154 individuals from 13 populations (Table 1 and Fig. 2A). Each sample yielded about 20 Gb of data.

Table 1

Sampling and climate information for 13 B. sinensis is populations in China

Population	Abbreviation	Longitude	Latitude	Sample size	AMT (_°C)	RH (%)	AP (mm)	E (mm)	AS (mm)	LWR (W/m²)	SWR (W/m²)
Enshi	ES	109_˚08′01″	30_˚02′54″	6	12.066	86.004	1,413.520	1,022.056	45.793	315.045	185.184
Chongyi	CY	114_˚18′30 ″	25_˚40′55″	15	16.835	85.914	1,617.485	1,202.128	28.841	346.931	195.432
Jingning	JN	119_˚38′09 ″	27_˚58′24″	15	15.289	88.085	2,152.491	1,226.724	38.463	340.070	193.905
Longquan	LQ	119_˚08′29 ″	28_˚04′29″	6	17.425	86.121	1,821.218	1,253.450	12.913	351.562	192.497
Nanping	NP	118_˚10′39 ″	26_˚38′30″	15	18.612	82.797	1,809.063	1,255.184	2.377	360.926	188.629
Huizhou	HZ	113_˚52′06 ″	23_˚37′57″	13	17.508	81.551	2,249.696	1,266.683	6.929	347.709	199.541
Longsheng	LS	109_˚55′11 ″	25_˚37′31″	15	14.477	85.583	1,896.454	1,169.204	50.396	332.110	198.686
Jinxiu	JX	110_˚12′42 ″	24_˚09′35″	15	14.594	82.132	1,694.328	1,138.587	33.598	329.956	201.576
Liping	LP	109_˚08′12 ″	26_˚13′49″	15	16.124	86.940	1,425.229	1,152.270	26.413	343.832	192.702
Pingbian	PB	103_˚41′15 ″	22_˚59′01″	8	16.874	78.295	1,589.989	1,195.990	1.225	338.621	223.474
Zhangjiajie	ZJJ	109_˚41′45 ″	29_˚39′18″	15	11.956	86.213	1,472.285	1,027.856	64.263	315.403	190.303
Taibei	TB	121_˚51′34 ″	25_˚07′34″	10	22.234	83.709	2,404.350	1,474.141	0.000	384.059	180.040
Wuzhishan	WZS	109_˚23′19 ″	18_˚58′18″	11	20.126	81.715	1,874.228	1,155.587	0.000	364.364	214.244
			Total	154

Climate variables are averaged from year 1982 to 2020 as downloaded from https://disc.gsfc.nasa.gov (last accessed 20 December 2021).

AMT: annual mean temperature; RH: relative humidity; AP: annual precipitation; E: Evapotranspiration; AS: annual snowfall; LWR: long-wave radiation; SWR: short-wave radiation.

Figure 2

Sampling locations and population structures in B. sinensis. (A) Geographic distribution of 13 B. sinensis populations in China. (B) First two components of a PCA of SNPs. (C)ADMIXTURE results for K = 2. Sampling and climate information for 13 B. sinensis is populations in China Climate variables are averaged from year 1982 to 2020 as downloaded from https://disc.gsfc.nasa.gov (last accessed 20 December 2021). AMT: annual mean temperature; RH: relative humidity; AP: annual precipitation; E: Evapotranspiration; AS: annual snowfall; LWR: long-wave radiation; SWR: short-wave radiation. After all sequencing was completed, we performed genome assembly, annotation, mapping, and SNP calling. For all data analyses, we ran programs under default settings except where otherwise indicated below.

Genome assembly and evaluation

Prior to genome assembly, we removed reads with base quality scores <30 and lengths shorter than 80 bp from the raw Illumina WGS and Hi-C datasets using Sickle v1.33 (https://github.com/najoshi/sickle, last accessed 21 June 2019). We further corrected the Illumina WGS reads in RECKONER v1.1.0 and applied the result for genome size estimation of B. sinensis in KmerGenie v1.7044 under the parameters ‘–diploid -k 141’. We also employed these reads to estimate the level of heterozygosity in the B. sinensis genome with GenomeScope 2.0 under the k-mer size of 21 and maximum k-mer coverage of 10,000. We processed the raw PacBio HiFi reads using CCS algorithm v4.2.0 (https://github.com/PacificBiosciences/ccs/releases/tag/v4.2.0, last accessed 13 August 2020) to obtain consensus reads, which we assembled in Hifiasm v0.11 (https://github.com/chhylp123/hifiasm, last accessed 26 August 2020). After assembly, we used both Pseudohaploid (https://github.com/schatzlab/pseudohaploid, last accessed 28 August 2020) and Purge_Dups v1.2.5 to examine and remove duplications in the genome. We identified areas of potential mis-assembly and determined breaks in the contigs using Scaffhic v1.1 (https://github.com/wtsi-hpag/scaffHiC, last accessed 28 August 2020) according to the Illumina Hi-C reads. Thereafter, we applied the Juicer pipeline v1.6 and 3d-dna v180114 to perform scaffolding, and we visualized the results in Juicebox v1.11.08 and manually corrected the errors. We further refined the scaffolds in MisjoinDetect (https://github.com/caixu0518/MisjoinDetect, last accessed 3 September 2020) before re-running the Juicer pipeline, 3d-dna, and juicebox. For the final scaffolded genome, we used LR_Gapcloser to close gaps, and we performed a final check for possible duplications by re-running Purge_Dups. To evaluate the quality of assembly, we used the eudicots_odb10 database within Benchmarking Universal Single-Copy Orthologs (BUSCO) v4.0.6. We also assessed quality based on the Illumina WGS reads in SQUAT through read mapping quality analytics with the parameter ‘-sample-size 200,000,000’. SQUAT uses two different alignment algorithms, BWA-MEM and BWA-backtrack, to assess mapping quality and report the percentages of uniquely mapped, multiply mapped, and unmapped reads. For uniquely mapped reads, SQUAT further classifies them into additional categories, such as those that are perfectly matched, those containing substitutions, those with mismatches at the ends (i.e. clips), and others.

Repetitive sequence annotation

To identify repeat sequences within the B. sinensis genome, we used Extensive de-novo TE Annotator (EDTA) v1.8.3 and REpeat Detector (RED) v2.0 and combined the results using the ‘merge’ command in bedtools v2.29.2. Based on the merged outcome, we masked repeat sequences using the ‘maskfasta’ command in bedtools.

Gene prediction and annotation

We performed gene prediction in LoReAn v2, an automated pipeline designed for the annotation of eukaryotic genomes. In addition to conducting ab initio gene prediction in LoReAn, we used the program for gene prediction based on RNA- and Iso-seq reads and protein sequences (Supplementary Table S1). The protein sequences were used for homology-based gene prediction. We chose them from the generally related species in the order of Brassicales and Sapindales in which B. sinensis was supposed to belong (Introduction), but more species in the former were selected considering the phylogeny shown in Huang et al. For the Iso-seq reads, we first processed them in IsoSeq v3 (https://github.com/ben-lerch/IsoSeq-3.0, last accessed 12 October, 2019) to obtain the full-length transcripts. After LoReAn prediction, we used the results as input for the funannotate pipeline v1.8.2 (https://github.com/nextgenusfs/funannotate, last accessed 19 October 2020) to acquire final integrated and consensus gene sets using the command of ‘funannotate predict’. During funannotate prediction, we applied the parameters ‘-max_intronlen 100,000 -busco_db embryophyta -organism other’. After gene prediction, we further used the funannotate pipeline for gene functional annotation with the command ‘funannotate annotate’. The annotation databases used included dbCAN v8.0, eggNOG v5.0, which includes GO (Gene Ontology), and KEGG (Kyoto Encyclopedia of Genes and Genomes), InterPro v79, MEROPS v12.2, Pfam v32.0, and UniProt v2020_02. We also performed secretome prediction in SignalP v4.1.

Orthologous gene group identification and analysis

We used OrthoFinder v2.4.0, to identify orthologous gene groups representing gene families within B. sinensis as well as in 11 other species representing malvids and species from large, related clades based on publicly available data (Supplementary Table S2). After the identification of orthologous groups, we allowed OrthoFinder to automatically select single-copy orthologs to generate the species trees. We used the species tree to obtain a dated phylogeny in TreePL,, in which we set nine-time calibration points for species pairs using their divergence time data (million years ago, MYA) from the Timetree database (http://timetree.org/, last accessed 26 October 2020; Supplementary Table S3). Based on the dated phylogeny, we estimated the expansions and contractions of orthologous gene families in CAFE v5. Prior to performing analyses in CAFE, we followed the users’ manual and removed gene families with more than 200 genes. For significantly expanded and contracted gene families, we followed up by conducting enrichment analysis according to the GO and KEGG databases using Tbtools v1.068. For significantly enriched GO terms, we visualized their inter-relationships in the Direct Acyclic Graph (DAG) with agriGO v2.0 and generated a treemap with REVIGO.

Genome comparison

We also compared the functional annotations of genes among the genomes of B. sinensis and eight additional sampled species representing Sapindales, Malvales, and Brassicales that comprised a clade with B. sinensis in phylogenetic analyses (Results). To accomplish this, we used the ‘funannotate compare’ command within the funannotate pipeline. This command applies Fisher’s exact test with a Benjamini–Hochberg correction to conduct enrichment analysis of GO terms. However, because the gene feature file (.gff3) of Brassica rapa downloaded from http://brassicadb.org (last accessed 1 November 2020), could not be parsed by the ‘funannotate compare’ command, we annotated its genome following the same procedure carried out on our assembled genome of B. sinensis and used short- (SRR12214240–SRR12214243, SRR12214246, SRR12214247, SRR12214250, and SRR12214251) and long-read RNA-seq data (SRR10259626–SRR10259628) from GenBank for RNA-seq-based prediction.

Gene duplications and syntenic block identification

We used the DupGen_finder pipeline to examine gene duplications in B. sinensis. In addition to potential whole-genome duplications (WGDs), DupGen_finder also identifies tandem duplications (TDs; separated by 5 or fewer genes), proximal duplications (PDs; separated by 10 or fewer genes), transposed duplications (TRDs; duplications mediated by transposable elements), and dispersed duplications (DSDs; random and non-neighbouring duplications) of genes (see Reference65 for detailed definitions of all types of duplications). All types of duplications were determined based on paired genes within species using an all-versus-all BLASTP search against Hevea brasiliensis, which we regarded as an outgroup. We also identified areas of homologous gene regions among chromosomes of B. sinensis using MCScan (Python version) implemented in the jcvi package and visualized them in Shinycircos.

Demographic history analysis

Based on the assembled genome and our Illumina WGS reads, we inferred the demographic history of B. sinensis using MSMC2 v2.1.3. This method of inference requires the mutation rate and generation time of species as input in addition to the genome. Because the mutation rate in B. sinensis is not available, we used the ‘evolutionary rate’, which has been shown to have little difference from the mutation rate, as a surrogate. To determine that, we first aligned the single copy orthologous genes shared among B. sinensis and C., papaya (i.e. its sister based on our reconstructed phylogeny; Results) and outgroup species Gossypium austral. We then performed Tajima’s relative rate test to examine if these genes were in the equality of evolutionary rate using the Relative-Rate-Test (https://github.com/lyy005/Relative-Rate-Test, last accessed 25 December 2021). The genes in the equality were selected and concatenated these into a single matrix, and removed gaps. Using this matrix, we calculated the proportion of different loci (D) between the sequences of B. sinensis and C. papaya and used this to estimate substitutions (k) according to the formula of k = (–3/4) ln (1–4D/3) following the Jukes and Cantor model. Thereafter, we inferred the mutation rate as k/2t, where t is the divergence time between B. sinensis and C. papaya from our dated phylogeny. We estimated the generation time of B. sinensis to be 20 years based on our field observations of when mature trees become reproductive. Moreover, in the demographic inference, we masked repeated sequences following the guidance of Patil et al.

SNP calling and genetic structure

Using the assembled genome of B. sinensis generated in this study as a reference, we applied dDocent v2.7.6, a bash pipeline, to call the SNPs in 154 individuals of the species. Within this pipeline, we used sickle v1.33 (https://github.com/najoshi/sickle, last accessed 7 April 2021) to trim the paired-end reads of each individual by removing reads with lengths of <80 bp and base quality values <30. After SNP calling, we applied the software implemented within dDocent to filter out SNPs of low quality, those deviating from Hardy–Weinberg equilibrium, and indels. We further filtered SNPs with Plink v1.9 to exclude SNPs that showed high linkage disequilibrium based on a threshold value of 0.2 for the squared correlation (R2) between all SNP pairs. Using the detected high-quality SNPs, we examined the genetic structure of B. sinensis via principal component analysis (PCA) and ADMIXTURE. We performed the PCA analysis in SNPRelate v1.24.0 and conducted ADMIXTURE in the AdmixPipe v2.0.2 pipeline. Within AdmixPipe, we performed 20 replicates for each possible genetic grouping (K) from 1 to 10, and we determined the best value of K according to cross-validation (CV) error and change of likelihood (K). For the best value of K, we applied CLUMPAK v1.1 to estimate the mean membership coefficient using 20 replicates per individual.

Genome-wide scans for local adaptation

We used two approaches to detect SNPs under selection: PCAdapt v4.3.3, and BayPass v2.2. PCAdapt indicated that two principal components were suitable for regressing SNPs, and within this, we implemented a q-value (i.e. adjusted ) < 0.01 as the threshold for determining outliers. For BayPass, we set the default parameters to estimate the XtX statistic, which is akin to Fst but accounts for the variance–covariance structure. Thereafter, we simulated pseudo-observed datasets (100,000 SNPs) to provide a calibrated threshold (99%) so that we could identify SNPs putatively under selection, i.e. adaptive SNPs. For adaptive SNPs, to detect their association with environments, we reran BayPass by only using adaptive SNPs but combining climate variables. We obtained seven climate variables (Table 1) from 1982 to 2020 from https://disc.gsfc.nasa.gov (last accessed 20 December 2021) for the specific coordinates of each population in the form of rasters at 10 km spatial resolution. Before input into BayPass, we averaged each climate variable across 39 years. After performing BayPass, we used a Redundancy Analysis (RDA) implemented in the vegan package to visualize the association results. Finally, for both adaptive and non-adaptive SNPs, we tested their isolation-by-distance (IBD) using IBD v1.5.2 by regression of pairwise genetic differentiation (Fst) with their geographic distance. We calculated pairwise Fst between populations using Pixy v1.2.5.beta1.

3. Results and discussion

Genome sequencing and assembly

The five sequenced libraries yielded ∼22 Gb of PacBio HiFi reads after performing consensus calling, ∼127 Gb of Illumina WGS reads, ∼105 Gb of Illumina Hi-C reads, ∼22 Gb of Illumina RNA-seq reads, and ∼300 Mb of PacBio Iso-seq reads after performing full-length transcript identification. The genome size of B. sinensis estimated in Kmergenie was 1,134,469,815 bp, and the genome heterozygosty rate was 0.71%. The assembled genomes in Hifiasm consisted of 1,222,797,721 bp, comprising 1,416 contigs and a contig N50 of 24,630,967 bp (Table 2). Among these contigs, 137 remained after processing in Pseudohaploid and Purge_Dups to reduce duplications. The remaining contigs were corrected and scaffolded using the Hi-C reads, and these yielded a final genome that was 1,170,884,685 with 1,120,988,409 bp (95.74%) of sequences assembled into nine chromosomes (Fig. 3A) and a scaffold N50 value of 137,464,959 bp. In the final assembled genome, the GC content was 35.80%, and the chromosomes ranged in size from 152,689,475 (Chromosome 1) to 87,593,448 bp (Chromosome 9).

Table 2

Statistics of the B. sinensis genome assembly

Statistics of HiFi assembly		Statistics of the Hi-C scaffolded assembly
Sequence length (bp)	Order of sequence length	Sequence length (bp)	Order of sequence length	Sequence length in the genome (%)
N10 = 64,553,695	L10 = 2	N10 = 152,689,475	L10 = 1	13.04
N20 = 54,398,357	L20 = 4	N20 = 151,999,468	L20 = 2	12.98
N30 = 48,448,289	L30 = 6	N30 = 144,368,824	L30 = 3	12.33
N40 = 29,147,647	L40 = 10	N40 = 140,467,444	L40 = 4	12.00
N50 = 24,630,967	L50 = 14	N50 = 137,464,959	L50 = 5	11.74
N60 = 19,150,349	L60 = 20	N60 = 103,046,003	L60 = 6	8.80
N70 = 14,033,915	L70 = 28	N70 = 102,104,410	L70 = 7	8.72
N80 = 9,569,185	L80 = 38	N80 = 101,254,378	L80 = 8	8.65
N90 = 3,151,252	L90 = 58	N90 = 87,593,448	L90 = 9	7.48
N100 = 10,839	L100 = 1,416	N100 = 1,000	L100 = 1,621
Total length	1,222,797,721 bp	1,170,884,685 bp
Average	863,557.71 bp	722,322.45 bp
Largest	89,979,610 bp	152,689,475 bp
Shortest	10,839 bp	1,000 bp

Figure 3

Features of the genome assembly (A) Hi-C interaction heat map (bin length 210,000 bp) for the B. sinensis genome. (B) Genome features across nine inferred chromosomes of B. sinensis. The grey-shaded tracks represent nine assembled chromosomes (scale: Mb); green represents repeat density; red represents gene density, and dark blue represents SNP density; and the cyan lines inside the circle represent syntenic blocks. The densities of repeats, genes, and SNPs were estimated based on a 106 bp window.

Completeness of the genome and quality evaluation

Based on BUSCO, we found that 2,270 (97.6%) of the 2,326 expected genes within the core eudicotyledons were captured, including 1,596 complete single-copy genes (68.6%) and 764 complete, duplicated genes (29.0%). There were an additional 17 genes (0.7%) that were fragmented and 39 (1.7%) appeared to be missing. SQUAT revealed that only 1% of reads were poorly mapped, and the percentages of different qualities and completeness of mapped reads are shown in Supplementary Table S4. Moreover, our simple assessment of genome integrity by mapping all Illumina WGS reads to the assembled genome using BWA-MEM indicated that 99.77% of reads were properly mapped.

Repeat annotation

EDTA and RED revealed that 60.47% and 58.38% of the genome represented repetitive regions, respectively. According to EDTA, there were more long terminal repeats (LTRs) than other kinds of repetitive sequences, accounting for 47.62% (557,597,939 bp) of the genome (Supplementary Table S5). The largest proportions of LTRs were Gypsy-like (285,140,648 bp; 24.35%) and Copia-like (185,533,829 bp; 15.85%). LTRs were followed in abundance by terminal inverted repeats (TIRs) comprising 10.21% (119,567,301 bp) of the genome. By combining the results of EDTA and RED, we determined that 788,029,873 bp (67.30%) of the assembled genome consisted of repetitive components and thus was subsequently masked. The density of repeat sequences in the genome is shown in Fig. 3B. When compared with the genome size of C. papaya, which was 372 Mb, the size of B. sinensis was almost three times larger. The two genomes contained a similar GC content, with C. papaya at 35.3% and B. sinensis at 35.80%. The B. sinensis genome contained a higher proportion of repetitive sequences than the C. papaya genome which had 51.9% repetitive sequences. However, after excluding repetitive sequences in B. sinensis, the remaining sequence size of 382,854,812 bp of B. sinensis was still larger than the C. papaya genome. Therefore, the large genome size of B. sinensis is not only caused by repetitive sequence expansion. Both genomes contained the highest proportion of Gypsy-like repetitive elements in their own genome, and the proportion (27.8%) of this element in C. papaya was higher than that in B. sinensis. However, considering 10.21% TIR elements in B. sinensis, the proportion of these repetitive elements in C. papaya was very low, with <1% of the genome. TIRs play important roles in the transposition of chromosomal fragments throughout the genome, altering gene expression and generating genetic diversity. Therefore, how TIRs influence the genetic variation and subsequently associate with the B. sinensis long-term adaptation needs further study in the future. With LoReAn, we identified 42,761 genes within the B. sinensis genome. After integrating them via the funannotate pipeline, we inferred a total of 48,870 genes encoding 53,020 proteins. Of these protein-coding genes, 44,685 (84.28%) were annotated to at least one database based on functional annotation (Supplementary Table S6).

Orthologous gene groups identification and analysis

A total of 29,808 putative gene families were identified in OrthoFinder representing 488,061 protein-encoding genes across all 13 analysed species. For 53,020 protein-encoding genes in B. sinensis, 47,025 (88.69%) of them were assigned to 17,270 (57.93%) gene families and 1,525 gene families containing 7,508 genes were specific to B. sinensis. The resulting species tree using OrthoFinder resolved B. sinensis within Brassicales and showed that it was a sister to C.papaya (Fig. 4). The divergence time between B. sinensis and C. papaya was estimated in TreePL to be 69.8 MYA.

Figure 4

A phylogenetic tree with mapped gene family contractions (–) and expansions (+) in B. sinensis and other species. The number at the root denotes the total gene families within the most recent common ancestor (MRCA). Please note that not all gene family contractions or expansions presented in the tree are significant. A time scale (in millions of years) is shown beneath the tree. Numbers of identified TPS genes with their Pfam ID PF01397 and PF03936 in malvids species are shown on the left of the tree. Ultimately, we performed analyses in CAFE on a total of 13,903 gene families after removing six that showed > 200 copies in at least one species and after CAFE removed 15,899 due to their lack of occurrence on the phylogenetic root. Among the 13,903 gene families, 5,876 families exhibited expansions, and 2,327 families exhibited contractions in B. sinensis (Fig. 4). Moreover, 11 families from the expanded families exhibited significant (P < 0.05) expansions, whereas 8 families from the contracted families underwent significant contractions. The significantly expanded and contracted families contained 219 and 17 protein-encoding genes, respectively. Biological processes of the significantly expanded gene families based on GO enrichment analysis (Supplementary Table S7) could be roughly grouped into four clusters corresponding to resistance to disease, adaption to shade, protein phosphorylation, and signalling (Fig. 5 and Supplementary Figs S1 and S2). According to DAG (Supplementary Fig. S2), the cluster related to resistance to disease converged to the genes related to positive regulation of defence response to virus by host (GO:0002230), defence response to fungus (GO:0050832), defence response, incompatible interaction (GO:0009814), response to molecule of fungal origin (GO:0002238), plant-type hypersensitive response (GO:0009626), and cellular response to salicylic acid (SA) stimulus (GO:0071446). The lowest-level GO term related to adaptation to shade was response to absence of light (GO:0009646), while protein phosphorylation itself (GO:0006468) was the lowest-level GO term related to that process, and signalling (GO:0023052) was alone in its cluster. In the Molecular Function GO category, the most abundant lowest-level GO terms representing expanded gene families were protein serine/threonine kinase activity (GO:0004674) and calmodulin binding (GO:0005516; Supplementary Fig. S3). The significantly contracted genes were all within the Biological Process category and were mostly related to responses to biotic stimuli (GO:0009607 and GO:0043207) and interspecies interactions between organisms (GO:0044419; Supplementary Table S8).

Figure 5

GO enrichment analysis of the genes in significantly expanded gene families.

GO enrichment analysis of the genes in significantly expanded gene families. The KEGG enrichment analysis revealed that significantly expanded gene families mainly had functions related to environmental adaptation, plant–pathogen interaction, the mitogen-activated protein kinase (MAPK) signalling pathway, signal transduction, and protein kinases (Supplementary Table S9). The analyses of KEGG enrichment for the significantly contracted gene families indicated that these genes were especially related to terpenoid biosynthesis, including monoterpenoid biosynthesis, terpenoid backbone biosynthesis, and metabolism of terpenoids and polyketides (Supplementary Table S10). The above results indicated that the most significantly expanded genes were largely involved in plant defences and, in particular, linked to disease resistance. Among the expanded gene families, we found that the lowest levels of the GO hierarchy were those pertaining directly to viral and fungal resistance (GO:0002230, GO:0050832, and GO:0002238), as well as hypersensitive responses (GO:0009626), cellular response to SA stimulus (GO:0071446), calmodulin binding (GO:0005516), and protein serine/threonine kinase activity (GO:0004674), all of which are closely associated with disease resistance in plants. Notably, calmodulin and SA are known to be critical within the signalling response pathways involved in plant disease resistance; specifically, calmodulin-binding proteins can affect the biosynthesis of signalling of hormones, such as SA. Additionally, serine/threonine-protein kinases are receptor-like kinases that typically contain a leucine-rich repeat structure that is used to detect biological pathogens., Moreover, plant hypersensitive responses are effective in defences against disease but come at the cost of a reduction in biomass accumulation because they cause cell death., Our KEGG enrichment analysis largely agrees with the GO enrichment results and suggests that genes related to disease resistance are expanded within B. sinensis. Among them, the MAPK signalling pathway has been demonstrated to actuate responses and resistance to plant disease. MAPK, along with SA, is a core mediator for the hypersensitive response and subsequent cell death., Genes that were over-represented in B. sinensis compared with the other sampled species were mainly related to RNA–DNA hybrid ribonuclease activity (GO:0004523; Supplementary Table S11 and Fig. S4), whereas under-represented genes were primarily associated with terpene synthase (TPS) activity (GO:0010333), zinc ion binding (GO:0008270), and adenosine diphosphate (ADP) binding (GO:0043531; Supplementary Table S11 and Fig. S5). Because the genes representing TPS activity (GO:0010333) in B. sinensis were significantly under-represented, we detected genes across species bearing TPS domains (Pfam IDs PF03936 and PF01397) and observed that B. sinensis displayed the smallest number of TPS genes (Fig. 4). The sister to B. sinensis, C. papaya, also possessed a small number of TPS genes in comparison to other species. Terpenes are important natural products with a wide range of applications in plants, but they most often serve as continuously available, passive, toxic defences against biological enemies. The low number of TPS genes in B. sinensis is unexpected because plants generally exhibit high numbers of TPS genes. Among 44 plant species summarized by Jiang et al., only Physcomitrella patens, Marchantia polymorpha, and Zostera marina, which comprise a moss, liverwort, and flowering sea grass, respectively, have smaller numbers of TPS genes than B. sinensis. Within B. sinensis, the reduced number of TPS genes could represent a trade-off between passive and active defence, where the utilization of SA and MAPK, which are enriched in the species, cause cell death and represent an active approach. The DupGen_Finder pipeline revealed 17,798 gene pairs in B. sinensis resulting from WGD, 2,949 gene pairs from TD, 834 gene pairs from PD, 5,710 gene pairs from TRD, and 8,250 gene pairs from DSD. MCScan revealed 421 syntenic blocks containing 23,017 genes and 23,306 gene pairs. The largest syntenic block size was 30,550,842 bp, and the smallest was 174,612 bp (Fig. 3B). The analysis in BUSCO also showed a relatively high level of gene duplication (29.0%). High levels of duplication in BUSCO analyses may be produced by WGD events or represent mis-assembly due to heterozygous contigs unpurged in the genome. However, we detected no heterozygous haplotigs in the Purge_Dups analysis, and DupGen_Finder indicated that the majority of the duplicated genes are attributed to WGD. Thus, WGD seems to be the most plausible explanation for the high levels of detected duplication. In contrast to B. sinensis, we found that only 0.8% of genes from the BUSCO database were duplicated within the genome of C. papaya, which is resolved as a sister to B. sinensis in our reconstructed phylogeny. The total BUSCO for the C. papaya genome was 80.7%, suggesting that incompleteness might also contribute to the lower duplication rate. To further verify that the observed rate of duplication in B. sinensis was not an artefact of sequencing, we sampled and sequenced an additional individual of the species using PacBio and Illumina DNA sequencing, as described above with assembly via the pb-assembly pipeline v0.06 (https://github.com/PacificBiosciences/pb-assembly, last accessed 6 September 2019) and Flye v2.8-b1674., The new genome comprised approximately 73 Gb of PacBio long reads and 145 Gb of Illumina short reads. After removing duplications using Pseudohaploid and Purge_Dups (Materials and methods), analyses in BUSCO for the pb-assembly revealed 73.7% completeness for single-copy genes and 23.4% for duplicated genes. Similarly, the Flye assembly showed 76.1% completeness of single-copy genes and 15.2% completeness of duplicated genes. Although the Flye assembly had relatively low gene duplication, the genome was relatively complete (91.3% vs. 97.1% of pb-assembly). Thus, these results suggest that our detection of duplication in our original result was not artificial and represented true WGD events.

Demographic history

We identified a total of 2,311 orthologous genes shared among B. sinensis, C. papaya, and G. australe. After performing a Relative-Rate-Test, 856 genes remained. By concatenating them, the aligned length of the sequence was 761,782 bp. We found a total of 130,040 SNPs between B. sinensis and C. papaya, and we estimated the mutation rate to be 1.39 × 10−9. Therefore, to determine the demographic history of B. sinensis, we applied this rate plus the estimated generation time of 20 years within MSMC2. This approach revealed substantial demographic fluctuations in the species over time (Fig. 6). In particular, we observed an initial clear population size decline from ca. 40 to 20 MYA. Thereafter, the population size increased slightly, but declined again around 3 MYA. After another slight recovery, it declined dramatically around 0.5 MYA to the present rare state.

Figure 6

Demographic history of B. sinensis shows the historical changes in effective population size. Arrows indicate large declines in population size that are described in the text.

Demographic history of B. sinensis shows the historical changes in effective population size. Arrows indicate large declines in population size that are described in the text. The first decline happened around 40 MYA, coinciding with a major, global climatic cooling trend around the Eocene-Oligocene transition, and the second major decline happened around 0.5 MYA, corresponding to the Naynayxungla glaciation (0.5–0.72 MYA), one of the most extensive glaciation events during the Quaternary Period. Similar two-step declines in population size have also been detected in other Paleogene relic species, such as Ginkgo biloba, which shows distribution overlap with B. sinensis in China. The timing of population declines in G. biloba are not fully consistent with those we found for B. sinensis, and this could be due to differences in analytical methods,, especially that repetitive sequences were not removed in the analysis of G. biloba.

SNP calling and genetic structure analysis

We obtained 1,284,251 SNPs from 154 B. sinensis individuals. After removing 388 SNPs putatively under selection (see below), 1,283,863 remained for genetic structure analyses. Based on these SNPs, a PCA revealed that the first principal component divided 154 B. sinensis samples into western populations (JX, LS, LP, ZJJ, ES, and PB) and eastern ones (WZS, HZ, CY, NP, LQ, JN, and TB; Fig. 2B). The second principal component further separated the TB (Taiwan) population from the other six populations among the eastern groups. In the ADMIXTURE analysis, the CV errors decreased constantly from K = 1 to K = 10, but the main decrease occurred at K = 2 (Supplementary Fig. S6A). The K analysis also indicated that the optimal K value was 2 (Supplementary Fig. S6B). When K = 2, the 154 samples were also divided into western and eastern groups according to their geographic distribution (Fig. 2C), which agrees with the first principal component in the PCA. Our results are consistent with a previous phylogeographic study of B. sinensis based on three chloroplast DNA fragments. That study proposed that the western group of populations occurs in an area (i.e. the Yungui Plateau) that was less influenced by Neogene and Quaternary glaciations than other geographic areas of China, whereas the eastern group may have survived in one or more refugia. However, the prior study did not include the WZS population from Hainan because it was only discovered recently. Here, we included WZS and found that it was within the eastern group. Although both the WZS and TB populations were separated from mainland China by the Qiongzhou and Taiwan straits, respectively, the TB population showed higher divergence from the other eastern populations than the WZS population based on the second principal component of our PCA (Fig. 2B). The timing of the formation of the Qiongzhou and Taiwan straits is still uncertain. Nevertheless, our results indicate that the Taiwan Strait might have formed earlier than the Qiongzhou Strait, leading to longer isolation of the TB population and hence greater divergence. PCAdapt identified 259,076 SNPs putatively under selection, whereas BayPass identified 7,182, and of these, 388 SNPs were identified according to both methods. For the shared SNPs, 94 were in exonic regions of 59 genes, and 22 were in the 3 prime untranslated regions of 11 genes (Supplementary Table S12). These genes were presumably related to diverse functions such as involvement in the growth of B. sinensis (e.g. PLIM2B_1, JHS1, and MES17 genes) and stress response (e.g. VTE5, RH1_3, and CYCL11_1genes) based on their annotations established in this study. These genes have been characterized in Arabidopsis thaliana and other model species, in which PLIM2B plays a crucial role in actin configuration during pollen germination and tube growth,, whereas JHS1 plays an important role in DNA replication and damage repair, meristem maintenance, and development.MES17 is a plant hormone-related gene and efficiently and specifically hydrolyzes MeIAA (Methyl Indole-3-Acetic Acid) to IAA, while IAA is known to be important in both root and shoot development in plants., Among the genes related to stress response, VTE5 encodes phytol kinase and is involved in tocopherol production, and in A. thaliana, it is related to resistance to root-knot nematodes, as well as to high-light and high-temperature stress tolerance in tomato.RH1 is in the DEAD-box-containing RNA helicase family, which modulates the secondary and tertiary structure of RNA. Previous studies indicated that this gene family was involved in many stress responses, such as to disease, drought, salinity, cold, and oxidation.,CYCL11_1 encodes a cognate cyclin for cyclin-dependent kinase G1 and G2 and modulates flowering time in response to temperature. The fact that all of these stress response and growth-related genes are under selection in B. sinensis indicates that the species still harbours diverse adaptive potential to potentially support its persistence. Among 388 adaptive SNPs, BayPass identified 41 SNPs that were significantly associated with climate variables. RDA indicated that they were mainly correlated with short-wave radiation (SWR) and long-wave radiation (LWR), in contrast to relative humidity (RH) and annual snowfall (AS). Both LWR and SWR are energy source and related to temperature., These results indicated that the significantly selected SNPs in climate were resistant to low humidity and high temperature. Bretschneiderasinensis is a subtropical species that grows in a cool and humid climate, where it experiences frost but not excessive heat. The association of SNPs and populations in RDA (Fig. 7B), which showed most populations clustered together along the RH and AS axis clearly confirmed this.

Figure 7

RDA shows the association of climate variables with (A) SNPs and (B) individuals. The vectors indicate the climate variables of the first two RDA components. IBD shows the association between genetic differentiation () and geographic distance using (C) non-adaptive and (D) adaptive SNPs in B. sinensis. ‘**’ means P < 0.01. In total 9 of the 41 climate-associated SNPs were identified in two genes (Fig. 7A), BLS_040253-T1 and BLS_045871-T1. Annotation analysis indicated that these two genes were SKIP2 and NAP1 (Supplementary Table S12). SKIP2 is an F-box family protein that is involved in the protein ubiquitination pathway. In plants, F-box proteins are one of the largest families, playing a pivotal role in plant growth and development and adverse conditions adaptation.,SKIP2, in particular, is related to root development.Nuclesome assembly protein 1 (NAP1), belongs to a family of histone chaperones. Because plants are sessile and vulnerable to stressful environments causing DNA damage, NAP1 combined with the other proteins modulates the repair of DNA damage, which is important to the maintenance of chromatin architecture facilitating plant genome stability and normal growth., However, how these SNPs alter the gene function promoting climate adaptation in B. sinensis needs further experimental confirmation. We detected significant IBD for both adaptive and non-adaptive SNPs in overall populations and the western group, but not in the eastern group (Fig. 7C and D). One possible reason for the lack of IBD in the eastern group could be attributed to spatial expansion and genetic interexchange among different refugia eroding the IBD effect. Furthermore, although adaptive SNPs are expected to correlate with environments or the other selection factors, they have also been observed following IBD in plants due to the correlation of geographic distance with selection factors. In this study, for all climate variables except AS, we observed that they were significantly correlated with geographic distance (Mantel test, P < 0.05), suggesting that both IBD and isolation-by-environment influenced the genetic diversities of B. sinensis.

4. Conclusions

As the sole species in the Bretschneidera genus and a Paleogene relic tree species, chromosome-level genome assembly and resequencing for adaptive SNP identification analysis provided important reference information for phylogeny, genetics, evolution, and the endangerment mechanism of B. sinensis. Demographic history dynamics revealed that the effective population size was influenced by historical events, and population genetic results will help in the study of current survival and evolutionary potential, species distribution, and migration.

Data availability

We deposited the sequenced reads to NCBI Sequence Read Archive under the accession number SRR12656547 for the PacBio HiFi reads, SRR12666035 for the Illumina WGS reads, SRR12548980 for the Illumina Hi-C reads, SRR13013654 for the Illumina RNA-seq reads, SRR13013685 for the PacBio Iso-seq reads, and SRR14234257–SRR14234410 for the 154 resequencing reads. The high-quality, assembled genome was submitted to GenBank under the accession number GCA_018105755.1. The assembly, repeats and gene annotation, SNPs, and raw outputs for syntenic block, DupGen_fin der and OrthoFinder are available at https://doi.org/10.6084/m9.figshare.15057996.v1.

Authors’ contributions

H.-F.C. and Z.-F.W. designed the study. H.-F.C. funded genome sequencing. Z.-F.W. performed computational analyses. H.-L.L., A.J.H., Z.-F.W., H.-F.C., Z.-A.L., and X.W. wrote the article. Click here for additional data file.

100 in total

Review 1. MAPK cascades in plant defense signaling.

Authors: S Zhang; D F Klessig
Journal: Trends Plant Sci Date: 2001-11 Impact factor: 18.313

2. The DEAD box RNA helicase family in Arabidopsis thaliana.

Authors: S Aubourg; M Kreis; A Lecharny
Journal: Nucleic Acids Res Date: 1999-01-15 Impact factor: 16.971

3. Arabidopsis LIM proteins: a family of actin bundlers with distinct expression patterns and modes of regulation.

Authors: Jessica Papuga; Céline Hoffmann; Monika Dieterle; Danièle Moes; Flora Moreau; Stéphane Tholl; André Steinmetz; Clément Thomas
Journal: Plant Cell Date: 2010-09-03 Impact factor: 11.277

4. Global cooling during the eocene-oligocene climate transition.

Authors: Zhonghui Liu; Mark Pagani; David Zinniker; Robert Deconto; Matthew Huber; Henk Brinkhuis; Sunita R Shah; R Mark Leckie; Ann Pearson
Journal: Science Date: 2009-02-27 Impact factor: 47.728

5. GenomeScope: fast reference-free genome profiling from short reads.

Authors: Gregory W Vurture; Fritz J Sedlazeck; Maria Nattestad; Charles J Underwood; Han Fang; James Gurtowski; Michael C Schatz
Journal: Bioinformatics Date: 2017-07-15 Impact factor: 6.937

6. Essential role for phytol kinase and tocopherol in tolerance to combined light and temperature stress in tomato.

Authors: Livia Spicher; Juliana Almeida; Katharina Gutbrod; Rosa Pipitone; Peter Dörmann; Gaétan Glauser; Magdalena Rossi; Felix Kessler
Journal: J Exp Bot Date: 2017-12-16 Impact factor: 6.992

7. InterPro in 2019: improving coverage, classification and access to protein sequence annotations.

Authors: Alex L Mitchell; Teresa K Attwood; Patricia C Babbitt; Matthias Blum; Peer Bork; Alan Bridge; Shoshana D Brown; Hsin-Yu Chang; Sara El-Gebali; Matthew I Fraser; Julian Gough; David R Haft; Hongzhan Huang; Ivica Letunic; Rodrigo Lopez; Aurélien Luciani; Fabio Madeira; Aron Marchler-Bauer; Huaiyu Mi; Darren A Natale; Marco Necci; Gift Nuka; Christine Orengo; Arun P Pandurangan; Typhaine Paysan-Lafosse; Sebastien Pesseat; Simon C Potter; Matloob A Qureshi; Neil D Rawlings; Nicole Redaschi; Lorna J Richardson; Catherine Rivoire; Gustavo A Salazar; Amaia Sangrador-Vegas; Christian J A Sigrist; Ian Sillitoe; Granger G Sutton; Narmada Thanki; Paul D Thomas; Silvio C E Tosatto; Siew-Yit Yong; Robert D Finn
Journal: Nucleic Acids Res Date: 2019-01-08 Impact factor: 16.971

8. Resequencing 545 ginkgo genomes across the world reveals the evolutionary history of the living fossil.

Authors: Yun-Peng Zhao; Guangyi Fan; Ping-Ping Yin; Shuai Sun; Ning Li; Xiaoning Hong; Gang Hu; He Zhang; Fu-Min Zhang; Jing-Dan Han; Ya-Jun Hao; Qiwu Xu; Xianwei Yang; Wenjie Xia; Wenbin Chen; Han-Yang Lin; Rui Zhang; Jiang Chen; Xiao-Ming Zheng; Simon Ming-Yuen Lee; Joongku Lee; Koichi Uehara; Jian Wang; Huanming Yang; Cheng-Xin Fu; Xin Liu; Xun Xu; Song Ge
Journal: Nat Commun Date: 2019-09-13 Impact factor: 14.919

9. The Pfam protein families database in 2019.

Authors: Sara El-Gebali; Jaina Mistry; Alex Bateman; Sean R Eddy; Aurélien Luciani; Simon C Potter; Matloob Qureshi; Lorna J Richardson; Gustavo A Salazar; Alfredo Smart; Erik L L Sonnhammer; Layla Hirsh; Lisanna Paladin; Damiano Piovesan; Silvio C E Tosatto; Robert D Finn
Journal: Nucleic Acids Res Date: 2019-01-08 Impact factor: 16.971