| Literature DB >> 28007977 |
Jian Xu1, Jiong-Tang Li1, Yanliang Jiang1, Wenzhu Peng1,2, Zongli Yao3, Baohua Chen1, Likun Jiang1, Jingyan Feng1, Peifeng Ji1, Guiming Liu4, Zhanjiang Liu5, Ruyu Tai1, Chuanju Dong1, Xiaoqing Sun1, Zi-Xia Zhao1, Yan Zhang1, Jian Wang6, Shangqi Li1, Yunfeng Zhao1, Jiuhui Yang7, Xiaowen Sun1, Peng Xu8,2,9.
Abstract
The Amur ide (Leuciscus waleckii) is a cyprinid fish that is widely distributed in Northeast Asia. The Lake Dali Nur population inhabits one of the most extreme aquatic environments on Earth, with an alkalinity up to 50 mmol/L (pH 9.6), thus providing an exceptional model with which to characterize the mechanisms of genomic evolution underlying adaptation to extreme environments. Here, we developed the reference genome assembly for L. waleckii from Lake Dali Nur. Intriguingly, we identified unusual expanded long terminal repeats (LTRs) with higher nucleotide substitution rates than in many other teleosts, suggesting their more recent insertion into the L. waleckii genome. We also identified expansions in genes encoding egg coat proteins and natriuretic peptide receptors, possibly underlying the adaptation to extreme environmental stress. We further sequenced the genomes of 10 additional individuals from freshwater and 18 from Lake Dali Nur populations, and we detected a total of 7.6 million SNPs from both populations. In a genome scan and comparison of these two populations, we identified a set of genomic regions under selective sweeps that harbor genes involved in ion homoeostasis, acid-base regulation, unfolded protein response, reactive oxygen species elimination, and urea excretion. Our findings provide comprehensive insight into the genomic mechanisms of teleost fish that underlie their adaptation to extreme alkaline environments.Entities:
Keywords: Leuciscus waleckii; acid–base regulation; adaptation; alkaline; genome; urea excretion
Mesh:
Year: 2016 PMID: 28007977 PMCID: PMC5854124 DOI: 10.1093/molbev/msw230
Source DB: PubMed Journal: Mol Biol Evol ISSN: 0737-4038 Impact factor: 16.240
Summary of Genome Assembly.
| Genome Assembly | N50 (size/number) | N90 (size/number) | Longest (kb) | Total Length | |
|---|---|---|---|---|---|
| Contigs | 37.3 kb/5,701 | 9.6 kb/20,566 | 303.5 | 738 Mb | |
| Scaffolds | 447.7 kb/477 | 95.1 kb/1,840 | 3277.6 | 752 Mb | |
Chromosomes | 24 pseudo-chromosomes (from 538 scaffolds) | 510.2 Mb (74%) | |||
Repetitive elements | Number | Total length (Mbp) | Percentage of genome (%) | ||
| Total | 1,652,292 | 284 | 37.8% | ||
| DNA transposons | 618,675 | 118 | 16.0% | ||
Retroelements | 139,335 | 43 | 5.7% | ||
Total | 23,560 | ||||
| Annotated | 23,068 | ||||
| Un-annotated | 492 | ||||
Fig. 1Repetitive element contents in L. waleckii. Age distribution of repetitive element contents in (a) L. waleckii and (b) D. rerio. The average number of substitutions per site for each fragmented repeat was estimated using the Jukes-Cantor model. The substitution rates correlate with the ages of the repetitive elements. LINE, long interspersed elements; SINE, short interspersed elements; LTR, long terminal repeat.
Fig. 2Comparison of evolutionary features of L. waleckii and other vertebrates and expanded gene families. (a) Phylogenetic tree and numbers of gene families undergoing expansion (red)/contraction (green). My, million years ago. (b) Venn diagram showing unique and overlapping gene families in L. waleckii, G. aculeatus, O. latipes, C. idellus, and D. rerio. (c) Phylogenetic tree of ZP proteins in vertebrates showing gene expansion in the L. waleckii genome. (d) Phylogenetic tree of VMO1 proteins in vertebrates showing gene expansion in the L. waleckii genome. (e) Phylogenetic tree of NPR proteins in vertebrates showing gene expansion in the L. waleckii genome. The proteins of L. waleckii are marked with red stars in (c), (d) and (e).
Fig. 3Population genetics and genomic regions under selective sweeps. (a) Maximum-likelihood phylogenetic tree of FW (n = 10) and ALK (n = 18) populations. Blue dots represent FW samples, whereas red triangles represent ALK individuals. (b) Population structure. The length of each colored segment represents the proportion of the individual’s genome from K = 2 ancestral populations. (c) Distribution of π ratios (FW/ALK) and Fst values, calculated in 10 kb windows with 10 kb sliding steps. Data points located to the right of the vertical dashed line (corresponding to the 5% tail of the empirical π ratio distribution, where the π ratio is 2.988) and above the horizontal dashed line (the 5% right tail of the empirical Fst distribution, where Fst is 0.147) were identified as selected regions for the ALK population (red points). (d) Selective sweeps on five selected genes. The π ratios, Fst values and Tajima’s D values were plotted using 10 kb sliding windows. Genomic regions located above the red dashed line (corresponding to the top 5% of Fst values, where Fst is 0.147) and above the black dashed line (corresponding to the top 5% of π ratios, where the π ratio is 2.988) were termed as regions under strong selective sweeps for the ALK population (grey regions). Genome annotations are shown at the bottom (black bar, coding sequences (CDS); blue bar, genes). The boundaries of CAII, CDC42, SLC4A3, KCNH2A, and NPR3 are marked in red.
Fig. 4Selective sweeps on urea excretion genes. (a) Selective sweeps on three selected genes. The π ratios, Fst values and Tajima’s D values were plotted using 10 kb sliding windows. Genomic regions located above the red dashed line (corresponding to the top 5% of Fst values, where Fst is 0.147) and above the black dashed line (5% significance level of the π ratio, where the π ratio is 2.988) were termed as regions under strong selective sweeps for the ALK population (grey regions). Genome annotations are shown at the bottom [black bar, coding sequences (CDS); blue bar, genes]. The boundaries of NAGS, CPS1 and URICASE are marked in red. (b) Schematic of the urea cycle. The components CPS1 and NAGS under the selective sweeps are marked in red and green, respectively.