| Literature DB >> 27646534 |
Mingzhou Li1, Lei Chen2, Shilin Tian1,3, Yu Lin3, Qianzi Tang1, Xuming Zhou4, Diyan Li1, Carol K L Yeung3, Tiandong Che1, Long Jin1, Yuhua Fu1,5, Jideng Ma1, Xun Wang1, Anan Jiang1, Jing Lan2, Qi Pan3, Yingkai Liu1, Zonggang Luo2, Zongyi Guo2, Haifeng Liu1, Li Zhu1, Surong Shuai1, Guoqing Tang1, Jiugang Zhao2, Yanzhi Jiang1, Lin Bai1, Shunhua Zhang1, Miaomiao Mai1, Changchun Li5, Dawei Wang3, Yiren Gu6, Guosong Wang1,7, Hongfeng Lu3, Yan Li3, Haihao Zhu3, Zongwen Li3, Ming Li8, Vadim N Gladyshev4, Zhi Jiang3, Shuhong Zhao5, Jinyong Wang2, Ruiqiang Li3, Xuewei Li1.
Abstract
Uncovering genetic variation through resequencing is limited by the fact that only sequences with similarity to the reference genome are examined. Reference genomes are often incomplete and cannot represent the full range of genetic diversity as a result of geographical divergence and independent demographic events. To more comprehensively characterize genetic variation of pigs (Sus scrofa), we generated de novo assemblies of nine geographically and phenotypically representative pigs from Eurasia. By comparing them to the reference pig assembly, we uncovered a substantial number of novel SNPs and structural variants, as well as 137.02-Mb sequences harboring 1737 protein-coding genes that were absent in the reference assembly, revealing variants left by selection. Our results illustrate the power of whole-genome de novo sequencing relative to resequencing and provide valuable genetic resources that enable effective use of pigs in both agricultural production and biomedical research.Entities:
Mesh:
Year: 2016 PMID: 27646534 PMCID: PMC5411780 DOI: 10.1101/gr.207456.116
Source DB: PubMed Journal: Genome Res ISSN: 1088-9051 Impact factor: 9.043
Figure 1.Comparison of SNP calling between the assembly-versus-assembly method and resequencing approaches based on read mapping. The Venn diagram with colors corresponding to the bar chart shows the sharing of identified SNPs among the assembly-versus-assembly method and two resequencing algorithms as implemented in SAMtools and GATK. An average of 4.25 M SNPs per breed were specifically identified by the assembly-versus-assembly method (marked as yellow), while only 0.24 k SNPs per breed were categorized by resequencing approaches (marked as red). A significant fraction of the detected SNPs by SAMtools (8.11 M per individual) and GATK (7.77 M per individual) was coincident (7.41 M; or 91.24% of SAMtools and 95.34% of GATK) (Supplemental Fig. S8).
Figure 2.Genomic variation between Chinese and European pigs. (A) Geographic locations of the original pig breeds. The Duroc (donor of the reference genome; it is denoted by a star) and Hampshire pigs were developed mainly in North America but originated in Europe. (B) Neighbor-joining phylogenetic tree, number of SNPs, transition/transversion ratio (Ts/Tv), heterozygous SNP ratio, patterns of regions of homozygosity (ROHs), and length and number of indels in the 10 breeds (left to right). Violin plots of the heterozygous SNP ratio and Ts/Tv ratio were generated using nonoverlapping 1-Mb windows (the medians are shown). For ROH, the circled area indicates the total length of ROHs in each breed. (C) Pairwise genomic similarity of Chinese and European pigs by identical score (IS) values within each 10-kb window across the genome (n = 259,511).
Figure 3.Identification of breed-specific selective sweeps. (A) Number of homozygous SNPs in breed-specific selected regions. Of 74.21 k homozygous SNPs in 20.10 Mb selected regions, 65.75 k (88.60%) were unique to a particular breed, which was highly concentrated in a small fraction (0.79%) of the genome and likely contributed to diversifying selection. (B) Selective sweep regions identified in the Rongchang pig. (Top panels, top half) Genes residing within or in the vicinity (±5 kb) of the selected regions are presented for each chromosome and ordered according to their locations. (Top panels, lower half) Degree of haplotype sharing of selected regions in pairwise comparisons among the 10 breeds. Homozygous SNP frequencies in individual breeds were used to calculate identity scores in 10-kb windows. Boxes (left) indicate pairwise comparison presented on that row (E, European pigs; C, Chinese pigs) according to the color assigned to each pig breed (right). Heat map colors indicate identity scores. (Middle panel) Percentage stacked column showing RSD values in the Rongchang-specific selected regions across 10 breeds sequenced. Rongchang showed predominantly higher RSD values than other breeds, indicating that only this breed has SNPs compared to the reference genome in this region. (Bottom half) RSD in 10-kb windows for Rongchang plotted along chromosomes. Black lines indicate selected regions (FDR < 0.05). Nine selected genes orthologous to the mammalian fat deposition genes are marked in red.
Summary of missing sequences and genes of the reference genome (Sscrofa10.2)
Figure 4.Details of assembled ALPK3 gene and selected variants. (A) Structure of assembled ALPK3. (Top panel) The interassembly collinear genes (colored rectangles) among 10 assemblies are linked by gray lines, and the genes not present in all 10 assemblies are marked in black. ALPK3 is denoted by a circle. Different scaffolds are shown as alternating white and gray backgrounds. (Bottom panel) Comparison of structure of ALPK3 among the 10 assemblies. Boxes and lines indicate exons and introns, respectively. (B) Coverage and depth for the longest gene model of ALPK3 (Gene ID: RCGENE17759) by cross-mapping reads from paired-end DNA libraries (insert sizes of 180 and 500 bp) of the 10 assemblies. The higher coverage depth (≥30×) suggests slightly different structures of ALPK3, which is attributable to limitations of short read assembly; as such, the longest gene model is considered more reliable and used for subsequent analyses. (C) Two selected missense mutations (T1,696-G and G1,733-C) in ALPK3 between Chinese wild boars (n = 6) and domestic Min pigs (n = 6). (Top panels) FST and heterozygosity/(1−FST), FDR (Arlequin), and Q-values (BayeScan) are plotted for 45 coding SNPs (18 missenses and 27 synonymous mutations). (Bottom panels) LD pattern of 45 SNPs in 101 domestic pigs from China (n = 41), North America (n = 12), and Europe (n = 48). Squares shaded in pink or red indicate significant LD between SNP pairs (bright red indicates pairwise D′ = 1), white squares indicate no evidence of significant LD, and blue squares indicate pairwise D′ = 1 without statistical significance. The adjacent T1,696-G and G1,733-C are closely linked (D′ = 1, r2 = 0.975, LOD = 41.6).