| Literature DB >> 26819540 |
Ruiyi Lin1, Xiaoyong Du2, Sixue Peng1, Liubin Yang1, Yunlong Ma1, Yanzhang Gong1, Shijun Li1.
Abstract
The duck is one of the most economically important waterfowl as a source of meat, eggs, and feathers. Characterizing the genetic variation in duck species is an important step toward linking genes or genomic regions with phenotypes. Human-driven selection during duck domestication and subsequent breed formation has likely left detectable signatures in duck genome. In this study, we employed a panel of >1.4 million single-nucleotide polymorphisms (SNPs) identified from the RNA sequencing (RNA-seq) data of 15 duck individuals. The density of the resulting SNPs is significantly positively correlated with the density of genes across the duck genome, which demonstrates that the usage of the RNA-seq data allowed us to enrich variant functional categories, such as coding exons, untranslated regions (UTRs), introns, and downstream/upstream. We performed a complete scan of selection signatures in the ducks using the composite likelihood ratio (CLR) and found 76 candidate regions of selection, many of which harbor genes related to phenotypes relevant to the function of the digestive system and fat metabolism, including TCF7L2, EIF2AK3, ELOVL2, and fatty acid-binding protein family. This study illustrates the potential of population genetic approaches for identifying genomic regions affecting domestication-related phenotypes and further helps to increase the known genetic information about this economically important animal.Entities:
Keywords: Anas platyrhynchos; RNA-seq; SNP; selection signatures
Year: 2015 PMID: 26819540 PMCID: PMC4721680 DOI: 10.4137/EBO.S21545
Source DB: PubMed Journal: Evol Bioinform Online ISSN: 1176-9343 Impact factor: 1.625
Figure 1Comparison of SNPs identified by the two approaches. We detected 916,407 SNPs by mapping to the reference genome (green) and 175,998 SNPs were called by mapping to the transcriptome (purple), which composed 974,251 unique variants using GATK–DNA-seq combined with the SNPiR pipeline. Besides, we applied the approach of GATK–RNA-seq and then detected 1,196,422 variants (red).
Figure 2Admixture plot presenting genetic structure for 15 duck individuals. The length of each colored segment represents the proportion of the individual’s genome from K = 3 ancestral populations. The single color bar in Baigai White, Ma–Liancheng White, and Peking duck means that there is no admixture.
Summary of SNPs in ducks.
| CATEGORY | COUNT | PERCENT (%) | NOTE |
|---|---|---|---|
| Sample size | N = 15 | – | SPLICE_SITE_REGION’ means that a variant is within 2 bp of a splice junction. |
| SNP | 1,468,452 | – | |
| UPSTREAM | 269,271 | 11.48 | |
| UTR_5_PRIME | 2,778 | 0.12 | |
| Exonic | 190,444 | 8.12 | |
| Non_coding_exon | 1,207 | 0.05 | |
| Frameshift | 885 | 0.04 | |
| NON_SYNONYMOUS | 57,628 | 2.46 | |
| Synonymous | 130,575 | 5.57 | |
| Nonsyn/Syn ratio (ω) | 0.44 | – | |
| INTRON | 552,382 | 23.56 | |
| SPLICE_SITE_REGION | 115,458 | 4.92 | |
| SPLICE_SITE_DONOR | 34,861 | 1.49 | |
| SPLICE_SITE_ACCEPTOR | 13,864 | 0.59 | |
| UTR_3_PRIME | 30,477 | 1.30 | |
| DOWNSTREAM | 507,849 | 21.66 | |
| INTERGENIC | 627,198 | 26.75 | |
| Number of effects | 2,344,610 | 100 |
Figure 3Average read depths on different elements (UTR, exon, intron) of 165 genes randomly selected from the duck transcriptome. The ∼10 × coverage at the 3′ end (far right) trailing off to ∼3 × by the time you get to the 5′ end (far left) indicated that in some cases, a portion of the 5′ end of a transcript has already been lost in the preparation of RNA-seq. The lower coverage at the 5′ end of transcript resulted in less SNPs calling at the 5′ end than that in the 3′ end. The much lower density of SNP calls in intronic regions and the lower quality of these SNPs are mainly because of very low coverage of short reads in RNA-seq data.
Summary of significant CRs (P ≤ 0.01) and distribution of SNPs in five duck populations.
| CHR | WINDOWS (N) | CR (N) | CR SNPS (N) | CHR SNPS (N) | CR LENGTH (KB) | CHR LENGTH (MBP) | CR GENE (N) | CHR GENES (N) |
|---|---|---|---|---|---|---|---|---|
| chr1 | 29 | 11 | 3,682 | 186,609 | 4,200 | 198.28 | 43 | 2,124 |
| chr2 | 17 | 12 | 1,890 | 121,342 | 3,000 | 154.285 | 38 | 1,365 |
| chr3 | 10 | 9 | 2,684 | 103,474 | 1,900 | 115.727 | 24 | 1,207 |
| chr4 | 17 | 9 | 2,573 | 59,448 | 2,900 | 74.523 | 46 | 758 |
| chr5 | 6 | 5 | 1,249 | 75,816 | 1,100 | 63.518 | 20 | 947 |
| chr6 | 4 | 3 | 1,614 | 38,315 | 800 | 36.433 | 13 | 514 |
| chr7 | 2 | 2 | 694 | 41,748 | 400 | 39.268 | 7 | 527 |
| chr8 | 3 | 2 | 386 | 44,361 | 500 | 31.228 | 4 | 522 |
| chr9 | 5 | 3 | 904 | 41,473 | 800 | 26.143 | 15 | 448 |
| chr10 | 0 | 0 | 0 | 28,568 | 0 | 18.705 | 0 | 320 |
| chr11 | 1 | 1 | 181 | 37,742 | 200 | 21.689 | 2 | 419 |
| chr12 | 0 | 0 | 0 | 29,961 | 0 | 20.949 | 0 | 350 |
| chr13 | 0 | 0 | 0 | 35,703 | 0 | 21.836 | 0 | 338 |
| chr14 | 4 | 4 | 2,292 | 36,637 | 800 | 19.493 | 16 | 345 |
| chr15 | 0 | 0 | 0 | 36,406 | 0 | 17.612 | 0 | 430 |
| chr16 | 2 | 1 | 448 | 35,585 | 300 | 15.016 | 13 | 374 |
| chr17 | 0 | 0 | 0 | 4,888 | 0 | 0.387 | 0 | 39 |
| chr18 | 1 | 1 | 624 | 32,263 | 200 | 11.812 | 7 | 308 |
| chr19 | 3 | 3 | 984 | 27,947 | 600 | 12.468 | 11 | 318 |
| chr20 | 0 | 0 | 0 | 32,858 | 0 | 11.803 | 0 | 343 |
| chr21 | 4 | 3 | 1,226 | 25,870 | 801 | 15.674 | 20 | 346 |
| chr22 | 0 | 0 | 0 | 26,388 | 0 | 7.939 | 0 | 251 |
| chr23 | 0 | 0 | 0 | 9,916 | 0 | 4.482 | 0 | 111 |
| chr24 | 0 | 0 | 0 | 25,486 | 0 | 7.225 | 0 | 240 |
| chr25 | 0 | 0 | 0 | 15,503 | 0 | 7.33 | 0 | 175 |
| chr26 | 0 | 0 | 0 | 7,475 | 0 | 1.284 | 0 | 80 |
| chr27 | 0 | 0 | 0 | 27,757 | 0 | 6.462 | 0 | 257 |
| chr28 | 0 | 0 | 0 | 19,044 | 0 | 4.768 | 0 | 201 |
| chr29 | 1 | 1 | 629 | 21,146 | 200 | 4.454 | 11 | 198 |
| chrW | 1 | 1 | 188 | 1,491 | 200 | 2.089 | 10 | 40 |
| chrZ | 16 | 5 | 947 | 32,634 | 2,300 | 74.036 | 29 | 735 |
| Total | 126 | 76 | 23,195 | 1,263,854 | 21201 | 1046.918 | 329 | 14,630 |
Notes:
Windows of size 100 kb with P < 10E-4.
Distinct CRs with P-value < 10 E-4.
Total number of SNPs forming significant CRs.
Total number of SNPs used in the chromosome.
Figure 4Circos plot of the global distribution of genes, SNP variants, and signature of selective sweep along with the genome. The circles from outside to inside illustrate gene density (yellow), SNP density (green), and CLR values (blue). The genes located in regions with significant strong sweep signatures are presenting as outliers. High values in each layout (gene density > 10/100 kb, SNP density > 1000/100 kb, and CLR value > 30) were marked in red histograms.
Figure 5The result of selective sweep for the entire chromosome 1. (A) The scan of CLR for the sweep signal at each window of 100 kb in length across chromosome 1. We found outlier genomic regions with significant strong sweep signatures at a threshold of 10E-4 (shown in red). (B) The scan of CLR for the sweep signal at each window of 100 kb in length across the across chromosome 1 using only SNP data from the exon/UTR regions.
The enriched biological process of GO analysis.
| TERM | GENES | |
|---|---|---|
| GO:0051099∼positive regulation of binding | AMH, NCOA3, HIPK2, JAK2, PRDX3, EIF2AK3 | 0.0054 |
| GO:0031016∼pancreas development | ALDH1A2, HNF1A, EIF2AK3, TCF7L2 | 0.0139 |
| GO:0046907∼intracellular transport | ENAH, GRPEL1, NPEPL1, NUP155, VTI1A, COPB2, APP, COG5, GBF1, ZFYVE16, LYST, STX16, GNAS, JAK2, ATP5O, SRP9, HSPA9, TOB1 | 0.0149 |
| GO:0043388∼positive regulation of DNA binding | AMH, NCOA3, HIPK2, JAK2, PRDX3 | 0.0186 |
| GO:0050796∼regulation of insulin secretion | INHBB, HNF1A, JAK2, TCF7L2 | 0.0200 |
| GO:0001655∼urogenital system development | AMH, ALDH1A2, LAMA5, ADAMTS1, NID1, CA2 | 0.0219 |
| GO:0007018∼microtubule-based movement | APP, KIF15, TUBE1, TUBB1, DNAH8, KIF26B | 0.0242 |
| GO:0007167∼enzyme linked receptor protein signaling pathway | WFIKKN2, FGFR3, KL, CD8B, ZFYVE16, HIPK2, COL1A2, JAK2, EIF2AK3, EPHB1, TOB1 | 0.0274 |
| GO:0002763∼positive regulation of myeloid leukocyte differentiation | GNAS, CA2, RUNX1 | 0.0274 |
| GO:0006633∼fatty acid biosynthetic process | TBXAS1, HNF1A, ELOVL3, ELOVL2, DEGS2 | 0.0277 |
| GO:0015031∼protein transport | EIF4ENIF1, GRPEL1, NPEPL1, NUP155, VTI1A, COPB2, COG5, RNF103, SCFD2, ZFYVE16, LYST, STX16, EXOC3, GNAS, JAK2, NUP35, SRP9, HSPA9, TOB1 | 0.0281 |
| GO:0002791∼regulation of peptide secretion | INHBB, HNF1A, JAK2, TCF7L2 | 0.0288 |
| GO:0006891∼intra-Golgi vesicle-mediated transport | COPB2, COG5, STX16 | 0.0304 |
| GO:0045184∼establishment of protein localization | EIF4ENIF1, GRPEL1, NPEPL1, NUP155, VTI1A, COPB2, COG5, RNF103, SCFD2, ZFYVE16, LYST, STX16, EXOC3, GNAS, JAK2, NUP35, SRP9, HSPA9, TOB1 | 0.0304 |
| GO:0008104∼protein localization | EIF4ENIF1, NPEPL1, GRPEL1, HNF1A, AMN, NUP155, VTI1A, COPB2, COG5, RNF103, SCFD2, ZFYVE16, LYST, STX16, EXOC3, GNAS, JAK2, NUP35, SRP9, HSPA9, TOB1 | 0.0310 |
| GO:0015718∼monocarboxylic acid transport | HNF1A, ABCC3, FABP1, FABP2 | 0.0357 |
| GO:0016050∼vesicle organization | COPB2, GBF1, LYST, ZFYVE16 | 0.0394 |
| GO:0006754∼ATP biosynthetic process | ATP5E, AK3, ATP10D, ATP5O, ATP5J | 0.0403 |
| GO:0043523∼regulation of neuron apoptosis | NMNAT3, HIPK2, JAK2, PRDX3, ITSN1 | 0.0417 |
| GO:0006917∼induction of apoptosis | DCC, APP, RASGRF2, LYST, HIPK2, NAIF1, JAK2, ITSN1, PDCD6, TRAF3 | 0.0436 |
| GO:0012502∼induction of programmed cell death | DCC, APP, RASGRF2, LYST, HIPK2, NAIF1, JAK2, ITSN1, PDCD6, TRAF3 | 0.0442 |