Literature DB >> 27845353

SNP discovery and genotyping using Genotyping-by-Sequencing in Pekin ducks.

Feng Zhu1, Qian-Qian Cui1, Zhuo-Cheng Hou1.   

Abstract

Genomic selection and genome-wide association studies need thousands to millions of SNPs. However, many non-model species do not have reference chips for detecting variation. Our goal was to develop and validate an inexpensive but effective method for detecting SNP variation. Genotyping by sequencing (GBS) can be a highly efficient strategy for genome-wide SNP detection, as an alternative to microarray chips. Here, we developed a GBS protocol for ducks and tested it to genotype 49 Pekin ducks. A total of 169,209 SNPs were identified from all animals, with a mean of 55,920 SNPs per individual. The average SNP density reached 1156 SNPs/MB. In this study, the first application of GBS to ducks, we demonstrate the power and simplicity of this method. GBS can be used for genetic studies in to provide an effective method for genome-wide SNP discovery.

Entities:  

Mesh:

Year:  2016        PMID: 27845353      PMCID: PMC5109183          DOI: 10.1038/srep36223

Source DB:  PubMed          Journal:  Sci Rep        ISSN: 2045-2322            Impact factor:   4.379


The domestic duck is an economically important agriculture animal and is consumed worldwide, especially in Asia1. Meanwhile, duck is also a suitable material for population genetics and evolutionary studies23. However, there is a limitation for population genetics studies and genomic selection due to the lack of a duck-specific DNA chip platform. Reduced-representation methods using restriction enzymes for the digestion reduce the genome complexity and are suitable for assaying SNPs from large numbers of samples with high reproducibility and low per-sample cost4. Genotyping by sequencing (GBS) is one such highly efficient strategy for genome-wide SNP detection and this approach has been successfully applied to aquatic, plants, and animals like chicken, pig and cattle56789101112. In this study, we developed a GBS strategy (Fig. 1) and applied it for SNP detection in the domestic Pekin duck, and evaluated the results by PCR-RFLP. Our theoretic analysis and experimental data showed this is a low cost and effective method for discovering SNPs in animal genomes for which chip microarrays are not yet available.
Figure 1

The GBS strategy pipeline.

The procedure mainly included three parts: 1. Genome-represent-reduced sequencing; 2. SNP discovery; 3. SNP validation.

Results

The selection of restriction enzyme

We used 11 commonly used restriction enzymes to conduct an in silico digestion study. The results of simulated digestion are illustrated in Fig. 2A. Tag number is one important index for evaluating enzyme digestion performance. MseI digestion was predicted to achieve more tags than other enzymes. The smooth tag size distribution curve for MseI supported its choice as a good candidate for this GBS study (Fig. 2B). Aside from the tag number, the genome-wide distribution of tags is another characteristic for enzyme selection. Theoretic results also suggest that MseI is better than other enzymes and has an even distribution pattern across the genome (Fig. 2C, Supplementary Figs S1 and S2). From consideration of tags on repeat regions and degenerative sites, MseI also achieved the highest tag number (Supplementary Table S1). Based on these analyses, MseI is the best candidate enzyme, from the eleven considered, for a GBS study in ducks. Regarding reads length and sequencing depth, 500 bp tag size would be suitable for this study. There are 211,898 tags whose length ranged from 400–500 bp in the in silico study (Fig. 2B).
Figure 2

(A) Simulation of DNA tags with restriction enzyme digestion. Tag counts for restriction enzyme NarI, BmtI, AflII, BanI, AseI, PstI, DpnII, BfaI, ApeKI, CviAII and MseI were estimated from the duck genome (BGI 1.0), based on loci of restriction enzyme nucleotide sequence recognition. (B) Distribution of Tag lengths with MseI digestion. The red area represents the suitable tag size which ranged from 400–500 bp. (C) Distribution of sequenced fragments count in 40-kb windows along the 30 longest scaffolds.

SNP discovery

A total of 544 million clean reads (63.25 Gb) were generated and 96.12% (523 million reads) of these were mapped to the duck genome with an average mapping rate of 96.25% (Supplementary Table S2). In total, about 13% of the genome was covered with tags, compared with 6.9% coverage predicted from in silico digestion. In total, 49,413 of GBS fragments were detected; tag length ranged from 39830 to 4230 bp, with median length 465 bp. Individual data information is shown in Supplementary Table S3. A total of 169,209 high-confidence SNPs were retained from all samples, with a mean of 55,920 SNPs identified for each individual (40,897 to 63,927) (Supplementary Table S4). Among the called genotypes, the number of SNPs was counted at 10-kb window size along the pseudo-chromosome displayed in Fig. 3. The SNP density reached 1156 SNPs/Mb, with an average of 41.23 SNPs identified for each fragment. The mean of the SNP missing rate was 5.64% and only 4 samples had a missing rate greater than 20% (Supplementary Table S4).
Figure 3

Distribution of SNP counts in 10-kb windows along the genome.

Only SNPs on the 30 longest scaffolds are shown. The shade of red scale represents the variation density of SNPs per window.

SNP validation

To validate results, we performed an in silico study with 50 randomly selected SNPs. After considering chromosomal distribution of SNPs and suitable enzyme digestion loci, we chose 24 SNPs to perform PCR-RFLP analysis. These SNPs were randomly distributed in the duck genome with approximately one selected site per chromosome. The results of PCR-RFLP assay are illustrated in Supplementary Table S5. A total of 982 sites were identified in PCR-RFLP and 280 SNPs were found, of which 90% (251/280) of identified SNPs were concordant with GBS’s results. Moreover, 94% (921/982) of genotypes were consistent with the GBS results. All the SNPs found using GBS were successfully validated by PCR-RFLP. Although randomicity and incomplete sequence coverage could lead to inconsistent results between GBS and PCR-RFLP, the PCR-RFLP assay results showed that the SNP library obtained by GBS were highly credible.

Discussion

Genomic selection and genome-wide association studies need 105 to 106 SNPs. However, many non-model species do not have reference chips for detecting variation. To solve this issue, one option is to generate more sequencing data as sequencing costs continue to fall quickly. The GBS method has great potential for application to the genomes of agricultural animals, as an alternatives to chip platforms. Pertille and coworkers sequenced 462 chicken using GBS method and identified 67,096 SNP with a 4.66% coverage of whole genome11. A pig sequencing experiment detected putative SNPs with an average density of 0.33 SNPs/10 Kb9. Additionally 63,797 SNPs were identified in a cattle study12. In this study, about 0.5-1X genome coverage data were obtained and the SNP density was found to be 1156 SNPs/Mb. Compared to other agricultural animals, our results showed excellent performance and high coverage for digested tags in ducks. The performance was slightly lower than that of the chicken study due to the population size of our study and the quality of the genome. A few individuals with relatively high missing rate (>20%) can be rescued using imputation methods when the sample size is large1314, especially in the designing genomic selection study. We observed that the ratio of genome coverage was a mean of 13%, higher than the 6.9%, predicted coverage. Two possibilities might lead to this result. Firstly, the duck reference genome still has many gaps. In the current duck reference genome, there are more than 70,000 contigs/scaffold. Therefore, the real number of predicted fragments cannot represent the real data until the quality of the reference genome is improved. Another reason is that shorter fragments were used in sequencing, even though a narrow selection range was set using a Pippin system. In practice, higher genome coverage will obtain more SNPs, but with reduced sequencing depth in some loci with the same sequencing data. In summary, we genotyped 49 Pekin ducks using GBS and identified 169,209 confident SNPs. We have demonstrated that GBS is a highly effective method for accessible and low cost genome-wide genotyping.

Methods

Ethics statement

All experiments were performed according to regulations and guidelines established by the Animal Care and Use Committee of China Agricultural University (permit number: DK996). All protocols and procedures were approved by the Beijing Administration Committee of Laboratory Animals under the leadership of the Beijing Association for Science and Technology (permit number: SYXK 2007–0023). Blood was extracted from the wing-vein via vacuum piping, with 75% alcohol/cotton ball for disinfection. All efforts were made to minimize animal suffering during the study.

Samples preparation

Forty-nine Pekin ducks from the same flock were randomly selected at the Beijing Jinxing Golden Star Duck Centre. Birds were fed ad libitum from 0 to 6 weeks. A blood sample was collected from each individual.

DNA extract, library construction, Sequencing

Genomics DNA was extracted from blood using the standard phenol/chloroform method. An in silico digestion of the duck reference genome (BGI 1.0, Ensemble 82) was performed to choose the appropriate restriction endonuclease using R package SimRAD15. According to the results of simulated digestion, 100 ug genomic DNA was digested with restriction endonuclease MseI, which recognizes a 4-bp sequence (TTAA) and creates a 2-bp overhang (Supplementary Table S6). Then a set of variable barcode adapters that recognize Mse1-compatible sequences were ligated to the digested DNA fragments. The ligation mixture was purified using Ampure XP beads. Fragments ranging from 550 to 580 bp, including adapter sequences, were purified with gel extraction. Next, the restriction fragments were enriched by PCR amplification with adapter-specific primers. The quality evaluation was performed by ABI StepOne Plus. The data of 2 × 125 bp pair-end reads were generated by the Illumina HiSeq2500.

Mapping

The raw reads that had <20 sequence quality score and <50-bp of sequence length were removed, and then barcode sequences were eliminated. The clean sequences were aligned to the duck reference genome (BGI 1.0, Ensemble 82) using Burrows-Wheeler Aligner (BWA) with the default parameters16. Read grouping and removal of PCR duplicates were done using Picard (http://picard.sourceforge.net.). The data were deposited in the NCBI sequence read archive (SRP068685).

Mutation detection

The genome analysis toolkit (GATK) was used to perform local realignment of reads to correct misalignments, and then to detect the SNPs and call the genotypes (-stand_call_conf 20 -stand_emit_conf 20, other parameters were default)17. Two criteria were used to identify the SNPs: 1. the missing rate of each locus could not be more than 0.2; 2. the mapping depth of each locus per sample should be more than 4. The information of tags calling si illustrated in Supplementary Table S3

PCR-RFLP

Restriction enzymes for the PCR-RFLP assay were selected using information from REBASE (http://rebase.neb.com)18. Primers for PCR were designed using Primer-Blast (http://www.ncbi.nlm.nih.gov/tools/primer-blast/)19. Conditions for the PCR were as follows: 94 °C for 5 min; 34 cycles of 94 °C for 30 s, 60–62 °C for 30 s and 72 °C for 1 min. This wasfollowed by a further 10 min extension at 72 °C. The restriction assay was performed at 37 °C for 2 h.

Additional Information

How to cite this article: Zhu, F. et al. SNP discovery and genotyping using Genotyping-by-Sequencing in Pekin ducks. Sci. Rep. 6, 36223; doi: 10.1038/srep36223 (2016). Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
  19 in total

1.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data.

Authors:  Aaron McKenna; Matthew Hanna; Eric Banks; Andrey Sivachenko; Kristian Cibulskis; Andrew Kernytsky; Kiran Garimella; David Altshuler; Stacey Gabriel; Mark Daly; Mark A DePristo
Journal:  Genome Res       Date:  2010-07-19       Impact factor: 9.043

Review 2.  Genome-wide genetic marker discovery and genotyping using next-generation sequencing.

Authors:  John W Davey; Paul A Hohenlohe; Paul D Etter; Jason Q Boone; Julian M Catchen; Mark L Blaxter
Journal:  Nat Rev Genet       Date:  2011-06-17       Impact factor: 53.242

3.  Use of genotyping by sequencing data to develop a high-throughput and multifunctional SNP panel for conservation applications in Pacific lamprey.

Authors:  Jon E Hess; Nathan R Campbell; Margaret F Docker; Cyndi Baker; Aaron Jackson; Ralph Lampman; Brian McIlraith; Mary L Moser; David P Statler; William P Young; Andrew J Wildbill; Shawn R Narum
Journal:  Mol Ecol Resour       Date:  2014-06-05       Impact factor: 7.090

4.  SimRAD: an R package for simulation-based prediction of the number of loci expected in RADseq and similar genotyping by sequencing approaches.

Authors:  Olivier Lepais; Jason T Weir
Journal:  Mol Ecol Resour       Date:  2014-05-26       Impact factor: 7.090

5.  De novo transcriptome assembly and identification of genes associated with feed conversion ratio and breast muscle yield in domestic ducks.

Authors:  Feng Zhu; Jian-Ming Yuan; Zhen-He Zhang; Jin-Ping Hao; Yu-Ze Yang; Shen-Qiang Hu; Fang-Xi Yang; Lu-Jiang Qu; Zhuo-Cheng Hou
Journal:  Anim Genet       Date:  2015-10-12       Impact factor: 3.169

6.  Primer-BLAST: a tool to design target-specific primers for polymerase chain reaction.

Authors:  Jian Ye; George Coulouris; Irena Zaretskaya; Ioana Cutcutache; Steve Rozen; Thomas L Madden
Journal:  BMC Bioinformatics       Date:  2012-06-18       Impact factor: 3.169

7.  REBASE--a database for DNA restriction and modification: enzymes, genes and genomes.

Authors:  Richard J Roberts; Tamas Vincze; Janos Posfai; Dana Macelis
Journal:  Nucleic Acids Res       Date:  2014-11-05       Impact factor: 16.971

8.  Genotyping-by-sequencing map permits identification of clubroot resistance QTLs and revision of the reference genome assembly in cabbage (Brassica oleracea L.).

Authors:  Jonghoon Lee; Nur Kholilatul Izzah; Beom-Soon Choi; Ho Jun Joh; Sang-Choon Lee; Sampath Perumal; Joodeok Seo; Kyounggu Ahn; Eun Ju Jo; Gyung Ja Choi; Ill-Sup Nou; Yeisoo Yu; Tae-Jin Yang
Journal:  DNA Res       Date:  2015-11-29       Impact factor: 4.458

9.  High-throughput and Cost-effective Chicken Genotyping Using Next-Generation Sequencing.

Authors:  Fábio Pértille; Carlos Guerrero-Bosagna; Vinicius Henrique da Silva; Clarissa Boschiero; José de Ribamar da Silva Nunes; Mônica Corrêa Ledur; Per Jensen; Luiz Lehmann Coutinho
Journal:  Sci Rep       Date:  2016-05-25       Impact factor: 4.379

10.  Rapid genotype imputation from sequence without reference panels.

Authors:  Simon Myers; Richard Mott; Robert W Davies; Jonathan Flint
Journal:  Nat Genet       Date:  2016-07-04       Impact factor: 38.330

View more
  7 in total

1.  Detection and Classification of Hard and Soft Sweeps from Unphased Genotypes by Multilocus Genotype Identity.

Authors:  Alexandre M Harris; Nandita R Garud; Michael DeGiorgio
Journal:  Genetics       Date:  2018-10-12       Impact factor: 4.562

2.  Genome-wide association study reveals novel loci associated with feeding behavior in Pekin ducks.

Authors:  Guang-Sheng Li; Feng Zhu; Fan Zhang; Fang-Xi Yang; Jin-Ping Hao; Zhuo-Cheng Hou
Journal:  BMC Genomics       Date:  2021-05-08       Impact factor: 3.969

3.  Genome-Wide Association Study of Growth and Feeding Traits in Pekin Ducks.

Authors:  Feng Zhu; Si-Rui Cheng; Yu-Ze Yang; Jin-Ping Hao; Fang-Xi Yang; Zhuo-Cheng Hou
Journal:  Front Genet       Date:  2019-07-26       Impact factor: 4.599

4.  Targeted genotyping by sequencing: a new way to genome profile the cat.

Authors:  M Longeri; A Chiodi; M Brilli; A Piazza; L A Lyons; G Sofronidis; M C Cozzi; C Bazzocchi
Journal:  Anim Genet       Date:  2019-09-12       Impact factor: 3.169

5.  Trait Analysis in Domestic Rabbits (Oryctolagus cuniculus f. domesticus) Using SNP Markers from Genotyping-by-Sequencing Data.

Authors:  Congyan Li; Yuying Li; Jie Zheng; Zhiqiang Guo; Xiuli Mei; Min Lei; Yongjun Ren; Xiangyu Zhang; Cuixia Zhang; Chao Yang; Li Tang; Yang Ji; Rui Yang; Jifeng Yu; Xiaohong Xie; Liangde Kuang
Journal:  Animals (Basel)       Date:  2022-08-11       Impact factor: 3.231

6.  Refining the Camelus dromedarius Myostatin Gene Polymorphism through Worldwide Whole-Genome Sequencing.

Authors:  Silvia Bruno; Vincenzo Landi; Gabriele Senczuk; Samantha Ann Brooks; Faisal Almathen; Bernard Faye; Suheil Semir Bechir Gaouar; Mohammed Piro; Kwan Suk Kim; Xavier David; André Eggen; Pamela Burger; Elena Ciani
Journal:  Animals (Basel)       Date:  2022-08-14       Impact factor: 3.231

7.  Single Nucleotide Polymorphism Discovery and Genetic Differentiation Analysis of Geese Bred in Poland, Using Genotyping-by-Sequencing (GBS).

Authors:  Joanna Grzegorczyk; Artur Gurgul; Maria Oczkowicz; Tomasz Szmatoła; Agnieszka Fornal; Monika Bugno-Poniewierska
Journal:  Genes (Basel)       Date:  2021-07-14       Impact factor: 4.096

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.