Literature DB >> 26172260

Genome-Wide Copy Number Variations Using SNP Genotyping in a Mixed Breed Swine Population.

Ralph T Wiedmann1, Dan J Nonneman1, Gary A Rohrer1.   

Abstract

Copy number variations (CNVs) are increasingly understood to affect phenotypic variation. This study uses SNP genotyping of trios of mixed breed swine to add to the catalog of known genotypic variation in an important agricultural animal. PorcineSNP60 BeadChip genotypes were collected from 1802 pigs that combined to form 1621 trios. These trios were from the crosses of 50 boars with 525 sows producing 1621 piglets. The pigs were part of a population that was a mix of ¼ Duroc, ½ Landrace and ¼ Yorkshire breeds. Merging the overlapping CNVs that were observed in two or more individuals to form CNV regions (CNVRs) yielded 502 CNVRs across the autosomes. The CNVRs intersected genes, as defined by RefSeq, 84% of the time - 420 out of 502. The results of this study are compared and contrasted to other swine studies using similar and different methods of detecting CNVR. While progress is being made in this field, more work needs to be done to improve consistency and confidence in CNVR results.

Entities:  

Mesh:

Year:  2015        PMID: 26172260      PMCID: PMC4501702          DOI: 10.1371/journal.pone.0133529

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


Introduction

Copy number variation (CNV) refers to segments of DNA typically larger than 1 kb that exist as variable numbers of copies among members of a species. CNV are a form of genetic variation distinct from the more commonly studied single nucleotide polymorphisms (SNP) and CNV have been shown to affect a larger number of nucleotides than SNPs [1]. Many studies have identified CNV in humans [2-4], other model organisms [5,6] and agricultural animals (reviewed in Clop [7]), including pigs [8-21] – the focus of this study. CNVs can affect gene dosage and disrupt normal gene regulation, leading to complex disease traits in humans (reviewed by Stankiewicz and Lupski [22]). In studies in humans, some of the missing heritability of SNP-based GWAS studies of complex traits has been assigned to CNVs [23,24]. The most commonly discussed example of CNV affecting pigs is the white coat phenotype caused by copy number variation of the KIT gene [25,26]. CNVs are typically detected using either array comparative genomic hybridization (aCGH) or an SNP genotyping array, although high-throughput sequencing is increasingly being used (reviewed by Kaplan et al. [27]). The main advantage of aCGH is higher signal to noise ratio. However, SNP genotyping chips use less DNA, are less expensive and provide genotyping of the population of animals so that SNP and CNV contributions to the heritability can be simultaneously determined. High-throughput sequencing, given sufficient investment, has superior resolution across the genome, but requires greater computational resources. Recently published results for detection of CNVs in pigs cover all three methods of detection: aCGH [8, 9, 20], SNP array both with [11,12] and without [13–15, 21] pedigree information, and high-throughput sequencing [16-18]. One study used the SNP array method on 217 highly inbred Iberian pigs and then used high-throughput sequencing on four of those pigs for validation [19]. Most of the pigs studied were either pure or half Chinese breeds, in contrast to the present study which utilizes composite pigs from Landrace, Duroc and Yorkshire lines. Thus, current results may be more relevant to the commercial swine industry. This study uses the Illumina PorcineSNP60 BeadChip (Illumina, San Diego, CA) coupled with the PennCNV algorithm [28]. PennCNV was chosen for this study in part due to its success when compared to competing algorithms [29] and due to its ability to effectively integrate pedigree relationships of boar-sow-offspring trios.

Results

Every pig had at least one CNV called, the average was 19.9 and the median was 14 CNV called per animal. CNV regions (CNVRs) were determined for the population by merging CNV that overlapped between animals. Including singletons, the full set of 949 CNVR covered 28.8% of the genome. Filtering out the singleton CNV reduced the results to 502 CNVR that cover 19.1% of the genome. The latter number is more consistent with other studies and requiring more than one observation also should eliminate any non-germline CNV as well as many false positives. S1 Table lists the 502 chromosomal positions for each of the CNVR along with their lengths and the number of pigs that contributed to each CNVR. The median number of pigs per CNVR was 8 with a range from 2 to 1129. The lengths of the CNVR ranged from 933 to 31,727,386 bp with a median value of 147,171 bp. The total length of all 502 CNVR is 495.29 Mb. Table 1 shows the coverage of each chromosome by CNVR, from the low of 3% in chromosome 7 to the high of 61% in chromosome 11. It also lists the total number of CNVR, their average length and the number that intersects known genes as reported by RefSeq [30]. Chromosome 8 exhibited the lowest percentage of CNVR that overlapped genes at 70%, while chromosome 12 had the highest rate of gene overlap at 100%. On an absolute basis, Chromosome 13 had the most CNVR with 63 and the most CNVR that overlapped known genes with 52, slightly ahead of chromosome 1 with 59 and 44, respectively. The total number of RefSeq genes that intersect the CNVRs in this study is 5422, with 1418 being characterized well enough to be assigned gene symbols.
Table 1

Summary of the CNVR content of each autosome and the frequency of overlap with genes.

ChrLengthCNVR lengthCoverage# CNVRavg length (Kb)# Genes% Genes
1315321320369252320.117596294475
2162569373372016560.2293112002994
3144787320189574570.131209481680
414346594153224510.037163331281
511150643983379300.075213971676
6157765591346346230.2221164961676
713476450941027320.03182281583
8148491824194806800.131209741470
9153670195443137230.2881529541387
1079102372140480020.178245852392
1187690580537385860.6132719902385
1263588570272308800.42820136120100
13218635233530132400.242638415283
14153851968568293020.3694313213684
15157681620387029270.245429213890
1686898990309330180.356339372988
176970158074029360.106223361986
186122007041114820.0677587571

Discussion

CNVR have been detected in many species and clearly are important components contributing to the missing heritability of complex traits. This study employed the use of a SNP genotyping beadchip containing 49,208 usable elements spread throughout the genome. Unfortunately, the broad and uneven spacing severely limits the accuracy of predicting end positions of the CNVR, while minimizing false-positives by filtering results to regions spanning three consecutive SNP prevents the identification of many small sized CNVR. Selection of predominantly single locus SNP to include on BeadChips limits the use of this technology to discover CNVR that have copy numbers greater than two. In addition to these technological limits, prior studies in cattle and swine have shown great variation between breeds in CNVR content and a sizable increase in CNVR detection rate for crossbred animals [11, 31]. This study uses a mixed breed population with SNP array detection and pedigree information to produce its results. The most similar published studies are those of Wang et al. [15], whose population consisted of 585 pigs that were a cross of Large White and Minzhu and Chen et al. [12] who tested 752 pigs that were an F2 cross of White Duroc and Erhualian. In the same study, Chen et al also reported results for 941 additional pigs covering 17 other populations. In an attempt to find the most robust CNVR that could be used for future investigations, the intersection of CNVR among this study and those of Wang et al. [15] and Chen et al. [12] was determined (Fig 1). Of the 502 CNVR reported in the present study, 237 (47%) overlapped at least one CNVR in the previous studies. There were 48 CNVR (9.6%), some very large, common to both Wang et al. [15] and Chen et al. [12] that overlapped a total of 77 CNVR reported in the present study. The intersection of all three sets of CNVR resulted in 77 regions spanning 12.51 Mb as listed in Table 2. Included in Table 2 is a list of 52 RefSeq genes with a defined gene symbol that intersect the CNVRs.
Fig 1

Comparison of CNVR discovered in pigs.

Comparison of CNVR discovered with the Illumina SNP60 BeadChip in the current study (USMARC_2015, black) with the results of Chen et al. [12] (Chen_2012, green) and Wang et al. [15] (Wang_2012, blue). In addition, the results of Li et al. [9], which used CGH arrays (Li_2012, red), are also displayed. Diagram was generated using PhenoGram (http://visualization.ritchielab.psu.edu/phenograms/document).

Table 2

CNVR in common across three independent studies.

ChrStartEndOverlapGenes
152040272520930581
197924182979652582PHIP
198775053988205313
11000504531001656844
11722421831727483125RPSA
12939286112939914516
12948211982952889447OR7A17
213853694139026078
782332463824563719
842499384938410WHSC1, WHSC2
811098524611110442711
811437619511467110112RPE65
93598721372196713GVIN1, TP53, NLGN1
95186145550068414
95757660585120815
9855477438564482316MIOS
11204952642056761617
11265347742659154418KBTBD6, MTRF1
11272932982743317119
11277186732795493120
11278885432795493121
11291252232921241722
11295920862999479023OLR1
11325819193287829224GABPAP
11358667873603748825
11398153744001388426FSHB, PPP4R2
11409279664098219527
11435828144411333428
11450037304553769729CSRNP3, PRSS16
11458572014592784330
11464596244661690331KLHL1
11496818944976501232
11511144855118842133
125078288515479434RNF157
127402572765106435
1298777801001798836NFAT5
12173512401740336837CRHR1, RPL13A
12186146411868667738KIF18B
12308235633133532039S100A16, NLGN1
13599415176016597640GXYLT2
13606476976074490341PDZRN3
13680454566812876142
13921179259226295443SLC9A9, CYP39A1
13926503179293998344
1310303310810319449045MME
1457032864750646
142742319286591447SYK
146661819674769348XPO7, NPM2
147363912741862249BIN3
14119897641222992950TRIM35
14148293911492268751
14172502191737488352
14180525811811651853
14193365771946753854
14206980752082086355
14211579402121423856
14475294794780198257
14487314194896261658
14502822545044015659TBC1D10A, SF3A1
14894021538949478660
14997577379981437661
1410177061210224489762
1410262959710272864563
1410308698810366334164NLRP8
1410674433710708500965CSTF2T, PRKG1
1411307449711313661566BTAF1
1413789554613795322567
1414464131214466561168
1414481816514490922469GPR26, GALNT11, CPXM2
15140025231401989670
15151157501533653071NFAT5, SPOPL, PRSS16
15177727571781985072
167694530770072073
16475367984765030074NLN, CSPP1
16730391977307131675
172300488234561476SGCZ
173459317351033177S100A16

Comparison of CNVR discovered in pigs.

Comparison of CNVR discovered with the Illumina SNP60 BeadChip in the current study (USMARC_2015, black) with the results of Chen et al. [12] (Chen_2012, green) and Wang et al. [15] (Wang_2012, blue). In addition, the results of Li et al. [9], which used CGH arrays (Li_2012, red), are also displayed. Diagram was generated using PhenoGram (http://visualization.ritchielab.psu.edu/phenograms/document). Different statistical methods to discover CNVR from SNP BeadChip data are available and each method produces a unique set of CNVR. Winchester et al. [29] conducted an objective evaluation of different methods using human HapMap data and concluded that the statistical method used should be one developed for the type of data to be analyzed. In addition, they indicated that inclusion of pedigree information in the analyses reduces the number of false-positives. Similarly, Wang et al. [15] analyzed their data with four different software programs and they found that PennCNV yielded the most CNVR that were discovered with at least one of the other programs. As PennCNV is the only software program that incorporates pedigree information with Illumina SNP data, it has been used in all studies with pigs when genotypic data was collected on both parents as well as progeny (trios). High-throughput sequencing, due to its kilobase resolution, is able to discover the more abundant smaller CNVR. Over 80% of the CNVR discovered by Jiang and coworkers were smaller than the average interval between adjacent SNP on the BeadChip (50 kb) and more than half of the CNVR discovered were between 10 and 20 kb[18]. In the study of Fernández et al. in which sequencing was used on four of the pigs with SNP genotyping data available, they were able to confirm only 16 of 65 BeadChip CNVRs with overlapping high-throughput analysis [19]. To illustrate the differences between BeadChip CNVR and sequencing CNVR, from Table 2 of Fernández et al. [19], CNVR 32 on chromosome 10 is 268 Kb long by BeadChip analysis and is overlapped by 51 smaller CNV found through sequencing. The large spacing of SNP in the Illumina PorcineSNP60 BeadChip and filtering single SNP CNVR creates low resolution CNVR that may be an aggregate of multiple smaller CNVR. The low confirmation rate of BeadChip CNVRs is not due to low resolution, but may be a technical issue related to the design and chemistry of this system. Therefore, stringent criteria need to be applied to limit the number of false-positives reported. Inclusion of pedigree information of genotyped trios and the use of PennCNV reduces the number of false positives. Each study likely finds only a fraction of the CNVR in its population. Poor overlap between swine studies may be due to a high rate of undetected CNVR within each population as well as the dramatically different breeds used in each of the studies. The high-throughput study of Rubin et al. reported 1928 CNVR in a population of 117 European pigs and wild boars [16]. These CNVR were found to overlap, or nearly overlap, 557 known genes. Of those, only five are in common with the genes listed in Table 2, further indicating an unfortunate lack of consensus between studies. Only 72 genes from Rubin et al. [16] were in common with the 1418 known genes that intersect CNVR observed in the present study Although several studies have successfully reported CNVR in a wide range of swine breeds, insufficient progress has been made in determining the phenotypic effects, and in particular, economically significant effects of these genetic variations. Rubin et al. found few CNVR within regions where signatures of selection were documented [16]. However, their study was based on a comparison between improved and unselected breeds. Two experiments were able to detect significant associations between CNVR and estimated breeding values for boars. Fowler et al. [32] conducted a GWAS for back fat thickness genotyping boars with extremely different breeding values. Along with the GWAS, they also used two different analyses to identify CNVR. Fowler et al. [32] reported 12 different CNVR along with 32 SNP associated with back fat thickness. Revay et al. [33] genotyped boars with extremely high and extremely low breeding values for a fertility trait (direct boar effect on litter size) and reported 35 CNVR detected and seven of these CNVR remained significantly associated with fertility upon testing them in a validation set of animals. However, more detailed studies are required to identify CNVR that affect phenotypic variation within populations. Failure to identify similar CNVR across studies is concerning. While refinement in experimental protocols is needed, the problem is amplified by variability between breeds and between detection methods. The experiment by Revay et al. [33] utilized purebred boars from the same breeds used to develop the composite population for the current study and 40% of their CNVR associated with fertility were identified in this study. Two of the lines studied for back fat thickness by Fowler et al. [32] were similar to germplasm in this study and 50% of the CNVR associated with back fat thickness were identified in this study. While the primary objective of these two reports was to detect associations with performance, they are the only two studies that used comparable commercially relevant germplasm. More work needs to be done to improve detection techniques for high-throughput testing of animals; thus, facilitating detection of significant CNVR effects on economically important traits.

Materials and Methods

The experimental procedures were approved and performed in accordance with the U.S. Meat Animal Research Center’s (USMARC) Animal Care and Use committee and the Guide for Care and Use of Agricultural Animals in Research and Teaching (FASS, 2010).

Animals

A composite swine population was developed at the USMARC starting in 2001 by crossing mixed Landrace-Yorkshire sows with one of 24 founding boars – 12 Landrace and 12 Duroc. The second generation was produced by mating Landrace-sired animals to Duroc-sired animals. Subsequent generations were created by choosing one male and ten females produced by each founding boar then randomly mating them while avoiding full-sib and half-sib pairings [34]. This study uses trios from crosses of 50 boars with 525 sows producing 1621 piglets, all born in the years 2005–2010. The piglets were members of the 5th through 8th filial generations of this closed composite population. Animals in this population were managed under typical commercial standards and either sold or slaughtered at the USMARC abattoir using conventional humane stunning methods followed by exsanguination.

DNA Isolation, SNP Array Genotyping, and Quality Control

Genomic DNA was extracted from the frozen tail sections clipped at 1 day of age of each pig using the Wizard SV Genomic DNA Purification kit (Promega, Madison, WI). The DNA samples were genotyped with the Illumina PorcineSNP60 BeadChip (Illumina, San Diego, CA) [35]. Genotype reactions were completed at the USMARC (Clay Center, NE) and the chips were then scanned at the USDA-ARS Bovine Functional Genomics Laboratory (Beltsville, MD). The scan results were interpreted at the USMARC using Illumina’s BeadStudio Genotyping software. The SNP with call rates <80% or minor allele frequencies < 0.05 were excluded from the data set, as were SNP that did not map or mapped to multiple positions in the Sus scrofa genome assembly 10.2. A final set of 49,208 SNP were used for further analysis.

Identification of Pig CNVs

Pig CNVs in this study were identified using PennCNV software [28]. PennCNV primarily utilizes the Log R Ratio (LRR) and the B Allele Frequency (BAF) output by BeadStudio, and the population frequency of B allele (PFB) calculated from the genotyping results. To improve the accuracy of the calls, PennCNV was provided a gcmodel file generated by calculating the gc content for the nearest 1 Mb of sequence around each SNP. A minimum of three consecutive SNP was required to call a CNV. PennCNV also utilizes pedigree information to significantly improve the accuracy of CNV calls. This study exclusively used pig samples with full trio information. To further improve the reliability of the results, all CNVs that were called only once in the population were discarded. CNV regions (CNVRs) were created by merging overlapping CNVs. Mention of trade names or commercial products is solely for the purpose of providing information and does not imply recommendation, endorsement or exclusion of other suitable products by the U.S. Department of Agriculture.

Information on all CNVR regions discovered.

Chromosome position, length, and number of pigs contributing to each of the 502 CNVR identified in the present study. (XLSX) Click here for additional data file.
  34 in total

1.  Detection of large-scale variation in the human genome.

Authors:  A John Iafrate; Lars Feuk; Miguel N Rivera; Marc L Listewnik; Patricia K Donahoe; Ying Qi; Stephen W Scherer; Charles Lee
Journal:  Nat Genet       Date:  2004-08-01       Impact factor: 38.330

Review 2.  Copy number variants, diseases and gene expression.

Authors:  Charlotte N Henrichsen; Evelyne Chaignat; Alexandre Reymond
Journal:  Hum Mol Genet       Date:  2009-04-15       Impact factor: 6.150

3.  PennCNV: an integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data.

Authors:  Kai Wang; Mingyao Li; Dexter Hadley; Rui Liu; Joseph Glessner; Struan F A Grant; Hakon Hakonarson; Maja Bucan
Journal:  Genome Res       Date:  2007-10-05       Impact factor: 9.043

4.  A large duplication associated with dominant white color in pigs originated by homologous recombination between LINE elements flanking KIT.

Authors:  Elisabetta Giuffra; Anna Törnsten; Stefan Marklund; Erik Bongcam-Rudloff; Patrick Chardon; James M H Kijas; Susan I Anderson; Alan L Archibald; Leif Andersson
Journal:  Mamm Genome       Date:  2002-10       Impact factor: 2.957

5.  Global variation in copy number in the human genome.

Authors:  Richard Redon; Shumpei Ishikawa; Karen R Fitch; Lars Feuk; George H Perry; T Daniel Andrews; Heike Fiegler; Michael H Shapero; Andrew R Carson; Wenwei Chen; Eun Kyung Cho; Stephanie Dallaire; Jennifer L Freeman; Juan R González; Mònica Gratacòs; Jing Huang; Dimitrios Kalaitzopoulos; Daisuke Komura; Jeffrey R MacDonald; Christian R Marshall; Rui Mei; Lyndal Montgomery; Kunihiro Nishimura; Kohji Okamura; Fan Shen; Martin J Somerville; Joelle Tchinda; Armand Valsesia; Cara Woodwark; Fengtang Yang; Junjun Zhang; Tatiana Zerjal; Jane Zhang; Lluis Armengol; Donald F Conrad; Xavier Estivill; Chris Tyler-Smith; Nigel P Carter; Hiroyuki Aburatani; Charles Lee; Keith W Jones; Stephen W Scherer; Matthew E Hurles
Journal:  Nature       Date:  2006-11-23       Impact factor: 49.962

6.  Molecular basis for the dominant white phenotype in the domestic pig.

Authors:  S Marklund; J Kijas; H Rodriguez-Martinez; L Rönnstrand; K Funa; M Moller; D Lange; I Edfors-Lilja; L Andersson
Journal:  Genome Res       Date:  1998-08       Impact factor: 9.043

7.  Distribution and functional impact of DNA copy number variation in the rat.

Authors:  Victor Guryev; Kathrin Saar; Tatjana Adamovic; Mark Verheul; Sebastiaan A A C van Heesch; Stuart Cook; Michal Pravenec; Timothy Aitman; Howard Jacob; James D Shull; Norbert Hubner; Edwin Cuppen
Journal:  Nat Genet       Date:  2008-05       Impact factor: 38.330

8.  A high-resolution map of segmental DNA copy number variation in the mouse genome.

Authors:  Timothy A Graubert; Patrick Cahan; Deepa Edwin; Rebecca R Selzer; Todd A Richmond; Peggy S Eis; William D Shannon; Xia Li; Howard L McLeod; James M Cheverud; Timothy J Ley
Journal:  PLoS Genet       Date:  2006-11-22       Impact factor: 5.917

9.  Copy number variations in high and low fertility breeding boars.

Authors:  Tamas Revay; Anh T Quach; Laurence Maignel; Brian Sullivan; W Allan King
Journal:  BMC Genomics       Date:  2015-04-10       Impact factor: 3.969

10.  A snapshot of CNVs in the pig genome.

Authors:  João Fadista; Marianne Nygaard; Lars-Erik Holm; Bo Thomsen; Christian Bendixen
Journal:  PLoS One       Date:  2008-12-16       Impact factor: 3.240

View more
  4 in total

1.  Genome-wide detection of CNV regions and their potential association with growth and fatness traits in Duroc pigs.

Authors:  Yibin Qiu; Rongrong Ding; Zhanwei Zhuang; Jie Wu; Ming Yang; Shenping Zhou; Yong Ye; Qian Geng; Zheng Xu; Sixiu Huang; Gengyuan Cai; Zhenfang Wu; Jie Yang
Journal:  BMC Genomics       Date:  2021-05-08       Impact factor: 3.969

2.  A study of vertebra number in pigs confirms the association of vertnin and reveals additional QTL.

Authors:  Gary A Rohrer; Dan J Nonneman; Ralph T Wiedmann; James F Schneider
Journal:  BMC Genet       Date:  2015-10-30       Impact factor: 2.797

3.  A global analysis of CNVs in swine using whole genome sequence data and association analysis with fatty acid composition and growth traits.

Authors:  Manuel Revilla; Anna Puig-Oliveras; Anna Castelló; Daniel Crespo-Piazuelo; Ediane Paludo; Ana I Fernández; Maria Ballester; Josep M Folch
Journal:  PLoS One       Date:  2017-05-04       Impact factor: 3.240

4.  Genetic analysis of teat number in pigs reveals some developmental pathways independent of vertebra number and several loci which only affect a specific side.

Authors:  Gary A Rohrer; Dan J Nonneman
Journal:  Genet Sel Evol       Date:  2017-01-04       Impact factor: 4.297

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.