Literature DB >> 34329477

The Chicken Pan-Genome Reveals Gene Content Variation and a Promoter Region Deletion in IGF2BP1 Affecting Body Size.

Kejun Wang1,2, Haifei Hu3, Yadong Tian1,2, Jingyi Li4, Armin Scheben5, Chenxi Zhang1,2, Yiyi Li1,2, Junfeng Wu1,2, Lan Yang1,2, Xuewei Fan1,2, Guirong Sun1,2, Donghua Li1,2, Yanhua Zhang1,2, Ruili Han1,2, Ruirui Jiang1,2, Hetian Huang1,2, Fengbin Yan2, Yanbin Wang2, Zhuanjian Li1,2, Guoxi Li1,2, Xiaojun Liu1,2, Wenting Li1,2, David Edwards3, Xiangtao Kang1,2.   

Abstract

Domestication and breeding have reshaped the genomic architecture of chicken, but the retention and loss of genomic elements during these evolutionary processes remain unclear. We present the first chicken pan-genome constructed using 664 individuals, which identified an additional approximately 66.5-Mb sequences that are absent from the reference genome (GRCg6a). The constructed pan-genome encoded 20,491 predicated protein-coding genes, of which higher expression levels are observed in conserved genes relative to dispensable genes. Presence/absence variation (PAV) analyses demonstrated that gene PAV in chicken was shaped by selection, genetic drift, and hybridization. PAV-based genome-wide association studies identified numerous candidate mutations related to growth, carcass composition, meat quality, or physiological traits. Among them, a deletion in the promoter region of IGF2BP1 affecting chicken body size is reported, which is supported by functional studies and extra samples. This is the first time to report the causal variant of chicken body size quantitative trait locus located at chromosome 27 which was repeatedly reported. Therefore, the chicken pan-genome is a useful resource for biological discovery and breeding. It improves our understanding of chicken genome diversity and provides materials to unveil the evolution history of chicken domestication.
© The Author(s) 2021. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

Entities:  

Keywords:  zzm321990 IGF2BP1zzm321990 ; body size; chicken; major gene; pan-genome

Mesh:

Year:  2021        PMID: 34329477      PMCID: PMC8557422          DOI: 10.1093/molbev/msab231

Source DB:  PubMed          Journal:  Mol Biol Evol        ISSN: 0737-4038            Impact factor:   16.240


Introduction

Chicken (Gallus gallus) is the most abundant domesticated animal in the world. The publication of the chicken genome in 2004 (Hillier et al. 2004) paved the way to identify the quantitative trait loci (QTLs) or quantitative trait genes (QTGs) involved in economically important traits, dissect the evolutionary processes of domestication, and understand the genetic basis of distinct phenotypes differentiating domesticated chickens and their wild relatives. Recently, the domestic chicken G. gallus domesticus was reported to have been domesticated from one subspecies of red jungle fowl, G. gallus spadiceus (Wang, Thakur, et al. 2020). Nevertheless, subspecies of G. gallus and other jungle fowls can introgress with G. gallus domesticus and these interspecies hybridizations have affected the genetic content of the species during evolution (Barton 2001; Desta 2019; Lawal et al. 2020; Wang, Thakur, et al. 2020). Traits such as yellow skin, pencilled feathers, and the spotted comb of domesticated chickens are likely the result of introgressions from G. sonneratii, G. varius, and G. lafayettii (Morejohn 1968b; Eriksson et al. 2008; Fallahshahroudi et al. 2019). Hybridizations leading to fertile offspring have been documented between Gallus species (Danforth 1958; Morejohn 1968a). These indicate that G. gallus domesticus is an admixed species, not only derived from red jungle fowl (Wang, Thakur, et al. 2020). A recent study also found different genome sizes between red jungle fowl and domestic chicken lineages (Piegu et al. 2020). Moreover, growing evidence suggests that structural variations are present in a substantial proportion of the genomes of many animals (Bickhart and Liu 2014), including human (Sherman et al. 2019), pig (Zhao et al. 2016; Li, Chen, et al. 2017; Tian et al. 2020), salmon (Bertolotti et al. 2020), and chicken (Kerstens et al. 2011; Seol et al. 2019). A range of phenotypes in chicken was reported to be determined by structural variations, such as feathered legs (Li, Lee, et al. 2020), crest (Li et al. 2021), blue egg shell (Wang et al. 2013), muffs and beard (Guo et al. 2016), comb (Wright et al. 2009; Imsland et al. 2012), and fibromelanosis (Dorshorst et al. 2011). The current chicken reference genome (GRCg6a) is derived from a single red jungle fowl individual. This reference therefore cannot fully capture the genetic diversity of domesticated chickens and may be unable to reveal the genetic basis of some phenotypes. Recently, an increasing number of reports for pan-genomes in human (Sherman et al. 2019), pig (Tian et al. 2020), goat (Li et al. 2019), and also in plants (Bayer et al. 2020), have focused on capturing genetic variations between different individuals within the species. The pan-genome represents the gene set of the species rather than a representative individual, which can uncover the genetic diversity and resolve structural variations that are missed by studies using a single reference genome. The pan-genome can also provide a straightforward way to detect presence/absence variations (PAVs) and explore the distributions of these variants at the population level. Body size is an important quantitative trait that has been intensively selected during chicken improvement and possibly associated with genome structural variations. One of the well-known candidate genes linked to body size is insulin-like growth factor 2 mRNA-binding protein 1 (IGF2BP1). IGF2BP1 can regulate cell proliferation, differentiation, morphology, and metabolism through regulating mRNA localization, stability, and translation of targeted genes (Stohr et al. 2012; Bell et al. 2013). In recent studies, IGF2BP1 was reported as N6-methyladenosine (m6A) readers to regulate the above functions (Huang et al. 2018; Zhu et al. 2020; Zhang, Wan, et al. 2021). Knockout of IGF2BP1 in mouse led to mild active colitis, mild-to-moderate active enteritis, and decreasing of barrier function and body weight (Singh et al. 2020). Dwarfism and impaired gut development were also observed in IGF2BP1-deficient mice (Hansen et al. 2004). Evidence from genome-wide association studies (GWASs) and QTL mapping revealed that the genomic regions upstream of IGF2BP1 were significantly associated with body weight, head weight, gizzard weight, chest width, leg weight, and wing weight in chicken and duck (Sheng et al. 2013; Ma et al. 2019; Zhou et al. 2018; Wang, Bu, et al. 2020; Wang, Cao, et al. 2020; Zhang, Wang, et al. 2021). However, the causal variations of IGF2BP1 that responsible for body sizes in chicken and duck remain unclear. Here, we constructed the first chicken pan-genome and comprehensively investigated PAV using this pan-genome, revealing changes in allele frequencies associated with chicken evolution. We found that deletions in the promoter region of IGF2BP1 can increase transcriptional activity and gene expression, regulating the body size in commercial chickens. Dissection of the causal variation of IGF2BP1 associated with body size can accelerate the breeding process for high growth rate chickens using marker-assisted selection. These findings will improve our understanding of changes in chicken gene content during domestication and breeding and help to design highly productive chicken breeds in the future.

Results

Pan-Genome Construction of Chicken

We constructed the first G. gallus pan-genome using an iterative mapping and assembly approach based on the chicken reference genome GRCg6a assembly. A set of whole-genome sequencing data including 664 individuals was used in the pan-genome construction, which contains 5 G. gallus wild subspecies, 28 native breeds (indigenous chicken breeds raised by farmers that did not experience intense artificial selection), and 4 commercial breeds (supplementary table S1, Supplementary Material online; fig. 1).
Fig. 1.

Pan-genome of chicken. (a) Geographical distribution of samples used for pan-genome construction. (b) Pan-genome gene classification. (c) Word cloud of the GO enrichment of biological process for variable genes. (d) Pan-genome modeling. The pan-genome modeling shows no more dramatic increases when the number of accession genomes is over 220, indicating that selected individuals were sufficient to capture the majority of PAVs within Gallus gallus. Upper and lower lines represent the pan-genome number and core-genome number, respectively.

Pan-genome of chicken. (a) Geographical distribution of samples used for pan-genome construction. (b) Pan-genome gene classification. (c) Word cloud of the GO enrichment of biological process for variable genes. (d) Pan-genome modeling. The pan-genome modeling shows no more dramatic increases when the number of accession genomes is over 220, indicating that selected individuals were sufficient to capture the majority of PAVs within Gallus gallus. Upper and lower lines represent the pan-genome number and core-genome number, respectively. The G. gallus pan-genome identified an additional approximately 66.5-Mb sequences that are absent from the reference genome (GRCg6a), encoding an additional 4,063 high-confidence genes (supplementary tables S2 and S3, Supplementary Material online). Of these, 49% (1,976 genes) nonreference genes are only present in a small proportion of chickens (fig. 1). Together, the chicken pan-genome, including reference and nonreference sequences, consists of 1,131.9 Mb and contains 20,941 predicted protein-coding genes. A total of 81 RNA-seq data sets from 27 tissues (including digestive, respiratory, kinetic, urinary, reproductive, endocrine, circulatory, nervous, immune, epithelium, and connective system) were used to investigate the gene expression (supplementary table S4, Supplementary Material online). We observed an average normalized transcript per million abundance greater than 1 for 90.6% of the autosomal genes in the reference genome and 19.4% of the nonreference genes. This pattern is similar to those found in other pan-genome studies (Zhao et al. 2018; Gao et al. 2019), which showed that genes in the reference genome generally have higher expression than genes in the nonreference contigs (supplementary fig. S1, Supplementary Material online).

Discovery of Gene PAV

After sample selection (see Materials and Methods and supplementary fig. S2, Supplementary Material online), a total of 268 individuals with average sequence depth larger than 10× based on pan-genome estimation were available for gene PAV detection, including 6 wild, 217 native, and 45 commercial individuals (supplementary table S1, Supplementary Material online). We categorized genes in the chicken pan-genome according to their gene presence frequencies. A total of 15,205 (76.32%) core genes are shared by 268 individuals. 4,738 genes are variable including 391 softcore, 2,351 shell, and 1,976 cloud genes, which are present in more than 99%, 1–99%, and less than 1% of all individuals, respectively (fig. 1). The chicken pan-genome showed a moderate core gene content (76.32%) compared with that of human (96.88%) (Duan et al. 2019), mussel (69.2%) (Gerdol et al. 2020), and plants (35–81%) (Gao et al. 2019). Gene Ontology (GO) enrichment results of each cluster of variable genes are presented in supplementary table S5, Supplementary Material online. Variable genes were enriched in the function associated with reproduction, nutrient absorption, metabolic and biosynthetic process (fig. 1). RNA-seq analysis revealed that the expression level of flexible genes (shell and cloud genes) was significantly lower than that of conserved genes (core and softcore genes) (supplementary fig. S1, Supplementary Material online). No apparent difference of expression was identified between conserved genes in the reference and nonreference sequences, but the expression of conserved genes was significantly higher than that of flexible genes in both reference and nonreference sequences (supplementary fig. S1, Supplementary Material online). Pan-genome modeling revealed a closed pan-genome with an estimated total of 19,190 genes (genes on sex chromosomes were excluded because the gene content was different between chromosomes Z and W) (fig. 1). This suggests the chicken pan-genome assembled using our selected 268 individuals included all or nearly all of the G. gallus gene contents.

Gene PAV Shaped by Selection, Genetic Drift, and Hybridization

We observed a broad gene PAV distribution within different groups, with substantial variation in the native chickens and wild relatives (red jungle fowls) (fig. 2). PAV-based Principal Component Analysis (PCA) and phylogenetic analysis also showed high diversity among wild relatives and native chickens, whereas commercial broiler and layer clustered together (fig. 2). Moreover, two clades of commercial chickens were further separated into two groups, meat-production (two broiler breeds: BRA and BRB) and egg-production (two layer breeds: BL and WL). These differences between commercial and native or wild chickens are likely due to selection, but the genetic drift and other factors cannot be ruled out. Therefore, we further investigated whether selection, genetic drift, and hybridization can alter gene PAV content since single-nucleotide polymorphism (SNP)-based allelic frequencies can be shaped by selection, genetic drift, and hybridization (Edwards 2008).
Fig. 2.

Distribution of gene PAV. (a) The heatmap shows the PAV of variable genes within wild relatives, native breeds, and commercial breeds. (b) The principal component analysis of chicken breeds based on gene PAV. Wild: wild relatives (red jungle fowls); Native, native breeds; commercial breeds consist of two broiler breeds (BRA and BRB) and two layer breeds (BL and WL). (c) Neighbor-Joining phylogenetic tree constructed based on gene PAV matrix.

Distribution of gene PAV. (a) The heatmap shows the PAV of variable genes within wild relatives, native breeds, and commercial breeds. (b) The principal component analysis of chicken breeds based on gene PAV. Wild: wild relatives (red jungle fowls); Native, native breeds; commercial breeds consist of two broiler breeds (BRA and BRB) and two layer breeds (BL and WL). (c) Neighbor-Joining phylogenetic tree constructed based on gene PAV matrix. We analyzed the pool sequencing data of “Virginia body weight lines” and compared the gene PAV content between high weight selected (HWS) lines and low weight selected (LWS) lines which were divergently selected from the same founder White Plymouth Rock population (Lillie et al. 2018). Two lines had been suffered from intensive bidirectional selection for 8-week body weight, between which about 15-fold phenotypic difference presented. PAV-based PCA and phylogenetic analysis showed two distinct clusters were consistent with selected lines (supplementary fig. S3 and b, Supplementary Material online). This suggests that gene PAV can be shaped by intensively artificial selection. We have further compared the frequencies of gene PAV between HWS and LWS and identified the candidate genes related to the intensive bidirectional selection for body weight. Twenty-four genes were found to be completely absent in HWS and present in LWS or entirely present in HWS and absent in LWS (supplementary fig. S3, Supplementary Material online). The well-studied SH3 domain containing ring finger 2 (SH3RF2, ENSGAT00000090177) gene, regulating appetite and affecting body weight, was also identified as one of these genes that is completely absent in HWS but present in LWS (Rubin et al. 2010; Jing et al. 2020). We further investigated whether gene PAV within the chicken population can be affected by genetic drift or hybridization. Firstly, we studied conserved populations of varying size. The subpopulations GS1, GS2, and GS3 are from the Gushi chicken populations, of which GS1 (n = 30) and GS2 (n = 30) were sampled from a small conserved population in 2010 and 2019, respectively, whereas GS3 (n = 30) was sampled from a large conserved population in 2019. The subpopulations XB1 (n = 30) and XB2 (n = 30) are the Xichuan black-bone chicken populations, which were sampled from a large conserved population in 2010 and 2019, respectively (supplementary note;supplementary fig. S4, Supplementary Material online). We did not observe the change of PAV content during short period (<9 years), whatever in a small or big conservation population, by comparing XB1 and XB2, or GS1 and GS2. However, we observed an apparent division between GS1 or GS2 and GS3 based on the results of PCA and phylogenetic analysis (supplementary fig. S4b–d, Supplementary Material online). We also found a significant reduction of genetic diversity in GS1 and GS2 in comparison with GS3 based on SNP heterozygosity and allelic richness analysis (observed heterozygosity: 0.18 in GS1, 0.17 in GS2, 0.23 in GS3; allelic richness: 0.53 in GS1, 0.47 in GS2, 0.75 in GS3). These results are consistent with significant differences in gene PAV content (supplementary fig. S4b–d, Supplementary Material online) and previous studies showing that small conserved populations suffer from genetic drift after long periods of isolation which leads to a reduction of genetic diversity (Whitlock 2000). Based on the above evidence, we compared the gene PAV frequencies between GS3 and GS1+GS2 to investigate gene PAV involving genetic drift by long periods of isolation. According to Fisher’s exact test (false discovery rate < 0.001) (Gao et al. 2019), we only identified six genes that were significantly different in frequencies between GS3 and GS1+GS2 (supplementary fig. S4, Supplementary Material online), and none of these genes has a clear functional annotation (annotated as proteins without known function). Of these, four gene PAVs were fixed or nearly fixed in GS1+GS2 that is consistent with the reduction of genetic diversity of GS1 and GS2. Secondly, we compared the gene PAV between Gushi chickens and the Gushi×Anak F2 population (supplementary fig. S5, Supplementary Material online) and identified a clear divergence between Gushi breeds and the F2 population according to the PAV distribution (supplementary fig. S5, Supplementary Material online). PAV-based PCA and the phylogenetic analysis also revealed that Gushi pure breeds and hybrid population fall into two distinct clades. These results suggest a relatively larger effect of hybridization on gene content, which is also significantly more extensive than that from genetic drift by comparing their clustering (supplementary fig. S5 and d, Supplementary Material online).

Change of Gene PAV Frequency during Breeding

Gene PAV can be shaped by domestication and improvement; therefore, PAV within populations can also be applied to track the evolutionary history of a species (Gao et al. 2019; Guo et al. 2020). By comparing the gene presence frequency between the commercial and native breeds, we identified 30 significantly increased genes and 83 significantly decreased genes associated with postdomestication breeding improvement (supplementary table S6 and fig. S6, Supplementary Material online). Of these, ten significant genes (seven increased and three decreased) are located on the reference genome. We observed that two uncharacterized genes (PanGallus_Gene02610 and ENSGALT00000098327) are lost in modern breeds. We also observed four immune-related genes significantly decreased during improvement, including a class I histocompatibility antigen (ENSGALT00000081489), a B-cell differentiation antigen CD72-like (PanGallus_Gene00218), a T-cell differentiation antigen CD6 (PanGallus_Gene04583), and an Immunoglobulin G-binding protein A (PanGallus_Gene03891). Tibetan chicken living at the Tibetan Plateau shows the environmental adaptation to high altitudes, particularly to the hypoxic environment (Wang et al. 2015). Therefore, we compared the gene PAV frequencies between Tibetan chicken and other lower land indigenous chicken to identify candidate genes associated with the environmental adaptation to high altitude (supplementary table S7, Supplementary Material online). A total of 121 genes showing significant difference in PAV frequencies were identified, of which frequencies of 118 genes were significantly increased in Tibetan chicken. Vasodilator-stimulated phosphoprotein (VASP, ENSGALT00000100137) was found to have a high presence frequency (0.906) in Tibetan chicken compared with other lower land native chickens (0.476). VASP has been reported to protect endothelial barrier function during hypoxia (Schmit et al. 2012). Vasculature of VASP deletion mouse exhibited patterning defects and lacks structural integrity, leading to edema and hemorrhaging (Furman et al. 2007). This evidence suggests that VASP is likely to play an essential role in vasculature function and structure in a hypoxic environment. Transitional endoplasmic reticulum ATPase gene (ENSGALT00000056168) was nearly completely lost in Tibetan chicken (frequency is 0.093), while had moderate frequency in other lower land native chickens (0.568). Previous studies revealed that transitional endoplasmic reticulum ATPase activity is significantly inhibited during hypoxia in rat and western painted turtles (Henrich and Buckler 2013; Smith et al. 2015). This suggests that the absence of transitional endoplasmic reticulum ATPase gene is potentially associated with the adaptation to hypoxic environment.

Change of PAV Frequency in Promoter Regions during Breeding

Most PAV analysis in previous pan-genome studies has focused on the protein-coding regions. However, further investigations of the roles of regulatory regions are also required since they can affect gene expression and phenotype (Van Laere et al. 2003; Swinnen et al. 2019). Similarity between orthologous promoters drastically decreased when distance was longer than 2 kb from the gene transcription start site (TSS) (Keightley et al. 2005). Therefore, promoter regions are generally anchored within the 2-kb upstream genomic region of the TSS (Farre et al. 2007; Abe and Gemmell 2014). In this study, the promoter region was defined as the 3-kb upstream genomic region from TSS to maximize the captured promotor regions. In order to detect smaller PAV in the promoter region, we divided each of the promoter region into three windows: 0–1, 1–2, and 2–3 kb upstream of the gene and investigated the frequencies of PAV in each window (fig. 3). We observed that frequencies of 143 PAVs in the 0–1 kb region of commercial chickens were significantly different from that of native chickens, which contains 117 increased and 26 decreased. In the same comparison, the frequencies of 80 PAVs differed significantly in the 1–2 kb regions with 56 increased and 24 decreased, and 78 PAVs differed in the 2–3 kb regions with 55 increased and 23 decreased (fig. 3; supplementary table S8, Supplementary Material online). We found 12 genes in the olfactory receptor gene family that showed reduced presence frequency in the promoter regions of commercial chickens relative to native chickens (supplementary fig. S7, Supplementary Material online). We also observed that the presence frequencies of the promoter region of nine immunoglobulin-related genes were altered during improvement (supplementary fig. S7, Supplementary Material online). Genes with significantly altered PAVs frequencies in promoter regions during breeding were enriched mainly in the GO terms of modulation by virus of host process, cyclin-dependent protein kinase holoenzyme complex, and p53 binding (supplementary fig. S8, Supplementary Material online).
Fig. 3.

Change of PAV frequency in promoter region during breeding and PAV-based GWAS. Scatter plots showing gene occurrence frequencies in Native breeds and Com (commercial) breeds for 0–1 kb (a), 1–2 kb (b), and 2–3 kb (c) upstream promoter regions, respectively. Manhattan plots showing significant promoter region PAVs associated with 151 traits for 0–1 kb (d), 1–2 kb (e), and 2–3 kb (f) upstream promoter regions. All association analysis result was plotted according to the physical location and P-value, with each dot representing an association analysis result. The upper and lower dashed lines represent the significant and suggestive thresholds, respectively. CW1, claw weight; CR, the ratio of claw weight to body weight; DPW, double pinion weight; SEW, semi-evisceration weight.

Change of PAV frequency in promoter region during breeding and PAV-based GWAS. Scatter plots showing gene occurrence frequencies in Native breeds and Com (commercial) breeds for 0–1 kb (a), 1–2 kb (b), and 2–3 kb (c) upstream promoter regions, respectively. Manhattan plots showing significant promoter region PAVs associated with 151 traits for 0–1 kb (d), 1–2 kb (e), and 2–3 kb (f) upstream promoter regions. All association analysis result was plotted according to the physical location and P-value, with each dot representing an association analysis result. The upper and lower dashed lines represent the significant and suggestive thresholds, respectively. CW1, claw weight; CR, the ratio of claw weight to body weight; DPW, double pinion weight; SEW, semi-evisceration weight. Interestingly, we found two PAVs located at both 1–2 and 2–3 kb upstream region of IGF2BP1 gene, respectively, which their presence frequencies are significantly less in commercial chickens than native chickens (fig. 3). A high loss rate was observed in commercial breeds compared with native breeds, with a 1–2 kb promoter region presence frequency of 0.04 in commercial breeds and 0.83 in native breeds (supplementary table S8, Supplementary Material online). Similarly, the 2–3 kb promoter region presence frequency was 0.04 in commercial breeds and 0.89 in native breeds (fig. 3).

PAV-Based GWAS on Promoter Regions

To uncover traits determined by promoter region PAV, we further conducted PAV-based GWAS on the promoter regions using the Gushi×Anak population with 204 F2 individuals (fig. 3). Anak chicken is a commercial broiler breed from Israel, whereas Gushi is an indigenous chicken of China that did not experience from an intensive selection. We identified 56 association events for 0–1 kb promoter regions, 61 for 1–2 kb promoter regions, and 78 for 2–3 kb promoter regions (supplementary table S8, Supplementary Material online). These association events are involved in 81 traits, including body size, growth, carcass, meat quality, and physiological traits (supplementary note, Supplementary Material online). For example, the PAV for 2–3 kb upstream region of ENSGALG00000052768 (low-density lipoprotein receptor precursor, LDLR) was functionally related to serum CREA (creatinine) level. ENSGALG00000051173 (olfactory receptor 14C36-like) was found to be associated with ileum length (IL), jejunum length (JL), and cecum length (CL). We also found that the promoter region PAV of immune-related genes showed associations with production traits. For instance, ENSGALG00000054397 (class I histocompatibility antigen, F10 alpha chain-like isoform X1) was associated with breast bone length (BBL12) and ENSGALG00000050329 (class I histocompatibility antigen, F10 alpha chain-like isoform X1) was correlated with body weight at birth. ENSGALG00000051088 (G. gallus class I histocompatibility antigen, F10 alpha chain-like) was linked with BBL12 and body slanting length at 12 weeks (BSL12) (supplementary table S8, Supplementary Material online). Immunoglobulin-related genes were also identified to correlate with production traits. ENSGALG00000049846 (immunoglobulin-like receptor CHIR2D-751 precursor) was associated with breast muscle weight (BMW) and the ratio of head weight to body weight at 12 weeks (HR1). ENSGALG00000045164 (leukocyte immunoglobulin-like receptor subfamily A member 2) was associated with BMW and shank girth (SG8). ENSGALG00000050779 (immunoglobulin superfamily member 1) was linked with six carcass composition traits, and ENSGALG00000050638 (immunoglobulin-like receptor CHIR2D-878 precursor) was associated with shank length (SL12) (supplementary table S8, Supplementary Material online). As expected, we also found that the promoter region of IGF2BP1 was associated with growth traits, including claw weight (CW1), the ratio of claw weight to body weight (CR), double pinion weight (DPW), and semi-evisceration weight (SEW), based on PAV-based GWAS of both 1–2 kb and 2–3 kb upstream regions (fig. 3; supplementary table S8, Supplementary Material online). The most significant association was identified between IGF2BP1 and CW1 (P = 1.92E-07) based on 1–2 kb PAV-based GWAS (fig. 3).

Dissection of the Structure and Function of IGF2BP1 Promoter Region

To dissect the structure of IGF2BP1 promoter region, we comprehensively analyzed the results of polymerase chain reaction (PCR) and WGS read mapping. Three alleles of IGF2BP1 promoter region were identified, which were defined as W (wild type), L1 allele (3.2 kb deletion at GRCg6a chr27:6082202–6085435), and L2 allele (1.5 kb deletion at chr27:6083984–6085538) (fig. 4supplementary fig. S9, Supplementary Material online). We conducted allele-specific PCR to genotype these three alleles in wild, native, and commercial chickens (fig. 4). As expected, the W allele was dominant in native breeds and wild relatives. In contrast, all the two commercial broiler breeds and commercial crossed chickens mainly had absence variant (L1 or L2 alleles). Absence variant (L1 or L2) was also dominant in commercial layer breeds, except for White Leghorn breeds. This result is consistent with the distribution of 1–2 and 2–3 kb upstream region PAV frequency for the IGF2BP1 promoter region, which showed that commercial breeds were almost uniform for the mutant absence variant. We also compared the promoter region of IGF2BP1 between HWS lines and LWS lines using their pool sequencing data (Lillie et al. 2018). We found that L1 was fixed in high body weight lines, whereas W was fixed in the low body weight lines, including the relaxed selection lines (supplementary fig. S9, Supplementary Material online). It implies that W and L1 alleles have been selected to be fixed at an earlier time, before the divergence of relaxed selection lines.
Fig. 4.

Structure and frequency of the three alleles in IGF2BP1 promoter region. (a) Genomic structure of three alleles in IGF2BP1 promoter region in relation to evolutionarily conserved elements (77 vertebrates basewise PhyloP conservation score). Variant alleles in the promoter region of IGF2BP1 include wild type (W) and two mutant alleles (L1 and L2). The conserved elements are indicated by red arrows. Asp-F, 2k-F, and Asp-R are the PCR primers for the identification of the allelic type. (b) Allelic frequency of IGF2BP1 promoter region in the validated population by allelic-specific PCR genotyping. PCR product sizes of W, L1 and L2 are 2345, 290 and 791 bp, respectively. The gel shows the six genotypes derived from the combinations of the three alleles.

Structure and frequency of the three alleles in IGF2BP1 promoter region. (a) Genomic structure of three alleles in IGF2BP1 promoter region in relation to evolutionarily conserved elements (77 vertebrates basewise PhyloP conservation score). Variant alleles in the promoter region of IGF2BP1 include wild type (W) and two mutant alleles (L1 and L2). The conserved elements are indicated by red arrows. Asp-F, 2k-F, and Asp-R are the PCR primers for the identification of the allelic type. (b) Allelic frequency of IGF2BP1 promoter region in the validated population by allelic-specific PCR genotyping. PCR product sizes of W, L1 and L2 are 2345, 290 and 791 bp, respectively. The gel shows the six genotypes derived from the combinations of the three alleles. Via the single genotype marker association analysis, the associations between the L1 allele and the body size, body weight or carcass composition still hold true when enlarging the sample size of Gushi×Anak F2 population to 734 (fig. 5; supplementary fig. S10, Supplementary Material online). The associated traits include claw weight (CW1, CR), shank length (SL12), breast bone length (BBL8 and BBL12), wing weight (DPW), evisceration weight (EW and SEW), head weight (HW1), carcass weight (CW), leg weight (LW and LMW), pelvis breadth (PB12), shank girth (SG12 and SG8), body slanting length (BSL8 and BSL12), gizzard weight (GW), body weight (BWHR, BW6 and BW10), and growth rate (GR0_4). In those traits, the L1L1 genotype was always linked to better performance of production (such as body size, carcass weight, and body weight) than the WW genotype (fig. 5; supplementary fig. S10, Supplementary Material online). The significant association between the IGF2BP1 genotype and body size confirms the PAV-based GWAS results in promoter regions (fig. 3). Of these, associations are most significant in traits CW1 (P = 2.32E-14) and CR (P = 3.70E-12), which account for 4.01% and 3.85% of the phenotypic variations, respectively. Interestingly, we observed a larger effect of L1 in females relative to males, which explained 11.5% in females and 7.3% in males of the phenotypic variations for CW1 trait (fig. 5). Besides, a female phenotype variation of 8.2% for DPW and 6.2% for SL12 was explained by L1. These associations are also consistent with the chicken and duck SNP-based GWAS results, which indicated that SNPs located near IGF2BP1 were associated with body weight, head weight, gizzard weight, chest width, leg weight, and wing weight (Ma et al. 2019; Zhou et al. 2018; Wang, Bu, et al. 2020; Wang, Cao, et al. 2020; Zhang, Wang, et al. 2021). Unexpectedly, L2 allele was not found in the F2 population.
Fig. 5.

Single-marker genotype association of IGF2BP1 promoter region in the validated Gushi×Anak F2 population with 734 individuals. Eight representative association events were included and others were showed in supplementary figure S10, Supplementary Material online. The number in the bracket is the proportion of phenotype variance explained by IGF2BP1 loci. CW1, claw weight; CR, the ratio of claw weight to body weight; SL12, shank length; BBL12, breast bone length; DPW, double pinion weight; SEW, semi-evisceration weight; CW, carcass weight; LW, leg weight. All traits were phenotyped at 12 weeks of age.

Single-marker genotype association of IGF2BP1 promoter region in the validated Gushi×Anak F2 population with 734 individuals. Eight representative association events were included and others were showed in supplementary figure S10, Supplementary Material online. The number in the bracket is the proportion of phenotype variance explained by IGF2BP1 loci. CW1, claw weight; CR, the ratio of claw weight to body weight; SL12, shank length; BBL12, breast bone length; DPW, double pinion weight; SEW, semi-evisceration weight; CW, carcass weight; LW, leg weight. All traits were phenotyped at 12 weeks of age. To further verify the molecular effects of the deletions, luciferase expression levels were investigated to represent the transcriptional activity through transfecting three kinds of recombinant plasmids (pGL3-L1, pGL3-L2, and pGL3-W) into chicken DF-1 cells (fig. 6). Before performing the luciferase activity experiment, we screened the genome region which inserted the pGL3 construct and confirmed that we did not find any difference except the L1 and L2 deletion. Therefore, the activity difference among the three constructs was derived from the deletions. The transcriptional activities of these two deletions (L1 and L2) were significantly higher than that of wild type (W). Further, the activity of the L1 genotype is also higher than that of L2 (fig. 6). Subsequently, we compared the mRNA expression level between the L1L1 (Ross 308) and WW genotypes (Gushi chicken). The expression of IGF2BP1 mRNA in WW genotype is significantly lower than that in the L1L1 genotype in almost all investigated tissues at 6 weeks of age (fig. 6). To reduce the difference in the genetic background among individuals with different genotypes and investigate the effect of deletions more accurately, we performed cross-breeding between chickens with the same heterozygous genotypes (L1W×L1W and L2W× L2W) to generate L1L1, L2L2, and WW genotype chicken with half-sib or full-sib relationship, and then compared the expressions of IGF2BP1. In spleen and duodenum tissues at 3 weeks of age, we observed higher expressions in L1L1 and L2L2 than WW genotype, whereas L1L1 also showed a higher value than L2L2 (fig. 6). This is completely consistent with the result of transcriptional activity (fig. 6). We also observed three conserved elements located in L1 deletion based on 77 vertebrates basewise PhyloP conservation score (https://hgdownload.soe.ucsc.edu/goldenPath/galGal6/phastCons77way/, last accessed January 2021), of which one conserved element located in L2 deletion (fig. 4). These suggest the functional importance of three conserved elements which possibly regulating IGF2BP1 expression.
Fig. 6.

Comparison of transcriptional activity and expression among three IGF2BP1 genotypes. (a) Comparison of transcriptional activity among different IGF2BP1 promoter region in chicken DF-1 cells. Left shows the constructions of the inserted fragment into the pGL3-Basic plasmid. Significance of two-tailed Student’s t-test: **P < 0.01; ***P < 0.001. (b) Comparison of mRNA expression of IGF2BP1 between L1L1 (Ross 308) and WW (Gushi) chickens in five tissues at 6 weeks of age. Breast, breast muscle; Leg, leg muscle. P-values were calculated using a two-tailed Student’s t-test. (c) Comparison of mRNA expression of IGF2BP1 between L1L1, L2L2, and WW in an IGF2BP1 genotype segregating population at 3 weeks of age.

Comparison of transcriptional activity and expression among three IGF2BP1 genotypes. (a) Comparison of transcriptional activity among different IGF2BP1 promoter region in chicken DF-1 cells. Left shows the constructions of the inserted fragment into the pGL3-Basic plasmid. Significance of two-tailed Student’s t-test: **P < 0.01; ***P < 0.001. (b) Comparison of mRNA expression of IGF2BP1 between L1L1 (Ross 308) and WW (Gushi) chickens in five tissues at 6 weeks of age. Breast, breast muscle; Leg, leg muscle. P-values were calculated using a two-tailed Student’s t-test. (c) Comparison of mRNA expression of IGF2BP1 between L1L1, L2L2, and WW in an IGF2BP1 genotype segregating population at 3 weeks of age.

Investigation of the Genomic Regions Flanking the Deletion

Since IGF2BP1 was also reported as the body size candidate gene using SNP-based studies (Sheng et al. 2013; Ma et al. 2019; Wang, Bu, et al. 2020; Wang, Cao, et al. 2020; Zhang, Wang, et al. 2021), we explored the SNPs within the region from 10 kb upstream of L1 deletion and 10 kb downstream in order to test whether any of the signal driving SNPs in previous studies could be the causal. SNP calling was done using the same 664 individual sequencing data for building the pan-genome; however, 210 individuals were excluded for the low quality of SNP calling or their heterozygous genotype. The remaining individuals include 325 WW, 117 L1L1, and 12 L2L2 samples. We searched for SNPs that associated with the deletions in three different ways, L1L1 versus WW, L2L2 versus WW, and L1L1 + L2L2 versus WW. Altogether, five associated SNPs were detected. Among them, the highest PhyloP conservation score is 0.97, and that SNP (chr27: 6087849) is not within a conservation element. The other four SNPs have negative conservation scores. This implies that none of these five associated SNPs is highly conserved, which supports that the deletion is likely to be the only functional mutation within this region.

Discussion

Construction of the First Chicken Pan-Genome and Dissection of Genetic Changes in the Chicken Population

Here, we constructed the first pan-genome of chicken, capturing approximately 66.5-Mb novel sequences that are absent from the reference genome (GRCg6a). Similar novel additional pan-genome sequences were captured in pig (Tian et al. 2020) (∼72.5 Mb), human (Sherman et al. 2019) (∼296.5 Mb), and plants (Yao et al. 2015; Golicz, Bayer, et al. 2016; Montenegro et al. 2017) (15.8–350 Mb). Absent sequences from the reference genome were predicted to encode additional 4,063 high-confidence genes. We also found that about one-third of the gene PAV is variable among the 268 individuals used for PAV calling. This highlights the heterogenicity of genetic makeup among chicken breeds and shows a potential utility for further breeding (Gao et al. 2019). We observed that red jungle fowls and native chickens contained most of the genetic diversity of chickens, whereas limited genetic diversity was found in commercial chickens (fig. 2). This result is consistent with known reductions in genetic diversity of modern livestock compared with their wild ancestors (Malomane et al. 2019; Frantz et al. 2020). Similarly, peach (Guo et al. 2020), chickpea (Varshney et al. 2019), and tomato (Gao et al. 2019) pan-genome studies found that their wild relatives and landraces are more genetically diverse compared with modern cultivars. We also found that intraspecies gene content variation can be affected by selection, genetic drift, or hybridization (supplementary figs. S3–S5, Supplementary Material online). We proposed that the reduction of genetic diversity in commercial chickens might occur due to intensive artificial selection during breeding, but other factors cannot be ruled out, such as genetic drift.

PAVs Are Associated with Physiological Traits and the Presence Frequency of Immune-Related Loci Was Reduced during Modern Chicken Breeding

We found that the promoter region PAV of genes showed associations with physiological related traits, such as LDLR and olfactory receptors (supplementary table S8, Supplementary Material online). Lipid accumulation can enhance LDLR expression leading to an increase of serum creatinine (Sun et al. 2013; Zhang et al. 2016). LDLR knockout mouse and rat showed substantial increases in plasma creatinine (Bisgaard et al. 2016; Sithu et al. 2017). Variation in the promoter region of LDLR may reduce its expression and further upregulate serum creatinine level. Olfactory receptors were first discovered in the olfactory epithelium, functioning in odorant recognition involving various physiological behaviors, such as food choice and intake. However, recent studies indicate that these genes are also expressed in the intestinal tract (Priori et al. 2015; Kim et al. 2017; Kotlo et al. 2020), and olfactory receptors play a role in intestinal inflammatory reaction (Kotlo et al. 2020), secretion (Kim et al. 2017), and microbiota metabolites (Priori et al. 2015). We also found olfactory receptor 14C36-like gene associated with IL, JL, and CL (supplementary table S8, Supplementary Material online). It is thus possible that olfactory receptors are involved in feed digestion and conversion via regulation of intestine development and thus were under selection during modern breeding (supplementary fig. S7, Supplementary Material online). We observed the presence frequency of immune-related gene or promoter region (including Major Histocompatibility Complex [MHC] and immunoglobulin) decreased in commercial chicken compared with the native breed. Of these, some immune gene PAVs showed significant association with production traits (supplementary table S8, Supplementary Material online). This is consistent with a previous report that a high immune response is negatively correlated with chicken egg production and body weight (Warner et al. 1987). MHC genes are involved in immune recognition and susceptibility to infectious disease (Sommer 2005). There is a possible genetic linkage between MHC genes and growth or reproduction genes (Warner et al. 1987). Another possible explanation is that increased productivity may also increase the metabolic burden of immune gene maintenance in modern breeds. A trade-off might occur between the conservation of production-related genes and the loss of immune-related genes due to human selection for desirable production traits (van der Most et al. 2011).

IGF2BP1 Deletion Is the Causal Variant for a Major QTL Associated with Body Size

Many QTGs or QTLs associated with chicken growth traits have been identified, of which loci located at chromosomes 27, 4, and 1 have the largest impact on growth in chicken (Sheng et al. 2013; Ma et al. 2019; Wang, Bu, et al. 2020; Wang, Cao, et al. 2020; Zhang, Wang, et al. 2021). To our knowledge, the study in 2003 was the first time to report that a large QTL region located between 4.0 and 6.1 Mb in chromosome 27 was associated with chicken body size (Kerje et al. 2003). After that, many studies identified the chicken growth trait QTL in chromosome 27, including the gene IGF2BP1 by SNP-based GWAS (Sheng et al. 2013; Ma et al. 2019; Wang, Bu, et al. 2020; Wang, Cao, et al. 2020). Our previous GWAS also revealed a signal peak correlated to body size trait, which was located at the genomic upstream of IGF2BP1 (Zhang, Wang, et al. 2021). GWAS in duck also revealed SNPs located at the genomic upstream region of IGF2BP1 that showed significant association with body size traits, whereas a higher expression level of IGF2BP1 is correlated to better performance (Zhou et al. 2018). Altogether, IGF2BP1 is a potential major gene associated with body size traits, but the causal variant regulating these traits has not been reported previously. In this study, using a genotype–phenotype association, we found two mutant alleles in the IGF2BP1 promoter region that contributed to larger body size. We also observed a stronger association in females than males (fig. 5; supplementary fig. S10, Supplementary Material online). We compared the phenotypes among L1W, L1L1, and WW chickens to estimate the inheritance mode of the deletions. Taking the CW1 trait as an example, we found no significant difference in CW1 between L1W and L1L1 (P = 0.68) in males, whereas both are significantly heavier than WW (L1W vs. WW, P = 1.26E-3, L1L1 vs. WW, P = 2.60E-5). We inferred that there is a possible dominant effect of L1 against W in males. In females, however, we found no significant difference between WW and L1W (P = 0.42), whereas L1L1 are significantly heavier than L1W (P = 5.35E-7) and WW (P = 4.0E-6). There is a possible recessive effect of L1 against W in females. One possible reason is that this autosomal deletion locus shows sex-influenced inheritance, with a dominant effect in males and a recessive effect in females. There may be a putative binding site of androgen-mediated transcription factor located on this deletion region. We also found three conserved elements based on 77 vertebrates basewise PhyloP conservation score, suggesting a putative regulatory function (fig. 4). These deletions in the promoter region may increase IGF2BP1 expression by upregulating its transcriptional activity (fig. 6). Further studies are required to elucidate the upstream regulatory pathway. Together with our GWAS analysis, the mutant genotype is associated with higher expression of IGF2BP1 and improved productivity traits (figs. 5 and 6). Our findings are consistent with findings that higher expression of IGF2BP1 is linked to the larger body size in duck (Zhou et al. 2018). Although the IGF2BP1 mutation only explains a moderate 2–4% of phenotypic variation, this is in fact a substantial effect for a complex quantitative trait like body size. For instance, in humans two key variants for lean body mass explained 0.23% and 0.16% of the variance (Zillikens et al. 2017) and approximately 50 variants for height only explain approximately 5% of the variance (Yang et al. 2010). After examining the flanking regions of the deletion, the only five SNPs correlated to the deletion showed extremely low conservation scores implying that the deletion is the unique functional variant in this region. Based on this combined evidence, we propose that the deletion in the IGF2BP1 promoter region is the causal variant for the QTL located at chromosome 27 that was previously reported to be related to body size in chicken.

Conclusion

Collectively, this first chicken pan-genome provides a foundation for future chicken population genetics and evolutionary genomics studies. PAV analysis offers an opportunity to uncover genomic architecture and identify the change of gene content during domestication and improvement, helping the designing of future chicken breeds with desired traits. We dissect the causal variant of one of the major QTLs contributing to body size in chicken using PAV-based GWAS. The deletions that we found can be applied as markers for breeding programs using marker-assisted selection. As pan-genomic studies become more common, PAV-based GWAS will provide a powerful complement to SNP-based GWAS for identifying functional variants of economically or evolutionary important traits.

Materials and Methods

Genomic Sequencing of Chicken

A total of 868 individuals were used in this study, of which 664 were used to construct the chicken pan-genome (supplementary table S1, Supplementary Material online). We downloaded 509 accessions, published in recent genome resequencing studies (Fan et al. 2013; Wang et al. 2015; Ulfah et al. 2016; Li, Che, et al. 2017; Lawal et al. 2018; Qanbari et al. 2019; Huang et al. 2020; Wang, Thakur, et al. 2020), from the National Center for Biotechnology Information (NCBI) Sequence Read Archive database (supplementary table S1, Supplementary Material online). Sequencing data of 150 Henan indigenous chickens and 204 Gushi×Anak F2 individuals were generated in this study, and further data for an additional five Xichuan black-bone chickens were generated in our previous study (Li, Sun, et al. 2020). Genomic DNA was extracted from chicken blood using Qiagen DNeasy Kit. Paired-end libraries with approximately 500 bp insert size were constructed and then subjected to sequencing using the BGISEQ-500 platform to generate paired-end 150 bp reads (BGI Genomics Co., Ltd. and Beijing Fuyu Biotechnology Co., Ltd, China). We also downloaded ten pool sequencing data, including five HWS and five LWS pool data from the NCBI database using project number PRJNA516366 (Lillie et al. 2018).

Pan-Genome Construction and Annotation

Raw reads were processed to remove low-quality reads and generate adaptor free clean reads using Trimmomatic (v0.36) (Bolger et al. 2014). The pan-genome was constructed by a reference-based iterative mapping and assembly approach using the GRCg6a assembly as a starting reference genome (Golicz, Batley, et al. 2016; Golicz, Bayer, et al. 2016). The reference-based iterative mapping and assembly approach (Golicz, Batley, et al. 2016; Golicz, Bayer, et al. 2016) was first applied in a pan-genome study of the crop Brassica oleracea. This approach allows using sequencing data from a large range of individuals from different populations to construct a pan-genome. Briefly, clean reads were mapped to the reference genome (Ensemble Gallus_gallus.GRCg6a.dna.toplevel.fa) using bowtie2 (v2.3.5.1) (Langmead and Salzberg 2012). Unmapped reads were extracted using SAMtools and then assembled using MaSuRCA v3.3.1 (Zimin et al. 2013). After pan-genome construction, newly assembled contigs of nonreference sequences with length larger than 500 bp were kept. Contaminant sequences were filtered by the following two steps. Firstly, contigs were aligned using BlastN v2.9.0 (Camacho et al. 2009) against the NT database (v5, 07-03-2019) of contaminant taxid groups, which includes archaea, viruses, bacteria, fungi, and Viridiplantae. Secondly, the remaining contigs were classified and filtered using Kraken2 (v 2.0.9-beta) based on the kraken2-microbial database, which consists of archaea, bacteria, fungi, protozoa, viral and human sequences (https://lomanlab.github.io/mockcommunity/mc_databases.html, last accessed April 2019) (Wood et al. 2019). The unclassified contigs were defined as contamination free. The final contamination-free nonreference sequences and the reference Gallus/GRCg6a genome were merged to generate the chicken pan-genome. A custom repeat library was constructed by scanning the final nonreference sequence using RepeatModeler (v1.0.11) (Flynn et al. 2020). A custom repeat library and the RepBase database (downloaded in June 2019) of vertebrates were used to detect the repeat sequences with RepeatMasker (v4.0.8) (Tarailo-Graovac and Chen 2009). The MAKER2 annotation pipeline was used to obtain a set of high-confidence annotation based on RNA-seq evidence, homologous protein evidence, and ab initio gene prediction evidence(Holt and Yandell 2011). RNA-seq evidence was generated using Hisat2-Stringtie pipeline (Pertea et al. 2016) with published data from available tissues (supplementary table S2, Supplementary Material online). Protein sequences of chicken, human, and other mammals and vertebrates were collected from the Uniprot database (https://www.uniprot.org/, last accessed July 2019). Ab initio gene prediction was implemented using SNAP (Korf 2004) and Augustus (Stanke et al. 2006) with the “chicken” model selected. Finally, redundant assembled protein sequences were filtered with CD-HIT (Fu et al. 2012) (-c 0.9 -n 5 -M 16000 -T 18) with the threshold of 90% similarity.

PAV Calling

Gene PAV was determined based on the cumulative coverage of exons of each gene. The longest transcripts were retrieved as the gene body to avoid redundant gene counts. If at least two reads covered more than 5% cumulative coverage of all exons, this gene was defined as present in an individual. Otherwise, it was defined as absent (Golicz, Bayer, et al. 2016). Clean reads were aligned to the pan-genome using BWA-MEM (v0.7.17) (Li and Durbin 2009) with default parameters, and the sequences depth of each sample was captured using Mosdepth package (v0.2.5)(Pedersen and Quinlan 2018). High-depth sequencing data (>30×) are preferable to increase the robustness of PAV analysis; however, it is not economical to sequence large samples numbers at this depth. Low-depth data (<15×) is a viable and more economical means to carry out PAV analysis in large sets of diverse samples (Gao et al. 2019; Sherman et al. 2019; Jayakodi et al. 2020). To estimate the impact of the sequencing depth on gene PAV calling, we extracted reads from reference genome individual with varying depths of sequences to determine the minimum sequence depth required to call a confident gene PAV. An average sequence depth of 10× was considered as the threshold for including a sample since this threshold is estimated to allow a 99.94% recovery rate of gene PAV (Gao et al. 2019) (supplementary fig. S2, Supplementary Material online). We also performed additional simulation analysis using sequencing data of random seven breeds and found that 98.4–99.5% of pan-genome genes can be called when the average sequence depth reaching 10× (supplementary fig. S2, Supplementary Material online). Thus, to get a high confident PAV matrix, individuals with an average depth above 10× were kept to perform gene PAV calling. Additionally, the sequencing data of red jungle fowls in Thailand were reported to be contaminated by domestic chicken sequences and were removed for the PAV calling (Qanbari et al. 2019; Wang, Thakur, et al. 2020). PAV calling for promoter region was performed using the same method of gene PAV calling that is described above but based on the gene promoter regions. We divided the promoter region into three 1-kb windows based on the distance to the TSS. The three blocks were in the 0–1, 1–2, and 2–3 kb regions upstream to the TSS of genes in the reference genome. A PAV was considered as present if more than 50% cumulative coverage with at least two reads was identified; otherwise, it was considered absent (Golicz, Bayer, et al. 2016).

PAV Analysis

The gene PAV matrix was subjected to population genetic analysis. Principal component analysis and neighbor-joining phylogenetic analysis were conducted using TASSEL5 (Bradbury et al. 2007). To identify the PAV with frequency significantly changed during improvement, the PAV frequency of each gene was compared between the native breeds and commercial breeds. Fisher’s exact test was employed to identify significant PAV with false discovery rate 0.001 (Gao et al. 2019). Significantly increased genes were defined as genes having a significantly higher frequency in the commercial breeds than the native breeds. Inversely, we consider genes with a significantly lower frequency as significantly decreased genes. To identify the promoter region with significantly changed during chicken improvement, PAV patterns were also analyzed using the same method as gene PAV frequency calculation.

PAV-Based GWAS

PAV-based GWAS was also implemented to identify the candidate genes associated with 151 traits in a Gushi×Anak F2 mapping population with 204 individuals. To reduce bias, gene PAVs were removed if they were located on sex chromosomes or showed a minor allele frequency less than 0.05. A general linear model (GLM) was employed for association analysis using TASSEL5 (Bradbury et al. 2007), with sex and the first five PCA eigenvectors defined as fixed effects. A Bonferroni test was used to define the genome-wide significant (0.05/number of loci) or suggestive (0.1/number of loci) cut-off threshold.

GO Annotation

Functional annotation of the pangenome was performed using command line Blast2GO (Conesa et al. 2005) v2.5. The pan-genome genes were aligned to the proteins in the Uniref90 database (downloaded on Sep 2019) using BlastP (Camacho et al. 2009), and only alignments with E-values < 1 × 10−5 were used. Then, the BLAST results were reformatted to satisfy Blast2GO naming requirements. GO annotation of the variable genes was conducted by the R package topGO (Alexa et al. 2006) using Fisher’s exact test with the approach “elim” used to correct for multiple comparisons.

Genotyping of IGF2BP1 PAV and Association Analysis

Three primers, including one forward and two reverse primers, were designed based on the sequence of the IGF2BP1 promoter region (fig. 4). One pair of primer, Asp-F and Asp-R, was used for genotyping L1 and W alleles, whereas another pair, 2k-F and Asp-R, was used to genotypes L2 and W (fig. 4supplementary table S8, Supplementary Material online). PCR was conducted as described below: 5 pmol of each primer, 100 ng of genomic DNA, 2 µl 10× PCR buffer (Takara), 100 uM dNTP mixture, and 1 µl Taq polymerase. Association analysis of the validation population was conducted between genotypes (L1L1, L1W, and WW) in IGF2BP1 PAV and 151 traits (supplementary note, Supplementary Material online) in F2 population with 734 individuals using GLM as described as above. The value of marker R-squared was used to explain the phenotype variation of IGF2BP loci, as computed from the marker sum of squares after fitting all other model terms divided by the total sum of squares (Bradbury et al. 2007).

Functional Assay of IGF2BP1 Promoter Region and IGF2BP1 Expression

Three kinds of IGF2BP1 promoter region (L1, L2, and W) were cloned into pGL3-Basic luciferase vector (Promega) using Clone-F and Clone-R primers (supplementary table S9, Supplementary Material online). All recombinant plasmids, together with the pRL-TK plasmid (Promega), were transfected into DF-1 (chicken fibroblast cell) cell line. After 48 h, the transcriptional activity was investigated by the Dual-Luciferase Reporter Assay System (Promega). Quantitative PCR was conducted to investigate the mRNA level of IGF2BP1 using primers IGF2BP1-qF and IGF2BP1-qR (supplementary table S9, Supplementary Material online). The relative expression level of IGF2BP1 was normalized by GAPDH using the 2−△△ct method.

Investigating the Flanking SNPs of Deletions

The deletion and its flanking regions (chr27:6072202–6095435) were analyzed by GATK (v3.8) pipeline (McKenna et al. 2010) using the same 664 individuals for building the pan-genome. Genotypes of the IGF2BP1 deletion of each sample were determined by the GATK results and manually checking the alignments by IGV (version 2.4.3). Then samples with the same genotypes were grouped together, and the SNP associated with the IGF2BP1 genotypes was defined as 1) significant in the chi-squared test, 2) the mutant allele of the SNP has an allelic frequency higher than 0.8 in the deletion group, and 3) the allelic frequency difference between the two compared groups greater than 0.5. Three different deletion groups were used in three different comparisons. They are L1L1 group, L2L2 group, and L1L1 + L2L2 group, all compared with WW group, respectively.

Ethics Declarations

Ethics approval for this study was obtained from Henan Agricultural University.

Supplementary Material

Supplementary data are available at Molecular Biology and Evolution online. Click here for additional data file.
  102 in total

1.  Using RepeatMasker to identify repetitive elements in genomic sequences.

Authors:  Maja Tarailo-Graovac; Nansheng Chen
Journal:  Curr Protoc Bioinformatics       Date:  2009-03

2.  Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown.

Authors:  Mihaela Pertea; Daehwan Kim; Geo M Pertea; Jeffrey T Leek; Steven L Salzberg
Journal:  Nat Protoc       Date:  2016-08-11       Impact factor: 13.491

3.  Cytosolic calcium regulation in rat afferent vagal neurons during anoxia.

Authors:  Michael Henrich; Keith J Buckler
Journal:  Cell Calcium       Date:  2013-10-17       Impact factor: 6.817

4.  Structural variation in the chicken genome identified by paired-end next-generation DNA sequencing of reduced representation libraries.

Authors:  Hindrik Hd Kerstens; Richard Pma Crooijmans; Bert W Dibbits; Addie Vereijken; Ron Okimoto; Martien Am Groenen
Journal:  BMC Genomics       Date:  2011-02-03       Impact factor: 3.969

5.  Liraglutide Reduces Both Atherosclerosis and Kidney Inflammation in Moderately Uremic LDLr-/- Mice.

Authors:  Line S Bisgaard; Markus H Bosteen; Lisbeth N Fink; Charlotte M Sørensen; Alexander Rosendahl; Christina K Mogensen; Salka E Rasmussen; Bidda Rolin; Lars B Nielsen; Tanja X Pedersen
Journal:  PLoS One       Date:  2016-12-16       Impact factor: 3.240

6.  Fast and accurate short read alignment with Burrows-Wheeler transform.

Authors:  Heng Li; Richard Durbin
Journal:  Bioinformatics       Date:  2009-05-18       Impact factor: 6.937

7.  CD-HIT: accelerated for clustering the next-generation sequencing data.

Authors:  Limin Fu; Beifang Niu; Zhengwei Zhu; Sitao Wu; Weizhong Li
Journal:  Bioinformatics       Date:  2012-10-11       Impact factor: 6.937

8.  Genomic data for 78 chickens from 14 populations.

Authors:  Diyan Li; Tiandong Che; Binlong Chen; Shilin Tian; Xuming Zhou; Guolong Zhang; Miao Li; Uma Gaur; Yan Li; Majing Luo; Long Zhang; Zhongxian Xu; Xiaoling Zhao; Huadong Yin; Yan Wang; Long Jin; Qianzi Tang; Huailiang Xu; Mingyao Yang; Rongjia Zhou; Ruiqiang Li; Qing Zhu; Mingzhou Li
Journal:  Gigascience       Date:  2017-06-01       Impact factor: 6.524

9.  Detection of CNV in the SH3RF2 gene and its effects on growth and carcass traits in chickens.

Authors:  Zhenzhu Jing; Xinlei Wang; Yingying Cheng; Chengjie Wei; Dan Hou; Tong Li; Wenya Li; Ruili Han; Hong Li; Guirong Sun; Yadong Tian; Xiaojun Liu; Xiangtao Kang; Zhuanjian Li
Journal:  BMC Genet       Date:  2020-02-28       Impact factor: 2.797

10.  Large meta-analysis of genome-wide association studies identifies five loci for lean body mass.

Authors:  M Carola Zillikens; Serkalem Demissie; Yi-Hsiang Hsu; Laura M Yerges-Armstrong; Wen-Chi Chou; Lisette Stolk; Gregory Livshits; Linda Broer; Toby Johnson; Daniel L Koller; Zoltán Kutalik; Jian'an Luan; Ida Malkin; Janina S Ried; Albert V Smith; Gudmar Thorleifsson; Liesbeth Vandenput; Jing Hua Zhao; Weihua Zhang; Ali Aghdassi; Kristina Åkesson; Najaf Amin; Leslie J Baier; Inês Barroso; David A Bennett; Lars Bertram; Rainer Biffar; Murielle Bochud; Michael Boehnke; Ingrid B Borecki; Aron S Buchman; Liisa Byberg; Harry Campbell; Natalia Campos Obanda; Jane A Cauley; Peggy M Cawthon; Henna Cederberg; Zhao Chen; Nam H Cho; Hyung Jin Choi; Melina Claussnitzer; Francis Collins; Steven R Cummings; Philip L De Jager; Ilja Demuth; Rosalie A M Dhonukshe-Rutten; Luda Diatchenko; Gudny Eiriksdottir; Anke W Enneman; Mike Erdos; Johan G Eriksson; Joel Eriksson; Karol Estrada; Daniel S Evans; Mary F Feitosa; Mao Fu; Melissa Garcia; Christian Gieger; Thomas Girke; Nicole L Glazer; Harald Grallert; Jagvir Grewal; Bok-Ghee Han; Robert L Hanson; Caroline Hayward; Albert Hofman; Eric P Hoffman; Georg Homuth; Wen-Chi Hsueh; Monica J Hubal; Alan Hubbard; Kim M Huffman; Lise B Husted; Thomas Illig; Erik Ingelsson; Till Ittermann; John-Olov Jansson; Joanne M Jordan; Antti Jula; Magnus Karlsson; Kay-Tee Khaw; Tuomas O Kilpeläinen; Norman Klopp; Jacqueline S L Kloth; Heikki A Koistinen; William E Kraus; Stephen Kritchevsky; Teemu Kuulasmaa; Johanna Kuusisto; Markku Laakso; Jari Lahti; Thomas Lang; Bente L Langdahl; Lenore J Launer; Jong-Young Lee; Markus M Lerch; Joshua R Lewis; Lars Lind; Cecilia Lindgren; Yongmei Liu; Tian Liu; Youfang Liu; Östen Ljunggren; Mattias Lorentzon; Robert N Luben; William Maixner; Fiona E McGuigan; Carolina Medina-Gomez; Thomas Meitinger; Håkan Melhus; Dan Mellström; Simon Melov; Karl Michaëlsson; Braxton D Mitchell; Andrew P Morris; Leif Mosekilde; Anne Newman; Carrie M Nielson; Jeffrey R O'Connell; Ben A Oostra; Eric S Orwoll; Aarno Palotie; Stephen C J Parker; Munro Peacock; Markus Perola; Annette Peters; Ozren Polasek; Richard L Prince; Katri Räikkönen; Stuart H Ralston; Samuli Ripatti; John A Robbins; Jerome I Rotter; Igor Rudan; Veikko Salomaa; Suzanne Satterfield; Eric E Schadt; Sabine Schipf; Laura Scott; Joban Sehmi; Jian Shen; Chan Soo Shin; Gunnar Sigurdsson; Shad Smith; Nicole Soranzo; Alena Stančáková; Elisabeth Steinhagen-Thiessen; Elizabeth A Streeten; Unnur Styrkarsdottir; Karin M A Swart; Sian-Tsung Tan; Mark A Tarnopolsky; Patricia Thompson; Cynthia A Thomson; Unnur Thorsteinsdottir; Emmi Tikkanen; Gregory J Tranah; Jaakko Tuomilehto; Natasja M van Schoor; Arjun Verma; Peter Vollenweider; Henry Völzke; Jean Wactawski-Wende; Mark Walker; Michael N Weedon; Ryan Welch; H-Erich Wichmann; Elisabeth Widen; Frances M K Williams; James F Wilson; Nicole C Wright; Weijia Xie; Lei Yu; Yanhua Zhou; John C Chambers; Angela Döring; Cornelia M van Duijn; Michael J Econs; Vilmundur Gudnason; Jaspal S Kooner; Bruce M Psaty; Timothy D Spector; Kari Stefansson; Fernando Rivadeneira; André G Uitterlinden; Nicholas J Wareham; Vicky Ossowski; Dawn Waterworth; Ruth J F Loos; David Karasik; Tamara B Harris; Claes Ohlsson; Douglas P Kiel
Journal:  Nat Commun       Date:  2017-07-19       Impact factor: 17.694

View more
  10 in total

1.  Intestinal Microbiota and Host Cooperate for Adaptation as a Hologenome.

Authors:  Hao Zhou; Lingyu Yang; Jinmei Ding; Ronghua Dai; Chuan He; Ke Xu; Lingxiao Luo; Lu Xiao; Yuming Zheng; Chengxiao Han; Fisayo T Akinyemi; Christa F Honaker; Yan Zhang; Paul B Siegel; He Meng
Journal:  mSystems       Date:  2022-01-11       Impact factor: 6.496

2.  Genetic effect of an InDel in the promoter region of the NUDT15 and its effect on myoblast proliferation in chickens.

Authors:  Chengjie Wei; Yufang Niu; Bingjie Chen; Panpan Qin; Yanxing Wang; Dan Hou; Tong Li; Ruiting Li; Chunxiu Wang; Huadong Yin; Ruili Han; Huifen Xu; Yadong Tian; Xiaojun Liu; Xiangtao Kang; Zhuanjian Li
Journal:  BMC Genomics       Date:  2022-02-16       Impact factor: 3.969

3.  Pan-Genome Analysis Reveals the Abundant Gene Presence/Absence Variations Among Different Varieties of Melon and Their Influence on Traits.

Authors:  Yang Sun; Jing Wang; Yan Li; Bin Jiang; Xu Wang; Wen-Hui Xu; Yu-Qing Wang; Pei-Tao Zhang; Yong-Jun Zhang; Xiang-Dong Kong
Journal:  Front Plant Sci       Date:  2022-03-25       Impact factor: 5.753

Review 4.  Expanding Gene-Editing Potential in Crop Improvement with Pangenomes.

Authors:  Cassandria G Tay Fernandez; Benjamin J Nestor; Monica F Danilevicz; Jacob I Marsh; Jakob Petereit; Philipp E Bayer; Jacqueline Batley; David Edwards
Journal:  Int J Mol Sci       Date:  2022-02-18       Impact factor: 5.923

5.  Identification of Body Size Determination Related Candidate Genes in Domestic Pig Using Genome-Wide Selection Signal Analysis.

Authors:  Bing Pan; Haoyuan Long; Ying Yuan; Haoyuan Zhang; Yangyang Peng; Dongke Zhou; Chengli Liu; Baiju Xiang; Yongfu Huang; Yongju Zhao; Zhongquan Zhao; Guangxin E
Journal:  Animals (Basel)       Date:  2022-07-19       Impact factor: 3.231

6.  Gene expression profiles of specific chicken skeletal muscles.

Authors:  Hua Kui; Bo Ran; Maosen Yang; Xin Shi; Yingyu Luo; Yujie Wang; Tao Wang; Diyan Li; Surong Shuai; Mingzhou Li
Journal:  Sci Data       Date:  2022-09-08       Impact factor: 8.501

7.  Prediction of transcript isoforms in 19 chicken tissues by Oxford Nanopore long-read sequencing.

Authors:  Dailu Guan; Michelle M Halstead; Alma D Islas-Trejo; Daniel E Goszczynski; Hans H Cheng; Pablo J Ross; Huaijun Zhou
Journal:  Front Genet       Date:  2022-10-03       Impact factor: 4.772

8.  ELOVL gene family plays a virtual role in response to breeding selection and lipid deposition in different tissues in chicken (Gallus gallus).

Authors:  Dandan Wang; Xinyan Li; Panpan Zhang; Yuzhu Cao; Ke Zhang; Panpan Qin; Yulong Guo; Zhuanjian Li; Yadong Tian; Xiangtao Kang; Xiaojun Liu; Hong Li
Journal:  BMC Genomics       Date:  2022-10-17       Impact factor: 4.547

9.  Large-Scale Whole Genome Sequencing Study Reveals Genetic Architecture and Key Variants for Breast Muscle Weight in Native Chickens.

Authors:  Xiaodong Tan; Lu Liu; Xiaojing Liu; Huanxian Cui; Ranran Liu; Guiping Zhao; Jie Wen
Journal:  Genes (Basel)       Date:  2021-12-21       Impact factor: 4.096

10.  De Novo Assembly of 20 Chicken Genomes Reveals the Undetectable Phenomenon for Thousands of Core Genes on Microchromosomes and Subtelomeric Regions.

Authors:  Ming Li; Congjiao Sun; Naiyi Xu; Peipei Bian; Xiaomeng Tian; Xihong Wang; Yuzhe Wang; Xinzheng Jia; Rasmus Heller; Mingshan Wang; Fei Wang; Xuelei Dai; Rongsong Luo; Yingwei Guo; Xiangnan Wang; Peng Yang; Dexiang Hu; Zhenyu Liu; Weiwei Fu; Shunjin Zhang; Xiaochang Li; Chaoliang Wen; Fangren Lan; Amam Zonaed Siddiki; Chatmongkon Suwannapoom; Xin Zhao; Qinghua Nie; Xiaoxiang Hu; Yu Jiang; Ning Yang
Journal:  Mol Biol Evol       Date:  2022-04-10       Impact factor: 8.800

  10 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.