| Literature DB >> 25133774 |
Yiyuan Yan1, Guoqiang Yi1, Congjiao Sun1, Lujiang Qu1, Ning Yang1.
Abstract
Insertion and deletion (INDEL) is one of the main events contributing to genetic and phenotypic diversity, which receives less attention than SNP and large structural variation. To gain a better knowledge of INDEL variation in chicken genome, we applied next generation sequencing on 12 diverse chicken breeds at an average effective depth of 8.6. Over 1.3 million non-redundant short INDELs (1-49 bp) were obtained, the vast majority (92.48%) of which were novel. Follow-up validation assays confirmed that most (88.00%) of the randomly selected INDELs represent true variations. The majority (95.76%) of INDELs were less than 10 bp. Both the detected number and affected bases were larger for deletions than insertions. In total, INDELs covered 3.8 Mbp, corresponding to 0.36% of the chicken genome. The average genomic INDEL density was estimated as 0.49 per kb. INDELs were ubiquitous and distributed in a non-uniform fashion across chromosomes, with lower INDEL density in micro-chromosomes than in others, and some functional regions like exons and UTRs were prone to less INDELs than introns and intergenic regions. Nearly 620,253 INDELs fell in genic regions, 1,765 (0.28%) of which located in exons, spanning 1,358 (7.56%) unique Ensembl genes. Many of them are associated with economically important traits and some are the homologues of human disease-related genes. We demonstrate that sequencing multiple individuals at a medium depth offers a promising way for reliable identification of INDELs. The coding INDELs are valuable candidates for further elucidation of the association between genotypes and phenotypes. The chicken INDELs revealed by our study can be useful for future studies, including development of INDEL markers, construction of high density linkage map, INDEL arrays design, and hopefully, molecular breeding programs in chicken.Entities:
Mesh:
Substances:
Year: 2014 PMID: 25133774 PMCID: PMC4136736 DOI: 10.1371/journal.pone.0104652
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Summary of sequencing and mapping statistics.
| Chicken breeds | Raw reads | Mapped reads (Ratio,%) | Q20 Reads (Ratio,%) | Effective Depth (X) | Coverage (%) |
| BY | 122,734,374 | 108,899,430(89) | 98,055,844(80) | 8.2 | 94.78 |
| CS | 113,814,596 | 102,118,711(90) | 81,636,500(72) | 6.8 | 94.67 |
| DX | 160,799,966 | 146,498,490(91) | 125,254,482(78) | 10.5 | 95.26 |
| LX | 146,127,228 | 129,219,851(88) | 100,947,015(69) | 8.4 | 95.03 |
| RIR | 168,078,474 | 151,117,490(90) | 98,330,708(59) | 8.2 | 95.12 |
| RJF | 161,325,436 | 144,056,310(89) | 100,985,637(63) | 8.4 | 94.92 |
| SG | 141,222,608 | 124,890,623(88) | 82,966,840(59) | 6.9 | 94.42 |
| SK | 115,578,334 | 104,212,249(90) | 91,427,763(79) | 7.6 | 94.83 |
| TB | 132,544,720 | 121,217,370(91) | 103,736,982(78) | 8.7 | 95.07 |
| WC | 143,636,242 | 132,110,332(92) | 114,868,135(80) | 9.6 | 95.24 |
| WL | 131,298,592 | 120,759,421(92) | 112,326,911(86) | 9.4 | 95.42 |
| WR | 143,375,106 | 132,693,886(93) | 123,918,088(86) | 10.4 | 95.42 |
| Average | 140,044,640 | 126,482,847(90) | 102,871,242(73) | 8.6 | 95.02 |
*Chicken abbreviations: BY, Beijing You; CS, Cornish; DX, Dongxiang; LX, Luxi Game; RIR, Rhode Island Red; RJF, Red Jungle Fowl; SG, Shouguang; SK, Silkie; TB, Tibetan; WC, Wenchang; WL, White Leghorn; WR, White Plymouth Rock.
Short INDELs detected in 12 diverse chicken breeds.
| Chicken breeds | INDEL count | Affected bases (bp) | Novel (Ratio,%) | Maximum length (bp) | Indel Rate (kb−1) | |||||
| Total | Insertion | Deletion | Total | Insertion | Deletion | Insertion | Deletion | |||
| BY | 415,540 | 196,981 | 201,852 | 1,176,135 | 526,774 | 649,361 | 370,997(89.28) | 29 | 47 | 0.48 |
| CS | 368,813 | 175,456 | 179,557 | 1,050,011 | 470,460 | 579,551 | 327,938(88.92) | 29 | 47 | 0.52 |
| DX | 497,358 | 233,956 | 241,907 | 1,439,295 | 643,553 | 795,742 | 445,606(89.59) | 29 | 49 | 0.45 |
| LX | 435,935 | 205,138 | 213,611 | 1,266,168 | 562,840 | 703,328 | 390,120(89.49) | 28 | 45 | 0.50 |
| RIR | 421,309 | 200,618 | 203,666 | 1,219,384 | 550,048 | 669,336 | 375,640(89.16) | 29 | 47 | 0.49 |
| RJF | 451,695 | 213,427 | 219,902 | 1,294,215 | 582,830 | 711,385 | 407,594(90.24) | 30 | 47 | 0.51 |
| SG | 383,782 | 182,043 | 186,521 | 1,092,190 | 489,868 | 602,322 | 342,332(89.20) | 29 | 44 | 0.53 |
| SK | 400,982 | 189,927 | 195,966 | 1,146,582 | 512,067 | 634,515 | 356,250(88.84) | 29 | 45 | 0.50 |
| TB | 448,575 | 211,393 | 218,550 | 1,284,935 | 572,294 | 712,641 | 401,789(89.57) | 30 | 45 | 0.49 |
| WC | 476,889 | 223,743 | 233,162 | 1,384,543 | 614,443 | 770,100 | 427,478(89.64) | 29 | 45 | 0.47 |
| WL | 484,471 | 229,597 | 231,861 | 1,371,739 | 620,051 | 751,688 | 431,757(89.12) | 29 | 44 | 0.49 |
| WR | 528,174 | 248,212 | 254,789 | 1,519,228 | 681,962 | 837,266 | 472,901(89.54) | 29 | 45 | 0.49 |
| Union | 1,343,782 | 549,806 | 701,623 | 3,794,977 | 1,439,988 | 2,354,989 | 1,242,748(92.48) | 30 | 49 | 1.49 |
Chicken abbreviations: BY, Beijing You; CS, Cornish; DX, Dongxiang; LX, Luxi Game; RIR, Rhode Island Red; RJF, Red Jungle Fowl; SG, Shouguang; SK, Silkie; TB, Tibetan; WC, Wenchang; WL, White Leghorn; WR, White Plymouth Rock.
INDELs that have multiple genotypes were excluded.
Corrected for INDELs called in more than one individual.
Figure 1Distribution of INDEL length.
INDELs with multiple genotypes were not included.
Figure 2INDEL and SNP density in each chromosome.
Densities were calculated as the number per 10 kb (INDEL) and kb (SNP), respectively. Densities are averaged by chicken individuals and corrected by read depth. Coverage was calculated based on Q20 reads.
Figure 3SNP to INDEL ratio.
The ratios were plotted based on the non-redundant (Union) data and the data averaged by chickens (Average), respectively. A: SNP to INDEL ratio across chromosomes. B: SNP to INDEL ratios in functional categories.
Statistics of INDELs and SNPs in functional regions.
| Category | INDEL | Category | SNP |
| Intergenic | 690,303 | Intergenic | 7,035,013 |
| Flanking region | 33,226 | Flanking region | 345,361 |
| Upstream | 13,925 | Upstream | 163,068 |
| Downstream | 18,374 | Downstream | 171,990 |
| Up/downstream | 927 | Up/downstream | 10,303 |
| Genic | 620,253 | Genic | 6,328,186 |
| 5′URT | 1,388 | 5′URT | 19,814 |
| 3′URT | 16,372 | 3′URT | 139,967 |
| 5′/3′UTR | 10 | 5′/3′UTR | 151 |
| Splicing | 318 | Splicing | 543 |
| ncRNA | 219 | ncRNA | 1,740 |
| Intronic | 600,181 | Intronic | 5,997,846 |
| Exonic | 1,765 | Exonic | 168,125 |
| Non-frameshift | 720 | Synonymous | 119,816 |
| Frameshift | 1,022 | Non-synonymous | 47,915 |
| Stop gain/loss | 23 | Stop gain/loss | 394 |
Regions that are 1 kb apart from the transcription start site.
Variant located in both upstream and downstream regions (possibly for two different genes).
Variants located in both 5′UTR and 3′UTR regions (possibly for two different genes).
Variants located in the transcripts without coding annotation in the current Ensembl gene annotation.
Variants caused gain or loss of stop codon.