| Literature DB >> 30988672 |
Avhashoni A Zwane1,2, Robert D Schnabel3,4, Jesse Hoff3, Ananyo Choudhury5, Mahlako Linah Makgahlela1,6, Azwihangwisi Maiwashe1,6, Este Van Marle-Koster2, Jeremy F Taylor3.
Abstract
Single nucleotide polymorphism arrays have created new possibilities for performing genome-wide studies to detect genomic regions harboring sequence variants that affect complex traits. However, the majority of validated SNPs for which allele frequencies have been estimated are limited primarily to European breeds. The objective of this study was to perform SNP discovery in three South African indigenous breeds (Afrikaner, Drakensberger, and Nguni) using whole genome sequencing. DNA was extracted from blood and hair samples, quantified and prepared at 50 ng/μl concentration for sequencing at the Agricultural Research Council Biotechnology Platform using an Illumina HiSeq 2500. The fastq files were used to call the variants using the Genome Analysis Tool Kit. A total of 1,678,360 were identified as novel using Run 6 of 1000 Bull Genomes Project. Annotation of the identified variants classified them into functional categories. Within the coding regions, about 30% of the SNPs were non-synonymous substitutions that encode for alternate amino acids. The study of distribution of SNP across the genome identified regions showing notable differences in the densities of SNPs among the breeds and highlighted many regions of functional significance. Gene ontology terms identified genes such as MLANA, SYT10, and CDC42EP5 that have been associated with coat color in mouse, and ADAMS3, DNAJC3, and PAG5 genes have been associated with fertility in cattle. Further analysis of the variants detected 688 candidate selective sweeps (ZHp Z-scores ≤ -4) across all three breeds, of which 223 regions were assigned as being putative selective sweeps (ZHp scores ≤-5). We also identified 96 regions with extremely low ZHp Z-scores (≤-6) in Afrikaner and Nguni. Genes such as KIT and MITF that have been associated with skin pigmentation in cattle and CACNA1C, which has been associated with biopolar disorder in human, were identified in these regions. This study provides the first analysis of sequence data to discover SNPs in indigenous South African cattle breeds. The information will play an important role in our efforts to understand the genetic history of our cattle and in designing appropriate breed improvement programmes.Entities:
Keywords: annotation; genes; indigenous breeds; novel variants; sequencing
Year: 2019 PMID: 30988672 PMCID: PMC6452414 DOI: 10.3389/fgene.2019.00273
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Sequencing results for indigenous Afrikaner, Drakensberger, and Nguni cattle breeds.
| Breed | Animals | Raw | Non-duplicated | Properly paired | Mapped | High quality | Average |
|---|---|---|---|---|---|---|---|
| AFR | 30 | 537,681,018 | 518,717,587 | 500,986,036 | 536,215,468 | 424,043,570 (79%) | 21.2X |
| DRA | 30 | 540,797,394 | 498,063,449 | 502,707,076 | 537,486,252 | 385,388,748 (71%) | 15.4X |
| NGI | 30 | 682,407,201 | 646,078,421 | 640,580,750 | 680,935,451 | 528,151,411 (77%) | 26.6.X |
| Total | 90 | 1,760,885,613 | 1,662,859,457 | 1,644,273,862 | 1,754,637,171 | 1,337,583,729 (76%) | 21.1X |
Summary of SNPs and Indels identified in Afrikaner (AFR), Drakensberger (DRA), and Nguni (NGI).
| SNPs | Indels | ||||
|---|---|---|---|---|---|
| Breed | No. variants | No. SNPs | Proportion SNPs | No. Indels | Proportion indels |
| AFR | 11,165,172 | 9,950,384 | 0.89 | 1,212,231 | 0.11 |
| DRA | 7,049,789 | 6,327,515 | 0.90 | 721,628 | 0.10 |
| NGI | 12,514,952 | 11,164,415 | 0.89 | 1,347,215 | 0.11 |
| Total | 17,243,304 | 15,442, 314 | 0.89 | 1,908,137 | 0.11 |
FIGURE 1The number of SNPs identified among the three indigenous South African breeds.
Novel variants identified in the three breeds through comparison to 1000 Bull Genomes Project Run 6 data.
| All Variants | SNPs | |||||||
|---|---|---|---|---|---|---|---|---|
| Breed | Known | Novel | Total | Proportion | Known | Novel | Total | Proportion |
| AFR | 9,381,545 | 614,536 | 9,996,081 | 0.07 | 8,576,732 | 617,296 | 9,194,028 | 0.07 |
| DRA | 6,307,154 | 381,743 | 6,688,897 | 0.06 | 5,764,627 | 413,795 | 6,178,422 | 0.06 |
| NGI | 10,693,999 | 631,412 | 11,325,411 | 0.07 | 9,793,635 | 647,269 | 10,440,904 | 0.07 |
| Total | 26,382,698 | 1,627,691 | 28,010,389 | 0.07 (Av) | 24,134,994 | 1,678,360 | 25,813,354 | 0.07 (Av) |
FIGURE 2Validation of novel SNPs using Run 6 of 1000 Genomes Project data.
Counts of SNPs within each functional class for gene regions.
| SNP class | Count | Total | |||||
|---|---|---|---|---|---|---|---|
| AFR | % | DRA | % | NGI | % | ||
| Downstream_gene | 437,355 | 4.4 | 288,515 | 4.6 | 440,357 | 3.9 | 1,166,227 |
| Stop_lost | 318 | 0.003 | 200 | 0.003 | 350 | 0.003 | 868 |
| Stop_gain | 38 | 0.0004 | 22 | 0.0003 | 15 | 0.0001 | 75 |
| Splice_site | 7,650 | 0.08 | 5,305 | 0.008 | 7,553 | 0.07 | 20,508 |
| Upstream_gene | 433,495 | 4.4 | 435,935 | 6.9 | 435,955 | 3.9 | 1,305,385 |
| Intronic | 2,726,502 | 27.4 | 1,800,155 | 28.4 | 2,731,530 | 24.5 | 7,258,187 |
| miRNA | 32,911 | 0.33 | 21,670 | 0.34 | 33,544 | 0.3 | 88,125 |
| Synonymous_coding | 38,537 | 0.4 | 29,836 | 0.47 | 40,694 | 0.36 | 109,067 |
| Nonsynonymous_coding | 31,205 | 0.31 | 22,395 | 0.35 | 31,130 | 0.28 | 84,730 |
| 3′_UTR | 18,999 | 0.2 | 13,163 | 0.21 | 18,968 | 1.7 | 51,130 |
| 5′_UTR | 3,974 | 0.04 | 3,055 | 0.05 | 3,805 | 0.034 | 10,834 |
| Within_non_coding_gene | 8,561 | 0.09 | 5,608 | 0.09 | 8,725 | 0.08 | 22,894 |
| Essential_splice_site | 182 | 0.002 | 124 | 0.002 | 192 | 0.002 | 498 |
| Total | 3,739,545 | 37.6 | 2,625,859 | 41.5 | 3,752,626 | 33.6 | 10,033,300 |
Counts of Indels by functional class for gene regions.
| Indel class | Count | Total | |||||
|---|---|---|---|---|---|---|---|
| AFR | % | DRA | % | NGI | % | ||
| Downstream_gene | 126,159 | 10.4 | 73,669 | 10.2 | 50,823 | 3.8 | 250,651 |
| Stop_lost | 43 | 0.004 | 49 | 0.007 | 26 | 0.002 | 118 |
| Stop_gain | 82 | 0.007 | 115 | 0.016 | 34 | 0.003 | 231 |
| Splice_site | 2481 | 0.2 | 1667 | 0.23 | 952 | 0.007 | 5,100 |
| Upstream_gene | 123,341 | 10.2 | 71,747 | 10.4 | 48,080 | 3.6 | 243,168 |
| Intronic | 745,500 | 61.5 | 431,225 | 59.8 | 317,114 | 23.5 | 1,493,839 |
| miRNA | 10,296 | 0.85 | 5,644 | 0.8 | 3,816 | 0.28 | 19,756 |
| Synonymous_coding | 1,004 | 0.08 | 855 | 0.12 | 449 | 0.33 | 2,308 |
| Non-synonymous_coding | 2,943 | 0.24 | 2,293 | 0.32 | 1,145 | 0.008 | 6,381 |
| 3′_UTR | 5,574 | 0.46 | 3,165 | 0.44 | 2,166 | 0.16 | 10,905 |
| 5 ′_UTR | 842 | 0.07 | 660 | 0.01 | 376 | 0.028 | 1,878 |
| Within_non_coding_gene | 2,141 | 0.18 | 1,311 | 0.18 | 545 | 0.04 | 3,997 |
| Total | 1,020,406 | 84.1 | 592,400 | 82.1 | 425,526 | 31.6 | 2,038,332 |
FIGURE 3Distribution of SNPs in 1 Mb windows (A) All SNPs (B) Missense SNPs (C) LoF (stop gain and stop loss) SNPs (D) Novel SNPs. The values expressed are Number of SNPs per kilobase. The inner circle represents AFR, the middle circle represent DRA and the outer circle represent NGI.
FIGURE 4Distribution of ZHp Z-scores across all 29 autosomes for Afrikaner (AFR), Drakensberger (DRA), and Nguni (NGI). The horizontal lines indicate ZHp Z-score thresholds of -4 and -5 used to define candidate and putative selective sweep regions in this study.
List of genes within SNP enriched genomic regions in the top 100 kb window.
| Gene | CHR | Function | Species | Reference |
|---|---|---|---|---|
| 3 | Gene silencing by miRNA | Human | ||
| 11 | Abnormal coat/hair pigmentation, thin skin, decreased body weight, kidney failure, anemia, hypertension, increased heart rate | Mouse | ||
| 11 | Increased anxiety, feeding behavior, heart failure, decreased drinking behavior, parkinsonian disorders | Mouse, rat | ||
| 11 | Premature death, abnormal heart morphology | Mouse | ||
| 11 | Cardiovascular system phenotype, decreased anxiety-related response | Mouse | ||
| 2 | Serkal syndrome, female sex determination, kidney failure, male sex differentiation | Mammals, mouse | ||
| 2 | Negative regulation of gene expression, hair follicle placode formation, spinal cord injuries, bipolar disorder, epilepsy arthritis | Mouse, rat | ||
| 8 | Diluted coat color, hair morphology | Mouse | ||
| 4 | Decreased total body fat amount, pilocytic astrocytoma (brain tumor) | Human, mouse | ||
| 3 | Decreased lean body mass, length, increased total body fat amount | Mouse | ||
| 10 | Prostatic neoplasms | Rat | ||
| 3 | Decreased embryo size, neonatal lethality, cell cycle checkpoint | Mouse, human | ||
| 3 | Spastic paraplegia, autosomal recessive | Mouse, human | ||
| 3 | Autoimmune diseases, enlarged spleen, diabetes mellitus, insulin-dependent | Human, mouse, rat | ||
| 4 | Suppression by virus of host molecular function, endosome to lysosome transport | Mouse | ||
| 10 | Increased T-cell proliferation, abnormal self-tolerance | Mouse | ||
| 10 | Gene silencing by miRNA, wound healing, spreading of epidermal cells, heart contraction, decreased rate, abnormal cell migration | Human, zebrafish, mouse | ||
| 10 | Decreased susceptibility to pharmacologically induced seizures | Mouse | ||
| 8 | Abnormal lung lobe morphology, notch signaling involved in heart development, cilium assembly | Human, mouse | ||
| 11 | Abnormal hair texture, decreased body weight, embryonic lethality | Mouse | ||
| 17 | Skeletal system morphogenesis | Human | ||
| 5 | Shortened circadian period (sleep disorder), sensory perception of smell | Mouse | ||
| 22 | Congenital disorder of glycosylation | Human | ||
| 18 | Deafness, autosomal dominant 4b | Human, mouse | ||
| 16 | Dendritic spine development | Mouse | ||
| 19 | Nanophthalmia, hemorrhage | Human, mouse | ||
| 18 | Staphylococcal pneumonia, bronchiolitis obliterans | Mouse | ||
| 8 | Fatty liver, myocarditis, diabetes mellitus | Rat | ||
| 4 | Congenital disorder | Human | ||
| 1 | Reduced fertility, thyroid, and eye inflammation | Mouse | ||