| Literature DB >> 26645365 |
Ashutosh Das1, Frank Panitz2, Vivi Raundahl Gregersen3, Christian Bendixen4, Lars-Erik Holm5.
Abstract
BACKGROUND: Over the last few years, continuous development of high-throughput sequencing platforms and sequence analysis tools has facilitated reliable identification and characterization of genetic variants in many cattle breeds. Deep sequencing of entire genomes within a cattle breed that has not been thoroughly investigated would be imagined to discover functional variants that are underlying phenotypic differences. Here, we sequenced to a high coverage the Danish Holstein cattle breed to detect and characterize single nucleotide polymorphisms (SNPs), insertion/deletions (Indels), and loss-of-function (LoF) variants in protein-coding genes in order to provide a comprehensive resource for subsequent detection of causal variants for recessive traits.Entities:
Mesh:
Year: 2015 PMID: 26645365 PMCID: PMC4673847 DOI: 10.1186/s12864-015-2249-y
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Summary of the alignment statistics for four Holstein cows genome
| Animal ID | Number of total reads | Number of mapped reads | Number of bases in mapped reads | Mean depth | Genome coverage |
|---|---|---|---|---|---|
| 44–6 | 766,207,530 | 720,413,573 (94.0 %) | 72.8 Gb | 27.3 X | 98.7 % |
| 46–25 | 731,266,080 | 691,737,221 (94.6 %) | 69.9 Gb | 26.2 X | 98.7 % |
| 49–25 | 766,453,890 | 724,694,871 (94.5 %) | 73.2Gb | 27.4 X | 98.8 % |
| 50–38 | 771,642,408 | 718,944,575 (93.2 %) | 72.6 Gb | 27.2 X | 98.7 % |
| Total | 3,035,569,908 | 2,855,790,240 (94.1 %) | 288.5 Gb | 27.0 X |
Summary statistics of the identified variants
| Total SNPs | 10,796,794 |
| Homozygous SNPs | 5,078,645 (47.0) |
| Heterozygous SNPs | 5,718,149 (53.0) |
| Novel SNPs | 482,835 (4.5) |
| Known SNPs | 10,313,959 (95.5 ) |
| Biallelic SNPs | 10,780,608 (99.9) |
| Triallelic SNPs | 16,186 (0.1) |
| Ts:Tv | 2.11:1.00 |
| Total indels | 1,295,036 |
| Heterozygous indels | 566,749 (43.8) |
| Homozygous indels | 728,287 (56.2) |
| Novel indels | 231,359 (17.9) |
| Known indels | 1,063,677 (82.1) |
| Deletions | 655,942 (50.7) |
| Insertions | 639,094 (49.3) |
Values in parentheses are the percentage of variants in the specific class of the total variants that type of variant
Fig. 1Genome-wide SNP densities. The plot was generated using SNP density per kb on y-axis for a bin size of 1 Mb on each chromosome (x-axis; chromosome indicated)
Comparison of BovineHD chip homozygous alternative genotypes to sequencing calls
| Animal ID | BovineHD | Sequencing calls | Concordant | Inconsistent | Homozygous > heterozygous |
|---|---|---|---|---|---|
| 44–6 | 247,562 | 247,519 (99.9 %) | 247,237 (99.8 %) | 17 | 265 (0.1 %) |
| 46–25 | 255,886 | 255,296 (99.8 %) | 254,826 (99.6 %) | 3 | 467 (0.2 %) |
| 49–25 | 259,773 | 259,240 (99.8 %) | 259,001 (99.7 %) | 4 | 235 (0.1 %) |
| 50–38 | 255,427 | 254,729 (99.7 %) | 254,407 (99.6 %) | 4 | 318 (0.1 %) |
Concordant, the same alleles at the same sites were detected by both the BovineHD chip and sequencing calls; Inconsistent, homozygous calls by both the Bovine HD chip and sequencing calls but with different alleles; Homozygous > heterozygous, homozygous SNPs on the BovineHD chip that were over-called as heterozygous in sequencing calls
Comparison of BovineHD chip heterozygous genotypes to sequencing calls
| Animal ID | BovineHD | Sequencing calls | Concordant | Heterozygous > homozygous |
|---|---|---|---|---|
| 44–6 | 223,103 | 215,822 (96.7 %) | 215,348 (96.5 %) | 474 (0.2 %) |
| 46–25 | 225,985 | 214,607 (94.9 %) | 214,065 (94.7 %) | 542 (0.2 %) |
| 49–25 | 215,929 | 204,225 (95.6 %) | 203,639 (94.3 %) | 586 (0.3 %) |
| 50–38 | 222,315 | 208,328 (93.7 %) | 207,498 (93.3 %) | 830 (0.4 %) |
Concordant, the same alleles at the same sites were detected by both the BovineHD chip and sequencing calls; Heterozygous > homozygous, heterozygous SNPs on the BovineHD chip that were under-called as homozygous in sequencing calls
Annotation of variants by functional class
| Functional class | SNP | Indel |
|---|---|---|
| Intergenic | 7,345,721 (68.0) | 866,042 (66.9) |
| Intronic | 2,656,868 (24.6) | 334,542 (25.8) |
| Upstream | 367,709 (3.4) | 46,226 (3.6) |
| Downstream | 317,069 (2.9) | 41,330 (3.2) |
| 3′ UTR | 19,677 (0.2) | 3150 (0.2) |
| 5′ UTR | 4364 (0.0) | 453 (0.0) |
| Splice regiona | 6393 (0.1) | 896 (0.1) |
| Splice donorb | 230 (0.0) | 53 (0.0) |
| Splice acceptorc | 218 (0.0) | 82 (0.0) |
| Initiator codond | 74 (0.0) | 1 (0.0) |
| Stop gain | 395 (0.0) | - |
| Frameshift | - | 1302 (0.1) |
| Missense | 34,183 (0.3) | 44 (0.0) |
| Synonymous | 40,055 (0.4) | - |
| Coding sequencee | 125 (0.0) | 135 (0.0) |
| Inframe deletion | - | 261 (0.0) |
| Inframe insertion | - | 194 (0.0) |
| Stop lost | 29 (0.0) | - |
| Stop retained | 25 (0.0) | - |
| Within non coding exonf | 3569 (0.0) | 270 (0.0) |
| Within mature miRNA | 70 (0.0) | 29 (0.0) |
| Nc transcriptf | 20 (0.0) | 26 (0.0) |
| Total | 10,796,794 (100.0) | 1,295,036 (100.0) |
aVariant in which a change has occurred within the region of the splice site either within 1–3 bases of the exon or 3–8 bases of the intron
bVariant is located in the first two bases of an intron
cSNP is located in the last two bases of an intron
dSNP changes at least one base of the first codon of a transcript
eSNP is located in coding sequence with indeterminate effect
fSNP is a transcript variant of a non-coding RNA. Values in parentheses are the percentage of variants in the functional class of the total variants in the column
Fig. 2Characteristics and length distribution of Indels (≤15 bp) in coding sequence (CDS). The horizontal axis shows the length of indels and the vertical axis indicates the count of indels
Numbers of LoF variants before filtering and putative deleterious LoF variants after filtering
| Variant type | Before filtering | After filtering | |||||
|---|---|---|---|---|---|---|---|
| Total (novel) | Gene count | LoFhom | LoFhet | AH1 | Consistent | Inconsistent | |
| Stop gain | 395 (210) | 345 | 28 | 367 | 97 | 97 | 0 |
| Splice site | 448 (235) | 392 | 72 | 376 | 77 | 76 | 1 |
| Frameshift indel | 1302 (1123) | 931 | 614 | 688 | 171 | 95 | 76 |
| Total | 2145 (1568) | 714 | 1431 | 345 | 268 | 77 | |
LoFhom variants were homozygous in all four sequenced Danish Holstein cows; LoFhet variants for which at least one of the four sequenced cows was heterozygous; AH1 LoF variants for which none of the 288 sequenced Holstein animals from the 1000 bull genomes project was homozygous; Consistent are concordantly called both in the four Danish Holstein cows and the 288 Holstein animals; Inconsistent variants called as discordant variant types (SNP as indel or indel as SNP) between the four Danish Holstein cows and the 288 bulls from the 1000 bull genomes project; Novel variants are not annotated in dbSNP build133
Fig. 3Distribution and densities of LoF variants across the genome. The blue bars on the X axis represent the number of putative deleterious LoF variants in each chromosome whereas the red line indicates the densities of LoF per Mb on the chromosome
Fig. 4Minor allele frequency distribution for putative deleterious LoF variants called by both GATK and SAMtools. Minor allele frequency was calculated using data for the 288 Holstein animals from the 1000 bull genomes project