| Literature DB >> 19660108 |
Sebastian H Eck1, Anna Benet-Pagès, Krzysztof Flisikowski, Thomas Meitinger, Ruedi Fries, Tim M Strom.
Abstract
BACKGROUND: The majority of the 2 million bovine single nucleotide polymorphisms (SNPs) currently available in dbSNP have been identified in a single breed, Hereford cattle, during the bovine genome project. In an attempt to evaluate the variance of a second breed, we have produced a whole genome sequence at low coverage of a single Fleckvieh bull.Entities:
Mesh:
Year: 2009 PMID: 19660108 PMCID: PMC2745763 DOI: 10.1186/gb-2009-10-8-r82
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Figure 1Distribution of read depth. (a) Distribution of mapped read depth in all autosomal chromosomes. Read depth is sampled at every position along the chromosomes. The solid line represents a Poisson distribution with the same mean. (b) Distribution of read depth as a function of GC-content. GC-content and read depth were calculated for non-overlapping windows of 500 bp.
Figure 2Analysis procedure. Sequence reads were aligned to the reference sequence (bosTau4) by the MAQ software. SNPs were called and filtered by MAQ and custom scripts, resulting in a final set of 2.44 million SNPs. Comparison with 25,726 array-based genotpyes revealed a false-negative detection rate of 49%. A false-positive detection rate of 1.1% was determined by comparison with 196 randomly selected SNPs genotyped with MALDI-TOF spectroscopy. By determining the false-positive detection rate in 75 coding SNPs with high coverage (≥16), we found evidence that the high false-positive detection rate in these SNPs is due to mapping errors caused by duplications that are not reflected in the reference sequence rather than to sequencing errors.
Figure 3Small indels. Distribution of the size of 115,371 small indels (68,354 deletions and 47,017 insertions). Positive and negative values on the x-axis correspond to the presence or absence of bases relative to the reference sequence.
Identified SNPs and small indels
| All | Coding | Non-synonymous | Splice-site | UTR | |
| SNPs (Ensembl) | 2,443,637 (18%) | 22,070 (18%) | 9,360 (15%) | 148 (14%) | 8,114 (20%) |
| Indel (Ensembl) | 115,371 | 425 | |||
| SNP (RefSeq) | 2,443,637 (18%) | 7,619 (18%) | 3,139 (16%) | 40 (15%) | 6,292 (20%) |
| Indel (RefSeq) | 115,371 | 203 |
Proportion of SNPs that have been previously reported are given in parentheses. UTR, untranslated region.
Concordant calls
| BovineSNP50 | MAQ calls | Concordant calls | |
| Homozygote reference | 22,999 | ||
| Homozygote variant | 12,043 | 8,974 (74.51%) | 8,949 (99.72%) |
| Heterozygote | 13,683 | 5,882 (42.98%) | 4,157 (70.67%) |
Comparison of the SNP calls made from genotype data and the sequence: concordant calls. Genotype data were generated using the Infinium BovineSNP50 BeadChip. Homozygote reference denotes an array-based genotype that is homozygous for the reference allele. Homozygote variant denotes an array-based genotype that is homozygous for a non-reference allele. Heterozygote denotes a heterozygous array-based genotype containing one reference allele and a variant allele.
Discordant calls
| Discordant calls | |
| All disagreements | 1,766 (6.86%) |
| GT-het>Seq-hom | 1,720 (6.68%) |
| Seq-het>GT-hom | 10 (0.03%) |
| Different homozygotes | 15 (0.06%) |
| Different heterozygotes | 5 (0.02%) |
| Seq-SNP>GT-Ref | 16 (0.09%) |
Comparison of the SNP calls made from genotype data and the sequence: discordant calls. GT-het>Seq-hom indicates a heterozygote under-call by MAQ (array based genotype heterozygote, MAQ based genotype homozygote). Seq-het>GT-hom indicates a possible heterozygote under-call by the array (array-based genotype homozygote for the reference allele, MAQ based genotype heterozygote). Different homozygotes denote homozygous genotypes on both platforms that both differed from the reference genotype. Different heterozygotes denote heterozygote genotypes on both platforms where one allele differs. Seq-SNP>GT-Ref indicates a MAQ based genotype that differs from the reference sequence while the chip based genotype displayed only the reference allele.
SNPs called by MAQ compared with calls by MALDI-TOF genotyping
| Concordant calls | 186 |
| MAQ heterozygote under-call | 8 |
| MALDI-TOF homozygous, MAQ heterozygous | 2 |
| Error rate (without heterozygote under-calls) | 1.1% |
Figure 4Minor allele frequency (MAF) spectrum of randomly selected SNPs. Genotypes of 196 SNPs were determined by MALDI-TOF mass spectroscopy in 48 Fleckvieh and 48 Braunvieh bulls.