| Literature DB >> 19835600 |
Hindrik H D Kerstens1, Richard P M A Crooijmans, Albertine Veenendaal, Bert W Dibbits, Thomas F C Chin-A-Woeng, Johan T den Dunnen, Martien A M Groenen.
Abstract
BACKGROUND: The development of second generation sequencing methods has enabled large scale DNA variation studies at moderate cost. For the high throughput discovery of single nucleotide polymorphisms (SNPs) in species lacking a sequenced reference genome, we set-up an analysis pipeline based on a short read de novo sequence assembler and a program designed to identify variation within short reads. To illustrate the potential of this technique, we present the results obtained with a randomly sheared, enzymatically generated, 2-3 kbp genome fraction of six pooled Meleagris gallopavo (turkey) individuals.Entities:
Mesh:
Year: 2009 PMID: 19835600 PMCID: PMC2772860 DOI: 10.1186/1471-2164-10-479
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Summary of DNA sequence filtering results.
| l32 n. | 107888201 | 94.48 | 24.78 | |
| l32 n. q20 o230 | 27979963 | 24.50 | 46.64 | |
| l32 n. q10 o230 | 32941906 | 28.84 | 40.23 |
1 sequences are filtered for length 32, without base-call errors (n or .). Singly represented reads are required to have a per base-call quality score of 20 (assembly data set) or 10 (SNP data set). Sequences more than four times overrepresented, based on the expected 56× coverage, were discarded.
Short read assembly results.
| edena -c 33 -m 16 | velvet 152 | SSAKE | |
| 230741 | 24 | 627600 | |
| 8965681 | NA | 13964267 | |
| 17487533 | 2812 | 36163074 | |
| 90 | 129 | 53 | |
1algorithm versions were: edena-2.1.1, velvet_0.3, SSAKE_v2.0
2parameter applies to hash_length
Quality estimation of turkey short read contigs based on alignment to the chicken genome.
| 124480 | 47 | - | 2 | - | 1 | - | 50 | - | ||
| 38382 | 53 | 2 | 1 | 56 | ||||||
| 25808 | 73 | 1 | 0 | 74 | ||||||
| 8878 | 69 | 1 | 0 | 70 | ||||||
| 6453 | 63 | 1 | 0 | 64 | ||||||
| 2372 | 51 | 1 | 0 | 52 | ||||||
| 1192 | 40 | 1 | 0 | 40 | ||||||
| 682 | 30 | 1 | 0 | 31 | ||||||
| 688 | 18 | 0 | 0 | 18 | ||||||
| 308 | 11 | 0 | 0 | 11 | ||||||
| 380 | 6 | 1 | 0 | 7 | ||||||
1 frequency in which contigs and BES (in italics) occurred per size category
2 per size category percentage of contigs and BES (in italics) that mapped to the chicken genome
Figure 1Distribution of short read turkey contigs, turkey BES, and SNPs on chicken chromosome 4. In black, short read contigs <100 bp; in blue, short read contigs ≥ 100 bp; in red, BES; in yellow, BES-short-read contigs; and in green, SNPs. On the X-axis, the chicken genome in 200 kb intervals. On the Y-axis, the frequency of mapped turkey features for a specific chicken genome interval.
Figure 2Distribution of nucleotide polymorphisms across 32 bp genome analyzer reads. The X-axis represents the 32 base sequence read. On the Y-axis is the cumulative number of identified SNPs per base position of the sequence read.
Overview of SNPs identified.
| 209623 | 7609 | 2254 | 5218 | 6454 | |
| 195928 | 7930 | 2760 | 5636 | 6834 | |
| 192731 | 11287 | 5620 | 8902 | 10125 | |
140/40 refers to SNPs that are flanked on both sides by at least 40 nucleotides of genomic sequence.
20/20 refers to SNPs that are flanked on both sides by at least 20 nucleotides of genomic sequence.
2/40 refers to SNPs that have at least 2 nucleotides flanking sequence on one side and at least 40 nucleotides on the other.
2c50 refers to reference genome consisting of short read contigs of 50 bp or more
c50ca is extended genome assembly based on chicken alignment
c50caB is extended genome assembly based on chicken alignment and turkey BES.
Figure 3Distribution of 6,134 SNPs that mapped uniquely to the genome, 343 of which were selected for validation. In blue, 5,791 putative SNPs identified using the c50caB reference sequence and mapping uniquely to the chicken genome; in red, 343 uniquely mapping putative SNPs selected for validation. On the X-axis, the chicken genome in 1 Mb intervals. On the Y-axis, the frequency of mapped putative turkey SNPs for a specific chicken genome interval.
SNP performance statistics.
| 304 | 88.6 | 20 | 48.8 | |
| 12 | 3.5 | 4 | 9.8 | |
| 27 | 7.9 | 17 | 41.5 | |
Genotyping performance of 343 SNP discovered in short read contigs that were uniquely mapped on the chicken genome and 41 SNPs discovered in contigs that were not, or not uniquely, mapped on the chicken genome.