| Literature DB >> 21410945 |
Robert H S Kraus1, Hindrik H D Kerstens, Pim Van Hooft, Richard P M A Crooijmans, Jan J Van Der Poel, Johan Elmberg, Alain Vignal, Yinhua Huang, Ning Li, Herbert H T Prins, Martien A M Groenen.
Abstract
BACKGROUND: Next generation sequencing technologies allow to obtain at low cost the genomic sequence information that currently lacks for most economically and ecologically important organisms. For the mallard duck genomic data is limited. The mallard is, besides a species of large agricultural and societal importance, also the focal species when it comes to long distance dispersal of Avian Influenza. For large scale identification of SNPs we performed Illumina sequencing of wild mallard DNA and compared our data with ongoing genome and EST sequencing of domesticated conspecifics. This is the first study of its kind for waterfowl.Entities:
Mesh:
Year: 2011 PMID: 21410945 PMCID: PMC3065436 DOI: 10.1186/1471-2164-12-150
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Summary of DNA sequence filtering results
| raw (76 bp) | l62 N. q12 o1521 | % | paired-end | % | single | % | |
|---|---|---|---|---|---|---|---|
| 34818352 | 16611852 | 47.7 | 10793170 | 65.0 | 5818682 | 35.0 | |
| 2547361732 | 1029934824 | 40.4 | 669176540 | 65.0 | 360758284 | 35.0 |
Paired and single sequence reads remaining after filtering raw reads.
1Raw sequences were filtered for length 62. Only reads without base-call errors (N or) were considered. Singly represented reads are required to have a per base-call quality of 12. Sequences more than four times overrepresented, based on the raw RRL coverage (38×, see methods) were discarded.
Figure 1Minor allele frequency distributions. In boxplot A MAC distributions of d-RRL (SNPs identified in this study) and d-Shared (SNPs that d-RRL shares with d-EST or d-WGS (also see Venn diagram Figure 2D)) are compared. Histograms (B and C) show MAC distributions of d-RRL and d-Shared at a bin width of 0.05.
Figure 2SNP distributions within datasets and between datasets. Diagrams A-C show the distribution of SNP predictions over the nucleotide position in the sequence reads for d-RRL, d-Shared and d-Between. Each filled dot represents the cumulative number of occurrences that the read position was involved in a SNP inference. Open dots represent the average TS:TV ratio of SNPs indentified in that read position. Diagram D shows how many SNPs are shared between independent SNP sets d-EST (SNPs identified by EST sequencing of domesticated duck (Vignal, unpublished data)), d-WGS (SNPs identified in the whole genome assembly of domesticated duck (Huang et al., in prep.)) and d-RRL (SNPs identified in RRL sequencing of wild mallard (this study)).
Transition/transversion ratios in SNP subsets
| 42313 | 42602 | 9658 | 9051 | 9114 | 9675 | 122,413 | 2.3 | |
| 7300 | 7442 | 1396 | 1227 | 1334 | 1484 | 20,184 | 2.7 | |
| 20156 | 21333 | 5464 | 5165 | 4804 | 4830 | 61,752 | 2.0 | |
1 = The transitions total divided by the transversions total for a data subset.
The two transitions and four transversions are abbreviated by their nucleotide ambiguity codes R, Y and M, W, S, K.
Figure 3Distribution of mallard SNPs uniquely mapped on the chicken genome. In blue are 4272 mallard SNPs with a unique mapping position to the chicken genome (see text for mapping algorithms). 384 mapped SNPs that were selected for genotyping are in red. On the X-axis, the chicken genome in 400 kb intervals, and on the Y-axis, the frequency (0-15) of mapped mallard SNPs for a specific chicken genome interval is given.
Figure 4Genotyping minor allele frequency and heterozygosity distributions. Validation of the d-Shared subset involved genotyping of 384 selected SNPs on 765 ducks including the nine mallards that made up the SNP discovery panel. Minor allele frequency (MAF) and heterozygosity of SNPs were calculated for the discovery panel, as well as for the whole set of genotyped ducks.