| Literature DB >> 26573482 |
Guilhem Sempéré1, Katayoun Moazami-Goudarzi2, André Eggen3, Denis Laloë4, Mathieu Gautier5, Laurence Flori6,7.
Abstract
BACKGROUND: The advent and democratization of next generation sequencing and genotyping technologies lead to a huge amount of data for the characterization of population genetic diversity in model and non model-species. However, efficient storage, management, cross-analyzing and exploration of such dense genotyping datasets remain challenging. This is particularly true for the bovine species where many SNP datasets have been generated in various cattle populations with different genotyping tools. DESCRIPTION: We developed WIDDE, a Web-Interfaced Next Generation Database that stands as a generic tool applicable to a wide range of species and marker types ( http://widde.toulouse.inra.fr). As a first illustration, we hereby describe its first version dedicated to cattle biodiversity, which includes a large and evolving cattle genotyping dataset for over 750,000 SNPs available on 129 (89 public) different cattle populations representative of the world-wide bovine genetic diversity and on 7 outgroup bovid species. This version proposes an optional marker and individual filtering step, an export of genotyping data in different popular formats, and an exploration of genetic diversity through a principal component analysis. Users can also explore their own genotyping data together with data from WIDDE, assign their samples to WIDDE populations based on distance assignment method and supervised clustering, and estimate their ancestry composition relative to the populations represented in the database.Entities:
Mesh:
Year: 2015 PMID: 26573482 PMCID: PMC4647285 DOI: 10.1186/s12864-015-2181-1
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1WIDDE architecture diagram. This high-level diagram illustrates the WIDDE architecture. It provides information about entities involved when using the information system, the data flows that occur between them, and the third-party software used in the process
Fig. 2Web interface to select individuals and markers, apply quality filter, export data in various formats and launch principal component analysis
Fig. 3Plot of the individuals according to their coordinates on the first two principal components of the principal component analysis including 44,554 SNPs genotyped on 685 individuals from 22 cattle populations representative of the cattle genetic diversity. Eight EUT (Abondance/ABO, Angus/ANG, Aubrac/AUB, Charolais/CHA, Holstein/HOL, Montbéliard/MON, Normande/NOR and Salers/SAL), four AFT (Baoulé/BAO, Lagune/LAG, N’Dama/NDA and Somba/SOM), six ZEB (Brahman/BRM, Nelore/NEL, Gir/GIR, Zebu Bororo/ZBO, Zebu Fulani/ZFU and Zebu from Madagascar/ZMA) and four admixed populations (Borgou/BOR, Kouri/KUR, Oumes Zaër/OUL and Santa Gertrudis/SGT) genotyped on the Illumina Bovine SNP50v1 were selected. Data has been filtered using default parameters
Fig. 4Proportion of assigned individuals and misassignment rate in assignment tests based on supervised clustering. The 2250 individuals from 45 public populations of the world reference dataset were assigned against the world reference dataset, using 32,966 (33K) SNP, 10K SNP and 1K SNP, with different values for the EM algorithm’s ε stopping criterion (0.01, 0.1 and 1). The proportion of assigned individuals a and the misassignment rate b were plotted against ancestry thresholds (0–1)