| Literature DB >> 24517501 |
Ezequiel Luis Nicolazzi1, Matteo Picciolini, Francesco Strozzi, Robert David Schnabel, Cindy Lawley, Ali Pirani, Fiona Brew, Alessandra Stella.
Abstract
BACKGROUND: Currently, six commercial whole-genome SNP chips are available for cattle genotyping, produced by two different genotyping platforms. Technical issues need to be addressed to combine data that originates from the different platforms, or different versions of the same array generated by the manufacturer. For example: i) genome coordinates for SNPs may refer to different genome assemblies; ii) reference genome sequences are updated over time changing the positions, or even removing sequences which contain SNPs; iii) not all commercial SNP ID's are searchable within public databases; iv) SNPs can be coded using different formats and referencing different strands (e.g. A/B or A/C/T/G alleles, referencing forward/reverse, top/bottom or plus/minus strand); v) Due to new information being discovered, higher density chips do not necessarily include all the SNPs present in the lower density chips; and, vi) SNP IDs may not be consistent across chips and platforms. Most researchers and breed associations manage SNP data in real-time and thus require tools to standardise data in a user-friendly manner. DESCRIPTION: Here we present SNPchiMp, a MySQL database linked to an open access web-based interface. Features of this interface include, but are not limited to, the following functions: 1) referencing the SNP mapping information to the latest genome assembly, 2) extraction of information contained in dbSNP for SNPs present in all commercially available bovine chips, and 3) identification of SNPs in common between two or more bovine chips (e.g. for SNP imputation from lower to higher density). In addition, SNPchiMp can retrieve this information on subsets of SNPs, accessing such data either via physical position on a supported assembly, or by a list of SNP IDs, rs or ss identifiers.Entities:
Mesh:
Year: 2014 PMID: 24517501 PMCID: PMC3923093 DOI: 10.1186/1471-2164-15-123
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1E/R diagram of the SNPchiMp database. Primary keys are evidenced with a black dot and underlined.
Consistency of information across SNP chip and assemblies
| | | | | | |||
|---|---|---|---|---|---|---|---|
| Bovine3kf | 2,900 | 14 | 0 | 0 | 17 | 5 | 20 |
| BovineLDg | 6,909 | 3 | 0 | 1 | 94 | 14 | 111 |
| BovineSNP50v.1h | 54,001 | 0 | 29 | 0 | 1,970 | 204 | 1,852 |
| BovineSNP50v.2i | 54,609 | 18 | 29 | 0 | 2,154 | 340 | 2,031 |
| BovineHDj | 777,962 | 13 | 96 | 5 | 72,014 | 3,338 | 64,528 |
| Axiom BOS 1k | 648,875 | 288,601 | 1 | 0 | 130,618 | 0 | 16,207 |
aTotal number of SNPs/SNP probes within chip not considering cross-references of SNP IDs.
bTotal number of SNPs without SNP ID – rs ID association.
cTotal number of SNPs with two different SNP IDs and same rs ID.
dTotal number of SNPs with three different SNP IDs and same rs ID.
eTotal number of SNPs whose rs ID is not mapped in the correspondent assembly (coded as chromosome 99 and position 0 in the SNPchiMp database).
fIllumina Golden Gate Bovine3K BeadChip.
gIllumina Infinium BovineLD BeadChip.
hIllumina Infinium BovineSNP50 v.1 BeadChip.
iIllumina Infinium BovineSNP50 v.2 BeadChip.
jIllumina Infinium BovineHD BeadChip.
kAffymetrix Axiom Genome-Wide BOS 1.
Consensus information across SNP chips
| 3,204 (304) | | | | | | |
| 2,388 (229) | 7,633 (724) | | | | | |
| 3,190 (304) | 7,589 (724) | 58,276 (4,275) | | | | |
| 3,122 (298) | 7,578 (724) | 56,494 (4,154) | 58,763 (4,154) | | | |
| 2,915 (275) | 7,561 (717) | 52,569 (3,835) | 53,099 (3,754) | 781,797 (3,835) | | |
| 1,861 (187) | 4,510 (450) | 38,527 (2,797) | 38,376 (2,712) | 87,398 (2,496) | 648,875 (0) |
aIllumina Golden Gate Bovine3K BeadChip.
bIllumina Infinium BovineLD BeadChip.
cIllumina Infinium BovineSNP50 v.1 BeadChip.
dIllumina Infinium BovineSNP50 v.2 BeadChip.
eIllumina Infinium BovineHD BeadChip.
fAffymetrix Axiom Genome-Wide BOS 1.
For across platform integration, rs IDs were used. Cross-referenced IDs are in parenthesis. Total number of SNP are displayed on the diagonal.