| Literature DB >> 23457633 |
Abstract
BACKGROUND: Single Nucleotide Polymorphisms (SNPs) are one of the largest sources of new data in biology. In most papers, SNPs between individuals are visualized with Principal Component Analysis (PCA), an older method for this purpose. PRINCIPALEntities:
Mesh:
Year: 2013 PMID: 23457633 PMCID: PMC3574019 DOI: 10.1371/journal.pone.0056883
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1SNP data transformed with PCA and t-SNE 1/2.
On the left is a PCA-plot with the first two components, on the right a t-SNE-plot of the very same data from each data source. Data sources: Panel (a) is from the 1001 genomes project, (b) from the RegMap panel and (c) from hapmap3 r2.
Figure 2SNP data transformed with PCA and t-SNE 2/2.
On the left is a PCA-plot with the first two components, on the right a t-SNE-plot of the very same data from each data source. Data sources: Panel (a) from hapmap3 r3 (compare with Fig. 1c) and (b) from the Rice Haplotype Map Project (only wild type where the label information was available).
Dunn's Validity Index and Silhouette Validation Method of the transformed SNP data.
| Data | Dunn's Validity Index | Silhouette Validation Method | ||||
|
|
|
|
|
|
| |
|
| 0.52 (0.09) | 0.61 (0.07) | 0.09 | 0.07 (0.04) | 0.22 (0.04) | 0.15 |
|
| 0.50 (0.06) | 0.50 (0.04) | 0.00 | 0.08 (0.02) | 0.15 (0.02) | 0.07 |
|
| 0.16 (0.01) | 0.25 (0.02) | 0.09 | 0.27 (0.02) | 0.31 (0.02) | 0.04 |
|
| 0.16 (0.01) | 0.35 (0.01) | 0.19 | 0.26 (0.02) | 0.32 (0.02) | 0.06 |
|
| 0.06 (0.07) | 0.10 (0.10) | 0.04 | −0.54 (0.04) | −0.46 (0.04) | 0.08 |
The values of two indices of cluster validity as a measure for structuredness of the different transformed data. As a comparison between PCA and t-SNE the diff(erence) column is expressive. The number in brackets is the standard deviation of the index with 1000 permutations of the labels.
Percent correctly classified with various machine learning methods acting on transformed SNP data.
| 1001 genomes project | RegMap | hapmap3 r2 | hapmap3 r3 | Rice | ||||||
| % |
|
|
|
|
|
|
|
|
|
|
|
| 55.6 | 72.7 | 79.2 | 89.7 | 72.9 | 90.5 | 72.9 | 87.5 | 41.3 | 66.6 |
|
| 60.6 | 76.8 | 77.6 | 89.1 | 72.7 | 90.9 | 73.3 | 87.6 | 39.7 | 64.9 |
|
| 67.7 | 76.8 | 80.7 | 85.8 | 70.3 | 85.1 | 72.2 | 84.8 | 50.5 | 56.4 |
|
| 62.6 | 75.8 | 75.2 | 80.3 | 74.6 | 87.2 | 71.8 | 84.1 | 40.7 | 42.3 |
|
| 13.9 | 8.1 | 15.8 | 13.5 | 14.5 | |||||
|
| 3.6 | 3.4 | 2.6 | 1.2 | 12.5 | |||||
The percent correctly classified as a measure how easy a model can be learned. As comparison between PCA and t-SNE, the respectively difference between these two columns is expressive. All models are better than random.