| Literature DB >> 31508797 |
Yaw Nti-Addae1, Dave Matthews2, Victor Jun Ulat3, Raza Syed1, Guilhem Sempéré4, Adrien Pétel5, Jon Renner6, Pierre Larmande7, Valentin Guignon8, Elizabeth Jones1, Kelly Robbins9.
Abstract
MOTIVATION: With high-throughput genotyping systems now available, it has become feasible to fully integrate genotyping information into breeding programs. To make use of this information effectively requires DNA extraction facilities and marker production facilities that can efficiently deploy the desired set of markers across samples with a rapid turnaround time that allows for selection before crosses needed to be made. In reality, breeders often have a short window of time to make decisions by the time they are able to collect all their phenotyping data and receive corresponding genotyping data. This presents a challenge to organize information and utilize it in downstream analyses to support decisions made by breeders. In order to implement genomic selection routinely as part of breeding programs, one would need an efficient genotyping data storage system. We selected and benchmarked six popular open-source data storage systems, including relational database management and columnar storage systems.Entities:
Mesh:
Year: 2019 PMID: 31508797 PMCID: PMC6737464 DOI: 10.1093/database/baz096
Source DB: PubMed Journal: Database (Oxford) ISSN: 1758-0463 Impact factor: 3.451
Figure 1Different orientation of genotyping data. (a) markers in rows and samples in columns, whereas (b) shows markers in columns and samples in rows.
Server configuration and software versions
|
| |
|---|---|
| Processor | Intel Xeon E5 2620 V3 2.4GHz 6C 15 Mb |
| Number of Processors | 24 |
| Memory | 128GB DDR-42133 MHz ECC/REG |
| Raid | 6 |
| Network File Storage | 50 TB |
| Operating System | Debian |
|
|
|
| HDF5 | 1.8 |
| MariaDB | 10.1 |
| MonetDB | 1.1 |
| MongoDB | 3.6.0 (compression level: snappy) |
| PostgreSQL | 9.5 |
| Spark | 2.3 |
| Java Hotspot | 1.8 (64 bit) |
| Elasticsearch | 2.3.3 (under Java 1.8.0_92 64bit) |
Data stored in “marker-fast” orientation in PostgreSQL and MariaDB. Marker column is of type string and contains the name of the marker, and Data column is of type JSON object for PostgreSQL and Dynamic column for MariaDB, and contains a JSON formatted string
| Marker | Data |
|---|---|
| S6_120018 |
|
| S6_120046 |
|
Figure 2Load times for database systems.
Figure 3Times for extracting increasing number of markers across all samples. (a) Times for extracting a contiguous set of markers for all samples. Times for MariaDB are excluded because they exceed 25 hours, and times for MongoDB and MonetDB were essentially identical. (b) Times for extracting a random set of markers for all samples. Times for MariaDB and MonetDB are excluded since they exceed 25 hours. (c) A zoom-in at extract times up to 500,000 random markers to show if there is significant difference between HDF5, MongoDB and PostgreSQL.
Figure 4Times for extracting increasing number of samples across all markers. (a) Times for extracting contiguous set of samples for all 32 million markers. (b) Times for extracting random set of samples for all 32 million markers. Times for MariaDB and PostgreSQL are excluded in both (a) and (b) because their queries exceeded 20 hours.
Figure 5Times for extracting increasing cross-section of samples and 1 million markers. Time for extracting a block contiguous number of samples and 1 million contiguous markers. Extract times for MariaDB are excluded since they exceeded 25 hours. (b) Times for extracting random set of samples across 1 million random markers. Extract times for MariaDB and MonetDB are excluded because their queries exceeded 25 hours.
Times in minutes for extracting M markers for all samples
| Contiguous | # of markers (mil) | MonetDB | HDF5 | MongoDB | PostgreSQL | MariaDB | Spark |
|---|---|---|---|---|---|---|---|
| Yes | 0.001 | 0.1 | 0 | 0 | 0.1 | 1566 | 19.5 |
| Yes | 0.005 | 0.2 | 0 | 0.1 | 0.5 | 1623 | 43.45 |
| Yes | 0.01 | 0.3 | 0 | 0.3 | 0.8 | 1523 | 71.3 |
| Yes | 0.05 | 1.2 | 0.1 | 1.4 | 3.9 | 1579 | 373.08 |
| Yes | 0.1 | 3.7 | 0.3 | 2.5 | 7.8 | 1663 | 423.07 |
| Yes | 0.5 | 10 | 1.2 | 12.3 | 39.2 | 1520 | 420.32 |
| Yes | 1 | 21.7 | 2.3 | 26.8 | 78.4 | 1632 | 426.8 |
| Yes | 5 | 134.2 | 11.6 | 134.3 | 391.8 | 475 | |
| No | 0.001 | 44.5 | 0.1 | 0.2 | 0.5 | 1545 | 252 |
| No | 0.005 | 47.2 | 0.4 | 0.7 | 0.6 | 2043 | 251 |
| No | 0.01 | 47.8 | 0.9 | 1.8 | 1.2 | 2108 | 251 |
| No | 0.05 | 58.6 | 4.4 | 7.1 | 6 | 3017 | 252 |
| No | 0.1 | 74.8 | 8.6 | 13.7 | 13.2 | 6021 | 252 |
| No | 1 | 2700 | 80.8 | 130.8 | 149.5 | 264 | |
| No | 5 | 443 | 858.8 | 945.6 | 316 |
Times in minutes for extracting N samples for all markers
| Contiguous | # of samples | MonetDB | HDF5 | MongoDB | PostgreSQL | MariaDB | Spark |
|---|---|---|---|---|---|---|---|
| Yes | 50 | 1.8 | 2 | 190.8 | 1282 | 1830 | 2.7 |
| Yes | 100 | 4 | 4 | 195.8 | 1301 | 1868 | 2.82 |
| Yes | 500 | 42.1 | 19.6 | 234.4 | 1401 | 13.73 | |
| Yes | 1000 | 103.2 | 40.5 | 291.4 | 1528 | 32.63 | |
| Yes | 2000 | 238.5 | 84.7 | 405.5 | 1804 | 92.52 | |
| Yes | 3000 | 432.5 | 128.4 | 518.2 | 2069 | 149.38 | |
| Yes | 4000 | 698.5 | 169 | 632.8 | 2327 | 211.52 | |
| No | 50 | 1.9 | 1.7 | 188.5 | 1482 | 1575.63 | 2.9 |
| No | 100 | 4.7 | 2.4 | 195.6 | 1505 | 1535.53 | 3.3 |
| No | 500 | 48.7 | 15.6 | 239.3 | 1648 | 14 | |
| No | 1000 | 118.3 | 22.6 | 295.7 | 1696 | 35.5 | |
| No | 2000 | 306.6 | 58.5 | 409 | 1678 | 108 | |
| No | 3000 | 547 | 81.6 | 621.3 | 1674 | 171 | |
| No | 4000 | 626.9 | 107.7 | 677.9 | 1681 | 239 |
Times in minutes for extracting M markers by N samples
| Contiguous? | # of markers | # of Samples | MonetDB | HDF5 | MongoDB | PostgreSQL | MariaDB | Spark |
|---|---|---|---|---|---|---|---|---|
| Yes | 1k | 1000 | 0.00 | 0.0 | 0.00 | 0.10 | 1287 | 1.60 |
| Yes | 1k | 3000 | 0.10 | 0.0 | 0.00 | 0.10 | 1684 | 8.55 |
| Yes | 1k | 4000 | 0.10 | 0.0 | 0.00 | 0.10 | 1718 | 13.37 |
| No | 1k | 1000 | 8.40 | 0.1 | 0.20 | 0.20 | 1073 | 20.4 |
| No | 1k | 3000 | 28.60 | 0.1 | 0.20 | 0.20 | 1163 | 144 |
| No | 1k | 4000 | 24.70 | 0.1 | 0.20 | 27.10 | 1662 | 201 |
| Yes | 50k | 1000 | 0.10 | 0.10 | 0.70 | 2.60 | 1726 | 38.08 |
| Yes | 50k | 3000 | 0.40 | 0.10 | 1.20 | 3.40 | 1635 | 204.3 |
| Yes | 50k | 4000 | 0.80 | 0.20 | 1.50 | 3.80 | 1111 | 299.3 |
| No | 50k | 1000 | 10.80 | 4.10 | 6.60 | 5.00 | 1297 | 20.05 |
| No | 50k | 3000 | 33.20 | 4.20 | 7.20 | 5.20 | 1623 | 145 |
| No | 50k | 4000 | 32.40 | 4.20 | 8.00 | 4.80 | 1756 | 203 |
| Yes | 1m | 1000 | 3.40 | 0.90 | 12.10 | 49.90 | 1045.00 | 41.32 |
| Yes | 1m | 3000 | 10.60 | 2.10 | 22.10 | 68.20 | 1048.00 | 222.12 |
| Yes | 1m | 4000 | 14.70 | 2.60 | 29.60 | 76.10 | 1492.00 | 314.2 |
| No | 1m | 1000 | 192.1 | 72.6 | 109.60 | 121.80 | 1802.00 | 22.77 |
| No | 1m | 3000 | 1846.6 | 78.5 | 138.10 | 143.40 | 1911.00 | 152.5 |
| No | 1m | 4000 | 3291.7 | 83.0 | 153.90 | 151.00 | 1935.00 | 212.6 |