| Literature DB >> 30454054 |
Bo Xu1, Changlong Li1, Hang Zhuang1, Jiali Wang1, Qingfeng Wang1, Chao Wang2, Xuehai Zhou1.
Abstract
BACKGROUND: The clinical decision support system can effectively break the limitations of doctors' knowledge and reduce the possibility of misdiagnosis to enhance health care. The traditional genetic data storage and analysis methods based on stand-alone environment are hard to meet the computational requirements with the rapid genetic data growth for the limited scalability.Entities:
Keywords: Alluxio; Clinical decision support system; Cloud computing; Genetic data analysis; Read mapping; Spark
Mesh:
Year: 2018 PMID: 30454054 PMCID: PMC6245588 DOI: 10.1186/s12920-018-0415-1
Source DB: PubMed Journal: BMC Med Genomics ISSN: 1755-8794 Impact factor: 3.063
Fig. 1GCDSS workflow
Fig. 2Workflow of disease identification and discovery
Fig. 3Framework of the CloudBWA algorithm
Fig. 4Workflow of the CloudBWA algorithm
Real datasets
| Public datasets | Description | Number |
|---|---|---|
| GRCh38 | Reference genome | About 3.2 billion base |
| ERR000589 | Paired-end reads | 23,928,016 reads |
| SRR062634 | Paired-end reads | 48,297,986 reads |
Fig. 5Impact evaluation of different batch size and output mode
Fig. 6Impact evaluation of different input data format
Fig. 7The speedup improvement by increasing the number of nodes
Fig. 8Performance comparison with distributed algorithms
Fig. 9Performance comparison with real data
Fig. 10Performance evaluation of calibration
Fig. 11Scalability evaluation of calibration
Fig. 12Performance evaluation of variant discovery and genotyping
Fig. 13Scalability evaluation of variant discovery and genotyping
Evaluation of disease identification and discovery
| Number of raw reads | 4,000,000 | 20,000,000 | 40,000,000 |
|---|---|---|---|
| Number of mapped reads | 3,911,329 | 19,553,895 | 39,107,115 |
| Number of mate mapped reads | 3,824,558 | 19,117,950 | 38,234,922 |
| Reads number after variant discovery | 3,655,139 | 17,772,692 | 37,313,251 |
| Reads number after genotyping | 5071 | 13,797 | 15,571 |
| Reads number after disease analysis | 3 | 14 | 34 |