| Literature DB >> 23134663 |
Uday S Evani1, Danny Challis, Jin Yu, Andrew R Jackson, Sameer Paithankar, Matthew N Bainbridge, Adinarayana Jakkamsetti, Peter Pham, Cristian Coarfa, Aleksandar Milosavljevic, Fuli Yu.
Abstract
BACKGROUND: Until recently, sequencing has primarily been carried out in large genome centers which have invested heavily in developing the computational infrastructure that enables genomic sequence analysis. The recent advancements in next generation sequencing (NGS) have led to a wide dissemination of sequencing technologies and data, to highly diverse research groups. It is expected that clinical sequencing will become part of diagnostic routines shortly. However, limited accessibility to computational infrastructure and high quality bioinformatic tools, and the demand for personnel skilled in data analysis and interpretation remains a serious bottleneck. To this end, the cloud computing and Software-as-a-Service (SaaS) technologies can help address these issues.Entities:
Mesh:
Year: 2012 PMID: 23134663 PMCID: PMC3481437 DOI: 10.1186/1471-2164-13-S6-S19
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1A: High level representation of the Atlas2 Genboree pipeline. Figure 1B: Specific steps involved in running Atlas2 Genboree.
Figure 2A: Shows how various folders are represented within Genboree, and how to navigate the menu bar to get to Atlas2 Suite. Figure 2B: Atlas2 Suite customization window.
Figure 3Schematic representation of the Atlas2 Amazon pipeline.
Figure 4A: Screenshot of the submissions page. Figure 4B: Screenshot of the monitoring page.
Summarizes the amount of computation and time required to get the data (Chr 2 and Chr 19) onto Genboree and to run through the variant calling steps.
| Resource usage (Chr2/Chr 19) | ||
|---|---|---|
| PatientX | PatientY | |
| Size of BAM file (GB) | 22/5 | 24/6 |
| Time to upload (Min) | 70/12 | 85/13 |
| Atlas2 Runtime (Min) | 390/27 | 420/33 |
| Atlas2 Memory Usage (MB) | 1196/275 | 1192/270 |
Summarizes the total number of raw variants found in chromosome 2 and 19 of the two patients.
| Variant calls | ||
|---|---|---|
| Nucleotide Variants | PatientX | PatientY |
| All Variants | 229484 | 235450 |
| %dbSNP | 87.51 | 87.66 |
| Coding | 1867 | 2062 |
| Nonsynonymous | 921 | 983 |
| Coding (novel) | 170 | 198 |
| Nonsynonymous (novel) | 124 | 129 |
| Candidate genes | 5 | 6 |
Raw variants were then filtered with dbSNP (ver 129) and annotated with genetic information.
Following three genes were found to contain two or more predicted amino acid altering heterozygous mutation in both the patients.
| Reproducibility | |||||
|---|---|---|---|---|---|
| Chr | Position | Reference/Variant | Gene | PatientX | PatientY |
| 2 | 72972139 | A/T | Found | Found | |
| 2 | 72969094 | A/G | Not Found | Found | |
| 19 | 63464322 | A/G | Found | Found | |
| 19 | 63464133 | C/A | Found | Found | |
| 2 | 27657528 | C/T | Found | Found | |
| 2 | 27655701 | G/C | Found | Found | |
The table shows how many of those mutations we reproduced.
Summarizes the cost of running Atlas-INDEL2 on whole exome capture SOLiD and Illumina BAMs using Atlas2 Amazon.
| SOLiD | Illumina | |
|---|---|---|
| 3 | 1 | |
| 34.6 | 13.9 | |
| 0.9 | 1.8 | |
| 10.5 | 7.7 | |
| 4.84 | 1.95 | |
| 0.09 | 0.18 | |
| 3.74 | 2.72 | |
Following table summarizes the cost projections of analyzing 1, 3, 10, 50, 100 and 1000 BAMs using Atlas2 Amazon.
| No of BAM | 1 | 3 | 10 | 50 | 100 | 1000 |
|---|---|---|---|---|---|---|
| 20 | 60 | 200 | 1000 | 2000 | 20000 | |
| 1 | 3 | 10 | 50 | 100 | 1000 | |
| 8 | 24 | 80 | 400 | 800 | 8000 | |
| 16.8 | 50.4 | 168 | 840 | 1592.16 | 15092.16 | |
| 0.1 | 0.3 | 1 | 5 | 10 | 100 | |
| 2.72 | 8.16 | 27.2 | 136 | 272 | 2720 | |
| 19.62 | 58.86 | 196.2 | 981 | 1874.16 | 17912.16 | |
Figure 5Graph based on Table 5 projecting the cost of storage, I/O and compute as we scale up data.