| Literature DB >> 24467687 |
Charles Cole, Konstantinos Krampis, Konstantinos Karagiannis, Jonas S Almeida, William J Faison, Mona Motwani, Quan Wan, Anton Golikov, Yang Pan, Vahan Simonyan, Raja Mazumder1.
Abstract
BACKGROUND: Next-generation sequencing (NGS) technologies have resulted in petabytes of scattered data, decentralized in archives, databases and sometimes in isolated hard-disks which are inaccessible for browsing and analysis. It is expected that curated secondary databases will help organize some of this Big Data thereby allowing users better navigate, search and compute on it.Entities:
Mesh:
Substances:
Year: 2014 PMID: 24467687 PMCID: PMC3916084 DOI: 10.1186/1471-2105-15-28
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Cloud BioLinux environment. A) Computational infrastructure; B) Cloud BioLinux SRA provides a command line interface for mapping and SNV identification and Oracle VM VirtualBox manager allows user to edit settings C) Appliance Import Wizard allows import of appliance in Open Virtualization format. D) Snapshot of applications available through Cloud BioLinux environment.
Figure 2Short read sequence mapping and nsSNV analysis workflow. nsSNV variations are mapped to proteins to identify amino acid changes. Functional site-specific information is extracted from UniProtKB, RefSeq and Conserved Domain Database.
Snapshot of information obtained upon searching the CSR database with a protein or gene accession number
| NM_000059.3 | 1092 | A|C | NP_000050.2 | 289 | N|H | P51587 | 289 | TCGA-BH-A0BW-01A-11D-A10Y-09 | Breast Cancer Case | WXS6 |
| NM_000059.3 | 1205 | C|A | NP_000050.2 | 326 | S|R | P51587 | 326 | TCGA-BH-A0AZ-01A-21D-A12Q-09 | Breast Cancer Case | WXS |
| NM_000059.3 | 1341 | A|C | NP_000050.2 | 372 | N|H | P51587 | 372 | TCGA-AC-A2FF-01A-11D-A17D-09 | Breast Cancer Case | WXS |
1Position in RefSeq nucleotide entry.
2Nucleotide change.
3Position in RefSeq protein entry.
4UniProtKB/Swiss-Prot acceesion.
5Position in UniProtKB/Swiss-Prot protein entry.
6Whole exome sequencing.
Figure 3Known and novel SNVs classification based on comparison with dbSNP. A) The proportion of the novel/known SNVs in breast cancer cases and control. The last bar indicates SNVs that overlap in case and control. B) Distribution of common and rare SNVs in dbSNP. As expected SNVs that are rare (found in less than 10% of the samples analyzed) have a lower chance of being found in dbSNP. C) Functional annotation of novel and known SNVs. For functional groups that have lower numbers a zoomed-in view is shown.
Figure 4SNV statistics. A) Numbers show the total nucleotides and amino acids affected by SNVs. B) Visual representation of the distribution of SNVs in samples, showing that nearly 68% of all SNVs appear in 10% or lower frequency.
Functional analysis of novel nsSNV containing genes
| PANTHER pathways | Integrin signaling pathway | 147 | 99.09 | + | 6.43E-04 |
| Cadherin signaling pathway | 109 | 74.91 | + | 2.15E-02 | |
| Endothelin signaling pathway | 62 | 38.88 | + | 6.55E-02 | |
| Gonadotropin releasing hormone receptor pathway | 174 | 133.70 | + | 7.71E-02 | |
| Nicotinic acetylcholine receptor signaling pathway | 75 | 49.78 | + | 8.78E-02 | |
| PANTHER protein classification | Cell adhesion molecule | 443 | 316.70 | + | 9.28E-10 |
| G-protein modulator | 346 | 238.47 | + | 3.82E-09 | |
| enzyme modulator | 906 | 730.11 | + | 5.69E-09 | |
| kinase | 431 | 312.91 | + | 1.28E-08 | |
| cytoskeletal protein | 585 | 452.77 | + | 1.02E-07 | |
| GO biological process | Cell communication | 2333 | 2002.60 | + | 3.16E-14 |
| Cell adhesion | 820 | 616.80 | + | 6.15E-14 | |
| Signal transduction | 2204 | 1905.41 | + | 5.32E-12 | |
| Cellular component organization | 773 | 595.00 | + | 4.72E-11 | |
| Protein modification process | 804 | 630.55 | + | 5.80E-10 | |
| GO molecular function | Hydrolase activity | 1295 | 1053.92 | + | 1.81E-12 |
| Transferase activity | 962 | 759.03 | + | 1.12E-11 | |
| Protein binding | 1715 | 1456.91 | + | 5.40E-11 | |
| Transmembrane transporter activity | 609 | 460.35 | + | 9.79E-10 | |
| Enzyme regulator activity | 732 | 576.51 | + | 1.05E-08 | |
| GO cellular component | Cytoskeleton | 585 | 452.77 | + | 2.03E-08 |
| Intracellular | 681 | 540.47 | + | 3.98E-08 | |
| Actin cytoskeleton | 302 | 229.94 | + | 8.66E-05 | |
| MHC protein complex | 18 | 40.30 | - | 2.37E-03 | |
| Extracellular matrix | 327 | 264.55 | + | 3.33E-03 |
Figure 5Distribution of functional sites from UniProtKB/Swiss-Prot that affected by nsSNVs. A) Distribution of loss of functional sites caused by nsSNVs from cases. B) Distribution of loss of functional sites caused by nsSNVs from controls. C) Heatmap representation of the overlap of nsSNVs in cases and control (nsSNVs found in both cases and controls are marked green and marked red if they are unique to case or control. All protein accessions and corresponding position were given a numerical ID (Additional file 1: Table S1) to facilitate visualization of the vertical axis in the heatmap.
Figure 6Phylogenetic analysis of patient samples based on SNVs. SNV-shrunk genome alignment comprising nucleotides from just the variation position is used (delta 0). Colored branches represent the ethnicity of the person from which the sample was taken with blue representing White, orange representing Asian, and red representing African-American. Bootstrap values of 75 and higher are shown. Two major groups are noticed with 98 and 78 bootstrap values.