| Literature DB >> 21698250 |
Patrik L Ståhl1, Magnus K Bjursell, Hovsep Mahdessian, Sophia Hober, Karin Jirström, Joakim Lundeberg.
Abstract
Biomarker identification is of utmost importance for the development of novel diagnostics and therapeutics. Here we make use of a translational database selection strategy, utilizing data from the Human Protein Atlas (HPA) on differentially expressed protein patterns in healthy and breast cancer tissues as a means to filter out potential biomarkers for underlying genetic causatives of the disease. DNA was isolated from ten breast cancer biopsies, and the protein coding and flanking non-coding genomic regions corresponding to the selected proteins were extracted in a multiplexed format from the samples using a single DNA sequence capture array. Deep sequencing revealed an even enrichment of the multiplexed samples and a great variation of genetic alterations in the tumors of the sampled individuals. Benefiting from the upstream filtering method, the final set of biomarker candidates could be completely verified through bidirectional Sanger sequencing, revealing a 40 percent false positive rate despite high read coverage. Of the variants encountered in translated regions, nine novel non-synonymous variations were identified and verified, two of which were present in more than one of the ten tumor samples.Entities:
Mesh:
Substances:
Year: 2011 PMID: 21698250 PMCID: PMC3115972 DOI: 10.1371/journal.pone.0020794
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Principle of a differentially expressed protein, as seen on the tissue array images in the Human Protein Atlas database.
Immunohistochemical staining using antibodies targeting the interrogated protein in the tissue sectionss gives rise to a localized and distinct brown color. In the first case (top) the healthy breast tissue does not seem to show any expression of the PIP protein, whereas the breast cancer tissue shows heavy expression of the targeted protein. In the second case (bottom) the healthy breast tissue shows heavy expression of the ACTG1 protein, whereas the breast cancer tissue does not seem to show any expression of the targeted protein.
Distribution of mapped reads, bases, coverage and variants per sample.
| Sample | Mapped reads | Mapped bases | Ave coverage | SNVs | Coding SNVs |
| MID1 | 35243 | 9.9mil | 33 | 247 | 66 |
| MID2 | 30264 | 8.6mil | 28 | 215 | 50 |
| MID3 | 2739 | 0.8mil | 3 | 27 | 8 |
| MID4 | 22952 | 6.4mil | 21 | 217 | 55 |
| MID5 | 30869 | 8.5mil | 28 | 230 | 46 |
| MID6 | 18608 | 5.2mil | 17 | 221 | 63 |
| MID7 | 20239 | 5.6mil | 18 | 185 | 48 |
| MID8 | 38539 | 10.7mil | 35 | 252 | 53 |
| MID9 | 7815 | 2.2mil | 7 | 129 | 34 |
| MID10 | 40046 | 11.4mil | 38 | 259 | 65 |
| Total | 247314 | 69.3mil | 228 | 1982 | 488 |
Distribution of mapped reads, bases, coverage and single nucleotide variants (SNVs) across the ten multiplex enriched breast cancer tumor samples. An even distribution across all samples except those tagged with multiplex identifier tag 3 and 9. Poor amplification characteristics of MID 3 was later verified by qPCR [21] to be the reason for poor representation of the corresponding sample among the sequence reads. The sample tagged with MID 9 was added in a lower molar amount upon pooling due to a low DNA concentration after library preparation.
Encountered single nucleotide variants (SNVs) overview.
| Total unique positions with single nucleotide variants (SNVs) | 579 |
| In target genes | 467 |
| In exons (incl UTR) | 266 |
| In introns | 201 |
| In coding sequence (CDS) | 149 |
| In non-coding sequence | 430 |
| Non-synonymous in CDS | 66 |
| Novel non-synonymous not in dbSNP | 15 |
| Verified novel non-synonymous not in dbSNP | 9 |
| Novel non-synonymous present in more than one patient | 6 |
|
|
|
|
|
|
An overview of the single nucleotide variations encountered across the samples. 149 unique positions with single nucleotide variations (SNVs) were found in protein coding sequence, 15 of which gave rise to a different amino acid and were previously unreported in dbSNP (version 130). In total six of these turned out to be false positives when verified by bidirectional Sanger sequencing. Two of the remaining mutations were present in more than one individual and were located in the SATB1 and DDXB26 genes.