| Literature DB >> 26589293 |
Jelle Scholtalbers1,2, Sebastian Boegel3,4, Thomas Bukur1,5, Marius Byl1, Sebastian Goerges1, Patrick Sorn1, Martin Loewer1, Ugur Sahin1,5,6, John C Castle1,7.
Abstract
Human cancer cell lines are an important resource for research and drug development. However, the available annotations of cell lines are sparse, incomplete, and distributed in multiple repositories. Re-analyzing publicly available raw RNA-Seq data, we determined the human leukocyte antigen (HLA) type and abundance, identified expressed viruses and calculated gene expression of 1,082 cancer cell lines. Using the determined HLA types, public databases of cell line mutations, and existing HLA binding prediction algorithms, we predicted antigenic mutations in each cell line. We integrated the results into a comprehensive knowledgebase. Using the Django web framework, we provide an interactive user interface with advanced search capabilities to find and explore cell lines and an application programming interface to extract cell line information. The portal is available at http://celllines.tron-mainz.de.Entities:
Mesh:
Substances:
Year: 2015 PMID: 26589293 PMCID: PMC4653878 DOI: 10.1186/s13073-015-0240-5
Source DB: PubMed Journal: Genome Med ISSN: 1756-994X Impact factor: 11.117
External data processed and integrated into the cell line portal
| Data type | Source | Number cell lines | Reference |
|---|---|---|---|
| Cancer cell line RNA-Seq data (2 × 101 bp) | CCLE | 781 | [ |
| Cancer cell line RNA-Seq data (2 × 75 bp) | Klijn | 301 | [ |
| Mutations | CCLE | 781 | [ |
| Mutations | Klijn | 675 | [ |
| HLA Class I and Class II types | Adams | 49 | [ |
Fig. 1Data integration and computational workflow. RNA-Seq data from 1,083 human cancer cell lines is downloaded from CCLE and Genentech (a) and mutation information for the cell lines is retrieved (b). The RNA-Seq reads are processed by our in-house pipeline (c), consisting of HLA typing and quantification, virus identification, gene expression analysis, and neo-epitope prediction. These data are integrated using consistent cell line names as primary identifier and annotate tissue and disease information using the onotology NCI Thesaurus (d). The results are freely accessible in the TRON Cell Line Portal (e) at http://celllines.tron-mainz.de
Fig. 2The TRON Cell Line portal (TCLP) offers two main views. a The sample information page provides the information of the selected cell line. b The advanced search functionality allows the search by a combination and exclusion of criteria
Fig. 3Example search: (a) ‘Show me all melanoma cell lines, that (i) are HLA-A*02:01 positive, (ii) express EGFR (between 1 and 1000 RPKM), (iii) have a BRAF p.V600E mutation and (iv) are derived from a female donor. b This search reveals three cell lines
Fig. 4Neo-epitope catalog of HCT116. Columns 1 to 3 describe the mutation, columns 4 to 7 show the HLA allele, the percentile rank, the sequence, and the IC50 of the predicted strongest binding neo-epitope, respectively. Columns 8 to 11 show information for the corresponding wild-type sequence. The marked row is the neo-epitope eluted and identified by mass spectrometry [27]