| Literature DB >> 27733501 |
Ian Streeter1, Peter W Harrison1, Adam Faulconbridge1, Paul Flicek1, Helen Parkinson1, Laura Clarke2.
Abstract
The Human Induced Pluripotent Stem Cell Initiative (HipSci) isf establishing a large catalogue of human iPSC lines, arguably the most well characterized collection to date. The HipSci portal enables researchers to choose the right cell line for their experiment, and makes HipSci's rich catalogue of assay data easy to discover and reuse. Each cell line has genomic, transcriptomic, proteomic and cellular phenotyping data. Data are deposited in the appropriate EMBL-EBI archives, including the European Nucleotide Archive (ENA), European Genome-phenome Archive (EGA), ArrayExpress and PRoteomics IDEntifications (PRIDE) databases. The project will make 500 cell lines from healthy individuals, and from 150 patients with rare genetic diseases; these will be available through the European Collection of Authenticated Cell Cultures (ECACC). As of August 2016, 238 cell lines are available for purchase. Project data is presented through the HipSci data portal (http://www.hipsci.org/lines) and is downloadable from the associated FTP site (ftp://ftp.hipsci.ebi.ac.uk/vol1/ftp). The data portal presents a summary matrix of the HipSci cell lines, showing available data types. Each line has its own page containing descriptive metadata, quality information, and links to archived assay data. Analysis results are also available in a Track Hub, allowing visualization in the context of public genomic annotations (http://www.hipsci.org/data/trackhubs).Entities:
Mesh:
Substances:
Year: 2016 PMID: 27733501 PMCID: PMC5210631 DOI: 10.1093/nar/gkw928
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.HipSci data flow. Sample metadata is collected from healthy donors and donors with rare genetic disease who each provide a skin biopsy. Sample metadata is also collected on the Fibroblasts and iPSC cell lines derived from these biopsies. This sample metadata is registered immediately upon creation and before any assays are performed in EMBL-EBI's BioSamples database. For each cell line a range of data is generated including quality control and genomics assays conducted by the Wellcome Trust Sanger Institute, proteomics assays by the University of Dundee, and cellular phenotyping by Kings College London. The quality control, genomics, and proteomics data is deposited in the relevant EMBL-EBI archive, and the cellular phenotyping data is released to the HipSci public FTP site. Table 1 lists the specific archive that assay data is submitted to and shows the distinction in destination between open and managed access data. Our web portal infrastructure is based upon an Elasticsearch engine to which the sample and assay data is loaded. The public json API uses standard Elasticsearch query syntax and through an intermediary web server allows any appropriate search query to be passed through to the search engine. The web portal app displays data fed from the public API creating searchable and filterable views of all of HipSci's rich data.
HipSci assay data archival strategy
| Assay | Data type | Consent | Archive |
|---|---|---|---|
| Sample collection | Sample descriptions | Open and Managed | BioSamples |
| Genotyping array | Genotypes and imputed genotypes | Open | EVA |
| Genotyping array | Genotypes and imputed genotypes | Managed | EGA |
| Expression array | Array Signal intensity data | Open | ArrayExpress |
| Expression array | Array Signal intensity data | Managed | EGA |
| Exome-seq | Aligned reads | Open | ENA |
| Exome-seq | Aligned reads | Managed | EGA |
| Exome-seq | Variant calls and imputed genotypes | Open | EVA |
| Exome-seq | Variant calls and imputed genotypes | Managed | EGA |
| RNA-seq | Aligned reads | Open | ENA |
| RNA-seq | Aligned reads | Managed | EGA |
| RNA-seq | Abundance of transcripts | Open | ENA |
| RNA-seq | Abundance of transcripts | Managed | EGA |
| Methylation array | Array Signal intensity data | Open | ArrayExpress |
| Methylation array | Array Signal intensity data | Managed | EGA |
| Proteomics | Mass spectrometry | Open and Managed | PRIDE |
| Cellular phenotyping | Morphology and DAPI/Edu staining intensity data | Open and Managed | HipSci FTP site |
This table describes the archive used for each assay and consent type combination. Consent for data is either ‘Open’ or ‘Managed’, corresponding to the terms agreed by the donor at the time of sample donation.
Figure 2.Using the HipSci cell line and data browser. The HipSci data browser provides views to explore the complex data the project has collected. (A) The main page of the browser is a table listing all the available cell lines with both a search box and specific filters to restrict the table by attributes like disease state, assay availability, banking availability, and source material. The table itself contains the cell line links, which take the user to cell line summary pages and assay links that take the user to pages listing all the files available for that line and assay type. (B) The cell line summary page itself contains descriptive information about a line, including disease state, derivation method, donor sex and tissue provider. Below the cell line summary, the table lists all files associated with a line, the assay that produced them, and what culture conditions and passage number they were produced under. (C) After the Assay file table are the line QC results for Pluritest and the HipSci copy number variation (CNV) check. These graphs present the results for the given line, its clones and the control data generated using the donor tissue sample.