| Literature DB >> 27788220 |
Uma R Chandran1,2, Olga P Medvedeva1, M Michael Barmada1,3,4,5, Philip D Blood6, Anish Chakka1,2, Soumya Luthra1,2, Antonio Ferreira4, Kim F Wong4, Adrian V Lee2,7,8, Zhihui Zhang6, Robert Budden6, J Ray Scott6, Annerose Berndt5, Jeremy M Berg9, Rebecca S Jacobson1,2,9.
Abstract
BACKGROUND: The Cancer Genome Atlas Project (TCGA) is a National Cancer Institute effort to profile at least 500 cases of 20 different tumor types using genomic platforms and to make these data, both raw and processed, available to all researchers. TCGA data are currently over 1.2 Petabyte in size and include whole genome sequence (WGS), whole exome sequence, methylation, RNA expression, proteomic, and clinical datasets. Publicly accessible TCGA data are released through public portals, but many challenges exist in navigating and using data obtained from these sites. We developed TCGA Expedition to support the research community focused on computational methods for cancer research. Data obtained, versioned, and archived using TCGA Expedition supports command line access at high-performance computing facilities as well as some functionality with third party tools. For a subset of TCGA data collected at University of Pittsburgh, we also re-associate TCGA data with de-identified data from the electronic health records. Here we describe the software as well as the architecture of our repository, methods for loading of TCGA data to multiple platforms, and security and regulatory controls that conform to federal best practices.Entities:
Mesh:
Year: 2016 PMID: 27788220 PMCID: PMC5082933 DOI: 10.1371/journal.pone.0165395
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
TCGA Expedition Modules and associated TCGA Datatypes managed.
| TCGA Datatype | Data Source | Level | File Type | Repository Code Module |
|---|---|---|---|---|
| WGS_(cgHub) | cgHub | 1 | Bam | python script + BAMMetadataManager |
| WXS_(cgHub) | cgHub | 1 | Bam | python script + BAMMetadataManager |
| Protected_Mutations | TCGA | 2 | Vcf | ProtectedMutationsNoSplit |
| Protected_Mutations_MAF | TCGA | 2 | Maf | MafModule |
| Somatic_Mutations | TCGA | 2 | Maf | MafModule |
| RNA-Seq_(cgHub) | cgHub | 1 | bam, fastq | python script + BAMMetadataManager |
| RNASeq | TCGA | 2 | Vcf | RNASeqLevel2Module |
| RNASeq | TCGA | 3 | Txt | RNASeqLevel3Module |
| RNASeqV2 | TCGA | 3 | Txt | RNASeqV2Level3Module |
| CNV_(CN_Array) | TCGA | 1 | txt, mat | CNAModule |
| TCGA | 2 | Txt | CNAModule | |
| TCGA | 3 | Txt | CNAModule | |
| CNV_(SNP_Array) | TCGA | 1 | txt, cel | CNVModule |
| TCGA | 2 | Txt | CNVModule | |
| TCGA | 3 | Txt | CNVModule | |
| Firebrowse | 4 | txt, pdf, png, hml, gistic, Rdata, bed, properties | CN_Level4 | |
| CNV_(Low_Pass_DNASeq) | TCGA | 2 | Vcf | ProtectedMutationsNoSplit |
| Expression_Exon | TCGA | 1 | txt, cel | ExpExonModule |
| TCGA | 2 | Txt | ExpExonModule | |
| TCGA | 3 | Txt | ExpExonModule | |
| Expression_Gene | TCGA | 1 | txt, cel | ExpGeneModule |
| TCGA | 2 | Txt | ExpGeneModule | |
| TCGA | 3 | Txt | ExpGeneModule | |
| Expression_Protein | TCGA | 0 | Txt | ExpProteinModule |
| TCGA | 1 | tif, txt | ExpProteinModule | |
| TCGA | 2 | Txt | ExpProteinModule | |
| TCGA | 3 | Txt | ExpProteinModule | |
| Georgetown | 4 | tar.gz, tsv | MassSpecModule | |
| Bisulfite-Seq_(cgHub) | cgHub | 1 | Bam | python script + BAMMetadataManager |
| DNA_Methylation (array based) | TCGA | 1 | idat, txt | MethylModule |
| TCGA | 2 | Txt | MethylModule | |
| TCGA | 3 | Txt | MethylModule | |
| miRNA-Seq_(cgHub) | cgHub | 1 | Bam | python script + BAMMetadataManager |
| miRNASeq | TCGA | 3 | Txt | miRNASeqModule |
| Fragment_Analysis_Result (microsatellite instability) | TCGA | 1 | txt, fsa | MSIModule |
| Diagnostic_images | TCGA | 1 | Svs | ImageModule |
| Tissue_images | TCGA | 1 | Svs | ImageModule |
| Clinical (patient history and biospecimen) | TCGA | 2 | Txt | ClinicalModule |
Fig 1TCGA Expedition file name components.
Fig 2TCGA Expedition Directory Structure.
Fig 3Architecture of the Pittsburgh Genome Resource Repository.