| Literature DB >> 32637500 |
David F Grabski1,2, Aakrosh Ratan3, Laurie R Gray2,4, Stefan Bekiranov5, David Rekosh2,4, Marie-Louise Hammarskjold2,4, Sara K Rasmussen2,6.
Abstract
Human Endogenous Retroviruses are a class of genomic elements that are the result of ancient retroviral infection of the human germline. Many are biologically active elements that have been implicated in multiple diseases including cancer. The most recent class to invade the human genome is the HERV-K(HML-2) (HERV-K) family. Approximately 90 HERV-K proviruses and many smaller elements have been identified to date in the human genome. Additional proviruses are continually being discovered with the rapid advancement of deep-sequencing and long-read sequencing technologies. HERV-K proviruses are poorly annotated in human transcriptome databases making their analysis in RNA-seq data difficult. To enable analysis, we compiled the sequences of 91 HERV-K proviruses identified in NCBI GenBank (ID JN675007-JN675097) and created a proviral alignment tool for visualizing RNA-seq reads aligned across individual proviruses. This allowed us to analyse publicly available RNA-seq data from 10 hepatoblastoma samples and 3 normal liver controls (GEO Accession ID: GSE89775). This data report includes the raw FASTA sequence files of the HERV-K proviruses from NCBI, a differential gene expression list between hepatoblastoma samples, and genomic alignment figures from 5 HERV-K proviruses identified as differentially expressed in the companion research article "Upregulation of Human Endogenous Retrovirus-K (HML-2) mRNAs in hepatoblastoma: Identification of potential new immunotherapeutic targets and biomarkers [1]. The data provided here are available for other research groups interested in evaluating individual HERV-K proviral expression using RNA-seq data. Furthermore, the data analysis is highly flexible and will accommodate the addition of other HERV-K proviruses.Entities:
Keywords: Genomic alignment; Hepatoblastoma; Human endogenous retrovirus-K; Transcriptome analysis
Year: 2020 PMID: 32637500 PMCID: PMC7330144 DOI: 10.1016/j.dib.2020.105895
Source DB: PubMed Journal: Data Brief ISSN: 2352-3409
Gene Ontology (GO) molecular function analysis following differential gene expression analysis of high HERV-K expressing Hepatoblastoma vs low HERVK expressing Hepatoblastoma.
| Functional Category | Genes in list | Total genes | Enrichment False Discovery Rate (Adjusted- |
|---|---|---|---|
| Phospholipid binding | 32 | 441 | 0.021578265 |
| Collagen binding | 10 | 72 | 0.023567822 |
| Lipid binding | 45 | 761 | 0.023567822 |
| Identical protein binding | 92 | 1871 | 0.023567822 |
| Extracellular matrix structural constituent | 16 | 179 | 0.040987689 |
| Growth factor binding | 14 | 150 | 0.043859833 |
| Extracellular matrix binding | 8 | 56 | 0.043859833 |
| Protein kinase binding | 39 | 673 | 0.046968545 |
Gene Ontology (GO) cellular localization analysis following differential gene expression analysis of high HERV-K expressing Hepatoblastoma vs low HERVK expressing Hepatoblastoma (Top 20 terms).
| Functional Category | Genes in list | Total genes | Enrichment FDR |
|---|---|---|---|
| Secretory granule | 80 | 946 | 9.06E−12 |
| Vesicle | 225 | 4252 | 1.39E−11 |
| Secretory vesicle | 87 | 1108 | 1.39E−11 |
| Extracellular region part | 201 | 3693 | 2.14E−11 |
| Vesicle lumen | 45 | 386 | 3.07E−11 |
| Extracellular organelle | 141 | 2326 | 6.29E−11 |
| Cytoplasmic vesicle lumen | 44 | 385 | 6.29E−11 |
| Extracellular exosome | 140 | 2300 | 6.29E−11 |
| Extracellular vesicle | 141 | 2324 | 6.29E−11 |
| Extracellular space | 188 | 3479 | 1.41E−10 |
| Secretory granule lumen | 41 | 367 | 6.38E−10 |
| Extracellular region | 228 | 4617 | 2.16E−09 |
| Cytoplasmic vesicle part | 109 | 1761 | 7.46E−09 |
| Cytoplasmic vesicle | 144 | 2625 | 2.85E−08 |
| Intracellular vesicle | 144 | 2628 | 2.88E−08 |
| Collagen-containing extracellular matrix | 38 | 425 | 1.26E−06 |
| Platelet alpha granule lumen | 14 | 70 | 1.92E−06 |
| Extracellular matrix | 44 | 551 | 2.64E−06 |
| Endomembrane system | 225 | 4988 | 6.27E−06 |
| Lysosome | 54 | 797 | 1.78E−05 |
Kyoto Encyclopedia of Genes and Genomes Enrichment Analysis following differential gene expression analysis of high HERV-K expressing Hepatoblastoma vs low HERVK expressing Hepatoblastoma.
| Functional Category | Genes in list | Total genes | Enrichment False Discovery Rate (Adjusted |
|---|---|---|---|
| Amoebiasis | 14 | 96 | 0.000793149 |
| Complement and coagulation cascades | 12 | 78 | 0.001119036 |
| Fatty acid degradation | 8 | 44 | 0.005062775 |
| Legionellosis | 9 | 55 | 0.005062775 |
| Peroxisome | 10 | 82 | 0.012590092 |
| Focal adhesion | 17 | 199 | 0.012590092 |
| Human papillomavirus infection | 24 | 330 | 0.012590092 |
| PI3K-Akt signaling pathway | 24 | 353 | 0.020654396 |
| Rheumatoid arthritis | 10 | 89 | 0.020654396 |
| ECM-receptor interaction | 9 | 82 | 0.034941883 |
| AGE-RAGE signaling pathway in diabetic complications | 10 | 100 | 0.034941883 |
| Epithelial cell signaling in Helicobacter pylori infection | 8 | 68 | 0.034941883 |
| Salmonella infection | 9 | 85 | 0.036101621 |
| Regulation of actin cytoskeleton | 16 | 214 | 0.036577774 |
| Tryptophan metabolism | 6 | 42 | 0.038157306 |
| Oocyte meiosis | 11 | 124 | 0.041198024 |
| IL-17 signaling pathway | 9 | 92 | 0.047460432 |
| Toxoplasmosis | 10 | 111 | 0.04997533 |
Fig. 1Graphical representation of uniquely aligned reads across HERV-K provirus (A)17p13.1 (B) 12q24.33 (C) 1q21.3 (D) 3q27.2 and (E) 7q22.2 created in bioinformatics platform Geneious. The x-axis represents the genomic position along the provirus. Major annotated regions of the proviral genome at each provirus are illustrated at the bottom of the panel. Coding regions for viral proteins Gag, Pro, Pol, Env, Rec or Np9 are represented by green bars, but does not necessarily infer an open-reading frame for the protein. Individual reads from each sample are represented on the y-axis. Abbreviations: FT- fetal tumor (hepatoblastoma), NC- normal control (liver).
| Subject | Immunology and Microbiology: Virology |
| Specific subject area | Human Endogenous Retroviruses and Oncology |
| Type of data | Tables |
| How data were acquired | Bioinformatic analysis of HERV-K elements in RNA-seq data (Salmon, HISAT2, DESeq2) |
| Data format | Raw and Analysed |
| Parameters for data collection | 91 HERV-K proviral sequences contained in the NCBI Data Repository (GenBank ID JN675007-JN675097) were concatenated into a single FASTA file. The HERV-K FASTA file was used to analyze a publically available RNA-seq dataset of Hepatoblastoma and Normal Liver Controls. |
| Description of data collection | HERV-K FASTA file was used to perform a standard differential gene expression analysis across conditions with ensuing gene enrichment analysis (GO & KEGG) as well as a positional alignment analysis of RNA-seq reads across individual proviruses. |
| Data source location | University of Virginia School of Medicine Charlottesville, Virginia |
| Data accessibility | With the article |
| Related research article | David F Grabski, Aakrosh Ratan, Laurie R Gray, Stefan Bekiranov, David Rekosh, Marie-Louise Hammarskjold, Sara K Rasmussen; Upregulation of Human Endogenous Retrovirus-K (HML-2) mRNAs in hepatoblastoma: Identification of potential new immunotherapeutic targets and biomarkers; Jounral of Pediatric Surgery; Submitted. |