| Literature DB >> 30072963 |
Matthew G Turnbull1, Renée N Douville1,2.
Abstract
Background: Endogenous retrovirus-K is a group of related genomic elements descending from retroviral infections in human ancestors. HML2 is the clade of these viruses which contains the most intact provirus copies. These elements can be transcribed and translated in healthy and diseased tissues, and some of them produce active retroviral enzymes, such as protease. Retroviral gene products, including protease, contribute to illness in exogenous retroviral infections. There are ongoing efforts to test anti-retroviral regimens against endogenous retroviruses. Herein, we examine the potential activity and diversity of human endogenous retrovirus-K proteases, and their potential for impact on immunity and human disease.Entities:
Keywords: RNAseq; active site motifs; amyotrophic lateral sclerosis; breast cancer; endogenous retrovirus-K (ERVK); prostate cancer; protease; protease inhibitor
Year: 2018 PMID: 30072963 PMCID: PMC6058741 DOI: 10.3389/fmicb.2018.01577
Source DB: PubMed Journal: Front Microbiol ISSN: 1664-302X Impact factor: 5.640
Retroviral HMMs from Pfam sorted according to the retroviral protein product from which they were derived.
| Protein | Pfam HMMs | ||
|---|---|---|---|
| • MA | • Gag_p10 | • Gag_p12 | • zf-CCHC_5 |
| • CA | • zf-CCHC | • Gag_p30 | • Gag_p17 |
| • NC | • Gag_p24 | • zf-H2C2 | • Gag_p19 |
| • Gag_MA | • gag_pre-integrs | • Retro_M | |
| • Retrotrans_gag | |||
| • PR | • Asp_protease_2 | • RVP | • Spuma_A9PTase |
| • gag-asp_protease | • RVP_2 | ||
| • RT | • RVT_1 | • RVT_3 | • RVT_N |
| • RVT_2 | • RVT_connect | • RVT_thumb | |
| • RH | • RNase_H | ||
| • IN | • IN_DBD_C | • Rve | • Integrase_Zn |
| • SU | • GP41 | • HERV-K_env_2 | • Env_gp36 |
| • Avian_gp85 | • MMTV_SAg | ||
| • DU | • dUTPase | ||
| • Accessory proteins | • Deltaretro_Tax | • Myc-N | • Tax |
| • F-protein | • Myc-LZ | • VPR | |
| • Other | • SPX | • Ribosomal_L22 | • UBN2_2 |
| • Borrelia_orfX | • Chromo | • UBN2_3 | |
| • Gypsy | • PHD | • Exo_endo_phos_2 | |
| • Peptidase_A17 | • PHO4 | • Ets | |
| • MRI | • G-patch | • Flu_PA | |
| • TYA | • PhoH | • RdRP_1 | |
| • Orbi_VP4 | • EXS | • Mononeg_RNA_pol | |
| • EVI2A | • Exo_endo_phos | • Flavi_NS5 | |
| • Peptidase_A2B | • Tctex-1 | • Bromo_coat | |
| • ALIX_LYPXL_bnd | • MRVI1 | • RdRP4 | |
| • Reo_sigma1 | • Pkinase_Tyr | • Peptidase_A3 | |
| • TLV_coat | • DUF1725 | • Flavi_glycop_C | |
| • DUF1011 | • DUF2155 | • BAF | |
| • HLH | • AA_permease_2 | • RNA_replicase_B | |
| • Pkinase | • Cupin_8 | • Birna_RdRp | |
| • Ras | • AA_permease_C | • Peptidase_C34 | |
| • bZIP_1 | • DUF4219 | • BLVR | |
| • UQ_con | |||
Distribution of PR and RT search results in the human genome GRCh38.
| Chromosome | tBLASTn | LTRdigest RVP | tBLASTn | LTRdigest RVT_1 | ||
|---|---|---|---|---|---|---|
| Raw | Curated | Raw | Curated | |||
| 1 | 32 | 15 | 97 | 34 | 115 | 15 |
| 2 | 17 | 6 | 79 | 36 | 95 | 7 |
| 3 | 28 | 14 | 109 | 57 | 136 | 13 |
| 4 | 35 | 14 | 113 | 53 | 144 | 15 |
| 5 | 23 | 11 | 74 | 39 | 140 | 9 |
| 6 | 30 | 6 | 95 | 37 | 104 | 9 |
| 7 | 18 | 3 | 58 | 24 | 92 | 3 |
| 8 | 24 | 13 | 91 | 32 | 85 | 13 |
| 9 | 8 | 1 | 39 | 13 | 49 | 3 |
| 10 | 19 | 5 | 52 | 23 | 80 | 8 |
| 11 | 18 | 12 | 71 | 40 | 104 | 12 |
| 12 | 15 | 10 | 75 | 37 | 82 | 10 |
| 13 | 5 | 2 | 34 | 19 | 30 | 1 |
| 14 | 10 | 4 | 35 | 13 | 57 | 4 |
| 15 | 7 | 3 | 17 | 7 | 26 | 3 |
| 16 | 10 | 4 | 23 | 12 | 14 | 2 |
| 17 | 11 | 3 | 21 | 17 | 10 | 2 |
| 18 | 2 | 3 | 19 | 5 | 28 | 2 |
| 19 | 42 | 6 | 76 | 38 | 18 | 6 |
| 20 | 5 | 0 | 15 | 0 | 20 | 0 |
| 21 | 3 | 0 | 12 | 6 | 10 | 0 |
| 22 | 7 | 1 | 14 | 6 | 12 | 1 |
| X | 22 | 7 | 108 | 14 | 267 | 7 |
| Y | 37 | 7 | 85 | 10 | 48 | 4 |
| Alternative | 52 | N/A | 64 | N/A | N/A | N/A |
| Total | 480 | 150 | 1412 | 572 | 1766 | 149 |
Number of sequences classified into each category from each search method.
| PR | RVP | RT | RVT_1 | |||||
|---|---|---|---|---|---|---|---|---|
| NT | AA | NT | AA | NT | AA | NT | AA | |
| ERVP | 12 | 12 | 1 | 1 | ||||
| ERV9 | 34 | 31 | 114 | 114 | 22 | 22 | ||
| ERVW | 13 | 13 | 53 | 53 | 10 | 9 | ||
| ERVH | 24 | 22 | 62 | 61 | 11 | 9 | ||
| ERVF | 2 | 2 | ||||||
| ERVFXA | 7 | 7 | ||||||
| ERVFB | 1 | 1 | ||||||
| ERVRB | 1 | 2 | 2 | |||||
| ERVI | 20 | 18 | ||||||
| ERVE | 3 | 3 | 20 | 20 | 3 | 3 | ||
| ERV3 | 2 | 2 | 2 | 2 | 1 | 1 | ||
| ERVT | 4 | 4 | 4 | 4 | ||||
| Gammaretrovirus total | 81 | 75 | 283 | 280 | 51 | 48 | ||
| HK1 | 39 | 39 | 1 | 1 | 16 | 16 | 2 | 2 |
| HK2 | 62 | 60 | 32 | 20 | 53 | 53 | 36 | 35 |
| HK4 | 7 | 7 | 2 | 2 | 6 | 6 | 2 | 2 |
| HK9 (K14C) | 15 | 12 | 2 | 2 | 5 | 5 | 2 | 2 |
| HK10 (KC4) | 5 | 1 | 1 | |||||
| HK3 | 216 | 215 | 22 | 22 | 81 | 80 | 14 | 14 |
| HK5 | 36 | 36 | 38 | 38 | 1 | 1 | ||
| HK6 | 52 | 54 | 11 | 11 | ||||
| HK7 | 3 | 3 | 2 | 2 | 10 | 10 | 7 | 7 |
| HK8 | 34 | 35 | 8 | 8 | 35 | 34 | 17 | 16 |
| Betaretrovirus total | 464 | 461 | 69 | 57 | 260 | 254 | 82 | 79 |
| ERVL | 6 | 6 | 5 | 4 | ||||
| UNPLACED | 16 | 19 | 0 | 18 | 10 | 19 | 10 | 17 |
| INTERNAL | 1 | 1 | ||||||
| Total | 480 | 150 | 572 | 149 | ||||
Frequency of co-occurrence for observed ERVK active site and associated helix motifs.
| Number | B1 | C2 | Number | B1 | C2 | Number | B1 | C2 |
|---|---|---|---|---|---|---|---|---|
| 52 | DTGAD | GRDLL | 17 | DTGVD | GRDLL | |||
| 4 | DTRAD | GRDLL | 1 | DTVVD | GRDLL | 1 | DTGDD | GRDLL |
| 4 | DTEAD | GRDLL | 1 | DTVAD | GRDLL | 1 | DTGAN | GRDLL |
| 3 | DTGSD | GRDLL | 1 | DTGVD | GRHLL | 1 | DTGAN | GKDLL |
| 3 | DTGED | GRDLL | 1 | DTGVD | GKDLL | 1 | DTGAD | GRELL |
| 3 | DTGAD | GQDLL | 1 | DTGTD | GRDLL | 1 | DTGAD | GRDIL |
| 3 | DTGAD | GKDLL | 1 | DTGMD | GRDLL | 1 | DTGAD | GQELL |
| 3 | DTDAD | GRDLL | 1 | DTGID | GRDLL | 1 | DTAAD | GRDLL |
| 2 | DTEVD | GRDLL | 1 | DTGGD | GRDLL | |||
| 4 | DTGAD | RRDLL | 1 | DTRSD | GRDLL | 1 | DTGAD | GHLL |
| 3 | DTGAD | ERDLL | 1 | DTRMD | GKEIY | 1 | DTEAD | GQDLL |
| 2 | HTGAD | GRDLL | 1 | DTRAD | GWDPL | 1 | DRGMD | GRDLL |
| 2 | DTVVD | GGTLL | 1 | DTGVD | LSPHTFI | 1 | DMGAD | DQDLL |
| 2 | WAWAV | GWDLL | 1 | DTGVA | GRDLL | 1 | DMGAD | |
| 1 | VTGVD | GRDLL | 1 | DTGPD | 1 | DIGVD | GRDLL | |
| 1 | ITWGR | GVDN | 1 | DTGVD | RRDLL | 1 | DIGAD | GGDLL |
| 1 | GWDLL | 1 | DTGAD | VWDLL | 1 | DIGAD | ERDLLL | |
| 1 | ETGVD | GWDLL | 1 | DTGAD | GVDLL | 1 | DAGAD | GRDLL |
| 1 | DTVAD | GGDLL | 1 | DTGAD | GTRPI | 1 | ATGAD | GRDLL |
| 1 | DTRTD | GRDVL | 1 | DTGAD | GRYLL | 1 | ALGAD | RRDLL |
RNA-Seq Libraries from the Sequence Read Archive analyzed in this study.
| Accession | Condition | Tissue | Library | Sequencing platform | Samples | Reference |
|---|---|---|---|---|---|---|
| SRP064478 | sALS | Cervical spinal cord | Truseq Stranded Total RNA HT Kit (Illumina) | Nextseq 500 from ∼260 bp fragments | 7 sALS, 8 non-ALS, without technical replication | |
| ERP000550 | Prostate cancer | Prostate | Oligo dT purification, fragmentation, and random hexamer PCR | HiSeq 2000 2 × 90 bp from ∼200 bp fragments | 14 tumor/non-tumor from same patient, without technical replication | |
| SRP058722 | High grade ductal carcinoma | Breast | ScriptSeq v2 RNA-Seq Kit (Epicenter) | HiSeq 2000 2 × 76 bp, fragment length unreported | 25 ductal carcinoma | |
Databases and software used in this study.
| Name | URL | Version or accession date |
|---|---|---|
| Bowtie2 | 2.3.3 | |
| BLAST | 2.2.28 | |
| Cytological bands | 2015-11-20 | |
| FASTQC | 0.11.5 | |
| Figtree | 1.4.1 | |
| Geneious | ||
| GenomeTools | 1.5.1 | |
| GIMP | 2 | |
| GRCh38 | 2014-08-27 | |
| HMMER | 3.1b1 | |
| Htslib | 1.5-22-g7a6854b | |
| JAVA | 1.8.0-144 | |
| jModelTest | (D Darriba) | 2 |
| MACSE | (V Ranwez) | 1.01b |
| Pfam-A | 2014-09-05 | |
| ProtTest | (D Darriba) | 3 |
| RAxML | 7.2.8 | |
| Repbase | (J Jurka) | 19.07 |
| RSYNC | v3.0.6 protocal v30 | |
| Samtools | 1.5-9-g473d6a4 | |
| Sratoolkit | 2.8.2-1 | |
| Trimmomatic | 0.36 | |