| Literature DB >> 32055257 |
Witold Tatkiewicz1, James Dickie2, Franchesca Bedford2, Alexander Jones2, Mark Atkin2, Michele Kiernan2, Emmanuel Atangana Maze2, Bora Agit1, Garry Farnham2, Alexander Kanapin3,4, Robert Belshaw2.
Abstract
BACKGROUND: The cell-surface attachment protein (Env) of the HERV-K(HML-2) lineage of endogenous retroviruses is a potentially attractive tumour-associated antigen for anti-cancer immunotherapy. The human genome contains around 100 integrated copies (called proviruses or loci) of the HERV-K(HML-2) virus and we argue that it is important for therapy development to know which and how many of these contribute to protein expression, and how this varies across tissues. We measured relative provirus expression in HERV-K(HML-2), using enriched RNA-Seq analysis with both short- and long-read sequencing, in three Mantle Cell Lymphoma cell lines (JVM2, Granta519 and REC1). We also confirmed expression of the Env protein in two of our cell lines using Western blotting, and analysed provirus expression data from all other relevant published studies.Entities:
Keywords: Cancer immunotherapy; HERV-K; HERV-K(HML-2); Leukemia; NGS; RNA-Seq; Transcriptomics; Transposable element; minION
Year: 2020 PMID: 32055257 PMCID: PMC7007669 DOI: 10.1186/s13100-020-0204-1
Source DB: PubMed Journal: Mob DNA
Details of HML-2 Env-encoding provirusesa
| Provirus name | Other common names | Genome coordinates (orientation) | GenBank Accession | Percentage of population with provirusb | Provirus age (my) | Full-length ORFs |
|---|---|---|---|---|---|---|
| 6q14.1 | K109 | chr6:78427019–36083(−) | AF164615 | 100 | < 2 | gag, envc |
| 7p22.1a & bd | HML-2.HOM K108 L & R | chr7:4622057–40031(−) | AC072054 | 100 | < 2 | a = pol, envc b = pol, env |
| 8p23.1a | K115 | chr8:7355397–64859(−) | AY037929 | 15 | 5e | pol, envc,f |
| 11q22.1 | K118 | chr11:101565794–75259(+) | N/A | 78 | < 2 | polg |
| 12q14.1 | K119 | chr12:58721242–30698(−) | AC074261 | 67 | < 2 | gag, envc |
| 19p12b | K113 | chr19:21841539h | AY037928 | 10–30 | < 0.5 | gag, pol, envc |
| 19q11 | ERVK-19 | chr19:28128498–37361(−) | Y17833 | 100 | < 5 | gag, pol, envc |
| Xq21.33 | N/A | chrX:93606603h | N/A | < 5 | 0.67–1.3 | gag, pol, env |
aData including name from ref. [20]. (genome coordinates from GRCh37/hg19), unless otherwise indicated
bData on proportion of individuals carrying full-length provirus taken from refs [18, 21, 22]; note, provirus 7p22.1 is polymorphic for the tandem duplication
cProtein expression shown by transfection [23] (12q14.1 & 19q11 identified as K74261 & K17833 respectively in that study)
dTandem duplication
eLTR divergence suggests an age of 5-9my for 8p23.1a [20] but, because provirus is human-specific, the integration date must be at the lower boundary (human chimp divergence is ~5mya)
fThe one nucleotide deletion in gag may be a sequencing error
gAs mentioned in the main text, this provirus has a premature stop codon 38 amino acid positions before the normal terminus in env, which might not prevent expression at the cell surface
hPre-integration site coordinates (Xq21.33 from ref. [18])
Fig. 1Western blot showing Env expression in JVM2 and REC1. MCF7 is present as a positive control. The uncleaved ~ 100 kDa full-length Env protein is clearly present in all cell lines. Other bands represent cleavage products, multiple glycosylation states and – at 55kDA – non-specific binding (see Methods)
Fig. 2Illustrated summary of workflow in our study. See Methods for details
Summary of sequencing results for the MCL cell linesa
| Cell line | JVM2 | G519 | REC1 | ||||
|---|---|---|---|---|---|---|---|
| Sequencing method | Ion Torrent (short-read) | MinION (long-read) | Ion Torrent (short-read) | ||||
| Enrichment | No | Yes Growth 1 | Yes Growth 2 | Yes Growth 3 | Yes | Yes | Yes |
| Total reads after QC | 3,255,142 | 2,749,743 | 2,672,868 | 3,508,762 | 218,872 | 2,839,730 | 3,073,933 |
| Total reads assigned to HML-2 (%) | 113b (0.003%) | 155,700 (5.6%) | 232,687 (8.7%) | 101,495 (2.9%) | 14,147 (6.9%) | 113,807 (4.0%) | 28,014 (0.91%) |
| Percentage reads from Env-encoding provirusesc | N/A | 5.3% | 15.9% | 12.8% | 20.8% | 17.2% | 2.3% |
aRaw Ion Torrent run reports are shown in Additional file 3
bAfter excluding the probably artifactual 52 hits to provirus 9q34.3 (see Additional file 2)
cPercentage of reads mapping to Env-encoding proviruses was calculated after normalisation via conversion to RPKM values (Reads Per Kilobase of transcript per Million mapped reads)
Fig. 3Relative expression of HML-2 proviruses in our study. a All experiments with default mapping. b Default mapping in JVM2 compared to counting only unique mappings and the results of a search for unique SNPs (data in Table 3; mean number of SNP hits calculated). Colours match those in Fig. 4. Env-encoding proviruses listed in same order in each bar. Provirus age and full-length ORFs indicated [20] (provirus 11q22.1 has a premature stop codon near its Env C-terminus). Genomic coordinates in Table 1 or as follows: 1p31.1 = chr1:75842771–9143; 1q21.3 = chr1:150605284–8361; 1q23.3 = chr1:160660575–9806; 1q22 = chr1:155596457–605636; 1q32.2 = chr1:207808457–12636; 3q12.3 = chr3:101410737–9859; 3q21.2 = chr3:125609302–18416; 5q33.3 = chr5:156084717–93896; 7q22.2 = chr7:104388369–93266; 11q12.3 = chr11:62135963–50563; 19p12a = chr19:20387400–97512; 19p12c = chr19:22757824–64561; 22q11.21 = chr22:18926187–35307; 22q11.23 = chr22:23879930–88810
Determining which of the Env-encoding proviruses are likely to contribute to protein expression in the JVM2 cell line
| Provirus | Number of uniquely mapping short reads (long reads in parentheses)a | Unique | Number of short reads with unique SNP allele (long reads in parentheses)c |
|---|---|---|---|
| 6q14.1 | 31 (0) | T(655)C | 0 (0) |
| G(799)A | 15 (0) | ||
| G(806)A | 9 (0) | ||
| 7p22.1a + bd | 83 (0) | C(51)T | 2 (0) |
| T(371)C | 0 (0) | ||
| A(1116)G | 0 (0) | ||
| 8p23.1a | 280 (1) | C(293)T | 0 (0)e |
| G(958)A | 365 (3) | ||
| G(1707)C | 47 (0) | ||
| C(1983)T | 1 (10) | ||
| 11q22.1 | 33 (0) | C(537) | 3 (0)f |
| G(1804)A | 0 (2)f | ||
| G(2005)A | 68 (3)f | ||
| 12q14.1 | 962 (9) | G(96)A | 275 (14) |
| T(465)C | 12 (0) | ||
| C(586)A | 89 (0)g | ||
| C(1484)T | 75 (0) | ||
| 19p12b | 17 (1) | C(421)T | 0 (0)h |
| C(970)G | 0 (0) | ||
| C(1885)A | 0 (1)i | ||
| A(1996)C | 0 (0) | ||
| 19q11 | 19 (1) | A(657)T | 0 (1) |
| T(1355)C | 0 (1) | ||
| T(1416)A | 0 (0) | ||
| T(1416)A | 0 (0)f | ||
| Xq21.33 | 52 (0) | C(52)A | 0 (0) |
| G(827)A | 16 (9) | ||
| G(1219)A | 0 (1) |
aAll multi-mapping short reads excluded. All long reads have a mapping quality score of at least 20 (equivalent to a mapping error of p = 0.01)
bEnv SNPs with allele that is found only in a single provirus. Positions relate to the Env alignment available in Additional file 9 with ancestral state inferred by commonality. In a few instances there is a second SNP within the 31 nt sequence and it is the combination that is unique
cAverage number of short-read matches to a 31 nt sequence spanning the SNP that are unique to the provirus, with corresponding result from the single minION run (17 nt match) in parenthesis
dBecause these proviruses are almost identical (resulting from a recent tandem duplication), and hence each would have few uniquely mapping reads, we repeated the analysis with provirus 7p22.1a deleted
eSNP allele also present in unexpressed provirus Xq11.1
fSNP allele also present in several other proviruses
gSNP allele also in provirus 5p12, which has only 5 unique short-read hits
hSNP allele also in unexpressed provirus 1q24.1
iSNP allele also in the expressed provirus 6p21.1
Fig. 4Relative expression of HML-2 proviruses in our and other studies. Relative expression of proviruses is shown as thickness of the pie slice. Env-encoding proviruses are indicated with an asterisk. Our three Mantle Cell Lymphoma cell lines – JVM2(Ion Torrent1–3 and minION), G519 and REC1 – are compared to published data from healthy donor lymphocytes, and other cancer cell lines and tissues (see text and Additional file 6 for details). The sequencing method is shown in parenthesis after the name. Results for Sanger and SMRT (Single Molecule Real Time) sequencing of three prostate biopsies are shown (one above the other) but note that the absence of provirus 22q11.23 from the Sanger sequencing is an artifact of the RT-PCR primers used (which incidentally were the same as those used in the melanoma and other cancers analysed by Sanger sequencing; note, PN233 is benign, the other two are cancerous). Results from two other lymphocyte donors not shown are very similar to the three shown here. The tandem duplication 7p22.1a + b (which have identical env sequences) are treated as one provirus in most studies so their expression values are combined here. Raw data available in Additional file 11
Fig. 5Problems in identifying proviruses from sequenced env transcripts or proteoforms. Hypothetical unique alleles in single nucleotide polymorphisms (SNPs) or single amino acid variants (SAAVs) are represented as coloured vertical bars (absence of the coloured bar denotes presence of the alternate variant) and premature stop codons represented as an asterisk. The figure shows possible difficulties that may arise in attempting to determine which proviruses gave rise to the Env protein in a patient or cell line. See Additional file 7 for further explanation of the mechanisms