Literature DB >> 22489537

Search for an aetiological virus candidate in chronic lymphocytic leukaemia by extensive transcriptome analysis.

Natalia Rego¹, Sergio Bianchi, Pilar Moreno, Helena Persson, Anders Kvist, Alvaro Pena, Pablo Oppezzo, Hugo Naya, Carlos Rovira, Guillermo Dighiero, Otto Pritsch.

Abstract

As an approach to determining the aetiology of chronic lymphocytic leukaemia (CLL), we searched for a virus expressed in human CLL B-cells by combining high-throughput sequencing and digital subtraction. Pooled B-cell mRNA transcriptomes from five CLL patients and five healthy donors were sequenced with 454 Life Sciences technology. Human reads were excluded by BLAST (Basic Local Alignment Search Tool) and BLAT (BLAST-like alignment tool) searches. Remaining reads were screened with BLAST against viral databases. Purified B-cells from two CLL patients, with and without stimulation by phorbol-esters, were sequenced using Illumina technology to achieve depth of sequencing. Burrows-Wheeler Aligner mapping and BLAST searches were used for the Illumina data. Pyrosequencing resulted in about 400 000 reads per sample. No viral candidate could be found. Illumina single-end sequencing for 115 cycles yielded an average of 26 ± 2·5 million filtered reads per sample, of which 2·2 ± 0·6 million remained unmapped to human references. BLAST searches of these reads against viral and human databases assigned nine reads to an Epstein-Barr virus origin, in one sample following phorbol-ester stimulation. Other reads showing a putative viral origin were dismissed after further analysis. Despite an in-depth analysis of the CLL transcriptome reaching more than 100 million sequences, we have not found evidence for a putative viral candidate in CLL.

Entities: Chemical Disease Gene Species

Mesh：

Year: 2012 PMID： 22489537 PMCID： PMC7161782 DOI： 10.1111/j.1365-2141.2012.09116.x

Source DB: PubMed Journal: Br J Haematol ISSN： 0007-1048 Impact factor: 6.998

Introduction

Chronic Lymphocytic Leukaemia (CLL) is the most common form of leukaemia in Western countries and mainly affects elderly individuals. It follows an extremely variable course, with survival ranging from months to decades. Available treatments often induce disease remission, but almost all patients will relapse and there is a consensus that CLL remains incurable. Multiple instances of the disease in some families and a low incidence among individuals of Japanese origin (including those who migrated to Hawaii), suggest that genetic influences are stronger than environmental factors in its pathogenesis (Weiss, 1979). Genome‐wide association studies have detected some loci influencing CLL risk (Sellick et al, 2007; Crowther‐Swanepoel et al, 2010; Slager et al, 2011) and recent whole‐genome and whole‐exome sequencing studies addressed the repertoire of somatic mutations and other genetic lesions in the disease (Fabbri et al, 2011; Puente et al, 2011; Wang et al, 2011; Quesada et al, 2012). Among the few genes found recurrently mutated, NOTCH1 and SF3B1 emerged as predictors of poor survival (Sportoletti et al, 2010; Fabbri et al, 2011; Puente et al, 2011; Rossi et al, 2011; Wang et al, 2011; Quesada et al, 2012). Several studies have started to shed light on the nature of genetic predisposition of CLL, however none of the reported genetic aberrations (mutations, deletions or trisomies) have been shown to be a major cause of the disease and the basis of this disorder remains unknown (Sellick et al, 2007; Dighiero & Hamblin, 2008). Since the Peyton Rous discovery in 1911 that avian sarcoma could be transmitted by non‐cellular filtrates, followed by the discovery of mammary tumour virus, murine leukaemia virus and polyoma virus between 1931 and 1958, at least six different viruses have been implicated in about 15% of human cancers (Klein, 2002). They include the DNA viruses Epstein‐Barr virus (EBV), human papilloma virus, hepatitis B virus and Human Herpes Virus 8 (HHV‐8), as well as RNA viruses, such as hepatitis C or human adult T cell leukaemia virus (HTLV). Class I or direct‐transforming RNA tumour viruses carry cellular oncogenes but are not known to play any role in tumour causation in nature. Class II or chronic RNA tumour viruses do not carry cell‐derived oncogenes but act through proviral DNA insertion into the immediate neighbourhood of a cellular oncogene. Feline, murine, and avian leukaemia viruses belong to this category. HTLV and Bovine Leukaemia Virus (BLV) expand the pre‐neoplastic cell population through transactivation induced by viral Tax protein, thereby providing the seed for secondary cellular changes. An animal model of CLL, Enzootic Bovine Leukaemia (EBL), is induced by BLV, a retrovirus belonging to the Deltaretrovirus genus and closely related to the HTLV‐1 virus. This virus induces hyper lymphocytosis of CD5+ B‐cells in a significant number of animals and causes aggressive disease in about 10% of cases (Gillet et al, 2007). Despite this suggestive animal model, little research has been devoted to this issue by using classical polymerase chain reaction (PCR) technology and no known virus has been identified in the case of human CLL (Hermouet et al, 2003). Attempts have been made to associate EBV with CLL, but accumulated data argue against a role of EBV in the early development of the disease and would only suggest a possible involvement in the generation of secondary malignancies (Tsimberidou et al, 2006; Dolcetti & Carbone, 2010; Tarrand et al, 2010). Recent studies have also addressed the presence of Merkel cell polyomavirus (MCPyV) in CLL (Koljonen et al, 2009; Shuda et al, 2009; Pantulu et al, 2010; Tolstov et al, 2010; Teman et al, 2011), but the results regarding MCPyV DNA are contradictory, showing extremely low MCPyV copy number if detected. Given the high seroprevalence of this virus in the analysed populations and the progressive immunodeficiency associated to CLL, MCPyV detection might be more likely explained by background infection or low‐level viral reactivation in CLL‐induced immunodeficiency (Koljonen et al, 2009; Shuda et al, 2009; Pantulu et al, 2010; Tolstov et al, 2010; Teman et al, 2011). In recent years, molecular techniques have been successfully applied in the identification of infectious agents, such as Borna virus, Kaposi sarcoma‐associated herpesvirus (HHV‐8), West Nile virus, and the Severe Acute Respiratory Syndrome (SARS) coronavirus. Such efforts fail, however, when the agents in question are truly novel or sufficiently distant in sequence from related agents to allow hybridization, or if they are poorly expressed at the RNA level. The advent of high‐throughput sequencing in combination with bioinformatics analysis could pave the way to the discovery of new infectious pathogens. This approach led to the identification of a new species of arenavirus in patients suffering from an infectious disease for which a causative pathogen could not be detected using any of the available diagnostic procedures (Palacios et al, 2008). Similarly, the novel MCPyV was characterized from Merkel cell carcinoma samples and recognized as a contributing factor in the pathogenesis (Feng et al, 2008). Thus, this methodology is suitable for applications in which the infectious pathogen is unknown. In the current study, we searched for the presence of a virus expressed at the RNA level in human CLL, by using massive sequencing technology associated with a digital transcriptome subtraction method.

Methods

Patients

Peripheral blood samples were obtained from seven CLL patients meeting the diagnostic criteria of the International Workshop on Chronic Lymphocytic Leukaemia‐sponsored Working Group (Hallek et al, 2008) and five healthy donors. Written informed consent was obtained in accordance with the ethical regulations of Uruguay and the Declaration of Helsinki. The selected CLL patients were diagnosed with a progressive phenotype (Table 1). Peripheral blood mononuclear cells (PBMCs) were isolated by centrifugation on Ficoll‐Hypaque (Pharmacia Fine Chemicals, Uppsala, Sweden) and immediately cryopreserved in liquid nitrogen.

Table 1

Clinical and molecular characteristics of patients

Patient	Sex	Age (years)	Binet	CD38	LPL	MS	Outcome
454 pool
CLL 046	F	45	A	Neg	Pos	UM	Progressor
CLL 072	M	61	C	Neg	Pos	UM	Indolent
CLL 080	F	63	C	Neg	Pos	UM	Progressor
CLL 083	M	60	B	Pos	ND	UM	Progressor
CLL 096	M	70	C	Neg	Neg	UM	Progressor
Illumina
CLL 238	M	76	C	Neg	Neg	M	Progressor
CLL 250	M	57	B	ND	Pos	UM	Progressor

Binet, Binet stage; LPL, lipoprotein lipase; MS, mutational status; F, female; M, male; Neg, negative; Pos, positive; ND, no data; UM, unmutatated; M, mutated.

Clinical and molecular characteristics of patients Binet, Binet stage; LPL, lipoprotein lipase; MS, mutational status; F, female; M, male; Neg, negative; Pos, positive; ND, no data; UM, unmutatated; M, mutated.

454 Life Sciences pyrosequencing

Total RNA from five CLL patients and five healthy donors was extracted using the RNeasy Midi Kit (Qiagen, Alameda, CA, USA). The integrity of the RNA was analysed on the Agilent 2100 Bioanalyser (Quantum Analytics, Foster City, CA, USA). Poly‐adenylated RNA was purified with Dynabeads® mRNA Purification Kit (Invitrogen, Carlsbad, CA, USA). Two double‐stranded cDNA samples were synthesized for a pool of the five CLL patients and a pool of the normal individuals with oligo(dT) primer using the SuperScript Double‐stranded cDNA Synthesis Kit (Invitrogen). Libraries were made from both pools and each library was pyrosequenced on one full LR70 plate (two plates total) using the Standard 454 Life Sciences FLX Sequencing Chemistry (454 Life Sciences, Branford, CT, USA).

Illumina Genome Analyzer sequencing

Two additional CLL patients were sequenced individually. Purification of the B‐cell population was performed by flow cytometry (MoFlo cell sorter; Beckman Coulter Inc., Brea, CA, USA) with phycoerythrin conjugated anti‐CD19 monoclonal antibody (DAKO, Glostrup, Denmark). The purity of isolated sub‐populations was shown to be greater than 97% after flow cytometric evaluation. For activation of isolated B‐cells, 12‐O‐tetradecanoylphorbol‐13 acetate (TPA) was dispensed from 0·3 mmol/l stock solution in dimethyl sulfoxide to a final concentration of 0·15 μmol/l. Following TPA addition, CLL B‐cells were incubated for 17 h. Maturation of TPA‐treated cells was documented by a 20% increase in mean cell volume and by characteristic changes in cell morphology. Total RNA from activated and non‐activated B‐cells was isolated with TRIzol® (Invitrogen) and quality control was performed as previously described. Libraries were prepared using a slightly modified version of the pre‐release protocol for directional RNA‐Seq available from Illumina (Illumina Inc., San Diego, CA, USA). Normalization using duplex‐specific thermostable nuclease (DSN; Evrogen, Moscow, Russia) was then performed before the cluster generation step. The normalization is based on denaturation of DNA libraries followed by addition of DSN after partial renaturation. Highly expressed genes, such as ribosomal RNA, tRNA and house‐keeping genes, rapidly renature to form double‐stranded DNA and are degraded by the DSN enzyme, while DNA molecules derived from less abundant transcripts are preserved (Zhulidov et al, 2004; Bogdanova et al, 2009). Single‐read sequencing was conducted for 115 cycles on the Illumina Genome Analyzer IIx, using one flowcell lane per library. Base calling and quality scores were produced using the illumina genome analyzer software v1.6.

Bioinformatics analysis

Sequences were filtered and trimmed as appropriate for the sequencing technology used in each case. After removal of ribosomal sequences, reads were mapped against the human genome and transcriptome references downloaded from UCSC (University of California, Santa Cruz) Genome Browser (Kent et al, 2002) and Ensembl (Flicek et al, 2011) databases. Remaining reads (i.e., those for which we could not determine a human origin) were screened through BLAST (Basic Local Alignment Search Tool) (Altschul et al, 1997) searches in RefSeq (Pruitt et al, 2009) viral databases to identify putative xenobiotic sequences, if present. These steps follow the main idea of the original digital subtraction proposal (Weber et al, 2002), but we introduced modifications or changed alignment tools in order to cope with the different attributes of the sequence data produced by each technology (Fig 1A–B summarizes the pipelines employed for 454 and Illumina data, respectively). A detailed description of the analysis and simulations conducted is given in online supplementary Data S1 Fig S2 and Table SI.

Figure 1

Computational subtraction approach developed to identify non‐human sequences in the chronic lymphocytic leukaemia (CLL) transcriptome. (A) Subtraction strategy for 454 data. (B) Subtraction strategy for Illumina data.

Results

Analysis of 454 data

Pyrosequencing resulted in 400 324 and 454 150 raw reads for the pooled CLL and Normal mRNA samples, respectively (Fig S1). After a trimming and filtering step including removal of ribosomal reads, 376 731 and 433 255 reads remained for the CLL and Normal samples, respectively [CLL average length: 224 ± 56 nucleotides (nts); Normal average length: 241 ± 47 nts). To isolate reads present only in the CLL pooled sample, an initial search using BLAST allowed us to discard 259 881 CLL reads that were also present in the Normal sample. The remaining sequences were sequentially aligned with BLAT (BLAST‐like alignment tool) (Kent, 2002) to the human transcriptome and genome, returning a list of 2968 candidate reads (reads with no or poor alignments). No significant matches were found for these reads in viral databases, the few recovered alignments were short and had high E‐values (Fig S1). Though we did not necessarily expect high similarity and alignment quality to known viral agents, when carefully re‐examined, these reads also displayed alignments of similar quality against the human transcriptome or genome. Thus, we were not able to find any putative sequence of viral origin (known or novel) that was present in the CLL pooled sample but absent in the Normal pool.

Analysis of Illumina data

The 454 sequencing was limited to approximately 400 000 sequences from the CLL transcriptome, a sequencing depth that bioinformatics analysis suggested would be insufficient to detect rare viral transcripts (see details below). We therefore decided to study two additional patients using Illumina sequencing technology, which offers greatly increased depth at the expense of shorter read length. Both patients were diagnosed with a progressive malignancy, even in the IgG non‐mutated case (Table 1). Two substantial improvements in this dataset were the preparation of libraries from normalized total RNA (rather than only poly‐adenylated mRNAs) and a 90‐fold increase in the amount of reads, enabling the detection of transcripts with very low expression level. Furthermore, to exclude the possibility that the virus could be integrated in the genome and barely transcribed, we analysed the isolated cells prior to and after treatment with phorbol‐ester, a potent protein kinase C (PKC) activator (Blumberg, 1988). Illumina sequencing yielded 36 ± 2·7 million raw reads per sample, from which 26 ± 2·5 million reads per sample remained after filtering steps (Table 2, Data S1). As previously mentioned, given this very large amount of sequence data, changes to the mapping tools and final BLAST searches of the subtraction pipeline were necessary. To begin with, we used Burrows‐Wheeler Aligner (BWA) (Li & Durbin, 2009) to discard reads of human origin by successive mapping to ribosomal RNAs, the human genome and transcriptome databases. A summary of BWA mapping results (Table 2) shows that the ribosomal content of our total RNA libraries was 18·0% ± 3·3%, a figure comparable to other ribosomal depletion protocols. Regarding non‐ribosomal reads, 67·3% ± 6·2% and 7·2% ± 0·6% aligned to genome and transcriptome databases respectively, while 2·2 ± 0·6 million reads remained unmapped.

Table 2

Illumina sequencing, filtering and mapping statistics

Reads	CLL250	CLL250_act	CLL238	CLL238_act
Raw counts	34 248 055	38 889 936	33 948 020	38 421 323
Filtering steps
Purity filter	28 771 020	32 477 113	29 210 502	31 924 855
Additional filter	22 479 555	27 956 701	26 039 540	27 613 647
Burrows‐Wheeler Aligner mapping steps
Ribosomal	5 050 513	5 022 371	3 851 214	4 627 416
Genomic	13 790 644	18 914 191	18 738 268	17 791 494
Transcriptomic	1 634 399	2 050 829	1 677 217	2 144 262
Final unmapped	2 004 005	1 969 310	1 772 841	3 050 475

Illumina sequencing, filtering and mapping statistics In the second phase, these unmapped reads were first subjected to a nucleotide BLAST similarity search against a database of viral genomes [Fig 1B; see Data S1 for a discussion on the BLASTN (nucleotide BLAST) protocol used]. An average of 219 499 ± 5111 reads per sample had matches in at least one viral genome (Table 3). These reads were thus considered as initial candidate viral reads. However, many candidate viral sequences also matched the human genome or transcriptome and could, by means of a linear discriminant function (see Data S1 for details), be assigned to a human origin (Table 3). Next, BLAST similarity searches were performed for both unmapped candidate viral reads and reads with a poor match in human (posterior probability lower than 0·9), against the whole non‐redundant nucleotide database (nt), followed by a careful inspection of the results.

Table 3

BLASTN results of unmapped reads

	CLL250	CLL250_act	CLL238	CLL238_act
Initial viral reads	230 221	200 276	163 277	284 222
Non‐human readsa	689	551	705	790
Final viral readsb	0	0	0	9

All initial viral reads were BLASTN searched in human databases. Non‐human reads are those that were not aligned and, if aligned, those classified by the discriminant function either as ‘viral’ or ‘human’ with posteriors lower than 0·9.

After a careful inspection of BLASTN searches in the non‐redundant database, only nine reads were conclusively assigned to a viral origin.

BLASTN results of unmapped reads All initial viral reads were BLASTN searched in human databases. Non‐human reads are those that were not aligned and, if aligned, those classified by the discriminant function either as ‘viral’ or ‘human’ with posteriors lower than 0·9. After a careful inspection of BLASTN searches in the non‐redundant database, only nine reads were conclusively assigned to a viral origin. As expected, reads with higher similarity to mammalian sequences constituted an abundant class in the four samples (Fig 2; Table SII and Data S2). Most reads matching viral sequences mapped anti‐sense to the primer binding site (PBS) of the Mason‐Pfizer monkey virus (MPMV). At its 5′ end, this primate betaretrovirus carries two long terminal repeats separated by 63 nucleotides complementary to tRNA‐Lys, the usual primer for reverse transcription of the viral genome (Sonigo et al, 1986). Further analysis revealed that these reads would be chimeric‐like sequences, where the first stretch of the read derives from human tRNA‐Lys precursors and the following region is variable (Data S1 and Fig S3). As supported by the absence of additional MPMV reads in our transcriptomes (i.e., reads aligning to MPMV genome loci other than PBS), we definitively excluded a true viral origin for these reads (Data S1). Other reads showing a putative viral origin were also dismissed because: (i) they aligned to viral genomes with qualities not better than to human [most of them were classified as human by linear discriminant analysis (LDA), although with a posterior probability lower than 0·9]; (ii) had low complexity repeats; (iii) were unknown to be related to human disease (see Grouper iridovirus, Sindbis virus, Glypta fumiferanae and a Lausannevirus isolate in online supplementary Data S2). Finally, nine reads from one of the two phorbol‐ester activated samples (Table 3 and Data S2) showed high similarity to the Epstein‐Barr virus genome (EBV), in a region of the BcLF1 gene. In this case, alignments to the EBV genome involved the full length of the reads with non‐hits to neither human genome nor transcriptome. Thus, these nine reads were considered to have a true viral origin.

Figure 2

Taxonomy of BLASTN results of the final candidate viral reads in the non‐redundant database. The shown classes were defined arbitrarily (online Data S1 and Table SII). Only the ‘Epstein‐Barr virus’ class involves reads with a final viral assignment. MPMV, Mason‐Pfizer monkey virus.

Illumina versus 454 data

As expected, the use of Illumina sequencing technology drastically increased the transcriptome depth reached. Comparison of read counts from all CLL samples showed: (i) 14 980 and 20 436 genes were detectable by 454 and Illumina, respectively; (ii) no gene present in 454 data was absent from Illumina data; (iii) absent genes from 454 showed counts with medians higher than 0 in Illumina samples (Fig S4 and Data S1). For instance, 3993 genes absent from 454 data were detected by at least five reads with Illumina sequencing (3355 fulfilled the additional requirement of being present in at least two samples). In addition, we focused on two well‐known genes in CLL disease: ADAM29 and LPL. Specifically, these genes are known to be differentially expressed in mutated and unmutated CLL patients respectively, showing an opposite behaviour between them (Oppezzo et al, 2005). Raw read counts from the 454 assay (pool of unmutated patients) could only detect seven reads for LPL and one read for ADAM29. Meanwhile, Illumina technology accounted for 477 reads for LPL and only one read for ADAM29, in the case of the unmutated sample. For the mutated patient, Illumina raw read counts were one and 420 for LPL and ADAM29, respectively. Notably, these expected LPL/ADAM29 expression ratios could only be detected by Illumina sequencing, demonstrating the higher resolution of this method.

Discussion

The first draft of the human genome enabled the development of new techniques to detect the presence of foreign sequences in a human transcriptome through in silico filtering of the transcripts against the reference genome (Weber et al, 2002). Xu et al (2003) showed that this simple digital subtraction approach was successful in detecting viral sequences from an Epstein‐Barr virus infected tissue. With the advent of massively parallel sequencing technologies, a renewed interest in the discovery of novel pathogens arose and new viruses have recently been identified as causative agents of human disease (Feng et al, 2008; Palacios et al, 2008). To date, 454 Life Sciences has been the chosen technology to discover new pathogens from transcriptome data, mainly because of the longer read length, which simplifies identification of the novel agent once the non‐human reads are detected. However, as Illumina technology has achieved increased read lengths and developed protocols for paired‐end sequencing, the advantage of 454 has become overpowered by the impressive amount of sequence data produced by Illumina Genome Analyzer and HiSeq sequencers, at least in cases where the foreign agent is expected to be weakly expressed in the sample library. In this respect, Illumina sequencing paired with a digital subtraction strategy has recently been shown to be sensitive enough to mine RNA‐Seq libraries with decreasing amounts of a spiked RNA‐virus (Moore et al, 2011). Recently, small RNA Illumina sequencing also proved suitable for discovering viruses in plants and insects, a strategy based on the small interfering RNA (siRNA) immune response that these organisms trigger against virus infection (Kreuze et al, 2009; Wu et al, 2010; Ma et al, 2011). Presently, no causative factor has been firmly linked to the aetiology of CLL, the most common form of leukaemia among Caucasian populations. To analyse the putative involvement of a viral agent in its onset, we conducted two surveys on CLL transcriptomes acquired by massively parallel sequencing technologies. First, we applied digital subtraction of human sequences to 454 sequencing data for a polyA‐positive RNA from a pool of five CLL patients. To improve sensitivity, we then prepared paired samples of B‐cells with and without prior treatment with a potent cell activator for two additional patients. Normalized libraries for total RNA were sequenced with Illumina technology, increasing the data yield over 454 sequencing. As we have shown, this approach allowed an important increase in transcript detection, very pronounced for genes expressed at low levels. Nine reads of viral origin (EBV) were detected in one activated sample. EBV prevalence in humans is approximately 90% and the evidence accumulated so far strongly support its involvement in the pathogenesis of a wide spectrum of human malignancies, particularly in the case of immuno‐compromised patients (Thompson & Kurzrock, 2004). As for the role of EBV in CLL development, EBV infection has been demonstrated to only play a role in the case of secondary malignancies, generally involving a new clone (Tsimberidou et al, 2006; Dolcetti & Carbone, 2010; Tarrand et al, 2010). Thus, given available data, the assignment of nine reads to an EBV origin were considered neither unexpected nor relevant to this work. Retroviruses were first discovered as tumour‐inducing agents in animals. The discovery of HTLV‐1 and its role in adult T cell leukaemia confirmed that retroviruses can also be oncogenic in humans, initiating extensive research on retroviral aetiology of human chronic diseases and cancer. A caveat of our study is the fact that a retroviral candidate of CLL would not be detectable using this strategy if it shows high similarity to human endogenous retroviral (HERV) sequences, allowing reads to be mapped to the human genome assembly and not further analysed (Voisset et al, 2008). Also, a HERV transcribed and related to CLL would not be detected by our pipeline. However, it is worth mentioning that the bovine analogous model of CLL is produced by a deltaretrovirus (BLV). As there are no HERVs related to deltaretroviruses, we did not expect to specifically face these confounding factors. Furthermore, with the single exceptions of HIV and HTLV, the role of retroviruses in human diseases remains highly controversial (Voisset et al, 2008; Weiss, 2010). For instance, the last of the RNA ‘rumour’ viruses, the Xenotropic murine leukaemia virus‐related virus, seems to have been erroneously related to disease as consequence of a contaminant artefact (Robinson et al, 2011; Simmons et al, 2011). In conclusion, despite having analysed CLL transcriptomes from seven different patients by massively parallel sequencing technology and provided the depth needed to detect a putative viral agent, we failed to identify any candidate viral sequence. Although this search was conducted with two powerful, up‐to‐date technologies, our data do not definitively exclude a viral origin for human CLL because the existence of a novel virus sufficiently distant in sequence from any related agents would be difficult to detect. Given that our study included a limited number of patients, we cannot exclude that an infectious agent could be found in a small percentage of patients, as we cannot exclude the possibility that the virus could operate as a hit and run agent. Also, an infectious agent might operate indirectly through attacking microenvironment cells. Further studies at the genomic level might be required to definitively exclude the presence of an integrated retrovirus. A truly interesting and novel concept would be a population genomics study, designed to explore the potential role of polymorphic HERVs as aetiological factors of CLL.

Authorship contributions

SB, PM and HP performed the experiments; NR and HN conducted the bioinformatics analysis; AK, HP and AP contributed to the analysis; SB, PO, CR, GD and OP designed the study; GD and OP coordinated the study. All authors were involved in the analysis and the interpretation of the results. All authors read, gave comments and approved the final version of the manuscript.

Conflict of interests

The authors reported no potential conflicts of interest. Data S1. Methods and results. Click here for additional data file. Data S2. Similarity search results of the BLAST search of final viral candidate reads (Illumina reads) against NCBI's non‐redundant database. Click here for additional data file. Fig S1. Computational subtraction approach developed to identify non‐human sequences in the CLL transcriptome obtained with the 454 technology. Click here for additional data file. Fig S2. Simulated reads from HIV and EBV genomes were pooled and searched using different BLASTN protocols in the viral genome database. Click here for additional data file. Fig S3. Compositional analysis of 1203 reads that aligned anti‐sense to the primer binding site PBS of the MPMV genome (sample CLL250). Click here for additional data file. Fig S4. Median values of Illumina raw read counts for genes with 0‐5 read counts when previously analysed by 454. Click here for additional data file. Table SI. Performance of different BLASTN protocols when searching simulated HIV reads. Click here for additional data file. Table SII. Distribution of BLASTN hits in the non‐redundant database. Click here for additional data file.

47 in total

1. Mutations of the SF3B1 splicing factor in chronic lymphocytic leukemia: association with progression and fludarabine-refractoriness.

Authors: Davide Rossi; Alessio Bruscaggin; Valeria Spina; Silvia Rasi; Hossein Khiabanian; Monica Messina; Marco Fangazio; Tiziana Vaisitti; Sara Monti; Sabina Chiaretti; Anna Guarini; Ilaria Del Giudice; Michaela Cerri; Stefania Cresta; Clara Deambrogi; Ernesto Gargiulo; Valter Gattei; Francesco Forconi; Francesco Bertoni; Silvia Deaglio; Raul Rabadan; Laura Pasqualucci; Robin Foà; Riccardo Dalla-Favera; Gianluca Gaidano
Journal: Blood Date: 2011-10-28 Impact factor: 22.113

2. Human Merkel cell polyomavirus infection I. MCV T antigen expression in Merkel cell carcinoma, lymphoid tissues and lymphoid tumors.

Authors: Masahiro Shuda; Reety Arora; Hyun Jin Kwun; Huichen Feng; Ronit Sarid; María-Teresa Fernández-Figueras; Yanis Tolstov; Ole Gjoerup; Mahesh M Mansukhani; Steven H Swerdlow; Preet M Chaudhary; John M Kirkwood; Michael A Nalesnik; Jeffrey A Kant; Lawrence M Weiss; Patrick S Moore; Yuan Chang
Journal: Int J Cancer Date: 2009-09-15 Impact factor: 7.396

3. Complete viral genome sequence and discovery of novel viruses by deep sequencing of small RNAs: a generic method for diagnosis, discovery and sequencing of viruses.

Authors: Jan F Kreuze; Ana Perez; Milton Untiveros; Dora Quispe; Segundo Fuentes; Ian Barker; Reinhard Simon
Journal: Virology Date: 2009-04-23 Impact factor: 3.616

4. Failure to confirm XMRV/MLVs in the blood of patients with chronic fatigue syndrome: a multi-laboratory study.

Authors: Graham Simmons; Simone A Glynn; Anthony L Komaroff; Judy A Mikovits; Leslie H Tobler; John Hackett; Ning Tang; William M Switzer; Walid Heneine; Indira K Hewlett; Jiangqin Zhao; Shyh-Ching Lo; Harvey J Alter; Jeffrey M Linnen; Kui Gao; John M Coffin; Mary F Kearney; Francis W Ruscetti; Max A Pfost; James Bethel; Steven Kleinman; Jerry A Holmberg; Michael P Busch
Journal: Science Date: 2011-09-22 Impact factor: 47.728

5. The LPL/ADAM29 expression ratio is a novel prognosis indicator in chronic lymphocytic leukemia.

Authors: Pablo Oppezzo; Yuri Vasconcelos; Catherine Settegrana; Dominique Jeannel; Françoise Vuillier; Magali Legarff-Tavernier; Eliza Yuriko Kimura; Stéphane Bechet; Gérard Dumas; Martine Brissard; Hélène Merle-Béral; Mihoko Yamamoto; Guillaume Dighiero; Frédéric Davi
Journal: Blood Date: 2005-03-31 Impact factor: 22.113

6. SF3B1 and other novel cancer genes in chronic lymphocytic leukemia.

Authors: Lili Wang; Michael S Lawrence; Youzhong Wan; Petar Stojanov; Carrie Sougnez; Kristen Stevenson; Lillian Werner; Andrey Sivachenko; David S DeLuca; Li Zhang; Wandi Zhang; Alexander R Vartanov; Stacey M Fernandes; Natalie R Goldstein; Eric G Folco; Kristian Cibulskis; Bethany Tesar; Quinlan L Sievers; Erica Shefler; Stacey Gabriel; Nir Hacohen; Robin Reed; Matthew Meyerson; Todd R Golub; Eric S Lander; Donna Neuberg; Jennifer R Brown; Gad Getz; Catherine J Wu
Journal: N Engl J Med Date: 2011-12-12 Impact factor: 91.245

7. Guidelines for the diagnosis and treatment of chronic lymphocytic leukemia: a report from the International Workshop on Chronic Lymphocytic Leukemia updating the National Cancer Institute-Working Group 1996 guidelines.

Authors: Michael Hallek; Bruce D Cheson; Daniel Catovsky; Federico Caligaris-Cappio; Guillaume Dighiero; Hartmut Döhner; Peter Hillmen; Michael J Keating; Emili Montserrat; Kanti R Rai; Thomas J Kipps
Journal: Blood Date: 2008-01-23 Impact factor: 22.113

8. Fast and accurate short read alignment with Burrows-Wheeler transform.

Authors: Heng Li; Richard Durbin
Journal: Bioinformatics Date: 2009-05-18 Impact factor: 6.937

Review 9. Human RNA "rumor" viruses: the search for novel human retroviruses in chronic disease.

Authors: Cécile Voisset; Robin A Weiss; David J Griffiths
Journal: Microbiol Mol Biol Rev Date: 2008-03 Impact factor: 13.044

10. Common variants at 2q37.3, 8q24.21, 15q21.3 and 16q24.1 influence chronic lymphocytic leukemia risk.

Authors: Dalemari Crowther-Swanepoel; Peter Broderick; Maria Chiara Di Bernardo; Sara E Dobbins; María Torres; Mahmoud Mansouri; Clara Ruiz-Ponte; Anna Enjuanes; Richard Rosenquist; Angel Carracedo; Jesper Jurlander; Elias Campo; Gunnar Juliusson; Emilio Montserrat; Karin E Smedby; Martin J S Dyer; Estella Matutes; Claire Dearden; Nicola J Sunter; Andrew G Hall; Tryfonia Mainou-Fowler; Graham H Jackson; Geoffrey Summerfield; Robert J Harris; Andrew R Pettitt; David J Allsup; James R Bailey; Guy Pratt; Chris Pepper; Chris Fegan; Anton Parker; David Oscier; James M Allan; Daniel Catovsky; Richard S Houlston
Journal: Nat Genet Date: 2010-01-10 Impact factor: 38.330

5 in total

Review 1. Emerging technologies for the clinical microbiology laboratory.

Authors: Blake W Buchan; Nathan A Ledeboer
Journal: Clin Microbiol Rev Date: 2014-10 Impact factor: 26.132

2. Central nervous system Richter's transformation and parvovirus B19 infection.

Authors: Preetesh Jain; Ohad Benjamini; Lin Pei; Nancy P Caraway; Gene Landon; Stella Kim; Sheetal Shivaprasad; Karin Woodman; Susan O'Brien; Alessandra Ferrajoli; Tapan Kadia; Zeev Estrov
Journal: Leuk Lymphoma Date: 2013-03-04

3. Large scale comparison of non-human sequences in human sequencing data.

Authors: Hongseok Tae; Enusha Karunasena; Jasmin H Bavarva; Lauren J McIver; Harold R Garner
Journal: Genomics Date: 2014-08-27 Impact factor: 5.736

4. Cathepsin S, a new serum biomarker of sarcoidosis discovered by transcriptome analysis of alveolar macrophages.

Authors: Hiroyuki Tanaka; Etsuro Yamaguchi; Nobuhiro Asai; Toyoharu Yokoi; Masaki Nishimura; Haruhisa Nakao; Masashi Yoneda; Yoshinori Ohtsuka; Satoshi Konno; Noritaka Yamada
Journal: Sarcoidosis Vasc Diffuse Lung Dis Date: 2019-05-01 Impact factor: 0.670

5. Human endogenous retrovirus np9 gene is over expressed in chronic lymphocytic leukemia patients.

Authors: Sabrina Fischer; Natalia Echeverría; Gonzalo Moratorio; Ana Inés Landoni; Guillermo Dighiero; Juan Cristina; Pablo Oppezzo; Pilar Moreno
Journal: Leuk Res Rep Date: 2014-07-25

5 in total