| Literature DB >> 28390408 |
Jérémie Becker1, Philippe Pérot1, Valérie Cheynet1, Guy Oriol1, Nathalie Mugnier2, Marine Mommert1,3, Olivier Tabone3, Julien Textoris3, Jean-Baptiste Veyrieras2, François Mallet4,5.
Abstract
BACKGROUND: Human endogenous retroviruses (HERVs) have received much attention for their implications in the etiology of many human diseases and their profound effect on evolution. Notably, recent studies have highlighted associations between HERVs expression and cancers (Yu et al., Int J Mol Med 32, 2013), autoimmunity (Balada et al., Int Rev Immunol 29:351-370, 2010) and neurological (Christensen, J Neuroimmune Pharmacol 5:326-335, 2010) conditions. Their repetitive nature makes their study particularly challenging, where expression studies have largely focused on individual loci (De Parseval et al., J Virol 77:10414-10422, 2003) or general trends within families (Forsman et al., J Virol Methods 129:16-30, 2005; Seifarth et al., J Virol 79:341-352, 2005; Pichon et al., Nucleic Acids Res 34:e46, 2006).Entities:
Keywords: Biostatistics; Microarray; Repetitive elements; Transcriptomics
Mesh:
Year: 2017 PMID: 28390408 PMCID: PMC5385096 DOI: 10.1186/s12864-017-3669-7
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1Mains steps of the HERV-V3 array design. The design involved three steps of (a) database creation, where HERV copies were either detected by RepeatMasker using 42 prototypes or reconstructed from Dfam predictions; (b) development of a hybridization model, illustrated by models predictions and observed intensities on Affymetrix probeset associated with CD59 gene; and (c) design of probes and probesets. The difference of annotation level between consensus and prototypes is shown, where LTR subregions and ORFs are only identified in prototypes. It can further be noted that the agreement between observed and predicted intensities increases with the k-mers size and the complexity of spatial information (a more thorough description is provided in the Additional file 3: Figure S1)
Number of elements and functional sub-regions contained in HERVgDB4 (left) and designed on HERV-V3 (right) where one probeset is defined by sub-region
| Repertoire | HERVgDB4 (database) | HERV-V3 (array) | |||
|---|---|---|---|---|---|
| Number of elements | Number of sub-regions | Number of elements | Number of probesets | Number of elements | |
| HERV prototypes | 29,859 | 90,106 | 29,807 | 45,374 | 29,859 |
| HERV centromeric | 192 | 589 | 24 | 29 | 192 |
| HERV Dfam | 169,821 | 342,482 | 154,535 | 283,641 | 169,821 |
| MaLR Dfam | 228,429 | 45,543 | 179,323 | 311,286 | 22,8429 |
| LINE1 | 1072 | 4627 | 664 | 1416 | 1072 |
| lncRNA | 3812 | 3819 | 3777 | 3777 | 3812 |
| Viruses | 291 | 386 | 289 | 368 | 2044 |
| gPEHM | 1559 | 1559 | 1559 | 1559 | 8743 |
| gU133 | 1559 | NA | 1559 | 3884 | 42,964 |
| gHTA | 1559 | NA | 1559 | 35,398 | 344,002 |
| Affymetrix Controls | NA | NA | NA | 177 | 20,895 |
| Total | 435,040 | 898,998 | 372,976 | 686,869 | 2,651,585 |
The discrepancy between the number of elements in the database and on the array is due to cross-hybridizing elements discarded during the design
Fig. 2Platform evaluation. a Pre-processing methods were evaluated on the whole array using the titration response as a function of the fold-change between samples A and B. Probesets were binned according to the fold-change values between A and B. Unlike GCBG-RMA, the three methods RMA-TPRN, RMA and Li-Wong present narrow titration curves, indicative of good performances. The two confounding factors (b) intensity and (c, same colour code as in 2b) probeset size distribution are represented in HERVs/MaLRs, gU133/gHTA and gPEHM compartments: the intensities are lower in HERVs/MaLRs than in genes (gPEHM, gU133/gHTA), reffecting a smaller proportion of expressed loci in the former. The three compartments, HERVs/MaLRs, gU133/gHTA, gPEHM, and downsized gPEHM (dgPEHM) are compared on (d) repeatability (CV) and accuracy measured both by (e) the titration response and (f) the estimated dilution mixture (). The grey horizontal lines in (f) symbolizes the theoretical mixture values β C and β D. Only probesets differentially expressed between samples A and B (fold-change A/B and B/A > 2, P < 0.01) were used to generate the boxplots in (f). The gene repertoires show similar level of repeatability and accuracy (similar median CVs, titration curves and distributions), whereas HERVs/MaLRs performances are slightly lower, due to smaller probesets
Fig. 3Consistency with Affymetrix design and model validation. Gene expression variation is compared across the three gene compartments based on fold-change correlation (a–c) and intersections of genes differentially expressed in the gene repertoires (d). The hybridization model PEHM is evaluated by correlating predicted and observed intensities on gU133 probes (e) and HERV-V2 training set (f)
Fig. 4Biological validation. a Intensity heatmap of tissue and pathology specific loci in seven HERV-V3 arrays: the observed intensities correlate well with the expected loci specificity. For each of the eight locus, the family and the probesets names are indicated (the family name and the sub-region annotation are abbreviated in the probeset name). b Distribution of differentially expressed loci (DELs) between hPSCs and embryoid bodies. While most of LDEs are found in MaLR-Dfam, HERV-Dfam and HERV-H, when normalized within family, the proportion of LDEs is higher in HERV-H and HERV-XA34, consistently with Wang et al. [13]. c Intersection between pluripotent loci identified by HERV-V3 and NGS (Wang et al.): despite a small number of shared loci (115), 55.7% of HERV-V3 loci coverage is contained in this intersection