Tianjiao Zhang1, Garry Wong1. 1. Faculty of Health Sciences, University of Macau, Taipa, Macau SAR 999078, China.
Abstract
Piwi interacting RNAs (piRNAs) are small non-coding single-stranded RNA species 20-31 nucleotides in size generated from distinct loci. In germline tissues, piRNAs are amplified via a "ping-pong cycle" to produce secondary piRNAs, which act in transposon silencing. In contrast, the role of somatic-derived piRNAs remains obscure. Here, we investigated the identity and distribution of piRNAs in human somatic tissues to determine their function and potential role in Parkinson's disease (PD). Human datasets were curated from the Gene Expression Omnibus (GEO) database and a workflow was developed to identify piRNAs, which revealed 902 somatic piRNAs of which 527 were expressed in the brain. These were mainly derived from chromosomes 1, 11, and 19 compared to the germline tissues, which were from 15 and 19. Approximately 20% of somatic piRNAs mapped to transposon 3' untranslated regions (UTRs), but a large proportion were sensed to the transcript in contrast to germline piRNAs. Gene set enrichment analysis suggested that somatic piRNAs function in neurodegenerative disease. piRNAs undergo dysregulation in different PD subtypes (PD and Parkinson's disease dementia (PDD)) and stages (premotor and motor). piR-has-92056, piR-hsa-150797, piR-hsa-347751, piR-hsa-1909905, piR-hsa-2476630, and piR-hsa-2834636 from blood small extracellular vesicles were identified as novel biomarkers for PD diagnosis using a sparse partial least square discriminant analysis (sPLS-DA) (accuracy: 92%, AUC = 0.89). This study highlights a role for piRNAs in PD and provides tools for novel biomarker development.
Piwi interacting RNAs (piRNAs) are small non-coding single-stranded RNA species 20-31 nucleotides in size generated from distinct loci. In germline tissues, piRNAs are amplified via a "ping-pong cycle" to produce secondary piRNAs, which act in transposon silencing. In contrast, the role of somatic-derived piRNAs remains obscure. Here, we investigated the identity and distribution of piRNAs in human somatic tissues to determine their function and potential role in Parkinson's disease (PD). Human datasets were curated from the Gene Expression Omnibus (GEO) database and a workflow was developed to identify piRNAs, which revealed 902 somatic piRNAs of which 527 were expressed in the brain. These were mainly derived from chromosomes 1, 11, and 19 compared to the germline tissues, which were from 15 and 19. Approximately 20% of somatic piRNAs mapped to transposon 3' untranslated regions (UTRs), but a large proportion were sensed to the transcript in contrast to germline piRNAs. Gene set enrichment analysis suggested that somatic piRNAs function in neurodegenerative disease. piRNAs undergo dysregulation in different PD subtypes (PD and Parkinson's disease dementia (PDD)) and stages (premotor and motor). piR-has-92056, piR-hsa-150797, piR-hsa-347751, piR-hsa-1909905, piR-hsa-2476630, and piR-hsa-2834636 from blood small extracellular vesicles were identified as novel biomarkers for PD diagnosis using a sparse partial least square discriminant analysis (sPLS-DA) (accuracy: 92%, AUC = 0.89). This study highlights a role for piRNAs in PD and provides tools for novel biomarker development.
piRNAs (piwi-interacting RNAs) form the largest class of small non-coding RNAs in animals and are characterized by their length distribution of 21–30 nucleotides, interaction with PIWI proteins, and 2′-O-methylation at the 3′end of the single-stranded mature product [1,2,3,4]. To date, the most vital function of piRNAs known is the formation of a piRNA-PIWI silencing complex, which suppresses the expression of transposons in germline cells [5,6]. Because of the ubiquitous epigenetic reprogramming in the germline, transposons may transform their silent status to activation and subsequently alter their chromosome position or propagate, leading to a significant increased risk of genomic instability [7,8,9]. The deficiency of the piRNA-PIWI complex gives rise to the activation of transposons and causes sterility [10,11,12]. In animal models, the piRNA-PIWI complex suppresses transposons by depositing repressing genetic biomarkers (usually H3K9me3) at a transcriptional level or by utilizing PIWI-dependent cleavage of target mRNAs at a post-transcriptional level [13,14,15]. However, in mammalian germ cells, most pre-pachytene piRNAs appear to be within coding regions or are 3′ UTR derived [16,17]. The most abundant piRNA population, pachytene piRNAs, also show lack of transposon sequences [17,18,19]. The widespread existence and high expression level of non-transposon recognizing piRNAs suggest additional functions beyond transposon-silencing [20]. Previous studies have revealed that Aub (homolog of Piwi in Drosophila), a piRNA binding complex, increases the translation of self-renewal and differentiation factors in germ stem cells, as well as stabilize specific mRNAs by a piRNA-guided pathway [13,21]. Recently, another study suggested that a MIWI (homolog of Piwi in mouse)/piRNA/elF3f/HuR super-complex contains a translation-activating function in spermiogenesis [15].In the germline, piRNAs’ biogenesis requires the ping-pong cycle for amplification of secondary piRNAs and is characterized by a 5′ U bias of primary piRNAs [22,23]. The first 10 base-pairs of primary piRNAs and their corresponding secondary piRNAs are complementary [22,23]. In contrast, the piRNA biogenesis pathway in somatic cells shows a distinct process [24,25,26]; however, the details remain elusive and controversial, especially in mammals. Many studies suggests somatic piRNAs exist and function in body regeneration [26,27], cancer [28], embryonic development [29], and neural development [30]. In Homo sapiens, transposons have the potential to increase the diversity of neurons [31]. For example, L1 can induce mosaicism in the neural genome [32]; and Alu is hypothesized to be related to human intelligence, cognition, and neurodegeneration [33]. Therefore, the transposon-repressing function of piRNAs seems to be necessary for neuronal systems. In addition, some studies also indicate that piRNAs function as gene expression regulators in somatic cells [34,35,36]. Many studies have found piRNAs to participate in bioprocesses in lower eukaryotes, including axon regeneration [37], foraging behavior [38], and memory [30,39,40,41]. The somatic piRNAs in the mammalian central nervous system were identified in mouse hippocampus and related to spine morphogenesis [42].Parkinson’s disease (PD) is the second most common neurodegenerative disease with the most consistent neuropathological feature, the gradual reduction of dopaminergic neurons in substantia nigra pars compacta [43,44,45]. At least nine risk factors independently modify risk for PD, of which three decrease (coffee consumption, smoking, and physical activity) and six increase (family history of PD, dyspepsia, exposure to pesticides, oils, metals, and anesthesia) risk [44,46]. PD patients present with motor symptoms that typically include akinesia, bradykinesia, tremor, and rigidity but also may have gait disturbance, impaired handwriting, and speech deficits [47]. Non-motor symptoms include sensory deficits (hyposmia and impaired color vision), neuropsychiatric and neurological features (hallucinations, pain, anxiety, depression, early cognitive dysfunction, and dementia), sleep disturbance and bladder hyper-reflexia, and pain [48]. Some symptoms may overlap, for example, while visual and auditory hallucinations are known, olfactory hallucinations have been found in up to 11.3% of patients [49]. Apart from the substantia nigra, selective deterioration of neurons also to some extent exists in other brain regions, including the amygdala, basal nucleus of Meynert, hypothalamus, hippocampus, temporal cortex, cingulate cortex, and prefrontal cortex, which usually varies with different subtypes, stages, and symptoms [50,51]. Another remarkable characteristic of PD is the presence of Lewy bodies, which is the consequence of aberrant protein aggregation, mitochondria damage, lysosome dysfunction, and inflammation [52,53,54]. Studies have established that Lewy bodies consists of aggregated α-synuclein, [55], are facilitated by mitophagy and PINK1 [56,57], develop when the lysosome is unable to break down aggregating protein [58], and involve inflammation particularly through microglia [59,60]. A previous study indicated piRNAs were dysregulated in neuronal cells derived from induced pluripotent stem cells of sporadic PD patients compared to controls [61]. In the study, SINE and LINE-derived piRNAs were downregulated. In addition, two studies found different expression patterns of piRNAs in amyotrophic lateral sclerosis (ALS) and tauopathies [62,63]. In the ALS study, a Drosophila Cabeza (homolog of FUS causing ALS) knockdown model demonstrated increased pre-piRNA but lower mature piRNA levels [62]. In the tauopathy model, Drosophila was used again via neuron specific overexpression of tauR406W, a mutant form of tau. The animals showed locomotor deficits as well as neuronal degeneration where the RNAs from the head displayed 50 increased and 60 decreased transposable elements [63]. These investigators further validated their findings via observing increased transcripts of the retroviral class of transposable elements in human AD tissues [63]. These results suggest that piRNAs may be potential biomarkers for age-related neurodegenerative disease, including PD. Nonetheless, existing knowledge concerning piRNAs in PD remains obscure. Therefore, deeper and more accurate knowledge on the identity, abundance, distribution, and differences of piRNAs in neuronal tissues will help to inform and provide evidence of piRNAs as playing an essential role in neurodegenerative disease processes.The piRNA-related discoveries in human somatic tissues are limited [64]. Moreover, some somatic piRNAs investigated by related research appear ambiguous as their alignment positions vastly overlap with other ncRNAs (e.g., tRNAs, rRNAs, and snoRNAs) [28,42,65,66], which suggests these somatic piRNAs might be overestimated as they are probably not bona fide piRNAs. In this study, we analyzed high-throughput small RNA sequencing data from several datasets downloaded from the GEO database with accession numbers and references provided therein. We used a relatively strict piRNA identification method by filtering other annotated small ncRNAs first and then annotated the rest of the sequences by a piRNA annotation database. Then, we analyzed the expression patterns and characteristics of germline piRNAs in tissue-specific somatic cells. To investigate piRNAs in PD tissues, we performed a comparative study of PD patients and controls and evaluated piRNAs’ functional role in disease onset and progression. Our investigation identifies a small but appreciable number of somatic piRNAs that exist in somatic cells and are dysregulated in human PD tissues. Furthermore, we identify the nature and potential gene targets of these piRNAs and propose a subset as disease biomarkers.
2. Materials & Methods
2.1. Workflow
In order to identify and compare piRNA expression patterns in germline tissues and somatic tissues, we built an analysis workflow. The workflow of the study includes pre-processing steps (quality control (QC), adapter trimming, and length filtering), annotation of piRNAs (mapping and annotation based on piRNA database). Next, tissue-specific piRNA features (chromosome distribution, genomic context, length distribution, and ping-pong cycle) were investigated. For the comparison study between PD and the control, we implemented differential expression analysis, followed by gene enrichment analysis for piRNAs-aligned to genes/predicted targets. Finally, classifiers were used to differentiate between the control and PD.
2.2. Collection of High-Throughput Small RNA Sequencing Datasets from GEO
We collected germline and somatic tissue small RNA sequencing datasets from GEO [67]. A previous study demonstrated that small RNA sequencing captures and sequences small RNAs with moderate-to-low-abundance while able to detect modest expression differences across samples [68]. We obtained the piRNA expression patterns in various tissues by analyzing their corresponding small RNA sequencing datasets. The description of datasets used for the study is provided in Table 1.
Table 1
Description of small RNA datasets curated for the analysis.
Tissue
Accession
Sample Size
Library Size
References
Testis
PRJNA196749
3
6.84–25.30 million
[69]
Sperm
PRJNA564759
102
0.36–1.63 million
[70]
Ovary
PRJNA272542
12Note: 4 adult ovaries; 4 ovaries from 1st trimester embryos; 4 ovaries from 2st trimester embryos
13.3–15.8 million
[71]
Dorsolateral Prefrontal Cortex (DLPFC)
PRJNA185476
4
6–12.25 million
[72]
Pancreas
PRJNA490335
3
12.2–12.5 million
[73]
Knee synovial tissues
PRJNA389258
10
8.39–17.21 million
[74]
Liver
PRJNA246372
4
5.83–40.70 million
[75]
Prefrontal cortex (PFC)
PRJNA295431PRJNA272617
26 (Parkinson’s disease)Note: including 17 Parkinson’s disease (PD); 9 Parkinson’s disease with dementia (PDD)25 (Control)
6.09–33.40 million
[76,77,78]
Amygdala
PRJNA381204
14 (Parkinson’s disease)Note: including 7 premotor stage; 7 motor stage14 (Control)
13.5–17.7 million
[79,80]
Blood (extracellular vesicles)
PRJNA655240
9 (Parkinson’s disease)Note: in each sample, both large (LEV) and small extracelluar vesciles (SEVs) were tested 6 (Control)Note: in each sample, both large (LEV) and small extracelluar vesciles (SEVs) were tested
2.37–30.7 million
[81]
2.3. Pre-Processing and piRNAs Annotation
Sequenced raw data were adaptor trimmed and size filtered (20–32 nucleotides) by cutadapt (v2.8) [82]. We then aligned the processed sequencing data to hg38 Human Genome assembly [83] by bowtie (v1.2.3) [84] with one mismatch tolerance, and alignments with more than 50 distinct positions restrained. To filter other small non-coding RNAs (rRNAs, tRNAs, miRNAs, scaRNAs, snoRNAs, miscRNAs, scRNAs, sRNAs, and snRNA) in our datasets, we created an aggregated small non-coding RNA annotation file based upon UCSC [85], RefSeq [86], Gencode [87], DASHR [88], and Ensemble databases and annotations [83]. We excluded reads aligned to these other small non-coding RNAs from further investigation. We used featureCounts (v2.0.0) [89] and piRNA annotation file download from piRbase [90] to annotate piRNAs in the remainder of the reads. The reads annotated by piRBase were considered as annotated piRNAs in our study. The general workflow of the study is shown below as Figure 1.
Figure 1
Workflow of this study. The raw data length filtered (20,32) is shown in brackets.
2.4. Transposons and Transcripts Annotation
Repeats’ annotation was downloaded from UCSC repeat browser [91], while transcripts annotation (protein-coding, lncRNAs, pseudogenes) was obtained from Ensemble [83]. In our study, we classified LINE, SINE, LTR, and DNA in repeats as transposons and the rest of the repetitive sequence types, such as simple repeats, as other repeats.
2.5. Differential Analysis and Enrichment Analysis
We used the sample size, variance homogeneity of variance, and existence of outliers to construct differential analysis by the Wilcoxon test (piRNAs) or Deseq2 (v1.30.1) [92]. Combat-seq algorithm [93] was used to correct batch effects when needed. We implemented gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis by a web-based resource g: Profiler [94].
2.6. Classifier Construction
Sparse partial least squares discriminant analysis (sPLS-DA) [95,96,97] was used to construct control and PD classifiers. For each classifier, we used expression counts data (piRNAs, piRNAs in each repeat region, and piRNAs in each gene region) normalized by library size to train the classifier. In each training process, we first evaluated the performance of sPLS-DA by component numbers from one to ten, using leave-one-out cross-validation [98,99] to obtain optimized component numbers. We then used the grid-search method to access the optimal number of variables to select on each component for a further sparse partial least squares discriminant analysis (sPLS-DA). Finally, we used accuracy and area under the curve (AUC) of the receiver operating characteristic curve (ROC) [100] to identify the quality of each classifier. We used package mixOmics [101] to complete the processes mentioned above.
3. Results
3.1. piRNA Expression in Somatic Cells and Comparison to Germline Cells
In order to characterize the properties of somatic piRNAs, we built a pipeline for extracting and identifying piRNAs from next generation sequencing (NGS) datasets. The pipeline is shown in (Figure 1). As the ovarian tissue dataset has a different experimental design and sequencing depth with others, piRNAs were considered expressed if reads were >2 in all samples or reads > 5 in 50% of samples, while for other tissues, the standard was reads > 2 in all samples or reads > 10 in 50%. We first compared piRNAs from somatic tissue to the testis and ovaries. The total number of piRNAs identified from the testis, ovary, and somatic tissue were 19981, 6363, and 902, respectively (Figure 2a). Consistent with previous studies, we identified a large number of testis piRNAs and a small number of somatic RNAs [102]. There were 49 common piRNAs between tissues studied, 51 between somatic and testis tissues, and 312 between somatic and ovaries, representing >30% of all somatic piRNAs. When the brain piRNAs were considered, the total number of piRNAs were 527 (Figure 2b). There was a dramatic reduction in piRNAs overlapping with testis, 3 in total, while 208 overlapped with ovaries. When we further compared brain regions, we observed that piRNAs were widely dispersed between the amygdala, prefrontal cortex, and dorsal lateral prefrontal cortex (Figure 2c). The expression pattern of somatic piRNA types appears to be tissue-specific with a large proportion of different piRNAs in the germline (testis, sperm, and ovary) (Figure 2a–c) (Supplementary Figure S1), which suggests that somatic piRNAs might function differently than those from germline piRNAs.
Figure 2
Venn plots of piRNAs identified from germline and somatic tissues. The number of piRNA types among somatic (dorsolateral prefrontal cortex, prefrontal cortex, amygdala, pancreas, liver, and knee synovial), germline tissues (testis, sperm, and ovary), and brain tissues (dorsolateral prefrontal cortex (DLPFC), prefrontal cortex (PFC), and amygdala) were compared. (a) Comparison between somatic and germline tissues. (b) Comparison between brain and germline tissues. (c) Comparison between different brain regions.
piRNAs are known to be expressed from specific loci in the genome. In order to identify the specific regions of the genome from where the piRNAs originated, we mapped the reads. Testis/sperm piRNAs display a pronounced chromosome 15 and 19 alignment preference, which is lost in the ovary (Figure 3a). Chromosome 19 alignment can also be observed in somatic tissue; however, there appears to be piRNA loci, such as chromosome 4 for ovary, 1 for brain tissues, liver, and pancreas, and 11 for liver (Figure 3a). Somatic piRNAs also show a high mitochondrial chromosome alignment rate among all three brain tissues studied and the liver in contrast to germline tissues (Figure 3b).
Figure 3
Chromosome distribution of piRNAs. (a) The distribution of piRNA reads mapped to the genome across different tissues was plotted. The y-axis represents RPKM which is calculated by . A fraction count to each multi-alignment reads was assigned (each aligned-read carries 1/x count where x is the number of alignment positions of that read). (a) shows piRNA distribution in autosomes while (b) shows piRNA distribution in the mitochondria chromosome.
To determine whether somatic piRNAs were functioning via the well-established transposon suppressing pathway, we identified the transposon sequences aligned to the piRNAs. We unexpectedly detected a lower percentage (~20%) of piRNAs aligned to transposon sequences in both somatic tissues and germline tissues (Figure 4a). This lower percentage in germline was also found by other studies [103,104]. Unlike other tissues, amygdala and germline piRNAs contain more reverse-stranded repeat transposons than stranded. Considering the prevalent model related to piRNA-guided transposon silencing, where the PIWI-piRNA complex interacts with a nascent transposon transcript, a higher reverse-stranded piRNA percentage might indicate that repressing transposon expression is a vital function in both amygdala and germline tissues. In LINE-alignment piRNAs, somatic and ovarian piRNAs show a reverse-stranded preference, while testis and sperm piRNAs present the opposite. However, testis piRNAs show a reverse-stranded preference in SINE-alignment piRNAs, while somatic piRNAs are just the opposite (Figure 4a). Because LINE and SINE might have different impacts in somatic and germline tissues, their aligned-piRNAs might have varying regulatory functions.
Figure 4
Genomic context of piRNAs. The percentage of annotated piRNAs reads mapped to transposons/transposon subtypes (a) and lncRNA/protein-coding genes/pseudogenes (b) of total annotated piRNA reads number. Ratio = reads number mapped to transposons (or lncRNA, protein-coding genes, pseudogenes)/total annotated piRNAs reads number. Symbols: *, p < 0.05; ** p < 0.01; *** p <0.001.
We observed that testis and sperm piRNAs showed a higher percentage of lncRNA-alignment (Figure 4b). These are mainly mammalian pachytene piRNAs, which are primarily generated from transposon-depleted lncRNAs and thought to regulate genes required for male fertility [17]. We detected a higher ratio of protein-coding derived piRNAs, especially in somatic tissues, suggesting a function different that in germline cells. This phenomenon is partly contributed by some coding-gene (~25%) containing transposon sequences in the 3′UTR [105]. Some studies demonstrated that these genes have the potential to be controlled by PIWI and transposon-derived piRNA complex [106], but it is more likely that there are more protein-coding-derived piRNAs that exist across tissues. Currently the detailed function of coding-protein-derived piRNAs remains unknown.We also identified many piRNAs aligned to pseudogenes, which is more evident in testis and sperm tissue as their lower protein-coding gene alignment rate (Figure 4b). A recent study suggested that some pseudogene-derived piRNAs have the potential to target and regulate their cognate protein-coding genes [106]. A more detailed study might thus unravel a regulation network among pseudogenes, piRNAs, and protein-coding genes [107].Next, to more clearly define a function for somatic piRNAs, we performed KEGG enrichment analysis with genes aligned by piRNAs at the 3′UTR region. Previous studies reported the existence of pre-pachytene piRNAs derived from 3′UTR of protein-coding genes in germline cells [17]. To our surprise, both brain tissue and testis tissue achieve enrichment results for “Parkinson’s disease” and “Pathways of Neurodegeneration–multiple diseases”, respectively (Figure 5). However, the function of these sense-piRNAs is indistinct. They might be transacting with their derived mRNAs by an unknown mechanism. When we used the transcript (including CDS, exon, 3′UTR, and 5′UTR) aligned by piRNAs to perform enrichment analysis of the three different brain tissues, testis, and ovary, all tissues identified an enrichment of neurodegeneration related genes.
Figure 5
KEGG enrichment results for 3′UTR-piRNA alignment genes. KEGG enrichment analysis with protein-coding genes containing 3′UTR aligned by annotated piRNAs in brain tissues (a–c) and testis tissue (d). Gene ratio = Gene number in query list (piRNAs aligned in gene list)/Total gene number of specific pathway in database.
To further define attributes of piRNAs from somatic tissues, we characterized their length features. Somatic piRNAs show shorter and more evenly distributed lengths than germline piRNAs (Figure 6). According to the ping-pong cycle biogenesis pathway, tissues with significant ping-pong cycle signals synthesize piRNAs, which possess a 5′U bias and an A nucleotide bias at the 10th position. We found these attributes in testis, sperm, and ovary derived piRNAs from our analysis pipeline. However, we were not able to uncover evidence of ping-pong cycle biogenesis in somatic tissues based upon the distribution of mapped reads (Figure 6). We further confirmed this with an algorithm, pingpongpro [108], which also was not able to detect a ping-pong cycle signal in somatic tissues.
Figure 6
The length distribution of annotated piRNAs. The y-axis represents the ratio of piRNA reads with specific length/base to total annotated piRNAs reads in that tissue. Colors represent the first base of annotated piRNAs.
3.2. piRNAs in Parkinson’s Disease
To determine the significance of somatic piRNAs in human disease, we detected 296 piRNAs in the prefrontal cortex of which 20 piRNA expression levels were significantly different in PD; and 508 piRNAs in amygdala of which 55 piRNA expression levels were significantly different in PD (Figure 7a). Among these, one piRNA (piR-hsa-748391) was expressed differently in PD in both prefrontal cortex and amygdala tissues. Unfortunately, none of 3′UTR derived piRNAs from neurodegeneration-related genes mentioned above were among these differently expressed piRNAs. A target predicting method raised by a previous paper [109] showed 55 differently expressed piRNAs in amygdala target 20 proteins. Of these 20 proteins, MT-CO1 and MT-CO3 were included in the Parkinson’s disease KEGG enrichment result, while MT-CO1, MT-CO3, and GRIA4 was included in the Pathways of neurodegeneration-multiple diseases KEGG enrichment result. However, this stringent target prediction method predicted that none of the genes were targeted by piRNAs differently expressed in PD prefrontal cortex.
Figure 7
Venn plots of differently expressed piRNAs. Venn plots of differentially expressed piRNAs from prefrontal cortex (a) and amygdala (b). The Wilcoxon test was used to detect differently expressed piRNAs between different groups with p-value < 0.05. PD: Parkinson’s disease; PDD: Parkinson’s disease dementia.
We also found five piRNAs were differently expressed among the control, premotor stage PD, and motor stage PD in amygdala but not in the control vs. PD (premotor plus motor stage) (Figure 7b). Among these, piR-hsa-131693 is predicted to target MT-CO3 and MT-CO3 related pseudogenes (MT-CO3, MTCO3P12, MTCO3P18, MTCO3P8, MTCO3P38, MTCO3P7, MTCO3P13, and MTCO3P22) and the expression of piR-has-131693 decreases as PD disease progresses in amygdala. In the prefrontal cortex, we also identified different piRNAs expression patterns in PD and PDD, although there were no significant enrichment results with their predicted target genes (Figure 7a). In PD SEVs, we found 17 differently expressed piRNAs, of which piR-hsa-2435261 was predicted to target TANGO2 and piR-hsa-1516701 was predicted to target JAK3. In PD LEVs, two differently expressed piRNAs were detected with no specific predicted target genes.We observed a higher percentage of transposon-derived piRNAs in the PD group in the prefrontal cortex (control vs. PD, control vs. PDD, control vs. PD plus PDD), but no significant difference between PD and PDD (Figure 8a). However, the transposon-derived piRNA expression difference between PD and control in amygdala is not significant (Figure 8b). In contrast, piRNAs in PD showed higher pseudogene alignment in the prefrontal cortex (Figure 9a), where the different piRNA-aligned pseudogenes were HSP90AA1 and EEF1A1-related pseudogenes. Interestingly, these two genes have been reported as PD-related genes and showed a downregulation trend in PD-affected tissues [110,111]. In amygdala, the overall piRNAs-aligned pseudogenes ratio is not different in Parkinson’s disease (Figure 9b), but some related pseudogene-aligned piRNAs, including HSP90, MTCO1, MTCO2, YWHAZ, GAPDH, and MTND were upregulated in the PD group.
Figure 8
The percentage of annotated piRNAs reads mapped to transposons in prefrontal cortex (a) and amygdala (b) Ratio = reads number mapped to transposons (or subtypes)/total annotated piRNAs reads number.
Figure 9
The percentage of annotated piRNAs reads mapped to transcripts in prefrontal cortex (a) and amygdala (b) Ratio = reads number mapped to lncRNA (or protein-coding genes, or pseudogenes)/total annotated piRNAs reads number.
We defined piRNAs-aligned genes with a stringent standard (piRNAs-aligned number > 10 in more than 50% samples or piRNAs-aligned number > 2 in all samples). Using differently expressed analysis (p < 0.05), we identified 24 genes with different piRNAs-alignment numbers between the control and PD in prefrontal cortex and 162 genes in amygdala. Figure 10 shows the top 10 results for each enrichment term. We also obtained Parkinson’s disease KEGG results in the amygdala tissue and prefrontal tissue with a relaxed standard (piRNAs-aligned number >2 in more than 10 samples) (data not shown). From the Parkinson’s disease KEGG enrichment result, five of differently expressed piRNAs-aligned genes (NDUFA4, CALM3, UCHL1, CALM2, and HSPA5) in the prefrontal cortex (15 genes) overlapped with differently expressed piRNAs-aligned genes in the amygdala (16 genes).
Figure 10
Gene set enrichment analysis for genes with different piRNAs-aligned rates. piRNAs-aligned genes was defined by a stringent standard (piRNAs-aligned number > 10 in > 50% of samples or > 2 in 100% of samples). The gene ontology and KEGG pathway enrichment analyses were established by genes with different piRNA aligned rates in PD and control groups. The top 10 results for each enrichment term: BP (biological process), CC (cellular component), MF (molecular function), and KEGG are shown. The tissues are indicated (a–h). Gene ratio = gene number in query list (piRNAs aligned in gene list)/total gene number of specific pathway in database.
3.3. sPLS-DA of piRNAs Expression Level in Control and PD
Lastly, we employed a sparse partial least square discriminant analysis, a supervised principal component analysis analog statistical method [95,97], to distinguish control and PD based on piRNAs expression levels. The leave-one-out cross-validation method was applied to construct 2-classes classifiers/3-classes classifiers (results are shown in Supplementary Table S1). These results show the potential that piRNAs can be biomarkers to separate PD from controls, suggesting piRNAs undergo perturbation in different PD subtypes (PD and PDD) and stages (premotor and motor). We also used piRNAs counts in piRNAs-aligned genes/repeats as indicators to train sPLS-DA classifiers, which showed validation performance improvement over the above piRNAs model to some extent. We also used a random forest algorithm and the results obtained were similar (data not shown). We also observed that a classifier based on piRNAs in blood extracellular vesicles achieve an excellent performace (accuracy: 92%, AUC = 0.89). The sPLS-DA based scatter plots for individuals show a clear separation (Figure 11).
Figure 11
sPLS-DA of piRNA expression data. The 3D sPLS-DA plots for prefrontal cortex (a) and amygdala (b) piRNA expression levels are shown. The optimized components value of blood SEVs is 1, therefore was not presented as a 3d PLS-DA plot.
4. Discussion
In this study, we identified piRNAs that display specific expression patterns across germline and somatic tissues. Compared with germline derived piRNAs, the overall abundance in somatic tissues was much lower but still appreciable. A previous study proposed that piRNA targeting has a greater correlation with binding energy compared to piRNA abundance, suggesting that piRNAs might act in a concentration-independent manner [112]. This supports the notion that somatic piRNAs can exert their functions with low expression levels. We found evidence for a ping-pong cycle biogenesis pathway signal in germline (testis, sperm, and ovary) but not in somatic tissues, which suggests a different piRNA generation pathway in the soma. The chromosomal loci from where piRNAs are generated showed some overlap between somatic and germline tissues; however, most somatic piRNAs were generated from loci distinct from germline cells.While the existence of piRNAs from somatic tissues has been described, the function of these piRNAs remains unclear [107,113,114]. It could be hypothesized that somatic piRNAs function similarly to germline piRNAs, where the primary function is to repress transposons [115]. We found only 20% of piRNAs derived from transposon areas in both germline and somatic tissues. As somatic cells have more extensive epigenetic biomarkers than germline cells [116], the primary function of somatic piRNAs might be different from repressing transposons. Notably, transposon-derived somatic piRNAs have different expression levels and stranded preferences with transposon-derived germline piRNAs, suggesting that transposon repression may not be its only function.Previous studies have shown evidence that piRNAs regulate protein expression and participate in many bioprocesses [117,118,119,120]. Some studies suggest that piRNAs repress gene expression by cleaving mRNAs of protein-coding genes [13,14,121,122]. Another study showed non-transposon-derived piRNAs have the potential to induce DNA methylation at non-transposon regions in a complementary manner in two human cell lines [123]. In addition, studies have indicated that piRNAs activate mRNA translation [15] and localize mRNA [124,125]. Conversely, some evidence has also shown that mRNAs act as a source of piRNAs, which in turn regulate their target mRNAs [22,126]. We hypothesized that piRNAs’ generation and function follow base-pairing rules, and we performed enrichment analysis with the genes aligned by piRNAs. The enrichment results showed neurodegeneration disease in both brain tissues (dorsolateral prefrontal cortex, prefrontal cortex, and amygdala) and germline tissues (testis and ovary), which suggested piRNAs may be involved in neurodegenerative processes. To further test this, we performed a comparative study of Parkinson’s disease in the prefrontal cortex and amygdala. The prefrontal cortex and amygdala were previously shown to be affected in Parkinson’s disease in a study using lesions with immunoreactions against α-synuclein and focusing on limbic and motor regions, and in PET imaging studies measuring blood flow changes, respectively [127,128]. Especially, amygdala was demonstrated to be damaged in the early stage and related to some incipient symptoms, such as anxiety determined by the association of anxiety symptoms and amygdala volume in 110 early stage patients [129]. Imaging studies have also addressed the link between anxiety and the amygdala in Parkinson’s disease especially highlighting the fear circuit [130]. We found that the piRNAs in Parkinson’s disease brains have a higher pseudogene alignment rate, while the cognate protein-coding genes of some of those pseudogenes showed neurodegeneration-related functions. Pseudogene-derived piRNAs have been proposed to suppress their cognate gene level in late spermatocytes [106]. Based on these results, we propose a pseudogenes-piRNAs protein-coding gene regulation network, where those pseudogenes-related piRNAs have the potential to regulate their cognate genes in brains.Finally, we used piRNAs expression levels as features to construct sPLS-DA classifiers with the prefrontal cortex, amygdala, and blood extracellular vesicles [97]. The blood extracellular vesicle-related classifier is particularly interesting, as it has potential to be measured as a noninvasive biomarker in Parkinson’s disease [131,132,133,134,135] To our surprise, the classifier’s performance with small extracellular vesicle piRNAs was excellent (accuracy: 92%, AUC = 0.89). This compares favorably to previous studies where miRNAs were proposed as biomarkers and may further complement studies where the amount of exosomes, α-synuclein, or other proteins were proposed [134,136,137,138]. The variable importance in projection (VIP) score in sPLS-DA indicates a feature’s relevance and importance to the trained model. We found piR-has-92056, piR-hsa-150797, piR-hsa-347751, piR-hsa-1909905, piR-hsa-2476630, and piR-hsa-2834636 showed the highest relevance in our trained model. piR-hsa-92056 and piR-hsa-150797 are derived from SPPL3, which is a signal peptide peptidase and identified as a risk gene of PD in a GWAS study [139]. piR-hsa-2476630 is derived from DOK3, which inhibits lipopolysaccharide signaling in macrophages [140], while lipopolysaccharide causes chronic neuroinflammation and progressive neurodegeneration [141]. piR-has-347751 is derived from RHEB, which alleviates neurodegeneration and induces axon regrowth [142]. piR-hsa-2834636 is derived from HBA2, of which the mRNA is dysregulated in the frontal cortex of neurodegenerative disease [143]. While these associations suggest specific targets through which the piRNAs may act, validation of each specific biochemical step remains to be performed. Model organisms may thus help in this regard, and a study from our lab demonstrated dysregulation of specific miRNAs and piRNAs in a PD model [144]. Ultimately, the function piRNAs in blood extracellular vesicles might reveal information concerning cell–cell communication and disease progression.While we were able to detect specific piRNAs in somatic tissue and demonstrate their dysregulation, the abundance is very small, especially compared to germline tissues. Moreover, we were not able to detect evidence of an amplification system or biological process to produce secondary piRNAs. How these modest levels of piRNAs affect their cognate genes remains to be determined. Recent studies suggest that multiple piRNAs may target a specific mRNA or have broad targeting capacity [109,112]. Thus, the predicted targets of the piRNAs in this study may require further verification and/or refinement. Furthermore, while the notion of dysregulated piRNAs from blood is attractive, their biological origins, such as from brain or other tissues, remain elusive. Previous work on the role of inflammatory processes in neurodegeneration suggests a blood-based pathway but requires additional investigation [145]. Finally, while recent studies have linked transposon expression in the brain to Alzheimer’s disease processes, the role of these sequences in PD is largely unknown [146,147].
5. Conclusions
In summary, we identified and characterized somatic piRNAs from a multitude of non-germline tissues. Their distribution and source of biogenesis appeared to differ from germline tissues. Their function, based upon cognate RNA targets appeared to be different as well. Surprisingly, their gene targets were enriched for neurodegenerative disease and specific genes identified highlighted pseudogenes related to Parkinson’s disease. The results presented here provide insights into the potential role of piRNAs in human neurodegenerative disorders.
Authors: Zheng Yan; Hai Yang Hu; Xi Jiang; Vera Maierhofer; Elena Neb; Liu He; Yuhui Hu; Hao Hu; Na Li; Wei Chen; Philipp Khaitovich Journal: Nucleic Acids Res Date: 2011-05-05 Impact factor: 16.971
Authors: Christopher M Lee; Galt P Barber; Jonathan Casper; Hiram Clawson; Mark Diekhans; Jairo Navarro Gonzalez; Angie S Hinrichs; Brian T Lee; Luis R Nassar; Conner C Powell; Brian J Raney; Kate R Rosenbloom; Daniel Schmelter; Matthew L Speir; Ann S Zweig; David Haussler; Maximilian Haeussler; Robert M Kuhn; W James Kent Journal: Nucleic Acids Res Date: 2020-01-08 Impact factor: 16.971
Authors: David Haussler; Sofie R Salama; Maximilian Haeussler; Jason D Fernandes; Armando Zamudio-Hurtado; Hiram Clawson; W James Kent Journal: Mob DNA Date: 2020-03-31