Literature DB >> 34871445

piRBase: integrating piRNA annotation in all aspects.

Jiajia Wang1,2,3,4, Yirong Shi3,5, Honghong Zhou1,2,3, Peng Zhang3,6, Tingrui Song3, Zhiye Ying1,2, Haopeng Yu1,2, Yanyan Li1,2,3, Yi Zhao1,2,7, Xiaoxi Zeng1,2, Shunmin He1,2,3,4,6, Runsheng Chen1,2,3,6,8.   

Abstract

Piwi-interacting RNAs are a type of small noncoding RNA that have various functions. piRBase is a manually curated resource focused on assisting piRNA functional analysis. piRBase release v3.0 is committed to providing more comprehensive piRNA related information. The latest release covers >181 million unique piRNA sequences, including 440 datasets from 44 species. More disease-related piRNAs and piRNA targets have been collected and displayed. The regulatory relationships between piRNAs and targets have been visualized. In addition to the reuse and expansion of the content in the previous version, the latest version has additional new content, including gold standard piRNA sets, piRNA clusters, piRNA variants, splicing-junction piRNAs, and piRNA expression data. In addition, the entire web interface has been redesigned to provide a better experience for users. piRBase release v3.0 is free to access, browse, search, and download at http://bigdata.ibp.ac.cn/piRBase.
© The Author(s) 2021. Published by Oxford University Press on behalf of Nucleic Acids Research.

Entities:  

Mesh:

Substances:

Year:  2022        PMID: 34871445      PMCID: PMC8728152          DOI: 10.1093/nar/gkab1012

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

Piwi-interacting RNAs (piRNAs) are a type of small noncoding RNA that bind to Piwi proteins (1,2) to form a complex and perform vital functions in the germline, somatic tissues and diseases. Unlike the other two types of small RNAs, microRNAs (miRNAs) and siRNAs, piRNAs are 24–31 nucleotides (nt) in length, and 2′-O-methylated at the 3′ terminus (3–5). Recent studies have shown that piRNAs loaded with the ovary-specific Piwi protein, PIWIL3, have no 2′-O-methylation at the 3′ end (6,7). Many piRNAs originate from large genomic regions called piRNA clusters (2,8,9), which can be transcribed into long single-strand precursors and further processed into mature piRNAs (10–14). It was reported that some piRNA precursors contain introns (15). Numerous studies have shown that piRNAs have many important biological functions (16). piRNAs are necessary for the maintenance of genome stability. The Piwi/piRNA complex can regulate the activity of transposons at genomic level through heterochromatin formation to silence transcription (17–21) and at the posttranscriptional level by repressing expression (9,22–26). Moreover, piRNAs are also involved in sex determination (27,28) and the regulation of mRNA and lncRNA expression (28–38). Accumulating studies have shown that piRNAs are expressed not only in germline cells but also in cancer cells (39–45) and somatic cells (46,47). piRNAs that have abnormal expression in various cancers can be used as potential diagnostic and prognostic biomarkers. In addition, Fu et al. found that the SNP rs132306 in piR-021285 is significantly associated with an increased risk of breast cancer (48). In addition to cancers, some piRNAs are aberrantly expressed in other diseases. In Alzheimer's disease (AD), Qiu et al. found that >100 piRNAs are differentially expressed in diseased versus normal brain tissue, and most of these piRNAs are associated with risk SNPs (49). In addition, piRNAs can also serve as potential biomarkers for AD (50). piRNAs also play roles that cannot be ignored in other noncancer diseases, such as cardiovascular disease (51–53). Previous studies have shown that defects in the Piwi protein, which influence the piRNA pathway, can lead to male sterility (22). Recent studies have revealed that the production of fertile oocytes in female golden hamsters depends on piRNAs (54–56). These studies have extended our understanding of piRNA functions. piRBase is currently the largest and most comprehensive piRNA database, and has been imported into RNAcentral (57) as an expert database. The number of different piRNAs in the current release has reached over 181 million, including 440 datasets from 44 species. Information about newly added piRNAs is annotated and displayed based on the previous modules. More information about piRNA-like small RNAs (piRNA-like sRNAs) and diseases, including cancers and other types of diseases, has been collected. Moreover, piRBase release v3.0 visualizes the regulatory network among the piRNAs and their targets. In addition to the expansion of the existing content, we also added some new content in piRBase release v3.0. At present, there are too many (up to tens of millions) piRNA sequences from various sequencing methods including chromatography, Piwi protein IP, Piwi protein CLIP-seq, oxidization and sRNA-seq. In order to get smaller and more representative piRNA set, we introduced the concept of the gold standard piRNA set in the piRBase release v3.0 according to the characteristics of piRNAs. We offer the gold standard piRNA set for six species, which will help users study piRNAs more effectively. Additionally, information on piRNA clusters, piRNA variants, splicing-junction piRNAs and piRNA expression profiles of various datasets was also added to the new version. Moreover, the web interface of piRBase release v3.0 has been redesigned.

DATA COLLECTION AND PROCESSING

Data collection and piRBase ID allocation

According to the data collection method in the previous versions of piRBase (58), new sequences defined as piRNAs in the corresponding article were collected from the literature, supplementary files and related GEOs (59). In addition, other relevant information, such as piRNA clusters, piRNA targets, piRNAs/piRNA-like sRNAs and diseases, was extracted from the published literature. New piRNA sequences of existing species in previous versions were assigned sequential piRBase IDs. For the new species added to the current piRBase, the naming rules of the piRNA sequence are piR + three characters (representing a species) + sequential numbers. Similarly, the IDs of piRNA clusters in piRBase are clus + three characters (representing a species) + sequential numbers. The list of three letter abbreviations and species is shown in Supplementary Table S1.

Genome alignment, splicing-junction piRNAs and annotation

To obtain the piRNA alignment information, Bowtie (60) was used to map the piRNA sequences to the corresponding genome with parameters ‘-v 1 -a -m 10 --best --strata’ as described in the previous versions of piRBase. However, some piRNAs cannot be mapped to the genome based on the previous alignment strategy. Considering the possibility of splicing, the unmapped piRNAs in Bowtie were realigned by STAR (61) with transcript annotation in piRBase release v3.0. The results show that there are many piRNAs that span the intron regions. We called this kind of piRNA splicing-junction piRNAs. Based on the genomic loci, piRNAs are annotated into two categories, gene- or repeat-related piRNAs, as in previous versions of piRBase.

piRNA variant annotation

Based on the alignment results, not all piRNAs have perfect matches in the genome. There are some mismatches between piRNAs and genome sequences that could be induced by genomic variants. To extract variant information, the unique mapped records with one mismatch were selected from the Bowtie results. A mismatch site was designated as a piRNA variant if >10 nonredundant piRNA sequences supported the same mismatch site. Human piRNA variants are annotated by dbSNP (62) and associated with RS IDs if they exist.

piRNA expression profiles and gold standard piRNA sets

To determine the expression of piRNAs in various tissues, piRNA read counts in various datasets were collected, and these data are shown in piRBase (Supplementary Table S2). In this version, the CPM (counts per million) in each dataset was calculated for each piRNA sequence based on the total read count in the dataset and the read count of each sequence. piRNAs interact with the Piwi protein (1,2) and have 2′-O-methylation at their 3′ termini (3–5). Although some piRNAs have no 2′-O-methylation (6,7), the 2′-O-methylation enriched sequences are considered a subset of piRNAs. These properties allow enrichment of piRNA sequences through methods such as immunoprecipitation (IP) of Piwi proteins and oxidation treatment (3). piRBase release v3.0 defines gold standard piRNA sets based on support from different kinds of piRNA enrichment datasets. Matched Piwi protein IP or oxidation treatment versus small RNA sequencing data were collected. We used a binomial model to calculate the P values of piRNA enrichment based on piRNA counts in enriched (ne) and matched control small RNA (nc) datasets. If piRNA has equal proportions in the two libraries, then ne/nc should be equal to Ne/Nc, where Ne and Nc are the total read counts of the two libraries. Let p = Ne/(Ne + Nc) and n = ne + nc; then, ne follows a binomial distribution of X ∼ B(n, p). The enrichment P value can be calculated as the probability of X ≥ ne. To include datasets lacking matched controls, the distributions of significantly enriched piRNAs at different abundance levels were examined for matched datasets. Considering the impact of library depth, piRNAs were sorted by counts in the enriched library, and the proportion of significantly (P< 0.05) enriched piRNAs was calculated in the most abundant piRNAs whose count sum exceeded a certain proportion of the total count in the enriched library (Figure 1). Based on the distributions, we chose a threshold of 0.2 to select the top piRNAs as the gold standard piRNAs in the enriched datasets lacking matched controls. For matched libraries, the top piRNAs with the same threshold were also selected regardless of significance, while for piRNAs falling into the 0.2–0.5 count proportion range, significantly (P< 0.05) enriched piRNAs were also selected into the gold standard piRNA set.
Figure 1.

The proportion of significantly enriched piRNAs with different cutoffs of the most abundant piRNAs. The piRNAs in each enriched dataset were sorted by counts, and the cutoffs of the top piRNAs were determined when the sum of the top piRNA counts exceeded a certain proportion of the total read count in the enriched dataset. The matched libraries are shown in Supplementary Table S3. 16.5dpc, 16.5 days post-coitum. fetal1st, 1st trimester embryos. fetal2nd, 2nd trimester embryos. Ox, oxidation treatment. Mili, Mili IP. Miwi, Miwi IP. Miwi2, Miwi2 IP.

The proportion of significantly enriched piRNAs with different cutoffs of the most abundant piRNAs. The piRNAs in each enriched dataset were sorted by counts, and the cutoffs of the top piRNAs were determined when the sum of the top piRNA counts exceeded a certain proportion of the total read count in the enriched dataset. The matched libraries are shown in Supplementary Table S3. 16.5dpc, 16.5 days post-coitum. fetal1st, 1st trimester embryos. fetal2nd, 2nd trimester embryos. Ox, oxidation treatment. Mili, Mili IP. Miwi, Miwi IP. Miwi2, Miwi2 IP.

DATABASE CONTENT

piRBase release v3.0 not only expanded on previous content and modules but also added some new content that could assist piRNA function studies more comprehensively. Given the large number of piRNAs, gold standard piRNA sets and piRNA clusters are provided in piRBase release v3.0. In addition to general information such as piRNA loci, piRNA targets and some basic annotations, more information on piRNAs is introduced into the current piRBase. Variants in piRNAs increase the diversity of piRNA sequences, and they may result in abnormal biological functions. Relevant information on splicing-junction piRNAs is shown on the detailed information page and is visualized in JBrowse (63). The collected piRNA-related data and the results processed by further analyses are released in v3.0.

New piRNA records and statistics

piRBase has the largest number of piRNA sequences and the most species compared to other similar databases. More than 181 million unique piRNA sequences from 44 species are included in piRBase release v3.0, and the number of datasets reaches 440. Figure 2 shows the statistics for the data in the current piRBase. In Figure 2A, the number of piRNA sequences for each species is presented. Figure 2B and C shows the number of datasets and the length distribution of piRNA sequences from different species in piRBase release v3.0, respectively.
Figure 2.

Statistics of piRNA data in piRBase release v3.0. (A) The number of piRNA sequences from each species. (B) Dataset composition of piRBase release v3.0. Three characters represent species (refer to Figure 2A), and the numbers represent the number of datasets. (C) The length distribution of piRNAs from each species.

Statistics of piRNA data in piRBase release v3.0. (A) The number of piRNA sequences from each species. (B) Dataset composition of piRBase release v3.0. Three characters represent species (refer to Figure 2A), and the numbers represent the number of datasets. (C) The length distribution of piRNAs from each species. piRBase release v3.0 contains the most piRNA sequences compared with other piRNA-related databases. piRBase includes all piRNA sequences in piRNAQuest (64) and piRNABank (65) except for platypus, which has no piRNA sequences provided in the related literature. In addition, there are 16 datasets in piRNAdb (https://www.pirnadb.org/), 4 of which are not in the 440 datasets of piRBase because these 4 datasets do not meet our inclusion criteria.

Gold standard piRNA set

To get more representative piRNA set, we tried to define the gold standard piRNA set based on two features of piRNAs: binding to Piwi protein and 2′-O-methylation at the 3′ end. We collected matched high-throughput sequencing data from six species, including human, mouse, rat, Drosophila melanogaster, cow, and crab-eating macaque, and marked the enriched sequences in Piwi IP- or oxidation-treated libraries as the gold standard piRNA set.

Splicing-junction piRNAs

It was reported that some piRNA precursors contain introns (15). In piRBase release v3.0, piRNAs not mapped to the genome in the first alignment step are further mapped using known splicing junctions. Some unmapped piRNAs could be located based on splicing junctions (Figure 3A). The relevant information is presented in the ‘Location’ panel of the piRNA detailed information page. The genes and repeats overlapping with splicing-junction piRNAs are also shown in the same panel. In addition, users can browse information near a piRNA in JBrowse (63) by clicking the link on the location, such as gene annotation, adjacent piRNAs and so on.
Figure 3.

New content and visualizations in piRBase release v3.0. (A) The number of splicing-junction piRNAs in the unmapped piRNAs during the Bowtie alignment step. (B) An example of an expression profile for one piRNA in human. (C, D) piRNA regulatory relationships in simplified mode (C) and detailed mode (D).

New content and visualizations in piRBase release v3.0. (A) The number of splicing-junction piRNAs in the unmapped piRNAs during the Bowtie alignment step. (B) An example of an expression profile for one piRNA in human. (C, D) piRNA regulatory relationships in simplified mode (C) and detailed mode (D).

piRNA clusters

A substantial fraction of piRNAs originating from genomic loci are termed piRNA clusters. We manually collected information on piRNA clusters from the related literature and assigned piRBase cluster IDs. In piRBase release v3.0, information on piRNA clusters of four species, including information on genomic location, strand, the number of piRNAs among the cluster region, genome version and species, is incorporated. The general information of a specific cluster and the piRNAs in the cluster are presented on the detailed information page. All the related piRNAs in this cluster can be browsed, and the detailed information page of each piRNA can be accessed through the piRNA name. piRNAclusterDB (66) includes data for many small RNA sequences that are not specific to piRNA studies, and piRNA clusters are predicted using proTRAC (67), while the piRNA clusters in piRBase were collected from high-quality literature. piRBase integrated only the information of piRNA clusters from different literature sources.

piRNA variants

Single nucleotide polymorphisms (SNPs) are a common type of genomic variant, and many SNPs are associated with phenotypes and diseases (68). The updated piRBase version shows the variants in piRNAs based on the genome position, the type of base substitution, and the number of different piRNAs containing a specific variant. In the detailed information page of piRNA variants, the general information and the related piRNAs containing the variant are displayed. For human, piRNA variants are annotated by RS IDs with links to other SNP function annotation databases, such as dbSNP (62), HaploReg (69), eQTL (70), PancanQTL (71) and piRNA-eQTL (72).

piRNA expression profiles

Numerous studies have shown that piRNAs are not specific to the germline. They also exist in many somatic tissues (73,74). In piRBase release v3.0, the CPM expressions of piRNAs from five species (Supplementary Table S2) was calculated, and it can be visualized for specific piRNAs on the piRNA details page (Figure 3B). In addition, users can select samples of interest to calculate the P value of the differential expression of piRNAs. This data provides the basis for the study of piRNA functions in different tissues.

piRNA targets

In the previous version, mRNA targets of piRNAs in mouse, silkworm, Caenorhabditis elegans and predicted lncRNA targets in mouse testis (58,75) were included. In this update, more piRNA targets were extracted from the published literature and added to piRBase. The target information of five species was included. As described in the previous version, piRNA target records can be browsed by selecting a species and accessing links to the detailed information page of piRNAs by piRBase ID. Moreover, on the detailed information page, there is a network that visualizes the regulatory relationships between a specific piRNA and its targets, which can be displayed in both simplified and detailed mode (Figure 3C, D).

piRNA/piRNA-like sRNA and disease

An increasing number of piRNA/piRNA-like sRNAs are closely associated with diseases (45). In piRBase release v3.0, we further manually collected not only cancer-, but also other types of disease-related piRNA/piRNA-like sRNAs from the related literature. 17 types of diseases (13 types of cancers, cardiovascular disease, stroke, Parkinson, and Alzheimer) were incorporated, and related information is exhibited on the corresponding pages.

Optimization of online tools

piRBase release v3.0 provides homepage search function. Users can search piRBase collected information by entering the piRBase ID. The ‘Name Convert Tool’ under the ‘Tools’ menu on the piRBase website allows one to convert piRNA names from not only the NCBI database (76) but also piRNABank (65), piRNAclusterDB (66), and other databases into piRBase names.

User interface and visualization

In piRBase release v3.0, the entire web interface has been redesigned to provide a better experience for users. All modules and functionality in previous versions have been kept and optimized. The new modules for piRNA variants and piRNA clusters have been added to the current version. In the detailed information page for each piRNA, there are panels for visualizing the expression of the piRNA in different datasets and the regulatory relationships between piRNAs and targets. In the current piRBase, we used the JBrowse (63) genome browser to visualize piRNA positions and annotations in different genomic regions. JBrowse is fast and easily embedded into websites. All species whose genome and annotation files are available can be visualized by selecting the appropriate species in JBrowse. In addition, information such as piRNA variants, piRNA target sites and piRNA-related epigenetic data can be linked to JBrowse in corresponding pages.

CONCLUSION

piRBase is a manually curated resource for piRNAs, that focuses on piRNA function analyses. piRBase release v3.0 covers more species and unique piRNA sequences than the previous version. According to the characteristics of piRNAs, piRBase release v3.0 provides the gold standard piRNA sets. To further expand the research on piRNA functions, potential information on splicing-junction piRNAs and piRNA variants is included in piRBase release v3.0. As potential biomarkers in diseases, disease-related piRNA/piRNA-like sRNAs information is collected not only for cancers but also for other diseases, such as cardiovascular disease, stroke, Pankinson and Alzheimer's disease. In addition to the further collection of piRNA targets, piRBase release v3.0 visualizes the regulation of piRNA targets, which makes the regulatory network of piRNAs more intuitive. In addition, the current version also provides the expression profiles of piRNAs in various tissues at different developmental stages and supports research on piRNAs in different tissues. In addition, the web interface of piRBase release v3.0 has been redesigned to provide a better user experience. We believe that piRBase release v3.0 will make valuable contributions to the field of piRNA.

DATA AVAILABILITY

piRBase release v3.0 is free to access, browse, search and download at http://bigdata.ibp.ac.cn/piRBase. The source code of analyses has been provided as Supplementary file 1. Click here for additional data file.
  76 in total

1.  piR-823, a novel non-coding small RNA, demonstrates in vitro and in vivo tumor suppressive activity in human gastric cancer cells.

Authors:  Jia Cheng; Hongxia Deng; Bingxiu Xiao; Hui Zhou; Fei Zhou; Zhisen Shen; Junming Guo
Journal:  Cancer Lett       Date:  2011-10-10       Impact factor: 8.679

2.  A distinct small RNA pathway silences selfish genetic elements in the germline.

Authors:  Vasily V Vagin; Alla Sigova; Chengjian Li; Hervé Seitz; Vladimir Gvozdev; Phillip D Zamore
Journal:  Science       Date:  2006-06-29       Impact factor: 47.728

3.  The 3' termini of mouse Piwi-interacting RNAs are 2'-O-methylated.

Authors:  Tomoya Ohara; Yuriko Sakaguchi; Takeo Suzuki; Hiroki Ueda; Kenjyo Miyauchi; Tsutomu Suzuki
Journal:  Nat Struct Mol Biol       Date:  2007-03-25       Impact factor: 15.369

4.  Developmentally regulated piRNA clusters implicate MILI in transposon control.

Authors:  Alexei A Aravin; Ravi Sachidanandam; Angelique Girard; Katalin Fejes-Toth; Gregory J Hannon
Journal:  Science       Date:  2007-04-19       Impact factor: 47.728

5.  C. elegans piRNAs mediate the genome-wide surveillance of germline transcripts.

Authors:  Heng-Chi Lee; Weifeng Gu; Masaki Shirayama; Elaine Youngman; Darryl Conte; Craig C Mello
Journal:  Cell       Date:  2012-06-25       Impact factor: 41.582

6.  NCBI GEO: archive for functional genomics data sets--update.

Authors:  Tanya Barrett; Stephen E Wilhite; Pierre Ledoux; Carlos Evangelista; Irene F Kim; Maxim Tomashevsky; Kimberly A Marshall; Katherine H Phillippy; Patti M Sherman; Michelle Holko; Andrey Yefanov; Hyeseung Lee; Naigong Zhang; Cynthia L Robertson; Nadezhda Serova; Sean Davis; Alexandra Soboleva
Journal:  Nucleic Acids Res       Date:  2012-11-27       Impact factor: 16.971

7.  Widespread expression of piRNA-like molecules in somatic tissues.

Authors:  Zheng Yan; Hai Yang Hu; Xi Jiang; Vera Maierhofer; Elena Neb; Liu He; Yuhui Hu; Hao Hu; Na Li; Wei Chen; Philipp Khaitovich
Journal:  Nucleic Acids Res       Date:  2011-05-05       Impact factor: 16.971

8.  Hamster PIWI proteins bind to piRNAs with stage-specific size variations during oocyte maturation.

Authors:  Kyoko Ishino; Hidetoshi Hasuwa; Jun Yoshimura; Yuka W Iwasaki; Hidenori Nishihara; Naomi M Seki; Takamasa Hirano; Marie Tsuchiya; Hinako Ishizaki; Harumi Masuda; Tae Kuramoto; Kuniaki Saito; Yasubumi Sakakibara; Atsushi Toyoda; Takehiko Itoh; Mikiko C Siomi; Shinichi Morishita; Haruhiko Siomi
Journal:  Nucleic Acids Res       Date:  2021-02-15       Impact factor: 16.971

9.  JBrowse: a dynamic web platform for genome visualization and analysis.

Authors:  Robert Buels; Eric Yao; Colin M Diesh; Richard D Hayes; Monica Munoz-Torres; Gregg Helt; David M Goodstein; Christine G Elsik; Suzanna E Lewis; Lincoln Stein; Ian H Holmes
Journal:  Genome Biol       Date:  2016-04-12       Impact factor: 13.583

10.  Sequence-dependent but not sequence-specific piRNA adhesion traps mRNAs to the germ plasm.

Authors:  Anastassios Vourekas; Panagiotis Alexiou; Nicholas Vrettos; Manolis Maragkakis; Zissimos Mourelatos
Journal:  Nature       Date:  2016-03-07       Impact factor: 49.962

View more
  3 in total

1.  HIV-1 Tat and cocaine coexposure impacts piRNAs to affect astrocyte energy metabolism.

Authors:  Mayur Doke; Fatah Kashanchi; Mansoor A Khan; Thangavel Samikkannu
Journal:  Epigenomics       Date:  2022-02-16       Impact factor: 4.778

2.  In Silico Study of piRNA Interactions with the SARS-CoV-2 Genome.

Authors:  Aigul Akimniyazova; Oxana Yurikova; Anna Pyrkova; Aizhan Rakhmetullina; Togzhan Niyazova; Alma-Gul Ryskulova; Anatoliy Ivashchenko
Journal:  Int J Mol Sci       Date:  2022-08-31       Impact factor: 6.208

3.  First characterization of PIWI-interacting RNA clusters in a cichlid fish with a B chromosome.

Authors:  Jordana Inácio Nascimento Oliveira; Adauto Lima Cardoso; Ivan Rodrigo Wolf; Rogério Antônio de Oliveira; Cesar Martins
Journal:  BMC Biol       Date:  2022-09-21       Impact factor: 7.364

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.