Literature DB >> 24228136

Cross-kingdom sequence similarities between human micro-RNAs and plant viruses.

Jovan D Rebolledo-Mendez¹, Radhika A Vaishnav, Nigel G Cooper, Robert P Friedland.

Abstract

Micro-RNAs regulate the expression of cellular and tissue phenotypes at a post-transcriptional level through a complex process involving complementary interactions between micro-RNAs and messenger-RNAs. Similar nucleotide interactions have been shown to occur as cross-kingdom events; for example, between plant viruses and plant micro-RNAs and also between animal viruses and animal micro-RNAs. In this study, this view is expanded to look for cross-kingdom similarities between plant virus and human micro-RNA sequences. A method to identify significant nucleotoide sequence similarities between plant viruses and hsa micro-RNAs was created. Initial analyses demonstrate that plant viruses contain nucleotide sequences which exactly match the seed sequences of human micro-RNAs in both parallel and anti-parallel directions. For example, the bean common mosaic virus strain NL4 from Colombia contains sequences that match exactly the seed sequence for micro-RNA of the hsa-mir-1226 in the parallel direction, which suggests a cross-kingdom conservation. Similarly, the rice yellow stunt viral cRNA contains a sequence that is an exact match in the anti-parallel direction to the seed sequence of hsa-micro-RNA let-7b. The functional implications of these results need to be explored. The finding of these cross-kingdom sequence similarities is a useful starting point in support of bench level investigations.

Entities: Chemical Disease Gene Species

Keywords: Tobacco mosaic virus; bioinformatics; environment; evolutionary conservation; health; human miRNA; molecular mimicry; plant virus; post-transcriptional regulation; sequence homology

Year: 2013 PMID： 24228136 PMCID： PMC3821693 DOI： 10.4161/cib.24951

Source DB: PubMed Journal: Commun Integr Biol ISSN： 1942-0889

Introduction

Micro-RNAs (miRs), first described approximately two decades ago, have gained much attention in the research community for their proposed regulatory influences on biological systems. Composed of short non-coding RNA molecules of 19–26 nucleotides in length, miRs can post-transcriptionally regulate gene expression and influence cellular phenotypes, such as: cell cycle regulation, oral carcinoma, heart disease and development of the central nervous system. Thus, miRs are key modulators of messenger-RNAs (mRNAs) usually operating through their complementary interactions with the 3′ untranslated regions (3′ UTRs) of mRNA sequences. These complementary interactions between miRs and mRNAs regulate the process of translation and thereby alter the abundance of specific proteins and modulate downstream cellular processes. Interference of the miR-associated regulatory events will likely have relevance to the etiology and expression of particular cellular and tissue phenotypes; regulatory events will also be involved in the etiology and/or expression of many pathologies. The putative target genes whose expression is regulated by particular miRs can be predicted by statistical and computational modeling methods- if the miR sequence is known. Similarly, bioinformatics approaches have been used for the prediction of novel miRs, as well as for predicting the targets of miRs.,, The miR based regulatory system is also present in plants,, and a recent provocative study described the presence of plant derived miRs in mammalian serum. This study demonstrates the ability of plant miRs to escape the GI tract and enter various organs and tissues of the body and influence gene expression. The importance of this report lies in its indication of cross-kingdom interactions. There are other recent reports of potential involvement of miRs in cross-kingdom and cross-species interactions. Historically, plant viruses have generally been considered safe for human exposure, with little direct evidence of their ability to replicate in humans, or cause or contribute to cellular function or disease. However, it was recently shown that Cowpea mosaic virus (CPMV) can enter the bloodstream after oral administration and can localize in endothelial cells of the blood brain barrier. A commonly consumed plant virus, the pepper mild mottle virus (PMMV) is detectable in stool and can induce an IgM antibody response in humans. Besides the possibility of invoking immune responses, plant viruses may be capable of influencing human gene expression via more direct interactions with genetic materials. The need to look for miR-like sequences in plant viruses is further supported by a recent bioinformatics study demonstrating that some animal viruses contain miRs. Our goal was to demonstrate if plant virus sequences have sequence similarity with hsa-miRs, either in the parallel (plus/plus) or antiparallel (plus/minus) directions; by similarity meaning the percent of identity of nucleotide sequences. In pursuit of these objectives, we developed a sequence processing pipeline, where data has been filtered and carefully curated to allow for rational selection of candidate plant virus- human miR interactions to be used for future bench assays of function, in order to better understand cross-kingdom interactions, which may be beneficial or harmful.

Results

For step 1, we used DPVweb, which contains both complete and partial genomes for plant viruses. Therefore, for plant viruses, 37,266 genome sequences were downloaded from DPVweb. A total of 1,424 human miRs were downloaded from miRBase as data set-B (Fig. 1). As many as 478 of these miRs had two mature sequences. Therefore, the total number of sequences in data set-B was actually 1,902. The final data sets from these external databases included 37,266 plant viral sequences in data set-A and 1,902 miR sequences in data set-B.

Figure 1. Pipeline method for determination of similarities between plant viruses and human miRs. Process pipeline developed for finding similarities between named plant viruses and human miRs. Step 1 includes the establishment of the two data sets including (A) plant viruses downloaded from DPVweb, and (B) hsa-miRs, downloaded from miRBase. Step 2 includes the use of NCBI BLASTn for sequence alignments to find sequence similarities between data sets A and B, where the maximum expectation value found is 9.5 and the shortest matching sequence is 6 nucleotides. Step 3 includes the use of filters based on rules: (A) plant viruses must have at least 1,221 nucleotides; (B) there must be at least 11 or more nucleotides of similarity in such match; and (C) the BLAST sequence should start in any of the nucleotides from within the miR seed region. The final database can be used for additional data integration and analyses. It was used here to perform clustering analyses and also for identification of viruses that contained sequences that showed similarities with miR seed sequence. In step 2 of the pipeline, the alignment of nucleotide sequences from among these two data sets was accomplished with the aid of BLASTN version 2.2.25+., The alignment resulted in two streams of output which included “matches” between viral and miR sequences, consisting of strings of six or more consecutive nucleotides, and “no-matches.” The data stream without matching sequences was discarded and the stream with matching sequences was saved and used for further processing. The saved data stream contained over twenty-thousand matching sequences, which were of two categories. One category included 11,653 matching pairs of parallel sequences (plus/plus) and the other category included 11,595 matched pairs of anti-parallel (plus/minus) or complementary sequences. The output of step 2 was a list of statistically significant matching sequences that showed some level of sequence matching (plus/plus and plus/minus matching) between plant viral sequences and human miR sequences. The data stream was further processed in step 3 with the aid of selection filters to insure that only high quality matches were retained. For example, one of the filters selected only those matching sequences whose plant virus length part would be at least 1,221 nucleotides to avoid possible duplications from partial viral sequences. In addition, the sequence match had to be 11 or more nucleotides in length, and this could include the first nucleotide of the mature miR. This filter made it less likely that sequence matches were the result of chance. Finally, we included the matches whose sequence region included the complete seed region (nucleotides 2–7) of the miR. The total filtering process led to reduction of the data to include a total of 2,479 matching sequences, including parallel and anti-parallel sequence sub-categories. From the 2,479 matched sequence pairs there were interactions between 78 miRs (4.1% of the 1,902 miRs) and 1,320 different plant viruses (3.54% of the 37,266 plant viruses).

Data analyses

A clustering analysis of the sequences in the database was adopted, as described below, to get a better understanding of the overall content of the database.

Plant viruses

This analysis showed the presence of four major clusters (A–D), in which plant viruses were matched through various levels of sequence identity with one or more hsa-miRs. The identification of clusters was based on the BLAST bit-scores and the length of their shared sequence similarities (Fig. 2). The four clusters were labeled according to size. For example, the smallest cluster A contained 25 viruses, each of which had sequences of 14 nucleotides in length that shared identities with hsa-miRs. The cluster B contained 63 viruses with slightly shorter sequence identities and cluster C contained 204 matching viruses with still shorter sequence identities. Cluster D, the largest cluster, contains 615 viruses each with sequences of 11 nucleotides in length sharing identities with hsa-miRs. Each of the major clusters intersected with one or more of the other clusters. For example, the intersection between clusters A and B consists of 14 viruses, with either 14 or 13 nucleotide sequence lengths, sharing identities. The intersection between B and C contained just one virus, rice tungro spherical virus (am234049) that matched sequences of either 11 or 12 nucleotides in length depending on the major cluster. While clusters A and B are subsets of cluster D, cluster C was not. It overlapped with clusters D and B, but not with A.

Figure 2. Overlapping clusters of plant viruses based on human miRs with which they have sequence similarities. Venn diagram showing the four subsets (A–D), based on the bit-score for all the plant viruses that had similarity sequences with human miRs. These data were obtained as the result from Step 3 (Fig. 1). The clusters represent the entire holdings in the database. The priority in this diagram was to show the BLAST bit-scores distributed among the different subsets of plant viruses. Subset (A) (color olive) contains 25 matching viruses with score of 28.4 bits and 14 nucleotides long; subset (B) (color dark khaki) contains 63 matching viruses with score of 30.2 bits and 13 nucleotides long; subset (C) (color dark green) contains 204 matching viruses with score of 24.3 bit and 12 nucleotide long; subset (D) (color pink) contains 615 matching viruses with score 22.3 bit and 11 nucleotides long; cluster I1 (color gray) contains 14 matching viruses with score of 24.3 bits (14 nucleotides) and 26.3 (13 nucleotides); and cluster (I2) (color bright green) contains 1 matching virus, rice tungro spherical (am234049) with scores of 26.3 bits (13 nucleotides long), 24.3 bits (12 nucleotides long) and 22.3 bits (11 nucleotides long).

Rank-ordered plant viruses

The viruses that shared some sequence identity (plus/plus or plus/minus) with one or more miRs were rank-ordered according to the number of miRs with which they showed some level of similarity. The two top ranked viruses are listed (Table 1) with their corresponding miR partners. The top ranked plant virus was the rice yellow stunt virus (ab516283), which has sequence similarities with twelve different human miRs. The second most frequent was Citrus tristeza virus, for which there are multiple strains. Strain T30 (eu937520) matched with ten miRs. Strains [Citrus tristeza virus-isolate NZRB-M12 (fj525431); Citrus tristeza virus isolate NZRB-TH28 (fj525433); Citrus tristeza virus isolate NZRB-TH30 (fj525434); and, Citrus tristeza virus isolate NZRB-M17 (fj525435)] all matched the same ten miRs. Finally, Citrus tristeza virus isolate NZRB-G90 (fj525432) shared some sequence identity with nine miRs. Citrus tristeza virus T30 (af260651) also shares sequence identity with nine different miRs.

Table 1. Plant virus ranking

Rank	Virus Name	DPVweb platn virus index	Virus strain	hsa miRs
1	Rice yellow stunt virus viral cRNA	ab516283		mir-1253, mir-1248, let-7b, let-7a-1, let-7a-3, mir-107, let-7a-2, mir-103a-1, mir-1-1, let-7c, mir-103b-1, mir-103b-2
2	Citrus tristeza virus	eu937520	Strain T30	*mir-122, mir-1244-1, mir-1244-2, mir-1244-3, let-7f-1, let-7f-2, mir-1184-1, mir-1184-2, mir-1184-3, mir-1228
		fj525431,fj525433, fj525434,fj525435,	Isolate NZRB-M12, isolate NZRB-TH28, isolate NZRB-TH30, isolate NZRB-M17	mir-1244-1, mir-1244-2, mir-1244-3, let-7f-1, mir-1184-1, mir-1184-2, mir-1184-3, let-7f-2, mir-1180, mir-1193
		fj525432	Isolate NZRB-G90	mir-1244-1, mir-1244-2, mir-1244-3, let-7f-1, mir-1184-1, mir-1184-2, mir-1184-3, let-7f-2, mir-1193
		af260651	T30	*mir-122, mir-1244-1, mir-1244-2, mir-1244-3, let-7f-1, mir-1184-1, mir-1184-2, mir-1184-3, let-7f-2

Ranking of the two plant viruses that have the most number of sequences matching across the human miRs. Rice yellow stunt virus viral cRNA followed by Citrus tristeza virus were the plant viruses that ranked the best matching similarities. The latest plant virus has seven strains that matched with has miRs, where mir-122 can be used to identify the difference between the fj strains, and af and eu strains.

Human micro-RNAs

This analysis showed the presence of four major miR-clusters (A–D), in which hsa-miRs shared some level of identity with one or more viruses based on the quality of the BLAST bit-scores length of shared sequence similarities (Fig. 3). For example, the smallest cluster A had the best bit-scores, with ten miRs, each of which had a 14 nucleotide length of sequence similarities with plant viruses. The cluster B had 16 miRs each with a 13-nucleotide length which had sequence identities with viruses. Cluster C contained 48 miRs with lengths of 12 nucleotides sharing identities with viruses. Cluster D contained 73 miRs, each of which had sequence similarities of 11 nucleotides in length with viruses. Some of the clusters had overlapping subsets. For example, the intersection of clusters A and B contained two miRs shared by each cluster. In addition, several miRs were found to have more than one sequence similarity with a virus sequence. While there are 314 such matches, there was only one instance of a miR having three sequence similarities with a virus sequence (data not shown).

Figure 3. Overlapping clusters of hsa-miRs based on the number of plant viruses which they have sequence similarities. Diagram showing the clustering of human miRs based on their sequence similarities with plant viruses. These data were obtained as an outcome from Step 3 (Fig. 1). This figure represents the entire holdings in the database. The priority for this particular result was set to show the subsets of hsa-miRs based on their BLAST bit-scores. Subset (A) (color bright green) contains 10 matching miRs with score of 28.2 and 14 nucleotides long; subset (B) (color dark khaki) contains 16 matching miRs with score of 26.3 bits and 13 nucleotides long; subset (C) (color dark green) contains 48 matching miRs with score of 24.3 bits and 12 nucleotides long; subset (D) (color pink) contains 73 matching miRs with 11 nucleotides long; and subset (I1)(color yellow green) contains 2 matching miRs that are contained in each other score subset.

Rank-ordered micro-RNAs

The top four rank-ordered human miRs are listed (Table 2) with their corresponding numbers of plant virus partners. These sequences contained the complete seed sequence of the identified miR and sequence similarities of 11 or more nucleotides in length. Some of the plant viruses have a number of different strain identities (for example, see Table1). The miR with the highest number of shared sequence identities was hsa-mir-1253 with some sequence identity to 136 plant viruses/strains (Table 2). The second highest was hsa-mir-1243 with some level of similarity to 110 plant viruses. The third ranked included hsa-mir-105-1 and hsa-mir-105-2 with similarities to the same 83 plant viruses. The fourth ranked miR was hsa-mir-1207, which had similarities to 81 plant viruses. A top four miR listing, rank ordered by the number of shared identities with named plant viruses, is available as a .

Table 2. miR ranking

Rank	miRs	Number of viruses/strains
1	hsa-mir-1253	136
2	hsa-mir-1243	110
3	hsa-mir-105-1	83
3	hsa-mir-105-2	83
4	hsa-mir-1207	81

Top rank ordered hsa-miRs that share identity with one or more plant viruses, with 11 or matching nucleotides including the complete seed region

Analysis of the seed sequence

The region of the miRs considered to be the most important in miR-mRNA interactions is the miR-seed region (nucleotides 2–7). For our analysis, the total seed sequence of the miRs had to be evident in the viral sequences in a parallel or complementary fashion (plus/plus or plus/minus). In addition, for this particular analysis, the total sequence similarity had to be at least 11 nucleotides long, which is about 50% the length of a mature miR. Together, these requirements led to the output of a total of 1,073 sequence matches. This list was examined for sequence matches that appeared to be the most representative of genuine miRs. For example, it contained sequence matches between viruses and miRs of between 11 and 15 nucleotides in length. Two subgroups were also identified: (1) those in which the seed sequence between nucleotides 2–7 were identical between plant virus and miR; and (2) those in which the seed sequences were identical and where the first nucleotide prior to the seed sequence was also a match. These requirements led to a list of 555 sequences with the plus/plus or parallel configuration (whose selected top 20 candidates are found in the ). A few of these are listed below (Table 3). While there were a number of sequence similarities between plant viruses and miRs which had lengths exceeding 14 nucleotides, their bit-scores were either not as good as those described above or they did not match the complete seed sequence (data not shown). Perhaps it is not surprising that strains of a given virus have the same sequence similarities even when it contains a miR seed sequence (Table 3, Fig. 4).

Table 3. Best plus/plus matching ranking based on seed region

DPVweb plant virus index	Virus name	Virus sequence	miR sequence	miR name
dq666332	Bean common mosaic virus strain NL4 from Colombia	gtgagggcatgcag	gtgagggcatgcag	1226
eu761198	Bean common mosaic virus isolate MS1	gtgagggcatgcag	gtgagggcatgcag	1226
af017780	Sour cherry green ring mottle virus	caaatgctcagact	caaatgctcagact	105-1
aj312438	Bean common mosaic virus cowpea isolate Y	gtgagggcatgcag	gtgagggcatgcag	1226
af017780	Sour cherry green ring mottle virus	caaatgctcagact	caaatgctcagact	105-2
ay575773	Blackeye cowpea mosaic virus	gtgagggcatgcag	gtgagggcatgcag	1226
aj312437	Bean common mosaic virus cowpea isolate R	gtgagggcatgcag	gtgagggcatgcag	1226
l32603	Sonchus yellow net virus	cagttatcacagtg	cagttatcacagtg	101-1

Figure 4. Nucleotide sequences showing similarity between mir-105-1, mir-105-2 and sour cherry green ring mottle virus (af017780). The partial sequences for mir-105-1 and mir-105-2 are aligned with a section of af017780. Note that miR105-1 and miR105-2 are both quite similar, with the exception of a single nucleotide (dotted-lined box). Both mir-105-1 and mir-105-2 contain 81 nucleotides. The length of the plant virus is 8,372 nucleotides, but the significant subset, nucleotides 695–708, that accompanies the similarities with mir-105-1 and mir-105-2 is shown here. The solid-lined box shows the seed region matching completely in all three sequences.

Highest ranking viruses which contain the complete seed sequence of hsa-miRs with plus/plus strand and a similarity of 14 nts. Among these, Bean common mosaic virus has different strains that have a high matching with the seed region of miRs. miR 1226 shares complete seed region matching with two different plant viruses. Figure 4. Nucleotide sequences showing similarity between mir-105-1, mir-105-2 and sour cherry green ring mottle virus (af017780). The partial sequences for mir-105-1 and mir-105-2 are aligned with a section of af017780. Note that miR105-1 and miR105-2 are both quite similar, with the exception of a single nucleotide (dotted-lined box). Both mir-105-1 and mir-105-2 contain 81 nucleotides. The length of the plant virus is 8,372 nucleotides, but the significant subset, nucleotides 695–708, that accompanies the similarities with mir-105-1 and mir-105-2 is shown here. The solid-lined box shows the seed region matching completely in all three sequences.

Statistical analysis of similarities

Although we used BLAST’s bit-score to define statistical significance, we decided to compare these results with those obtained from randomly generated sequences. The correspondent p values were equal to 8.94E-3 for the length of the sequence similarity, and equal to 4.23745E-10 for the relative location of the similarity inside the plant virus genome. These results demonstrate that two data sets, namely the similarities from the database obtained from the randomly generated plant viruses and the one using the real human miRs database, are significantly different from each other and that their statistical distribution is also different. These results indicate the presence of a much higher degree of specificity in the nature.

Specific cases

While the presence of viruses in plants has obvious relevance to the agricultural economy, it has been suggested that there may be consequences for human health.,, The presence of miR-like sequences in plant viruses associated with four particular plants where some health relevance is clearly plausible was investigated. In this case, the viral sequence had to match half the length of a miR (11 nucleotides) or more. Those data that included complete seed sequences are summarized below and in Table 4. For these following analyses the matching sequences that contained the virus in both forms: complete and incomplete genomes were selected.

Table 4. Selected plant viruses matched sequences

Plant virus index	Plant virus name	Plant virus sequence	miR sequence	hsa miR
ay555269	Tobacco mosaic virus coat protein mRNA, complete cds.	224 cggtgctggat 234	2 cggtgctggat 12	hsa-mir-1250
z29370	Tobacco mosaic virus (Crucifer) genomic RNA for RNA-dependent RNA polymerase; 122K protein; transport protein and coat protein	832 tcaccagccct 822	1 tcaccagccct 11	hsa-mir-1226
am236832	Potato virus Y partial gene for polyprotein, helper component-proteinase, HC-Pro region, genomic RNA, isolate Anhui5	683 actggatcaatt 672	2 actggatcaatt 13	hsa-mir-1243
aj439544	Potato virus Y gene for polyprotein, genomic RNA, isolate SON41	6646 cagttatcacag 6657	1 cagttatcacag 12	hsa-mir-101-1
aj890348	Potato virus Y strain C gene for polyprotein, genomic RNA, isolate Adgen	6757 acggatgtttg 6767	1 acggatgtttg 11	hsa-mir-105-1
aj890348	Potato virus Y strain C gene for polyprotein, genomic RNA, isolate Adgen	6757 acggatgtttg 6767	1 acggatgtttg 11	hsa-mir-105-2
u06789	Potato virus Y-VN genome, 3′ untranslated region.	850 gagggtcttgg 860	1 gagggtcttgg 11	hsa-mir-1182
af395678	Blackeye Cowpea mosaic virus polyprotein gene, partial cds.	1054 gtgagggcatgcag 1067	1 gtgagggcatgcag 14	hsa-mir-1226
aj515903	Blackeye Cowpea mosaic virus partial gene for polyprotein, genomic RNA	629 gtgagggcatgcag 642	1 gtgagggcatgcag 14	hsa-mir-1226
s66280	Polymerase, coat protein [blackeye Cowpea mosaic virus BlCMV, W, Genomic, 254 nt, segment 2 of 2].	193 gtgagggcatgcag 206	1 gtgagggcatgcag 14	hsa-mir-1226
x00206	Cowpea mosaic virus bottom component RNA (B RNA)	4830 tcaccagccct 4820	1 tcaccagccct 11	hsa-mir-1226
af395678	Blackeye Cowpea mosaic virus polyprotein gene, partial cds.	267 cagatgatcta 257	2 cagatgatcta 12	hsa-mir-1245b
s66253	Polymerase, coat protein [blackeye Cowpea mosaic virus BlCMV, W, Genomic, 945 nt, segment 1 of 2].	348 cagatgatcta 338	2 cagatgatcta 12	hsa-mir-1245b
dq198144	Blackeye Cowpea mosaic virus strain Coimbatore coat protein gene, partial cds.	417 tcacacctgcc 427	1 tcacacctgcc 11	hsa-mir-1228
eu170481	Southern Cowpea mosaic virus coat protein gene, complete cds.	773 gtgttcacagc 763	2 gtgttcacagc 12	hsa-mir-124-1
eu170481	Southern Cowpea mosaic virus coat protein gene, complete cds.	773 gtgttcacagc 763	2 gtgttcacagc 12	hsa-mir-124-2
eu170481	Southern Cowpea mosaic virus coat protein gene, complete cds.	773 gtgttcacagc 763	2 gtgttcacagc 12	hsa-mir-124-3
aj308228	Pepper mild mottle virus proviral gene for 183 kDa protein, gene for 126 kDa protein, gene for 28 kDa protein and cp gene, genomic RNA	531 ctgcgcaagct 521	1 ctgcgcaagct 11	hsa-let-7i
fn594778	Pepper mild mottle virus partial 126K gene for replicase small subunit, isolate P96/41, genomic RNA	51 ctgcgcaagct 41	1 ctgcgcaagct 11	hsa-let-7i
fn594779	Pepper mild mottle virus partial 126K gene for replicase small subunit, isolate P03/17, genomic RNA	51 ctgcgcaagct 41	1 ctgcgcaagct 11	hsa-let-7i
fn594780	Pepper mild mottle virus partial 126K gene for replicase small subunit, isolate P95/25, genomic RNA	51 ctgcgcaagct 41	1 ctgcgcaagct 11	hsa-let-7i
fn594781	Pepper mild mottle virus partial 126K gene for replicase small subunit, isolate P95/22, genomic RNA	51 ctgcgcaagct 41	1 ctgcgcaagct 11	hsa-let-7i
fn594782	Pepper mild mottle virus partial 126K gene for replicase small subunit, isolate P02/4, genomic RNA	51 ctgcgcaagct 41	1 ctgcgcaagct 11	hsa-let-7i
fn594783	Pepper mild mottle virus partial 126K gene for replicase small subunit, isolate P02/5, genomic RNA	51 ctgcgcaagct 41	1 ctgcgcaagct 11	hsa-let-7i
fn594784	Pepper mild mottle virus partial 126K gene for replicase small subunit, isolate P84/4, genomic RNA	51 ctgcgcaagct 41	1 ctgcgcaagct 11	hsa-let-7i
fn594785	Pepper mild mottle virus partial 126K gene for replicase small subunit, isolate P89/4, genomic RNA	51 ctgcgcaagct 41	1 ctgcgcaagct 11	hsa-let-7i
fr671374	Pepper mild mottle virus partial 126K gene for replicase, isolate P95/23, genomic RNA	51 ctgcgcaagct 41	1 ctgcgcaagct 11	hsa-let-7i

Selected plant viruses in order of TMV, PVY, CPMV, and PMMoV, showing matching between plant virus and miR sequences.

Selected plant viruses in order of TMV, PVY, CPMV, and PMMoV, showing matching between plant virus and miR sequences. Tobacco mosaic virus (TMV) is a positive-sense, single-stranded RNA virus that infects more than 150 plant species including tobacco, tomatoes and peppers.TMV is found in cigarettes and chewing (smokeless) tobacco and in the saliva of smokers, and we have detected antibodies to TMV in human serum., Two TMV strains shared some sequence identities with hsa-mir-1226 and hsa-mir-1250, with a plus/minus and a plus/plus strand respectively (Fig. 5 and Table 4). Both of these shared sequences included coverage of the complete miR seed sequences (Fig. 5).

Figure 5. Two strains of Tobacco mosaic virus, z29370 and ay555269, show sequences similarity to human miRs. The mir-1226 and virus z29370 have an antisense similarity (plus/minus), while mir-1250 and virus ay555269 have a sense (plus/plus) similarity. The seed region of the miRs is shown in the solid-line box. In (A) the similarity starts one nucleotide before the seed region, and it is followed by three nucleotides after such similarity. (B) shows the similarity is extended four nucleotides after the seed region. Potato virus Y (PVY) is an RNA virus in the family Potyviridae that infects potatoes, peppers, tomatoes and tobacco. Infection with PVY limits crop yield but does not destroy all growth. PVY is found worldwide, and it is estimated that 15% of potato crops are infected. It is likely that some potatoes consumed by humans are infected with PVY. Previous studies have shown similarities between this plant virus protein and amyloid Aβ 1–42 peptide, a protein found in plaques in Alzheimer-diseased brains. Specifically, mouse antibodies to PVY bind to amyloid Aβ. An analysis showed sequence similarity of PVY with five human miRs including hsa-mir-1243, hsa-mir-101-1, hsa-mir-105-1, hsa-mir-105-2 and hsa-mir-1182 (Table 4). The miR-gene targets for each of these miRs were identified in miRbase (data not shown). The gene targets for these miRs were examined for any potential overlap (Figs. 6 and 7). Overlapping gene identities included KLF12, C17orf39, RUNX1, ANKRD52, CAMK2G and the locus ID, LOC100507421 (Fig. 6). Similarly, the clusters of the targeted genes for miR-1243, miR-101 and miR-105 contained three genes in common, including SMAD2, SLMO2, NAP1L1 (Fig. 7).

Figure 7. Gene sets similarities across miRs relevant to miR-101/101ab, miR-105/105ab, and miR-1243. Groups from the sets that contain the target genes, taken from miRBase, forthe three miRs: miR-101/101ab (with 803 genes), miR-105/105ab (with 479 genes) and miR-1243 (with 139 genes). The cluster that belongs to all the three groups is formed by three genes: SMAD2, SLMO2, NAP1L1.

Figure 6. Gene sets similarities across miRs relevant to miR-101/101ab, miR-105/105ab, and miR-1182. Clusters from the sets that contain the target genes, taken from miRBase, for the three miRs: miR-101/101ab (with 803 genes), miR-105/105ab (with 479 genes) and miR-1182 (with 110 genes). The cluster that belongs to all the three groups is formed by 6 genes/loci: LOC100507421, KLF12, C17orf39, RUNX1, ANKRD52, CAMK2G. Figure 7. Gene sets similarities across miRs relevant to miR-101/101ab, miR-105/105ab, and miR-1243. Groups from the sets that contain the target genes, taken from miRBase, forthe three miRs: miR-101/101ab (with 803 genes), miR-105/105ab (with 479 genes) and miR-1243 (with 139 genes). The cluster that belongs to all the three groups is formed by three genes: SMAD2, SLMO2, NAP1L1. Cowpea mosaic virus (CPMV) is a plant comovirus in the superfamily of picornaviruses, and which has recently been explored as a prospective nanoparticle delivery system for use via oral and systemic routes. It has been shown to enter the bloodstream and cells of mammals following oral intake., Plant virus strains af395678, aj515903, s66280 and x00206 share sequence similarity with hsa-mir-1226. Strains af395678 and s66253 have a match with hsa-mir-1245b. Strain dq198144 has a match with hsa-mir-1228, and strain eu170481 has matches with three miRs including: hsa-mir-124-1, hsa-mir-124-2 and hsa-mir-124-3 (Table 4). Pepper mild mottle virus (PMMoV) is a non-enveloped, rod-shaped, single-stranded, positive-sense RNA virus classified in the genus Tobamovirus. It is extremely resistant to physical and chemical agents and is one of the major pathogens of Capsicum (chili peppers). The pepper mild mottle virus is highly abundant in human stool and is easily transmitted by the oral route, by handling during cultivation, and is also transmitted through contaminated seeds. Seven strains of PMMoV (aj308228, fn594778, fn594779, fn594780, fn594781, fn594782, fn594783, fn594784, fn594785, fr671374) contain sequences that shared sequence identity including the complete seed sequence with the miR hsa-let-7i (Fig. 8 and Table 4).

Figure 8. Sequence similarities between pepper mild mottle virus and hsa-let-7i. The seed region is inside the solid-line box. The complete sequence similarity between the different strains of the virus and the hsa-mir-let-7i consists of 11 nucleotides. Every single PMMoV is complementary (Plus/Minus) to the miR. The virus strain aj308228 shares the similarity from nucleotide 531, while all the other strains do so from nucleotide 51 (antisense).

Discussion

We report here the novel finding of sequence similarities between plant viruses and human cellular miRs. To our knowledge, such a large-scale comparison of plant virus and human miR sequences has not previously been reported. While miRs have been identified in DNA-viruses, there has been less evidence for the presence of miRs in RNA viruses and it is currently held that those viruses most likely to encode miRs would be DNA viruses. The novel finding of cross-kingdom similarities, in particular between the seed sequences of plant viral sequences, most of which are RNA viruses, and hsa-miRs, may provide an indication of the evolutionary conservation of this particular structural-functional feature of miRs. As viruses are known to undergo rapid molecular evolution it may be useful to explore further the relative conservation between the virus seed-like and non-seed sequences. Computational tools are helping to open up the field of host-pathogen interactions, in part through the ability to assess similarities and/or complementarities between nucleotide sequences in large data sets. The results presented here indicate the presence of sequence similarities both for the parallel and anti-parallel directions. One example of similarity in the parallel (plus/plus) direction between a plant virus and a hsa-miR is the bean common mosaic virus cowpea isolate Y and hsa-mir-1226, where 14 nucleotides are shared and which includes the miR's seed region. It remains to be seen if a full length viral miR would be comprised of this 14 nucleotides sequence plus other contiguous nucleotides, but certainly, virus encoded miRs have been reported. The existence of viral miR-like sequences opens the intriguing possibility that they may be able to interact with cellular mRNAs, and essentially compete with cellular miRs. In the human, there are 312 conserved targets for hsa-mir-1226, including NRP1, involved in control of cell migration, and NPAS3, involved in neurogenesis. Thus, viral miR-like sequences could have far reaching consequences if they gain access to the cellular machinery. However, it is not clear that the real targets of plant virus miRs will be found in the cells of metazoans, whether plant or animal, and it is possible that the natural targets of viral miR-like sequences will be found within the RNA genome of the virus itself. These are questions that await further bench level investigations. The principal action of cellular miRs is to downregulate the activity of cellular mRNAs through complementary base-pairing interactions. Perez-Quintero and colleagues (2010) propose a broader view in which plant miRs provide antiviral activity for plants. Similarly, animal miRs targets appear to have antiviral properties against animal viruses., Plant cellular miRs do not appear to have more putative base-pairing targets with plant infecting viruses than with animal infecting viruses. If so, it would seem reasonable to suggest that the defensive nature of the miR activities is a conserved property which to some extent, at least, employs a mode of action that crosses kingdoms. This suggests that base-pairing between miR sequences with other RNA nucleotide sequences may have implications beyond the cellular mRNA regulatory machinery with which it has been characteristically associated. For example, hsa-miR-122 has been shown to interact with complementary sites on the Hepatitis-C RNA genome and may be regarded as a regulator of viral replication. Another example is the inhibitory effect exerted by respiratory syncytial virus (RSV) on hsa-mir-221. Uncovering the general principle of such activities could lead to a more complete understanding of host defense mechanisms. The presence of miR-like sequences in the viral RNA genome will compete with and impair appropriate cellular miR function if accessibility is not an issue., We also found that the sets of real plant viruses and randomly created sequences have similarities with human miRs, based on the statistically significance of BLAST’s bit-score. Nevertheless and based on p values, we obtained a strong evidence that randomness does not match statistically significant different examples of plant virus sequences that were anti-parallel to hsa-miR seed sequences were found. For example the soybean dwarf virus (ab038147) contains a plus/minus sequence and has a 100% match with the seed regions of hsa-mir-103a-1 and hsa-mir-103a-2, while also matching a total of 14 nucleotides with them. For those viruses that contain such anti-parallel or plus/minus sequences, it could be that they too are players in the viral regulatory machinery. Alternatively, if cellular systems are considered, these viral sequences would most likely base pair with the cellular miRs to reduce their activity. The family of hsa-mir-103a has 650 predicted target genes associated with it, including SCN1A, a sodium channel which is responsible for action potential initiation and propagation in neurons, GABRG2, a GABA-A receptor with associated chloride channels and CACNA2D1, a calcium channel gene, also present in neurons. Again, it is not clear that such viral complementary sequences would have access to hsa-miRs or vice versa, but if they did they might have rather widespread effects. It has been shown that noncoding RNA in herpes virus saimiri binds to and causes the degradation of human miR-27 resulting in enhanced infection., Alternatively, cellular miRs may bind to complementary viral sequence and thereby regulate its ability to transfect cells. Other studies have suggested interactions between viral sequences and cellular miRs.,,, To our knowledge this is the first large-scale demonstration of sequence similarities between plant viruses and human miRs. We found a large number of such similarities, many of which demonstrate that plant virus genomes contain sequences that are identical to human miRs. In addition, some plant viruses contain sequences that were complementary to human miRs. These results may be important from an evolutionary point of view. We propose to use the data reported in our study to support future bench level investigations of functional significance.

Materials and Methods

A data processing pipeline was developed for collecting sequences from plant virus genomes and hsa-miRs and for procuring high quality comparative data, as depicted in Figure 1.

Step 1

Sequences from two databases were downloaded: (1) descriptions of plant viruses (DPVweb), including the complete set of independent files corresponding to plant virus genomes (Fig. 1, data set-A; downloaded on April 7th, 2011); and (2) the miRBase sequence database (version 16, November 2nd, 2011), a repository of published miR sequences and associated annotations. From miRBase, we selected only those sequences for human miRs (hsa-miRs; Fig. 1, data set B). The set of plant viruses and the set of hsa-miRs were FAST formatted for alignment with Basic Length Alignment Sequence Tool for nucleotide-nucleotide matching (BLAST+ 2.2.25+), hosted by the National Center for Biotechnology Information (NCBI), and performed the BLASTN in a local server.

Step 2

BLASTN 2.2.25+, was used to detect matches that demonstrated sequence similarity between plant viral RNAs and human miRs. A sequence of six or more consecutive matching nucleotides was required. The bit-score was used to select the statistically significant subsets. This score is a numerical value that describes the overall quality of an alignment, with higher bit-scores associated with better alignments.

Step 3

All sequence matches were then filtered to select higher quality data. First, we used only those plant viruses with sequence lengths equal to or greater than 1,221 nucleotides to filter out partial viral sequences. In addition, we required any matching sequences to have a minimum of half the length of a mature miR which is 11 nucleotides. A third filtering step required the presence of at least some part of the miR-seed sequence, nucleotides 2–7. 393 sequence similarities were found from filtering by selecting just those similarities that are in common from the second nucleotide in the human miR. The latter sequence is important for complementary interactions of miRs with their gene-targets and is reported to include the second through the seventh nucleotides of the mature miR sequence. In addition, we allowed searches of partial virus genomes larger than 1,221 nucleotides for studies of four specific crop plants (tobacco, potato, pepper and cowpea) for which there were only incomplete viral genomes available. Finally, we performed a statistical analysis on the results. GenRGens software was utilized to randomly generate genomics sequences. We created randomized versions of every single plant virus genome in the DPV database. The settings used were the Markov model, with order and phase equal to one and to those in our original database. By using the same miRBase filtered human miRs as in our original study, we analyzed similarities to the new randomly generated plant virus-like sequences by the same pipeline. By filtering the results for similarities starting at the second nucleotide of the human miRs, we obtained 9,359 similarities (compared with only 393 found between naturally existing plant viruses and human miRs). We statistically compared both sets of analyses to determine if there were any similarities.

The database

The pipeline fed into a set of curated data that was subsequently used for a variety of additional analyses. For example, a clustering analysis of the information in these data provides an overview of the collected information from either the virus's or miR’s point of view. In addition, some filtering was used to identify viral sequences that shared identity with the complete miR seed sequences. Finally, viruses associated with a few agricultural crops of importance to agronomy were examined.

46 in total

1. Exogenous plant MIR168a specifically targets mammalian LDLRAP1: evidence of cross-kingdom regulation by microRNA.

Authors: Lin Zhang; Dongxia Hou; Xi Chen; Donghai Li; Lingyun Zhu; Yujing Zhang; Jing Li; Zhen Bian; Xiangying Liang; Xing Cai; Yuan Yin; Cheng Wang; Tianfu Zhang; Dihan Zhu; Dianmu Zhang; Jie Xu; Qun Chen; Yi Ba; Jing Liu; Qiang Wang; Jianqun Chen; Jin Wang; Meng Wang; Qipeng Zhang; Junfeng Zhang; Ke Zen; Chen-Yu Zhang
Journal: Cell Res Date: 2011-09-20 Impact factor: 25.617

2. Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microRNA targets.

Authors: Benjamin P Lewis; Christopher B Burge; David P Bartel
Journal: Cell Date: 2005-01-14 Impact factor: 41.582

3. GenRGenS: software for generating random genomic sequences and structures.

Authors: Yann Ponty; Michel Termier; Alain Denise
Journal: Bioinformatics Date: 2006-03-30 Impact factor: 6.937

Review 4. MicroRNAs: powerful new regulators of heart disease and provocative therapeutic targets.

Authors: Eva van Rooij; Eric N Olson
Journal: J Clin Invest Date: 2007-09 Impact factor: 14.808

5. Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes.

Authors: S Karlin; S F Altschul
Journal: Proc Natl Acad Sci U S A Date: 1990-03 Impact factor: 11.205

6. Viral nanoparticles associate with regions of inflammation and blood brain barrier disruption during CNS infection.

Authors: Leah P Shriver; Kristopher J Koudelka; Marianne Manchester
Journal: J Neuroimmunol Date: 2009-04-25 Impact factor: 3.478

7. Human cytomegalovirus infection alters the expression of cellular microRNA species that affect its replication.

Authors: Fu-Zhang Wang; Frank Weber; Carlo Croce; Chang-Gong Liu; Xudong Liao; Philip E Pellett
Journal: J Virol Date: 2008-07-02 Impact factor: 5.103

8. microRNA target predictions across seven Drosophila species and comparison to mammalian targets.

Authors: Dominic Grün; Yi-Lu Wang; David Langenberger; Kristin C Gunsalus; Nikolaus Rajewsky
Journal: PLoS Comput Biol Date: 2005-06-24 Impact factor: 4.475

Review 9. Computational challenges in miRNA target predictions: to be or not to be a true target?

Authors: Christian Barbato; Ivan Arisi; Marcos E Frizzo; Rossella Brandi; Letizia Da Sacco; Andrea Masotti
Journal: J Biomed Biotechnol Date: 2009-06-17

10. Humans have antibodies against a plant virus: evidence from tobacco mosaic virus.

Authors: Ruolan Liu; Radhika A Vaishnav; Andrew M Roberts; Robert P Friedland
Journal: PLoS One Date: 2013-04-03 Impact factor: 3.240

3 in total

Review 1. Can plant viruses cross the kingdom border and be pathogenic to humans?

Authors: Fanny Balique; Hervé Lecoq; Didier Raoult; Philippe Colson
Journal: Viruses Date: 2015-04-20 Impact factor: 5.048

Review 2. COVID-19: fighting the invisible enemy with microRNAs.

Authors: Neeraj Chauhan; Meena Jaggi; Subhash C Chauhan; Murali M Yallapu
Journal: Expert Rev Anti Infect Ther Date: 2020-09-16 Impact factor: 5.091

3. New Variants of Squash Mosaic Viruses Detected in Human Fecal Samples.

Authors: Fabiola Villanova; Roberta Marcatti; Mayara Bertanhe; Vanessa Dos Santos Morais; Flavio Augusto de Padua Milagres; Rafael Brustulin; Emerson Luiz Lima Araújo; Roozbeh Tahmasebi; Steven S Witkin; Xutao Deng; Eric Delwart; Ester Cerdeira Sabino; Cassio Hamilton Abreu-Junior; Élcio Leal; Antonio Charlys da Costa
Journal: Microorganisms Date: 2021-06-22

3 in total