| Literature DB >> 34872593 |
Sihao Huang1, Wen Zhang1, Christopher D Katanski1, Devin Dersh2, Qing Dai3, Karen Lolans4, Jonathan Yewdell2, A Murat Eren4, Tao Pan5.
Abstract
Pseudouridine (Ψ) is an abundant mRNA modification in mammalian transcriptome, but its functions have remained elusive due to the difficulty of transcriptome-wide mapping. We develop a nanopore native RNA sequencing method for quantitative Ψ prediction (NanoPsu) that utilizes native content training, machine learning modeling, and single-read linkage analysis. Biologically, we find interferon inducible Ψ modifications in interferon-stimulated gene transcripts which are consistent with a role of Ψ in enabling efficacy of mRNA vaccines.Entities:
Keywords: Interferon; Machine learning; Nanopore sequencing; Pseudouridine
Mesh:
Substances:
Year: 2021 PMID: 34872593 PMCID: PMC8646010 DOI: 10.1186/s13059-021-02557-y
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Fig. 1Ψ prediction model training using model organisms and microbiome rRNA Ψ modification. a Overview of the experiments to generate the Ψ prediction model by nanopore sequencing. b Features of a region in human 18S rRNA from Illumina sequencing and nanopore sequencing. c Features of a region in a microbial rRNA from Illumina sequencing and nanopore sequencing. d Box and Whisker plots with 1.5 times interquartile range of the 12 feature candidates of U and Ψ sites derived from nanopore sequencing. Ins, insertion rate after the base. Ins_len, insertion length mean. Del, deletion rate after the base. Del_len, deletion length mean. Del_site, deleted site ratio (the site is in a deletion). Mis, overall mismatching ratio. Mis_A, mutation to A ratio. Mis_C, mutation to C ratio. Mis_G, mutation to G ratio. Base_qual_mean, average base quality score. Base_qual_STD, base quality score standard deviation. Base_qual_count_0, ratio of bases with a quality score 0 at a site. e Mutation preference for the Ψ sites in all rRNAs in a ternary plot. Red, Ψ sites in model organisms. Blue, Ψ sites in the microbiome. f Correlation matrix of modification state (Ψ=1, U=0) and the 12 feature candidates. The value of correlation coefficient is indicated in each box. Same labels as panel d. Label type, modification state. g ROC (receiver operating characteristic) curves of EXT models with different numbers of features included. The number of features and AUC (aera under curve) values of each model are indicated by the legend. The features are added to the model in the order of their correlation with the modification state indicated in panel f. For example, 1 feature means “mis_C”, 2 features means “mis_C” and ”mis”, and so on. h ROC curve of the testing set predicted by the optimized EXT model. The AUC value is indicated in the graph
Fig. 2Interferon treatment elicits more Ψ modification in mRNA. a Log10 expression levels of genes in untreated sample and IFN β-treated (left) or IFN γ-treated (right) samples. Expression level is calculated as the peak height of the piled reads. Red, genes with an increase of > 2 fold in expression. Blue, genes with a decrease of > 2 fold in expression. b Venn diagram of the GO terms of the genes containing the 500 U sites with the highest Ψ probabilities in each sample. c Scatter plot showing the mean modification probability change versus log10 expression fold change of each gene between untreated and IFN β-treated (left) or IFN γ-treated (right) sample. Red, genes with an increase of > 2 fold in expression. Blue, genes with a decrease of > 2 fold in expression. d Mean Ψ modification probability of genes assigned to groups based on expression fold change between untreated and IFN β-treated (left) or IFN γ-treated (right) samples. ***p<10-3, and ****p<10-4. e GO analysis of the 50 genes with highest mean Ψ probability change between untreated and IFN β-treated (top) or IFN γ-treated (bottom) samples. Blue vertical line indicates p=0.05. f Mean Ψ probability change of the highest 50 genes between untreated and IFN β-treated (left) or IFN γ-treated (right) samples. Genes with a significant increase in expression levels are marked in red (>10 fold) or orange (5–10 fold). g Relative Ψ level of mRNA transcript of ACTB (left panel, data from set 1 and set 2 primers) and ISG15 (right panel, data from set 1, set 2, and set 3 primers) in the untreated and interferon-treated samples measured by RT-qPCR. *p < 0.05; **p < 0.01. h Single read prediction results for the partially modified Ψ sites in human rRNA. The stoichiometry predicted by our method is compared with the stoichiometry reported previously by quantitative LC/MS. The correlation coefficient is 0.6566 (Pearson’s r). i Clustering heatmap showing the Ψ probability of two pairs of sites in single reads of the B2M transcript in the IFN γ-treated sample. Each row represents a read. Site numbers are defined as the chromosomal locations in the hg38 nomenclature. These two pairs show either negative linkage (left) or no linkage (right). j Reads in panel i are assigned to “Ψ” and “U” groups based on the posterior probabilities of site 1 in Gaussian mixture model (k=2). The cumulative distribution curves of Ψ probabilities of site 2 are drawn for reads in “Ψ” or “U” groups or for all reads. The curves for “Ψ” and “U” groups undergo two sample Kolmogorov-Smirnov test; p values are <2.2x10-16 (left) and 0.7684 (right). k P value in the two sample Kolmogorov-Smirnov test for selected pairs of sites in the B2M transcript in the untreated and IFN treated samples