| Literature DB >> 25820423 |
Paul J Sample1, Kirk W Gaston2, Juan D Alfonzo3, Patrick A Limbach4.
Abstract
Ribosomal ribonucleic acid (RNA), transfer RNA and other biological or synthetic RNA polymers can contain nucleotides that have been modified by the addition of chemical groups. Traditional Sanger sequencing methods cannot establish the chemical nature and sequence of these modified-nucleotide containing oligomers. Mass spectrometry (MS) has become the conventional approach for determining the nucleotide composition, modification status and sequence of modified RNAs. Modified RNAs are analyzed by MS using collision-induced dissociation tandem mass spectrometry (CID MS/MS), which produces a complex dataset of oligomeric fragments that must be interpreted to identify and place modified nucleosides within the RNA sequence. Here we report the development of RoboOligo, an interactive software program for the robust analysis of data generated by CID MS/MS of RNA oligomers. There are three main functions of RoboOligo: (i) automated de novo sequencing via the local search paradigm. (ii) Manual sequencing with real-time spectrum labeling and cumulative intensity scoring. (iii) A hybrid approach, coined 'variable sequencing', which combines the user intuition of manual sequencing with the high-throughput sampling of automated de novo sequencing.Entities:
Mesh:
Substances:
Year: 2015 PMID: 25820423 PMCID: PMC4446411 DOI: 10.1093/nar/gkv145
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Typical RNA oligomer fragmentation products generated by collision-induced dissociation (CID). The most abundant products are typically c- and y- fragment ions, however w- and a-B (a- ions lacking the adjacent base) are often commonly observed.
Figure 2.The design of the de novo sequencing algorithm. Potential compositions are produced for a calculated oligonucleotide mass and each composition is used for the de novo sequence analysis. The algorithm builds sequences one nucleotide at a time and tests the fit of the data to the expected products. This is done in the 5′ to 3′ direction using –c ions and in the 3′ to 5′ direction using –y ions. Once a potential sequence is found –w and –a-B ions are calculated and the sequences are scored based on the summed intensities of the fragment ions for that sequence.
Figure 3.Automated de novo sequencing. (A) MS/MS data can be selected and moved to the ‘To be sequenced’ box to be included in the analysis. (B) MS/MS data with less than a user selected number of m/z reads and/or maximum ion abundance can be removed from the ‘Scan collection’ box. (C) Nucleotides to be included in the automated de novo sequencing attempt—oligomers of <8 nt can utilize pools of up to 20 nt, while longer oligomers return the best results with pools of <12 nt. New entries in the ‘Saved pools’ dropdown list can be added by editing the ‘NucleotidePools.txt’ file. (D) The CID m/z tolerance, total mass tolerance, RNase digestion context, 5′ and 3′ ends and number of allowed skips are all modifiable by the user.
Summary of results generated by the automated sequencing of E. coli ΔqueC ΔqueF pGAT-queC tRNAAsp and E. coli tRNAGln(UUG)
| Sequence | Precursor | Charge state | Rank | % Diff | Complete sequences | Compositions | Time (ms) |
|---|---|---|---|---|---|---|---|
| CGp | 667.37 | −1 | 1 | 17 | 2 | 2 | 41 |
| AGp | 691.27 | −1 | 1 | NA | 1 | 1 | 40 |
| [s4U]AGpa | 1013.4 | −1 | 1 | 37 | 4 | 2 | 58 |
| [D]CGp | 975.52 | −1 | 1 | 40 | 4 | 2 | 150 |
| CAGp | 996.39 | −1 | 1 | 20 | 4 | 2 | 52 |
| CCAb | 876.26 | −1 | 2 | 1.5 | 8 | 2 | 44 |
| [m5U][Ψ]CGpc,d | 1293.4 | −1 | 1 | 12 | 12 | 6 | 122 |
| [D][D]AGp | 653.53 | −1 | 1 | 33 | 3 | 1 | 47 |
| [Ψ]CCGpd | 1278.5 | −1 | 1 | 36 | 5 | 2 | 73 |
| [m7G]UCGpe | None found | ||||||
| UCCCGpf | 791.83 | −2 | 1 | 3 | 30 | 6 | 71 |
| UCCCGp | 791.83 | −2 | 1 | 8 | 14 | 3 | 60 |
| UUCAGp | 803.91 | −2 | 1 | 8 | 25 | 3 | 100 |
| AAUACCUGpf | 1286.0 | −2 | 1 | 1 | 367 | 15 | 891 |
| CCU[preQ0]UC[m2A]CGpf,g | 1453.5 | −2 | 1 | 5 | 720 | 34 | 491 |
| CCU[G+]UC[m2A]CGpf,g | 1461.9 | −2 | 1 | 6 | 195 | 44 | 1019 |
The T1 restriction digest confinement was used to match the sample preparation. Unless otherwise noted, all automated sequencing attempts were allowed one skip, a target mass tolerance of 1 Da and a mass-to-charge tolerance of 0.3. The ‘% Diff.’ is the cumulative intensity difference between the correct oligomer sequence and second highest scoring oligomer if the correct sequence is the highest scoring result. Otherwise, it measures the difference between the highest scoring sequence and the independently verified correct sequence. ‘Complete sequences’ is the number of sequences which the m/z data supports, given user-defined sequencing parameters. ‘Compositions’ is the number of nucleotide compositions that fit the target mass range, given the user-defined nucleotide pool. ‘Time’ measures the time in milliseconds that the program takes to return the automated sequencing results.
a[s4U] = 1sU.
bCAC scored 1.5% higher than CCA and was analyzed with the 3′-OH setting and no digestion constraint.
c[m5U] = 1mU.
d[Ψ] = U.
e[m7G] = 1mG.
fTarget mass tolerance = 2.
g[m2A] = 1mA.
Summary of results generated by the automated sequencing of E. coli tRNAGln(UUG)
| Sequence | Precursor | Charge state | Rank | % Diff | Complete sequences | Compositions | Time (ms) |
|---|---|---|---|---|---|---|---|
| CGp | 667.14 | −1 | 1 | 20 | 2 | 2 | 61 |
| C[Gm]Gpa | 1026.1 | −1 | 1 | 14 | 6 | 4 | 55 |
| CCAg | 876.14 | −1 | 1 | 14 | 8 | 2 | 57 |
| [D]AAGp | 663.73 | −2 | 1 | 18 | 4 | 3 | 65 |
| [m5U][Ψ]CGpb,c | 646.64 | −2 | 1 | 23 | 14 | 6 | 79 |
| UA[s4U]CGpd | 811.68 | −2 | 1 | 8 | 68 | 6 | 94 |
| CCAAGp | 814.89 | −2 | 1 | 10 | 14 | 4 | 207 |
| CACCGp | 802.76 | −2 | 1 | 17 | 11 | 3 | 113 |
| U[Um]U[cmnm5s2U]UGpb | 1004.5 | −2 | 1 | 12 | 41 | 24 | 186 |
| [m2A]UACCGpe | 974.84 | −2 | 1 | 1 | 91 | 12 | 231 |
| AAUCCAGpf | 1132.4 | −2 | 1 | 12 | 12 | 40 | 588 |
| UACCCCAGp | 1273.0 | −2 | 1 | 8 | 56 | 19 | 137 |
| CAUUCCCUGpf | 942.91 | −3 | 1 | 11 | 428 | 12 | 380 |
The T1 restriction digest confinement was used to match the sample preparation. Unless otherwise noted, all automated sequencing attempts were allowed one skip, a target mass tolerance of 1 Da and a mass-to-charge tolerance of 0.3.
a[Gm] = 1mG
b[m5U], [Um] = 1mU
c[Ψ] = U
d[s4U] = 1sU
e[m2A] = 1mA
fTarget mass tolerance = 2
gCCA was analyzed with the 3′-OH setting and no digestion constraint.
Figure 4.Sequencing accuracy in relation to oligomer length. (A) Accuracy as a function of RNA oligomer length. (B) Relative rank of all tested oligomers. First 95% (73/77) second 3% (2/77), fifth 1% (1/77) and uninterpreted 1% (1/77).
Figure 5.(A) Annotated MS/MS spectrum from the program output of a sequence identified as 1mUUCG, which would correspond to the expected [m5U][Ψ]CGp digestion product from Escherichia coli Gln-tRNAUUG. (B) The 15 MS/MS scans annotated as 1mUUCG from the E. coli Gln-tRNAUUG dataset. Nearly every MS/MS scan yielded the complete, expected c-ion series, while almost half of the MS/MS scans could be annotated with the complementary y-ion series.