| Literature DB >> 33008363 |
Abstract
BACKGROUND: This paper describes a web based tool that uses a combination of sonification and an animated display to inquire into the SARS-CoV-2 genome. The audio data is generated in real time from a variety of RNA motifs that are known to be important in the functioning of RNA. Additionally, metadata relating to RNA translation and transcription has been used to shape the auditory and visual displays. Together these tools provide a unique approach to further understand the metabolism of the viral RNA genome. This audio provides a further means to represent the function of the RNA in addition to traditional written and visual approaches.Entities:
Keywords: Auditory display; COVID-19; Molecular animation; RNA sequence; SARS-CoV-2; Sonification
Mesh:
Substances:
Year: 2020 PMID: 33008363 PMCID: PMC7530539 DOI: 10.1186/s12859-020-03760-7
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1The animated display. Panel A shows the sliding window of the animated display in translation mode. Key features of the animated display are labelled such as the translated peptide sequences and the frame in which they occur, the presence of start and stop codons are highlighted in green and red, respectively. The location of the audio play-head is represented to coincide with the peptidyl-transferase centre of the ribosome. The sonified audio is generated as the SARS-CoV-2 genome sequence passes through the play-head. The direction in which the ribosome moves relative to the RNA sequence is indicated. Panel B shows the animated display in transcription mode. The newly synthesised minus RNA strand is shown below the genome sequence with the 3′ extended nucleotide shown in the play-head. The direction in which the replicase protein complex moves in relation to genome sequence is indicated
The mapping of each RNA feature into a layer of the auditory display
| Description | RNA feature | Note range | When is the feature sonified |
|---|---|---|---|
| As the sequence is processed each is sonified to create a constant audio stream | Nucleotide | 4 | Throughout the genome |
| Di-nucleotide | 16 | Throughout the genome | |
| GC Content (10 bp) | 10 | Throughout the genome | |
| GC Content (100 bp) | 10 | Throughout the genome | |
| Three of the same nucleotide repeats | Example: the poly-A tail | 4 | Anytime when condition is true |
| Codons (translation only) | Codon Frame 1 | 20 | Between start and stop codons |
| Codon Frame 2 | 20 | Between start and stop codons | |
| Codon Frame 3 | 20 | Between start and stop codons | |
| Trinucleotides (transcription only) | Only the 1st and 3rd nucleotides are considered | 16 | Throughout the genome |
| Untranslated regions | Intragenic UTR regions (excluding 5′ and 3′ UTRs) | 16 | At genomic regions defined by GenBank metadata [ Individual nucleotides were mapped to higher octaves ranges for the sake of audio clarity |
| Transcription regulating sequences (TRS) | Each nucleotide in TRS1 through to TRS10 | 16 | |
| Polyprotein cleavage sites (translation only) | Nucleotides that code for the cleaved AA residues | 4 | |
| Stem and Loop regions (SL) | Each nucleotide in the identified region | 16 |
Description of the navigation buttons from where users can begin playing the audio and visual displays
| 5′UTR | 5′ untranslated region |
| Poly-/-protein | Two buttons representing the coding region before and after the -1 frameshift position of the large polyprotein |
| 9 U regions | Each navigates to an untranslated region between ORF’s |
| -S- | Region coding for the canonical S protein |
| -E- | Region coding for the canonical E protein |
| -M- | Region coding for the canonical M protein |
| -N- | Region coding for the canonical N protein |
| ORF 3a, ORF 6, ORF 7a, ORF 7b, ORF 8, ORF 10 | Regions thought to code for other proteins or polypeptides |
| 3′UTR | 5′ untranslated region |
| 5′UTR | 5′ untranslated region |
| N1—N16 | Location of the 16 NSP proteins within the large polyprotein |
| 14 C sites | Cleavage sites within the translated polyprotein giving rise to the 16 individual NSP proteins |
| S—ORF 10 | Region of the RNA sequence downstream of the polyprotein |
| 3′UTR | 5′ untranslated region |
| 5′UTR | 5′ untranslated region |
| T1—T10 | Location of TRS 1 to TRS 10. TRS1 is sometimes referred to as the leader TRS and is linked to the subsequence TRS 2—10 to produce the sub-genomic regions during transcription |
| 5 SL regions | Stem Loop regions giving rise to structured regions of RNA. These are formed due to sequence complementarity and base pairing |
| 12 Seq regions | Undefined sequences between the TRS regions, these often correspond closely to the ORF regions |
| 3′UTR | 5′ untranslated region |
Scale degrees and instrumentation of the RNA features being sonified
| Sonified motif | Instrument | Pan | Translation Scale Bb aeolian mode | Transcription Scale C Lydian mode | ||
|---|---|---|---|---|---|---|
| Scale degrees | Octave | Scale degrees | Octave | |||
| Nucleotide | Synth | L | 1, 3 | 2, 3 | 1, 5 | 2, 3 |
| Di-nucleotide | Synth | R | 1, 4, 5, 6 | 1, 2, 3, 4 | 1, 3, 5 | 1 |
| GC Content (10 bp) | AM synth + delay | L | 1, 3, 6, 7 | 2,3 | 1, 3, 5, 7 | 4, 5 |
| GC Content (100 bp) | AM synth + delay | R | 1, 3, 6, 7 | 2, 3 | 1, 3, 5, 7 | 4, 5 |
| 3 bp repeat | Synth | L | 1, 3 | 4 | 1, 4, 5 | 6 |
| Codon Frame 1 (translation) | FM synth + distortion | L | 1, 3, 4, 5, 7 | 2, 3, 4, 5 | – | – |
| Codon Frame 2 (translation) | FM synth + distortion | C | 1, 3, 4, 5, 7 | 2, 3, 4, 5 | – | – |
| Codon Frame 3 (translation) | FM synth + distortion | R | 1, 3, 4, 5, 7 | 2, 3, 4, 5 | – | – |
| Tri-nucleotide (transcription) | FM synth + distortion | L | – | – | 1, 3, 4, 5, 7 | 3, 4, 5 |
| Untranslated regions | AM synth | R | 1, 2, 3 | 5 | 1, 4, 6, 7 | 3 |
| Transcription regulating sequences (TRS) | AM synth | L | 1, 2, 4, 5, 6 | 5 | 1, 2, 3, 4, 5, 6, 7 | 6 |
| Cleavage sites in the polyprotein | AM synth + distortion | L | 1, 6, 7 | 4 | 1, 2, 3 | 6 |
| Stem and loop regions (SL) | AM synth + delay + distortion | R | 1, 2, 6, 7 | 5 | 1, 4, 5, 7 | |
Fig. 2Multitrack wave files representing a portion of an auditory display. These tracks play in unison to generate the auditory display and each represent approximately 80 nucleotides beginning at nucleotide position 65. This sequence is located in the 5′ untranslated region and includes a TRS region and a uORF. Each audio stream was generated from a different algorithm, only nucleotides that gave rise to audio are shown (the entire nucleotide sequence is shown in track 2). In track 1, each nucleotide generates a note for every beat unless it is a repeat of the previous in which case the length of the note is extended. In track 2, each di-nucleotide generates a note every second beat. In tracks 3 and 4, audio from the GC track is only triggered when the GC ratio changes by an increment of 0.1. Each change in the GC ratio is indicated by a plus (+) or minus (−) symbol on the wave files. In track 5, only codon sequences beginning with a start codon (AUG) are shown through to the next stop codon (e.g. UAA). Isolated stop codons also give rise to a note. This track is a compilation of audio form three sub-tracks each representing a different reading frame and notes in this track are panned left, centre or right, respectively. Track 6 represents the audio generated from metadata that indicates the location of a TRS region. Additionally, the consensus sequence within this region is coloured purple in the visual display. Track 7 represents audio generated by the occurrence of three nucleotides of the same type. Other data tracks are not represented since no audio was generated in these during processing of this sequence of the genome. Additionally, the amino acid sequence of the ORF is shown in the codon track 5
Fig. 3Alignment of the raw stereo waveforms. Two stereo waveforms are shown that depict the audio from examples 1 and 2. The vertical cursor indicates the transition across the TRS1 consensus sequence. Panel A depict the audio from the ‘UTR to Surface Glycoprotein’ example and panel B depicts that from the ‘Untranslated ends’ example. To the left of the cursor the stereo waveforms are identical leading up to the TRS1 region. To the right of the cursor the waveforms diverge. Panel A represents translation of a template produced through discontinuous transcription whereas panel B represents translation of contiguous genome sequence