Literature DB >> 30445487

G4HunterApps.

Abstract

MOTIVATION: In order to help G4Hunter users and make it more accessible, I have developed a set of small applications within the Shiny/R framework.
RESULTS: Each application fulfils simple tasks ranging from computing the G4Hunter score for a sequence or a list of sequence to extracting sequences with a G4Hunter score above a threshold for a sequence up to 5 Mb or a list of short sequences. The application can be installed either on the user computer within Rstudio or on a Rstudio server.
AVAILABILITY AND IMPLEMENTATION: The source code for the ShinyApps is available on GitHub (https://github.com/LacroixLaurent).

Entities: Chemical Disease Gene Species

Mesh：
Software

Year: 2019 PMID： 30445487 PMCID： PMC6596896 DOI： 10.1093/bioinformatics/bty951

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

1 Introduction

Nucleic acid sequences not only carry genetic information, but can also adopt various structures that go beyond double-helical DNA or stem/loop RNA combinations. These shapes and their dynamics could code for another level of genetic/genomic information. More and more light has been recently shed on alternative or unusual nucleic acid structures as sequences prone to ‘unorthodox’ conformations are proposed to have nucleic acid-related functions (Bacolla; Kouzine). Guanine quadruplexes (G4) are a family of alternative nucleic acid structures that have attracted attention because of their high structural stability under physiological conditions (Davis, 2004) and the widespread distribution of sequences compatible with G4 formation (Hansel-Hertsch ; Maizels, 2012). Many recent papers also point towards biological effects that are, or could be, mediated through G4 formation (Maizels, 2015; Rhodes and Lipps, 2015). For a long time, pattern matching algorithms were used to search for genomic sequences able to form such structures (Huppert and Balasubramanian, 2005; Todd ), but other types of predicting algorithm have recently become available (Beaudoin ; Garant ; Sahakyan ; Varizhuk ). We have previously developed G4Hunter, one such algorithm (Bedrat ). G4Hunter can re-evaluate the widespread occurrence of G4 forming sequences in various genomes in addition to mapping potential G4s that eluded other algorithms. In the original paper (Bedrat ), we published the R-code for the algorithm allowing anyone to use G4Hunter. But I came to realize that many people are reluctant to code or might need G4Hunter for a simple task; such as calculating the G4Hunter score for an oligonucleotide or retrieving the G4Hunter predicted hits for their favourite gene or transcript. Therefore, I have developed small applications in the Shiny/R framework that allow such tasks with a user-friendly interface. For whole genomes scans, scripts are available within the original publication for G4Hunter (Bedrat ).

2 Materials and methods

Fours applications have been written in Shiny/R starting from the scripts published (Bedrat ). All apps require the Shiny library. Additional libraries required are: Xvector, Biostrings (Huber ) and GenomicRanges (Lawrence ). All source code are available on GitHub (https://github.com/LacroixLaurent) and can be run on a personal computer via RStudio (https://www.rstudio.com/). Setting up the apps on an Rstudio server is out of the scope of this article.

3 Results

3.1 G4HunterScore

This app simply computes the G4Hunter score of a sequence using the published rule (Bedrat ). Letters different from G or C are not translated and have a score of 0. Spaces are automatically removed. The app also reports the length of the sequence.

3.2 G4HunterTable

The G4Hunter_table app provides a way to compute G4Hunter scores for a list of sequences either in a text format (one sequence per line) or in a multifasta format (one sequence per fasta entry). In the case of text-type entry, the app tolerates the presence of a header (first line) that can be removed with the ‘Remove the header’ option. Finally, results are reported in table form that can be exported as tab-separated values with three columns: the sequence, the G4Hunter scores and the lengths.

3.3 G4HunterSeeker

The top part of the application reuses the G4HunterScore app in order to allow the user to quickly check the G4Hunter score of a given sequence within the others apps without starting the G4HunterScore app. Users can either type or paste a sequence (manual entry) or upload a fasta file containing the sequence (fasta file entry). Users also need to specify the sequence type (DNA or RNA alphabet) in order to properly import the sequence. The Threshold and Window size determine the parameters for the sequence search as described in the original publication. The higher the threshold, the more stringent the search: fewer G4 motifs will be found, but these will be the most stable/likely ones. The procedure extracts sequences that have a G4Hunter score above the threshold (in absolute value) in a window, fuses the overlapping sequences and then refines theses sequences by removing bases at the extremities that are not G for sequences with a positive score (or C for the negative ones). It also looks at the first neighbouring base and adds it to the sequence if it is a G for sequences with a positive score (C for sequences with a negative score). Please see the original G4Hunter publication (Bedrat ) and in particular Supplementary Figure S1B for more details. The result is typically a table reporting: (i) the name of the input sequence (seqnames), (ii) the start, end and width of the hit (after refining the sequence as explained above), (iii) the strand of the hit (the strand is (+) if the proposed G4 forming sequence is in the Input Sequence and (−) if the G4 forming sequence in on the reverse complementary strand). The score is the G4Hunter score of the refined hit and max_score is the highest score in absolute value in a window of the chosen window size found during the search. The threshold and window are respectively the Threshold and Window size used for the search. The sequence corresponds to the refined sequence in the Input sequence. This field is sensitive to the Report G-sequences option (see below) and is not present if the Report sequences option is not selected. The first line of the fasta file (after the > sign) imposes the sequence name (seqname) in the output. This can be changed by checking the Alternate Seqname option and entering the chosen sequence name in the New Seqname option. The Report G_sequences option changes sequences with a negative score (C-rich sequences) into their reverse complement. Thus the output reports only G-rich sequences. The Number of Hits reports the number of sequences retrieved that match the settings. The Length of the Input Sequence corresponds to the length of the DNA sequence you enter with your Fasta file or manually. The output can be exported to a text file (tab-separated values) that can be directly opened with Microsoft Excel. There is no limit for the size of the input sequence using the manual entry option and sequences up to several megabases are well tolerated by the app. For fasta file entry, there is a limit set by the default settings of the shiny environment: the file can be up to 5 Mb. However, this option can be modified (see the README file from the source code for details).

3.4 G4HunterMultiFastaSeeker

This app is very similar to the G4HunterSeeker app but allows the user to import multifasta files to perform the search on several input sequences. The limit in size for the fasta file is 5 Mb. If the number of entries in your multifasta file is large (>1000), the app will run significantly slower. You might need to use R outside of the Shiny interface. To avoid unwanted long computation times, you need to click on the button ‘Please click here to start the computation’ to start the G4Hunter process after uploading your file and checking the number of fasta entries. Up to 1000 entries result in a typical computing time below 1 min but for more than a few thousands entries, users might have to wait a few minutes. Options (Threshold, Window Size, Report sequences and Report G-sequences) are similar to the G4Hunter Seeker app. The output table has a unique name for each hit (hitnames) corresponding to the concatenation of the target entry name (from the fasta file) and the start of the hit in this target. The output can be exported to a text file (tab-separated values).

16 in total

Review 1. G-quartets 40 years later: from 5'-GMP to molecular biology and supramolecular chemistry.

Authors: Jeffery T Davis
Journal: Angew Chem Int Ed Engl Date: 2004-01-30 Impact factor: 15.336

Review 2. DNA G-quadruplexes in the human genome: detection, functions and therapeutic potential.

Authors: Robert Hänsel-Hertsch; Marco Di Antonio; Shankar Balasubramanian
Journal: Nat Rev Mol Cell Biol Date: 2017-02-22 Impact factor: 94.444

3. Breakpoints of gross deletions coincide with non-B DNA conformations.

Authors: Albino Bacolla; Adam Jaworski; Jacquelynn E Larson; John P Jakupciak; Nadia Chuzhanova; Shaun S Abeysinghe; Catherine D O'Connell; David N Cooper; Robert D Wells
Journal: Proc Natl Acad Sci U S A Date: 2004-09-17 Impact factor: 11.205

4. G4 motifs in human genes.

Authors: Nancy Maizels
Journal: Ann N Y Acad Sci Date: 2012-09 Impact factor: 5.691

5. Software for computing and annotating genomic ranges.

Authors: Michael Lawrence; Wolfgang Huber; Hervé Pagès; Patrick Aboyoun; Marc Carlson; Robert Gentleman; Martin T Morgan; Vincent J Carey
Journal: PLoS Comput Biol Date: 2013-08-08 Impact factor: 4.475

6. Highly prevalent putative quadruplex sequence motifs in human DNA.

Authors: Alan K Todd; Matthew Johnston; Stephen Neidle
Journal: Nucleic Acids Res Date: 2005-05-24 Impact factor: 16.971

7. Prevalence of quadruplexes in the human genome.

Authors: Julian L Huppert; Shankar Balasubramanian
Journal: Nucleic Acids Res Date: 2005-05-24 Impact factor: 16.971

8. New scoring system to identify RNA G-quadruplex folding.

Authors: Jean-Denis Beaudoin; Rachel Jodoin; Jean-Pierre Perreault
Journal: Nucleic Acids Res Date: 2013-10-10 Impact factor: 16.971

9. Re-evaluation of G-quadruplex propensity with G4Hunter.

Authors: Amina Bedrat; Laurent Lacroix; Jean-Louis Mergny
Journal: Nucleic Acids Res Date: 2016-01-20 Impact factor: 16.971

10. Machine learning model for sequence-driven DNA G-quadruplex formation.

Authors: Aleksandr B Sahakyan; Vicki S Chambers; Giovanni Marsico; Tobias Santner; Marco Di Antonio; Shankar Balasubramanian
Journal: Sci Rep Date: 2017-11-06 Impact factor: 4.379

2 in total

1. A guide to computational methods for G-quadruplex prediction.

Authors: Emilia Puig Lombardi; Arturo Londoño-Vallejo
Journal: Nucleic Acids Res Date: 2020-01-10 Impact factor: 16.971

2. Guanine quadruplexes in the RNA genome of the tick-borne encephalitis virus: their role as a new antiviral target and in virus biology.

Authors: Jiří Holoubek; Klára Bednářová; Jan Haviernik; Ivana Huvarová; Zuzana Dvořáková; Jiří Černý; Martina Outlá; Jiří Salát; Eva Konkol'ová; Evzen Boura; Daniel Růžek; Michaela Vorlíčková; Luděk Eyer; Daniel Renčiuk
Journal: Nucleic Acids Res Date: 2022-05-06 Impact factor: 19.160

2 in total