| Literature DB >> 32481589 |
Omer An1, Kar-Tong Tan1, Ying Li1, Jia Li1, Chan-Shuo Wu1, Bin Zhang1, Leilei Chen1,2, Henry Yang1.
Abstract
Next-generation sequencing (NGS) has been a widely-used technology in biomedical research for understanding the role of molecular genetics of cells in health and disease. A variety of computational tools have been developed to analyse the vastly growing NGS data, which often require bioinformatics skills, tedious work and a significant amount of time. To facilitate data processing steps minding the gap between biologists and bioinformaticians, we developed CSI NGS Portal, an online platform which gathers established bioinformatics pipelines to provide fully automated NGS data analysis and sharing in a user-friendly website. The portal currently provides 16 standard pipelines for analysing data from DNA, RNA, smallRNA, ChIP, RIP, 4C, SHAPE, circRNA, eCLIP, Bisulfite and scRNA sequencing, and is flexible to expand with new pipelines. The users can upload raw data in FASTQ format and submit jobs in a few clicks, and the results will be self-accessible via the portal to view/download/share in real-time. The output can be readily used as the final report or as input for other tools depending on the pipeline. Overall, CSI NGS Portal helps researchers rapidly analyse their NGS data and share results with colleagues without the aid of a bioinformatician. The portal is freely available at: https://csibioinfo.nus.edu.sg/csingsportal.Entities:
Keywords: NGS data analysis; NGS pipelines; bioinformatics pipelines
Mesh:
Substances:
Year: 2020 PMID: 32481589 PMCID: PMC7312552 DOI: 10.3390/ijms21113828
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Figure 1Website Framework of CSI NGS Portal. The middle panel shows the logical flow of the website from top to the bottom, whereas the automated steps of the data transitions from one page to another are described in the left panel, and the function of each page expecting user input/action is given in the right panel. : Logical flow; : Automated step; : Page function.
Figure 2Website Interface and Usage of CSI NGS Portal. (a) Upload page, (b) Annotation page, (c) Submit page, (d) Jobs page, and (e) Browse page. Key features and usage information are highlighted in text boxes. Further details on the quotas, rules and usage instructions are given on the “README” sections on the website.
Bioinformatics pipelines implemented on CSI NGS Portal.
| Bioinformatics | Analysis | Tools and | Sequencing | Normal/Control/Reference Samples | Replicate | Overall |
|---|---|---|---|---|---|---|
|
| Genome alignment | BWA (mem) [ | Single/Paired end | Optional b | NA | ~1 day |
| Mutation calling | GATK4 Mutect2 [ | |||||
| Mutation annotation | ANNOVAR [ | |||||
|
| Genome alignment | STAR [ | Single/Paired end | NA | NA | ~2 h |
| Gene expression | HTSeq-count [ | |||||
| Isoform expression | Salmon [ | |||||
| Alternative splicing | in-house Perl | |||||
|
| Genes table | Bioconductor DESeq2 [ | Single/Paired end c | Required | Required | ~10 min |
| Genes report | Bioconductor regionReport [ | |||||
| Heatmap | Superheat [ | |||||
| Volcano | ggplot2 (Wickham 2016) | |||||
| Pathway enrichment | Bioconductor ReactomePA [ | |||||
| Gene set enrichment analysis | GSEA [ | |||||
| Isoforms report | Bioconductor DEXSeq [ | |||||
|
| Enrichment plots | Bioconductor ReactomePA [ | NA | NA | NA | ~1 min |
|
| Genome alignment | BWA (mem) [ | Single/Paired end | NA | NA | ~7 h |
| Variant calling | Samtools mpileup [ | |||||
| Candidates selection | adapted from [ | |||||
| AEI calculation | RNAEditingIndexer [ | |||||
| UCSC track hub | in-house Bash | |||||
|
| Genome alignment | NovoAlign | Single/Paired end | NA | NA | ~1 h |
| smallRNA expression | in-house Perl | |||||
|
| Genome alignment | BWA (mem) [ | Single/Paired end | Optional | Optional | ~10 min |
| Interactions | Bioconductor r3Cseq [ | |||||
| Report | Bioconductor r3Cseq [ | |||||
|
| Genome alignment | Bowtie2 [ | Single/Paired end | Required | NA | ~2 h |
| Peak calling | MACS2 [ | |||||
| Motif enrichment | HOMER [ | |||||
| UCSC track hub | in-house Bash | |||||
|
| Genome alignment | STAR [ | Paired end | Required | Optional | ~8 h |
| Peak calling | in-house Bash | |||||
| UCSC track hub | in-house Bash | |||||
|
| Transcriptome alignment | Bowtie2 [ | Single/Paired end | Required | NA | ~10 h |
| Reactivity calculation | icSHAPE [ | |||||
| Structure prediction | RNAfold [ | |||||
|
| Genome alignment | STAR [ | Single/Paired end | Required | Required(2–10 samples) | ~2 h |
| Alternative splicing | rMATS [ | |||||
|
| Genome alignment | STAR [ | Single/Paired end | NA | NA | ~1 h |
| circRNA expression | in-house Perl | |||||
|
| Demultiplexing | eclipdemux [ | Single/Paired end | Required | NA | ~1 day |
| Mapping | STAR [ | |||||
| Peak calling | clipper [ | |||||
| Peak normalisation | eCLIP [ | |||||
| Peak annotation | HOMER [ | |||||
| Motif enrichment | HOMER [ | |||||
| UCSC track hub | in-house Bash | |||||
|
| Genome alignment | bowtie2 [ | Single/Paired end | NA | NA | ~3 days |
| Methylation calling | Bismark [ | |||||
| UCSC track hub | in-house Bash | |||||
| DMRs | metilene [ | |||||
|
| Genome alignment | STAR [ | Paired end | NA | NA | ~4 h |
| Single cell analysis | Cell Ranger (10× Genomics) | |||||
|
| Genome alignment | STAR [ | Single/Paired end | NA | NA | ~4 h |
| Plots | ngsplot [ | |||||
| Plots | deepTools [ |
a Ideally technical replicates rather than biological replicates. Numbers in parentheses denote the samples in total. b For somatic mutation calling, a matched normal DNA sample is highly recommended. Use of “tumor-only mode” is useful only for specific purposes. c Not directly applicable to “Diff-Exp” pipeline, it instead refers to the samples from the target jobs where this pipeline starts from. Seq: Sequencing, BWA: Burrows-Wheeler Aligner, GATK: GenomeAnalysisToolkit, ANNOVAR: Annotate Variation, STAR: Spliced Transcripts Alignment to a Reference, GSEA: Gene Set Enrichment Analysis, AEI: Alu Editing Index, 4C: Chromosome Conformation Capture-on-Chip, ChIP: Chromatin Immunoprecipitation, RIP: RNA Immunoprecipitation, MACS: Model-based Analysis of ChIP-Seq, icSHAPE: in vivo click Selective 2-Hydroxyl Acylation and Profiling Experiment, MATS: Multivariate Analysis of Transcript Splicing, eCLIP: enhanced Crosslinking and Immunoprecipitation, DMR: Differentially Methylated Regions, NA: Not Applicable. Usage of the tools and packages in CSI NGS Portal and website links to their original sources are given in Supplementary Table S1. The detailed descriptions, expected input and output of the pipelines are given in Supplementary Data and on the website Docs page.Overall runtime is the approximate time elapsed for one sample to finish all the analysis steps once the job starts running, and may vary depending on the data size, pipeline parameters and server load. However, runtime for additional samples under the same job do not multiply proportionally due to the parallelisation. In case of multiple samples, all the samples start off running as soon as there are available resources on the server and keep running in parallel until they all finish. This provides an efficient means of utilising system resources, while providing results to the user as quickly as possible.
Features of CSI NGS Portal.
|
|
| All the pipelines run from input to output without intervention with minimal user input. |
|
|
| User-friendly and simple design with interactive tables having search, filter, sort, edit, export and share options. |
|
|
| Repertoire of pipelines is easy to expand complying with the existing website framework. |
|
|
| Pipelines written in virtually any script language can be integrated independently of the website code. |
|
|
| The pipelines documentation are available online with the descriptions and the code. |
|
|
| The website can be functionally displayed on multiple devices and platforms with different window/screen sizes. |
|
|
| FastQC report is auto-generated upon file upload, sequence and quality trimming are optionally available with multiple options. |
|
|
| No personal information is collected, secure, random cookies for authorisation and dynamic usernames for data sharing are used. |
|
|
| Data can be edited, deleted or shared only by the owner, expired data are completely removed from the server. |
|
|
| Uploaded raw FASTQ files are private to the user, results can be optionally shared/unshared with other users any time. |
|
|
| Data is fully accessible via the portal until expiry (10 days, subject to revision upon usage and server capacity). |
|
|
| All the data can be downloaded to local computer with a few clicks via browser and command line. |
|
|
| Alignment (.bam, .bigwig) and mutation (.vcf) data can be viewed in local IGV without downloading the original files. |
|
|
| Peak regions (.bigbed, .bigwig) and sites from supported pipelines can be viewed in UCSC Genome Browser online as a track hub. |
|
|
| Real-time overall job progress log and individual tool log files are generated useful for tracking and debugging. |
|
|
| User is notified upon job completion if e-mail address is provided during job submission (optional). |
|
|
| Jobs are parallelised by multi-threading and by simultaneous run of multiple samples wherever possible. |
|
|
| Popular and established bioinformatics pipelines for new data types are continuously added. |
|
|
| All the tools and packages are regularly updated to the latest stable versions available. |
Icons are made by Freepik and obtained from www.flaticon.com.
Comparison of CSI NGS Portal to other NGS data analysis platforms.
| Platform | Number of Pipelines/ | Full | Data Visualisation | Data | Custom Workflow Building | Code | Local Installation | Registration/Login |
|---|---|---|---|---|---|---|---|---|
|
| 16 a | Yes | Static f | Yes | No | Pipeline level | In progress | Not required |
|
| Multiple b | No | Dynamic | Yes | Yes | Source level | Yes | Required |
|
| 7 c | Yes | Dynamic | Yes | Limited | No | No | Required |
|
| 1 d | Yes | Static | No | No | No | No | Required |
|
| 1 e | Yes | Static | No | No | No | No | Not required |
Features common to all platforms such as user data upload, results download, job log, pipelines documentation, etc. are omitted, and commercially available or paid platforms are excluded from the comparison. a DNA-Seq, RNA-Seq, Diff-Exp, Pathway-Enrichment, RNA-Editing, smallRNA, 4C-Seq, ChIP-Seq, RIP-Seq, SHAPE-Seq, rMATS, circRNA, eCLIP-Seq, Bisulfite-Seq, scRNA-Seq, ngsplot-deepTools; b Pipelines are available as workflows for different data types; c RNA-Seq, ChIP-Seq, Bisulfite-Seq, Exome-Seq, De novo genome sequencing, Metagenome, CAGE/SAGE-Seq; d RNA-Seq; e miRNA-Seq; f Diverse plots in pdf format, comprehensive interactive html reports, links to IGV and UCSC track hubs are provided depending on the pipeline and accessible via single clicks on the browser.