| Literature DB >> 27134733 |
Ben Busby1, Matthew Lesko1, Lisa Federer2.
Abstract
In genomics, bioinformatics and other areas of data science, gaps exist between extant public datasets and the open-source software tools built by the community to analyze similar data types. The purpose of biological data science hackathons is to assemble groups of genomics or bioinformatics professionals and software developers to rapidly prototype software to address these gaps. The only two rules for the NCBI-assisted hackathons run so far are that 1) data either must be housed in public data repositories or be deposited to such repositories shortly after the hackathon's conclusion, and 2) all software comprising the final pipeline must be open-source or open-use. Proposed topics, as well as suggested tools and approaches, are distributed to participants at the beginning of each hackathon and refined during the event. Software, scripts, and pipelines are developed and published on GitHub, a web service providing publicly available, free-usage tiers for collaborative software development. The code resulting from each hackathon is published at https://github.com/NCBI-Hackathons/ with separate directories or repositories for each team.Entities:
Keywords: Bioconductor; Education; Genome Annotation; Genomics; Next Generation Sequencing; Open-Source; Pharmacogenomics; Software
Year: 2016 PMID: 27134733 PMCID: PMC4837979 DOI: 10.12688/f1000research.8382.2
Source DB: PubMed Journal: F1000Res ISSN: 2046-1402
Figure 1. Outline of the pipeline in Homogeneous RNA-seq Mapping (HRM) team.
The leftmost column shows procedures and the next columns are tools used in each step and files created by each tool, respectively. HISAT directly accesses SRA data of interest for users and provides aligned reads in a SAM file. Picard classified reads sorted by SAMtools into functional categories using the RefFlat file. After the quality check by qc.pl, HTSeq calculates raw read counts at each region.