| Literature DB >> 33437938 |
Abstract
DNA and RNA sequencing is a core technology in biological and medical research. The high throughput of these technologies and the consistent development of new experimental assays and biotechnologies demand the continuous development of methods to analyze the resulting data. The RECOMB Satellite Workshop on Massively Parallel Sequencing brings together leading researchers in computational genomics to discuss emerging frontiers in algorithm development for massively parallel sequencing data. The 10th meeting in this series, RECOMB-Seq 2020, was scheduled to be held in Padua, Italy, but due to the ongoing COVID-19 pandemic, the meeting was carried out virtually instead. The online workshop featured keynote talks by Paola Bonizzoni and Zamin Iqbal, two highlight talks, ten regular talks, and three short talks. Seven of the works presented in the workshop are featured in this edition of iScience, and many of the talks are available online in the RECOMB-Seq 2020 YouTube channel.Entities:
Keywords: Bioinformatics; Genomics; Quantitative Genetics
Year: 2020 PMID: 33437938 PMCID: PMC7788091 DOI: 10.1016/j.isci.2020.101956
Source DB: PubMed Journal: iScience ISSN: 2589-0042
Figure 1The growth in available genomic data—both raw sequencing data and processed data—is staggering
The plot in Figure 1A shows the growth, over time, of the total number of genomes deposited in the RefSeq database (Leary et al., 2015); note the y axis is on a log scale. The number of available assemblies has been increasing at an exponential rate, and the availability of such a wide and growing variety of references highlights the importance of developing scalable approaches for pan-genomic representation and indexing. Likewise, the plot in Figure 1B (with data as reported in [Svensson et al., 2019]) shows the growth, over time, of the total number of reported cells in different single-cell RNA-seq sequencing experiments, with the blue line signifying the total cumulative number of reported cells. The clear trend is that more recent studies report sequencing results on more individual cells, with one recent study (Cao et al., 2020) reporting million cells.
Figure 2Graph-based representations of pangenomes and the trade-offs between what can be represented and what can be efficiently indexed
General graphs can express all variations and directed acyclic graphs (DAGs) only miss some structural variations but these representations are not efficiently indexable (Equi et al., 2020). Haplotype-aware graphs (Sirén et al., 2019) and founder block graphs (Mäkinen et al., 2020) are a restricted form of DAGs that can be efficiently indexed. Colored de Bruijn graphs (Iqbal et al., 2012) are efficiently indexable (Almodaresi et al., 2018) but collapse repeats.