Literature DB >> 29511353

A Comprehensive Study of De Novo Genome Assemblers: Current Challenges and Future Prospective.

Abdul Rafay Khan¹, Muhammad Tariq Pervez¹, Masroor Ellahi Babar², Nasir Naveed³, Muhammad Shoaib⁴.

Abstract

BACKGROUND: Current advancements in next-generation sequencing technology have made possible to sequence whole genome but assembling a large number of short sequence reads is still a big challenge. In this article, we present the comparative study of seven assemblers, namely, ABySS, Velvet, Edena, SGA, Ray, SSAKE, and Perga, using prokaryotic and eukaryotic paired-end as well as single-end data sets from Illumina platform.
RESULTS: Results showed that in case of single-end data sets, Velvet and ABySS outperformed in all the seven assemblers with comparatively low assembling time and high genome fraction. Velvet consumed the least amount of memory than any other assembler. In case of paired-end data sets, Velvet consumed least amount of time and produced high genome fraction after ABySS and Ray. In terms of low memory usage, SGA and Edena outperformed in all the assemblers. Ray also showed good genome fraction; however, extremely high assembling time consumed by the Ray might make it prohibitively slow on larger data sets of single and paired-end data.
CONCLUSIONS: Our comparison study will provide assistance to the scientists for selecting the suitable assembler according to their data sets and will also assist the developers to upgrade or develop a new assembler for de novo assembling.

Entities: Chemical

Keywords: DBG (de Bruijn graph); ENA (European Nucleotide Archive); NGS (next-generation sequencing); OLC (overlap layout consensus); bps (base pairs)

Year: 2018 PMID： 29511353 PMCID： PMC5826002 DOI： 10.1177/1176934318758650

Source DB: PubMed Journal: Evol Bioinform Online ISSN： 1176-9343 Impact factor: 1.625

Introduction

DNA sequencing has revolutionized the current advancements in the field of science and technology. It has been widely used in applied field of medicine, genetic engineering, food science, etc.[1] In current era, next-generation sequencing (NGS) is the most advanced technology of DNA sequencing, which provides more accuracy and speed than previously known Sanger sequencing.[2] Paired-end sequencing in NGS, which involves the sequencing of both forward and reverse fragments of DNA, has further increased the accuracy and ability to detect indels which otherwise was not possible in single-end sequencing.[3] Next-generation sequencing technique produces millions of short sequence reads and assembling these short sequence reads without a reference genome is one of the challenging task for de novo assemblers.[4] In the past few years, several de novo sequence assembling algorithms have been developed to handle and assemble the large amount of short sequence reads to form longer fragments called contigs but choosing the appropriate assembler for paired-end or single-end data is still a challenging job.[5] The currently available assembling algorithms include de Bruijn graph (DBG), overlap layout consensus (OLC), string graph, greedy, and hybrid algorithm.[6] De Bruijn graph is the graph algorithm based on k-mers approach, which splits the short reads into smaller k-mers, and these k-mers overlap by k − 1 which is the next k-mer. Dividing the sequences into smaller sizes also helps improving the crisis of different initial read lengths, whereas OLC is also the graph-based algorithm which builds overlap graph by overlapping the similar sequences.[7] Finding overlapping sequences is usually the slowest part of the assembly and these overlapped sequences then pack fragments of the overlap graph into contigs. The DBG algorithm is faster and OLC algorithm executes better for longer sequence reads. String graph algorithm is the variant of OLC algorithm, which performs global overlap graph by eliminating unnecessary sequences.[8] Greedy algorithms start by joining the short sequence reads that are best overlapped to produce contigs. Most greedy assemblers use heuristic techniques that are designed to eliminate misassembling of recurring sequences.[9] Hybrid assembling algorithm refers to the mixing various assembling algorithms. It is used to reduce the number of contigs and errors produced by other algorithms.[10] There are many de novo assemblers available online which have been developed by applying one of these five assembling algorithms. Our study evaluated the de novo sequence assemblers for Illumina-based paired-end and single-end short reads data sets. This study provides guidance to the biologists and bioinformaticians in selecting the appropriate assembler according to their data sets and it also assists developers to upgrade or develop a new assembler for de novo assembling.

Materials and Methods

Data sets

To compare the performance of each assembler, Illumina HiSeq 2000–based short sequence reads were downloaded from publicly available database European Nucleotide Archive (ENA)[11] (Tables 1 and 2). For the estimation of genome fraction, all the reference genomes were downloaded from National Center for Biotechnology Information (NCBI) genome database. Short sequence reads included 7 paired-end and 8 single-end prokaryotic data sets and also 5 paired-end and 5 single-end eukaryotic data sets. All the data sets have maximum read length of 100 bps.

Table 1.

Prokaryotic data sets used in this study.

S. no.	Data set	ENA run accession	Data set type	No. of reads
1	Staphylococcus aureus	ERR353143	Paired-end	137 022
2	Streptococcus pneumoniae	ERR490828	Paired-end	321 004
3	Escherichia coli	ERR490638	Paired-end	737 008
4	Mycobacterium tuberculosis	ERR495003	Paired-end	770 994
5	Neisseria flava	DRR015798	Paired-end	1 218 573
6	Aeromonas salmonicida	DRR015726	Paired-end	2 267 875
7	Rothia mucilaginosa	DRR015851	Paired-end	4 098 002
8	Streptococcus suis	DRR015872	Single-end	113 512
9	Streptococcus pyogenes	SRR1148216	Single-end	724 546
10	Salmonella enterica	ERR233905	Single-end	1 490 584
11	Neisseria gonorrhoeae	SRR969383	Single-end	1 840 438
12	Chlamydia muridarum	SRR1736648	Single-end	3 099 636
13	Clostridioides difficile	ERR465798	Single-end	5 094 314
14	Bacillus anthracis	ERR1596542	Single-end	7 466 661
15	Chlamydia trachomatis	SRR1038047	Single-end	9 129 274

Table 2.

Eukaryotic data sets used in this study.

S. no.	Data set	ENA run accession	Data set type	No. of reads
1	Homo sapiens	DRR002191	Paired-end	126 605 856
2	Drosophila melanogaster	DRR016722	Paired-end	95 461 377
3	Arabidopsis thaliana	ERR1224454	Paired-end	30 841 688
4	Saccharomyces cerevisiae	ERR052652	Paired-end	17 584 902
5	Fungi	SRR1614243	Paired-end	22 344 195
6	Homo sapiens	DRR002191	Single-end	126 605 856
7	Drosophila melanogaster	DRR002191	Single-end	95 461 377
8	Arabidopsis thaliana	ERR1224454	Single-end	30 841 688
9	Saccharomyces cerevisiae	ERR052652	Single-end	17 584 902
10	Fungi	SRR1614243	Single-end	22 344 195

Prokaryotic data sets used in this study. Eukaryotic data sets used in this study.

Genome assemblers

Seven assemblers (Table 3), which represent 5 different assembly algorithm strategies, were selected to assemble paired-end and single-end data sets.

Table 3.

De novo assemblers selected for this study.

S. no.	ASSEMBLER	Programming LANGUAGE	ALGORITHM	Input reads
1	ABySS[14]	C++	De Bruijn graph (DBG)	Paired-end and single-end
2	Velvet[15]	C	De Bruijn graph (DBG)	Paired-end and single-end
3	Edena[16]	C++	Overlap/layout/consensus (OLC)	Paired-end and single-end
4	SGA[17]	C++	String graph	Paired-end
5	Ray[18]	C++	Hybrid	Paired-end and single-end
6	SSAKE[19]	Perl	Greedy	Paired-end and single-end
7	Perga[20]	C	Greedy	Paired-end and single-end

De novo assemblers selected for this study. All the selected assemblers were executed on the virtual machine, which was designed using Oracle VM VirtualBox with 2 VCPU, 4 GB of RAM memory and 64-bit Linux Ubuntu Server 14.04 operating system (supplementary file 1).

Efficiency evaluation

The efficiency of each assembler was evaluated using various parameters, which include assembling total time, maximum memory usage, and maximum CPU usage.

Accuracy evaluation

The output of assemblers was decomposed into contigs. All these contig information were stored in contig files which were produced as an end result of assembling by an assembler. Contig files were used for the accuracy evaluation of each assembler using different parameters including the total number of contigs and N50 contig length. These parameters were collected using Assemblathon 2 script[12] which is written in Perl language to calculate the metrics of each contig file. Genome fraction was calculated using QUAST tool[13] to find the similarity between the contig sequences and the reference genome.

Statistical analyses

For data analysis, R (version 3.3.2) was used. The data were tested using Shapiro-Wilk normality to find whether data are normally distributed or not. To determine statistical significance, parametric and nonparametric tests were used according to the data. A 2-tailed P values less than .05 were considered as significant.

Results

Efficiency, as well as the accuracy of each assembler, was analyzed by generated contig files using various evaluation techniques. Our study involved evaluation of 7 different assemblers with alternative assembly algorithms such as ABySS and Velvet, the DBG-based assemblers; Edena which is an OLC-based assembler; SGA which uses string graph algorithm; SSAKE and Perga, the greedy-based assembler; and Ray which worked on hybrid algorithm (Table 3).

Total assembling time

The total assembling time in minutes was calculated using Linux time command, and median of each assembler was compared using Mann-Whitney test. The results showed that Ray, the hybrid assembler, consumed more time on paired-end data sets with a median time of 553.95 minutes and single-end data sets with a median time of 373.15 minutes than any other assembler and reached very high level of significance with P < .05 in prokaryotic data sets, whereas in eukaryotic data set, SSAKE, the greedy assembler, consumed more time on single-end data sets with a median time of 223.10 minutes and lowest in paired-end data sets with a median time of 13.85 minutes. Velvet, the DBG assembler, consumed lowest median time of 1.49 minutes on paired-end data sets and 1.26 minutes on single-end data sets, whereas in eukaryotic data sets, Velvet showed lowest time of 1.26 minutes on single-end data sets with median of 4.90 minutes. ABySS, which is also the DBG assembler, was second in consuming the lowest median time of 1.93 minutes on single-end prokaryotic and eukaryotic data sets. SSAKE was second lowest time-consuming tool (on paired-end prokaryotic data sets) with median time of 1.93 minutes (Figure 1).

Figure 1.

The comparison of total median assembling time of each assembler for (A) paired-end and single-end prokaryotic data sets and (B) paired-end and single-end eukaryotic data sets.

Memory and CPU usage

The maximum assembling memory usage in megabytes (MBs) and CPU usage in percentage (%) were also calculated using Linux command and the assemblers were compared using independent samples test. On prokaryotic paired-end data sets, the results showed that ABySS, the DBG assembler, and Perga, the greedy assembler, consumed the highest amount of memory than any other assembler with a significance of P < .05, whereas SGA, the string graph assembler, and Edena, the OLC assembler, used the least amount of memory than other assemblers. On eukaryotic paired-end data sets, ABySS and Velvet consumed the highest amount of memory, whereas SSAKE used the least amount of memory. On prokaryotic single-end data sets, SSAKE and Perge, the greedy assemblers, consumed the highest amount of memory, whereas Edena consumed the lowest memory among all. On eukaryotic single-end data sets, Velvet and Edena consumed the lowest memory among all assembler, whereas Perga and SSAKE consumed the highest amount of memory (Figure 2).

Figure 2.

The mean comparison of memory usage and CPU usage of each assembler for (A) paired-end and single-end prokaryotic data sets and (B) paired-end and single-end eukaryotic data sets.

The mean comparison of memory usage and CPU usage of each assembler for (A) paired-end and single-end prokaryotic data sets and (B) paired-end and single-end eukaryotic data sets. In terms of CPU usage, ABySS, Velvet, and SGA, the graph-based assemblers, consumed a huge amount of CPU, whereas Edena and SSAKE consumed least amount of CPU on prokaryotic paired-end data sets, whereas SSAKE also consumed least amount of CPU on eukaryotic paired-end data sets. On prokaryotic and eukaryotic single-end data sets, Ray, Perge, and Velvet consumed huge amount of CPU as compared with SSAKE and Edena which consumed least amount of CPU (Figure 2).

Total number of contigs

For further analysis of assembled contigs, the number of contigs was calculated by running Assemblathon script. In an ideal condition, the minimum number of contigs that matches the whole genome sequence could be generated from each assembly procedure. The results showed that on prokaryotic and eukaryotic paired-end data sets, the Velvet, the DBG assembler, assembled short reads into relatively short contigs and achieved significance of P < .05, whereas in case of single-end data sets, ABySS produced the high number of contigs followed by Velvet and SSAKE. However, SSAKE and Perga produced the low number of contigs on paired-end data sets (Figure 3).

Figure 3.

The comparison of the total number of contigs by median of each assembler for (A) paired-end and single-end prokaryotic data sets and (B) paired-end and single-end eukaryotic data sets.

N50 contig length

N50 contig length was calculated by running Assemblathon script on contig files produced by various assemblers. On prokaryotic and eukaryotic paired-end data sets, ABySS produced high N50 contig length, whereas Velvet produced low N50 contig length. On prokaryotic single-end data sets, Velvet produced high N50 contig length with a median length of 1530.00 bp followed by ABySS with a median length of 1054.00 bp, whereas SSAKE and Perga produced low N50 contig length with a median length of 260.50 and 348.00 bp. On eukaryotic single-end data sets, Edena produced high N50 contig length with a median length of 57 252.00 bp, whereas Perga produced low N50 contig length with a median length of 12 654.50 bp (Figure 4).

Figure 4.

The comparison of the N50 contig length by median of each assembler for (A) paired-end and single-end prokaryotic data sets and (B) paired-end and single-end eukaryotic data sets.

Genome fraction

By mapping all the contigs onto the reference genomes using QUAST tool, we calculated the genome fraction of all the contigs generated by each assemblers which showed the percentage of aligned contig bases in the reference genome (Table 4). ABySS showed the high number of genome fraction with a mean of 66.3% on paired-end data sets and 69.8% in prokaryotic and eukaryotic single-end data sets. Ray showed second highest genome fraction with a mean of 58.8% followed by Velvet with third highest genome fraction with a mean of 57.1% on prokaryotic paired-end data sets, whereas Perga, Edena, and SGA showed average accuracy with a mean genome fraction of 51.9%, 51.4%, and 50.4% and SSAKE showed worst accuracy with mean genome fraction of 13.2%.

Table 4.

List of all assemblers with their mean genome fraction.

Assembler	Prokaryotic single-end	Prokaryotic paired-end
ABySS	69.8	66.3
Velvet	59.6	57.1
Edena	43.8	51.4
SGA	—	50.4
Ray	48.7	58.8
SSAKE	44.3	13.2
Perga	57.6	51.9
Assembler	Eukaryotic single-end	Eukaryotic paired-end
ABySS	85.4	82.4
Velvet	82.6	85.6
Edena	62.2	90.4
Perga	82.0	83.2
SGA	—	52.4
SSAKE	49.2	74.0

List of all assemblers with their mean genome fraction. On single-end prokaryotic data sets, Velvet showed second highest genome fraction with mean of 59.6% followed by Perga with third highest genome fraction with mean of 57.6%, whereas Ray, SSAKE, and Edena showed average accuracy with mean genome fraction of 48.7%, 44.3%, and 43.8%. On eukaryotic paired-end data sets, Edena showed highest genome fraction with a mean of 90.4% and the second highest Velvet with a mean of 85.6% whereas SSAKE showed lowest genome fraction with a mean of 49.2% and 74.0% in single-end and paired-end data sets (Figure 5). Practically, an assembler which produces the fewer number of contigs, with high N50 and high genome fraction, is considered to be ideal.

Figure 5.

The comparison of mean genome fraction of each assembler for (A) paired-end and single-end prokaryotic data sets and (B) paired-end and single-end eukaryotic data sets.

Discussion

We evaluated the selected assemblers with prokaryotic and eukaryotic paired-end and single-end Illumina-based short reads on a Linux-based server. Our results showed that Ray, the hybrid assembler, takes the highest time to complete the whole genome assembling on prokaryotic paired-end and single-end data sets[21] but Ray was unable to run on eukaryotic paired-end and single-end data sets because Ray required huge RAM and multiple CPUs for assembling large number of reads. However, the DBG assemblers, Velvet and ABySS, are the best options for both types of data sets because of tremendous assembling speed by consuming the lowest assembling time among all other assemblers, whereas Velvet and ABySS are the best options only for eukaryotic single-end data sets.[22] Edena, the OLC assembler, consumed lowest memory on both prokaryotic paired-end and single-end data sets; Velvet and Edena consumed lowest memory on eukaryotic single-end data sets; and SSAKE on eukaryotic paired-end data sets. SGA, the string graph assembler, was also a good choice to assemble paired-end data sets consuming low memory,[23] but in terms of assembler data transformation, SGA consumed more time in indexing, correction, duplication removal, and overlapping steps before assembling that made SGA more complex than Edena, which needs only overlapping step to be performed before assembling. Velvet also consumed less memory on single-end data sets after Edena. In terms of high memory usage, ABySS and SSAKE were on the top on paired-end and single-end data sets, respectively. In summary, in case of paired-end and single-end prokaryotic genomes, ABySS efficiently produced genome assembly and consumed less amount of time but consumed high amount of memory,[24] whereas Velvet proved to be a time-efficient and memory-efficient program for only single-end data sets. Edena was a memory-efficient program for both types of data sets, and SGA was also a memory-efficient program, but it is only available for paired-end data. ABySS and Velvet also provided high scalability to handle a large amount of data than rest of the assemblers. In terms of total number of contigs, we found that on paired-end data set, the Velvet, produced the greater number of contigs but low N50 value, whereas ABySS produced the greater number of contigs on single-end data sets and showed high N50 value on both data sets.[25] This contrasted with the contigs produced from Edena, SGA, Ray, SSAKE, and Perga that produced the low number of contigs and low N50. Ideally, contigs with high N50 and high genome fraction were our expectation but Velvet and ABySS worked more conservatively than others when it came to merging small contigs into larger contigs, which gave an assembly with a larger number of contigs.[26] There could be a number of different things that might have led to this result such as k-mer size for the assembly, quality of the single-end vs paired-end data, and a bunch of other parameters that could have been used to build the assemblies. To check the accuracy of genome assembly, the contigs were aligned to their related reference genomes using QUAST tool. ABySS showed high number of genome fraction on both paired-end and single-end data sets followed by Ray on paired-end data sets, and Velvet showed second highest genome fraction on single-end data sets. Velvet and ABySS could be the best choice for both paired-end and single-end prokaryotic data sets with highest genome fraction among all selected assemblers[27] but still there are some improvements needed to be incorporated into ABySS. There are several ways in which ABySS can be improved. ABySS consumption of memory and CPU on paired-end data sets is much higher than single-end data sets. ABySS mostly relies on mate pairs to assemble their contigs. This approach may perform poorly in case of lack of coverage and it has a known issue with deadlocking when using higher k values. So, tackling these issues and decreasing the memory and CPU usage make ABySS to be best in all other assemblers. Many research groups worldwide are working on building better genome assemblers. A group of researchers at the European Bioinformatics Institute[28] developed the DBG-based genome assembler Velvet. Canada’s Michael Smith Genome Sciences Centre[29] developed ABySS. These research groups are still working on improving their assemblers and they periodically release latest versions of their assemblers. De Bruijn graph–based genome assemblers are considered as the best genome assemblers.[30]

Conclusions

Our study evaluated 7 de novo sequence assemblers in terms of memory, time, and accuracy. We found that each assembler is capable of assembling whole prokaryotic or whole eukaryotic genome but the hybrid assembler Ray is not capable of assembling whole eukaryotic genome if you have about 4 GB of RAM or less. The selection of the best assembler is dependent on the uniqueness of the data sets and the user requirements. On single-end data sets, Velvet and ABySS, produced generally the best results among all 7 assemblers with comparatively low assembling time and high prokaryotic and eukaryotic genome fractions. Velvet also consumed the lowest memory usage on both single-end data sets. Some improvements are needed in ABySS including reduction in memory and CPU usage. On paired-end data sets, when a large amount of memory is not available, SGA and Edena might be a good choice. The hybrid approach, Ray, also showed high genome fraction; however, extremely high assembling time used by the Ray might make it prohibitively slow on larger data sets.

22 in total

1. Velvet: algorithms for de novo short read assembly using de Bruijn graphs.

Authors: Daniel R Zerbino; Ewan Birney
Journal: Genome Res Date: 2008-03-18 Impact factor: 9.043

2. ABySS: a parallel assembler for short read sequence data.

Authors: Jared T Simpson; Kim Wong; Shaun D Jackman; Jacqueline E Schein; Steven J M Jones; Inanç Birol
Journal: Genome Res Date: 2009-02-27 Impact factor: 9.043

3. A memory-efficient data structure representing exact-match overlap graphs with application for next-generation DNA assembly.

Authors: Hieu Dinh; Sanguthevar Rajasekaran
Journal: Bioinformatics Date: 2011-06-02 Impact factor: 6.937

4. Efficient de novo assembly of large genomes using compressed data structures.

Authors: Jared T Simpson; Richard Durbin
Journal: Genome Res Date: 2011-12-07 Impact factor: 9.043

5. Evaluation of methods for de novo genome assembly from high-throughput sequencing reads reveals dependencies that affect the quality of the results.

Authors: Niina Haiminen; David N Kuhn; Laxmi Parida; Isidore Rigoutsos
Journal: PLoS One Date: 2011-09-07 Impact factor: 3.240

6. MetaVelvet-SL: an extension of the Velvet assembler to a de novo metagenomic assembler utilizing supervised learning.

Authors: Kengo Sato; Yasubumi Sakakibara
Journal: DNA Res Date: 2014-11-27 Impact factor: 4.458

7. Hybrid error correction and de novo assembly of single-molecule sequencing reads.

Authors: Sergey Koren; Michael C Schatz; Brian P Walenz; Jeffrey Martin; Jason T Howard; Ganeshkumar Ganapathy; Zhong Wang; David A Rasko; W Richard McCombie; Erich D Jarvis
Journal: Nat Biotechnol Date: 2012-07-01 Impact factor: 54.908

8. PERGA: a paired-end read guided de novo assembler for extending contigs using SVM and look ahead approach.

Authors: Xiao Zhu; Henry C M Leung; Francis Y L Chin; Siu Ming Yiu; Guangri Quan; Bo Liu; Yadong Wang
Journal: PLoS One Date: 2014-12-02 Impact factor: 3.240

9. How to apply de Bruijn graphs to genome assembly.

Authors: Phillip E C Compeau; Pavel A Pevzner; Glenn Tesler
Journal: Nat Biotechnol Date: 2011-11-08 Impact factor: 54.908

10. Accurate indel prediction using paired-end short reads.

Authors: Dominik Grimm; Jörg Hagmann; Daniel Koenig; Detlef Weigel; Karsten Borgwardt
Journal: BMC Genomics Date: 2013-02-27 Impact factor: 3.969

11 in total

1. A comprehensive annotation and differential expression analysis of short and long non-coding RNAs in 16 bat genomes.

Authors: Nelly F Mostajo; Marie Lataretu; Sebastian Krautwurst; Florian Mock; Daniel Desirò; Kevin Lamkiewicz; Maximilian Collatz; Andreas Schoen; Friedemann Weber; Manja Marz; Martin Hölzer
Journal: NAR Genom Bioinform Date: 2019-09-30

2. Evolutionary Dynamics of the Pericentromeric Heterochromatin in Drosophila virilis and Related Species.

Authors: Alexander P Rezvykh; Sergei Yu Funikov; Lyudmila A Protsenko; Dina A Kulikova; Elena S Zelentsova; Lyubov N Chuvakova; Justin P Blumenstiel; Michael B Evgen'ev
Journal: Genes (Basel) Date: 2021-01-27 Impact factor: 4.096

3. Seq: A High-Performance Language for Bioinformatics.

Authors: Ariya Shajii; Ibrahim Numanagić; Riyadh Baghdadi; Bonnie Berger; Saman Amarasinghe
Journal: Proc ACM Program Lang Date: 2019-10-10

4. instaGRAAL: chromosome-level quality scaffolding of genomes using a proximity ligation-based scaffolder.

Authors: Lyam Baudry; Nadège Guiglielmoni; Hervé Marie-Nelly; Alexandre Cormier; Martial Marbouty; Komlan Avia; Yann Loe Mie; Olivier Godfroy; Lieven Sterck; J Mark Cock; Christophe Zimmer; Susana M Coelho; Romain Koszul
Journal: Genome Biol Date: 2020-06-18 Impact factor: 13.583

5. The effect of variant interference on de novo assembly for viral deep sequencing.

Authors: Christina J Castro; Rachel L Marine; Edward Ramos; Terry Fei Fan Ng
Journal: BMC Genomics Date: 2020-06-22 Impact factor: 3.969

6. An Improved Genome Assembly for Drosophila navojoa, the Basal Species in the mojavensis Cluster.

Authors: Thyago Vanderlinde; Eduardo Guimarães Dupim; Nestor O Nazario-Yepiz; Antonio Bernardo Carvalho
Journal: J Hered Date: 2019-01-07 Impact factor: 2.645

7. Application of different DNA extraction procedures, library preparation protocols and sequencing platforms: impact on sequencing results.

Authors: F Pasquali; I Do Valle; F Palma; D Remondini; G Manfreda; G Castellani; R S Hendriksen; A De Cesare
Journal: Heliyon Date: 2019-11-01

8. Ten simple rules for getting started with command-line bioinformatics.

Authors: Parice A Brandies; Carolyn J Hogg
Journal: PLoS Comput Biol Date: 2021-02-18 Impact factor: 4.475

9. De novo genome assembly of Bacillus altitudinis 19RS3 and Bacillus altitudinis T5S-T4, two plant growth-promoting bacteria isolated from Ilex paraguariensis St. Hil. (yerba mate).

Authors: Iliana Julieta Cortese; María Lorena Castrillo; Andrea Liliana Onetto; Gustavo Ángel Bich; Pedro Darío Zapata; Margarita Ester Laczeski
Journal: PLoS One Date: 2021-03-11 Impact factor: 3.240

10. QuASeR: Quantum Accelerated de novo DNA sequence reconstruction.

Authors: Aritra Sarkar; Zaid Al-Ars; Koen Bertels
Journal: PLoS One Date: 2021-04-12 Impact factor: 3.240