| Literature DB >> 27535823 |
Matteo Comin1, Michele Schimd2.
Abstract
BACKGROUND: Sequencing technologies are generating enormous amounts of read data, however assembly of genomes and metagenomes remain among the most challenging tasks. In this paper we study the comparison of genomes and metagenomes only based on read data, using word counts statistics called alignment-free thus not requiring reference genomes or assemblies. Quality scores produced by sequencing platforms are fundamental for various analyses, moreover future-generation sequencing platforms, will produce longer reads but with error rate around 15 %. In this context it will be fundamental to exploit quality values information within the framework of alignment-free measures.Entities:
Keywords: Alignment-free measures; Meta-genomes; Phylogeny without assembly; Reads quality values
Mesh:
Year: 2016 PMID: 27535823 PMCID: PMC4989896 DOI: 10.1186/s12920-016-0193-6
Source DB: PubMed Journal: BMC Med Genomics ISSN: 1755-8794 Impact factor: 3.063
Fig. 1Average R-F distance between estimated and original trees as sequence length N varies
Fig. 2Average R-F distance between estimated tree and the original as number of reads M varies
Fig. 3Average R-F distance between estimated tree and the original as the length of reads β varies
Fig. 4Average R-F distance between estimated and original trees as parameter k varies
Fig. 5Output dendogram of hierarchical clustering when dissimilarity matrix is produced using measure (k=8) on all the datasets for the three groups LV, M and MA
Fig. 6Height of the node that correctly clusters the two types of metagenomes for the groups LV, M and MA as a function of parameter k of the measure used to produce the dissimilarity matrix