| Literature DB >> 25378341 |
Martin Kollmar1, Lotte Kollmar2, Björn Hammesfahr2, Dominic Simm2.
Abstract
Eukaryotic genomes are the basis for understanding the complexity of life from populations to the molecular level. Recent technological innovations have revolutionized the speed of data generation enabling the sequencing of eukaryotic genomes and transcriptomes within days. The database diArk (http://www.diark.org) has been developed with the aim to provide access to all available assembled genomes and transcriptomes. In September 2014, diArk contains about 2600 eukaryotes with 6000 genome and transcriptome assemblies, of which 22% are not available via NCBI/ENA/DDBJ. Several indicators for the quality of the assemblies are provided to facilitate their comparison for selecting the most appropriate dataset for further studies. diArk has a user-friendly web interface with extensive options for filtering and browsing the sequenced eukaryotes. In this new version of the database we have also integrated species, for which transcriptome assemblies are available, and we provide more analyses of assemblies.Entities:
Mesh:
Year: 2014 PMID: 25378341 PMCID: PMC4384042 DOI: 10.1093/nar/gku990
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Representation of the nonredundant species (i.e. one strain per species) in diArk with their sequencing type and method. For comparison, all species are marked, for which transcriptome data and/or genome assemblies are available via NCBI/ENA/DDBJ. Nine hundred and eighty five of the assemblies have been published but only 784 of them are linked to the genome assemblies at NCBI.
Figure 2.Evolution of the fraction of nonredundant species compared to all sequenced species over time.
Figure 3.Distribution of species, for which EST/cDNA data, genome assemblies and transcriptome assemblies are available. For each sequencing type, the pie charts show the percentage of sequenced species for selected taxa.