| Literature DB >> 31375475 |
Johnathan Lo1, Michelle M Jonika1, Heath Blackmon2.
Abstract
Microsatellites are repetitive DNA sequences usually found in non-coding regions of the genome. Their quantification and analysis have applications in fields from population genetics to evolutionary biology. As genome assemblies become commonplace, the need for software that can facilitate analyses has never been greater. In particular, R packages that can analyze genomic data are particularly important since this is one of the most popular software environments for biologists. We created an R package, micRocounter, to quantify microsatellites. We have optimized our package for speed, accessibility, and portability, making the automated analysis of large genomic data sets feasible. Computationally intensive algorithms were built in C++ to increase speed. Tests using benchmark datasets show a 200-fold improvement in speed over existing software. A moderately sized genome of 500 Mb can be processed in under 50 sec. Results are output as an object in R increasing accessibility and flexibility for practitioners.Entities:
Keywords: genome analysis; genomics; microsatellite; repetitive sequences
Mesh:
Year: 2019 PMID: 31375475 PMCID: PMC6778809 DOI: 10.1534/g3.119.400335
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
Insect genomes used in benchmarking and testing micRocounter. Assembly size is the size of the assembled genome and not necessarily representative of the true genome size since some assemblies are highly fragmented or missing significant proportions of the genome. All genomes were downloaded from NCBI
| Order | Species | Assembly Size (Mbp) | Assembly Version | Accession Number |
|---|---|---|---|---|
| Blattodea | 2037 | 1 | GCA_003018175.1 | |
| Blattodea | 1018 | 1 | GCA_002891405.2 | |
| Coleoptera | 2409 | 2 | GCA_003013835.2 | |
| Coleoptera | 12 | 1 | GCA_000281835.1 | |
| Diptera | 1,383 | 5 | GCA_002204515.1 | |
| Diptera | 253 | 1 | GCA_000298335.1 | |
| Diptera | 144 | 6+ | GCA_000001215.4 | |
| Diptera | 69 | 1 | GCA_001014935.1 | |
| Diptera | 412 | 1 | GCA_001015175.1 | |
| Hemiptera | 706 | 1 | GCA_000181055.3 | |
| Hymenoptera | 589 | 1 | GCA_900490025.1 | |
| Lepidoptera | 855 | 1 | GCA_002245475.1 | |
| Lepidoptera | 357 | 1 | GCA_002938995.1 | |
| Odonata | 1,628 | 1 | GCA_002093875.1 | |
| Phasmatidae | 3802 | 1 | GCA_002778355.1 |
Figure 1Processing time and memory usage of micRocounter across 15 representative genomes. In each panel the x axis represents genome size of the benchmark genomes in Mb. A) Comparison of execution time for micRocounter and Palfinder on benchmark genome set. B) Execution time for micRocounter on benchmark genome sets with time on a log scale. C) Peak memory usage running micRocounter on benchmark genomes.