Theodore R Gibbons1, Stephen M Mount2,3, Endymion D Cooper4, Charles F Delwiche5,6. 1. Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, 20742, Maryland. trgibbons@gmail.com. 2. Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, 20742, Maryland. smount@umd.edu. 3. Center for Bioinformatics and Computational Biology, University of Maryland, College Park, 20742, Maryland. smount@umd.edu. 4. Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, 20742, Maryland. endymion.dante.cooper@gmail.com. 5. Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, 20742, Maryland. delwiche@umd.edu. 6. Maryland Agricultural Experiment Station, University of Maryland, College Park, 20742, Maryland. delwiche@umd.edu.
Unfortunately, [1] was published without incorporation of all requested corrections. The correct equation for computing the E-value from a bit score is not The image file used for Figure 2 was incorrect. The correct figure is provided here as Fig. 1. The name of Additional File 1 should be “Supplemental Figures”, not “Sensitivity performance comparison for each test database.” The University of Maryland, College Park is located in College Park, Maryland, not College Park, Baltimore, Maryland. The correct addresses are provided in this erratum.Illustration of simulated sequence fragmentation. This example illustrates the four different ways in which the fragmentation scheme 1323 would be applied to a toy input test database with only three clusters containing sequences from only four organisms. The four resulting test sets represent a cross of two variables, arranged here into rows and columns. The sequences in the top row (a & b) have been split into even subsequences. The sequences in the lower row (c & d) have been randomly fragmented into uneven subsequences. In the left column (a & c), the user-defined integer assigned to each organism directly determines the number of subsequences into which each sequence is split. In the right column (b & d), these integers are first mapped to all sequences within a cluster, but are then shuffled within that cluster before fragmentation