| Literature DB >> 35172880 |
Paul P Gardner1,2, James M Paterson3, Stephanie McGimpsey4, Fatemeh Ashari-Ghomi5, Sinan U Umu6, Aleksandra Pawlik7, Alex Gavryushkin8,9, Michael A Black10.
Abstract
BACKGROUND: Computational biology provides software tools for testing and making inferences about biological data. In the face of increasing volumes of data, heuristic methods that trade software speed for accuracy may be employed. We have studied these trade-offs using the results of a large number of independent software benchmarks, and evaluated whether external factors, including speed, author reputation, journal impact, recency and developer efforts, are indicative of accurate software.Entities:
Mesh:
Year: 2022 PMID: 35172880 PMCID: PMC8851831 DOI: 10.1186/s13059-022-02625-x
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Fig. 1A A heatmap indicating the relationships between different features of bioinformatic software tools. Spearman’s rho is used to infer correlations between metrics such as citations based metrics, the year and relative age of publication, version number, GitHub derived activity measures, and the mean relative speed and accuracy rankings. Red colours indicate a positive correlation, blue colours indicate a negative correlation. Correlations with a P value less than 0.05 (corrected for multiple-testing using the Benjamini-Hochberg method) are indicated with a ‘X’ symbol. The correlations with accuracy are illustrated in more detail in B, the relationship between speed and accuracy is shown in more detail in Fig. 2. B Violin plots of Spearman’s correlations for permuted accuracy ranks and different software features. The unpermuted correlations are indicated with a red asterisk. For each benchmark, 1000 permuted sets of accuracy and speed ranks were generated, and the ranks were normalised to lie between 0 and 1 (see Methods for details). Circled asterisks are significant (empirical P value < 0.05, corrected for multiple-testing using the Benjamini-Hochberg method)
Fig. 2A A heatmap indicating the relative paucity or abundance of software in the range of possible accuracy and speed rankings. Redder colours indicate an abundance of software tools in an accuracy and speed category, while bluer colours indicate scarcity of software in an accuracy and speed category. The abundance is quantified using a Z-score computation for each bin, this is derived from 1000 random permutations of speed and accuracy ranks from each benchmark. Mean normalised ranks of accuracy and speed have been binned into 9 classes (a 3×3 grid) that range from comparatively slow and inaccurate to comparatively fast and accurate. Z-scores with a P value less than 0.05 are indicated with a ‘X’. B The z-score distributions from the permutation tests (indicated with the wheat coloured violin plots) compared to the z-score for the observed values for each of the corner and middle square of the heatmap