| Literature DB >> 28327106 |
Izaskun Mallona1, Miguel A Peinado2.
Abstract
BACKGROUND: Genomic datasets accompanying scientific publications show a surprisingly high rate of gene name corruption. This error is generated when files and tables are imported into Microsoft Excel and certain gene symbols are automatically converted into dates.Entities:
Keywords: Data conversion; Excel; Gene symbol; Machine readability; Structured data
Mesh:
Year: 2017 PMID: 28327106 PMCID: PMC5359807 DOI: 10.1186/s12864-017-3631-8
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1Data flow and usage example. a, Truke data flow. b, tabular data with the corrupted (left) and fixed (right) gene symbols. Data corresponds to the Ziemann’s [2] meta-analysis (Additional file 1) and was processed as if formatted by mm/dd/yyyy. Rows 1,6 and 13 exemplify dates which are not recoverable. Rows 3-5,7 and 10-12 depict dates which map to different gene symbols and therefore require further manual parsing. Rows 8, 9, 14 and 15 are unambiguous fixes