MOTIVATION: At present, mapping of sequence identifiers across databases is a daunting, time-consuming and computationally expensive process, usually achieved by sequence similarity searches with strict threshold values. SUMMARY: We present a rapid and efficient method to map sequence identifiers across databases. The method uses the MD5 checksum algorithm for message integrity to generate sequence fingerprints and uses these fingerprints as hash strings to map sequences across databases. The program, called MagicMatch, is able to cross-link any of the major sequence databases within a few seconds on a modest desktop computer.
MOTIVATION: At present, mapping of sequence identifiers across databases is a daunting, time-consuming and computationally expensive process, usually achieved by sequence similarity searches with strict threshold values. SUMMARY: We present a rapid and efficient method to map sequence identifiers across databases. The method uses the MD5 checksum algorithm for message integrity to generate sequence fingerprints and uses these fingerprints as hash strings to map sequences across databases. The program, called MagicMatch, is able to cross-link any of the major sequence databases within a few seconds on a modest desktop computer.
Authors: Peter D Karp; Christos A Ouzounis; Caroline Moore-Kochlacs; Leon Goldovsky; Pallavi Kaipa; Dag Ahrén; Sophia Tsoka; Nikos Darzentas; Victor Kunin; Núria López-Bigas Journal: Nucleic Acids Res Date: 2005-10-24 Impact factor: 16.971
Authors: Sayoni Das; Ian Sillitoe; David Lee; Jonathan G Lees; Natalie L Dawson; John Ward; Christine A Orengo Journal: Nucleic Acids Res Date: 2015-05-11 Impact factor: 16.971