Kai Kang1,2, Hui Chong1, Kang Ning1. 1. Key Laboratory of Molecular Biophysics of the Ministry of Education, Hubei Key Laboratory of Bioinformatics and Molecular-imaging, Center of AI Biology, Department of Bioinformatics and Systems Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China. 2. Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China.
Abstract
BACKGROUND: Microbial community samples have been accumulating at a speed faster than ever, with hundreds of thousands of samples been sequenced each year. Mining such a huge amount of multisource heterogeneous data is becoming an increasingly difficult challenge, so efficient and accurate compare and search of samples is in urgent need: faced with millions of samples in the data repository, traditional sample comparison and search approaches fall short in speed and accuracy. FINDINGS: Here we proposed Meta-Prism 2.0, a microbial community sample analysis method that has pushed the time and memory efficiency to a new limit without compromising accuracy. Based on sparse data structure, time-saving instruction pipeline, and SIMD optimization, Meta-Prism 2.0 has enabled ultra-fast, memory-efficient, flexible, and accurate search among millions of samples. Meta-Prism 2.0 was put to test on several data sets, with the largest containing 1 million samples. Results show that Meta-Prism 2.0's 0.00001-s per sample pair compare speed and 8-GB memory needs for searching against 1 million samples have made it one of the most efficient sample analysis methods. Additionally, Meta-Prism 2.0 can achieve accuracy comparable with or better than other contemporary methods. Third, Meta-Prism 2.0 can precisely identify the original biome for samples, thus enabling sample source tracking. Finally, we have provided a web server for fast search of microbial community samples online. CONCLUSIONS: In summary, Meta-Prism 2.0 has changed the resource-intensive sample search scheme to an effective procedure, which could be conducted by researchers every day even on a laptop, for insightful sample search, similarity analysis, and knowledge discovery. Meta-Prism 2.0 can be accessed at https://github.com/HUST-NingKang-Lab/Meta-Prism-2.0, and the web server can be accessed at https://hust-ningkang-lab.github.io/Meta-Prism-2.0/.
BACKGROUND: Microbial community samples have been accumulating at a speed faster than ever, with hundreds of thousands of samples been sequenced each year. Mining such a huge amount of multisource heterogeneous data is becoming an increasingly difficult challenge, so efficient and accurate compare and search of samples is in urgent need: faced with millions of samples in the data repository, traditional sample comparison and search approaches fall short in speed and accuracy. FINDINGS: Here we proposed Meta-Prism 2.0, a microbial community sample analysis method that has pushed the time and memory efficiency to a new limit without compromising accuracy. Based on sparse data structure, time-saving instruction pipeline, and SIMD optimization, Meta-Prism 2.0 has enabled ultra-fast, memory-efficient, flexible, and accurate search among millions of samples. Meta-Prism 2.0 was put to test on several data sets, with the largest containing 1 million samples. Results show that Meta-Prism 2.0's 0.00001-s per sample pair compare speed and 8-GB memory needs for searching against 1 million samples have made it one of the most efficient sample analysis methods. Additionally, Meta-Prism 2.0 can achieve accuracy comparable with or better than other contemporary methods. Third, Meta-Prism 2.0 can precisely identify the original biome for samples, thus enabling sample source tracking. Finally, we have provided a web server for fast search of microbial community samples online. CONCLUSIONS: In summary, Meta-Prism 2.0 has changed the resource-intensive sample search scheme to an effective procedure, which could be conducted by researchers every day even on a laptop, for insightful sample search, similarity analysis, and knowledge discovery. Meta-Prism 2.0 can be accessed at https://github.com/HUST-NingKang-Lab/Meta-Prism-2.0, and the web server can be accessed at https://hust-ningkang-lab.github.io/Meta-Prism-2.0/.
Authors: Sunil Thomas; Jacques Izard; Emily Walsh; Kristen Batich; Pakawat Chongsathidkiet; Gerard Clarke; David A Sela; Alexander J Muller; James M Mullin; Korin Albert; John P Gilligan; Katherine DiGuilio; Rima Dilbarova; Walker Alexander; George C Prendergast Journal: Cancer Res Date: 2017-03-14 Impact factor: 12.701
Authors: Omry Koren; Julia K Goodrich; Tyler C Cullender; Aymé Spor; Kirsi Laitinen; Helene Kling Bäckhed; Antonio Gonzalez; Jeffrey J Werner; Largus T Angenent; Rob Knight; Fredrik Bäckhed; Erika Isolauri; Seppo Salminen; Ruth E Ley Journal: Cell Date: 2012-08-03 Impact factor: 41.582
Authors: Georg Zeller; Julien Tap; Anita Y Voigt; Shinichi Sunagawa; Jens Roat Kultima; Paul I Costea; Aurélien Amiot; Jürgen Böhm; Francesco Brunetti; Nina Habermann; Rajna Hercog; Moritz Koch; Alain Luciani; Daniel R Mende; Martin A Schneider; Petra Schrotz-King; Christophe Tournigand; Jeanne Tran Van Nhieu; Takuji Yamada; Jürgen Zimmermann; Vladimir Benes; Matthias Kloor; Cornelia M Ulrich; Magnus von Knebel Doeberitz; Iradj Sobhani; Peer Bork Journal: Mol Syst Biol Date: 2014-11-28 Impact factor: 11.429
Authors: Alex L Mitchell; Alexandre Almeida; Martin Beracochea; Miguel Boland; Josephine Burgin; Guy Cochrane; Michael R Crusoe; Varsha Kale; Simon C Potter; Lorna J Richardson; Ekaterina Sakharova; Maxim Scheremetjew; Anton Korobeynikov; Alex Shlemov; Olga Kunyavskaya; Alla Lapidus; Robert D Finn Journal: Nucleic Acids Res Date: 2020-01-08 Impact factor: 16.971
Authors: Maria G Dominguez-Bello; Kassandra M De Jesus-Laboy; Nan Shen; Laura M Cox; Amnon Amir; Antonio Gonzalez; Nicholas A Bokulich; Se Jin Song; Marina Hoashi; Juana I Rivera-Vinas; Keimari Mendez; Rob Knight; Jose C Clemente Journal: Nat Med Date: 2016-02-01 Impact factor: 53.440