| Literature DB >> 29048594 |
Eric Marinier1, Rahat Zaheer1, Chrystal Berry1, Kelly A Weedmark1, Michael Domaratzki2, Philip Mabon1, Natalie C Knox1, Aleisha R Reimer1, Morag R Graham1,3, Linda Chui4,5, Laura Patterson-Fortin5, Jian Zhang6, Franco Pagotto7, Jeff Farber7, Jim Mahony8, Karine Seyer9, Sadjia Bekal10,11, Cécile Tremblay10,11, Judy Isaac-Renton12, Natalie Prystajecky12,13, Jessica Chen14, Peter Slade15, Gary Van Domselaar1,3.
Abstract
The ready availability of vast amounts of genomic sequence data has created the need to rethink comparative genomics algorithms using 'big data' approaches. Neptune is an efficient system for rapidly locating differentially abundant genomic content in bacterial populations using an exact k-mer matching strategy, while accommodating k-mer mismatches. Neptune's loci discovery process identifies sequences that are sufficiently common to a group of target sequences and sufficiently absent from non-targets using probabilistic models. Neptune uses parallel computing to efficiently identify and extract these loci from draft genome assemblies without requiring multiple sequence alignments or other computationally expensive comparative sequence analyses. Tests on simulated and real datasets showed that Neptune rapidly identifies regions that are both sensitive and specific. We demonstrate that this system can identify trait-specific loci from different bacterial lineages. Neptune is broadly applicable for comparative bacterial analyses, yet will particularly benefit pathogenomic applications, owing to efficient and sensitive discovery of differentially abundant genomic loci. The software is available for download at: http://github.com/phac-nml/neptune.Entities:
Mesh:
Year: 2017 PMID: 29048594 PMCID: PMC5737611 DOI: 10.1093/nar/gkx702
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971