Literature DB >> 33600481

Development of a program for in silico optimized selection of oligonucleotide-based molecular barcodes.

In Seok Yang1, Sang Won Bae2, BeumJin Park1, Sangwoo Kim1.   

Abstract

Short DNA oligonucleotides (~4 mer) have been used to index samples from different sources, such as in multiplex sequencing. Presently, longer oligonucleotides (8-12 mer) are being used as molecular barcodes with which to distinguish among raw DNA molecules in many high-tech sequence analyses, including low-frequent mutation detection, quantitative transcriptome analysis, and single-cell sequencing. Despite some advantages of using molecular barcodes with random sequences, such an approach, however, makes it impossible to know the exact sequences used in an experiment and can lead to inaccurate interpretation due to misclustering of barcodes arising from the occurrence of unexpected mutations in the barcodes. The present study introduces a tool developed for selecting an optimal barcode subset during molecular barcoding. The program considers five barcode factors: GC content, homopolymers, simple sequence repeats with repeated units of dinucleotides, Hamming distance, and complementarity between barcodes. To evaluate a selected barcode set, penalty scores for the factors are defined based on their distributions observed in random barcodes. The algorithm employed in the program comprises two steps: i) random generation of an initial set and ii) optimal barcode selection via iterative replacement. Users can execute the program by inputting barcode length and the number of barcodes to be generated. Furthermore, the program accepts a user's own values for other parameters, including penalty scores, for advanced use, allowing it to be applied in various conditions. In many test runs to obtain 100000 barcodes with lengths of 12 nucleotides, the program showed fast performance, efficient enough to generate optimal barcode sequences with merely the use of a desktop PC. We also showed that VFOS has comparable performance, flexibility in program running, consideration of simple sequence repeats, and fast computation time in comparison with other two tools (DNABarcodes and FreeBarcodes). Owing to the versatility and fast performance of the program, we expect that many researchers will opt to apply it for selecting optimal barcode sets during their experiments, including next-generation sequencing.

Entities:  

Mesh:

Substances:

Year:  2021        PMID: 33600481      PMCID: PMC7891705          DOI: 10.1371/journal.pone.0246354

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


  29 in total

1.  Illumina sequencing library preparation for highly multiplexed target capture and sequencing.

Authors:  Matthias Meyer; Martin Kircher
Journal:  Cold Spring Harb Protoc       Date:  2010-06

2.  Error-correcting barcoded primers for pyrosequencing hundreds of samples in multiplex.

Authors:  Micah Hamady; Jeffrey J Walker; J Kirk Harris; Nicholas J Gold; Rob Knight
Journal:  Nat Methods       Date:  2008-02-10       Impact factor: 28.547

3.  DNABarcodes: an R package for the systematic construction of DNA sample tags.

Authors:  Tilo Buschmann
Journal:  Bioinformatics       Date:  2017-03-15       Impact factor: 6.937

4.  Detection of ultra-rare mutations by next-generation sequencing.

Authors:  Michael W Schmitt; Scott R Kennedy; Jesse J Salk; Edward J Fox; Joseph B Hiatt; Lawrence A Loeb
Journal:  Proc Natl Acad Sci U S A       Date:  2012-08-01       Impact factor: 11.205

5.  Molecular indexing enables quantitative targeted RNA sequencing and reveals poor efficiencies in standard library preparations.

Authors:  Glenn K Fu; Weihong Xu; Julie Wilhelmy; Michael N Mindrinos; Ronald W Davis; Wenzhong Xiao; Stephen P A Fodor
Journal:  Proc Natl Acad Sci U S A       Date:  2014-01-21       Impact factor: 11.205

Review 6.  Coming of age: ten years of next-generation sequencing technologies.

Authors:  Sara Goodwin; John D McPherson; W Richard McCombie
Journal:  Nat Rev Genet       Date:  2016-05-17       Impact factor: 53.242

Review 7.  Next-Generation Sequencing Informatics: Challenges and Strategies for Implementation in a Clinical Environment.

Authors:  Somak Roy; William A LaFramboise; Yuri E Nikiforov; Marina N Nikiforova; Mark J Routbort; John Pfeifer; Rakesh Nagarajan; Alexis B Carter; Liron Pantanowitz
Journal:  Arch Pathol Lab Med       Date:  2016-02-22       Impact factor: 5.534

Review 8.  Enhancing the accuracy of next-generation sequencing for detecting rare and subclonal mutations.

Authors:  Jesse J Salk; Michael W Schmitt; Lawrence A Loeb
Journal:  Nat Rev Genet       Date:  2018-03-26       Impact factor: 53.242

9.  A comparative study of techniques for differential expression analysis on RNA-Seq data.

Authors:  Zong Hong Zhang; Dhanisha J Jhaveri; Vikki M Marshall; Denis C Bauer; Janette Edson; Ramesh K Narayanan; Gregory J Robinson; Andreas E Lundberg; Perry F Bartlett; Naomi R Wray; Qiong-Yi Zhao
Journal:  PLoS One       Date:  2014-08-13       Impact factor: 3.240

10.  Benefits and Challenges with Applying Unique Molecular Identifiers in Next Generation Sequencing to Detect Low Frequency Mutations.

Authors:  Ruqin Kou; Ham Lam; Hairong Duan; Li Ye; Narisra Jongkam; Weizhi Chen; Shifang Zhang; Shihong Li
Journal:  PLoS One       Date:  2016-01-11       Impact factor: 3.240

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.