Literature DB >> 24866698

Aptaligner: automated software for aligning pseudorandom DNA X-aptamers from next-generation sequencing data.

Emily Lu1, Miguel-Angel Elizondo-Riojas, Jeffrey T Chang, David E Volk.   

Abstract

Next-generation sequencing results from bead-based aptamer libraries have demonstrated that traditional DNA/RNA alignment software is insufficient. This is particularly true for X-aptamers containing specialty bases (W, X, Y, Z, ...) that are identified by special encoding. Thus, we sought an automated program that uses the inherent design scheme of bead-based X-aptamers to create a hypothetical reference library and Markov modeling techniques to provide improved alignments. Aptaligner provides this feature as well as length error and noise level cutoff features, is parallelized to run on multiple central processing units (cores), and sorts sequences from a single chip into projects and subprojects.

Entities:  

Mesh:

Substances:

Year:  2014        PMID: 24866698      PMCID: PMC4059528          DOI: 10.1021/bi500443e

Source DB:  PubMed          Journal:  Biochemistry        ISSN: 0006-2960            Impact factor:   3.162


DNA aptamers are quickly maturing as research and therapeutic tools because of their ability to bind specific proteins with high affinity.[1−3] Using solution methods,[4] DNA aptamers are generally determined by serial selection rounds. However, during bead-based aptamer selection,[5,6] one collects beads in a single round or uses serial magnet heights. Traditionally, subcloned and sequenced aptamers were aligned using methods such as ClustalW,[7] and more recently methods such as Tallymer,[8] using k-mer analysis to find transposable elements, and structural alignments[9] have been reported. X-aptamers[10] are created by a bead-based pseudorandom process incorporating encoded specialty 5-X-dU bases (labeled as W, X, Y, or Z). Traditional alignment algorithms ignore the inherent X-aptamer bead-based, encoded design and fail to properly align and decode the sequences. Thus, we sought to create mapping software for X-aptamers, similar to BLAST[11] or BLAT,[12] but one using a Markov model based on the X-aptamer library designs. Traditional aptamers use two primers flanking a random region. However, X-aptamers[10] use a pseudorandom process wherein the variable region is built in 10 or mores stages, using a split and pool method[5,6,10] in which only a small number of DNA fragments are allowed at any one stage (Table S1 of the Supporting Information). Some fragments contain functionalized deoxyuracil bases (5-X-dU) that are sequenced as a T. Although next-generation sequencing (NGS) provides copious data, analyzing a single aptamer or X-aptamer project per chip is not cost-effective. For optimal utility, NGS aptamer alignment software should be capable of separating NGS data into multiple projects (Figure S2 of the Supporting Information) and subprojects when using bar coding. Aptaligner was designed to accomplish such analyses for X-aptamers and other aptamers. Logistically, Aptaligner performs the following steps: (1) determines the number of projects, (2) reads and error-checks library design files, (3) reads FastQ NGS data and reduces it to an enumerated unique list, (4) removes sequences outside of a user-defined length range, (5) removes rare sequences with a user-defined noise level, (6) asks for central processing units (CPUs), (7) builds Markov models for each library and finds the optimal alignment, (8) assigns each sequence to a project, (9) decodes positions W–Z, and (10) conducts statistical analysis. These steps are explained in more detail below and are explicitly detailed in the program user guide. In the first two steps, Aptaligner requests the number of projects having data in the FastQ file. For each project, graphical user interfaces (GUIs) ask for a project name and library design file (.csv format). The design file describes aptamer library construction from the 5′-end to the 3′-end. To sequence multiple aptamer selection rounds at once, bar codes (tags) may be added during the final polymerase chain reaction step, and they may be located at either end of 5′- or 3′-primer. For each line, two flags indicate whether the line contains primer information or bar code information (Table S1 of the Supporting Information). As a precaution, the code checks each library design file for typical errors and aborts if needed. In step 3, the FastQ file is reduced to a set of enumerated unique sequences. In our experience, using a 5 million sequence NGS chip, this step typically reduces the list from more than 5 million nonunique sequences to ∼3.5 million unique sequences. This step is used to avoid reanalyzing identical sequences thousands of times. In step 4, the user supplies a length error cutoff value to remove sequences that are excessively short or long. The GUI shows the number of unique sequences and how many of them will be retained using length cutoff values of 0 (∼20%), 5 (∼50%), 10 (∼55%), 15 (∼60%), and the current choice. The default value is five bases, and the user may accept or change this value. With a change, the window will recalculate the retained sequences. In step 5, a noise level cutoff removes sequences that occur “noise-level” or fewer times. The dialogue box indicates unique sequences retained in step 4 (length cutoff) and how many will be retained after application of the noise filter using noise levels from 0 to 5. A noise level of 1 often reduces the sequences by 90%, greatly reducing the computation time. After the user (step 6) inputs the number of CPUs (cores) to use for the calculation, Aptaligner (step 7) performs the alignments using a Markov model. The topology of the Markov model (Figure S1 of the Supporting Information) is derived individually from each library design file (Table S1 of the Supporting Information). Each base is represented as a node, with transitions linking each base to the next. Alternative sequences (alternate bar codes, alternate DNA fragments) are modeled as parallel tracks, with no transitions between them. Insertion nodes link each base to the next, allowing bases to be inserted because of technical errors. Deletions are modeled by transitions from each node to every possible node that occurs later in the library. The deletion penalty is based on the number of bases skipped. However, we do have special transitions that allow whole fragments to be deleted. Given the Markov model, and a specification of base match, mismatch, insertion, and deletion probabilities, we align an observed sequence and recover the highest-scoring alignment using the Viterbi algorithm.[13] A theoretical X-aptamer library might contain up to 100 billion perfect sequences. For each sequence, the alignment scores are used to choose the winning project. Good scores typically range from −11 to −60 (Figure S2 of the Supporting Information) but are slightly project-dependent. A list of sequences, scores, and their winning projects is retained in the main directory, and project-specific sequences are recorded into project subdirectories. Once sequences are assigned to a project, the specialty 5-X-dU bases are decoded (step 9) from T back to W, X, Y, or Z. A fragment substitution is performed for perfect fragments. For imperfect fragments, the code calculates all possible Levenshtein edit distances[14] between the fragment and theoretical choices. If, and only if, a unique smallest distance is found with a T/X alignment, the T is converted to X. All letters except B and J, and A, C, G, T, and U, can denote specialty bases, but additional mapping (“ --lib2seq X:T”) is needed. The final step calculates statistics for each bar code or random fragment, clusters the top sequences (Figure S3 of the Supporting Information), and, for the top 50 sequences, finds all close sequences (edit distance of <3) in the top 5000 sequences (Table S2 of the Supporting Information). In addition to providing superior alignment (Table 1), by ignoring small primer errors Aptaligner finds 40–100% more counts per random region total compared to exact primer matches (Exact Match). Examples are listed in Table 2.
Table 1

Top X-Aptamer Sequences Aligned against a Library (not each other) Based on a Markov Model

T1T2T3T4T5T6T7T8T9T10T11T12allR1R2R3R4R5R6R7R8R9R10
12003426028026023467GCGTGGTGX--TTCGTG---TXGCC
0000139101000001392XTTGG--GGATTTCGTGGTGGTGGC
1466000101001001469XTGCGXCACATGCCGTTCCGTGGCC
7000020001150001159XTTGGXCGGCC---GTT-----GCC
920010200097259001144GCTX-TAACCCTTCGTGGTGGTGCG
0100001010803001085TGTXGTGGXATTGCGTTGTG--GXG
0000002011001001004TCTXGGTGXATTGCGTT-----GCC
00002120098200987TCTXGGTGX--TTCGTGCCGTXGCC
21000030195900966GCGTG--GGATTGCGTTGTGTGGXG
00000000877100878-----TA--CCTGCGTTGAG--GXG
01000100082400826TCGTGXCGXGTXACGTGGTGGTGCC
72200011000000774TGGTGTGGGCCTTCGTTGXCGTGCC
Table 2

Sequence Frequencies Determined by Aptaligner and Exact Matcha

 sequenceExact MatchAptaligner
1GCGTGGTGTTTCGTGTTGCC53287602
2TTTGGGGATTTCGTGGTGGTGGC26643662
3GCGTGTGGTATTGCGTTGTCTTGGC22603616
4TTTGGTCGGCCGTTGCC20992686
5TTGCGTCACATGCCGTTCCGTGGCC20803245
6GCTTTAACCCTTCGTGGTGGTGCG19213793
7TGTTGTGGTATTGCGTTGTGGTG18372819
8TCTTGGTGTATTGCGTTGCC17852487
9TTTGGTCGTATTGCGTAGTCGTGGC16152718
10GCGGGATCCGTTGTG16152965
11GCGTGGGATTGCGTTGTGTGGTG15932836
12TCTTGGTGTTTCGTGCCGTTGCC15662298

Confidential sequences were scrambled postanalysis. There was an increase in sequence frequency for Aptaligner compared to Exact Match. These are the same sequences listed in Table 1, but they were scrambled after alignment.

Confidential sequences were scrambled postanalysis. There was an increase in sequence frequency for Aptaligner compared to Exact Match. These are the same sequences listed in Table 1, but they were scrambled after alignment. The software and test files (library design, FastQ, and output) are free to noncommercial users at www.uth.edu/nbme/Aptaligner.htm or bioinformatics.uth.tmc.edu. It requires linux (tested on RedHat and Fedora) and the following: Python (www.python.org), including xlrd, argeparse, and openpyxl; Biopython (www.biopython.org); R,[15] including R libraries MiscPsycho,[16] stringr,[17] dendroextras,[18] and statmod;[19] numpy (http://www.numpy.org/); and C. The software was tested on a laptop (Intel Core2 Duo, CPU P8400 @2.27 GHz, 4GB RAM, 64-bit Operating System, RHEL 6.5), a Dell Precision T7500N computer workstation with dual quad core Intel Xeon processor E5603 (1.6 GHz) with 12 GB RAM, with a Redhat Linux (RHEL 6.5) operating system, and a server with 48 cores (Dell PowerEdge R815; 2× AMD Opteron 6174 2.2 GHz processors (4 total); 8× 32Gb 1333 MHz Dual Ranked RDIMM RAM; No OS; RAID 5; PERC H700, 512 Mb Cache; 6 × 300Gb 10k RPM SCSI HD; 1100w power supply). It was also run using a Fedora 20 virtual machine. Aptaligner’s complexity is O(NMM), similar to pairwise alignments, where N equals sequence length and M equals the library length. Accordingly, the software requires significant memory (3G) and up to 12GB free disk space per NGS chip. Insufficient resources elicit a run-time warning.
  11 in total

Review 1.  Aptamers: an emerging class of molecules that rival antibodies in diagnostics.

Authors:  S D Jayasena
Journal:  Clin Chem       Date:  1999-09       Impact factor: 8.327

2.  BLAT--the BLAST-like alignment tool.

Authors:  W James Kent
Journal:  Genome Res       Date:  2002-04       Impact factor: 9.043

3.  Construction and selection of bead-bound combinatorial oligonucleoside phosphorothioate and phosphorodithioate aptamer libraries designed for rapid PCR-based sequencing.

Authors:  Xianbin Yang; Suzanne E Bassett; Xin Li; Bruce A Luxon; Norbert K Herzog; Robert E Shope; Judy Aronson; Tarl W Prow; James F Leary; Romy Kirby; Andrew D Ellington; David G Gorenstein
Journal:  Nucleic Acids Res       Date:  2002-12-01       Impact factor: 16.971

4.  Multiple sequence alignment with the Clustal series of programs.

Authors:  Ramu Chenna; Hideaki Sugawara; Tadashi Koike; Rodrigo Lopez; Toby J Gibson; Desmond G Higgins; Julie D Thompson
Journal:  Nucleic Acids Res       Date:  2003-07-01       Impact factor: 16.971

5.  Systematic evolution of ligands by exponential enrichment: RNA ligands to bacteriophage T4 DNA polymerase.

Authors:  C Tuerk; L Gold
Journal:  Science       Date:  1990-08-03       Impact factor: 47.728

6.  X-aptamers: a bead-based selection method for random incorporation of druglike moieties onto next-generation aptamers for enhanced binding.

Authors:  Weiguo He; Miguel-Angel Elizondo-Riojas; Xin Li; Ganesh Lakshmana Rao Lokesh; Anoma Somasunderam; Varatharasa Thiviyanathan; David E Volk; Ross H Durland; Johnnie Englehardt; Claudio N Cavasotto; David G Gorenstein
Journal:  Biochemistry       Date:  2012-10-11       Impact factor: 3.162

7.  Immunofluorescence assay and flow-cytometry selection of bead-bound aptamers.

Authors:  Xianbin Yang; Xin Li; Tarl W Prow; Lisa M Reece; Suzanne E Bassett; Bruce A Luxon; Norbert K Herzog; Judy Aronson; Robert E Shope; James F Leary; David G Gorenstein
Journal:  Nucleic Acids Res       Date:  2003-05-15       Impact factor: 16.971

Review 8.  Aptamers as therapeutics.

Authors:  Anthony D Keefe; Supriya Pai; Andrew Ellington
Journal:  Nat Rev Drug Discov       Date:  2010-07       Impact factor: 84.694

9.  Rapid identification of cell-specific, internalizing RNA aptamers with bioinformatics analyses of a cell-based aptamer selection.

Authors:  William H Thiel; Thomas Bair; Andrew S Peek; Xiuying Liu; Justin Dassie; Katie R Stockdale; Mark A Behlke; Francis J Miller; Paloma H Giangrande
Journal:  PLoS One       Date:  2012-09-04       Impact factor: 3.240

10.  A new method to compute K-mer frequencies and its application to annotate large repetitive plant genomes.

Authors:  Stefan Kurtz; Apurva Narechania; Joshua C Stein; Doreen Ware
Journal:  BMC Genomics       Date:  2008-10-31       Impact factor: 3.969

View more
  8 in total

1.  Selection of PD1/PD-L1 X-Aptamers.

Authors:  Hongyu Wang; Curtis H Lam; Xin Li; Derek L West; Xianbin Yang
Journal:  Biochimie       Date:  2017-09-11       Impact factor: 4.079

2.  DNA Thioaptamer with Homing Specificity to Lymphoma Bone Marrow Involvement.

Authors:  Junhua Mai; Xin Li; Guodong Zhang; Yi Huang; Rong Xu; Qi Shen; Ganesh L Lokesh; Varatharasa Thiviyanathan; Lingxiao Chen; Haoran Liu; Youli Zu; Xiaojing Ma; David E Volk; David G Gorenstein; Mauro Ferrari; Haifa Shen
Journal:  Mol Pharm       Date:  2018-03-26       Impact factor: 4.939

3.  Crystal structures of thrombin in complex with chemically modified thrombin DNA aptamers reveal the origins of enhanced affinity.

Authors:  Rafal Dolot; Curtis H Lam; Malgorzata Sierant; Qiang Zhao; Feng-Wu Liu; Barbara Nawrot; Martin Egli; Xianbin Yang
Journal:  Nucleic Acids Res       Date:  2018-05-18       Impact factor: 16.971

4.  Morph-X-Select: Morphology-based tissue aptamer selection for ovarian cancer biomarker discovery.

Authors:  Hongyu Wang; Xin Li; David E Volk; Ganesh L-R Lokesh; Miguel-Angel Elizondo-Riojas; Li Li; Alpa M Nick; Anil K Sood; Kevin P Rosenblatt; David G Gorenstein
Journal:  Biotechniques       Date:  2016-11-01       Impact factor: 1.993

5.  X-Aptamer Technology Identifies C4A and ApoB in Blood as Potential Markers for Schizophrenia.

Authors:  Consuelo Walss-Bass; Ganesh L R Lokesh; Elena Dyukova; David G Gorenstein; David L Roberts; Dawn Velligan; David E Volk
Journal:  Mol Neuropsychiatry       Date:  2018-10-10

Review 6.  Development of Phosphorothioate DNA and DNA Thioaptamers.

Authors:  David E Volk; Ganesh L R Lokesh
Journal:  Biomedicines       Date:  2017-07-13

7.  A surrogate marker for very early-stage tau pathology is detectable by molecular magnetic resonance imaging.

Authors:  Parag Parekh; Qingshan Mu; Andrew Badachhape; Rohan Bhavane; Mayank Srivastava; Laxman Devkota; Xianwei Sun; Prajwal Bhandari; Jason L Eriksen; Eric Tanifum; Ketan Ghaghada; Ananth Annapragada
Journal:  Theranostics       Date:  2022-07-18       Impact factor: 11.600

8.  X-aptamers targeting Thy-1 membrane glycoprotein in pancreatic ductal adenocarcinoma.

Authors:  Hongyu Wang; Xin Li; Lisa A Lai; Teresa A Brentnall; David W Dawson; Kimberly A Kelly; Ru Chen; Sheng Pan
Journal:  Biochimie       Date:  2020-11-23       Impact factor: 4.079

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.