Literature DB >> 31459128

Novel Method for Structure-Activity Relationship of Aptamer Sequences for Human Prostate Cancer.

Xinliang Yu^1,2, Huiqiong Yang¹, Xianwei Huang¹.

Abstract

Prostate cancer (PCa) is one of the most common malignancies in men and seriously threatens men's health. Developing aptamer probes for PCa cells is of great significance for early diagnosis and treatment of PCa. This paper reports a classification model for SELEX-based aptamers, which were obtained with PCa cell line PCa-3M-1E8 (highly metastatic tumor cell) as target cells and PCa cell line PCa-3M-2B4 (low metastatic tumor cell) as control cells. On the basis of the SELEX principle, 100 oligonucleotide sequences from the 3rd round of SELEX were defined as low affinity and specificity aptamers, and 100 sequences from the 11th round were set as high affinity and specificity aptamers. Seven molecular descriptors were used for the classification model, which were calculated from amino acid sequences translated from DNA aptamer sequences with DNAMAN software. The classification model based on binary logical regression analysis has prediction accuracies, sensitivity, and specificity of about 80% for both the training set and test set. Therefore, it is feasible to calculate molecular descriptors from amino acid sequence translated from DNA aptamer sequences and develop a classification model for PCa cell line PCa-3M-1E8.

Entities: CellLine Chemical Disease Gene Species

Year: 2018 PMID： 31459128 PMCID： PMC6644987 DOI： 10.1021/acsomega.8b01464

Source DB: PubMed Journal: ACS Omega ISSN： 2470-1343

Introduction

Aptamers are short single-stranded DNA (ssDNA) (or RNA) oligonucleotides. They can fold into distinct two-dimensional (2D) and three-dimensional (3D) structures and are capable of binding targets, such as proteins, cells, and viruses, with high affinity and specificity.[1] Compared to natural antibodies, aptamers possess several advantages in synthesis and modification. In addition, aptamers are small in sizes and display good binding affinity to targets, which make them easily penetrate tissue and tumor, and possess fast blood clearance.[2,3] Aptamers are promising in therapeutic applications, such as cancer cell detection and diagnostics, as well as in molecular tools in the areas of anti-infective, anticoagulation, anti-inflammation, antiangiogenesis antiproliferation, and immune therapies.[4,5] DNA aptamers have been developed for the virus-infected cells, mesenchymal stem cells, porcine endothelial precursor cells, and live bacterial cells.[6,7] At present, an aptamer (brand name Macugen) has been used for the treatment of neovascular age-related macular degeneration. In addition, some aptamers are currently in clinical trials.[8] Aptamers can be acquired through an in vitro iterative selection process designated as the systematic evolution of ligands by an exponential enrichment (SELEX) technique.[1,3] This is a reiterative process as follows: the starting oligonucleotide library with approximately 1014–1015 unique sequences is incubated with the targets of interest by an affinity binding method, the bound aptamer–target complexes are partitioned from the unbound oligonucleotides, the unbound oligonucleotides are discarded, the bound aptamers are eluted in from the targets and amplified by polymerase chain reaction (PCR), and the amplified aptamers are incubated with the targets in the next round of SELEX. After several rounds of selection, aptamers with high affinity and specificity will be obtained. However, the above selection processes are lengthy and need relatively higher consumption in reagents. The quantitative structure–activity relationship (QSAR) (e.g., pattern recognition model or classification model) method can reduce aptamer development costs and accelerate the process of screening candidate sequences. The main objective of the QSAR methodology is to find a mathematical relationship between chemical structures and the activity of interest. To accomplish this purpose, one usually employs a mathematical model (F)Equation correlates experimental activity values with a set of molecular descriptors, including physiochemical properties and structural properties derived from the molecular structures. Once a reliable QSAR model is developed, this model can be used to predict the activities of new chemicals. Thus, QSAR can be used to understand the mechanism, design new compounds, and screen candidates.[9] Then, the most-promising candidate may be synthesized for laboratory testing. A few groups have carried out QSAR studies for aptamers. Yu et al. introduced a classification model for predicting streptavidin-binding aptamers that may be “low”- or “high”-affinity aptamers. The model possesses a classification accuracy of 97.98% for the training set. Furthermore, the prediction fractions of winning aptamers from the 1st round to the 10th round of SELEX consist of the enrichment characteristics of aptamer based on SELEX selection.[10] In addition, Yu et al. developed another classification model for the binding affinity of aptamers of hepatocarcinoma cell line SMMC-7721.[11] The classification model has good statistical qualities with specificity and sensitivity greater than 80%. For these two classification models,[10,11] some molecular descriptors used in the models were calculated from the loop structures of center sequences. Li et al. analyzed aptamer–target pairs by applying sequence information from DNA (or RNA) aptamers and their target proteins. The predictive accuracies of the model for the training set and the test set are, respectively, 81.34 and 77.41%.[12] Musafia et al. introduced a novel QSAR model for the binding affinity of anti-influenza aptamers. The model is accurate with regression coefficients R2 being 0.702 for the training set and 0.66 for the test set, though the descriptors are based on topological structures, which cannot describe three-dimensional information, such as the electronic and geometric properties.[6] Despite aptamers having found an increasingly wide utilization, only a few QSAR studies have been reported. There are two reasons for this phenomenon. One is that experimental data of aptamers are insufficient. The other is that accurately calculating molecular descriptors from aptamers is difficult because of their larger molecular weight. In this work, we adopt a novel method to calculate molecular descriptors from aptamers against prostate cancer (PCa) cells that were selected by our group with the cell-SELEX method and develop a pattern recognition model for PCa aptamers. PCa is one of the most common cancers nowadays in men and, as the second leading cause of cancer death, seriously threatens men’s health.[13] A deficiency of early clinical diagnosis of PCa can lead to an estimated 270 thousand men dying from PCa in one year in the United States alone.[14] The development of aptamer probes for PCa is urgently required.[15−17] This work will help the SEXLEX selection of aptamer sequences for PCa.

Results and Discussion

Table S1 in the Supporting Information shows 200 center sequences of candidate aptamer (5′-AGAAGGAAGGAGAGCGACAC-N40-TATCAGTGGTCGGTCGTCAT-3′), with PCa cell line PCa-3M-1E8 acting as target cells and PCa cell line PCa-3M-2B4 being the negative control cells in SELEX selection. Sequence nos. 1–67 and 135–167 were selected from the 3rd round of SELEX, while sequence nos. 68–134 and 168–200 were obtained from the 11th round of selection. These sequences were randomly divided into two sets: a training set and a test set. Sequence nos. 1–134 from the training set were used to build classification models. The remaining sequences (nos. 135–200) in the test set were used for evaluating the predictive power of classification models. Candidate aptamer sequences in Table were selected from the 12th round of SELEX selection. These sequences obtained from the 3rd round of SELEX have low affinity and specificity for PCa-3M-1E8 target. Their class labels were defined as 1. The sequences obtained from the 11th and 12th rounds of SELEX possess high affinity and specificity for PCa-3M-1E8 target. Their class labels were defined as 2.

Table 1

Twenty-Four Candidate Aptamer Sequences from 12 Rounds

no.	sequences (5′ → 3′)	exp.	pre.
1	TGCAGGGTGAGAGGTTGGCTTTAGAGGGTTAGGGGGAATT	2	1
2	GGAGGGCTAGAGTAGGGGGCTGTCAAGGGGTCGGTGGGGA	2	2
3	TGCAGGGTGAGAGGTTGGTTTTAGAGGGTTAGGGGGAATT	2	1
4	GGAAGGGGCGTGGTTGGTAGAAAGGGAAGGGGAAGTTTAG	2	2
5	AGGGGGCAAGAGGGTGGTTTTAGAGGGGCAGGGGGAGTT	2	2
6	GGGGGAGGCGGGCGGGGTGCTGACGGGGGAGTTTAGCCGT	2	2
7	GAGGAGGTCATGGGAGAGGAGGCGGAGACGGGGAGGGATG	2	2
8	GCACGGGATCAGGGTGGGGTGGAGAGGGGAATTTTAGTGG	2	2
9	GGGACACGGTTGGGAGTGGGGTTTGGTCGTCCGGGGGATG	2	2
10	GAGCATGGAGTACGGGCGGGGTGATGACGGGGGAGTTTAG	2	2
11	AGGGGGGGTTGGGTATGCGTCCGGAGAGTTGCTCGAGTTC	2	2
12	TTCGGGCGATGGGTTAGGTTGGCGGAGGTGGGAGGGCGCGG	2	2
13	CTCTCGGGAAGTACGGTAGGAAGTGGTACCACGGGGTTA	2	2
14	TGGGGGCAAGAGGGTGGTTTTAGAGGGGCAGGGGGAGTT	2	2
15	GGGGGCGAGAGGGTGGTTTTAGAGGGGATGGGAGGAGTT	2	2
16	GGGGAGGCGGGCGGGGTGCTGACGGGGGAGTTTAGCCGT	2	2
17	GGGGAGGCGGGCGGGGTGCTGACGGGGGAGTTTAGCCGT	2	2
18	GGAGGAGGCGGGCGGGGTGCTGACGGGGGAGTTTAGCCGT	2	2
19	GTACCCGGAGACCAGTGACGGGGGTGTTTCGGCTGAAGCT	2	1
20	GTACTCTGCGTGCTGGGGGTGTTTGATGTAGTTCAGGCT	2	2
21	GTACTTGCGGAGCAGGGGTGGACCGTGTATAGTCGGGACT	2	2
22	GAGAGAGTGGGGGAGTGATCGGAGCGTGGGGTGTAGGGC	2	1
23	GGATCGGCTCGGGGGGGCAAGGGCCGGCGGGGATGTCATG	2	2
24	TAGGACGGAATGGGGGTGTGGGCTGTAGGGGAGGACAAAG	2	2

After calculating 1000 molecular descriptors for each optimized amino acid sequence that was translated from DNA aptamer sequences with DNAMAN software, binary logical regression analysis was performed for selecting the optimal subset of descriptors, by applying Forward Wald method in IBM SPSS Statistics 19 for the total data set (200 sequences) in Table S1. Classification table, variables in the classification models, and their definitions of molecular descriptors are listed in Tables , 3, and 4, respectively. Table shows that only classification model 7 has accuracies above 80%. Table suggests that all the variables in the classification model 7 possess sig. values smaller than 0.05, which indicate that these independent variables have meaningful effect on the dependent variable. Thus, the seven molecular descriptors in classification model 7 were taken as the optimal descriptor subset and used to develop logical regression eq from the training set in Table S1Here, we take the calculation of probable value for sequence no. 200 in Table S1 as an example. Its descriptor values for EED, SVV, CWP, FCO, FOS, FCS, and FNN are, respectively, 1.963, −13.483, 0.139, 28, 2, 3, and 8. From eq , we can obtain P(Y) = 1/(1 + e–0.994) = 0.730. Because the P(Y) is above 0.5, the class label of sequence no. 200 is “2”.

Table 2

Classification Table Based on the Total Set

		predicted
observed		class 1	class 2	accuracy
model 1	class 1	56	44	56.0
	class 2	32	68	68.0
	overall percentage	62.0
model 2	class 1	69	31	69.0
	class 2	32	68	68.0
	overall percentage
model 3	class 1	72	28	72.0
	class 2	26	74	74.0
	overall percentage	73.0
model 4	class 1	72	28	72.0
	class 2	22	78	78.0
	overall percentage	75.0
model 5	class 1	76	24	76.0
	class 2	23	77	77.0
	overall percentage
model 6	class 1	76	24	76.0
	class 2	20	80	80.0
	overall percentage	78.0
model 7	class 1	81	19	81.0
	class 2	19	81	81.0
	overall percentage	81.0

Table 3

Variables in the Classification Models

model	descriptor	B	SE	Waals	df	sig.	exp(B)
model 1a	F_CO	–0.125	0.027	20.919	1	0.000	0.883
model 1a	constant	4.242	0.936	20.538	1	0.000	69.544
model 2b	S_VV	–0.526	0.117	20.319	1	0.000	0.591
	F_CO	–0.221	0.038	34.229	1	0.000	0.802
	constant	0.169	1.267	0.018	1	0.894	1.184
model 3c	S_VV	–0.555	0.123	20.356	1	0.000	0.574
	F_CO	–0.228	0.039	34.630	1	0.000	0.796
	F_CS	–0.350	0.104	11.399	1	0.001	0.705
	constant	0.442	1.334	0.110	1	0.740	1.556
model 4d	E_ED	23.734	13.191	3.237	1	0.072	2.031 × 10¹⁰
	S_VV	–0.622	0.128	23.566	1	0.000	0.537
	F_CO	–0.280	0.046	37.076	1	0.000	0.756
	F_CS	–0.426	0.115	13.750	1	0.000	0.653
	constant	–45.591	25.627	3.165	1	0.075	0.000
model 5e	E_ED	42.590	15.877	7.196	1	0.007	3.139 × 10¹⁸
	S_VV	–0.684	0.136	25.155	1	0.000	0.505
	F_CO	–0.0326	0.052	39.040	1	0.000	0.722
	F_OS	1.196	0.381	9.863	1	0.002	3.306
	F_CS	–0.855	0.194	19.404	1	0.000	0.425
	constant	–82.136	30.791	7.116	1	0.008	0.000
model 6f	E_ED	36.783	16.405	5.027	1	0.025	9.431 × 10¹⁵
	S_VV	–0.853	0.157	29.673	1	0.000	0.426
	F_CO	–0.362	0.058	39.332	1	0.000	0.696
	F_OS	1.245	0.396	9.894	1	0.002	3.473
	F_CS	–0.901	0.201	20.087	1	0.000	0.406
	F_NN	–0.240	0.089	7.211	1	0.007	0.787
	constant	–70.938	31.822	4.969	1	0.026	0.000
model 7g	E_ED	36.306	16.785	4.679	1	0.031	5.852 × 10¹⁵
	S_VV	–0.838	0.161	27.062	1	0.000	0.433
	C_WP	–90.770	28.598	10.074	1	0.002	0.000
	F_CO	–0.404	0.064	40.162	1	0.000	0.667
	F_OS	1.306	0.424	9.488	1	0.002	3.691
	F_CS	–0.995	0.216	21.164	1	0.000	0.370
	F_NN	–0.329	0.098	11.388	1	0.001	0.719
	constant	–55.171	32.756	2.837	1	0.092	0.000

Variable(s) entered on model 1: FCO.

Variable(s) entered on model 2: SVV.

Variable(s) entered on model 3: FCS.

Variable(s) entered on model 4: EED.

Variable(s) entered on model 5: FOS.

Variable(s) entered on model 6: FNN.

Variable(s) entered on model 7: CWP.

Table 4

Definitions of Molecular Descriptors in Classification Models

no.	symbol	definition	class
1	E_ED	eigenvalue n (=7) from edge adjacency matrix weighted by dipole moment	edge adjacency indices
2	S_VV	signal 05/weighted by van der Waals volume	3D-MoRSE descriptors
3	C_WP	3rd component symmetry directional WHIM index/weighted by polarizability	WHIM descriptors
4	F_CO	frequency of C–O at topological distance 6	2D atom pairs
5	F_OS	frequency of O–S at topological distance 7	2D aom pairs
6	F_CS	frequency of C–S at topological distance 9	2D atom pairs
7	F_NN	frequency of N–N at topological distance 10	2D atom pairs

Subsequently, we adopted eq to predict the class labels of sequences in the test set in Table S1. The class labels calculated are listed in Table S1. Besides the statistical parameter accuracy (=true prediction/(true prediction + false prediction)), sensitivity (=true positives/(true positive + false negative)) and specificity (=true negatives/(true negative + false positives)) were used to describe the characteristics of prediction. The statistical results from eq are listed in Table , which shows that the accuracy, sensitivity, and specificity of the training set and test set are about 80%. Further, eq was used to predict class labels of sequences in Table from the 12th round of SELEX. The calculation class labels are listed in Table . The prediction accuracy is 83.3%. Therefore, the prediction results are accurate.

Table 5

Statistical Results from Logical Regression Equation

data set	label	experiment	prediction		accuracy (%)
training set			class 1	class 2
	class 1	67	53	14	79.1
	class 2	67	12	55	82.1
			sensitivity	specificity
			81.5%	79.7%
test set			class 1	class 2
	class 1	33	27	6	81.8
	class 2	33	6	27	81.8
			sensitivity	specificity
			81.8%	81.8%

Variable(s) entered on model 1: FCO. Variable(s) entered on model 2: SVV. Variable(s) entered on model 3: FCS. Variable(s) entered on model 4: EED. Variable(s) entered on model 5: FOS. Variable(s) entered on model 6: FNN. Variable(s) entered on model 7: CWP. The first descriptor EED (eigenvalue n (=7) from the edge adjacency matrix weighted by dipole moment) is an edge adjacency index. EED belongs to topological indices in nature and is calculated from the edge adjacency matrix of an H-depleted molecule. The square matrix is symmetric and has a dimension of nBO × nBO (here nBO is the number of bonds of nonhydrogen atom pairs), whose entries equal one if the bonds are adjacent and zero for nonadjacent. EED is calculated by applying the dipole moment for edge weighting schemes. Thus, EED is related to the symmetry of molecules and polarity and possesses good discrimination among isomers.[18] The second descriptor SVV, signal 05/weighted by van der Waals volume, belongs to 3D molecule representation of structures based on electron diffraction (3D-MoRSE) descriptors. They can be calculated with the following expressionwhere Morsw denotes the scattered electron intensity, the parameter s means the scattered intensity of atoms in the different directions varying from 0 to 31, r represents the distance between atoms i and j, nAT denotes the number of atoms, and w denotes an atomic property, including unweighted schemes and weighting properties, such as van der Waals volume and atomic polarizability. The descriptor SVV describes the scattering of electron intensity along the van der Waals volume. It can describe molecular similarity/diversity from three-dimensional structures.[18] The third descriptor, CWP, is the 3rd component symmetry directional WHIM index/weighted by polarizability. WHIM descriptors are based on the configuration of atoms in the 3D space, which is defined by the Cartesian axes (x, y, and z). Molecular principal axes are evaluated to obtain a unique reference frame. Besides atomic projections being carried out along their respective principal axes, atomic dispersion and distribution around the geometric center are calculated. Similarly, six different weighting schemes (including unweighted schemes, atomic mass, the van der Waals volume, the Sanderson atomic electronegativity, the atomic polarizability, and the electrotopological state indices of Kier and Hall) are taken into account to obtain various WHIM descriptors. The descriptor CWP not only reflects molecular size and shape, but also describes molecular symmetry and atom distribution with respect to invariant reference frames.[18] The remaining topological molecular descriptors, FCO (frequency of C–O with the topological distance being 6), FOS (frequency of O–S with the distance being 7), FCS (frequency of C–S with the distance being 9), and FNN (frequency of N–N with the distance being 10), are binary and frequency atom pairs. In binary atom pairs, the value (1) or (0), respectively, expresses the presence or absence of a certain pair of atoms at a specific bond distance. As for the frequency of atom pairs, each molecular descriptor means the number of atom pairs that satisfy the above conditions of topological distance. The main aim of QSAR studies is to develop correlations of molecular structures and their properties. However, aptamer structures are related to the structure of amino acid sequences translated from their corresponding DNA aptamer sequences. This is the reason why the seven molecular descriptors present in eq from amino acid sequences can describe the aptamer structures. In general, the QSAR prediction strategies depend on molecular descriptors calculated directly from molecular structures. However, our classification model is related to descriptors through an indirect route. Thus, our paper is novel and provides a new approach for the QSAR investigations of aptamers. Especially, this paper indicates that there is a correlation between the chemical structures of DNA aptamer sequences and amino acid sequences translated from the former, although this subject may be complicated. The feasibility of calculating molecular descriptors from amino acid sequences translated from DNA aptamer sequences and developing binary logical regression equation for aptamers of PCa cell line PCa-3M-1E8 has been demonstrated. However, it should be noted that our method is based on the DNA sequences encoding protein’s information to realize the prediction for aptamer sequences. However, the method cannot be used to predict their specific confirmations, which are correlated with aptamer binding specificity. The reason is that the specific confirmations of aptamers depend on target properties. That is to say, an aptamer sequence may possess different confirmations when it is applied in various fields.

Conclusions

Although directly calculating molecular descriptors from aptamers is difficult, the seven molecular descriptors EED,SVV, FCO, CWP, FOS, FCS, and FNN can describe the aptamer structures, which were calculated from amino acid sequence translated by DNA aptamer sequences with DNAMAN software. The classification model can be used to recognize candidate aptamers with high or low affinity and specificity to PCa cell line PCa-3M-1E8. The prediction accuracies, sensitivity, and specificity obtained from the training and test sets are about 80%. Therefore, the prediction results are accurate and acceptable. The investigation provides a novel approach to calculate descriptors for aptamer molecules.

Materials and Methods

Cell-SELEX Experimental

Candidate aptamer sequences used in this paper were selected in our laboratory with the cell-SELEX technique. The forward primer (FP) labeled with 6-carboxyfluorescein (FAM) and the biotinylated reverse primer (RP) were, respectively, 5′-FAM-AGAAGGAAGGAGAGCGACAC-3′ and 5′-biotin-ATGACGACCGACCACTGATA-3′, which were used for the synthesis of the random ssDNA library, 5′-TTGACTTGCCACTGACTACC-(randomized region with 40 ± 2 nucleotides)-GAAGTCAGTCGGTCGTCATC-3′. In addition, the primers (FP, 5′-AGAAGGAAGGAGAGCGACAC-3′) and (RP, 5′-ATGACGACCGACCACTGATA-3′) were applied to clone and sequence the product sequences. These primer products and the initial ssDNA pool were purchased from Sangon Biotech (Shanghai, China) Co., Ltd. PCa cell lines were bought from Shanghai Institute of Biochemistry and Cell Biology. PCa cell line PCa-3M-1E8 (highly metastatic tumor cell) was used as target cells for positive selection, and PCa cell line PCa-3M-2B4 (low metastatic tumor cell) was taken as the negative control targets during cell-SELEX. The SELEX procedures are as follows: ssDNA pool (10 nmol) was completely dissolved in 900 μL of binding buffer (obtained from washing buffer with the concentrations of 1 mg/mL for bovine serum albumin and 0.1 mg/mL for yeast tRNA) and formed the initial library; incubation of the initial library with PCa-3M-1E8 cells (1 × 106) was continued for 1 h on ice on a rotary shaker; the elution of cells from unbound sequences was performed in triplicate with washing buffer (5 mmol/L magnesium chloride and 4.5 g/L glucose in phosphate buffer saline); the bound DNAs were eluted in 500 mL of sterile water at 95 °C for 10 min and then were amplified with the PCR technique; the products were isolated by means of streptavidin-coated sepharose beads and treated with 0.2 mol/L sodium hydroxide. The enriched ssDNA aptamer pools from each round of SELEX were used for the next round. After the fourth round of SELEX selection, the counterselection was introduced to increase the specificity of aptamers, which may differentiate similar PCa cells. In addition, the incubation and washing conditions were made stringent gradually to improve aptamer performances and SELEX screening efficiency. Other SELEX procedure and materials can be found in References.[5,15] After 12 rounds of SELEX selection, the selection library reached a platform, which was monitored by using the flow cytometer (FACScalibur, BD Bioscience). The ssDNA pools from the 3rd and 11th rounds were sequenced by a high-throughput sequencing device. The resulting pool from the 12th round was sequenced with the classic method of cloning and sequencing. According to aptamer evolutionary principles of SELEX selection, the binding ability of aptamers increases exponentially during the first few rounds. Finally, the binding affinity does not increase obviously or reaches a saturation point. Although a candidate aptamer appearing in one of the first few rounds does not mean it does not bind, the sequences from 3rd round and 11th round (see Table S1) are not duplicated in our experiment. Thus, it is reasonable to define the class labels of ssDNA sequences obtained from the first few rounds (e.g., from the 1st to 3rd rounds) of SELEX as 1 and the class labels of sequences from the final rounds (such as the 11th and 12th rounds) as 2 (here, class labels 1 and 2, respectively, denote aptamer sequences having low and high affinity). The experimental affinities from aptamer candidates (nos. 1, 2, 4, and 5 in Table ) with PCa cell line PCa-3M-1E8 were tested with a flow cytometer. PCa-3M-1E8 cells (1 × 106) and candidate aptamers in diverse concentrations were incubated in binding buffer (200 mL) on ice for 30 min. The equilibrium dissociation constant Kd used to describe the affinities was calculated with eq .where the parameter C denotes the concentration of aptamer sequences, the parameter B denotes the aptamer adsorption, and the parameter Bmax expresses the saturated adsorption. The experimental Kd values are 3.24, 2.0, 46.8, and 7.1 nM, which are in the nanomolar range and acceptable.

Molecular Descriptor Derivation

An important step in classical QSAR studies is to calculate molecular descriptors. Generally, the QSAR models are based on molecular descriptors calculated directly from their respective molecular structure.[19−22] However, this approach of calculating molecular descriptors is only appropriate to small molecules, rather than macromolecules or larger molecules. For instance, the commerce soft, Dragon,[18] which has been widely used in molecular descriptor calculation, is not appropriate to aptamers (typical with 15–45 nucleotides). Since the nucleotide sequence of a genome segment encodes the amino acid sequence, the amino acid sequence indicates the structure information of the corresponding nucleotide sequence. Thus, the structure descriptors of amino acid sequence are correlated with that of their corresponding nucleotide sequence. In this paper, DNA aptamer sequences were translated into amino acid sequence with DNAMAN software (version 6.0). Then, molecular structures of amino acid sequences were constructed by ChemBioDraw Ultra 11.0 in ChemOffice 2008, which were subsequently converted to 3D structures with ChemBio3D Ultra 11.0. The 3D structures of amino acid sequences were optimized by applying the molecular mechanics (MM2 force field) method with the convergence criterion of minimum root-mean-square error of the gradient value reaching 0.42 kJ/(mol Å). Lastly, the optimized amino acid sequences saved as Sybyl MOL2 files (*.mol2) were used as the input files for Dragon 6.0 software,[18] which can calculate 4885 descriptors for each molecule. After removing redundant and nonuseful structure descriptors, which are equal to a constant or have a pair of correlation coefficients greater than 0.95, 1000 variables remained for descriptor selection.

2 in total

1. SARS-CoV-2 virus label-free electrochemical nanohybrid MIP-aptasensor based on Ni₃(BTC)₂ MOF as a high-performance surface substrate.

Authors: Zeinab Rahmati; Mahmoud Roushani
Journal: Mikrochim Acta Date: 2022-07-19 Impact factor: 6.408

2. Prediction of Depuration Rate Constants for Polychlorinated Biphenyl Congeners.

Authors: Xinliang Yu
Journal: ACS Omega Date: 2019-09-12