| Literature DB >> 34243709 |
Yiren Wang1, Mashari Alangari2, Joshua Hihath2, Arindam K Das3, M P Anantram4.
Abstract
BACKGROUND: The all-electronic Single Molecule Break Junction (SMBJ) method is an emerging alternative to traditional polymerase chain reaction (PCR) techniques for genetic sequencing and identification. Existing work indicates that the current spectra recorded from SMBJ experimentations contain unique signatures to identify known sequences from a dataset. However, the spectra are typically extremely noisy due to the stochastic and complex interactions between the substrate, sample, environment, and the measuring system, necessitating hundreds or thousands of experimentations to obtain reliable and accurate results.Entities:
Keywords: All-electrical detection; Conductance probability distribution; DNA sequence identification; Machine learning; Single Molecule Break Junction
Mesh:
Substances:
Year: 2021 PMID: 34243709 PMCID: PMC8268518 DOI: 10.1186/s12864-021-07841-6
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1Large sample histograms of all data classes. The R2 test is: ‘accept current trace if R2 ≤ β = 0.95’. After the current traces are R2 filtered, one histogram is constructed per class, using all available current traces, which are converted to conductance traces after low pass filtration (see Figures S2 and S3 for details)
Details of the ten datasets used in this paper
| Label | Sequence | Bias | Note | Solvent / Buffer |
|---|---|---|---|---|
| S1 | Octanedithiol | 0.30 V | Not a DNA/RNA | mesitylene |
| S2 | 5’-CCC GGG CCC GGG-3’ 3’-GGG CCC GGG CCC-5’ | 0.01 V | 100mMP + 100uL + 30uL | |
| S3 | 5’-CGA CCC CTC UUG AAC-3’ 3’-GCT GGG GAG AAC TTG-5’ | 0.05 V | E. coli O157:H7 | 10uL + 75 μm + 600MC_50BC_20RR |
| S4 | 5’-CGA CCC CTC UUG AGC-3’ 3’-GCT GGG GAG AAC TTG-5’ | 0.05 V | E. coli O175:H28 One mismatch from S3 | 30ul + 7.5 μm + Rg |
| S5 | 5’-CGA CCC CCC UUG AAC-3’ 3’-GCT GGG GAG AAC TTG-5’ | 0.30 V | E. coli ED1a One mismatch from S3 | 75 μm + 10uL |
| S6 | 5’-CCC GGG CCC GGG-3’ 3’-GGG CCC GGG CCC-5’ | 0.10 V | Same as S2 | 100mMP + 100uL + 30uL |
| S7 | 5’-CCC GGG CCC GGG-3’ 3’-GGG CCC GGG CCC-5’ | 0.20 V | Same as S2 | 100mMP + 100uL + 30uL |
| S8 | 5’-CCC GGG CCC GGG-3’ 3’-GGG CCC GGG CCC-5’ | 0.01 V | Same as S2 | 100mMP + 100uL + 20uL |
| S9 | 5’-CCC GGG CCC GGG-3’ 3’-GGG CCC GGG CCC-5’ | 0.10 V | Same as S2 | 100mMP + 100uL + 50uL |
| S10 | 5’-GGG TTT GGG-3’ | 0.01 V | G-quadruplex secondary structures | 100mMP + 100uL + 30uL |
Fig. 2Comparison of large sample, baseline, and small sample histograms.(a), (d) Large sample, (b), (e) baseline (H = 30) and (c), (f) small sample (H = 10) conductance probability histograms for datasets S1 (a-c) and S8 (d-f)
Fig. 3Confusion matrices for baseline classifiers. (a) Confusion matrices corresponding to target labeling scheme TLS-1 with 6 classes. (b) Confusion matrices corresponding to target labeling scheme TLS-2 with 8 classes
Fig. 4Performance analysis of baseline classifiers with respect to (w.r.t) β, Nbins, and H. (a), (b) Accuracy of baseline classifiers w.r.t the R2 test threshold parameter, β, corresponding to target labeling scheme TLS-1 with 6 classes (a) and TLS-2 with 8 classes (b). We chose the same color but different line types to distinguish between datasets [S2,S8], [S6,S9], and [S7], which are of the same strand but use different bias voltages. (c), (d) Accuracy of baseline classifiers w.r.t the number of histogram bins, Nbins, corresponding to target labeling scheme TLS-1 with 6 classes (c) and TLS-2 with 8 classes (d). Similar color scheme and line types as in (a, b) have been used to distinguish between datasets [S2,S8], [S6,S9], and [S7]. (e), (f) Accuracy of baseline classifiers w.r.t the number of traces used to compute a conductance histogram, H, corresponding to target labeling scheme TLS-1 with 6 classes (e) and TLS-2 with 8 classes (bs). We chose the same color but different line types to distinguish between datasets [S2,S8], [S6,S9], and S7, which are of the same strand but use different bias voltages.