| Literature DB >> 17540033 |
Pan Du1, Warren A Kibbe, Simon M Lin.
Abstract
BACKGROUND: Oligonucleotide probes that are sequence identical may have different identifiers between manufacturers and even between different versions of the same company's microarray; and sometimes the same identifier is reused and represents a completely different oligonucleotide, resulting in ambiguity and potentially mis-identification of the genes hybridizing to that probe.Entities:
Year: 2007 PMID: 17540033 PMCID: PMC1891274 DOI: 10.1186/1745-6150-2-16
Source DB: PubMed Journal: Biol Direct ISSN: 1745-6150 Impact factor: 4.540
Figure 1The mapping from probes to genes and annotations. Note that both genes (for example the definition of a gene or its representative sequence) and annotations (for example, functional annotation by Gene Ontology) are dynamic; so are the mappings among them. Only the probe sequence is stable over time.
Figure 2The encoding and decoding process of nuID. The solid arrows represent the encoding process, and the dashed arrows represent the decoding process. The bold-italic number 11 is the numeric value of the checking code "L". The "AA" at the end of sequence is the padded nucleotides.
Examples of nuIDs
| Array Type | Manufacturer's Proprietary Identifier | Nucleotide Sequence | nuID |
| Affymetrix Human | 206064_s_at_probe1 | TGTATATGTCTGGTTTTCTTACCCC | a7M7ev98VQ |
| Illumina Human | GI_23097300-A | GCTTCACTCGCTTCCCAGGGGCTCCGTTCACCAACTACATGAGCTACACG | cn0dn1Sqdb0UHE4nEY |
| Illumina Mouse | TRBV23_AE000664_T_cell_receptor_beta_variable_23_106-S | GACCCTTCGAAGTGAAAGAACACAGTCATGTTATATGGTATAGTCATGGT | 9hX2C4CBEtO8zrMtOs |
The error detection power of the nuID checksum algorithm (N = 21)
| 1-character | 2-character | 3-character | Random | |
| 25mer | 0.97780 | 0.97918 | 0.98689 | 0.99924 |
| 50mer | 0.97724 | 0.97838 | 0.98607 | 0.99997 |
| 100mer | 0.97894 | 0.97825 | 0.98617 | 1* |
L and N are defined in Equation (3) and (4) in Methods. The column "1-character" is the error detection rate of an nuID with only one character mutated. Similar definition for column "2-character" and "3-character". "Random" column is error detection rate of a random ASCII string. The optimum detection power is 1.0.
* We realize the detection of nuIDs for 100mers is not guaranteed, but in none of our simulations did we ever encounter a randomly assembled string that was a valid nuID.