| Literature DB >> 26187740 |
Abstract
BACKGROUND: Genome-wide association studies show that most human traits and diseases are caused by a combination of environmental and genetic causes, with each one of these having a relatively small effect. In contrast, most therapies based on macromolecules like antibodies, antisense oligonucleotides or peptides focus on a single gene product. On the other hand, complex organisms seem to have a plethora of functional molecules able to bind specifically to multiple genes or genes products based on their sequences but the mechanisms that lead organisms to recruit these multispecific regulators remain unclear.Entities:
Mesh:
Substances:
Year: 2015 PMID: 26187740 PMCID: PMC4506634 DOI: 10.1186/s12864-015-1727-6
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1Bias of dinucleotide frequencies observed in species of different complexity. Mutational biases CA/AC (a), AT/TA (b), AG/GA (c), GC/CG (d). Examples of sequences and their frequencies in human cDNA comprising a random collection of low-frequency (e) or high-frequency dinucleotides are shown (f). Horizontal lines represent ratio 1:1, i.e., no bias. Error bars are contained in data points and represent 95 % confidence intervals (binomial distributions) in comparison with random expectations. Ec (Escherichia coli CFT073), At (Arabidopsis thaliana), Ce (Caenorhabditis elegans), Dm (Drosophila melanogaster), Mm (Mus musculus), Hs (Homo sapiens)
Fig. 2Increase in the variance of sequence interactivity in the human and mouse genomes. 3D plots of sequence frequency in cDNA relative to the length and relative interactivity (r. interactivity) measured as the percentage of high-frequency (CA, AT, GC, AG) vs low-frequency dinucleotides (AC, TA, CG, GA) show an increase in the variance of sequence interactivity in Mus musculus and Homo sapiens (e and f) in comparison with Escherichia coli (a), Arabidopsis thaliana(b), Caenorhabditis elegans(c) and Drosophyla melanogaster(d)
Fig. 3High variance in the interactivity of sequences facilitates the recruitment of multispecific regulators. (a) Average ratio of dinucleotide frequencies with the same base composition observed in species of different complexities. (b) Human sequences in the 3’UTR have a very similar nucleotide composition to the rest of the genome in comparison with the 5’UTR, increasing the likelihood of interactions with other sequences by complementarity. (c) Relative proportion per nucleotide of the common nucleotide sequences targeting genes of therapeutic interest identified in this work (considering the average nucleotide size of the 5’UTR, coding sequence and 3’UTR as 200, 1340 and 800 bp, respectively). Error bars are contained in data points in (A) and (B) and represent 95 % confidence intervals (binomial distributions) in comparison with random expectations. Horizontal lines represent ratio 1:1, i.e., no bias. Ec (Escherichia coli CFT073), At (Arabidopsis thaliana), Ce (Caenorhabditis elegans), Dm (Drosophila melanogaster), Mm (Mus musculus), Hs (Homo sapiens)
Antisense gapmer oligonucleotides that exclusively demonstrate reverse complementarity to multiple cDNAs related to particular disorders. All nucleotides are linked by phosphorothioate linkages *, and conformationally restricted nucleotide monomers, such as tricycle-DNA, LNAs and MOEs, are preceded by +. Additional sequences can be found in Table S1 and will be updated at www.wikisequences.org
| Antisense oligonucleotides | Targets |
|---|---|
| Cancer | |
| +T*+T*+G*A*T*G*G*G*G*A*A*C*T*+T*+G*+G |
|
| +G*+C*+C*A*A*G*C*C*A*A*A*+G*+T*+C |
|
| +A*+G*+G*T*C*C*A*G*T*T*T*+C*+T*+G |
|
| +T*+G*+T*C*A*G*C*T*G*T*C*+A*+T*+T |
|
| +T*+T*+G*G*T*T*T*C*C*T*T*+T*+G*+C |
|
| +G*+G*+C*C*A*G*G*C*C*A*A*A*+G*+T*+C |
|
|
| |
| +A*+C*+C*A*G*C*T*G*C*T*T*G*+A*+A*+G |
|
| +G*+G*+C*C*A*G*G*C*C*A*A*A*G*+T*+C*+A |
|
| +G*+C*+C*A*T*C*C*A*C*T*T*+C*+A*+C |
|
|
| |
| +T*+T*+G*C*G*G*G*C*A*G*C*C*+A*+G*+G |
|
| +G*+T*+T*A*C*A*A*G*C*A*T*+C*+A*+T |
|
| +A*+G*+C*C*A*C*T*G*G*A*T*+G*+T*+G |
|
| +T*+G*+T*G*A*T*A*C*T*T*T*+C*+T*+G |
|
| +A*+C*+A*T*C*A*C*T*C*T*G*G*T*G*+G*+G*+T |
|
| +C*+A*+C*C*T*G*G*T*A*G*G*C*+G*+C*+A |
|
| +T*+C*+A*C*T*G*T*A*C*A*C*+C*+T*+T |
|
| +C*+A*+C*C*T*G*G*T*A*G*G*C*+G*+C*+A |
|
| Immunological diseases | |
| +C*+C*+A*A*C*C*T*T*C*A*+C*+A*+C |
|
| +T*+C*+T*C*C*T*T*C*C*T*C*T*G*+C*+T*+T |
|
| +C*+C*+G*T*G*G*G*T*C*C*C*T*G*+G*+C*+A |
|
Peptide sequences present in multiple proteins involved in particular human diseases. These peptides could be used as decoys, antigens to raise antibodies, or if aiming to intracellular targets delivered directly in the format of stapled peptides, incorporated as loops into naturally occurring cyclic peptides, or used after conjugation with molecules that aid in cellular uptake. Additional sequences can be found in Table S2 will be updated at www.wikisequences.org
| Sequence | Targets |
|---|---|
| Cancer | |
| ILLLDEATSALDTESE. |
|
| KVLGSGAFGTVYKG. |
|
| KVAVKMLKS. |
|
| VHRDLAARNVLV. |
|
|
| |
| RIY |
|
| VWELMTFG. | |
| YQLYSRTSGKH. |
|
| PSQRPTFKQLVEDLDR. |
|
| ERSPHRPILQAGLPAN. |
|
| MEKKLHAVPA. |
|
| SEMEMMKMIGKHKNII |
|
| NLLGACTQ. | |
|
|
|
| KCIHRDLAARNVLVT |
|
| EDNVMKIADFGLAR. | |
| FSVLYTVPAT. |
|
|
| |
| YPERPIIFLS. |
|
|
| |
| FLALDLGGTNFRVL. |
|
| QLELPVKYA. |
|
| LLCDKVQKDDIEVRF. |
|
|
| |
| LCLEERDWLPG. |
|
| GLFWANLRAAIN. |
|
| IEKSYKSIFVL. |
|
eextracellular, iinternal, ttransmembrane site, bold: repeated motif
Fig. 4Prediction of peptide structures. Further characterization of the peptides included in Table 2, red: helical, green: extended, blue: coil. Peptide structure was predicted using PEP-FOLD, which is based on hidden Markov models. The structure of these peptides is of importance to define their usefulness depending on their application (eg to raise antibodies, as decoys etc.)