| Literature DB >> 20860850 |
Sébastien Terrat1, Eric Peyretaillade, Olivier Gonçalves, Eric Dugat-Bony, Fabrice Gravelat, Anne Moné, Corinne Biderre-Petit, Delphine Boucher, Julien Troquet, Pierre Peyret.
Abstract
BACKGROUND: Microorganisms display vast diversity, and each one has its own set of genes, cell components and metabolic reactions. To assess their huge unexploited metabolic potential in different ecosystems, we need high throughput tools, such as functional microarrays, that allow the simultaneous analysis of thousands of genes. However, most classical functional microarrays use specific probes that monitor only known sequences, and so fail to cover the full microbial gene diversity present in complex environments. We have thus developed an algorithm, implemented in the user-friendly program Metabolic Design, to design efficient explorative probes.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20860850 PMCID: PMC2955052 DOI: 10.1186/1471-2105-11-478
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Results window produced by Metabolic Design. Each metabolite is represented by small yellow squares, called nodes, and enzymes as edges between nodes. Inner windows give the parsing of BLASTp results ordered by increasing Expected Value and obtained for each reference protein as query. For each extracted homologous protein sequence, data such as sequence in EMBL format (f button), sequence in FASTA format (s button), or split BLASTp alignment results (a button) are directly available through the inner window toolbar buttons. The w button, allows the execution of ClustalW alignment on pre-selected protein sequences. Such sequences can also be saved in a single file in FASTA format (s+ button), and/or used to launch the probe design module (o button). Additional functions have also been implemented. The user can automatically highlight potential metabolic capacities of a given organism (species name) subsequently using the 'catch' and 'view' buttons at the top of the window.
Figure 2Strategy to design explorative probes for functional microarrays used in Metabolic Design. After extraction of potential candidate sequences by BLASTp using query reference protein to compare against concatenated Swiss-Prot and TrEMBL databases, a multiple alignment with selected protein sequences and the reference protein is performed. The next step is in two parts: (i) for each molecular site, amino acids are backtranslated, taking into account all genetic code redundancy to determine a degenerate nucleic consensus sequence, (ii) probes are then extracted from this consensus sequence, according to defined user parameters. The program then searches for all potential cross-hybridizations for each selected probe against the 'Cross-hybridization database' by tBLASTn. Kane's criteria are then checked for all positive results by BLASTn. If Kane's criteria are in agreement with a potential cross-hybridization, the program also checks whether it is a potential member of the targeted enzyme family using a BLASTx comparison against the reference protein. Cross-hybridization results are then clustered by BLASTn, stored and visualized in an output file.
Figure 3Probe determination with Metabolic Design. A: The inner window of Metabolic Design showing results for designing probes with Metabolic Design and parameters (such as probe size, degeneracy or inosine composition) defined by the user. The user's chosen parameters are visible on the left, and potential probes are listed on the right. The program also displays all potential peptide combinations for each degenerate probe (named as oligopeptides) with probe listing. B: Example of probe design approach, degeneracy calculation and inosine percentage determination for the third probe in the inner window. Note that inosine residues are not taken into account for the degeneracy calculation step.
Reference enzyme information.
| REFERENCE PROTEIN | BLAST THRESHOLD AND SEQUENCES USED | |||||
|---|---|---|---|---|---|---|
| Putative alpha subunit of ring-hydroxylating dioxygenase | Q65AT1 | [ | 1e-40 | B2Z3Z2, A2TC87, A4XDY3, 085843, A9Y004, B5L7S0, B5L7R9, Q1HCP6, Q7WUA0 | ||
| Putative beta subunit of ring-hydroxylating dioxygenase | Q65AT0 | [ | 1e-30 | A2TC88, A4XDY2, 085842, B5L7R8 | ||
| Putative large subunit of oxygenase | Q83VL2 | [ | 1e-40 | A2TC29, A9XZZ2, Q65AS5 | ||
| Putative small subunit of oxygenase | Q83VL1 | [ | 1e-30 | A9XZZ3, Q65AS6, A4XDV1, 085992, A2TC30, Q9Z4T6 | ||
| Putative 1,2-dihydrodiol-l,2-dihydroxy-dehydrogenase | Q9X9Q9 | [ | 1e-40 | Q14RW3, 085972 | ||
| Putative biphenyl-2,3-diol 1,2-dioxygenase | P74836 | [ | 1e-40 | PI 1122, Q6LCU9, Q7DG81, A4XDU9, 085990, A9XZZ5, Q65AS8, Q9KWI2 | ||
| Putative ferredoxin component of dioxygenase | A2TC31 | [ | 1e-20 | 034128, Q65AS7, A9XZZ4, A4XDV0, 085991, Q83VL0 | ||
| Putative ferredoxin reductase component of dioxygenase | A2TC59 | [ | 1e-40 | Q83VI9, A4XDS3, 085962 | ||
Organism name and source, accession number and bibliographic reference for each reference protein. BLASTp expected threshold values used and selected sequences for multiple alignments for the probe design are given.
Selected probe information.
| Targeted Gene | Probe name | Sequence | Number of unique DNA sequences used for the probe design | Number of specific probes | Positions on the reference gene sequence |
|---|---|---|---|---|---|
| phnA1a_MD_A | GTITGYAAYTAYCAYGGITGGGT | 5 | 256 | 294 - 316 | |
| phnA1a_MD_B | CAYGARATHGARGTITGGACITA | 4 | 384 | 957 - 979 | |
| phnA2a_MD_A | GARGAYATHCAYTAYTGGATGCC | 2 | 48 | 123 - 145 | |
| phnA2a_MD_B | GGICARGTITGGATGGARGAYCC | 3 | 128 | 261 - 284 | |
| ahdA1c_MD_A | GARTGYGTITAYCAYCARTGGGC | 3 | 128 | 318 - 340 | |
| ahdA1c_MD_B | GAYGCIGCIGAYAARCARGCITA | 2 | 1024 | 771 - 793 | |
| ahdA2c_MD_A | GAYGAYMGIYTIGARGARTGGCC | 3 | 1024 | 081 - 103 | |
| ahdA2c_MD_B | ATHGAYACIATGATGGTIMGICC | 3 | 768 | 459 - 481 | |
| bphB_MD_A | AAYGTIGGIATHTGGGAYTWYAT | 3 | 768 | 261 - 283 | |
| bphB_MD_B | AAYBTIAARGGITAYTTYTTYGG | 3 | 384 | 348 - 370 | |
| bphC_MD_A | CCITAYTTYATGCAYTGYAAYGA | 5 | 128 | 558 - 580 | |
| bphC_MD_B | TGGYTITGGGARTTYGGITGGGG | 4 | 128 | 777 - 799 | |
| bphA3_MD_A | ATHATHGARTGYCCITTYCAYGG | 2 | 576 | 180 - 202 | |
| bphA3_MD_B | ATHGAIGAYGGITGGGTITGYAT | 3 | 768 | 279 - 302 | |
| ahdA4_MD_A | GCIAAYGTICCIGAYAAYTTYTT | 2 | 1024 | 159 - 181 | |
| ahdA4_MD_B | CARGARACITAYCARAAYGCIGC | 2 | 512 | 867 - 889 | |
Total number of specific probes from the probe degenerate sequence and relative positions on the reference gene sequence for each targeted gene are described._Numbers of unique DNA sequences, coding for studied enzymes are also given to highlight that our probes target known genes but also unknown ones. Nomenclature: M: A and C; R: A and G; W: A and T; S: G and C; Y: C and T; H: A, C and T; D: A, G and T; B: G, T and C; I: A, C, G and T.
Results obtained with designed probes for a mixture of phenanthrene and fluoranthene.
| Gene name | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 256 | 384 | 48 | 128 | 128 | 1024 | 1024 | 768 | 768 | 384 | 128 | 128 | 576 | 768 | 1024 | 512 | |
| 1 | 2 | 0 | 1 | 3 | 1 | 2 | 1 | 4 | 1 | 1 | 1 | 3 | 1 | 0 | 0 | |
| 18.32 | 6.62 | X | 22.64 | 8.61 | 9.93 | 8.92 | 16.26 | 5.79 | 4.09 | 4.47 | 4.54 | 36.87 | 9.79 | X | X | |
| Yes | No | No | Yes | No | Yes | Yes | Yes | Yes | No | Yes | No | Yes | Yes | No | No | |
| 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
For each degenerate probe defined targeting two different regions (A and B) of genes (phnA1a, phnA2a, ahdA1c, ahdA2c, bphB, bphC, bphA3 and ahdA4), total number of specific probes stemming from the degenerate sequence, total number of specific probes giving a 'positive' signal (with a SNR' > 3), highest median SNR' visualized for each targeted region of each gene and whether the probe specific to the strain EPA505 gene gives this highest signal median SNR'.
Results obtained with designed probes with total DNA extracted from the contaminated soil S3.
| Gene name | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| 256 | 128 | 1024 | 1024 | 768 | 768 | 128 | 768 | 1024 | |
| 0 | 37 | 204 | 18 | 1 | 36 | 16 | 44 | 2 | |
| 0 | 28.90 | 19.92 | 1.75 | 0.13 | 4.68 | 12.50 | 5.72 | 0.19 | |
| 0 | 9.47 | 42.85 | 7.05 | 4.29 | 6.33 | 4.43 | 8.84 | 3.48 | |
For each degenerate probe defined targeting one particular region (A or B) of genes (phnA1a, phnA2a, ahdA1c, ahdA2c, bphB, bphC, bphA3 and ahdA4), total number of specific probes stemming from the degenerate sequence, total number of specific probes giving a 'positive' signal (with a SNR' > 3), probe percentage giving a 'positive' signal and highest signal median SNR' visualized for each targeted region of each gene.
Figure 4Median SNR' for the contaminated soil with 128 specific probes targeting the . This graphic represents the detected median SNR' for each specific probe (ordered by sequence) derived from the degenerate defined probe phnA2a_MD_B targeting one particular region of phnA2a gene. Black squares: signals obtained with the model strain EPA505 with a mix of both pollutants (the highest signal is given by the specific probe targeting the strain EPA505 specific gene). Gray diamonds: signals obtained with total DNA extracted from the soil S3 (clearly showing a particular probe signature). The dotted line represents the defined threshold for SNR' values.