| Literature DB >> 36038542 |
Joan Hérisson1, Thomas Duigou2, Melchior du Lac2,3, Kenza Bazi-Kabbaj2, Mahnaz Sabeti Azad2, Gizem Buldum4, Olivier Telle2, Yorgo El Moubayed2, Pablo Carbonell5,6, Neil Swainston5,7, Valentin Zulkower8, Manish Kushwaha2, Geoff S Baldwin4, Jean-Loup Faulon9,10,11.
Abstract
Here we introduce the Galaxy-SynBioCAD portal, a toolshed for synthetic biology, metabolic engineering, and industrial biotechnology. The tools and workflows currently shared on the portal enables one to build libraries of strains producing desired chemical targets covering an end-to-end metabolic pathway design and engineering process from the selection of strains and targets, the design of DNA parts to be assembled, to the generation of scripts driving liquid handlers for plasmid assembly and strain transformations. Standard formats like SBML and SBOL are used throughout to enforce the compatibility of the tools. In a study carried out at four different sites, we illustrate the link between pathway design and engineering with the building of a library of E. coli lycopene-producing strains. We also benchmark our workflows on literature and expert validated pathways. Overall, we find an 83% success rate in retrieving the validated pathways among the top 10 pathways generated by the workflows.Entities:
Mesh:
Year: 2022 PMID: 36038542 PMCID: PMC9424320 DOI: 10.1038/s41467-022-32661-x
Source DB: PubMed Journal: Nat Commun ISSN: 2041-1723 Impact factor: 17.694
Fig. 1Automated construction of 88 distinct plasmids coding for lycopene pathway operons containing genes in different orders, with varying promoters and RBSes.
a Plasmids coding for lycopene genes were assembled using the BASIC method (with DNA linkers). Genes in the lycopene pathway, crtE, crtB, and crtI (parts 3, 4, and 5), were assembled in an operon with UTR-RBS linkers containing different RBSes. The promoter (part 2) and the 3-gene operon were assembled into a backbone with p15A origin of replication (ORI) and chloramphenicol resistance gene (Cmp-R). The assembled parts were flanked by methylated linkers that recapitulate BASIC prefix and suffix (LMP and LMS). b Number of constructs with successful transformant colonies and their color are reported from Micalis Institute, Paris, and Imperial College, London. Number of constructs common to both laboratories are in the intersection. Data are from spotting 10 µL of transformation reactions by Opentrons (left) and from spotting 100 µL manually or 40 µL by Opentrons (right) on LB plates. c Count-plots show the number of constructs with successful colonies, grouped by position and gene (details in Supplementary file ‘Dataset 1’). Results are from Paris (left), from London (right), and in common (middle). Constructs have a weaker promoter (top) or stronger promoter (bottom). The RBSes are differentiated by colors. The genes’ positions in the operon are indicated on the x-axis. Means of the number of constructs for each promoter are shown by dashed lines. d Lycopene measurement (mg of lycopene per gDCW) from different constructs from both laboratories. Types of RBSes and promoters, and gene orders are indicated. E: crtE, B: crtB, I: crtI. Promoters and terminators are shown at the extremities. e Examples of red and white colonies (top), pellet preparation (middle) and acetone extracted lycopene (bottom). Source data are provided in the ‘Source Data’ file.
Fig. 2Scoring Galaxy-SynBioCAD predicted pathways with literature pathways and expert validation data.
a Pathways for different targets and different hosts are extracted from literature (cf. Literature data benchmarking subsection), this is illustrated here for production of phenol in E. coli. b Galaxy-SynBioCAD workflows are run on the literature targets and hosts. c A collection of Galaxy-SynBioCAD generated pathways is compiled. Pathway ‘A’ producing phenol in E. coli from tyrosine is highlighted. d The Galaxy-SynBioCAD generated pathways are compared with the literature pathways using a matching algorithm (cf. ‘Supplementary_Text’ file). The plot shows for each literature pathway the best matching pathways among all Galaxy-SynBioCAD generated pathways. Pathways having a matching score above 0.5 are identical (similarity of 1) to literature pathways as far as main substrate and products are concerned. The raw data can be found in Supplementary file ‘Dataset 2’, tab ‘literature_matching_score’. e Galaxy-SynBioCAD generated pathways are evaluated by metabolic engineer experts whose task is to select in batches of 5 generated pathways which ones are valid (cf. Expert validation trial benchmarking subsection). f Valid pathways according to experts and pathways matching literature are added to a training set of labeled pathways. g The set of labeled pathways is used to train a classifier printing out a machine learning score to assess if a given pathway is valid or not (cf. Machine Learning Global Scoring in Methods section). The figure plots the results obtained for all pathways generated by Galaxy-SynBioCAD. The raw data, including the training set, can be found in the Supplementary file ‘Dataset 3’. Using a machine learning global score threshold of 0.5, the accuracy retrieving literature of expert labeled pathways is 0.91 with a false positive rate of 0.10 in 4-fold cross validation (cf. Supplementary file ‘Dataset 3’, tab ‘Pathway_PredictedScore’). Source data are provided in the ‘Source Data’ file.
Fig. 3Ranking predicted pathways with machine learning global score.
The color code on the right side shows the machine learning global score (from 1 top to 0). The black boxes show the location of the literature or expert selected pathways for a set 60 literature target engineered in E. coli (*), S. cerevisiae (**) or P. putida (***). If a row does not contain a black box, then the literature or expert selected pathway is not found within the first 50 scored pathways. The numbers listed on the right side are the total numbers of pathways generated by the Galaxy-SynBioCAD workflows. The data used to generate the figure can be found in Supplementary file ‘Dataset 3’, tab ‘Lit_Pathway_Rank_ML’. Source data are provided in the ‘Source Data’ file.
Pathway and reaction features used by the XGBoost classifier
| Item | Format | Comment | |
|---|---|---|---|
| Pathway level | Chassis organisms | Integer | Taxonomy ID of the organism |
| Gibbs free energy | Float | Computed using the Thermodynamics calculations described in the Thermodynamics section | |
| Fraction of reaction FBA | Float | Target flux computed by FBA (cf. Flux Balance Analysis with fraction of reaction section) | |
| Reaction level | Reaction | 4096-bit vector | A reaction is represented by its Morgan fingerprint. Fingerprint(reaction) = Fingerprint(substrate) + Fingerprint(product). Morgan fingerprints are computed using the RDKit library[ |
| Enzyme availability score | Float | Enzyme availability score which provided a confidence level of finding an enzyme sequence catalyzing the reaction (cf. Delépine et al.[ | |
| Gibbs free energy | Float | Computed using the Thermodynamics calculations mentioned above for the provided reaction only |
Labware IDs used at Imperial College (London) and Micalis Institute (Paris) laboratories
| Description | London | Paris | Used in |
|---|---|---|---|
| P20 single channel pipette | p20_single_gen2 | p20_single_gen2 | Steps 1, 3, and 4 |
| P300 multi channels pipette | p300_multi_gen2 | p300_multi_gen2 | Steps 2 and 4 |
| Opentrons 4-in-1 tubes rack for 1.5 ml eppendorf tubes | e14151500starlab_24_tuberack_1500ul | opentrons_24_tuberack_eppendorf_1.5ml_safelock_snapcap | Steps 1, 3, and 4 |
| Opentrons 10 μL tips rack | opentrons_96_tiprack_20ul | tipone_3dprinted_96_tiprack_20ul | Steps 1, 3, and 4 |
| Opentrons 300 μL tips rack | opentrons_96_tiprack_300ul | tipone_yellow_3dprinted_96_tiprack_300ul | Steps 2 and 4 |
| 96-well rigid PCR plate (clip reactions and transformation steps) | 4ti0960rig_96_wellplate_200ul | green_96_wellplate_200ul_pcr | Steps 1 and 4 |
| 96-well rigid PCR plate (purification and assembly steps) | 4ti0960rig_96_wellplate_200ul | black_96_wellplate_200ul_pcr | Steps 2 and 3 |
| Agar plate (transformation step) | nuncomnitraysingle_1_wellplate_35000ul corning_12_wellplate_6.9ml_flat | thermoomnitrayfor96spots_96_wellplate_50ul | Step 4 |
| Reservoir plate 21 mL 12 channels | 4ti0131_12_reservoir_21000ul | citadel_12_wellplate_22000ul | Step 2 |
| 96 deep well plate 2 mL wells | 4ti0136_96_wellplate_2200ul | transparent_96_wellplate_2ml_deep | Step 2 |
DNA-Bot parameters that differ between Imperial College (London) and Micalis Institute (Paris) laboratories
| Step | Parameter | London | Paris |
|---|---|---|---|
| Purification step | magdeck_id | magdeck | magnetic module gen2 |
| magdeck_height | 20 | 10.8 | |
| settling_time | 2 | 6 | |
| drying_time | 5 | 15 | |
| elution_time | 2 | 5 | |
| wash_time | 0.5 | 0.5 | |
| bead_ratio | 1.8 | 1.8 | |
| incubation_time | 5 | 5 | |
| Transformation step | incubation_temp | 4 | 8 |
| incubation_time | 20 | 30 |