| Literature DB >> 17663785 |
Yunjia Chen1, Shihong Qiu, Chi-Hao Luan, Ming Luo.
Abstract
BACKGROUND: Expression of higher eukaryotic genes as soluble, stable recombinant proteins is still a bottleneck step in biochemical and structural studies of novel proteins today. Correct identification of stable domains/fragments within the open reading frame (ORF), combined with proper cloning strategies, can greatly enhance the success rate when higher eukaryotic proteins are expressed as these domains/fragments. Furthermore, a HTP cloning pipeline incorporated with bioinformatics domain/fragment selection methods will be beneficial to studies of structure and function genomics/proteomics.Entities:
Mesh:
Substances:
Year: 2007 PMID: 17663785 PMCID: PMC1950093 DOI: 10.1186/1472-6750-7-45
Source DB: PubMed Journal: BMC Biotechnol ISSN: 1472-6750 Impact factor: 2.563
Figure 1The primer design strategy using two pairs of primers. Primer F2 and R2 contained attB sites and no gene specific region, which could be synthesized in bulk; Primer F1 and R1 contained gene specific sequences and an overlap region with Primer F2 and R2. CDS stands for coding sequence and a protease cleavage site was engineered after attB1 site.
Figure 3A schematic representation of HTP cloning and expression pipeline with the aid of bioinformatics tools. In above HTP cloning pipeline, some steps, which were marked with star, were not performed on BiomekFX robot. ExtractCDS and BatchPrimer were two PERL programs used for extraction of the DNA coding sequence from a full-length sequence (ORF) and design of gene specific primers.
Figure 4An E-Gel test result for entry clones of the second plate of 94 human genes. 2% E-Gel® 96 Agarose with E-Gel® Low Range Quantitative DNA Ladder were used in the test.
Statistic of PCR and entry clone success rates of HTP cloning
| all | PCR (success rate) | entry clone (success rate) | |
| 90 | 85 (94.4%) | 83 (92.2%) | |
| Human | 188 | 184 (97.9%) | 183 (97.3%) |
| Brucella | 88 | 88 (100%) | 88 (100%) |
| Total | 366 | 357 (97.5%) | 354 (96.7%) |
Figure 5Two examples for interpreting DDBP (domain/domain boundary prediction) method. A: According to the prediction of Interpro/InterProScan, 3-H6 (NP_508026), a 431-amino-acid protein that has no TM region or the signal peptide, possibly contained three domains: Domain1 (24–118), Domain 2 (141–234), and Domain 3 (254–370). a, b, c, d on the right of horizontal lines mark four separate alignment results between protein 3-H6 and Protein Data Bank (PDB) database. a: the region 4–131 of 3-H6 is homology with the region 20–147 of 1ROU with 60% identity; b: the region 4–244 of 3-H6 was similar to the region 41–280 of 1Q1C with 48% identity; c: the region 7–408 of 3-H6 was similar to the region 24-422 of a 1KTO/A with 40% identity; d: the region 128–428 of 3-H6 was similar to the region 22–330 of 1P5Q/A with 35% identity. By combining the results of Interpro/InterProScan and alignments, three protein fragments (1–131, 128–244, and 245–431) were selected for 3-H6 as stable domains/fragments. B: 11020-H6 (corresponding to the region 299–792 of protein NP_493412), a 494-amino-acid protein that has no TM regions or the signal peptide, was predicted to have three possible domains/fragments (Fragment1: 53–225; Fragment2: 236–494; Fragment3: 337–475) by InterPro/InterProScan (shown on top). DLF results showed that protein 11020-H6 may contain five possible domain linkers (DL1: 19–52; DL2: 106–145; DL3: 215–241; DL4: 325–330; DL5, 373–383) (shown at the bottom). The stable domains/fragments of 11020-H6 were predicted as 53–225, 236–494 and 331–494 by the DDBP method (shown as the conclusion in the box at right).
Comparisons between experimental and DDBP prediction results*
| 11058-C7 | 249 | no | no | 4–190; 1–220; 80–190 | no | 24%, (7–240/8–268, 288); 26%, (4–240/7–225, 251) | (1–249)# | 1–249 | NP_506406 | F20G2.1 | |
| 11048-D3 | 199 | no | no | 9–199 | 76–98, 181–181 | 26%, (5–160/3–164, 208) | (1–199)$ | 1–199 | NP_502315 | F35G2.2 | |
| 11011-D8 | 190 | no | no | 8–36, 47–75, 83–111; 2–107; | 148–172, 108–120, 82–92, 49–51 | 27%, (37–105/1–69, 146) | (45–190)& | 1–107, 52–190 | NP_493641 | F23F1.2 | |
| 18-A2 | 210 | no | no | 29–79; 108–194 | no | 30%, (75–193/4–115, 135) | (74–210)& | 75–210 | NP_491893 | BAG1 (human) homolog family member (bag-1) | |
| 11033-F3 | 208 | no | no | 6–74; 128–194; 1–97; 80–207 | 39–56 | 31%, (2–207/1–197, 198) | (1–208)# | 1–208 | NP_496863 | Glutathione S-Transferase family member (gst-16) | |
| 11-D11 | 346 | no | no | 80–317; 55–320; 219–317 | 19–34, 116–126 | 31%, (76–334/36–291, 298) | (56–346)& | 55–346 | NP_491872 | C55B7.3 | |
| 11104-F4 | 370 | no | no | 2–221, 1–370 | 126–143, 348–352, 19–25, 71–82, 292–297, 233–237 | 34%, (128–347/15–231, 265) | (125–370, 1–124)& | 1–125, 128–347 | NP_001040820 | Cell Division Cycle related family member (cdc-37) | |
| 79-D4 | 401 | no | no | 65–395; 37–400 | 66–108, 27–45, 108–125, 217–236, 138–146 | 35%, (212–395/2– 185, 185) | (206–401)# | 212–401 | NP_491735 | C06A5.7b | |
| 9-H3 | 212 | no | no | no | 19–52, 162–192, 79–89 | 35%, (86–136/84– 131, 217) | (59–212)$ | 53–212 | NP_493365 | Y40B1B.5 | |
| 76-D4 | 254 | no | no | 2–171; 3–250; 139–167 | 139–147 | 36%, (3–251/8–265, 278) | (1–254)# | 1–254 | NP_001021765 | Y47G6A.22 | |
| 8-C1 | 142 | no | no | 4–140 | no | 46%, (5–141/9–149, 150) | (1–142)& | 1–142 | NP_499813 | T12D8.6 | |
| 11-F6 | 327 | no | 209– 231, 246– 268 | 207–227, 246–266 | 56–95, 19–56, 95–136, 136–161 | 50%, (141–167/1– 28, 163) | (1–182, 1–145)& | 1–135 | NP_491774 | T09B4.5a | |
| 1-F11 | 229 | no | no | 148–217; 170–201; 170–212 | 19–21, 129–134 | 59%, (135–220/20–107, 113) | (135–229)# | 135–229 | NP_506367 | F53F4.3 | |
| 3-H6 | 431 | no | no | 24–118, 141–234; 254–370; 261–370 | 397–413, 214–225, 100–107, 123–128 | 60%, (4–131/20–147, 149); 48%, (4–244/41–280, 280); 40%, (7–408/24–422, 457); 35%, (128–428/22–330, 336); | (1–135)# | 1–131, 128–244, 245–431 | NP_508026 | FK506-Binding protein family member (fkb-6) | |
| 20-H6 | 496 | no | no | 38–496; 186–261, 293–363, 422–483; 183–272, 275–374, 422–487 | 120–151, 265–279 | 66%, (394–496/1– 103, 104) ; 75%, (183–269/1– 87, 90); 75%, (290–366/1– 78, 85) | (169–385, 386–496)& | 183–272, 290–366, 394–496 | NP_001022967 | U2AF splicing factor family member (uaf-1) | |
| 1-D10 | 206 | 1–21 | no | 41–198; 23–77, 85–141, 143–196 | no | no | (23–206)# | 22–206 | NP_491320 | R12E2.13 | |
| 11020-H6** | 494 | no | no | 53–225; 337–475; 336–453; 236–494 | 19–52, 106–145, 215–241, 373–383, 325–330 | no | (1–237, 238–494) &$ | 53–225, 236–494, 331–494 | NP_493412 | Y37H9A.3 | |
| 70-H8 | 130 | no | 107– 129 | 109–129 | 19–104 | no | (1–130)# | 1–130 | NP_491052 | W03D8.3 | |
| 8-C9 | 183 | no | no | 125–159; | no | no | (1–183)# ; (28–183, 23–183)$ | 1–183 | NP_510277 | BMP receptor Associated protein family member (bra-1) | |
| 11005-B8 | 245 | no | no | no | 19–20, 129–136, 41–46 | no | (9–245)# | 1–245 | NP_740981 | R05F9.1b | |
| 18-F7 | 288 | no | no | 34–286 | 19–30, 72–78, 196–202, 66–70 | 45%, (35–286/21– 274, 276) | (32–266)$ | 34–286 | NP_001021584 | EXOnuclease family member (exo-3) | |
| 4-F5 | 592 | no | no | 189–269 | 485–509, 125–142, 19–37, 369–382,99–109, 311–317 | 35%, (210–269/15–74, 76); | (1–144)& | 1–124, 143–269 | NP_494544 | C16C8.16 | |
| 11011-C6 | 162 | no | no | 4–162; 5–54, 71–132 | 51–68 | no | (1–148, 1–124)& | 1–162 | NP_500324 | F42A6.6 | |
| 11058-H2 | 249 | no | no | 4–190; 4–230; 4–209 | no | 40%, (5–245/23– 263, 267) | (1–222)$ | 1–249 | NP_506407 | F20G2.2 | |
| 76-F10 | 263 | no | no | 23–130; 33–109 | 118–126, 20–20 | 27%, (27–120/6–98, 108) | (1–129)& | 21–130 | T26031 | hypothetical protein W01A8.2 | |
| 79-H11 | 245 | no | no | 2–156, 1–241 | 144–181, 181–204, 121–144 | 51%, (2–154/2–154, 155) | (1–185)$ | 1–156 | NP_492567 | C03D6.5 | |
| 25–B11 | 302 | no | no | 23–127; 45–114 | 203–236, 19–24, 236–248, 146–161, 187–195 | no | (1–153)& | 23–145 | NP_492781 | B0511.7 | |
| 11-D3 | 313 | no | no | 28–214; 9–302 | 151–174, 19–22 | no | (1–313)# ; (213–313)$ | 23–313 | NP_001021333 | Suppressor of PResenilin defect family member (spr-2) | |
| 11058-F12 | 272 | no | no | 40–63; 40–68, 90–124 | no | 32%, (67–119/3–54, 60); 26%, (40–119/5–84, 87); 31%, (41–114/36– 113, 124) | (1–147)& | 1–124 | NP_503566 | F36F12.8 | |
| 20-D7 | 500 | no | no | 311–395; 41–278; 283–436 | 387–417, 288–308, 453–469, 481–482 | 28%, (342–432/68–153, 289) | (1–500)# ; (298–500, 388–500, 407–500)$ | 1–287, 283–452 | NP_491868 | lariat DeBRanching enzyme related family member (dbr-1) | |
| 37-G9 | 245 | no | no | no | 206–227, 19–21 | no | (1–102, 103–245)& | 1–245 | NP_507040 | F14H3.6 | |
| 70-D2 | 265 | 1–25 | 15–37 | 69–243 | 223–247, 195–199 | 22%, (128–264/13–121, 135) | (1–130, 1–174)& | 1–265 | AAC25860 | Hypothetical protein C37C3.3 | |
| 2-B6 | 316 | no | no | 27–294 | 225–260, 174–191, 149–163, 296–298, 71–82, 84–84, 219–220 | 23%, (53–168/46– 180, 201) | (71–294)# ; (104–316)$& | 1–295 | NP_501422 | D2096.8 | |
| 10-E5 | 274 | no | no | 1–80, 108–190 | 229–246, 193–213 | 25%, (17–161/32– 166, 196) | (65–237, 1–74, 75–274)& | 1–192 | NP_502380 | C25G4.6 | |
| 3-D2 | 419 | no | no | 95–128, 133–166; 93–197; 133–166 | 196–217, 41–62, 259–275, 341–351, 19–21 | 26%, (95–239/13– 144, 166); 32%, (99–197/8–95, 118) | (140–309, 290–419)& | 93–197, 93–258 | NP_495087 | C17G10.2 | |
| 113-H8 | 588 | no | no | 232–342; 241–334 | 492–515, 19–56, 81–132, 150–185 | 29%, (234–328/2– 85, 105) | (1–345, 34–313)& | 232–342 | NP_740981 | R05F9.1b | |
| 76-F6 | 803 | no | 769– 791 | 17–250, 558–652; 10–34, 111–148, 382–402 | 752–767, 19–24, 717–727, 280–281 | 29%, (62–257/5– 200, 205); 29%, (83–257/1– 175, 181) | (1–181)& | 25–257 | NP_491008 | alpha-CaTuliN (catenin/vinculin related) family member (ctn-1) | |
| 4-A4 | 569 | no | no | 164–338, 369–527;187–200, 206–222, 263–279, 305–321, 321–335, 506–527; 263–318, 388–452, 511–568 | 65–97, 19–38,112–139 | 32%, (154–567/18–397, 402) | (334–542, 334–501)&; (1–569)# | 154–549 | NP_495753 | associated with RAN (nuclear import/export) function family member (ran-3) | |
| 25-H8 | 339 | no | no | 24–84; 24–75 | 215–276, 111–169, 187–201, 100–109 | 41%, (22–75/7–61, 70) | (1–152)& | 1–99 | NP_495652 | T09A5.8 | |
| 2-H9 | 356 | no | no | 22–273; 1–108, 115–297; 13–284 | 192–215, 303–312, 338 | 42%, (1–284/1–289, 382) | (1–335, 1–356) $& | 1–197 | NP_497949 | T23F11.1 | |
| 18-H1 | 208 | no | no | 32–124; 36–113; 24–45, 51–68, 131–145, 164–181, 186–205 | 171–190, 114–171, 19–40 | 43%, (41–110/10– 76, 90) | (1–81)&$ | 32–124 | NP_510410 | HIStone family member (his-24) | |
| 11049-D6 | 435 | no | no | 293–433; 270–433 | 110–152, 234–260, 175–186, 375–381 | no | (1–156)# ; (9–158, 1–119)$& | 261–435 | NP_001041025 | Y41E3.7a | |
| 9-G11 | 250 | no | no | 1–194 | 228–232 | no | (35–250)& | 1–227 | NP_497076 | R05H10.1 | |
| 10-E1 | 251 | no | no | no | 158–199 | no | (1–206)& | 1–251 | NP_496943 | W01G7.4 | |
| 37-F11 | 230 | no | no | 1–230 | 106–126, 168–182 | no | (45–179, 45–230)& | 1–230 | NP_507024 | T10C6.5 | |
| 75-A8 | 228 | no | no | no | 19–44, 125–135 | no | (1–189)$& | 1–228 | NP_492509 | F46A9.1 | |
| 11048-E2 | 262 | no | no | 33–178; 61–179 | 183–198 | no | (1–262)$ | 1–182 | NP_501337 | MEChanosensory abnormality family member (mec-17) |
* column A: Prediction accuracy level (I, accurate; II, basically accurate; III, wrong) and Plate-ID, users could query/search the sequence relative information from the SGCE web site [41]; column B: number of amino acids in the ORF; column C: signal peptide prediction results using SignalP; column D: transmembrane region prediction results using TMHMM; column E: domains/fragments from Interpro/InterProScan analysis; column F: domain linker regions predicted by Domain Linker Finder; column G: PDB-alignment results, including the percentage of sequence identity, query/subject sequence start position and end position, and the length of the subject sequence; column H: experimental results from protein crystal/three-dimensional structures (labeled with #), limited proteolysis (labeled with $) or spontaneous degradation (labeled with &); column I: DDBP prediction results; J: Accession Number, user could obtain sequence relative information from National Center for Biotechnology Information (NCBI) [40]; K: Definition of the protein or ACEID for proteins without known functions.
** 11020-H6 is corresponding to the region 299–792 of NP_493412.
Constructs, soluble expression and purification results of 57 proteins used for testing DDBP method
| Well | |||||||||
| Row | Column | Accession Number | Start position | End position | Length (aa) | Protein definition | Soluble expression level (18°C) | Soluble expression level (37°C) | Purified |
| A | 2 | NP_493355 | 1 | 300 | 300 | C01A2.5 | medium | high | yes |
| A | 3 | NP_001022737 | 1 | 264 | 264 | X-box Binding Protein homolog family member (xbp-1) | medium | high | not |
| A | 7 | NP_498947 | 1 | 282 | 282 | PeRoXisome assembly factor family member (prx-19) | high | high | Yes |
| A | 9 | NP_497226 | 1 | 253 | 253 | W06E11.4 | medium | medium | not |
| A | 10 | T26925 | 1 | 195 | 195 | hypothetical protein Y45F10C.5 | medium | not soluble | not |
| A | 11 | NP_495146 | 1 | 218 | 218 | K05F1.9 | medium | high | not |
| B | 1 | NP_495062 | 1 | 210 | 210 | Helix Loop Helix family member (hlh-26) | high | high | not |
| B | 2 | NP_496422 | 1 | 225 | 225 | B0491.3 | high | low | Yes |
| B | 3 | NP_495475 | 1 | 197 | 240 | F10E7.2 | not soluble | low | not |
| B | 4 | NP_496547 | 21 | 284 | 284 | W03C9.1 | medium | high | not |
| B | 5 | NP_496156 | 29 | 184 | 184 | R53.8 | low | high | not |
| B | 8 | NP_501161 | 1 | 250 | 250 | F42C5.3 | medium | high | Yes |
| B | 9 | NP_502163 | 1 | 319 | 319 | C10C6.3 | high | high | Yes |
| B | 10 | NP_500772 | 55 | 348 | 368 | ZK354.6 | not soluble | not soluble | not |
| B | 11 | NP_501789 | 1 | 297 | 297 | F25H8.1 | high | high | Yes |
| C | 1 | NP_501895 | 1 | 294 | 294 | R09E10.1 | medium | high | Yes |
| C | 2 | NP_500890 | 1 | 243 | 243 | H32C10.2 | high | high | not |
| C | 3 | NP_501981 | 1 | 388 | 388 | R102.5a | low | low | not |
| C | 4 | NP_507039 | 1 | 196 | 196 | F14H3.5 | high | low | not |
| C | 5 | NP_506094 | 1 | 183 | 183 | F23H12.3 | medium | low | not |
| C | 6 | NP_506094 | 1 | 90 | 183 | F23H12.3 | not soluble | medium | Yes |
| C | 8 | NP_505964 | 20 | 260 | 260 | T04F3.2 | medium | medium | not |
| C | 9 | NP_506495 | 1 | 252 | 252 | D1086.4 | high | high | not |
| C | 10 | NP_741113 | 1 | 394 | 419 | C32A3.3a | not soluble | not soluble | not |
| C | 11 | NP_501199 | 1 | 299 | 299 | F55G1.9 | not soluble | low | not |
| D | 1 | NP_495021 | 1 | 197 | 197 | EEED8.12 | high | high | not |
| D | 6 | NP_502315 | 1 | 199 | 199 | F35G2.2 | high | high | Yes |
| D | 10 | NP_491869 | 1 | 232 | 232 | MeDiaTor family member (mdt-18) | medium | low | not |
| E | 1 | NP_501936 | 1 | 190 | 190 | F01D4.5b | medium | low | not |
| E | 2 | NP_506929 | 26 | 144 | 206 | F57A10.4 | not soluble | low | not |
| E | 3 | NP_506929 | 26 | 206 | 206 | F57A10.4 | not soluble | not soluble | not |
| E | 5 | NP_491210 | 1 | 249 | 249 | T12F5.1 | high | high | not |
| E | 7 | NP_506245 | 35 | 240 | 240 | R186.3 | low | not soluble | not |
| E | 8 | NP_495941 | 1 | 269 | 308 | T24H10.1 | high | medium | Yes |
| E | 10 | NP_510298 | 1 | 269 | 269 | AMP-Activated Kinase Beta subunit family member (aakb-1) | medium | low | not |
| F | 1 | NP_492285 | 1 | 239 | 239 | F02E9.5 | medium | low | not |
| F | 2 | NP_509787 | 1 | 195 | 195 | F13E6.1 | medium | high | not |
| F | 3 | AAZ82857 | 1 | 230 | 230 | Hypothetical protein C17H12.13 | not soluble | not soluble | not |
| F | 4 | NP_493382 | 1 | 210 | 210 | Y87G2A.10 | high | medium | Yes |
| F | 5 | NP_497990 | 1 | 214 | 214 | C38D4.9 | high | high | Yes |
| F | 10 | NP_498391 | 1 | 217 | 217 | C56G2.15 | high | high | Yes |
| G | 2 | NP_493230 | 1 | 183 | 183 | W02A11.2 | high | not soluble | Yes |
| G | 3 | NP_492005 | 1 | 189 | 189 | F22D6.2 | high | high | Yes |
| G | 4 | NP_492692 | 1 | 206 | 206 | Y106G6E.4 | high | medium | Yes |
| G | 5 | NP_492795 | 1 | 207 | 207 | C34B2.5 | high | medium | Yes |
| G | 6 | NP_491736 | 1 | 214 | 214 | C06A5.2 | medium | high | not |
| G | 7 | NP_491358 | 1 | 233 | 233 | ZK973.9 | low | medium | not |
| G | 8 | NP_492301 | 1 | 240 | 240 | D1081.9 | not soluble | not soluble | not |
| G | 9 | NP_492301 | 1 | 65 | 240 | D1081.9 | high | high | Yes |
| G | 11 | NP_491721 | 1 | 273 | 273 | B0207.11 | high | high | not |
| H | 2 | NP_491965 | 1 | 274 | 274 | T21G5.4 | high | low | Yes |
| H | 3 | NP_491348 | 1 | 287 | 287 | Y47D9A.2a | not soluble | not soluble | not |
| H | 4 | NP_491903 | 27 | 330 | 363 | D2092.4 | high | medium | not |
| H | 5 | NP_496803 | 1 | 183 | 183 | F15D4.2 | medium | low | not |
| H | 6 | NP_491434 | 1 | 177 | 177 | C10H11.7 | low | low | not |
| H | 8 | NP_495527 | 30 | 179 | 179 | F45E12.5b | high | high | Yes |
| H | 9 | NP_494315 | 1 | 276 | 276 | F22E5.8 | not soluble | not soluble | not |
Figure 6Soluble expression results of 57 proteins used for testing DDBP method. ELISA results for soluble expression at 18°C and 37°C. Different shades in panels stand for different expression levels: the dark gray for the higher level, the gray for the medium level, the white for the lower level and the black for those not expressed, which was decided by comparisons with the positive control (A12 and B12, each containing one soluble protein). If ELISA readings of OD (optical density) at 405 nm was higher than or the same with the lower value of positive controls, the protein in this well was considered as expressed. Well C12 and D12 are negative controls and blank wells (white with no numbers) are null. After comparing the results at 18°C and 37°C, seven proteins (well B10, C10, E3, F3, G8, H3, and H9) were considered as not soluble.
Figure 7Purification results of 57 proteins used for testing DDBP method. Purification results for 15 of the 20 purified proteins. The name of each SDS-PAGE gel includes 2 parts, for example B2 (NP_496422), B2 corresponds to the well showed in Figure 6 and Table 3, and NP_496422 is the accession number of the protein in the public database [40]. The bands labeled with ''Cut'' in the figure correspond to the results after the cleavage by the thrombin and those labeled with ''Uncut'' correspond to the results before the cleavage. ''Aa'' in the figure stands for the amino acid range of the purified proteins.
Figure 2A multi-step laddered PCR Protocol. With this protocol, template DNA was amplified for 34 cycles with 5 minutes at 95°C for initial denaturation, 20 second at 94°C for denaturation, 30 second for annealing, 140 second at 68°C for extension and 10 minutes at 68°C for final extension. Annealing temperature was variable: it started from a relatively high temperature (55°C), and then decreased 1–2 degree each time until to 46°C. The temperature again increased 5 degree and stabilized at 51°C.