| Literature DB >> 20211841 |
Pablo A Lee1, Jessica S Dymond, Lisa Z Scheifele, Sarah M Richardson, Katrina J Foelber, Jef D Boeke, Joel S Bader.
Abstract
Synthetic biology projects aim to produce physical DNA that matches a designed target sequence. Chemically synthesized oligomers are generally used as the starting point for building larger and larger sequences. Due to the error rate of chemical synthesis, these oligomers can have many differences from the target sequence. As oligomers are joined together to make larger and larger synthetic intermediates, it becomes essential to perform quality control to eliminate intermediates with errors and retain only those DNA molecules that are error free with respect to the target. This step is often performed by transforming bacteria with synthetic DNA and sequencing colonies until a clone with a perfect sequence is identified. Here we present CloneQC, a lightweight software pipeline available as a free web server and as source code that performs quality control on sequenced clones. Input to the server is a list of desired sequences and forward and reverse reads for each clone. The server generates summary statistics (error rates and success rates target-by-target) and a detailed report of perfect clones. This software will be useful to laboratories conducting in-house DNA synthesis and is available at http://cloneqc.thruhere.net/ and as Berkeley Software Distribution (BSD) licensed source.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20211841 PMCID: PMC2860120 DOI: 10.1093/nar/gkq093
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Flowchart of the CloneQC sequence validation pipeline. See ‘Materials and Methods’ section for details.
Figure 2.Summary results for CloneQC run as a web application. ‘STATISTICS’ provides summary statistics for each synthetic target matched by at least one clone. Following are summary tables for each target sequence (only the first two shown), giving the identities of clones that contain perfect physical DNA for the target (Passing Clones), have discrepancies between reads but may have perfect physical DNA (Check Clones) or have errors that are fixable by reamplification (Fixable Clones).
Column descriptions for detailed results spreadsheet with one record per clone
| key1 | Clone unique identifier (primary key). |
| bb_id | Target sequence unique identifier, taken from the best match among target sequences provided in the fasta file of targets. |
| length | Length of the target sequence. |
| overallqc | Overall QC for the clone: PASS, FAIL, CHECK, FIXABLE or NA if Reverse Complement QC or Match QC failed. |
| mutnqc | Mutation QC: PASS if no mutations, FAIL if 1 or more mutations (some of which may be fixable), or NA if Reverse Complement QC or Match QC failed. |
| revcomqc | Reverse complement QC: PASS if exactly one read must be reverse complemented; FAIL otherwise. This QC is specific for a workflow in which each clone has a forward and reverse read; it can be modified for workflows that provide different numbers of reads per clone. |
| matchqc | Match QC for all reads matching the same target sequence: PASS if all reads match the same target; FAIL otherwise. |
| pctid | Percent identify of the reads to the target sequence, taken as the number of matches in the three-way sequence alignment of the target with the two reads relative to the target length. |
| PF | Percent identity for the matching region of the forward read and the target. |
| PR | Percent identity for the matching region of the reverse read and the target. |
| LF | Length of the forward read. |
| LR | Length of the reverse read. |
| read1 | Filename of the forward read. For the workflow described, this file is |
| read2 | Filename of the reverse read, either |
| read_extra | Filenames of extra reads provided for the clone. |
| n_ins | Number of insertion synthesis errors (multi-base insertion count as 1 error). |
| n_del | Number of deletion synthesis errors (multi-base deletion count as 1 error). |
| n_sub | Number of substitution synthesis errors (multi-base substitutions count as 1 error). |
| n_chk | Number of regions to check for possible errors where individual reads disagree, with one matching the target sequence (multi-base regions count as 1 error). |
| n_tot | Sum of n_ins, n_del, n_sub, n_chk. |
| mutnstr | Space-delimited list of the errors. Each list item has the form |
Comparison of human and computer quality control assessments for sequences from 133 different clones
| C | H | |||
|---|---|---|---|---|
| PASS | FIXABLE | FAIL | UNCLEAR | |
| PASS | 21 | 0 | 0 | 0 |
| CHECK | 7 | 0 | 1 | 0 |
| FIXABLE | 0 | 4 | 0 | 0 |
| FAIL | 0 | 0 | 97 | 3 |