| Literature DB >> 34019636 |
Matthew J McGuffie1, Jeffrey E Barrick1.
Abstract
Engineered plasmids are widely used in the biological sciences. Since many plasmids contain DNA sequences that have been reused and remixed by researchers for decades, annotation of their functional elements is often incomplete. Missing information about the presence, location, or precise identity of a plasmid feature can lead to unintended consequences or failed experiments. Many engineered plasmids contain sequences-such as recombinant DNA from all domains of life, wholly synthetic DNA sequences, and engineered gene expression elements-that are not predicted by microbial genome annotation pipelines. Existing plasmid annotation tools have limited feature libraries and do not detect incomplete fragments of features that are present in many plasmids for historical reasons and may impact their newly designed functions. We created the open source pLannotate web server so users can quickly and comprehensively annotate plasmid features. pLannotate is powered by large databases of genetic parts and proteins. It employs a filtering algorithm to display only the most relevant feature matches and also reports feature fragments. Finally, pLannotate displays a graphical map of the annotated plasmid, explains the provenance of each feature prediction, and allows results to be downloaded in a variety of formats. The webserver for pLannotate is accessible at: http://plannotate.barricklab.org/.Entities:
Year: 2021 PMID: 34019636 PMCID: PMC8262757 DOI: 10.1093/nar/gkab374
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.pLannotate web server workflow. Users input a plasmid sequence as a FASTA file, a GenBank file or raw text. This sequence is queried against various feature databases. Hits are compiled and filtered to ensure that the most informative matches are reported. Annotation results are displayed as a graphical plasmid map for users to interact with and can be download in different formats.
Databases and search programs used by pLannotate to identify plasmid features
| Database | Number of features | Search program |
| Percent identity cutoff |
|---|---|---|---|---|
|
| 13,240 | BLASTN | 1 | 98% |
|
| 762 | DIAMOND | 0.001 | 98% |
|
| 547,899 | DIAMOND | 0.001 | 10% |
|
| 3,940 | Infernal | 1 | N/A |
Figure 2.Annotation results for three engineered plasmids. Each panel compares output from the pLannotate web server, the SnapGene viewer desktop program, and the PlasMapper web server. SnapGene and PlasMapper both have options to display generic open reading frames above a length cutoff that do not match specific features in their databases. These ORFs are displayed as thin arrows for SnapGene results and pink arrows for PlasMapper results. pLannotate displays features in different colors based on their type if they are matches to the GenoLIB feature database and in tan if they are matches to proteins in Swiss-Prot. Fragmentary matches are shown as unfilled arrows. Labels were manually edited to improve legibility for all tools. (A) Plasmid 7M8 is used for adeno-associated virus (AAV) vector production. (B) Plasmid pAM5505 is a helper plasmid for engineering cyanobacteria via conjugal transfer of DNA. (C) Plasmid pTECH-chPyIRS(IPYE) contains an engineered aminoacyl-tRNA synthetase enzyme. Note the partial TcR open-reading frame downstream of the tet promoter that is discussed in the text
Figure 3.Incomplete fragments of genes and other genetic parts are common in engineered plasmids. We used pLannotate to annotate the sequences of 10,000 plasmids that are available from Addgene and analyzed predictions of putative feature fragments. (A) Overall abundance of fragments derived from different types of features. Protein coding regions are divided into the CDS category for matches to the GenoLIB or FPbase databases and the swissprot category for matches to the Swiss-Prot database. (B) Identities of the most common feature fragments. (C) Histogram of the percentage of the bases in each plasmid that are derived from predicted feature fragments. The orange line at 1.7% represents the median.