| Literature DB >> 28516137 |
Alex Orlek1,2, Hang Phan1,2, Anna E Sheppard1,2, Michel Doumith3, Matthew Ellington2,3, Tim Peto1,2, Derrick Crook1,2, A Sarah Walker1,2, Neil Woodford2,3, Muna F Anjum2,4, Nicole Stoesser1.
Abstract
Thousands of plasmid sequences are now publicly available in the NCBI nucleotide database, but they are not reliably annotated to distinguish complete plasmids from plasmid fragments, such as gene or contig sequences; therefore, retrieving complete plasmids for downstream analyses is challenging. Here we present a curated dataset of complete bacterial plasmids from the clinically relevant Enterobacteriaceae family. The dataset was compiled from the NCBI nucleotide database using curation steps designed to exclude incomplete plasmid sequences, and chromosomal sequences misannotated as plasmids. Over 2000 complete plasmid sequences are included in the curated plasmid dataset. Protein sequences produced from translating each complete plasmid nucleotide sequence in all 6 frames are also provided. Further analysis and discussion of the dataset is presented in an accompanying research article: "Ordering the mob: insights into replicon and MOB typing…" (Orlek et al., 2017) [1]. The curated plasmid sequences are publicly available in the Figshare repository.Entities:
Keywords: Complete genomes; Enterobacteriaceae family; Plasmids; Sequence data curation
Year: 2017 PMID: 28516137 PMCID: PMC5426034 DOI: 10.1016/j.dib.2017.04.024
Source DB: PubMed Journal: Data Brief ISSN: 2352-3409
| Subject area | Microbiology, Bioinformatics |
| More specific subject area | Plasmids |
| Type of data | Sequence data |
| How data was acquired | Plasmid nucleotide sequences were compiled from Genbank and RefSeq accessions contained within the NCBI nucleotide database. Corresponding protein sequences were generated by translating each plasmid nucleotide sequence in all 6 frames. |
| Data format | FASTA files, Genbank files (zipped) |
| Experimental factors | N/A |
| Experimental features | N/A |
| Data source location | Sequences were retrieved from the NCBI nucleotide database ( |
| Data accessibility | Data is publicly available in the Figshare repository. |
| DOI: |