| Literature DB >> 28770026 |
Anish K Pandey1, David W Cleary1, Jay R Laver1, Martin C J Maiden2, Xavier Didelot3, Andrew Gorringe4, Robert C Read1.
Abstract
We present the high quality, complete genome assembly of Neisseria lactamica Y92-1009 used to manufacture an outer membrane vesicle (OMV)-based vaccine, and a member of the Neisseria genus. The strain is available on request from the Public Health England Meningococcal Reference Unit. This Gram negative, dipplococcoid bacterium is an organism of worldwide clinical interest because human nasopharyngeal carriage is related inversely to the incidence of meningococcal disease, caused by Neisseria meningitidis. The organism sequenced was isolated during a school carriage survey in Northern Ireland in 1992 and has been the subject of a variety of laboratory and clinical studies. Four SMRT cells on a RSII machine by Pacific Biosystems were used to produce a complete, closed genome assembly. Sequence data were obtained for a total of 30,180,391 bases from 2621 reads and assembled using the HGAP algorithm. The assembly was corrected using short reads obtained from an Illumina HiSeq 2000instrument. This resulted in a 2,146,723 bp assembly with approximately 460 fold mean coverage depth and a GC ratio of 52.3%.Entities:
Keywords: Bacteria; Commensal; Genome assembly; Nasopharyngeal microflora; Neisseria; SMRT cell sequencing; Short read sequencing
Year: 2017 PMID: 28770026 PMCID: PMC5525351 DOI: 10.1186/s40793-017-0250-6
Source DB: PubMed Journal: Stand Genomic Sci ISSN: 1944-3277
Classification and general features of Neisseria lactamica strain Y92–1009 according to MIGS specification [36]
| MIGS ID | Property | Term | Evidence codea |
|---|---|---|---|
| Classification |
| TAS [ | |
|
| TAS [ | ||
|
| TAS [ | ||
|
| TAS [ | ||
|
| TAS [ | ||
|
| TAS [ | ||
|
| TAS [ | ||
|
| |||
| Gram stain | Negative | IDA | |
| Cell shape | Diplococcus | IDA | |
| Motility | Non-motile but piliated | TAS [ | |
| Sporulation | Not reported | NAS | |
| Temperature range | 32–39 °C | IDA | |
| Optimum temperature | 37.0 °C | IDA | |
| pH range; Optimum | 3.5–6.5 °C; 5 °C | TAS | |
| Carbon source | Glucose, Maltose, Lactose | TAS [ | |
| MIGS-6 | Habitat | Human Nasopharynx | TAS [ |
| MIGS-6.3 | Salinity | 0.9% | TAS [ |
| MIGS-22 | Oxygen requirement | Aerobe | TAS [ |
| MIGS-15 | Biotic relationship | commensal | TAS [ |
| MIGS-14 | Pathogenicity | Non-pathogen | TAS [ |
| MIGS-4 | Geographic location | Londonderry, Northern Ireland | TAS [ |
| MIGS-5 | Sample collection | 1992 | TAS [ |
| MIGS-4.1 | Latitude | 54.9966 N | NAS |
| MIGS-4.2 | Longitude | 7.3086 W | NAS |
| MIGS-4.4 | Altitude | 128 m | NAS |
aEvidence codes - IDA: Inferred from Direct Assay; TAS: Traceable Author Statement (i.e., a direct report exists in the literature); NAS: Non-traceable Author Statement (i.e., not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontology project [2]
bData for isolate geographic location and sample collection was acquired by searching for N. latamica Y92–1009 (ID number: 4945) on pubMLST Neisseria
Fig. 1Photomicrograph of N. lactamica Y92–1009. This image was obtained with transmission electron micrography and displays the diplococcoid nature of the N. lactamica Y92–1009 cell. The size of the cell is indicated in micrometres (μm)
Fig. 2Phylogenetic tree indicating the position of N. lactamica Y92–1009 amongst other pathogenic and commensal Neisseria. This tree was constructed based on a core genome comparison of a collection of 32 Neisseria assemblies generated using the genome comparator tool available on pubmlst.org/neisseria. The reconstructed evolutionary relationships among N. meningitidis (Red, n = 2), N. gonnorhoeae (Red, n = 1), N. cinerea (Yellow, n = 1) and N. lactamica (Black, n = 27) are shown. The genome sequenced here, N. lactamica Y92–1009 is outlined in cyan (Nla_PacBio|ST_3493). This analysis included the sequenced genome and the best representative assembly for every identified sequence type (ST) of this species. The tree was generated using FastTree v.2 [13] and edited using Figtree v.1.4.3 [14]. The tree is drawn to scale, with branch length units being expressed as an overall proportion of divergence based on the comparison of 1629 genes
Project information
| MIGS ID | Property | Term |
|---|---|---|
| MIGS 31 | Finishing quality | Complete |
| MIGS-28 | Libraries used | SMRTbell Template prep kit |
| MIGS 29 | Sequencing platforms | Pacific Biosciences RSI |
| MIGS 31.2 | Fold coverage | 470× |
| MIGS 30 | Assemblers | HGAP |
| MIGS 32 | Gene calling method | Prokka, Blast2GO |
| Locus Tag | ||
| Genbank ID | CP019894 | |
| GenBank Date of Release | 17/02/2017 | |
| GOLD ID | - | |
| BIOPROJECT | PRJNA331097 | |
| MIGS 13 | Source Material Identifier | - |
| Project relevance | Medical, Biotechnological |
Genome statistics
| Attribute | Value | % of Total |
|---|---|---|
| Genome size (bp) | 2,146,723 | 100 |
| DNA coding (bp) | 1,831,541 | 85.3 |
| DNA G + C (bp) | 1,123,594 | 52.3 |
| DNA scaffolds | 1 | 100 |
| Total genes | 2053 | 100 |
| Protein coding genes | 1980 | 96.4 |
| RNA genes | 72 | 3.5 |
| Pseudo genes | 16 | 0.8 |
| Genes in internal clusters | 16 | 0.8 |
| Genes with function prediction | 1918 | 93.4 |
| Genes assigned to COGs | 1527 | 74.3 |
| Genes with Pfam domains | 5 | 0.2 |
| Genes with signal peptides | 0 | 0 |
| Genes with transmembrane helices | 0 | 0 |
| CRISPR repeats | 3 | 0.1 |
Number of predicted genes associated with general COG functional categories
| Code | Value | %age | Description |
|---|---|---|---|
| J | 148 | 7.21 | Translation, ribosomal structure and biogenesis |
| A | 1 | 0.05 | RNA processing and modification |
| K | 56 | 2.73 | Transcription |
| L | 137 | 6.67 | Replication, recombination and repair |
| B | 1 | 0.05 | Chromatin structure and dynamics |
| D | 24 | 1.17 | Cell cycle control, Cell division, chromosome partitioning |
| V | 23 | 1.12 | Defense mechanisms |
| T | 25 | 1.22 | Signal transduction mechanisms |
| M | 130 | 6.33 | Cell wall/membrane biogenesis |
| N | 20 | 0.97 | Cell motility |
| U | 42 | 2.05 | Intracellular trafficking and secretion |
| O | 75 | 3.65 | Posttranslational modification, protein turnover, chaperones |
| C | 109 | 5.31 | Energy production and conversion |
| G | 48 | 2.34 | Carbohydrate transport and metabolism |
| E | 129 | 6.28 | Amino acid transport and metabolism |
| F | 45 | 2.19 | Nucleotide transport and metabolism |
| H | 76 | 3.70 | Coenzyme transport and metabolism |
| I | 51 | 2.48 | Lipid transport and metabolism |
| P | 77 | 3.75 | Inorganic ion transport and metabolism |
| Q | 10 | 0.49 | Secondary metabolites biosynthesis, transport and catabolism |
| R | 137 | 6.67 | General function prediction only |
| S | 163 | 7.94 | Function unknown |
| - | 526 | 25.62 | Not in COGs |
The total is based on the total number of protein coding genes (1980) putatively discovered in the genome
Fig. 3Circular map of N. lactamica Y92–1009 genome and features generated with Cgview Comparison Tool. The arrows in the outermost ring indicate putative genes (with the arrow indicating the 5′ to 3′ direction on the positive strand.) identified and assigned to Clusters of Orthologous Groups (COGs). The 20 COG categories are indicated by different colours from Red to Grey according to the colour key. The second and third rings indicate regions containing coding sequence (Blue), tRNA (Orange), and other RNAs (Grey), with the second ring running 5′ to 3′ (positive strand) and the third ring running 3′to 5′ (negative strand). The fourth ring indicates open reading frames assigned as COGs and encoded on the negative strand. The fifth, black, graph ring displays GC content while the last ring (purple and green) shows positive (Green) and negative (Purple) GC skew
Frequency of repeat sequences in N. lactamica Y92–1009 genome
| Repeat type | Repeat sequence | Value |
|---|---|---|
| AT-DUS | ‘ATGCCGTCTGAA’ | 1718 |
| AG-DUS | ‘AGGCCGTCTGAA’ | 262 |
| AG-mucDUS | ‘AGGTCGTCTGAA’ | 45 |
| DSR3 | ‘ATTCCCNNNNNNNNGGGAAT’ | 454 |
| Correia | ‘ATAG[CT]GGATTAACAAAAATCAGGAC’ | 50 |
| ‘TATAG[CT]GGATTAAATTTAAACCGGTAC’ | 1 | |
| ‘TATAG[CT]GGATTAACAAAAACCGGTAC’ | 17 | |
| ‘TATAG[CT]GGATTAAATTTAAATCAGGAC’ | 17 |
Fig. 4Graphic showing phage related proteins identified in the intact prophage by PHAST. The diagram above was annotated by and imported from the PHAST-prophage database. Twenty-six hypothetical proteins were removed from the schematic to increase the clarity of the phage related proteins. The proteins above the black line indicating genome position are encoded 5′ to 3′ while the proteins under it are encoded 3′ to 5′. The abbreviations in the diagrams are as follows Att (phage attachment site), Coa (Phage coat protein), fib (Phage Tail Fiber), Int (Phage integrase) Pla (Phage plate protein), PLP (Phage like protein), Por (Portal protein) sha (Phage tail shaft protein) Ter (Terminase)