| Literature DB >> 29474422 |
Steven Shave1, Stefan Mann1, Joanna Koszela1, Alastair Kerr1, Manfred Auer1.
Abstract
The design of highly diverse phage display libraries is based on assumption that DNA bases are incorporated at similar rates within the randomized sequence. As library complexity increases and expected copy numbers of unique sequences decrease, the exploration of library space becomes sparser and the presence of truly random sequences becomes critical. We present the program PuLSE (Phage Library Sequence Evaluation) as a tool for assessing randomness and therefore diversity of phage display libraries. PuLSE runs on a collection of sequence reads in the fastq file format and generates tables profiling the library in terms of unique DNA sequence counts and positions, translated peptide sequences, and normalized 'expected' occurrences from base to residue codon frequencies. The output allows at-a-glance quantitative quality control of a phage library in terms of sequence coverage both at the DNA base and translated protein residue level, which has been missing from toolsets and literature. The open source program PuLSE is available in two formats, a C++ source code package for compilation and integration into existing bioinformatics pipelines and precompiled binaries for ease of use.Entities:
Mesh:
Substances:
Year: 2018 PMID: 29474422 PMCID: PMC5825087 DOI: 10.1371/journal.pone.0193332
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Library definition format.
Example of the library definition format allowing robust identification of randomized positions within a sequence from forwards and reverse complementary strand reads.
Fig 2Example protein residue occurrence heatmap.
Protein residue occurrence heatmap for the exemplar dataset accompanying the PuLSE software distribution. Phenylalanine is slightly enriched over its expected occurrence rate for each position within the library. Lysine is underrepresented at each position. However, the enrichment and underrepresentations are not pronounced, ranging from 0.44 to 3.27 of expected.
Fig 3Example DNA base occurrence heatmap.
DNA base occurrence heatmap for exemplaric dataset accompanying the PuLSE software distribution. Enrichment and underrepresentation is not pronounced, suggesting the profiled phage library possesses a high degree of randomness and therefore the expected diversity.