| Literature DB >> 34791067 |
Luca Barbon1, Victoria Offord1, Elizabeth J Radford2,3, Adam P Butler1, Sebastian S Gerety2, David J Adams1, Hong Kee Tan2, Andrew J Waters1.
Abstract
MOTIVATION: CRISPR/Cas9-based technology allows for the functional analysis of genetic variants at single nucleotide resolution whilst maintaining genomic context (Findlay et al., 2018). This approach, known as saturation genome editing (SGE), a form of deep mutational scanning (DMS), systematically alters each position in a target region to explore its function. SGE experiments require the design and synthesis of oligonucleotide variant libraries which are introduced into the genome. This technology is applicable to diverse fields such as disease variant identification, drug development, structure-function studies, synthetic biology, evolutionary genetics and host-pathogen interactions. Here we present the Variant Library Annotation Tool (VaLiAnT) which can be used to generate variant libraries from user-defined genomic coordinates and standard input files. The software can accommodate user-specified species, reference sequences and transcript annotations.Entities:
Year: 2021 PMID: 34791067 PMCID: PMC8796380 DOI: 10.1093/bioinformatics/btab776
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Information flow diagram for VaLiAnT: input files are shown as grey document boxes with file types shown in brackets. Processes are shown as green boxes and output files are shown as blue document boxes with file extensions in brackets. Process data is shown in slanted white boxes. Quality control (QC) report contains sequences in a comma-separated value (CSV) file and is execution-specific. All other output files are targeton-specific, see user-manual: https://github.com/cancerit/VaLiAnT/wiki. Abbreviations: Variant Call Format (VCF), Browser Extensible Data (BED), General Transfer Format (GTF), General Feature Format 2 (GFF2) and Fast-all (FASTA)
Fig. 2.Mutator function descriptions: (a) Basic mutation actions that do not require CDS reading frame information. An example six base-pair targeton is shown for explanation purposes; (a-i) ‘snv’ results in all possible nucleotide substitutions at each position in the targeton, (a-ii) ‘1del’ deletes single nucleotides at each position, (a-iii) ‘2del0’ deletes nucleotides in tandem starting at position 0, (a-iv) ‘2del1’ deletes nucleotides in tandem starting at position 1. (b) Mutation actions that require CDS reading frame information. An example nine base-pair targeton is shown in which the first and last codon are split between upstream and downstream hypothetical exons. Amino acids encoded by codons are displayed in capital italics above the DNA sequence; (b-i) the ‘snvre’ mutation function runs the ‘snv’ function resulting in ‘SNV oligo’ output. CDS frame information is computed and where a missense change occurs as a result of ‘snv’—shown in ‘Peptide’ in bold—a ‘Redundant oligo’ is generated. This redundant oligo encodes the same missense change as that generated by ‘snv’ at the peptide level, but with an alternative triplet sequence. The redundant triplet sequence chosen is the most frequent according to the codon frequency table (or next most frequent if ‘snv’ generates the most frequent). Instances in which redundant oligos are not generated are represented by ‘–’ in the redundant oligo column, this includes; when synonymous changes are created by ‘snv’ (these are included in ‘snvre’ outputs), when ‘snv’ alone creates an additional missense change that results in an appropriate redundant oligo (as is the case for peptides GLIS and GIIS in the example), when non-degenerate missense codons are produced (ATG in the example) and when partial codons are targeted (denoted by ‘NA’; codons cannot be replaced, but SNVs are still introduced); (b-ii) ‘inframe’ results in codon, triplet deletions for the longest inframe coding sequence within the targeton; (b-iii) ‘ala’ results in inframe substitutions to alanine based on the top-ranking alanine from the codon usage table, resulting in an alanine scan through coding sequence; (b-iv) ‘stop’ results in inframe substitutions to a stop codon based on the top-ranking stop from the codon frequency table, giving systematic truncating mutations throughout the coding sequence
Fig. 3.SGE library generation workflow for BRCA1: an example workflow for BRCA1 exon 2 (Homo sapiens, GRCh38, ENSE00003510592) is shown with the corresponding input and output files used available from https://github.com/cancerit/VaLiAnT. (a) Overview of the library design, with sequence information modified from Geneious Prime® (version 2019.04) visualization. Targeton, genomic region, chromosome and genome build are displayed, together with GRCh38 coordinates both for the complete targeton and for BRCA1 exon 2. The selected sgRNA binding site is shown in yellow, where the directionality of the exon and sgRNA are negative strand and antisense, respectively. Within the nucleotide sequence, red nucleotides are positions selected for PAM/protospacer protection edits, beneath which the translated peptide sequence is shown by coloured rectangles. Below this, the targeton range is represented by a grey rectangle with condensed sequences represented by double slash. At either end of the targeton are the 5′-AD and 3′-AD (light blue rectangles) which represent appended P5 and P7 adapter sequences, enabling generic amplification of the generated library pool. Annealing sites for targeton-specific amplification and cloning adapters are shown as white arrows. Dark blue rectangles represent region 1 (r1) and region 3 (r3) and are 25 bp extensions from region 2 (r2), a red rectangle with black lines to indicate the location of the PAM/protospacer protection edits. Regions in which mutator functions are actioned are described through annotated black demarcation lines. Variants ingested through custom VCF files are incorporated throughout the entire range of the targeton. As shown, r1 and r3 are modified with basic mutation type functions. To ensure deletion of dinucleotide splice-acceptor or splice-donor intronic sequences immediately flanking exon 2, sequential deletion through r1(25 bp) is off-set using 2del1 (as r1 length is odd), enabling deletion of exon-flanking tandem nucleotides at the distal (3′) end of r1. As dinucleotide deletion proceeds from the 5′ end, 2del0 is used for r3, ignoring the final distal nucleotide of r3 at the 3′ end. r2 is modified by CDS-specific mutator functions. (b) Schematic of the input and output files used for computation. The ‘targetons’ file contains targeton and r2 genomic ranges, r1 and r3 extension values and additional information including the required mutator functions for each region and sgRNA identifiers which correspond with the identifiers given in the ‘PAM edits’ VCF file. Reference files include ‘REF’ FASTA chromosome sequence, ‘GTF’ specific transcript annotation, and a custom variant file ‘VCF’. VaLiAnT is run from the command line to generate output files, including ‘meta’ full library metadata, ‘VCF’ of all variants generated and a ‘unique’ csv file for easy ordering of sequence synthesis. (c) Downstream processes. Interrogation of the output files using sequence alignment is used for library validation. A library composition graph is shown, delineating each mutator function output for the entire targeton range of BRCA1 exon 2, including total sequences and total unique sequences. Downstream synthesis, experimentation and/or analysis processes are possible after validation
Summary of input parameters for BRCA1 exon 2 SGE library generation
| Input parameter | Value |
|---|---|
| chromosome | 17 |
| strand | - |
| targeton range | 43115634 : 43115878 |
| r2 range | 43115726 : 43115779 |
| r1 length | 25 |
| r3 length | 25 |
| sgRNA identifier | sgRNA_ex2 |
| mutators for r1 | 2del1, snv, 1del |
| mutators for r2 | snvre, inframe, ala, stop, 1del |
| mutators for r3 | 2del0, snv, 1del |
Note: Summary of values included in targeton parameter input file for exon 2. Values correspond to Figure 3a schematic.
Summary of output sequence categories for BRCA1 exon 2 SGE library generation
| r1 | r2 | r3 | Constant | Complete | ||
|---|---|---|---|---|---|---|
| Length (bp) | 25 | 54 | 25 | 141 | 245 | |
| Mutator functions | 2del1 | 12 | – | – | – | 12 |
| 2del0 | – | – | 12 | – | 12 | |
| 1del | 25 | 54 | 25 | – | 104 | |
| inframe | – | 17 | – | – | 17 | |
| snv | 75 | 162 | 75 | – | 312 | |
| snvre | – | 140 | – | – | 140 | |
| ala | – | 17 | – | – | 17 | |
| stop | – | 17 | – | – | 17 | |
| gnomAD | 2 | 8 | 10 | 22 | 42 | |
| ClinVar | 74 | 176 | 76 | 1 | 327 | |
| Total | 188 | 591 | 198 | 23 | 1000 | |
| Excluded | 0 | 1 | 0 | 0 | 1 | |
|
| 109 | 351 | 101 | 22 |
| |
Note: Values for each attribute are shown, specific to regions 1–3 (r1–3) and constant regions (unedited, except for custom variants) and the summed values comprising the entire targeton (complete). One custom variant in r2 results in an oligonucleotide longer than 300 bp and is excluded from the final library, total unique oligonucleotides—the number representing library complexity for SGE experiments—is shown in bold. Values relate to targeton ‘chr17_43115634_43115878_minus_sgRNA_ex2’ (https://github.com/cancerit/VaLiAnT/tree/develop/examples/sge/brca1_nuc_output_exp).