| Literature DB >> 31248380 |
Abstract
DpbCasX, also called Cas12e, is an RNA-guided DNA endonuclease isolated from Deltaproteobacteria. In this paper I characterized the CasX-compatible genome editing sites in the reference genomes of yeast (Saccharomyces cerevisiae), flatworms (Caenorhabditis elegans), flies (Drosophila melanogaster), zebrafish (Danio rerio), mouse (Mus musculus), rats (Rattus norvegicus), and humans (Homo sapiens). Across those genomes there were > 27,000 CasX sites per megabase on average. More than 90% of genes in each genome had at least one unique site overlapping an exon, with median unique sites per gene of 6-45. I also annotated sites in the GRCm38 reference and 15 additional mouse strain genomes. The presence of specific guide sequences varied amongst the strains, with CAST/EiJ and PWK/PhJ showing the greatest divergence from the reference strain. The high density of CasX sites and number of exon overlapping sites suggests that CasX has the potential to be used as a common genome editor.Entities:
Keywords: Caenorhabditis elegans; Cas12e; CasX; Danio rerio; Drosophila melanogaster; Genome editing; Homo sapiens; Mus musculus; Rattus norvegicus; Saccharomyces cerevisiae
Mesh:
Year: 2019 PMID: 31248380 PMCID: PMC6598274 DOI: 10.1186/s12864-019-5924-6
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
CasX site genomic distribution in 7 model organisms
| Organism | Total sites | Unique sites | Unique (%) | Total sites / Mbp [median] | Unique sites / Mbp [median] |
|---|---|---|---|---|---|
|
| 367,810 | 345,376 | 93.90 | 30,254.74 | 28,409.4 |
|
| 3,315,259 | 2,988,557 | 90.15 | 33,057.91 | 29,800.2 |
|
| 3,681,483 | 3,011,422 | 81.80 | 25,614.59 | 20,952.5 |
|
| 33,296,705 | 22,500,532 | 67.58 | 24,242.74 | 16,382.2 |
|
| 70,817,235 | 54,464,834 | 76.91 | 25,932.10 | 19,944.1 |
|
| 73,427,248 | 55,157,865 | 75.12 | 25,582.77 | 19,217.5 |
|
| 78,288,233 | 62,089,586 | 79.31 | 25,256.30 | 20,030.5 |
CasX sites overlapping known gene exons
| Organisms | Genes | Cut (%) | Unique cut (%) | Sites / gene [median] | Unique sites / gene [median] |
|---|---|---|---|---|---|
|
| 7036 | 99.97 | 96.93 | 31 | 29 |
|
| 46,778 | 96.46 | 94.73 | 7 | 6 |
|
| 17,737 | 99.86 | 97.30 | 38 | 37 |
|
| 32,520 | 99.94 | 95.43 | 52 | 45 |
|
| 54,838 | 99.58 | 93.14 | 36 | 26 |
|
| 32,883 | 99.40 | 90.44 | 33 | 25 |
|
| 58,735 | 99.60 | 96.98 | 28 | 22 |
Fig. 1CasX PAM site usage. Shown in this figure are the 7 species on the x-axis (abbreviated as the first letter of the genus and species), and a stacked bar chart of fractional PAM site usage on the y-axis. The plot is divided into two subplots with the distribution of only unique cutters and of all sites. The A and T PAM sites are generally the most used. The TTCC and TTCG sites are used much less often in zebrafish, mouse, rat, and humans. The TTCG site, which contains a CpG dinucleotide, is seldom observed in those four species in particular
Fig. 2Hamming distance for editing targets amongst Ensembl mouse strains. Shown is the Hamming distance between sites for the GRCm38 reference (denoted Mus musculus), and genomes of 15 strains available via Ensembl. The matrix was ordered by calculating the hierarchical clustering of the distances with complete linkage. The strains CAST/EiJ and PWK/PhJ were the furthest from the GRCm38 reference among the tested strains, though all had some degree of difference. Those two strains in particular might require site annotations for their specific genomes rather than site selection from the general mouse reference