| Literature DB >> 36131352 |
Dewan Shrestha1,2, Aishee Bag3, Ruiqiong Wu2, Yeting Zhang3, Xing Tang2, Qian Qi2, Jinchuan Xing4,5, Yong Cheng6,7.
Abstract
BACKGROUND: Genomic safe harbors are regions of the genome that can maintain transgene expression without disrupting the function of host cells. Genomic safe harbors play an increasingly important role in improving the efficiency and safety of genome engineering. However, limited safe harbors have been identified.Entities:
Keywords: Chromatin organization; Epigenome; Gene therapy; Genetic engineering; Genomic safe harbor; Mobile genetic elements
Mesh:
Substances:
Year: 2022 PMID: 36131352 PMCID: PMC9490961 DOI: 10.1186/s13059-022-02770-3
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 17.906
Fig. 1A schematic representation of the overall genomic safe harbor identification strategy. a Selection of common pMEIs from healthy individuals with AF > 0.1. b Removing pMEIs significantly associated with gene expression (FDR < 0.1 in eQTL mapping). c Removing pMEIs showing spatial proximity with oncogenes, tumor suppressor genes, and dosage-sensitive genes based on TADs and chromatin interaction mapping. d Removing pMEIs overlapping repressive chromatin regions
Identified GSH sites in blood from the 1000 Genomes project and the GTEx data
| GSH_ID | Position | FDR | AF | TAD gene density | Active regions | Gene | Location | Dataset |
|---|---|---|---|---|---|---|---|---|
| BLD_GSH_1 | chr1:150200138-150200139 | 0.84 | 0.14 | NA | 4_Tx,5_TxWk | ANP32E | Intron | 1KG |
| BLD_GSH_2 | chr1:198243300-198243301 | 1 | 0.11 | 4.94 | 5_TxWk | NEK7 | Intron | 1KG, GTEx |
| BLD_GSH_3 | chr11:129759556-129759557 | 1 | 0.25 | NA | 4_Tx,5_TxWk | NFRKB | Intron | 1KG |
| BLD_GSH_4 | chr12:122722288-122722289 | 1 | 0.23 | NA | 4_Tx,5_TxWk | VPS33A | Intron | 1KG |
| BLD_GSH_5 | chr13:111559414-111559652 | 0.87 | 0.2 | NA | 4_Tx,5_TxWk | ANKRD10 | Intron | 1KG |
| BLD_GSH_6 | chr15:49609604-49609605 | 1 | 0.17 | 10.7 | 5_TxWk | GALK2 | Intron | 1KG, GTEx |
| BLD_GSH_7 | chr15:59169388-59169389 | 0.47 | 0.16 | NA | 4_Tx,5_TxWk | - | Intergenic | 1KG |
| BLD_GSH_8 | chr2:39071477-39071819 | 0.16 | 0.32 | NA | 4_Tx,5_TxWk | DHX57 | Intron | 1KG |
| BLD_GSH_9 | chr2:223481690-223481979 | 0.93 | 0.28 | NA | 4_Tx,5_TxWk | FARSB | Intron | 1KG |
| BLD_GSH_10 | chr3:37361602-37361603 | 1 | 0.1 | 25 | 4_Tx,5_TxWk | GOLGA4 | Intron | 1KG, GTEx |
| BLD_GSH_11 | chr3:45542662-45542663 | 0.13 | 0.36 | NA | 4_Tx,5_TxWk | LARS2 | Intron | 1KG, GTEx |
| BLD_GSH_12 | chr3:45768351-45768676 | 0.26 | 0.38 | NA | 4_Tx,5_TxWk | SACM1L | Intron | 1KG |
| BLD_GSH_13 | chr4:88032137-88032469 | 0.76 | 0.58 | NA | 4_Tx,5_TxWk | AFF1 | Intron | 1KG, GTEx |
| BLD_GSH_14 | chr6:157397700-157397701 | 1 | 0.13 | NA | 2_TssAFlnk,5_TxWk | ARID1B | Intron | 1KG, GTEx |
| BLD_GSH_15 | chr8:120800792-120800793 | 0.89 | 0.21 | NA | 4_Tx,5_TxWk | TAF2 | Intron | 1KG |
| BLD_GSH_16 | chr9:100675550-100675551 | n.s. | 0.16 | NA | 4_Tx,5_TxWk | TRMO | Intron | GTEx |
| BLD_GSH_17 | chr9:115937084-115937379 | 0.34 | 0.35 | NA | 4_Tx,5_TxWk | FKBP15 | Intron | 1KG |
| BLD_GSH_18 | chr1:1654012-1654013 | n.s. | 0.15 | NA | 5_TxWk | CDK11A | Intron | GTEx |
| BLD_GSH_19 | chr15:79167169-79167170 | n.s. | 0.12 | NA | 1_TssA,2_TssAFlnk | MORF4L1 | Intron | GTEx |
GSH_ID: Unique GSH ID, Position: genomic coordinates for the GSH (hg19), FDR: eQTL FDR value, n.s. non-significant, AF: pMEI allele frequency, TAD gene density: Gene density of GSH TAD, NA GSH not assigned to a TAD, Active regions: active transcription region based on ChromHMM states, Gene: GSH overlapping gene, Location: position of GSH (intron, exon, intergenic), Dataset: data source, 1KG the 1000 Genomes Project, GTEX the Genotype-Tissue Expression Project
Fig. 2Epigenetic and chromatin interactions near the candidate GSH sites in blood and brain. a Genome browser screenshot of a representative GSH in blood. From top to bottom: Interaction heatmap and TADs from Hi-C in GM12878 cells. Chromatin interaction loops from promoter capture Hi-C in blood cells (see Method section for details). The coordinate of the GSH. Active and repressive genomic regions defined by 15-state ChromHMM from blood cells in the Roadmap project (Additional file 9: Table S8), and reference genes. b Genome browser screenshot of a representative GSH in brain. From top to bottom: Interaction heatmap and TADs from Hi-C in brain hippocampus. Chromatin interaction loops from promoter capture Hi-C in brain cells (dorsolateral prefrontal cortex, hippocampus, and neural progenitor cells). The coordinate of the GSH. Active and repressive genomic regions defined by 15-state ChromHMM from brain cells in the Roadmap project (Additional file 9: Table S8), and reference genes. Regions surrounding the GSH sites are highlighted with blue shade
Identified GSH sites in brain from the GTEx data
| GSH_ID | Position | FDR | AF | TAD gene density | Active regions | Gene | Location | Dataset |
|---|---|---|---|---|---|---|---|---|
| BRN_GSH_1 | chr1:247027889-247027890 | n.s | 0.46 | NA | 4_Tx,5_TxWk | AHCTF1 | Intron | GTEx |
| BRN_GSH_2 | chr12:987960-987961 | n.s | 0.14 | NA | 4_Tx,5_TxWk | WNK1 | Intron | GTEx |
| BRN_GSH_3 | chr22:18283915-18283916 | n.s | 0.32 | NA | 5_TxWk | MICAL3 | Intron | GTEx |
| BRN_GSH_4 | chr4:5834890-5834891 | n.s | 0.22 | NA | 4_Tx,5_TxWk | CRMP1 | Intron | GTEx |
| BRN_GSH_5 | chr7:5553845-5553846 | n.s | 0.15 | NA | 1_TssA,2_TssAFlnk | LOC221946 | Intron | GTEx |
GSH_ID: Unique GSH ID, Position: Genomic coordinates for the GSH (hg19), FDR: eQTL FDR value, n.s. non-significant, AF: pMEI allele frequency, TAD gene density: gene density of GSH TAD, NA GSH not assigned to a TAD, Active regions: active transcription region based on ChromHMM states, Gene: GSH overlapping gene, Location: position of GSH (intron, exon, intergenic), Dataset: data source, GTEX the Genotype-Tissue Expression Project
Fig. 3Experimental validation of GSHs in HUDEP2 cells. a PCA plot showing the RNA-seq data for all tested cell lines. b–d Volcano plots showing differential expressed genes (DEGs) in a blood GSH (BLD_GSH_10), non-GSH MEI (MEI_chr3_3707_INS) and AAVS1. Common: DEGs share by more than two cell lines. Same TAD: genes within the same TAD of the GFP integration site; +/− TAD: genes in the TADs flanking to the GFP integration site
Fig. 4Long-term validation of BLD_GSH_10 clones. a Representative distribution of GFP fluorescence signals in HUDEP2 WT cells (gray) and cells from a HUDEP2 clone with a GFP reporter transgene integrated in the GSH site (blue) in day 1 and day 90, respectively. b Bar plots showing the normalized GFP fluorescence signals of five independent clones and WT HUDEP2 cells. c Representative immuno-flow cytometry results showing cell differentiation comparison between WT cells and cells from one GFP clone. Y-axis is the signal for red blood cell maturation marker Band3. X-axis is the signals for GFP. The mature red blood cell compartment is highlighted in red. d Bar plots showing the percent of Band3 high cell populations before and after differentiation for five GFP clones and WT cells