| Literature DB >> 29649445 |
Christa Geeke Toenhake1, Sabine Anne-Kristin Fraschka1, Mahalingam Shanmugiah Vijayabaskar2, David Robert Westhead2, Simon Jan van Heeringen3, Richárd Bártfai4.
Abstract
Underlying the development of malaria parasites within erythrocytes and the resulting pathogenicity is a hardwired program that secures proper timing of gene transcription and production of functionally relevant proteins. How stage-specific gene expression is orchestrated in vivo remains unclear. Here, using the assay for transposase accessible chromatin sequencing (ATAC-seq), we identified ∼4,000 regulatory regions in P. falciparum intraerythrocytic stages. The vast majority of these sites are located within 2 kb upstream of transcribed genes and their chromatin accessibility pattern correlates positively with abundance of the respective mRNA transcript. Importantly, these regions are sufficient to drive stage-specific reporter gene expression and DNA motifs enriched in stage-specific sets of regulatory regions interact with members of the P. falciparum AP2 transcription factor family. Collectively, this study provides initial insights into the in vivo gene regulatory network of P. falciparum intraerythrocytic stages and should serve as a valuable resource for future studies.Entities:
Keywords: ATAC-seq; Plasmodium falciparum; RNA-seq; chromatin; malaria; regulatory sequences; transcription factor
Mesh:
Substances:
Year: 2018 PMID: 29649445 PMCID: PMC5899830 DOI: 10.1016/j.chom.2018.03.007
Source DB: PubMed Journal: Cell Host Microbe ISSN: 1931-3128 Impact factor: 21.023
Figure 1ATAC-Seq Detects Chromatin Accessibility during Intraerythrocytic Development of P. falciparum
(A) UCSC genome browser screenshot of a 66,700 bp region on chromosome 7 displaying chromatin accessibility profiles from eight time points (t05–t40 hr post-invasion [hpi]) as a ratio of normalized ATAC-seq tag count over background transposition in naked, genomic DNA (ATAC-seq/gDNA). Black bars below t40 track depict the peak regions as identified by MACS2 peak calling. Coding sequences are indicted as blue (positive strand) or red (minus strand) bars. GC%, the mean percentage of GC nucleotides, smoothened over 5 bp windows.
(B) Heatmap depicting the Pearson correlation of RPKM (reads per kb per million mapped reads) values in peak regions from the above dataset (REPLICATE1) and an independent eight time point ATAC-seq dataset (REPLICATE2) demonstrating a high degree of reproducibility.
(C) The distribution of accessible regions among different genomic regions. Peaks were called on accessibility profiles of individual time points (t05–t40) and merged, yielding 4,035 accessible regions. Intergenic regions were categorized based on the direction of transcription of the flanking coding regions (divergent, green; convergent, orange; tandem, pink).
(D) Peaks located in divergent and tandem intergenic regions were assigned to the closest downstream gene and the distance between the ATAC peak summit and the ATG or TSS was calculated. The line depicts the smoothened distribution of distances (kernel density estimate).
See also Figure S1 and Table S1.
Figure 2ATAC-Seq Predicts TF-Binding Events Detected Earlier by ChIP-Seq
(A) UCSC genome browser screenshots of three genomic regions with AP2-I TF-binding sites. Normalized ATAC-seq coverage is plotted as ratio over coverage in gDNA control. Data of the three AP2-I ChIP-seq replicates from Santos et al. (2017), generated in the P. falciparum Dd2 line, were mapped against the P. falciparum 3D7 genome. Turquoise bars are AP2-I peaks from Santos et al. (2017).
(B) Overlap between AP2-I peaks and ATAC peaks from t40 time point (165 out of 2,771 peaks overlap with 169 out of 177 AP2-I peaks) plotted as Venn diagram.
(C) Heatmap of non-gDNA corrected ATAC-seq accessibility profiles over AP2-I peaks (midpoint ± 5 kb) that overlap with merged ATAC peaks (n = 169). Profiles were clustered using Pearson correlation calculated for the middle 100 bp window into 2 clusters by k-means clustering.
(D) Median accessibility profiles for the two clusters of ATAC peaks defined in (C) with the 50th and 90th percentile as a dark- and light-colored shades. Accessibility is calculated as proportion of sum of quantile normalized RPKM values over the time points.
See also Table S2.
Figure 3Chromatin Accessibility Positively Correlates with mRNA Abundance of the Downstream Gene
(A) UCSC genome browser screenshot (chr3: 110,000–125,500) of ATAC-seq chromatin accessibility profiles (top) and corresponding RNA-seq profiles (bottom; only reads mapping to the sense strand are displayed).
(B) Heatmaps displaying relative chromatin accessibility and mRNA abundance profiles of the downstream gene through eight stages of intraerythrocytic development. Relative accessibility in ATAC-seq peaks located in divergent or tandem intergenic regions was calculated as a proportion of sum of quantile normalized RPKM values over the time points and clustered by k-means using the 1-Pearson correlation distance metric. The relative transcript abundances of the downstream located genes, expressed as proportion of sum of sense strand RPKM values over the time points, were plotted in the same order. Color scales range from the 20th to the 80th percentile for both datasets.
(C) Observed (red line) and 1,000 randomized distributions (1,000 gray lines with mean of all randomizations as black line) of Pearson correlation coefficients between relative chromatin accessibility and mRNA abundance of the downstream gene (as in B) plotted as smoothened kernel density distribution.
Figure 4Accessible Regions Are Sufficient in Driving Stage-Specific Reporter Gene Expression
(A) Schematic representation of the DNA constructs inserted into the cg6 locus of attB(+) 3D7 P. falciparum parasites using the site-specific integration system (Nkrumah et al., 2006). Accessible regions detected by ATAC-seq (light blue) were cloned upstream of the minimal kahrp promoter (yellow) followed by the gfp-luc fusion gene (green).
(B) Bar plots showing the relative gfp-luc transcript abundance determined by qRT-PCR for ring, trophozoite, and schizont stages from parasites carrying the reporter construct without accessible region (only the kahrp minimal promoter, yellow), with the accessible region of PF3D7_1372200 (hrpIII, positive control), or with selected accessible region located upstream of PF3D7_0719000, PF3D7_1200700, or PF3D7_1222700. For the latter three constructs with novel putative regulatory sequences, reporter gene expression was measured in biological duplicate. Relative gfp-luc transcript abundance was determined based on a standard reference dilution series (STAR Methods). The relative abundances in replicate 2 were scaled to the average of replicate 1. The raw data are depicted in Figure S2. Transcript abundance (RPKM) of the respective gene as determined by RNA-seq is indicated as a blue line for eight time points (t05–t40).
See also Figure S2.
Figure 5DNA Motifs Enriched in Stage-Specific ATAC Regions
(A) Co-clustering of ATAC-seq and RNA-seq data, limited to peak-to-gene pairs with Pearson correlation > 0.6 (n = 2,118; Table S1). Accessibility and transcript abundance are expressed as proportion of sum of (qn)RPKM values over the time points and clustered by k-means using the 1-Pearson correlation distance metric into eight clusters. Color scales range from the 20th to 80th percentile per dataset.
(B) Heatmap of significance estimates for differential motif enrichment (expressed as −log10(p value)). Each column relates to a cluster generated by co-clustering the ATAC-seq and RNA-seq data by k-means cluster (see A). Each row refers to a motif or a “motif group,” with logo and name listed on the right. Asterisks indicate that the motif is a representative from a group of similar motifs (Figure S3B; Table S3). Predicted binding sites for P. falciparum AP2 TFs are reported in blue font with PlasmoDB geneID (name, if known, is in brackets). When a cluster contains one or more predicted P. falciparum AP2 motifs, these are reported in brackets behind the representative motif.
See also Figure S3 and Tables S1 and S3.
Figure 6DNA Pull-Downs Identify Potential cis-trans Regulatory Interactors
Scatterplots displaying the quantitative proteomic analysis of duplicate DNA pull-downs with label swap using (A) 58–60 bp DNA probes with the CA-repetitive motif (ACACACAT) and three de novo predicted motifs (B, de_novo_motif_031, TTATTACAC; C, de_novo_motif_028, GCACWWTNNKTGCW; and D, de_novo_motif_050, GAGCTCAA). The same probes with a scrambled motif were used as controls. The statistically significant outliers (black diamond, intensity-based FDR < 5%) are the potential interactors to the motif. Red font indicates that the interaction was confirmed using a probe from a different genomic region, but containing the same motif. Green dots are candidate DNA-binding factors derived from Table 4 of Bischoff and Vaquero (2010). Earlier predicted binding sites for the identified AP2 factors are show in relation to the motif used in the pull-down below each plot.
See also Figure S4 and Table S4.
| REAGENT or RESOURCE | SOURCE | IDENTIFIER |
|---|---|---|
| STBL3_pDC2 | ( | N/A |
| STBL3_pOM1 | This manuscript | N/A |
| STBL3_pOM2 | This manuscript | N/A |
| XL10-Gold_ | This manuscript | N/A |
| XL10-Gold_ | This manuscript | N/A |
| XL10-Gold_ | This manuscript | N/A |
| XL10-Gold_ | This manuscript | N/A |
| XL10-Gold_ | This manuscript | N/A |
| XL10-Gold_ | This manuscript | N/A |
| XL10-Gold_ | This manuscript | N/A |
| DH5α_pINT | This manuscript | N/A |
| STBL3 | Thermo Fisher Scienfic | Cat#C7373-03 |
| XL10-Gold | Stratagene | Cat#200314 |
| DH5α | New England Biolabs | Cat#C29871 |
| Blasticidin-S-HCl | Thermo Fisher Scienfic | Cat#R210-01 |
| WR99210 | Jacobus Pharmaceuticals | N/A |
| Geneticin | Thermo Fisher Scienfic | Cat#11811-031 |
| Proteinase K | Sigma-Aldrich | Cat#P6556 |
| KAPA HiFi HotStart ReadyMix | KAPA Biosystems | Cat#KK2602 |
| TURBO DNase | Ambion (Thermo Fisher Scientific) | Cat#AM2238 |
| NextFlex adapters | Bio Scientific | Cat#514122 |
| Actinomycin D | Thermo Fisher Scientific | Cat#11805017 |
| SuperScript III Reverse Transcriptase | Invitrogen (Thermo Fisher Scientific) | Cat#18080044 |
| RNasin Plus RNase Inhibitor | Promega | Cat#N261B |
| 2x iQ SYBR Green Supermix | BioRad | Cat#170-8887 |
| cOmplete, EDTA-free Protease Inhibitor Cocktail | Roche (Sigma-Aldrich) | Cat#04693132001 |
| Ribonucleic acid, transfer from baker’s yeast ( | Sigma-Aldrich | Cat#R5636 |
| Poly(deoxyinosinic-deoxycytidylic) acid sodium salt | Sigma-Aldrich | Cat#P4929 |
| Poly(deoxyadenylic-thymidylic) acid sodium salt | Sigma-Aldrich | Cat#P0883 |
| TCEP | Sigma-Aldrich | Cat#C4706-2G |
| MMTS | Thermo Fisher Scienfic | Cat#23011 |
| Trypsin/Lys-C Mix, Mass Spec Grade | Promega | Cat#V5072 |
| NaBH3CN | Merck | Cat#818053 |
| NaBD3CN | Sigma-Aldrich | Cat#190020-1G |
| Trifluoroacetic acid ULC/MS | Biosolve BV | Cat#20234131 |
| Wizard Plus SV Minipreps DNA Purification Systems | Promega | Cat#A1460 |
| QIAamp DNA Blood Mini Kit | QIAGEN | Cat#51106 |
| QIAquick PCR Purification Kit | QIAGEN | Cat#28106 |
| MinElute PCR Purification Kit | QIAGEN | Cat#28006 |
| QIAGEN RNeasy Mini Kit | QIAGEN | Cat#74106 |
| Oligotex mRNA Mini Kit | QIAGEN | Cat#70022 |
| Qubit dsDNA HS Assay Kit | Thermo Fisher Scientific | Cat#Q32854 |
| Qubit RNA HS Assay Kit | Thermo Fisher Scientific | Cat#Q32852 |
| Qubit Protein Assay Kit | Thermo Fisher Scientific | Cat#Q33212 |
| Nextera DNA Library Prep Kit | Illumina | Cat#FC-121-1030 |
| Nextera DNA Sample Preparation Index Kit | Illumina | Cat#FC-121-1012 |
| Agilent High Sensitivity DNA Kit | Agilent | Cat#5067-4626 |
| KAPA Library Quantification Kit | KAPA Biosystems | Cat#KR0405 |
| NextSeq500/550 HighOutput kit V2 (75 cycles) | Illumina | Cat# FC-404-2005. |
| ATAC-seq data in | This manuscript | GEO: |
| RNA-seq data in | This manuscript | GEO: |
| AP2-I-GFP ChIP-seq data in | ( | GEO: |
| PlasmoDB and GeneDB ( | ||
| PlasmoDB and GeneDB ( | ||
| PlasmoDB and GeneDB ( | ||
| Motifs from plants, vertebrates and invertebrates reported in CISBP | ( | |
| Parasite strain: | ( | Alan Cowman, WEHI, Melbourne, Australia |
| Parasite strain: | ( | David A. Fidock, Columbia Uni., US |
| Parasite strain: | This manuscript | N/A |
| Parasite strain: | This manuscript | N/A |
| Parasite strain: | This manuscript | N/A |
| Parasite strain: | This manuscript | N/A |
| Parasite strain: | This manuscript | N/A |
| Parasite strain: | This manuscript | N/A |
| See | Biolegio B.V. | N/A |
| See | Integrated DNA Technologies | N/A |
| Random hexamer primers | Roche (Sigma-Aldrich) | Cat#11034731001 |
| OligodT12-18 | Invitrogen (Thermo Fisher Scientific) | Cat#18418012 |
| MV163 plasmid | ( | Robert Sauerwein, Radboud UMC, NL |
| pDC2 ( | ( | David A. Fidock, Columbia Uni., US |
| pINT | ( | David A. Fidock, Columbia Uni., US |
| pOM1 | This manuscript | N/A |
| pOM2 | This manuscript | N/A |
| This manuscript | N/A | |
| This manuscript | N/A | |
| This manuscript | N/A | |
| This manuscript | N/A | |
| This manuscript | N/A | |
| This manuscript | N/A | |
| This manuscript | N/A | |
| This manuscript | N/A | |
| FastQC v0.11.2 | ( | RRID: SCR_014583; |
| BWA samse (version 0.7.12-r1039) | ( | RRID: SCR_010910; |
| BWA-mem (version 0.7.10) | ( | RRID: SCR_010910; |
| Picard tools (version 1.139) | Broad Institute | RRID: SCR_006525; |
| Samtools (version 1.2 and 1.3.1) | ( | RRID: SCR_002105; |
| Bedtools suite (version 2.20.1) | ( | RRID: SCR_006646; |
| MACS2 (release 2.7) | ( | RRID: SCR_013291; |
| R package preprocessCore (version 1.36.0) | ( | |
| UCSC Genome Browser | ( | RRID: SCR_005780; |
| Morpheus tool | Broad Institute | |
| Trimmomatic (version 0.36) | ( | RRID: SCR_011848; |
| Fluff | ( | |
| maxQuant (version 1.5.3.30) | ( | RRID: SCR_014485; |
| Perseus software package (version 1.4.0.20) | ( | RRID: SCR_015753; |
| R package SeqGL (version 1.1.3) | ( | |
| GimmeMotifs package (v0.11.0) | ( | RRID: SCR_001146; |
| Plasmodipur filters | EuroProxima | Cat#8011Filter25u |
| Agencourt AMPure XP beads | Beckman Coulter | Cat#A63882 |
| Streptavidin Sepharose High Performance | GE Healthcare | Cat#17511301 |
| 2% E-Gel Size Select agarose gels | Invitrogen (Thermo Fisher Scientific) | Cat#G6610-02 |