| Literature DB >> 31338105 |
Elizabeth Tseng1, William J Rowell1, Omolara-Chinue Glenn2,3, Ting Hon1, Julio Barrera2,3, Steve Kujawa1, Ornit Chiba-Falek2,3.
Abstract
Dysregulation of alpha-synuclein expression has been implicated in the pathogenesis of synucleinopathies, in particular Parkinson's Disease (PD) and Dementia with Lewy bodies (DLB). Previous studies have shown that the alternatively spliced isoforms of the SNCA gene are differentially expressed in different parts of the brain for PD and DLB patients. Similarly, SNCA isoforms with skipped exons can have a functional impact on the protein domains. The large intronic region of the SNCA gene was also shown to harbor structural variants that affect transcriptional levels. Here, we apply the first study of using long read sequencing with targeted capture of both the gDNA and cDNA of the SNCA gene in brain tissues of PD, DLB, and control samples using the PacBio Sequel system. The targeted full-length cDNA (Iso-Seq) data confirmed complex usage of known alternative start sites and variable 3' UTR lengths, as well as novel 5' starts and 3' ends not previously described. The targeted gDNA data allowed phasing of up to 81% of the ~114 kb SNCA region, with the longest phased block exceeding 54 kb. We demonstrate that long gDNA and cDNA reads have the potential to reveal long-range information not previously accessible using traditional sequencing methods. This approach has a potential impact in studying disease risk genes such as SNCA, providing new insights into the genetic etiologies, including perturbations to the landscape the gene transcripts, of human complex diseases such as synucleinopathies.Entities:
Keywords: Iso-Seq; PacBio; Parkinson’s Disease; alternative splicing; isoforms; long read sequencing; targeted sequencing
Year: 2019 PMID: 31338105 PMCID: PMC6629766 DOI: 10.3389/fgene.2019.00584
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Figure 1Schematic presentation of the study design. DNA and RNA materials were extracted from postmortem brain tissues of patients from Parkinson’s disease, Dementia with Lewy Body, and control groups. gDNA and cDNA libraries were made using probe hybridization and sequenced on the PacBio Sequel system. Analysis was performed using PacBio software and other existing tools.
Figure 2Targeted gDNA capture and phasing. An example showing one sample from each condition. Top track shows one of the SNCA isoforms, followed by the gDNA coverage for the three samples. The variant track shows each SNP and are color-coded for heterozygous (purple), homozygous alternative (orange), and homozygous reference (gray). Phased blocks are shown in light blue. Bottom track shows capture probe locations. The dropout region in probe design is due to two LINE elements in the middle of intron 4. For the gDNA coverage and phasing information of all 12 samples, see Supplementary Figures.
A novel triplet tandem repeat in intron 4 (chr4: 90713442).
| Sample | |
|---|---|
| PD-1 | |
| PD-2 | |
| PD-3 | |
| PD-4 | |
| N-1 | |
| N-2 | |
| N-3 | |
| N-4 | |
| DLB-1 | |
| DLB-2 | |
| DLB-3 | |
| DLB-4 |
PD-4 is incorrectly genotyped by GATK4HC but can be genotyped by visual inspection.
The reference has 16 repeats. The table shows the repeat number of both haplotypes for each sample.
Figure 3SNCA isoforms captured using targeted Iso-Seq identifies novel start and end sites. The majority of the isoform complexity comes from combinatorial usage of alternate 3′ UTR lengths and exon 1, with a few rare alternative splice sites found in exon 1 (green), 2 (red), and 4 (blue). All junctions have canonical splice sites. We identified five isoforms that skipped exon 5 and two isoforms that skipped exon 3. We also identified novel start (orange) and end sites (purple) in intron 4. Called SNPs are marked in purple.
SNCA isoform abundance for each sample, aggregated by splicing patterns.
| GROUP | PD-1 (%) | PD-2 (%) | PD-3 (%) | PD-4 (%) | N-1 (%) | N-2 (%) | N-3 (%) | N-4 (%) | DLB-1 (%) | DLB-2 (%) | DLB-3 (%) | DLB-4 (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| AllExons | 95.92 | 97.06 | 95.82 | 95.76 | 98.10 | 97.73 | 96.63 | 96.68 | 96.00 | 94.59 | 94.58 | 96.21 |
| Skip3 | 0.00 | 0.00 | 0.03 | 0.08 | 0.03 | 0.09 | 0.02 | 0.25 | 0.08 | 0.13 | 0.24 | 0.00 |
| Skip5 | 2.50 | 1.47 | 1.68 | 1.10 | 0.79 | 0.93 | 1.00 | 1.32 | 2.04 | 1.42 | 1.56 | 1.89 |
| Alt5 | 1.02 | 0.00 | 1.61 | 2.76 | 0.64 | 0.77 | 1.85 | 0.08 | 0.94 | 2.83 | 3.16 | 0.79 |
| Alt3 | 0.56 | 1.47 | 0.85 | 0.30 | 0.44 | 0.49 | 0.51 | 1.67 | 0.94 | 1.03 | 0.47 | 1.10 |
The abundance for each isoform is the fraction of on-target, full-length reads associated with that isoform.
Isoforms are grouped by their splice patterns (see Figure 3).
AllExons, isoforms expressing all 6 exons; Skip3, isoforms skipping exon 3; Skip5, isoforms skipping exon 5; Alt5, isoforms with the alternative start site in intron 4; Alt3, isoforms with the alternative end site in intron 4.
The full abundance for each isoform is shown in Supplementary Table and Supplementary Data.
cDNA SNP information.
| # | Coord | dbSNP | Annotation | Ref | Alt | Homo_Ref | Homo_Alt | Het | Inconclusive |
|---|---|---|---|---|---|---|---|---|---|
| 1 | 90758389 | rs2301135 | exon1 | G | C | PD-1, | PD-3, | PD-4, | PD-2, |
| 2 | 90757312 | rs2870027 | exon1 | C | T | PD-3, | DLB-3 | PD-1, | |
| 3 | 90743331 | rs10005233 | intron4 | C | T | PD-1, | PD-3, | N-1, | PD-2 |
| 4 | 90646886 | rs356165 | exon6 | G | A | N-4, | PD-1, | PD-3, | PD-2 |
SNPs were called using full-length reads from the Iso-Seq data. For each sample, the number of FL reads supporting either the reference or alternative base was tabulated. If both alleles have 5+ FL read support, it is called heterozygous; if one allele has 5+ reads and the other has <5, homozygous; otherwise, inconclusive.