| Literature DB >> 31824661 |
Gloria C Ferreira1, Jenna Oberstaller2, Renée Fonseca3, Thomas E Keller4, Swamy Rakesh Adapa2, Justin Gibbons3, Chengqi Wang2, Xiaoming Liu2, Chang Li2, Minh Pham5, Guy W Dayhoff Ii6, Ben Busby7, Rays H Y Jiang2, Linh M Duong8,9, Luis Tañón Reyes10, Luciano Enrique Laratelli6, Douglas Franz6, Segun Fatumo11, Atm Golam Bari12, Audrey Freischel9, Lindsey Fiedler8, Omkar Dokur12, Krishna Sharma13, Deborah Cragun2.
Abstract
Background: Basic and clinical scientific research at the University of South Florida (USF) have intersected to support a multi-faceted approach around a common focus on rare iron-related diseases. We proposed a modified version of the National Center for Biotechnology Information's (NCBI) Hackathon-model to take full advantage of local expertise in building "Iron Hack", a rare disease-focused hackathon. As the collaborative, problem-solving nature of hackathons tends to attract participants of highly-diverse backgrounds, organizers facilitated a symposium on rare iron-related diseases, specifically porphyrias and Friedreich's ataxia, pitched at general audiences.Entities:
Keywords: Ataxia; Bioinformatics; Clinical Informatics; Data Science; Friedreich’s Ataxia; Hackathon; Porphyria; Rare Diseases
Mesh:
Substances:
Year: 2019 PMID: 31824661 PMCID: PMC6894363 DOI: 10.12688/f1000research.19140.1
Source DB: PubMed Journal: F1000Res ISSN: 2046-1402
Figure 1. UPWARD - Uniting People Working Against Rare Disease.
UPWARD opens with a web interface designed to clearly communicate research and advocacy goals to the public, request consent and gather data in a HIPPA-compliant manner.
List of porphyria-related pathogenic SNPs.
UPWARD includes a tool built to map highly-pathogenic and likely-pathogenic porphyria-associated variants.
| Name | Gene | RSID | Chip |
|---|---|---|---|
| NM_000374.4(UROD):c.603A>G (p.Pro201=) | UROD | rs2228084 | GSA |
| NM_000374.4(UROD):c.842G>A (p.Gly281Glu) | UROD | rs121918057 | GSA |
| NM_000374.4(UROD):c.842G>T (p.Gly281Val) | UROD | rs121918057 | GSA |
| NM_000374.4(UROD):c.874C>G (p.Arg292Gly) | UROD | rs121918059 | GSA |
| NM_000374.4(UROD):c.912C>A (p.Asn304Lys) | UROD | rs121918065 | GSA |
| NM_000374.4(UROD):c.932A>G (p.Tyr311Cys) | UROD | rs121918061 | GSA |
| NM_000374.4(UROD):c.995G>A (p.Arg332His) | UROD | rs121918066 | GSA |
| NM_000309.4(PPOX):c.-90G>T | PPOX | rs115158839 | GSA |
| NM_001122764.1(PPOX):c.199delC (p.Leu67Terfs) | PPOX | rs786204784 | GSA |
| NM_001122764.3(PPOX):c.502C>T (p.Arg168Cys) | PPOX | rs121918325 | GSA |
| NM_000097.5(CPOX):c.814A>C (p.Asn272His) | CPOX | rs1131857 | GSA |
| NM_000410.3(HFE):c.187C>G (p.His63Asp) | HFE|LOC108783645 | rs1799945 | GSA |
| NM_000410.3(HFE):c.193A>T (p.Ser65Cys) | HFE|LOC108783645 | rs1800730 | GSA |
| NM_000410.3(HFE):c.845G>A (p.Cys282Tyr) | HFE | rs1800562 | GSA |
| NM_000031.5(ALAD):c.823G>A (p.Val275Met) | ALAD | rs121912981 | GSA |
| NM_000031.5(ALAD):c.718C>T (p.Arg240Trp) | ALAD | rs121912982 | GSA |
| NM_000031.5(ALAD):c.397G>A (p.Gly133Arg) | ALAD | rs121912980 | GSA |
| NM_000031.5(ALAD):c.36C>G (p.Phe12Leu) | ALAD | rs121912984 | GSA |
| NM_000375.2(UROS):c.683C>T (p.Thr228Met) | UROS | rs121908014 | GSA |
| NM_000375.2(UROS):c.673G>A (p.Gly225Ser) | UROS | rs121908020 | GSA |
| NM_000375.2(UROS):c.244G>T (p.Val82Phe) | UROS | rs121908016 | GSA |
| NM_000375.2(UROS):c.217T>C (p.Cys73Arg) | UROS | rs121908012 | GSA |
| NM_000375.2(UROS):c.184A>G (p.Thr62Ala) | UROS | rs28941775 | GSA |
| NM_000375.2(UROS):c.10C>T (p.Leu4Phe) | UROS | rs121908015 | GSA |
| NM_000190.4(HMBS):c.445C>T (p.Arg149Ter) | HMBS | rs118204120 | GSA |
| NM_000190.4(HMBS):c.499C>T (p.Arg167Trp) | HMBS | rs118204101 | GSA |
| NM_000190.4(HMBS):c.500G>T (p.Arg167Leu) | HMBS | rs118204095 | GSA |
| NM_000190.4(HMBS):c.500G>A (p.Arg167Gln) | HMBS | rs118204095 | GSA |
| NM_000190.4(HMBS):c.601C>T (p.Arg201Trp) | HMBS | rs118204109 | GSA |
| NM_000190.4(HMBS):c.606G>T (p.Val202=) | DPAGT1|HMBS | rs1131488 | GSA |
| NM_000190.4(HMBS):c.1075G>A (p.Asp359Asn) | HMBS | rs144949995 | GSA |
| NM_001382.3(DPAGT1):c.1177A>G (p.Ile393Val) | DPAGT1|HMBS | rs643788 | GSA |
| NM_001382.3(DPAGT1):c.994T>G (p.Phe332Val) | DPAGT1|HMBS | rs138544311 | GSA |
| NM_000374.4(UROD):c.603A>G (p.Pro201=) | UROD | rs2228084 | OmniExpress |
| NM_000309.4(PPOX):c.-186C>A | PPOX | rs2301286 | OmniExpress |
| NM_000410.3(HFE):c.187C>G (p.His63Asp) | HFE|LOC108783645 | rs1799945 | OmniExpress |
| NM_000190.3(HMBS):c.-65C>T | HMBS | rs589925 | OmniExpress |
| NM_000190.4(HMBS):c.88-14G>A | HMBS | rs17075 | OmniExpress |
| NM_000190.4(HMBS):c.613-19C>A | HMBS | rs1784304 | OmniExpress |
| NM_001382.3(DPAGT1):c.*427T>G | DPAGT1|HMBS | rs28990975 | OmniExpress |
| NM_001382.3(DPAGT1):c.*417T>C | DPAGT1|HMBS | rs7759 | OmniExpress |
| NM_001382.3(DPAGT1):c.*265A>G | DPAGT1|HMBS | rs28990974 | OmniExpress |
| NM_001382.3(DPAGT1):c.1177A>G (p.Ile393Val) | DPAGT1|HMBS | rs643788 | OmniExpress |
Figure 2. Overview of the Variants Discovery pipeline to report possible pathogenic variants associated with Mendelian diseases.
Abbreviations: dbNSFP, database for nonsynonymous SNPs’ functional predictions; WGSA, whole genome sequencing annotator; HGMD, Human gene mutation database; eQTL, expression quantitative trait loci.
Figure 3. Flowchart for Massiveseq Methodology.
The pipeline takes metadata from the Sequence Read Archive (SRA) and parses it for quality control (QC). The primary work takes place in a custom snakemake script that aligns sequences with Hisat2 and then quantifies transcripts with Stringtie in a parallelized fashion across available machines and cores.
Figure 4. The work flow chart for identifying abnormal genes based on RNA-Seq.
After RNA-Seq is performed on a patient sample, the program searches the Genotype-Tissue Expression Project (GTEx) database for RNA-Seq data from the specific tissue potentially associated with the disease. Three methods are used for RNA-Seq normalization (Fragments per kilobase of transcript per million mapped reads (FPKM), transcripts per million mapped reads (TPM) and Differential gene expression analysis based on the negative binomial distribution (as implemented in DESeq)), and the data were fit to a Gaussian mixture model to remove noise within samples. The differentially expressed genes in the patient sample are finally captured by using the R program DESeq.
Figure 5. Phenotype-to-Genotype Mapping: Assessing combinatorial variant-contribution to disease phenotypes general workflow.
Input data are variant-call files in .vcf format collected from patient samples. The feature-selection module collects all available annotation information for each identified variant, then narrows down to variants most likely to be associated with the phenotype based on user-specified parameters. These feature-selected variants are then analyzed for combinatorial contribution to the disease using the tools in the analysis module. The output of the analysis modules are tables and graphs that summarize the results.
Figure 6. Expression change of the UROS gene caused by eQTL SNP No. 1 across all tissue types in the Genotype-Tissue Expression Project (GTEx).
There is significant down-regulation of UROS gene associated with this variant in all tissues (except ovary). NES: normalized effect size.
Figure 7. Expression change of the UROS gene caused by eQTL SNP No. 2 across all tissue types in the Genotype-Tissue Expression Project (GTEx).
There is significant down-regulation of UROS gene associated with this variant in all tissues (except ovary). NES: normalized effect size.
Figure 8. Significance of up-regulated genes from metaseq analysis; red bar denotes .05 significance cutoff.
Distribution of significance in downregulated genes from metaseq analysis; no genes were significant at 0.05 threshold.
List of novel-isoform transcripts within 1kb of the FXN gene.
| Novel Transcript | Chr. | Strand | Start | End | FPKM | TPM | Disease |
|---|---|---|---|---|---|---|---|
| SRR8038380_chr.30572 | 9 | + | 69035259 | 69100178 | 1.787076 | 3.825795 | Friedrich Ataxia |
| SRR8038380_chr.30573 | 9 | - | 69107926 | 69108217 | 0.136139 | 0.291447 | Friedrich Ataxia |
| SRR8038387_chr.17699 | 9 | + | 69035259 | 69079076 | 0.274068 | 0.490055 | Carrier |
| SRR8038389_chr.19844 | 9 | + | 69035751 | 69074850 | 1.070571 | 1.139182 | Unaffected |
| SRR8038390_chr.21253 | 9 | + | 69035259 | 69100178 | 1.033484 | 1.298192 | Unaffected |
| SRR8038399_chr.26427 | 9 | + | 69035259 | 69079076 | 3.126959 | 7.802162 | Unaffected |
Available RNA-Seq data samples in Genotype-Tissue Expression Project (GTEx) for different tissues.
| Tissues | Number of
|
|---|---|
| Adipose | 797 |
| Adrenal | 190 |
| Bladder | 11 |
| Blood | 536 |
| Blood | 913 |
| Brain | 1671 |
| Breast | 290 |
| Cervix | 11 |
| Colon | 507 |
| Esophagus | 1021 |
| Fallopian | 7 |
| Heart | 600 |
| Kidney | 45 |
| Liver | 175 |
| Lung | 427 |
| Muscle | 564 |
| Nerve | 414 |
| Ovary | 133 |
| Pancreas | 248 |
| Pituitary | 183 |
| Prostate | 152 |
| Salivary | 97 |
| Skin | 1203 |
| Small | 137 |
| Spleen | 162 |
| Stomach | 262 |
| Testis | 259 |
| Thyroid | 446 |
| Uterus | 111 |
| Vagina | 115 |
Figure 9. n = 20 genes are sampled here to compare different normalization method: Fragments per kilobase of transcript per million mapped reads (FPKM), transcripts per million mapped reads (TPM) and Differential gene expression analysis based on the negative binomial distribution (DESeq).
Figure 10. The Gaussian mixture model is implemented here to filter out noise.
Hist plot shows the distribution of gene expression level for gene ‘CELSR2’ in 1671 different brain RNA-Seq samples. The Gaussian mixture model is fitted by the EM algorithm and the noise is filtered out by posterior probability bigger than 0.5.
Figure 11. Differential gene expression analysis based on the negative binomial distribution (DESeq) is used here to find differential expression genes between patient and database.
A) Scatter plot shows significant differential genes (green dot, p-adj < 0.01). B) Boxplot shows top 10 abnormal genes in simulation compared with data from database.