| Literature DB >> 29688361 |
Modupeore O Adetunji1, Susan J Lamont2, Carl J Schmidt1.
Abstract
Database URL: https://modupeore.github.io/TransAtlasDB/.Entities:
Mesh:
Year: 2018 PMID: 29688361 PMCID: PMC5824778 DOI: 10.1093/database/bay014
Source DB: PubMed Journal: Database (Oxford) ISSN: 1758-0463 Impact factor: 3.451
Figure 1.The Architecture of TransAtlasDB.
Column names requirement status for Sample information tab-delimited file
| Header | Status | Description |
|---|---|---|
| Sample name | Required | Sample identification number |
| Sample description | Optional | Sample description |
| Derived from | Required | Animal identification number |
| Organism | Required | Organism name |
| Organism part | Required | Tissue name |
| First name | Optional | Person’s first name |
| Middle initial | Optional | Person’s middle Initial |
| Last name | Optional | Person’s last name |
| Organization | Optional | Organization |
List of programs accepted in TransAtlasDB.
| Information | Programs |
|---|---|
| Alignment Information | |
| TopHat2 | |
| HiSAT2 | |
| STAR | |
| Expression Information | |
| Cufflinks | |
| Strintie | |
| Kallisto | |
| Salmon | |
| ReadCount information | |
| htseqcount | |
| featureCounts | |
| STAR quantMode | |
| Variant Information | |
| GATK | |
| SamTools | |
| Variant Annotation Information | |
| VEP | |
| Annovar | |
| Sequence files details (optional) | |
| FastQC | |
Figure 2.Directory structure layout for each sample. Output files (sufflix) required from the specified software for the different RNAseq analyses data types.
Figure 3.Schema of the TransAtlasDB RDB system. The MySQL tables are grouped by data stored (i.e. Sample Information, Alignment Information, Expression Information and Variants Information).
Description of MySQL database schema. The MySQL schema consists of (A) 23 tables, (B) 6 views and (C) 4 stored procedures relevant for the organization of the different data sets generation from transcriptome analysis.
| Attributes | Description |
|---|---|
| A. TABLES | |
| Animal | Animal information |
| AnimalStats | Additional information on Animal |
| Breed | Organism Breed |
| CommandSyntax | Analysis data commands |
| DevelopmentalStage | Organism developmental stage |
| GeneStats | Expression information summary |
| HealthStatus | Organism health status |
| MapStats | Alignment information and statistics |
| Material | Type of Sample |
| Metadata | Alignment information summary |
| Organism | Organism information |
| Organization | Organization of scientist |
| Person | Scientist information |
| ReadCounts | Raw counts details |
| Sample | Sample information |
| SampleOrganization | Cross reference of Sample and Organization |
| SamplePerson | Cross reference of Sample and Person |
| SampleStats | Additional information on Sample |
| Sex | Sex of Organism |
| Tissue | Organism part |
| VarAnnotation | Variant annotation information |
| VarResult | Variant information |
| VarSummary | Variant information summary |
| B. VIEWS | |
| vw_nosql | Sample details |
| vw_nosql | Prototype of NoSQL template |
| vw_sampleinfo | Summary analysis and statistics of each sample |
| vw_seqstats | Sequencing Metadata of all RNAseq analysis |
| vw_vanno | Variant annotation details |
| vw_vvcf | Prototype of VCF template |
| C. PROCEDURES | |
| usp_vall | Variants information in organism |
| usp_vchrom | Variants information of a chromosome |
| usp_vchrposition | Variants information of a chromosomal region |
| usp_vgene | Variants information of a gene |
Fields in FastBit system for querying
| Fields | Type | Description | |
|---|---|---|---|
| A. | |||
| sampleid | text | Sample Id | |
| chrom | key | Reference chromosome | |
| position | int | Reference Position | |
| refallele | char | Reference allele | |
| altallele | char | Alternate allele | |
| quality | double | Variant Quality | |
| dbsnpvariant | text | dbSNP membership number | |
| variantclass | key | Type of variant | |
| zygosity | key | Genotype | |
| source | text | Source of annotation | |
| consequence | text | Variant consequence | |
| geneid | text | Gene Id (from NCBI or Ensembl) | |
| genename | text | Gene short name | |
| transcript | text | Transcript Id (if provided) | |
| feature | text | Feature annotation | |
| genetype | text | Location of variant | |
| proteinposition | int | Relative position of aminoacid in protein | |
| aachange | text | Aminoacid change | |
| codonchange | text | Alternative codon with the variant | |
| organism | text | Organism name | |
| tissue | text | Tissue | |
| B. | |||
| sampleid | text | Sample Id | |
| chrom | key | Gene/Feature chromosome | |
| start | int | Gene/Feature start position | |
| stop | int | Gene/Feature end position | |
| genename | text | Gene short name (if available) | |
| geneid | text | Gene Id(s) associated with the gene/feature | |
| coverage | double | Estimated absolute depth of read coverage for the gene/feature | |
| tpm | double | TPM of the Gene/Feature | |
| fpkm | double | FPKM of the Gene/Feature | |
| fpkmconflow | double | The lower bound of the 95% confidence interval on the FPKM of the Gene/Feature | |
| fpkmconfhigh | double | The upper bound of the 95% confidence interval on the FPKM of the Gene/Feature | |
| fpkmstatus | char | Quantification status for the Gene/Feature | |
| genename | text | Gene short name | |
| tissue | text | Tissue | |
| C. | |||
| sampleid | text | Sample Id | |
| genename | text | Gene short name (if available) | |
| readcount | Int | Read counts per Gene | |
| organism | text | Organism name | |
| tissue | text | Tissue | |
FastBit fields are similar to the (A) variant information tables, (B) expression information tables and (C) gene counts information in the RDB, allowing synonymous access to queries data across both systems.
Scripts within the TransAtlasDB toolkit
| File Name | module | Description |
|---|---|---|
| Database installation module | ||
| MySQL & FastBit re-connection application | ||
| metadata | Database sample import module | |
| data2db | Database import module | |
| delete | Database sample delete module | |
| Database interactive module with pre-configured database queries | ||
| query | User database queries | |
| db2data | Database retrieval module | |
| Folder with sample files |
Figure 4.Data import procedure using tad-import.pl and available options for (A) samples metadata and (B) RNAseq data, respectively.
Figure 5.Data export procedure using tad-export.pl and available options either executing a MySQL query syntax or choosing from the four defined (avgfpkm, genexp, chrvar and varanno) options.
Figure 6.Performing SQL DQL via the web interface to the (A) relational and (B) non-RDB.
Figure 7.Various summary tables displaying database content in the About page.
Figure 8.Status of analysis data previously archived.
Figure 9.Use Cases via the web interface. (A) Genes summary expression levels across all samples. (B) Genes fpkm expression level for each sample. (C) Variants found in the OPTN gene. (D) Variants found around the chromosomal region of the OPTN gene.