| Literature DB >> 31181663 |
Kate Megquier1, Diane P Genereux2, Jessica Hekman3, Ross Swofford4, Jason Turner-Maier5, Jeremy Johnson6, Jacob Alonso7, Xue Li8,9, Kathleen Morrill10,11, Lynne J Anguish12, Michele Koltookian13, Brittney Logan11, Claire R Sharp14, Lluis Ferrer15, Kerstin Lindblad-Toh16,17, Vicki N Meyers-Wallen18, Andrew Hoffman19,20, Elinor K Karlsson21,22,23.
Abstract
Dogs are an unparalleled natural model for investigating the genetics of health and disease, particularly for complex diseases like cancer. Comprehensive genomic annotation of regulatory elements active in healthy canine tissues is crucial both for identifying candidate causal variants and for designing functional studies needed to translate genetic associations into disease insight. Currently, canine geneticists rely primarily on annotations of the human or mouse genome that have been remapped to dog, an approach that misses dog-specific features. Here, we describe BarkBase, a canine epigenomic resource available at barkbase.org. BarkBase hosts data for 27 adult tissue types, with biological replicates, and for one sample of up to five tissues sampled at each of four carefully staged embryonic time points. RNA sequencing is complemented with whole genome sequencing and with assay for transposase-accessible chromatin using sequencing (ATAC-seq), which identifies open chromatin regions. By including replicates, we can more confidently discern tissue-specific transcripts and assess differential gene expression between tissues and timepoints. By offering data in easy-to-use file formats, through a visual browser modeled on similar genomic resources for human, BarkBase introduces a powerful new resource to support comparative studies in dogs and humans.Entities:
Keywords: ATAC-seq; RNA-seq; annotation; canine; comparative; dog; epigenomic; expression; genome
Mesh:
Substances:
Year: 2019 PMID: 31181663 PMCID: PMC6627511 DOI: 10.3390/genes10060433
Source DB: PubMed Journal: Genes (Basel) ISSN: 2073-4425 Impact factor: 4.096
Figure 1BarkBase sample collection and data production. Samples were collected from a total of six embryos, and six adult dogs. BarkBase currently contains RNA-seq data from up to five tissues in d33, d36, d39, and d44 embryos, and from up to 27 tissues sampled from each of five adult dogs diverse in age and in breed ancestry. ATAC-seq data are currently available for eight tissues from a subset of individuals. Additional data sets will be posted as they become available.
Figure 2The BarkBase web portal. The BarkBase web portal enables download of whole genome sequence (WGS) data, RNA-seq data, and assay for transposase-accessible chromatin using sequencing (ATAC-seq) data for (A) up to 27 tissues from each of the five adult dogs; and (B) up to five tissues from canine embryos collected at each of the four staged gestational timepoints. Reads preprocessed and aligned to CanFam3.1 are available at BarkBase.org. From the BarkBase interface (C), users can readily select specific tissues and samples. Raw read data from RNA-seq and ATAC-seq is available through the Sequence Read Archive (SRA) (Table S1).
Figure 3BarkBase captures novel transcripts. Overlapping the transcriptome from BarkBase and Ensembl shows most bases are captured in both datasets. BarkBase contains 84 Mb of transcribed sequence not included in the existing annotation, highlighting its utility to improve the annotation of the canine genome.
Figure 4Cumulative transcriptome expression is matched to tissue type. Cumulative sum of fraction of tissue-specific transcriptomes represented by individual genes in (A) canine embryos at four gestational time points; and (B) up to five individual adult dogs. Single-gene counts per million (CPM) values were divided by sample-sum CPM, sorted in increasing order, and the cumulative sum calculated. Cumulative values are shown for the 1000 top-expressed genes in each sample. Data sampled from a given embryonic tissue at different gestational time points are very similar, perhaps reflecting the fairly narrow time window of sampling. Combining data from adult and embryonic samples (C) reveals strong similarity of data from given tissue types across individuals and developmental stages.
Figure 5Transcriptome data from five individuals clusters primarily by tissue type. Hierarchical clustering of RNA-seq data from (A) single tissues of five adult dogs; (B) five adult dogs, based on data concatenated across 21 tissues; and (C) embryonic tissues sampled at four gestational time points. Clustering is based on Euclidean distances among samples. Overall, in data from both adults and embryos, samples of a given tissue cluster across individuals. As observed in cumulative analysis, embryonic samples of a given tissue type cluster despite variation in gestational time points, perhaps reflecting the fairly narrow time window of sampling.
Figure 6The relationship between samples within a single tissue type is highly variable. Clustering is based on Euclidean distances among samples, with no consistent clustering by age or breed observed. Outlines group tissues of a given class.
Figure 7Gene expression levels correlate between dog and human tissues. Heatmap showing Spearman correlation between the genes expressed in canine and human tissues, after filtering for minimum expression (median CPM > 1) and unique orthology mapping between species. In all cases except one (dog thyroid), comparison of dog tissue to the corresponding human tissue had the highest Spearman coefficient, suggesting broad conservation of the transcriptome in these tissues across species.
Functional enrichment among genes differentially expressed in embryonic tissues at d36 and d44 reflects organ-specific roles. Genes differentially expressed in each of the five individual tissues sampled at embryonic d36 as compared d44 (FDR < 0.1) were analyzed for functional enrichment using IPA.
| Category | Diseases or Functions |
| No. of Genes | Genes | |
|---|---|---|---|---|---|
|
| Organismal Injury and Abnormalities | Hypertrophy | 9.8 × 10−10 | 15 |
|
| Visceromegaly | 1.6 × 10−9 | 18 |
| ||
| Cardiovascular Disease, Cardiovascular System Development and Function, Organ Morphology, Organismal Development, Organismal Injury and Abnormalities | Enlargement of heart | 3.7 × 10−9 | 15 |
| |
| Abnormal morphology of heart | 5.0 × 10−9 | 17 |
| ||
| Muscular hypertrophy | 5.7 × 10−9 | 10 |
| ||
| Hypertrophy of heart | 1.6 × 10−7 | 11 |
| ||
| Cardiovascular System Development and Function | Morphology of cardiovascular system | 6.4 × 10−9 | 19 |
| |
| Cardiovascular Disease, Cardiovascular System Development and Function | Abnormal morphology of cardiovascular system | 8.2 × 10−9 | 18 |
| |
| Organismal Development, Organismal Injury and Abnormalities | Abnormal morphology of thoracic cavity | 2.9 × 10−8 | 18 |
| |
| Organismal Development | Abnormal morphology of body cavity | 8.0 × 10−8 | 22 |
| |
|
| Skeletal and Muscular System Development and Function | Morphogenesis of embryonic skeleton | 4.4 × 10−12 | 7 |
|
| Morphology of axial skeleton | 1.1 × 10−8 | 8 |
| ||
| Fusion of bone | 1.6 × 10−8 | 6 |
| ||
| Morphology of skeleton | 1.8 × 10−8 | 9 |
| ||
| Embryonic Development, Organismal Development | Patterning of rostrocaudal axis | 1.2 × 10−11 | 8 |
| |
| Organismal Development | Abnormal morphology of body cavity | 9.2 × 10−9 | 17 |
| |
| Cardiovascular System Development and Function, Organ Development, Organ Morphology, Skeletal and Muscular System Development and Function | Contraction of cardiac muscle | 3.9 × 10−8 | 6 |
| |
| Organ Morphology, Skeletal and Muscular System Development and Function | Quantity of rib | 6.0 × 10−8 | 5 |
| |
| Cancer, Skeletal and Muscular Disorders, Tissue Morphology | Transformation of vertebrae | 7.1 × 10−8 | 5 |
| |
| Organismal Development, Organismal Injury and Abnormalities | Abnormal morphology of thoracic cavity | 8.8 × 10−8 | 13 |
| |
|
| Cell Cycle | Cell division of neural stem cells | 9.1 × 10−5 | 1 |
|
| Embryonic Development, Nervous System Development and Function, Organ Development, Organismal Development, Tissue Development | Development of hippocampal fissure | 9.1 × 10−5 | 1 |
| |
| Nervous System Development and Function, Organ Morphology, Organismal Development | Size of primary visual cortex | 9.1 × 10−5 | 1 |
| |
| Nervous System Development and Function, Neurological Disease, Organ Morphology, Organismal Development, Organismal Injury and Abnormalities | Abnormal morphology of medial ganglionic eminences | 1.8 × 10−4 | 1 |
| |
| Developmental Disorder, Embryonic Development, Tissue Morphology | Degeneration of Wolffian duct | 1.8 × 10−4 | 1 |
| |
|
| Cancer, Gastrointestinal Disease, Hepatic System Disease, Organismal Injury and Abnormalities | Hepatitis B virus-related hepatocellular carcinoma | 9.2 × 10−5 | 3 |
|
| Cell-To-Cell Signaling and Interaction, Renal and Urological System Development and Function | Activation of kidney cells | 9.8 × 10−5 | 2 |
| |
| Organismal Injury and Abnormalities | Organ Degeneration | 9.8 × 10−5 | 8 |
| |
| Developmental Disorder, Hereditary Disorder, Metabolic Disease, Organismal Injury and Abnormalities | Hyperphenylalaninemia | 2.3 × 10−4 | 2 |
| |
| Lipid Metabolism, Small Molecule Biochemistry | Metabolism of acylglycerol | 3.2 × 10−4 | 4 |
| |
| lung | Cancer,Organismal Injury and Abnormalities | Epithelial neoplasm | 0.0 | 6961 |
|
| Non-hematological solid tumor | 0.0 | 7039 |
| ||
| Nonhematologic malignant neoplasm | 0.0 | 7021 |
| ||
| Carcinoma | 0.0 | 6949 |
| ||
| Tumorigenesis of tissue | 0.0 | 6969 |
|
Figure 8ATAC-seq maps transposase-accessible chromatin in canine tissues. Analysis of the two tissue types with ATAC-seq data for five individuals, pancreas (A) and salivary gland (B), reveals strong enrichment of peaks around known transcription start sites. This enrichment is consistent across individuals. Annotating the ATAC-seq peaks with ChIPseeker, using the Ensembl annotation of dog, shows, as expected, an overlap with known promoters in both (C) the pancreas (n ≅ 10,000) and (D) salivary gland (n ≅ 12,000), but there are more peaks in distal/intergenic regions, potentially marking novel promoters or distal regulatory elements. (E) Across all tissues, ATAC-seq peaks are most likely to be in annotated promoters, but a large proportion are far from genes. (F) In all tissues, the enrichment for ATAC-seq peaks falls off rapidly with increasing distance from a TSS.
Figure 9Integrating ATAC-seq with RNA-seq data can help validate novel genes. (A) Of the 44 novel genes expressed in the pancreas, most are less than 25 kb from a pancreas ATAC-seq peak. For those closest to ATAC-seq peaks, integrating RNA-seq and ATAC-seq provides additional evidence that they are real genes. (B) 58 novel genes expressed in the salivary gland (including 15 also expressed in pancreas) do not cluster as closely to pancreas ATAC-seq peaks, suggesting tissue specificity. (C) 491 novel genes not expressed in the pancreas are much more dispersed relative to the ATAC-seq peaks in the pancreas.