| Literature DB >> 24625832 |
Marc P Hoeppner1, Andrew Lundquist2, Mono Pirun3, Jennifer R S Meadows1, Neda Zamani1, Jeremy Johnson4, Görel Sundström1, April Cook4, Michael G FitzGerald4, Ross Swofford4, Evan Mauceli5, Behrooz Torabi Moghadam6, Anna Greka4, Jessica Alföldi4, Amr Abouelleil4, Lynne Aftuck4, Daniel Bessette4, Aaron Berlin4, Adam Brown4, Gary Gearin4, Annie Lui4, J Pendexter Macdonald4, Margaret Priest4, Terrance Shea4, Jason Turner-Maier4, Andrew Zimmer4, Eric S Lander4, Federica di Palma7, Kerstin Lindblad-Toh8, Manfred G Grabherr8.
Abstract
The domestic dog, Canis familiaris, is a well-established model system for mapping trait and disease loci. While the original draft sequence was of good quality, gaps were abundant particularly in promoter regions of the genome, negatively impacting the annotation and study of candidate genes. Here, we present an improved genome build, canFam3.1, which includes 85 MB of novel sequence and now covers 99.8% of the euchromatic portion of the genome. We also present multiple RNA-Sequencing data sets from 10 different canine tissues to catalog ∼175,000 expressed loci. While about 90% of the coding genes previously annotated by EnsEMBL have measurable expression in at least one sample, the number of transcript isoforms detected by our data expands the EnsEMBL annotations by a factor of four. Syntenic comparison with the human genome revealed an additional ∼3,000 loci that are characterized as protein coding in human and were also expressed in the dog, suggesting that those were previously not annotated in the EnsEMBL canine gene set. In addition to ∼20,700 high-confidence protein coding loci, we found ∼4,600 antisense transcripts overlapping exons of protein coding genes, ∼7,200 intergenic multi-exon transcripts without coding potential, likely candidates for long intergenic non-coding RNAs (lincRNAs) and ∼11,000 transcripts were reported by two different library construction methods but did not fit any of the above categories. Of the lincRNAs, about 6,000 have no annotated orthologs in human or mouse. Functional analysis of two novel transcripts with shRNA in a mouse kidney cell line altered cell morphology and motility. All in all, we provide a much-improved annotation of the canine genome and suggest regulatory functions for several of the novel non-coding transcripts.Entities:
Mesh:
Substances:
Year: 2014 PMID: 24625832 PMCID: PMC3953330 DOI: 10.1371/journal.pone.0091172
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Comparison of canFam2.0 and canFam3.1.
| Property | canFam2.0 | canFam3.1 |
| Coverage of euchromatic portion of genome (%) | 99.2 | 99.6 |
| Portion of assembly in “certified regions” (%) | 99.5 | 99.8 |
| Contiguity: gaps per Mb | 12 | 6 |
| ENCODE regions | High-quality draft | 98% Finished |
Transcribed loci per tissue and library preparation.
| Brain | Blood | Heart | Kidney | Liver | Lung | Muscle | Ovary | Skin | Testis | Total | |
|
| 30,325 | 33,486 | 23,807 | 28,420 | 25,431 | 29,493 | 39,976 | 22,221 | 34,335 | 41,070 | 65,314 |
|
| 69,030 | 64,657 | 43,659 | 38,059 | 96,231 | 67,842 | 31,006 | 83,665 | N/A | 33,857 | 194,878 |
N/A, due to poor alignment performance, this library was excluded from subsequent analyses.
Figure 1Location of lincRNAs and single-exon intergenic non-coding transcripts.
(a) We show the mapping of lincRNAs, broken down by sample across chromosome 1. (b) Histogram of distance from intergenic transcripts to the next known transcribed element in both the 5′ (left) and 3′ (right) direction. The average distance is around 150,000 nucleotides, suggesting that these loci are not closely associated with known genes.
Figure 2Distance trees of expression profiles.
We constructed neighbor-joining trees based on the correlation between expression values (FPKM>1.0) between samples, with 1 minus Spearman's rho defining the distance. Colors denote library construction methods (poly-A: blue, DSN: red). We divided transcribed loci into (a) protein coding genes with RNA-Seq support, either annotated by EnsEMBL in dog or EnsEMBL in the human orthologous regions. Replicates cluster together, so do the library constructions methods poly-A and DSN, as well as related tissues, such as heart and muscle; (b) antisense transcripts, that overlap at least one exon of a protein coding gene, as defined in (a). With the exception of testis, poly-A and DSN separate the samples, with both the poly-A and DSN sub-trees maintaining closer relationships between the related tissues heart and muscle; (c) spliced intergenic loci, excluding sequences that have coding potential. Similar to protein coding genes, the poly-A and DSN group by tissue first, with the exception of kidney DSN; and (d) intergenic and uncharacterized single-exon transcript loci. In this set, DSN and poly-A are, similar to antisense loci, the most dominant factor when grouping samples.
Figure 3Modulation of podocyte motility through shRNAs to non-coding RNAs at the BAIAP2 locus.
A) The number of cells (average +/− SEM) that migrated into the scratch after 48 hours for each of the four conditions: scrambled shRNAs (negative control), shRNAs directed at coding sequence (positive control), antisense shRNAs, and lincRNA shRNAs. As previously shown, shRNA to Baiap2 (coding shRNA) inhibited podocyte migration (p<0.01). shRNAs to the lincRNA further reduced podocyte migration (p<0.001 compared to scrambled shRNA, p<0.01 compared to coding shRNA), while shRNAs to the antisense transcript resulted in increased podocyte motility (p<0.01). (B–D) Phalloidin (actin) stained podocytes treated with B) scrambled shRNAs (normal podocyte appearance), C) antisense shRNAs and D) lincRNA shRNAs. Treatment with shRNAs to the antisense (C) resulted in increased filapodia formation (white arrows), consistent with increased BAIAP2 activity. Treatment with lincRNA shRNAs (D) resulted in an abnormal appearance of the podocytes and the actin cytoskeleton.