| Literature DB >> 30022108 |
Lindsay A Holden1, Meharji Arumilli2,3,4, Marjo K Hytönen2,3,4, Sruthi Hundi2,3,4, Jarkko Salojärvi5,6, Kim H Brown7, Hannes Lohi8,9,10.
Abstract
Dogs are excellent animal models for human disease. They have extensive veterinary histories, pedigrees, and a unique genetic system due to breeding practices. Despite these advantages, one factor limiting their usefulness is the canine genome reference (CGR) which was assembled using a single purebred Boxer. Although a common practice, this results in many high-quality reads remaining unmapped. To address this whole-genome sequence data from three breeds, Border Collie (n = 26), Bearded Collie (n = 7), and Entlebucher Sennenhund (n = 8), were analyzed to identify novel, non-CGR genomic contigs using the previously validated pseudo-de novo assembly pipeline. We identified 256,957 novel contigs and paired-end relationships together with BLAT scores provided 126,555 (49%) high-quality contigs with genomic coordinates containing 4.6 Mb of novel sequence absent from the CGR. These contigs close 12,503 known gaps, including 2.4 Mb containing partially missing sequences for 11.5% of Ensembl, 16.4% of RefSeq and 12.2% of canFam3.1+ CGR annotated genes and 1,748 unmapped contigs containing 2,366 novel gene variants. Examples for six disease-associated genes (SCARF2, RD3, COL9A3, FAM161A, RASGRP1 and DLX6) containing gaps or alternate splice variants missing from the CGR are also presented. These findings from non-reference breeds support the need for improvement of the current Boxer-only CGR to avoid missing important biological information. The inclusion of the missing gene sequences into the CGR will facilitate identification of putative disease mutations across diverse breeds and phenotypes.Entities:
Mesh:
Year: 2018 PMID: 30022108 PMCID: PMC6052005 DOI: 10.1038/s41598-018-29190-3
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Primary and secondary data.
| Breed | Assembly | Contigs | Consensus (bp) | Largest Contig (bp) | N50 |
|---|---|---|---|---|---|
| Border | Primary (avg) | 654,517 | 220,364,358 | 16,305 | 308 |
| Secondary | 256,957 | 312,472,893 | 32,04 | 1,374 | |
| Bearded | Primary (avg) | 120,393 | 47,478,915 | 62,905 | 478 |
| Secondary | 7,476 | 15,967,624 | 27,27 | 3,147 | |
| Entlebucher | Primary (avg) | 174,309 | 59,493,588 | 16,863 | 386 |
| Secondary | 15,455 | 23,200,436 | 20,32 | 2,156 |
The unaligned reads in the three breeds were compiled into primary (per individual) contig assemblies using MIRA 4.0 which were combined and assembled into secondary assemblies.
Figure 1Infographic representation of the de novo assembly across the three breeds. Unmapped reads from each breed were assembled into primary assembly for each individual and pooled by breed to generate secondary assemblies. Boxer reads were mapped to the secondary assemblies to determine novel and shared genetic information across the assemblies. Images for Entlebucher Sennenhund, Bearded Collie and Boxer are obtained from Anne Teijula, Lotta Paakkanen and Outi Toni respectively.
Figure 2BLASTn hits for contigs from 3 domestic dog breeds. We queried all de novo contigs from Border Collie, Bearded Collie, and Entlebucher Sennenhund using BLASTn for nucleotide sequence similarity against the full NCBI nr/nt non-redundant nucleotide database. Single top-hit results are binned by species or identified as not aligning (unknown) to the nr/nt database. To the right of BLASTn pie charts are GC content and complexity scores of secondary de novo contig assemblies. GC content, DUST complexity score, and Entropy complexity scores for Border Collie, Bearded Collie, and Entlebucher Sennenhund contigs. For reference, GC content of CanFam3.1 is 41.3%. Low DUST scores (<7) and high Entropy scores (>70) indicate high complexity (fewer short repeats), as seen in all three breeds.
Figure 3Polymerase chain reaction (PCR) examples of the contigs. (a) Cropped gel image showing the amplification of the contig bc_c107501 in all Border Collies (BC) and no Boxers (B). (b) Cropped gel image showing variable amplification of the contig bc_c222950 in Border Collies and Boxers. Full-length gels are presented in Supplementary Fig. S2.
Total count of contigs with mapped genomic coordinates using paired-end relationships in all three breeds.
| Breed | Total Contigs | Predicted contigs | High Quality Predictions | Single loci | Two loci | More than two loci |
|---|---|---|---|---|---|---|
| Border | 216240 | 194621 | 158271 | 138361 | 16947 | 2963 |
| Bearded | 6569 | 6541 | 6055 | 1961 | 1661 | 2433 |
| Entlebucher | 14779 | 14540 | 13119 | 7399 | 3008 | 2712 |
Summary of genes in CGR gaps spanned by BAC clone-inserts and assembled contigs.
| Ensembl | RefSeq | canFam3.1+ | |
|---|---|---|---|
| Gaps in CGR (11800) | 9261 | 8061 | 8508 |
| BAC clone inserts (11681) | 5434 | 4966 | 7255 |
| Contigs in assembly (7,711) | 3757 | 3405 | 4970 |
| Novel contigs in assembly (53) | 25 | 21 | 32 |
Gaps overlapping an exon, splice site, UTR5/UTR3, intron, or ncRNA are classified as genic gaps.
Figure 4Annotation of Border Collie de novo contigs. (a) Flow chart of contig annotation. Number of contigs is indicated in red, contigs preferentially aligning to dog, human, or mouse protein Ensembl databases are indicated by colored boxes. Note: Some contigs aligned with equal weight to two databases. (b) Venn diagram of the number of protein coding genes to which novel contigs aligned. Numbers in the overlapping regions indicate the number of BLASTx hits aligning to dog (red), human (yellow), and mouse (blue).
Figure 5Known disease genes overlapping novel contigs. Overlap of non-dog genes aligning to contigs, genes associated with hereditary dog diseases (from OMIA), and genes associated with Mendelian human diseases (OMIM) in (a) Border Collies (b) Bearded Collies and (c) Entlebucher Sennenhunds. (d) There are 6 total genes represented in Border and Bearded Collie contigs that associate with diseases in humans and animal models. These genes are implicated in 54 human and 4 animal model disease phenotypes.
Examples of novel variants in six genes found in the Border and Bearded Collie contigs.
| Contig ID | Gene ID | Contig Seq | Accession ID | Dog Disease | Affected Breeds | Human Disease |
|---|---|---|---|---|---|---|
| bc_87005 | COL9A3 | partial CDS in last exon due to gap in CGR | NP_001184100.1 | Oculoskeletal dysplasia 1 | Samoyed, Labrador Retriever | Stickler Syndrome, Type 1; Marshall Syndrome |
| bc_134555 | DLX6 | putative splice variant | NP_005213.3 | Cleft palate 1 | Nova Scotia Duck Tolling Retriever | Pierre Robin Syndrome |
| bc_156488 | FAM161A | putative splice variant | XP_005626198.1 | Progressive retinal atrophy, type 3 | Tibetan Spaniel, Tibetan Terrier | Retinitis pigmentosa 28 |
| bc_162349 | RASGRP1 | putative splice variant | AAX76907.1 | Thrombopathia | Eskimo Spitz, Bassett Hound, Landseer Newfoundland | Storage pool platelet disease |
| bc_85943 | RD3 | putative splice variant | NP_898882.1 | Rod-cone dysplasia 2 | Rough Collie, Smooth Collie | Leber congenital amaurosis 12 |
| bc_110123 & bc_92297 | SCARF2 | partial CDS in exons 4, 5, 12 due to gap in CGR | AAH00584.2 | van den Ende-Gupta syndrome | Wirehaired Fox Terrier | van den Ende-Gupta syndrome |
Table includes a description of the novel content within the contig, associated dog or human disease, and the affected breeds.