| Literature DB >> 28087693 |
Si Lok1,2, Tara A Paton3,2, Zhuozhi Wang3,2, Gaganjot Kaur3,2, Susan Walker3,2, Ryan K C Yuen3,2, Wilson W L Sung3,2, Joseph Whitney3,2, Janet A Buchanan3,2, Brett Trost3,2, Naina Singh3,2, Beverly Apresto3,2, Nan Chen3,2, Matthew Coole3,2, Travis J Dawson3,2, Karen Ho3,2, Zhizhou Hu3,2, Sanjeev Pullenayegum3,2, Kozue Samler3,2, Arun Shipstone3,2, Fiona Tsoi3,2, Ting Wang3,2, Sergio L Pereira3,2, Pirooz Rostami3,2, Carol Ann Ryan3,2, Amy Hin Yan Tong4, Karen Ng5, Yogi Sundaravadanam5, Jared T Simpson5,6, Burton K Lim7, Mark D Engstrom7, Christopher J Dutton8, Kevin C R Kerr8, Maria Franke8, William Rapley8, Richard F Wintle3,2, Stephen W Scherer1,2,9,10.
Abstract
The Canadian beaver (Castor canadensis) is the largest indigenous rodent in North America. We report a draft annotated assembly of the beaver genome, the first for a large rodent and the first mammalian genome assembled directly from uncorrected and moderate coverage (< 30 ×) long reads generated by single-molecule sequencing. The genome size is 2.7 Gb estimated by k-mer analysis. We assembled the beaver genome using the new Canu assembler optimized for noisy reads. The resulting assembly was refined using Pilon supported by short reads (80 ×) and checked for accuracy by congruency against an independent short read assembly. We scaffolded the assembly using the exon-gene models derived from 9805 full-length open reading frames (FL-ORFs) constructed from the beaver leukocyte and muscle transcriptomes. The final assembly comprised 22,515 contigs with an N50 of 278,680 bp and an N50-scaffold of 317,558 bp. Maximum contig and scaffold lengths were 3.3 and 4.2 Mb, respectively, with a combined scaffold length representing 92% of the estimated genome size. The completeness and accuracy of the scaffold assembly was demonstrated by the precise exon placement for 91.1% of the 9805 assembled FL-ORFs and 83.1% of the BUSCO (Benchmarking Universal Single-Copy Orthologs) gene set used to assess the quality of genome assemblies. Well-represented were genes involved in dentition and enamel deposition, defining characteristics of rodents with which the beaver is well-endowed. The study provides insights for genome assembly and an important genomics resource for Castoridae and rodent evolutionary biology.Entities:
Keywords: Canadian beaver; genome annotation; genome assembly; rodent; whole-genome sequencing
Mesh:
Year: 2017 PMID: 28087693 PMCID: PMC5295618 DOI: 10.1534/g3.116.038208
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
Figure 1Canadian Beaver Genome Project. Schematic diagram of genome and transcriptome assembly. FL-ORF, full-length open reading frame; PacBio, Pacific Biosciences; RNA-seq, RNA sequencing.
Figure 2Information on sequenced C. canadensis. (A) Map of the Parry Sound region of Upper Canada (Ontario) and the Saguenay-Lac-Saint-Jean region of Lower Canada (Quebec), where the subjects originated. Beaver specimen catalog number ROM 106,880 = red diamond. “Ward” beaver, the Toronto Zoo = red star (Map: Fensett circa 1812, Wikimedia Foundation/public domain). (B) Image of “Ward” at the Toronto Zoo (Photo: courtesy of the Toronto Zoo). (C) Size estimation of beaver genome by k-mer analysis. The circled region denotes faint signals from repetitive sequences. (D) PacBio read length distribution from 80 flow cells; 5.6 million reads representing ∼30-fold coverage of beaver genome. The histogram is further divided into fold-coverage of the genome in the 10 kb read length intervals as indicated. PacBio, Pacific Biosciences; ROM, Royal Ontario Museum.
Figure 3C. canadensis transcriptome analysis. (A) Size distribution of the assembled Trinity components from muscle and leukocyte RNA-seq reads. (B) Length distribution of assembled FL-ORFs from leukocyte and muscle transcriptomes. (C) Distribution of FL-ORFs in leukocyte and muscle. FL-ORF, full-length open reading frame; RNA-seq, RNA sequencing; ROM, Royal Ontario Museum.
Figure 4Analysis of C. canadensis mitochondria. (A) Tiling depth of mitochondrial RNA sequencing (RNA-seq) reads from archival muscle tissue. (B) Tiling depth of mitochondrial RNA-seq reads from leukocytes. (C) Diagrammatic depiction of C. canadensis mitochondrial genome. ND, NADH dehydrogenase; CO, Cytochrome c oxidase; ATP, ATPase; Cyt b, Cytochrome b; and CR, Control Region (including D-loop). Transfer RNA (tRNA) genes are denoted by the one-letter amino acid code. Variants between the leukocyte assembly and the Horn assembly are shown with gold rectangles (see also Table 4). (D) Alignment of 500 bp of D-loop sequence downloaded from GenBank with C. canadensis samples from this study, with neighbor-joining tree. Branch lengths have been formatted (proportional transformation) for space considerations. Samples Ward (KY311838) and ROM (Royal Ontario Museum) (KY321562) are as indicated. Horn sample, C. canadensis harvested in Finland (Horn ) (FR691685). Pelz Serrano samples (Pelz Serrano 2011) (JQ663937-70): AK, Alaska; AL, Alabama; Can, Alberta; GA, Georgia; ID, Idaho; KY, Kentucky; ND, North Dakota; NM, New Mexico; ME, Maine; MN, Minnesota; MS, Mississippi; OK, Oklahoma; SC, South Carolina; TN, Tennessee; TX, Texas; VA, Virginia; WA, Washington; WI, Wisconsin; and WY, Wyoming.
Variants observed between Ward and Horn beaver mitochondrial assemblies
| Variant Start | Variant End | Length | Amino Acid Change | CDS Position | Change | Codon Change | Polymorphism Type |
|---|---|---|---|---|---|---|---|
| 1,680 | 1,680 | 1 | C→T | SNP (transition) | |||
| 2,209 | 2,208 | 0 | -TTC | Deletion | |||
| 2,630 | 2,630 | 1 | A→T | SNP (transversion) | |||
| 2,786 | 2,786 | 1 | 30 | C→T | ACC→ACT | SNP (transition) | |
| 3,200 | 3,200 | 1 | 444 | C→T | ATC→ATT | SNP (transition) | |
| 3,296 | 3,296 | 1 | 540 | G→A | CCG→CCA | SNP (transition) | |
| 3,323 | 3,323 | 1 | 567 | A→G | ACA→ACG | SNP (transition) | |
| 3,668 | 3,668 | 1 | 912 | T→C | TAT→TAC | SNP (transition) | |
| 3,688 | 3,688 | 1 | T→M | 932 | C→T | ACA→ATA | SNP (transition) |
| 4,605 | 4,605 | 1 | 684 | T→C | CTT→CTC | SNP (transition) | |
| 4,674 | 4,674 | 1 | 753 | C→T | CTC→CTT | SNP (transition) | |
| 4,770 | 4,770 | 1 | 849 | C→T | GCC→GCT | SNP (transition) | |
| 5,178 | 5,178 | 1 | T→C | SNP (transition) | |||
| 5,409 | 5,409 | 1 | 66 | T→C | TTT→TTC | SNP (transition) | |
| 5,541 | 5,541 | 1 | 198 | T→C | ATT→ATC | SNP (transition) | |
| 5,652 | 5,652 | 1 | 309 | A→G | TGA→TGG | SNP (transition) | |
| 6,312 | 6,312 | 1 | 969 | G→A | TGG→TGA | SNP (transition) | |
| 6,360 | 6,360 | 1 | 1017 | C→T | CTC→CTT | SNP (transition) | |
| 6,522 | 6,522 | 1 | 1179 | T→C | TTT→TTC | SNP (transition) | |
| 7,043 | 7,043 | 1 | 13 | C→T | CTA→TTA | SNP (transition) | |
| 7,453 | 7,453 | 1 | 423 | A→G | CGA→CGG | SNP (transition) | |
| 7,489 | 7,489 | 1 | 459 | A→G | CTA→CTG | SNP (transition) | |
| 7,834 | 7,834 | 1 | 51 | T→C | ATT→ATC | SNP (transition) | |
| 7,909 | 7,909 | 1 | 126 | T→C | GTT→GTC | SNP (transition) | |
| 8,013 | 8,013 | 1 | 69 | T→C | ATT→ATC | SNP (transition) | |
| 8,284 | 8,284 | 1 | I→V | 340 | A→G | ATT→GTT | SNP (transition) |
| 8,651 | 8,651 | 1 | 27 | T→C | CAT→CAC | SNP (transition) | |
| 8,702 | 8,702 | 1 | 78 | C→T | CTC→CTT | SNP (transition) | |
| 8,942 | 8,942 | 1 | 318 | G→A | CTG→CTA | SNP (transition) | |
| 9,356 | 9,356 | 1 | 732 | C→T | TTC→TTT | SNP (transition) | |
| 10,013 | 10,013 | 1 | 120 | T→C | ATT→ATC | SNP (transition) | |
| 10,019 | 10,019 | 1 | 126 | T→C | ATT→ATC | SNP (transition) | |
| 10,036 | 10,036 | 1 | V→A | 143 | T→C | GTC→GCC | SNP (transition) |
| 10,082 | 10,082 | 1 | 189 | T→C | ATT→ATC | SNP (transition) | |
| 10,112 | 10,112 | 1 | 219 | A→G | GTA→GTG | SNP (transition) | |
| 10,519 | 10,519 | 1 | 336 | C→T | GCC→GCT | SNP (transition) | |
| 10,591 | 10,591 | 1 | 408 | A→G | TGA→TGG | SNP (transition) | |
| 10,690 | 10,690 | 1 | N→K | 507 | C→A | AAC→AAA | SNP (transversion) |
| 11,528 | 11,528 | 1 | 1345 | T→C | TTA→CTA | SNP (transition) | |
| 11,734 | 11,734 | 1 | (A)5→(A)6 | Insertion (tandem repeat) | |||
| 12,517 | 12,517 | 1 | 759 | A→G | GTA→GTG | SNP (transition) | |
| 13,366 | 13,366 | 1 | 1608 | G→A | TCG→TCA | SNP (transition) | |
| 13,945 | 13,945 | 1 | G→A | SNP (transition) | |||
| 14,005 | 14,005 | 1 | A→G | SNP (transition) | |||
| 14,466 | 14,466 | 1 | I→V | 304 | A→G | ATC→GTC | SNP (transition) |
| 14,637 | 14,637 | 1 | D→N | 475 | G→A | GAC→AAC | SNP (transition) |
| 14,703 | 14,703 | 1 | F→L | 541 | T→C | TTC→CTC | SNP (transition) |
| 14,837 | 14,837 | 1 | 675 | C→T | ACC→ACT | SNP (transition) | |
| 14,954 | 14,954 | 1 | 792 | T→C | ACT→ACC | SNP (transition) | |
| 15,356 | 15,356 | 1 | (A)5→(A)6 | Insertion (tandem repeat) | |||
| 15,515 | 15,515 | 1 | T→C | SNP (transition) | |||
| 15,544 | 15,544 | 1 | T→C | SNP (transition) | |||
| 15,632 | 15,632 | 1 | T→A | SNP (transversion) | |||
| 15,634 | 15,634 | 1 | G→A | SNP (transition) | |||
| 15,674 | 15,674 | 1 | T→C | SNP (transition) | |||
| 15,685 | 15,685 | 1 | G→A | SNP (transition) | |||
| 15,692 | 15,692 | 1 | C→T | SNP (transition) | |||
| 15,755 | 15,755 | 1 | C→T | SNP (transition) | |||
| 15,766 | 15,766 | 1 | C→T | SNP (transition) | |||
| 15,813 | 15,813 | 1 | G→A | SNP (transition) | |||
| 15,815 | 15,815 | 1 | A→G | SNP (transition) | |||
| 15,817 | > 15817 | > 1 | GT→A | Deletion | |||
| 15,907 | 15,907 | 1 | +A | Insertion | |||
| 16,376 | 16,439 | 64 | (ACACGTATACACGTATACACGTATACACGTAT)8→(ACACGTATACACGTATACACGTATACACGTAT)10 | Insertion (tandem repeat) | |||
| 16,504 | 16,504 | 1 | A→C | SNP (transversion) | |||
| 16,507 | 16,509 | 3 | +ACC | Insertion | |||
| 16,527 | 16,528 | 2 | GC→CG | Substitution | |||
| 16,689 | 16,689 | 1 | A→G | SNP (transition) |
CDS, coding sequence; SNP, single nucleotide polymorphism.
Variants observed between Ward and ROM beaver mitochondrial assemblies
| Variant Start | Variant End | Length | Amino Acid Change | CDS Position | Change | Codon Change | Polymorphism Type |
|---|---|---|---|---|---|---|---|
| 1,033 | 1,033 | 1 | A→G | SNP (transition) | |||
| 3,107 | 3,107 | 1 | 351 | G→A | CTG→CTA | SNP (transition) | |
| 3,296 | 3,296 | 1 | 540 | A→G | CCA→CCG | SNP (transition) | |
| 3,347 | 3,347 | 1 | 591 | A→C | CCA→CCC | SNP (transversion) | |
| 4,047 | 4,047 | 1 | 126 | C→A | CCC→CCA | SNP (transversion) | |
| 4,770 | 4,770 | 1 | 849 | T→C | GCT→GCC | SNP (transition) | |
| 4,810 | 4,810 | 1 | V→I | 889 | G→A | GTC→ATC | SNP (transition) |
| 5,541 | 5,541 | 1 | 198 | C→T | ATC→ATT | SNP (transition) | |
| 6,522 | 6,522 | 1 | 1179 | C→T | TTC→TTT | SNP (transition) | |
| 6,804 | 6,804 | 1 | 1461 | T→C | CTT→CTC | SNP (transition) | |
| 7,043 | 7,043 | 1 | 13 | T→C | TTA→CTA | SNP (transition) | |
| 7,834 | 7,834 | 1 | 51 | C→T | ATC→ATT | SNP (transition) | |
| 7,909 | 7,909 | 1 | 126 | C→T | GTC→GTT | SNP (transition) | |
| 8,284 | 8,284 | 1 | V→I | 340 | G→A | GTT→ATT | SNP (transition) |
| 8,702 | 8,702 | 1 | 78 | T→C | CTT→CTC | SNP (transition) | |
| 9,959 | 9,959 | 1 | 66 | T→C | TAT→TAC | SNP (transition) | |
| 10,013 | 10,013 | 1 | 120 | C→T | ATC→ATT | SNP (transition) | |
| 10,019 | 10,019 | 1 | 126 | C→T | ATC→ATT | SNP (transition) | |
| 10,112 | 10,112 | 1 | 219 | G→A | GTG→GTA | SNP (transition) | |
| 10,288 | 10,288 | 1 | 105 | T→C | AGT→AGC | SNP (transition) | |
| 11,528 | 11,528 | 1 | 1345 | C→T | CTA→TTA | SNP (transition) | |
| 12,517 | 12,517 | 1 | 759 | G→A | GTG→GTA | SNP (transition) | |
| 13,764 | 13,764 | 1 | N→Y | 325 | T→A | AAT→TAT | SNP (transversion) |
| 14,687 | 14,687 | 1 | 525 | A→G | CTA→CTG | SNP (transition) | |
| 14,703 | 14,703 | 1 | L→F | 541 | C→T | CTC→TTC | SNP (transition) |
| 14,837 | 14,837 | 1 | 675 | T→C | ACT→ACC | SNP (transition) | |
| 14,954 | 14,954 | 1 | 792 | C→T | ACC→ACT | SNP (transition) | |
| 15,544 | 15,544 | 1 | C→T | SNP (transition) | |||
| 15,561 | 15,561 | 1 | T→C | SNP (transition) | |||
| 15,618 | 15,618 | 1 | C→T | SNP (transition) | |||
| 15,634 | 15,634 | 1 | A→G | SNP (transition) | |||
| 15,692 | 15,692 | 1 | T→C | SNP (transition) | |||
| 15,702 | 15,702 | 1 | A→G | SNP (transition) | |||
| 16,136 | 16,136 | 1 | T→C | SNP (transition) |
CDS, coding sequence; SNP, single nucleotide polymorphism.
Assembly and scaffolding statistics for the C. canadensis genome: Canu v1.2 – PacBio preassembly read usage
| PacBio Reads (Gb) | Gb Lost (%) | |
|---|---|---|
| Input PacBio reads | 5,646,491 (80) | |
| Reads after error correction step | 5,568,093 (78) | 2.5 |
| Reads after trimming step | 4,546,175 (48) | 40.0 |
PacBio, Pacific Biosciences.
Assembly and scaffolding statistics for C. canadensis genome: Canu v1.2 assembly metrics from 4.546 M input PacBio reads (48 Gb)
| Version Name | Number of Contigs | N50 (bp) | Total Bases (Gb) | Max Contig Length (Mb) | |
|---|---|---|---|---|---|
| Primary Canu assembly | CP | 11,982 | 538,502 | 2.484 | 4.666 |
| First Pilon polishing | CP1 | 11,982 | 539,399 | 2.488 | 4.678 |
| Second Pilon polishing | CP2 | 11,981 | 539,125 | 2.487 | 4.677 |
| Assembly reconciled with Abyss assembly | CP2-2 | 22,515 | 278,680 | 2.487 | 3.330 |
| Assembly scaffolded with FL- and PL-ORFs | CP2-3 | 21,170 | 317,558 | 2.518 | 4.235 |
| Unused PacBio reads (Singletons) | Unused | 188,572 | 14,524 | 1.803 | 0.074 |
PacBio, Pacific Biosciences; ORFs, open reading frames.
Assembly and scaffolding statistics for C. canadensis genome: Abyss assembly metrics from 80 M input Illumina reads (24 Gb) (PE 150 bp, 300-400 bp Insert)
| Abyss Assembly | Version Name | Number of Contigs | N50 (bp) | Total Base (Gb) | Max Contig Length (Mb) |
|---|---|---|---|---|---|
| Primary assembly | AB | 312,881 | 17,834 | 2.431 | 0.206 |
| Pilon correction | AB1 | 312,881 | 17,834 | 2.431 | 0.261 |
| REAPR correction | AB2 | 316,195 | 17,517 | 2.431 | 0.261 |
Figure 5Contig length distributions. (A) CP2 assembly: Canu assembly and Pilon polishing. (B) CP2-2 assembly: breakage of regions of discordance with Abyss assembly (AB2). (C) CP2-3 assembly: scaffolding with FL- and PL-ORFs, and the BUSCO gene set. BUSCO, Benchmarking Universal Single-Copy Orthologs; FL-ORF, full-length open reading frame; PL-ORF, partial-length open reading frame.
Figure 6Reconciliation of genome scaffolded assembly CP2-3 with the exon–gene models of 9805 FL-ORFs. The exon–gene models of 8930 FL-ORFs (91.1%) were congruent with the CP2-3 assembly. The exons of 96 FL-ORFs (1%) were not found in the assembly. The remaining 779 FL-ORFs (7.9%) were missing one or more exons or have one or more exons in an incorrect orientation. FL-ORF, full-length open reading frame.
Figure 7Reconciliation of genome assembly CP2-3 with the exon–gene models for each of the 3013 members of the BUSCO gene set. The exon–gene models of 2504 BUSCO genes (83.1%) were congruent with the CP2-3 assembly. The exons of 14 BUSCO genes (0.5%) were not found in the assembly. The remaining 495 BUSCO genes (16%) were missing one or more exons or had one or more exons in the incorrect orientation. BUSCO, Benchmarking Universal Single-Copy Orthologs.
Figure 8Diagrammatic depiction of C. canadensis Cntnap2, Ddm, and Ttn loci in assembly CP2-3. Vertical bars represent exons. Horizontal lines represent assembled contigs ordered in accordance with the exon–gene models for the indicated loci, providing scaffolding information.
Statistics for C. canadensis genome scaffolded assemblies CP2-3 and AB2-3 using FL- and PL-ORFs
| Assembly | Scaffolded Assembly | |
|---|---|---|
| PacBio-Canu assembly CP2-3 | ||
| Number of contigs/scaffolds | 22,502 | 21,157 |
| Longest contig/scaffold | 3,330,706 bp | 4,235,261 bp |
| Span (total bp) | 2.486 Gb | 2.515 Gb |
| Mean contig/scaffold length | 110,503 bp | 119,017 bp |
| N50 contig/N50 scaffold | 278,680 bp | 317,558 bp |
| Abyss assembly AB2-3 | ||
| Number of contigs/scaffolds | 316,144 | 300,561 |
| Longest contig/scaffold | 206,011 bp | 787,257 bp |
| Span (total bp) | 2.431 Gb | 2.432 Gb |
| Mean contig/scaffold length | 7,688 bp | 8,091 bp |
| N50 contig/N50 scaffold | 17,517 bp | 19,662 bp |
RepeatMasker results for C. canadensis assembly CP2-3
| Number of Elements | Length Occupied (bp) | Percentage of Sequence (%) | |
|---|---|---|---|
| SINEs | 1,526,194 | 167,113,901 | 6.64 |
| Alu/B1 | 320,643 | 31,356,476 | 1.25 |
| B2-B4 | 124,373 | 7,917,276 | 0.31 |
| IDs | 441,613 | 37,889,991 | 1.50 |
| MIRs | 232,410 | 32,110,048 | 1.28 |
| LINEs | 584,641 | 379,985,492 | 15.09 |
| LINE1 | 420,271 | 340,564,285 | 13.52 |
| LINE2 | 143,179 | 35,333,421 | 1.40 |
| L3/CR1 | 16,864 | 3,272,497 | 0.13 |
| LTR elements | 358,396 | 123,574,660 | 4.91 |
| ERVL | 72,861 | 26,961,457 | 1.07 |
| ERVL-MaLRs | 200,863 | 68,820,344 | 2.73 |
| ERV_classI | 43,815 | 20,130,270 | 0.80 |
| ERV_classII | 28,700 | 4,316,563 | 0.17 |
| DNA elements | 221,280 | 48,541,417 | 1.93 |
| hAT-Charlie | 123,347 | 24,927,384 | 0.99 |
| TcMar-Tigger | 45,279 | 12,145,600 | 0.48 |
| Unclassified | 6,070 | 1,176,346 | 0.05 |
| Total interspersed repeats | 720,391,816 | 28.61 | |
| Small RNA | 59,721 | 4,937,299 | 0.20 |
| Satellites | 5,814 | 820,000 | 0.03 |
| Simple repeats | 655,839 | 31,533,733 | 1.25 |
| Low complexity | 159,452 | 9,064,948 | 0.36 |
The query species was assumed to be rodentia. RepeatMasker version open-4.0.5, rushjob mode run with rmblastn version 2.2.27+, RepBase Update 20140131, RM database version 20140131. File name: cp3.scaff.assembly.fasta; sequences, 21,170; total length, 2,518,060,007 bp (2,517,076,473 bp excluding N/X-runs); GC level, 39.72%; and bases masked, 765,530,353 bp (30.40%).
Most repeats fragmented by insertions or deletions have been counted as one element.
Beaver loci involved in dentition
| Gene | Gene Name | Reference | Accession |
|---|---|---|---|
| Enzyme | |||
| Enamel Matrix Protease | KY286078 | ||
| Matrix Metallopeptidase 14 | KY286080 | ||
| Enamelysin/Matrix Metallopeptidase 20 | KY286081 | ||
| Extracellular Matrix Protein | |||
| Ameloblastin | KY286056 | ||
| Amelogenin X Chromosome | KY286057 | ||
| Amelotin | KY286058 | ||
| Dentin Matrix Protein 1 | KY286066 | ||
| Dentin Sialophosphoprotein | KY286067 | ||
| Enamelin | KY286070 | ||
| Signal Transduction | |||
| Bone Morphogenic Protein 4 | KY286062 | ||
| Bone Morphogenic Protein 7 | KY286063 | ||
| Dickkopf-Related Protein 1 | KY286065 | ||
| Ectodysplasin A | KY286068 | ||
| Ectodysplasin A Receptor | KY286069 | ||
| Follistatin | KY286071 | ||
| Gremlin 2 | KY286073 | ||
| Inhibin A | KY286077 | ||
| Noggin | KY286086 | ||
| Int/Wingless 10a | Yamashiro | KY286096 | |
| Int/Wingless 6 | Yamashiro | KY286095 | |
| Structural Protein | |||
| Integrin Binding Sialoprotein | KY286076 | ||
| Tuftelin-Interacting Protein 11 | KY286092 | ||
| Tuftelin | KY286094 | ||
| Transcription Factor | |||
| BarH-like Homeobox 1 | KY286060 | ||
| Polycomb Complex Protein BMi1 | KY286061 | ||
| Heart and Neural Crest Derivatives 1 | KY286074 | ||
| Heart and Neural Crest Derivatives 2 | KY286075 | ||
| Lymphoid Enhancer-Binding Factor 1 | KY286079 | ||
| Muscle Segment Homeobox 1 | KY286082 | ||
| Muscle Segment Homeobox 2 | KY286083 | ||
| Nuclear Factor 1 C-Type | KY286085 | ||
| Nuclear Factor (Erythroid-Derived 2)-like 2 | KY286084 | ||
| Paired Box Protein Pax 6 | KY286087 | ||
| Paired Box Protein Pax 9 | KY286088 | ||
| Paired-like Homeodomain Transcription Factor 1 | KY286089 | ||
| Paired-like Homeodomain Transcription Factor 2 | KY286090 | ||
| T-Box Transcription Factor TBX1 | Caton | KY286091 | |
| Other | |||
| ATPase, H+ Transporting, lysosomal V0 subunit A1 | KY286059 | ||
| Ferritin Heavy Chain | KY286072 | ||
| Tissue Inhibitor of Metallopeptidase 2 | KY286093 |
Figure 9Tree based on 507 amino acid alignment of C. canadensis Dmp-1 to 13 other mammalian Dmp1 protein sequences: XP_012885281 Ord’s kangaroo rat (D. ordii); NP_058059 house mouse (M. musculus); XP_008768217.1 Norway brown rat (R. norvegicus); XP_017202907.1 European rabbit (Oryctolagus cuniculus); XP_004590692.1 American pika (Ochotona princeps); XP_005341957.1 13 lined ground squirrel (I. tridecemlineatus); XP_003469412.1 guinea pig (Cavia porcellus); NP_004398.1 man (H. sapiens); XP_004039140.1 Western lowland gorilla (Gorilla gorilla gorilla); XP_014994254.1 rhesus macaque (Macaca mulatta); XP_002745661.1 common marmoset (Callithrix jacchus); XP_005608699.1 horse (Equus caballus); and NP_776463.2 domestic cow (Bos taurus). Tree was constructed using PhyML 3.0 (Guindon ) with the JTT model of amino acid substitution. Color legend: red = Rodentia; blue = Primate; green = Lagomorpha.