| Literature DB >> 33836575 |
Julia V Halo1,2, Amanda L Pendleton2, Feichen Shen2, Aurélien J Doucet2,3, Thomas Derrien4, Christophe Hitte4, Laura E Kirby2, Bridget Myers2, Elzbieta Sliwerska2, Sarah Emery2, John V Moran2,5, Adam R Boyko6, Jeffrey M Kidd7,8.
Abstract
Technological advances have allowed improvements in genome reference sequence assemblies. Here, we combined long- and short-read sequence resources to assemble the genome of a female Great Dane dog. This assembly has improved continuity compared to the existing Boxer-derived (CanFam3.1) reference genome. Annotation of the Great Dane assembly identified 22,182 protein-coding gene models and 7,049 long noncoding RNAs, including 49 protein-coding genes not present in the CanFam3.1 reference. The Great Dane assembly spans the majority of sequence gaps in the CanFam3.1 reference and illustrates that 2,151 gaps overlap the transcription start site of a predicted protein-coding gene. Moreover, a subset of the resolved gaps, which have an 80.95% median GC content, localize to transcription start sites and recombination hotspots more often than expected by chance, suggesting the stable canine recombinational landscape has shaped genome architecture. Alignment of the Great Dane and CanFam3.1 assemblies identified 16,834 deletions and 15,621 insertions, as well as 2,665 deletions and 3,493 insertions located on secondary contigs. These structural variants are dominated by retrotransposon insertion/deletion polymorphisms and include 16,221 dimorphic canine short interspersed elements (SINECs) and 1,121 dimorphic long interspersed element-1 sequences (LINE-1_Cfs). Analysis of sequences flanking the 3' end of LINE-1_Cfs (i.e., LINE-1_Cf 3'-transductions) suggests multiple retrotransposition-competent LINE-1_Cfs segregate among dog populations. Consistent with this conclusion, we demonstrate that a canine LINE-1_Cf element with intact open reading frames can retrotranspose its own RNA and that of a SINEC_Cf consensus sequence in cultured human cells, implicating ongoing retrotransposon activity as a driver of canine genetic variation.Entities:
Keywords: Canis familiaris; long-read assembly; mobile elements; structural variation
Year: 2021 PMID: 33836575 PMCID: PMC7980453 DOI: 10.1073/pnas.2016274118
Source DB: PubMed Journal: Proc Natl Acad Sci U S A ISSN: 0027-8424 Impact factor: 11.205
Fig. 1.Alignment of assembled scaffolds to the CanFam3.1 genome. Each of the assembled scaffolds was aligned to the CanFam3.1 reference genome. Results are shown for four chromosomes. The colored bars below each line indicate the corresponding position of each scaffold, colored based on their indicated length. Above each line, regions of segmental duplications based on read depth in the Zoey Illumina data are indicated by red boxes.
Comparison of the Boxer and Great Dane assemblies
| CanFam3.1 autosomes + X | Zoey autosomes + X | |
| Total length | 2,327,633,984 | 2,326,329,672 |
| Non-N | 2,317,593,971 | 2,320,292,846 |
| Number of gaps | 19,553 | 997 |
| Longest contiguous segment | 2,428,071 | 28,813,894 |
| Mean contiguous segment length | 118,523 | 2,239,665 |
| Median contiguous segment length | 54,641 | 1,107,836 |
| N50 segment length | 277,468 | 4,765,928 |
| Segmental duplications genome alignment, >1 kbp, >90% ID | 6,250 | 6,371 |
| 49,339,683 | 45,425,166 | |
| Segmental duplications Penelope read depth | 459 | 468 |
| 47,757,534 | 40,836,807 |
Presented are general assembly statistics for the primary autosomal and X chromosome sequence of the CanFam3.1 and Zoey assemblies. Contiguous segment refers to the length of sequence uninterrupted by an “N” nucleotide. Segmental duplications were identified in each assembly based on an assembly self-alignment and by the depth of coverage of Illumina sequencing reads from Penelope, an Iberian Wolf. See for additional details.
Fig. 2.Annotation of genes missing from the CanFam3.1 assembly. A genome browser view of chr20 on the Zoey assembly is shown. The top track summarizes a comparison between the Zoey and CanFam3.1 assemblies using the UCSC liftOver tool. Black segments show alignment to the corresponding chromosome on the CanFam3.1 assembly. Purple segments match to an unlocalized contig (chrUn_JH374124) in the CanFam3.1 assembly. The large region in the middle between the purple and black segments is absent from the CanFam3.1 assembly. The track below shows the position of four genes in this region annotated using RNA-Seq data: GNA15, GNA11, AES, and TLE2. The colored bars below each gene model show the expression levels across different tissues, as indicated by the color key at the left. See for additional details.
Fig. 3.CanFam3.1 assembly gaps are enriched for sequence with extreme GC content. Depicted is the distribution of GC content for 12,806 resolved assembly gaps. A subset consisting of 5,553 of the 12,806 segments have a GC content greater than that found in 99% of randomly selected segments. See for additional details.
Fig. 4.Size of structural variants identified between the CanFam3.1 and Zoey assemblies. Shown are histograms depicting the size distribution of (A) 16,834 deletions and (B) 15,621 insertions between the Zoey and CanFam3.1 assemblies. Variant size is plotted on a logarithmic scale such that the bins in the histogram are of equal size in the log scale. Large increases at ∼200 bp and ∼6 kbp indicate the disproportionate contribution of dimorphic LINE1 and SINEC sequences to the genetic differences between the two assemblies. See for additional details.
Fig. 5.Identification of canine LINE-1 and SINEC elements capable of retrotransposition. (A) Top) A full-length L1_Cf equipped with a retrotransposition indicator cassette (mneoI) was assayed for retrotransposition in human HeLa-HA cells. TSD indicates a target site duplication generated upon retrotransposition. (Bottom) Results of the retrotransposition assay. JM101/L1.3 (positive control) contains an active human LINE-1. JM105/L1.3 (negative control) contains a human LINE-1 that harbors an inactivating missense mutation in the reverse transcriptase domain of ORF2p (99). ADL1Cf-104_5 contains the full-length canine LINE-1 identified in this study. (B) (Top) A consensus SINEC_Cf element equipped with an indicator cassette to monitor the retrotransposition of RNA pol III transcripts (neotet) (75) was assayed for retrotransposition in human HeLa-HA cells in the presence of either an active human LINE-1 or the newly cloned L1_Cf-104_5 sequence that lacks a retrotransposition indicator cassette (JM101/L1.3Δneo or ADL1Cf-104_5Δneo, respectively). (Bottom) Results of the retrotransposition assay. JM101/L1.3Δneo (positive control) contains an active human LINE-1. JM105/L1.3 Δneo (negative control) contains a human LINE-1 that harbors an inactivating missense mutation in the reverse transcriptase domain of ORF2p (99). ADL1Cf-104_5Δneo contains an active canine LINE-1 (see A). The expression of either JM101/L1.3Δneo or ADL1-Cf-105Δneo could drive human Alu and canine SINEC_Cf retrotransposition. In both assays, the blue-stained foci represent G418-resistant foci containing a presumptive retrotransposition event. See for additional details.