| Literature DB >> 33283855 |
Sanjit Singh Batra1, Michal Levy-Sakin2, Jacqueline Robinson3, Joseph Guillory4, Steffen Durinck4,5, Tauras P Vilgalys6, Pui-Yan Kwok2,3, Laura A Cox7,8, Somasekar Seshagiri4, Yun S Song1,9,10, Jeffrey D Wall3.
Abstract
BACKGROUND: Baboons are a widely used nonhuman primate model for biomedical, evolutionary, and basic genetics research. Despite this importance, the genomic resources for baboons are limited. In particular, the current baboon reference genome Panu_3.0 is a highly fragmented, reference-guided (i.e., not fully de novo) assembly, and its poor quality inhibits our ability to conduct downstream genomic analyses.Entities:
Year: 2020 PMID: 33283855 PMCID: PMC7719865 DOI: 10.1093/gigascience/giaa134
Source DB: PubMed Journal: Gigascience ISSN: 2047-217X Impact factor: 6.524
Figure 1:Illustration of our genome assembly strategy.
Assembly statistics for each step of the adopted assembly strategy
| Assembly | 10x | 10x Contigs | 10x Contigs + Nanopore scaffolding | 10x Contigs + Nanopore Scaffolding + Nanopore gap filling | 10x Contigs + Nanopore scaffolding + Nanopore gap filling + Illumina polishing | Panubis1.0 | Panu_3.0 |
|---|---|---|---|---|---|---|---|
| Total length of scaffolds | 2,892,554,220 | 2,809,352,255 | 2,871,292,557 | 2,871,210,925 | 2,870,847,162 | 2,869,821,163 | 2,959,373,024 |
| No. of scaffolds | 24,513 | 87,632 | 15,803 | 15,803 | 15,803 | 11,145 | 63,235 |
| Scaffold N50 | 15,720,195 | 84,258 | 1,695,573 | 1,695,772 | 1,695,642 | 140,274,886 | 585,721 |
| Total gap length | 83,203,960 | 0 | 50,344,034 | 2,030,908 | 2,030,908 | 2,321,983 | 22,434,732 |
| Total length of contigs | 2,809,350,260 | 2,809,352,255 | 2,820,948,523 | 2,869,180,017 | 2,868,816,254 | 2,867,510,325 | 2,937,001,527 |
| No. of contigs | 87,347 | 87,632 | 62,252 | 17,004 | 17,004 | 15,243 | 122,216 |
| Contig N50 | 84,258 | 84,258 | 134,222 | 1,469,760 | 1,469,602 | 1,455,705 | 138,819 |
Total length of scaffolds is the sum of lengths of scaffolds (including A, C, G, T and N) in each scaffold. Total gap length is the total number of N's in the assembly. Contigs are constructed by splitting the assembly at every stretch of ≥1 N. The total length of contigs is the sum of the number of sequenced base pairs (including only A, C, G, and T) in each scaffold.
Figure 2:Hi-C map of our Panubis1.0 genome. The figure represents the Hi-C map obtained by aligning Hi-C paired-end reads to the Panubis1.0 genome assembly laid out on the X-axis as well as the Y-axis. Because each read-pair consists of 2 reads, a position (i, j) on this map represents the number of read-pairs where one read aligned to position i and the other read aligned to position j on the Panubis1.0 genome. The intensity of each pixel in this Hi-C map represents the number of reads aligning within that bin. The Hi-C map has been drawn at a resolution of 1.25 Mb. Each blue square on the diagonal represents a chromosome-length scaffold. Autosomes are listed first, ordered by size, and the last square corresponds to the X chromosome. The axes are labeled in units of megabases.
Figure 3:Dot plots showing chromosome Y synteny suggest that the Panubis1.0 chromosome Y is putatively at least a part of the true chromosome Y. A dot plot between rhesus chromosome Y and Panubis1.0 putative chromosome Y is shown on the left, while a dot plot between the chimpanzee chromosome Y and the human chromosome Y is shown on the right. Each dot represents an aligned block, with purple representing an alignment on the positive strand and cyan an alignment on the negative strand. The axis labels are in units of megabases. The phylogenetic distance between baboon and rhesus macaque is similar to that between human and chimpanzee. Hence, the broadly conserved synteny between the rhesus and baboon putative chromosome Y as compared to the synteny between the chimp and human chromosome Y suggests that the scaffold representing the putative chromosome Y in the Panubis1.0 assembly is indeed capturing at least a large part of chromosome Y.
Figure 4:Dot plots showing alignment of Panu_3.0 reference-assisted chromosomes vs Panubis1.0 chromosome-length scaffolds. The Panu_3.0 assembly is shown on the Y-axis and the Panubis1.0 assembly is shown on the X-axis. Each dot represents the position of a syntenic block between the 2 assemblies as determined by the nucmer alignment. The color of the dot reflects the orientation of the individual alignments (purple indicates consistent orientation and blue indicates inconsistent orientation). The dot plots illustrate that there are chromosomes containing large inversions and translocations in the Panu_3.0 assembly with respect to the Panubis1.0 assembly.
Likely large (>100 kb) assembly errors in Panu_3.0, ordered by size
| Panu_3.0 chromosome | Panu_3.0 (Mb) | Panu_2.0 (Mb) | Type | Linkage support | BNG support | LDhelmet support | ||
|---|---|---|---|---|---|---|---|---|
| Start | End | Start | End | |||||
| NC_018164.2 | 88.05 | 104.99 | 87.61 | 104.98 | Inv | Start | Yes | Unknown[ |
| NC_018167.2 | 29.38 | 44.71 | 29.25 | 44.53 | Inv | Start + end | Yes | Start + end |
| NC_018156.2 | 4.04 | 8.67 | 4.18 | 8.63 | Inv | No | Yes[ | No |
| NC_018162.2 | 82.42 | 86.47 | 81.91 | 84.01 | Trans | Start + end | No[ | No |
| NC_018166.2 | 104.28 | 108.05 | 103.66 | 107.44 | Inv | No | Yes | No |
| NC_018165.2 | 15.93 | 19.48 | 15.85 | 19.40 | Inv | No | No | No |
| NC_018166.2 | 96.94 | 100.12 | 96.39 | 99.54 | Trans | Start + end | Yes[ | Start + end |
| NC_018160.2 | 36.05 | 36.75 | 35.88 | 36.55 | Trans | No | Yes[ | Start |
| NC_018163.2 | 23.19 | 23.66 | 0 | 0.47 | Trans | No | Yes[ | No |
| NC_018164.2 | 4.05 | 4.49 | 3.99 | 4.45 | Trans | No[ | Yes | No |
| NC_018165.2 | 100.91 | 101.18 | 100.31 | 100.59 | Trans | No | Yes | No |
| NC_018152.2 | 166.73 | 166.89 | 169.86 | 170.10 | Trans | Start + end | Yes | End |
Note that a “no” in the “Linkage support” or “LDhelmet support” columns is inconclusive and should not be interpreted as support for the Panu_3.0 assembly being correct.
Unable to determine whether linkage and LDhelmet provide support at the end breakpoint due to a lack of synteny between Panu_2.0 and Panu_3.0.
Panu_2.0 assembly seems to be correct.
Bionano Genomics (BNG) maps do not support a translocation with these breakpoints. However, they do support a potential large structural variant at the starting breakpoint.
BNG maps support the presence of a large structural variant, which may be a translocation.
Linkage data suggest a potential polymorphic inversion (in 16,413) partially overlapping with this interval.
Figure 5:Evidence for misassembly on chromosome NC_018167.2 in Panu_3.0. (a) Bionano optical map alignment to the Panu_3.0 assembly demonstrates an inversion on chromosome NC_018167.2 beginning at ∼29.38 Mb and ending at ∼44.71 Mb. (b) Estimates of the population recombination rate ρ near the potential synteny breaks of the inversion identified on chromosome NC_018167.2. (c) The x-axis shows positions along chromosome NC_018167.2 in Panu_3.0, where each row represents 1 of the 9 offspring of sire 10,173. Switches between red and blue within a row represent a recombination event. The 2 vertical black lines represent locations where ≥3 recombinations occur at the same locus, indicating a potential misassembly.
Additional large (>100 kb) inversion differences between Panubis1.0 and Panu_3.0, ordered by size
| Panubis1.0 chromosome | Panubis1.0 Start (Mb) | Panubis1.0 End (Mb) | Panu_3.0 chromosome | Panu_3.0 chromosome | Panu_3.0 Start (Mb) | Panu_3.0 End (Mb) |
|---|---|---|---|---|---|---|
| NC_044992.1 | 28.89 | 45.01 | CM001506.2 | NC_018167.2 | 29.38 | 44.79 |
| NC_044995.1 | 0.00 | 13.00 | CM001509.2 | NC_018170.2 | 0.00 | 13.31 |
| NC_044987.1 | 101.26 | 106.48 | CM001504.2 | NC_018165.2 | 101.44 | 107.53 |
| NC_044978.1 | 176.83 | 181.37 | CM001495.2 | NC_018156.2 | 175.08 | 180.09 |
| NC_044986.1 | 86.61 | 90.73 | CM001499.2 | NC_018160.2 | 85.56 | 90.30 |
| NC_044988.1 | 0.00 | 3.50 | CM001505.2 | NC_018166.2 | 0.00 | 3.78 |
| NC_044996.1 | 86.67 | 89.58 | CM001511.2 | NC_018172.2 | 86.91 | 90.23 |
| NC_044982.1 | 154.35 | 156.82 | CM001497.2 | NC_018158.2 | 155.71 | 158.53 |
| NC_044984.1 | 7.96 | 10.58 | CM001501.2 | NC_018162.2 | 8.03 | 10.83 |
| NC_044991.1 | 33.09 | 35.09 | CM001500.2 | NC_018161.2 | 32.46 | 35.05 |
| NC_044996.1 | 93.67 | 95.52 | CM001511.2 | NC_018172.2 | 94.22 | 96.59 |
| NC_044981.1 | 68.61 | 71.05 | CM001494.2 | NC_018155.2 | 69.37 | 71.65 |
| NC_044996.1 | 40.49 | 42.78 | CM001511.2 | NC_018172.2 | 41.15 | 43.34 |
| NC_044996.1 | 10.01 | 11.79 | CM001511.2 | NC_018172.2 | 10.20 | 12.06 |
| NC_044996.1 | 31.80 | 33.37 | CM001511.2 | NC_018172.2 | 32.11 | 33.97 |
| NC_044979.1 | 142.32 | 144.05 | CM001493.2 | NC_018154.2 | 141.96 | 143.71 |
| NC_044996.1 | 90.77 | 92.54 | CM001511.2 | NC_018172.2 | 91.42 | 92.99 |
| NC_044993.1 | 63.59 | 65.52 | CM001510.2 | NC_018171.2 | 62.31 | 63.73 |
| NC_044991.1 | 26.79 | 28.49 | CM001500.2 | NC_018161.2 | 26.52 | 27.82 |
| NC_044980.1 | 0.02 | 0.78 | CM001496.2 | NC_018157.2 | 0.02 | 1.26 |
| NC_044979.1 | 0.00 | 0.73 | CM001493.2 | NC_018154.2 | 0.00 | 0.75 |
We cannot definitively determine which orientation is correct for these inversions, and they should be considered provisional.
Figure 6:Pedigree of baboons used in linkage analysis. Circles represent females, and squares, males.