| Literature DB >> 31293619 |
Mark F Richardson1,2, Kylie Munyard3, Larry J Croft1, Theodore R Allnutt4, Felicity Jackling5, Fahad Alshanbari6, Matthew Jevit6, Gus A Wright6, Rhys Cransberg3, Ahmed Tibary7, Polina Perelman8, Belinda Appleton2, Terje Raudsepp6.
Abstract
The development of high-quality chromosomally assigned reference genomes constitutes a key feature for understanding genome architecture of a species and is critical for the discovery of the genetic blueprints of traits of biological significance. South American camelids serve people in extreme environments and are important fiber and companion animals worldwide. Despite this, the alpaca reference genome lags far behind those available for other domestic species. Here we produced a chromosome-level improved reference assembly for the alpaca genome using the DNA of the same female Huacaya alpaca as in previous assemblies. We generated 190X Illumina short-read, 8X Pacific Biosciences long-read and 60X Dovetail Chicago® chromatin interaction scaffolding data for the assembly, used testis and skin RNAseq data for annotation, and cytogenetic map data for chromosomal assignments. The new assembly VicPac3.1 contains 90% of the alpaca genome in just 103 scaffolds and 76% of all scaffolds are mapped to the 36 pairs of the alpaca autosomes and the X chromosome. Preliminary annotation of the assembly predicted 22,462 coding genes and 29,337 isoforms. Comparative analysis of selected regions of the alpaca genome, such as the major histocompatibility complex (MHC), the region involved in the Minute Chromosome Syndrome (MCS) and candidate genes for high-altitude adaptations, reveal unique features of the alpaca genome. The alpaca reference genome VicPac3.1 presents a significant improvement in completeness, contiguity and accuracy over VicPac2 and is an important tool for the advancement of genomics research in all New World camelids.Entities:
Keywords: Dovetail Chicago; MHC; Minute Chromosome Syndrome; VicPac3.1; alpaca; chromosome-level; high-altitude adaptations; reference genome
Year: 2019 PMID: 31293619 PMCID: PMC6598621 DOI: 10.3389/fgene.2019.00586
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Comparative summary statistics of alpaca genome assemblies.
| Breed | Huacaya | Huacaya | Huacaya | Huacaya | Huacaya | Huacaya | Huacaya |
| Sex | Female | Female | Female | Female | Female | Female | Female |
| Individual | n/a | ||||||
| Assembly size (Gb) | 2.12 | 2.12 | 2.12 | 2.66 | 2.17 | 2.96 | 2.01 |
| Contig N50 (kb) | 35.72 | 35.75 | 35.75 | 306.09 | 29.07 | 3.91 | 66.3 |
| Number of contigs | 204,817 | 204,577 | 205,666 | 719,860 | 412,904 | 721,292 | 75,733 |
| Scaffold N50 (Mb) | 24.02 | 9.86 | 9.06 | 5.83 | 7.26 | 0.23 | 5.1 |
| Scaffold L50 | 25 | 64 | 69 | 126 | 86 | 2,595 | – |
| Number of scaffolds | 77,390 | 78,963 | 82,481 | 678,087 | 276,726 | 298,413 | 4,322 |
| Longest scaffold (Mb) | 121.37 | 38.36 | 38.36 | 25.07 | 38.45 | 5.51 | – |
| GC % | 41.4 | 41.4 | 41.4 | 41.6 | 41.4 | 39.7 | 41.5 |
| N’s % | 4.17 | 4.09 | 3.98 | 2.44 | 4.31 | 35.09 | – |
| Repeat % | 33.48 | – | – | – | 34.74 | 32.1 | |
| Reference | This study | This study | This study | This study | NCBIa, | NCBI, | |
| UCSCb, | UCSC, | ||||||
| Ensemblc | Ensembl | ||||||
BUSCO analysis of genome completeness.
| Complete and single copy (%) | Complete and duplicated (%) | Fragmented (%) | Missing (%) | |
|---|---|---|---|---|
| 96.1 | 0.7 | 2.0 | 1.9 | |
| 94.7 | 0.7 | 2.4 | 2.2 | |
| 95.0 | 0.7 | 2.1 | 2.2 | |
| 94.2 | 0.8 | 2.1 | 2.9 | |
| 93.9 | 0.8 | 2.6 | 2.7 | |
| 95.0 | 0.5 | 2.5 | 2.0 | |
| 94.5 | 1.2 | 2.6 | 1.7 | |
| 95.2 | 0.5 | 2.3 | 2.0 | |
| 92.4 | 1.2 | 3.0 | 3.4 | |
| 92.1 | 1.1 | 3.4 | 3.4 | |
Chromosomal assignment of VicPac3.1 showing per each chromosome assembly size, number of unique assigned scaffolds, number of annotated genes, gene density, and human homology.
| Alpaca chr. | Assembly size, bp | No. of unique mapped scaffolds | No. of genes | Genes per Mb | Human homology | Assembly size, bp | No. of unique mapped scaffolds |
|---|---|---|---|---|---|---|---|
| 1 | 101,041,233 | 5 | 1625 | 16.1 | 3q, 21q | 41,153,578 | 5 |
| 2 | 121,370,620 | 1 | 1650 | 13.6 | 4 | 36,264,523 | 4 |
| 3 | 83,363,794 | 3 | 1269 | 15.2 | 5 | 66,866,246 | 7 |
| 4 | 65,636,945 | 3 | 1166 | 17.8 | 9 | 36,674,619 | 6 |
| 5 | 96,274,254 | 1 | 1428 | 14.9 | 2q | 67,623,744 | 5 |
| 6 | 74,791,714 | 2 | 1448 | 19.6 | 14q, 15q | 34,188,095 | 5 |
| 7 | 31,168,711 | 1 | 641 | 2.1 | 7 | 9,531,993 | 3 |
| 8 | 70,270,077 | 1 | 1028 | 14.7 | 6q | 62,544,616 | 5 |
| 9 | 74,791,714 | 3 | 1081 | 14.4 | 1p, 16q, 19q | 26,984,682 | 4 |
| 10 | 39,582,034 | 1 | 794 | 19.9 | 11 | 0 | 0 |
| 11 | 77,176,758 | 6 | 1958 | 25.4 | 1q, 10q | 36,454,531 | 6 |
| 12 | 48,986,614 | 2 | 910 | 18.6 | 12q | 30,043,790 | 2 |
| 13 | 61,008,235 | 3 | 1491 | 2.4 | 1p | 55,336,466 | 6 |
| 14 | 67,111,318 | 2 | 901 | 13.4 | 13q | 19,497,535 | 4 |
| 15 | 32,418,436 | 2 | 643 | 2.0 | 2p | 23.521.627 | 3 |
| 16 | 39,074,364 | 6 | 1220 | 3.1 | 10p, 17q | 36,989,750 | 8 |
| 17 | 46,944,759 | 1 | 887 | 18.9 | 3p | 40,598,934 | 2 |
| 18 | 29,910,177 | 2 | 930 | 31.0 | 7, 16p | 24,952,824 | 2 |
| 19 | 24,022,313 | 1 | 693 | 28.9 | 20q | 12,494,946 | 1 |
| 20 | 38,741,345 | 2 | 1110 | 28.5 | 6p | 15,672,241 | 2 |
| 21 | 29,520,914 | 3 | 895 | 30.9 | 1q, 16q∗ | 15,741,292 | 4 |
| 22 | 25,522,599 | 1 | 891 | 34.3 | 5q, 19p | 26,415,928 | 2 |
| 23 | 29,440,657 | 2 | 520 | 17.9 | 1q, 13q | 32,337,675 | 3 |
| 24 | 18,346,407 | 1 | 318 | 17.7 | 18 | 15,189,407 | 2 |
| 25 | 60,195,357 | 3 | 1467 | 24.5 | 8q | 26,317,781 | 5 |
| 26 | 27,987,978 | 2 | 537 | 19.2 | 4q, 8p | 32,483,939 | 2 |
| 27 | 22,699,463 | 1 | 11 | 0.5 | 15q | 8,774,263 | 2 |
| 28 | 16,162,605 | 1 | 412 | 25.8 | 2p | 13,182,695 | 2 |
| 29 | 16,162,605 | 2 | 461 | 28.8 | 8q | 24,598,565 | 2 |
| 30 | 13,130,742 | 3 | 238 | 18.3 | 18q | 11,278,592 | 3 |
| 31 | 13,602,737 | 1 | 323 | 23.1 | 4, 8p | 12,583,175 | 1 |
| 32 | 22,732,685 | 3 | 677 | 29.4 | 12q, 22q | 8,370,595 | 3 |
| 33 | 16,261,182 | 1 | 451 | 28.2 | 11q | 12,417,884 | 2 |
| 34 | 22,097,801 | 1 | 526 | 23.9 | 12p | 16,301,232 | 2 |
| 35 | 18,484,027 | 4 | 397 | 22.1 | 10p | 10,673,185 | 4 |
| 36 | 5,377,765 | 3 | 64 | 12.8 | 7p | 3,455,596 | 4 |
| X | 36,971,808 | 8 | 687 | 17.2 | X | 16,549,574 | 6 |
FIGURE 1Predicted peptide length distributions. (A) Histogram of VicPac3.1 peptide annotation showing number of peptides full length, as defined by 90% or better overlap of the closest human peptide match. (B) Histogram of predicted peptide lengths (log10) for VicPac3.1.
FIGURE 2Features of alpaca chromosomes in VicPac3.1. (A) y-axis: the number of predicted genes per chromosome; x-axis: total length of chromosomally assigned scaffolds; (B) Gene density per chromosome; magenta – chromosomes with gene density < 5 genes/Mb; dark blue – chromosomes with gene density > 25 genes/Mb.
FIGURE 3Comparison of MHC in alpaca, dromedary and Bactrian camel. (A) Schematic of camelid chr20 showing FISH map positions of MHC and CRISP2 and corresponding scaffolds in VicPac3.1; (B) comparative organization of MHC Class II; (C) comparative organization of MHC Class I; gene symbols in red denote loci that show differences between species; relative positions of genes and distances between loci correspond to the sequence map.
Chromosome 36 scaffolds and predicted genes with orthologs in HSA7.
| Scaffold | Scaffold size, bp | Gene symbol | HSA7 sequence position |
|---|---|---|---|
| ScfyRBE_2631 | 352,287 | chr7:45,888,357-45,893,668 | |
| chr7:45,912,598-45,921,274 | |||
| ScfyRBE_77331 | 2,828,351 | chr7:47,275,154-47,582,144 | |
| chr7:47,963,288-47,979,543 | |||
| chr7:47,987,151-48,029,119 | |||
| chr7:48,035,520-48,061,297 | |||
| chr7:48,171,460-48,647,495 | |||
| chr7:49,773,661-49,921,950 | |||
| chr7:49,937,441-50,093,264 | |||
| chr7:50,096,036-50,159,830 | |||
| chr7:50,304,669-50,405,101 | |||
| chr7:50,444,129-50,542,535 | |||
| chr7:50,531,759-50,543,463 | |||
| chr7:50,592,580-50,782,567 | |||
| chr7:51,016,212-51,316,799 | |||
| ScfyRBE_77323 | 2,197,127 | chr7:54,542,325-54,571,080 | |
| chr7:54,752,253-54,759,974 | |||
| chr7:55,365,448-55,433,742 | |||
| chr7:55,470,613-55,572,525 | |||
| chr7:44,062,727-44,065,587 | |||
| chr7:44,044,717-44,061,716 | |||
| chr7:43,875,913-43,906,626 | |||
| chr7:43,866,558-43,869,557 | |||
Candidate genes for high altitude adaptations and signatures of selection in the alpaca.
| Gene symbol | Signature of selection | Species where the gene is under positive selection | References |
|---|---|---|---|
| ACAA1A | – | Deer mouse | |
| ADAM17 | Negative | Yak | |
| ARG2 | Negative | Yak | |
| ATF6 | – | Pig | |
| CKMT1 | – | Deer mouse | |
| EFEMP1 | Negative | Pig | |
| EGLN1 | Negative | Yak, dog, human | |
| EHHADH | Positive | Deer mouse | |
| EPAS1 | Negative | Dog, human, snakes | |
| ERP44 | – | Camels (oxidative stress | |
| HOXB6 | – | Pig | |
| IKBKG | – | Pig | |
| KLF6 | Negative | Pig | |
| MGST2 | – | Camels (oxidative stress | |
| MMP3 | – | Yak | |
| NFE2L2 | Negative | Camels (oxidative stress | |
| NOTCH4 | Negative and positive | Deer mouse | |
| PPARA | Positive | Deer mouse, human | |
| RBPJ | Negative and positive | Pig | |
| SF3B1 | Negative | Pig | |
Alpaca chr21 scaffolds and predicted genes (MC1R is highlighted) with known human orthologs.
| Scaffold | Gene symbol | Human location; chr: sequence position |
|---|---|---|
| ScfyRBE_283 | chr1:162,069,774-162,368,451 | |
| chr1:162,632,465-162,787,400 | ||
| chr1:179,081,377-179,095,996 | ||
| ScfyRBE_77299 | chr1:180,632,004-180,890,251 | |
| chr1:183,186,288-183,244,900 | ||
| chr1:184,690,231-184,754,913 | ||
| ScfyRBE_77374 | chr16:88,715,338-88,785,220 | |
| chr16:88,453,317-88,537,016 | ||
| chr16:87,951,434-88,077,318 | ||
| chr16:85,898,803-85,922,609 | ||
| chr16:85,613,216-85,676,204 | ||
| chr1:147,186,259-147,225,638 | ||
| chr1:150,796,208-150,808,323 | ||
| chr1:161,118,101-161,121,067 | ||
| ScfyRBE_14 | chr16:90,022,600-90,044,971 | |
| chr16:89,914,847-89,920,951 | ||
| chr16:89,490,917-89,557,748 | ||
| chr16:89,285,175-89,490,318 | ||
| chr16:89,195,761-89,201,664 | ||
Select mammalian coat color genes in VicPac3.1.
| Gene symbol | Scaffold | Alpaca chr. | FISH; | ||
|---|---|---|---|---|---|
| 1 | 3 | ScfyRBE_14 | 21 | n/a | |
| 8 | 7 | ScfyRBE_2524 | 4 | 4q21-q22 | |
| 8 | 7 | ScfyRBE_4179 | 14 | n/a | |
| 12 | 7 | ScfyRBE_77320 | 16 | n/a | |
| 22 | 19 | ScfyRBE_26 | 2 | 2q24 | |
| 10 | 11 | ScfyRBE_77306 | 12 | 12q22 | |
| 14 | 10 | ScfyRBE_5827 | 3 | 3q12 | |
Clusters of keratin and keratin-associated protein genes in VicPac3.1.
| Gene symbol | Chromosome | |
|---|---|---|
| ScfyRBE_77306 | 12 | |
| ScfyRBE_77388 | 16 | |
| ScfyRBE_2857 | n/a | |
| ScfyRBE_4 | 1 | |