| Literature DB >> 31803240 |
Ran Li1, Weiwei Fu1, Rui Su2, Xiaomeng Tian1, Duo Du1, Yue Zhao1, Zhuqing Zheng1, Qiuming Chen1, Shan Gao1, Yudong Cai1, Xihong Wang1, Jinquan Li2, Yu Jiang1.
Abstract
It is broadly expected that next generation sequencing will ultimately generate a complete genome as is the latest goat reference genome (ARS1), which is considered to be one of the most continuous assemblies in livestock. However, the rich diversity of worldwide goat breeds indicates that a genome from one individual would be insufficient to represent the whole genomic contents of goats. By comparing nine de novo assemblies from seven sibling species of domestic goat with ARS1 and using resequencing and transcriptome data from goats for verification, we identified a total of 38.3 Mb sequences that were absent in ARS1. The pan-sequences contain genic fractions with considerable expression. Using the pan-genome (ARS1 together with the pan-sequences) as a reference genome, variation calling efficacy can be appreciably improved. A total of 56,657 spurious SNPs per individual were repressed and 24,414 novel SNPs per individual on average were recovered as a result of better reads mapping quality. The transcriptomic mapping rate was also increased by ∼1.15%. Our study demonstrated that comparing de novo assemblies from closely related species is an efficient and reliable strategy for finding missing sequences from the reference genome and could be applicable to other species. Pan-genome can serve as an improved reference genome in animals for a better exploration of the underlying genomic variations and could increase the probability of finding genotype-phenotype associations assessed by a comprehensive variation database containing much more differences between individuals. We have constructed a goat pan-genome web interface for data visualization (http://animal.nwsuaf.edu.cn/panGoat).Entities:
Keywords: de novo assembly; goats; pan-genome; pan-sequences; reference genome
Year: 2019 PMID: 31803240 PMCID: PMC6874019 DOI: 10.3389/fgene.2019.01169
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Figure 1Phylogenetic relationship of Caprini species (A) and their genomic divergence (B). Each of the representative genomes of other Caprini species was compared with the goat reference genome to estimate the genomic divergence.
Figure 2Characteristic of pan-sequences. (A) Length distribution of pan-sequences. (B) Homolog identification of pan-sequences within seven non-Caprini bovid species. (C) Frequency distribution of pan-sequences in domestic goats. (D) The cumulative size of pan-sequences by sequentially adding de novo assemblies of eight Caprini species (blue line) as compared with simulated sequence length by adding goat individuals (red line). The simulated sequence length was calculated using the formula as described in methods.
Figure 3The source of pan-sequences. (A) An example of pan-sequences resulting from insertions. A region of 18.8 kb was found to be present in goat by comparing Oar4.0 with ARS1 which was supported by reads mapping information. (B) An example of pan-sequences resulting from assembly errors in ARS1. The dot plots showed a region of 148 kb identified from chr20 of Oar4.0 that was missing in chr23 of ARS1. The presence of this region was supported by synteny with chr23 of CHIR2.0 and by the reads mapping information.
Examples of pan-sequences and their annotated genes.
| Pan-sequences ID | Number of samples with NRD ≥0.4 | Length (bp) | Identity (Coverage) with ARS1a | Paralog copies in goat pan-genomeb | Annotated genes | Gene description |
|---|---|---|---|---|---|---|
| 3_213322000_213345586-Ovis.aries | 107 | 23586 | No hit | 0 | LOC101119130 | TRIO and F-actin-binding protein |
| 5_12421945_12429000-Ovis.aries | 11 | 7055 | 76 (31) | 1 | LOC101112563 | Intercellular adhesion molecule 1-like |
| 6_115990993_115994660-Ovis.aries | 107 | 3667 | 70 (37) | 1 | FGFR3 | Fibroblast growth factor receptor 3 |
| 8_73765000_73783000-Ovis.aries | 107 | 18000 | 84 (11) | 0 | LRP11 | Low density lipoprotein receptor-related protein 11 |
| 12_71555499_71581526-Capra.hircus | 20 | 26027 | 75 (17) | 6 | MRP4 | Multidrug Resistance-Associated Protein 4 |
| 18_19266000_19273036-Ovis.aries | 102 | 7036 | 77 (38) | 4 | LOC101102268 | myeloid-associated differentiation marker-like |
| 20_25417856_25432000-Ovis.aries | 7 | 14144 | 84 (16) | 1 | LOC101109220 | SLA class II histocompatibility antigen, DQ haplotype D alpha chain-like |
| 20_34142966_34162204-Ovis.aries | 107 | 19238 | 77 (19) | 4 | PRL | Prolactin |
| AJPT02077673.1_24914_38585-Capra.hircus | 15 | 13671 | 82 (13) | 1 | DQA1 | SLA class II histocompatibility antigen, class II, DQ alpha 1 |
| AJPT02103288.1_12703_24009-Capra.hircus | 66 | 11306 | 78 (22) | 2 | GBP7 | Guanylate binding protein 7 |
aThe identity (coverage) of each pan-sequences were determined using BLASTN. bThe paralog copies were determined using tblastx (evalue<1e-20).
Figure 4Improvement of reads mapping for resequencing data using pan-genome versus ARS1. (A) Comparison of mapping ratio of resequencing data using pan-genome versus ARS1. (B) The mapping quality of reads from pan-sequences as compared with their original mapping quality on ARS1. (C) The number of identified SNPs for the 10 goat samples using pan-genome versus ARS1. (D) The reads mapping quality was improved within the red rectangle accompanied by repression of false SNPs removal of the low-quality mapped reads. Pan-base specifically refers to the ARS1 proportion in the pan-genome when using the pan-genome as the reference for mapping whereas ARS1 refers to using the ARS1 as the reference for mapping. T-test was used for the comparison. ** P < 0.01.
Figure 5Improvement of reads mapping for transcriptomic data using pan-genome versus ARS1. (A) Comparison of mapping ratio of resequencing data using pan-genome versus ARS1. (B) The mapping quality of reads from pan-sequences as compared with their original mapping quality on ARS1. (C) The expression of pan-sequences across nine tissues. T-test was used for the comparison ** P < 0.01.
Figure 6Overview of goat pan-genome database features.