| Literature DB >> 31437279 |
Lu Wang1, Jinwei Wu1, Xiaomei Liu1, Dandan Di1, Yuhong Liang1, Yifei Feng1, Suyun Zhang1, Baoguo Li1,2, Xiao-Guang Qi1.
Abstract
BACKGROUND: The golden snub-nosed monkey (Rhinopithecus roxellana) is an endangered colobine species endemic to China, which has several distinct traits including a unique social structure. Although a genome assembly for R. roxellana is available, it is incomplete and fragmented because it was constructed using short-read sequencing technology. Thus, important information such as genome structural variation and repeat sequences may be absent.Entities:
Keywords: zzm321990 Rhinopithecus roxellanazzm321990 ; BioNano optical maps; annotation; genome assembly; high-quality
Mesh:
Year: 2019 PMID: 31437279 PMCID: PMC6705546 DOI: 10.1093/gigascience/giz098
Source DB: PubMed Journal: Gigascience ISSN: 2047-217X Impact factor: 6.524
Figure 1:Image of R. roxellana, taken on Qinling Mountain, China.
Reads generated by the 5 sequencing methods
| Paired-end libraries | Insert size (bp) | Total clean data (Gb) | Read length (bp) | Sequence coverage (×) |
|---|---|---|---|---|
| Illumina | 350 | 422.00 | 150 | 133.12 |
| PacBio | 20 k | 304.84 | n/a | 95.86 |
| 10X Genomics | 500−700 | 340.90 | 150 | 109.56 |
| BioNano | n/a | 463.75 | n/a | n/a |
| Hi-C | 350 | 307.90 | n/a | 97.77 |
| Total | n/a | 1,839.39 | n/a | 582.15 |
Note: The sequence coverage was calculated based on an estimated genome size of 3.18 Gb. n/a: not applicable.
Summary of the final R. roxellana genome assembly
| Category | Contig | Scaffold | ||
|---|---|---|---|---|
| Length (bp) | Number | Length (bp) | Number | |
| Total | 3,038,184,325 | 6,099 | 3,038,467,325 | 3,269 |
| Maximum | 30,757,641 | n/a | 206,558,726 | n/a |
| ≥2000 bp | n/a | 5,708 | n/a | 2,879 |
| N50 | 5,723,610 | 151 | 144,559,847 | 9 |
| N60 | 4,241,389 | 211 | 141,075,955 | 11 |
| N70 | 3,173,235 | 292 | 135,203,321 | 14 |
| N80 | 2,063,823 | 408 | 118,350,466 | 16 |
| N90 | 896,517 | 622 | 83,045,532 | 19 |
Note: The “Number” column represents the number of contigs/scaffolds longer than the value of the corresponding category. n/a: not applicable.
Figure 2:Hi-C heat map of interactions between pairs of chromosomal loci throughout the genome. Hi-C interactions within and among R. roxellana chromosomes (Chr 1–Chr 22); interactions were drawn based on the chromatin interaction frequencies between pairs of 100-kb genomic regions (as determined by Hi-C). In principle, darker red cells indicate stronger and more frequent interactions, which in turn imply that the 2 sequences are spatially close.
Summary and characteristics of the predicted protein-coding genes
| Gene set | Number | Mean transcript length (bp) | Mean coding sequence length (bp) | Mean exon length (bp) | Mean intron length (bp) | Mean exons per gene | |
|---|---|---|---|---|---|---|---|
|
| Augustus | 32,928 | 23,441 | 1,052 | 196 | 5,112 | 5.38 |
| GlimmerHMM | 618,957 | 4,204 | 404 | 166 | 2,654 | 2.43 | |
| SNAP | 97,298 | 49,851 | 755 | 144 | 11,597 | 5.23 | |
| Geneid | 36,863 | 35,242 | 1,035 | 188 | 7,615 | 5.49 | |
| Genscan | 50,419 | 40,635 | 1,137 | 167 | 6,800 | 6.81 | |
| Homology | Ggo | 25,281 | 19,893 | 1,055 | 184 | 3,971 | 5.74 |
| Hsa | 38,444 | 14,763 | 826 | 182 | 3,942 | 4.54 | |
| Mmu | 21,959 | 29,709 | 1,470 | 187 | 4,123 | 7.85 | |
| Rbi | 25,320 | 25,685 | 1,387 | 196 | 3,991 | 7.09 | |
| Rro | 24,121 | 28,439 | 1,420 | 185 | 4,043 | 7.68 | |
| RNASeq | PASA | 66,620 | 28,449 | 1,219 | 164 | 4,247 | 7.41 |
| Cufflinks | 73,199 | 31,497 | 2,737 | 409 | 5,052 | 6.69 | |
| EVM | 30,102 | 22,298 | 1,098 | 182 | 4,199 | 6.05 | |
| Pasa-update* | 29,403 | 27,638 | 1,180 | 181 | 4,782 | 6.53 | |
| Final set* | 22,497 | 34,153 | 1,369 | 178 | 4,885 | 7.71 | |
Note: Pasa-update* includes only the untranslated regions; other regions were not included. Final set* represents the results after the Pasa filtering process, where the longest isoform was chosen in the case of multiple splicing isoforms; redundant single exons were also discarded. The “Number” column gives the number of protein-coding genes predicted by each method.
Figure 3:Gene predictions. (a) Number of genes estimated by various prediction approaches: de novo (blue), homology (pink), and RNA-sequencing data (green). The labels rna_0.5, denove_0.5, and homology_0.5 indicate the genes predicted by each method with an overlap >50%. (b) Number of genes predicted based on de novo, homology, and RNA-sequencing approaches, in addition to expression level (in reads per kilobase of transcript, per million mapped reads [rpkm]). The labels rna_0.5, denove_0.5, and homology_0.5 indicate the genes predicted by each method with an overlap >50%, while rpkm > 1 indicates those genes with a relative expression level >1.
Figure 4:R. roxellana phylogenetic relationships and gene families. Phylogenetic relationships were inferred from 5,418 single-copy gene families in R. roxellana and other mammals. All nodes had support values of 100%. Estimated divergence times are given near each node. Numbers under each species indicate the number of gene families that have been expanded (green) and contracted (brown) since the split of species from the most recent common ancestor (MRCA). The numbers near each node (blue) correspond to the estimated divergence time of these species. Monkey images are copyright 2013 Stephen D. Nash of the International Union for Conservation of Nature Species Survival Commission Primate Specialist Group and are used with permission. MYA: million years ago.