| Literature DB >> 30382841 |
Tao Tan1,2, Lin Xia3, Kailing Tu3, Jie Tang3, Senlin Yin3, Lunzhi Dai3,4, Peng Lei3,5, Biao Dong3,6, Hongbo Hu3,7, Yong Fan8, Yang Yu9, Dan Xie10,11.
Abstract
BACKGROUNDS: Macaca fascicularis (M. fascicularis) is a primate model organism that played important role in studying human health. It is vital to better understand the similarity and differences of gene regulation between M. fascicularis and human. Current comparative study of gene regulation between the two species are limited by low quality of gene annotation and lack of regulatory element data on M. fascicularis genome.Entities:
Keywords: Crab-eating macaque; Evolution; Gene regulation
Mesh:
Year: 2018 PMID: 30382841 PMCID: PMC6211470 DOI: 10.1186/s12864-018-5183-y
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1Workflow of annotation procedure. Overview of the data and workflow of the computational annotation and manual annotation
Fig. 2Quantitative measure of the comparison of different annotation versions. a The cumulative congruence distributions of Ensemble Macaca fascicularis 5.0.91 annotation, NCBI Macaca fascicularis release 101 and the new annotation (manual) on gene level, compared with cDNA alignments, EST alignments and RNA-seq alignments. Great congruence means high strong consistency with evidence. b The cumulative congruence distributions of Ensemble Macaca fascicularis 5.0.91 annotation, NCBI Macaca fascicularis release 101 and the new annotation on exon level, compared with cDNA alignments, EST alignments and RNA-seq alignments. Great congruence means high strong consistency with evidence
Comparision of different annotation
| Future | Human Gencode version 19 | Refseq | NCBI 101 (2016.02) | Ensemble version1 (2017.12) | Computational annotation | Manual Annotation | |
|---|---|---|---|---|---|---|---|
| Genes | 57,820 (main chromosome without chrY) | - | 33,368 (31,950 main chromosome without chrY) | 28,592 (27,966 main chromosome without chrY) | 23,282 (main chromosome without chrY) | 13,413 (main chromosome without chrY) | |
| Protein-coding genes | 20,345 | - | 20,627 | 21,404 | 17,774 | 13,196 | |
| average length of genes | 29,907 bp | - | 41,827 bp | 38,598 bp | 43,579 bp | 62,503 bp | |
| transcripts | 196,520 | 2037 | 76,559 | 53,156 | 27,029 | - | |
| average length of transcripts | 34,243 bp | 40,127 bp | 80,039 bp | 49,987 bp | 44,243 bp | - | |
| Number of transcripts per gene | 3.4 | 1 | 2.29 | 1.86 | 1.16 | - | |
| CDSs | 269,043 | 15,307 | 62,934 | 481,660 | 27,628 | - | |
| average length of CDSs | 159 bp | 147 bp | 709 bp | 155 bp | 1410 bp | - | |
| Exons | 562,673 | 16,597 | 289,695 | 500,876 | 201,670 | 771,632 | |
| average length of exons | 330 bp | 217 bp | 403 bp | 218 bp | 195 bp | 615 bp | |
| Number of exons per transcript | 2.86 | 8.15 | 3.78 | 9.42 | 7.46 | 13.93 | |
| coverage rate(main chromosomes) | min | 12.08% | 1.08%(without chrY, chrM) | 33.87%(without chrY) | 27.45%(without chrY) | 23.95%(without chrY, chrM) | 18.14%(without chrY, chrM) |
| median | 51.71% | 3.07%(without chrY, chrM) | 48.96%(without chrY) | 39.94%(without chrY) | 35.17%(without chrY, chrM) | 29.96%(without chrY, chrM) | |
| mean | 51.17% | 2.91%(without chrY, chrM) | 48.14%(without chrY) | 40.79%(without chrY) | 35.92%(without chrY, chrM) | 29.22%(without chrY, chrM) | |
| max | 92.50% | 4.59%(without chrY, chrM) | 68.36%(without chrY) | 92.42%(without chrY) | 50.57%(without chrY, chrM) | 39.45%(without chrY, chrM) | |
| coverage rate (genome, all main chromosomes) | 50.39% | 2.81%(without chrY, chrM) | 46.8%(without chrY) | 37.98%(without chrY) | 33.94%(without chrY, chrM) | 28.09%(without chrY, chrM) | |
Fig. 3Complexity of tissue transcriptomes and comparison of tissue expression profiles across human and M. fascicularis. a Cumulative distribution of the faction of total orthologous transcription contributed by genes which in order of decreasing expression in each tissue (x axis). Left panel shows complexity of tissue transcriptomes in human, right panel shows M. fascicularis. b The heat map shows the all-versus-all Pearson correlation matrix between 13 tissues in human and M. fascicularis over all 11,446 orthologous genes. Red box means specific tissue expression pattern of M. fascicularis; Black box presents specific tissue expression pattern of human; black and red box presents similar tissue expression pattern of same kind of tissue between human and M. fascicularis. Orange means highest correlation coefficient, blue means lowest correlation coefficient. Samples from pituitary of M. fascicularis were colored in red