| Literature DB >> 25319552 |
Aleksey V Zimin, Adam S Cornish, Mnirnal D Maudhoo, Robert M Gibbs, Xiongfei Zhang, Sanjit Pandey, Daniel T Meehan, Kristin Wipfler, Steven E Bosinger, Zachary P Johnson, Gregory K Tharp, Guillaume Marçais, Michael Roberts, Betsy Ferguson, Howard S Fox, Todd Treangen, Steven L Salzberg, James A Yorke, Robert B Norgren1.
Abstract
BACKGROUND: The rhesus macaque (Macaca mulatta) is a key species for advancing biomedical research. Like all draft mammalian genomes, the draft rhesus assembly (rheMac2) has gaps, sequencing errors and misassemblies that have prevented automated annotation pipelines from functioning correctly. Another rhesus macaque assembly, CR_1.0, is also available but is substantially more fragmented than rheMac2 with smaller contigs and scaffolds. Annotations for these two assemblies are limited in completeness and accuracy. High quality assembly and annotation files are required for a wide range of studies including expression, genetic and evolutionary analyses.Entities:
Mesh:
Substances:
Year: 2014 PMID: 25319552 PMCID: PMC4214606 DOI: 10.1186/1745-6150-9-20
Source DB: PubMed Journal: Biol Direct ISSN: 1745-6150 Impact factor: 4.540
Figure 1Flowchart illustrating procedures for assembly and annotation of the MacaM rhesus macaque genome.
Rhesus chromosome nomenclature
| 1 | 1 | 1 | 1 | 1 | 1 | 1 |
| 2p+ | 2a | 2a | 2a | 2a | 15 | 13 |
| 2q- | 2b | 2b | 2b | 2b | 9 | 12 |
| 3 | 3 | 3 | 3 | 3 | 3 | 2 |
| 4 | 4 | 4 | 4 | 4 | 4 | 5 |
| 5 | 5 | 5 | 5 | 5 | 5 | 6 |
| 6 | 6 | 6 | 6 | 6 | 6 | 4 |
| 7/21 | 7/21 | 7/21 | 7/21 | 7 | 2 | 3 |
| 8 | 8 | 8 | 8 | 8 | 8 | 8 |
| 9 | 9 | 9 | 9 | 9 | 14 | 15 |
| 10 | 10 | 10 | 10 | 10 | 10 | 9 |
| 11 | 11 | 11 | 11 | 11 | 11 | 14 |
| 12 | 12 | 12 | 12 | 12 | 12 | 11 |
| 13 | 13 | 13 | 13 | 13 | 16 | 17 |
| 14/15 | 14/15 | 14/15 | 14/15 | 14 | 7 | 7 |
| 20/22 | 20/22 | 20/22 | 20/22 | 15 | 13 | 10 |
| 16 | 16 | 16 | 16 | 16 | 20 | 20 |
| 17 | 17 | 17 | 17 | 17 | 17 | 16 |
| 18 | 18 | 18 | 18 | 18 | 18 | 18 |
| 19 | 19 | 19 | 19 | 19 | 19 | 19 |
| X | X | X | X | X | X | X |
| Y | Y | Y | Y | Y | Y | Y |
H = Human; C = Chimpanzee; G = Gorilla; O = Orangutan; M = MacaM; W = Wienberg et al. [16]; R = Rogers et al. [17].
Assembly statistics for de novorhesus transcripts
| SRX099247 | Cerebral cortex | 76 | Single | 1578 | 547 | 643 |
| SRX101205 | Cerebral cortex | 76 | Single | 1552 | 530 | 624 |
| SRX101272 | Cerebral cortex | 76 | Single | 1659 | 496 | 626 |
| SRX101273 | Cerebral cortex | 76 | Single | 1656 | 484 | 596 |
| SRX101274 | Cerebral cortex | 76 | Single | 2054 | 590 | 722 |
| SRX101275 | Cerebral cortex | 76 | Single | 1646 | 466 | 585 |
| SRX103458 | Caudate nucleus | 76 | Paired | 2651 | 824 | 952 |
| SRX101672 | Caudate nucleus | 76 | Paired | 2831 | 1079 | 1128 |
| SRX092157 | Cerebral cortex | 100 | Paired | 2322 | 829 | 924 |
| SRX092159 | Thymus | 100 | Paired | 2731 | 1060 | 1121 |
| SRX092158 | Testis | 100 | Paired | 1970 | 641 | 753 |
Mutations in the reference rhesus macaque which interfere with annotation
| 1 | 135383631 | KLHDC9 | stop-gain |
| 1 | 169148554 | ZBTB41 | start-loss |
| 3 | 50555959 | ZXDC | stop-gain |
| 9 | 18304595 | PRUNE2 | deletion |
| 10 | 23470221 | C10orf67 | start-loss |
| 15 | 274407 | PRPF6 | stop-gain |
All mutations were in the heterozygous state.
Chromosome assembly statistics
| rheMac2 | 172351 | 2646263223 | 219335 | 15354 | 28 | 6760 |
| CR_1.0 | 399581 | 2562947788 | 205919 | 6414 | 13 | 1707 |
| MacaM | 93402 | 2721371100 | 560771 | 29136 | 64 | 3534 |
Contig statistics based on contigs placed on chromosomes.
Comparison of rhesus proteins extracted from rheMac2 and MacaM annotation files with human orthologs
| rheMac2_N | 90.93 | 92.28 | 41.86 |
| MacaM | 97.02 | 98.16 | 1.12 |
rheMac2_N: NCBI annotation of rheMac2.
MacaM: Our annotation of the new MacaM assembly.
Values equal the mean percent Identities, Similarities and Gaps when comparing rheMac2_N and MacaM with human orthologs.
Figure 2Correction of rheMac2 SHE gene misassembly in MacaM. A. rheMac2 genome. Exons 1, 2, 4, 5 and 6 of the Src homology 2 domain containing E. (SHE) gene are contained within scaffold NW_001108937.1. Exon 3 of this gene was assigned to scaffold NW_001218118.1. Scaffold NW_001108937.1 was correctly assigned to chromosome 1. However, scaffold NW_001218118.1 was mistakenly assigned to chromosome X. This resulted in an annotation of the rhesus SHE gene with missing sequence (corresponding to exon 3). Additional details on the misassembly of this gene in rheMac2 can be found in [3]. B. MacaM genome. All 6 exons of the SHE gene were found on scaffold 2317188291 of the MacaM assembly.
Figure 3Alignment of rhesus macaque SHE proteins from different annotations with human protein. Human SHE protein accession: NP_001010846.1. MacaM: Protein derived from the MacaM rhesus macaque genome. rheMac2_N: Protein obtained from the NCBI annotation of rheMac2, accession. rheMac2_E: Protein obtained from the Ensembl annotation of rheMac2, accession ENSMMUT00000032345. CR_1.0: Protein obtained from the Chinese rhesus macaque genome produced by BGI [8]. Yellow highlighting indicates identical sequence in human and alternative rhesus macaque annotations with the exception of sequences that are only shared in rheMac2_E and CR_1.0 which are indicated by green highlighting. Exon boundaries are indicated by line separating amino acids.
mRNA-seq expression comparison between rheMac2 and MacaM in four tissues
| Testis | 5176 | 7587 | 45 | 47 |
| Thymus | 5177 | 7116 | 56 | 55 |
| Caudate nucleus | 4831 | 6726 | 51 | 52 |
| Cerebral cortex | 4963 | 6574 | 44 | 44 |
rheMac2_N: NCBI annotation of rheMac2.
MacaM: Our annotation of the new MacaM assembly.
All statistics were based on genes with an FPKM > =10.
Figure 4mRNA expression validation. We sequenced RNA from 60 rhesus macaque PBMC samples of differing ranks using Illumina paired end sequencing. After filtering, we mapped reads to either the MacaM (green symbols) or rheMac2 (blue symbols) assemblies using the STAR algorithm; we used CUFFLINKS to assign transcripts and determine differentially expressed genes (DEGs). (A) Number of uniquely mapping reads in individual RNA samples mapped using the MacaM and rheMac2 assemblies. Individual samples mapped by either assembly are joined by lines. (B) Percentage of total filtered reads that uniquely mapped to each assembly. (C) Number of DEGs that were identified using CUFFDIFF2.1 for dominant animals at two time points using the MacaM and rheMac2 genomes.
Figure 5Number of DEGs which were identified in an experiment analyzing social anxiety in rhesus macaques. CUFFDIFF2.1 was used to identify DEGs with two Ranks (R1 = dominant; R2 = subordinate) and three time points (T1 = baseline; T2 = T1 + 20 minutes; T3 = T1 + 260 minutes). Human intruder intervention occurred immediately before T2, after T1.