Literature DB >> 26125617

Complete Taiwanese Macaque (Macaca cyclopis) Mitochondrial Genome: Reference-Assisted de novo Assembly with Multiple k-mer Strategy.

Yu-Feng Huang1, Mohit Midha2, Tzu-Han Chen1, Yu-Tai Wang3, David Glenn Smith4, Kurtis Jai-Chyi Pei5, Kuo Ping Chiu6.   

Abstract

The Taiwanese (Formosan) macaque (Macaca cyclopis) is the only nonhuman primate endemic to Taiwan. This primate species is valuable for evolutionary studies and as subjects in medical research. However, only partial fragments of the mitochondrial genome (mitogenome) of this primate species have been sequenced, not mentioning its nuclear genome. We employed next-generation sequencing to generate 2 x 90 bp paired-end reads, followed by reference-assisted de novo assembly with multiple k-mer strategy to characterize the M. cyclopis mitogenome. We compared the assembled mitogenome with that of other macaque species for phylogenetic analysis. Our results show that, the M. cyclopis mitogenome consists of 16,563 nucleotides encoding for 13 protein-coding genes, 2 ribosomal RNAs and 22 transfer RNAs. Phylogenetic analysis indicates that M. cyclopis is most closely related to M. mulatta lasiota (Chinese rhesus macaque), supporting the notion of Asia-continental origin of M. cyclopis proposed in previous studies based on partial mitochondrial sequences. Our work presents a novel approach for assembling a mitogenome that utilizes the capabilities of de novo genome assembly with assistance of a reference genome. The availability of the complete Taiwanese macaque mitogenome will facilitate the study of primate evolution and the characterization of genetic variations for the potential usage of this species as a non-human primate model for medical research.

Entities:  

Mesh:

Year:  2015        PMID: 26125617      PMCID: PMC4488429          DOI: 10.1371/journal.pone.0130673

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


Introduction

Genome assembly has been an area of interest for research groups worldwide since the initiation of the Human Genome Project in 1990. Genome assembly is critically important in the sense that it provides genomic maps to show the locations of protein-coding genes, noncoding genes, regulatory elements, as well as sequence sometimes regarded as “junk DNA” for the future study of gene expression (e.g., transcriptome analysis) and regulation (e.g., miRNA expression, and mapping of transcription factor binding sites and epigenetic modifications). The complete mitochondrial genomes (mitogenomes) of about one-third Asian macaques [1-3] have been sequenced and assembled up-to-date. These include the mitogenomes of the Indian rhesus macaque (Macaca mulatta) [4-6], Chinese rhesus macaque (M. mulatta lasiota) [7], longtail/crab-eating macaque (M. fascicularis) from Mauritius, Malaysia and Indochina [6, 8], Tibetan macaque (M. thibetana) [6, 9], Assam macaque (M. assamensis) [10], lion-tailed macaque (M. silenus) [6], stump-tailed macaque (M. arctoides) [6, 11], Tonkean macaque (M. tonkeana) [6] and Japanese macaque (M. fuscata) [12], leaving only the mitogenome of Taiwanese (Formosan) macaque (M. cyclopis) remain unsequenced. The mitogenome of Taiwanese macaque could be an important piece in solving the puzzle of evolution of the fascicularis group. The Taiwanese macaque is the only non-human primate species endemic to Taiwan and they are commonly found in forest habitat in lowland to mid-elevation (up to approximately 2,000 meter above sea level). Together with the Indian rhesus macaque, Chinese rhesus macaque, crab-eating macaque, and Japanese macaque, it belongs to the fascicularis group of macaque species [2, 13–16]. However, this classification is based on phenotypic characteristics (e.g., the morphology of male sex organ) that may be discordant with phylogeny based on nuclear loci, such as short tandem repeats (STRs) and partial mitochondrial DNA (mtDNA) sequences. Mitogenome phylogenetics has emerged as a superior approach for the study of primate relationships and evolution [17], and the level of genetic differences among macaques justifies their usage as animal models for the study of different human diseases [18]. Therefore, completion of a whole mitogenome sequence for M. cyclopis is of immense interest and will facilitate the phylogenetic study of primate evolution and medical research. Previous mitochondrial genome sequencing employed three types of approaches: capture-enrichment, next-generation sequencing (NGS) or a mixture of the two [19]. Initially, the most common approach used multiplex PCR to capture overlapping fragments that could be assembled into longer sequence [4, 7, 9, 10]. Later, amplification of longer fragments by long-range PCR or multiplex PCR [8] and NGS [20-22] was employed for assembly. We constructed a M. cyclopis whole genome shotgun library by NGS. Taking advantage of the availability of M. cyclopis mitochondrial sequence reads in the library and using the existing M. mulatta mitogenome as a reference, we adopted a novel procedure of reference-assisted de novo assembly with multiple k-mer strategy to complete the M. cyclopis mitogenome. As expected, our results show that its organization and gene order are very similar to those of other macaques and phylogeny based on mitogenomes of several macaque species supports a close relationship of M. cyclopis to M. mulatta mulatta and M. mulatta lasiota in the fascicularis group.

Materials and Methods

Ethics statement

The macaque used in this study was rescued and adopted in Rescue Center for Endangered Wild Animals, National Pingtung University of Science and Technology (NPUST). This study was approved by Institutional Animal Care and Use Committee of National Pingtung University of Science and Technology Laboratory Animal Center. (Approval Number: NPUST-IACUC-101-082). The blood samples were collected from 25 Formosan macaques individually which have been kept in the same troop in Pingtung Rescue Center of Wild Endangered Animals for more than 5 years. Only one female macaque was selected for this study after comprehensive examination. This troop that contained all the 25 Formosan macaques were kept in the 30 x 30 m outdoor enclosure which filled with branches, ropes and platforms and were provided with 30kg fresh fruit and 40kg fresh vegetables twice daily. Samples were obtained as part of a comprehensive health screening effort conducted biannually by Pingtung Rescue Center of Wild Endangered Animals and was handled by veterinarians with professional personnel of Pingtung Rescue Center of Wild Endangered Animals. During blood sample collection, macaques were trapped in a portable cage measuring 80 × 80 × 120 cm separately and sedated with 5 mg/kg ketamine combined with 0.25 mg/kg xylazine intramuscularly. After induction of anesthesia, macaques were incubated and the anesthesia was then maintained by isoflurane inhalation and monitored by SpO2, heart rate and respiration rate continuously. After sample collection, the anesthesia was reversed by 0.2 mg/kg atipamezole and animals were placed in the portable cage separately for fully recovery from anesthesia before being released back into the outdoor enclosure. All anesthetized macaques were given a complete physical examination, and using universal precautions and sterile technique by veterinarians. The veterinarians and personnel of Pingtung Rescue Center of Wild Endangered Animals monitored and handled all anesthetized macaques during procedure. None of the macaques was sacrificed at the end of the study.

Sample collection, DNA extraction and sequencing

Out of 25 macaques, one female M. cyclopis (NPUST ID: 12060805) from Institute of Wildlife Conservation, National Pingtung University of Science and Technology was used for this study. A total of 10 mL blood was collected by venipuncture of the femoral vein out of which 7 mL blood was aliquotted into Vacutainer vials containing EDTA whereas the remaining blood was centrifuged to extract serum. Serum and whole blood were stored at −20°C and 4°C and later used in further analysis. Genomic DNA was isolated from the peripheral blood cells, sonicated, electrophoresed, and the ~500 bp fragments were isolated with gel excision and used for whole genome sequencing library construction. The library was subjected to 2 x 90 bp paired-end (PE) sequencing by Illumina HiSeq2000 instrument. Followed by reference-assisted de novo mitochondrial genome assembly pipeline is shown in Fig 1.
Fig 1

The simple de novo mitochondrial genome assembly pipeline for the Taiwanese macaque mitochondrial genome.

Pre-assembly processing of reads

To separate the mitochondrial reads of M. cyclopis we used mitogenome of Indian rhesus macaque (M. mulatta) [GenBank: NC_005943] as reference and mapped raw reads to reference with Bowtie2 [23] using default parameters. Our method includes reads selection using reference mitogenome, hence we named it as reference-assisted. Mapped read IDs were retrieved from BAM file even if only one mate of paired-end read was mapped. Paired-end (PE) reads were extracted by seqtk (https://github.com/lh3/seqtk) software according to the list extracted from the BAM file. NGS QC Toolkit [24] was used to filter PE reads with QV ≥ 20 for all bases to construct a read pool. This read pool might contain SE reads as the pair might have broken because of lower QV than threshold. Similarly two more read pools were generated with QV ≥ 25 and QV ≥ 30 for all bases. These three read pools were each separated into two pools having 1) only paired-end reads (PE), and 2) both paired-end reads and single-end reads (PE+SE) thereby making six pools in total. This process of constructing six read pools is designed to increase variability. All six read pools were used for de novo assembly of the mitogenome. Coverage was estimated using formula: C = (N * L * 2) / G for PE reads, and C = (N * L) / G for SE reads, where C stands for coverage, N is the number of reads in the sequencing library, L is the read length, and G is the size of the target genome.

De novo mitochondrial genome assembly with multiple k-mer strategy

The idea to assemble the mitochondrial genome of Taiwanese macaque was to utilize all possible scaffolds generated by assembler with 15 different k-mer combinations (starting from minimum of 61 to maximum of 89 by increment of 2). In addition, we generated different read pools with corresponding QV threshold for NGS QC Toolkit described in the previous section. The multiple k-mer strategy combines the advantages of the capability of large k-mers to resolve repetitive sequences and the capability of small k-mers to assemble low coverage bases and overcome sequencing errors [25, 26]. This pipeline was used to produce scaffolds from 90 different combinations (15 k-mers on 6 read pools). We used SOAPdenovo v1.05 [27] to obtain scaffolds. Further to eliminate gaps on scaffolds we used GapFiller [28] with three options: bowtie, bwasw and bwa, with a maximum of 50 iterations for sequence convergence.

Finishing the draft

Circular mitochondrial DNA was sequenced in the shotgun library and assembled into a linear sequence, therefore, a repeat (overlapping) region at both the 5’- and 3’-ends should be present [29]. We used BLASTN from NCBI BLAST+ [30] to determine the overlapped region for each scaffold and circularize the scaffold by merging the overlapped regions (Fig 2). Based on the size of existing mitogenomes of macaque species, scaffolds with sizes ranging between 16,500 bp and 16,700 bp were retained as mitogenome candidates.
Fig 2

Illustration of checking for boundary overlap to finish the mitochondrial genome.

Circular mitochondrial DNA was assembled to scaffolds in linear form each of which exhibited overlapped regions at the ends.

Illustration of checking for boundary overlap to finish the mitochondrial genome.

Circular mitochondrial DNA was assembled to scaffolds in linear form each of which exhibited overlapped regions at the ends. All mitogenome candidates assembled by de novo mitogenome assembly pipeline, were collected. The orientation of candidate mitogenomes was unified by CSA (cyclic DNA sequence aligner) [31] and then compared with each other using CD-HIT-EST in CD-HIT package [32, 33] with the consideration of forward and reversed strands. Sequence represented by the cluster with the largest number of identical sequences is concluded as mitochondrial genome of M. cyclopis.

Mitochondrial genome annotation and sequence comparison

To identify the boundaries of protein-coding genes, tRNA genes, and rRNA genes for annotation of the M. cyclopis mitogenome, MITOS [34] and DOGMA [35] were used. Mitogenome and gene boundaries, both were validated using BLASTN by comparing gene annotations of Indian rhesus macaque and with related mtDNA fragments available in GenBank. For protein-coding genes, sequences were translated by Sequin software and verified by aligning the translated sequence against the UniProt/TrEMBL database. The circular map of the mitochondrial genome was generated by CGView [36]. Repeat regions were detected by RepeatMasker (http://www.repeatmasker.org/) with rmblast. Clustal Omega [37] was employed for multiple sequence alignment on the mitogenome and DNA fragments respectively.

Comparative phylogenetic analysis

To study comparative phylogenetics of macaques, each mitochondrial genome was aligned by unifying starting position with respect to mitogenome of Indian rhesus macaque [GenBank: NC_005943]. Phylogenetic relationships among the macaques based on the complete mitogenome were estimated with Clustal Omega. Ambiguously aligned positions were removed by using Gblocks v0.91b [38] under default settings. We used MrBayes v3.2.3 [39], one of hierarchical Bayesian inference (BI) tools, to estimate phylogeny. Each search was run simultaneous Markov chains for 40000 generations, sampling every 1000 generations. Generations sampled before the chain reached stationarity (“burn-in”) were discarded. Phylogenetic analysis has been demonstrated on the complete mitochondrial genome using the Homo sapiens mitogenome [GenBank: NC_012920] as outgroup. Tree was visualized with FigTree v1.4.2 (http://tree.bio.ed.ac.uk/software/figtree/). In addition, pairwise sequence identity was determined by LAGAN [40] to analyse sequence-level variation.

Results and Discussion

The completed M. cyclopis mitochondrial genome

A total of 574,495,990 2 x 90 bp paired-end (PE) reads were generated by an Illumina HiSeq2000 sequencer from the whole genome shotgun sequencing library of M. cyclopis. Among those, 222,251 raw reads were mapped to the Indian rhesus reference mitogenome (S1 Dataset). The overall alignment rate is 0.04% and the coverage is 2,265X±578X with median of 2,304X. Coverage was estimated assuming mitogenome to be around 16,500bp. Read pools generated after QV based selection are summarised in Table 1. Six read pools were then processed to retrieve draft mitogenome candidates with our assembly pipeline. No ambiguous bases were found in these draft mitogenomes.
Table 1

Library statistics of quality reads and coverage information.

Quality score cut-offPaired-End (PE) readsSingle-End (SE) readsTotal basesCoverage(average)
QV ≥ 2064,99876,64418,597,6001,127
QV ≥ 2539,25478,26114,109,210855
QV ≥ 3011,24157,0627,158,960434
By comparing these candidates it was revealed that largest cluster contained 7 identical sequences (4 forward strand forms and 3 reversed strand forms), while other two clusters contained only 1 sequence each. Thus, the sequence of the largest cluster was recognized as the complete sequence of M. cyclopis mitogenome. The M. cyclopis mitogenome (GenBank accession number KM023192) is a double-stranded nucleotide sequence of 16,563 bp, falling within the size range of other published macaque mitogenome sequences (Table 2) including Asian and Northern African macaques. The sizes of macaque mitogenomes sequenced to date range from 16,540 bp in M. thibetana to 16,586 bp in M. sylvanus; the size of the Taiwanese macaque mitogenome (16,563) is closest to that of M. mulatta (16,564), M. fuscata (16,565) and M. mulatta lasiota (16,561).
Table 2

Comparison of mitochondrial size between macaque species.

SpeciesSize (bp)GenBank accession numberReference
M. mulatta 16,555KJ567051[6]
M. mulattalasiota 16,561KF830702[7]
M. fuscata 16,565NC_025513 / KM401548[12]
M. mulatta 16,564JQ821843[5]
M. mulatta 16,564NC_005943[4]
M. mulatta 16,564KJ567053[6]
M. arctoides 16,559KM360179[11]
M. arctoides 16,558NC_025201 / KJ567055[6]
M. fascicularis 16,569KJ567052[6]
M. fascicularis 16,571KF305937[8]
M. fascicularis 16,575NC_012670Yi, et al. 2009, unpublished
M. thibetana 16,540NC_011519[9]
M. thibetana 16,539KJ567056[6]
M. assamensis 16,542NC_023795 / KF990122[10]
M. tonkeana 16,560NC_025222 / KJ567058[6]
M. silenus 16,541NC_025221 / KJ567057[6]
M. sylvanus 16,586NC_002764[49]
M. sylvanus 16,585KJ567054[6]
M. cyclopis 16,563KM023192This report

Genome organization and gene arrangement

The complete organization of the mitochondrial genome of M. cyclopis is shown in Fig 3. It contains 13 protein-coding genes (ND1-6, ND4L, COX 1–3, ATP6, ATP8, CYTB), 2 ribosomal RNAs (12S and 16S), 22 transfer RNAs and a control region (Table 3). The heavy strand (H-strand) contains 2 ribosomal RNAs, 14 transfer RNAs and 12 protein-coding genes while the light strand (L-strand) contains 8 transfer RNAs and one protein-coding gene. The origin of light strand replication, OL, located between tRNA-Asn and tRNA-Cys, is 32 bp in length. In addition, there are five overlaps between adjacent genes out of which three are same strand overlap. Among same strand overlaps there is a 46 bp overlap between ATP8 and ATP6, a 1 bp overlap between ATP6 and COX3, and a 7 bp overlap between ND4 and ND4L. We identified a 3 bp overlap between tRNA-Ile and tRNA-Gln and a 28 bp overlap between COX1 and tRNA-Ser, both of gene pairs lie on opposite strands.
Fig 3

The organization of the mitogenome of M. cyclopis.

The outer and the inner circles represent the H- and L-strands, respectively. The tRNA, rRNA, protein-coding genes and control region are represented in green, red, blue and gray, respectively.

Table 3

Summary of the mitochondrial genome organization of M. cyclopis.

NamePosition (Start-End)StrandLength# of AAAnti-codonInferred initiation codonInferred termination codonIntergenic spacer 1
tRNA-PheH72GAA0
tRNA-Phe538–609H72GAA0
rrnS (12S)610–1556H9470
tRNA-Val1557–1625H69TAC0
rrnL (16S)1626–3184H15590
tRNA-Leu (L2)3185–3259H75TAA2
ND13262–4216H955318ATGT—0
tRNA-Ile4217–4285H69GAT-3
tRNA-Gln4283–4354L72TTG1
tRNA-Met4356–4423H68CAT0
ND24424–5465H1042347ATTT—0
tRNA-Trp5466–5532H67TCA7
tRNA-Ala5540–5608L69TGC1
tRNA-Asn5610–5682L73GTT32
tRNA-Cys5715–5783L69GCA0
tRNA-Tyr5784–5847L64GTA4
COX15852–7420H1569522ATGTAA-28
tRNA-Ser (S2)7393–7461L69TGA3
tRNA-Asp7465–7532H68GTC1
COX27534–8217H684227ATGTAA73
tRNA-Lys8291–8353H63TTT1
ATP88355–8561H20768ATGTAA-46
ATP68516–9196H681226ATGTAA-1
COX39196–9979H784261ATGT—0
tRNA-Gly9980–10047H68TCC0
ND310048–10393H346115ATTT—0
tRNA-Arg10394–10458H65TCG0
ND4L10459–10755H29798ATGTAA-7
ND410749–12126H1378459ATGT—0
tRNA-His12127–12195H69GTG0
tRNA-Ser (S1)12196–12254H59GCT0
tRNA-Leu (L1)12255–12325H71TAG0
ND512326–14137H1812603ATATAA0
ND614138–14665L528ATGAGG0
tRNA-Glu14666–14734L69TTC4
CYTB14739–15879H1141380ATGT—0
tRNA-Thr15880–15943H64TGT1
tRNA-Pro15945–16012L68TGG0

1Intergenic spacer: numbers indicate nucleotides separating two adjacent genes. Negative numbers indicate overlapping nucleotides

The organization of the mitogenome of M. cyclopis.

The outer and the inner circles represent the H- and L-strands, respectively. The tRNA, rRNA, protein-coding genes and control region are represented in green, red, blue and gray, respectively. 1Intergenic spacer: numbers indicate nucleotides separating two adjacent genes. Negative numbers indicate overlapping nucleotides

Protein coding genes (PCGs)

The 13 protein-coding genes of the M. cyclopis mitochondrial genome include 7 NADH dehydrogenase subunits (ND1-6, ND4L), 3 cytochrome c oxidase subunits (COX1-3), 2 ATP synthase subunits (ATP6, ATP8), and cytochrome b (CYTB). Ten of 13 protein-coding genes were initiated with ATG, while ND2 and ND3 shared ATT as their start codon, and ND5 utilized ATA. Six protein-coding genes (COX1, COX2, ATP8, ATP6, ND4L, ND5) ended with TAA, whereas ND6 terminated with AGG. Remaining six genes that comprise ND1, ND2, COX3, ND3, ND4 and CYTB were found to have incomplete stop codons (T—).

Ribosomal RNA and transfer RNA

Taiwanese macaque mtDNA contains the small unit rRNA (12S rRNA) and large unit rRNA (16S rRNA) genes. They are located between tRNA-Phe and tRNA-Leu (L2) genes, and separated from each other by the tRNA-Val gene. The sizes of the two ribosomal RNAs are 947 bp and 1,209 bp, respectively. There are 22 tRNA genes, with sizes ranging from 59 bp of tRNA-Ser to 75 bp of tRNA-Leu (L2), that include two more tRNAs that code for leucine and serine. The mitochondrial WANCY tRNA-gene cluster, located between ND2 and COX1, is 383 bp in length.

Control region (hypervariable region, non-coding regions)

The control region (D-loop) of the mitogenomes of macaques, size of which varies from 1,083 bp in M. mulatta lasiota to 1,100 bp in M. fascicularis [GenBank: KF305937], is located between tRNA-Pro and tRNA-Phe. This region is 1,088 bp long in mitogenome of M. cyclopis. RepeatMasker identified a (TCGTACA)n repeat within 43 bp located in nps 282–322 of the D-loop. The control region of the Taiwanese macaque exhibits a higher GC ratio than other macaque species of the fascicularis group (Table 4). Within this sequence, we identified the central domain, extended termination associated sequences (ETAS) domain and conserved sequence blocks (CSB) domain based on previous studies [9, 41] with multiple sequence alignment of the D-Loop of 19 macaques. The central domain is located in the position 364–686. CSB1, CSB2 and CSB3 are located in nps 763–787, 849–864 and 892–910, respectively. The ETAS1 and ETAS2 are near the 5’ end of the control region, at nps 57–113 and 275–334, respectively.
Table 4

Characteristics of the D-loop/control region of macaque species.

SpeciesGenBank Acc.Size (bp)ACGTNG+C (%)CpG
bp%bp%bp%bp%
M. mulatta JQ821843108532129.59%34131.43%13512.44%28826.54%043.87%58
M. fascicularis KF305937110032629.64%34431.27%14112.82%28926.27%044.09%66
M. mulatta lasiotus KF830702108332029.55%33731.12%13812.74%28826.59%043.86%58
M. mulatta KJ567051108332229.73%33230.66%13612.56%29327.05%043.21%54
M. fascicularis KJ567052109633030.11%33630.66%13312.14%29727.10%042.79%56
M. mulatta KJ567053108531929.40%33931.24%13612.53%29026.73%143.78%58
M. sylvanus KJ567054109432229.43%35732.63%14012.80%27525.14%045.43%58
M. thibetana KJ567056107032330.19%34131.87%13512.62%27125.33%044.49%56
M. arctoides KM360179108331729.27%35332.59%14413.30%26924.84%045.89%74
M. sylvanus NC_002764109532229.41%35432.33%14112.88%27825.39%045.21%60
M. mulatta NC_005943108532129.59%34131.43%13512.44%28826.54%043.87%58
M. thibetana NC_011519109132930.16%34331.44%13812.65%28125.76%044.09%56
M. fascicularis NC_012670109532930.05%33930.96%13812.60%28926.39%043.56%58
M. assamensis NC_023795109132629.88%34031.16%14012.83%28526.12%044.00%58
M. arctoides NC_025201108131829.42%34231.64%14313.23%27825.72%044.87%72
M. silenus NC_025221108831629.04%33430.70%14313.14%29527.11%043.84%54
M. tonkeana NC_025222108832029.41%34231.43%13812.68%28826.47%044.12%62
M. fuscata NC_025513108832029.41%34531.71%14112.96%28225.92%044.67%56
M. cyclopis KM023192108831629.04%34031.25%14213.05%29026.65%044.30%62
Three datasets of mtDNA fragments associated with the Taiwanese macaque have previously been deposited in GenBank. These dataset were used for validating the complete mitochondrial genome of M. cyclopis and the alignment results in terms of identity, mismatch and gap are shown in Table 5. In a study by Smith and colleagues, out of 1,053 samples, 53 samples (DQ373370~DQ373422) [18] were from Taiwanese macaques. They sequenced 835 bp long mtDNA fragment containing one seventh of CYTB, tRNA-Pro, tRNA-Thr and HVS-1(first hypervariable segment of control region) of the fascicularis group of macaque species for the analysis of mitochondrial DNA variation. Sequence identity ranges from 98.66% to 100% without gap was validated in the coding region of CYTB, tRNA-Pro, and tRNA-Thr. We compared previously sequenced 8 fragments (called 8 samples) from different studies (Table 6) with similar region in mitogenome of M. cyclopis. These include 12S rRNA, 16S rRNA, tRNA-Val, tRNA-Glu, COX1-3 and CYTB (AF424944.1, AF424945.1 [42]; AY685836.1, AY685877.1, AY685795.1, AY685786.1, AY685713.1 [16] and AJ304499.1 (unpublished)). The sequence identity ranges from 98.66% to 100% with full-length alignment of query sequences without gap. 73 control region-containing fragments (called 73 samples) that include AB261600.1 [43], AY014865.1 ~ AY014877.1, AY016337.1 [44], AY682594.1 [45], AY878873.1 ~ AY878925.1, DQ143984.1 ~ DQ143987.1 [13] were also compared with the control region of M. cyclopis and 91.62% to 99.65% sequence identity was found. In summary, variation in the control region among individuals is obvious in the alignment results with far more mismatches and gaps than occur in the coding region.
Table 5

Results of sequence alignment of the complete M. cyclopis mitochondrial genome against the three existing datasets of M. cyclopis partial mitochondrial fragments.

DatasetType 1 IdentityMismatchGap
OverallMinMaxMinMaxMinmax
53 samplesOverall96.2295.4597.49203701
Coding region99.2098.66100.000400
Control region94.5693.6896.10203301
8 samplesCoding region99.5798.66100.000700
73 samplesControl region94.8791.6299.6524504

1Overall: whole mitogenome as a subject sequence in BLASTN. Coding region: the coding region of the mitogenome, starting with tRNA-Phe and ending with tRNA-Pro, as a subject sequence in BLASTN. Control region: the control region of the mitogenome as a subject sequence in BLASTN.

Table 6

List of sequenced fragments of M. cyclopis mitochondrial genome.

GenBank accession numberDescriptionReference
AY685877.1 M. cyclopis isolate Taiwan 12S ribosomal RNA gene, partial sequence; mitochondrial[16]
AF424944.1 M. cyclopis isolate 1 12S ribosomal RNA gene, partial sequence; tRNA-Val gene, complete sequence; and 16S ribosomal RNA gene, partial sequence; mitochondrial genes for mitochondrial products[42]
AF424945.1 M. cyclopis isolate 2682 12S ribosomal RNA gene, partial sequence; tRNA-Val gene, complete sequence; and 16S ribosomal RNA gene, partial sequence; mitochondrial genes for mitochondrial products[42]
AY685713.1 M. cyclopis isolate Taiwan cytochrome oxidase subunit I gene, partial cds; mitochondrial [16]
AY685786.1 M. cyclopis isolate Taiwan cytochrome oxidase subunit II gene, partial cds; mitochondrial [16]
AY685795.1 M. cyclopis isolate Taiwan cytochrome oxidase subunit III gene, partial cds; mitochondrial [16]
AY685836.1 M. cyclopis isolate Taiwan tRNA-Glu gene, partial sequence; and cytochrome b gene, partial cds; mitochondrial[16]
AJ304499.1 M. cyclopis mitochondrial partial cytb gene for cytochrome bLee, unpublished (2000)
1Overall: whole mitogenome as a subject sequence in BLASTN. Coding region: the coding region of the mitogenome, starting with tRNA-Phe and ending with tRNA-Pro, as a subject sequence in BLASTN. Control region: the control region of the mitogenome as a subject sequence in BLASTN. We compare the mitochondrial genome of 19 different macaques including M. cyclopis and created phylogenetic trees by applying Bayesian method (Fig 4). Pair-wise sequence identity between macaque mitochondrial genomes indicate that M. cyclopis is phylogenetically closer to M. mulatta lasiota than to M. mulatta mulattaas well (Table 7). Our results resemble to those of previous reports by Balasubramaniam et al. [2], Chu et al. [13], Zhao et al. [46], and Smith et al. [47].
Fig 4

Phylogenetic analysis based on whole mitochondrial genome sequence.

The evolutionary history was inferred by MrBayes. The scale is 1.0 expected changes per site. The outgroup for evolutionary analysis is Homo sapiens mitochondrion with GenBank acc. NC_012920.

Table 7

Pair-wise sequence similarity between macaques by LAGAN.

KF830702NC_025513JQ821843NC_005943KJ567053KM360179NC_025201KJ567052KF305937NC_012670NC_011519KJ567056NC_023795NC_025222NC_025221KJ567054NC_002764KM023192
M. mulatta lasiotus M. fuscata M. mulatta M. mulatta M. mulatta M. arctoides M. arctoides M. fascicularis M. fascicularis M. fascicularis M. thibetana M. thibetana M. assamensis M. tonkeana M. silenus M. sylvanus M. sylvanus M. cyclopis
KJ567051 M. mulatta 99.26%96.41%95.72%95.72%95.71%93.03%92.95%92.90%92.88%92.58%91.55%91.57%91.41%90.39%90.21%89.66%89.60%96.71%
KF830702 M. mulatta lasiotus -96.35%95.71%95.71%95.71%92.91%92.84%92.99%92.93%92.62%91.59%91.60%91.37%90.36%90.25%89.69%89.62%96.64%
NC_025513 M. fuscata --95.57%95.57%95.59%92.95%92.90%92.75%92.65%92.40%91.28%91.29%91.18%90.20%90.16%89.57%89.47%96.46%
JQ821843 M. mulatta ---100.00%99.93%92.93%92.90%92.86%92.93%92.62%91.43%91.45%91.24%90.17%90.28%89.44%89.36%95.62%
NC_005943 M. mulatta ----99.93%92.93%92.90%92.86%92.93%92.62%91.43%91.45%91.24%90.17%90.28%89.44%89.36%95.62%
KJ567053 M. mulatta -----92.92%92.89%92.86%92.93%92.60%91.41%91.43%91.22%90.17%90.25%89.45%89.37%95.62%
KM360179 M. arctoides ------98.75%92.46%92.52%92.08%91.36%91.36%91.18%90.30%90.28%89.37%89.30%93.01%
NC_025201 M. arctoides -------92.49%92.53%92.02%91.25%91.28%91.08%90.14%90.27%89.50%89.42%93.01%
KJ567052 M. fascicularis --------97.98%97.68%91.60%91.63%91.37%90.20%90.07%89.68%89.59%92.88%
KF305937 M. fascicularis ---------96.41%91.62%91.64%91.43%90.19%90.27%89.74%89.67%92.88%
NC_012670 M. fascicularis ----------91.34%91.32%91.06%90.05%89.95%89.56%89.49%92.56%
NC_011519 M. thibetana -----------99.89%98.04%89.65%89.83%89.21%89.14%91.47%
KJ567056 M. thibetana ------------98.05%89.62%89.82%89.23%89.15%91.47%
NC_023795 M. assamensis -------------89.46%89.72%89.17%89.11%91.40%
NC_025222 M. tonkeana --------------91.88%89.36%89.29%90.33%
NC_025221 M. silenus ---------------89.51%89.44%90.23%
KJ567054 M. sylvanus ----------------99.88%89.76%
NC_002764 M. sylvanus -----------------89.67%

Phylogenetic analysis based on whole mitochondrial genome sequence.

The evolutionary history was inferred by MrBayes. The scale is 1.0 expected changes per site. The outgroup for evolutionary analysis is Homo sapiens mitochondrion with GenBank acc. NC_012920. Multiple sequence alignment for the region of tRNA-Phe and tRNA-Pro (excluding control region) of the macaques under study revealed variations in the following regions. 17 and 24 bp long different insertions after tRNA-Tyr in WANCY region in M. sylvanus and M. mulatta respectively (Fig 5). There is a 20 bp overlap region between tRNA-Tyr and COX1 identified in M. mulatta (JQ821843). The 3’-end boundary of tRNA-Tyr is the same among all macaques. In our analysis with tRNAscan-SE [48] and MITOS, we found these two insertions and concluded them as result of annotation conflict. In addition, pairwise comparison with the same species from different publications this observation was confirmed. For M. sylvanus, we compared barbary macaque of GenBank acc. NC_002764 with that of GenBank acc. KJ567054. For M. mulatta, we compared Indian rhesus macaque of GenBank JQ821843 with both of them of GenBank acc. KJ567053 and NC_005943 respectively.
Fig 5

Insertion in tRNA-Tyr of the WANCY tRNA-gene cluster.

Multiple sequence alignment of the WANCY tRNA-gene cluster identified a 17 bp insertion at the 3’-end of tRNA-Tyr specific to M. sylvanus and a 24 bp insertion at the 3’-end of tRNA-Tyr specific to M. mulatta. The red box showed the overlap region between tRNA-Tyr and COX1. The blue line showed the 3’-end boundary of tRNA-Tyr while checking with tRNAscan-SE and MITOS.

Insertion in tRNA-Tyr of the WANCY tRNA-gene cluster.

Multiple sequence alignment of the WANCY tRNA-gene cluster identified a 17 bp insertion at the 3’-end of tRNA-Tyr specific to M. sylvanus and a 24 bp insertion at the 3’-end of tRNA-Tyr specific to M. mulatta. The red box showed the overlap region between tRNA-Tyr and COX1. The blue line showed the 3’-end boundary of tRNA-Tyr while checking with tRNAscan-SE and MITOS. The intergenic region between COX2 and tRNA-Lys is the short non-coding region in the mitochondrial genome (Fig 6). This intergenic region is the second largest noncoding region in macaque mitogenomes. Comparison with the multiple sequence alignment of macaques, a short CT repeat region found in the fascicularis group but is absent in species of the sinica group.
Fig 6

COX2/tRNA-Lys intergenic region.

A short CT repeat region in red box is present in the fascicularis group but absent in species of the sinica group. The blue line represents the boundary of COX2 and tRNA-Lys respectively. In sequence alignment, * (asterisk) character indicates positions which are fully conserved.

COX2/tRNA-Lys intergenic region.

A short CT repeat region in red box is present in the fascicularis group but absent in species of the sinica group. The blue line represents the boundary of COX2 and tRNA-Lys respectively. In sequence alignment, * (asterisk) character indicates positions which are fully conserved. We have identified a less conserved region in 3’-end of tRNA-Lys by multiple sequence alignment (Fig 7). Among all tRNA genes, tRNA-Lys gene exhibits a length variation among different macaque species.
Fig 7

Length variation in tRNA-Lys.

The less conserved region (red box) in the 3’-end of tRNA-Lys was explored by multiple sequence alignment. The blue line represents the boundary of 5’-end and 3’-end of tRNA-Lys respectively. In sequence alignment, * (asterisk) character indicates positions which are fully conserved.

Length variation in tRNA-Lys.

The less conserved region (red box) in the 3’-end of tRNA-Lys was explored by multiple sequence alignment. The blue line represents the boundary of 5’-end and 3’-end of tRNA-Lys respectively. In sequence alignment, * (asterisk) character indicates positions which are fully conserved. Most macaques use ATT/ATG/ATA as the start codon for their mitogenome protein-coding genes (Table 8). Exceptions include ND1 and ATP8 of M. thibetana and ND1 of M. assamensis, which use GTG as the start codon. Comparison of the tRNAs of the Taiwanese macaque with those of its closest relatives, the Indian and Chinese rhesus macaques, revealed that only tRNA-Ala and tRNA-Arg are identical among all three macaques.
Table 8

Comparison of start and stop codons of protein-coding genes of seven species/subspecies of macaques.

Species M. mulatta M. mulatta lasiotus M. fuscata M. mulatta M. mulatta M. mulatta M. arctoides M. arctoides M. fascicularis M. fascicularis M. fascicularis M. thibetana M. thibetana M. assamensis M. tonkeana M. silenus M. sylvanus M. sylvanus M. cyclopis
GenBank acc.KJ567051KF830702NC_025513JQ821843NC_005943KJ567053KM360179NC_025201KJ567052KF305937NC_012670NC_011519KJ567056NC_023795NC_025222NC_025221KJ567054NC_002764KM023192
ND1ATG / T—ATG / T—ATG / TAAATG / T—ATG / TAGATG / T—ATG / T—ATG / T—ATG / T—ATG / T—ATG / T—GTG / T—GTG / T—GTG / T—ATG / T—ATG / T—ATG / T—ATG / TAGATG / T—
ND2ATT / T—ATT / T—ATT / TAAATT / T—ATT / TAGATT / T—ATT / T—ATT / T—ATT / T—ATT / T—ATT / T—ATT / T—ATT / T—ATT / T—ATT / T—ATT / T—ATT / T—ATT / TAGATT / T—
COX1ATG / TAAATG / TAAATG / TAAATG / TAAATG /ATG / TAAATG / TAAATG / TAAATG / TAGATG / TAGATG / T—ATG /ATG / TAGATG / TAGATG / TAAATG / TAAATG / TAGATG / G—ATG / TAA
COX2ATG / TAAATG / TAAATG / TAAATG / TAAATG / TAAATG / TAAATG / TAAATG / TAAATG / TAAATG / TAAATG / TAAATG / TAAATG / TAAATG / TAAATG / TAAATG / TAGATG / TAAATG / TAAATG / TAA
ATP8ATG / TAAATG / TAAATG / TAAATG / TAAATG / TAAATG / TAAATG / TAAATG / T—ATG / T—ATG / TAAATG / TAAGTG / TAAGTG / T—ATG / TAAATG / TAAATG / TAAATG / TAAATG / TAAATG / TAA
ATP6ATG / TAAATG / TAAATG / TAAATG / TAAATG / TAAATG / TAAATG / TAAATG / TAAATG / TAAATG / TAAATG / TAAATG / TAAATG / TAAATG / TAAATG / TAAATG / TAAATG / TAAATG / TAAATG / TAA
COX3ATG / T—ATG / T—ATG / TAAATG / T—ATG / T—ATG / T—ATG / T—ATG / T—ATG / T—ATG / T—ATG / T—ATG / T—ATG / T—ATG / T—ATG / T—ATG / T—ATG / T—ATG / T—ATG / T—
ND3ATT / T—ATT / T—ATT / TAAATT / T—ATT / T—ATT / T—ATC / T—ATC / T—ATT / T—ATT / T—ATT / T—ATC / T—ATC / T—ATC / T—ATT / T—ATT / T—ATT / T—ATT / T—ATT / T—
ND4LATG / TAAATG / TAAATG / TAAATG / TAAATG / TAAATG / TAAATG / TAAATG / TAAATG / TAAATG / TAAATG / TAAATG / TAAATG / TAAATG / TAAATG / TAAATG / TAAATG / TAAATG / TAAATG / TAA
ND4ATG / T—ATG / T—ATG / TAAATG / T—ATG / T—ATG / T—ATG / T—ATG / T—ATG / T—ATG / T—ATG / T—ATG / T—ATG / T—ATG / T—ATG / T—ATG / T—ATG / T—ATG / T—ATG / T—
ND5ATG / TAAATA / TAAATA / TAAATA / TAAATA / TAAATA / TAAATA / TAAATA / TAAATA / TAGATA / TAAATA / TAAATG / TAAATA / TAAATA / TAAATC / TAAATG / TAAATA / TAAATA / TAAATA / TAA
ND6ATG / AGGATG / AGGATG / AGGATG / AGGATG / AGGATG / AGGATG / AGGATG / AGGATG / AGGATG / AGGATG / AGGATG / AGGATG / AGGATG / AGGATG / AGAATG / AGAATG / AGGATG / AGGATG / AGG
CYTBATG / T—ATG / T—ATG / T—ATG / T—ATG / T—ATG / T—ATG / T—ATG / T—ATG / T—ATG / T—ATG / T—ATG / T—ATG / T—ATG / T—ATG / T—ATG / T—ATG / T—ATG / T—ATG / T—

Further discussion

The introduced reference-assisted de novo assembly with multiple k-mer strategy for mitogenome assembly consists of two steps: mitochondrial genome reads selection and de novo way to assemble the mitogenome. The proposed pipeline for mitogenome assembly can be used for de novo mitogenome assembly from mitochondrial genome sequencing library as well. Moreover, it is the critical issue to determine the closeness of reference genome with the sequencing target because selected mitochondrial genome reads will totally affect the outcome. But, it is still not clear how closely related reference genome will be affected because of the complexity of mitogenomes between species. The initial assumption of de novo assembly with multiple k-mer strategy on different read pools generated by using different QV threshold attempts to identify the final mitogenome with identical scaffold supports. In this study, while we used the mitochondrial genome of Indian rhesus macaque as reference genome, the largest cluster with 7 identical sequences was identified as the final complete mitogenome of Taiwanese macaque. There are 4 copies in forward strand forms and 3 copies in reverse strand forms in the CD-HIT-EST cluster. This is a good sign that the major mitochondrial genome copy could be detected by the proposed method and identical sequences support to the final complete mitogenome. We also found that the k-mer of the identical sequences ranges from 81 to 89. These results suggested that larger k-mer have capability to assemble the complete mitogenome. In summary, the proposed method provide an alternative solution for mitochondrial genome assembly with identical sequence supports.

Conclusions

In 1976, Fooden reported a provisional classification of 19 macaque species clustered into four groups based on the structure of male external genitalia: the silenus-sylvanus group, the sinica group, the fascicularis group, and the arctoides group [15]. M. cyclopis, together with M. fascicularis, M. mulatta, and M. fuscata, were assigned to the fascicularis group. This classification based on reproductive anatomy was later supported by molecular evidence. Recently, owing to advances in PCR and sequencing technologies, researchers were able to use mitochondrial DNA fragments (e.g., control region, rRNA or tRNA genes) for the phylogenetic study of M. cyclopis [13, 16, 42, 43]. However, due to the lack of a complete mitochondrial genome, only partial mitochondrial DNA sequences were analyzed for cross-species comparison. Here, for the first time, we assembled the mitochondrial genome of M. cyclopis, making it available for whole mitochondrial genome-based phylogenetic study. In accordance with most of the previous reports, our whole mitogenome-base phylogenetic study indicated a strong phylogenetic tie between M. cyclopis and other macaque species in the fascicularis group, especially M. mulatta lasiotus, a species living in Southern China. Similar results were also reported by Chu et al. [13] and Smith et al. [18], which showed the strongest phylogenetic connection of M. cyclopis with macaques living in Sichuan and Yunnan area, China. Studies of 835 bp mtDNA across species of the facicularis group by Smith et al. indicated a closer phylogenetic relationship of Chinese rhesus macaque to Taiwanese macaque, than to Indian rhesus macaque [18]. These data also suggested the migration of macaques from Asian continent to Taiwan during the ice age.

Raw paired-end reads for assembling Taiwanese macaque mitochondrial genome.

The compressed file contains paired-end reads in FASTQ format (MitoReads.PE.R1.fastq and MitoReads.PE.R2.fastq). (TGZ) Click here for additional data file.
  44 in total

1.  Molecular analyses of mtDNA deletion mutations in microdissected skeletal muscle fibers from aged rhesus monkeys.

Authors:  Nolan G Gokey; Zhengjin Cao; Jeong W Pak; Damian Lee; Susan H McKiernan; Debbie McKenzie; Richard Weindruch; Judd M Aiken
Journal:  Aging Cell       Date:  2004-10       Impact factor: 9.304

2.  Hierarchical steepness, counter-aggression, and macaque social style scale.

Authors:  Krishna N Balasubramaniam; Katharina Dittmar; Carol M Berman; Marina Butovskaya; Mathew A Cooper; Bonaventura Majolo; Hideshi Ogawa; Gabriele Schino; Bernard Thierry; Frans B M De Waal
Journal:  Am J Primatol       Date:  2012-06-11       Impact factor: 2.371

3.  Molecular estimates of primate divergences and new hypotheses for primate dispersal and the origin of modern humans.

Authors:  U Arnason; A Gullberg; A S Burguete; A Janke
Journal:  Hereditas       Date:  2000       Impact factor: 3.271

Review 4.  Ancient mitogenomics.

Authors:  Simon Y W Ho; M Thomas P Gilbert
Journal:  Mitochondrion       Date:  2009-09-27       Impact factor: 4.160

5.  tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence.

Authors:  T M Lowe; S R Eddy
Journal:  Nucleic Acids Res       Date:  1997-03-01       Impact factor: 16.971

6.  Mammalian mitochondrial D-loop region structural analysis: identification of new conserved sequences and their functional and evolutionary implications.

Authors:  E Sbisà; F Tanzariello; A Reyes; G Pesole; C Saccone
Journal:  Gene       Date:  1997-12-31       Impact factor: 3.688

7.  Preliminary analysis of the mitochondrial genome evolutionary pattern in primates.

Authors:  Liang Zhao; Xingtao Zhang; Xingkui Tao; Weiwei Wang; Ming Li
Journal:  Dongwuxue Yanjiu       Date:  2012-08

8.  The complete mitochondrial genome of Assamese Macaques (Macaca assamensis).

Authors:  Juan Jiang; Peng Li; Jianqiu Yu; Guang Zhao; Yong Yi; Bisong Yue; Jing Li
Journal:  Mitochondrial DNA A DNA Mapp Seq Anal       Date:  2014-02-04       Impact factor: 1.514

9.  The whole mitochondrial genome of the Cynomolgus macaque (Macaca fascicularis).

Authors:  Ruilei Li; Huawei Wang; Liqin Yang; Baoming Zhang; Yijiang Li; Jiansheng Hu; Qingpeng Kong
Journal:  Mitochondrial DNA       Date:  2013-09-16

10.  Toward almost closed genomes with GapFiller.

Authors:  Marten Boetzer; Walter Pirovano
Journal:  Genome Biol       Date:  2012-06-25       Impact factor: 13.583

View more
  1 in total

1.  The complete mitochondrial genome of Attacus atlas formosanus Villiard, 1969 (Lepidoptera: Saturniidae).

Authors:  Yu-Feng Huang; Hakan Bozdoğan; Tzu-Han Chen; Chia-Chieh Hsieh; Yu-Shin Nai
Journal:  Mitochondrial DNA B Resour       Date:  2022-01-18       Impact factor: 0.658

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.