| Literature DB >> 26586576 |
Kengo Sato1, Yoko Kuroki2, Wakako Kumita3, Asao Fujiyama4,5, Atsushi Toyoda5, Jun Kawai6, Atsushi Iriki7, Erika Sasaki3,8, Hideyuki Okano9,10, Yasubumi Sakakibara1.
Abstract
The first draft of the common marmoset (Callithrix jacchus) genome was published by the Marmoset Genome Sequencing and Analysis Consortium. The draft was based on whole-genome shotgun sequencing, and the current assembly version is Callithrix_jacches-3.2.1, but there still exist 187,214 undetermined gap regions and supercontigs and relatively short contigs that are unmapped to chromosomes in the draft genome. We performed resequencing and assembly of the genome of common marmoset by deep sequencing with high-throughput sequencing technology. Several different sequence runs using Illumina sequencing platforms were executed, and 181 Gbp of high-quality bases including mate-pairs with long insert lengths of 3, 8, 20, and 40 Kbp were obtained, that is, approximately 60× coverage. The resequencing significantly improved the MGSAC draft genome sequence. The N50 of the contigs, which is a statistical measure used to evaluate assembly quality, doubled. As a result, 51% of the contigs (total length: 299 Mbp) that were unmapped to chromosomes in the MGSAC draft were merged with chromosomal contigs, and the improved genome sequence helped to detect 5,288 new genes that are homologous to human cDNAs and the gaps in 5,187 transcripts of the Ensembl gene annotations were completely filled.Entities:
Mesh:
Year: 2015 PMID: 26586576 PMCID: PMC4653617 DOI: 10.1038/srep16894
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Summary of sequence reads.
| length | insert | raw | quality filter | |||||
|---|---|---|---|---|---|---|---|---|
| # of reads | total bases | coverage | # of reads | total bases | coverage | |||
| Illumina GAIIx | ||||||||
| SE | 115 | 71 M | 8 G | 2.7× | 71 M | 7 G | 2.4× | |
| PE | 150 | 500 | 363 M | 55 G | 18× | 337 M | 39 G | 13× |
| PE | 115 | 500 | 441 M | 51 G | 17× | 423 M | 45 G | 15× |
| PE | 115 | 700 | 460 M | 53 G | 18× | 445 M | 49 G | 16× |
| Illumina HiSeq2000 | ||||||||
| PE | 100 | 3 K | 219 M | 18 G | 5.9× | |||
| PE | 100 | 8 K | 118 M | 10 G | 3.2× | |||
| PE | 100 | 20 K | 122 M | 10 G | 3.3× | |||
| PE | 100 | 40 K | 41 M | 3 G | 1.1× | |||
Statistics of assembly results with Illumina SE, PE and MP reads.
| # of contigs | N50 | # of gaps | total gap length | |
|---|---|---|---|---|
| Our improved draft genome | 104 K | 61,143 | 122 K | 129,679,131 bp |
| MGSAC draft (caljac-3.2) | 201 K | 29,273 | 187 K | 162,452,744 bp |
Figure 1An example of an improved region.
(a) The region “chr4: 69196273–69307838” in the MGSAC draft genome consisted of 6 contigs, which includes 5 gaps, 1 with a length > 10 Kbp. (b) SSPACE generated a scaffold using MP reads, in which the region corresponding to the large gap was filled with the contig ACFV01184668.1 (gray), which was part of the non-chromosomal scaffolds in the MGSAC draft genome. (c) The remaining gaps were filled by GapCloser using Illumina short sequence reads. Finally, the genome was updated by mapping the gap-filled scaffolds.
Statistics of gene annotations in the improved genome sequence.
| Ensembl annotations from MGSAC | marmoset cDNA | human cDNA | ab initio by AUGUSTUS | RNA-seq | |
|---|---|---|---|---|---|
| # of transcripts | 52,754 | 45,432 | 116,826 | 32,464 | 78,227 |
| # of completed | 5,187 | 0 | 5,288 | 12,209 | 8,316 |
Figure 2Principal component analysis based on the pairwise allele-sharing distance among the CIEA marmoset and 9 MGSAC marmosets.
The contribution rate of PC1 and PC2 is 13.95% and 13.44%, respectively.
The total number of the improved (CIEA) contigs, the total number of MGSAC contigs mapped to each chromosome in the improved genome, and the number of MGSAC contigs newly mapped to each chromosome in the improved genome but remained unmapped in the MGSAC draft genome.
| chr 1 | chr 2 | chr 3 | chr 4 | chr 5 | chr 6 | chr 7 | chr 8 | |
|---|---|---|---|---|---|---|---|---|
| # of CIEA contigs | 5,310 | 4,406 | 3,877 | 3,866 | 5,366 | 3,581 | 4,472 | 3,685 |
| # of MGSAC contigs | 12,214 | 10,822 | 9,257 | 8,957 | 11,671 | 8,369 | 10,135 | 7,482 |
| # of MGSAC contigs newly mapped | 876 | 519 | 845 | 792 | 1,020 | 732 | 881 | 1,452 |
| 3,928 | 2,931 | 3,571 | 3,352 | 2,459 | 2,519 | 2,242 | 2,208 | |
| 8,364 | 7,513 | 7,836 | 7,766 | 6,050 | 5,988 | 5,248 | 5,028 | |
| 1,138 | 328 | 586 | 419 | 456 | 429 | 458 | 437 | |
| 1,450 | 1,630 | 1,288 | 1,331 | 1,312 | 3,480 | 12,410 | 4,758 | |
| 3,562 | 3,186 | 2,961 | 3,018 | 2,781 | 6,192 | 17,389 | 5,290 | |
| 262 | 564 | 246 | 193 | 293 | 282 | 7,893 | 5,042 |
Figure 3Gene Ontology (GO) category analysis for the transcripts of human cDNA mapped to the MGSAC draft genome and the transcripts of human cDNA newly mapped to the improved genome.