Literature DB >> 25969728

Improving the ostrich genome assembly using optical mapping data.

Jilin Zhang1, Cai Li2, Qi Zhou3, Guojie Zhang4.   

Abstract

BACKGROUND: The ostrich (Struthio camelus) is the tallest and heaviest living bird. Ostrich meat is considered a healthy red meat, with an annual worldwide production ranging from 12,000 to 15,000 tons. As part of the avian phylogenomics project, we sequenced the ostrich genome for phylogenetic and comparative genomics analyses. The initial Illumina-based assembly of this genome had a scaffold N50 of 3.59 Mb and a total size of 1.23 Gb. Since longer scaffolds are critical for many genomic analyses, particularly for chromosome-level comparative analysis, we generated optical mapping (OM) data to obtain an improved assembly. The OM technique is a non-PCR-based method to generate genome-wide restriction enzyme maps, which improves the quality of de novo genome assembly.
FINDINGS: In order to generate OM data, we digested the ostrich genome with KpnI, which yielded 1.99 million DNA molecules (>250 kb) and covered the genome at least 500×. The pattern of molecules was subsequently assembled to align with the Illumina-based assembly to achieve sequence extension. This resulted in an OM assembly with a scaffold N50 of 17.71 Mb, which is 5 times as large as that of the initial assembly. The number of scaffolds covering 90% of the genome was reduced from 414 to 75, which means an average of ~3 super-scaffolds for each chromosome. Upon integrating the OM data with previously published FISH (fluorescence in situ hybridization) markers, we recovered the full PAR (pseudoatosomal region) on the ostrich Z chromosome with 4 super-scaffolds, as well as most of the degenerated regions.
CONCLUSIONS: The OM data significantly improved the assembled scaffolds of the ostrich genome and facilitated chromosome evolution studies in birds. Similar strategies can be applied to other genome sequencing projects to obtain better assemblies.

Entities:  

Keywords:  Genome assembly; Optical mapping; Ostrich

Mesh:

Year:  2015        PMID: 25969728      PMCID: PMC4427950          DOI: 10.1186/s13742-015-0062-9

Source DB:  PubMed          Journal:  Gigascience        ISSN: 2047-217X            Impact factor:   6.524


Data description

The advent of the next-generation sequencing (NGS) technology (e.g. Illumina HiSeq, SOLID, 454 FLX) has facilitated the new genome sequencing projects. However, the short reads produced by NGS limits the de novo assembly process to overcome the repeat-rich or highly heterozygous regions to obtain long scaffolds. Without long scaffolds, it is difficult or impossible to conduct some downstream analyses, such as chromosomal rearrangement analysis. One good method used to elongate the scaffolds is optical mapping (OM) [1], which estimates the gap length between scaffolds and merges them into much longer sequences without introducing new bases. The flightless ostrich (Struthio camelus) is the tallest and heaviest living bird. It is the only member in the family Struthionidae, which is the basal extant member of Palaeognathae. Ostrich meat is considered healthy due to its high polyunsaturated fatty acid content, low saturated fatty acid content, and low cholesterol level. The worldwide production of ostrich meat is around 12,000 to 15,000 tons per year [2]. Due to this bird’s biological and agricultural importance, the avian phylogenomics project sequenced the ostrich genome for phylogenetic [3] and comparative genomics analyses [4]. Because ostrich is an important species for avian chromosome evolution analysis [5,6], we generated OM data to help improve the assembly. To increase scaffold lengths with OM technology, the input genome assembly must meet certain requirements as follows: (1) the minimum scaffold N90 should be ≥200 kb and (2) N% in the genome should be <5%. Our Illumina-based assembly fully met these requirements. Before generating OM data, a series of restriction enzymes was evaluated based on the average DNA fragment size produced. This enabled us to check their compatibility with and coverage in the ostrich genome (Table 1). To determine the best enzyme, numerous criteria were applied to define their feasibility, including the percentage of usable DNA fragments within a certain size range, maximum fragment size, number of fragments generated, etc. (Table 1). After evaluation, we chose KpnI as the most efficient enzyme for the ostrich genome for use in subsequent experiments.
Table 1

Restriction enzymes evaluated for compatibility with the Ostrich genome

EnzymeUsable % 5-20 kbUsable % 6-12 kbUsable % 6-15 kb#Frags >100 kbAvg. frag. size (kb)Max. frag. size (kb)
Afl II 11.464.784.6013.9067.77
BamH I 95.2086.5277.0768.00127.15
Kpn I 96.7787.0262.771410.39148.23
Nco I 63.3234.1833.6704.8174.89
Nhe I 88.6663.8563.2606.1984.12
Bgl II 0.990.360.3603.0836.92
Spe I 69.6033.8032.5515.66105.87
Xba I 12.204.634.4603.9559.07
Restriction enzymes evaluated for compatibility with the Ostrich genome All work done in this project followed the guidelines and protocols for research on animals and had the necessary permits and authorization. High molecular weight genomic DNA was extracted from a blood sample collected from a male ostrich in the Kunming Zoo of China. The DNA was then transferred to OpGen, Inc. for collection of single molecule restriction maps (SMRMs) on the Argus® Whole Genome Mapping System. The average size of the digested molecules was ~282 kb, which was determined to be sufficient. To further confirm the enzyme compatibility and performance, 3 MapCards were run to examine the average fragment size, the results of which were consistent with the expected outcome. In total, 32 high-density MapCards were collected and ~136,000 molecules were marked for each card. Finally, about 1.99 million molecules (>250 kb) were analyzed using Genome-Builder (Table 2), OpGen’s analysis pipeline for restriction map comparison. Briefly, in silico restriction maps were first generated from the Illumina assembly based on the KpnI recognition site. These maps were then used as seeds to find overlaps with the SMRMs obtained from the DNA molecules by map-to-map alignment in the Genome-Builder pipeline. Overlapped maps were then assembled with the in silico maps to produce elongated maps, where low coverage regions towards both ends were discarded to maintain the high confident extensions. In our study, we performed four iterations to ensure sufficient extensions. In each iteration, the extended scaffolds were used as the seeds for the next iteration. The extended scaffolds were then used to perform pairwise alignment. The resulting alignments that passed the empirical confidence threshold were considered candidates to connect scaffolds. The relative location and orientation of each of the pairs of the connected scaffolds were used to generate super-scaffolds. This elevated the assembly quality and achieved a scaffold N50 of 17.71 Mb, which is 5 times as large as the scaffold N50 of the initial assembly (Table 3).
Table 2

Summary of SMRM data

AllMaps of >250 kb
Total size 1,126,357.03 Mb732,483.56 Mb
Number of molecules 3,925,1951,989,698
Average molecule size 286.96 kb368.14 kb
Minimum molecule size 150.11 kb250 kb
Average fragment size 16.854 kb17.598 kb
Table 3

Summary of assemblies

Scaffold N50Scaffold N90N%Total size
Initial assembly 3.59 Mb561Kb3.301.26Gb
OM assembly 17.71 Mb3.41 Mb5.561.23Gb
Summary of SMRM data Summary of assemblies To demonstrate that OM assembly can facilitate chromosome evolution research, we present an example of the Z chromosome. Together with previously published FISH (fluorescence in situ hybridization) markers [7], OM makes it possible to re-organize and anchor the scaffolds to the relevant position on the Z chromosome. We recovered the PAR (pseudoautosomal region) by jointing 4 super-scaffolds and their corresponding FISH markers (Figure 1). It is worth mentioning that upon OM integration with FISH markers, most of the sequences in the W degenerated region were properly placed (Figure 1). The longest super-scaffold anchored to the ostrich Z chromosome is 29.2 Mb. Considering the gap sequence introduced by OM could not elucidate more information on the whole Z chromosome, we ignored the gap size estimated from OM and filled in a constant gap of 600 Ns between scaffolds. This avoided introducing more uncertainty into the sequence and simplified the downstream analysis. The pseudo Z chromosome we constructed further extended our knowledge of evolutionary strata and their diversity in birds, making it possible to deduce the rearrangement events during different periods [8]. In addition, together with the multi-genome alignments, we further examined the force of Z chromosome evolution in birds [9].
Figure 1

Relationships between OM super-scaffolds and the Illumina assembly scaffolds. The upper part of the figure shows the super-scaffolds generated by OM, and the lower shows the ordered Illumina scaffolds by aligning against the chicken Z chromosome. Because we made use of the FISH markers (red triangles) to resolve the artificial rearrangements introduced by alignment with the chicken genome, the scaffold order of the lower part was not the original order from the whole genome alignment. The red and blue underlines represent the PAR and W degenerated region, respectively.

Relationships between OM super-scaffolds and the Illumina assembly scaffolds. The upper part of the figure shows the super-scaffolds generated by OM, and the lower shows the ordered Illumina scaffolds by aligning against the chicken Z chromosome. Because we made use of the FISH markers (red triangles) to resolve the artificial rearrangements introduced by alignment with the chicken genome, the scaffold order of the lower part was not the original order from the whole genome alignment. The red and blue underlines represent the PAR and W degenerated region, respectively. In conclusion, the OM data generated in this study and presented here improved the ostrich assembly and facilitated a comparative analysis at the chromosome level. The improved assembly can be used for future genomic studies, especially those requiring long scaffolds. Furthermore, these data can be used for future development of OM software tools.

Availability of supporting data

The data files presented in this Data Note are available in the GigaScience repository, GigaDB [10]. Raw sequencing data are also available from the SRA [SRP028745].
  8 in total

Review 1.  Optical mapping of DNA: single-molecule-based methods for mapping genomes.

Authors:  Robert K Neely; Jochem Deen; Johan Hofkens
Journal:  Biopolymers       Date:  2011-01-04       Impact factor: 2.505

2.  Comparison of the Z and W sex chromosomal architectures in elegant crested tinamou (Eudromia elegans) and ostrich (Struthio camelus) and the process of sex chromosome differentiation in palaeognathous birds.

Authors:  Yayoi Tsuda; Chizuko Nishida-Umehara; Junko Ishijima; Kazuhiko Yamada; Yoichi Matsuda
Journal:  Chromosoma       Date:  2007-01-12       Impact factor: 4.316

3.  Comparative genomics reveals insights into avian genome evolution and adaptation.

Authors:  Guojie Zhang; Cai Li; Qiye Li; Bo Li; Denis M Larkin; Chul Lee; Jay F Storz; Agostinho Antunes; Matthew J Greenwold; Robert W Meredith; Anders Ödeen; Jie Cui; Qi Zhou; Luohao Xu; Hailin Pan; Zongji Wang; Lijun Jin; Pei Zhang; Haofu Hu; Wei Yang; Jiang Hu; Jin Xiao; Zhikai Yang; Yang Liu; Qiaolin Xie; Hao Yu; Jinmin Lian; Ping Wen; Fang Zhang; Hui Li; Yongli Zeng; Zijun Xiong; Shiping Liu; Long Zhou; Zhiyong Huang; Na An; Jie Wang; Qiumei Zheng; Yingqi Xiong; Guangbiao Wang; Bo Wang; Jingjing Wang; Yu Fan; Rute R da Fonseca; Alonzo Alfaro-Núñez; Mikkel Schubert; Ludovic Orlando; Tobias Mourier; Jason T Howard; Ganeshkumar Ganapathy; Andreas Pfenning; Osceola Whitney; Miriam V Rivas; Erina Hara; Julia Smith; Marta Farré; Jitendra Narayan; Gancho Slavov; Michael N Romanov; Rui Borges; João Paulo Machado; Imran Khan; Mark S Springer; John Gatesy; Federico G Hoffmann; Juan C Opazo; Olle Håstad; Roger H Sawyer; Heebal Kim; Kyu-Won Kim; Hyeon Jeong Kim; Seoae Cho; Ning Li; Yinhua Huang; Michael W Bruford; Xiangjiang Zhan; Andrew Dixon; Mads F Bertelsen; Elizabeth Derryberry; Wesley Warren; Richard K Wilson; Shengbin Li; David A Ray; Richard E Green; Stephen J O'Brien; Darren Griffin; Warren E Johnson; David Haussler; Oliver A Ryder; Eske Willerslev; Gary R Graves; Per Alström; Jon Fjeldså; David P Mindell; Scott V Edwards; Edward L Braun; Carsten Rahbek; David W Burt; Peter Houde; Yong Zhang; Huanming Yang; Jian Wang; Erich D Jarvis; M Thomas P Gilbert; Jun Wang
Journal:  Science       Date:  2014-12-11       Impact factor: 47.728

4.  Complex evolutionary trajectories of sex chromosomes across bird taxa.

Authors:  Qi Zhou; Jilin Zhang; Doris Bachtrog; Na An; Quanfei Huang; Erich D Jarvis; M Thomas P Gilbert; Guojie Zhang
Journal:  Science       Date:  2014-12-11       Impact factor: 47.728

5.  Temporal genomic evolution of bird sex chromosomes.

Authors:  Zongji Wang; Jilin Zhang; Wei Yang; Na An; Pei Zhang; Guojie Zhang; Qi Zhou
Journal:  BMC Evol Biol       Date:  2014-12-12       Impact factor: 3.260

6.  Whole-genome analyses resolve early branches in the tree of life of modern birds.

Authors:  Erich D Jarvis; Siavash Mirarab; Andre J Aberer; Bo Li; Peter Houde; Cai Li; Simon Y W Ho; Brant C Faircloth; Benoit Nabholz; Jason T Howard; Alexander Suh; Claudia C Weber; Rute R da Fonseca; Jianwen Li; Fang Zhang; Hui Li; Long Zhou; Nitish Narula; Liang Liu; Ganesh Ganapathy; Bastien Boussau; Md Shamsuzzoha Bayzid; Volodymyr Zavidovych; Sankar Subramanian; Toni Gabaldón; Salvador Capella-Gutiérrez; Jaime Huerta-Cepas; Bhanu Rekepalli; Kasper Munch; Mikkel Schierup; Bent Lindow; Wesley C Warren; David Ray; Richard E Green; Michael W Bruford; Xiangjiang Zhan; Andrew Dixon; Shengbin Li; Ning Li; Yinhua Huang; Elizabeth P Derryberry; Mads Frost Bertelsen; Frederick H Sheldon; Robb T Brumfield; Claudio V Mello; Peter V Lovell; Morgan Wirthlin; Maria Paula Cruz Schneider; Francisco Prosdocimi; José Alfredo Samaniego; Amhed Missael Vargas Velazquez; Alonzo Alfaro-Núñez; Paula F Campos; Bent Petersen; Thomas Sicheritz-Ponten; An Pas; Tom Bailey; Paul Scofield; Michael Bunce; David M Lambert; Qi Zhou; Polina Perelman; Amy C Driskell; Beth Shapiro; Zijun Xiong; Yongli Zeng; Shiping Liu; Zhenyu Li; Binghang Liu; Kui Wu; Jin Xiao; Xiong Yinqi; Qiuemei Zheng; Yong Zhang; Huanming Yang; Jian Wang; Linnea Smeds; Frank E Rheindt; Michael Braun; Jon Fjeldsa; Ludovic Orlando; F Keith Barker; Knud Andreas Jønsson; Warren Johnson; Klaus-Peter Koepfli; Stephen O'Brien; David Haussler; Oliver A Ryder; Carsten Rahbek; Eske Willerslev; Gary R Graves; Travis C Glenn; John McCormack; Dave Burt; Hans Ellegren; Per Alström; Scott V Edwards; Alexandros Stamatakis; David P Mindell; Joel Cracraft; Edward L Braun; Tandy Warnow; Wang Jun; M Thomas P Gilbert; Guojie Zhang
Journal:  Science       Date:  2014-12-12       Impact factor: 47.728

7.  The molecular basis of chromosome orthologies and sex chromosomal differentiation in palaeognathous birds.

Authors:  Chizuko Nishida-Umehara; Yayoi Tsuda; Junko Ishijima; Junko Ando; Atushi Fujiwara; Yoichi Matsuda; Darren K Griffin
Journal:  Chromosome Res       Date:  2007-07-03       Impact factor: 4.620

8.  Reconstruction of gross avian genome structure, organization and evolution suggests that the chicken lineage most closely resembles the dinosaur avian ancestor.

Authors:  Michael N Romanov; Marta Farré; Pamela E Lithgow; Katie E Fowler; Benjamin M Skinner; Rebecca O'Connor; Gothami Fonseka; Niclas Backström; Yoichi Matsuda; Chizuko Nishida; Peter Houde; Erich D Jarvis; Hans Ellegren; David W Burt; Denis M Larkin; Darren K Griffin
Journal:  BMC Genomics       Date:  2014-12-11       Impact factor: 3.969

  8 in total
  11 in total

Review 1.  But where did the centromeres go in the chicken genome models?

Authors:  Benoît Piégu; Peter Arensburger; Florian Guillou; Yves Bigot
Journal:  Chromosome Res       Date:  2018-09-17       Impact factor: 5.239

2.  Mammalian X homolog acts as sex chromosome in lacertid lizards.

Authors:  M Rovatsos; J Vukić; L Kratochvíl
Journal:  Heredity (Edinb)       Date:  2016-03-16       Impact factor: 3.821

3.  Protein sequences bound to mineral surfaces persist into deep time.

Authors:  Beatrice Demarchi; Shaun Hall; Teresa Roncal-Herrero; Colin L Freeman; Jos Woolley; Molly K Crisp; Julie Wilson; Anna Fotakis; Roman Fischer; Benedikt M Kessler; Rosa Rakownikow Jersie-Christensen; Jesper V Olsen; James Haile; Jessica Thomas; Curtis W Marean; John Parkington; Samantha Presslee; Julia Lee-Thorp; Peter Ditchfield; Jacqueline F Hamilton; Martyn W Ward; Chunting Michelle Wang; Marvin D Shaw; Terry Harrison; Manuel Domínguez-Rodrigo; Ross DE MacPhee; Amandus Kwekason; Michaela Ecker; Liora Kolska Horwitz; Michael Chazan; Roland Kröger; Jane Thomas-Oates; John H Harding; Enrico Cappellini; Kirsty Penkman; Matthew J Collins
Journal:  Elife       Date:  2016-09-27       Impact factor: 8.140

4.  Dynamic evolutionary history and gene content of sex chromosomes across diverse songbirds.

Authors:  Luohao Xu; Gabriel Auer; Valentina Peona; Alexander Suh; Yuan Deng; Shaohong Feng; Guojie Zhang; Mozes P K Blom; Les Christidis; Stefan Prost; Martin Irestedt; Qi Zhou
Journal:  Nat Ecol Evol       Date:  2019-04-01       Impact factor: 15.460

5.  Avian W and mammalian Y chromosomes convergently retained dosage-sensitive regulators.

Authors:  Daniel W Bellott; Helen Skaletsky; Ting-Jan Cho; Laura Brown; Devin Locke; Nancy Chen; Svetlana Galkina; Tatyana Pyntikova; Natalia Koutseva; Tina Graves; Colin Kremitzki; Wesley C Warren; Andrew G Clark; Elena Gaginskaya; Richard K Wilson; David C Page
Journal:  Nat Genet       Date:  2017-01-30       Impact factor: 38.330

6.  A Genetic Map of Ostrich Z Chromosome and the Role of Inversions in Avian Sex Chromosome Evolution.

Authors:  Homa Papoli Yazdi; Hans Ellegren
Journal:  Genome Biol Evol       Date:  2018-08-01       Impact factor: 3.416

7.  A new emu genome illuminates the evolution of genome configuration and nuclear architecture of avian chromosomes.

Authors:  Jing Liu; Zongji Wang; Jing Li; Luohao Xu; Jiaqi Liu; Shaohong Feng; Chunxue Guo; Shengchan Chen; Zhanjun Ren; Jinpeng Rao; Kai Wei; Yuezhou Chen; Erich D Jarvis; Guojie Zhang; Qi Zhou
Journal:  Genome Res       Date:  2021-01-06       Impact factor: 9.043

8.  Analysis of Sex Chromosome Evolution in the Clade Palaeognathae from Phased Genome Assembly.

Authors:  Miki Okuno; Shusei Mizushima; Asato Kuroiwa; Takehiko Itoh
Journal:  Genome Biol Evol       Date:  2021-11-05       Impact factor: 3.416

9.  BioNano genome mapping of individual chromosomes supports physical mapping and sequence assembly in complex plant genomes.

Authors:  Helena Staňková; Alex R Hastie; Saki Chan; Jan Vrána; Zuzana Tulpová; Marie Kubaláková; Paul Visendi; Satomi Hayashi; Mingcheng Luo; Jacqueline Batley; David Edwards; Jaroslav Doležel; Hana Šimková
Journal:  Plant Biotechnol J       Date:  2016-01-23       Impact factor: 9.803

10.  Chromosome-level assembly reveals extensive rearrangement in saker falcon and budgerigar, but not ostrich, genomes.

Authors:  Rebecca E O'Connor; Marta Farré; Sunitha Joseph; Joana Damas; Lucas Kiazim; Rebecca Jennings; Sophie Bennett; Eden A Slack; Emily Allanson; Denis M Larkin; Darren K Griffin
Journal:  Genome Biol       Date:  2018-10-24       Impact factor: 13.583

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.