Literature DB >> 30540030

Complete and de novo assembly of the Leishmania braziliensis (M2904) genome.

Sandra González-de la Fuente1, Esther Camacho1, Ramón Peiró-Pastor1, Alberto Rastrojo1, Fernando Carrasco-Ramiro1, Begoña Aguado1, Jose M Requena1.   

Abstract

Leishmania braziliensis is the etiological agent of American mucosal leishmaniasis, one of the most severe clinical forms of leishmaniasis. Here, we report the assembly of the L. braziliensis (M2904) genome into 35 continuous chromosomes. Also, the annotation of 8395 genes is provided. The public availability of this information will contribute to a better knowledge of this pathogen and help in the search for vaccines and novel drug targets aimed to control the disease caused by this Leishmania species.

Entities:  

Mesh:

Substances:

Year:  2018        PMID: 30540030      PMCID: PMC6319030          DOI: 10.1590/0074-02760180438

Source DB:  PubMed          Journal:  Mem Inst Oswaldo Cruz        ISSN: 0074-0276            Impact factor:   2.743


Protists of the genus Leishmania are causative agents of a group of diseases known as leishmaniasis, which range from self-curing cutaneous lesions to mucosal disfigurations and fatal visceral outcomes. According to estimates of incidence and disease burden, leishmaniasis are among the six world’s most serious parasitic diseases. The medical relevance of this parasite led to push the project of deciphering its genome sequence that was undertaken by an international consortium formed by the Wellcome Sanger Institute and many laboratories around the world. As a result, the Leishmania (L.) major genome sequence was published in 2005, providing an invaluable framework for studies in very different areas of the field. Soon after, as L. major causes in humans a relatively benign form of cutaneous leishmaniasis, representative species of the two other forms of leishmaniasis, mucosal (Leishmania braziliensis) and visceral (Leishmania infantum) were also selected for genome sequencing projects. The genome sequence of another relevant species, Leishmania donovani, was published a few years later. Nevertheless, the genome assemblies achieved for these species were of a lower quality than that obtained for the L. major genome; many sequence gaps remained in the chromosomal scaffolds generated. Despite the advances in high-throughput sequencing methods and the relatively small genome size of this species (around 32-Mbp, per haploid set), the abundance in repeated sequences (0.4-1 kb in length) that are scattered along the different chromosomes , is a hurdle for sequence assemblers. In fact, most of the recent genomic assemblies reported for Leishmania species contain a significant number of sequence gaps and uncertainties in the structure of chromosomes. Fortunately, third-generation sequencing technologies, having the capacity of generating long sequence reads, have emerged as valuable tools for solving many of the assembly problems arising with the use of short sequence reads. In particular, single molecule real time (SMRT) sequencing, developed by Pacific Biosciences (PacBio ), is being incorporated as a necessary methodology for generating complete chromosomal assemblies. Recently, PacBio sequencing was used to generate a new reference genome for L. donovani. Similarly, a complete L. infantum genome was obtained by our group through a combination of PacBio long reads and paired-end short-reads generated by Illumina technology. In the present work, following the pipeline developed for the assembly of L. infantum, we present a complete assembly of the L. braziliensis (M2904 strain) genome into 35 chromosomes, providing the corresponding gene annotation. In this study, the M2904 strain (MHOM/BR/75/M2904) of L. braziliensis was used, as it is the common reference for genomic studies. , This strain was kindly provided by Dr J Moreno (WHO Collaborating Centre for Leishmaniasis, Centro Nacional de Microbiología, Instituto de Salud Carlos IIII, Madrid, Spain). Culturing conditions and the procedure for total DNA isolation have been described elsewhere. Firstly, Illumina sequencing of total DNA was carried out. Library construction and paired-end library sequencing were performed at the Centro Nacional de Análisis Genómico (CNAG-CRG, Spain; http://www.cnag.crg.eu/) using Illumina HiSeq2000 technology. A total of 51,049,711 paired-end, 2×126 bp sequence reads were generated with a mean insert size of 301 bp. PrinseqQuality (http://prinseq.sourceforge.net/) was applied to quality filtering/trimming of reads (cut-off value of 20, quality phred score), and only reads with length ≥ 60-nt were used (Fig. 1). Finally, 50,619,063 filtered reads were assembled using the CLC Genomics Workbench version 5.0 (CLC Bio). However, the results were not satisfactory as the genome was assembled into 2,686 scaffolds with 3,888 sequence gaps remaining.
Fig. 1:

schematic overview of the Leishmania braziliensis genome assembly workflow. Input files (Illumina and PacBio reads) are shown in yellow rhomboids. Both different assembly processes and software used are represented in blue boxes. Output files are shown in green boxes and discarded data are shown in red boxes.

To improve the assembly, PacBio sequencing was undertaken. A total of 276,308 pre-filtered reads with a median size of 16,600 nucleotides were generated on a PacBio RS II sequencing instrument. The Norwegian Sequencing Centre (www.sequencing.uio.no) provided the sequencing service. As outlined in Fig. 1, firstly, quality trimming of PacBio reads was performed by default in the HGAP pipeline (P_filter module). De novo genome assembly was carried out following a hierarchical genome-assembly process (HGAP ), using the HGAP3 (Pacific Biosciences, SMRT Analysis Software v2.3.0) and HGAP4 (Pacific Biosciences, SMRT Link 4.0.0) tools. Three different assembly strategies were used by varying the software tool and the size of the expected genome: 34 and 35 Mbp, using HGAP3, and 35 Mbp using HGAP4; as a result, 162, 151 and 90 contigs were obtained, respectively. The HGAP4-generated contigs were selected for further analysis. Forty-four out of the 90 contigs were determined to be spurious and discarded, giving their disproportionately low coverage (mean < 40×) and/or short length (< 15-Kb). Some of these contigs corresponded to maxicircle sequences, even though a complete sequence for the L. braziliensis maxicircle molecule could not be generated. The rest of the discarded contigs showed a chimeric structure, being composed by Leishmania sequences present in two or more regions of the chromosomal-size contigs, probably due to assembly artefacts. The remaining 46 contigs (see Fig. 1), having mean coverage of 96× (Table), were preliminary assigned to chromosomes by BLAST analysis using the L. major (Friedlin strain) chromosomes as reference. This analysis indicated that 27 of the contigs represented complete L. braziliensis chromosomes. The remaining 19 contigs could be ordered into continuous chromosomes without any sequence gaps (Fig. 2) using different assemblers. In particular, SSPACE-standard [version 2.0 with extension option (-x 1)] software was useful to join the contigs belonging to chromosomes 8 and 9; SSPACE-LongReads (v 1-1), a tool which uses only the longest PacBio subreads, was successful for joining the two contigs forming the chromosome 27. The two contigs of the chromosome 10 were merged with pyScaf (v1) software (https://github.com/lpryszcz/pyScaf), which aligns long reads onto the contigs and identifies the reads that connect them. Three contigs were identified for chromosome 34, and joined by the combination of two tools: contig1 and contig2 could be joined by pyScaf, and the union between contig 2 and contig 3 was attained by minimus2 (v2), which uses NUCmer (v3.1) to compute overlaps between contigs. Finally, the contigs belonging to the three remaining chromosomes (2, 20 and 22) were joined with minimus2.
TABLE Features

of the Leishmania braziliensis assembled genomes and sequencing strategies

FeaturesAssembly-2007(5) Assembly-2011 28 Assembly-2018(13) This work
Chromosomes (scaffolds)3735378235
Number of contigs10411031360135
N5057,784 bp-20,600 bp1,063,631 bp
Annotated genes8428-81618395
Annotated CDS:8153835780018244
- Functional annotation---4862
- Hypothetical proteins---3382
Annotated pseudogenes161-6533
Annotated structural genes114-95118
Number of gaps2097-33520
Number of Ns92,079-~316,7780
Haploide genome size (bp)31,996,77231,997,77330,009,65332,301,632
Coverage (mean) [sequencing method]6× [Capillary sequencing]105× [Illumina] 7× [Capillary]88× [Illumina]363× [Illumina] 96× [PacBio]

-: data not found.

Fig. 2:

read-depth analysis throughout the chromosomes formed by the union of two or more PacBio-assembled contigs. Coverage (log2 value) was appointed by sliding window analysis (bin 200 bp) with either Illumina (in blue) and PacBio (in red) reads, along chromosomes 2, 8, 9, 10, 20, 22, 27 and 34. Contigs lengths are shown by arrowheads lines. Chromosomes 2, 20, and 22 (panels A, E and F) were joined using the minimus 2 assembler. Chromosomes 8 (panel B) and 9 (panel C) were joined by the SSPACE-standard tool. Moreover, chromosome 27 (panel G) was joined by the SSPACE-LongRead tool and chromosome 10 (panel D) was joined with the pyScaf scaffolder tool. Finally, pyScaf and minimus2 were necessary for joining chromosome 34 (panel H).

-: data not found. Afterwards, the accuracy of the joining points was evaluated using the Illumina paired-end reads. The gap-size in each chromosome was calculated and closed by combining the results generated with the tools GapCloser (v1.12-r6) and Gapfiller. GapFiller (v1.10) was run with the parameters -g 3 (three allowed gaps in Bowtie alignment) and -i 100 (number of iterations to fill a gap). Additionally, the contigs generated from the Illumina sequencing reads (see above) were aligned to the de novo assembled chromosomes using LAST aligner (v.8, http://last.cbrc.jp/) with -uNEAR option. We found nine Illumina contigs that aligned at the ends of chromosomes 8, 15, 16, 20, 21, 22, 23 and 27, but having overhanging sequences. Extension of these chromosomes was performed using MAFFT multiple-aligner (v.7.313). To extend the rest of chromosomes, the SSPACE-standard software was used. Coverage analysis on the final chromosomal assemblies was performed using Illumina and PacBio reads [see Supplementary data (Figs 1-35)]. Illumina reads were aligned with Bowtie2 (v.2.34.3; option: --local), and PacBio bax.h5 reads were aligned with the pbalign (BLASR, v.0.2.0) tool. Coverage analysis was performed from each alignment along the 35 chromosomes using the GenomeCoverageBed tool (v.2.25.0, using -d option, which informs the depth at each genome position) of BEDtools2. The graphical coverage plots files were generated with GNUPLOT (v.4.6, patchlevel 2; http://www.gnuplot.info/). A continuous distribution of reads was found in all chromosomes, and the reads coverage in the eight chromosomes generated by joining of two or three contigs are shown in Fig. 2. The joining points of the chromosomes assembled initially in two or more contigs correspond to regions with tandemly repeated genes (or other repeated elements), and, according to the uneven coverage observed at these points in some of the chromosomes (Fig. 2), it is likely that the exact number of repeats has still not been determined. At this point, a detailed sequence checking was performed using PacBio-utilities (v.1; https://github.com/douglasgscofield/PacBio-utilities), designed to correct deletions and insertions introduced with some frequency, mainly in homopolymer strings, by the PacBio sequencing. Corrections were dictated by the sequence derived from Illumina reads, which have higher accuracy than the PacBio ones. Sequence insertions/deletions were corrected when they were supported by more than ten Illumina reads and the indel was present in 80% (or above) of the reads mapping the concerned position. Additionally, a second correction step was carried out using an in-house Python script, which uses the output generated by the Pilon tool (v.1.2.2, using option: --diploid). As a result, 556 positions were corrected, of which 161 corresponded to sequence deletions and 395 to insertions. The accuracy of these modifications was checked by visualisation of Illumina reads mapped to the assembled genome using the IGV (v.2.3) tool. Gene annotation of both protein-coding genes and known non-coding RNAs on the new assembled L. braziliensis genome was performed using Companion web server, using the default settings and selecting the L. major (Friedlin) annotation as a reference. Additionally, OrthoMCL web version and BLAST searches were used to complement the annotations generated by Companion. Afterwards, all data were combined into a GFF3 file by an in-house Python script. Finally, a manual curation by IGV visualisation of the annotated genes was performed. The annotation file is provided as a supplementary dataset. As summarised in Table, a total of 8,395 genes have been annotated, of which 8,277 correspond to protein-coding genes (including 33 pseudogenes) and 118 are genes coding for structural and/or functional RNAs (i.e., rRNAs, tRNAs and snoRNAs). To visually illustrate the changes in the chromosomal architecture found in the assembly reported here regarding the current reference genome (GeneDB), a comparison of chromosome 20 in both assemblies is shown in Fig. 3. In the current reference, this chromosome is assembled in two separate scaffolds, whereas a continuous chromosome was assembled in this work. Moreover, several regions were assembled in a different order. The availability of a robust genome sequence, as the one presented in this work, is a valuable resource for studies addressing whole-organism aspects following either genomics, transcriptomics or proteomics approaches. Nevertheless, to date, there is not any genome assembly that can be considered as set in stone, and the L. braziliensis genome reported here is not an exception. In particular, we are aware that the structure of the extremities for several of the assembled chromosomes has not been determined definitively, as the telomeric hexanucleotide repeats were not assembled for all the chromosomal ends. The Leishmania chromosomes contain subtelomeric regions consisting of repetitive sequences with variable length and number of repeats that are often chromosome-specific, and for these reasons dedicated methodological approaches are required for accurate determination of their structure.
Fig. 3:

schematic illustration of chromosomal fusion (20 and 34) in the chromosome 20 of Leishmania braziliensis genome. Pairwise alignments between the L. braziliensis newly assembled genome (top) and the current reference genome (bottom), which is fragmented in two scaffolds (LbrM.20.1 and LbrM.20.2), were generated using Mauve genome alignment tool with default settings. This tool uses different colors to represent synteny blocks. Sections located underneath the x-axis show inversion events. The sizes of the two contigs of the current reference genome for chromosome 20 are shown by lines with arrow-heads.

Data availability - Genomic raw data have been deposited in The European Nucleotide Archive (ENA; http://www.ebi.ac.uk/ena/). Also, the assembled genome and annotations files were uploaded under the Study accession number PRJEB25922 and study unique name ena-STUDY-CBMSO-09-04-2018-08:02:47:223-4008. Additionally, the new L. braziliensis genome sequence and annotations are available at the Leish-ESP web site (http://leish-esp.cbm.uam.es/).
  28 in total

1.  Scaffolding pre-assembled contigs using SSPACE.

Authors:  Marten Boetzer; Christiaan V Henkel; Hans J Jansen; Derek Butler; Walter Pirovano
Journal:  Bioinformatics       Date:  2010-12-12       Impact factor: 6.937

2.  Whole genome sequencing of multiple Leishmania donovani clinical isolates provides insights into population structure and mechanisms of drug resistance.

Authors:  Tim Downing; Hideo Imamura; Saskia Decuypere; Taane G Clark; Graham H Coombs; James A Cotton; James D Hilley; Simonne de Doncker; Ilse Maes; Jeremy C Mottram; Mike A Quail; Suman Rijal; Mandy Sanders; Gabriele Schönian; Olivia Stark; Shyam Sundar; Manu Vanaerschot; Christiane Hertz-Fowler; Jean-Claude Dujardin; Matthew Berriman
Journal:  Genome Res       Date:  2011-10-28       Impact factor: 9.043

3.  The genome of the kinetoplastid parasite, Leishmania major.

Authors:  Alasdair C Ivens; Christopher S Peacock; Elizabeth A Worthey; Lee Murphy; Gautam Aggarwal; Matthew Berriman; Ellen Sisk; Marie-Adele Rajandream; Ellen Adlem; Rita Aert; Atashi Anupama; Zina Apostolou; Philip Attipoe; Nathalie Bason; Christopher Bauser; Alfred Beck; Stephen M Beverley; Gabriella Bianchettin; Katja Borzym; Gordana Bothe; Carlo V Bruschi; Matt Collins; Eithon Cadag; Laura Ciarloni; Christine Clayton; Richard M R Coulson; Ann Cronin; Angela K Cruz; Robert M Davies; Javier De Gaudenzi; Deborah E Dobson; Andreas Duesterhoeft; Gholam Fazelina; Nigel Fosker; Alberto Carlos Frasch; Audrey Fraser; Monika Fuchs; Claudia Gabel; Arlette Goble; André Goffeau; David Harris; Christiane Hertz-Fowler; Helmut Hilbert; David Horn; Yiting Huang; Sven Klages; Andrew Knights; Michael Kube; Natasha Larke; Lyudmila Litvin; Angela Lord; Tin Louie; Marco Marra; David Masuy; Keith Matthews; Shulamit Michaeli; Jeremy C Mottram; Silke Müller-Auer; Heather Munden; Siri Nelson; Halina Norbertczak; Karen Oliver; Susan O'neil; Martin Pentony; Thomas M Pohl; Claire Price; Bénédicte Purnelle; Michael A Quail; Ester Rabbinowitsch; Richard Reinhardt; Michael Rieger; Joel Rinta; Johan Robben; Laura Robertson; Jeronimo C Ruiz; Simon Rutter; David Saunders; Melanie Schäfer; Jacquie Schein; David C Schwartz; Kathy Seeger; Amber Seyler; Sarah Sharp; Heesun Shin; Dhileep Sivam; Rob Squares; Steve Squares; Valentina Tosato; Christy Vogt; Guido Volckaert; Rolf Wambutt; Tim Warren; Holger Wedler; John Woodward; Shiguo Zhou; Wolfgang Zimmermann; Deborah F Smith; Jenefer M Blackwell; Kenneth D Stuart; Bart Barrell; Peter J Myler
Journal:  Science       Date:  2005-07-15       Impact factor: 47.728

4.  OrthoMCL: identification of ortholog groups for eukaryotic genomes.

Authors:  Li Li; Christian J Stoeckert; David S Roos
Journal:  Genome Res       Date:  2003-09       Impact factor: 9.043

5.  Minimus: a fast, lightweight genome assembler.

Authors:  Daniel D Sommer; Arthur L Delcher; Steven L Salzberg; Mihai Pop
Journal:  BMC Bioinformatics       Date:  2007-02-26       Impact factor: 3.169

6.  Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement.

Authors:  Bruce J Walker; Thomas Abeel; Terrance Shea; Margaret Priest; Amr Abouelliel; Sharadha Sakthikumar; Christina A Cuomo; Qiandong Zeng; Jennifer Wortman; Sarah K Young; Ashlee M Earl
Journal:  PLoS One       Date:  2014-11-19       Impact factor: 3.240

7.  Genome-wide stochastic adaptive DNA amplification at direct and inverted DNA repeats in the parasite Leishmania.

Authors:  Jean-Michel Ubeda; Frédéric Raymond; Angana Mukherjee; Marie Plourde; Hélène Gingras; Gaétan Roy; Andréanne Lapointe; Philippe Leprohon; Barbara Papadopoulou; Jacques Corbeil; Marc Ouellette
Journal:  PLoS Biol       Date:  2014-05-20       Impact factor: 8.029

8.  Resequencing of the Leishmania infantum (strain JPCM5) genome and de novo assembly into 36 contigs.

Authors:  Sandra González-de la Fuente; Ramón Peiró-Pastor; Alberto Rastrojo; Javier Moreno; Fernando Carrasco-Ramiro; Jose M Requena; Begoña Aguado
Journal:  Sci Rep       Date:  2017-12-22       Impact factor: 4.379

9.  SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler.

Authors:  Ruibang Luo; Binghang Liu; Yinlong Xie; Zhenyu Li; Weihua Huang; Jianying Yuan; Guangzhu He; Yanxiang Chen; Qi Pan; Yunjie Liu; Jingbo Tang; Gengxiong Wu; Hao Zhang; Yujian Shi; Yong Liu; Chang Yu; Bo Wang; Yao Lu; Changlei Han; David W Cheung; Siu-Ming Yiu; Shaoliang Peng; Zhu Xiaoqian; Guangming Liu; Xiangke Liao; Yingrui Li; Huanming Yang; Jian Wang; Tak-Wah Lam; Jun Wang
Journal:  Gigascience       Date:  2012-12-27       Impact factor: 6.524

10.  Leishmania naiffi and Leishmania guyanensis reference genomes highlight genome structure and gene evolution in the Viannia subgenus.

Authors:  Simone Coughlan; Ali Shirley Taylor; Eoghan Feane; Mandy Sanders; Gabriele Schonian; James A Cotton; Tim Downing
Journal:  R Soc Open Sci       Date:  2018-04-25       Impact factor: 2.963

View more
  7 in total

1.  Global genome diversity of the Leishmania donovani complex.

Authors:  Susanne U Franssen; Caroline Durrant; Olivia Stark; Bettina Moser; Tim Downing; Hideo Imamura; Jean-Claude Dujardin; Mandy J Sanders; Isabel Mauricio; Michael A Miles; Lionel F Schnur; Charles L Jaffe; Abdelmajeed Nasereddin; Henk Schallig; Matthew Yeo; Tapan Bhattacharyya; Mohammad Z Alam; Matthew Berriman; Thierry Wirth; Gabriele Schönian; James A Cotton
Journal:  Elife       Date:  2020-03-25       Impact factor: 8.140

2.  LPG2 Gene Duplication in Leishmania infantum: A Case for CRISPR-Cas9 Gene Editing.

Authors:  Flávio Henrique Jesus-Santos; Jéssica Lobo-Silva; Pablo Ivan Pereira Ramos; Albert Descoteaux; Jonilson Berlink Lima; Valéria Matos Borges; Leonardo Paiva Farias
Journal:  Front Cell Infect Microbiol       Date:  2020-08-13       Impact factor: 5.293

Review 3.  Leishmania tarentolae: Taxonomic classification and its application as a promising biotechnological expression host.

Authors:  Stephan Klatt; Larry Simpson; Dmitri A Maslov; Zoltán Konthur
Journal:  PLoS Negl Trop Dis       Date:  2019-07-25

4.  Leishmania Mitochondrial Genomes: Maxicircle Structure and Heterogeneity of Minicircles.

Authors:  Esther Camacho; Alberto Rastrojo; África Sanchiz; Sandra González-de la Fuente; Begoña Aguado; Jose M Requena
Journal:  Genes (Basel)       Date:  2019-09-26       Impact factor: 4.096

5.  The Experimental Proteome of Leishmania infantum Promastigote and Its Usefulness for Improving Gene Annotations.

Authors:  África Sanchiz; Esperanza Morato; Alberto Rastrojo; Esther Camacho; Sandra González-de la González-de la Fuente; Anabel Marina; Begoña Aguado; Jose M Requena
Journal:  Genes (Basel)       Date:  2020-09-02       Impact factor: 4.096

6.  Transcriptional Shift and Metabolic Adaptations during Leishmania Quiescence Using Stationary Phase and Drug Pressure as Models.

Authors:  Marlene Jara; Michael Barrett; Ilse Maes; Clement Regnault; Hideo Imamura; Malgorzata Anna Domagalska; Jean-Claude Dujardin
Journal:  Microorganisms       Date:  2022-01-03

7.  Gene Annotation and Transcriptome Delineation on a De Novo Genome Assembly for the Reference Leishmania major Friedlin Strain.

Authors:  Esther Camacho; Sandra González-de la Fuente; Jose C Solana; Alberto Rastrojo; Fernando Carrasco-Ramiro; Jose M Requena; Begoña Aguado
Journal:  Genes (Basel)       Date:  2021-08-29       Impact factor: 4.096

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.