| Literature DB >> 25336895 |
Diogo A Tschoeke1, Gisele L Nunes2, Rodrigo Jardim1, Joana Lima2, Aline Sr Dumaresq2, Monete R Gomes2, Leandro de Mattos Pereira2, Daniel R Loureiro3, Patricia H Stoco4, Herbert Leonel de Matos Guedes5, Antonio Basilio de Miranda1, Jeronimo Ruiz6, André Pitaluga7, Floriano P Silva8, Christian M Probst9, Nicholas J Dickens10, Jeremy C Mottram10, Edmundo C Grisard4, Alberto Mr Dávila1.
Abstract
Leishmaniasis is an infectious disease caused by Leishmania species. Leishmania amazonensis is a New World Leishmania species belonging to the Mexicana complex, which is able to cause all types of leishmaniasis infections. The L. amazonensis reference strain MHOM/BR/1973/M2269 was sequenced identifying 8,802 codifying sequences (CDS), most of them of hypothetical function. Comparative analysis using six Leishmania species showed a core set of 7,016 orthologs. L. amazonensis and Leishmania mexicana share the largest number of distinct orthologs, while Leishmania braziliensis presented the largest number of inparalogs. Additionally, phylogenomic analysis confirmed the taxonomic position for L. amazonensis within the "Mexicana complex", reinforcing understanding of the split of New and Old World Leishmania. Potential non-homologous isofunctional enzymes (NISE) were identified between L. amazonensis and Homo sapiens that could provide new drug targets for development.Entities:
Keywords: Leishmania amazonensis; comparative genomics; phylogenomics
Year: 2014 PMID: 25336895 PMCID: PMC4182287 DOI: 10.4137/EBO.S13759
Source DB: PubMed Journal: Evol Bioinform Online ISSN: 1176-9343 Impact factor: 1.625
Summary of the Leishmania amazonensis assembly and genome.
| 10,305 | |
| Sum of consensus sequences length | 29,670,588 bases |
| Number of scaffolds >1 K nt | 4827 (46.8%) |
| Number of scaffolds >10 K nt | 732 (7.1%) |
| Number of scaffolds >100 K nt | 2 (0.02%) |
| Coding genes: CDS | 8,802 |
| Chromosome | 34 |
| %GC content: Contigs/CDS | 59%/(61.125%) |
| Max (bases) | 141,211/(19,872) |
| Min (bases) | 96/(66) |
| Mean (bases) | 2,879/(1,637) |
| Median (bases) | 853/(1,209) |
| N50 scaffold length | 8,346 |
| Molecular function | 4,065 |
| Biological process | 4,007 |
| Cellular component | 4,054 |
| 3,075 | |
| 6,144 | |
| 5,554 | |
| 887 | |
Figure 1Average size (bases) from putative CDS identified in L. amazonensis genome.
Resume table of most abundant and single copy genes/domains found in Leishmania amazonensis genome analysis.
| MOST ABUNDANT GENES/DOMAINS | SINGLE COPY GENES/DOMAINS |
|---|---|
| ABC transporter | rpS2 |
| Amastin surface glycoprotein | rpS5 |
| ATP-dependent RNA helicase | rpS8 |
| Calpains | rpS10 |
| Dynein heavy chain | rpS12 |
| Heat Shock Proteins (HSPs) | rpL7 |
| Kinesin | rpL12 |
| Protein kinase | rpL13 |
| WD40 | rpL19 |
| Chaperone DNAJ | rpL23 |
Notes: Most abundant genes/domains found in the initial Leishmania amazonensis genome analysis. Genes/domains found in single copy during the analysis. 40S ribomosomal proteins (rpS) and 60S ribosomal proteins (rpL).
List of orphans proteins found in Leishmania amazonensis with their respective identification, description and length (aa).
| IDENTIFICATION | DESCRIPTION | LENGTH |
|---|---|---|
| LAJMNGS001H06.b.195 | Unspecified product | 98 |
| LAJMNGS002H09.b.421 | Unspecified product | 150 |
| LAJMNGS005H02.b.1027 | Hypothetical protein, conserved | 79 |
| LAJMNGS006F03.b.1178 | Hypothetical protein, unknown function | 771 |
| LAJMNGS018E09.b.3196 | Carboxypeptidase, putative | 325 |
| LAJMNGS018H07.b.3264 | Hypothetical protein | 951 |
| LAJMNGS027A04.b.4532 | Hypothetical protein | 167 |
| LAJMNGS030G04.b.5103 | Unspecified product | 48 |
| LAJMNGS031F02.b.5255 | Unspecified product | 212 |
| LAJMNGS038C10.b.6191 | Unspecified product | 37 |
| LAJMNGS038E01.b.6205 | Unspecified product | 94 |
| LAJMNGS051A11.b.7995 | Hypothetical protein | 139 |
| NODE_5216_1 | Hypothetical protein, Unknown function | 68 |
| NODE_20256_1 | Unspecified product | 81 |
Figure 2Proteins families identification generated by PFAM database.
Notes: Only 20 most abundant families were represented in the figure. Remaining families are grouped into green square and not characterized proteins are in blue.
Figure 3Conserved domains identification generated by RpsBlast with CDD database.
Notes: Only 20 most domains were represented in the legend. Remaining families are grouped into green square and uncharacterized proteins are in purple.
Figure 4Gene ontology results to protein characterization in level of molecular function (A), Biological process (B) and cellular component (C). Only the 20 most abundant characteristics were listed here.
Figure 5Functional category by KOG and COG for Leishmania amazonensis proteins INFORMATION STORAGE AND PROCESSING: [J] Translation, ribosomal structure and biogenesis, [A] RNA processing and modification, [K] Transcription, [L] Replication, recombination and repair, [B] Chromatin structure and dynamics. CELLULAR PROCESSES AND SIGNALING: [D] Cell cycle control, cell division, chromosome partitioning, [Y] Nuclear structure, [V] Defense mechanisms, [T] Signal transduction mechanisms, [M] Cell wall/membrane/envelope biogenesis, [N] Cell motility, [Z] Cytoskeleton, [W] Extracellular structures, [U] Intracellular trafficking, secretion, and vesicular transport, [O] Posttranslational modification, protein turnover, chaperones. METABOLISM: [C] Energy production and conversion,[G] Carbohydrate transport and metabolism, [E] Amino acid transport and metabolism, [F] Nucleotide transport and metabolism, [H] Coenzyme transport and metabolism, [I] Lipid transport and metabolism, [P] Inorganic ion transport and metabolism, [Q] Secondary metabolites biosynthesis, transport and catabolism POORLY CHARACTERIZED: [R] General function prediction only, [S] Function unknown.
Figure 6Phylogenomics analysis tree for all six Leishmania species (in red) and for other 28 protozoa species, inferred by Maximum Likehood with 1,000 boostrap replicates, based on thirty-one universal orthologous (UO) genes. Name and legend of the 34 species: Angomonas deanei (A deanei), Strigomonas culicis (S culicis), Leishmania amazonensis (L amazonensis), Leishmania braziliensis (L braziliensis), Leishmania donovani (L donovani), Leishmania infantum (L infantum), Leishmania major (L major), Leishmania mexicana (L mexicana), Trypanosoma brucei (T brucei), Trypanosoma cruzi (T cruzi), Trypanosoma vivax (T vivax), Giardia lamblia (G lamblia), Naegleria gruberi (N gruberi), Dictyosteliida spp.: Dictyostelium discoideum and Polysphondylium pallidum. Trichomonas vaginalis (T vaginalis), Entamoeba spp.: Entamoeba dispar, Entamoeba histolytica and Entamoeba invadens. Tetrahymena thermophila (T thermophila), Plasmodium spp.: Plasmodium berghei, Plasmodium cynomolgi, Plasmodium falciparum, Plasmodium knowlesi and Plasmodium vivax. Coccids spp.: Cryptosporidium muris, Neospora caninum and Toxoplasma gondii. Piroplasmids spp.: Babesia bovis, Babesia equi, Babesia microti, Theileria annulata, Theileria orientalis and Theileria parva.
Figure 7Comparative analysis of species Leishmania using orthologous and paralogous protein groups generated by OrthoMCL. The colors represent the number of protein shared between the species. blue (intern paralogous into specie); green orthologous groups between 2 species (L. amazonensis and L. mexicana: 18; L. amazonensis and L. donovani: 15; L. amazonensis and L. braziliensis: 9; L. amazonensis and L. major: 4; L. amazonensis and L. infantum: 1); and red: 7026 orthologous groups shared between all six Leishmania species. Orthologous groups shared between 3, 4 and 5 species are yellow.
Identification of orthologous groups between L. amazonensis and Leishmania species and inparalogous from L. amazonensis only characterized orthologs are listed.
| ORTHOMCL | PFAM ANNOTATION | CDD ANNOTATION | PROTEIN DESCRIPTION | L. | L. | L. | L. | |
|---|---|---|---|---|---|---|---|---|
| ORTHOMCL7819 | LAJMNGS015A07.b.2588 | Triacylglycerol lipase | X | |||||
| ORTHOMCL7785 | NODE_9861_1 | Kinetoplast-associated protein | X | X | ||||
| ORTHOMCL7789 | NODE_20602_1 | pfam07344, Amastin | Unspecified product | X | X | |||
| ORTHOMCL7794 | NODE_11369_4 | pfam13766, ECH_C,2-enoyl-CoA Hydratase | 3-hydroxyisobutyryl-coenzyme a hydrolase | X | X | |||
| ORTHOMCL7802 | LAJMNGS046H11.b.7373 | Viscerotropic | X | X | ||||
| ORTHOMCL7803 | LAJMNGS046G07.b.7351 | cd00051, EFh, EF-hand, calcium binding motif | Flagellar calcium-binding protein, putative | X | X | |||
| ORTHOMCL7808 | LAJMNGS033H06.b.5610 | PTZ00201, amastin surface glycoprotein | Amastin-like protein | X | X | |||
| ORTHOMCL7813 | LAJMNGS029D07.b.4914 | PF13415.1 Kelch_3 | pfam01344, Kelch_1 | Hypothetical protein | X | X | ||
| ORTHOMCL7814 | LAJMNGS029D03.b.4903 | PF00806.14PUF | cd07920, Pumilio | Unspecified product | X | X | ||
| ORTHOMCL7822 | LAJMNGS010A05.b.1767 | PTZ00428, 60S ribosomal protein L4 | Ribosomal protein L1a, putative | X | X | |||
| ORTHOMCL7717 | NODE_33600_1 | Amino acid permease | X | X | ||||
| ORTHOMCL7788 | NODE_21871_1 | PTZ00201, amastin | Amastin-like protein | X | X | |||
| ORTHOMCL7790 | NODE_20189_1 | cd03213, ABCG_EPDR | ATP-binding cassette protein subfamily G, member 1 | X | X | |||
| ORTHOMCL7792 | NODE_12712_1 | PTZ00263, protein kinase A | Protein kinase A catalytic subunit isoform 2 | X | X | |||
| ORTHOMCL7795 | NODE_10493_4 | Phosphoglycan beta 1,3 galactosyltransferase 4 | X | X | ||||
| ORTHOMCL7800 | LAJMNGS047C07.b.7405 | COG1788, Acyl CoA: acetate/3-ketoacid | Succinyl-coa:3-ketoacid-coenzyme a transferase- like protein | X | X | |||
| ORTHOMCL7801 | LAJMNGS047B12.b.7397 | PTZ00243, ABC transporter | Multidrug resistance protein, putative,p-glycoprotein, putative, ABC transporter | X | X | |||
| ORTHOMCL7809 | LAJMNGS033B09.b.5509 | Calpain-like cysteine peptidase | X | X | ||||
| ORTHOMCL7811 | LAJMNGS030F01.b.5087 | Vacuolar-type Ca2 -ATPase, putative | X | X | ||||
| ORTHOMCL7823 | LAJMNGS008G11.b.1562 | COG1621,SacC, Beta-fructosidases | Beta-fructosidase, invertase,sucrose hydrolase | X | X | |||
| ORTHOMCL7719 | NODE_11708_1 | PTZ00186, heat shock 70 kDa | Heat shock 70-related protein 1, mitochondrial precursor, putative | X | X | |||
| ORTHOMCL7793 | NODE_1256_4 | PLN00220, tubulin beta chain | Beta tubulin | X | X | |||
| ORTHOMCL7798 | LAJMNGS050C12.b.7841 | PLN02880, tyrosine decarboxylase | Tyrosine/dopa decarboxylase | X | X | |||
| ORTHOMCL7815 | LAJMNGS027C01.b.4566 | pfam00201, UDP-glucoronosyl and glucosyl transferase | Hypothetical protein, conserved | X | X | |||
| ORTHOMCL7818 | LAJMNGS015D10.b.2693 | PTZ00261, acyltransferase | Unspecified product | X | X | |||
| ORTHOMCL7825 | LAJMNGS008E08.b.1511 | PF00107.21ADH_zinc_N | cd08250, Mgc45594_like | Oxidoreductase-like protein | X | X | ||
Figure 8In green area, a total of 2,483 L. amazonensis proteins identified by both Conserved Domains Database (RpsBlast-CDD) and Protein Families Database (HMMER-PFam). In lateral tables we visualize most frequents Families (Pfam) and Domains (CDD). 269 proteins were identified only by Protein Families Database, and inside yellow area we show 10 most frequent families found by Pfam. A total of 2,634 proteins were identified only by Conserved Domains Database (CDD), and in blue area the 10 most common domains assigned by CDD in L. amazonensis.
Figure 9Alignment of DNA-directed RNA polymerase, alpha subunit, sequences between the 6 Leishmania species. We color-coded the sites with identical residues with the same color, and used asterisks to indicate the conserved residues in all species.
Intergenomic NISEs, their official enzyme names, sequences IDs, Uniprot IDs for human sequences, PDB structures and the identity for each sequence.
| EC | ENZYME NAME (OFFICIAL) | ORGANISM | SEQUENCES IDs (*) | UNIPROT ACCESS | PDB [BEST HIT] (**) | IDENTITY [PDB] |
|---|---|---|---|---|---|---|
| 1.1.1.2 | Alcohol dehydrogenase (NADP(+)) | LAJMNGS050H11.b.7960 | N/A | 1UUF | 160/332 (48%) | |
| hsa:10327 | P14550 | 2ALR | Structure solved | |||
| 1.3.1.34 | 2,4-dienoyl-CoA reductase (NADPH) | LAJMNGS010C07.b.1806 | N/A | 1PS9 | 294/730 (40%) | |
| LAJMNGS024B09.b.4107 | N/A | 198/658 (30%) | ||||
| hsa:1666 | Q16698 | 1W6U | Structure solved | |||
| hsa:26063 | Q9NUI1 | 4FC6 | Structure solved | |||
| 1.3.1.74 | 2-alkenal reductase | LAJMNGS036G08.b.6014 | N/A | 4GBY | 139/482 (29%) | |
| hsa:22949 | Q14914 | 1ZSV (+) | Structure solved | |||
| 2.7.4. 2 | Phosphomevalonate kinase | LAJMNGS005E09.b.95 | N/A | N/A | N/A | |
| hsa:10654 | Q15126 | 3CH4 | Structure solved | |||
| Q6FGV9 | ||||||
| 3.1.11.2 (Predicted NISE) | Exodeoxyribonuclease III | LAJMNGS001G08.b.166 | N/A | N/A | N/A | |
| hsa:5810 | O60671 | 3G65 (+) | Structure solved | |||
| hsa:5883 | Q99638 | 3GGR (+) | Structure solved | |||
| hsa:11219 | Q9BQ50 | 1Y97 | Structure solved | |||
| hsa:11277 | Q9NSU2 | 3U6F | 178/304 (59%) | |||
| Q5TZT0 | ||||||
| 5.3.3.2 | Isopentenyl-diphosphate Delta-isomerase | LAJMNGS034G09.b.5743 | N/A | 2ZRU | 118/352 (34%) | |
| hsa:91734 | Q9BXS1 | 2PNY | Structure solved | |||
| hsa:3422 | Q13907 | 2ICJ | Structure solved |
Notes: (*) The sequences IDs from H. sapiens are from KEGG database. (**) The (+) signal on “PDB [Best hit]” column represent that there are more structures solved for this sequence.
Intragenomic NISEs, their official enzyme names, sequences IDs, PDB structures identified and the identity for each sequence.
| EC | ENZYME NAME (OFFICIAL) | ORIGINAL ANNOTATION | SEQUENCES IDs: | PDB [BEST HIT] | IDENTITY [PDB] |
|---|---|---|---|---|---|
| 4.2.1.1 | Carbonate dehydratase | carbonic anhydrase-like protein | LAJMNGS019E05.b.3366 | 4G7A | 53/164 (32%) |
| Carbonate dehydratase | carbonic anhydrase family protein, putative | LAJMNGS035D05.b.5816 | 1I6O | 97/229 (42%) | |
| 4.2.99.18 | DNA-(apurinic or apyrimidinic site) lyase | endonuclease III, putative | LAJMNGS002A05.b.218 (2) | 1P59 | 66/194 (34%) |
| DNA-(apurinic or apyrimidinic site) lyase | endonuclease/exonuclease protein-like protein | LAJMNGS041H02.b.6678 (2) | 2ISI | 37/106 (35%) | |
| 5.4.2.1 | Phosphoglycerate mutase | phosphoglycerate mutase protein, putative | LAJMNGS013E01.b.2299 | 4IJ5 | 45/152 (30%) |
| Phosphoglycerate mutase | 2,3-bisphosphoglycerate-independent phosphoglycerate mutase,2,3-bisphosphoglycerate-independentphosphoglyceratemutase | LAJMNGS025H05.b.4375 (2) | 3IGY | 497/552 (90%) |
Notes: (*) The numbers between parenthesis on “Sequences IDs: L. amazonensis” column, represent the number of copies of this enzyme.
Figure 10Phylogenetic relationship among argonaute-like genes in Trypanosomatids, constructed by Neighbor-Joinning with 1,000 bootstrap replicates.
RNAi pathway related sequences in L. amazonensis.
| LAJMNGS037G03.b.6124 |
| LAJMNGS051A10.b.7989 |
| LAJMNGS009D01.b.1653 |
| LAJMNGS023D01.b.3956 |
| LAJMNGS034E11.b.5717 |
| LAJMNGS035F02.b.5853 |
| LAJMNGS002E10.b.336 |
| LAJMNGS005H09.b.1043 |
| LAJMNGS016A07.b.2785 |
| LAJMNGS018H11.b.3270 |
| LAJMNGS021E12.b.3675 |
| LAJMNGS024C05.b.4124 |
| LAJMNGS042E06.b.6755 |
| LAJMNGS045F10.b.7200 |
| LAJMNGS046A05.b.7244 |
| LAJMNGS020H09.b.3587 |
| LAJMNGS021A10.b.3613 |
Figure 11Structural comparison of selected intergenomic NISE cases between L. amazonensis and Human. Top panel (EC 1.1.1.2): A LAJMNGS050H11.b.7960 “putative NADP-dependent alcohol dehydrogenase” from L. amazonensis (A) and human Aldehyde reductase (PDB 2ALR) (B). Middle panel (EC 1.3.1.34): LAJMNGS010C07.b.1806 “putative 2,4-dienoyl-coa reductase FADH1” from L. amazonensis (C) and human mitochondrial 2,4-dienoyl-CoA reductase (PDB 1W6U) (D). Bottom panel (EC 5.3.3.2): LAJMNGS034G09.b.5743 “putative isopentenyl-diphosphate delta-isomerase” from L. amazonensis (E) and human Isopentenyl-diphosphate Delta-isomerase (PDB 2ICK) (F). Models for all proteins are presented as ribbons. Parasite proteins are colored by secondary structure and presented superposed on their templates (gray ribbons) used in homology modeling . Human analogs are colored by secondary structure, except for 1 W6U, which is colored by chain and presented superposed on the peroxisomal isoform (PDB) shown as gray ribbons. The insets show details of the proposed catalytic residues and co-factors for each analogous enzyme. Residues colored blue belong to the parasite enzymes while residues from human analogs are color-coded by atom type.
Figure 12Comparison result between L. mexicana and L. amazonensis. Synteny map of L. mexicana (top) compared to L. amazonensis (bottom). The red lines connect the sequences and are proportional to sequence identity, the darker the more similar are the sequences. The scale and numbers represents nucleotide position on the genome/chromosome.