| Literature DB >> 25830531 |
Miguel M Pinheiro1, Md Atique Ahmed2, Scott B Millar1, Theo Sanderson3, Thomas D Otto3, Woon Chan Lu4, Sanjeev Krishna5, Julian C Rayner3, Janet Cox-Singh1.
Abstract
Plasmodium knowlesi is a newly described zoonosis that causes malaria in the human population that can be severe and fatal. The study of P. knowlesi parasites from human clinical isolates is relatively new and, in order to obtain maximum information from patient sample collections, we explored the possibility of generating P. knowlesi genome sequences from archived clinical isolates. Our patient sample collection consisted of frozen whole blood samples that contained excessive human DNA contamination and, in that form, were not suitable for parasite genome sequencing. We developed a method to reduce the amount of human DNA in the thawed blood samples in preparation for high throughput parasite genome sequencing using Illumina HiSeq and MiSeq sequencing platforms. Seven of fifteen samples processed had sufficiently pure P. knowlesi DNA for whole genome sequencing. The reads were mapped to the P. knowlesi H strain reference genome and an average mapping of 90% was obtained. Genes with low coverage were removed leaving 4623 genes for subsequent analyses. Previously we identified a DNA sequence dimorphism on a small fragment of the P. knowlesi normocyte binding protein xa gene on chromosome 14. We used the genome data to assemble full-length Pknbpxa sequences and discovered that the dimorphism extended along the gene. An in-house algorithm was developed to detect SNP sites co-associating with the dimorphism. More than half of the P. knowlesi genome was dimorphic, involving genes on all chromosomes and suggesting that two distinct types of P. knowlesi infect the human population in Sarawak, Malaysian Borneo. We use P. knowlesi clinical samples to demonstrate that Plasmodium DNA from archived patient samples can produce high quality genome data. We show that analyses, of even small numbers of difficult clinical malaria isolates, can generate comprehensive genomic information that will improve our understanding of malaria parasite diversity and pathobiology.Entities:
Mesh:
Substances:
Year: 2015 PMID: 25830531 PMCID: PMC4382175 DOI: 10.1371/journal.pone.0121303
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1A screen shot of Artemis DNA view comparing six Plasmodium knowlesi genome sequences from patient isolates to the Plasmodium knowlesi H strain reference genome sequence.
The P. knowlesi normocyte binding protein xa locus on chromosome 14 is shown. The screen shot shows segregation of the sequences from patient isolates into two groups, (n = 3 in each group) and the dimorphism is clearly visible.
Clinical samples human DNA depleted using the Whatman filtration method.
| Sample ID | Parasites/ul | Accession ID | Total hDNA (ng) qPCR | Total Pk DNA (ng) qPCR | % Pk DNA |
|---|---|---|---|---|---|
| 47 | 128,500 | ND | 302 | 294 | 49.3 |
| Duplicate | ERR366425 | 131 | 238 | 64.5 | |
| 87 | 47,000 | NS | 225 | 51 | 18.7 |
| Duplicate | NS | 6 | 8 | 56 | |
| 73 | 40,000 | ERR366426 | 33 | 233 | 87.6 |
| Duplicate | ND | 106 | 162 | 60.4 | |
| 91 | 29,000 | NS | 269 | 37 | 12 |
| 220 | 10,000 | NS | 735 | 24 | 3.1 |
| 48 | 86,000 | ND | 1402 | 1167 | 45.4 |
| Duplicate | ERR274221 | 421 | 824 | 65.7 | |
| 50A | 139,000 | ERR274222 | 317 | 2775 | 89.8 |
| Duplicate | ND | 415 | 3280 | 88.8 | |
| 50B | 390,000 | ERR274223 | 597 | 1239 | 67.5 |
| 55 | 186,500 | NS | 1235 | 428 | 25.7 |
| Duplicate | NS | 3418 | 433 | 11.2 | |
| 58 | 149,000 | ERR274224 | 336 | 506 | 60 |
| 62 | 139,000 | NS | 988 | 71 | 6.7 |
| 178 | 104,000 | NS | 103 | 11 | 9.4 |
| 233 | 326,000 | NS | 1015 | 313 | 23.5 |
| 258 | 58,000 | NS | 560 | 94 | 14.3 |
| Duplicate | NS | 245 | 59 | 19.4 | |
| 299 | 66,000 | ERR274225 | 626 | 984 | 61.1 |
Samples sequenced using Illumina HiSeq or MiSeq sequencing platforms are labelled. The amount of Parasite DNA recovered and per cent human DNA contamination are given.
*MiSeq.
**HiSeq.
NS = Not suitable.
ND = Duplicate sample suitable but not sequenced.
§ Accession number EMBL-EBI European Nucleotide Archive (http://www.ebi.ac.uk).
P. knowlesi clinical isolate genome sequence summary report.
| Sequence ID | ERR366426 | ERR366425 | ERR274221 | ERR274223 | ERR274222 | ERR274225 | ERR274224 |
|---|---|---|---|---|---|---|---|
| Technology | MiSeq | MiSeq | HiSeq | HiSeq | HiSeq | HiSeq | HiSeq |
| Sample Ref. | SKS-047 | SKS-073 | SKS-048 | SKS-050B | SKS-050A | SKS-299 | SKS-058 |
| Total Reads | 6003990 | 6130562 | 45217760 | 47211338 | 58756176 | 51769792 | 41469862 |
| Total Reads Mapped | 4731148 | 5049125 | 37095534 | 38967367 | 48466910 | 43203711 | 34547072 |
| Total Reads Not Mapped | 1272842 | 1081437 | 8122226 | 8243971 | 10289266 | 8566081 | 6922790 |
| % Reads Not Mapped | 21 | 17 | 17 | 17 | 17 | 16 | 16 |
| % Human DNA | 1.58 | 1.59 | 0.75 | 0.48 | 0.14 | 0.71 | 0.73 |
| Coverage | 32 | 34 | 166 | 175 | 218 | 195 | 156 |
| Number of SNPs | 267055 | 260888 | 318427 | 317051 | 317072 | 304641 | 304147 |
| Number Dimorphic SNPs | 42771 | 42771 | 42771 | 42771 | 42771 | 42771 | 42771 |
| % Zero Coverage (%) | 7.8 | 7.9 | 5.5 | 5.8 | 5.8 | 5.9 | 6 |
| Coverage >1 (%) | 91.6 | 91.4 | 94.1 | 93.9 | 93.9 | 93.7 | 93.6 |
| Coverage >5 (%) | 90.1 | 89.9 | 93.5 | 93.3 | 93.4 | 93.1 | 92.9 |
| Coverage >10 (%) | 88.4 | 88.4 | 92.9 | 92.8 | 93 | 92.6 | 92.4 |
Genome Size 23487363
Fig 2The number and position of SNP sites per gene co-associating with the P. knowelsi genome-wide dimorphism.
Non-synonymous polymorphisms (red) are shown above the line and synonymous polymorphisms (blue) are shown below the line. The line is drawn at zero. The chromosomes are drawn to scale and the height of the bars represents the number of SNP sites per gene per region of each chromosome. The scale is given in the boxed area and is the number of SNP sites per gene.
Summary of P. knowlesi gene ontology (GO) analysis and the proportion of genes in each group with dimorphic SNP's.
| Gene ontology (GO) group | GO-ID | GO- subgroup | Total number of genes | Total genes in dimorphism | Proportion in dimorphism |
|---|---|---|---|---|---|
|
| GO:0060089 | Molecular transducer activity | 5 | 3 | 0.60 |
| GO:0000988 | Protein binding transcription factor activity | 8 | 4 | 0.50 | |
| GO:0001071 | Nucleic acid binding transcription factor activity | 31 | 23 | 0.74 | |
| GO:0030234 | Enzyme regulator activity | 41 | 24 | 0.59 | |
| GO:0005215 | Transporter activity | 101 | 63 | 0.62 | |
| GO:0005198 | Structural molecule activity | 151 | 67 | 0.44 | |
| GO:0003824 | Catalytic activity | 999 | 653 | 0.65 | |
| GO:0005488 | Binding | 1031 | 664 | 0.64 | |
|
| GO:0031012 | Extracellular matrix | 1 | 0 | 0.00 |
| GO:0005576 | Extracellular region | 6 | 3 | 0.50 | |
| GO:0031974 | Membrane-enclosed lumen | 55 | 34 | 0.62 | |
| GO:0016020 | Membrane | 15 | 12 | 0.80 | |
| GO:0005623 | Cell | 1054 | 573 | 0.54 | |
| GO:0043226 | Organelle | 813 | 423 | 0.52 | |
| GO:0032991 | Macromolecular complex | 447 | 229 | 0.51 | |
|
| GO:0051704 | Multi-organism process | 5 | 4 | 0.80 |
| GO:0002376 | Immune system process | 3 | 1 | 0.33 | |
| GO:0022610 | Biological adhesion | 4 | 4 | 1.00 | |
| GO:0032502 | Developmental process | 8 | 2 | 0.25 | |
| GO:0040011 | Locomotion | 10 | 5 | 0.50 | |
| GO:0000003 | Reproduction | 7 | 3 | 0.43 | |
| GO:0023052 | Signaling | 72 | 40 | 0.56 | |
| GO:0065007 | Biological regulation | 109 | 59 | 0.54 | |
| GO:0071840 | Cellular component organization or biogenesis | 184 | 99 | 0.54 | |
| GO:0050896 | Response to stimulus | 157 | 93 | 0.59 | |
| GO:0051179 | Localization | 255 | 155 | 0.61 | |
| GO:0044699 | Single-organism process | 264 | 149 | 0.56 | |
| GO:0009987 | Cellular process | 1215 | 721 | 0.59 | |
| GO:0008152 | Metabolic process | 1211 | 716 | 0.59 |
Gene Ontology assigned using Blast2GO—Software for Biologists, http://www.blast2go.com.
Fig 3P. knowlesi genes are grouped by gene ontology (GO) terms.
The percent of genes in each GO sub-group of a) molecular function, b) cellular processes and c) biological processes are shown, n = the total number of mapped annotated genes in each GO sub-group. Percent of genes in GO subgroups with: no dimorphic SNP sites shown in brown, genes with 1–9 dimorphic SNP sites turquoise and genes with ≥10 dimorphic SNP sites purple. The expected percent of genes with 1–9 dimorphic SNPs (50%) is marked with a turquoise hatched line and the expected percent of genes with ≥10 dimorphic SNPs (11%) is marked with a purple hatched line. Gene ontology was assigned using Blast2GO—Software for Biologists, http://www.blast2go.com.
topGO gene enrichment analysis.
| Gene Ontology (Blast@GO) |
|
| ≥1 dimorphic SNPs | ≥10 dimorphic SNPs |
|---|---|---|---|---|
|
|
|
| ||
| Binding | Ion binding | GO:0043167 | 0.000077 | 0.01011 |
| Catalytic activity | Helicase activity | GO:0004386 | 0.000095 | |
| Catalytic activity | ATPase activity | GO:0016887 | 0.0014 | 0.02964 |
| Catalytic activity | Hydrolase activity, acting on glycosyl bonds | GO:0016798 | 0.0328 | |
| Catalytic activity | DNA binding | GO:0003677 | 0.0121 | |
| Catalytic activity | GTPase activity | GO:0003924 | 0.0421 | |
| Nucleic acid binding transcription factor activity | Nucleic acid binding transcription factor activity | GO:0001071 | 0.000037 | |
| Catalytic activity | Kinase activity | GO:0016301 | 0.00067 | |
|
| ||||
| Cell; Organelle | Microtubule organizing center | GO:0005815 | 0.021 | |
| Cell | Plasma membrane | GO:0005886 | 0.037 | |
| Cell; Organelle | Nucleus | GO:0005634 | 0.0029 | |
| Cell; Organelle | Cytoskeleton | GO:0005856 | 0.0402 | |
|
| ||||
| Metaboloc process; Cellular process | tRNA metabolic process | GO:0006399 | 0.00033 | |
| Metaboloc process; Cellular process | Cellular amino acid metabolic process | GO:0006520 | 0.00357 | |
| Single-organism process; Metaboloc process; Cellular process | Mitosis | GO:0007067 | 0.00849 | |
| Metaboloc process; cellular process | Nucleobase-containing compound catabolic process | GO:0034655 | 0.01453 | |
| Multi-organism process | Transport | GO:0006810 | 0.03512 | |
| Metaboloc process; cellular process | Cellular protein modification process | GO:0006464 | 0.03678 | |
| Metaboloc process; cellular process | Cellular nitrogen compound metabolic process | GO:0034641 | 0.028 | |