| Literature DB >> 30821816 |
Pasi K Korhonen1, Ross S Hall1, Neil D Young1, Robin B Gasser1.
Abstract
BACKGROUND: Here, we created an automated pipeline for the de novoassembly of genomes from Pacific Biosciences long-read and Illumina short-read data using common workflow language (CWL). To evaluate the performance of this pipeline, we assembled the nuclear genomes of the eukaryotes Caenorhabditis elegans (∼100 Mb), Drosophila melanogaster (∼138 Mb), and Plasmodium falciparum (∼23 Mb) directly from publicly accessible nucleotide sequence datasets and assessed the quality of the assemblies against curated reference genomes.Entities:
Keywords: genome assembly; repeatability; workflow automation; workflow language
Mesh:
Year: 2019 PMID: 30821816 PMCID: PMC6451199 DOI: 10.1093/gigascience/giz014
Source DB: PubMed Journal: Gigascience ISSN: 2047-217X Impact factor: 6.524
Figure 1:Diagram illustrates an automated common workflow language (CWL)-based genome assembly pipeline for PacBio long-read and Illumina short-read data. PacBio reads are first pre-processed and then used for assembly and long-read polishing. Illumina reads are cleaned and used to further polish the long-read assembly. Finally, haplotypes are merged in the repeat-masked, polished assembly. While the workflow is running, dependent software tools are automatically deployed from Bioconda package channel and DockerHub container repository. The code for the workflow and the Dockerfiles for the docker containers are stored in a GitHub code-repository.
Statistics for the PacBio long-read and Illumina short-read datasets and for reference genomes of Caenorhabditis elegans, Drosophila melanogaster, and Plasmodium falciparum[a]
| Description |
|
|
|
|---|---|---|---|
| PacBio raw reads (bp) | 4,726,985,993 | 15,733,529,928 | 5,246,949,826 |
| read count; average length (bp) | 411,459; 11,488 | 1,657,183; 9,494 | 515,155; 10,185 |
| PacBio corrected reads (bp) | 3,795,130,237 | 5,258,127,473 | 653,116,132 |
| read count; average length (bp) | 256,228; 14,812 | 279,988; 18,780 | 32,211; 20,276 |
| PacBio trimmed reads (bp) | 3,644,992,500 | 5,080,646,626 | 600,631,753 |
| read count; average length (bp) | 248,954; 14,641 | 271,623; 18,705 | 30,866; 19,459 |
| PacBio contaminated reads (bp) | 36,479,366 | 20,369 | 50,389 |
| read count; average length (bp) | 2,647; 13,781 | 1; 20,369 | 4; 12,597 |
| PacBio decontaminated reads (bp) | 3,608,513,134 | 5,080,626,257 | 600,581,364 |
| read count; average length (bp) | 246,307; 14,651 | 271,622; 18,705 | 30,862; 19,460 |
| Illumina PE raw reads (bp) | 24,028,252,320 | 42,492,715,000 | 61,074,625,500 |
| read count; average length (bp) | 200,235,436; 120 | 424,927,150; 100 | 244,298,502; 250 |
| Illumina PE cleaned reads (bp) | 16,914,423,470 | 28,126,765,439 | 13,370,453,180 |
| read count; average length (bp) | 66,608,171; 112 | 312,148,126; 90 | 87,161,538; 153 |
| Sequencing depth for PacBio raw data | 47 | 109 | 225 |
| Sequencing depth for trimmed and decontaminated PacBio reads | 36 | 35 | 26 |
| Sequencing depth for Illumina raw reads | 240 | 296 | 2,625 |
| Sequencing depth for Illumina cleaned reads | 169 | 196 | 575 |
| Genome size (bp); sequence count | 100,286,401; 7 | 137,567,484; 8 | 23,292,622; 14 |
| Number of N nucleotides; gap count | 0; 0 | 490,385; 268 | 0; 0 |
| NG90 (bp); LG90 | 13,783,801; 6 | 23,513,712; 5 | 1,067,971; 12 |
| NG50 (bp); LG50 | 17,493,829; 3 | 25,286,936; 3 | 1,687,656; 5 |
| GC-content (%) | 35.44 | 42.08 | 19.34 |
| Complete BUSCO ortholog count | 968 | 1,653 | 148 |
| Complete single-copy BUSCO ortholog count | 962 | 1,641 | 148 |
| Complete duplicated BUSCO ortholog count | 6 | 12 | 0 |
| Fragmented BUSCO ortholog count | 8 | 3 | 1 |
| Missing BUSCO ortholog count | 6 | 2 | 66 |
| Expected BUSCO ortholog count | 982 | 1,658 | 215 |
| Length of coding sequences in reference (bp) | 24,681,654 | 21,683,562 | 12,552,304 |
| Length of non-coding sequences in reference (bp) | 75,604,747 | 115,883,922 | 10,740,318 |
| Number of reference coding sequences | 20,081 | 13,911 | 5,515 |
| Estimated repeat content (%); interspersed repeats (%) | 18.95;18.20 | 20.52;19.04 | 21.84;4.41 |
a Caenorhabditis elegans (National Center for Biotechnology Information [NCBI] accession identifier SRR2598966; URL [60]), Drosophila melanogaster [61] (NCBI Sequence Read Archive (SRA) accession identifiers SRX499318 and SRR1211256), and Plasmodium falciparum (NCBI SRA accession identifiers SRR3194817–25 and ERR862169–70) [59].
Metrics for the pipeline assemblies of the Caenorhabditis elegans genome against the reference assembly for this species
| Metric | Canu contigs | Arrow-polished contigs | Pilon-polished contigs | HaploMerger2- merged contigs |
|---|---|---|---|---|
| Genome size (bp) | 104,147,712 | 104,179,922 | 104,199,510 | 102,615,360 |
| Sequence count | 100 | 100 | 100 | 54 |
| Quast genome fraction (%) | 97.29 | 97.64 | 97.56 | 97.00 |
| Quast aligned length (bp) | 98,056,933 | 98,420,852 | 98,371,646 | 97,651,504 |
| Number of Ns (bp); gap count | 0;0 | 0;0 | 0;0 | 0;0 |
| N(G)90 (bp); L(G)90 | 973,097;34 | 973,604;34 | 973,839;34 | 1,058,765;27 |
| N(G)50 (bp); L(G)50 | 2,859,879;11 | 2,860,369;11 | 2,860,908;11 | 4,165,666;9 |
| GC content (%) | 35.44 | 35.45 | 35.45 | 35.44 |
| Repeat content (%); interspersed repeats (%) | - | - | 20.64;19.33 | 20.41;19.17 |
| Longest sequence (bp) | 7,357,248 | 7,359,834 | 7,361,197 | 11,799,614 |
| Shortest sequence (bp) | 8,435 | 8,435 | 8,429 | 16,463 |
| Quast number of translocations; relocations; inversions | 1;41;14 | 1;36;14 | 1;38;15 | 5;40;13 |
| Quast number of local mis-assemblies | 891 | 709 | 722 | 696 |
| Quast duplication ratio | 1.005 | 1.005 | 1.005 | 1.004 |
| Quast mis-matches | 15,037 | 15,355 | 14,414 | 13,869 |
| Quast indels (≤5 bp; >5 bp) | 41,302;698 | 21,859;811 | 5,397;764 | 5,325;743 |
| Quast indels length | 58,771 | 40,680 | 23,336 | 22,772 |
| Quast mis-matches; indels per 100 kbp | 15.41;43.04 | 15.68;23.15 | 14.73;6.3 | 14.26;6.24 |
| GAGE missing reference bases (nt; %) | 86,628;0.09 | 77,203;0.08 | 76,194;0.08 | 292,272;0.29 |
| GAGE missing assembly bases (nt; %) | 464,022;0.45 | 582,816;0.56 | 548,487;0.53 | 457,713;0.45 |
| GAGE duplicated reference bases | 4,962,481 | 4,775,862 | 4,834,860 | 3,510,166 |
| GAGE compressed reference bases | 596,736 | 586,626 | 595,695 | 712,344 |
| GAGE average identity (%) | 99.92 | 99.94 | 99.96 | 99.96 |
| GAGE nucleotide mis-matches | 10,407 | 9,883 | 9,921 | 9,964 |
| GAGE indels (≤5 bp; >5 bp) | 49,111;529 | 24,590;526 | 5,866;527 | 6,076;528 |
| GAGE number of translocations; relocations; inversions | 32;270;129 | 35;124;300 | 29;129;300 | 42;132;290 |
| Complete single-copy; duplicated BUSCO ortholog count | 948;6 | 963;6 | 964;7 | 964;6 |
| Fragmented; missing BUSCO ortholog count | 21;7 | 10;3 | 8;3 | 8;4 |
| Number of nucleotide mis-matches in; outside CDSs | 1,209;13,828 | 1,156;14,199 | 1,154;13,260 | 1,222;12,647 |
| Number of indels in; outside CDSs | 3,580;38,357 | 1,104;21,499 | 177;5,889 | 149;5,825 |
| Number of affected mRNAs; proteins | 2,877;2,858 | 969;948 | 154;131 | 144;121 |
| Number of non-synonymous; synonymous mutations | 483;553 | 515;590 | 443;551 | 485;579 |
| Number of in-frame indels | 101 | 49 | 48 | 61 |
| Combined accuracy of mis-matches and indels in coding regions (%) | 99.981 | 99.991 | 99.995 | 99.994 |
| Combined accuracy of mis-matches and indels in non-coding regions (%) | 99.789 | 99.855 | 99.922 | 99.925 |
Metrics for pipeline assemblies of the Plasmodium falciparum genome against the reference assembly for this species
| Metrics | Canu contigs | Arrow-polished contigs | Pilon-polished contigs |
|---|---|---|---|
| Genome size (bp) (apicoplast removed) | 23,328,599 | 23,350,837 | 23,350,454 |
| Sequence count (apicoplast removed) | 14 | 14 | 14 |
| Apicoplast genome (bp) | - | - | 34,274 |
| Quast genome fraction (%) | 99.62 | 99.529 | 99.648 |
| Quast aligned length (bp) | 23,252,840 | 23,248,663 | 23,276,411 |
| Number of Ns (bp); gap count | 0;0 | 0;0 | 0;0 |
| N(G)90 (bp); L(G)90 | 1,058,353;12 | 1,059,223;12 | 1,059,208;12 |
| N(G)50 (bp); L(G)50 | 1,709,389;5 | 1,711,020;5 | 1,710,975;5 |
| GC content (%) | 19.34 | 19.33 | 19.33 |
| Repeat content (%); interspersed repeats (%) | - | - | 22.45; 6.78 |
| Longest sequence (bp) | 3,291,378 | 3,294,104 | 3,294,056 |
| Shortest sequence (bp) | 642,032 | 642,892 | 642,874 |
| Quast number of translocations; relocations; inversions | 0;2;0 | 0;2;0 | 0;2;0 |
| Quast number of local mis-assemblies | 43 | 47 | 47 |
| Quast duplication ratio | 1.002 | 1.003 | 1.003 |
| Quast mis-matches | 2,237 | 1,242 | 1,503 |
| Quast indels (≤5 bp; >5 bp) | 14,422;174 | 9,241;168 | 8,783;180 |
| Quast indels length | 21,049 | 14,430 | 13,977 |
| Quast mis-matches; indels per 100 kbp | 9.64;62.9 | 5.36;40.59 | 6.48;38.62 |
| GAGE missing reference bases (nt; %) | 15,710;0.07 | 15,198;0.07 | 15,333;0.07 |
| GAGE missing assembly bases (nt; %) | 12,584;0.05 | 12,774;0.05 | 12,658;0.05 |
| GAGE duplicated reference bases | 112,885 | 281,583 | 193,259 |
| GAGE compressed reference bases | 122,934 | 89,625 | 89,404 |
| GAGE average identity (%) | 99.88 | 99.93 | 99.93 |
| GAGE nucleotide mis-matches | 3,094 | 1,107 | 1,281 |
| GAGE indels (≤5 bp: >5 bp) | 19,815;156 | 11,923;128 | 11,450;131 |
| GAGE number of translocations; relocations; inversions | 14;12;9 | 35;12;10 | 34;12;11 |
| Complete single-copy; duplicated BUSCO ortholog count | 147;0 | 148;0 | 148;0 |
| Fragmented; missing BUSCO ortholog count | 1;67 | 1;66 | 1;66 |
| Number of nucleotide mis-matches in; outside CDSs | 420;1,817 | 356;886 | 348;1,155 |
| Number of indels in; outside CDSs | 1,009;13,577 | 573;8,826 | 486;8,466 |
| Number of affected CDSs | 732 | 430 | 369 |
| Number of affected mRNAs; proteins | 711;704 | 420;418 | 362;360 |
| Number of all anomalies | 15,394 | 9,712 | 9,621 |
| Number of non-synonymous; synonymous mutations | 233;187 | 189;167 | 179;169 |
| Number of in-frame indels | 131 | 84 | 61 |
| Combined accuracy of mis-matches and indels in coding regions (%) | 99.979 | 99.989 | 99.988 |
| Combined accuracy of mis-matches and indels in non-coding regions (%) | 99.875 | 99.921 | 99.922 |
*Circlator [67] was used to establish the size of apicoplast genome.
Metrics for pipeline assemblies of the Drosophila melanogaster genome against the reference assembly for this species
| Metrics | Canu contigs | Arrow-polished contigs | Pilon-polished contigs | HaploMerger2- merged contigs |
|---|---|---|---|---|
| Genome size (bp) | 157,857,743 | 157,985,917 | 157,986,071 | 129,695,906 |
| Sequence count | 439 | 439 | 439 | 61 |
| Quast genome fraction (%) | 97.907 | 98.1 | 98.095 | 91.514 |
| Quast aligned length (bp) | 138,910,049 | 139,294,859 | 139,287,556 | 126,646,721 |
| Number of Ns (bp); gap count | 0;0 | 0;0 | 0;0 | 0;0 |
| N90 (bp); L90 | 138,987;78 | 139,113;78 | 139,125;78 | 1,615,500;10 |
| N50 (bp); L50 | 10,648,637;6 | 10,656,889;6 | 10,656,888;6 | 13,348,143;4 |
| NG90 (bp); LG90 | 105,872;95 | 104,289;96 | 104,289;96 | 1,615,500;10 |
| NG50 (bp); LG50 | 8,532,606;7 | 8,534,347;7 | 8,534,351;7 | 16,059,280;3 |
| GC content (%) | 41.68 | 41.68 | 41.68 | 42.17 |
| Repeat content (%); interspersed repeats (%) | - | - | 30.15;28.84 | 16.54;14.59 |
| Longest sequence (bp) | 21,669,562 | 21,676,918 | 21,676,919 | 25,791,812 |
| Shortest sequence (bp) | 2,688 | 2,688 | 2,688 | 7,073 |
| Quast number of translocations; relocations; inversions | 74;60;2 | 74;60;2 | 74;60;2 | 39;24;0 |
| Quast number of local mis-assemblies | 610 | 652 | 645 | 313 |
| Quast duplication ratio | 1.031 | 1.032 | 1.032 | 1.006 |
| Quast mis-matches | 8,441 | 6,256 | 6,590 | 4,909 |
| Quast indels (≤5 bp; >5 bp) | 41,716;402 | 8,399;390 | 8,480;390 | 7,222;279 |
| Quast indels length | 51,453 | 16,762 | 16,911 | 12,871 |
| Quast mis-matches; indels per 100 kbp | 6.27;31.28 | 4.64;6.51 | 4.88;6.57 | 3.9;5.96 |
| GAGE missing reference bases (nt; %) | 643,319;0.47 | 644,217;0.47 | 646,300;0.47 | 4,913,341;3.57 |
| GAGE missing assembly bases (nt; %) | 3,608,718;2.29 | 3,655,639;2.31 | 3,655,348;2.31 | 522,589;0.40 |
| GAGE duplicated reference bases | 23,437,831 | 23,161,535 | 23,181,331 | 3,623,824 |
| GAGE compressed reference bases | 1,919,237 | 1,778,270 | 1,783,342 | 7,621,896 |
| GAGE average identity (%) | 99.95 | 99.98 | 99.98 | 99.98 |
| GAGE nucleotide mis-matches | 7,292 | 5,657 | 6,622 | 5,459 |
| GAGE indels (≤5 bp; >5 bp) | 49,597;273 | 9,393;245 | 9,506;245 | 8,825;213 |
| GAGE number of translocations; relocations; inversions | 14;267;73 | 15;306;75 | 15;306;69 | 96;235;96 |
| Complete single-copy; duplicated BUSCO ortholog count | 1,618;19 | 1,634;19 | 1,634;19 | 1,639;11 |
| Fragmented; missing BUSCO ortholog count | 17;4 | 2;3 | 2;3 | 2;6 |
| Number of nucleotide differences in; outside CDSs | 1,697;6,744 | 1,586;4,670 | 1,502;5,088 | 1,584;3,325 |
| Number of indels in; outside CDSs | 4,953;37,143 | 157;8,576 | 158;8,656 | 194;7,272 |
| Number of affected mRNAs; proteins | 2,660;2,640 | 123;105 | 128;109 | 133;120 |
| Number of non-synonymous; synonymous mutations | 687;650 | 586;612 | 575;539 | 590;604 |
| Number of in-frame indels | 94 | 52 | 48 | 42 |
| Combined accuracy of mis-matches and indels in coding regions (%) | 99.969 | 99.992 | 99.992 | 99.992 |
| Combined accuracy of mis-matches and indels in non-coding regions (%) | 99.798 | 99.939 | 99.937 | 99.951 |
Metrics for unpolished and polished Vembar assemblies of the Plasmodium falciparum genome against the reference assembly
| Metrics | Vembar assembly | Arrow-polished Vembar assembly | Pilon-polished Vembar assembly |
|---|---|---|---|
| Genome size (bp) (apicoplast removed) | 23,556,156 | 23,527,671 | 23,548,582 |
| Sequence count (apicoplast removed) | 20 | 20 | 20 |
| Quast genome fraction (%) | 98.965 | 99.214 | 98.526 |
| Quast aligned length (bp) | 23,203,419 | 23,233,198 | 23,093,770 |
| Number of Ns (bp); gap count | 0;0 | 0;0 | 0;0 |
| N(G)90 (bp); L(G)90 | 1,063,883;12 | 1,062,674;12 | 1,063,566;12 |
| N(G)50 (bp); L(G)50 | 1,712,288;5 | 1,710,421;5 | 1,711,745;5 |
| GC content (%) | 19.37 | 19.4 | 19.37 |
| Longest sequence (bp) | 3,299,835 | 3,294,973 | 3,298,759 |
| Shortest sequence (bp) | 24,138 | 24,220 | 24,138 |
| Quast number of translocations; relocations; inversions | 0;3;0 | 0;2;0 | 0;3;0 |
| Quast number of local mis-assemblies | 46 | 43 | 45 |
| Quast duplication ratio | 1.007 | 1.005 | 1.006 |
| Quast mis-matches | 1,233 | 1,396 | 1,365 |
| Quast indels (≤5 bp; >5 bp) | 31,261;546 | 9,391;213 | 23,638;533 |
| Quast indels length | 52,962 | 15,731 | 44,775 |
| Quast mis-matches; indels per 100 kbp | 5.35;137.98 | 6.04;41.56 | 5.95;105.32 |
| GAGE missing reference bases (nt; %) | 9,435;0.04 | 3,215;0.01 | 9,185;0.04 |
| GAGE missing assembly bases (nt; %) | 48,492;0.21 | 101,507;0.43 | 48,137;0.20 |
| GAGE duplicated reference bases | 239,012 | 330,347 | 219,507 |
| GAGE compressed reference bases | 146,954 | 97,331 | 172,885 |
| GAGE average identity (%) | 99.76 | 99.92 | 99.79 |
| GAGE nucleotide mis-matches | 2,502 | 1,197 | 2,010 |
| GAGE indels (≤5 bp: >5 bp) | 47,266;477 | 13,187;161 | 38,900;478 |
| GAGE number of translocations; relocations; inversions | 69;29;11 | 39;20;9 | 61;23;10 |
| Complete single-copy; duplicated BUSCO ortholog count | 141;0 | 146;0 | 146;0 |
| Fragmented; missing BUSCO ortholog count | 1;73 | 1;68 | 1;68 |
| Number of nucleotide mis-matches in; outside CDSs | 442;791 | 383;1,013 | 449;916 |
| Number of indels in; outside CDSs | 4,172;27,619 | 669;8,925 | 1,748;22,403 |
| Number of affected CDSs | 2,099 | 465 | 1,040 |
| Number of affected mRNAs; proteins | 1,949;1,947 | 457;454 | 1,001;999 |
| Number of all anomalies | 28,410 | 9,938 | 23,319 |
| Number of non-synonymous; synonymous mutations | 252;190 | 209;174 | 252;197 |
| Number of in-frame indels | 268 | 95 | 169 |
| Combined accuracy of mis-matches and indels in coding regions (%) | 99.978 | 99.988 | 99.984 |
| Combined accuracy of mis-matches and indels in non-coding regions (%) | 99.769 | 99.919 | 99.810 |
Metrics between the Vembar and pipeline assemblies of the Plasmodium falciparum genome
| Metrics | Pilon-polished contigs vs. Vembar assembly | Arrow-polished contigs vs. Vembar assembly | Arrow-polished Vembar assembly vs. Vembar assembly | Arrow-polished Vembar assembly vs. Arrow-polished contigs |
|---|---|---|---|---|
| Genome size (bp) | 23,350,454 | 23,350,837 | 23,527,671 | 23,350,837 |
| Sequence count | 14 | 14 | 20 | 14 |
| Quast genome fraction (%) | 99.196 | 99.196 | 99.638 | 99.206 |
| Quast aligned length (bp) | 23,331,625 | 23,332,007 | 23,455,145 | 23,342,276 |
| Number of Ns (bp); gap count | 0;0 | 0;0 | 0;0 | 0;0 |
| N(G)90 (bp); L(G)90 | 1,059,208;12 | 1,059,223;12 | 1,062,674;12 | 1,059,223;12 |
| N(G)50 (bp); L(G)50 | 1,710,975;5 | 1,711,020;5 | 1,710,421;5 | 1,711,020;5 |
| GC content (%) | 19.33 | 19.33 | 19.4 | 19.33 |
| Longest sequence (bp) | 3,294,056 | 3,294,104 | 3,294,973 | 3,294,104 |
| Shortest sequence (bp) | 642,874 | 642,892 | 24,220 | 642,892 |
| Quast number of translocation; relocation; inversions | 2;4;0 | 1;0;0 | 0;0;0 | 1;1;0 |
| Quast number of local mis-assemblies | 8 | 9 | 7 | 3 |
| Quast duplication ratio | 0.999 | 0.999 | 1 | 1 |
| Quast mis-matches | 443 | 458 | 645 | 368 |
| Quast indels (≤5 bp; >5 bp) | 28,490;336 | 28,437;338 | 27,555;314 | 3,901;154 |
| Quast indels length | 41,790 | 41,736 | 39,998 | 7,753 |
| Quast mis-matches; indels per 100 kbp | 2.09;122.05 | 1.96;123.15 | 2.75;118.74 | 1.58;17.37 |
| GAGE missing reference bases (nt/%) | 45,726/0.19 | 45,177/0.19 | 3,275/0.01 | 40,737/0.17 |
| GAGE missing assembly bases (nt/%) | 3,742/0.02 | 3,524/0.02 | 3,191/0.01 | 1,022/0.00 |
| GAGE duplicated reference bases | 30,238 | 29,012 | 122,706 | 41,521 |
| GAGE compressed reference bases | 213,158 | 200,144 | 120,450 | 782,798 |
| GAGE average identity (%) | 99.81 | 99.81 | 99.82 | 99.97 |
| GAGE nucleotide mis-matches | 399 | 414 | 694 | 180 |
| GAGE indels (≤5 bp; >5 bp) | 39,377;213 | 39,755;212 | 38,586;183 | 5,923;43 |
| GAGE number of translocation; relocations; inversions | 49;20;1 | 46;16;1 | 32;15;0 | 35;8;2 |
| Complete BUSCOs | 148 | 148 | 146 | 148 |
| Complete single-copy; duplicated BUSCO ortholog count | 148;0 | 148;0 | 146;0 | 148;0 |
| Fragmented; missing BUSCO ortholog count | 1;66 | 1;66 | 1;68 | 1;66 |
Figure 2:Correlation diagrams of indels are illustrated for one chromosome of each reference genome reassembled. The columns represent Caenorhabditis elegans, Drosophila melanogaster, and Plasmodium falciparum, from left to right. The y-axis on left side represents the data to correlate with indels (gray bars and smoothened black line), whereas red bars and blue bars on the right side represent positive and negative correlations, respectively. Clearly, the regions around indels correlate with those around nucleotide differences, repeat regions, non-coding non-repeat regions, and gaps in Illumina coverage. In contrast, regions around GC content, coding regions, and Illumina coverage correlate negatively to those around indels. As expected, due to lack of context bias, PacBio coverage does not show clear correlation to indels and have only few low coverage regions in these chromosomes. The correlation patterns for C. elegans and D. melanogaster follow those of P. falciparum, although they are not as conspicuous.