| Literature DB >> 31540974 |
Jennafer A P Hamlin1, Guilherme B Dias2, Casey M Bergman2, Douda Bensasson3.
Abstract
Although normally a harmless commensal, Candida albicans, it is also one of the most common causes of bloodstream infections in the U.S. Candida albicans has long been considered an obligate commensal, however, recent studies suggest it can live outside animal hosts. Here, we have generated PacBio sequences and phased genome assemblies for three C. albicans strains from oak trees (NCYC 4144, NCYC 4145, and NCYC 4146). PacBio datasets are high depth (over 400 fold coverage) and more than half of the sequencing data are contained in reads longer than 15 kb. Primary assemblies showed high contiguity with several chromosomes for each strain recovered as single contigs, and greater than half of the alternative haplotype sequence was assembled in haplotigs at least 174 kb long. Using these assemblies we were able to identify structural polymorphisms, including a polymorphic inversion over 100 kb in length. These results show that phased de novo diploid assemblies for C. albicans can enable the study of genomic variation within and among strains of an important fungal pathogen.Entities:
Keywords: Candida albicans yeast; SMRT sequencing; haplotype phasing
Mesh:
Year: 2019 PMID: 31540974 PMCID: PMC6829152 DOI: 10.1534/g3.119.400486
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
PacBio sequencing statistics for C. albicans oak strains
| NCYC 4144 | NCYC 4145 | NCYC 4146 | |
|---|---|---|---|
| Yield (Gbp) | 6.4 | 6.1 | 8.3 |
| Read number | 703,252 | 689,752 | 863,237 |
| Theoretical coverage | 432x | 415x | 558x |
| Read N50 (bp) | 16,366 | 15,964 | 16,110 |
| Longest read (bp) | 68,892 | 67,764 | 100,529 |
Assuming a 14.8 Mb haploid genome size (van het Hoog ).
Figure 1Read length distribution for the PacBio data sets generated for three C. albicans oak strains. Vertical lines indicate read N50 length.
De novo genome assembly statistics
| NCYC 4144 | NCYC 4145 | NCYC 4146 | ||||
|---|---|---|---|---|---|---|
| Primary contigs | Haplotigs | Primary contigs | Haplotigs | Primary contigs | Haplotigs | |
| Length (bp) | 14,699,875 | 12,507,746 | 15,453,953 | 13,773,744 | 15,483,668 | 13,063,188 |
| Count | 9 | 68 | 17 | 101 | 18 | 84 |
| GC (%) | 33.6 | 33.51 | 33.68 | 33.49 | 33.59 | 33.55 |
| N50 (bp) | 1,942,916 | 254,098 | 1,652,457 | 174,215 | 1,284,452 | 248,397 |
Figure 2Diploid assemblies of C. albicans generated in this study (y-axis) aligned to the reference strain SC5314 (x-axis). For each chromosome that is labeled on the x-axis the primary contigs and haplotigs are stacked, with the primary contigs always placed first in the lower portion of each plot. Darker vertical grid lines demarcate chromosome limits, and horizontal lines show gaps between contigs. Note that the primary assemblies (lower diagonals for each chromosome) are nearly gapless. These dot plots display only alignments that are at least 20 kb in length and aligned segments are colored by percent identity with SC5314.
BUSCO scores for the primary contigs of C. albicans de novo assemblies. Values represent percentages and numbers inside parentheses indicate absolute number of genes for each category
| BUSCO category | NCYC 4144 | NCYC 4145 | NCYC 4146 | |||
|---|---|---|---|---|---|---|
| Primary | Haplotigs | Primary | Haplotigs | Primary | Haplotigs | |
| Complete | 97.1 (1660) | 79.2 (1355) | 96.2 (1647) | 88.2 (1510) | 96.9 (1659) | 84.3 (1442) |
| Single-copy | 96.6 (1652) | 78.8 (1348) | 92.3 (1580) | 87.8 (1503) | 93.3 (1597) | 83.9 (1435) |
| Duplicated | 0.5 (8) | 0.4 (7) | 3.9 (67) | 0.4 (7) | 3.6 (62) | 0.4 (7) |
| Fragmented | 1.3 (23) | 2.1 (36) | 1.5 (26) | 2.6 (44) | 1.0 (17) | 2.0(35) |
| Missing | 1.6 (28) | 18.7 (320) | 2.3 (38) | 9.2 (157) | 2.1 (35) | 13.7 (234) |
The Saccharomycetales ortholog gene set v9 was used in this analysis and comprises a total of 1,711 genes.
Figure 3Depth of contig coverage for three diploid C. albicans assemblies mapped to the haploid reference genome (strain SC5314). Black boxes below each coverage plot highlight the regions defined as loss of heterozygosity in Bensasson and generally correspond to regions of contig depth of coverage = 1.
Figure 4Structural polymorphisms in the diploid assemblies of chromosome 3 aligned to the haploid reference genome (strain SC5314) at a previously known locus (van het Hoog ; Todd ). (A) Dot plots of both primary contigs and haplotigs showing a small homozygous inversion in strain NCYC 4144 followed by a large heterozygous inversion. In strain NCYC 4146, the small inversion is heterozygous and the large inversion is homozygous, while NCYC 4145 is homozygous for both inversions. Both the small and the large inversion are flanked by inverted repeats, which are more easily recognized on the dot plots when the intervening inversions are absent (the NCYC 4144 or NCYC 4146 haplotigs). (B) Contig alignment depth over the same region. The inverted repeats on both sides of the two polymorphic inversions show the increased contig alignment depth expected for repeats.