| Literature DB >> 35348662 |
Lewis Stevens1, Nicolas D Moya1,2, Robyn E Tanny1, Sophia B Gibson1, Alan Tracey3, Huimin Na4, Rojin Chitrakar5, Job Dekker4, Albertha J M Walhout4, L Ryan Baugh5,6, Erik C Andersen1.
Abstract
The publication of the Caenorhabditis briggsae reference genome in 2003 enabled the first comparative genomics studies between C. elegans and C. briggsae, shedding light on the evolution of genome content and structure in the Caenorhabditis genus. However, despite being widely used, the currently available C. briggsae reference genome is substantially less complete and structurally accurate than the C. elegans reference genome. Here, we used high-coverage Oxford Nanopore long-read and chromosome-conformation capture data to generate chromosome-level reference genomes for two C. briggsae strains: QX1410, a new reference strain closely related to the laboratory AF16 strain, and VX34, a highly divergent strain isolated in China. We also sequenced 99 recombinant inbred lines generated from reciprocal crosses between QX1410 and VX34 to create a recombination map and identify chromosomal domains. Additionally, we used both short- and long-read RNA sequencing data to generate high-quality gene annotations. By comparing these new reference genomes to the current reference, we reveal that hyper-divergent haplotypes cover large portions of the C. briggsae genome, similar to recent reports in C. elegans and C. tropicalis. We also show that the genomes of selfing Caenorhabditis species have undergone more rearrangement than their outcrossing relatives, which has biased previous estimates of rearrangement rate in Caenorhabditis. These new genomes provide a substantially improved platform for comparative genomics in Caenorhabditis and narrow the gap between the quality of genomic resources available for C. elegans and C. briggsae.Entities:
Keywords: zzm321990 Caenorhabditis briggsaezzm321990 ; comparative genomics; genetic diversity; genome rearrangement; reference genomes; selfing
Mesh:
Year: 2022 PMID: 35348662 PMCID: PMC9011032 DOI: 10.1093/gbe/evac042
Source DB: PubMed Journal: Genome Biol Evol ISSN: 1759-6653 Impact factor: 4.065
Reference genome metrics
|
|
|
|
| |
|---|---|---|---|---|
| Accession | PRJNA784955 | PRJNA784955 | PRJNA10731 | PRJNA13758 |
| Version | v1 | v1 | WS279 | WS279 |
| Span (Mb) | 106.2 | 107.0 | 108.4 | 100.2 |
| Number of scaffolds | 6 (+MT) | 6 (+MT) | 367 | 6 (+MT) |
| Number of unassigned scaffolds | 0 | 0 | 361 | 0 |
| Percentage of assembly span in in six scaffolds (%) | 100 | 100 | 97.0 | 100 |
| Scaffold N50 (Mb) | 17.1 | 17.3 | 17.5 | 17.5 |
| Number of contigs[ | 14 | 10 | 5,074 | 6 |
| Contig N50 (Mb)[ | 14.7 | 16.1 | 0.05 | 17.5 |
| Number of gaps | 7 | 3 | 4,707 | 0 |
| Span of Ns (kb) | 3.5 | 1.5 | 2,965.5 | 0 |
| BUSCO[ | 99.4 | 99.4 | 98.8 | 99.4 |
| QV score[ | 45.6 | 44.4 | – | – |
Contig values calculated by splitting scaffolds at ≥10 consecutive Ns.
Genome completeness was assessed using BUSCO (version 4.1.4) with the nematoda_odb10 dataset.
QV scores were calculated by Merqury (version 1.1) using short-read Illumina data. QV scores of 45.6 and 44.4 correspond to one error every 36.2 and 27.5 kb, respectively.
Fig. 1.High-quality reference genomes for two C. briggsae strains. (A) Comparison between the C. briggsae QX1410 and C. elegans N2 reference genomes. Repeat density and protein-coding gene density per 10 kb windows are shown. Repeats were identified de novo using RepeatModeler2. Solid lines represent LOESS smoothing functions fitted to the data. Relative positions of 10,387 one-to-one orthologs are shown as lines joining the two density plots. (B) Whole-genome alignment of AF16 to the QX1410 reference genome generated using nucmer. Alignments shorter than 1 kbp are not shown. Alignments in the reverse orientation are highlighted in red. Inset: chromosome IV showing multiple regions between AF16 and QX1410 that are in different orientations. (C) Whole-genome alignment of VX34 to the QX1410 reference genome generated using nucmer. Alignments shorter than 1 kbp are not shown. Alignments in the reverse orientation are highlighted in red. Inset: the same chromosome IV region as in (B) showing a largely collinear alignment.
Fig. 2.Recombination rates in the C. briggsae genome determined by genotyping 99 QX1410xVX34 RILs. (A) Marey maps for each chromosome in QX1410. The genetic position of each marker is shown as a function of physical position (black dots). Fits from segmented linear regressions are shown as black lines. Changepoints in the segmented linear regressions were used to estimate chromosome domain boundaries and the rate of recombination. Dashed pink lines indicate the physical position of gaps in the QX1410 genome assembly. Asterisks were added to chromosomal ends where subtelomeric regions are unresolved. (B) Frequency of the QX1410 allele as a function of physical position across every marker in each chromosome. Allele frequency was averaged using a sliding window 100 kb with a step size of 5 kb. The neutral expected frequency of 0.5 is shown as a dashed horizontal black line.
Chromosomal domains
| Chr | Left tip | Left arm | Center | Right arm | Right tip | |
|---|---|---|---|---|---|---|
| I | Size (kb) | 388 | 2,803 | 8,017 | 3,762 | 571 |
| Size (%) | 2.5 | 18.0 | 51.6 | 24.2 | 3.7 | |
| Right end (kb) | 388 | 3,191 | 11,208 | 14,970 | 15,541 | |
| Rate[ | 0 | 8.43 | 0.40 | 7.63 | 0 | |
| II | Size (kb) | 566 | 2,844 | 10,578 | 2,583 | 24 |
| Size (%) | 3.4 | 17.1 | 63.7 | 15.6 | 0.2 | |
| Right end (kb) | 566 | 3,410 | 13,988 | 16,571 | 16,595 | |
| Rate[ | 0 | 3.68 | 1.17 | 21.75 | 0 | |
| III | Size (kb) | 716 | 3,249 | 7,859 | 2,171 | 816 |
| Size (%) | 4.8 | 21.9 | 53.1 | 14.7 | 5.5 | |
| Right end (kb) | 716 | 3,965 | 11,824 | 13,995 | 14,811 | |
| Rate[ | 0 | 7.16 | 0.47 | 10.88 | 0 | |
| IV | Size (kb) | 702 | 3,546 | 9,194 | 2,924 | 714 |
| Size (%) | 4.1 | 20.8 | 53.8 | 17.1 | 4.2 | |
| Right end (kb) | 702 | 4,248 | 13,442 | 16,366 | 17,080 | |
| Rate[ | 0 | 4.14 | 0.67 | 8.99 | 0 | |
| V | Size (kb) | 469 | 5,593 | 9,433 | 4,052 | 386 |
| Size (%) | 2.4 | 28.1 | 47.3 | 20.3 | 1.9 | |
| Right end (kb) | 469 | 6,062 | 15,495 | 19,547 | 19,933 | |
| Rate[ | 0 | 4.20 | 0.44 | 6.68 | 0 | |
| X | Size (kb) | 1,444 | 5,622 | 11,456 | 3,639 | 60 |
| Size (%) | 6.5 | 25.3 | 51.5 | 16.4 | 0.3 | |
| Right end (kb) | 1,444 | 7,066 | 18,522 | 22,161 | 22,221 | |
| Rate[ | 0 | 5.01 | 0.67 | 3.46 | 0 | |
| Cumulative size (kb) | 4,285 | 24,385 | 55,809 | 19,131 | 2,571 | |
| Cumulative size (%) | 4.0 | 23.0 | 52.6 | 18.0 | 2.4 | |
Rates are estimated from the slopes of segmented linear fits with chromosome genetic length scaled to 50 cM.
Fig. 3.Genome-wide divergence between three C. briggsae strains. Genomes were aligned using nucmer and aligned regions of 1 kb or longer are shown. Conserved protein sequences were identified using OrthoFinder and aligned using MAFFT; lines represent LOESS smoothing curves fitted to the amino acid identity data. Grey shading indicates chromosome arm regions defined previously. (A) Nucleotide identity between aligned regions of QX1410 and AF16 genomes, and amino acid identity of protein sequences conserved between QX1410 and AF16. (B) Nucleotide identity between aligned regions of QX1410 and VX34 genomes, and amino acid identity of protein sequences conserved between QX1410 and VX34.
Fig. 4.Selfing species have undergone more genome rearrangement than their outcrossing sister species. (A) Caenorhabditis phylogeny showing relationships within the Elegans group (Stevens et al. 2020). The species under comparison are highlighted in blue (selfers) or orange (outcrossers). Branch lengths are in substitutions per site; scale is shown. (B) Percentage of neighboring gene pairs in each chromosome with collinear orthologs between the two selfing and two outcrossing species. (C) The proportion of neighboring genes in 500 kb windows of the C. elegans genome that have collinear orthologs in the C. briggsae genome. Solid represent LOESS smoothing functions fitted to the data. Dotted lines represent the positions of the recombination rate domain boundaries (“arms” and “centers”) in C. elegans (Rockman and Kruglyak 2009). (D) Positions of 9,395 one-to-one orthologs in the C. elegans and C. briggsae genomes. Dotted lines represent the positions of the recombination rate domain boundaries (“arms” and “centers”) in C. elegans. (E) The proportion of neighboring genes in 500 kb windows of the C. inopinata genome that have collinear orthologs in the C. nigoni genome. Lines represent LOESS smoothing functions fitted to the data. (F) Positions of 9,395 one-to-one orthologs in the C. inopinata and C. nigoni genomes.