| Literature DB >> 32601059 |
James M Pflug1, Valerie Renee Holmes2, Crystal Burrus2, J Spencer Johnston2, David R Maddison3.
Abstract
Measuring genome size across different species can yield important insights into evolution of the genome and allow for more informed decisions when designing next-generation genomic sequencing projects. New techniques for estimating genome size using shallow genomic sequence data have emerged which have the potential to augment our knowledge of genome sizes, yet these methods have only been used in a limited number of empirical studies. In this project, we compare estimation methods using next-generation sequencing (k-mer methods and average read depth of single-copy genes) to measurements from flow cytometry, a standard method for genome size measures, using ground beetles (Carabidae) and other members of the beetle suborder Adephaga as our test system. We also present a new protocol for using read-depth of single-copy genes to estimate genome size. Additionally, we report flow cytometry measurements for five previously unmeasured carabid species, as well as 21 new draft genomes and six new draft transcriptomes across eight species of adephagan beetles. No single sequence-based method performed well on all species, and all tended to underestimate the genome sizes, although only slightly in most samples. For one species, Bembidion sp. nr. transversale, most sequence-based methods yielded estimates half the size suggested by flow cytometry.Entities:
Keywords: Carabidae; Flow Cytometry; Genome Size; Insect Genomes; K-mer
Mesh:
Year: 2020 PMID: 32601059 PMCID: PMC7466995 DOI: 10.1534/g3.120.401028
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
Figure 1Read mapping coverage at the beginning and end of each of the Regier set loci using reads from Bembidion sp. nr. transversale DNA2544. The black line indicates the average relative coverage along the length of the locus, and the blue line shows 75 base positions from either end of the locus.
Average flow cytometry genome size measurements. Values given in Mb. SD indicates standard deviation; SE indicates standard error
| Female | Male | |||||||
|---|---|---|---|---|---|---|---|---|
| Genome Size | N | SD | SE | Genome Size | N | SD | SE | |
| 2193.41 | 9 | 26.26 | 8.75 | 2118.05 | 9 | 27.77 | 9.26 | |
| 837.39 | 1 | — | — | 831.70 | 5 | 6.57 | 2.94 | |
| 408.37 | 6 | 12.43 | 5.08 | 391.50 | 4 | 4.01 | 2.00 | |
| 597.60 | 1 | — | — | 585.70 | 5 | 5.41 | 2.42 | |
| 1040.65 | 4 | 18.42 | 9.21 | 1000.93 | 4 | 19.16 | 9.58 | |
Figure 2Relative red fluorescence and the number of nuclei counted at each fluorescence level of representative male (A) and female (B) Bembidion sp. nr. transversale. Bars around each peak represent statistical gates that provide the total nuclei in that peak, average channel number of nuclei in the peak, and the coefficient of variation (CV). D. virilis standard 1C = 328 Mb, P. americana standard 1C = 3,338 Mb.
Summary of results for genomes assembled with CLC Genomics Workbench
| Read Length | Total Reads | Reads After Trim | N50 | Average Contig (bp) | Maximum Contig (bp) | Contig Count | Total Assembled Bases | |
|---|---|---|---|---|---|---|---|---|
| 101 | 59,590,984 | 59,182,421 | 355 | 299 | 79,486 | 905,365 | 270,465,134 | |
| 151 | 717,036,094 | 713,035,405 | 525 | 460 | 140,578 | 1,438,922 | 661,385,315 | |
| 101 | 55,816,204 | 51,442,737 | 673 | 543 | 16,894 | 280,998 | 152,578,104 | |
| 151 | 77,809,004 | 75,284,196 | 1,411 | 893 | 68,744 | 340,400 | 304,113,552 | |
| 101 | 80,214,318 | 79,962,073 | 4,114 | 1,481 | 47,683 | 124,458 | 184,268,766 | |
| 101 | 80,209,476 | 79,995,111 | 366 | 306 | 13,857 | 1,298,649 | 397,294,330 | |
| 101 | 69,486,602 | 69,476,201 | 341 | 289 | 17,107 | 1,160,880 | 335,949,421 | |
| 101 | 88,000,000 | 86,431,416 | 1,258 | 588 | 65,825 | 472,429 | 277,787,110 |
Summary of results for transcriptomes assembled with Trinity
| Read Length | Reads Examined | Transcripts | N50 | Average contig length | Total assembled bases | |
|---|---|---|---|---|---|---|
| 50 | 425,577,514 | 57,119 | 1274 | 898.66 | 35,767,428 | |
| 101 | 22,400,532 | 22,330 | 1727 | 1177.05 | 20,375,906 | |
| 101 | 22,103,790 | 24,327 | 1764 | 1146.54 | 22,566,122 | |
| 100 | 30,791,800 | 26,994 | 1963 | 1338.76 | 23,724,243 | |
| 100 | 38,061,712 | 34,153 | 2184 | 1338.91 | 30,509,665 | |
| 101 | 31,571,972 | 29,159 | 1823 | 1209.21 | 26,260,335 |
Summary of genome size estimates using flow cytometry and sequence-based methods. Values given in Mb. Flow cytometry Flow cytometry was not performed on Trachypachus gibbsii DNA3786, Amphizoa insolens DNA3784, and Omoglymmius hamatus DNA3782. CovEST Basic, CovEST Repeat, and GenomeScope analyses were conducted using a k value of 21. Cells in the GenomeScope column containing dashes indicate the sample failed to converge
| Flow Cytometry | Regier Mapping | ODB Mapping | GenomeScope | CovEST Basic | CovEST Repeat | |
|---|---|---|---|---|---|---|
| — | 710.1 | 610.7 | — | 376.78 | 728.47 | |
| 2,118.1 | 1,291.1 | 1,241.0 | 1,113.67 | 932.04 | 2,140.04 | |
| 2,118.1 | 1,114.9 | 1,113.4 | 1,010.67 | 827.88 | 1,980.24 | |
| 2,118.1 | 897.9 | 866.4 | 945.51 | 813.44 | 1,924.27 | |
| 2,118.1 | 983.3 | 946.4 | 908.41 | 758.09 | 1,480.15 | |
| 831.7 | 603.1 | 645.8 | — | 359.94 | 790.81 | |
| 391.5 | 411.5 | 438.5 | 414.46 | 386.92 | 608.44 | |
| 385.8 | 390.1 | 395.7 | 374.46 | 410.91 | 751.89 | |
| 390.1 | 389.5 | 393.4 | 374.67 | 395.86 | 704.51 | |
| 396.5 | 354.5 | 393.1 | — | 409.81 | 796.39 | |
| 393.6 | 383.1 | 406.7 | 377.09 | 412.78 | 624.53 | |
| 585.7 | 769.1 | 663.1 | 545.72 | 442.32 | 659.97 | |
| 585.7 | 518.1 | 535.9 | 483.39 | 340.38 | 423.66 | |
| 585.7 | 522.1 | 513.9 | 493.63 | 346.52 | 578.96 | |
| — | 1,525.7 | 1,552.1 | 74.03 | 627.13 | 1,188.94 | |
| 1,000.9 | 1,144.9 | 984.6 | 74.58 | 475.11 | 1,221.83 | |
| 1,045.1 | 760.3 | 860.5 | — | 586.67 | 1,127.86 | |
| 1,018.3 | 855.5 | 787.4 | — | 597.29 | 1,145.04 | |
| 1,068.0 | 859.9 | 746.4 | — | 650.22 | 1,454.54 | |
| 1,031.2 | 1,103.7 | 927.6 | — | 678.61 | 1,181.38 | |
| — | 342.0 | 315.5 | 264.08 | 314.81 | 525.13 |
Flow cytometry measurements for sample are species averages of multiple individuals (see Tables 2 and S3).
Samples made with DNA extracted from different tissues of the same individual.
Figure 3Boxplot of average genomic read mapping coverages for each of the Regier (left) and OrthoDB (right) genes for eight representative specimens. Red dots indicate outlier genes with coverage outside three interquartiles from the median.
Figure 4Summary of genome size estimates using flow cytometry and sequence-based methods for the eight adephagan species. Flow cytometry measurements are averages of multiple individuals (see Tables 2 and S3). CovEST Basic, CovEST Repeat, and GenomeScope analyses were conducted using a k value of 21. Sequence-based estimates were obtained from different individual specimens (Table 1) than those analyzed with flow cytometry.
Figure 5Summary of genome size estimates using flow cytometry and sequence-based methods for the 13 samples. CovEST Basic, CovEST Repeat, and GenomeScope analyses were conducted using a k value of 21. a. Samples made with DNA extracted from different tissues of the same individual.
Figure 6Meiotic first metaphase cells. (A) Three cells of male Bembidion sp. nr. transversale (B) Two cells of male Lionepha tuulukwa. Photographs are at the same scale. Scale bar 10µm.