| Literature DB >> 31546893 |
Sara L Martin1, Jean-Sebastien Parent2, Martin Laforest3, Eric Page4, Julia M Kreiner5, Tracey James6.
Abstract
Genomic approaches are opening avenues for understanding all aspects of biological life, especially as they begin to be applied to multiple individuals and populations. However, these approaches typically depend on the availability of a sequenced genome for the species of interest. While the number of genomes being sequenced is exploding, one group that has lagged behind are weeds. Although the power of genomic approaches for weed science has been recognized, what is needed to implement these approaches is unfamiliar to many weed scientists. In this review we attempt to address this problem by providing a primer on genome sequencing and provide examples of how genomics can help answer key questions in weed science such as: (1) Where do agricultural weeds come from; (2) what genes underlie herbicide resistance; and, more speculatively, (3) can we alter weed populations to make them easier to control? This review is intended as an introduction to orient weed scientists who are thinking about initiating genome sequencing projects to better understand weed populations, to highlight recent publications that illustrate the potential for these methods, and to provide direction to key tools and literature that will facilitate the development and execution of weed genomic projects.Entities:
Keywords: genome scans; genomics; non-target site resistance; plant genome assembly; population genetics; population genomics; weeds
Year: 2019 PMID: 31546893 PMCID: PMC6783936 DOI: 10.3390/plants8090354
Source DB: PubMed Journal: Plants (Basel) ISSN: 2223-7747
Metrics of continuity and completion for weed genome assemblies available from GenBank. This list was compiled by search GenBank [34] in May 2019 for species included on one of the following five lists of weeds: 1) Species with herbicide resistance maintained at weedscience.org by Heap [35], 2) the United States Department of Agriculture’s Federal Noxious Weed List [36], 3) Weeds of Nation Significance in Australia [37], 4) Weber and Gut’s list of weeds spreading in Europe [38], or 5) the Canadian Weed Seed Order [39]. Year is the year the assembly was submitted to GenBank, and as with assembly level, coverage, sequencing technology used and assembly method were recorded from the assembly information page on GenBank. The number of contigs (greater than 500 bp long), assembled genome size, N50 and NG50 were determined by QUAST (v. 5.0.2). The number of BUSCOs that were (C)omplete, complete and (S)ingle-copy, complete and (D)uplicated, (F)ragmented, or (M)issing were determined using the eudicotyledons_odb10 set of 2121 conserved genes and BUSCO version 3.0.2. Where we could not locate a published for the genome, we have reported the lead author as listed as having submitted the genome to GenBank. Note that additional weed genomes may be available on CoGe, Phytozome, and the European nucleotide database.
| Common Name | Latin Name | Year | Level of Assembly | No. of Contigs | Est. Genome Size (Mbp) | Assembled Size (Mbp) | N50 | NG50 | BUSCOS (Percentage of 2121 Genes) | Coverage | Sequencing Technology | Assembly Method | Reference or Lead Submitting Author | ||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| C | S | D | F | M | |||||||||||||
|
|
| 2017 | Scaffold | 221,885 | 411 1 | 237 | 2555 | NA2 | 76 | 75 | 1 | 13 | 11 | 80.4 | Illumina | Platanus | [ |
|
|
| 2016 | Scaffold | 7810 | 270 | 167 | 56,351 | 19,454 | 95 | 93 | 2 | 3 | 3 | 66.5 | Illumina | Celera | [ |
|
|
| 2018 | Contig | 11,815 | 1515 1 | 2241 | 397,058 | 654,137 | 88 | 30 | 57 | 3 | 9 | 104.8 | PacBio | FALCON-Unzip | R. Bartaula |
|
|
| 2018 | Chromosome | 11 | 355 | 271 | 59,130,575 | 59,130,575 | 80 | 76 | 4 | 6 | 15 | 9.4 | ABI 3739 | ARACHNE | [ |
|
|
| 2017 | Scaffold | 70,673 | 485 | 386 | 3,737,062 | 2,395,810 | 98 | 80 | 18 | 1 | 1 | 212 | Illumina | SOAPdenovo | [ |
|
|
| 2018 | Chromosome | 6653 3 | 820 | 892 | 60,968,100 | 62,039,859 | 88 | 72 | 16 | 4 | 7 | 79 | PacBio | FALCON | [ |
|
|
| 2017 | Scaffold | 8186 | 391 1 | 268 | 627,605 | 320,701 | 96 | 13 | 83 | 2 | 3 | 40 | Illumina | Newbler | [ |
|
| 2018 | Scaffold | 2936 | 340 1 | 301 | 1,020,118 | 894,734 | 97 | 93 | 4 | 1 | 2 | 546.9 | Illumina | SOAPdenovo2 | [ | |
|
|
| 2018 | Scaffold | 39,787 | 340 | 283 | 964,272 | 627,004 | 97 | 93 | 5 | 1 | 2 | 890 | Illumina | SOAPdenovo | [ |
|
|
| 2018 | Scaffold | 67,725 | 460 | 344 | 1,376,405 | 577,147 | 98 | 96 | 2 | 1 | 1 | 200 | Illumina | Platanus | [ |
|
|
| 2014 | Contig | 20,075 | 335 | 326 | 20,748 | 20,226 | 66 | 44 | 22 | 10 | 24 | 350 | Roche 454 | Newbler | [ |
|
|
| 2017 | Contig | 52,373 | 450 | 377 | 16,573 | 13,050 | 93 | 90 | 3 | 3 | 4 | 47.7 | Illumina | Newbler | [ |
|
|
| 2012 | Scaffold | 10,823 | 450 | 375 | 4,428,067 | 3,741,400 | 94 | 92 | 2 | 2 | 4 | 13.5 | Roche 454 | Newbler | [ |
|
|
| 2018 | Chromosome | 8283 3 | 1084 | 725 | 25,947,084 | 173,700 | 96 | 90 | 6 | 2 | 2 | 80 | Illumina | AllPaths | [ |
|
|
| 2018 | Scaffold | 1,072,009 | 3327 4 | 840 | 3242 | NA2 | 76 | 72 | 4 | 8 | 16 | 50 | Illumina | SOAPdenovo | J. Li |
|
| 2016 | Chromosome | 4826 3 | 473 | 422 | 36,610,139 | 36,610,139 | 94 | 88 | 6 | 2 | 5 | 186 | Roche 454 | SOAPdenovo | [ | |
|
|
| 2017 | Chromosome | 21 | 694 1 | 457 | 25,272,979 | NA2 | 83 | 78 | 5 | 3 | 14 | 100 | Illumina | Allpaths-LG | S. Natsume |
|
|
| 2017 | Scaffold | 4113 | 1400 | 486 | 705,200 | NA2 | 89 | 26 | 63 | 2 | 10 | 170 | Illumina | SOAPdenovo2 | [ |
|
|
| 2019 | Chromosome | 1091 3 | 333 1 | 349 | 1,429,328 | 1,517,519 | 96 | 46 | 50 | 1 | 3 | 115 | Illumina | MECAT | C.-Y. Tang |
|
|
| 2017 | Chromosome | 1528 3 | 3600 | 3028 | 178,899,001 | 174,509,413 | 89 | 80 | 9 | 3 | 8 | 100 | PacBio | PBcR | [ |
|
|
| 2018 | Chromosome | 16 | 496 1 | 462 | 29,809,665 | 28,894,297 | 97 | 89 | 7 | 1 | 2 | 290 | Illumina | SOAPdenovo2 | [ |
|
|
| 2016 | Scaffold | 666,180 | 2621 1 | 481 | 1361 | NA2 | 31 | 29 | 2 | 22 | 47 | 5 | Illumina | CLC Genomic Workbench | [ |
|
|
| 2016 | Scaffold | 190,876 | 400 | 353 | 3915 | 3044 | 58 | 52 | 5 | 20 | 22 | 33 | Illumina | MaSuRCA | [ |
|
|
| 2018 | Chromosome | 105,321 3 | 2513 1 | 2075 | 37,709 | 24,189 | 49 | 41 | 8 | 17 | 33 | 60 | Illumina | ABySS | J. De Vega |
|
|
| 2014 | Scaffold | 9688 | 782 1 | 362 | 30,401,905 | NA2 | 86 | 80 | 6 | 4 | 10 | 52.5 | Illumina | SOAPdenovo2 | C. Brian |
|
|
| 2014 | Chromosome | 12 | 586 1 | 394 | 31,244,610 | 28,494,620 | 81 | 74 | 7 | 6 | 13 | 130 | Roche 454 | AllPaths | R. A. Wing |
|
|
| 2015 | Scaffold | 3818 | 450 1 | 339 | 27,785,585 | 26,200,591 | 83 | 76 | 6 | 5 | 13 | 120 | Q. Zhao | ||
|
|
| 2019 | Chromosome | 367 3 | 489 1 | 415 | 28,085,715 | 26,003,091 | 88 | 81 | 6 | 3 | 9 | 148 | PacBio | CANU | L. Wang |
|
|
| 2018 | Chromosome | 466 | 923 | 848 | 48,259,421 | 45,112,342 | 83 | 25 | 58 | 4 | 13 | 160 | Illumina | CANU | [ |
|
|
| 2018 | Chromosome | 34,381 3 | 2870 | 2716 | 204,470,928 | 180,516,484 | 95 | 29 | 65 | 1 | 4 | 239 | Illumina | DeNovoMAGIC | [ |
|
|
| 2019 | Contig | 6087 | 508 1 | 707 | 248,703 | 390,844 | 95 | 52 | 43 | 1 | 3 | 130 | Illumina | SMARTdenovo | [ |
|
|
| 2019 | Contig | 4454 | 391 1 | 500 | 237,044 | 357,710 | 70 | 49 | 21 | 3 | 27 | 30 | PacBio | CANU | W. Kong |
|
|
| 2014 | Contig | 64,732 | 515 | 254 | 10,333 | NA2 | 95 | 82 | 12 | 3 | 2 | 47 | Roche 454 | ABySS, Newbler, Celera Assembler, Minimus2 | [ |
|
|
| 2017 | Chromosome | 44,239 3 | 573 | 383 | 35,166,889 | 26,198,371 | 96 | 82 | 14 | 2 | 1 | 225 | Illumina | SOAPdenovo2 | [ |
|
|
| 2017 | Scaffold | 83,189 | 711 | 740 | 90,830 | 95,085 | 91 | 66 | 25 | 4 | 5 | 327 | Illumina | SOAPdenovo2 GapCloser | [ |
|
|
| 2018 | Chromosome | 15,303 3 | 1565 1 | 3133 | 91,359,291 | 109,189,819 | 78 | 20 | 58 | 5 | 17 | 90 | Illumina | CANU | [ |
|
|
| 2017 | Scaffold | 1,581,707 | 7900 | 1685 | 2200 | NA2 | 66 | 62 | 4 | 13 | 21 | 50 | Illumina | CLC Assembly Cell, CarmA | [ |
|
|
| 2019 | Chromosome | 14 | 782 1 | 396 | 46,702,114 | 35,460,007 | 81 | 75 | 6 | 6 | 13 | 118 | PacBio | MECAT | P. Huang |
|
|
| 2018 | Scaffold | 319,506 | 2640 1 | 1185 | 11,019 | NA2 | 68 | 66 | 4 | 13 | 18 | 40 | PacBio | SOAPdenovo2, CLC, PBJelly, SSPACE | [ |
|
|
| 2016 | Contig | 258,575 | 792 1 | 1478 | 6967 | NA2 | 38 | 33 | 6 | 8 | 54 | 96 | Illumina | Celera Assembler | Y. Lv |
|
|
| 2017 | Chromosome | 869 3 | 730 | 709 | 68,658,214 | 68,658,214 | 86 | 80 | 5 | 4 | 10 | 8 | Illumina | ARACHNE | [ |
|
|
| 2015 | Scaffold | 6768 | 539 | 343 | 140,815 | NA2 | 98 | 97 | 2 | 1 | 1 | 80 | Illumina | CLC NGS Cell | [ |
1 When not reported by the authors, we have estimated based the genome size based on the genome size available from Kew’s C-DNA value database (see Section 2.2). In some cases, this has resulted in an estimate smaller than the assembled genome size. 2 In cases where the genome assemble size is not sufficiently higher than half of the expected genome size, an NG50 cannot be calculated (see Section 2.1). 3 In some cases chromosome-level genome assemblies have pieces left over and these increase the number of contigs included in the assembly files beyond the expected chromosome number. 4 Genome size estimate from DNA content analysis in Creber et al. [70].
Figure 1Plot of k-mer frequency by length produced for Camelina neglecta J.Brock, Mandáková, Lysak & Al-Shehbaz produced using Jellyfish and visualized using R. The position of the peak at a k-mer length of 22 is used to calculate genome size based on the area under the curve as represented by the light blue region. Here the genome size estimated is 248 Mb, while flow cytometry estimates indicate a genome size of 264 (±9) Mbp [80].
Figure 2Blobplot generated for Conzya canadensis (Asteraceae) draft genome assembly showing the genera with the closest similarity to the sequenced genome (Laforest, Martin, and Page unpublished data). The first panel (A) indicates the percentage of reads that were mapped and the second panel (B) shows the taxonomic break down of hits at the taxonomic level requested. In this case the majority of hits are from other genera from the Asteraceae. The program generates a text file with more detailed information. The three part third panel (C) shows histograms for the proportion of G and C bases in the sequence which typically varies among species (top) and coverage (right) weighted by the cumulative length of sequences in each bin. The main panel has circles colored by taxonomic affiliation positioned on the x-axis by the GC proportion and on the y-axis by coverage within the raw data which gives a sense of the relative concentration of the sequences in the DNA sample.