| Literature DB >> 23796126 |
Michael W Bevan, Cristobal Uauy.
Abstract
The sequencing of large and complex genomes of crop species, facilitated by new sequencing technologies and bioinformatic approaches, has provided new opportunities for crop improvement. Current challenges include understanding how genetic variation translates into phenotypic performance in the field.Entities:
Mesh:
Year: 2013 PMID: 23796126 PMCID: PMC3706852 DOI: 10.1186/gb-2013-14-6-206
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Progress in crop genome sequencing
| Species (common name) | Genome size | Ploidy | Sequence strategy | Publication | Assembly features | Reference |
|---|---|---|---|---|---|---|
| 389 Mb | 2n = 2x = 24 | BAC physical map, Sanger sequencing | Aug 2005 | Essentially complete chromosome arm coverage | [ | |
| 550 Mb | 2n = 2x = 38 | BAC physical map, | Sep 2006 | 2,447 cscaffolds containing 410 Mb, 82% of sequence genetically anchored | [ | |
| 475 Mb | 2n = 2x = 36 | WGS, Sanger sequencing | Sep 2007 | 3,514 csupercontigs containing 487 Mb, 69% of sequence genetically anchored | [ | |
| 700 Mb | 2n = 2x = 20 | WGS, Sanger sequencing | Jan 2009 | 229 scaffolds containing 97% of the genome, 88% of sequence genetically anchored | [ | |
| 2,300 Mb | 2n = 2x = 20, | BAC physical map, | Nov 2009 | 2,048 Mb in 125,325 bcontigs forming 61,161 scaffolds | [ | |
| 1,115 Mb | Two | WGS, Sanger sequencing | Jan 2010 | 397 scaffolds containing 85% of the genome, 98% of sequence genetically anchored | [ | |
| 750 Mb | One WGD | WGS, Sanger, Roche 454 | Oct 2010 | 1,629 cmetacontigs containing 80% of the genome, 71% of sequence genetically anchored | [ | |
| 430 Mb | 2n = 2x = 20 | WGS, Sanger, Illumina, Roche 454 | Dec 2010 | 524 scaffolds containing 80% of the genome, 67% of sequence genetically anchored | [ | |
| 240 Mb | 2n = 2x = 14 | WGS, Roche 454, Illumina, SOLiD | Dec 2010 | 272 scaffolds containing 95% of the genome, 94% of sequence genetically anchored | [ | |
| 658 Mb | 2n = 2x = 36 | WGS, Illumina | June 2011 | 57,277 scaffolds containing 60% of the genome | [ | |
| 844 Mb | 2n = 4x = 48 | Double monoploid DM and diploid RH, | July 2011 | 443 superscaffolds containing 78% of the genome, 86% of the assembly genetically anchored | [ | |
| 485 Mb | Three | WGS, Illumina, BAC end Sanger sequencing | Aug 2011 | 288 Mb in scaffolds, 90% of the assembly genetically anchored | [ | |
| 375 Mb | WGD | BAC physical map, | Dec 2011 | 8 pseudomolecules containing 70% of the genome, 100% in optical map | [ | |
| 770 Mb | 2n = 2x = 36 | WGS, Roche 454, | Jan 2012 | 12,977 scaffolds containing 80% of the genome | [ | |
| 833 Mb | 2n = 2x = 22 | WGS, Illumina | Jan 2012 | 137,542 scaffolds containing 73% of the genome | [ | |
| 500 Mb | 2n = 2x = 18 | WGS, Sanger, Illumina, BAC end sequence | May 2012 | 597 scaffolds containing 80% of the genome, 99% of the assembly genetically anchored | [ | |
| 900 Mb | 2n = 2x = 24 | WGS, Roche 454, Illumina and SOLiD, | May 2012 | 91 scaffolds containing 85% of the genome, 99% of the assembly genetically anchored | [ | |
| 312 Mb | Three WGD | WGS, Roche 454, BAC end sequencing | July 2012 | 1,584 scaffolds containing 83% of the genome, 88% of the assembly genetically anchored | [ | |
| 523 Mb | 2n = 2x = 22 | WGS, Roche 454, Sanger, Illumina | Aug 2012 | 24,425 contigs containing 90% of the genome, 70% of the assembly genetically anchored | [ | |
| 367 Mb | 2n = 2x = 18 | Dihaploid WGS, Illumina | Jan 2013 | 4,811 scaffolds containing 82% of the genome, 73% of the assembly genetically anchored | [ | |
| 880 Mb | 2n = 2x = 26 | WGS, Illumina | Aug 2012 | 4,715 scaffolds containing 85% of the genome, 73% of the assembly genetically anchored | [ | |
| 5,100 Mb | 2n = 2x = 14 | WGS, Illumina, BAC physical map, BAC sequence (Roche 454, Illumina) | Nov 2012 | Physical map (4.98 Gb), BAC sequence (1.13 Gb), WGS assemblies (1.9 Gb); integrated by physical map and syntenic order | [ | |
| 17,000 Mb | 2n = 6x = 42 | WGS, Roche 454 | Nov 2012 | Orthologous group assembly, 437 Mb | [ | |
| 880 Mb | 2n = 2x = 26 | WGS, Sanger, Roche 454, Illumina | Dec 2012 | 1,084 scaffolds containing 86% of the genome, 98% anchored and oriented to genetic map | [ | |
| 738 Mb | 2n = 2x = 16 | WGS, Illumina | Jan 2013 | 7,163 scaffolds containing 64% of the genome | [ | |
| 2 Gb | 2n = 2x = 48 | WGS, Illumina | Apr 2013 | 80% of the 2.05 Gb assembly maps to 5,499 scaffolds of less than 62 kb | [ | |
| 20,000 Mb | 2n = 2x = 24 | fosmid pools with both haploid (megagametophyte) and diploid WGS | May 2013 | Merged assembly 12.0 Gb, with 4.3 Gb in ≥10 kb scaffolds | [ | |
| 24,000 Mb | 2n = 2x = 24 | WGS single haploid | In progress | |||
| 1,500 Mb | One WGD, | WGS | In progress | |||
| 1,890 Mb | 2n = 2x = 32 | WGS, BAC physical maps | In progress | |||
| >15,000 Mb | Diploid progenitors | WGS | In progress | |||
aWGD alloploids have a whole-genome duplication in recent lineage. bA contig is an unambiguous linear assembly of sequences with no physical gaps in coverage, but which can contain errors. cThe terms supercontig, scaffold or metacontig are used interchangeably to describe a set of contigs that are linked by a known physical distance but that contain sequence gaps. These scaffolds are usually created using mate-pair reads and BAC end sequences. dPseudomolecule is a term applied to a chromosome-scale assembly of contigs and scaffolds that is anchored to a long-range framework using genetic markers and other chromosome features, including cytogenetic features and deletions.
Figure 1Diverse outcomes of polyploidy in crop species. Three examples of the consequences of allopolyploidy (in which hybrids have sets of chromosomes derived from different species) in important crop species are shown. (a) Oilseed rape (canola) is derived from a recent hybridization of Brassica rapa (Chinese cabbage, turnip) and Brassica oleraceae (broccoli, cauliflower, cabbage). The progenitor of these Brassica species was hexaploid (compared to Arabidopsis) after two rounds of whole-genome duplication. Extensive gene loss, possibly via deletion mechanisms [18], has occurred in these species. Upon hybridization to form allotetraploid Brassica napus, gene loss is accelerated, producing novel patterns of allelic diversity [19]. (b) Bread wheat is an allohexaploid derived from the relatively recent hybridization of allotetraploid durum (pasta) wheat and wild goat grass, Aeglilops tauschii. The Ph1 locus in the B genome [37] prevents pairing between the A, B and D genomes, leading to diploid meiosis and genome stability. This maintains the extensive genetic diversity from the three progenitor Triticeae genomes that underpins wheat crop productivity. (c) Sugarcane (Saccharum sp.) is a complex and unstable polyploid that is cultivated by cuttings. Hybrids between S. officinarum, which has high sugar content, and S. spontaneum, a vigorous wild relative, have variable chromosomal content from each parent. The genomes are closely related to the ancestral diploid Sorghum [42].
Figure 2The impact of whole genome sequencing on breeding. (a) Initial genetic maps consisted of few and sparse markers, many of which were anonymous markers (simple sequence repeats (SSR)) or markers based on restriction fragment length polymorphisms (RFLP). For example, if a phenotype of interest was affected by genetic variation within the SSR1-SSR2 interval, the complete region would be selected with little information about its gene content or allelic variation. (b) Whole genome sequencing of a closely related species enabled projection of gene content onto the target genetic map. This allowed breeders to postulate the presence of specific genes on the basis of conserved gene order across species (synteny), although this varies between species and regions. (c) Complete genome sequence in the target species provides breeders with an unprecedented wealth of information that allows them to access and identify variation that is useful for crop improvement. In addition to providing immediate access to gene content, putative gene function and precise genomic positions, the whole genome sequence facilitates the identification of both natural and induced (by TILLING) variation in germplasm collections and copy number variation between varieties. Promoter sequences allow epigenetic states to be surveyed, and expression levels can be monitored in different tissues or environments and in specific genetic backgrounds using RNAseq or microarrays. Integration of these layers of information can create gene networks, from which epistasis and target pathways can be identified. Furthermore, re-sequencing of varieties identifies a high density of SNP markers across genomic intervals, which enable genome-wide association studies (GWAS), genomic selection (GS) and more defined marker-assisted selection (MAS) strategies.