| Literature DB >> 26911872 |
Anna Vlasova1,2, Salvador Capella-Gutiérrez1,2,3, Martha Rendón-Anaya4, Miguel Hernández-Oñate4, André E Minoche5, Ionas Erb1,2, Francisco Câmara1,2, Pablo Prieto-Barja1,2, André Corvelo6, Walter Sanseverino7, Gastón Westergaard8, Juliane C Dohm9, Georgios J Pappas10, Soledad Saburido-Alvarez4, Darek Kedra1,2, Irene Gonzalez2,11, Luca Cozzuto1,2, Jessica Gómez-Garrido2,12, María A Aguilar-Morón2,11, Nuria Andreu2,11, O Mario Aguilar13, Jordi Garcia-Mas7, Maik Zehnsdorf2,11, Martín P Vázquez8, Alfonso Delgado-Salinas14, Luis Delaye15, Ernesto Lowy16, Alejandro Mentaberry17, Rosana P Vianello-Brondani18, José Luís García19, Tyler Alioto2,12, Federico Sánchez20, Heinz Himmelbauer9, Marta Santalla21, Cedric Notredame1,2, Toni Gabaldón22,23,24, Alfredo Herrera-Estrella25, Roderic Guigó26,27,28.
Abstract
BACKGROUND: Legumes are the third largest family of angiosperms and the second most important crop class. Legume genomes have been shaped by extensive large-scale gene duplications, including an approximately 58 million year old whole genome duplication shared by most crop legumes.Entities:
Mesh:
Substances:
Year: 2016 PMID: 26911872 PMCID: PMC4766624 DOI: 10.1186/s13059-016-0883-6
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Fig. 1BAT93 assembly overview. a An example of a genotype-by-sequencing (GBS) profile for the scaffold scaffold00017. The defined mis-assembly point is at the center. Colors indicate different variants between the GBS samples and the reference genome: blue, homozygous variant; light blue, heterozygous variant; grey, absence of any variant. Colors correspond to the linkage groups. b Synteny-like comparison of one-to-one ortologs between BAT93 (green) and G19833 (brown) linkage groups. Colors correspond to the linkage groups, as in (c). c Circos plot representing the gene content and transcriptome maps of the linkage groups of P. vulgaris. The outer ring represents the localization of genes across bean linkage groups. Grey regions are meant to contain genes and white regions depleted from annotated genes. The red line shows the repeat coverage across the linkage groups. Below, squares of different colors represent different types of genes: red, smallRNAs; blue, lncRNAs; yellow, legume-specific; black, resistance. The inner rings below the horizontal bar delineating the linkage groups represent RNA-Seq coverage for the different organs: axial meristem, flower, pod, seed, leaf, root and stem
Summary of P. vulgaris cv. BAT93 genome assembly
| Whole genome | Scaffolds only | |
|---|---|---|
| Assembly | ||
| Total length | 549,604,264 | 494,957,111 |
| Number of scaffolds/contigs | 68,379 | 9,047 |
| N50(size/number) | 433,759 / 324 | 526,483 / 267 |
| N90(size/number) | 2,023 / 8,894 | 35,958 / 1,484 |
| Range (min-max) | 500-3,177,954 | 2,000-3,177,954 |
| % of Ns | 34.96 % | 36.99 % |
| G + C content | 38.43 % | 36.64 % |
| Annotation | ||
| Number of protein coding (PC) genes | 30,491 | 29,569 |
| Number of PC transcripts | 66,634 | 65,685 |
| Number of small RNAs | 2,529 | 2,271 |
| Number of long non-coding genes | 1,033 | 870 |
| G + C content exonic (for PC genes) | 47.57 % | 47.70 % |
| Number of functionally annotated transcripts | 62,713 (94.12 %) | 62,594 (95.2 %) |
The "Whole genome" column corresponds to the entire set of scaffolds and unplaced contigs, while the "Scaffolds only" column corresponds only to the set of scaffolds. Complete annotation statistic are provided in Additional file 1: Table S15
Fig. 2Conservation and expression pattern of lncRNAs in P. vulgaris. Phylogenomics profiling of lncRNA transcripts in 12 plant species. Shown are 762 bean transcripts (belonging to 507 genes) conserved in at least one other plant species. Percentage of sequence identity with bean is shown as a heat map, where green denotes high similarity and grey missing transcripts. The leftmost column indicates average expression levels in bean, the rightmost column marks 56 transcripts inferred from A. thaliana homologues
Fig. 3Phylogenomics analysis. The species phylogeny is based on maximum-likelihood analyses of a concatenated alignment of 172 widespread, single-copy orthologous genes. The two different P. vulgaris accessions used in this phylogeny are colored differently. Bars represent the total number of genes for each species (scale on the top) and are divided to indicate different types of phylogenetic profiles: green, widespread proteins which are found in at least 12 of the 14 species; grey, widespread but legume-specific proteins which are found in at least four of the six legumes species; light-orange, genes without a clear phylogenetic profile; brown, species-specific genes with no (detectable) homologs in other species. The thin blue line under each bar represents the percentage of P. vulgaris G19833 genes which have homologs in a given species. Conversely, the thin orange line represents the percentage of P. vulgaris BAT93 genes which have homologs in a given species
Fig. 4Transcriptome dynamics. a Development stages of the common bean. Modified with permission from the technical guide for the bean growing by the “Instituto Interamericano de Cooperación para la Agricultura” (IICA) [33]. b Hierarchical clustering of bean samples based on expression levels of protein coding genes (PCG). The sample labels are described in Additional file 1: Table S8. c Tissue specificity of the PCGs and lncRNA genes. The bar plot represents the proportion of genes expressed in a given number of organs d The pie charts represent the distribution of organ-specific PCG and lncRNAs across organs. The color code for organs is specified in (b). e Differential PCG and lncRNA expression during development. Each bar corresponds to the number of genes differentially expressed in a given developmental stage compared with the previous one. Values above and below zero indicate the proportion of up-regulated and down-regulated genes, respectively; the number of regulated genes is shown at the tip of the corresponding bar
Fig. 5Co-expression network. a Co-expression network layout; the 11 largest modules are colored differently, and labeled with their putative function. b Composition of the largest modules in the co-expression network (number of PCGs and lncRNAs, and of organ-specific genes). Colors correspond to those in the network in (a). c Gene connectivity as a function of evolutionary age. d Gene connectivity as a function of presence/absence of paralogs
Fig. 6Analysis of dated duplicated genes. a Species list assigned to different relative evolutionary periods. Red squares represent a duplication event. b Average Pearson correlation coefficient (PCC) and tissue expression complementarity (TEC) scores computed for the proteins assigned to particular ages. The number of genes duplicated at a particular age is indicated in parentheses on the x-axis. c Relationship between gene expression variation and gene duplications. The blue color represents the mean coefficient of variation (CV) for a real set of paralogs and red for a randomly assigned one. The last class on the x-axis (8) contains eight or more paralogs