| Literature DB >> 24684742 |
Ke Lin, Ningwen Zhang, Edouard I Severing, Harm Nijveen, Feng Cheng, Richard G F Visser, Xiaowu Wang, Dick de Ridder, Guusje Bonnema1.
Abstract
BACKGROUND: Brassica rapa is an economically important crop species. During its long breeding history, a large number of morphotypes have been generated, including leafy vegetables such as Chinese cabbage and pakchoi, turnip tuber crops and oil crops.Entities:
Mesh:
Substances:
Year: 2014 PMID: 24684742 PMCID: PMC4230417 DOI: 10.1186/1471-2164-15-250
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Three plants. Left: the Chinese cabbage cultivar, Chiifu; middle: an oil-like rapid cycling line (RC-144); right: Japanese vegetable turnip (VT-117).
Figure 2Definition of retained and lost genes. Illustrative examples of a retained and a lost gene in turnip. (a) A. thaliana gene A has three orthologous genes in turnip, but only two in Chiifu and rapid cycling; hence, we call A a retained gene for turnip based on the presence of A3. (b) Gene A is considered a lost gene for turnip based on the absence of A3.
Comparisons of Chiifu gene models made by MAKER and obtained from BRAD
| BRAD_only | MAKER_only | Overlap_reciprocal | Overlap_split | Overlap_join | Overlap_total | ||
|---|---|---|---|---|---|---|---|
| 100% | Gene | 8,500 | 26,118 | 11,271 | 156 | 2,880 | 32,519 |
| Exons | 30,121 | 42,416 | 164,479 | 26 | 895 | 176,454 | |
| 75% | Gene | 6,437 | 11,715 | 25,848 | 187 | 2,815 | 34,582 |
| Exons | 21,838 | 34,590 | 179,737 | 47 | 964 | 184,743 | |
| 50% | Gene | 5,229 | 8,601 | 29,857 | 203 | 2,762 | 35,790 |
| Exons | 19,385 | 31,639 | 184,909 | 78 | 1,007 | 187,198 | |
| 25% | Gene | 4,239 | 6,544 | 33,617 | 719 | 2,719 | 36,780 |
| Exons | 18,270 | 30,178 | 187,577 | 347 | 1,039 | 188,313 |
Four different minimum overlap requirements (expressed as a fraction of the Chiifu reference gene model) used to compare two gene models at both gene level and exon level. The BRAD_only and MAKER_only columns represent features found only in the reference gene model and MAKER generated gene model respectively. Intersections between two gene models mainly include the fraction overlap reciprocal for both (overlap_reciprocal), overlaps that split one reference feature to many MAKER features (overlap_split) or join many reference features to one MAKER feature (overlap_join).
Figure 3Coverage of published Chiifu reference gene models compared with re-annotated Chiifu gene models. Coverage of published Chiifu reference gene models based on number of genes and exons compared with those re-annotated by MAKER, considering a prediction identical when overlapping the reference gene model by at least 75%.
Figure 4Genomic variations anchored to chromosomes in resequenced turnip and rapid cycling genomes. Genomic variants including insertions, deletions and SNPs between resequenced turnip, rapid cycling and reference Chiifu genome on each chromosome. On each chromosome (A01-A10), the middle row represents either common or unique variations in the Chiifue genome. Genomic variations between rapid cycling and Chiifu are presented in the top three rows, variations between turnip and Chiifu in the bottom three rows. Common variations have the same sequence composition at the same position in both rapid cycling and turnip; unique variations have different nucleotides between the three genomes at the same position.
Figure 5Number of genes predicted to be functionally affected by genomic variants. Before annotation, genes were considered functionally affected in the rapid cycling line or in turnip when one of the following variants was found w.r.t. the Chiifu genome: SPLICE_SITE_ACCEPTOR, SPLICE_SITE_DONOR, START_LOST, EXON_DELETED, FRAME_SHIFT, STOP_GAINED or STOP_LOST. Genes were considered affected if they had no orthologous gene at the same chromosome/scaffold of Chiifu genome after its re-annotation.
Number of genomic variants located in exons, introns, UTRs and intergenic regions over three subgenomes
| Turnip | Rapid cycling | |||||
|---|---|---|---|---|---|---|
| Type (alphabetical order) | LF | MF1 | MF2 | LF | MF1 | MF2 |
| EXON-Count | 80,093 | 36,695 | 44,991 | 89,042 | 42,165 | 49,868 |
| EXON-Length | 170,181 | 138,159 | 101,503 | 238,506 | 112,953 | 122,966 |
| INTERGENIC-Count | 243,148 | 129,852 | 179,159 | 280,600 | 154,959 | 192,416 |
| INTERGENIC-Length | 620,502 | 355,733 | 460,022 | 745,887 | 447,983 | 534,174 |
| INTRON-Count | 138,428 | 69,785 | 83,492 | 160,185 | 80,189 | 94,772 |
| INTRON-Length | 312,855 | 160,185 | 179,874 | 357,133 | 166,618 | 216,723 |
| UTR-Count | 8,837 | 4,816 | 5,995 | 10,340 | 5,673 | 6,619 |
| UTR-Length | 20,918 | 9,232 | 11,538 | 34,871 | 11,044 | 12,382 |
Genomic variants mapped on four different types of genome regions grouped by three subgenomes (LF, MF1, MF2) in turnip and rapid cycling. The counts indicate the number of variations in each genomic region; the length is the sum over all genomic variations. SNPs are defined as being 1 bp long.
Figure 6pan-genome composition. There are 38,186 genes classified as common in the B. rapa pan genome; the number of unique genes was 1,464 in Chiifu, 1,118 in turnip and 1,090 in rapid cycling.
Top ten GO biological processes with most genes assigned in Chiifu, turnip and rapid cycling
| Chiifu | Turnip | Rapid cycling | |||
|---|---|---|---|---|---|
| Biological process | #genes | Biological process | #genes | Biological process | #genes |
| Response to stress | 40 | Response to stress | 65 | Response to stress | 31 |
| Response to abiotic stimulus | 31 | Protein modification process | 35 | Protein modification process | 16 |
| Response to endogenous stimulus | 18 | Catabolic process | 32 | Response to abiotic stimulus | 15 |
| Secondary metabolic process | 16 | Transport | 30 | Signal transduction | 13 |
| Signal transduction | 16 | Response to abiotic stimulus | 29 | Cellular component organization | 13 |
| Catabolic process | 14 | Signal transduction | 27 | Response to biotic stimulus | 12 |
| Cellular component organization | 13 | Response to biotic stimulus | 21 | Transport | 12 |
| Anatomical structure morphogenesis | 11 | Carbohydrate metabolic process | 18 | Catabolic process | 11 |
| Response to biotic stimulus | 11 | Cellular component organization | 17 | Response to endogenous stimulus | 9 |
| Protein modification process | 10 | Response to endogenous stimulus | 14 | Lipid metabolic process | 9 |
Only dispensable and unique genes were included in the analysis. The term “response to stress” is the most over-represented and seven out of these ten GO terms are found in Chiifu, turnip as well as rapid cycling.
Figure 7Subgenome composition of dispensable and unique genes in three genotypes. The subgenome composition of dispensable and unique genes in three B. rapa genotypes in terms of (a) number of genes; (b) frequency of gene changes, calculated as number of changed genes divided by the total number of total genes in the subgenome. LF: less fractionated subgenome, with the highest gene densities; MF1: more fractionated subgenome 1, with moderate gene densities; MF2: most fractionated subgenome 2, with lowest gene densities.
Orthologous genes of and found in Chiifu, turnip and rapid cycling
| Number of copies in
| 1 | 2 | 3 | |||
|---|---|---|---|---|---|---|
| Compared genomes | at | th | at | th | at | th |
| Chiifu | 15,237 | 17,880 | 3,706 | 3,041 | 691 | 540 |
| Turnip | 15,190 | 17,812 | 3,676 | 2,907 | 713 | 503 |
| Rapid cycling | 15,225 | 17,774 | 3,595 | 2,960 | 681 | 519 |
The number of A. thaliana (“at”) and T. halophile (“th”) genes having one, two and three copies of orthologous genes in Chiifu, turnip and rapid cycling.
Retained and lost genes in Chiifu, turnip and rapid cycling
| Chiifu | Turnip | Rapid cycling | |
|---|---|---|---|
| Genes with copy number changes | 1,151 | 1,053 | 932 |
| Retained genes | 265 | 180 | 156 |
| Lost genes | 886 | 873 | 906 |
| Retained|lost gene assigned to gene families | 40|23 | 35|19 | 33|20 |
| Genes present in unique and dispensable gene set, without at or th orthologs* | 231 | 280 | 336 |
Dispensable and unique genes having orthologs in A. thaliana or T. halophila were included to determine the retained and lost genes. The latest curated gene family assignment of A. thaliana genes from TAIR was used.
* The number of B. rapa unique and dispensable genes without A. thaliana (“at”) or T. halophile (“th”) orthologous genes.
Gene family assignment for retained and lost genes in Chiifu, turnip and rapid cycling
| Chiifu | Turnip | Rapid cycling | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| TAIR gene family description | Class | Number of genes in common sets | Number of genes in unique and dispensable sets | TAIR gene family description | Class | Number of genes in common sets | Number of genes in unique and dispensable sets | TAIR gene family description | Class | Number of genes in common sets | Number of genes in unique and dispensable sets |
| C2H2 transcription factor family | LOST | Acyl lipid metabolism family | LOST | C3H transcription factor family | LOST | ||||||
| Core DNA replication machinery | LOST | C2H2 Transcription factor family | LOST | Cytochrome P450 | LOST | ||||||
| Cytochrome P450 | LOST | Class III peroxidase | LOST | Cytoplasmic ribosomal protein gene family | LOST | ||||||
| Cytoplasmic ribosomal protein gene family | LOST | Cytochrome P450 | LOST | Glutathione S-transferase family | LOST | ||||||
| Cytoskeleton | LOST | Cytoplasmic ribosomal protein gene family | LOST | Glycoside hydrolase gene families | LOST | ||||||
| EF-hand containing proteins | LOST | EF-hand containing proteins | LOST | Glycosyltransferase gene families | LOST | ||||||
| Expansins | LOST | FH2 proteins | LOST | Homeobox transcription factor family | LOST | ||||||
| Glutathione S-transferase family | LOST | Glycosyltransferase gene families | LOST | Inorganic solute cotransporters | LOST | ||||||
| Glycosyltransferase gene families | LOST | MIP family | LOST | Lipid metabolism gene families | LOST | ||||||
| Lateral organ boundaries gene family | LOST | Miscellaneous membrane protein families | LOST | MAP kinase kinase kinase kinase (MAPKKKK) family | LOST | ||||||
| Miscellaneous membrane protein families | LOST | Monosaccharide transporter-like gene family | LOST | Miscellaneous membrane protein families | LOST | ||||||
| MYB Transcription factor family | LOST | Trihelix transcription factor family | LOST | Primary pumps (ATPases) gene families | LOST | ||||||
| Primary pumps (ATPases) gene family | LOST | ARF transcription factor family | RETAINED | 15 | 1 | Receptor kinase-like protein family | LOST | ||||
| Acyl Lipid metabolism family | RETAINED | 587 | 3 | Carbohydrate esterase gene families | RETAINED | 75 | 1 | Acyl Lipid metabolism family | RETAINED | 587 | 4 |
| AP2-EREBP transcription factor family | RETAINED | 170 | 1 | Chloroplast and mitochondria gene families | RETAINED | 53 | 1 | BZR transcription factor family | RETAINED | 6 | 1 |
| ARF transcription factor family | RETAINED | 15 | 1 | Class III peroxidase | RETAINED | 71 | 1 | CBL-interacting serione-threonine Protein Kinases | RETAINED | 24 | 1 |
| C2H2 transcription factor family | RETAINED | 215 | 1 | Cytochrome P450 | RETAINED | 144 | 2 | Core cell cycle genes | RETAINED | 65 | 1 |
| C3H transcription factor family | RETAINED | 162 | 1 | Cytoplasmic ribosomal protein gene family | RETAINED | 221 | 1 | Cytochrome P450 | RETAINED | 144 | 3 |
| CCAAT-HAP3 transcription factor family | RETAINED | 13 | 1 | GeBP transcription factor family | RETAINED | 10 | 1 | Glycoside hydrolase gene families | RETAINED | 335 | 1 |
| Cytoplasmic ribosomal protein gene family | RETAINED | 221 | 2 | Glutathione S-transferase family | RETAINED | 37 | 2 | Glycosyltransferase gene families | RETAINED | 280 | 1 |
| EF-hand containing proteins | RETAINED | 175 | 2 | Glycosyltransferase gene families | RETAINED | 280 | 1 | Histidine phosphotransfer proteins | RETAINED | 5 | 1 |
| Eukaryotic initiation factor gene family | RETAINED | 101 | 1 | HSP70s | RETAINED | 4 | 1 | Inorganic solute cotransporters | RETAINED | 93 | 1 |
| Glutathione S-transferase family | RETAINED | 37 | 1 | Lipid metabolism gene families | RETAINED | 106 | 2 | Lipid metabolism gene families | RETAINED | 106 | 1 |
| Glycoside hydrolase gene families | RETAINED | 335 | 2 | MADS-box transcription factor family | RETAINED | 89 | 2 | Miscellaneous membrane protein families | RETAINED | 415 | 3 |
| Glycosyltransferase gene families | RETAINED | 280 | 1 | Myosin | RETAINED | 17 | 1 | NAC transcription factor family | RETAINED | 97 | 1 |
| Lateral organ boundaries gene family | RETAINED | 37 | 2 | Organic solute cotransporters | RETAINED | 272 | 1 | Nodulin-like protein family | RETAINED | 65 | 3 |
| MAP kinase kinase kinase family | RETAINED | 74 | 1 | Receptor kinase-like protein family | RETAINED | 237 | 4 | Organic solute cotransporters | RETAINED | 272 | 2 |
| MIP family | RETAINED | 37 | 1 | REM transcription factor family | RETAINED | 20 | 1 | Pollen coat proteome | RETAINED | 3 | 1 |
| Miscellaneous membrane protein families | RETAINED | 415 | 2 | RCI2 gene family | RETAINED | 6 | 2 | ||||
| MYB transcription factor family | RETAINED | 154 | 1 | SNAREs | RETAINED | 66 | 1 | ||||
| Plant defensins superfamily | RETAINED | 6 | 1 | ||||||||
| Plant U-box protein (PUB) | RETAINED | 66 | 1 | ||||||||
| Primary pumps (ATPases) gene family | RETAINED | 31 | 1 | ||||||||
| Receptor kinase-like protein family | RETAINED | 237 | 2 | ||||||||
Overview of lost and retained genes assigned to A. thaliana gene families. The “LOST” gene family has no value in the dispensable and unique gene set because the inconsistency of counts in the other two genotypes.
Figure 8Network analysis of retained and lost genes in turnip. 155 A. thaliana peroxidase-related genes were selected. a) Five retained genes and four lost genes were identified in turnip, five of which were class III peroxidases. b) Summary of the functional protein interaction network found by STRING using five retained genes as input. c) Phenylpropanoid biosynthesis pathway in A. thaliana, including four retained genes and two lost genes. A. thaliana genes that encode enzymes are indicated by light green colored boxes; red resp. dark green boxes indicate genes with less resp. more copies in rapid cycling than in Chiifu and turnip.
Software and scripts used in the project
| Name | Running time (h) | Input format | Output format | Script purpose |
|---|---|---|---|---|
| cortex_var | 24 / genotype | Fastq | vcf | - |
| *cortex_combiner | < 1 | Vcf | fasta | Post-processing of cortex |
| *maker_pre_ws | 24 | Txt | Fasta | Pre-processing for MAKER |
| MAKER | 140/genotype | Fasta | gff, fasta | - |
| *ortholog_assign | 20/genotype | Fasta | csv, fasta | Post-processing of MAKER |
| NCBI BLAST | 200 | Fasta | xml | - |
| *InterProScan_ws | 100 | Fasta | xml | Pre-processing for Blast2GO |
| Blast2GO | < 1 | Xml | csv | - |
| *run_metacyc | < 0.1 | Txt | csv | Post-processing of Blast2GO |
| *beast_pre | < 5 | Txt | nex | Pre-processing of BEAST |
| BEAST | 24 | Nex | png | - |
| *choose_fasta | < 0.1 | Txt | Fasta | Extract sequence from fasta |
The order in the table indicates the flow of the analysis, except for the script “choose_fasta” which can be used anytime when needed. Names starting with an asterisk are scripts generated specifically for this work. The script purpose column indicates when the scripts should be used before or after certain program. All scripts run under Linux and provide a short usage summary when started without arguments. “txt” input format: a list of file names used for the scripts.
Figure 9Workflow of the study. The workflow describes the methods and logic used in the study, from raw sequence reads to the annotation of the full complement of genes in a genome. Newly created scripts are marked by “Script”. Any number of genomes can be analyzed using this workflow, provided there is sufficient computational power.
Flowering time related lost genes in three genotypes
| Genotype | Ara ID | Gene name | Pathway | Gene full name | Protein function |
|---|---|---|---|---|---|
| Rapid cycling | AT2G32950 | COP1 | Photoperiod | CONSTITUTIVE PHOTOMORPHOGENIC 1 | E3 ubiquitin ligase |
| Rapid cycling | AT3G11440 | MYB65 | Gibberellin | MYB65 | MYB transcription factor |
| Rapid cycling | AT3G20740 | FIS3 | Vernalization | FERTILIZATION-INDEPENDENT ENDOSPERM | Encodes a protein similar to the transcriptional regular of the animal Polycomb group |
| Rapid cycling | AT5G03790 | LMI1 | Flower development | LATE MERISTEM IDENTITY 1 | HD-Zip transcription factor |
| Rapid cycling | AT5G47010 | LBA1 | Metabolic process | LOW-LEVEL BETA-AMYLASE 1 | Required for nonsense-mediated mRNA decay |
| Chiifu | AT1G04440 | CKL13 | Vernalization | CASEIN KINASE LIKE 13 | Protein serine/threonine kinase activity |
| Chiifu | AT4G25470 | CBF2 | Vernalization | C-REPEAT/DRE BINDING FACTOR 2 | Encodes a member of the DREB subfamily A-1 of ERF/AP2 transcription factor family |
| Chiifu | AT5G59710 | VIP2 | Vernalization | VIRE2 INTERACTING PROTEIN 2 | Encodes a nuclear-localized NOT (negative on TATA-less) domain-containing |
| Turnip | AT1G53090 | SPA4 | Photoperiod | SPA1-RELATED 4 | WD-40 and protein kinase-like domain |
| Turnip | AT4G27430 | CIP7 | Photoperiod | COP1-INTERACTING PROTEIN 7 | - |
| Turnip | AT5G64813 | LIP1 | Photoperiod | LIGHT INSENSITIVE PERIOD 1 | GTPase |
Five lost genes are related to flowering time in Rapid cycling, covering all five categories of lowering time genes. In the other genomes just three genes are found, related only to photoperiod in turnip and vernalization in Chiifu.