| Literature DB >> 22908038 |
Salim Bourras1, Michel Meyer, Jonathan Grandaubert, Nicolas Lapalu, Isabelle Fudal, Juliette Linglin, Benedicte Ollivier, Françoise Blaise, Marie-Hélène Balesdent, Thierry Rouxel.
Abstract
The ever-increasing generation of sequence data is accompanied by unsatisfactory functional annotation, and complex genomes, such as those of plants and filamentous fungi, show a large number of genes with no predicted or known function. For functional annotation of unknown or hypothetical genes, the production of collections of mutants using Agrobacterium tumefaciens-mediated transformation (ATMT) associated with genotyping and phenotyping has gained wide acceptance. ATMT is also widely used to identify pathogenicity determinants in pathogenic fungi. A systematic analysis of T-DNA borders was performed in an ATMT-mutagenized collection of the phytopathogenic fungus Leptosphaeria maculans to evaluate the features of T-DNA integration in its particular transposable element-rich compartmentalized genome. A total of 318 T-DNA tags were recovered and analyzed for biases in chromosome and genic compartments, existence of CG/AT skews at the insertion site, and occurrence of microhomologies between the T-DNA left border (LB) and the target sequence. Functional annotation of targeted genes was done using the Gene Ontology annotation. The T-DNA integration mainly targeted gene-rich, transcriptionally active regions, and it favored biological processes consistent with the physiological status of a germinating spore. T-DNA integration was strongly biased toward regulatory regions, and mainly promoters. Consistent with the T-DNA intranuclear-targeting model, the density of T-DNA insertion correlated with CG skew near the transcription initiation site. The existence of microhomologies between promoter sequences and the T-DNA LB flanking sequence was also consistent with T-DNA integration to host DNA mediated by homologous recombination based on the microhomology-mediated end-joining pathway.Entities:
Keywords: Gene Ontology; Leptosphaeria maculans; T-DNA; genome structure; mutagenesis
Mesh:
Substances:
Year: 2012 PMID: 22908038 PMCID: PMC3411245 DOI: 10.1534/g3.112.002048
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
Figure 1 A schematic representation of occurrence of T-DNA insertion events along four L. maculans chromosomes. For each chromosome, the upper plot shows the location of the T-DNA integration events, and the lower plot schematizes variations in GC content along the chromosome, defining AT-rich and GC-equilibrated isochores. The average GC percentage of the chromosome is indicated.
Figure 2 Correlation between the number of T-DNA integrations and chromosomal features. The features investigated for each chromosome were (A) chromosome size; (B) total size of the GC isochores; (C) total size of the transcriptional regions [defined as the sum of regulatory sequences (promoter + terminator) and gene-coding sequences (exons + introns)]; (D) total size of the regulatory regions (defined as the sum of promoter and terminator sequences); (E) total size of gene-coding regions (defined as the sum of exonic and intronic sequences); (F) total size of the exonic sequences; and (G) total size of the intronic sequences. Regression curves and the 95% confidence intervals are plotted in continuous and discontinuous lines, respectively.
Distribution of T-DNA insertion events within L. maculans genomic regions
| Genomic Regions | T-DNA Insertion Events | |||||
|---|---|---|---|---|---|---|
| Type | Size (Mb) | % Genome | Observed | Expected | SR | |
| Regulatory | 11.8 | 26 | 200 | 83 | 12.92 | |
| 5′ promoting | 5.9 | 13 | 122 | 41 | 12.64 | |
| 3′ terminating | 5.9 | 13 | 78 | 41 | 5.77 | |
| Coding | 17.6 | 39 | 119 | 123 | −0.37 | |
| Exons | 15.3 | 34 | 86 | 107 | −2.04 | |
| Introns | 2.3 | 5 | 33 | 16 | 4.24 | |
| Shared | — | — | 41 | — | ||
| Intergenic | 15.7 | 35 | 40 | 110 | −6.67 | |
Expected number of T-DNA integration events (T-IE) [= (T-IE genomic density) × (genomic region size)]. Values were approximated to the nearest integer.
Standardized residues. We considered a normal distribution of SRs because we cannot reject the null hypothesis as revealed by the Kolmogorov-Smirnov test (P-value = 0.976, α = 0.05).
Regulatory regions, defined as the sum of promoting and terminating regions of the 12,469 predicted genes of L. maculans; Gene-promoting regions, 500 bp upstream of the start codon; Gene-terminating regions, 500 bp downstream of the stop codon; Gene-coding regions, from start to stop codons, including introns; Shared, common regulatory regions shared by two head-to-tail nearby genes; Intergenic, genomic regions corresponding to none of the previous criteria. Note that overlaps between compartments may occur, leading to a total number of sequences higher than 318.
Figure 3 The link between CG skew and AT skew in gene promoter regions and favored T-DNA integration events. A. Density of T-DNA insertions in promoter regions (green curve), CG skew (red curve) and AT skew (blue curve) variations along T-DNA-targeted gene promoter regions, as a function of location from the ATG. B. Comparison of CG skew (red curve) and AT skew (blue curve) variations between promoter regions of T-DNA–targeted genes (plain lines) and promoters of all L. maculans predicted genes (dotted lines).
Figure 4 Analysis of CG (A) and AT skews (B) at T-DNA insertion sites in four targeted compartments of the genome. Sequences 200 bp upstream and downstream of the integration sites were extracted and CG/AT skews were calculated. The sequences were then grouped according to four compartments of the genome: promoter (red curves), terminator (green curves), intergenic (blue curves), and protein coding (black curves) regions.
Figure 5 The search for microhomology between the host-DNA and T-DNA left border. One hundred and sixty 25-bp preinsertion sites were investigated for occurrence of 5-bp-long consecutive motifs corresponding to identical motifs in the T-DNA left border. The 41 sequences of preinsertion sites that show identity with consecutive, 1-bp sliding window, and 5-bp-long motifs are displayed.
Figure 6 Occurrence of sequence microhomologies to eukaryotic core promoter elements (TATA box, CAT box, and Initiator) in the T-DNA LB and 15 upstream supplementary bases.
Figure 7 Analysis of microhomology at T-DNA preinsertion sites. Frequency of occurrence of single bases identical to those of the 25-bp T-DNA left border in the genome preinsertion sites were analyzed. The T-DNA LB sequence is illustrated, and homologs of the TATA box and Inr in the LB sequence are boxed.
Gene Ontology annotation of T-DNA–targeted genes using the “biological process” vocabulary
| Whole Genome | T-DNA–targeted Genes | |||
|---|---|---|---|---|
| Annot. | Obs. Annot. | Exp. Annot. | SR | |
| Immune system process | 1 | 0 | 0.03 | −0.16 |
| Cell proliferation | 3 | 0 | 0.08 | −0.28 |
| Death | 4 | 0 | 0.11 | −0.33 |
| Locomotion | 4 | 0 | 0.11 | −0.33 |
| Biological adhesion | 6 | 0 | 0.16 | −0.40 |
| Nitrogen utilization | 9 | 0 | 0.24 | −0.49 |
| Reproduction | 14 | 1 | 0.38 | 1.02 |
| Carbon utilization | 15 | 1 | 0.40 | 0.94 |
| Multi-organism process | 43 | 1 | 1.15 | −0.14 |
| Cellular component organization | 177 | 5 | 4.75 | 0.11 |
| Multicellular organismal process | 228 | 6 | 6.12 | −0.05 |
| Cellular component biogenesis | 232 | 8 | 6.23 | 0.71 |
| Developmental process | 257 | 8 | 6.90 | 0.42 |
| Response to stimulus | 262 | 4 | 7.03 | −1.14 |
| Biological regulation | 469 | 12 | 12.59 | −0.17 |
| Localization | 706 | 21 | 18.95 | 0.47 |
| Cellular process | 2489 | 62 | 66.79 | −0.59 |
| Metabolic process | 3070 | 84 | 82.39 | 0.18 |
| Total | 8198 | 220 | 220 | — |
Number of annotations generated per category for the 12,469 L. maculans predicted genes.
Observed number of annotations generated by the GO analysis.
Expected number of annotations [= (∑annot.) × P(functional category)]. Where (∑annot.) is the sum of all generated annotations, and P(functional category) is whole genome probability of the considered functional category. Values were approximated to two decimals.
Standardized residuals. We considered a normal distribution of SRs because we cannot reject the null hypothesis as revealed by the Kolmogorov-Smirnov test (P-value = 0.391, α = 0.05). Biased categories are indicated in bold.
Distribution of T-DNA insertion events along the L. maculans chromosomes
| Chromosomes | T-DNA Insertion Events | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| No. | SC | Size (Mb) | GC % | Gene Content | GC Size | ρ | Observed | Expected | SR | |
| 1 | 0 | 4.3 | 45.7 | 1276 | 2.9 | 5.6 | 24 | 30 | −1.06 | |
| 2 | 2+19 | 4.1 | 44.1 | 1206 | 2.8 | 7.8 | 32 | 29 | 0.58 | |
| 3 | 6+29+11 | 3.7 | 39.7 | 810 | 1.9 | 6.0 | 22 | 26 | −0.74 | |
| 4 | 8+10 | 3.6 | 43.2 | 1055 | 2.1 | 7.6 | 27 | 25 | 0.41 | |
| 5 | 1 | |||||||||
| 6 | 12+15+32 | 4.1 | 41.6 | 888 | 2.0 | 5.6 | 23 | 29 | −1.06 | |
| 7 | 20+21+23 | 3.3 | 44.7 | 751 | 1.4 | 5.2 | 17 | 23 | −1.27 | |
| 8 | 3+31 | 2.5 | 43.3 | 604 | 0.8 | 6.0 | 15 | 17 | −0.58 | |
| 9 | 4 | 1.9 | 46.6 | 666 | 1.4 | 9.4 | 18 | 13 | 1.25 | |
| 10 | 5 | |||||||||
| 11 | 9 | 1.8 | 45.1 | 478 | 1.1 | 7.9 | 14 | 12 | 0.45 | |
| 12 | 7 | 1.8 | 46.0 | 568 | 1.2 | 4.5 | 8 | 12 | −1.25 | |
| 13 | 13 | 1.6 | 43.9 | 493 | 0.9 | 6.7 | 11 | 11 | −0.13 | |
| 14 | 14 | 1.5 | 47.4 | 513 | 1.2 | 8.5 | 13 | 11 | 0.69 | |
| 15 | 17 | 1.4 | 43.7 | 416 | 1.0 | 3.5 | 5 | 10 | −1.61 | |
| 16 | 16 | 1.4 | 44.3 | 353 | 0.8 | 3.6 | 5 | 10 | −1.53 | |
| 17 | 18 | 1.4 | 44.7 | 369 | 0.8 | 3.0 | 4 | 9 | −1.77 | |
| 18 | 22 | |||||||||
| Un. | — | 0.7 | — | — | — | — | 8 | — | — | |
| Genome | — | 45.1 | 44.1 | 12469 | — | 7.0 | 318 | |||
Supercontigs reassembled to make up whole chromosomes.
Number of predicted genes per chromosome.
Total size of GC isochores per chromosome.
T-DNA insertion event (T-IE) density [= (number of T-IEs per chromosome / chromosome size)].
Based on density of T-DNA insertion events in the whole genome (7 T-IE/Mb), the expected number of T-IEs per chromosome was calculated as [(chromosome size) × (T-IE whole-genome density)]. Values were approximated to the nearest integer.
Standardized residuals. We considered a normal distribution of SRs because we cannot reject the null hypothesis as revealed by the Kolmogorov-Smirnov test (P-value = 0.475, α = 0.05). Chromosomes showing a significant bias in number of T-DNA insertion events are indicated in bold.
Unassembled genomic sequences (summing up approximately to 0.7 Mb).
Gene Ontology annotation of “biological process” for chromosomes 5 and 10
| Whole Genome | Chromosome 5 | Chromosome 10 | |||||
|---|---|---|---|---|---|---|---|
| Annot. | Obs. Annot. | Exp. Annot. | SR | Obs. Annot. | Exp. Annot. | SR | |
| Immune system process | 1 | 0 | 0.10 | −0.31 | 0 | 0.06 | −0.24 |
| Cell proliferation | 3 | 0 | 0.29 | −0.54 | 0 | 0.17 | −0.41 |
| Death | 4 | 1 | 0.38 | 1.00 | 0 | 0.22 | −0.47 |
| Locomotion | 4 | 1 | 0.38 | 1.00 | 0 | 0.22 | −0.47 |
| Biological adhesion | 6 | 1 | 0.57 | 0.56 | 0 | 0.33 | −0.58 |
| Growth | 6 | 1 | 0.57 | 0.56 | 1 | 0.33 | 1.15 |
| Nitrogen utilization | 9 | 1 | 0.86 | 0.15 | 0 | 0.50 | −0.71 |
| Reproduction | 14 | 2 | 1.34 | 0.57 | 2 | 0.78 | 1.38 |
| Multi-organism process | 43 | 4 | 4.11 | −0.05 | 0 | 2.39 | −1.55 |
| Cell wall organization or biogenesis | 49 | 5 | 4.68 | 0.15 | 2 | 2.73 | −0.44 |
| Signaling | 153 | 14 | 14.61 | −0.16 | 7 | 8.51 | −0.52 |
| Cellular component organization | 177 | 19 | 16.91 | 0.51 | 11 | 9.85 | 0.37 |
| Multicellular organismal process | 228 | 19 | 21.78 | −0.59 | 11 | 12.68 | −0.47 |
| Cellular component biogenesis | 232 | 31 | 22.16 | 1.88 | 17 | 12.90 | 1.14 |
| Developmental process | 257 | 21 | 24.55 | −0.72 | 12 | 14.30 | −0.61 |
| Response to stimulus | 262 | 16 | 25.02 | −1.80 | 22 | 14.57 | 1.95 |
| Biological regulation | 469 | 47 | 44.79 | 0.33 | 31 | 26.09 | 0.96 |
| Localization | 706 | 62 | 67.43 | −0.66 | 33 | 39.27 | −1.00 |
| Cellular process | 2489 | 240 | 237.73 | 0.15 | 149 | 138.45 | 0.90 |
| Metabolic process | 3070 | 293 | 293.22 | −0.01 | 157 | 170.76 | −1.05 |
| Total | 8198 | 783 | 783 | — | 456 | 456 | _ |
Number of annotations generated per category for the 12,469 genes predicted in L. maculans.
Observed number of annotations generated for all predicted genes on chromosome 5 and 10.
Expected number of annotations [= (∑annot.) x P(functional category)]. Where (∑annot.) is the sum of all generated annotations, and P(functional category) is whole genome probability of the considered functional category. Values were approximated to two decimals.
Standardized residuals. We considered a normal distribution of SRs because we cannot reject the null hypothesis as revealed by the Kolmogorov-Smirnov test (Chromosome 5: P-value = 0.453, α = 0.05. Chromosome 10: P-value = 0.188, α = 0.05). Biased categories are indicated in bold.