| Literature DB >> 21106049 |
James K Hane1, Richard P Oliver.
Abstract
BACKGROUND: Repeat-induced point mutation (RIP) is a fungal genome defence mechanism guarding against transposon invasion. RIP mutates the sequence of repeated DNA and over time renders the affected regions unrecognisable by similarity search tools such as BLAST.Entities:
Mesh:
Substances:
Year: 2010 PMID: 21106049 PMCID: PMC3017866 DOI: 10.1186/1471-2164-11-655
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
The four potential di-nucleotide RIP mutations detected by RIPCAL.
| RIP mutation | Counted di-nucleotides | ||||
|---|---|---|---|---|---|
| Forward | Reverse complement | Forward | Reverse complement | ||
| pre-RIP | post-RIP | pre-RIP | post-RIP | ||
| CpA | TpA | TpG | TpA | CpA, TpA | TpG, TpA |
| CpC | TpC | GpG | GpA | CpC, TpC | GpG, GpA |
| CpG | TpG | CpG | CpA | CpG, TpG | CpG, CpA |
| CpT | TpT | ApA | ApG | CpT, TpT | ApG, ApA |
The deRIP process counts the occurrence of the contributing di-nucleotides incrementally across a multiple alignment of repeats and alters the consensus sequence at each position to the appropriate pre-RIP di-nucleotide sequence.
Figure 1The distribution of the Y1 family of rDNA repeats and their susceptibility to RIP mutation. (A) A multiple alignment of Y1 rDNA repeats found in the genome of S. nodorum strain SN15. Each repeat was compared for mutation with the alignment majority consensus (black = match, grey = mismatch, white = gap). The mutation of CpN di-nucleotides is color-coded according to the legend (left). In S. nodorum RIP is characterised by the mutation of CpA di-nucleotides (red). Y1 rDNA repeats are grouped according to their genomic location and length. Full length rDNA repeats scattered randomly throughout the genome are prone to RIP whereas short, incomplete copies (defined as <1 kb but generally <300 bp) are not affected. rDNA repeats located in a tandem array at the 3' end of scaffold 5 [NCBI: CH445329] are protected from RIP, excepting a single repeat. (B) The S. nodorum tandem rDNA array, also known as the nucleolus organiser region (NOR), and flanking regions. Region 1 contains gene encoding regions, region 2 contains non-rDNA repeats and regions 3 and 4 comprise the tandem rDNA array. RIP mutates repetitive DNA, hence genes in region 1 are not RIP-mutated but repeats in region 2 are RIP-mutated (indicated in red). The tandem rDNA array repeats are protected from RIP (region 4), except for a single repeat at the array terminus (region 3).
Validation of the deRIP technique comparing homology of majority- and deRIP-consensus sequences with non-RIP-affected sequences.
| Blastn homology | Needleman-Wunsch Global Alignment | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Majority consensus | deRIP consensus | deRIP improvement factor | Majority consensus | deRIP consensus | deRIP improvement to percent identity | |||||
| Repeat class | Hit Accession | e-value | bitscore | e-value | bitscore | Percent identity | ||||
| Elsa | AJ277966 | 1.00E-51 | 216 | 1.00E-121 | 381 | 1.8 X | 69.2% | 73.1% | 3.9% | |
| Molly | AJ488502 | 7.00E-07 | 66 | 3.00E-86 | 329 | 5.0 X | 72.3% | 77.5% | 5.2% | |
| Pixie | AJ488503 | 5.00E-07 | 66 | 2.00E-28 | 137 | 2.1 X | 72.5% | 75% | 2.5% | |
| Long, non-rDNA array repeats > 1 kb | 0 | 12800 | 0 | 17220 | 1.3 X | 89.5% | 94.0% | 4.5% | ||
| Short, non-rDNA array repeats < 1 kb | 3.00E-10 | 58 | 1.00E-27 | 122 | 2.1 X | 46.2% a | 45.6% a | -0.6% | ||
| RIP-mutated terminal rDNA array repeat | 0 | 8258 | -- | -- | -- | 85.8% | -- | -- | ||
| a Needleman-Wunsch global alignment was performed using a sub-region of long rDNA repeats corresponding to the short rDNA repeat consensus | ||||||||||
Blastn hits and pairwise global percent identities to non-RIP-affected sequences were compared between the majority consensus and deRIP consensus versions. (A) The transposons Elsa, Molly and Pixie of S. nodorum SN15 were compared to active copies of an alternate strain. In all 3 cases the deRIP sequences match best to the active transposons. This is indicated by the 'deRIP improvement' factor and the differences in percent identities for global alignments. DeRIP improvement is a measure of how much better the deRIP consensus matched the hit compared to the majority consensus. DeRIP improvement > 1 indicates that the repeat family was derived from the hit or a related homolog, but was subsequently mutated by RIP. (B) RIP-protected copies of the S. nodorum rDNA repeat are located within a tandem array (Figure 1). RIP-susceptible copies were grouped by size into long (> 1 kB) and short (< 1 kB) categories and compared to the RIP-protected copies. Homology between RIP-protected repeats in rDNA array and long RIP-susceptible non-rDNA array repeats were improved by deRIP. The rDNA array also contains one RIP-affected repeat at its terminus which shows similar levels of homology to the rDNA array as the majority consensus of the long non-rDNA array repeats.
Figure 2Application of the deRIP process to the Molly transposon repeat family of . Molly is one of three S. nodorum repeats with known functionally transposable sequence available [NCBI AJ488502.1]. (A) Genomic matches to the Molly repeat family were aligned and compared for RIP-like polymorphism against a model sequence (in this case the majority consensus). RIP mutation of the form CpN ←→ TpN was color-coded as indicated in the legend. (B) The deRIP process was applied to a 51 bp sub-region of the alignment. A 'majority' consensus of the alignment represented the most abundant nucleotide at each alignment position. The deRIP consensus was derived from the majority consensus, however where di-nucleotides were detected exhibiting RIP-like polymorphism (Table 1) they were reverted back to their pre-RIP state. Changes in sequence between majority and deRIP consensus sequences was compared to the sequence of the active transposon. (C) Phylogram showing relationships between all genomic regions, majority consensus, deRIP consensus and active copy of the Molly repeat family. The deRIP consensus resembled the functional transposon more closely than the majority consensus, highest G:C content sequence and the majority of matching genomic regions.
Summary of RIP mutation in the repeat families of S. nodorum strain SN15.
| RIP Dominance Scores | NCBI NR Protein Blastx | GIRI Repbase Tblastx | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| R8 | 48 | 9143 | 2.96 | 1.95 | 2.91 | 126 | 77 | 1.96 | |||
| R10 | 59 | 1241 | 1.91 | 0.96 | 2.07 | 3 | 2 | 2.13 | 0 | 3 | |
| X0 | 76 | 3862 | 2.13 | 0.97 | 2.05 | 1 | 0 | 3 | 1 | 1.34 | |
| R9 | 72 | 4108 | 1.88 | 0.92 | 1.77 | 250 | 25 | 2.75 | 124 | 4 | 1.28 |
| Molly | 40 | 1862 | 1.21 | 0.64 | 1.73 | 250 | 161 | 3.78 | 34 | 15 | 1.92 |
| X3 | 213 | 9364 | 0.63 | 0.81 | 1.62 | 11 | 10 | 2.8 | |||
| X35 | 19 | 1157 | 1.5 | 1.34 | 1.43 | 1 | 0 | ||||
| X96 | 14 | 308 | 0.87 | 0.89 | 1.39 | ||||||
| X48 | 22 | 265 | 1.82 | 1.16 | 1.33 | ||||||
| R22 | 23 | 678 | 1.2 | 0.84 | 1.28 | 2 | 2 | 1.06 | |||
| X26 | 38 | 4628 | 1.16 | 1.08 | 1.19 | 57 | 57 | 1.38 | |||
| Pixie | 28 | 1845 | 0.77 | 0.57 | 1.06 | 250 | 190 | 1.79 | 17 | 18 | 1.25 |
| R37 | 98 | 1603 | 0.49 | 0.25 | 0.95 | 0 | 55 | 4 | 18 | 1.16 | |
| R31 | 23 | 3031 | 0.99 | 0.83 | 0.9 | 16 | 15 | 1.44 | 3 | 7 | 1.14 |
| X23 | 29 | 685 | 0.45 | 0.4 | 0.9 | 3 | 3 | 0.82 | |||
| X36 | 10 | 512 | 0.89 | 0.78 | 0.87 | 2 | 1 | 1.43 | |||
| Elsa | 17 | 5240 | 0.86 | 0.78 | 0.82 | 250 | 231 | 2.06 | 65 | 30 | 1.44 |
| R51 | 39 | 833 | 0.47 | 0.31 | 0.8 | 0 | 3 | 0 | 3 | ||
| X11 | 36 | 8555 | 0.83 | 0.71 | 0.78 | 250 | 250 | 1.35 | 250 | 228 | 2.18 |
| X12 | 29 | 2263 | 0.67 | 0.43 | 0.76 | 0 | 1 | 10 | 10 | 1.44 | |
| R39 | 29 | 2050 | 0.59 | 0.28 | 0.74 | 173 | 149 | 1.54 | 34 | 31 | 1.88 |
| X28 | 30 | 1784 | 0.83 | 0.59 | 0.73 | ||||||
| R25 | 23 | 3320 | 0.25 | 0.6 | 0.65 | 4 | 4 | 1.19 | 3 | 1 | 0.86 |
| X15 | 37 | 6231 | 0.61 | 0.46 | 0.61 | 250 | 250 | 1.45 | 243 | 217 | 1.66 |
| R38 | 25 | 358 | 0.2 | 0.14 | 0.5 | ||||||
RIP dominance, a measure of the strength of RIP mutation, is reported for all 3 different RIPCAL comparison methods: versus the highest G:C content sequence; versus the alignment 'majority' consensus and; versus the deRIP consensus. Measures of how much the predicted deRIP consensus of a repeat family resembles its original version, hit discovery scores and deRIP improvement factors, are also summarised for comparisons against NCBI NR Proteins via blastx and the GIRI Repbase database of repetitive elements via tblastx.
Classification of repeat family origin in S. nodorum SN15.
| Repeat family | Predicted origin after deRIP | comparison type | informative hits | Majority Consensus e-value | deRIP consensus e-value | deRIP improvement factor (maximum) | |
|---|---|---|---|---|---|---|---|
| X26 | Sub-telomeric, transposon remnant | Telomere-associated RecQ helicase | blastx vs NR | EAL89306.1 telomere-associated RecQ helicase, putative | 1.00E-07 | 2.00E-12 | 1.25 |
| R25 | Transposon remnant | Histone H3 | blastx vs NR | EDU47581.1 histone H3 | 0.032 | 2.00E-04 | 1.16 |
| tblastx vs Repbase | TDD4 DNA transposon | 6.00E-04 | |||||
| R10 | Unknown | Uncharacterized endogenous gene region and DNA transposon | blastx vs NR | EAT76576.1 hypothetical protein SNOG_15997 | 2.2 | 2.00E-13 | 2.13 |
| blastx vs NR | EAT81769.1 hypothetical protein SNOG_11270 | 5.00E-11 | |||||
| blastx vs NR | EAT76052.1 hypothetical protein SNOG_16585 | 0.006 | 2.00E-08 | 1.40 | |||
| tblastx vs Repbase | CR1-3_HM CR1 | 9.00E-06 | |||||
| R31 | Unknown | DNA Transposon | blastx vs NR | CAP79587.1 Pc23g00930 | 0.013 | 1.00E-06 | 1.28 |
| tblastx vs Repbase | hAT-1_AN hAT DNA transposon | 1.00E-05 | 1.00E-06 | 1.07 | |||
| R39 | Unknown | Mariner/Tc1-like DNA transposon | blastx vs NR | EAT91063.1 hypothetical protein SNOG_01414 | 2.00E-62 | 3.00E-73 | 1.15 |
| blastx vs NR | EED11513.1 pogo transposable element, putative | 2.00E-28 | 1.00E-36 | 1.20 | |||
| tblastx vs Repbase | Mariner-9_AN Mariner/Tc1 | 8.00E-37 | 1.00E-25 | 1.01 | |||
| R51 | Unknown | Mariner/Tc1-like DNA transposon | tblastx vs Repbase | P-29_HM P | 1.00E-05 | ||
| tblastx vs Repbase | Mariner-31_HM Mariner/Tc1 | 3.00E-05 | |||||
| X23 | Unknown | LTR Retrotransposon | tblastx vs Repbase | ATCOPIA80_I Copia | 1.00E-04 | ||
| tblastx vs Repbase | CR1-3_HM CR1 | 9.00E-05 | 3.00E-04 | 0.82 | |||
| X36 | Unknown | Retrotransposon | blastx vs NR | EAS29858.1 hypothetical protein CIMG_08604 | 4.9 | 2.00E-04 | 1.43 |
| blastx vs NR | gag-pol polyprotein | 4.00E-03 | |||||
| X3X3R8 | X3: Helicase | Endogenous gene cluster containing tandem duplicated Rad5/SNF2-like helicase, Rad6/ubiquitin conjugating enzyme and uncharacterised ORFs | blastx vs NR | EAT83378.1 hypothetical protein SNOG_09186 | 1.00E-165 | 0 | 1.48 |
| blastx vs NR | EAT90556.1 hypothetical protein SNOG_02344 | 6.00E-75 | 1.00E-122 | 1.54 | |||
| blastx vs NR | EAT83381.1 hypothetical protein SNOG_09189 | 8.00E-93 | 1.00E-117 | 1.51 | |||
| blastx vs NR | EAT90553.1 hypothetical protein SNOG_02341 | 7.00E-61 | 1.00E-100 | 1.51 | |||
| blastx vs NR | EAT92620.1 hypothetical protein SNOG_16597 | 9.00E-39 | 2.00E-49 | 1.43 | |||
| blastx vs NR | EAT91018.1 hypothetical protein SNOG_01369 | 3.00E-30 | 1.00E-48 | 1.44 | |||
| blastx vs NR | EAT90555.1 hypothetical protein SNOG_02343 | 7.00E-36 | 5.00E-33 | 1.31 | |||
| blastx vs NR | EAT90554.1 hypothetical protein SNOG_02342 | 1.00E-36 | 2.00E-26 | 1.34 | |||
| blastx vs NR | EAT83379.1 hypothetical protein SNOG_09187 | 3.00E-14 | 1.00E-21 | 1.28 | |||
| blastx vs NR | EAT91020.2 hypothetical protein SNOG_01371 | 3.00E-15 | 2.00E-20 | 1.19 | |||
| blastx vs NR | EAT83294.1 hypothetical protein SNOG_09102 | 2.00E-13 | 4.00E-20 | 1.25 | |||
| blastx vs NR | EAT83377.2 hypothetical protein SNOG_09185 | 1.00E-13 | 8.00E-20 | 1.23 | |||
| blastx vs NR | EAT92618.1 hypothetical protein SNOG_16595 | 3.00E-07 | 4.00E-19 | 1.60 | |||
| blastx vs NR | EAT83380.1 hypothetical protein SNOG_09188 | 4.00E-05 | 7.00E-15 | 1.57 | |||
| blastx vs NR | EAT91021.1 hypothetical protein SNOG_01372 | 2.00E-06 | 8.00E-14 | 1.40 | |||
| blastx vs NR | EDU40406.1 ubiquitin-conjugating enzyme E2-21 kDa | 2.00E-10 | 2.00E-16 | 1.27 | |||
| blastx vs NR | EAW17873.1 ubiquitin conjugating enzyme ( | 1.00E-07 | 2.00E-13 | 1.29 | |||
| R8: Ubiquitin conjugating enzyme | blastx vs NR | EAT91013.2 hypothetical protein SNOG_01364 | 0 | 0 | 0.91 | ||
| blastx vs NR | EAT92627.2 hypothetical protein SNOG_16589 | 0 | 0 | 0.91 | |||
| blastx vs NR | EAT83373.2 hypothetical protein SNOG_09181 | 1.00E-177 | 0 | 0.95 | |||
| blastx vs NR | EAT90557.2 hypothetical protein SNOG_02345 | 2.00E-65 | 1.00E-106 | 2.80 | |||
| blastx vs NR | EAT90559.2 hypothetical protein SNOG_02347 | 1.00E-62 | 5.00E-91 | 1.38 | |||
| blastx vs NR | EAT91015.1 hypothetical protein SNOG_01366 | 3.00E-40 | 3.00E-62 | 1.35 | |||
| blastx vs NR | EAT85951.1 hypothetical protein SNOG_06120 | 2.00E-18 | 3.00E-24 | 1.19 | |||
| blastx vs NR | EAT91016.1 hypothetical protein SNOG_01367 | 1.00E-15 | 2.00E-23 | 1.28 | |||
| blastx vs NR | EAT83374.2 hypothetical protein SNOG_09182 | 5.00E-05 | 4.00E-08 | 1.23 | |||
| blastx vs NR | EAT91014.2 hypothetical protein SNOG_01365 | 1.00E-04 | 2.00E-04 | 1.15 | |||
After deRIP analysis the predicted origin of 8 repeat families has been altered from that described in Hane & Oliver (2008) [20]. Details of the blast hits which were most informative in re-classifying a repeat family are listed below. E-values are shown for matches to both the majority and deRIP consensus sequences. DeRIP improvement is a measure of how much better the deRIP consensus matched the hit compared to the majority consensus. DeRIP improvement > 1 indicates that the repeat family was derived from the hit or a related homolog, but was subsequently mutated by RIP.
Figure 3Nine copies of the repeat X3X3R8 contain predicted gene annotations in the . Blast analysis of the deRIP consensus sequence led to the hypothesis that 3 helicase genes, an ubiquitin conjugating enzyme and 2 unknown genes originally occupied this region. The effects of RIP mutation have led to the disruption of open-reading frames in several of these genes resulting in multiple, short-length gene predictions which are highly likely to be pseudogenes.
Figure 4Comparison of the repeats of the X3X3R8 repeat family with respect to predicted gene content. The deRIP consensus (red square, top) is a prediction of the original repeat sequence prior to RIP-degradation. Nine X3X3R8 repeats contained predicted gene annotations (green circles, refer to Figure 3). All gene-annotation containing repeats were more closely related to the deRIP consensus than to the majority consensus (red square, middle).