| Literature DB >> 17937804 |
Abstract
BACKGROUND: Efforts to gather genomic evidence for the processes of gene evolution are ongoing, and are closely coupled to improved gene annotation methods. Such annotation is complicated by the occurrence of disrupted mRNAs (dmRNAs), harbouring frameshifts and premature stop codons, which can be considered indicators of decay into pseudogenes.Entities:
Mesh:
Substances:
Year: 2007 PMID: 17937804 PMCID: PMC2194788 DOI: 10.1186/1471-2164-8-371
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Overall statistics
| Initial frame disruption is frameshift | 346 (83%) |
| Number with compensatory frameshifts | 17 (4%) |
| Initial frame disruption is premature stop codon | 73 (17%) |
Figure 1Three examples of dmRNAs. The translated dmRNA sequence is shown along with the corresponding nucleotide sequence; the aligning protein sequence is shown above these in each case. They are as follows: (a) a multiply-disrupted example (homologous to a cytochrome P450); (b) a multiply-disrupted example from a zinc-finger -containing transcription factor family; (c) an alternative splicing of the transmembrane sugar transporter gene, C20orf59, which appears to be a transmembrane sugar transporter.
Figure 2Numbers of paralogs. The distribution of the number of paralogs for all genes, and for genes yielding dmRNAs. The bin labeled x contains all values N such that x-5
Figure 3Numbers of frame disruptions. The number of frame disruptions in dmRNAs plotted versus the total occurrences of this number, on a log-log scale. This distribution is governed by a power law relationship, with the parameters for this linear relationship indicated on the plot.
Figure 4Distribution of frame-disrupted and non-frame-disrupted exon lengths in the disrupted mRNAs. The exon lengths are in bins labelled at either end of the bin with the upper (≤) and lower (>) bounds, with occurrences in each bin on the y axis. The percentage of exons >1000 nucleotides is given for each data set. The upper left panel is for the whole set of exons; the lower left panel for 5' exons, the upper right for internal exons, and the lower right for 3' exons.
Protein structure disruptions in mammalian messenger RNA transcripts
| Frameshift | All cases | 293 : 54 | 272.7 : 74.3 | †† |
| Cases with verifying alignments | 230 : 51 | 211.3 : 69.7 | ††† | |
| Stop codon | All cases | 68 : 5 | 55.6 : 17.4 | ††† |
| Cases with verifying alignments | 34 : 5 | 27.4 : 11.6 | † | |
| Frameshift | All cases | 360 : 59 | 327.5 : 91.5 | ††† |
| Cases with verifying alignments | 268 : 57 | 242.2 : 82.8 | †† | |
| Cases with verifying alignments (excluding probable UTR features) | 174:35 | 153.6 : 55.4 | ††† | |
* For the last row, those with frameshifts and stop codons are pooled together.
** Verifiying alignments are significant alignments to a rodent or non-mammalian vertebrate protein, as detailed in Methods.
*** The ratio stands for 'the number of frame disruptions not disrupting a protein structure domain assignment versus the number that do'. A margin for ascertaining overlap with a protein domain assignment of 15 nucleotides was used in the calculations. The expectations for the statistical tests (χ2) are calculated by adding up the total amount of coding sequence that can be assigned to a SCOP protein structure domain for the sample of transcripts analysed in each row of the table. † stands for P < 0.05, †† for P < 0.01 and ††† for P < 0.001. The significant results remain significant to at least P < 0.05 when margins for calculating overlap with protein domains of 0, 5, 10, 20 or 25 nucleotides are also used.
Distribution of initial disablements relative to zinc-finger domains *
| 0 | 21 : 20 | 26.7 : 14.3 | N.S. |
| 1 | 17 : 24 | 24.7 : 16.3 | † |
| 2 | 14 : 21 | 22.1 : 18.9 | † |
| 3 | 12 : 29 | 19.7 : 21.3 | † |
| 4 | 10 : 31 | 17.3 : 23.7 | † |
| 5 | 10 : 31 | 14.9 : 26.1 | N.S. |
* The format of this Table is as for Tables 2–3. The overlap margin is the number of residues that are ignored at either end of the zinc-finger domain (thus shortening the length of the defined protein motif).
Frame disruption placement and alternative splicing
| Frameshift | All cases | 191 : 156 | 258.8 : 88.2 | ††† |
| Cases with verifying alignments | 156 : 125 | 209.6 : 71.4 | ††† | |
| Stop codon | All cases | 33 : 40 | 54.4 : 18.6 | ††† |
| Cases with verifying alignments | 13 : 26 | 29.1 : 9.9 | ††† | |
| Frameshift | All cases | 228 : 191 | 312.5 : 106.5 | ††† |
| Cases with verifying alignments | 174 : 151 | 242.4 : 82.7 | ††† | |
| Cases with verifying alignments (without 'probable UTR features') | 114 : 95 | 143.8 : 65.2 | ††† | |
* Ratios are for numbers expected or observed in constitutive exons versus alternative ones. Expectations and test are performed as for Table 1.
Frame disruption placement and nonsense-mediated decay
| Frameshift | All cases | 159 : 188 | 294.9 : 52.1 | ††† |
| Cases with verifying alignments | 122 : 159 | 206.6 : 74.4 | ††† | |
| Stop codon | All cases | 43 : 30 | 42.9 : 30.1 | N.S. |
| Cases with verifying alignments | 22 : 17 | 17.5 : 21.5 | N.S. | |
| Frameshift | All cases | 201 : 218 | 344.5 : 74.5 | ††† |
| Cases with verifying alignments | 141 : 184 | 232.0 : 93.0 | ††† | |
| Cases with verifying alignments (without 'probable UTR features') | 87 : 122 | 111.5 : 97.5 | ††† | |
* Ratios are for numbers expected or observed in NMD regions versus non-NMD ones. Expectations and tests are performed as for Table 1.
Figure 5Pipeline for annotating dmRNAs. The steps discussed in Methods are illustrated schematically.