| Literature DB >> 17708759 |
Ian Armstead1, Lin Huang, Julie King, Helen Ougham, Howard Thomas, Ian King.
Abstract
BACKGROUND: Various methods have been developed to explore inter-genomic relationships among plant species. Here, we present a sequence similarity analysis based upon comparison of transcript-assembly and methylation-filtered databases from five plant species and physically anchored rice coding sequences.Entities:
Mesh:
Year: 2007 PMID: 17708759 PMCID: PMC2041955 DOI: 10.1186/1471-2164-8-283
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Partial angiosperm taxonomy illustrating the relationship between monocot and dicot species included in the present analysis. Numbers represent estimated times of lineage divergences (million years before the present) relative to rice taken from 1Bell et al. (2005) [26] and 2Gaut (2002) [58]. Taxonomic relationships were obtained from the NCBI Taxonomy browser [59].
Spearman rank correlation coefficients between linear 10% FAexpTRL pseudomolecule segments comparing % MegaBLAST alignments and average scores for each test database.
| Database | Lp_MF | |||||||||||
| Lp_MF | a | a | Lp_MF | |||||||||
| Lp_MF | b | b | Zm_MF | |||||||||
| Zm_MF | a | a | Zm_MF | |||||||||
| Zm_MF | b | b | Zm_TA | |||||||||
| Zm_TA | a | a | Zm_TA | |||||||||
| Zm_TA | b | 0.248** | b | Hv_TA | ||||||||
| Hv_TA | a | a | Hv_TA | |||||||||
| Hv_TA | b | 0.256** | b | Gm_TA | ||||||||
| Gm_TA | a | 0.275** | a | Gm_TA | ||||||||
| Gm_TA | b | b | At_TA | |||||||||
| At_TA | a | 0.242** | 0.259** | a | ||||||||
| At_TA | b | |||||||||||
a = % alignments
b = average scores
Significant levels: p < 0.001 (bold), ** = p < 0.01; * = p < 0.05; ns = p > 0.05
1 p = 0.051
Figure 2Distribution of differently-annotated TIGR rice loci on rice pseudomolecules. Linear order of differently annotated types of TIGR rice loci (TRL) on each of the rice pseudomolecules (1–12) in relation to significant MegaBLAST sequence alignments between the Os_CD database and the test databases. For each rice pseudomolecule: column 1 (red) = combined test database significant alignments; column 2 (blue) = functionally annotated TRL or expressed protein, column 3 (blue) = hypothetical protein, column 4 (blue) = retro/transposon-related sequence. Pseudomolecules are aligned along the centromere (horizontal black bar).
Figure 3Heat maps for % sequence alignments and average scores. Colour coded moving windows/100 functionally annotated, expressed TIGR rice loci (MWs/FAexpTRL) for each rice pseudomolecule (1–12). For each pseudomolecule: column 1–6 = MWs for % significant MegaBLAST alignments between Os_CD and test databases Lp_MF, Zm_MF, Zm_TA, Hv_TA, Gm_TA and At_TA respectively; column 7 = position of MWs containing rice centromere (dark vertical bar); column 8–13 = MWs for average score of significant MegaBLAST alignments between Os_CD and test databases Lp_MF, Zm_MF, Zm_TA, Hv_TA, Gm_TA and At_TA, respectively [see Additional file 2 Table 3 for colour code quantification]. Pseudomolecule representations are aligned along the centromeres.
FAexpTRL blocks associated with high MegaBLAST scores from pseudomolecules 4 and 10
| Test database MegaBLAST score | |||||||
| FAexpTRL | Annotation | Lp_MF | Zm_MF | Zm_TA | Hv_TA | Gm_TA | At_TA |
| LOC_Os04g16450 | aquaporin PIP2.8, putative, expressed | 470 | 357 | 351 | 838 | 121 | 121 |
| LOC_Os04g16680 | Sedoheptulose-1,7-bisphosphatase, chloroplast precursor, putative, expressed | 289 | 995 | 1112 | 1112 | 184 | 129 |
| LOC_Os04g16740 | ATP synthase alpha chain, putative, expressed | 1037 | 644 | 743 | 1126 | 856 | 815 |
| LOC_Os04g16750 | Photosystem I P700 chlorophyll a apoprotein A2, putative, expressed | 613 | 868 | 644 | 628 | 470 | 462 |
| LOC_Os04g16760 | Photosystem I P700 chlorophyll a apoprotein A1, putative, expressed | 503 | 1465 | 1100 | 323 | 357 | 252 |
| LOC_Os04g16770 | Photosystem Q, putative, expressed | 936 | 190 | 539 | 1891 | 848 | 987 |
| LOC_Os04g16780 | Chloroplast 30S ribosomal protein S3, putative, expressed | 129 | 287 | 474 | 498 | 97.6 | 141 |
| LOC_Os04g16790 | DNA-directed RNA polymerase alpha chain, putative, expressed | 525 | - | 1086 | 1219 | 216 | 206 |
| LOC_Os04g16819 | DNA-directed RNA polymerase beta chain, putative, expressed | 264 | 1265 | 410 | 394 | 105 | 188 |
| LOC_Os04g16820 | DNA-directed RNA polymerase beta chain, putative, expressed | 1072 | - | 1096 | 460 | 266 | - |
| LOC_Os10g21200 | Photosystem Q, putative, expressed | 936 | 1037 | 539 | 1883 | 840 | 979 |
| LOC_Os10g21270 | ATP synthase beta chain, putative, expressed | 991 | 1524 | 1376 | 527 | 573 | 793 |
| LOC_Os10g21280 | Ribulose bisphosphate carboxylase large chain precursor, putative, expressed | 936 | 591 | 1766 | 1701 | 531 | 981 |
| LOC_Os10g21310 | Photosystem II P680 chlorophyll A apoprotein, putative, expressed | 561 | 287 | 551 | 170 | 206 | - |
| LOC_Os10g21330 | DNA-directed RNA polymerase alpha chain, putative, expressed | 525 | 184 | 1070 | 1203 | 216 | 206 |
| LOC_Os10g38229 | Photosystem I P700 chlorophyll a apoprotein A1, putative, expressed | 496 | 914 | 1092 | 323 | 357 | 252 |
| LOC_Os10g38248 | Photosystem I P700 chlorophyll a apoprotein A2, putative, expressed | 1033 | 949 | 936 | 906 | 716 | 1076 |
| LOC_Os10g38270 | ATP synthase alpha chain, putative, expressed | 1029 | - | 698 | 1098 | 848 | 815 |
| LOC_Os10g38292 | Chloroplast ATP synthase a chain precursor, putative, expressed | 109 | 258 | - | 1096 | 545 | 614 |
| Mean MegaBLAST score | 214 | 260 | 504 | 473 | 155 | 151 | |
- = no significant MegaBLAST alignment with Os_CD database
Spearman rank correlation coefficients between linear 10% FAexpTRL pseudomolecule segments comparing % MegaBLAST alignments for each of the test databases and the number of segmentally duplicated FAexpTRL
| Test database | Segmentally duplicated FAexpTRL | |
| A | B | |
| Lp_MF | ||
| Zm_MF | ||
| Zm_TA | ||
| Hv_TA | ||
| Gm_TA | ||
| At_TA | 0.185* | |
A = including pseudomolecule 11 and 12 segments 1 and 2 (n = 120).
B = excluding pseudomolecule 11 and 12 segments 1 and 2 (n = 116).
Significant levels: p < 0.001 (bold), * = p < 0.05;
[See Additional file 2 Table 2 for associated 10% FAexpTRL pseudomolecule 11 and 12 segments 1 and 2 values]
Figure 4Heat maps for % sequence alignments and segmentally duplicated rice loci. Colour coded moving windows/100 functionally annotated, expressed TIGR rice loci (MWs/FAexpTRL) for each rice pseudomolecule (1–12). For each pseudomolecule: column 1–6 = MWs for % significant MegaBLAST alignments between Os_CD and test databases Lp_MF, Zm_MF, Zm_TA, Hv_TA, Gm_TA and AT_TA, respectively; column 7 = position of MWs containing rice centromere (dark vertical bar); column 8 = MWs indicating the distribution of segmentally duplicated FAexpTRL [see Additional file 2 Table 3 for colour code quantification].
Maximal expression patterns according to organ type of Arabidopsis CDS from the At_CD database significantly aligned with FAexpTRL from the 'top 10%' alignment (red zone) regions of the pseudomolecules.
| 'Top 10%' alignments | Random At_CD loci | ||||
| Plant Organ1 | No. | % (A) | No. | % (B) | A-B |
| lateral root cap | 35 | 8.0 | 53 | 3.4 | 4.6 |
| callus | 39 | 9.0 | 75 | 4.8 | 4.2 |
| cell suspension | 30 | 6.9 | 56 | 3.6 | 3.3 |
| root tip | 53 | 12.2 | 139 | 9.0 | 3.2 |
| node | 13 | 3.0 | 22 | 1.4 | 1.6 |
| elongation zone | 7 | 1.6 | 9 | 0.6 | 1.0 |
| xylem | 23 | 5.3 | 68 | 4.4 | 0.9 |
| endodermis | 13 | 3.0 | 35 | 2.3 | 0.7 |
| endodermis + cortex | 9 | 2.1 | 21 | 1.4 | 0.7 |
| stele | 8 | 1.8 | 24 | 1.5 | 0.3 |
| shoot apex | 16 | 3.7 | 54 | 3.5 | 0.2 |
| rosette | 1 | 0.2 | 1 | 0.1 | 0.1 |
| root hair zone | 23 | 5.3 | 81 | 5.2 | 0.1 |
| influorescence | 0 | 0 | 0 | 0 | 0 |
| ovary | 0 | 0 | 1 | 0.1 | -0.1 |
| cotyledons | 9 | 2.1 | 34 | 2.2 | -0.1 |
| stem | 2 | 0.5 | 10 | 0.6 | -0.1 |
| pedicel | 7 | 1.6 | 28 | 1.8 | -0.2 |
| seedling | 0 | 0 | 5 | 0.3 | -0.3 |
| adult leaf | 3 | 0.7 | 16 | 1.0 | -0.3 |
| juvenile leaf | 0 | 0 | 6 | 0.4 | -0.4 |
| epidermis atrichoblasts | 12 | 2.8 | 49 | 3.2 | -0.4 |
| roots | 1 | 0.2 | 10 | 0.6 | -0.4 |
| cork | 7 | 1.6 | 33 | 2.1 | -0.5 |
| silique | 4 | 0.9 | 23 | 1.5 | -0.6 |
| stigma | 0 | 0 | 9 | 0.6 | -0.6 |
| stamen | 11 | 2.5 | 49 | 3.2 | -0.7 |
| flower | 0 | 0 | 10 | 0.6 | -0.6 |
| carpel | 0 | 0 | 10 | 0.6 | -0.6 |
| petiole | 1 | 0.2 | 16 | 1.0 | -0.8 |
| cauline leaf | 8 | 1.8 | 41 | 2.6 | -0.8 |
| hypocotyl | 3 | 0.7 | 25 | 1.6 | -0.9 |
| seed | 14 | 3.2 | 66 | 4.3 | -1.1 |
| lateral root | 6 | 1.4 | 38 | 2.5 | -1.1 |
| sepal | 2 | 0.5 | 25 | 1.6 | -1.1 |
| hypocotyl | 1 | 0.2 | 25 | 1.6 | -1.4 |
| petal | 8 | 1.8 | 51 | 3.3 | -1.5 |
| radicle | 0 | 0 | 25 | 1.6 | -1.6 |
| pollen | 53 | 12.2 | 224 | 14.5 | -2.3 |
| senescent leaf | 13 | 3.0 | 83 | 5.4 | -2.4 |
| Total | 435 | 1550 | |||
1 Plant organs definitions as used in the Genevestigator™ database [57].
Maximal expression patterns according to growth stage of Arabidopsis CDS from the At_CD database significantly aligned with rice FAexpTRL from the 'top 10%' alignments (red zone) regions of the pseudomolecules.
| Growth stage1 (days) | 'Top10%' alignments | Random At_CD loci | |||
| No. | % (A) | No. | % (B) | A-B | |
| 1.0 – 5.9 | 86 | 19.7 | 187 | 11.8 | 7.9 |
| 6.0 – 13.9 | 28 | 6.4 | 86 | 5.4 | 1.0 |
| 14.0 – 17.9 | 36 | 8.2 | 167 | 10.5 | -2.3 |
| 18.0 – 20.9 | 19 | 4.3 | 101 | 6.4 | -2.1 |
| 21.0 – 24.9 | 99 | 22.7 | 258 | 16.3 | 6.4 |
| 25.0 – 28.9 | 43 | 9.8 | 180 | 11.4 | -1.4 |
| 29.0 – 35.9 | 3 | 0.7 | 61 | 3.8 | -3.1 |
| 36.0 – 44.9 | 33 | 7.6 | 140 | 8.8 | -1.2 |
| 45.0 – 50 | 90 | 20.6 | 405 | 25.6 | -5.0 |
| Total | 437 | 1585 | |||
1 Growth stages as described in the Genevestigator™ database [57].
DNA databases
| Sequence type | Abbreviation | No. sequences | Total base pairs | Average sequence length | Source | |
| CDS | Os_CD | 62827 | 85784595 | 1365 | TIGR | |
| MF | Lp_MF | 471749 | 236911323 | 502 | ViaLactia | |
| MF | Zm_MF | 450197 | 338653263 | 752 | NCBI | |
| TA | Zm_TA | 169087 | 112449533 | 665 | TIGR | |
| TA | Hv_TA | 123351 | 83655311 | 678 | TIGR | |
| TA | Gm_TA | 114693 | 65479263 | 571 | TIGR | |
| TA | At_TA | 148368 | 92081906 | 621 | TIGR | |
| CDS | At_CD | 30690 | 38043579 | 1240 | TAIR |
MF = methylation filtered
CDS = predicted complete coding sequences
TA = transcript assemblies
ViaLactia = ViaLactia Biosciences [60].
NCBI = National Center for Biotechnology Information [61]. Download command line = '(txid4577 [ORGN] AND Quackenbush [AUTH] AND "methylation" [ALL])' obtained from Gramene [9].
TIGR = The Institute for Genome Research [52, 62].
TAIR = The Arabidopsis Information Resource [11].
Representation of FAexpTRL large gene families in the 'top' and 'bottom' 10% MWs
| FAexp annotation large gene family1 | Annotation type in all FAexpTRL | Annotation type in top2 10% MWs | Annotation type in bottom2 10% MWs | |||
| No. | % (n= 17108)3a | No. | % (n = 3057)3b | No. | % (n = 508)3c | |
| Protein kinase domain containing protein, expressed | 412 | 2.41 | 68 | 2.22 | 22 | 4.33 |
| F-box domain containing protein, expressed | 330 | 1.93 | 3 | 0.10 | 69 | 13.58 |
| Leucine Rich Repeat family protein, expressed | 318 | 1.86 | 20 | 0.65 | 50 | 9.84 |
| pentatricopeptide, putative, expressed | 264 | 1.54 | 16 | 0.52 | 1 | 0.20 |
| NB-ARC domain containing protein, expressed | 245 | 1.43 | 1 | 0.03 | 76 | 14.96 |
| Zinc finger, C3HC4 type family protein, expressed | 218 | 1.27 | 36 | 1.18 | 1 | 0.20 |
| Cytochrome P450 family protein, expressed | 184 | 1.08 | 22 | 0.72 | 8 | 1.57 |
| RNA recognition motif family protein, expressed | 117 | 0.68 | 16 | 0.52 | 0 | 0.00 |
1Large gene family = Families > 100 FAexpTRL with identical annotations.
2MWs which are in the top/bottom 10% for both % sequence alignments and average scores.
3 n = a) total number of FAexpTRL; b) number of FAexpTRL in top 10% alignment MWs with significant alignments identified in each of the Lp_MF, Zm_MF, Zm_TA and Hv_TA (monocot) databases; c) number of FAexpTRL in bottom 10% MAWs with no significant alignments identified in any of the monocot databases.