Literature DB >> 22639581

Asymmetric distribution of gene expression in the centromeric region of rice chromosome 5.

Hiroshi Mizuno1, Yoshihiro Kawahara, Jianzhong Wu, Yuichi Katayose, Hiroyuki Kanamori, Hiroshi Ikawa, Takeshi Itoh, Takuji Sasaki, Takashi Matsumoto.   

Abstract

There is controversy as to whether gene expression is silenced in the functional centromere. The complete genomic sequences of the centromeric regions in higher eukaryotes have not been fully elucidated, because the presence of highly repetitive sequences complicates many aspects of genomic sequencing. We performed resequencing, assembly, and sequence finishing of two P1-derived artificial chromosome clones in the centromeric region of rice (Oryza sativa L.) chromosome 5 (Cen5). The pericentromeric region, where meiotic recombination is silenced, is located at the center of chromosome 5 and is 2.14 Mb long; a total of six restriction-fragment-length polymorphism markers (R448, C1388, S20487S, E3103S, C53260S, and R2059) genetically mapped at 54.6 cM were located in this region. In the pericentromeric region, 28 genes were annotated on the short arm and 45 genes on the long arm. To quantify all transcripts in this region, we performed massive parallel sequencing of mRNA. Transcriptional density (total length of transcribed region/length of the genomic region) and expression level (number of uniquely mapped reads/length of transcribed region) were calculated on the basis of the mapped reads on the rice genome. Transcriptional density and expression level were significantly lower in Cen5 than in the average of the other chromosomal regions. Moreover, transcriptional density in Cen5 was significantly lower on the short arm than on the long arm; the distribution of transcriptional density was asymmetric. The genomic sequence of Cen5 has been integrated into the most updated reference rice genome sequence constructed by the International Rice Genome Sequencing Project.

Entities:  

Keywords:  International Rice Genome Sequencing Project; P1-derived artificial chromosome; centromere; genome sequencing; mRNA-Seq

Year:  2011        PMID: 22639581      PMCID: PMC3355683          DOI: 10.3389/fpls.2011.00016

Source DB:  PubMed          Journal:  Front Plant Sci        ISSN: 1664-462X            Impact factor:   5.753


Introduction

The centromere is essential for the correct segregation of chromosomes in dividing cells. The functional centromere complex is composed of proteins binding to highly repetitive centromere-specific DNA sequences (Houben and Schubert, 2003; Dawe and Hiatt, 2004; Hall et al., 2004; Sharma and Raina, 2005; Lamb et al., 2007; Ma et al., 2007; Gill et al., 2008). Centromere-specific histone-H3-like protein (CENH3) defines the boundaries of the functional centromeric region of DNA; CENH3 replaces the canonical histone H3 to form a specific type of nucleosome that is essential for kinetochore formation (Henikoff et al., 2001; Blower et al., 2002). The kinetochore links the chromosome to microtubule polymers, which are attached to the mitotic spindle during mitosis and meiosis. However, the genomic sequences of the centromeric regions are diverse and have not yet been fully elucidated in higher eukaryotes, even in the case of the so-called “completely sequenced” genomes (Hosouchi et al., 2002; Mizuno et al., 2008b; Torras-Llort et al., 2009; Buscaino et al., 2010). Because the presence of highly repetitive sequences complicates many aspects of genomic sequencing (including cloning, mapping, chromosome walking, and computer-assisted assembly of the fragments of DNA sequences), sequencing of the centromeric regions of higher eukaryotes is extremely difficult. Nevertheless, substantial progress in sequencing of the centromere region has been made in rice (Oryza sativa L.). As some rice centromeres have exceptionally small numbers of tandem repeats (IRGSP, 2005), rice is suitable for the comprehensive analysis of centromeric sequence composition and organization in eukaryotes. From 1998 to 2004, the International Rice Genome Sequencing Project (IRGSP) succeeded in constructing a P1-derived artificial chromosome (PAC) and bacterial artificial chromosome (BAC) clone contig including the centromere regions of three chromosomes. Initial Sanger dideoxy sequencing of these clones revealed, for the first time, the overall structure of the centromeric regions of higher eukaryotes (IRGSP, 2005). To date, of the 12 rice chromosome centromeric regions, Cen3 (containing gaps; Yan et al., 2006), Cen4 (Zhang et al., 2004), and Cen8 (Wu et al., 2004) have been almost completely sequenced. In the case of Cen5, a PAC/BAC contig has been constructed by chromosome walking (Cheng et al., 2005); however, the contig is only partially sequenced (IRGSP, 2005). In the core region of each rice centromere is a tandem array of a key sequence, the 155-bp RCS2/CentO sequence (Dong et al., 1998). Around the RCS2/CentO array is distributed the pericentromeric region in which meiotic recombination is suppressed. Genes have been computationally predicted in pericentromeric regions (Nagaki et al., 2004; Wu et al., 2004). Twenty-seven of the predicted genes in Cen8 are conserved in the japonica rice Nipponbare and the indica rice Kasalath (Wu et al., 2009). Although the centromere has been considered to be a highly heterochromatic and transcriptionally silent chromosomal domain, active genes have been found in the 750-kb core domain of Cen8 (Nagaki et al., 2004). There is therefore controversy as to whether gene expression is silenced in the functional centromere. To assess the functional importance of the expression of these centromeric genes, it is important to characterize them and quantify their transcripts. Here, we performed sequence improvement and comprehensive expression analysis of rice Nipponbare chromosome 5 at single-nucleotide resolution. First, we used a Sanger sequencing-based finishing procedure to bridge the short and long arm chromosome 5 sequences in the public reference rice genome sequence constructed by the IRGSP. Second, we applied Illumina massive parallel sequencing technology to mRNA sequencing, revealing the distribution of gene expression in Cen5. We discovered that the distribution was asymmetric. We discuss the importance of gene expression in centromeric regions and the evolutionary history of the asymmetric distribution of expressed genes in Cen5.

Materials and Methods

Sequence improvement of PAC/BAC clones by using a finishing procedure

P1-derived artificial chromosome (P) and BAC (B) libraries were constructed from genomic DNA derived from the rice cultivar Nipponbare (JP 229579 in the National Institute of Agrobiological Sciences Genebank; O. sativa L. ssp. japonica) and generated by the Rice Genome Research Program of Japan. The BAC library (OSJNBa) was constructed by the Arizona Genomics Institute (Ammiraju et al., 2006). Details of the method used for Southern hybridization and PCR screening of the PAC/BAC libraries have been given previously (Wu et al., 2003). Two PAC clones (P0587F01, P0697B04) were resequenced in accordance with the IRGSP sequencing guidelines (IRGSP, 2005). Briefly, about 2000 subclone plasmid libraries from each PAC clone were end-sequenced, and these sequences were assembled with Phred–Phrap software. For the gap regions within PAC/BAC clones, bridging subclones were fully sequenced by primer walking. To resolve misassembly in the repeat regions, several subclones (~7 kb) were fully sequenced, and these continuous sequences were used as a guide for the reassembly process. Finally, the clone sequences were combined, taking into account overlaps.

Preparation of cDNA, illumina sequencing, and mapping of short reads

Nipponbare rice was grown in a growth chamber at 28°C. After the seedlings had been grown for 7 days, total RNA was extracted from the shoots and roots by using an RNeasy Plant Kit (Qiagen, Hilden, Germany). RNA quality was calculated by using a Bioanalyzer 2100 algorithm (Agilent Technologies, USA); high-quality RNA (RNA integrity number >8) was used. Oligo(dT) magnetic beads were used to isolate poly(A) RNA from the total RNA samples. Poly(A) RNA was converted to cDNA for massive parallel sequencing in an Illumina Genome Analyzer IIx (Illumina, San Diego, CA, USA), in accordance with the protocol for the mRNA-Seq sample preparation kit (Illumina). All primary mRNA sequence read data had been previously submitted to the DNA Data Bank of Japan (DDBJ; DRA000159; Mizuno et al., 2010). Normal shoot and normal root reads that passed the filter were mapped onto the Nipponbare reference genome (Build 5.0) by using Bowtie (version 0.12.7; Langmead et al., 2009) and TopHat (version 1.2.0; Trapnell et al., 2009) software, with the default parameters. Uniquely mapped reads were used for further analysis. Differences in transcriptional density [total length of transcribed region (bp)/length of the genomic region (bp)] and expression level [number of uniquely mapped reads/length of transcribed region (bp)] were assessed statistically by Fisher's exact test. The length of the genomic region was calculated on the basis of the Nipponbare reference genomic sequence (Build 5.0). A “transcribed region” was defined as a region in which at least one read derived from mRNA was mapped.

Results

Genomic sequencing of Cen5

P1-derived artificial chromosome/BAC clone-based sequencing was adopted for genomic sequencing of Cen5. A PAC/BAC contig was constructed by chromosome walking to cover the genetically defined centromeric region of chromosome 5 (Cheng et al., 2005). The PAC/BAC contig was mapped by using restriction-fragment length polymorphism (RFLP) markers S20487S and E3103S, located on the short and long arms, respectively, of chromosome 5 at 54.6 cM; the contig bridged the sequence between the short and long arms of chromosome 5 (Figure 1). Because a version of the sequences of two PAC clones (P0587F01, P0697B04) had already been published in draft status, these clones were divided into a number of pieces (12 in the case of AC146339 and 7 for AC137984; Table 1). To obtain more accurate information on Cen5, these PAC clones were resequenced by Sanger-based sequencing technology, reassembled, and finished (see Materials and Methods). Clone P0587F01 was reassembled into one contig and the sequence was submitted to the PLN (plant, fungal, and algal sequences) division of DDBJ (52,858 bp, AP011109; Table 1). In the case of P0697B04, all the gaps were filled, but because the center of this clone was occupied by the RCS2/CentO repeats the exact number and orientation of RCS2/CentO repeats were not determined; the sequence was submitted as an incomplete status high-throughput genomic sequence (HTGS)_PHASE2 (147,577 bp, AP011110; Table 1). Cen5 had two different-sized clusters of 155-bp RCS2/CentO satellite repeats (Figure 1). After removing redundant sequences from the regions overlapping between the neighboring PAC/BAC clones, we generated a continuous, high-quality DNA sequence covering the entire region of Cen5. The genomic sequence of Cen5 was integrated into the latest reference genomic sequence of rice constructed by the IRGSP (IRGSP Build 5.0 pseudomolecules).
Figure 1

Genetic map and PAC/BAC physical map of . Two PAC clones (P0697B04 and P0587F01; black bars) were sequenced. The PAC/BAC contig was mapped by using restriction-fragment-length polymorphism markers S20487S and E3103S, which were located on the short and long arms, respectively, of chromosome 5 at 54.6 cM. Red boxes represent RCS2/CentO clusters.

Table 1

Improvement of the sequences of PAC clones.

P0587F01P0697B04
Accession numberAC146339AP011109AC137984AP011110
Contigs12171*
StatusHTGS_PHASE1PLN_PHASE3HTGS_PHASE2HTGS_PHASE2
Length (bp)149,33052,858114,329147,577

*The number and orientation of RCS2/CentO repeats were not determined. HTGS, high-throughput genomic sequence; Phase 1: unfinished; may be unordered, unoriented contigs, with gaps. Phase 2: unfinished, ordered, oriented contigs, with or without gaps. Phase 3: finished, no gaps. PLN: plant, fungal, and algal sequences of Phase 3.

Genetic map and PAC/BAC physical map of . Two PAC clones (P0697B04 and P0587F01; black bars) were sequenced. The PAC/BAC contig was mapped by using restriction-fragment-length polymorphism markers S20487S and E3103S, which were located on the short and long arms, respectively, of chromosome 5 at 54.6 cM. Red boxes represent RCS2/CentO clusters. Improvement of the sequences of PAC clones. *The number and orientation of RCS2/CentO repeats were not determined. HTGS, high-throughput genomic sequence; Phase 1: unfinished; may be unordered, unoriented contigs, with gaps. Phase 2: unfinished, ordered, oriented contigs, with or without gaps. Phase 3: finished, no gaps. PLN: plant, fungal, and algal sequences of Phase 3.

Identification of expressed region by using mRNA-SEQ

We defined pericentromeric regions as recombinational cold spots proximal to RCS2/CentO, as in a previous rice analysis (Wu et al., 2003). A total of six RFLP markers (R448, C1388, S20487S, E3103S, C53260S, and R2059) genetically mapped at 54.6 cM were located in the 2.14-Mb defined as the pericentromeric region of chromosome 5 (Figure 2). A total of five RFLP markers (R288, S2106, C53648S, C1794, and C954) were mapped at 19.6 cM in the 2.09-Mb pericentromeric region of Cen4 (Figure A1A in Appendix); and a total of six RFLP markers (C1374, R2381, E20691S, S21882S, C1115, and R2466) were mapped at 54.3 cM in the 2.43-Mb pericentromeric region of Cen8 (Figure A1B in Appendix).
Figure 2

Distribution of expressed regions in . The positions of restriction-fragment-length polymorphism (RFLP) markers mapped at 54.3–55.4 cM are indicated. The region in which RFLP markers are mapped at 54.6 cM is shown (gray box). The distribution of 36-bp mapped reads on the rice genome was graphed in GBrowse (Stein et al., 2002). The graph indicates the average depths of reads from mRNA-Seq for samples obtained from shoots (green) or roots (red). Only depths <50 are shown (Depths ≥50 are shown as 50). The level of expression is normalized to that of the shoot (standard). Gene models based on Rice Annotation Project (RAP) representative loci (RAP_rep) and RAP predicted genes (RAP_pred) are shown. Expression of an RAP predicted gene is shown (white triangle). The position of Os05g0303000, a homolog of the wheat PSR161 gene mapped on wheat Cen1B (see text), is also indicated (black triangle). Red boxes represent RCS2/CentO clusters.

Figure A1

Distribution of expressed regions proximal to . The distributions of reads mapped on Cen4 (A) and Cen8 (B) were graphed in GBrowse, as in Figure 2. Os04g0234600 in Cen4 is extremely highly expressed (black triangle). Os08g0319450 in Cen8 is located in the small RCS2/CentO sequence.

Distribution of expressed regions in . The positions of restriction-fragment-length polymorphism (RFLP) markers mapped at 54.3–55.4 cM are indicated. The region in which RFLP markers are mapped at 54.6 cM is shown (gray box). The distribution of 36-bp mapped reads on the rice genome was graphed in GBrowse (Stein et al., 2002). The graph indicates the average depths of reads from mRNA-Seq for samples obtained from shoots (green) or roots (red). Only depths <50 are shown (Depths ≥50 are shown as 50). The level of expression is normalized to that of the shoot (standard). Gene models based on Rice Annotation Project (RAP) representative loci (RAP_rep) and RAP predicted genes (RAP_pred) are shown. Expression of an RAP predicted gene is shown (white triangle). The position of Os05g0303000, a homolog of the wheat PSR161 gene mapped on wheat Cen1B (see text), is also indicated (black triangle). Red boxes represent RCS2/CentO clusters. We compared the averages of gene density, transcriptional density, and expression level in the centromeric region with those in other chromosomal regions. The average gene density in the centromeric region was the lowest in the whole chromosomal region (Figure 3). The average transcriptional density in the centromeric region was lower than that in other chromosomal regions, but the average expression level in the centromeric region was not (Figure 3). Gene expression in the centromeric region was compared by statistical analysis, which was independent of gene annotation. First, transcriptional density was compared. The transcriptional density of Cen5 was 0.070 (shoot) and 0.065 (root), whereas that of the other regions of the same chromosome was 0.168 (shoot) and 0.170 (root); transcriptional density was significantly lower (P < 0.0001) in Cen5 than in the average of the other regions by Fisher's exact test (Table 2). The transcriptional densities in Cen4 and Cen8 were also significantly lower than in the averages of the other regions (Table 2). Second, expression level was compared. The expression level in Cen5 was 234.4 (shoot) and 177.5 (root), whereas that in the other regions was 264.8 (shoot) and 239.5 (root); the expression level in Cen5 was significantly lower than that in the other regions (P < 0.0001). However, in Cen4, expression of the gene Os04g0234600 (similar DNA sequence to that encoding sedoheptulose-bisphosphatase) was extremely high in the shoot (Figure A1A in Appendix), resulting in a high average expression level in Cen4 (data not shown). With the exception of the expression of Os04g0234600 in Cen4, expression levels were also significantly lower in Cen4 and Cen8 than in the other regions (Table 2). Thus, gene expression (transcriptional density and expression level) was significantly lower in the centromeric region than in the other regions.
Figure 3

Relationships of gene density, transcriptional density, and expression levels. Average scores of gene density, transcriptional density, and expression level in each 1 Mb of sliding windows on chromosomes 4, 5, and 8 are shown. Horizontal axis indicates position on the reference rice genome (Build 5.0 pseudomolecules) constructed by the International Rice Genome Sequencing Project. Lines indicate the boundaries of each pericentromeric region.

Table 2

Comparison of transcription in centromeric regions and in the whole genomic region.

Genomic region (bp)TissueTranscribed region (bp)No. of uniquely mapped readsTranscriptional densityExpression level
CentromereOtherCentromereOtherCentromereOtherCentromereOtherPCentromereOtherP
Cen42,088,65533,973,212Shoot115,4955,005,92548,8751,368,0960.0550.147<0.0001214.7273.3<0.0001
Root101,4535,096,94015,4091,084,7340.0490.150<0.0001151.9212.8<0.0001
Cen52,139,09827,934,342Shoot149,1604,687,56234,9641,241,3320.0700.168<0.0001234.4264.8<0.0001
Root138,6184,758,23224,6071,139,4870.0650.170<0.0001177.5239.5<0.0001
Cen82,431,59426,098,435Shoot231,8663,862,07636,8131,197,7000.0950.148<0.0001158.8310.1<0.0001
Root239,9173,843,45035,396787,9080.0990.147<0.0001147.5205.0<0.0001

Statistical significance (P) was based on Fisher's exact test. Expression levels in the centromeric region of chromosome 4 were calculated without the gene Os04g0234600 (see text). The centromeric region was defined as from the start position of the short arm of the pericentromeric region to the end position of the long arm of the pericentromeric regions. Transcribed region, transcriptional density, and expression level are defined in Section “Materials and Methods.”

Relationships of gene density, transcriptional density, and expression levels. Average scores of gene density, transcriptional density, and expression level in each 1 Mb of sliding windows on chromosomes 4, 5, and 8 are shown. Horizontal axis indicates position on the reference rice genome (Build 5.0 pseudomolecules) constructed by the International Rice Genome Sequencing Project. Lines indicate the boundaries of each pericentromeric region. Comparison of transcription in centromeric regions and in the whole genomic region. Statistical significance (P) was based on Fisher's exact test. Expression levels in the centromeric region of chromosome 4 were calculated without the gene Os04g0234600 (see text). The centromeric region was defined as from the start position of the short arm of the pericentromeric region to the end position of the long arm of the pericentromeric regions. Transcribed region, transcriptional density, and expression level are defined in Section “Materials and Methods.” We also compared transcription in the short and long arms in the pericentromeric regions. In Cen5, transcriptional density was 0.039 (shoot) and 0.035 (root) on the short arm and 0.110 (shoot), 0.103 (root) on the long arm. Transcriptional density was significantly (P < 0.0001) lower on the short arm than on the long arm by Fisher's exact test (Table 3); the distribution of transcriptional density was asymmetric in Cen5. The expression level of Cen5 in shoots was significantly (P < 0.0001) lower on the short arm than on the long arm, whereas the expression level of Cen5 in roots was significantly (P < 0.0001) lower on the long arm than on the short arm (Table 3). Thus, the distribution of expression level of Cen5 was asymmetric, but the tendency was in the opposite directions in the shoots and roots.
Table 3

Comparison of transcription in .

Genomic region (bp)TissueTranscribed region (bp)No. of uniquely mapped readsTranscriptional densityExpression level
Pericent. short armRCS2/CentOPericent. long armPericent. short armRCS2/CentOPericent. long armPericent. short armRCS2/CentOPericent. long armPericent. short armRCS2/CentOPericent. long armPPericent. short armRCS2/CentOPericent. long armP
Cen41,779,938124,271184,446Shoot88,49814026,85740,98457,8860.0500.0010.146<0.0001463.135.7293.6<0.0001
Root75,23028825,93510,869104,5300.0420.0020.141<0.0001144.534.7174.7<0.0001
Cen51,063,87497,181978,043Shoot41,6290107,5314,332030,6320.0390.0000.110<0.0001104.10.0284.9<0.0001
Root37,5070101,1119,725014,8820.0350.0000.103<0.0001259.30.0147.2<0.0001
Cen8935,76376,1651,419,666Shoot94,769178136,91914,495522,3130.1010.0020.096<0.0001153.028.1163.0<0.0001
Root95,763176143,97814,069521,3220.1020.0020.1010.496146.928.4148.10.0224

Statistical significance of the difference in gene expression between the short arm and long arm (P) was based on Fisher's exact test. Transcribed region, transcriptional density, and expression level are defined in Section “Materials and Methods.”

Comparison of transcription in . Statistical significance of the difference in gene expression between the short arm and long arm (P) was based on Fisher's exact test. Transcribed region, transcriptional density, and expression level are defined in Section “Materials and Methods.”

Characterization of genes expressed in Cen5

The annotated genes in Cen5 were characterized by using the Rice Annotation Project Database (RAP-DB; Rice_Annotation_Project, 2008); 28 genes were annotated in the pericentromeric region on the short arm of Cen5 (~1.06 Mb), whereas 45 genes were annotated on the long arm (~0.978 Mb; Table A1 in Appendix; Table 3). On the short arm close to RCS2/CentO (C1388 to S20487S), most of the genes encoding hypothetical proteins were hardly expressed (Table A1 in Appendix). On the long arm, genes encoding proteins similar to transcription factor IIA large subunit (Os05g0292200), acetyl-coenzyme A carboxylase (Os05g0295300), glyoxalase I (Os05g0295800), and zinc-finger-like protein (Os05g0299700) were expressed at relatively high levels (RPKM > 20; Table A1 in Appendix) in both shoots and roots. Analysis of the mapped reads also gave evidence of the expression of genes computationally predicted by the RAP (Figure 2). A non-protein-coding transcript (Os05g0296600) was also expressed (Table A1 in Appendix). Most of the genes highly expressed on the long arm were similar to genes encoding functional – not hypothetical – proteins.
Table A1

Annotated genes in .

Gene_IDS/LStartEndLengthStrandDescriptionRPKM_shootRPKM_root
R448
Os05g0276500S11422127114239071101Expansin Os-EXPA30.240.83
Os05g0277000S1144748111448493754Similar to Expansin Os-EXPA30.2227.83
Os05g0277200S11463176114644211246+Conserved hypothetical protein2.733.63
Os05g0277300S11465333114695463295Similar to cDNA clone: 001-013-F117.596.48
Os05g0277350S1147415311475272691+Similar to leucine rich repeat family protein00
Os05g0277500S1149617311497090840+Similar to germin-like protein subfamily 2 member 4 precursor1.69.45
Os05g0278500S11638503116440341551Transferase family protein6.91162.59
Os05g0278550S1164345611644083628+Hypothetical gene5.6144.79
Os05g0278950S1167004411672821715Similar to ATP-dependent Clp protease proteolytic subunit00
Os05g0279300S11676865116856031209Similar to tRNA pseudouridine synthase A3.231.39
Os05g0279400S11689568116953863310+Zinc-finger, RING-type domain containing protein23.8321.99
Os05g0279600S11700491117099891352+Endonuclease/exonuclease/phosphatase domain containing protein57.01
Os05g0279750S11721971117261534183+Hypothetical gene00.02
Os05g0279900S11728764117317891475+Similar to Polygalacturonase A7.492.52
C1388
Os05g0280200S1175267811754728672Similar to Ras-related protein RGP252.1855.95
Os05g0280350S1175272811754718+Hypothetical gene57.6563.09
Os05g0280500S11782293117853891881Phospholipid/glycerol acyltransferase domain containing protein0.7773.36
Os05g0280700S11817709118208773169Similar to resistance protein candidate0.230
Os05g0281400S11920597119217021013+Protein of unknown function DUF810 domain containing protein5.625.75
Os05g0282500S1204112912043113600Hypothetical conserved gene0.090
Os05g0282900S12079928120817351808+Conserved hypothetical protein0.190.97
Os05g0283000S12088257120916921607+Conserved hypothetical protein0.070
Os05g0283200S12098520120995751056+Pectinesterase inhibitor domain containing protein00
Os05g0283600S12122939121313563569+Zinc-finger, CCHC-type domain containing protein00
Os05g0285900S12322935123275341162+Conserved hypothetical protein2.022.88
Os05g0286100S12337263123382991037+Similar to zinc-finger protein KNUCKLES014.26
Os05g0286200S1235385812356702772+Conserved hypothetical protein00
Os05g0287800S12482678124868011445+Conserved hypothetical protein6.817.5
S204875 RCS2/CentO repeats
Os05g0289100L12601354126024921058+Hypothetical conserved gene00
Os05g0289400L12630181126351262682Similar to CRN (Crooked neck) protein19.6329.5
Os05g0289700L12650476126519761395+Arbuscular mycorrhizal specific marker 10. Benzyl alcohol benzoyl transferase00
Os05g0290300L12704171127057201219Hypothetical conserved gene5.1311.83
Os05g0290400L12704190127150352613+Hypothetical gene6.9212.32
Os05g0291600L1286025412860794541+Hypothetical conserved gene00.13
Os05g0291700L12862505128684321316Similar to PTAC162631.22
Os05g0291800L1287286312873488526+Similar to predicted protein00
Os05g0292200L12895006129014031630+Similar to Transcription factor IIA large subunit (TFIIA-L1)30.1829.59
S3103S
Os05g0292800L1292502712925834551+Similar to one helix protein (OHP)183.518.6
Os05g0293500L12962105129673801237Similar to Pectate lyase B00
Os05g0293600L12978536129840175482+Similar to RNA polymerase beta’ chain00
Os05g0294600L13018766130214912425Pentatricopeptide repeat domain containing protein14.972.73
Os05g0294800L13035304130391952262+Hypothetical gene10.5210.5
Os05g0295100L13056572130756972031+Hypothetical conserved gene0.992.73
Os05g0295200L13086136130892962181Conserved hypothetical protein10.321.34
Os05g0295300L1309323313094329952Similar to acetyl-coenzyme A carboxylase40.1245.31
Os05g0295700L13117580131219262251Similar to homoserine dehydrogenase-like protein10.2211.75
Os05g0295800L13123210131277861052Similar to glyoxalase I39.1236.36
C53260S
Os05g0295900L13135652131448183064Conserved hypothetical protein0.712.97
Os05g0296200L13169380131717532374+Conserved hypothetical protein00
Os05g0296600L1321692313217232310+Non-protein coding transcript23.7762.28
Os05g0296700L1322166713222206540Similar to small heat shock protein3.623.24
Os05g0296750L1322173013222352623+Hypothetical gene3.232.34
Os05g0296800L1322621113228572897Hypothetical protein0.310.32
Os05g0296900L1325900413259727508Conserved hypothetical protein00
Os05g0297001L13261758132639212164+Similar to predicted protein00
Os05g0297300L13287199132889341736+Protein of unknown function DUF1618 domain containing protein00
Os05g0297400L1328999613290998992Similar to CXIP400
Os05g0297800L13304340133077792408Conserved hypothetical protein0.770.21
Os05g0297850L1330930513309728424Hypothetical conserved gene00
Os05g0297900L13311413133151531034+Similar to signal peptidase 18 subunit9.6717.76
Os05g0298200L13337235133414542401+Ankyrin repeat containing protein14.939.83
Os05g0298600L13349202133514142213Hypothetical conserved gene3.435.1
Os05g0298700L13357011133593461220Similar to xylan endohydrolase isoenzyme X-I00
Os05g0298900L1339595513396672718+Conserved hypothetical protein6.8414.11
Os05g0299000L1340091913401654736+Hypothetical protein0.080.1
Os05g0299101L1340264713403283550Hypothetical gene0.410
Os05g0299200L13407527134124971491Hypothetical conserved gene10.042.98
Os05g0299300L13414154134200433226WD40 repeat-like domain containing protein4.485.13
Os05g0299500L13434338134398171563+Protein of unknown function DUF9146.6415.66
Os05g0299600L13440049134423372171Protein of unknown function DUF16771.180.67
Os05g0299700L13450657134530152359Similar to expressed protein (zinc-finger-like protein)38.2539.38
Os05g0300700L13504825135120702425+Cell division cycle-associated protein domain containing protein9.1916.98
Os05g0301500L13558563135633042162+Similar to ribophorin I18.8833.4
R2059

Genes located between restriction-fragment-length polymorphism (RFLP) markers R448 and R2059 on chromosome 5 are listed. Gene ID (gene_ID); mapped on short arm or long arm (S/L); start position (start); end position (end); total nucleotide length of each transcript (length); coding strand (strand); description in Rice Annotation Project Database (description); RPKM in shoot (RPKM_shoot); and RPKM in root (RPKM_root) are listed. The position of RFLP markers and RCS2/CentO repeats are also shown in bold letter.

The distribution of transcription of each gene was identified by using Illumina mRNA-Seq technology. We adopted the RPKM (reads per kilobase of exon models per million mapped reads) method (Mortazavi et al., 2008) for transcript quantification on the basis of the number of sequence reads mapped on each gene. The RPKM and signal intensity from microarray analysis of the same RNA materials as used in this study had been compared previously; these two independent measures of transcript abundance were correlated (r = 0.75–0.77; Mizuno et al., 2010). Dot plot analysis of the RPKM and the chromosomal position of each gene suggested that gene expression was low in the centromeric regions (Figure 4).
Figure 4

Relationships between location and RPKM (reads per kilobase of exon models per million mapped reads) of genes. RPKM of each gene on chromosomes 4, 5, and 8 are shown. Only genes with RPKM < 100 are shown. Horizontal axis indicates the positions of genes in the reference rice genome (Build 5.0 pseudomolecules) constructed by the International Rice Genome Sequencing Project. Red triangles indicate the positions of RCS2/CentO clusters.

Relationships between location and RPKM (reads per kilobase of exon models per million mapped reads) of genes. RPKM of each gene on chromosomes 4, 5, and 8 are shown. Only genes with RPKM < 100 are shown. Horizontal axis indicates the positions of genes in the reference rice genome (Build 5.0 pseudomolecules) constructed by the International Rice Genome Sequencing Project. Red triangles indicate the positions of RCS2/CentO clusters. A putative gene conserved in the rice centromere and wheat centromere was found: Os05g0303000 was mapped only 90 kb distal to the marker R2059 on Cen5 and was highly expressed in shoots and roots (Figure 2). Os05g0303000 had 82.6% DNA sequence identity to PSR161 (data not shown). PSR161 is the only actively transcribed gene that has been mapped on the functional centromere of wheat chromosome 1B (Francki et al., 2002), suggesting that the location of this homolog is conserved in rice Cen5 and wheat Cen1B.

Discussion

Gene expression in pericentromeric regions

To assess the functional importance of gene expression in the centromeric region, we performed genomic sequencing of Cen5 (Figure 1; Table 1) and expression analysis (Figure 2). Gene expression (transcriptional density and expression level) was significantly lower in the pericentromeric regions of Cen4, Cen5, and Cen8 than in the other regions (Table 2; Figures 3 and 4). Low transcriptional density could be partly explained by the low gene density (Figure 3), as centromeric regions contain repetitive sequences such as the centromere-specific retrotransposon RIRE7/CRR and the tandem repetitive sequence RCS2/CentO. The high expression observed only under specific conditions (e.g., of Os04g0234600 in shoots, Figure A1A in Appendix) could be explained by the occurrence of permissive transcriptional activity through pockets of DNA hypomethylation (Wong et al., 2006) and/or mosaics of histone modification in the centromeric region (Stimpson and Sullivan, 2010): the presence of methylated histone H3 at Lys9 leads to heterochromatin assembly, whereas methylated histone H3 at Lys4 leads to euchromatin assembly. Thus, gene expression was generally low in the centromeric region, but the suppression could be selectively released in specific tissues and under specific cell conditions. The distribution of gene expression was asymmetric in Cen5: genes were rarely expressed on the short arm and highly expressed on the long arm (Figure 2; Table 3). The size of the rarely expressed region C1388 to S20487S (~700 kb; Figure 2) was almost the same as that of the kinetochore region on Cen8 (750 kb; Nagaki et al., 2004; Wu et al., 2004), suggesting that these rarely expressed gene regions are related to the formation of kinetochores in Cen5. In the 700-kb region, most of the genes were annotated as hypothetical and were hardly expressed (Table A1 in Appendix), suggesting that these genes do not have specific functions. On the long arm of Cen5, genes with similarity to those encoding known functional proteins were highly expressed (RPKM > 20; Table A1 in Appendix); the statistical median of the RPKM for all RAP2 annotated genes was 3.399 in the shoots and 4.241 in the roots (Mizuno et al., 2010). Moreover, rice Os05g0303000 had a DNA sequence similar to that of wheat PSR161. Os05g0303000 and PSR161 have been mapped in the centromeric regions of rice Cen5 (Figure 2) and wheat Cen1B (Francki et al., 2002), respectively; their chromosomal positions are consistent with the chromosomal synteny between these two crops (Devos, 2005). The results of application of a molecular–cytogenetic method have also suggested synteny between the centromeric regions of wheat and rice (Qi et al., 2009). PSR161 encodes HSP70, which is thought to function as a molecular chaperone. As HSP70 is also conserved in Pisum sativum, Cucumis sativus, Spinacia oleracea, and Chlamydomonas reinhardtii (Francki et al., 2002), HSP70 gene silencing is likely to have serious effects. Therefore, because of the existence of highly expressed regions proximal to RCS2/CentO on the long arm, including the conserved HSP70 homolog, we consider that kinetochore formation on Cen5 on an evolutionary time scale was restricted to the short arm. The RCS2/CentO sequence is tandemly arrayed in the core region of Cen5. The length of a unit of rice RCS2/CentO is 155 bp (Dong et al., 1998); this length is considered to be related to the formation of the nucleosomal unit required for kinetochore formation (Houben and Schubert, 2003; Dawe and Hiatt, 2004; Ma et al., 2007). Cen5 had two clusters of RCS2/CentO repeats (Figure A2 in Appendix). In comparison, Cen8 has three large clusters (Wu et al., 2004) and Cen4 has 18 clusters (Zhang et al., 2004); thus the amount and organization of RCS2/CentO clusters differ markedly among Cen4, Cen5, and Cen8 (Figure A2 in Appendix). No genes were annotated (Figure A2 in Appendix), and expression was hardly detected, in the sequence separating the RCS2/CentO arrays (Table 2), suggesting that gene expression did not occur in the core region of the centromeric region. The sequences separating the RCS2/CentO array are derived from repetitive sequences, such as the centromere-specific gypsy-like retrotransposon RIRE7 (Kumekawa et al., 2001), that are fragmented and have nucleotide substitutions (Wu et al., 2004; Zhang et al., 2004). Even though Cen8 has other small RCS2/CentO sequences that have the Os08g0319450 gene within the RCS2/CentO array, Os08g0319450 was not expressed in the shoots or roots (Figure A1B in Appendix). Therefore, the region separating the RCS2/CentO array had little expression activity.
Figure A2

Distributions of . RCS2/CentO clusters in Cen4, Cen5, and Cen8 on rice genome sequence Build 5.0 (blue boxes) are shown. The small RCS2/CentO sequence in Cen8 (Figure A1B in Appendix) is not shown. Gene models based on Rice Annotation Project (RAP) representative genes are shown.

Remaining gap in the reference rice genome sequence

The published rice genomic sequence covers 95.3% of the estimated 390-Mb total genome sequence, and it contains 36 gaps (IRGSP, 2005). The 36 gaps have been gradually sequenced since the completion of the IRGSP. This sequencing has included telomeres, subtelomeres, and the ribosomal DNA cluster (Mizuno et al., 2008a). However, the latest rice genomic sequence contains only a portion of the centromeric regions. Here, we performed resequencing, assembly, and finishing of PAC clones in rice Cen5 (Figure 1; Table 1). In the remaining centromeric regions of rice chromosomes, interference by repetitive sequences has prevented further chromosome walking and subsequent genomic sequencing (Wu et al., 2003; IRGSP, 2005). In an in situ hybridization analysis, unsequenced centromeres had relatively large clusters of repetitive sequences (Cheng et al., 2002). Moreover, RCS2/CentO repetitive DNA inserted into PAC/BAC clones is easily deleted: 47.2% of centromeric PAC clones have inserts <60 kb in length, compared with 13.6% in the total library (Mizuno et al., 2006), suggesting that these clones are unstable in Escherichia coli. Thus, complete genomic sequencing of the remaining centromeric regions will be a challenging problem. Our work has primarily helped to bridge the short arm and long arm of chromosome 5 of the reference rice genome sequence constructed by the IRGSP. By using the reference genomic sequence, massive parallel sequencing of mRNA was used to generate transcript maps. Recently, the massive parallel sequencing technique has also been applied to the analysis of DNA methylation, histone modification, and protein binding. Thus, high-quality reference genomic sequencing will play pivotal roles in further sequence-based functional analysis of centromeric regions in the next-generation sequencing era.

Conflict of Interest Statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
  38 in total

1.  The generic genome browser: a building block for a model organism system database.

Authors:  Lincoln D Stein; Christopher Mungall; ShengQiang Shu; Michael Caudy; Marco Mangone; Allen Day; Elizabeth Nickerson; Jason E Stajich; Todd W Harris; Adrian Arva; Suzanna Lewis
Journal:  Genome Res       Date:  2002-10       Impact factor: 9.043

2.  Plant neocentromeres: fast, focused, and driven.

Authors:  R Kelly Dawe; Evelyn N Hiatt
Journal:  Chromosome Res       Date:  2004       Impact factor: 5.239

Review 3.  The rapidly evolving field of plant centromeres.

Authors:  Anne E Hall; Kevin C Keith; Sarah E Hall; Gregory P Copenhaver; Daphne Preuss
Journal:  Curr Opin Plant Biol       Date:  2004-04       Impact factor: 7.834

Review 4.  Updating the 'crop circle'.

Authors:  Katrien M Devos
Journal:  Curr Opin Plant Biol       Date:  2005-04       Impact factor: 7.834

Review 5.  An overview of plant chromosome structure.

Authors:  N Gill; C S Hans; S Jackson
Journal:  Cytogenet Genome Res       Date:  2008-05-22       Impact factor: 1.636

6.  Mapping and quantifying mammalian transcriptomes by RNA-Seq.

Authors:  Ali Mortazavi; Brian A Williams; Kenneth McCue; Lorian Schaeffer; Barbara Wold
Journal:  Nat Methods       Date:  2008-05-30       Impact factor: 28.547

7.  Genomic and genetic characterization of rice Cen3 reveals extensive transcription and evolutionary implications of a complex centromere.

Authors:  Huihuang Yan; Hidetaka Ito; Kan Nobuta; Shu Ouyang; Weiwei Jin; Shulan Tian; Cheng Lu; R C Venu; Guo-Liang Wang; Pamela J Green; Rod A Wing; C Robin Buell; Blake C Meyers; Jiming Jiang
Journal:  Plant Cell       Date:  2006-07-28       Impact factor: 11.277

8.  A molecular-cytogenetic method for locating genes to pericentromeric regions facilitates a genomewide comparison of synteny between the centromeric regions of wheat and rice.

Authors:  Lili Qi; Bernd Friebe; Peng Zhang; Bikram S Gill
Journal:  Genetics       Date:  2009-09-21       Impact factor: 4.562

9.  Physical maps and recombination frequency of six rice chromosomes.

Authors:  Jianzhong Wu; Hiroshi Mizuno; Mika Hayashi-Tsugane; Yukiyo Ito; Yoshino Chiden; Masaki Fujisawa; Satoshi Katagiri; Shoko Saji; Shoji Yoshiki; Wataru Karasawa; Rie Yoshihara; Akiko Hayashi; Harumi Kobayashi; Kazue Ito; Masao Hamada; Masako Okamoto; Maiko Ikeno; Yoko Ichikawa; Yuichi Katayose; Masahiro Yano; Takashi Matsumoto; Takuji Sasaki
Journal:  Plant J       Date:  2003-12       Impact factor: 6.417

10.  TopHat: discovering splice junctions with RNA-Seq.

Authors:  Cole Trapnell; Lior Pachter; Steven L Salzberg
Journal:  Bioinformatics       Date:  2009-03-16       Impact factor: 6.937

View more
  10 in total

1.  Transcriptome profiling of short-term response to chilling stress in tolerant and sensitive Oryza sativa ssp. Japonica seedlings.

Authors:  Matteo Buti; Marianna Pasquariello; Domenico Ronga; Justyna Anna Milc; Nicola Pecchioni; Viet The Ho; Chiara Pucciariello; Pierdomenico Perata; Enrico Francia
Journal:  Funct Integr Genomics       Date:  2018-06-06       Impact factor: 3.410

2.  Transcriptome analysis and molecular marker discovery in Solanum incanum and S. aethiopicum, two close relatives of the common eggplant (Solanum melongena) with interest for breeding.

Authors:  P Gramazio; J Blanca; P Ziarsolo; F J Herraiz; M Plazas; J Prohens; S Vilanova
Journal:  BMC Genomics       Date:  2016-04-23       Impact factor: 3.969

3.  Candidate loci involved in domestication and improvement detected by a published 90K wheat SNP array.

Authors:  Lifeng Gao; Guangyao Zhao; Dawei Huang; Jizeng Jia
Journal:  Sci Rep       Date:  2017-03-22       Impact factor: 4.379

4.  A manually annotated Actinidia chinensis var. chinensis (kiwifruit) genome highlights the challenges associated with draft genomes and gene prediction in plants.

Authors:  Sarah M Pilkington; Ross Crowhurst; Elena Hilario; Simona Nardozza; Lena Fraser; Yongyan Peng; Kularajathevan Gunaseelan; Robert Simpson; Jibran Tahir; Simon C Deroles; Kerry Templeton; Zhiwei Luo; Marcus Davy; Canhong Cheng; Mark McNeilage; Davide Scaglione; Yifei Liu; Qiong Zhang; Paul Datson; Nihal De Silva; Susan E Gardiner; Heather Bassett; David Chagné; John McCallum; Helge Dzierzon; Cecilia Deng; Yen-Yi Wang; Lorna Barron; Kelvina Manako; Judith Bowen; Toshi M Foster; Zoe A Erridge; Heather Tiffin; Chethi N Waite; Kevin M Davies; Ella P Grierson; William A Laing; Rebecca Kirk; Xiuyin Chen; Marion Wood; Mirco Montefiori; David A Brummell; Kathy E Schwinn; Andrew Catanach; Christina Fullerton; Dawei Li; Sathiyamoorthy Meiyalaghan; Niels Nieuwenhuizen; Nicola Read; Roneel Prakash; Don Hunter; Huaibi Zhang; Marian McKenzie; Mareike Knäbel; Alastair Harris; Andrew C Allan; Andrew Gleave; Angela Chen; Bart J Janssen; Blue Plunkett; Charles Ampomah-Dwamena; Charlotte Voogd; Davin Leif; Declan Lafferty; Edwige J F Souleyre; Erika Varkonyi-Gasic; Francesco Gambi; Jenny Hanley; Jia-Long Yao; Joey Cheung; Karine M David; Ben Warren; Ken Marsh; Kimberley C Snowden; Kui Lin-Wang; Lara Brian; Marcela Martinez-Sanchez; Mindy Wang; Nadeesha Ileperuma; Nikolai Macnee; Robert Campin; Peter McAtee; Revel S M Drummond; Richard V Espley; Hilary S Ireland; Rongmei Wu; Ross G Atkinson; Sakuntala Karunairetnam; Sean Bulley; Shayhan Chunkath; Zac Hanley; Roy Storey; Amali H Thrimawithana; Susan Thomson; Charles David; Raffaele Testolin; Hongwen Huang; Roger P Hellens; Robert J Schaffer
Journal:  BMC Genomics       Date:  2018-04-16       Impact factor: 3.969

Review 5.  The Role of Human Centromeric RNA in Chromosome Stability.

Authors:  Simon Leclerc; Katsumi Kitagawa
Journal:  Front Mol Biosci       Date:  2021-03-31

6.  Genome-Wide Association Studies Identifying Multiple Loci Associated With Alfalfa Forage Quality.

Authors:  Sen Lin; Cesar Augusto Medina; O Steven Norberg; David Combs; Guojie Wang; Glenn Shewmaker; Steve Fransen; Don Llewellyn; Long-Xi Yu
Journal:  Front Plant Sci       Date:  2021-06-18       Impact factor: 5.753

Review 7.  The Nipponbare genome and the next-generation of rice genomics research in Japan.

Authors:  Takashi Matsumoto; Jianzhong Wu; Takeshi Itoh; Hisataka Numa; Baltazar Antonio; Takuji Sasaki
Journal:  Rice (N Y)       Date:  2016-07-22       Impact factor: 4.783

8.  Single-molecule sequencing and Hi-C-based proximity-guided assembly of amaranth (Amaranthus hypochondriacus) chromosomes provide insights into genome evolution.

Authors:  D J Lightfoot; D E Jarvis; T Ramaraj; R Lee; E N Jellen; P J Maughan
Journal:  BMC Biol       Date:  2017-08-31       Impact factor: 7.431

9.  Genomic insights from the first chromosome-scale assemblies of oat (Avena spp.) diploid species.

Authors:  Peter J Maughan; Rebekah Lee; Rachel Walstead; Robert J Vickerstaff; Melissa C Fogarty; Cory R Brouwer; Robert R Reid; Jeremy J Jay; Wubishet A Bekele; Eric W Jackson; Nicholas A Tinker; Tim Langdon; Jessica A Schlueter; Eric N Jellen
Journal:  BMC Biol       Date:  2019-11-22       Impact factor: 7.431

10.  Chromosomal characteristics of salt stress heritable gene expression in the rice genome.

Authors:  Matthew T McGowan; Zhiwu Zhang; Stephen P Ficklin
Journal:  BMC Genom Data       Date:  2021-05-27
  10 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.