| Literature DB >> 24626288 |
Zhixin Zhao1, Xiaohui Wu2, Praveen Kumar Raj Kumar1, Min Dong2, Guoli Ji3, Qingshun Quinn Li4, Chun Liang5.
Abstract
Messenger RNA 3'-end formation is an essential posttranscriptional processing step for most eukaryotic genes. Different from plants and animals where AAUAAA and its variants routinely are found as the main poly(A) signal, Chlamydomonas reinhardtii uses UGUAA as the major poly(A) signal. The advance of sequencing technology provides an enormous amount of sequencing data for us to explore the variations of poly(A) signals, alternative polyadenylation (APA), and its relationship with splicing in this algal species. Through genome-wide analysis of poly(A) sites in C. reinhardtii, we identified a large number of poly(A) sites: 21,041 from Sanger expressed sequence tags, 88,184 from 454, and 195,266 from Illumina sequence reads. In comparison with previous collections, more new poly(A) sites are found in coding sequences and intron and intergenic regions by deep-sequencing. Interestingly, G-rich signals are particularly abundant in intron and intergenic regions. The prevalence of different poly(A) signals between coding sequences and a 3'-untranslated region implies potentially different polyadenylation mechanisms. Our data suggest that the APA occurs in about 68% of C. reinhardtii genes. Using Gene Ontolgy analysis, we found most of the APA genes are involved in RNA regulation and metabolic process, protein synthesis, hydrolase, and ligase activities. Moreover, intronic poly(A) sites are more abundant in constitutively spliced introns than retained introns, suggesting an interplay between polyadenylation and splicing. Our results support that APA, as in higher eukaryotes, may play significant roles in increasing transcriptome diversity and gene expression regulation in this algal species. Our datasets also provide useful information for accurate annotation of transcript ends in C. reinhardtii.Entities:
Keywords: Chlamydomonas reinhardtii; alternative polyadenylation; alternative splicing; intron retention; poly(A) signals; sequencing platforms
Mesh:
Substances:
Year: 2014 PMID: 24626288 PMCID: PMC4025486 DOI: 10.1534/g3.114.010249
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
The data sources and numbers of poly(A) sites from three sequencing platforms
| Data Types | Source | No. of Reads With Poly(A) Tails | No. of Unique Poly(A) Sites | No. of Poly(A) Site Clusters |
|---|---|---|---|---|
| Genome sequences | Phytozome v4.3 | |||
| Gene annotation | Phytozome v4.3 | |||
| Expressed sequence tag data | National Center for Biotechnology Information, Joint Genome Institute, | 338,234 | 21,041 | 11,035 |
| 454 data | Collaborator, DNAnexus | 824,565 | 88,184 | 30,086 |
| Illumina data | DNAnexus | 22,372,354 | 195,266 | 88,304 |
| Total | 23,535,153 | 256,771 | 97,479 | |
Number of PACs in different gene types
| Constitutive Introns | Retrained Introns | |||
|---|---|---|---|---|
| Protein-Coding Genes | Noncoding Genes | Protein-Coding Genes | Noncoding Genes | |
| Intron number | 134,708 | 2,550 | 5,151 | 116 |
| No. of introns that contain PACs | 12,031 | 153 | 120 | 5 |
| % of introns that contain PACs | 8.93% | 6.00% | 2.33% | 4.31% |
The percentage is calculated by (no. of introns that contains PACs)/(intron number). PACs, poly(A) site clusters.
Figure 1The distribution of poly(A) sites in the genic and intergenic regions in different datasets. (A) All unique poly(A) sites from Sanger expressed sequence tags (ESTs), 454, and Illumna before 50-nt extension. (B) All unique poly(A) sites from ESTs, 454, and Illumna after 50-nt extension. (C) Poly(A) site clusters (PACs) after 50-nt extension from EST data. (D) PACs after 50-nt extension from 454 data. (E) PACs after 50-nt extension from Illumina data. (F) PACs after 50-nt extension from all three combined data. UTR, untranslated region; CDS, coding sequences.
Figure 2The single-nucleotide profiles from different datasets. (A) All combined poly(A) site clusters (including ESTs, 454 and Illumina). (B) Illumina data. (C) Expressed sequence tag data EST data. (D) 454 data.
Figure 3The length difference of intron and coding sequences (CDS) with and without poly(A) site clusters. (A) Intron. (B) CDS. PA: poly(A) sites; NPA1: control group 1 without poly(A) sites; NPA2: control group 2 without poly(A) sites; NPA3: control group 3 without poly(A) sites.
Number of PACs in different categories
| Poly(A) Site classification | No. of PAC | PACs Percentage | No. of Genes | Gene Percentage |
|---|---|---|---|---|
| Constitutive | 2747 | 4.93 | 2,747 | 16.05 |
| Strong | 3653 | 6.56 | 3,653 | 21.35 |
| Weak | 10,276 | 18.44 | 3,653 | 21.35 |
| Median | 39,044 | 70.07 | 7,946 | 46.43 |
PAC, poly(A) site clusters.
Figure 4The single nucleotide profiles (−50 to +25) and putative poly(A) signals in near upstream element (NUE) region (−28 to −5) from different poly(A) site categories. (A) Constitutive poly(A) sites. (B) Strong poly(A) sites. (C) Weak poly(A) sites. (D) Median poly(A) sites.
Figure 5The gene numbers with different PACs among different datasets. (A) All poly(A) data (including Sanger expressed sequence tags (ESTs), 454, and Illumina). (B) Illumina data. (C) EST data. (D) 454 data. PACs, Poly(A) site clusters.
APA extent variation among the four datasets
| Dataset | No. PACs | APA Extent, % |
|---|---|---|
| EST | 11,035 | 7.87 |
| 454 | 30,086 | 27.49 |
| Illumina | 88,304 | 63.46 |
| Total | 97,479 | 67.78 |
APA, alternative polyadenylation; PACs, poly(A) site clusters.
The most significant GO functions in high-quality APA genes with five or more PACs
| GO_ID | Ontology | Term | P value |
|---|---|---|---|
| GO:0004872 | molecular_function | Receptor activity | 3.24E-33 |
| GO:0034660 | biological_process | ncRNA metabolic process | 1.16E-23 |
| GO:0008236 | molecular_function | Serine-type peptidase activity | 3.95E-23 |
| GO:0017171 | molecular_function | Serine hydrolase activity | 3.95E-23 |
| GO:1901135 | biological_process | Carbohydrate derivative metabolic process | 5.48E-23 |
| GO:0006399 | biological_process | tRNA metabolic process | 1.07E-21 |
| GO:0006418 | biological_process | tRNA aminoacylation for protein translation | 1.34E-18 |
| GO:0043038 | biological_process | Amino acid activation | 1.34E-18 |
| GO:0043039 | biological_process | tRNA aminoacylation | 1.34E-18 |
| GO:0009056 | biological_process | Catabolic process | 1.43E-17 |
| GO:0004812 | molecular_function | Aminoacyl-tRNA ligase activity | 5.44E-16 |
| GO:0016875 | molecular_function | Ligase activity, forming carbon-oxygen bonds | 5.44E-16 |
| GO:0016876 | molecular_function | Ligase activity, forming aminoacyl-tRNA and related compounds | 5.44E-16 |
| GO:0003824 | molecular_function | Catalytic activity | 8.21E-16 |
| GO:0016798 | molecular_function | Hydrolase activity, acting on glycosyl bonds | 2.26E-12 |
| GO:0004553 | molecular_function | Hydrolase activity, hydrolyzing O-glycosyl compounds | 2.77E-12 |
GO, Gene Ontology; APA, alternative polyadenylation; PACs, poly(A) site clusters; ncRNA, noncoding RNA; tRNA, transfer RNA.
The most significant GO functions in non-PAC genes
| GO_ID | Ontology | Term | P value |
|---|---|---|---|
| GO:0043038 | biological_process | Amino acid activation | 6.580e-25 |
| GO:0043039 | biological_process | tRNA aminoacylation | 6.580e-25 |
| GO:0006418 | biological_process | tRNA aminoacylation for protein translation | 6.580e-25 |
| GO:0004812 | molecular_function | Aminoacyl-tRNA ligase activity | 7.184e-24 |
| GO:0016875 | molecular_function | Ligase activity, forming carbon-oxygen bonds | 7.184e-24 |
| GO:0016876 | molecular_function | Ligase activity, forming aminoacyl-tRNA and related compounds | 7.184e-24 |
| GO:0005044 | molecular_function | Scavenger receptor activity | 2.591e-21 |
| GO:0038024 | molecular_function | Cargo receptor activity | 2.591e-21 |
| GO:0034660 | biological_process | ncRNA metabolic process | 1.836e-20 |
| GO:0006399 | biological_process | tRNA metabolic process | 1.234e-18 |
| GO:0008236 | molecular_function | Serine-type peptidase activity | 2.101e-13 |
| GO:0017171 | molecular_function | Serine hydrolase activity | 2.101e-13 |
| GO:0016070 | biological_process | RNA metabolic process | 2.223e-13 |
| GO:0004872 | molecular_function | Receptor activity | 1.871e-12 |
GO, Gene Ontology; PACs, poly(A) site clusters; ncRNA, noncoding RNA; tRNA, transfer RNA.