| Literature DB >> 16990247 |
Jian-Hua Yang1, Xiao-Chen Zhang, Zhan-Peng Huang, Hui Zhou, Mian-Bo Huang, Shu Zhang, Yue-Qin Chen, Liang-Hu Qu.
Abstract
Small nucleolar RNAs (snoRNAs) represent an abundant group of non-coding RNAs in eukaryotes. They can be divided into guide and orphan snoRNAs according to the presence or absence of antisense sequence to rRNAs or snRNAs. Current snoRNA-searching programs, which are essentially based on sequence complementarity to rRNAs or snRNAs, exist only for the screening of guide snoRNAs. In this study, we have developed an advanced computational package, snoSeeker, which includes CDseeker and ACAseeker programs, for the highly efficient and specific screening of both guide and orphan snoRNA genes in mammalian genomes. By using these programs, we have systematically scanned four human-mammal whole-genome alignment (WGA) sequences and identified 54 novel candidates including 26 orphan candidates as well as 266 known snoRNA genes. Eighteen novel snoRNAs were further experimentally confirmed with four snoRNAs exhibiting a tissue-specific or restricted expression pattern. The results of this study provide the most comprehensive listing of two families of snoRNA genes in the human genome till date.Entities:
Mesh:
Substances:
Year: 2006 PMID: 16990247 PMCID: PMC1636440 DOI: 10.1093/nar/gkl672
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1CDseeker and ACAseeker core algorithm workflow. (A) The C/D snoRNA model. The C/D box snoRNAs carry the conserved boxes C (RUGAUGA, R = purine) and D (CUGA) near their 5′ and 3′ ends, respectively. The two boxes are frequently folded together by a short (4–5 bp) terminal helix, to form a structure similar to a kink-turn. Often, imperfect copies of the C and D boxes, named C′ and D′, are located internally, in the order C/D′/C′/D. The 2′-O-ribose methylation of target RNAs is guided by one or two 10–21 antisense elements located upstream of the D and/or D′ boxes, so that the modified base is paired with the snoRNA nucleotide located precisely 5 nt upstream of the D or D′ box (17). (B) Schematic representation of the CDseeker algorithms. (C) The H/ACA snoRNA model. The H/ACA box snoRNAs consist of two hairpins and two short single-stranded regions, which contain the H box (ANANNA) and the ACA box. The latter is always located 3 nt upstream of the 3′ end of the snoRNA. The hairpins contain bulges, or recognition loops that form complex pseudoknots with the target RNA, where the target uridine is the first unpaired base. The position of the substrate uridine always resides 13–16 nt upstream of the H box (left recognition pocket) or of the ACA box (right recognition pocket) (17). (D) Schematic representation of the ACAseeker algorithms.
Figure 2(A) SnoRNA conserved features on the human UCSC Genome Browser. C/D and H/ACA snoRNAs are colored blue and green, respectively. Conservation track reveals that sequences corresponding to snoRNAs are more highly conserved than those of flanking sequences. (B) A candidate box H/ACA RNA by computational method does not fit the conserved pattern. UCSC conservation track reveals that sequences corresponding to candidate box H/ACA RNA are less highly conserved than those of flanking sequences. Although this candidate can fold into a hairpin–hinge–hairpin–tail structure, its expression cannot be confirmed by northern blot and reverse transcription.
Figure 3Computational identification of box C/D snoRNAs. (A) The distribution of CDseeker scores for known ‘orphan’ C/D snoRNA genes. (B) The distribution of CDseeker scores for known guide C/D snoRNA genes. (HS, MM, RN, CF and BT are abbreviations of human, mouse, rat, dog and cow, respectively. HS-MM represent for human–mouse WGA sequences.)
Figure 4Flowchart of the CDseeker and ACAseeker algorithms. (A) The flowchart of the CDseeker algorithm is divided into three main stages. The initial stage is a scan of the four WGA sequences by the CDseeker core program. The second stage is location of the genome using the locateGenome program. The final stage is to intersect the four results and filter the candidate sequence with an evolution conservation pattern. The number of known snoRNAs found at different stages of analysis is shown in parentheses. (B) The flowchart of the ACAseeker algorithm is divided into three main stages. The initial stage is a scan of the four high WGA sequences by the ACAseeker core program. The second stage is location of the genome using the locateGenome program. The final stage is to intersect the four results and filter the candidate sequence with an evolution conservation pattern. The number of known snoRNAs found at different stages of analysis is shown in parentheses.
Novel box C/D snoRNA genes
| snoRNA name | Len | Location | Exp. | Homology | Modification | Antisense element | Host gene/comments |
|---|---|---|---|---|---|---|---|
| Novel guide | |||||||
| SNORD117 | 94 | chr3:52699794–52699886 | N.blot | MM RN CF BT | 18S-Gm683 | 15 nt (3′) | GNL3 |
| SNORD118 | 101 | chr14:44649828–44649928 | N.blot | MM RN CF BT | 18S-Gm1447 | 11 nt (3′) | PRPF39 |
| SNORD119 | 96 | chr20:2391598–2391693 | N.blot | MM RN CF BT | 28S-Am4560 | 16 nt (3′) | SNRPB |
| SNORD120 | 84 | chrX:20064093–20064185 | N.blot | MM RN BT | U2-Am30 | 13 nt (3′) | EIF1AX |
| SNORD121A | 91 | chr9:33942762–33942852 | N.blot | MM RN CF | 28S-Gm4607 | 12 nt (5′) | UBAP2 |
| SNORD121B | 93 | chr9:33924286–33924378 | N.blot | MM RN CF | 28S-Gm4607 | 12 nt (5′) | UBAP2 |
| Novel isoform | |||||||
| SNORD41B | 94 | chr19:12675401–12675494 | Iso(U41) | MM RN CF BT | 28S-Um4276 | 14 nt (3′) | TNPO2 |
| SNORD12B | 103 | chr20:47330257–47330359 | Iso(HBII-99) | MM RN CF BT | 28S-Gm3878 | 13 nt (5′) | HSUP1 |
| SNORD111B | 94 | chr16:69120906–69120999 | Iso(HBII-82) | MM CF BT | 28S-Gm3923 | 16 nt (5′) | SF3B3 |
| SNORD58C | 79 | chr18:45269603–45269692 | Iso(U58) | RN BT | 28S-Um4197 | 15 nt (5′) | RPL17 |
| SNORD11B | 112 | chr2:202864285–202864396 | Iso(HBII-95) | MM RN CF BT | 18S-Gm509 | 13 nt (3′) | NOP5/NOP58 |
| SNORD105B | 92 | chr19:10081425–10081516 | Iso(U105) | MM RN CF | 18S-Um799 | 15 nt (3′) | P2Y11 |
| Novel orphan | |||||||
| SNORD122 | 100 | chr2:29004342–29004441 | N.blot | MM RN CF BT | Unknown | Unknown | WDR43 |
| SNORD123 | 88 | chr5:9601939–9602026 | N.blot | MM RN CF BT | Unknown | Unknown | Hs.34447 |
| SNORD124 | 104 | chr17:35437321–35437424 | N.blot | MM RN CF BT | Unknown | Unknown | THRAP4 |
| SNORD125 | 96 | chr22:28059152–28059247 | N.blot | CF BT | Unknown | Unknown | AP1B1 |
| SNORD126 | 99 | chr14:19864440–19864538 | N.blot | RN CF | Unknown | Unknown | CCNB1IP |
| Novel isoform | |||||||
| SNORD116@ | 106 | chr15:22881615–22881718 | Iso(HBII-85) | MM RN CF BT | Unknown | Unknown | SNURF-SNRNP-UBE3A |
| SNORD114@ | 93 | chr14:100534548–100534640 | Iso(14q(II)) | CF BT | Unknown | Unknown | MEG8 |
‘Iso’: is isoforms; ‘Len’: length of the snoRNA gene (as the program extends 5′ and 3′ stems by 15 nt, the predicted snoRNAs may be 20 nt larger than corresponding snoRNAs confirmed by northern blot); ‘Exp’: expression situation. ‘N.blot’ indicate the snoRNA was identified by northern blotting analysis in our work. In the column ‘host gene’, the protein-coding host genes are denoted by their symbols. In column ‘location’, the genomic locations are shown. In the column ‘modification’, a nucleotide with ‘m’ represents the rRNA or snRNA methylation site that is conserved in mammals. HS, MM, RN, CF, and BT are abbreviations of human (hg18, March 2006), mouse, rat, dog and cow, respectively.
Novel box H/ACA snoRNA genes
| snoRNA name | Len. (nt) | Location | Exp. | Homology | Modification | Antisense element | Host gene/comments |
|---|---|---|---|---|---|---|---|
| Novel guide | |||||||
| SCARNA26 | 145 | chr1:153915523–153915667 | N.blot | CF | U4-Ψ78 | 6 + 7 nt (5′) | YY1AP1 |
| SNORA82 | 123 | chr3:187986808–187986930 | N.blot | MM RN | 28S-Ψ4491 | 3 + 7 nt (5′) | EIF4A2 |
| SCARNA27 | 126 | chr6:8031640–8031765 | N.blot | CF BT | U4-Ψ4 | 7 + 3 nt (5′) | EEF1E1 |
| SNORA83A | 135 | chr7:64168351–64168485 | N.blot | RN | 18S-Ψ1367 | 5 + 7 nt (5′) | LOC441242 |
| SNORA83B | 135 | chr7:64862474–64862608 | N.blot | RN | 18S-Ψ1367 | 5 + 7 nt (5′) | LOC441241 |
| SNORA77B | 122 | chr22:18493925–18494047 | N.blot | MM RN CF | 18S-Ψ814 | 6 + 5 nt (5′) | RANBP1 |
| SNORA77A | 123 | chr1:201965332–201965454 | N.blot (Schattner, ACA63) | MM RN CF BT | 18S-Ψ814 | 6 + 5 nt (5′) | ATP2B4 |
| SNORA80B | 135 | chr7:6023034–6023168 | Schattner. | CF | 18S-Ψ109-Ψ572 | 6 + 4nt (5′) 7+ 4nt(3′) | JTV1 |
| SNORA80A | 136 | chr21:32671367–32671502 | Schattner. (ACA67) | MM BT | 18S-Ψ109-Ψ572 | 6 + 4nt (5′) 7+ 4nt(3′) | C21orf108 |
| SNORA79 | 140 | chr14:80738792–80738931 | Schattner. (ACA65) | MM RN | U6-Ψ31-Ψ86 | 5 + 6nt (5′) 5+ 7nt(3′) | GTF2A1 |
| SCARNA21 | 138 | chr17:7750166–7750303 | Schattner. (ACA68) | CF | U12-Ψ19 | 6 + 4 nt (5′) | CHD3 |
| SNORA76 | 132 | chr17:59577431–59577562 | Schattner. (ACA62) | MM RN CF BT | 18S-Ψ34-Ψ105 | 7 + 3nt(5′) 7 + 5nt(3′) | EST cluster |
| SCARNA22 | 132 | chr5:82395779–82395910 | Gu.(U109) | MM RN CF | U1-Ψ6 | 4 + 5nt (3′) | MGC23909 |
| Novel isoform | |||||||
| SNORA58B | 134 | chr1:152498829–152498962 | Iso(ACA58) | RN BT | 28S-Ψ3823 | 5 + 9 nt (5′) | UBAP2L |
| Novel orphan | |||||||
| SNORA84 | 133 | chr9:94094564–94094696 | N.blot(Washietl) | MN RN CF | Unknown | Unknown | IARS |
| SNORA85 | 130 | chr15:63364852–63364981 | N.blot | MM RN CF BT | Unknown | Unknown | PARP16 |
| SNORA86 | 132 | chr7:64163814–64163945 | N.blot | MM RN CF BT | Unknown | Unknown | BX649060 |
| SNORA45 | 127 | chr11:8663564–8663690 | Gu.(ACA3-2) | MM CF BT | Unknown | Unknown | RPL27A |
| SNORA12 | 144 | chr10:101986903–101987046 | Gu.(U108) | CF | Unknown | Unknown | CWF19L1 |
| Novel Isoform | |||||||
| SNORA36C | 129 | chr2:69600679–69600807 | Iso(ACA36) | CF | Unknown | Unknown | AAK1 |
| SNORA38B | 132 | chr17:63167248–63167379 | Iso(ACA38) | BT | Unknown | Unknown | NOL11 |
| SNORA70B | 134 | chr2:61497883–61498016 | Iso(U70) | MN BT | Unknown | Unknown | USP34 |
| SNORA70C | 134 | chr9:118983166–118983299 | Iso(U70) | BT | Unknown | Unknown | ASTN2 |
| SNORA11B | 128 | chr14:90662523–90662650 | Iso(U107) | MM BT | Unknown | Unknown | C14orf159 |
| SNORA11C | 128 | chrX:47132993–47133120 | Iso(U107) | MM CF BT | Unknown | Unknown | ZNF157 |
| SNORA11D | 127 | chrX:51823183–51823309 | Iso(U107) | BT | Unknown | Unknown | MAGED4 |
| SNORA11E | 127 | chrX:51950458–51950584 | Iso(U107) | BT | Unknown | Unknown | MAGED4 |
‘Iso’: is isoforms; ‘Len’: length of the snoRNA gene, ‘Exp’: expression situation. ‘N.blot’ indicate the snoRNA was identified by northern blotting analysis in our work. ‘Schattner’, ‘Gu’ and ‘Washietl’ indicates the confirmed expression of snoRNAs in other works (ref. 13, 16, 21). In the column ‘host gene’, the protein-coding host genes are denoted by their symbols. In column ‘location’, the genomic locations are shown. In the column ‘modification’, a nucleotide with ‘Ψ’ represents the rRNA or snRNA pseudouridine site that is conserved in mammals. HS, MM, RN, CF, and BT are abbreviations of human (hg18, March 2006), mouse, rat, dog and cow, respectively.
Figure 5Northern blotting analysis of the expression patterns of novel snoRNAs. Lane M, molecular weight markers (pBR322 digested with HaeIII and 5′-end-labeled with [γ-32P]ATP). The samples of different rat tissues are indicated by the names of tissues. U6 snRNA were analyzed as a control. (A) Expression pattern of novel C/D snoRNAs. (B) Expression pattern of novel H/ACA snoRNAs. (C) Expression pattern of novel snoRNAs in human cell lines. The names of human cell lines are indicated.