Literature DB >> 17175533

Genome-wide analyses of retrogenes derived from the human box H/ACA snoRNAs.

Yuping Luo1, Siguang Li.   

Abstract

The family of box H/ACA snoRNA is an abundant class of non-protein-coding RNAs, which play important roles in the post-transcriptional modification of rRNAs and snRNAs. Here we report the characterization in the human genome of 202 sequences derived from box H/ACA snoRNAs. Most of them were retrogenes formed using the L1 integration machinery. About 96% of the box H/ACA RNA-related sequences are found in corresponding locations on the chimpanzee and human chromosomes, while the mouse shares approximately 50% of these human sequences, suggesting that some of the H/ACA RNA-related sequences in primate occurred after the rodent/primate divergence. Of the H/ACA RNA-related sequences, 49% are found in intronic regions of protein-coding genes and 64 H/ACA-related sequences can be folded to the typical secondary structure of the box H/ACA snoRNA family, while 30 of them were recognized as functional homologs of their corresponding box H/ACA snoRNAs previously reported. Of the 64 sequences with the typical secondary structure of the box H/ACA RNA family, 11 were found in EST databases and 5 among which were shown to be expressed in more than one human tissue. Notably, U107f is nested in an intron of a protein gene coding for nudix-type motif 13, but expressed from the opposite strand, and the searching of EST databases revealed it can be expressed in liver and spleen, even in melanotic melanoma.

Entities:  

Mesh:

Substances:

Year:  2006        PMID: 17175533      PMCID: PMC1802619          DOI: 10.1093/nar/gkl1086

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

The family of box H/ACA RNA is an abundant class of non-protein-coding RNAs, which includes small nucleolar RNAs (snoRNAs), small Cajal body-specific RNAs (scaRNAs) (1), as well as, a homologous class of RNAs in archaeal organisms (2). Typical box H/ACA RNA exhibits a common hairpin–hinge–hairpin-tail secondary structure with the H (ANANNA) motif in the single-stranded hinge region and an ACA triplet located 3 nt upstream of the 3′ termini (3). The majority of known box H/ACA RNAs play important roles in the post-transcriptional modification of rRNAs and snRNAs (4,5): the box H/ACA snoRNAs direct the conversion of uridine to pseudouridine at specific residues of eukaryotic ribosomal RNAs as well as Pol III-transcribed snRNA U6, whereas box H/ACA scaRNAs guide the formation of Pol II-transcribed spliceosomal nuclear RNA (snRNAs) Ψs (1). However, a few H/ACA RNAs are involved in rRNA processing, for example, U17, an evolutionarily conserved H/ACA snoRNA present in vertebrate, yeasts and the unicellular protozoan Tetrahymena thermophila (6), is involved in rRNA processing at the 5′ end of 18S rRNA (7). Most likely, U17 functions as an RNA chaperone that safeguards the correct folding of 18S rRNA during pre-rRNA processing. Recently, systematic experimental approaches and computational screening programs for H/ACA RNAs have been developed and numerous H/ACA RNAs have been detected in eukaryotes from yeast to human (8–15). In humans, ∼100 H/ACA RNAs have been identified, and most of which are located within the introns of protein-encoding genes (16). Some H/ACA RNAs have several copies in different introns of the same genes (17,18), or within introns of different genes (19), suggesting redundant H/ACA RNAs appear to have arisen via duplication or transposition from existing H/ACA RNAs, but the ultimate origin of these RNAs is an open question. In humans, retrotransposons of the long interspersed element-1 (L1) family and their remnants account for ∼17% of the human genome (20,21). The enzymatic machinery of a retrotransposition-competent L1 predominantly transposes its own copies (22). However, L1s are capable of transposing other sequences, mostly Alu retroposons, but also cDNAs of different types of cellular RNAs (23–25), thus forming retrogenes or retropseudogenes. The existence of an H/ACA retrogene, i.e. a non-autonomously transcribed H/ACA RNA-related sequence, was reported previously in the mouse genome (15), but no H/ACA retrogene was characterized in humans. Here we have identified 202 novel box H/ACA RNA-related sequences in the human genome, most of which are retrogenes. Sequence analyses suggest the involvement of the L1 retroposition machinery in the formation of human H/ACA RNA retrogenes. In addition, we found that the previously reported genes encoding ACA14a, ACA37, ACA41, ACA58, ACA59a, ACA59b, ACA63, ACA66, ACA67, ACA71a, ACA98b and U109 all appear to have resulted from retrotransposition events of H/ACA RNAs, suggesting retrotransposition mechanisms have played a pivotal role in the mobility and diversification of H/ACA RNA genes.

MATERIALS AND METHODS

Computational search for H/ACA RNA-related genes in Homo sapiens

The sequences of human H/ACA sno/scaRNAs were taken from the snoRNA database (). We used the megaBLAST tool on the NCBI website () to find box H/ACA RNA-related genes or pseudogenes on the human genome (NCBI build 36.1). The BLAST hits kept for further analysis contained at least 60% of the corresponding mature H/ACA RNA. H/ACA RNA-related sequences found in H.sapiens were retrieved with a 600 nt extension at each extremity and then searched for orthologs in chimpanzee genome (Pan troglodytes; NCBI build 1.1), mouse genome (mouse NCBI build 36.1) and other animal databases. All H/ACA RNA-related genes or pseudogenes were mapped on human genome using BLAT search ().

Sequence identity analysis

All H/ACA RNA-related genes or pseudogenes were sequentially aligned with their corresponding H/ACA RNA gene sequence using Matcher (). The percentage of identities for each H/ACA RNA-related sequence compared with its corresponding H/ACA RNA gene was calculated.

Detection of chimeric retrogenes

To look for the eventuality of chimeric retrogenes, flanking regions of the H/ACA RNA-related sequences were sequentially aligned with the sequences of a number of other small non-protein-coding RNA species (e.g. tRNAs, snRNAs, miRNAs, rRNAs, etc.) and then investigated for repetitive elements with the RepeatMasker program ().

Prediction of secondary structures of H/ACA RNA-related sequences

The secondary structures of all computationally identified H/ACA-related RNAs were derived using the mfold program (26); .

RESULTS

Identification of 202 box H/ACA RNA-related genes

Using a computational, genome-wide search strategy for extracting of human sequences with sequence similarities to various box H/ACA RNAs, we found 202 box H/ACA RNA-related sequences (Table 1) when requirements for >80% identity of sequence relative to at least 60% of the length of the corresponding RNA were set. The list of these sequences is appended as Supplementary data. We also searched chimpanzee and mouse genomes and found that ∼96% of these human box H/ACA RNA-related genes exist in corresponding locations on the chimpanzee chromosomes, while mouse share ∼50% of these human box H/ACA RNA-related sequences (data not shown). The distribution of numbers of different human box H/ACA RNA-related genes is strikingly skewed. U70 has the most copies at 21, ACA40 has the second-most at 13, while 13 H/ACA RNAs have only one copy of ACA-related gene each, and no H/ACA-related gene was found for 28 H/ACA genes.
Table 1

Box H/ACA RNA-related genes in human

NNameGenomic placementChromosomeChromosome start positionIdentity (%)cTypeGenBank accession no.
1ACA1baIntronic85697783694.6RetrogeneAC046176
2ACA1cIntronic220362525387.7RetrogenefAC023271
3ACA1dIntronic162425214583.9AC004125
4ACA2cIntronic21021265088.2RetrogeneAC007240
5ACA2dIntronic18451550991.4RetrogeneAL359273
6ACA3bIntergenic214217540091.0dRetrogeneAP001745
7ACA3-2baIntergenic16278641086.3RetrogeneAC005570
8ACA3-2cIntergenic128310123380.0RetrogeneAC090679
9ACA4baIntronicb219797768883.9RetrogenefAC010746
10ACA7cIntronic11390037490.0RetrogenefAC087441
11ACA7dIntronicb117364110788.5RetrogeneAP000577
12ACA7eIntergenicX1564425287.1RetrogeneAC112497
13ACA7fIntergenic85209013481.0RetrogenefAC090919
14ACA8bIntergenicX13201444895.0RetrogenefZ77249
15ACA8caIntergenic176269804695.0RetrogeneAC007448
16ACA8dIntronic64190857890.3dRetrogenefAL365205
17ACA8eIntronic63889806181.9dRetrogeneAC022402
18ACA9baIntronicbX9996426594.0RetrogenefZ95327
19ACA9caIntronic1212266720993.3RetrogenefAC117503
20ACA9dIntronic137205881985.0RetrogenefAL356754
21ACA10baIntronic23026380498.5RetrogenefAC016907
22ACA10caIntronic124602621585.8RetrogeneAC008083
23ACA12baIntronic12888876488.6RetrogenefAL645729
24ACA12caIntronic176832657080.0RetrogenefAC011120
25ACA15baIntronic76416835197.8AC073210
26ACA15caIntronic76486247497.8AC073107
27ACA15dIntronic221761739793.7dAC000094
28ACA16bIntronicX1697242486.7RetrogenefAL732371
29ACA17bIntergenic1011557020889.5RetrogeneAL592546
30ACA17cIntronic127031941687.5dAC078860
31ACA18bIntergenic153000779893.3RetrogeneAC079969
32ACA18caIntronicb317882476489.6RetrogeneAC026355
33ACA18daIntergenic57855252192.3RetrogeneAC016559
34ACA18eIntergenicX13881975589.0dRetrogenefAL590077
35ACA20baIntronicb73933512692.5RetrogeneAC092174
36ACA20caIntergenic88139172990.2RetrogenefAC104212
37ACA20dIntergenicX533816381.9RetrogeneAC095353
38ACA22baIntronic76416381298.5AC073210
39ACA22cIntronic123849981888.9RetrogeneAC121336
40ACA22dIntronic75609055288.5AC092579
41ACA25baIntergenic33205402387.4RetrogenefAC094019
42ACA25caIntergenic711500860584.4RetrogenefAC092590
43ACA25daIntergenic318036956781.4RetrogeneAC076966
44ACA26baIntergenic213140380386.9RetrogeneAY270787
45ACA27bIntronic513993943789.8RetrogenefAC005214
46ACA27cIntergenic161284384590.6RetrogenefAC092324
47ACA27dIntronic613721853087.3RetrogenefAL121933
48ACA30cIntronic177390761388.2RetrogenefAC061992
49ACA31bIntronic171950591289.3dRetrogeneAC005722
50ACA31cIntronic2485245987.7RetrogenefAC092169
51ACA32bIntergenic214183300382.7RetrogeneAP001741
52ACA32cIntergenic147673712180.0AC007375
53ACA33baIntergenic610413770384.8RetrogenefAL591387
54ACA33cIntronic213283220089.9RetrogeneAP000272
55ACA36caIntronic26960067997.0RetrogenefAC079121
56ACA36dIntronic107361062490.9eRetrogenefAL607035
57ACA37bIntronic92077693494.5RetrogenefAL445624
58ACA38baIntronic176316724884.3RetrogeneAC006534
59ACA38caIntergenic1211781062280.2RetrogenefAC087863
60ACA40baIntergenic213561066896.9RetrogenefAC020602
61ACA40cIntronic173834611393.1RetrogenefAC055866
62ACA40dIntergenic103676759593.1RetrogenefAL590730
63ACA40eaIntergenic611127644293.1RetrogenefZ84480
64ACA40faIntergenic202942778391.5RetrogenefAL031650
65ACA40gIntergenic21808534990.0RetrogenefAC079802
66ACA40hIntergenic111803276490.6dRetrogeneAL390877
67ACA40iIntergenic57421420291.8dRetrogenefAC010501
68ACA40jIntergenic813472602290.7RetrogenefAC090821
69ACA40kIntronic52400805292.3dRetrogeneAC026784
70ACA40lIntergenic63572758591.5dRetrogeneAL033519
71ACA40mIntergenicX12315927689.8dRetrogeneAL391241
72ACA40nIntergenic79938763892.6RetrogeneAC004522
73ACA41bIntronic154361674085.7RetrogenefAC090527
74ACA42bIntronic143725940785.6dRetrogenefAL136296
75ACA42cIntergenic111596601682.2RetrogenefAL592436
76ACA43bIntergenic116595681384.2dRetrogenefAP002748
77ACA43cIntergenic218050737280.5RetrogenefAC096587
78ACA45bIntronic64361985992.9RetrogenefAL355802
79ACA45caIntergenic204136660990.6RetrogenefAL021395
80ACA46bIntronic147700243082.6dRetrogeneAF111168
81ACA47bIntergenic110137129792.5RetrogenefAC093157
82ACA47cIntronic1906534492RetrogenefAL158048
83ACA47dIntergenic25355108989.9RetrogenefAC069157
84ACA47eIntergenic219725911785.1RetrogenefAC068544
85ACA47fIntronic111288294091.6dRetrogeneAC013549
86ACA48baIntergenicX346016091.9RetrogenefAC141001
87ACA48cIntergenic18729102185.9RetrogeneAP005270
88ACA48dIntergenic125554142687.5RetrogenefAC121758
89ACA48eIntergenic4110267286.7RetrogenefAC092535
90ACA48fIntergenic166678069185.9RetrogeneAC130462
91ACA48gIntronic222754195183.0RetrogeneAC073149
92ACA48hIntergenic152380518282.3RetrogeneAC044913
93ACA48iIntergenic710162407884.8dRetrogenefAF024533
94ACA48jIntergenic6134748787.8dRetrogenefBX322644
95ACA51bIntergenic21035362491.8RetrogenefAC007240
96ACA51cIntronic416962913590.3RetrogenefAC079926
97ACA53baIntergenic156336485893.8RetrogeneAC068213
98ACA53cIntergenic312991611188.3RetrogenefAC079945
99ACA57bIntergenic12864137284.7RetrogeneAC092184
100ACA57cIntronic1111663977187.5dRetrogenefAP000892
101ACA58bIntronic15400927890.6RetrogenefAL049745
102ACA58caIntronic115249882786.9RetrogeneAL590431
103ACA62baIntergenic56849239180RetrogeneAC022107
104ACA63baIntronic221849392592.8AC006549
105ACA63cIntronic1849444280RetrogenefAL096855
106ACA64baIntergenic115937762192.3RetrogeneAL591806
107ACA64caIntergenic414057697091.6RetrogenefAC097376
108ACA64dIntergenicX7999923183.9RetrogeneAL590031
109ACA64eaIntronicb161229855681.6RetrogeneAC092365
110ACA66bIntergenic117000383787.3RetrogenefAL135931
111ACA66cIntergenic172336995786.6RetrogenefAC090287
112ACA66dIntergenic157320093383.0RetrogenefAC113208
113ACA67dIntronic7602303488.3RetrogeneAC116348
114ACA68baIntronic11574124890.0RetrogenefAL121992
115ACA99bIntronic54082596380.0eRetrogeneAC008810
116E2bIntronic81888141787.0dRetrogenefAC009884
117E2cIntronic46829557887.7dRetrogenefAC079880
118E2dIntronic13554851488.6dRetrogenefAL160000
119E3baIntronic13665663491.1RetrogenefAC119675
120E3cIntergenic76442906789.7RetrogenefAC092685
121U17caIntronic20875983285.8RetrogenefAL031683
122U17dIntronic181765161986.4dRetrogenefAC103987
123U17eIntergenic185189764288.6RetrogeneAC016165
124U17fIntronic68948069089.9dRetrogeneAL160403
125U19bIntergenic26523929984.6RetrogeneAC007318
126U19-2bIntronic105128102988.0RetrogeneAL672187
127U19-2cIntergenic171541808686.9RetrogenefDQ480389
128U19-2dIntronic171402120186.9RetrogenefDQ075320
129U23baIntergenic41693146791.3RetrogenefAC006231
130U23cIntronic24996951080.0RetrogenefAC078994
131U23dIntronic12933053691.5dRetrogenefAC009533
132U23eIntergenic123112106891.5dRetrogenefAC008013
133U23fIntronic12948896888.7dRetrogenefAC006432
134U23gIntronic162336066786.3dRetrogenefAC008915
135U64bIntronic34025511282.1RetrogenefAC099331
136U64cIntergenic71270690780.0RetrogenefAC011891
137U67baIntergenic78791634392.3RetrogeneAC002069
138U67caIntronic117743724590.2RetrogenefAL512326
139U67dIntergenic82371968780.0RetrogenefAC012119
140U67eIntergenic61181803882.0RetrogeneAL022724
141U68baIntronic193779108391.1RetrogenefAC008474
142U68cIntergenic515858978382.2RetrogenefAC134043
143U68daIntergenicX2406122584.5RetrogeneAC079169
144U69bIntergenic17817362684.2RetrogeneAC008053
145U70baIntronic120021421193.5RetrogenefAC099676
146U70caIntronic221541991895.7RetrogenefAC016708
147U70daIntergenic26149788295.7RetrogenefAC016894
148U70eIntronic216525240787.7RetrogenefAL832824
149U70fIntergenic517072572992.8RetrogenefAC093246
150U70gIntergenic58771434586.3RetrogenefAC091826
151U70hIntergenic8885649589.9RetrogenefAC087763
152U70iaIntronic8497320987.0RetrogenefAC019176
153U70jIntergenic83351710593.7dRetrogenefAC013603
154U70kaIntronic911898320391.4RetrogenefAL355608
155U70lIntergenic118243016387.7RetrogenefAP000893
156U70maIntronic126730728288.4RetrogenefAC015550
157U70nIntronic127436919092.7dRetrogenefAC078820
158U70oIntergenic1212002922985.5RetrogeneAC079602
159U70pIntronic167028997189.1RetrogenefAC010653
160U70qIntronic164874305492.0dRetrogenefAC007610
161U70rIntronic172337348386.3RetrogenefAC090287
162U70sIntronic172512880191.7dRetrogenefAC023389
163U70tIntronic18301543294.8dRetrogenefAP005431
164U70uIntronic19979168299.0dRetrogenefAC008752
165U70vIntergenic213313604190.6RetrogenefAP000039
166U71eaIntergenic107979725482.5RetrogeneAC012560
167U72baIntergenic316189741591.1RetrogenefAC069224
168U72cIntronic120396697389.6RetrogenefAC119673
169U72dIntronic122243398293.3dRetrogenefAC092809
170U72eIntergenic213998546885.8RetrogenefAC016710
171U72fIntronic317397175883.6RetrogenefAC108667
172U72gIntergenic210471613783.1RetrogenefAC068057
173U72hIntergenic813251421587.3RetrogenefAC104040
174U87bIntergenic162150647490.1dRetrogenefAC005632
175U107baIntronicX5497046398.5eRetrogeneAL049732
176U107caIntronicX5182318385.5BX537154
177U107daIntronicX5195045785.5AL928717
178U107eIntergenic154329439692.9eAC051619
179U107faIntronicb107455584495.3eRetrogeneAC016394
180U107gaIntergenic149066252294.2eAC007374
181U107haIntronicX4713299290.7eRetrogenefAL591503
182U107iIntergenic146934068883.8eRetrogenefAL157789
183U107jIntergenic213475027881.4eRetrogenefAP000053
184U107kIntronic412071030286.1eRetrogenefAC080089
185U108baIntronic813124440382.2RetrogeneAC103725
186U108caIntronicb25564634385.8RetrogeneAC015982
187U109baIntronic119129303489.0RetrogenefAL136370
188U109caIntronicb18254535784.6RetrogenefAP005061
189U109dIntergenic166722229584.6AC126773
190U109eIntergenic27557024180.0AC007099
191HBI-6bIntergenic45330906087.7AC104066
192HBI-6cIntergenic15196303591.9AL050343
193HBI-6dIntronic121026552788.7RetrogenefAC092814
194HBI-6eIntergenic203977453388.7AL133229
195HBI-6fIntergenic715193735491.0dAC104843
196HBI-6gaIntergenic21014778385.4AC104794
197HBI-6hIntergenic20505006387.9RetrogenefAL121924
198HBI-6iIntergenic115442845989.6dRetrogeneAL135927
199HBI-6jIntergenic35339678087.7dRetrogeneAC112218
200HBI-6kIntergenic98906518583.8eRetrogenefAL136367
201HBI-61bIntronic213195850186.4dRetrogeneAC026776
202HBI-61cIntergenic181754595487.2eChimeraAC091038

aRetrogenes with common hairpin–hinge–hairpin–tail secondary structure.

bRetrogenes distributed on the antisense orientation of protein-coding genes.

cIdentity to the corresponding consensus sequence.

d5′-truncated box H/ACA RNA-related sequences.

e3′-truncated or 3′sequences are different from the corresponding consensus sequences.

fRetrogenes with poly (A) tails at their 3′ ends.

Box H/ACA RNA-related genes in human aRetrogenes with common hairpin–hinge–hairpin–tail secondary structure. bRetrogenes distributed on the antisense orientation of protein-coding genes. cIdentity to the corresponding consensus sequence. d5′-truncated box H/ACA RNA-related sequences. e3′-truncated or 3′sequences are different from the corresponding consensus sequences. fRetrogenes with poly (A) tails at their 3′ ends. These box H/ACA RNA-related sequences are not uniformly distributed on human chromosomes. There are 22 and 24 copies on chromosomes 1 and 2, respectively, however, no copy was found on chromosome Y and only two copies were found on chromosome 22, while chromosomes 5, 6, 7, 12, 17, 8 and X had some relative excess density of box H/ACA RNA-related genes. Of the 202 box H/ACA RNA-related genes found in the human genome, 99 (49%) located in intronic regions of protein-coding genes. Interestingly, eight of them were distributed on the antisense orientation of their host genes (Table 1). There were no significant differences between box H/ACA RNA-related genes located in introns and these located in intergenic regions in regard to sequence identity and sequence length (data not shown).

Most of the box H/ACA RNA-related genes are retrogenes

Careful analysis of the upstream and downstream region of these H/ACA snoRNA-related sequences, we found that of the 202 box H/ACA RNA-related genes found in this work, 182 (90%) probably correspond to H/ACA retrogenes (Table 1). All these retrogenes were flanked by direct repeats (target site duplications TSDs) of 7–17 nt, and most of them contained poly (A) tails at their 3′ ends (Figure 1). Figure 1A shows a characteristic retrogene consisting of a 3′ end poly(A) tail and of TSDs. In some cases, the H/ACA RNAs, each along with their original 5′- or 3′- flanking sequences, retrotransposed into a new location on the same or a different chromosome (Figure 1B and C), suggesting these H/ACA retrogenes resulted from somewhat stable H/ACA RNA processing intermediates in H/ACA biogenesis. However, some H/ACA RNA retrogenes originated when partially processed, exon-containing hnRNAs were reverse transcribed and inserted at new locations into the genome (Figure 1D and E), for example, the ACA40 gene hosted in the sixth intron of hypothetical protein gene MGC5306, a fragment of the MGC5306 gene including the host intron of ACA40 together with all 3′-exons, retrotransposed independently into chromosome 2 (ACA40b), chromosome 17 (ACA40c), chromosome 10 (ACA40d), chromosome 6 (ACA40e), chromosome 5 (ACA40i), chromosome 8 (ACA40j) and chromosome 5 (ACA40k).
Figure 1

Schematic representation of box H/ACA RNA retrogene examples. (A) The sequence below the scheme is retrogene U64b and 55 retrogenes belong to this type. (B) The sequence below the scheme is retrogene ACA10b and a number of retroposed nucleotides on the 5′-flanks and 5 retrogenes belong to this type. (C) The sequence below the scheme is retrogene ACA64c and a number of retroposed nucleotides on the 3′-flanks and 24 retrogenes belong to this type. (D) The sequence below the scheme is retrogene U70m and a number of retroposed nucleotides on the 3′-flanks and 25 retrogenes are similar to this case. (E) The sequence below the scheme is retrogene ACA40j and a number of retroposed nucleotides on the 3′-flanks and 12 retrogenes are similar to this case. The exon-derived sequences in (D) and (E) are shown in capital letters. (F) The sequence below the scheme is retrogene ACA7d and 6 retrogenes belong to this type. (G) The sequence below the scheme is retrogene ACA53c and a number of retroposed nucleotides on the 3′-flanks and 7 retrogenes belong to this type. (H) The sequence below the scheme is retrogene ACA36d and a number of retroposed nucleotides on the 3′-flanks and 4 retrogenes belong to this type. (I) 6 retrogenes belong to this type. The sequence below the scheme is retrogene ACA18e and a number of retroposed nucleotides on the 3′ flanks. (J) The sequence below the scheme is retrogene HBI-61c and 1 retrogene belongs to this type. In all the cases, the H/ACA RNA sequences are in italics, retroposed nucleotides on the 3′- or 5′-flanks are in lower cases, Alu sequences are shaded, poly(A) and TSD are in opened and closed boxes, respectively. The L1 consensus recognition site (TTAAAA) is indicated at the 5′ end and overlaid by a black bar in the examples.

Schematic representation of box H/ACA RNA retrogene examples. (A) The sequence below the scheme is retrogene U64b and 55 retrogenes belong to this type. (B) The sequence below the scheme is retrogene ACA10b and a number of retroposed nucleotides on the 5′-flanks and 5 retrogenes belong to this type. (C) The sequence below the scheme is retrogene ACA64c and a number of retroposed nucleotides on the 3′-flanks and 24 retrogenes belong to this type. (D) The sequence below the scheme is retrogene U70m and a number of retroposed nucleotides on the 3′-flanks and 25 retrogenes are similar to this case. (E) The sequence below the scheme is retrogene ACA40j and a number of retroposed nucleotides on the 3′-flanks and 12 retrogenes are similar to this case. The exon-derived sequences in (D) and (E) are shown in capital letters. (F) The sequence below the scheme is retrogene ACA7d and 6 retrogenes belong to this type. (G) The sequence below the scheme is retrogene ACA53c and a number of retroposed nucleotides on the 3′-flanks and 7 retrogenes belong to this type. (H) The sequence below the scheme is retrogene ACA36d and a number of retroposed nucleotides on the 3′-flanks and 4 retrogenes belong to this type. (I) 6 retrogenes belong to this type. The sequence below the scheme is retrogene ACA18e and a number of retroposed nucleotides on the 3′ flanks. (J) The sequence below the scheme is retrogene HBI-61c and 1 retrogene belongs to this type. In all the cases, the H/ACA RNA sequences are in italics, retroposed nucleotides on the 3′- or 5′-flanks are in lower cases, Alu sequences are shaded, poly(A) and TSD are in opened and closed boxes, respectively. The L1 consensus recognition site (TTAAAA) is indicated at the 5′ end and overlaid by a black bar in the examples. Most of the retrogenes harbored at their 5′ ends either a T2A4 hexanucleotide preferably recognized by L1 nicking endonuclease, or its derivatives with one or two single nucleotide substitutions (Figure 1A–E). These features suggest the involvement of the L1 retroposition machinery in the formation of the H/ACA retrogene. Notably, 39 (19%) of H/ACA RNA-related retrogenes were shortened at their 5′ end (Table 1), presumably because of premature termination of the reverse transcription step. However, there are a few H/ACA RNA-related retrogenes without satisfactory L1 signature, which lack either a poly (A) tail (Figure 1F) or T2A4 target site overlapping a TSD (Figure 1G). The existence of tailless retrogenes were reported recently (27), suggesting a variant mechanism for the biogenesis of retrosequences. Closer inspection of the H/ACA snoRNA-related retrogenes and their flanking sequences revealed that, in same cases, the H/ACA snoRNA-related retrogene had been disrupted by independent integration of an Alu element (Figure 1H). In these cases, allowing for virtual removal of the Alu insertion revealed a ‘repaired’ retrogene. In other cases, Alu sequence was inserted in the place between H/ACA RNA retrogene and the 3′-TSD (Figure 1I). This suggests that at these sites the H/ACA RNAs were inserted before the integration of the Alu elements. Interestingly, one chimeric retrogene composed of H/ACA sequence fused at its 3′ termini with Alu element, was found (Figure 1J), which was probably formed due to template switching (28) from Alu RNA to H/ACA RNA during reverse transcription and then the fused transcript was integrated into the human genome. A number of retrogenes were reported to result from template switching, including those containing U6, 5S rRNA or 7SL rRNA fused at their 3′ termini with Alu elements (24).

Some previously identified snoRNAs resulted from retrotransposition

Closer analysis of the upstream and downstream region of previously identified snoRNAs showed that ACA14a, ACA37, ACA41, ACA58, ACA59, ACA59b, ACA63, ACA66, ACA67, U71a, ACA98b and U109, are encoded by retrogenes (Figure 2). These box H/ACA RNAs were cloned from a HeLa cell extract immunoprecipitated with an anti-GAR1 antibody (18) or their expression were verified by Northern blot and primer extension (8,13,15). Clearly, these snoRNAs were formed by retrotransposition in the course of primate evolution, for example, the data obtained in this study suggest that the ACA63 gene originated as the result of retroposition of the ACA63b copy. First, ACA63b is found in corresponding locations on the human, chimpanzee and mouse genomes. Then, human and chimpanzee ATP2B4 and RERE genes encode ACA63 and another retrogene ACA63c in their introns, respectively, while the homologous genes of mouse are devoid of any ACA63-like sequence (Figure 3). Furthermore, comparison and alignment of the two loci ACA63/ACA63b from all available primate sequences revealed that the Otolemur garnettii ACA63 locus shows clean absence of the ACA63 along with its retroposed 3′-and 5′-flanking nucleotides (Supplementary Figure 1a). This convincing evidence indicates that human ACA63b that we found in this work is an evolutionary conserved snoRNA widely presented in vertebrates and retrotransposition of ACA63b occurred in primate after the rodent/primate divergence during the course of evolution. Interestingly, there are 4 ACA63c copies with obvious target site duplications (TSDs) in the chimp RERE gene, which probably resulted from a single retroposition event into this gene, followed by local segmental duplications.
Figure 2

Some previously reported H/ACA snoRNA genes with retrogene hallmarks. Schematic representation of the H/ACA RNA sequences. poly(A) and TSD are in open and closed boxes, respectively. The L1 consensus recognition site (TTAAAA) is indicated at the 5′ end.

Figure 3

Amplification of ACA63b snoRNA in primate. ACA63 sequence (small arrow) is located within an intron of the orthologous host genes. Additional copies (ACA63 and ACA63c) were generated in the primate lineage. Exons are represented by boxes. The cartoon is not drawn to scale. ATP2B4: ATPase, Ca++ transporting, plasma membrane 4. HTF9C: HpaII tiny fragments locus 9C. RANBP1: RAN binding protein 1. ZDHHC8: zinc finger, DHHC-type containing 8. RERE: arginine-glutamic acid dipeptide (RE) repeats.

Some previously reported H/ACA snoRNA genes with retrogene hallmarks. Schematic representation of the H/ACA RNA sequences. poly(A) and TSD are in open and closed boxes, respectively. The L1 consensus recognition site (TTAAAA) is indicated at the 5′ end. Amplification of ACA63b snoRNA in primate. ACA63 sequence (small arrow) is located within an intron of the orthologous host genes. Additional copies (ACA63 and ACA63c) were generated in the primate lineage. Exons are represented by boxes. The cartoon is not drawn to scale. ATP2B4: ATPase, Ca++ transporting, plasma membrane 4. HTF9C: HpaII tiny fragments locus 9C. RANBP1: RAN binding protein 1. ZDHHC8: zinc finger, DHHC-type containing 8. RERE: arginine-glutamic acid dipeptide (RE) repeats. In vertebrates, sequences encoding H/ACA are generally located in introns of their host gene, in the same orientation. So far, in vertebrates, an intron can carry only one snoRNA gene, but a host gene can carry several different snoRNA genes in different introns (16). The evolutionary analysis of H/ACA RNA genes within the introns of orthologous genes in six vertebrate species showed that a number of snoRNA genes in different introns of a host gene probably resulted from retrotransposition, for example, the H.sapiens, Pan troglodytes, Mus musculus, Rattus norvegicus and Canis familiaris EIF4A2 gene orthologs host three snoRNA genes, HBI-61, E3 and ACA4 in different introns, respectively, while Gallus gallus only contains HBI-61 in its orthologous gene (Figure 4A). Similarly, the RPSA genes in all aforementioned mammals host two H/ACA genes, E2 and ACA6 in different introns; however, G.gallus is devoid of snoRNAs in the orthologous gene (Figure 4A). Notably, human and chimpanzee ACA4, E2 and E3 are flanked by TSD of >10 nt (data not shown). Although those TSD with a few nucleotide changes, one of these TSDs' ancestral states was present in the tenrec, Echinops telfairi ACA4 (Figure 4B), suggesting ACA4 and E3 in EIF4A2 and E2 in RPSA in mammal were resulted from retroposition after the mammal/aves divergence. In addition, there are some host genes which carry several paralogous snoRNA genes in different introns, such as in the TBRG4 gene (Figure 4A). The amplification of ACA5 in the host gene most likely did not occur via retroposition because insertions of retroposed sequences are virtually random and should not lead to accumulations in neighboring introns (11).
Figure 4

Phylogenetic analysis of some H/ACA RNA genes. (A) Presence/absence of H/ACA RNA genes within the introns of orthologous host genes in six vertebrates. Each snoRNA sequence (small arrow) is located within an intron of the indicated genes. Exons are represented by boxes. The cartoon is not drawn to scale. EIF4A2: eukaryotic translation initiation factor 4A, isoform 2. RPSA: ribosomal protein SA. TBRG4: transforming growth factor beta regulator 4. (B) Retrogene ACA4 in Echinops telfairi. H/ACA RNA sequences are in italics, poly (A) and TSD are in opened and closed boxes, respectively.

Phylogenetic analysis of some H/ACA RNA genes. (A) Presence/absence of H/ACA RNA genes within the introns of orthologous host genes in six vertebrates. Each snoRNA sequence (small arrow) is located within an intron of the indicated genes. Exons are represented by boxes. The cartoon is not drawn to scale. EIF4A2: eukaryotic translation initiation factor 4A, isoform 2. RPSA: ribosomal protein SA. TBRG4: transforming growth factor beta regulator 4. (B) Retrogene ACA4 in Echinops telfairi. H/ACA RNA sequences are in italics, poly (A) and TSD are in opened and closed boxes, respectively.

Structures and expression of box H/ACA-related RNAs

Up to date, more than 100 H/ACA RNAs have been found in H.sapiens (16). In this study, we found at least two-thirds of these human H/ACA RNA genes have one or more related copies (Table 1). Remarkably, U70 has 21 related copies including six truncated sequences, and another snoRNA gene, U40, exhibits 13 related copies with six truncated sequences. Alignments of these novel H/ACA RNA-related sequences with their orthologs previously reported revealed numerous sequence changes, including small insertions or deletions, which occurred frequently in less important regions, and occasionally in the conserved elements such as box H and ACA. Despite showing sequence variation to some extent, out of 202 box H/ACA RNA-related sequences, 64 can be folded to the typical secondary structure of the box H/ACA RNA family, i.e. the hairpin–hinge–hairpin–tail structure (Supplementary Figure 2), among which 30 were recognized as functional homologs of their corresponding box H/ACA RNAs previously reported according to the relationship between the structure and function of snoRNA, while the remainder did not show any complementarity to either rRNAs or snRNAs due to the sequence diversification and therefore were recognized as orphan H/ACA RNAs. Retroposition generated for most box H/ACA RNA genes additional copies, quite a number might be functional. Due to cross-hybridization in Northern blot analysis, it could not be assessed if all the 64 box H/ACA RNA-related sequences with typical features of the box H/ACA RNA family are indeed expressed in human tissues. Therefore, we performed BLAST searches of all the 64 box H/ACA RNA-related sequences against EST databases and found that of 11, the corresponding ESTs were detected in EST databases and 5 were shown to be expressed in more than one human tissue (Table 2). Of course, identification of ESTs is not necessarily an indication for the presence of processed and functional snoRNAs. Notably, U107f is located in an intron of a protein gene coding for nudix (nucleoside diphosphate linked moiety X)-type motif 13, but expressed from the opposite strand (Figure 5) and EST database searches revealed that it can be expressed in liver and spleen, even in melanotic melanoma (Table 2). It is not clear whether U107f has a functional role as an antisense regulator for the expression of the protein-coding gene.
Table 2

Box H/ACA RNA-related genes expressed in human tissues detected in EST databases

NameGenBank accession no.ESTTissue
ACA12bAL645729BQ708140Spleen
ACA15bAC073107DB218848Trachea
ACA15cAC073107DB218848Trachea
ACA58cAL590431DW429803Liver
ACA63bAC006549CN275435Embryonic stem cell, retinoic acid and mitogen-treated hes cell line H7
AK097659Testis
AK094410Cerebellum
ACA64cAC097376DA572426Whole embryo, mainly body
U68bAC008474BQ423961Retinoblastoma
BE672593Lung carcinoid
U107bAL049732DB287809Uterus
CN267974Embryonic stem cells, cell lines H1, H7 and H9
U107cAL928717H08107Infant brain
U107dAL928717H08107Infant brain
AK094541Amygdala
CN389247Embryonic stem cells
U107fAC016394CB162932Liver
BQ224195Melanotic melanoma
BX096147Liver and spleen
Figure 5

Genomic location of U107f in H.sapiens. SnoRNA genes are shown by black arrows, protein-coding genes by non-filled and gray arrows (not drawn to scale). The length of intergenic spacers is also indicated.

Genomic location of U107f in H.sapiens. SnoRNA genes are shown by black arrows, protein-coding genes by non-filled and gray arrows (not drawn to scale). The length of intergenic spacers is also indicated. Box H/ACA RNA-related genes expressed in human tissues detected in EST databases

DISCUSSION

We have identified in the human genome databases 202 novel box H/ACA RNA-related sequences 0–20% diverged from their corresponding genes reported previously and belonging to 61 box H/ACA RNA types (Table 1), which shows that most human box H/ACA RNA have multiple copies. In contrast to Arabidopsis and rice, where many snoRNAs are found in multiple copies mainly resulting from two different mechanisms: large chromosomal duplications and small tandem duplications producing polycistronic genes (29), human multiple box H/ACA copies mainly result from retroposition. Out of 202 box H/ACA RNA-related sequences identified in this work, 182 have the typical structures of retrogene, and the figure of H/ACA retrogene seems to be underestimated, inasmuch as retrogenes >20% diverged from their corresponding genes are not included in our analysis. The genomes of the chimpanzee and man share ∼96% of box H/ACA RNA-related sequences at identical locations, and only ∼4% are thus hominin-specific, having arisen in our genome since the divergence from chimpanzee. On the contrary, the genomes of the mouse contains only ∼50% box H/ACA RNA-related sequences relative to man and some sequences were found in different genomic regions, suggesting that most of the H/ACA RNA-related sequences in primate occurred after the rodent/primate divergence. To elucidate the mechanism of H/ACA snoRNA propagation in primates, we analyzed all ape-specific events (those duplicated in human and chimp but not in rhesus monkey) using presence/absence patterns, and found that among nine ape-specific events (ACA1b, ACA10b, ACA40g, ACA40n, ACA43b, ACA51b, ACA57b, ACA64c and U67c), all but one originated from retroposition (Supplementary Figure 1C), suggesting that duplications of most H/ACA snoRNAs in primates are indeed bona fide events mediated by retroposition. In addition, retroposition of different H/ACA RNAs occurred at different stage of primate evolution (Supplementary Figure 1). Notably, the sequence of human-specific retrogene ACA59b is completely identical to ACA59, pointing to a very recent origin of the snoRNA retrogene ACA59b and suggesting, that retrotransposition of snoRNAs still continues to the present day in the human lineage. Multiple studies have suggested a high rate of retroposition on the primate and rodent lineages (30–32), probably driven by the activity of L1 retrotransposable elements (33). Our results also show the involvement of the L1 retroposition machinery in the formation of human H/ACA retrogenes. Retroposition was commonly thought to generate nonfunctional gene copies (retropseudogenes) that accumulate disablements such as premature stop codons and frameshift mutations for protein-coding genes (34), because the copied mRNA is generally lacking regulatory elements. However, Brosius (35,36) predicted that retrogenes can insert next to resident promoter/enhancer elements and thus escape transcriptional silencing. Indeed, researchers have recently shown that retroposition has generated a significant number of new functional genes (retrogenes) in mammalian genomes (37,38). Similarly, some of the retrogenes derived from H/ACA RNAs appear to be functional genes. First, nearly 50% H/ACA retrogenes found in this work are intronic, encoded within protein-coding genes. Like previously identified intronic snoRNAs (39–41), intronic retrogenes can be co-transcribted with their host genes and then released from excised, debranched introns by exonucleolytic trimming. Furthermore, unlike protein-coding genes, snoRNA retrogenes do not accumulate disablements such as premature stop codons and frameshift mutations. Importantly, some snoRNA retrogenes, even when located in the antisense orientation to their host gene (ACA107f) or in intergenic region (ACA64c), have typical H/ACA RNA structure and can be expressed in human tissues. In addition, for some H/ACA genes retroposition generated more copies and the process may also have provided abundant raw material for the formation of new genes. Therefore it appears that retroposition is one of the ways of novel snoRNA gene formation. In line with the notion, some previously reported box H/ACA RNA genes apparently resulted from retrotransposition of different box H/ACA RNAs (Figures 2–4).

SUPPLEMENTARY DATA

Supplementary data are available at NAR online.
  40 in total

Review 1.  The expanding snoRNA world.

Authors:  Jean Pierre Bachellerie; Jérôme Cavaillé; Alexander Hüttenhofer
Journal:  Biochimie       Date:  2002-08       Impact factor: 4.079

2.  Identification of 13 novel human modification guide RNAs.

Authors:  Patrice Vitali; Hélène Royo; Hervé Seitz; Jean-Pierre Bachellerie; Alexander Hüttenhofer; Jérôme Cavaillé
Journal:  Nucleic Acids Res       Date:  2003-11-15       Impact factor: 16.971

3.  Millions of years of evolution preserved: a comprehensive catalog of the processed pseudogenes in the human genome.

Authors:  Zhaolei Zhang; Paul M Harrison; Yin Liu; Mark Gerstein
Journal:  Genome Res       Date:  2003-12       Impact factor: 9.043

Review 4.  Comparative analysis of processed pseudogenes in the mouse and human genomes.

Authors:  Zhaolei Zhang; Nick Carriero; Mark Gerstein
Journal:  Trends Genet       Date:  2004-02       Impact factor: 11.639

5.  Mfold web server for nucleic acid folding and hybridization prediction.

Authors:  Michael Zuker
Journal:  Nucleic Acids Res       Date:  2003-07-01       Impact factor: 16.971

Review 6.  Mobile elements: drivers of genome evolution.

Authors:  Haig H Kazazian
Journal:  Science       Date:  2004-03-12       Impact factor: 47.728

7.  Extensive gene traffic on the mammalian X chromosome.

Authors:  J J Emerson; Henrik Kaessmann; Esther Betrán; Manyuan Long
Journal:  Science       Date:  2004-01-23       Impact factor: 47.728

8.  The human genome contains many types of chimeric retrogenes generated through in vivo RNA recombination.

Authors:  Anton Buzdin; Elena Gogvadze; Elena Kovalskaya; Pavel Volchkov; Svetlana Ustyugova; Anna Illarionova; Alexey Fushan; Tatiana Vinogradova; Eugene Sverdlov
Journal:  Nucleic Acids Res       Date:  2003-08-01       Impact factor: 16.971

9.  U17/snR30 is a ubiquitous snoRNA with two conserved sequence motifs essential for 18S rRNA production.

Authors:  Vera Atzorn; Paola Fragapane; Tamás Kiss
Journal:  Mol Cell Biol       Date:  2004-02       Impact factor: 4.272

10.  Whole-genome screening indicates a possible burst of formation of processed pseudogenes and Alu repeats by particular L1 subfamilies in ancestral primates.

Authors:  Kazuhiko Ohshima; Masahira Hattori; Tetsusi Yada; Takashi Gojobori; Yoshiyuki Sakaki; Norihiro Okada
Journal:  Genome Biol       Date:  2003-10-28       Impact factor: 13.583

View more
  24 in total

1.  Non-coding RNAs: a key to future personalized molecular therapy?

Authors:  Marco Galasso; Maria Elena Sana; Stefano Volinia
Journal:  Genome Med       Date:  2010-02-18       Impact factor: 11.117

2.  snoTARGET shows that human orphan snoRNA targets locate close to alternative splice junctions.

Authors:  Peter S Bazeley; Valery Shepelev; Zohreh Talebizadeh; Merlin G Butler; Larisa Fedorova; Vadim Filatov; Alexei Fedorov
Journal:  Gene       Date:  2007-11-21       Impact factor: 3.688

3.  LINE-1 ORF1 protein localizes in stress granules with other RNA-binding proteins, including components of RNA interference RNA-induced silencing complex.

Authors:  John L Goodier; Lili Zhang; Melissa R Vetter; Haig H Kazazian
Journal:  Mol Cell Biol       Date:  2007-06-11       Impact factor: 4.272

4.  Human and mouse protein-noncoding snoRNA host genes with dissimilar nucleotide sequences show chromosomal synteny.

Authors:  Ritsuko Tanaka-Fujita; Yuuichi Soeno; Hitoshi Satoh; Yoshikazu Nakamura; Shigeo Mori
Journal:  RNA       Date:  2007-04-27       Impact factor: 4.942

5.  Retroposed SNOfall--a mammalian-wide comparison of platypus snoRNAs.

Authors:  Jürgen Schmitz; Anja Zemann; Gennady Churakov; Heiner Kuhl; Frank Grützner; Richard Reinhardt; Jürgen Brosius
Journal:  Genome Res       Date:  2008-05-07       Impact factor: 9.043

6.  Analysis of small nucleolar RNAs reveals unique genetic features in malaria parasites.

Authors:  Prakash Chandra Mishra; Anuj Kumar; Amit Sharma
Journal:  BMC Genomics       Date:  2009-02-07       Impact factor: 3.969

7.  Human miRNA precursors with box H/ACA snoRNA features.

Authors:  Michelle S Scott; Fabio Avolio; Motoharu Ono; Angus I Lamond; Geoffrey J Barton
Journal:  PLoS Comput Biol       Date:  2009-09-18       Impact factor: 4.475

8.  Systematic identification and evolutionary features of rhesus monkey small nucleolar RNAs.

Authors:  Yong Zhang; Jun Liu; Chunshi Jia; Tingting Li; Rimao Wu; Jie Wang; Ying Chen; Xiaoting Zou; Runsheng Chen; Xiu-Jie Wang; Dahai Zhu
Journal:  BMC Genomics       Date:  2010-01-25       Impact factor: 3.969

9.  Genome-wide analysis of chicken snoRNAs provides unique implications for the evolution of vertebrate snoRNAs.

Authors:  Peng Shao; Jian-Hua Yang; Hui Zhou; Dao-Gang Guan; Liang-Hu Qu
Journal:  BMC Genomics       Date:  2009-02-22       Impact factor: 3.969

10.  SnoRNA copy regulation affects family size, genomic location and family abundance levels.

Authors:  Danny Bergeron; Cédric Laforest; Stacey Carpentier; Annabelle Calvé; Étienne Fafard-Couture; Gabrielle Deschamps-Francoeur; Michelle S Scott
Journal:  BMC Genomics       Date:  2021-06-05       Impact factor: 3.969

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.