Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Computational identification of microRNA gene loci and precursor microRNA sequences in CHO cell lines.

Literature DB >> 22306111

Computational identification of microRNA gene loci and precursor microRNA sequences in CHO cell lines.

Matthias Hackl¹, Vaibhav Jadhav, Tobias Jakobi, Oliver Rupp, Karina Brinkrolf, Alexander Goesmann, Alfred Pühler, Thomas Noll, Nicole Borth, Johannes Grillari.

Abstract

MicroRNAs (miRNAs) have recently entered Chinese hamster ovary (CHO) cell culture technology, due to their severe impact on the regulation of cellular phenotypes. Applications of miRNAs that are envisioned range from biomarkers of favorable phenotypes to cell engineering targets. These applications, however, require a profound knowledge of miRNA sequences and their genomic organization, which exceeds the currently available information of ~400 conserved mature CHO miRNA sequences. Based on these recently published sequences and two independent CHO-K1 genome assemblies, this publication describes the computational identification of CHO miRNA genomic loci. Using BLAST alignment, 415 previously reported CHO miRNAs were mapped to the reference genomes, and subsequently assigned to a distinct genomic miRNA locus. Sequences of the respective precursor-miRNAs were extracted from both reference genomes, folded in silico to verify correct structures and cross-compared. In the end, 212 genomic loci and pre-miRNA sequences representing 319 expressed mature miRNAs (approximately 50% of miRNAs represented matching pairs of 5' and 3' miRNAs) were submitted to the miRBase miRNA repository. As a proof-of-principle for the usability of the published genomic loci, four likely polycistronic miRNA cluster were chosen for PCR amplification using CHO-K1 and DHFR (-) genomic DNA. Overall, these data on the genomic context of miRNA expression in CHO will simplify the development of tools employing stable overexpression or deletion of miRNAs, allow the identification of miRNA promoters and improve detection methods such as microarrays. Copyright Â

Entities: Chemical Disease Gene Species

Mesh：

Substances：
MicroRNAs
RNA Precursors

Year: 2012 PMID： 22306111 PMCID： PMC3314935 DOI： 10.1016/j.jbiotec.2012.01.019

Source DB: PubMed Journal: J Biotechnol ISSN： 0168-1656 Impact factor: 3.307

Chinese hamster ovary (CHO) cells are currently the first choice mammalian cell line for the production of complex therapeutic proteins requiring proper folding and post-translational modifications, creating an annual revenue exceeding 100 billion USD (Mudhar, 2006). With the publication of a CHO-K1 draft genome (Xu et al., 2011), as well as thorough analysis of the CHO mRNA transcriptome (Becker et al., 2011), the basis for genomic characterization of CHO cells has been set and will allow the development of novel tools to rationally design CHO cells as bioindustrial work horses. Therefore, microRNAs (miRNAs) have been discussed as promising tools for CHO cell characterization as well as engineering (Barron et al., 2011). This family of small non-coding RNAs, which by now encompasses more than 1000 sequences for mouse and human (Griffiths-Jones, 2010), acts by negative regulation of gene expression due to post-transcriptional repression of mRNA translation (Hüttenhofer and Schattner, 2006). The ∼22 nt long mature miRNAs that catalyze this repression are the result of enzymatic processing of a primary RNA-Polymerase II miRNA transcript: in the nucleus, RNase III Drosha together with Dgcr8 cleave a ∼70 nt long single-stranded RNA referred to as precursor miRNA (pre-miR) or miRNA hairpin/stem-loop due to its characteristic secondary structure (Gregory et al., 2004). Pre-miRNAs are exported into the cytoplasm where cleavage of the loop by the RNase Dicer generates a duplex of two ∼22 nt long mature miRNAs (Takeshita et al., 2007). The partial sequence complementarity underlying the miRNA:mRNA interaction, allows single miRNAs to bind up to 100 distinct mRNAs (Selbach et al., 2008), thus potentially orchestrating the expression of whole gene networks similar to transcription factors. This range in target regulation achieved by individual miRNAs is mirrored in their biological relevance, which includes control of cellular proliferation and energy metabolism as well as stress resistance and cell death (Müller et al., 2008). Two studies have so far addressed the identification and annotation of CHO miRNAs, and independently reported the expression of 350 (Johnson et al., 2011) and 365 (Hackl et al., 2011) mature miRNAs, but have not identified the respective genomic loci or pre-miRNA sequences. This information is, however, necessary for (i) mimicking endogenous miRNA expression, since pre-miRNA secondary structures can be target to regulation of miRNA stability (Michlewski et al., 2008), for (ii) understanding transcriptional regulation of specific miRNAs, as well as for (iii) phylogenetic analyses. Based on the alignment of a combined set of previously reported mature miRNA sequences against two independent CHO-K1 genome reference sequences, we here report the identification of miRNA gene loci and extraction of the respective pre-miRNA sequences from both genomic references, followed by cross-comparison of the derived sequences (Fig. 1). In detail, the employed strategy used two public datasets containing sequences of mature CHO miRNAs with expression levels detectable by next-generation sequencing (Johnson et al., 2011; Hackl et al., 2011). Both datasets were downloaded, reduced by redundant isomiR sequences as well as recently reported non-coding RNAs in miRBase version 18.0 (Griffiths-Jones, 2010), and then merged into one dataset containing 415 miRNAs of which 22 were putative novel miRNA sequences. These sequences were further used as “query” (given in Supplementary Data 1) for BLAST alignment against two distinct CHO genome references using blastn with nucleotide mismatch penalty −2, and nucleotide match reward +1. The first reference consisted of the recently published CHO-K1 sequence (Xu et al., 2011) hereafter referred to as “K1-P” (for “public”) and the second reference being a low coverage, so far unpublished CHO-K1 genome assembly by Bielefeld University and BOKU University referred to as “K1-BB” (Table 1). In brief, the K1-BB genome (ATCC CCL-61) was sequenced on an Illumina Genome Analyser IIx in a 2 × 125 bp sequencing run on six lanes according to the manufacturer's manuals. Sequencing resulted in 411 million reads and 51 Gbp which leads to an estimated genome coverage of 17-fold considering a genome size of 3 Gbp. Assembly of the sequence data was performed with velvet 1.0.4 resulting in 11.4 million contigs that can be downloaded at ftp://ftp.cebitec.uni-bielefeld.de/pub/supplements/2011/Hackl_JBiotech/.

Fig. 1

Strategy for identification of CHO pre-miR sequences from genomic references. (a) Schematic outline of identification strategy. (b) Flow chart illustrating the sequence identification strategy in detail: all currently published CHO mature miRNA sequences were BLAST-aligned to two independent CHO-K1 genomic reference sequence assemblies (K1-P and K1-BB). BLAST results were filtered for alignments with zero mismatches (100% identity) and alignment lengths equal to the mature miRNA length (100% length). Additionally, miRNAs mapping to genomic repeat regions were removed. From the remaining genomic loci, the respective pre-miRNA sequences were extracted independently from both genomic references and cross-checked.

Table 1

Genome references for identification of CHO pre-miR sequences.

	K1-P	K1-BB
Genome size (Gbp)	2.40	2.98
Contigs	109,151	11,400,490
Average contig length	21,986	261
Median contig length	503	124.5
x Coverage	95	17.1

Following filtering of BLAST alignments (Fig. 1), a total of 365 out of 415 distinct mature miRNAs could be mapped to either genomic reference. In detail, 353 distinct mature miRNAs gave a perfect BLAST hit against the K1-P reference, while 330 miRNAs could be aligned to the K1-BB reference with an overlap of 318 miRNAs, shown as Venn diagram in Fig. 2a (Hulsen et al., 2008). While the majority of miRNAs exhibited a single exact match in the reference genome, some miRNAs exhibited two or more exact matches (Fig. 2b). This might have biological reasons, since duplications of miRNA genes are known to result in 100% identical paralogous sequences present in other parts of the genome (Gardner et al., 2009). Alternatively, the observed multiple hits could be a consequence of incomplete assembly of the genomic references. This would explain the reduction in multiple perfect matches from the incompletely assembled 2.9 Gbp K1-BB genome to the almost completely assembled 2.45 Gbp K1-P genome from 28% to 16% of the aligned miRNAs (Fig. 2b). Nevertheless, 15 miRNAs exhibited more than 10 and up to 250 perfect matches (Supplementary Table 1), which indicates that these are repeat derived small RNAs rather than canonical miRNAs. Hence, these miRNAs were removed from the BLAST results and not considered for further analysis as well as submission to miRBase.

Fig. 2

BLAST alignment of mature miRNAs to two different genomic reference sequences. (a) Size-adjusted Venn diagram indicating that 318 mature miRNAs were aligned to both reference genomes, while 35 and 12 mature miRNAs could only be aligned to K1-P or K1-BB, respectively. (b) The cumulative fraction of BLAST-aligned miRNAs is plotted against the number of perfect genomic matches identified; for each miRNA; 16% and 28% of miRNAs could be perfectly aligned to two or more genomic locations in the K1-P (black) or K1-BB (gray) genomic reference sequence.

In the next step, the genomic locations of BLAST aligned mature miRNAs were analyzed in detail to identify the respective pre-miRNA sequences (Fig. 3a): genomic locations where two miRNAs could be aligned in close proximity indicate miRNA genes from which two mature miRNAs – corresponding to the 5′ and 3′ miRNA – are produced. Other genomic loci were mapped by only one miRNA, suggesting the expression of just one active mature miRNA, which is either derived from the 5′ or 3′ arm of the hairpin. Approximately 50% of the genomic loci were identified by alignment of both 5′ and 3′ mature miRNAs (Fig. 3b). For these genomic loci the pre-miRNA sequence lengths was estimated as the length from the 5′ miRNA start to the 3′ miRNA end. The resulting sequence lengths were plotted against the cumulative fraction of the number of pre-miRNAs, showing that the majority (>95%) of hairpins exhibited a length between 50 and 70 bases (Fig. 3c).

Fig. 3

Characterization of CHO pre-miRNA sequences. (a) Scheme representing the strategy for pre-miRNA sequence extraction from a genomic locus mapped by either one or two mature miRNAs: (i) a buffer of 10 bases up- and downstream the mature miRNAs was taken in case both hairpin-arms were mapped. (ii) and (iii) For genomic positions aligned by a single miRNA a total pre-miRNA of 100 bases was extracted, starting 10 bases upstream a 5′ miRNA match or 10 bases downstream a 3′ match. (b) Distribution of CHO miRNA loci identified by alignment of either 5′ or 3′ mature miRNAs or both is shown. Venn overlap of miRNA genomic loci as identified independently in each CHO-K1 genomic reference sequence. (c) For pre-miRNA genomic loci mapped at both the 5′ and 3′ miRNA hairpin-arm, length of the pre-miRNA was calculated as the distance between the start of the 5′ miRNA alignment and the end of the 3′ miRNA alignment. Cumulative fraction of pre-miRNAs is plotted against the pre-miRNA length, showing that for most pre-miRs length ranged between 50 and 70 bases.

Since it has been shown that Drosha cleavage is dependent on the hairpin loop rather than consensus sequences in the flanking regions, the precise start and stop sites of a pre-miRNA are difficult to determine (Zeng et al., 2005). Therefore, an arbitrary distance of 10 bases upstream the 5′ miRNA and 10 bases downstream the 3′ miRNA was included as “buffer” during sequence extraction from the genomic references (Fig. 3a). Based on the observation that most pre-miRNA sequences ranged between 50 and 70 bases, an arbitrary sequence length of 100 bases was defined for pre-miRNAs with only one expressed miRNA detected (i.e. only one hairpin-arm mapped by a mature miRNA), including a buffer of 10 bases upstream or downstream the miRNA start site (Fig. 3a). The important information whether a single match represented a 5′ or 3′ miRNA was derived from orthologous pre-miRNAs (mainly human, mouse or rat) in miRBase. In order to verify sequence correctness, all CHO pre-miRNA sequences were folded in silico using the DINAMelt webserver that is based on the mfold++ software (Markham and Zuker, 2005). Manual curation of all folding resulted in the removal of 7 putative novel CHO pre-miRs that did not resemble structures of canonical miRNAs with a complementary stemloop and 3′ overhangs, while all of the conserved CHO pre-miRs (209 sequences) as well as three novel pre-miRs passed manual curation. The respective 212 RNA secondary structures are provided as Supplementary Data 2. Table 3 exemplarily gives the pre-miRNA sequences of all 6 miRNAs belonging to the miR-17-92 cluster, which were identified in close proximity on one genomic scaffold.

Table 3

miR-17-92 pre-miRNA sequences.

>cgr-mir-17_scaffold_gi|344163086|gb|JH001979.1|_REV

AGGATAATGTcaaagtgcttacagtgcaggtagTGATATGCACATCTactgcagtgcaggcacttgtggCATTATGGT

>cgr-mir-18a_scaffold_gi|344163086|gb|JH001979.1|_REV

CTTTTTGTTCtaaggtgcatctagtgcagatagTGAAGTAGACTAGCATCTactgccctaagtgctccttctggCATAAGAAG

>cgr-mir-19a_scaffold_gi|344163086|gb|JH001979.1|_REV

GCAGCCCTCTGTTAGTTTTGCATACTTGCACTACAAGAAGAATGCAGTtgtgcaaatctatgcaaaactgaTGGTGGCCT

>cgr-mir-19b_scaffold_gi|344163086|gb|JH001979.1|_REV

GTCTATGGTTagttttgcaggtttgcatccagcTGTATAATACTCTGCtgtgcaaatccatgcaaaactgaCTGTGGTGG

>cgr-mir-20a_scaffold_gi|344163086|gb|JH001979.1|_REV

TCTGTGGCACtaaagtgcttatagtgcaggtagTGTCCACTCATCTACTGCATTACGAGCACTTCCAGTGCTGCCAGCTGGAGAGCCCCAGCCTCGCTCG

>cgr-mir-92a_scaffold_gi|344163086|gb|JH001979.1|_REV

CTTTCTACACaggctgggatttgttgcaatgctGTGTTTCTCGATGGtattgcacttgtcccggcctgtTGAGTTTGG

Lower case letters indicate mature miRNAs; upper case letters indicate 5′ and 3′ flanking regions as well as loop regions.

Comparison of pre-miRNA sequences derived from two CHO-K1 genomic references (K1-P and K1-BB) gave four sequences with either one or two mismatches, of which only mir-486 harbored a potential single-nucleotide polymorphism (SNP) within a mature miRNA (Supplementary Table 2). In the other cases, SNPs were identified in the hairpin-loop (mir-324 and mir-486) or regions flanking mature miRNAs to the 5′ (mir-1956) or 3′ end (mir-542). Conservation of CHO pre-miRNA sequences was estimated for mir-17-92 by calculating ClustalW alignments (Thompson et al., 1994) to the respective sequences from Mus musculus; the results indicate high conservation for mir-18a and mir-19b, while several mismatches were found between mouse and CHO hairpin-loops of mir-17, mir-20a, and mir-92a, respectively (Fig. 4a). Supplementary Data 3 gives the sequences of all 212 miRNA hairpins, as they were extracted from the K1-P and K1-BB genomic reference as well as the respective genomic location. To show that the here provided information can easily be applied to amplify and clone CHO miRNAs, four distinct clusters of miRNAs were chosen for PCR amplification using primers designed based on the K1-P genomic reference (Supplementary Table 2). Genomic DNA isolated from adherent CHO-K1 and DHFR (-) cell lines cultivated at 37 °C at 7% atmospheric CO2, served as template for the PCR reaction that gave specific bands at the expected size (Fig. 4b).

Fig. 4

Sequence characterization of CHO pre-miRNAs. (a) Conservation CHO (cgr) mir-17-92 pre-miRNAs in respect to Mus musculus (mmu); *, sequence matches; - sequence deletions. (b) PCR amplification of miRNA clusters: PCR amplification of miRNA clusters using genomic DNA from CHO-K1 and CHO dhfr (-) cells. Lanes 1–4 and 6–9 showing specific amplification for miR-24-23a (1/6), miR-17-92a, miR-221-222 and miR-24-23b. Lanes 5 and 10 no template control PCR.

Overall, these data demonstrate a successful identification of the genomic location of 365 out of 415 (88%) expressed mature miRNA sequences. After exclusion of 15 miRNAs due to multiple alignments to genomic repeat regions, 350 miRNAs remained for annotation of genomic loci based on miRNA alignment patterns. After manual verification of miRNA-like RNA secondary structures, a total of 212 miRNA loci as well as the respective pre-miRNA sequences were identified with high confidence (Table 2), cross-checked to confirm correctness of sequences, and provided as Supplementary Data to this publication. In addition all sequences were submitted to the miRBase database (Griffiths-Jones, 2010) for assignment of miRBase accession numbers (Supplementary Data 3).

Table 2

Number of aligned miRNAs, unique genomic loci and precursor-miRNA sequences.

	K1-P	K1-BB
miRNAs mapped to genome (100% ID, 100% length)	353	330
miRNAs mapped to genomic repeat regions	14	15
miRNAs used for identification of genomic loci and pre-miRNAs	339	315
High confidence genomic miRNA locia	206	196
pre-miRNA sequences submitted to miRBaseb	206	6

After removal of loci that give rise to incorrectly folded pre-miRs.

In total 212 pre-miRNA sequences submitted to miRBase.

This data can now be used to establish CHO specific tools for miRNA overexpression as engineering strategy using endogenous pre-miRNA sequences, which do show differences in nucleotide sequence compared to mouse homologs (Fig. 4b). In addition, the development of knockout strategies to specifically reduce miRNA expression will benefit from these data, and finally, knowledge of the genomic loci also allows amplification and cloning of polycistronic miRNA clusters that are likely to have a stronger influence on CHO cell phenotypes upon overexpression compared to single miRNAs.

16 in total

1. Conserved microRNAs in Chinese hamster ovary cell lines.

Authors: Kathryn C Johnson; Nitya M Jacob; Peter Morin Nissom; Matthias Hackl; Lim Hseuh Lee; Miranda Yap; Wei-Shou Hu
Journal: Biotechnol Bioeng Date: 2011-02 Impact factor: 4.530

Review 2. MicroRNAs: tiny targets for engineering CHO cell phenotypes?

Authors: Niall Barron; Noelia Sanchez; Paul Kelly; Martin Clynes
Journal: Biotechnol Lett Date: 2010-09-25 Impact factor: 2.461

3. Recognition and cleavage of primary microRNA precursors by the nuclear processing enzyme Drosha.

Authors: Yan Zeng; Rui Yi; Bryan R Cullen
Journal: EMBO J Date: 2004-11-25 Impact factor: 11.598

4. Widespread changes in protein synthesis induced by microRNAs.

Authors: Matthias Selbach; Björn Schwanhäusser; Nadine Thierfelder; Zhuo Fang; Raya Khanin; Nikolaus Rajewsky
Journal: Nature Date: 2008-07-30 Impact factor: 49.962

5. MicroRNAs as targets for engineering of CHO cell factories.

Authors: Dethardt Müller; Hermann Katinger; Johannes Grillari
Journal: Trends Biotechnol Date: 2008-05-28 Impact factor: 19.536

6. Unraveling the Chinese hamster ovary cell line transcriptome by next-generation sequencing.

Authors: Jennifer Becker; Matthias Hackl; Oliver Rupp; Tobias Jakobi; Jessica Schneider; Rafael Szczepanowski; Thomas Bekel; Nicole Borth; Alexander Goesmann; Johannes Grillari; Christian Kaltschmidt; Thomas Noll; Alfred Pühler; Andreas Tauch; Karina Brinkrolf
Journal: J Biotechnol Date: 2011-09-17 Impact factor: 3.307

7. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice.

Authors: J D Thompson; D G Higgins; T J Gibson
Journal: Nucleic Acids Res Date: 1994-11-11 Impact factor: 16.971

8. Next-generation sequencing of the Chinese hamster ovary microRNA transcriptome: Identification, annotation and profiling of microRNAs as targets for cellular engineering.

Authors: Matthias Hackl; Tobias Jakobi; Jochen Blom; Daniel Doppmeier; Karina Brinkrolf; Rafael Szczepanowski; Stephan H Bernhart; Christian Höner Zu Siederdissen; Juan A Hernandez Bort; Matthias Wieser; Renate Kunert; Simon Jeffs; Ivo L Hofacker; Alexander Goesmann; Alfred Pühler; Nicole Borth; Johannes Grillari
Journal: J Biotechnol Date: 2011-03-30 Impact factor: 3.307

9. BioVenn - a web application for the comparison and visualization of biological lists using area-proportional Venn diagrams.

Authors: Tim Hulsen; Jacob de Vlieg; Wynand Alkema
Journal: BMC Genomics Date: 2008-10-16 Impact factor: 3.969

10. DINAMelt web server for nucleic acid melting prediction.

Authors: Nicholas R Markham; Michael Zuker
Journal: Nucleic Acids Res Date: 2005-07-01 Impact factor: 16.971

12 in total

1. Regulation of miR-29b-1/a transcription and identification of target mRNAs in CHO-K1 cells.

Authors: Penn Muluhngwi; Kirsten Richardson; Joshua Napier; Eric C Rouchka; Justin L Mott; Carolyn M Klinge
Journal: Mol Cell Endocrinol Date: 2017-01-28 Impact factor: 4.102

2. Genomic landscapes of Chinese hamster ovary cell lines as revealed by the Cricetulus griseus draft genome.

Authors: Nathan E Lewis; Xin Liu; Yuxiang Li; Harish Nagarajan; George Yerganian; Edward O'Brien; Aarash Bordbar; Anne M Roth; Jeffrey Rosenbloom; Chao Bian; Min Xie; Wenbin Chen; Ning Li; Deniz Baycin-Hizal; Haythem Latif; Jochen Forster; Michael J Betenbaugh; Iman Famili; Xun Xu; Jun Wang; Bernhard O Palsson
Journal: Nat Biotechnol Date: 2013-07-21 Impact factor: 54.908

3. Analysis of microRNA transcription and post-transcriptional processing by Dicer in the context of CHO cell proliferation.

Authors: Matthias Hackl; Vaibhav Jadhav; Gerald Klanert; Michael Karbiener; Marcel Scheideler; Johannes Grillari; Nicole Borth
Journal: J Biotechnol Date: 2014-01-28 Impact factor: 3.307

4. Advances in Mammalian cell line development technologies for recombinant protein production.

Authors: Tingfeng Lai; Yuansheng Yang; Say Kong Ng
Journal: Pharmaceuticals (Basel) Date: 2013-04-26

Review 5. CHO microRNA engineering is growing up: recent successes and future challenges.

Authors: Vaibhav Jadhav; Matthias Hackl; Aliaksandr Druz; Smriti Shridhar; Cheng-Yu Chung; Kelley M Heffner; David P Kreil; Mike Betenbaugh; Joseph Shiloach; Niall Barron; Johannes Grillari; Nicole Borth
Journal: Biotechnol Adv Date: 2013-08-02 Impact factor: 14.227

6. Establishment of a CpG island microarray for analyses of genome-wide DNA methylation in Chinese hamster ovary cells.

Authors: Anna Wippermann; Sandra Klausing; Oliver Rupp; Stefan P Albaum; Heino Büntemeyer; Thomas Noll; Raimund Hoffrogge
Journal: Appl Microbiol Biotechnol Date: 2013-10-22 Impact factor: 4.813

7. Identification of microRNAs specific for high producer CHO cell lines using steady-state cultivation.

Authors: Andreas Maccani; Matthias Hackl; Christian Leitner; Willibald Steinfellner; Alexandra B Graf; Nadine E Tatto; Michael Karbiener; Marcel Scheideler; Johannes Grillari; Diethard Mattanovich; Renate Kunert; Nicole Borth; Reingard Grabherr; Wolfgang Ernst
Journal: Appl Microbiol Biotechnol Date: 2014-07-23 Impact factor: 4.813