| Literature DB >> 32050897 |
Amita Kashyap1, Adelaide Rhodes2, Brent Kronmiller2, Josie Berger3, Ashley Champagne3, Edward W Davis2, Mitchell V Finnegan4, Matthew Geniza5, David A Hendrix6,7, Christiane V Löhr1, Vanessa M Petro3, Thomas J Sharpton8,9, Jackson Wells2, Clinton W Epps10, Pankaj Jaiswal5, Brett M Tyler2,5, Stephen A Ramsey11,12.
Abstract
BACKGROUND: Long noncoding RNAs (lncRNAs) have roles in gene regulation, epigenetics, and molecular scaffolding and it is hypothesized that they underlie some mammalian evolutionary adaptations. However, for many mammalian species, the absence of a genome assembly precludes the comprehensive identification of lncRNAs. The genome of the American beaver (Castor canadensis) has recently been sequenced, setting the stage for the systematic identification of beaver lncRNAs and the characterization of their expression in various tissues. The objective of this study was to discover and profile polyadenylated lncRNAs in the beaver using high-throughput short-read sequencing of RNA from sixteen beaver tissues and to annotate the resulting lncRNAs based on their potential for orthology with known lncRNAs in other species.Entities:
Keywords: Beaver; Castor canadensis; Expression atlas; Long noncoding RNA; Transcriptome; lncRNA
Year: 2020 PMID: 32050897 PMCID: PMC7014947 DOI: 10.1186/s12864-019-6432-4
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Contig retention through the screening pipeline for novel lncRNAs
| Step | % Contigs Eliminated | # Contigs Eliminated | # Contigs |
|---|---|---|---|
| Orthology analysis (BLASTn) | 62.7 | 54,405 (a) | 32,309 novel |
| Probable noncoding (CPAT | 70.1 | 22,781 | 9528 |
High confidence noncoding (CPAT | 98.1 | 9346 | 182 |
| Pfam annotations | 0 | 0 | 182 |
| align to genome and compare to MAKER annotations | 19.2 | 35 | 147 |
Columns as follows: “Step”, the name of the program or step in the screening pipeline; “% Contigs Eliminated”, the percentage of contigs from Column 4 of the previous row in the table that were eliminated in this step of the analysis pipeline; “# Contigs Eliminated”, the number of contigs corresponding to the percentage in Column 2; “# Contigs Remaining”, the number of contigs remaining after the row’s filtering Step was applied. The number of starting contigs before step 1 (“Orthology analysis”) was 86,714
(a) This includes the 40 beaver contigs that we identified that are orthologs of known noncoding transcripts in other species (Fig. 9, purple rectangle). The percentage shown in column “% Contigs Eliminated” is for that specific step (row) relative to the number of contigs before that step.
Fig. 9Overview of the computational pipeline for identifying beaver lncRNAs. Transcript contigs from the consensus transcriptome (“Merged Transcriptome” above) were sequentially filtered using (1) Basic Local Alignment Search Tool for nucleotide sequence (BLASTn) against the NCBI nucleotide database to eliminate probable orthologs of protein-coding genes, known lncRNAs, and other non-lncRNA transcript types; (2) CPAT to detect and eliminate contigs with protein-coding ORFs or nucleotide hexamer usage patterns that are consistent with protein coding genes; (3) HMMscan scan against the Pfam database to identify matches to protein domain motifs; and (4) BLASTn alignment against the OSU draft beaver genome assembly and eliminating those contigs that overlapped with scaffold regions that were annotated (by MAKER) as protein-coding genes. Contigs discovered by the annotation pipeline that are orthologs of known lncRNAs are shown in purple, and novel noncoding contigs identified by the annotation pipeline are shown in green
Fig. 1Noncoding transcript contigs’ model-based structural stability is inversely correlated with length. Marks indicate lncRNA contigs that have no known orthologs (“novel”; a) and that have known noncoding orthologs (“known”, b). The outlier in (b) is labeled by its known ortholog, XIST
Fig. 2The lncRNA contigs with known orthologs are longer than the novel lncRNA contigs. Density distributions of contig lengths for the 147 novel noncoding transcript contigs (“novel”) and the 40 noncoding transcript contigs that are orthologous to known noncoding transcripts (“known”)
Fig. 3In the pan-tissue transcriptome assembly, known lncRNA contigs had overall higher coverage levels than novel lncRNA contigs. Density distributions of contig coverage depths for the 147 novel noncoding transcript contigs (“novel”) and the 40 noncoding transcript contigs that are orthologous to known noncoding transcripts (“known”). For both sets of noncoding transcript contigs, average depth of coverage in the assembly was not significantly correlated with contig length (Fig. 5)
Fig. 5Contig average depth of read coverage in the assembly is not correlated with contig length. Marks indicate contigs that do not have orthologs (a, 147 contigs) or that are orthologous to known noncoding transcripts (b, 40 contigs). The outlier in (b) is labeled by its known ortholog, XIST
Novel lncRNA contigs with strongest evidence across multiple correlates
| Contig | Measure | max (RPKM) | ||||
|---|---|---|---|---|---|---|
| Length (nt) | MFE (kcal/mol) | Coverage | BLASTn Alignment Length (%) | Intronic | ||
| Ccan_OSU1_lncRNA_contig41254.1 | −96.8 | 26.71 | 100.00 | no | 7.8 | |
| Ccan_OSU1_lncRNA_contig46102.1 | 334 | − 103.57 | 8.42 | 100.00 | no | 7.6 |
| Ccan_OSU1_lncRNA_contig46174.1 | 333 | − 126.5 | 16.66 | 100.00 | no | 6.5 |
| Ccan_OSU1_lncRNA_contig43610.1 | 350 | −140.8 | 10.21 | 83.71 | no | 30.1 |
| Ccan_OSU1_lncRNA_contig44966.1 | 341 | − 149.8 | 11.81 | 63.93 | no | 48.6 |
| Ccan_OSU1_lncRNA_contig45799.1 | 336 | − 77 | 16.06 | 100.00 | no | 8.0 |
| Ccan_OSU1_lncRNA_contig59927.1 | 267 | −103.7 | 13.66 | 100.75 | no | 13.0 |
| Ccan_OSU1_lncRNA_contig62060.1 | 260 | −50.7 | 36.25 | 69.23 | yes | 22.8 |
Underlined text indicates that a particular contig was in the top ten, among all novel lncRNA contigs, for the given column feature (i.e., length, MFE, coverage, or alignment length). The BLASTn alignment length is computed as 100×(length of alignment)/(length of contig). The sixth column (Intronic) reflects whether the contig’s alignment to the reference genome was gapped or not; a “yes” is indicative of a potential excised intron. The last column, max (RPKM), is the maximum RPKM for the contig across all tissues and was not a criteria for inclusion in the table
Fig. 4Tissue-specific expression of novel lncRNAs in the American beaver. Heatmap rows correspond to the 147 contigs and columns correspond to the 16 tissues that were profiled. Cells are colored by log2(1 + RPKM) expression level. Rows and columns are separately ordered by hierarchical agglomerative clustering and cut-based sub-dendrograms are colored (arbitrary color assignment to sub-clusters) as a guide for visualization. Rows are labeled with abbreviated contig names, e.g., contig4731.1 instead of Ccan_OSU1_lncRNA_contig4731.1
Beaver noncoding contigs that are probable orthologs of known lncRNAs or noncoding transcripts
| Symbol; annotation | Contig | Species with ortholog hits | Human Ensembl Gene ID | BLASTn annotation | %ID | nt | |
|---|---|---|---|---|---|---|---|
| AC037459.2; (antisense to CCAR2) | Ccan_OSU1_lncRNA_contig74544.1 | Homo sapiens | ENSG00000253200 | CCAR2 lncRNA (cell cycle and apoptosis regulator 2) | 8.0⨉10−46 | 89 | 155 |
| AC019068.1; antisense | Ccan_OSU1_lncRNA_contig10709.1 | Homo sapiens | ENSG00000233611 | AC079135.1 gene, antisense lncRNA (TPA - predicted) | 2.4⨉10−12 | 77.6 | 143 |
| AC083843.1 | Ccan_OSU1_lncRNA_contig47288.1 | Homo sapiens | ENSG00000253433 | AC083843.1 gene, lincRNA (TPA - predicted) | 7.7⨉10−13 | 88.4 | 69 |
| AC095055.1 (antisense to SH3D19) | Ccan_OSU1_lncRNA_contig41532.1 | Homo sapiens | ENSG00000270681 | SH3D19 antisense noncoding RNA (SH3 domain containing 19) | 8.1⨉10− 58 | 82.9 | 274 |
| AC116667.1; (antisense to ZFHX3) | Ccan_OSU1_lncRNA_contig71613.1 | Homo sapiens | ENSG00000271009 | ZFHX3 antisense (zinc finger homeobox 3) | 1.8⨉10−47 | 83.6 | 231 |
| AL161747.2; (antisense to SALL2) | Ccan_OSU1_lncRNA_contig44345.1 | Homo sapiens | ENSG00000257096 | SALL2 lncRNA (spalt-like transcription factor 2) | 7.5⨉10−68 | 84.4 | 288 |
| AP000233.2 | Ccan_OSU1_lncRNA_contig22249.1 | Homo sapiens | ENSG00000232512 | AP000233.2 gene lincRNA (TPA - predicted) | 9.0⨉10−5 | 100 | 31 |
| AP003068.1; (antisense to VPS51) | Ccan_OSU1_lncRNA_contig24716.1 | Homo sapiens, Mus musculus, Bos taurus | ENSG00000254501 | VPS51 antisense (vacuolar protein sorting 51) | 0 | 93.2 | 438 |
| AP003068.1; (antisense to VPS51) | Ccan_OSU1_lncRNA_contig55707.1 | Mus musculus, Homo sapiens, Gallus gallus | ENSG00000254501 | VPS51 antisense/reverse strand (vacuolar protein sorting 51) | 1.7⨉10−83 | 92 | 226 |
| CTA-204B4.6† | Ccan_OSU1_lncRNA_contig29141.1 | Homo sapiens | ENSG00000259758 | CTA-204B4.6 gene lincRNA (TPA - predicted) | 6.2⨉10− 120 | 83.5 | 491 |
| CTA-204B4.6 | Ccan_OSU1_lncRNA_contig30023.1 | Homo sapiens | ENSG00000259758 | CTA-204B4.6 gene lincRNA (TPA - predicted) | 2.1⨉10− 129 | 94.5 | 308 |
| DNM3OS; (antisense to DNM3) | Ccan_OSU1_lncRNA_contig78034.1 | Homo sapiens; various primates | ENSG00000230630 | DNM3OS (DNM3 opposite strand/antisense RNA) lncRNA | 3.4⨉10−69 | 89.8 | 216 |
| GNB4; lncRNA isoform* | Ccan_OSU1_lncRNA_contig55083.1 | Homo sapiens | ENSG00000114450 | GNB4 (guanine nucleotide binding protein (G protein), beta polypeptide 4) | 6.4⨉10−38 | 78.8 | 287 |
| AC007038.2; (antisense to KANSL1L) | Ccan_OSU1_lncRNA_contig54664.1 | Homo sapiens, Mus musculus | ENSG00000272807 | KANSL1L antisense transcript (KAT8 regulatory NSL complex subunit 1-like) | 1.1⨉10−40 | 92 | 125 |
| KCNA3; noncoding isoform | Ccan_OSU1_lncRNA_contig27553.1 | Homo sapiens, Mus musculus | ENSG00000177272 | KCNA3 lncRNA (potassium voltage-gated channel, shaker-related subfamily, member 3) | 2.3⨉10− 139 | 85.5 | 502 |
| KCNA3; noncoding isoform | Ccan_OSU1_lncRNA_contig29471.1 | Homo sapiens | ENSG00000177272 | KCNA3 lncRNA (potassium voltage-gated channel, shaker-related subfamily, member 3) | 1.8⨉10−70 | 78.7 | 475 |
| KCNA3; noncoding isoform | Ccan_OSU1_lncRNA_contig79757.1 | Homo sapiens | ENSG00000177272 | KCNA3 lncRNA (potassium voltage-gated channel, shaker-related subfamily, member 3) | 7.6⨉10−31 | 80.2 | 197 |
| KCNA3; noncoding isoform | Ccan_OSU1_lncRNA_contig81530.1 | Homo sapiens, Mus musculus | ENSG00000177272 | KCNA3 lncRNA (potassium voltage-gated channel, shaker-related subfamily, member 3) | 7.1⨉10−61 | 87.7 | 211 |
| LINC01355 | Ccan_OSU1_lncRNA_contig54147.1 | Homo sapiens | ENSG00000261326 | LINC01355 lncRNA | 1.0⨉10− 85 | 87.5 | 295 |
| LMLN; noncoding isoform* | Ccan_OSU1_lncRNA_contig28300.1 | Homo sapiens | ENSG00000185621 | LMLN (leishmanolysin-like (metallopeptidase M8 family) | 3.1⨉10− 73 | 80.4 | 414 |
| MEG3 | Ccan_OSU1_lncRNA_contig11359.1 | Homo sapiens, Mus musculus, Pongo abelii | ENSG00000214548 | MEG3 lncRNA (maternally expressed 3) | 1.6⨉10− 123 | 93 | 313 |
| MEG3 | Ccan_OSU1_lncRNA_contig30419.1 | Homo sapiens, Pongo abelii | ENSG00000214548 | MEG3 lncRNA (maternally expressed 3) | 7.6⨉10− 124 | 93 | 313 |
| MEG3 | Ccan_OSU1_lncRNA_contig6442.1 | Homo sapiens, Mus musculus, Pongo abelii | ENSG00000214548 | MEG3 lncRNA (maternally expressed 3) | 2.2⨉10−123 | 93 | 313 |
| N4BP2L2-IT2* | Ccan_OSU1_lncRNA_contig81871.1 | Homo sapiens | ENSG00000281026 | N4BP2L2-IT2 lncRNA (N4BPL2 intronic transcript 2) | 2.2⨉10−6 | 76.2 | 130 |
| NIPBL-DT | Ccan_OSU1_lncRNA_contig25986.1 | Homo sapiens | ENSG00000285967 | NIPBL lncRNA bidirectional promoter (Nipped-B homolog) | 3.6⨉10−38 | 80.9 | 225 |
| PDK3; noncoding isoform* | Ccan_OSU1_lncRNA_contig72478.1 | Homo sapiens | ENSG00000067992 | PDK3 (pyruvate dehydrogenase kinase, isozyme 3) | 1.8⨉10−37 | 84.2 | 171 |
| RASSF3; noncoding isoform* | Ccan_OSU1_lncRNA_contig10200.1 | Homo sapiens | ENSG00000153179 | RASSF3 (Ras associated (RalGDS/AF-6) domain family member 3) | 0 | 83.2 | 963 |
| RASSF3; noncoding isoform* | Ccan_OSU1_lncRNA_contig10200.2 | Homo sapiens | ENSG00000153179 | RASSF3 (Ras associated (RalGDS/AF-6) domain family member 3) | 0 | 83.3 | 962 |
| AC098818.2†; (antisense to BMP2K) | Ccan_OSU1_lncRNA_contig59404.1 | Homo sapiens | ENSG00000260278 | RP11-109G23.3 gene, antisense lncRNA | 4.5⨉10−59 | 83.3 | 275 |
| TRIM56; sense overlapping | Ccan_OSU1_lncRNA_contig18315.1 | Homo sapiens | ENSG00000169871 | RP11-395B7.7 gene, sense overlapping lncRNA (TPA - predicted) | 4.7⨉10−28 | 72.8 | 519 |
| RP11-395B7.7 | Ccan_OSU1_lncRNA_contig47935.1 | Homo sapiens | ENSG00000260336 | RP11-395B7.7 gene, sense overlapping lncRNA (TPA - predicted) | 9.7⨉10−22 | 73.9 | 284 |
| AC090948.1 | Ccan_OSU1_lncRNA_contig29838.1 | Homo sapiens | ENSG00000271964 | RP11-415F23.2 gene, antisense lncRNA (TPA - predicted) | 1.5⨉10−26 | 93.3 | 89 |
| AL591848.4† | Ccan_OSU1_lncRNA_contig59344.1 | Homo sapiens | ENSG00000260855 | RP11-439E19.10 gene, antisense lncRNA (TPA - predicted) | 4.9⨉10−4 | 96.9 | 32 |
| AC022893.2 | Ccan_OSU1_lncRNA_contig76877.1 | Homo sapiens | ENSG00000260838 | RP11-531A24.3 gene, lincRNA (TPA - predicted) | 3.6⨉10−39 | 81.4 | 226 |
| AL355488.1 (antisense to SLC16A4) | Ccan_OSU1_lncRNA_contig17784.1 | Homo sapiens | ENSG00000273373 | RP5-1074 L1.4 gene, antisense lncRNA (TPA - predicted) | 1.0⨉10−44 | 89.9 | 149 |
| THRB-AS1; (antisense to THRB) | Ccan_OSU1_lncRNA_contig53102.1 | Homo sapiens | ENSG00000228791 | THRB antisense/reverse strand (thyroid hormone receptor, beta) | 6.8⨉10−18 | 80.9 | 136 |
| TINCR; lncRNA isoform | Ccan_OSU1_lncRNA_contig14850.1 | Homo sapiens | ENSG00000223573 | TINCR lncRNA (tissue differentiation-inducing non-protein coding RNA) | 4.1⨉10−44 | 82.2 | 225 |
| TUG1; lncRNA isoform | Ccan_OSU1_lncRNA_contig6874.1 | Mus musculus | ENSG00000253352 | TUG1 lncRNA (taurine upregulated gene 1) | 6.2⨉10−79 | 79.9 | 448 |
| UBR5; lncRNA isoform* | Ccan_OSU1_lncRNA_contig10406.1 | Homo sapiens, Bos taurus | ENSG00000104517 | UBR5 (ubiquitin protein ligase E3 component n-recognin 5) | 0 | 82.9 | 977 |
| XIST | Ccan_OSU1_lncRNA_contig185.1 | Homo sapiens, Mus musculus | ENSG00000229807 | XIST lncRNA (X inactive specific transcript) | 3.1⨉10− 136 | 79.7 | 772 |
E, the E-value for the highest-scoring BLASTn match; %ID, percent identity between the contig and matching query sequence, by BLASTn; nt, length of match (nt); E-value of “0” means that E < 2.23 × 10− 308. Columns as follows: “Symbol”, Human Gene Nomenclature Committee gene symbol; “annotation”, classification of the lncRNA transcript type if it is not an obligate lncRNA gene or if it is antisense to a protein-coding gene (i. entries with an asterisk after the annotation denote noncoding transcript contigs whose orthologs are potential noncoding isoforms; see Methods; ii. entries with a dagger after the annotate denote transcripts which have new BLASTn annotations for beaver, as of November 18, 2019); “Contig,”, the name of the transcript contig; “Species”, the species in which orthologs of the contig were detected by sequence similarity; Ensembl Gene ID, the Ensembl gene identifier of the putative human ortholog; “BLASTn annotation”, the annotation of the BLASTn hit corresponding to the statistics in the last three columns (E, %ID, nt)
Results of pathway enrichment analysis of human orthologs of beaver lncRNAs
| Pathway name | Gene set size of pathway | Enrichment score (normalized) | FDR adjusted |
|---|---|---|---|
| KEGG_RIBOSOME | 87 | 2.48 | < 10− 8 |
| KEGG_PROTEIN_EXPORT | 22 | 2.38 | < 10−8 |
| KEGG_NEUROACTIVE_LIGAND_RECEPTOR_INTERACTION | 263 | 1.68 | < 10−8 |
| KEGG_TASTE_TRANSDUCTION | 48 | 2.20 | < 10−8 |
| KEGG_REGULATION_OF_ACTIN_CYTOSKELETON | 211 | 1.17 | < 10−8 |
| KEGG_RNA_POLYMERASE | 28 | 1.91 | 0.025 |
| KEGG_CALCIUM_SIGNALING_PATHWAY | 176 | 1.86 | 0.049 |
The normalized enrichment scores are computed as described in [33]
Fig. 6Tissue-specific expression of beaver lncRNAs that are orthologous to known noncoding transcripts. Heatmap rows correspond to the 40 contigs and columns correspond to the 16 tissues that were profiled. Cells are colored by log2(1 + RPKM) expression level. Rows and columns are separately ordered by hierarchical agglomerative clustering and cut-based sub-dendrograms are colored (arbitrary color assignment to sub-clusters) as a guide for visualization. Rows are labeled with abbreviated contig names, e.g., contig29838.1 instead of Ccan_OSU1_lncRNA_contig29838.1
Fig. 7Predicted minimum-free energy secondary structures of the putative beaver MEG3 lncRNA Ccan_OSU1_lncRNA_contig11359.1 (a) and the homologous sequence of human MEG3 (b). False color indicates pairing probability (see colormap in panel A)
Fig. 8Predicted minimum-free energy secondary structure of the novel spleen- and ovary-specific lncRNA Ccan_OSU1_lncRNA_contig44966.1, showing relatively high pairing probabilities. False color indicates base pairing probability (see colormap)
Evidentiary criteria for filtering transcript contigs based on the MAKER gene annotation features
| Annotation Tool | Annotation Call | |
|---|---|---|
| Basis for Exclusion as lncRNA | blastx | protein_match |
| genemark | match, match_part | |
| maker | CDS | |
| protein2genome | match_part, protein_match | |
| snap_masked | match, match_part | |
| tblastx | match_part, translated_nucleotide_match | |
| not basis for Exclusion as lncRNA | blastn | expressed_sequence_match; match_part |
| blastx | match_part | |
| cdna2genome | expressed_sequence_match; match_part | |
| est2genome | expressed_sequence_match; match_part | |
| maker | exon, gene, mRNA | |
| repeatmasker | match; match_part |