| Literature DB >> 21328370 |
Hongkai Wu1, Jinwen Wang, Riqiang Deng, Ke Xing, Yuanyan Xiong, Junfeng Huang, Xionglei He, Xunzhang Wang.
Abstract
The severe acute respiratory syndrome (SARS) leads to severe injury in the lungs with multiple factors, though the pathogenesis is still largely unclear. This paper describes the particular analyses of the transcriptome of human lung tissue that was infected by SARS-associated coronavirus (SARS-CoV). Random primers were used to produce ESTs from total RNA samples of the lung tissue. The result showed a high diversity of the transcripts, covering much of the human genome, including loci which do not contain protein coding sequences. 10,801 ESTs were generated and assembled into 267 contigs plus 7,659 singletons. Sequences matching to SARS-CoV RNAs and other pneumonia-related microbes were found. The transcripts were well classified by functional annotation. Among the 7,872 assembled sequences that were identified as from human genome, 578 non-coding genes were revealed by BLAST search. The transcripts were mapped to the human genome with the restriction of identity=100%, which found a candidate pool of 448 novel transcriptional loci where EST transcriptional signal was never found before. Among these, 13 loci were never reported to be transcriptional by other detection methods such as gene chips, tiling arrays, and paired-end ditags (PETs). The result showed that random-priming cDNA library is valid for the investigation of transcript diversity in the virus-infected tissue. The EST data could be a useful supplemental source for SARS pathology researches.Entities:
Mesh:
Substances:
Year: 2011 PMID: 21328370 PMCID: PMC7166665 DOI: 10.1002/jmv.22012
Source DB: PubMed Journal: J Med Virol ISSN: 0146-6615 Impact factor: 2.327
Other cDNA Libraries of Human Lung Tissue Used in This Study for Comparison Analysis
| Library name | dbEST ID | Description | EST number |
|---|---|---|---|
| MGC101 | dbEST:10453 | Epidermoid carcinoma | 9,166 |
| MGC69 | dbEST:5608 | Large cell carcinoma, undifferentiated | 9,748 |
| dbEST16438 | dbEST:16438 | Fetal fibroblast | 3,339 |
| UICFEC1 | dbEST:10395 | Normal lung from adult and from fetal day 64, day 87, week 19, and week 42 | 12,971 |
| UICFDU1 | dbEST:10398 | Adult primary lung epithelial cells | 12,742 |
| dbEST249 | dbEST:249 | Male, 72 years old | 13,244 |
All these cDNA libraries were generated by oligo‐dT priming.
Figure 1The length distributions for ESTs before and after assembly. The lengths of assembled ESTs (black bars) were prolonged comparing with that of ESTs before assembly (open bars), with the longest ESTs exceeding 6 kbp.
Classification of the 65 Transcripts That Did Not Match to Human Genome
| Classes | Number of assembled sequences | Number of ESTs |
|---|---|---|
| Transcripts matched to human sequences from nt database | 11 | 11 |
| SARS‐CoV RNA sequences | 10 | 162 |
| Sequences similar to pneumonia‐related microbes | 12 | 13 |
| Non‐information transcripts | 32 | 32 |
| Total | 65 | 218 |
The classification was based on the annotations of the best matching homologs during BlastN searches of the 65 transcripts against the NCBI nt database.
The class “Non‐information transcripts” includes sequences that did not match to any known sequences (E‐value >0.001).
Transcripts With Significant Similarity to Pneumonia‐Related Microbes
| Accession | Sequence length (bp) | Nt accession | Species | Score | Coverage (%) | E‐Value | Identity (%) |
|---|---|---|---|---|---|---|---|
| a0_000908 | 138 | EF061771.1 |
| 183 | 80 | 1E−43 | 96 |
| da0_004802 | 128 | AF443616.3 |
| 226 | 100 | 1E−56 | 99 |
| b0_000821 | 135 | AF443616.3 |
| 232 | 99 | 1E−57 | 98 |
| b0_000840 | 212 | AF125581.1 |
| 347 | 100 | 2E−92 | 93 |
| Contig267 | 388 | AY737013.1 |
| 695 | 100 | 0.0 | 99 |
| de0_010065 | 626 | AE004091.2 |
| 325 | 72 | 2E−34 | 92 |
| de0_012036 | 427 | AF440524.1 |
| 187 | 78 | 4E−44 | 72 |
| de0_003139 | 44 | AY956411.1 |
| 50.1 | 56 | 4E−04 | 100 |
| de0_007533 | 112 | AJ746243.1 |
| 141 | 91 | 5E−31 | 92 |
| de0_001693 | 217 | BX640425.1 |
| 150 | 95 | 7E−35 | 76 |
| db0_001118 | 181 | CP000408.1 |
| 269 | 100 | 1E−69 | 90 |
| db0_003991 | 129 | AF269487.1 |
| 129 | 98 | 2E−27 | 82 |
The contig was assembled from 2 ESTs.
Figure 2Statistics of BLAST searches. a: BlastP search of assembled sequences against NCBI refseq human protein dataset. b: BlastN search of assembled sequences against NCBI refseq human RNA dataset.
Novel Transcriptional Loci Addressed by UCSC Human Genome Browser
| Blat identity (%) | Amounts of loci lacking transcriptional signals | |||||
|---|---|---|---|---|---|---|
| Without EST signal | Without EST + method1 | Without EST + method2 | Without EST + method3 | Without EST + method1 + method2 signals | Without EST + method1 + method2 + method3 signals | |
| >97 | 1,364 | 443 | 69 | 15 | 37 | 3 |
| >98 | 1,318 | 425 | 64 | 15 | 35 | 3 |
| >99 | 1,179 | 378 | 56 | 11 | 31 | 2 |
| =100 | 448 | 139 | 21 | 3 | 13 | 1 |
Method1—gene chips or tiling arrays.
Method2—paired‐end ditags (PETs).
Method3—RNA‐Seq.
Transcripts With Significant Similarity to MicroRNAs
| Accession | Length (bp) | MicroRNA | MiR length (bp) | Coverage (%) | E‐Value | Identity (%) |
|---|---|---|---|---|---|---|
| a0_001869 | 342 | hsa‐mir‐1268 | 52 | 100 | 2E−11 | 90.38 |
| Contig135 | 2,438 | 100 | 1E−10 | 90.38 | ||
| Contig161 | 1,589 | 100 | 7E−11 | 90.38 | ||
| Contig167 | 1,344 | 100 | 6E−11 | 90.38 | ||
| Contig203 | 2,565 | 100 | 1E−10 | 90.38 | ||
| Contig223 | 1,208 | 100 | 5E−11 | 90.38 | ||
| Contig241 | 1,696 | 100 | 3E−13 | 92.31 | ||
| Contig68 | 1,673 | 100 | 8E−11 | 90.38 | ||
| da0_001367 | 504 | 100 | 9E−14 | 92.31 | ||
| de_178 | 570 | 100 | 3E−11 | 90.38 | ||
| de0_00060 | 569 | 100 | 3E−11 | 90.38 | ||
| de0_003191 | 683 | 100 | 5E−16 | 94.23 | ||
| de0_003984 | 701 | 100 | 3E−11 | 90.38 | ||
| de0_004380 | 601 | 100 | 1E−13 | 92.31 | ||
| de0_005202 | 701 | 100 | 3E−11 | 90.38 | ||
| de0_006649 | 602 | 100 | 1E−13 | 92.31 | ||
| de0_012147 | 743 | 98.08 | 1E−10 | 90.2 | ||
| Contig135 | 2,438 | hsa‐mir‐566 | 94 | 95.74 | 1E−28 | 92.22 |
| Contig223 | 1,208 | 95.74 | 2E−26 | 91.11 | ||
| Contig232 | 1,131 | 95.74 | 2E−26 | 91.11 | ||
| Contig251 | 814 | 95.74 | 1E−26 | 91.11 | ||
| de0_008873 | 649 | 95.74 | 9E−27 | 91.11 | ||
| Contig135 | 2,438 | ptr‐mir‐566 | 93 | 96.77 | 1E−28 | 92.22 |
| Contig223 | 1,208 | 96.77 | 2E−26 | 91.11 | ||
| Contig232 | 1,131 | 96.77 | 2E−26 | 91.11 | ||
| Contig251 | 814 | 96.77 | 1E−26 | 91.11 | ||
| de0_008873 | 649 | 96.77 | 9E−27 | 91.11 |
Statistics of Transcripts with Significant Similarity to Different Datasets of RNAdba
| Datasets of RNAdb | rnaz | ncrnascan | evofold | fantom3 | asoverlaps | snorna | hinv | combinedlit | pirna |
|---|---|---|---|---|---|---|---|---|---|
| Number of matched transcripts | 56 | 1 | 26 | 42 | 18 | 0 | 11 | 12 | 447 |
For dataset “pirna” the cut off values of coverage = 100%, identity = 100%, and E‐value < 1E−7 were used; for other datasets, the cut off values of coverage > 90%, identity > 90%, and E‐value < 1E−7 were used.
Figure 3Venn diagram of the transcripts with similarities to sequences in coding and non‐coding datasets, showing the coding sequence class and the non‐coding sequence class, and an overlapping of 281 transcripts that shared between the two classes.
Gene Ontology Comparison of the SARS‐CoV Library against Other Six Human Lung Tissue Libraries
| Categories | Libraries (brief description) |
| ||||||
|---|---|---|---|---|---|---|---|---|
| SARS‐CoV Inf | MGC101 (epidermoid carcinoma) | MGC69 (large cell carcinoma) | dbEST16438 (fetal fibroblast) | UICFEC1 (normal, adult + fetal) | UICFDU1 (normal, adult) | dbEST249 (normal, 72 year old) | ||
| Cellular component | 261 (100%) | 2,735 (100%) | 1,962 (100%) | 962 (100%) | 1,846 (100%) | 1,376 (100%) | 951 (100%) | |
| Cell | 239 (91.57%)↓ | 2,620 (95.79%) | 1,894 (96.53%) | 917 (95.32%) | 1,721 (93.22%) | 1,306 (94.91%) | 872 (91.69%) | 0.028 |
| Virion | 18 (6.89%)↑ | 1 (0.03%) | 0 (0.00%) | 0 (0.00%) | 3 (0.16%) | 1 (0.07%) | 5 (0.52%) | 0.027 |
| Envelope | 8 (3.06%)↓ | 126 (4.60%) | 413 (21.04%) | 43 (4.46%) | 85 (4.60%) | 81 (5.88%) | 47 (4.94%) | 0.027 |
| Macromolecular complex | 39 (14.94%)↓ | 779 (28.48%) | 772 (39.34%) | 237 (24.63%) | 503 (27.24%) | 411 (29.86%) | 294 (30.91%) | 0.028 |
| Organelle | 131 (50.19%)↓ | 1,850 (67.64%) | 1,439 (73.34%) | 633 (65.80%) | 1,163 (63.00%) | 910 (66.13%) | 576 (60.56%) | 0.028 |
| Extracellular region part | 20 (7.66%)↑ | 88 (3.21%) | 54 (2.75%) | 42 (4.36%) | 96 (5.20%) | 31 (2.25%) | 50 (5.25%) | 0.028 |
| Organelle part | 68 (26.05%)↓ | 836 (30.56%) | 818 (41.69%) | 263 (27.33%) | 517 (28.00%) | 422 (30.66%) | 282 (29.65%) | 0.028 |
| Virion part | 18 (6.89%)↑ | 1 (0.03%) | 0 (0.00%) | 0 (0.00%) | 3 (0.16%) | 1 (0.07%) | 5 (0.52%) | 0.027 |
| Synapse part | 1 (0.38%)↑ | 4 (0.14%) | 1 (0.05%) | 0 (0.00%) | 1 (0.05%) | 0 (0.00%) | 3 (0.31%) | 0.027 |
| Cell part | 239 (91.57%)↓ | 2,620 (95.79%) | 1,894 (96.53%) | 917 (95.32%) | 1,721 (93.22%) | 1,306 (94.91%) | 872 (91.69%) | 0.028 |
| Synapse | 3 (1.14%)↑ | 8 (0.29%) | 3 (0.15%) | 3 (0.31%) | 7 (0.37%) | 0 (0.00%) | 4 (0.42%) | 0.028 |
| Molecular function | 381 (100%) | 3,129 (100%) | 2,058 (100%) | 1,077 (100%) | 2,106 (100%) | 1,593 (100%) | 1,121 (100%) | |
| Motor activity | 10 (2.62%)↑ | 35 (1.11%) | 16 (0.77%) | 16 (1.48%) | 19 (0.90%) | 16 (1.00%) | 11 (0.98%) | 0.028 |
| Auxiliary transport protein activity | 1 (0.26%)↑ | 0(0.00%) | 0 (0.00%) | 2 (0.18%) | 3 (0.14%) | 2 (0.12%) | 0 (0.00%) | 0.026 |
| Chaperone regulator activity | 0 (0.00%)↓ | 2 (0.06%) | 2 (0.09%) | 0 (0.00%) | 5 (0.23%) | 4 (0.25%) | 1 (0.08%) | 0.043 |
| Enzyme regulator activity | 9 (2.36%)↓ | 148 (4.72%) | 63 (3.06%) | 60 (5.57%) | 86 (4.08%) | 60 (3.76%) | 54 (4.81%) | 0.028 |
| Translation regulator activity | 2 (0.52%)↓ | 81 (2.58%) | 55 (2.67%) | 25 (2.32%) | 43 (2.04%) | 41 (2.57%) | 30 (2.67%) | 0.027 |
| Molecular transducer activity | 42 (11.02%)↑ | 176 (5.62%) | 108 (5.24%) | 81 (7.52%) | 122 (5.79%) | 84 (5.27%) | 76 (6.77%) | 0.028 |
| Biological process | 338 (100%) | 2,746 (100%) | 1,943 (100%) | 975 (100%) | 1,901 (100%) | 1,403 (100%) | 1,020 (100%) | |
| Cell killing | 0 (0.00%)↓ | 13 (0.47%) | 3 (0.15%) | 0 (0.00%) | 5 (0.26%) | 2 (0.14%) | 1 (0.09%) | 0.043 |
| Immune system process | 8 (2.36%)↓ | 108 (3.93%) | 66 (3.39%) | 37 (3.79%) | 94 (4.94%) | 59 (4.20%) | 79 (7.74%) | 0.028 |
| Metabolic process | 220 (65.08%)↓ | 1,856 (67.58%) | 1,474 (75.86%) | 669 (68.61%) | 1,331 (70.01%) | 1,027 (73.20%) | 743 (72.84%) | 0.028 |
| Viral reproduction | 4 (1.18%)↑ | 11 (0.40%) | 12 (0.61%) | 2 (0.20%) | 5 (0.26%) | 8 (0.57%) | 6 (0.58%) | 0.028 |
| Multicellular organismal process | 58 (17.15%)↑ | 373 (13.58%) | 218 (11.21%) | 131 (13.43%) | 289 (15.20%) | 165 (11.76%) | 133 (13.03%) | 0.028 |
| Developmental process | 56 (16.56%)↓ | 678 (24.69%) | 331 (17.03%) | 237 (24.30%) | 442 (23.25%) | 294 (20.95%) | 205 (20.09%) | 0.028 |
| Growth | 4 (1.18%)↓ | 62 (2.25%) | 23 (1.18%) | 14 (1.43%) | 32 (1.68%) | 18 (1.28%) | 14 (1.37%) | 0.043 |
| Locomotion | 0 (0.00%)↓ | 21 (0.76%) | 3 (0.15%) | 7 (0.71%) | 7 (0.36%) | 2 (0.14%) | 4 (0.39%) | 0.028 |
| Rhythmic process | 0 (0.00%)↓ | 12 (0.43%) | 3 (0.15%) | 5 (0.51%) | 5 (0.26%) | 5 (0.35%) | 1 (0.09%) | 0.028 |
| Response to stimulus | 21 (6.21%)↓ | 322 (11.72%) | 225 (11.58%) | 128 (13.12%) | 247 (12.99%) | 178 (12.68%) | 151 (14.80%) | 0.028 |
| Localization | 58 (17.15%)↓ | 59 (21.63%) | 533 (27.43%) | 232 (23.79%) | 357 (18.77%) | 251 (17.89%) | 184 (18.03%) | 0.028 |
| Establishment of localization | 53 (15.68%)↓ | 514 (18.71%) | 489 (25.16%) | 202 (20.71%) | 317 (16.67%) | 232 (16.53%) | 163 (15.98%) | 0.028 |
| Maintenance of localization | 0 (0.00%)↓ | 18 (0.65%) | 14 (0.72%) | 7 (0.71%) | 6 (0.31%) | 10 (0.71%) | 8 (0.78%) | 0.027 |
| Biological regulation | 65 (19.23%)↓ | 876 (31.90%) | 431 (22.18%) | 317 (32.51%) | 581 (30.56%) | 37 7(26.87%) | 277 (27.15%) | 0.028 |
Only those second level GO categories with significant relative count differences were considered and discussed where the relative counts of the SARS‐CoV library were the highest or the lowest among the seven libraries. Difference significances were tested by the statistic method of Wilcoxon Signed‐Rank test with P‐value < 0.05.
Statistical Comparison of Transcripts Matching to Annotated Sequences Among the Seven Libraries
| Labraries | EST numbers | Numbers of assembled sequence (ESTs per. assembled sequence) | Numbers of sequences matched to IPI proteins (/sequence number %) | Numbers of matched sequences with GO assignments (/matched number %) |
|---|---|---|---|---|
| SARS‐CoV Inf | 10,594 | 7,872 (1.35) | 1,689 (21.45%) | 425 (25.18%) |
| MGC101 | 9,166 | 5,250 (1.75) | 4,446 (84.69%) | 3,391 (76.27%) |
| MGC69 | 9,748 | 5,366 (1.82) | 2,929 (54.58%) | 2,249 (76.78%) |
| UICFEC1 | 12,971 | 7,455 (1.74) | 3,509 (47.07%) | 2,335 (66.54%) |
| dbEST16438 | 3,339 | 1,598 (2.09) | 1,511 (94.56%) | 1,181 (78.16%) |
| UICFDU1 | 12,742 | 7,383 (1.73) | 2,861 (38.75%) | 1,766 (61.73%) |
| dbEST249 | 13,244 | 7,244 (1.83) | 2,081 (28.73%) | 1,229 (59.06%) |