| Literature DB >> 31015567 |
Maria Suntsova1, Nurshat Gaifullin2, Daria Allina3, Alexey Reshetun4, Xinmin Li5, Larisa Mendeleeva6, Vadim Surin6, Anna Sergeeva6, Pavel Spirin7, Vladimir Prassolov7, Alexander Morgan8, Andrew Garazha9,10, Maxim Sorokin11,12, Anton Buzdin9,10,13.
Abstract
Comprehensive analysis of molecular pathology requires a collection of reference samples representing normal tissues from healthy donors. For the available limited collections of normal tissues from postmortal donors, there is a problem of data incompatibility, as different datasets generated using different experimental platforms often cannot be merged in a single panel. Here, we constructed and deposited the gene expression database of normal human tissues based on uniformly screened original sequencing data. In total, 142 solid tissue samples representing 20 organs were taken from post-mortal human healthy donors of different age killed in road accidents no later than 36 hours after death. Blood samples were taken from 17 healthy volunteers. We then compared them with the 758 transcriptomic profiles taken from the other databases. We found that overall 463 biosamples showed tissue-specific rather than platform- or database-specific clustering and could be aggregated in a single database termed Oncobox Atlas of Normal Tissue Expression (ANTE). Our data will be useful to all those working with the analysis of human gene expression.Entities:
Mesh:
Year: 2019 PMID: 31015567 PMCID: PMC6478850 DOI: 10.1038/s41597-019-0043-4
Source DB: PubMed Journal: Sci Data ISSN: 2052-4463 Impact factor: 6.444
Fig. 1The hierarchical clustering dendrogram of all experimental RNA sequencing profiles of human tissues. Gene expression data were used to calculate Euclidian distances between the samples. Color indicates the sample preparation method (tissue in FFPE, RNA in ethanol, tissue in RNAlater). The lower scale indicates the number of uniquely mapped reads. QC denotes the quality control threshold of 2.5 million uniquely mapped reads.
Fig. 2The distribution of the experimental RNA sequencing profiles with respect to the number of uniquely mapped reads. The vertical dashed line indicates the QC threshold of 2.5 million uniquely mapped reads per sample.
Human tissue samples included in the RNA sequencing assay.
| Tissue | # of samples | # of samples passed QC |
|---|---|---|
| Adrenal gland | 6 | 5 |
| Bladder | 5 | 4 |
| Brain | 9 | 7 |
| Cervix | 4 | 4 |
| Colon | 12 | 7 |
| Esophagus | 8 | 7 |
| Kidney | 8 | 6 |
| Liver | 8 | 7 |
| Lung | 8 | 7 |
| Mammary gland | 5 | 5 |
| Normal CD138+cells | 11 | 10 |
| Ovary | 4 | 4 |
| Pancreas | 8 | 6 |
| Prostate | 6 | 6 |
| Skeletal muscle | 6 | 6 |
| Skin | 6 | 6 |
| Small intestine | 9 | 5 |
| Stomach | 15 | 10 |
| Thyroid gland | 6 | 6 |
| Tonsil | 7 | 6 |
| Uterus (myometrium) | 2 | 2 |
| Whole blood nuclear cells | 6 | 6 |
|
|
|
|
Fig. 3The hierarchical clustering dendrogram of QC-passed experimental RNA sequencing profiles of human tissues. Gene expression data were used to calculate Euclidian distances between the samples. The color markers indicate the tissue types. The lower scale indicates the number of uniquely mapped reads. ‘QC’ denotes the quality control threshold of 2.5 million uniquely mapped reads.
Fig. 4RIN vs number of uniquely mapped reads per sample. Spearman’s rho = 0.344 (p-value = 9.687e-06). The horizontal dashed line indicates the QC threshold of 2.5 mln uniquely mapped reads.
Fig. 5RNA concentration vs number of uniquely mapped reads per sample. Spearman’s rho = 0.03 (p-value = 0.8). The horizontal dashed line indicates the QC threshold of 2.5 mln uniquely mapped reads.
Fig. 6The correlation plots for four gene expression profiles in replicate RNA sequencing experiments. (a) Comparison for the esophagus, E_3 tissue biosample. (b) Comparison for the liver, ID 15_6 tissue biosample. Upper part of the diagonally split matrix shows correlation coefficients (Spearman’s rho). Bottom diagonal shows pairwise plots for gene expression values in logarithmic scale for every pair of replicates under comparison.
Fig. 7The dendrogram of normal samples from Oncobox (no prefix in sample names), TCGA database (“TCGA_” prefix in samples names) and ENCODE (“ENCODE_” prefix in samples names). For the TCGA data 10 random samples per tissue type were selected for visualization, in cases when more norms were available. Euclidian distance between the samples was measured using gene expression data. The dendrogram was built using R ward.D2 method. The color markers indicate the tissue type. The lower scales indicate the number of uniquely mapped reads.
Overview of Oncobox Atlas of Normal Tissue Expression (ANTE) database.
| Tissue | Number of samples |
|---|---|
| Adrenal gland | 12 |
| Brain | 12 |
| Esophagus | 13 |
| Kidney | 136 |
| Liver | 60 |
| Lung | 123 |
| Ovary | 8 |
| Pancreas | 9 |
| Prostate | 7 |
| Skin | 14 |
| Thyroid gland | 69 |
| Design Type(s) | gene expression analysis objective • data integration objective • organism part comparison design • transcription profiling design |
| Measurement Type(s) | transcription profiling assay |
| Technology Type(s) | RNA sequencing |
| Factor Type(s) | sex • age • organism subdivision |
| Sample Characteristic(s) | Homo sapiens • kidney • Kidney • Colon • Liver • brain • Lung • endometrium • Ovary • prostate gland • Esophagus • Stomach • Mammary gland • Thyroid gland • Pancreas • Tonsil • skeletal muscle tissue • small intestine • Adrenal gland • urinary bladder • skin of body • uterus • uterine cervix • Small intestine • tonsil • blood • bone marrow • esophagus • liver |
Primary characteristics of tissue biosamples
| Donor ID | Sex | Age, y.o. | Tissues collected |
|---|---|---|---|
| FFPE-1 | Female | 29 | Brain, Cervix, Colon, Esophagus, Ovary, Kidney, Liver, Stomach |
| FFPE-2 | Male | 27 | Brain, Esophagus, Kidney, Liver, Prostate, Stomach, Colon |
| FFPE-3 | Male | 47 | Brain, Colon, Esophagus, Kidney, Liver, Lung, Prostate, Stomach |
| FFPE-4 | Male | 33 | Brain, Colon, Esophagus, Kidney, Liver, Lung, Prostate, Stomach |
| FFPE-5 | Female | 50 | Cervix, Colon, Esophagus, Kidney, Liver, Lung, Ovary, Brain, Stomach |
| FFPE-6 | Female | 24 | Liver, Brain, Colon, Esophagus, Kidney, Stomach |
| FFPE-7 | Male | 39 | Lung |
| FFPE-8 | Male | 41 | Lung |
| Later-1 | Male | 36 | Adrenal gland, Colon, Pancreas, Skeletal muscle, Small intestine, Stomach, Thyroid gland, Tonsil |
| Later-10 | Female | 44 | Mammary gland |
| Later-11 | Female | 51 | Mammary gland |
| Later-12 | Female | 42 | Adrenal gland, Bladder, Brain, Cervix, Colon, Esophagus, Lung, Mammary gland, Ovary, Pancreas, Skeletal muscle, Skin, Small intestine, Stomach, Thyroid gland, Tonsil, Uterus (myometrium) |
| Later-13 | Male | 16 | Adrenal gland, Brain, Colon, Pancreas, Prostate, Skin, Small intestine, Stomach, Tonsil |
| Later-14 | Male | 29 | Prostate, Skin, Small intestine, Stomach |
| Later-15 | Male | 49 | Prostate, Skin, Small intestine, Stomach |
| Later-16 | Female | 51 | Cervix, Kidney, Liver, Mammary gland, Ovary, Pancreas, Skin, Small intestine, Stomach, Tonsil, Uterus (myometrium) |
| Later-2 | Male | 20 | Adrenal gland, Colon, Lung, Pancreas, Skeletal muscle, Small intestine, Stomach, Thyroid gland |
| Later-3 | Male | 45 | Adrenal gland, Bladder, Colon, Pancreas, Skeletal muscle, Small intestine, Stomach, Thyroid gland, Tonsil |
| Later-4 | Male | 42 | Bladder, Pancreas, Skeletal muscle, Thyroid gland, Tonsil |
| Later-5 | Male | 44 | Bladder |
| Later-6 | Male | 54 | Bladder |
| Later-8 | Female | 12 | Adrenal gland, Brain, Colon, Esophagus, Kidney, Liver, Lung, Pancreas, Skeletal muscle, Skin, Small intestine, Stomach, Thyroid gland, Tonsil |
| Later-9 | Female | 35 | Mammary gland |
| WB1 | Male | 25 | Peripheral blood mononuclear cells |
| WB2 | Male | 75 | Peripheral blood mononuclear cells |
| WB5 | Female | 24 | Peripheral blood mononuclear cells |
| WB6 | Male | 36 | Peripheral blood mononuclear cells |
| WB7 | Female | 23 | Peripheral blood mononuclear cells |
| WB8 | Female | 31 | Peripheral blood mononuclear cells |
| D1 | Female | 25 | Normal CD138 + cells |
| D2 | Female | 28 | Normal CD138 + cells |
| D3 | Male | 33 | Normal CD138 + cells |
| D4 | Male | 24 | Normal CD138 + cells |
| D5 | Female | 36 | Normal CD138 + cells |
| D6 | Female | 41 | Normal CD138 + cells |
| D7 | Female | 30 | Normal CD138 + cells |
| D8 | Male | 40 | Normal CD138 + cells |
| D9 | Male | 38 | Normal CD138 + cells |
| D10 | Male | 27 | Normal CD138 + cells |
| D11 | Female | 33 | Normal CD138 + cells |