| Literature DB >> 33122769 |
Zehua Zeng1, Yuzhe Xiong1, Wenhuan Guo1, Hongwu Du2.
Abstract
In gene expression analysis, sample differences and experimental operation differences are common, but sometimes, these differences will cause serious errors to the results or even make the results meaningless. Finding suitable internal reference genes efficiently to eliminate errors is a challenge. Aside from the need for high efficiency, there is no package for screening endogenous genes available in Python. Here, we introduce ERgene, a Python library for screening endogenous reference genes. It has extremely high computational efficiency and simple operation steps. The principle is based on the inverse process of the internal reference method, and the robust matrix block operation makes the selection of internal reference genes faster than any other method.Entities:
Mesh:
Year: 2020 PMID: 33122769 PMCID: PMC7596506 DOI: 10.1038/s41598-020-75586-5
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1(a) The boxplot of test data before processing. (b) The density plot of the test data before processing. (c) The boxplot of test data after processing. (d) The density plot of the test data after processing. The boxplot’s abscissa is the sample, the ordinate is the gene expression, the green line is the median, the blue box line is the quartile, and the black point is the outlier. The density plot’s abscissa is the length of the data, the ordinate is the data density, and the lines of different colors represent different samples.
Figure 2(a) The boxplot of GSE4786-MC before processing. (b) The density plot of the GSE4786-MC before processing. (c) The boxplot of GSE4786-MC after processing. (d) The density plot of the GSE4786-MC after processing. The boxplot’s abscissa is the sample, the ordinate is the gene expression, the green line is the median, the blue box line is the quartile, and the black point is the outlier. The density plot’s abscissa is the length of the data, the ordinate is the data density, and the lines of different colors represent different samples.
Figure 3The upsetplot of 66 sample pairs overlap (GSE125792, 12 samples). The abscissa represents the sample pair, and the ordinate represents the appearance of the candidate internal reference genes. The height of the column in the upper bar chart represents the number of the sample pairs. In the upper bar chart, the height of the column represents the ordinal number of the sample pair. The higher the ordinal number, the higher the column.
Normfinder versus ERgene in computational time.
| The Number of genes | 2 samples | 3 samples | 4 samples | |||
|---|---|---|---|---|---|---|
| Normfinder | ERgene | Normfinder | ERgene | Normfinder | ERgene | |
| 100 | 0.1 s | 0.1 s | 0.5 s | 0.66 s | 1 s | 1.37 s |
| 500 | 35 s | 0.48 s | 42 s | 1.35 s | 44 s | 2.67 s |
| 1000 | 6 min | 1.05 s | 6 min 11 s | 3.20 s | 6 min 40 s | 10.89 s |
| 2000 | 55 min | 4.58 s | 55 min | 10.95 s | 56 min | 28.47 s |
Comparison of internal reference genes found (the genes in bold are identical; the genes in italics are in the same family) (The test data are not converted by a probe).
| Normfinder & geNorm | ERgene | |
|---|---|---|
| Test dataset ( | P47754, | |
| Horrison dataset | ATP5B, ACTB, B2M, | |
| McLoughlin dataset |