| Literature DB >> 31409293 |
Shintaro Katayama1, Tiina Skoog2, Cilla Söderhäll2,3, Elisabet Einarsdottir2,4, Kaarel Krjutškov2,4,5, Juha Kere2,4,6.
Abstract
BACKGROUND: Standard RNAseq methods using bulk RNA and recent single-cell RNAseq methods use DNA barcodes to identify samples and cells, and the barcoded cDNAs are pooled into a library pool before high throughput sequencing. In cases of single-cell and low-input RNAseq methods, the library is further amplified by PCR after the pooling. Preparation of hundreds or more samples for a large study often requires multiple library pools. However, sometimes correlation between expression profiles among the libraries is low and batch effect biases make integration of data between library pools difficult.Entities:
Keywords: Gene expression; Library bias correction; Next-generation sequencing; Transcriptome
Mesh:
Substances:
Year: 2019 PMID: 31409293 PMCID: PMC6693229 DOI: 10.1186/s12859-019-3017-9
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
The history of library preparation and sequencing for investigation of the library biases
| Library | Library synthesis (date) | Sequencing (flowcell:lane) | ||||||
|---|---|---|---|---|---|---|---|---|
| Project | ID | RNA into capture plate | Started | Finished | Quantity (nM) | 1st | 2nd | 3rd |
| BEAS | a | 2015/12/30 | 2016/01/04 | 2016/01/07 | 0.50 | A:6 | B:1 | C:1 |
| BEAS | b | 2015/12/30 | 2016/01/04 | 2016/01/07 | 0.26 | A:7 | B:2 | C:2 |
| BEAS | c | 2015/12/30 | 2016/01/04 | 2016/01/07 | 1.71 | A:8 | B:3 | C:3 |
| BEAS | d | 2015/12/30 | 2016/01/04 | 2016/01/07 | 0.82 | B:4 | C:4 | D:5 |
| BEAS | e | 2015/12/30 | 2016/01/08 | 2016/01/11 | 1.11 | B:5 | C:5 | D:6 |
| BEAS | f | 2015/12/30 | 2016/01/08 | 2016/01/11 | 1.10 | B:6 | C:6 | D:7 |
| BEAS | g | 2015/12/30 | 2016/01/08 | 2016/01/11 | 1.01 | B:7 | C:7 | D:8 |
| BEAS | h | 2015/12/30 | 2016/01/08 | 2016/01/11 | 1.40 | B:8 | C:8 | E:1 |
| THP1 | a | 2015/12/04 | 2015/12/07 | 2015/12/09 | 1.50 | A:1 | F:3 | G:2 |
| THP1 | b | 2015/12/04 | 2015/12/07 | 2015/12/09 | 2.10 | A:2 | F:4 | G:3 |
| THP1 | c | 2015/12/04 | 2015/12/08 | 2015/12/10 | 2.00 | F:5 | G:4 | G:5 |
| THP1 | d | 2015/12/04 | 2015/12/08 | 2015/12/10 | 1.90 | A:3 | F:6 | G:6 |
| THP1 | e | 2015/12/07 | 2015/12/08 | 2015/12/10 | 2.90 | A:4 | F:7 | G:7 |
| THP1 | f | 2015/12/07 | 2015/12/08 | 2015/12/10 | 2.70 | A:5 | F:8 | G:8 |
Fig. 1Biases between BEAS-2B libraries. a Spearman correlation coefficients between the 99 technical replicates over 8 libraries and the similarities; the color scheme for representation of the libraries is common also for the following panels. b PCA on VST [10] normalized expression levels of the replicates. c The top 5 contributing genes to the dimensions. Red dashed lines correspond to the expected value if the contributions are uniform. d The normalized levels of CCDC85B and RRM2
Fig. 2Library bias correction to the BEAS-2B libraries. a Correlation between the sequencing depths (x-axis) and the raw read counts (y-axis) of CCDC85B (left) and RRM2 (right) before (top) and after NBGLM-LBC (bottom); solid lines were the linear regressions, and the transparent bands were the 95% prediction intervals. b Spearman correlation coefficients between the replicates after NBGLM-LBC. c PCA on VST normalized expression levels after the library bias correction. d Relation between the library quantity (x-axis of the left panel), proportions of mapped reads on protein coding genes (x-axis of the right panel), and the library redundancy (y-axis)
Fig. 3NBGLM-LBC required consistent sample layout. Tables (top) are sample layouts of two simulation datasets. A consistent sample layout (left) is defined as the number of control samples and case samples in each library being almost the same, while in an inconsistent sample layout (right) each library has either control or case samples. Panels (bottom) are PCAs on VST normalized expression levels before (left in each sample layout) and after (right in each sample layout) NBGLM-LBC
Fig. 4Library biases and the correction in GEWAC study. a Hierarchical clustering of GEWAC subjects using the leukocyte expression profile without the library bias correction. Upper bar below tree represents the libraries, and lower bar represents the sample types. b PCA of the expression profile without the library bias correction. c Relation between the library quantity (x-axis of the left panel), proportions of mapped reads on protein coding genes (x-axis of the right panel), and the library redundancy (y-axis). d Hierarchical clustering of GEWAC subjects using the leukocyte expression profile with the library bias correction. The upper bar below tree represents the libraries, the lower bar represents the sample types