| Literature DB >> 31958400 |
Rishi K Gupta1, Carolin T Turner2, Cristina Venturini2, Hanif Esmail3, Molebogeng X Rangaka4, Andrew Copas5, Marc Lipman6, Ibrahim Abubakar1, Mahdad Noursadeghi7.
Abstract
BACKGROUND: Multiple blood transcriptional signatures have been proposed for identification of active and incipient tuberculosis. We aimed to compare the performance of systematically identified candidate signatures for incipient tuberculosis and to benchmark these against WHO targets.Entities:
Mesh:
Substances:
Year: 2020 PMID: 31958400 PMCID: PMC7113839 DOI: 10.1016/S2213-2600(19)30282-6
Source DB: PubMed Journal: Lancet Respir Med ISSN: 2213-2600 Impact factor: 30.700
Characteristics of the datasets included in meta-analysis of candidate whole blood transcriptional signatures for incipient tuberculosis
| London tuberculosis contacts | 324 (8 tuberculosis; 316 healthy) | Cohort | Adult tuberculosis contacts | London, UK | Negative | Baseline | Median 1·9 (IQR 1·7– 2·2) years, record linkage | Culture-confirmed, or clinically diagnosed | 15–20 million 41 bp paired-end reads | 7/7 | Clinical evaluation and chest x-ray |
| Adolescent Cohort Study | 287 (73 tuberculosis; 214 healthy) | Nested case-control | Adolescents with latent tuberculosis infection | South Africa | Negative | Serial (0, 6, 12, and 24 months) | 2 years, active | Intrathoracic disease with 2 positive smears, or 1 positive culture | 30 million 50 bp paired-end reads | 9/9 | Clinical evaluation; tuberculosis <6 months from enrolment excluded; chest x-ray not specified |
| Grand Challenges 6-74 | 412 (98 tuberculosis; 314 healthy) | Nested case-control | Adult household pulmonary tuberculosis contacts | South Africa, The Gambia, Ethiopia | Negative | Serial (0, 6, and 18 months) | 2 years, active | Culture-confirmed or clinically diagnosed | 60 million 50 bp paired-end reads | 9/9 | Clinical evaluation; tuberculosis <3 months from enrolment excluded; chest x-ray not specified |
| Leicester tuberculosis contacts | 103 (4 tuberculosis; 99 healthy) | Cohort | Adult tuberculosis contacts | Leicester, UK | Negative | Baseline plus serial for a subset | 2 years, active | Confirmed by culture or Xpert MTB/RIF | 25 million 75 bp paired-end reads | 7/7 | Clinical evaluation and chest x-ray |
Owing to the high frequency of serial sampling (<6-monthly), only baseline samples were included.
Characteristics of candidate whole blood transcriptional signatures for incipient tuberculosis included in systematic review and meta-analysis
| Anderson38 | 42 | Disease risk score | Children | HIV positive and negative | South Africa, Malawi | Elastic net using genome-wide data | Tuberculosis | 87 | 43 | 1 |
| 1 | NA | Adults | HIV negative | UK | SVM using genome-wide data | Tuberculosis | 46 | 31 | 1 | |
| Gjoen7 | 7 | LASSO regression | Children | HIV negative | India | LASSO using 198 preselected genes | Tuberculosis | 47 | 36 | 2 |
| Gliddon3 | 3 | Disease risk score | Adults | HIV positive and negative | South Africa, Malawi | Forward Selection-Partial Least Squares using genome-wide data | Tuberculosis | 285 (tuberculosis and non-tuberculosis) | .. | 1 |
| Huang11 | 13 | SVM (linear kernel) | Adults | HIV negative | UK | Common genes from elastic net, L1/2 and LASSO models, using genome-wide data | Tuberculosis | 16 | 79 | 1 |
| Kaforou25 | 27 | Disease risk score | Adults | HIV positive and negative | South Africa, Malawi | Elastic net using genome-wide data | Tuberculosis | 285 (tuberculosis and non-tuberculosis) | .. | 1 |
| Maertzdorf4 | 4 | Random forest | Adults | HIV negative | India | Random forest using 360 selected target genes | Tuberculosis | 113 | 76 | 2 |
| NPC2 | 1 | NA | Adults | Not stated | Brazil | Differential expression using genome-wide data | Tuberculosis | 6 | 28 | 3 |
| Qian17 | 17 | Sum of standardised expression | Adults | HIV negative | UK | Differential expression of nuclear factor, erythroid 2-like 2-mediated genes | Tuberculosis | 16 | 69 | 1 |
| Rajan5 | 5 | Unsigned sums | Adults | HIV positive | Uganda | Differential expression using genome-wide data | Tuberculosis | 80 total (1:2 cases:controls) | .. | 1 |
| Roe3 | 3 | SVM (linear kernel) | Adults | HIV negative | UK | Stability selection, using genome-wide data | Incipient tuberculosis | 46 | 31 | 1 |
| Singhania20 | 20 | Modified disease risk score | Adults | HIV negative | UK, South Africa | Random forest using modular approach | Tuberculosis | Discovery set not explicitly stated | .. | 1 |
| Suliman2 | 2 | ANKRD22 – OSBPL10 | Adults | HIV negative | Gambia, South Africa, Ethiopia | Pair ratios algorithm using genome-wide data | Incipient tuberculosis | 79 | 328 | 4 |
| Suliman4 | 4 | (GAS6 + SEPT4) –(CD1C + BLK) | Adults | HIV negative | Gambia, South Africa | Pair ratios algorithm using genome-wide data | Incipient tuberculosis | 45 | 141 | 4 |
| Sweeney3 | 3 | (GBP5 + DUSP3) ÷ 2 –KLF2 | Adults | HIV positive and negative | Meta-analysis | Significance thresholding and forward search in genome-wide data | Tuberculosis | 266 | 931 | 1 |
| Walter45 | 51 | SVM (linear kernel) | Adults | HIV negative | USA | SVMs, using genome-wide data | Tuberculosis | 24 | 24 | 1 |
| Zak16 | 16 | SVM (linear kernel) | Adolescents | HIV negative | South Africa | SVM-based gene pair models using genome-wide data | Incipient tuberculosis | 37 | 77 | 1 |
Signatures are referred to by combining the first author's name of the corresponding publication as a prefix, with number of constituent genes as a suffix. For signatures where not all constituent genes were identifiable in the RNA sequencing data (eg, due to records being withdrawn), the suffix indicates the number of identifiable genes included in this analysis. Log2-transformed transcripts per million data used to calculate all signatures, unless otherwise specified. NA=not applicable. SVM=support vector machine. LASSO=least absolute shrinkage and selection operator.
Indicates total number of eligible signatures discovered in each study. Where multiple signatures were discovered for the same intended purpose and from the same training dataset, we included the signature with greatest accuracy, as defined by the area under the receiver operating characteristic curve in the validation data. Where accuracy was equivalent, we included the most parsimonious signature.
Anderson38 included 42 genes in the original, Huang11 had 13, Kaforou25 had 27, and Walter45 had 51 (genes not included in current models were either duplicates or not identifiable in RNA sequencing data).
For disease risk scores, the sum of downregulated genes was subtracted from the sum of upregulated genes. For unsigned sums and modified disease risk scores, genes were summed, irrespective of their direction of regulation.
Calculated using non-log-transformed data using model coefficients from original publication.
Required normalisation of the training and test sets. This was done for each gene by subtracting the mean expression across all samples in the dataset and dividing by the SD.
Calculated using non-log-transformed counts per million data with trimmed mean of M-values normalisation, as per original description.
Modelling approach was not clear from the original description. We recreated this using two approaches: as a simple equation of gene pairs ((GAS6+SEPT4)–(CD1C+BLK)) and as an SVM using the four constituent gene pairs, as previously described. Because the former approach achieved marginally better performance that was closer to the authors' original description in their test dataset, this was included in the final analysis.
Figure 1Genes comprising the eight best performing blood transcriptomic signatures for incipient tuberculosis
(A) Matrix showing constituent genes for each signature. (B) Network diagram showing statistically enriched (p<0·05) upstream regulators of the 40 genes, identified by Ingenuity Pathway Analysis. Coloured nodes represent the predicted upstream regulators, grouped by function (red=cytokine, blue=transcription factor, green=other). Black nodes represent the transcriptional biomarkers downstream of these regulators. STAT1, represented by a blue node as a predicted upstream regulator of a number of genes, is also gene target for other upstream regulators. The identity of each node is indicated using Human Genome Organisation nomenclature. The size of the nodes is proportional to the number of downstream biomarkers associated with each regulator and the thickness of the edges is proportional to the –log10 p value for enrichment of each of the upstream regulators.
Figure 2Scatterplots showing scores of eight best performing transcription signatures for incipient tuberculosis, stratified by interval to disease
Dashed horizontal lines indicate thresholds set as standardised scores of two for each signature. Number of samples included for each signature, at each timepoint, indicated in the appendix 1 (p 19). Repeated measures analysis of variance with linear trend method showed p<0·0001 for association of categorical interval to disease with decreasing scores for each of the eight signatures. Scatterplots showing scores of these signatures plotted against days to tuberculosis are shown in the appendix 1 (p 18).
Figure 3Receiver operating characteristic curves showing diagnostic accuracy of eight best performing transcriptional signatures for incipient tuberculosis
Receiver operating characteristic curves shown stratified by months from sample collection to disease. Area under the curve estimates and 95% CIs are shown in the appendix 1 (p 15). Number of samples included for each signature, at each timepoint, indicated in the appendix 1 (p 19).
Figure 4Diagnostic accuracy of eight best performing transcriptional signatures for incipient tuberculosis shown in receiver operating characteristic space, stratified by months to disease
Dashed lines represent positive-predictive values of 5%, 10%, and 15%, based on 2% pre-test probability. Grey shading indicates 95% CIs for each signature. Cutoffs derived from two standard scores above the mean of control population. The number of samples included for each signature, at each timepoint, is indicated in the appendix 1 (p 19). Point estimates and 95% CIs are also shown in the appendix 1 (p 20).