Literature DB >> 31341169

Deep single-cell RNA sequencing data of individual T cells from treatment-naïve colorectal cancer patients.

Yuanyuan Zhang1, Liangtao Zheng2, Lei Zhang2, Xueda Hu1, Xianwen Ren1, Zemin Zhang3,4.   

Abstract

T cells, as a crucial compartment of the tumour microenvironment, play vital roles in cancer immunotherapy. However, the basic properties of tumour-infiltrating T cells (TILs) such as the functional state, migratory capability and clonal expansion remain elusive. Here, using Smart-seq2 protocol, we have generated a RNA sequencing dataset of 11,138 T cells isolated from peripheral blood, adjacent normal and tumour tissues of 12 colorectal cancer (CRC) patients, including 4 with microsatellite instability (MSI). The dataset contained an expression profile of 10,805 T cells, as well as the full-length T cell receptor (TCR) sequences of 9,878 cells after quality control. To facilitate data mining of our T cell dataset, we developed a web-based application to deliver systematic interrogations and customizable functionalities ( http://crctcell.cancer-pku.cn/ ). Functioning with our dataset, the web tool enables the characterization of TILs based on both transcriptome and assembled TCR sequences at the single cell level, which will help unleash the potential value of our CRC T cell data resource.

Entities:  

Mesh:

Substances:

Year:  2019        PMID: 31341169      PMCID: PMC6656756          DOI: 10.1038/s41597-019-0131-5

Source DB:  PubMed          Journal:  Sci Data        ISSN: 2052-4463            Impact factor:   6.444


Background & Summary

CRC is among the common causes of cancer-related mortality worldwide[1,2]. While immune checkpoint blocking antibodies (ICBs) have shown impressive clinical benefits in cancers[3-6], their benefits are highly uneven among CRC patients. Remarkably, only CRC patients with MSI showed pronounced responses to ICBs, while patients with microsatellite stability (MSS) derived no benefit[7,8]. The underlying mechanisms of such discrimination remain elusive. T cells play vital roles in killing malignant cells and are associated with responses to ICB-treatment[9,10]. It is thus important to understand the cellular underpinnings of TILs in CRC. Single cell transcriptome analysis has become a compelling approach to decipher the properties of TILs, due to its ability to quantify gene expression and assemble TCR sequences simultaneously. In our recent Nature paper, we have performed single cell RNA sequencing of 11,138 T cells isolated from peripheral blood, adjacent normal and tumour tissues of 12 treatment-naïve CRC patients (Fig. 1a and Table 1), and developed STARTRAC (single T cell analysis by RNA sequencing and TCR tracking) indices to analyse the dynamic relationships among 20 identified T cell subsets[11]. Here, we provide the detailed description of our dataset and present a webserver to deliver comprehensive and customizable analyses.
Fig. 1

Schematic overview of the study design and analysis pipeline. (a) The experimental flowchart of this study. (b) The bioinformatics pipeline used for data analysis. Softwares used in each steps were labelled in blue. WES, whole exome sequencing; DEG, differentially expressed gene; dist, tissue distribution; expa, clonal expansion; migr, cross-tissue migration; tran, developmental transition.

Table 1

Clinical characteristics of 12 CRC patients.

Patient IDAgeGenderHistological typeaStageTumour sizeMSI statusbTNM ClassificationGrade
P070168FemaleRectum ADCI1 × 0.8 cmMSS1,0,0Well- differentiated
P101235FemaleColon ADCIIIC7 × 6 cmMSS4,2,0Low-differentiated
P120766FemaleColon ADCII6 × 6 cmMSS4,0,0Moderate-differentiated
P121242FemaleColon ADCII6 × 4 cmMSS4,0,0Low- or moderate- differentiated
P122877FemaleColon ADCII4.5 × 4 cmMSS4,0,0Low- or moderate- differentiated
P021575MaleColon ADCIV6.5 × 4 cmMSS4,2,1Low-differentiated
P030955MaleRectum ADCIIIC5 × 4.5 cmMSS3,2,0Moderate- differentiated
P041175MaleRectum ADCIIB6.5 × 3.5 cmMSS4,0,0Moderate- differentiated
P012365FemaleColon ADCIIIB11.5 × 7 cmMSI4,1,0Moderate- differentiated
P041382FemaleColon ADCIIIB10 × 10 cmMSI4,1,0Moderate- differentiated
P082583FemaleColon ADCIIB9 × 4 cmMSI4,0,0Low- differentiated
P090945MaleColon ADCIIIB6 × 4 cmMSI3,1,0Low- differentiated

aADC, adenocarcinoma.

bMSS, microsatellite stability; MSI, microsatellite instability.

Schematic overview of the study design and analysis pipeline. (a) The experimental flowchart of this study. (b) The bioinformatics pipeline used for data analysis. Softwares used in each steps were labelled in blue. WES, whole exome sequencing; DEG, differentially expressed gene; dist, tissue distribution; expa, clonal expansion; migr, cross-tissue migration; tran, developmental transition. Clinical characteristics of 12 CRC patients. aADC, adenocarcinoma. bMSS, microsatellite stability; MSI, microsatellite instability. The dataset contained an average of 1.25 million uniquely mapped read pairs per cell, with an average mapping rate of 96.6% (Online-only Table 1). After quality control, we obtained an expression profile of 12,547 genes for 10,805 cells, with an average of 3,182 genes detected per cell (Online-only Table 1). The expression data could be used to elucidate the expression distributions of genes including those currently pursued as immunotherapy targets in clinical trials (Fig. 2a), illuminating the potentially modulated T cell populations with different immunotherapies. Furthermore, the dataset can serve as a resource for further T cells exploration including the identification of novel regulatory mechanisms by depicting the specific expression patterns of transcription factors (Fig. 2b).
Online-only Table 1

Sequencing data statistics of single T cells in CRC.

PatientCell typeaAverage number of raw readsAverage number of raw basesAverage number of clean readsAverage number of clean basesAverage error rate of read1 (%)Average error rate of read2 (%)Average Q20 of read1 (%)Average Q20 of read2 (%)Average Q30 of read1 (%)Average Q30 of read2 (%)Average GC content of read1 (%)Average GC content of read2 (%)Average high quality rate (%)Average uniquely mapped read pairsAverage mapping rate (%)Average number of detected genesbNumber of cells
P0123NP71,714,334258,864,3611,713,937258,804,5330.020.0396.7094.8093.0888.6945.4145.7792.15719,72497.362,02366
P0123NTC1,867,875282,049,1071,867,620282,010,6580.020.0396.8894.9393.2888.9146.1446.4192.58755,54397.232,513127
P0123NTH1,755,265265,044,9531,755,180265,032,2170.020.0396.9594.7793.4288.5545.7746.1692.13719,68196.692,09081
P0123NTR1,831,831276,606,4741,831,733276,591,6690.020.0396.8094.6593.1788.4745.6145.9792.20746,95695.742,46886
P0123PP71,541,654232,789,6981,541,643232,788,0280.020.0496.3594.2992.3587.8845.4645.7691.54604,13092.152,02081
P0123PTC1,904,745287,616,4431,904,183287,531,6480.020.0496.3693.5692.5486.6046.2146.5590.36736,10196.632,15782
P0123PTH1,666,075251,577,2581,665,993251,564,8940.020.0397.2395.3293.8089.4246.3046.6593.23647,48196.892,53683
P0123PTR1,833,308276,829,5611,832,767276,747,7550.020.0396.6594.1193.0287.3244.8645.2391.08752,57497.862,27185
P0123TP71,715,085258,977,8811,714,997258,964,4890.020.0396.6594.6192.9688.4145.3445.6792.01711,99496.172,464152
P0123TTC1,675,997253,075,5731,675,828253,050,0140.020.0396.6194.1492.8587.7345.8446.1091.65666,45293.702,576109
P0123TTH1,655,447249,972,5531,655,156249,928,5360.020.0496.6193.7592.8586.9045.8946.2391.18649,27893.552,515136
P0123TTR1,754,982265,002,2721,754,567264,939,6490.020.0396.7194.5293.0888.2145.1145.4491.86739,95397.052,528150
P0215NTC3,696,603558,187,0273,696,532558,176,3350.020.0596.9892.5193.2783.1445.2445.7791.831,425,81695.923,46389
P0215NTH3,577,400540,187,4043,577,319540,175,2220.020.0696.8792.2192.8282.4645.2145.6991.631,404,59197.263,28374
P0215NTR3,968,152599,190,9223,968,109599,184,4740.020.0797.0891.2593.5080.9044.2144.7590.151,624,92596.693,99320
P0215PTC3,497,938528,188,6083,497,882528,180,2420.020.0597.2192.5393.4683.0246.8547.3092.051,269,38098.613,42669
P0215PTH4,021,645607,268,3904,021,536607,251,9680.020.0597.3392.6993.8583.4346.6647.1692.201,493,90698.163,54385
P0215PTR3,705,902559,591,2543,705,808559,577,0040.020.0697.2291.7593.6981.7646.4146.9091.051,344,16097.133,87776
P0215TTC3,619,457546,538,0713,619,383546,526,7870.020.0697.1991.8993.5781.9445.6846.1491.171,420,60297.123,786121
P0215TTH3,644,561550,328,6813,644,479550,316,2770.020.0597.0392.7293.3983.6545.4245.8492.111,452,49496.063,628107
P0215TTR3,681,850555,959,3183,681,765555,946,5060.020.0696.9391.5893.1181.5846.2046.6890.561,400,27396.363,910113
P0309PP71,688,709254,995,1091,688,340254,939,2670.020.0496.7294.1892.6287.0947.9248.1891.76526,00796.071,79666
P0309PTC1,931,976291,728,3701,931,850291,709,4010.020.0396.9295.2993.2389.3046.5246.8292.81758,16098.592,80094
P0309PTH1,618,471244,389,1021,618,268244,358,4020.020.0396.4994.7392.5488.2846.6346.9991.79616,15197.602,74187
P0309PTR1,810,857273,439,4641,810,554273,393,5990.020.0396.9194.4393.0487.5947.0547.3392.00696,21497.072,39277
P0309TP71,440,359217,494,2131,440,140217,461,1840.020.0396.8294.6292.9388.0146.2246.4892.24563,36797.052,32082
P0309TTC1,834,215276,966,5191,833,959276,927,8720.020.0396.7194.3892.8587.5746.0246.3791.58732,58598.312,482137
P0309TTH1,735,228262,019,4351,735,010261,986,4520.020.0396.9894.8993.1888.4746.3046.5592.66691,20698.072,461136
P0309TTR1,506,787227,524,8621,506,449227,473,7860.020.0496.3993.7292.2186.2346.7947.2390.66592,59697.762,56479
P0411NTC1,936,233292,371,2251,936,198292,365,9460.020.0596.3192.0891.9683.5246.3046.7991.32752,67693.982,768119
P0411NTH2,008,920303,346,9322,008,309303,254,5940.020.0596.5792.3292.3483.6245.1045.6291.80817,31797.002,41677
P0411PTC1,541,931232,831,6341,541,547232,773,6100.020.0696.2091.2191.6782.0246.9847.5290.53581,19094.082,74068
P0411PTH1,785,455269,603,7021,785,210269,566,7460.020.0596.5692.5492.3384.1547.0847.5292.12689,96995.212,897135
P0411PTR1,776,541268,257,6671,775,998268,175,6490.020.0696.6391.8792.3882.8646.5647.0891.41710,50797.312,73574
P0411TTC1,780,848268,908,0651,780,528268,859,6640.020.0696.8392.0792.6683.0746.5047.0191.66673,45898.142,93290
P0411TTH1,891,541285,622,6981,891,531285,621,1880.020.0596.7392.5792.6384.0646.2046.7091.85758,47297.363,02688
P0411TTR1,788,669270,089,0341,788,643270,085,1400.020.0596.7992.7292.7084.2846.3446.8292.13747,74398.023,446113
P0413NTC3,658,208552,389,4113,657,933552,347,8880.020.0596.7692.8092.5784.0446.4846.8992.591,454,52492.873,43093
P0413NTH3,663,795553,233,0123,663,327553,162,4010.020.0696.8592.4992.7883.4346.5947.0492.311,363,07888.623,41569
P0413PTC3,684,099556,298,9493,683,740556,244,7870.020.0596.7892.5892.5883.5846.6647.1392.341,401,68792.153,45390
P0413PTH3,841,036579,996,4683,840,709579,947,0470.020.0596.7892.7492.6983.9646.9747.4192.521,478,60795.983,74085
P0413PTR3,614,343545,765,7933,613,899545,698,7800.020.0596.7792.6192.7083.8046.9247.3592.501,358,17195.413,63278
P0413TTC3,679,688555,632,9573,679,121555,547,3180.030.0696.1092.0091.2682.9046.9947.5091.551,488,60497.224,225119
P0413TTH3,659,034552,514,1753,658,514552,435,6570.030.0596.2592.3591.6283.5646.7447.1891.981,497,80297.003,812119
P0413TTR3,545,754535,408,9033,545,215535,327,4540.030.0596.2792.3291.5883.4547.1847.6192.081,458,55897.674,338112
P0701NTC7,751,3741,170,457,4987,751,2981,170,445,9740.020.0397.2195.4293.8389.9145.9146.2393.752,933,50694.165,05063
P0701NTR7,582,2471,144,919,3347,580,5841,144,668,1760.020.0397.1894.5093.9588.1845.3445.6392.132,919,14296.505,241152
P0701PTC6,349,280958,741,2996,348,431958,613,1360.020.0496.6893.5793.0086.5746.1946.7091.312,282,96094.624,973113
P0701PTH7,776,0391,174,181,8207,775,9201,174,163,9870.020.0397.2095.0393.7989.0645.5645.9493.283,016,20795.604,92677
P0701PTR7,107,3341,073,207,3967,107,0101,073,158,4760.030.0496.0694.0991.6787.1546.8347.1092.262,553,27296.074,97680
P0701TTC7,117,0781,074,678,7337,116,9121,074,653,6380.020.0496.4093.6192.1486.3445.1545.5591.522,712,38093.965,278151
P0701TTH6,998,1931,056,727,1086,998,0411,056,704,1220.030.0496.2293.3091.4885.6046.7647.0991.252,614,96095.784,87281
P0701TTR7,143,8341,078,718,8777,143,6891,078,697,0020.020.0397.0794.7493.5588.7446.7547.1092.862,778,23096.345,878135
P0825NTC1,780,675268,881,9421,780,639268,876,4320.130.1688.1485.9376.0371.8645.8346.4885.11656,45199.213,05090
P0825NTH1,807,972273,003,8231,807,937272,998,4540.120.1588.4686.3476.5972.4445.5546.1485.86673,27298.993,12495
P0825NTY1,592,307240,438,3431,592,133240,412,0610.030.0795.6390.7290.2480.3345.4146.0589.45610,41899.202,785117
P0825PTC1,640,213247,672,2331,640,175247,666,4880.030.0696.0791.8791.3182.3646.1446.7091.04618,18799.092,947130
P0825PTH1,802,334272,152,4831,802,186272,130,1390.030.0795.9990.5290.9979.9445.5146.1689.52670,32999.243,03892
P0825PTR1,554,681234,756,7931,554,542234,735,7880.030.0795.9290.4190.8179.7146.0746.7789.34569,37399.292,896116
P0825TTC1,660,329250,709,7091,660,318250,708,0530.030.0696.2191.6991.3882.3845.5946.0590.70669,84598.842,948180
P0825TTH1,712,108258,528,3821,712,098258,526,7940.030.0696.1491.7291.3382.4445.6046.1590.57674,93298.802,711163
P0825TTR1,787,158269,860,8011,787,120269,855,1360.030.0696.1691.6491.4182.0346.1146.6490.82709,06698.803,374174
P0825TTY1,625,054245,383,2011,624,843245,351,2990.030.0795.5790.8490.1980.4645.4446.1989.60658,46899.372,66396
P0909NTC3,110,820469,733,7493,110,564469,695,1190.020.0397.0494.4393.6287.9345.4345.5291.751,179,70995.323,66847
P0909NTH3,736,288564,179,4613,735,299564,030,1160.020.0397.2394.8393.8888.7845.8445.9292.691,426,22196.123,918148
P0909PTC3,504,476529,175,8893,504,352529,157,0970.020.0497.0594.2693.4387.4845.9846.0292.281,341,78093.404,02672
P0909PTH3,981,055601,139,2933,980,673601,081,6210.020.0497.1394.2293.6487.4545.7545.8292.031,512,97095.043,95385
P0909PTR3,329,581502,766,6753,329,434502,744,5160.020.0496.5292.4092.6284.4045.3245.5289.561,239,37292.423,96167
P0909PTY1,716,265259,155,9661,716,067259,126,0520.030.0796.2990.6091.5379.9545.6046.2690.30648,35794.993,08677
P0909TTC3,869,919584,357,8273,869,131584,238,8410.020.0397.6195.7694.7690.6045.5645.7694.011,618,09197.504,694139
P0909TTH3,815,422576,128,7343,814,945576,056,6260.020.0397.2194.8793.8888.8645.5445.6092.991,589,33397.234,596214
P0909TTR3,666,375553,622,6363,665,775553,532,0040.020.0397.3294.9494.0688.8545.7145.7892.701,537,77596.914,319171
P0909TTY1,702,643257,099,0451,702,439257,068,3440.020.0696.5092.0391.8882.3045.4846.0191.83698,72297.233,42985
P1012PTC3,748,214565,980,2953,748,164565,972,8150.020.0696.7591.7892.6082.0446.0446.2790.751,491,54497.564,05395
P1012PTH3,838,306579,584,1553,838,199579,568,0830.020.0596.9492.5592.8383.2246.0846.2891.941,525,26997.923,99588
P1012PTR3,162,102477,477,4703,162,008477,463,1360.020.0896.7490.2292.3879.0445.9146.1888.991,220,35298.333,71584
P1012PTY1,756,369265,211,7031,756,163265,180,6600.030.0796.3691.0591.6780.6845.7846.4490.74680,84095.973,27287
P1012TTC3,702,952559,145,8013,702,437559,067,9400.030.0796.6291.2592.1380.8346.3746.6190.131,489,47397.764,193241
P1012TTH3,411,715515,168,9533,411,494515,135,6220.020.0796.4590.7991.8380.1646.1346.3989.591,304,41696.283,430170
P1012TTR3,425,917517,313,4153,425,794517,294,9690.020.0696.9691.5792.8181.4846.6046.8190.781,362,67597.564,274177
P1012TTY1,905,667287,755,7161,905,445287,722,2430.030.0796.3690.8391.7280.3045.8246.4990.33757,24696.453,122123
P1207PTC3,606,923544,645,3313,605,187544,383,2160.020.0696.5591.5192.5181.9045.6646.1890.241,389,71493.683,747126
P1207TTC3,540,183534,567,6473,538,500534,313,5320.030.0995.6688.4890.9977.4946.8347.4186.621,241,98287.043,77484
P1212NTC3,767,875568,949,1183,766,095568,680,4050.030.0796.1590.4491.3679.5245.6146.1289.701,443,90793.033,967205
P1212NTH3,561,632537,806,5013,560,120537,578,0550.030.0896.2290.1391.1778.3346.1546.6989.831,427,65397.653,708225
P1212NTY4,062,957613,506,5404,060,649613,157,9530.030.0796.6491.4392.1080.6846.3146.7291.191,650,57097.464,19923
P1212PTC3,801,176573,977,6433,799,275573,690,5030.030.0996.0589.1890.9277.0346.0146.5488.561,410,37493.674,345105
P1212PTH3,724,347562,376,4143,722,518562,100,1510.030.0896.3690.6291.3979.1546.2546.7790.391,474,67797.654,083105
P1212PTR3,810,787575,428,8573,809,226575,193,1980.030.0896.4190.8991.5279.6246.4546.9590.651,511,47595.944,13789
P1212TTC3,690,325557,239,0913,688,586556,976,5120.030.0896.2889.9991.5578.6845.9146.4289.151,419,15292.664,119211
P1212TTH3,549,079535,910,8903,547,181535,624,3660.030.0996.2589.4191.0376.8746.2846.8088.931,425,36698.983,90973
P1212TTR3,700,772558,816,5113,698,983558,546,3840.030.1095.8888.8490.7276.6445.8846.4787.961,404,68293.374,800128
P1228NTC3,805,123574,573,5673,804,741574,515,8270.020.0596.9093.4292.9685.2445.9246.3993.341,538,97797.843,933239
P1228NTH3,914,655591,112,9033,914,494591,088,5610.030.0796.6292.1792.3382.8045.6646.1793.421,642,23298.113,589184
P1228NTR3,825,729577,685,1323,825,711577,682,3200.020.0596.6693.1192.4884.7345.4145.9193.021,594,16296.813,729148
P1228PTC3,559,129537,428,5413,558,953537,401,9270.030.1095.7789.6690.6478.6145.7746.4591.591,424,50496.104,06188
P1228PTH3,771,713569,528,7053,771,635569,516,8550.020.0597.0693.3793.3885.0345.4045.9293.421,550,10898.344,20975
P1228PTR3,632,355548,485,6793,632,307548,478,4030.020.0596.8693.4892.8985.2845.5746.0593.451,515,59598.204,14286
P1228TTC3,862,801583,282,9973,862,477583,234,0710.030.0696.4991.6491.9781.9845.7846.3591.001,608,94598.583,449224
P1228TTH4,020,957607,164,4584,020,253607,058,1280.020.0496.8594.0193.0086.1844.4845.0593.931,714,68098.553,48483
P1228TTR3,503,527529,032,5933,503,226528,987,1040.030.0696.1291.4691.3981.7844.7545.3790.911,437,12996.994,23483

aPTC, CD8+ cytotoxic T cells from peripheral blood; TTC, CD8+ cytotoxic T cells from tumour tissue; NTC, CD8+ cytotoxic T cells from adjacent normal tissue.

PTH, CD4+CD25− cells from peripheral blood; TTH, CD4+CD25− cells from tumour tissue; NTH, CD4+CD25− cells from adjacent normal tissue.

PTR, CD4+CD25hi cells from peripheral blood; TTR, CD4+CD25hi cells from tumour tissue; NTR, CD4+CD25hi cells from adjacent normal tissue.

PTY, CD4+CD25int cells from peripheral blood; TTY, CD4+CD25int cells from tumour tissue; NTY, CD4+CD25int cells from adjacent normal tissue.

PP7, CD4+ T cells from peripheral blood; TP7, CD4+ T cells from tumour tissue; NP7, CD4+ T cells from adjacent normal tissue.

bA gene was defined as “detected” if the number of mapped read pairs of this gene was larger than 0.

Fig. 2

Expression patterns of selected genes. (a) Violin plots showing the expression distributions of known immunotherapy targets in tumour-enriched T cell clusters. (b) Bubble plots depicting expressions of transcription factors in different CD4+ T cell clusters.

Expression patterns of selected genes. (a) Violin plots showing the expression distributions of known immunotherapy targets in tumour-enriched T cell clusters. (b) Bubble plots depicting expressions of transcription factors in different CD4+ T cell clusters. TCR sequences, composed of α- and β-chains, play major roles in the selection and activation of T cells[12]. Both α- and β-chains contribute to the determination of TCR antigen specificity, and different T cells with the same TCR could be functionally distinct[13]. To uncover information about T cell ancestry and clonality, we obtained full-length TCR sequences of 91.4% (9,878/10,805) cells with at least one pair of productive α-β chains after eliminating non-productive alleles or low-abundance TCRs (Fig. 3a and Supplementary File 1). Accordingly, T cells with identical TCRs were defined to be from the same clonotype, and a total of 7,274 clonotypes were obtained (Supplementary File 1). Indeed, a strong correlation was observed between the recurring frequencies of α-chains and that of β-chains, indicating a common ancestral cell of origin (Fig. 3b).
Fig. 3

The TCR profile of single T cells. (a) The abundance distributions of TCR α- or β-chain. The gray lines represent the fitting values. (b) The relationship between the degrees of recurrent usage of various TCR α-chains with that of β-chains. Each dot represents a group of TCR α/β allele expressed in a given number of cells. Dot size represents the proportion of such group in all TCR chains detected. (c) TCR sharing patterns of different CD8+ T cell clusters enriched in different tissues.

The TCR profile of single T cells. (a) The abundance distributions of TCR α- or β-chain. The gray lines represent the fitting values. (b) The relationship between the degrees of recurrent usage of various TCR α-chains with that of β-chains. Each dot represents a group of TCR α/β allele expressed in a given number of cells. Dot size represents the proportion of such group in all TCR chains detected. (c) TCR sharing patterns of different CD8+ T cell clusters enriched in different tissues. The TCR sequences can be utilized to delineate TCR sharing patterns of both inter/intra-tissues and inter/intra-clusters (Fig. 3c), shedding light on the properties of T cells including clonal expansion, developmental transition and cross-tissue migration. Furthermore, TCR sequences, as well as the transcriptome data elucidating T cell functions, could serve as a data resource for the discovery of antigen specificity in therapeutic applications[14]. In our related work, we have revealed important insights of the T cell biology based on STARTRAC indices[11]. For instance, tumour-resident CD8+ effector memory and dysfunctional T cells showed mutually exclusive developmental transition patterns, suggesting a TCR-based cell fate decision. In addition, we found that a special subset of IFNG+ TH1-like T cells with CXCL13+BHLHE40+ were preferentially enriched in MSI tumours, which might contribute to the favourable responses of MSI patients to ICBs. While some discoveries have been made, the unprecedented data resource of CRC T cells is still attractive to many biologists. To facilitate data mining of our T cell dataset, we developed iSTARTRAC (the interactive platform of STARTRAC), a web server to deliver customizable functionalities for further T cell investigation. iSTARTRAC provides key functions including cluster visualization, gene expression demonstration, differential expression analysis, TCR sharing illustration and discrimination of differences between MSI and MSS patients (Fig. 4).
Fig. 4

Schema describing the key functionalities of the iSTARTRAC web server. iSTARTRAC provides six functional modules including cluster atlas, gene expression, DEG analysis, TCR-based analysis, STRATRAC indices and MSI versus MSS. Each module implements several customizable analyses for user input samples.

Schema describing the key functionalities of the iSTARTRAC web server. iSTARTRAC provides six functional modules including cluster atlas, gene expression, DEG analysis, TCR-based analysis, STRATRAC indices and MSI versus MSS. Each module implements several customizable analyses for user input samples. The comprehensive and customizable analyses with simple clicking through iSTARTRAC could greatly facilitate data reuse in the field of cancer immunology, and the accompanying scientific discussion will further expedite the process of therapeutic discovery and understanding the mechanism of immunotherapies with respect to T cell functions.

Methods

These methods are expanded version of descriptions in our related work[11], which provided detailed descriptions of experimental procedures including human specimens, single cell collection, cell sorting, reverse transcription, amplification and sequencing, and those of computational processing including quality control, data processing, TCR assembly, unsupervised clustering and definition of STARTRAC indices[11]. While most part of the methods described here was cited from that report, we specifically aim to emphasize the samples and the methods used to generate the single cell RNA-seq data.

Clinical human specimens

Twelve patients with CRC were enrolled and pathologically diagnosed with colorectal adenocarcinoma at Peking University People’s Hospital. All patients in this study provided written informed consent for sample collection and data analyses. This study was approved by the Research and Ethical Committee of Peking University People’s Hospital and complied with all relevant ethical regulations. The patients included eight with MSS (P0701, P1012, P1207, P1212, P1228, P0215, P0411 and P0309) and four with MSI (P0123, P0909, P0825 and P0413) status. Among these 4 MSI patients, 3 had positive lymph nodes (P0123, P0413 and P0909), two of them had poorly-differentiated disease (P0825 and P0909), and none of them had distal metastasis. There were eight females and four males, and the median age of diagnosis was 67, ranging from 35 to 82. Among these 12 patients, one was diagnosed at stage I, five at stage II, five at stage III, and one at stage IV, which was classified according to the guidance of AJCC version 8. None of them were treated with chemotherapy or radiation prior to tumour resection. The available clinical characteristics are summarized in Table 1.

Sample collection and preparation

Fresh tumour and adjacent normal tissue samples (at least 2 cm from matched tumour tissues) were surgically resected from the above-described patients. Patients P0701, P0909, P1212, P1228, P0215, P0411, P0413, P0825, P0123 and P0309 had peripheral blood and paired tumour and adjacent normal tissues, whereas patients P1012 and P1207 had only fresh tumour tissue and matched peripheral blood. Tumours and adjacent normal tissues were cut into approximately 1-mm3 pieces in the RPMI-1640 medium (Invitrogen) with 10% fetal bovine serum (FBS; Sciencell), and enzymatically digested with MACS Tumour Dissociation Kit (Miltenyi Biotec) for 30 min on a rotor at 37 °C, according to the manufacturer’s instruction. The dissociated cells were subsequently passed through a 40-µm cell-strainer (BD) and centrifuged at 400 g for 10 min. After the supernatant was removed, the pelleted cells were suspended in red blood cell lysis buffer (Solarbio) and incubated on ice for 2 min to lyse red blood cells. After washing twice with PBS (Invitrogen), the cell pellets were re-suspended in sorting buffer (PBS supplemented with 1% FBS). PBMCs were isolated using HISTOPAQUE-1077 (Sigma-Aldrich) solution as previously described[15]. In brief, 3 ml of fresh peripheral blood was collected before surgery in EDTA anticoagulant tubes and subsequently layered onto HISTOPAQUE-1077. After centrifugation, lymphocyte cells remained at the plasma–HISTOPAQUE-1077 interface and were carefully transferred to a new tube and washed twice with PBS. Red blood cells were removed via the same procedure described above. These lymphocytes were re-suspended in sorting buffer.

Single-cell sorting, reverse transcription, amplification and sequencing

Single-cell suspensions were stained with antibodies against CD3, CD4, CD8 and CD25 (anti-human CD3, UCHT1; anti-human CD4, OKT4; anti-human CD8, OKT8; anti-human CD25, BC96; eBioscience) for fluorescence-activated cell sorting (FACS), performed on a BD Aria III instrument. Single cells of different subtypes including cytotoxic T (TC) cells, T helper (TH) cells and regulatory T (Treg) cells were enriched by gating 7AAD−CD3+CD8+, 7AAD−CD3+CD4+CD25−/+ and 7AAD-CD3+CD4+ CD25++ T cells, respectively, and sorted into 96-well plates (Axygen) chilled to 4 °C, prepared with lysis buffer with 1 µl 10 mM dNTP mix (Invitrogen), 1 µl 10 µM Oligo dT primer, 1.9 µl 1% Triton X-100 (Sigma), and 0.1 µl 40 U µl-1 RNase Inhibitor (Takara). The single-cell lysates were sealed and stored frozen at −80 °C immediately. Single-cell transcriptome amplifications were performed according to the Smart-Seq2 protocol[15,16]. The External RNA Controls Consortium (ERCC; Ambion; 1:4,000,000) was added into each well as the exogenous spike-in control before the reverse transcription. The amplified cDNA products were purified with 1× Agencourt XP DNA beads (Beckman). A procedure of quality control was performed following the first round of purification, which included the detection of CD3D by qPCR (forward primer, 5′-TCATTGCCACTCTGCTCC-3′; reverse primer, 5 primer, 5′-TCATTGCCACT) and fragment analysis by analyser AATI. For those single-cell samples with high quality after quality control (cycle threshold <30), the DNA products were further purified with 0.5× Agencourt XP DNA beads, and the concentration of each sample was quantified by Qubit HsDNA kits (Invitrogen). Multiplex (384-plex) libraries were constructed and amplified using the TruePrep DNA Library Prep Kit V2 for Illumina (Vazyme Biotech). The libraries were then purified with Agencourt XP DNA beads and pooled for quality assessment by fragment analyser. For all the 12 patients, purified libraries were analysed by an Illumina Hiseq 4000 sequencer with 150-bp pair-end reads. For patient P1207, only CD8+ T cells were collected due to the temporary lack of CD4 antibody.

Bulk DNA isolation and sequencing

Genomic DNA of peripheral blood and tissue samples of patients with CRC were extracted using the QIAamp DNA Mini Kit (QIAGEN) according to the manufacturer’s specification. The concentrations of DNA were quantified using the Qubit HsDNA Kits (Invitrogen) and the qualities of DNA were evaluated with agarose gel electrophoresis. Exon libraries were constructed using the SureSelectXT Human All Exon V5 capture library (Agilent). Samples were sequenced on the Illumina Hiseq 4000 sequencer with 150-bp paired-end reads.

Multi-colour immunohistochemistry

OpalTM multi-colour immunohistochemistry (IHC) staining were performed with antibodies of rabbit anti-human CD3 (Abcam, clone SP7, 1:400), mouse anti-human CD8 (Abcam, clone 144B, 1:500), rabbit anti-human CD4 (Abcam, clone EPR6855, 1:400) and mouse anti-human FOXP3 (Abcam, clone mAbcam22510, 1:500) to validate the existence of infiltrating TC, TH and Treg cells in tumour tissues. The specimens were collected and prepared for the formalin-fixed paraffin-embedded tissues sections as previously mentioned[15]. Antigen was retrieved by AR9 buffer (pH 6.0, PerkinElmer) and boiled in the oven for 15 min. After a pre-incubation with blocking buffer at room temperature for 10 min, the sections were incubated at room temperature for 1 h with aforementioned antibodies. A secondary horseradish peroxidase-conjugated antibody (PerkinElmer) were added and incubated at room temperature for 10 min. Signal amplification was performed using TSA working solution diluted at 1:100 in 1× amplification diluent (PerkinElmer) and incubated at room temperature for 10 min. The multispectral imaging was collected by Mantra Quantitative Pathology Workstation (PerkinElmer, CLS140089) at 20× magnification and analysed by InForm Advanced Image Analysis Software (PerkinElmer) version 2.3. For each patient, a total of 8–15 high-power fields were taken based on their tumour sizes.

Microsatellite instability testing

DNA purified from tumour tissues using QIAamp DNA Mini Kit (QIAGEN) was subjected to multiplex fluorescent PCRbased assay (Promega) by amplifying seven loci including five mononucleotide repeats (NR21, BAT26, BAT25, NR24 and Mono27) and two pentanucleotide repeats (PentaC and PentaD) and was compared with DNA extracted from matched adjacent normal tissues. Multiplex PCR products were analysed by ABI PRISM 3100 Genetic Analyzer (Applied Biosystems).

Quality control and preprocessing of single cell RNA-seq data

Low-quality read pairs of single-cell RNA sequencing (scRNA-seq) data were filtered out if at least one end of the read pair met one of the following criteria: (1) ‘N’ bases account for ≥10% of the read length; (2) bases with quality <5 account for ≥50% of the read length; and (3) the read contains adaptor sequence. The filtered read pairs were processed using HTSeqGenie pipeline (R package version 4.8) to obtain the gene expression table. Specially, read pairs were then mapped to human ribosomal RNA (rRNA) sequences (download from RFam database) and the read pairs with both ends unmapped were kept for downstream analysis. Read pairs passing this filter for rRNA were aligned to human reference sequence (hg19) using GSNAP[17], with parameters ‘–novelsplicing 1 -n 10 -i 1 -M 2’. To calculate the expression levels of genes, the gene model file ‘knownGene.txt’ (30 June 2013 version), downloaded from UCSC, was used. The R function findOverlaps was used to count the number of uniquely mapped read pairs located in each gene and the count table tabulated as genes by cells was used for downstream analysis. The transcripts per million (TPM) table was derived from the count table and the TPM value was calculated bywhere Cij is the count value of gene i in cell j. It should be noticed that the TPM here is a simplified version based on the hypothesis that all mapped reads are approximate the same length. Low-quality cells were filtered if the library size or the number of expressed genes (counts larger than 0) was smaller than predefined thresholds. Both thresholds were defined as the medians of all cells minus 3× the median absolute deviation. Furthermore, if the proportion of mitochondrial gene counts was larger than 10%, these cells were discarded. Only cells with the average TPM of CD3D, CD3E and CD3G larger than 10 were kept for subsequent analysis. We further identified CD4+, CD8+, CD4CD8− (double negative) and CD4+CD8+ (double positive) T cells based on the gene expression data. Given the average TPM of CD8A and CD8B, one cell was considered as CD8 positive or negative if the value was larger than 30 or less than 3, respectively; given the TPM of CD4, one cell was considered as CD4 positive or negative if the value was larger than 30 or less than 3, respectively. Hence, the cells can be in silico classified as CD4+CD8−, CD4CD8+, CD4+CD8+, CD4CD8− and other cells that cannot be clearly defined. While TPM is an intuitive and popular measurement to standardize the total number of transcripts between cells, it is insufficient and could bias downstream analysis because TPM can be dominated by a handful of highly expressed genes. Therefore, we mainly used TPM for preliminary data processing and gene expression visualization. Recently, methods for normalizing scRNA-seq data including scran[18] have been proposed to implement robust and effective normalization, and thus we used the size-factor normalized read count for main analyses in our study including dimensionality reduction, clustering and finding markers for each cluster. After discarding genes with average counts of fewer than or equal to 1, the count table of the cells passing the above filtering was normalized by a pooling strategy. We applied the R package scran[18] in Bioconductor to perform the normalization process. Specifically, cells were pre-clustered using the ‘quickCluster’ function with the parameter ‘method = hclust’. Size factors were calculated using ‘computeSumFactors’ function with the parameter ‘sizes = seq (20,100,by = 20)’ which indicates the number of cells per pool. Raw counts of each cell were divided by their size factors, and the resulting normalized counts were then scaled to log2 space and used for batch correction. Scran utilizes a pooling strategy implemented in ‘computeSumFactors’ function, in which size factors for individual cells were deconvoluted from size factors of pools. To avoid violating the assumption that most genes were not differentially expressed, hierarchical clustering based on Spearman’s rank correlation was performed with ‘quickCluster’ function first, then normalization was performed in each resulting cluster separately. The size factor of each cluster was further re-scaled to enable comparison between clusters. To remove the possible effects of different donors on expression, the normalized table was further centred by patient. Thus, in the centred expression table, the mean values of the cells for each patient were zero. A total of 12,548 genes and 10,805 cells were retained in the final expression table. If not explicitly stated, ‘normalized read count’ or ‘normalized expression’ in this study refers to the normalized and centred count data for simplicity.

Unsupervised clustering analysis of CRC single T cell RNA-seq dataset

The cell clusters used here were the same as defined in our related Nature paper[11]. The expression tables of CD8+CD4− T cells and CD8CD4+ T cells as defined by the aforementioned in silico classification but excluding MAIT cells and iNKT cells, were fed into an iteratively unsupervised clustering pipeline separately. Specifically, given expression table, the top n genes with the largest variance were selected, and then the expression data of the n genes were analysed by single-cell consensus clustering (SC3)[19]. n was tested from 500, 1000, 1500, 2000, 2500 and 3000. In SC3, the distance matrices were calculated based on Spearman correlation and then transformed by calculating the eigenvectors of the graph Laplacian. Then the k-means algorithm was applied to the first d eigenvectors multiple times where d was chosen from 4% to 7% of the total number of input cells. Finally, hierarchical clustering with complete agglomeration was performed on the SC3 consensus matrix and k clusters were inferred. The SC3 parameters k, which was used in the k-means and hierarchical clustering, was tried from 2 to 10. For each SC3 run, the silhouette values were calculated, the consensus matrix was plotted, and cluster specific genes were identified. Such information was used to determine the optimal k and n. Once the stable clusters were determined, the above procedure was iteratively applied to each of these clusters to reveal the sub-clusters. After obtained the stable clusters by SC3, we further redefined the cluster labels of indeterminate cells with the silouatte values less than zero by R package XGBoost[20]. The training datasets were composed of cells with the silouatte >0, while cells to be reclassified with the silouatte <0 were then redefined to clusters with the largest predicting score. The in silico classified CD8+CD4− MAIT cells had distinct gene expression patterns compared with other CD8+CD4− T cells, and were defined as cluster “CD8_C08-SLC4A10”. When the clustering results were obtained, one-way ANOVA implemented by R function aov was performed to identify the differentially expressed genes among the clusters. R function TukeyHSD was used to identify which cluster pairs showed a significant difference. A gene was defined as being significantly differentially expressed based on the following criteria: 1) adjusted P-value (Benjamini-Hochberg method) of F test less than 0.05; 2) the absolute difference of any one significant cluster pair (P-value of Tukey’s ‘Honest Significant Difference’ method less than 0.01) larger than 1. The significantly differentially expressed genes were categorized in the cluster that showed the highest expression. The t-SNE method implemented in R package Rtsne was used for clustering visualization. To visualize the cell density on the t-SNE plot, kernel density estimation was performed using R function kde (ks package), and the contour lines encompassing the top 10%, 20%, …90% cells with highest densities were shown. A total of 8,530 T cells, including 3,628 CD8+CD4− and 4,902 CD8CD4+ T cells with clustering definitions, were used in the t-SNE projection. Other cells such as CD8+CD4+ and CD8CD4− T cells were not included in this visualization.

Analysis pipelines of bulk exome sequencing data

The bulk exome sequencing data were cleaned following the same procedure for the scRNA-seq data processing. The cleaned read pairs were then processed according to the BWA-Picard/ Genome Analysis Toolkit (GATK)-Strelka pipeline. In brief, the cleaned read pairs were aligned to human genome reference version b37 (downloaded from ftp://ftp.broadinstitute.org:/bundle) by the BWA-MEM algorithm[21]. The alignments were then sorted and de-duplicated by Picard (Broad Institute). GATK[22] was used to realign multiple reads around putative INDEL by Smith–Waterman alignment algorithm and re-calibrate base quality. The analysis-ready bam files were input into the GATK UnifiedGenotyper module to call SNP/INDEL and into Strelka[23] to call somatic SNV/INDEL and into ADTEx[24] (version 1.0.4) to call somatic copy number alterations. The mutations were annotated with ANNOVAR[25].

TCR assembly

TraCeR[26] was used to deduce the TCR sequences of each cell. The outputs of TraCeR include the assembled nucleotide sequences for both α and β chains, the coding potential of the nucleotide sequences (that is, productive or not), the translated amino acid sequence, the CDR3 sequences and the estimated TPM value of α or β chains. Only cells with TPM values larger than 10 for the α chain and larger than 15 for the β chain were kept. For cells with two or more α or β chains assembled, the α–β pair that was productive and of the highest expression level was defined as the dominant α–β pair in the corresponding cell. If two cells had identical dominant α–β pairs, the dominant α–β pair was identified as clonal TCRs. To integrate with the gene expression data, the TCR-based analysis was performed only for cells that passed the aforementioned quality control pipeline (total 10,805). Thus, 9,878 cells with TCR information were used in the integrative analysis[27] (Supplementary File 1). If one cell had an α chain composed of V segment TRAV1-2 and one of the following J segments (TRAJ33, TRAJ20 and TRAJ12), the cell was classified as a MAIT cell[28]. If the α chain of one cell was rearranged by V segment TRAV10 and J segment TRAJ18, the cell was classified as an invariant natural killer T cell[29]. In the 9,878 cells with at least one pair of productive α and β chains, only 3 cells were identified as invariant natural killer T cells, and 102 cells were identified as MAIT cells, including 71 CD8+CD4− T cells classified in silico.

Definition of STARTRAC indices

We present STRATRAC as a framework, defined by four indices, to analyse different aspects of T cells based on paired single cell transcriptomes and TCR sequences. The first index, named as STARTRAC-dist (STARTRAC-distribution), utilizes the ratio of observed over expected cell numbers in tissues to measure the enrichment of T cell clusters across different tissues. Given a contingency table of T cell clusters by tissues, we first apply Chi-squared test to evaluate whether the distribution of T cell clusters across tissues significantly deviates from random expectations. We then calculate the STARTRAC-dist index for each combination of T cell clusters and tissues according the following formula:where R is the ratio of observed cell number over the expected cell number of a given combination of T cell cluster and tissue. The expected cell number for each combination of T cell clusters and tissues are obtained from the Chi-squared test. can indicate whether cells of a certain cluster are enriched (Ro/e > 1) or depleted (Ro/e < 1) in a specific tissue. The other three STARTRAC indices, STARTRAC-expa (STARTRAC-expansion), STARTRAC-migr (STARTRAC-migration) and STARTRAC-tran (STARTRAC-transition), are designed to measure the degree of clonal expansion, tissue migration, and state transitions of T cell clusters upon TCR tracking, respectively. The MAIT cells were not included in these types of analyses because they have distinct TCRs. For STARTRAC-expa, which uses the standard TCR clonality measurement[30] but is specifically applied to different T cell clusters in our analyses, we first adopt the normalized Shannon entropy to calculate the evenness of the TCR repertoire of the given T cell cluster and then define the STARTRAC-expa index as 1-evenness. Mathematically, the STARTRAC-expa index of a specific cluster with N clonotypes is defined by the following formula:where p is the cell frequency of clonotype i in the cluster, and a clonotype is defined by identical, full-length, paired α and β TCR chains. STARTRAC-expa ranges from 0 to 1, with 0 indicating no clonal expansion for each clonotype while 1 indicating that the cluster is composed of only one clonally expanded clonotype, with high STARTRAC-expa indicating high clonality. For T cells with identical TCR clonotypes, even if they are present in different tissues or in different development states, logically they could be likely derived from a single naïve T cell, clonally expanded initially at one location and migrated across tissues or have undergone state transitions. Based on this principle, we define STARTRAC-migr and STARTRAC-tran to evaluate the extent of tissue migration and state transition of each clonotype, respectively. For each clonotype, given its distribution across tissues (peripheral blood, adjacent normal mucosa and tumour), we define its STARTRAC-migr index as:where is the ratio of the number of cells with TCR clonotype t in tissue j to the total number of cells with TCR clonotype t and . For two T cell clusters with similar clonal expansion and clonal size, the one with clonal cells broadly distributed in various tissues would likely be more mobile. Similarly, its STARTRAC-tran index can be defined as:where is the ratio of the number of cells with TCR clonotype t in cluster k to the total number of cells with TCR clonotype t, , and K is the total number of cell clusters. The input of STARTRAC-migr is the observed cell frequency across tissues of a certain clonotype, while the input of STARTRAC-tran is the observed cell frequency across cell clusters of a certain clonotype. By contrast, the input of STARTRAC-expa is the observed cell frequency across clonotypes of a certain cell cluster, and the input for the traditional TCR clonality measure is the observed sequence frequency across a TCR repertoire of a given sample. After the extent of tissue migration of each clonotype is quantified by STARTRAC-migr, given a cluster with total T clonotypes, the STARTRAC-migr index at the cluster level can be defined as the weighted average of all TCR clonotype migration indices contained in the cluster:where is the ratio of the number of cells with clonotype t in cluster cls to the total number of cells in cluster cls. Similarly, when the extent of state transition of each clonotype is quantified by STARTRAC-tran, given a cluster with total T clonotypes, the STARTRAC-tran index at the cluster level can be defined as the weighted average of all TCR clonotypes state transition indices contained in the cluster:where is the ratio of the number of cells with clonotype t in cluster cls to the total number of cells in cluster cls. Besides the overall evaluation of the extents of migration and state transitions by STARTRAC-migr and STARTRAC-tran, we also define pairwise STARTRAC-migr (pSTARTRAC-migr) and STARTRAC-tran (pSTARTRAC-tran) indices for precise quantification. For example, given a clonotype t and two tissue types (e.g., blood and tumour), the pSTARTRAC-migr index is calculated by the following formula:where is the ratio of the number of cells with TCR clonotype t in tissue j to the total number of cells with TCR clonotype t in tissues 1 and 2 (i.e., blood and tumour), and . In other words, pSTARTRAC-migr uses the same formula as STARTRAC-migr but limits the number of tissues to two and the frequencies of cells between two specified tissues are re-calculated. Likewise, given a clonotype t and two T cell clusters (e.g., TEM and TEX), the pSTARTRAC-tran index is calculated by the following formula:where is the ratio of the number of cells with TCR clonotype t in cluster k to the total number of cells with TCR clonotype t in clusters 1 and 2 (i.e., TEM and TEX), and . Thus, pSTARTRAC-tran uses the same formula as STARTRAC-tran but limits the number of clusters to two and the frequencies of cells between the two specified clusters are re-calculated. Once pairwise STARTRAC-migr and STARTRAC-tran for clonotypes are obtained, the corresponding indices for clusters are calculated via weighted average according to their clonotype compositions.

Summary of scRNA-seq data and bioinformatics workflow used for data processing

For all the 12 patients, a total of 35.5 G raw reads and 5.4 T raw bases were obtained after sequencing. After preprocessing, we obtained 32.5 G high-quality reads with an average high-quality rate of 91.3% (Online-only Table 1). Accordingly, we summarized the data processing procedures and tools used in each step in a flowchart, consisting of quality control filtering, TCRs assembly, expression quantification, data normalization and downstream analyses (Fig. 1b).

Data Records

As described in our related research paper[11], the raw sequencing data have been deposited in the European Genome-phenome Archive database under study accession id EGAS00001002791 and dataset accession id EGAD00001003910[31], which are available in FASTQ file format upon request and approval. The DATA ACCESS AGREEMENT is provided at https://github.com/zhangyybio/single-T-cell-data-access. Applicants can request access to the data by directly downloading it or by sending an email to cancerpku@pku.edu.cn. The process that is used to approve an application includes verifying the institution, participants and research purposes of the application, and the authorization by EGA. In general this process will take about two weeks. In principal, any academic research institutions complying with the laws and bioethic regulation policies of China will be approved. The publication moratorium described in the Data Access Agreement officially expires concurrent with publication of this Data Descriptor. The processed gene expression data were deposited in the Gene Expression Omnibus database under accession id GSE108989[32]. The clinical data recording available clinical characteristics of the collected 12 CRC patients are summarized in Table 1 and the genomic features are summarized in Table 2 and Online-only Table 2. Online-only Table 3 lists the DNA fragment sizes of short tandem repeat loci from tested patients in microsatellite instability testing experiment. Basic statistics of single cell sequencing data are provided in Online-only Table 1. The cluster information and TCR typing data are presented in Supplementary File 1, which has also been uploaded to Figshare[27].
Table 2

Statistics of somatic mutations detected by whole exome sequencing of CRC tumours.

PatientaFrameshift insertionFrameshift deletionFrameshift substitutionStopgainStoplossNonframeshift insertionNonframeshift deletionNonframeshift substitutionMissense SNVbSynonymous SNVbUnknownTotal
P0123 27129051204086938911,472
P0825 125422056353501,18149422,323
P0909 11419004630301,44058202,378
P0413 271560600111092942731,614
P0215522091614079420178
P04112606012068290114
P0701230110200102460166
P10124110100020180630270
P12072503017059360113
P121265040230135520207
P12283707001088460152
P0309010200004015058

Somatic mutations were detected by variant caller Strelka and were annotated with ANNOVAR.

aMSI pateints are labelled in bold.

bSNV,single nucleotide variant.

Online-only Table 2

Selected cancer-associated somatic mutations detected in CRC tumours.

PatientaGenomic mutationExonic functionbGenec_DNA mutationProtein mutationHot spotDriver genec
P120712:25398284,C>Tmissense_SNVKRASc.G35Ap.G12DYesOncogene
17:56448303,G>GCframeshift_insertionRNF43c.343dupGp.A115fsNoTSG
17:56492719,C>AstopgainRNF43c.G220Tp.E74XNoTSG
18:48591870,TGCCCTATTG>Tnonframeshift_deletionSMAD4c.569_577delp.190_193delNoTSG
19:11132513,C>Tmissense_SNVSMARCA4c.C338Tp.T113MYesTSG
20:57429320,G>Amissense_SNVGNASc.G1000Ap.G334SYesOncogene
P121217:7578440,T>Cmissense_SNVTP53c.A13Gp.K5EYesTSG
P12283:41278180,G>Amissense_SNVCTNNB1c.G2056Ap.E686KNoOncogene
5:112175617,TC>Tframeshift_deletionAPCc.2227delCp.P743fsYesTSG
22:24159001,G>Tmissense_SNVSMARCB1c.G673Tp.D225YNoTSG
P02154:153332832,G>AstopgainFBXW7c.C124Tp.Q42XNoTSG
5:112174631,C>TstopgainAPCc.C1240Tp.R414XYesTSG
5:112175174,G>TstopgainAPCc.G1783Tp.E595XYesTSG
P041117:7577046,C>AstopgainTP53c.G415Tp.E139XYesTSG
17:70119882,A>ACframeshift_insertionSOX9c.885dupCp.D295fsNoTSG
P0413 1:43804331,G>Tmissense_SNVMPLc.G331Tp.V111LNoOncogene
3:41275757,C>Tmissense_SNVCTNNB1c.C1652Tp.T551MNoOncogene
3:47158201,C>Tmissense_SNVSETD2c.G4498Ap.E1500KNoTSG
3:128205864,G>Amissense_SNVGATA2c.C11Tp.A4VNoOncogene
3:138665368,G>Amissense_SNVFOXL2c.C197Tp.A66VNoOncogene
3:178952088,A>Gmissense_SNVPIK3CAc.A3143Gp.H1048RYesOncogene
9:110249887,A>Gmissense_SNVKLF4c.T638Cp.V213ANoOncogene
11:108114816,CT>Cframeshift_deletionATMc.634delTp.F212fsYesTSG
12:46123836,TA>Tframeshift_deletionARID2c.103delAp.K35fsNoTSG
14:81422170,G>Amissense_SNVTSHRc.G146Ap.S49NNoOncogene
16:348044,C>Tmissense_SNVAXIN1c.G1462Ap.G488RNoTSG
16:3801727,G>Amissense_SNVCREBBPc.C3665Tp.T1222MNoTSG
17:7577538,C>Tmissense_SNVTP53c.G266Ap.R89QYesTSG
17:56435160,AC>Aframeshift_deletionRNF43c.1853delGp.G618fsYesTSG
18:42531605,C>Tmissense_SNVSETBP1c.C2300Tp.S767LNoOncogene
19:17942557,G>Amissense_SNVJAK3c.C2731Tp.R911CNoOncogene
 P08251:27100983,C>Tmissense_SNVARID1Ac.C803Tp.S268FNoTSG
1:27105659,C>Tmissense_SNVARID1Ac.C254Tp.A85VNoTSG
1:27105930,TG>Tframeshift_deletionARID1Ac.268delGp.G90fsYesTSG
2:48026881,G>Amissense_SNVMSH6c.G853Ap.A285TNoTSG
3:178947836,A>Gmissense_SNVPIK3CAc.A2711Gp.Y904CNoOncogene
4:55594093,C>Tmissense_SNVKITc.C226Tp.P76SNoOncogene
4:106155778,G>GAframeshift_insertionTET2c.680dupAp.E227fsNoTSG
5:56177480,G>Tmissense_SNVMAP3K1c.G2453Tp.R818MNoTSG
5:112154771,C>TstopgainAPCc.C988Tp.R330XYesTSG
7:2968322,CG>Cframeshift_deletionCARD11c.1663delCp.R555fsYesOncogene
7:140453136,A>Tmissense_SNVBRAFc.T1799Ap.V600EYesOncogene
9:98270529,GC>Gframeshift_deletionPTCH1c.114delGp.G38fsNoTSG
16:50813641,C>Tmissense_SNVCYLDc.C1195Tp.L399FNoTSG
17:29667635,T>Cmissense_SNVNF1c.T1598Cp.L533SNoTSG
17:29676257,G>Amissense_SNVNF1c.G1873Ap.A625TNoTSG
17:56435160,AC>Aframeshift_deletionRNF43c.1853delGp.G618fsYesTSG
19:42796882,G>GCframeshift_insertionCICc.3341dupCp.A1114fsYesTSG
20:4167411,C>TstopgainSMOXc.C1552Tp.Q518XNoOncogene
20:57415505,C>Tmissense_SNVGNASc.C344Tp.T115INoOncogene
21:44513265,G>Amissense_SNVU2AF1c.C451Tp.R151WNoOncogene
P0123 1:65325832,CG>Cframeshift_deletionJAK1c.1289delCp.P430fsYesOncogene
1:120548005,C>Amissense_SNVNOTCH2c.G113Tp.C38FNoTSG
3:37035079,C>Tmissense_SNVMLH1c.C41Tp.T14INoTSG
3:178952085,A>Gmissense_SNVPIK3CAc.A3140Gp.H1047RYesOncogene
4:153247303,T>Cmissense_SNVFBXW7c.A971Gp.H324RYesTSG
5:112174898,G>TstopgainAPCc.G1507Tp.G503XNoTSG
6:33286928,G>Tmissense_SNVDAXXc.C1784Ap.P595HNoTSG
6:157505442,GA>Gframeshift_deletionARID1Bc.3385delAp.K1129fsNoTSG
7:140453136,A>Tmissense_SNVBRAFc.T1799Ap.V600EYesOncogene
9:21974705,G>Amissense_SNVCDKN2Ac.C122Tp.P41LNoTSG
13:32954022,CA>Cframeshift_deletionBRCA2c.9090delAp.T3030fsYesTSG
19:1221306,G>Amissense_SNVSTK11c.G829Ap.D277NNoTSG
19:42793222,G>Amissense_SNVCICc.G1114Ap.A372TNoTSG
20:31024242,C>TstopgainASXL1c.C3400Tp.Q1134XNoTSG
X:76938647,G>Amissense_SNVATRXc.C1987Tp.R663CYesTSG
P07014:153244185,G>AstopgainFBXW7c.C1444Tp.R482XYesTSG
4:153247289,G>Amissense_SNVFBXW7c.C985Tp.R329CYesTSG
5:112174094,T>TAstopgainAPCc.704dupAp.Y235_N236delinsXYesTSG
5:112175507,C>TstopgainAPCc.C2116Tp.Q706XYesTSG
11:108202177,G>Amissense_SNVATMc.G2593Ap.G865RNoTSG
17:70119805,CT>Cframeshift_deletionSOX9c.808delTp.F270fsNoTSG
P0909 1:27106105,C>Tmissense_SNVARID1Ac.C442Tp.R148WNoTSG
2:29416773,T>Cmissense_SNVALKc.A976Gp.N326DNoOncogene
3:52610644,T>Cmissense_SNVPBRM1c.A3508Gp.T1170ANoTSG
4:153244155,TC>Tframeshift_deletionFBXW7c.1473delGp.G491fsYesTSG
5:112173917,C>TstopgainAPCc.C526Tp.R176XYesTSG
9:139395150,T>Cmissense_SNVNOTCH1c.A5788Gp.T1930ANoTSG
10:123276800,GA>Gframeshift_deletionFGFR2c.828delTp.F276fsNoOncogene
11:108186742,C>TstopgainATMc.C1171Tp.R391XYesTSG
11:119077219,A>Tmissense_SNVCBLc.A92Tp.D31VNoOncogene
12:46245445,C>Tmissense_SNVARID2c.C1541Tp.T514MNoTSG
13:28589318,T>Cmissense_SNVFLT3c.A2606Gp.Q869RNoOncogene
15:45007681,T>Cmissense_SNVB2Mc.T128Cp.L43PYesTSG
15:45007824,A>ACframeshift_insertionB2Mc.272dupCp.T91fsNoTSG
15:90631688,A>ATframeshift_insertionIDH2c.190dupAp.M64fsNoOncogene
16:3781375,G>Amissense_SNVCREBBPc.C4876Tp.R1626CYesTSG
19:11144117,T>Cmissense_SNVSMARCA4c.T1307Cp.M436TNoTSG
19:42795608,AC>Aframeshift_deletionCICc.2689delCp.P897fsNoTSG
22:41556705,A>Gmissense_SNVEP300c.A3650Gp.D1217GNoTSG
22:41574697,T>TCframeshift_insertionEP300c.6983dupCp.S2328fsNoTSG
X:63411935,T>TCframeshift_insertionAMER1c.1231dupGp.E411fsNoTSG
P10127:140453136,A>Tmissense_SNVBRAFc.T1799Ap.V600EYesOncogene
9:139402561,C>Gmissense_SNVNOTCH1c.G1046Cp.G349ANoTSG
17:7577082,C>Tmissense_SNVTP53c.G379Ap.E127KYesTSG
18:48581243,C>TstopgainSMAD4c.C82Tp.Q28XYesTSG

aMSI pateints are labelled in bold.

bSNV, single nucleotide variant.

cTSG, tumour suppressor gene.

Online-only Table 3

DNA fragment sizes of short tandem repeat loci from tested patients in microsatellite instability testing experiment.

Normal DNATumour DNA
PatientaMarkerbSize 1Size 2Size 3Size 1Size 2Size 3Size 4
P0909 NR21106.09106.0296.76
Bat26177.31166.03161.95177.4
Bat25118.5118.47111.84
NR24134.3134.2127.87
Mono27170.33172.23170.42160.49172.2
PentaC230.4230.4204.36
PentaD172.39176.99177.08172.37181.97
P0413 NR21107.09103.3598.66107.16
Bat26178.46172.26169.19178.38
Bat25119.43119.44114.73
NR24134.2134.2127.9
Mono27172.23172.17165.05
PentaC230.5251.38230.46225.27240.89251.35
PentaD181.93191.65191.65181.93
P0825 NR21105.55105.57103.6998.12
Bat26178178166.76163.62
Bat25119.71119.8114.19111.42
NR24134.06134.06128.7126.06
Mono27172.95173.05164.95163.24
PentaC230.28230.29
PentaD176.57186.2176.67186.33
P0123 NR21106.67101.93100.1106.66
Bat26178.16173.98167.87178.15
Bat25118.96119.04112.53
NR24134.03134.08130.5
Mono27173.09173.06167.69
PentaC230.26230.26
PentaD176.77196.07196.03176.76
P1212NR21106.11106.2
Bat26178.46178.37
Bat25119.51119.49
NR24133.21133.37
Mono27169.55169.43
PentaC225.37230.56225.3230.42
PentaD176.93206.02177.07206.15
P0215NR21105.97105.93
Bat26178.11178.12
Bat25119.28119.11
NR24133.66133.47
Mono27173.29173.17
PentaC219.8230.25219.8230.17
PentaD176.39190.96176.25190.78
P0701NR21106.99107.16
Bat26177.31177.26
Bat25119.5119.33
NR24133.32133.36
Mono27172.23172.27
PentaC220.01230.42219.99230.41
PentaD191.54191.64
P1012NR21106.01106.09
Bat26177.4177.44
Bat25119.32119.33
NR24133.31133.32
Mono27172.2172.17
PentaC230.52235.67230.44235.62
PentaD191.64191.62
P1207NR21105.96106.09
Bat26178.4178.4
Bat25119.33119.33
NR24133.43133.26
Mono27170.41172.33170.4172.16
PentaC230.53235.74230.42235.61
PentaD172.49191.71172.32191.71

MSI pateints are labelled in bold.

PentaC and PantaD, two much less variable pentanucleotide repeats.

Statistics of somatic mutations detected by whole exome sequencing of CRC tumours. Somatic mutations were detected by variant caller Strelka and were annotated with ANNOVAR. aMSI pateints are labelled in bold. bSNV,single nucleotide variant.

Technical Validation

Validating the presence of tumour-infiltrating lymphocytes

OpalTM multi-colour IHC staining were performed with anti-CD3, CD8,CD4, and FOXP3 antibodies to validate the existence of infiltrating TC, TH and Treg cells in tumour tissues (Fig. 5a).
Fig. 5

Quality assessment of single cell RNA-seq data. (a) OpalTM multi-colour IHC staining to validate the existence of T cells in CRC tumours (exemplified by P0215 and P1212). (b) One representative example of cDNA size distribution derived from tumour of P0309. (c) One representative fragmentation profile of sequencing library after tagmentation prepared from pooled amplicons produced by PCR amplification of cDNA from samples of P0413. (d) The densities of GC content per sequence for two representative samples of P1212 and P1228. (e) Heatmaps demonstrating the expression levels of classic marker in each T cell subtypes. The right-sided barplots showed the percentages of cell with the expression of corresponding genes (TPM > 0). RFU, relative fluorescence unit.

Quality assessment of single cell RNA-seq data. (a) OpalTM multi-colour IHC staining to validate the existence of T cells in CRC tumours (exemplified by P0215 and P1212). (b) One representative example of cDNA size distribution derived from tumour of P0309. (c) One representative fragmentation profile of sequencing library after tagmentation prepared from pooled amplicons produced by PCR amplification of cDNA from samples of P0413. (d) The densities of GC content per sequence for two representative samples of P1212 and P1228. (e) Heatmaps demonstrating the expression levels of classic marker in each T cell subtypes. The right-sided barplots showed the percentages of cell with the expression of corresponding genes (TPM > 0). RFU, relative fluorescence unit.

Validating the genomic features of CRC patients

Exome sequencing of bulk tumours from 12 patients showed that four patients harboured mutations in TP53 and five patients harboured mutations in APC/FBXW7. These genomic alterations were consistent with the characteristics of colon adenocarcinoma (COAD) and rectum adenocarcinoma (READ) from The Cancer Genome Atlas (TCGA)[33]. Summarized tables were provided for the statistics of somatic mutations (Table 2) and selected cancer-associated somatic mutations (Online-only Table 2) that were detected in these patients.

Validating the genomic alterations of MSI patients

Among the 12 CRC patients, 4 patients (P0123, P0909, P0825 and P0413) showed deficient in DNA mismatch repair based on IHC testing of four markers (MLH1, MSH2, MSH6, and PMS2)[11], which was also supported by the much higher mutation load (Table 2). To further confirm the MSI status of these patients, we performed microsatellite instability testing by multiplex fluorescent PCR-based assay. Indeed, we found that 4 tumours from MSI patients were characterized by MSI-H phenotypes with two or more mononucleotide loci showing instability (Online-only Table 3).

Validation of RNA samples & RNA-seq libraries

Quality control procedure was performed following the first round of purification of amplified cDNA products, including the detection of CD3D by qPCR and fragment analysis. For single cell samples with high quality (cycle threshold <30), the DNA products were further purified and the concentration of each sample was quantified (Fig. 5b). The constructed multiplex libraries were purified and pooled for quality assessment (Fig. 5c).

Validating the quality of scRNA-seq data

Quality control analyses revealed that the raw sequence data were of high quality, with an average high-quality rate of 91.3% (Online-only Table 1). We assessed the qualities of clean data by statistics of per sequence quality scores and per sequence GC contents. For each sequence, an average of 87.9% bases have a quality score higher than phred quality 30 (Q30), and 94.5% bases have a quality score higher than phred quality 20 (Q20) (Online-only Table 1). In addition, the GC contents of each sample showed a similar normal distribution, with a mean value of 46.2% (Fig. 5d and Online-only Table 1). These statistics indicated that high-quality RNA-seq reads were obtained for downstream analysis.

Validating cell types by marker genes

To evaluate the accuracy of FACS, we examined the expression of conventional marker genes of T cell subsets, including CD3D, CD3E, CD3G, CD8A, CD8B, CD4, IL2RA and FOXP3 (Fig. 5e). While dropout event is prevalent and challenging in single cell RNA-seq data, the gene expression levels of classical T cell markers were consistent with protein levels measured by FACS. Specifically, all T cells were characterized by high expression of CD3 genes (CD3D, CD3E and CD3G). Most TC cells expressed high-level of CD8 (CD8A, CD8B) but low-level of CD4, whereas TH cells and Tregs exhibited the opposite pattern. Tregs showed high expressions of IL2RA encoding transmembrane protein CD25 and regulatory transcription factor FOXP3 compared with TH cells (Fig. 5e). Therefore, the expression patterns of classic T cell markers confirmed the reliability of T cell subtypes.

Usage Notes

To facilitate reuse of our T cell dataset and broaden the user community, we developed a web server and will use the following sections to elaborate the design and functionalities provided by iSTARTRAC. iSTRATRAC is available at http://crctcell.cancer-pku.cn/.

Design and implementation

Although we have provided an online portal at http://crc.cancer-pku.cn to depict gene expressions, only limited functionalities were presented, hindering the wide usage of our data. Here, to facilitate further exploration of our T cell data, we have developed a much enhanced web server iSTARTRAC to enable the comprehensive and customizable analyses. The iSTARTRAC website is deployed on server with 64GB RAM and CPU Gold 6149 × 16 cores running the Ubuntu (version 16.04.4) Linux (version 4.4.0) operating system. The interface is constructed using the Shiny web application framework (version 1.2.0) in R (version 3.5.0) running on the Shiny-server (version 1.5.6.875). iSTARTRAC is freely available to all users with no login requirement, and can be accessed by most web browsers including Google Chrome, Mozilla Firefox, Safari and Internet Explorer. The website automatically adjusts the look and feel according to different browsers and devices, but Google Chrome is recommended to achieve the best visualization.

Sample options panel

In each module of iSTARTRAC, four categories of basic options are available for modulating the input samples of interest, including Cluster, Cell Type, Tissue Type and Patient. The Cluster icon consists of 20 clusters including 8 for CD8+ T cells and 12 for CD4+ T cells, and the Cell Type icon is composed of five cell types including CD8+ T cells, CD4+ T cells, CD4+ CD25− T cells, CD4+ CD25+ T cells and CD4+ CD25++ T cells defined by FACS. Peripheral blood (P), adjacent normal (N) and tumour infiltrating (T) are included in the Tissue Type icon. The Patient icon contains eight MSS patients, as well as four MSI patients. Moreover, iSTARTRAC presents interactive sliders that can be adjusted to change the dot sizes and line widths to achieve optimal visualization of the plots. Plots are regenerated on-the-fly as the user changes sliders or samples, providing an interactive experience that makes it possible to perform customizable analyses.

Functionalities

iSTARTRAC provides key interactive and customizable functions including cluster visualization, gene expression demonstration, differential expression analyses between clusters or cell types, TCR sharing illustration, customizable analysis of STARTRAC indices and discrimination of differences between MSI and MSS patients (Fig. 4).

Cluster atlas

iSTARTRAC dynamically demonstrates the tSNE plot of cell clusters for user-defined T cells derived from given cell clusters, tissue origins, cell types and patients (in the ‘tSNE Plot’ tab). In addition, an annotation table of basic information of T cells is shown and users are allowed to download the table by clicking the DOWNLOAD button (in the ‘Table’ tab).

Gene expression

In this module, iSTARTRAC interactively plots expression distribution of a given gene in different clusters according to user-defined sample selections. The results can be presented in tSNE plot (in the ‘tSNE Plot’ tab), violin plots (in the ‘Violin Plot’ tab), or box plots (in the ‘Box Plot’ tab).

Differential expression analysis

iSTARTRAC performs differential expression (DE) analyses and identifies differentially expressed genes (DEGs) between any two given clusters (in ‘Cluster DEG’ tab) or cell types (in ‘Cell Type DEG’ tab), illustrating the results in volcano plots. Single cell transcriptome data is exceptionally appropriate for dissecting the intrinsic cellular heterogeneity. In addition to the commonly used unsupervised clustering, pairwise gene expression distribution, a simple and effective approach similar to FACS with proteins, can also be utilized to detect cell subpopulations. Accordingly, iSTARTRAC allows users to input a pair of genes to dynamically compartmentalize cell subpopulations and performs differential expression analysis for any two subdivided populations (in ‘in silico FACS’ tab). Users can adjust the thresholds of low/high-expression, as well as the significance thresholds of fold change and p-values after multiple testing adjustments. Furthermore, summary tables of signature gene for CD8+ and CD4+ T cells are provided and can be downloaded (in ‘Table’ tab).

TCR-based analysis

For any user-defined frequency of clonal cells, iSTARTRAC provides a tSNE plot to illustrate the distribution of clonal cells in each cluster, with non-clonal cells (cells harbouring TCRs with a frequency below the defined threshold) coloured in grey as background (in ‘tSNE Plot’ tab). The enormous TCR repertoire, which is essential for recognising foreign antigens and tumour neoantigens, could serve as tags to track T cell lineages. Accordingly, iSTARTRAC plots a heatmap to depict the TCR sharing patterns of various clusters enriched in different tissues (in ‘TCR Sharing’ tab), providing the clues of cross-tissue migration and state transition. In addition, iSTARTRAC presents bar plots to show the clonotype statistics of user-defined samples (in ‘Clonotype Statistics’ tab). A summary table of TCR typing is displayed and can be downloaded, which contains the information of TCR sequences and corresponding samples (in ‘Table’ tab).

STRATRAC indices

For given samples, iSTARTRAC dynamically illustrates the STRATRAC-dist indices to dissect the tissue preference of T cell clusters, yielding a discrete enrichment table decorated with colours (in ‘STARTRAC-dist’ tab). Users are allowed to adjust the thresholds for discretizing enrichment levels quantified by R (the ratio of observed over expected cell numbers in tissues to measure the enrichment of T cell clusters across different tissues). To reveal dynamic relationships of T cell subsets with respect to clonal expansion, migration and development transition, iSTARTRAC plots STRATRAC-expa/migr/tran indices for samples of user interest (in ‘STRATRAC-expa/migr/tran’ tab). Furthermore, pairwise STRATRAC-migr (in ‘pSTRATRAC-migr’ tab) and pairwise STRATRAC-tran (in ‘pSTRATRAC-tran’ tab) could also be dynamically illustrated according to user defined sample selections.

MSI versus MSS

With this module, users can delineate differences in term of cell compositions (in ‘Cell Percentage’ tab), STARTRAC indices (in ‘STARTRAC-expa/migr/tran’ tab) and gene expressions (in ‘DEG Analysis’ tab) between MSI and MSS patients for user-specified dataset of interest.

Summary of scRNA-seq data application

The compendium dataset provided here, was produced primarily to illustrate the dynamic relationships of tumour-infiltrating lymphocytes in CRC, including functional states, clonal expansions, migrations and developmental transitions[11]. The dataset can be further utilized to detect the transcript isoforms, non-coding transcripts and the potential splice variants. The differential isoform usages of T cell subtypes will shed new light on the underlying regulatory mechanisms of phenotypic differentiation and will provide opportunities for immuno-oncology modulation by determining the subtype specific expression of known and novel isoforms in TILs. In addition, our dataset could serve as a resource for the comparison of different library preparation methods such as Smart-seq2 protocol and 10X platform, providing specific features of RNA-seq data produced with Smart-seq2 protocol. The interactive platform, iSTARTRAC, could be explored by experimental biologists to dissect regulatory mechanisms of T cell differentiation, identify novel targets of immunotherapy, as well as to compare the differences of T cell compositions, gene expressions and STARTRAC indices between MSI and MSS patients. The comprehensive and customizable analyses with simple clicking through iSTARTRAC will facilitate data mining in cancer immunology community and help unleash the potential value of our CRC T cell data resource.
Design Type(s)transcription profiling design • disease analysis objective
Measurement Type(s)transcription profiling assay
Technology Type(s)RNA sequencing
Factor Type(s)Microsatellite Instability • age • sex • experimental condition • tumor stage
Sample Characteristic(s)Homo sapiens • lymphocyte
  29 in total

1.  Full-length RNA-seq from single cells using Smart-seq2.

Authors:  Simone Picelli; Omid R Faridani; Asa K Björklund; Gösta Winberg; Sven Sagasser; Rickard Sandberg
Journal:  Nat Protoc       Date:  2014-01-02       Impact factor: 13.491

2.  Linking T-cell receptor sequence to functional phenotype at the single-cell level.

Authors:  Arnold Han; Jacob Glanville; Leo Hansmann; Mark M Davis
Journal:  Nat Biotechnol       Date:  2014-06-22       Impact factor: 54.908

3.  The future of cancer treatment: immunomodulation, CARs and combination immunotherapy.

Authors:  Danny N Khalil; Eric L Smith; Renier J Brentjens; Jedd D Wolchok
Journal:  Nat Rev Clin Oncol       Date:  2016-04-26       Impact factor: 66.675

Review 4.  CD8+ cytotoxic T lymphocytes in cancer immunotherapy: A review.

Authors:  Bagher Farhood; Masoud Najafi; Keywan Mortezaee
Journal:  J Cell Physiol       Date:  2018-11-22       Impact factor: 6.384

5.  PD-1 Blockade in Tumors with Mismatch-Repair Deficiency.

Authors:  Dung T Le; Jennifer N Uram; Hao Wang; Bjarne R Bartlett; Holly Kemberling; Aleksandra D Eyring; Andrew D Skora; Brandon S Luber; Nilofer S Azad; Dan Laheru; Barbara Biedrzycki; Ross C Donehower; Atif Zaheer; George A Fisher; Todd S Crocenzi; James J Lee; Steven M Duffy; Richard M Goldberg; Albert de la Chapelle; Minori Koshiji; Feriyl Bhaijee; Thomas Huebner; Ralph H Hruban; Laura D Wood; Nathan Cuka; Drew M Pardoll; Nickolas Papadopoulos; Kenneth W Kinzler; Shibin Zhou; Toby C Cornish; Janis M Taube; Robert A Anders; James R Eshleman; Bert Vogelstein; Luis A Diaz
Journal:  N Engl J Med       Date:  2015-05-30       Impact factor: 91.245

6.  Pembrolizumab versus Chemotherapy for PD-L1-Positive Non-Small-Cell Lung Cancer.

Authors:  Martin Reck; Delvys Rodríguez-Abreu; Andrew G Robinson; Rina Hui; Tibor Csőszi; Andrea Fülöp; Maya Gottfried; Nir Peled; Ali Tafreshi; Sinead Cuffe; Mary O'Brien; Suman Rao; Katsuyuki Hotta; Melanie A Leiby; Gregory M Lubiniecki; Yue Shentu; Reshma Rangwala; Julie R Brahmer
Journal:  N Engl J Med       Date:  2016-10-08       Impact factor: 91.245

7.  Fast and SNP-tolerant detection of complex variants and splicing in short reads.

Authors:  Thomas D Wu; Serban Nacu
Journal:  Bioinformatics       Date:  2010-02-10       Impact factor: 6.937

8.  SC3: consensus clustering of single-cell RNA-seq data.

Authors:  Vladimir Yu Kiselev; Kristina Kirschner; Michael T Schaub; Tallulah Andrews; Andrew Yiu; Tamir Chandra; Kedar N Natarajan; Wolf Reik; Mauricio Barahona; Anthony R Green; Martin Hemberg
Journal:  Nat Methods       Date:  2017-03-27       Impact factor: 28.547

9.  Fast and accurate long-read alignment with Burrows-Wheeler transform.

Authors:  Heng Li; Richard Durbin
Journal:  Bioinformatics       Date:  2010-01-15       Impact factor: 6.937

10.  T cell fate and clonality inference from single-cell transcriptomes.

Authors:  Michael J T Stubbington; Tapio Lönnberg; Valentina Proserpio; Simon Clare; Anneliese O Speak; Gordon Dougan; Sarah A Teichmann
Journal:  Nat Methods       Date:  2016-03-07       Impact factor: 28.547

View more
  21 in total

1.  De novo prediction of cancer-associated T cell receptors for noninvasive cancer detection.

Authors:  Daria Beshnova; Jianfeng Ye; Oreoluwa Onabolu; Benjamin Moon; Wenxin Zheng; Yang-Xin Fu; James Brugarolas; Jayanthi Lea; Bo Li
Journal:  Sci Transl Med       Date:  2020-08-19       Impact factor: 17.956

Review 2.  Current and Prospective Methods for Assessing Anti-Tumor Immunity in Colorectal Cancer.

Authors:  Yulia I Nussbaum; Yariswamy Manjunath; Kanve N Suvilesh; Wesley C Warren; Chi-Ren Shyu; Jussuf T Kaifi; Matthew A Ciorba; Jonathan B Mitchem
Journal:  Int J Mol Sci       Date:  2021-04-30       Impact factor: 5.923

3.  NKG2A is a late immune checkpoint on CD8 T cells and marks repeated stimulation and cell division.

Authors:  Linda Borst; Marjolein Sluijter; Gregor Sturm; Pornpimol Charoentong; Saskia J Santegoets; Mandy van Gulijk; Marit J van Elsas; Christianne Groeneveldt; Nadine van Montfoort; Francesca Finotello; Zlatko Trajanoski; Szymon M Kiełbasa; Sjoerd H van der Burg; Thorbald van Hall
Journal:  Int J Cancer       Date:  2021-11-10       Impact factor: 7.316

4.  Single-cell RNA sequencing coupled to TCR profiling of large granular lymphocyte leukemia T cells.

Authors:  Shouguo Gao; Zhijie Wu; Bradley Arnold; Carrie Diamond; Sai Batchu; Valentina Giudice; Lemlem Alemu; Diego Quinones Raffo; Xingmin Feng; Sachiko Kajigaya; John Barrett; Sawa Ito; Neal S Young
Journal:  Nat Commun       Date:  2022-04-11       Impact factor: 14.919

5.  Systematic comparison of high-throughput single-cell RNA-seq methods for immune cell profiling.

Authors:  Tracy M Yamawaki; Daniel R Lu; Daniel C Ellwanger; Dev Bhatt; Paolo Manzanillo; Vanessa Arias; Hong Zhou; Oh Kyu Yoon; Oliver Homann; Songli Wang; Chi-Ming Li
Journal:  BMC Genomics       Date:  2021-01-20       Impact factor: 3.969

6.  A T cell repertoire timestamp is at the core of responsiveness to CTLA-4 blockade.

Authors:  Hagit Philip; Tom Snir; Miri Gordin; Mikhail Shugay; Alona Zilberberg; Sol Efroni
Journal:  iScience       Date:  2021-01-27

7.  Dickkopf 1 impairs the tumor response to PD-1 blockade by inactivating CD8+ T cells in deficient mismatch repair colorectal cancer.

Authors:  Qiaoqi Sui; Dingxin Liu; Wu Jiang; Jinghua Tang; Lingheng Kong; Kai Han; Leen Liao; Yuan Li; Qingjian Ou; Binyi Xiao; Guochen Liu; Yihong Ling; Jiewei Chen; Zexian Liu; Zhixiang Zuo; Zhizhong Pan; Penghui Zhou; Jian Zheng; Pei-Rong Ding
Journal:  J Immunother Cancer       Date:  2021-03       Impact factor: 13.751

8.  Evaluating the Reproducibility of Single-Cell Gene Regulatory Network Inference Algorithms.

Authors:  Yoonjee Kang; Denis Thieffry; Laura Cantini
Journal:  Front Genet       Date:  2021-03-22       Impact factor: 4.599

9.  CD161 expression and regulation defines rapidly responding effector CD4+ T cells associated with improved survival in HPV16-associated tumors.

Authors:  Chantal L Duurland; Saskia J Santegoets; Ziena Abdulrahman; Nikki M Loof; Gregor Sturm; Tom H Wesselink; Ramon Arens; Sanne Boekestijn; Ilina Ehsan; Mariette I E van Poelgeest; Francesca Finotello; Hubert Hackl; Zlatko Trajanoski; Peter Ten Dijke; Veronique M Braud; Marij J P Welters; Sjoerd H van der Burg
Journal:  J Immunother Cancer       Date:  2022-01       Impact factor: 13.751

10.  Intratumor expanded T cell clones can be non-sentinel lymph node derived in breast cancer revealed by single-cell immune profiling.

Authors:  Shiping Jiao; Qing Xiong; Meisi Yan; Xiaolu Zhan; Zhenhuang Yang; Cheng Peng; Beicheng Sun; Da Pang; Tong Liu
Journal:  J Immunother Cancer       Date:  2022-01       Impact factor: 13.751

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.