Literature DB >> 32330223

scHLAcount: allele-specific HLA expression from single-cell gene expression data.

Charlotte A Darby1, Michael J T Stubbington2, Patrick J Marks2, Álvaro Martínez Barrio2, Ian T Fiddes2.   

Abstract

SUMMARY: Bulk RNA sequencing studies have demonstrated that human leukocyte antigen (HLA) genes may be expressed in a cell type-specific and allele-specific fashion. Single-cell gene expression assays have the potential to further resolve these expression patterns, but currently available methods do not perform allele-specific quantification at the molecule level. Here, we present scHLAcount, a post-processing workflow for single-cell RNA-seq data that computes allele-specific molecule counts of the HLA genes based on a personalized reference constructed from the sample's HLA genotypes.
AVAILABILITY AND IMPLEMENTATION: scHLAcount is available under the MIT license at https://github.com/10XGenomics/scHLAcount. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
© The Author(s) 2020. Published by Oxford University Press.

Entities:  

Year:  2020        PMID: 32330223      PMCID: PMC7320622          DOI: 10.1093/bioinformatics/btaa264

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 Introduction

The class I and class II human leukocyte antigen (HLA) genes play an important role in antigen presentation in the immune system, and are highly variable in the human population with hundreds of cataloged alleles (Robinson ). Studies using bulk RNA-seq data have shown that HLA genes are expressed at different levels among human tissues and immune cell types (Boegel ), and allele-specific expression (ASE) has been observed in lymphoblastoid cell lines (Aguiar ; Lee ). However, expression of these genes may be underestimated in RNA-seq experiments due to poor read mapping caused by sequence divergence between the standard reference genome and the alleles in the reads. It is particularly useful to understand ASE of HLA genes in the context of single cells and particular cell types. For example, cell type-specific HLA class I and class II expression can influence immunotherapy response in cancer (Chowell ; Johnson ). scHLAcount enables allele-specific analysis of the HLA genes in single-cell gene expression data, such as those produced by the 10× Genomics Single Cell Immune Profiling (5′ capture) and Gene Expression (GEX) (3′ capture) Solutions. Based on the genotypes of the sample, scHLAcount constructs a personalized reference and computes allele-specific molecule counts for HLA class I and class II genes. This output can be used to study ASE of HLA genes at the single-cell resolution.

2 Implementation

scHLAcount is a post-processing workflow for single-cell gene expression data that produce allele-specific molecule counts for the main HLA class I and class II genes in each cell (Fig. 1). Users provide the specific HLA alleles present in their sample of interest. These can be obtained by specialized molecular tests, such as sequence-specific oligonucleotide probe PCR, sequence-specific primed PCR, or Sanger sequence-based typing (Erlich, 2012). Alternatively, algorithms for sequence-based typing from next-generation sequencing reads of the genome, exome or transcriptome that use allele databases to infer genotypes can be employed [reviewed by Bauer ]. Tian attempted to genotype individual cells for HLA class I using scRNA-seq data, but found that most cells did not have adequate read coverage. Combining reads from many cells in an scRNA-seq experiment as a ‘pseudo-bulk’ dataset for genotyping is an interesting avenue for further research.
Fig. 1.

scHLAcount takes as input an allele sequence database (e.g. IMGT/HLA), genotypes for the sample being evaluated, cell barcodes and aligned reads (e.g. BAM file from Cell Ranger). Allele sequences and relevant reads are extracted, and pseudoalignment is used to produce an allele-specific molecule count matrix. A snippet of the output matrix is shown for two cell barcodes and one gene (HLA-A) with two alleles

scHLAcount takes as input an allele sequence database (e.g. IMGT/HLA), genotypes for the sample being evaluated, cell barcodes and aligned reads (e.g. BAM file from Cell Ranger). Allele sequences and relevant reads are extracted, and pseudoalignment is used to produce an allele-specific molecule count matrix. A snippet of the output matrix is shown for two cell barcodes and one gene (HLA-A) with two alleles Based on the genotypes provided, scHLAcount extracts the coding and genomic sequences of those alleles from the IMGT/HLA database (Robinson ) and builds two colored de Bruijn graphs, one containing the coding sequences (CDS) and one containing genomic sequences. In addition, scHLAcount uses the read alignments generated by scRNA-seq analysis tools such as Cell Ranger. Reads associated with valid cell barcodes and reported as aligning to the region of the genome containing the HLA genes are extracted from the alignment file and pseudoaligned to the CDS graph following the approach described by Bray . This yields the set of alleles in the reference graph that could have generated the read, also referred to as the equivalence class. If there is no significant alignment to the CDS graph, pseudoalignment is attempted to the genomic sequence graph. In 5′ GEX datasets, we observed up to 12% of aligned reads were only aligned to the genomic sequence graph and not the CDS graph. In 3′ GEX datasets, up to 80% of aligned reads were aligned to the genomic sequence. This genomic alignment step is intended to rescue reads that may be haplotype specific in 3′ or 5′ untranslated regions (UTR). It also provides a mechanism to handle reads from pre-mRNA in single nuclei RNA-seq libraries. Parameters and approaches to missing genotypes are discussed in Supplementary Material S1. Reads sharing a cell barcode and unique molecular identifier (UMI) are assumed to originate from the same RNA molecule. At recommended sequencing depths with modest sequence saturation, there are typically 1–3 reads per UMI. Individual reads may have different equivalence classes according to their pseudoalignment. We ignore reads whose equivalence class contains more than one gene, which we observed was 15–45% of aligned reads in 5′ GEX datasets and 10% of reads in 3′ GEX. If more than half of the reads from a molecule are assigned to a particular gene, that molecule will be assigned to one of its input reference alleles (e.g. HLA-A 02:01), based on the constituent reads’ equivalence classes. In the case of ambiguity, it will be assigned to that gene (e.g. HLA-A) instead. The output is a sparse molecule count matrix where each column corresponds to a barcode in the provided cell barcode list, and each row corresponds to an allele. See Supplementary Material S2 for a more detailed comparison of 3′ and 5′ GEX data with scHLAcount.

3 Results

To illustrate the applications of scHLAcount, we reanalyzed two previously published datasets. First, we applied our method to five acute myeloid leukemia (AML) samples (Petti ) (Supplementary Material S3). Using the scHLAcount allele-specific molecule counts, we detected cell type-specific allele bias. Detailed results from one patient are shown in Supplementary Figure S2 and Tables S1–S3. Second, we reexamined data from two Merkel cell carcinoma (MCC) patients (Paulson ) (Supplementary Material S4). We extend the original finding that HLA class I expression is lost in tumor cells compared with non-tumor cells and use scHLAcount allele-specific molecule counts to show that this expression loss may be allele-specific (Supplementary Fig. S3 and Tables S4–S6).

4 Conclusion

scHLAcount provides a simple way to assign reads from scRNA-seq experiments to HLA alleles given genotypes, and is a powerful tool for investigating ASE, loss of heterozygosity and mutational or epigenetic suppression of HLA expression in tumor immune-evasion. Click here for additional data file.
  11 in total

Review 1.  HLA DNA typing: past, present, and future.

Authors:  H Erlich
Journal:  Tissue Antigens       Date:  2012-07

2.  Near-optimal probabilistic RNA-seq quantification.

Authors:  Nicolas L Bray; Harold Pimentel; Páll Melsted; Lior Pachter
Journal:  Nat Biotechnol       Date:  2016-04-04       Impact factor: 54.908

3.  Patient HLA class I genotype influences cancer response to checkpoint blockade immunotherapy.

Authors:  Diego Chowell; Luc G T Morris; Claud M Grigg; Jeffrey K Weber; Robert M Samstein; Vladimir Makarov; Fengshen Kuo; Sviatoslav M Kendall; David Requena; Nadeem Riaz; Benjamin Greenbaum; James Carroll; Edward Garon; David M Hyman; Ahmet Zehir; David Solit; Michael Berger; Ruhong Zhou; Naiyer A Rizvi; Timothy A Chan
Journal:  Science       Date:  2017-12-07       Impact factor: 47.728

4.  The IPD and IMGT/HLA database: allele variant databases.

Authors:  James Robinson; Jason A Halliwell; James D Hayhurst; Paul Flicek; Peter Parham; Steven G E Marsh
Journal:  Nucleic Acids Res       Date:  2014-11-20       Impact factor: 16.971

5.  Evaluation of computational programs to predict HLA genotypes from genomic sequencing data.

Authors:  Denis C Bauer; Armella Zadoorian; Laurence O W Wilson; Natalie P Thorne
Journal:  Brief Bioinform       Date:  2018-03-01       Impact factor: 11.622

6.  A general approach for detecting expressed mutations in AML cells using single cell RNA-sequencing.

Authors:  Allegra A Petti; Stephen R Williams; Christopher A Miller; Ian T Fiddes; Sridhar N Srivatsan; David Y Chen; Catrina C Fronick; Robert S Fulton; Deanna M Church; Timothy J Ley
Journal:  Nat Commun       Date:  2019-08-14       Impact factor: 17.694

7.  Melanoma-specific MHC-II expression represents a tumour-autonomous phenotype and predicts response to anti-PD-1/PD-L1 therapy.

Authors:  Douglas B Johnson; Monica V Estrada; Roberto Salgado; Violeta Sanchez; Deon B Doxie; Susan R Opalenik; Anna E Vilgelm; Emily Feld; Adam S Johnson; Allison R Greenplate; Melinda E Sanders; Christine M Lovly; Dennie T Frederick; Mark C Kelley; Ann Richmond; Jonathan M Irish; Yu Shyr; Ryan J Sullivan; Igor Puzanov; Jeffrey A Sosman; Justin M Balko
Journal:  Nat Commun       Date:  2016-01-29       Impact factor: 14.919

8.  HLA and proteasome expression body map.

Authors:  Sebastian Boegel; Martin Löwer; Thomas Bukur; Patrick Sorn; John C Castle; Ugur Sahin
Journal:  BMC Med Genomics       Date:  2018-03-27       Impact factor: 3.063

9.  Acquired cancer resistance to combination immunotherapy from transcriptional loss of class I HLA.

Authors:  K G Paulson; V Voillet; M S McAfee; D S Hunter; F D Wagener; M Perdicchio; W J Valente; S J Koelle; C D Church; N Vandeven; H Thomas; A G Colunga; J G Iyer; C Yee; R Kulikauskas; D M Koelle; R H Pierce; J H Bielas; P D Greenberg; S Bhatia; R Gottardo; P Nghiem; A G Chapuis
Journal:  Nat Commun       Date:  2018-09-24       Impact factor: 14.919

10.  AltHapAlignR: improved accuracy of RNA-seq analyses through the use of alternative haplotypes.

Authors:  Wanseon Lee; Katharine Plant; Peter Humburg; Julian C Knight
Journal:  Bioinformatics       Date:  2018-07-15       Impact factor: 6.937

View more
  5 in total

1.  Integration of tumor extrinsic and intrinsic features associates with immunotherapy response in non-small cell lung cancer.

Authors:  Denise Lau; Sonal Khare; Michelle M Stein; Prerna Jain; Yinjie Gao; Aicha BenTaieb; Tim A Rand; Ameen A Salahudeen; Aly A Khan
Journal:  Nat Commun       Date:  2022-07-13       Impact factor: 17.694

Review 2.  Allele-specific expression: applications in cancer and technical considerations.

Authors:  Carla Daniela Robles-Espinoza; Pejman Mohammadi; Ximena Bonilla; Maria Gutierrez-Arcelus
Journal:  Curr Opin Genet Dev       Date:  2020-12-28       Impact factor: 5.578

3.  HLA RNA Sequencing With Unique Molecular Identifiers Reveals High Allele-Specific Variability in mRNA Expression.

Authors:  Tiira Johansson; Dawit A Yohannes; Satu Koskela; Jukka Partanen; Päivi Saavalainen
Journal:  Front Immunol       Date:  2021-02-25       Impact factor: 7.561

4.  Unique Molecular Identifier-Based High-Resolution HLA Typing and Transcript Quantitation Using Long-Read Sequencing.

Authors:  Caleb Cornaby; Maureen C Montgomery; Chang Liu; Eric T Weimer
Journal:  Front Genet       Date:  2022-06-13       Impact factor: 4.772

Review 5.  HLA allele-specific expression: Methods, disease associations, and relevance in hematopoietic stem cell transplantation.

Authors:  Tiira Johansson; Jukka Partanen; Päivi Saavalainen
Journal:  Front Immunol       Date:  2022-09-28       Impact factor: 8.786

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.