| Literature DB >> 35906372 |
Agustin Ure1, Dhananjay Mukhedkar1,2, Laila Sara Arroyo Mühr3.
Abstract
In the era of cervical cancer elimination, accurate and validated pipelines to detect human papillomavirus are essential to elucidate and understand HPV association with human cancers. We aimed to provide an open-source pipeline, "HPV-meta", to detect HPV transcripts in RNA sequencing data, including several steps to warn operators for possible viral contamination. The "HPV-meta" pipeline automatically performs several steps, starting with quality trimming, human genome filtering, HPV detection (blastx), cut-off settlement (10 reads and 690 bp coverage to make an HPV call) and finishing with fasta sequence generation for HPV positive samples. Fasta sequences can then be aligned to assess sequence diversity among HPV positive samples. All RNA sequencing files (n = 10,908) present in the cancer genome atlas (TCGA) were analyzed. "HPV-meta" identified 25 different HPV types being present in 488/10,904 specimens. Validation of results showed 99.98% agreement (10,902/10,904). Multiple alignment from fasta files warned about high sequence identity between several HPV 18 and 38 positive samples, whose contamination had previously been reported. The "HPV-meta" pipeline is a robust and validated pipeline that detects HPV in RNA sequencing data. Obtaining the fasta files enables contamination investigation, a non very rare occurrence in next generation sequencing.Entities:
Mesh:
Substances:
Year: 2022 PMID: 35906372 PMCID: PMC9338075 DOI: 10.1038/s41598-022-17318-5
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.996
Figure 1“HPV-meta” pipeline for detecting HPV transcripts in RNA sequencing data. Flowchart describing the pipeline steps included in “HPV-meta”. The pipeline includes removal of human reads, sort and conversion to fq files (samtools v. 1.10), quality trimming (Trimmomatic v. 0.39 (https://github.com/usadellab/Trimmomatic)[22] extra trimming (needed for specific library preparation kits, e.g: removing 3 bp of R2 from libraries prepared with the Smarter stranded total RNA-seq kit from Takara, USA) performed with Cutadapt v. 3.3)[23] re-mapping to human reference genome (double human cleaning using Nextgenmap v. 0.5.5)[24] filtering out of human reads, mapping non-human reads to an HPV protein database (Diamond v. 2.0.7)[25] coverage calculation and, if HPV positivity is present, a fasta file is generated by mapping the reads to HPV genome references and subjecting them to variant calling using GATK v. 4.2.3.0[26] (Image created using https://app.diagrams.net/ and Inkscape v. 1.1, https://inkscape.org/).
HPV types detected in RNA sequencing data from TCGA using “HPV-meta”.
| HPV type | Tissue or organ of origin | Total positive samples |
|---|---|---|
| HPV16 | Cervix uteri (171); Tonsil, NOS (30); Base of tongue, NOS (7); Tongue, NOS (6); Cerebrum (4); | 261 |
| Overlapping lesion of lip, oral cavity and pharynx (4); Bladder, NOS (4); Larynx, NOS (3); | ||
| Liver (3); Oropharynx, NOS (2); Endometrium (2); Breast, NOS (2); Prostate gland (2); | ||
| Upper lobe, lung (2); Connective, subcutaneous and other soft tissues of lower limb and hip (2); | ||
| Hypopharynx, NOS (2); Hard palate (2); Lower lobe, lung (1); Lower gum (1); Mouth, NOS (1); | ||
| Floor of mouth, NOS (1); Kidney, NOS (1); Pleura, NOS (1); Posterior wall of oropharynx (1); | ||
| Sigmoid colon (1); Skin, NOS (1); Head of pancreas (1); Brain, NOS (1); Upper Gum (1); | ||
| Gum, NOS (1) | ||
| HPV18 | Cervix uteri (40); Kidney, NOS (6); Sigmoid colon (6); Bladder, NOS (4); Ovary (3); | 72 |
| Cardia, NOS (2); Cecum (2); Colon, NOS (2); Ascending colon (1); Body of stomach (1); | ||
| Descending colon (1); Gastric antrum (1); Liver (1); Rectum, NOS (1); Transverse colon (1) | ||
| HPV38 | Endometrium (41) | 41 |
| HPV45 | Cervix uteri (23); Endometrium (2); Liver (2); Lateral wall of bladder (1); Thyroid gland (1) | 29 |
| HPV33 | Cervix uteri (9); Tonsil, NOS (3); Overlapping lesion of lip, oral cavity and pharynx (2); | 18 |
| Tongue, NOS (2); Base of tongue, NOS (1); Floor of mouth, NOS (1) | ||
| HPV35 | Cervix uteri (6); Liver (3); Base of tongue, NOS (2); Kidney, NOS (1); Tonsil, NOS (1) | 13 |
| HPV58 | Cervix uteri (8); Cerebrum (1) | 9 |
| HPV52 | Cervix uteri (8); Posterior wall of bladder (1) | 9 |
| HPV31 | Cervix uteri (7) | 7 |
| HPV39 | Cervix uteri (5) | 5 |
| HPV51 | Cervix uteri (1); Kidney, NOS (1); Retroperitoneum (1) | 3 |
| HPV59 | Cervix uteri (3) | 3 |
| HPV30 | Cervix uteri (1); Upper lobe, lung (1) | 2 |
| HPV68 | Cervix uteri (2) | 2 |
| HPV70 | Cervix uteri (2) | 2 |
| HPV56 | Cervix uteri (1); Lateral wall of bladder (1) | 2 |
| HPV73 | Cervix uteri (2) | 2 |
| HPV2 | Breast, NOS (1) | 1 |
| HPV69 | Cervix uteri (1) | 1 |
| HPV-mSK041 | Ovary (1) | 1 |
| HPV133 | Corpus uteri (1) | 1 |
| HPV26 | Cervix uteri (1) | 1 |
| HPV155 | Brain, NOS (1) | 1 |
| HPV94 | Kidney, NOS (1) | 1 |
| HPV6 | Anterior wall of bladder (1) | 1 |
HPV types detected in TCGA RNA sequencing files. Number in brackets corresponds to number of specimens for each organ/tissue of origin.
Figure 2HPV pairwise comparison among HPV positive specimens detected in TCGA. Pairwise comparison performed for HPV16 positive RNA sequences in the TCGA database (a), for HPV 38 sequences (b) and for HPV 18 sequences (c). BLCA: Bladder Urothelial Carcinoma; BRCA: Breast Invasive Carcinoma; CESC: Cervical Squamous Cell Carcinoma and Endocervical Adenocarcinoma; COAD:Colon Adenocarcinoma; HNSC: Head and Neck Squamous Cell Carcinoma; KIRC: Kidney Renal Clear Cell Carcinoma; KIRP: Kidney Renal Papillary Cell Carcinoma; LGG: Brain Lower Grade Glioma; LIHC: Liver Hepatocellular Carcinoma; LUSC: Lung Squamous Cell Carcinoma; MESO: Mesothelioma; OV: Ovarian Serous Cystadenocarcinoma; PAAD: Pancreatic Adenocarcinoma; PRAD: Prostate Adenocarcinoma; READ: Rectum Adenocarcinoma; SARC: Sarcoma; SKCM: Skin Cutaneous Melanoma; STAD: Stomach Adenocarcinoma; UCEC: Uterine Corpus Endometrial Carcinoma. (Image created using Python v. 3.8.10 and Seaborn library v. 0.11.2, https://seaborn.pydata.org/).