Literature DB >> 35918336

Muscle and adipose tissue segmentations at the third cervical vertebral level in patients with head and neck cancer.

Kareem A Wahid1, Brennan Olson2,3, Rishab Jain2, Aaron J Grossberg2, Dina El-Habashy1,4, Cem Dede1, Vivian Salama1, Moamen Abobakr1, Abdallah S R Mohamed1, Renjie He1, Joel Jaskari5, Jaakko Sahlsten5, Kimmo Kaski5, Clifton D Fuller6, Mohamed A Naser7.   

Abstract

The accurate determination of sarcopenia is critical for disease management in patients with head and neck cancer (HNC). Quantitative determination of sarcopenia is currently dependent on manually-generated segmentations of skeletal muscle derived from computed tomography (CT) cross-sectional imaging. This has prompted the increasing utilization of machine learning models for automated sarcopenia determination. However, extant datasets currently do not provide the necessary manually-generated skeletal muscle segmentations at the C3 vertebral level needed for building these models. In this data descriptor, a set of 394 HNC patients were selected from The Cancer Imaging Archive, and their skeletal muscle and adipose tissue was manually segmented at the C3 vertebral level using sliceOmatic. Subsequently, using publicly disseminated Python scripts, we generated corresponding segmentations files in Neuroimaging Informatics Technology Initiative format. In addition to segmentation data, additional clinical demographic data germane to body composition analysis have been retrospectively collected for these patients. These data are a valuable resource for studying sarcopenia and body composition analysis in patients with HNC.
© 2022. The Author(s).

Entities:  

Mesh:

Year:  2022        PMID: 35918336      PMCID: PMC9346108          DOI: 10.1038/s41597-022-01587-w

Source DB:  PubMed          Journal:  Sci Data        ISSN: 2052-4463            Impact factor:   8.501


Background & Summary

Head and neck cancer (HNC) affects more than 900,000 individuals worldwide annually[1]. Sarcopenia, a body composition status describing skeletal muscle depletion, is a well-validated negative prognostic factor in patients with HNC and has become increasingly studied in recent years[2-4]. Sarcopenia is quantitatively determined primarily using the cross-sectional estimate of skeletal muscle at a specific vertebral level. Current methods to generate cross-sectional skeletal muscle segmentations for use in sarcopenia determination are reliant on expert human-generated segmentations, which can be time-consuming to procure and subject to user variability[5]. Therefore, the dissemination of high-quality skeletal muscle segmentations is of paramount importance to develop tools for sarcopenia-related clinical decision making. Publicly disseminated HNC datasets have increased sharply in recent years. For example, several HNC imaging datasets, predominantly composed of computed tomography (CT) images, have been hosted on The Cancer Imaging Archive (TCIA)[6]. Public datasets, such as these, have been crucial towards advanced algorithmic development for clinical decision support tools[7]. However, only a handful of existing HNC datasets have provided information germane to determining sarcopenia status in patients, namely that by Grossberg et al.[8] providing body composition analysis data based on abdominal imaging. Moreover, to date, there are no existing open-source repositories for body composition analysis data based on head and neck region imaging. Increasing evidence has shown the potential utility of sarcopenia determination using skeletal muscle in the head and neck region[2,9]. This is driven by the fact that many patients with HNC may not have abdominal imaging acquired as part of the standard workup, but will almost certainly have head and neck region imaging acquired, particularly due to its necessity for radiotherapy treatment planning[10] and staging purposes[11]. These head and neck imaging data could be used to train models for automated sarcopenia-related clinical decision making, as shown in previous studies[12]. Therefore, the dissemination of sarcopenia-related data derived from head and neck imaging is an unmet need that may foster more rapid adoption of automated HNC clinical decision support tools. Here we present the curation and annotation of a large-scale TCIA dataset of 394 patients with HNC for use in sarcopenia-related clinical decision making and body composition analysis. The primary contribution of this dataset is high-quality skeletal muscle and adipose tissue segmentation at the cervical vertebral level in an easily accessible and standardized imaging format, in addition to additional clinical demographic variables. These data can be leveraged to build models for body composition analysis and sarcopenia-related decision-making germane to HNC. Moreover, these data could form the basis for future data modeling challenges for sarcopenia-related decision-making in patients with HNC. An overview of the data descriptor is shown in Fig. 1.
Fig. 1

Data descriptor overview. The Cancer Imaging Archive (TCIA) head and neck squamous cell carcinoma (HNSCC) computed tomography dataset is used to generate muscle and adipose tissue segmentations at the third cervical (C3) vertebral level in Neuroimaging Informatics Technology Initiative (NIfTI) format. Additional demographic data (weight, height) is collected from electronic health records (EHR). The final newly distributed dataset can be used for body composition analysis, such as sarcopenia-related clinical decision-making.

Data descriptor overview. The Cancer Imaging Archive (TCIA) head and neck squamous cell carcinoma (HNSCC) computed tomography dataset is used to generate muscle and adipose tissue segmentations at the third cervical (C3) vertebral level in Neuroimaging Informatics Technology Initiative (NIfTI) format. Additional demographic data (weight, height) is collected from electronic health records (EHR). The final newly distributed dataset can be used for body composition analysis, such as sarcopenia-related clinical decision-making.

Methods

Study population and image details

To develop this dataset, imaging data from the TCIA head and neck squamous cell carcinoma (HNSCC) collection, a large repository of imaging data originally collected from The University of Texas MD Anderson Cancer Center, were utilized. Specifically, 396 patients with contrast-enhanced CT scans were selected from the 495 available patients in the “Radiomics outcome prediction in Oropharyngeal cancer” dataset[13,14]. These patients were selected due to their inclusion of the third cervical vertebral level on imaging. To summarize the underlying data, these were patients with histopathologically-proven diagnosis of squamous cell carcinoma of the oropharynx that were treated with curative-intent intensity-modulated radiotherapy. Imaging data was composed of high-quality CT scans of patients who were injected with intravenous contrast material. Images were acquired before the start of radiotherapy. Imaging data were provided in the Digital Imaging and Communications in Medicine (DICOM) standardized format. Additional details on the original imaging dataset are provided in the corresponding data descriptor[14] and TCIA website[13]. All DICOM images were previously de-identified, as described in previous data descriptors[8,14].

Skeletal muscle segmentation

For each CT image, the middle of the third cervical vertebra (C3) was located on a single axial slice and the skeletal muscle and adipose tissues were manually segmented. As described in previous publications[15], muscle and adipose tissue were defined in the ranges of −29 to 150 and −190 to −30 Hounsfield units, respectively to initially guide manual segmentation; manual corrections to the initial automatically generated segmentation were necessary due to the presence of non-desired tissues (i.e., vasculature, soft tissue) in the Hounsfield unit ranges implemented. Based on these criteria, the paraspinal and sternocleidomastoid muscles were included as part of the skeletal muscle segmentation, while subcutaneous, intermuscular, and visceral adipose compartments were included as part of the adipose segmentation. Skeletal muscle and adipose tissue were segmented by trained research assistants (B.O. and R.J.) and reviewed by a radiation oncologist with 4 years of post-residency experience (A.J.G.) using a commercial image-processing platform (sliceOmatic v. 5.0, Tomovision, Magog, Canada). Examples of skeletal muscle and adipose tissue segmentations with corresponding images are shown in Fig. 2. Segmentations were exported from sliceOmatic in .tag format, with the corresponding 2D axial slice in DICOM format.
Fig. 2

Segmentation examples for a subset of 25 cases. Each image corresponds to one patient. Images are single-slice computed tomography axial views with segmentations superimposed. The red regions correspond to skeletal muscle tissue and the yellow regions correspond to adipose tissue.

Segmentation examples for a subset of 25 cases. Each image corresponds to one patient. Images are single-slice computed tomography axial views with segmentations superimposed. The red regions correspond to skeletal muscle tissue and the yellow regions correspond to adipose tissue.

NIfTI conversion

The Neuroimaging Informatics Technology Initiative (NIfTI) file format is increasingly seen as the standard for reproducible medical imaging research[16]. Therefore, we converted all our segmentation (.tag) and imaging (.dcm) data to NIfTI format, in order to increase the interoperability and widespread utilization of these data. For all file conversion processes, Python v. 3.7.9[17] was used. An overview of the NIfTI conversion workflow for segmentations and images is shown in Fig. 3. In brief, using an in-house Python script, .tag files (sliceOmatic output) were read in binary format and converted into numpy format[18], trimmed to remove header information, and then re-sized to the corresponding size of the 2D DICOM axial slice (sliceOmatic output) which was also converted to numpy format, i.e., a 2D array. The slice location was determined from the 2D DICOM axial slice in tandem with the 3D DICOM image (acquired from the TCIA) using pydicom[19]; the 3D DICOM image was necessary to determine the relative position of the 2D axial slice on the 3D volume. A 3D array that contained the segmentation information was then created by filling in all non-segmented slices with 0s, yielding a 3D segmentation mask. Each 3D segmentation mask contained separate regions of interest (0 = background, 1 = muscle, 2 = adipose for example in Fig. 3). A 3D representation was selected for the segmentation masks so that segmentations could be used for 3D applications (e.g., in tandem with the original 3D images), in addition to 2D applications (e.g., in tandem with single slice 2D images). 3D segmentation masks were converted to binary masks in NIfTI format (separate binary files for muscle and adipose) using SimpleITK[20]; separate binary files for each tissue type were generated for ease of use, e.g., most auto-segmentation approaches utilize binary masks[21]. 3D CT DICOM images were loaded into Python using the DICOMRTTool[22] library, and then converted to NIfTI format using SimpleITK. Additional documentation on scripts used for conversion can be located on the corresponding GitHub repository: https://github.com/kwahid/C3_sarcopenia_data_descriptor.
Fig. 3

File conversion workflow for segmentations and images. Outputs from sliceOmatic software, i.e., .tag segmentation and 2D Digital Imaging and Communications in Medicine (DICOM) slice, are used to generate a 2D mask array of muscle and adipose tissue. Information from 2D DICOM slice and corresponding 3D DICOM image (acquired from corresponding The Cancer Imaging Archive dataset) are used to generate a 3D array, which is then converted to Neuroimaging Informatics Technology Initiative (NIfTI) format.

File conversion workflow for segmentations and images. Outputs from sliceOmatic software, i.e., .tag segmentation and 2D Digital Imaging and Communications in Medicine (DICOM) slice, are used to generate a 2D mask array of muscle and adipose tissue. Information from 2D DICOM slice and corresponding 3D DICOM image (acquired from corresponding The Cancer Imaging Archive dataset) are used to generate a 3D array, which is then converted to Neuroimaging Informatics Technology Initiative (NIfTI) format. Of the 396 cases converted through the previously mentioned workflow, one patient (TCIA ID 0435) had a DICOM CT file with image reconstruction errors, while another (TCIA ID 0464) was unable to be converted to NIfTI format successfully, thus necessitating their removal from the final dataset, yielding 394 image/segmentation pairs in NIfTI format. Also worthy of note, 4 cases (TCIA ID’s: 0226, 0280, 0577, and 0607) yielded partitioned segmentation masks (mask spread over several slices) secondary to export issues in sliceOmatic when loading images with oblique image orientations; these cases have been kept in the dataset for completeness but should likely not be used for most segmentation-related applications.

Additional patient demographic data collection

In addition to cross-sectional area derived from skeletal muscle segmentations, calculation of skeletal muscle index requires data concerning patient height and weight. In order to increase the usability of segmented regions of interest for sarcopenia-related calculations and model building, we also collected corresponding height (in m) and weight (in kg) data for all patients in our dataset. Anonymized TCIA IDs were mapped to existing patient medical record numbers to collect the corresponding data. Data were collected from the University of Texas MD Anderson Cancer Center clinical databases through the EPIC electronic medical record system by a manual review of clinical notes and paperwork. The Institutional Review Board of the University of Texas MD Anderson Cancer Center gave ethical approval for this work (RCR03–0800, waiver of informed consent). Height and weight were collected for the pre-radiotherapy visit only in accordance with the pre-radiotherapy imaging collected for this study. Clinical data collection was performed by a trained physician (D.E.).

Data Records

Segmentation data

This data collection consists of 788 3D volumetric compressed NIfTI files (394 skeletal muscle “muscle.nii.gz” files, 394 adipose tissue “fat.nii.gz” files) derived from an original collection of 394 DICOM files of pre-therapy CT images collected from 495 TCIA cases (“Radiomics outcome prediction in Oropharyngeal cancer”)[13,14]. The skeletal muscle and adipose tissue NIfTI files are binary masks (0 = background, 1 = tissue region of interest). While we do not provide the corresponding 394 CT images in NIfTI format due to Figshare upload size constraints, we do provide all the code necessary to produce these files (see Code availability section). In addition to NIfTI format files, we also include .tag segmentation files and corresponding 2D DICOM files (sliceOmatic outputs) for interested parties to recreate our NIfTI conversion pipeline if desired. Of note, we do not include the 3D DICOM CT files as these can be acquired from existing TCIA repositories[13,14].

Clinical data

We also provide a single comma-separated value (CSV) file containing additional clinical demographic data germane to sarcopenia clinical-decision making. Within the CSV file, in addition to newly collected height and weight variables, we also include previously publicly available clinical variables in the TCIA dataset[13,14] relevant for body composition analysis (age and sex). Segmentations are organized by an anonymized TCIA patient ID number (“TCIA Radiomics ID”) and can be cross-referenced against the CSV data table using this identifier. The raw data, data records, and supplemental descriptions of the meta-data files are cited under Figshare: 10.6084/m9.figshare.18480917[23].

Technical Validation

Skeletal muscle segmentations

The segmentations provided in this data descriptor have been utilized as ground-truth segmentations in a previous study by Naser et al.[12] which yielded sarcopenia determination results (normal vs. depleted skeletal muscle) that were consistent with existing literature[9], i.e., overall survival stratification is significant in males but not females as determined by Kaplan Meier analysis. Note: 4 patients included in the current data descriptor were excluded from the aforementioned analysis (TCIA ID’s: 0226, 0280, 0577, and 0607), due to oblique image orientation mask issues previously described in Methods.

EPIC (Electronic Medical Record System)

The University of Texas MD Anderson Cancer Center adopted this system in the year 2017 which allows integrating research data and accessing data from virtually every electronic source within the institution. https://www.clinfowiki.org/wiki/index.php/Epic_Systems.

Usage Notes

This data collection is provided in NIfTI format with the accompanying CSV file containing additional clinical information indexed by TCIA identifier. We invite all interested researchers to download this dataset to use in sarcopenia-related research and automated clinical decision support tool development. Images (reproducible through code) and segmentations are stored in NIfTI format and may be viewed and analyzed in any NIfTI viewing application, depending on the end-user’s requirements. Current open-source software for these purposes includes ImageJ[24] and 3D Slicer[25].
Measurement(s)skeletal muscle • adipose tissue
Technology Type(s)computed tomography
  17 in total

1.  The first step for neuroimaging data analysis: DICOM to NIfTI conversion.

Authors:  Xiangrui Li; Paul S Morgan; John Ashburner; Jolinda Smith; Christopher Rorden
Journal:  J Neurosci Methods       Date:  2016-03-02       Impact factor: 2.390

2.  SoftSeg: Advantages of soft versus binary training for image segmentation.

Authors:  Charley Gros; Andreanne Lemay; Julien Cohen-Adad
Journal:  Med Image Anal       Date:  2021-03-18       Impact factor: 8.545

3.  The Cancer Imaging Archive (TCIA): maintaining and operating a public information repository.

Authors:  Kenneth Clark; Bruce Vendt; Kirk Smith; John Freymann; Justin Kirby; Paul Koppel; Stephen Moore; Stanley Phillips; David Maffitt; Michael Pringle; Lawrence Tarbox; Fred Prior
Journal:  J Digit Imaging       Date:  2013-12       Impact factor: 4.056

4.  Quantitative Imaging Network: Data Sharing and Competitive AlgorithmValidation Leveraging The Cancer Imaging Archive.

Authors:  Jayashree Kalpathy-Cramer; John Blake Freymann; Justin Stephen Kirby; Paul Eugene Kinahan; Fred William Prior
Journal:  Transl Oncol       Date:  2014-02-01       Impact factor: 4.243

5.  Association of Body Composition With Survival and Locoregional Control of Radiotherapy-Treated Head and Neck Squamous Cell Carcinoma.

Authors:  Aaron J Grossberg; Sasikarn Chamchod; Clifton D Fuller; Abdallah S R Mohamed; Jolien Heukelom; Hillary Eichelberger; Michael E Kantor; Katherine A Hutcheson; G Brandon Gunn; Adam S Garden; Steven Frank; Jack Phan; Beth Beadle; Heath D Skinner; William H Morrison; David I Rosenthal
Journal:  JAMA Oncol       Date:  2016-06-01       Impact factor: 31.777

6.  The Design of SimpleITK.

Authors:  Bradley C Lowekamp; David T Chen; Luis Ibáñez; Daniel Blezek
Journal:  Front Neuroinform       Date:  2013-12-30       Impact factor: 4.081

7.  When the Loss Costs Too Much: A Systematic Review and Meta-Analysis of Sarcopenia in Head and Neck Cancer.

Authors:  Xin Hua; Shan Liu; Jun-Fang Liao; Wen Wen; Zhi-Qing Long; Zi-Jian Lu; Ling Guo; Huan-Xin Lin
Journal:  Front Oncol       Date:  2020-02-05       Impact factor: 6.244

8.  Establishment and Validation of Pre-Therapy Cervical Vertebrae Muscle Quantification as a Prognostic Marker of Sarcopenia in Patients With Head and Neck Cancer.

Authors:  Brennan Olson; Jared Edwards; Catherine Degnin; Nicole Santucci; Michelle Buncke; Jeffrey Hu; Yiyi Chen; Clifton D Fuller; Mathew Geltzeiler; Aaron J Grossberg; Daniel Clayburgh
Journal:  Front Oncol       Date:  2022-02-14       Impact factor: 5.738

9.  Is sarcopenia a predictor of prognosis for patients undergoing radiotherapy for head and neck cancer? A meta-analysis.

Authors:  Merran Findlay; Kathryn White; Natalie Stapleton; Judith Bauer
Journal:  Clin Nutr       Date:  2020-09-18       Impact factor: 7.324

10.  Matched computed tomography segmentation and demographic data for oropharyngeal cancer radiomics challenges.

Authors: 
Journal:  Sci Data       Date:  2017-07-04       Impact factor: 6.444

View more
  1 in total

1.  Deep learning auto-segmentation of cervical skeletal muscle for sarcopenia analysis in patients with head and neck cancer.

Authors:  Mohamed A Naser; Kareem A Wahid; Aaron J Grossberg; Brennan Olson; Rishab Jain; Dina El-Habashy; Cem Dede; Vivian Salama; Moamen Abobakr; Abdallah S R Mohamed; Renjie He; Joel Jaskari; Jaakko Sahlsten; Kimmo Kaski; Clifton D Fuller
Journal:  Front Oncol       Date:  2022-07-28       Impact factor: 5.738

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.