Extracting radiomics features from existing segmentations

Hi @fedorov , excellent work! This NLST analysis left me wondering if IDC plans to do anything else along these lines. In particular, IDC has a lot of existing SEG/RTSTRUCT tumor and organ segmentation data in it for which there are no publicly shared radiomics data (not even tumor volume in many cases).

Since we have standardized image features that have been agreed upon by many experts in the community via the International Biomarker Standardization Initiative (IBSI) and implemented in tools such as Pyradiomics, would it make sense for IDC to compute these features? This would have two major benefits to the research community:

  1. Converting image data into text-based features creates a new opportunity for researchers from other domains without image processing expertise to explore correlations between images and other datatypes (e.g. genomics, proteomics, clinical).
  2. It would prevent duplicative effort by the many users who would likely run such a tool as a preliminary step in their analyses.

If it’s too much to take on by yourselves, could there be a way to crowdsource this analysis such that anyone who applies Pyradiomics to < Collection XYZ > can easily share their notebook/results in a place on IDC that everyone knows to look for them?

1 Like

Justin, this is a fair point and an interesting idea to explore!

I see the situation as a bit more nuanced. Although the features definitions are standardized, for many of the features their values will be highly dependent on preprocessing and parameterization of the extraction process.

For the segmentations in the TotalSegmentator-CT-Segmentations collection we only extracted first order and shape features, as those do not depend on the choices related to binning, smoothing, rescaling etc.

I do think that computing features for the existing segmentations should add a lot of value to the data. Extraction of the first order and shape features should be not controversial. Going beyond those may not be straightforward and may need to be customized to specific acquisition protocols.

For the regions defined by RTSTRUCT contours, additional complexity may arise in converting those contours to binary segmentations. Again, this may require extra effort to investigate, and may easier be done at the granularity of the individual collections to control for collection-specific conventions.

Crowdsourcing is great when there are huge crowds with the necessary qualifications waiting to work on a task. With radiomics feature extraction we would want to be certain the extraction process was done properly, and the provenance of how it was done can be tracked.

Certainly, if there is a group/individual with a demonstrable track record in radiomics research willing to contribute those features, we would be very interested to discuss this. Converting RTSTRUCT into binary segmentations and quality control of the results may on its own become an exploration (e.g., see RT Structure Set conversions to binary label maps - SlicerRT vs plastimatch · Issue #105 · SlicerRt/SlicerRT · GitHub and you can easily google more examples like that).

Overall, it is not computation that is a bottleneck, but configuring the extraction and addressing peculiarities and heterogeneity of the data.

In any case, thank you for the suggestion! It is indeed a good one.

The MIRP radiomics tool we discussed briefly on another forum supports RTSTRUCT natively. It would be interesting to know if that solves the problem you’re raising here or falls victim to the same issues.

Anyway, thanks for getting back to me and I hope this comes to fruition at some point!

RTSTRUCT instances can contain some regions that are very difficult to rasterize correctly, so it would be really interesting to identify some of those cases and test out the various tools.

The 4 CPTAC analysis result datasets that you all are hosting on IDC contain a lot of RTSTRUCT tumor segmentations that might be useful for testing. I’m in the middle of trying to apply MIRP’s morphological feature extraction function to all of them. If someone on your side wants to compare notes at some point let me know!