Metadata information for pathology images

Hi - I have a question related to extracting metadata from pathology images. I can see each case has multiple slides, and when I run a query for example for case 01BR001 on the CPTAC-BRCA dataset there are 7 associated pathology HE images. When I open the ‘image viewer’ the images show up and then I can see the actual slide ID (7 unique IDs).

Question: Is there a way to generate a table with the proper HE image metadata? I need to relate the slides ID which are the actual file names to what tissue they come from and if they are tumor or not (which is only visible on the image viewer)

Thank you


Alan Jerusalmi, PhD

Chief Innovation Officer & co-founder

Bio-AI Health

Cell#: +1 617-980-9948


1 Like

Alan, thank you for reaching out!

I prepared a brief python notebook to demonstrate how to access this information programmatically.

This notebook relies on the idc-index python package and some experimental functionality we are working on to support searching slide microscopy images.

Please let us know if you have any follow up questions. I apologize in advance the documentation for this is lacking right now, but hopefully the notebook will help you.

Some of the functionality in that notebook is very fresh, and will/should change in the future. If there are any features that are missing, please let me know, and we will consider for the future development plans of idc-index.

@ajerusalmi I realized that perhaps I did not provide more background/details along with the explanation.

Both the IDC Portal interface and the interface of the image viewer are populated from the metadata that is stored in the DICOM files containing the images.

First screenshot - portal: “Case ID” in the first screenshot above corresponds to DICOM PatientID. SeriesInstanceUID would typically map to a single pathology slide. StudyInstanceUID will contain one or more series that correspond to the slides that are grouped together because they were acquired using the same equipment around the same time.

Second screenshot - viewer: here “Description” field is populated from SeriesDescription DICOM attribute, and “Anatomical structure” is extracted from AnatomicStructureSequence DICOM attribute, which in turn is stored in SpecimenDescriptionSequence.

You will see that series description in some cases includes indication about the type of tissue in a given slide. SeriesDescription content is free text - you cannot expect consistency of those values across different collections, which can complicate search and selection. However, there may be present a a separate attribute that contains coded values describing the type of tissue: AnatomicStructureModifierSequence (a sub-sequence within AnatomicStructureSequence).

If the discussion above about sequences seems confusing do not worry - you will not need to parse those DICOM files! This information is just for the backround. The values of those attributes are extracted to make access more convenient.

Last thing to note. If you never worked with DICOM Slide Microscopy before, the file encoding conventions used in DICOM are different from such formats as TIFF or SVS. In DICOM, each series will typically be stored as multiple files, where each resolution level is stored separately. In DICOM nomenclature, each resolution level is mapped to a separate DICOM instance.

In case you are interested to learn more about DICOM support for digital pathology, check out this publication:

Herrmann et al. Implementing the DICOM Standard for Digital Pathology. J Pathol Inform. 2018 Nov 2;9:37. doi: 10.4103/jpi.jpi_42_18. PMID: 30533276; PMCID: PMC6236926. Implementing the DICOM Standard for Digital Pathology - PubMed

Please check out the notebook (same link as above) that I updated with a bit more explanations and details. I also updated it to include PatientID (which is “Case ID” in the IDC Portal).

If this is still confusing or incomplete, please let me know! It’s a complex topic!

Hi Andrey,

Thank you for the detailed response. I will test this out this week. Appreciate the fast response

1 Like