Storing definitions of data collections as DICOM entities

hmeine · June 29, 2022, 11:23am

Hi, I am not directly affiliated with IDC (yet), but I have a discussion topic that should be relevant to this audience and was pointed here by @fedorov:

I have been wondering whether and how it makes sense to represent training / validation / test group definitions in DICOM entities. My very specific goal is to be able to use a single DICOM UID for unambiguously referencing the data that should be used for training (and possibly another UID for a validation set). I thought it should suffice to give connection information to a PACS to my deep learning/CNN-based segmentation model algorithm container, together with these UIDs, and the training should be able to pull all necessary data from the PACS.

Taking a step back, I guess there are a few related use cases, potentially also within IDC, such as uniquely defining (versions and variants of) data collections.

My specific idea / impression was that Key Object Selection Documents may be usable for this purpose (disclaimer: impression based on heresay; I have not used them before), that could potentially refer to DICOM SEGs that could in turn refer to the original images, allowing for the training to pull all intended training data from the server.

Does my idea sound feasible? Are there role models / predecessors / related endeavors or use cases I did not see yet? Any public spec or experiences?

pieper · June 29, 2022, 1:23pm

Hi @hmeine - thanks for posting I agree that whatever we do in this area needs to leverage the dicom metadata and allow referencing input data and processing variables using a consistent structure that reuses dicom conventions. Whether it’s ever part of the standard doesn’t matter much at this point, but we can think of it as being in draft mode while the details are being worked through. I don’t think we have any really good examples yet, but I’m interested to hear what others have done.

fedorov · June 29, 2022, 2:57pm

@hmeine welcome to IDC forum!

I assume that you want to define your data collections in a way that spans multiple patients. If that is indeed the case, my understanding is that saving an object with such collection as a DICOM object would conflict with the DICOM view of the real world, since that object would need to fit into the patient->study->series->instance hierarchy, and what is going to be that PatientID specified in that object (whatever it is going to be - KO or SR)?

Of course, nothing prevents you from hacking your way through and make a container that has that list, but I am not entirely sure about the benefit. I completely agree definition of the collection/cohort should leverage DICOM UIDs, but it is not clear to me DICOM is even the right mechanism to communicate collections/cohorts.

In IDC collection/cohort can be defined using one of the 2 approaches, which will uniquely and persistently resolve to the list of files corresponding to that cohort:

as a list of SOPInstanceUIDs selected from IDC dicom_all table + IDC data version number
as an SQL query against IDC dicom_all table + IDC data version number

Related issue is that DICOM is not intended to handle versioning, and research applications and repositories must deal with versioning, so this adds another complexity, and perhaps motivation for repository- or solution-specific approach to cohort definition.

Of course, within individual patient/study one can utilize various existing capabilities of the standard to select individual series, or specify provenance of the analysis artifacts etc.

I would definitely love to hear @dclunie and @hackermd thoughts on this.

fedorov · June 29, 2022, 2:59pm

Coincidentally, I am at our regular weekly Office Hours here https://meet.google.com/xyt-vody-tvb until 11:30am Boston time - if anyone wants to drop by and discuss this topic in person!

dclunie · June 29, 2022, 3:27pm

Actually DICOM just added the capability to do something like this in Sup 223:

ftp://medical.nema.org/medical/dicom/final/sup223_ft_InventoryIODandServices.pdf

but it is brand new and not implemented yet, though there is no reason why you could not manually create and store an “Inventory Instance”, which as Sup 223 is a so-called “non-patient object”.

hmeine · June 29, 2022, 3:41pm

Thanks to both of you for the quick replies! I must admit that I did not consider the possibility that my motivation to store these as DICOM objects would be higher than yours.

So, let me take the “good cop” role and put my DICOM preaching hat on:

I am motivated to store the list permanently and immutably, and it makes a lot of sense that it is “close to the data itself”; it kind of belongs together IMO.
From the specific perspective of my task, it would be so convenient to specify one UID for the dataset, instead of a list that is not as easy to pass around via cmdline args, environment variables or so, and might also break / lose entries without notice.

I understand that this would be stretching the purpose of a regular PACS / the DICOM data model, OTOH I would find it quite disappointing to have to setup another server with a different protocol (HTTP/S3/…) just for serving the list of cases.

hmeine · June 29, 2022, 4:01pm

Thanks David (as you probably guessed, your post arrived after I composed mine), the Inventory Information Model looks very interesting indeed! I will have to study it. When implementing it, I would probably run into the problem that the new tags and constants are not not part of dcmtk / pydicom / dicom-rs etc. dictionaries yet, right?

I am not so concerned about it being a new supplement, since I am trying to get something running (in a “tentatively standard-conformant way”) which is only my own pieces of software talking to each other (and with a PACS, which hopefully does not care about unknown objects).

pieper · June 29, 2022, 5:02pm

Thanks @dclunie I wasn’t aware of that either.

I can’t speak for all libraries, but you should be able to supplement the dictionaries to define the new codes as a workaround and of course could submit PRs to add anything that’s missing.

fedorov · June 29, 2022, 5:26pm

@dclunie I was completely unaware of this - this is great, and we should discuss this in more detail and evaluate its utility in the context of IDC!

Are there any publicly available samples? I looked around the FTP server, but either there are none, or (more likely) I have no idea how to navigate those folders.

dclunie · June 29, 2022, 7:07pm

No samples AFAIK - this is brand new (was just completed last week).

fedorov · June 29, 2022, 7:33pm

@hmeine I think it is quite possible that your PACS may balk at those unknown objects, since that supplement is introducing a new SOPClassUID, and it would be quite logical for a PACS to refuse object classes not listed in the conformance statement.

hmeine · June 29, 2022, 7:39pm

I am considering Orthanc or DCM4CHEE; this is not about a clinical installation (just a research installation under our control in a hospital). Would you also expect these less clinically oriented „PACS-like systems“ to have a problem with unknown SOP Class UIDs?

fedorov · June 29, 2022, 7:51pm

I don’t know, I don’t have much experience with those. But I think it should be easy to test. You can, for example, take any DICOM object, replace SOPClassUID with any value (which would be a valid VR-wise, but not a value listed in the standard), and try to send and retrieve that object with Orthanc or DCM4CHEE. This must be much faster and easier than asking the maintainers through their forum etc.

hmeine · July 1, 2022, 12:20pm

Just a quick remark while I am busy with other stuff and still could not read the full supplement: The slides on Sup 223 state that the persistent inventory object is “Managed like other DICOM non-patient objects” (slide 9). I found that interesting because of the discussion above about DICOM focusing on per-patient objects.

hmeine · July 10, 2022, 12:15pm

When reading the supplement, one area where my intended use case and the supplement’s original motivation diverge is that creating a copy of the contents is not what I want; my inventory should just refer to the necessary other objects. It’s clear that the latter is somehow also supported (in any case, the original data will be fully identified), but much of the text seems to assume that one wants the inventory to contain all the data. I am not very experienced with reading the standard, even less so with supplements, so I do not feel very confident yet on how good of a fit this is.

fedorov · July 23, 2022, 2:47am

Looking at the Inventory Module, it does appear that it references studies/series/instances by UIDs. I do not see where it would keep a copy of all of the content of their attributes.

kirbyju · November 15, 2023, 6:51pm

Hi all, I am listening to a SIIM webinar right now geared at spreading awareness and encouraging adoption of Sup 223. Has there been any further consideration of how you might use this for IDC? And have any of you heard of support for this being implemented or considered in open source PACS like Orthanc or DCM4CHEE?

fedorov · November 15, 2023, 7:51pm

Justin, I recently have been in contact with one of the authors/proponents of Sup 223, and we may meet at RSNA to chat about this in person.

On the IDC side, there has been no further consideration or discussion of this capability. I have communicated to the aforementioned author of this supplement that IMHO sharing of publicly available samples that exemplify encoding of such objects, sample python/pydicom code demonstrating how to use such objects, and support of these new objects in the (one and only dciodvfy!) DICOM validator would be absolutely essential for this supplement to gain even consideration in the community. Those would also go a long way in “spreading awareness and encouraging adoption”

dclunie · January 29, 2024, 4:08pm

If FHIR is in use, there is also ImagingSelection - FHIR v5.0.0

hmeine · January 30, 2024, 2:00pm

For the record, in the DICOM breakout session during the NAMIC Project Week yesterday, @dclunie also referred to classic modality worklists as a concept that relates to lists of entities. If I understood correctly, he suggested that one could consider the AI training as a “modality” / task that is instructed with the worklist. (At the same time, David stated that it’s probably not a very attractive method, because those AI people are usually not interested in the additional DICOM infrastructure and its complexity.)

https://www.dicomstandard.org/using/dicomweb/workflow-ups-rs

Topic		Replies	Views
Understanding what images are available for a given patient via BigQuery API Support bigquery	2	507	December 16, 2022
IDC November 2022 release Announcements release	0	355	November 18, 2022
IDC September 2022 release - clinical data Announcements release	0	820	September 26, 2022
IDC March 2023 release Announcements release	0	800	March 16, 2023
IDC Nov 2024 release v20: nuclei segmentations + more data and features Announcements release	2	416	January 17, 2025

Storing definitions of data collections as DICOM entities

Related topics