Downloading datasets (e.g. DICOM stacks) from the IDC

Hi -

We have been exploring the IDC imaging repository, and I see that when we view image stacks we have an option of downloading a single slice snapshot. However, is it possible to download the whole stack? like, the DICOM file?

As an example, we’d want to download the full stack of a dataset like this:
https://viewer.imaging.datacommons.cancer.gov/viewer/1.3.6.1.4.1.14519.5.2.1.6279.6001.224985459390356936417021464571?seriesInstanceUID=1.2.276.0.7230010.3.1.3.0.57823.1553343864.578877,1.3.6.1.4.1.14519.5.2.1.6279.6001.273525289046256012743471155680

Thanks,
Jonathan

1 Like

Do you have any objections if I convert this support request to a public forum post, so that other users can benefit from the discussion?

Absolutely no objections!

Best,
Jonathan

Jonathan, thank you for asking the question! I understand you submitted it via support email. Please create an account on IDC forum here to participate in the conversation in the public thread: https://discourse.canceridc.dev/.

Yes, it is possible!

If you have the viewer URL, and this is what you want to use to define what you want to download, you should know that the structure of the URL is

https://viewer.imaging.datacommons.cancer.gov/viewer/<DICOM StudyInstanceUID>?seriesInstanceUID=<comma separated DICOM SeriesInstanceUIDs>

You should check out our instructions for downloading data from IDC here: https://learn.canceridc.dev/data/downloading-data.

Specifically, in the step where you create the manifest, you would specify either StudyInstanceUID (if you want to download the entire study), which for your specific URL will be the following:

# Select all files for a given DICOM study
SELECT gcs_url
FROM `bigquery-public-data.idc_current.dicom_all`
WHERE StudyInstanceUID = "1.3.6.1.4.1.14519.5.2.1.6279.6001.224985459390356936417021464571"

or if you want to download just the files corresponding to the two series in the URL, you would have this query:

# Select all files for the given DICOM series
SELECT gcs_url
FROM `bigquery-public-data.idc_current.dicom_all`
WHERE SeriesInstanceUID IN  ("1.2.276.0.7230010.3.1.3.0.57823.1553343864.578877","1.3.6.1.4.1.14519.5.2.1.6279.6001.273525289046256012743471155680")

Given these queries, you should be able to proceed with the steps to download the files corresponding to those DICOM entities.

If you have any questions or suggestions about download instructions, please do let us know - we are continuously refining that page trying to make it as simple as possible!

Looking forward to your response.

1 Like