`public-datasets-idc` Google Storage bucket decommissioning

IDC data is replicated between Google Cloud Storage (GCS) and AWS buckets, and is available for free egress from either location, with the storage bucket organization documented in this article.

Up until IDC release v19, most of the IDC data in GCS was stored in the bucket public-datasets-idc.

Due to internal reorganizations of IDC Google Cloud projects, with IDC release v20 we started using idc-open-data GCS bucket as a replacement of public-datasets-idc. Right now, public-datasets-idc is still available, and any URLs pointing to the files in that bucket will resolve. At the end of Q3 2025 public-datasets-idc bucket will be decommissioned, and its content will no longer be available.

How does it affect users of IDC?

If you refer to the files in the public-datasets-idc bucket (e.g., from a manifest), those links will not resolve after Q3 2025.

I only rely on AWS buckets for downloading IDC data - does this change affect me?

No. If your workflows and manifests rely on AWS as the source of data, there are no changes for you.

Does it mean the files that are stored in the decommissioned bucket will no longer be accessible?

All of the files are already replicated in the GCS idc-open-data bucket (as well as AWS idc-open-data bucket, as discussed here). The organization of the files in the new bucket is identical to the prior one. You can access content by replacing public-datasets-idc with idc-open-data in the URL.

I.e., if you refer to a DICOM series in s3://public-datasets-idc/87040cd1-da73-4d21-ae31-5b5d329c9fce, the same content is available in s3://idc-open-data/87040cd1-da73-4d21-ae31-5b5d329c9fce.

How can I make sure my workflows are not affected by this update?

If you are using IDC Portal or BigQuery tables starting with v20 or later, or if you rely on idc-index v0.8.0 or later, there should be no references to the GCS public-datasets-idc bucket, and you will not be affected.

Note that BigQuery tables corresponding to IDC releases v19 and earlier will continue to refer to the URLs in the public-datasets-idc bucket in the gcs_url and gcs_bucket columns! If you use those tables, you will need to check and replace that with idc-open-data bucket name!

What should I do to prepare?

If you use idc-index for data access, upgrade it to the latest version. If you rely on BigQuery, migrate your workflows to the idc_current (idc_v21) BigQuery dataset. If you have legacy GCS manifests, replace bucket name if applicable as discussed earlier, and confirm you are able to access the data.

I don’t understand and I need more help!

Please ask your questions in a reply to this post, or send email to support@canceridc.dev.