What is the difference between the three AWS buckets?

Chris_Hafey · July 27, 2023, 11:41am

Hello - I see that there are three different buckets on AWS:
arn:aws:s3:::idc-open-data (58TB)
arn:aws:s3:::idc-open-data-two (235GB)
arn:aws:s3:::idc-open-data-cr (haven’t checked)

What is the difference between these three buckets (other than size)? I apologize if this is already documented but I couldn’t find it after about 15 minutes of reading/searching

fedorov · July 27, 2023, 1:26pm

Hey Chris, welcome to the forum!

This is not covered in the documentation, so I am glad you did not spend more than 15 minutes searching for the answer!

idc-open-data contains most of the data, which is covered by a non-restrictive license (CC-BY or like) and were not labeled as such that may contain head scans
idc-open-data-two contains collections that may contain head scans. This is done for the collections that were labeled as such by TCIA, in case there is a change in policy and we need to treat such images in any special way in the future
idc-open-data-cr contains data that is covered by a license that restricts commercial use (CC-NC). Note that the license information is available at the granularity of the individual files in the IDC BigQuery tables as explained in this tutorial - you do not need to check the bucket name to get the license information

Chris_Hafey · July 27, 2023, 3:31pm

Thanks for the quick response. A few more questions:

Is there any duplication of data between the buckets?
Why are you calling out “may contain head scans”?

fedorov · July 27, 2023, 3:45pm

Is there any duplication of data between the buckets?

No, at least there should not be. If you see duplication, please let us know.

Why are you calling out “may contain head scans”?

Because that’s the designation that was assigned by TCIA.

fedorov · July 21, 2025, 8:15pm

IDC documentation now summarizes the information about the cloud buckets it maintains on this page: Files and metadata | IDC User Guide.

Topic		Replies	Views
`public-datasets-idc` Google Storage bucket decommissioning Announcements maintenance	0	57	July 22, 2025
IDC May 2023 release Announcements release	1	427	May 11, 2023
IDC is now in Google Public Dataset Program + Dec 2021 release is out! Announcements release	1	1050	December 22, 2021
IDC March 2023 release Announcements release	0	807	March 16, 2023
Using API to get images for a particular TCGA patient barcode Data	1	610	December 7, 2021

What is the difference between the three AWS buckets?

Related topics