What is the difference between the three AWS buckets?

Hello - I see that there are three different buckets on AWS:
arn:aws:s3:::idc-open-data (58TB)
arn:aws:s3:::idc-open-data-two (235GB)
arn:aws:s3:::idc-open-data-cr (haven’t checked)

What is the difference between these three buckets (other than size)? I apologize if this is already documented but I couldn’t find it after about 15 minutes of reading/searching

1 Like

Hey Chris, welcome to the forum!

This is not covered in the documentation, so I am glad you did not spend more than 15 minutes searching for the answer!

  • idc-open-data contains most of the data, which is covered by a non-restrictive license (CC-BY or like) and were not labeled as such that may contain head scans
  • idc-open-data-two contains collections that may contain head scans. This is done for the collections that were labeled as such by TCIA, in case there is a change in policy and we need to treat such images in any special way in the future
  • idc-open-data-cr contains data that is covered by a license that restricts commercial use (CC-NC). Note that the license information is available at the granularity of the individual files in the IDC BigQuery tables as explained in this tutorial - you do not need to check the bucket name to get the license information

Thanks for the quick response. A few more questions:

  1. Is there any duplication of data between the buckets?
  2. Why are you calling out “may contain head scans”?
  1. Is there any duplication of data between the buckets?

No, at least there should not be. If you see duplication, please let us know.

  1. Why are you calling out “may contain head scans”?

Because that’s the designation that was assigned by TCIA.

1 Like