Hello - I see that there are three different buckets on AWS:
arn:aws:s3:::idc-open-data (58TB)
arn:aws:s3:::idc-open-data-two (235GB)
arn:aws:s3:::idc-open-data-cr (haven’t checked)
What is the difference between these three buckets (other than size)? I apologize if this is already documented but I couldn’t find it after about 15 minutes of reading/searching
1 Like
Hey Chris, welcome to the forum!
This is not covered in the documentation, so I am glad you did not spend more than 15 minutes searching for the answer!
idc-open-data
contains most of the data, which is covered by a non-restrictive license (CC-BY or like) and were not labeled as such that may contain head scans
idc-open-data-two
contains collections that may contain head scans. This is done for the collections that were labeled as such by TCIA, in case there is a change in policy and we need to treat such images in any special way in the future
idc-open-data-cr
contains data that is covered by a license that restricts commercial use (CC-NC). Note that the license information is available at the granularity of the individual files in the IDC BigQuery tables as explained in this tutorial - you do not need to check the bucket name to get the license information
Thanks for the quick response. A few more questions:
- Is there any duplication of data between the buckets?
- Why are you calling out “may contain head scans”?
- Is there any duplication of data between the buckets?
No, at least there should not be. If you see duplication, please let us know.
- Why are you calling out “may contain head scans”?
Because that’s the designation that was assigned by TCIA.
1 Like