As we announced a while ago, TCIA made the decision to pull a subset of data from public access collections to limited access. At the moment, we still keep those files that used to be public in IDC, and the metadata for those files is still accessible in our BigQuery tables, but you cannot download those “Limited” access files referenced by
gcs_url from IDC.
As discussed in this post the issue will manifest itself in an error accessing
gcs_url that corresponds to a non-public file:
AccessDeniedException: 403 <user email> does not have storage.objects.list access to the Google Cloud Storage bucket.
bigquery-public-data.idc_current.dicom_all table has a column named
access , which takes values
Limited that define if the file corresponding to the instance can be accessed. For all practical purposes, if you interact with the IDC BigQuery tables, you should make sure you exclude “Limited” access items using the following clause in your query:
SELECT ... FROM `bigquery-public-data.idc_current.dicom_all` WHERE access <> "Limited"
In the upcoming release of IDC we will by default exclude limited access items from what you select in the portal, so the portal selection should be more intuitive. But if you access the data via BigQuery queries you will need to know that “Limited” are not accessible and account for this in your query.