Question: IDC API Vs Big Query

nss10 · December 4, 2022, 4:30am

Hi,
I’m a developer building Biomedical Research Hub Data Commons. I was looking into a way to fetch data related to IDC studies, in such a way that, the information available in BRH will always be up to date with respect to the data available in IDC. I got a few questions related to that.

I see this endpoint – https://api.imaging.datacommons.cancer.gov/v1/collections which has the information I need, but wanted to make sure if the api is still being maintained?
I also see a lot of discussions talking about Big Query usage, wanted to make sure what would be the best way to get the info I need. API or BigQuery?

In order to fetch the info I need, I’ve been told to use this query in the BigQuery console

Query

SELECT *
   FROM canceridc-data.idc.data_collections_metadata
   LIMIT 1000

But this query only yields 25 results, despite of the API returning 128 results. Wasn’t sure if I was using the right query.

The subject value in the BigQuery result set is different from the subject_count value in the API, are the fields different? If not, which one of those is stale?

Your help would be sincerely appreciated.

Thank you

fedorov · December 4, 2022, 6:10pm

The API should be functional. If you see problems, we will need to look into this. @bill.clifford should be able to help with this.

BigQuery can be used to access all of the metadata, you can access it using standard SQL. API exposes a very tiny bit of metadata, mostly what is available in the IDC portal. I personally use BigQuery for all the data querying needs.

Can you please let me know who told you to use that query? That query is using the wrong table, and if it is mentioned anywhere in public documentation or examples we should fix this.

You can take a look at this notebook to get started with IDC BigQuery tables: https://github.com/ImagingDataCommons/IDC-Examples/blob/master/notebooks/getting_started/part2_searching_basics.ipynb. If you let me know what exactly you want to query, I am happy to help you put together the queries and confirm you are searching the right tables.

wlongabaugh · December 5, 2022, 9:18pm

The API at https://api.imaging.datacommons.cancer.gov/v1/collections is functional and maintained. It is primarily intended to allow programmatic interaction for cohort creation, as opposed to interactive use of the portal. But for the definitive read-only source of what is available in IDC, using the BigQuery tables now hosted in the Google Public Data program is probably the way to go. As Andrey says, the BQ dataset and table you are asking about are out-of-date.

Topic		Replies	Views
Use API to get collections metadata Support tutorial	7	526	April 22, 2021
Access to IDC clinical data through `idc-index` Support	1	79	May 6, 2024
API to get patients of a collection Support	6	439	April 28, 2022
Downloading Studies (DICOMS) from IDC which is not the current one - paying customer Support	1	22	July 14, 2025
How to Get SQL running on BigQuery to work with idc-index package? Support	2	30	February 3, 2025

Question: IDC API Vs Big Query

Related topics