Question: IDC API Vs Big Query

Hi,
I’m a developer building Biomedical Research Hub Data Commons. I was looking into a way to fetch data related to IDC studies, in such a way that, the information available in BRH will always be up to date with respect to the data available in IDC. I got a few questions related to that.

  1. I see this endpoint – https://api.imaging.datacommons.cancer.gov/v1/collections which has the information I need, but wanted to make sure if the api is still being maintained?

  2. I also see a lot of discussions talking about Big Query usage, wanted to make sure what would be the best way to get the info I need. API or BigQuery?

In order to fetch the info I need, I’ve been told to use this query in the BigQuery console

Query

SELECT *
   FROM canceridc-data.idc.data_collections_metadata
   LIMIT 1000

But this query only yields 25 results, despite of the API returning 128 results. Wasn’t sure if I was using the right query.

  1. The subject value in the BigQuery result set is different from the subject_count value in the API, are the fields different? If not, which one of those is stale?

Your help would be sincerely appreciated.

Thank you

The API should be functional. If you see problems, we will need to look into this. @bill.clifford should be able to help with this.

BigQuery can be used to access all of the metadata, you can access it using standard SQL. API exposes a very tiny bit of metadata, mostly what is available in the IDC portal. I personally use BigQuery for all the data querying needs.

Can you please let me know who told you to use that query? That query is using the wrong table, and if it is mentioned anywhere in public documentation or examples we should fix this.

You can take a look at this notebook to get started with IDC BigQuery tables: https://github.com/ImagingDataCommons/IDC-Examples/blob/master/notebooks/getting_started/part2_searching_basics.ipynb. If you let me know what exactly you want to query, I am happy to help you put together the queries and confirm you are searching the right tables.

The API at https://api.imaging.datacommons.cancer.gov/v1/collections is functional and maintained. It is primarily intended to allow programmatic interaction for cohort creation, as opposed to interactive use of the portal. But for the definitive read-only source of what is available in IDC, using the BigQuery tables now hosted in the Google Public Data program is probably the way to go. As Andrey says, the BQ dataset and table you are asking about are out-of-date.