Hi,
I am writing a application that will use API requests to NCI CRDC resources to access data file objects for particular TCGA patients by using the patient barcodes.
I have used the IDC exploration tool/GUI to confirm that a patient of interest exists in your collections, and there are open access files I would like to download. However, I cannot figure out how to use simple API calls from my terminal to download all open access files for this patient to my local computer.
Ideally, I would like to do the following all from the command-line without relying on a web-browser / GUI:
- send a query to the IDC API containing a patient barcode, e.g., “TCGA-BH-A18U”, and receive a list of all files by their GUID (or DRS ID) in IDC for that patient.
- send a download request (ideally using DRS IDs) to download the open access files.
I’ve taken a look at the swagger docs for the IDC API, but it’s unclear if this is currently possible.
Best,
Chris
send a query to the IDC API containing a patient barcode, e.g., “TCGA-BH-A18U”, and receive a list of all files by their GUID (or DRS ID) in IDC for that patient.
To do this, you would first use BigQuery API against the IDC dicom_all
table:
SELECT
gcs_url
FROM
`bigquery-public-data.idc_current.dicom_all`
WHERE
PatientID = "TCGA-BH-A18U"
If you want to run this query from the shell, you could use gcloud bq
command line tool. This will return you GCS URLs for all of the files that correspond to the specified PatientID
.
send a download request (ideally using DRS IDs) to download the open access files.
Given the list from the above, you could follow the instructions here to download the corresponding files: https://learn.canceridc.dev/data/downloading-data.
We do not have DRS IDs for the current version of IDC data, since for some reason it takes very very very long time to index IDC data by Gen3. If you want details on this, @bill.clifford can provide a more detailed explanation.
Please let us know if this does not answer your question.