Tutorial notebook preparation

Hi, I’m in the process of exploring the IDC portal to prepare a tutorial notebook for the platform.

For now, to explore the use of BigQuery and play around with the IDC, I’ve copied one of the Colab notebooks examples to my personal Google Drive (using the same address I’ve used to register on the IDC).
I have some questions about the specifics of the tutorial creation:

  1. I see we have access to a GCP project; I wondered if it’s up to us to pick what type of VM to use or if there are guidelines to follow. Related to this point, I was thinking of using the Notebook API to play around, and then once ready, migrate the code to a Colab notebook to share on the IDC GitHub page; would this be ok?
  2. Should/Can the tutorial notebook that I’ll prepare rely on a bucket hosted in the IDC-project? (for example, if I decide to convert and store the images as nifti).
  3. Should I see the $300 credits somewhere on the GCP project page, or will it appear once I’ll start being charged?

Thanks!

1 Like

@giemmecci happy to see you exploring IDC, and thanks for your very valid questions!

Yes, this would be ok. You should keep in mind though that different VM configurations have different costs associated. You should be monitoring your project budget and plan accordingly!

You could, but I don’t recommend you should. If you do, this may complicate ensuring your notebook is broadly applicable persistent. I would strongly recommend that your workflow is operating on the data maintained by IDC directly, and any conversions you need to do are done within your notebook! It is our responsibility to ensure provenance, appropriate usage, versioning, etc for the data maintained by IDC. You probably do not want to be on the hook to do this for the data that will result from your conversions! :wink:

Depending on your permissions within the project, you may or may not have access to the Billing section. You, or someone within your project, should be able to monitor the usage. You are also strongly encouraged to set budget alerts, as described in this article: Set budgets and budget alerts  |  Cloud Billing  |  Google Cloud.

Noting that you are one of the users taking advantage of the IDC-provided cloud credits, keep in mind that procedures related to administration of the IDC cloud credits are outlined in this document (linked from IDC documentation on this page).

1 Like

I can’t see the “Budgets & alerts” sub-menu under “Billing”, I guess it means that I don’t have permission to set up budget alerts in my current project? If so, given that is an IDC-managed project, would it be possible to either have the permissions or set up the alerts?

Thanks!

It may be that with your role in the project you won’t be able to set alerts, and I definitely see how you would want to have those alerts. I may have mis-spoken, and I do not have the permissions on the project to check your permissions.

@wlongabaugh can you comment on previous post from @giemmecci? What is our guidance to the user who wants to set a budget alert?

1 Like

Unfortunately, the single billing account shared by all IDC hosted projects is locked down, and Google’s system is designed to alert the account owners instead of the project members. I am not aware of a way for project members to set up alerts, and the way we have it set up does not actually have “credits” that you spend down. Instead, we will let you know when the budget approaches the set amount ($600 for your multi-person project). The best way to keep an eye on costs is by having the optional billing tile display on the project dashboard (see step 8 of the document Andrey mentions above). That provides lots of details on how the money is spent. When I just checked, the only expenses so far for idc-external-005 were for BigQuery and logging, and since they are below Google’s free monthly allocation, the actual charges are $0 at this point.

Thanks for the feedback, and you have a good point that setting up some infrastructure to support email alerts to project members makes sense. We will add it to the list of requested features.

Bill

2 Likes

Hi, I’ll work together with @Yashbir143 for one of the tutorial notebooks, and the idea is to show how to link IDC and GDC data for a given dataset (in our case, TCGA-GBM).

My understanding is that in the future we’ll be able to do this directly from the IDC website, but right now the matching needs to be performed “manually”. Is that correct? And if so, would a notebook showing how to link imaging and genomic information be useful for a tutorial?

Thanks!

Great idea!

No, matching of data across repositories in CRDC will be done by a different component - Cancer Data Aggregator (CDA) (some details are available here: https://datascience.cancer.gov/news-events/blog/cancer-data-aggregator-engine-could-drive-data-aggregation-whole-new-way). Yes, I think your perspective on how to link this information would be very helpful. We can later discuss with you, if you are interested, more details about CDA.

1 Like