IDC Production Release

Thanks to the hard work of a superb team, IDC is now in production release!


Thanks to @farahank for announcing the production release!

The main highlights of the production release:

  • the amount of data available increased from ~1TB in the initial pilot release to >16TB
  • support for digital pathology added
  • introduction of versioning to support reproducible science
  • examples of use cases released to the community
  • API for programmatic access to cohorts

Details on the major milestones and improvements that were accomplished by @IDC_team in less than 12 months since the initial introduction of the IDC pilot:

We need input from YOU to guide our development!

Please give IDC a try: we have free cloud credits to help you get started. We welcome you to join our community and help us build this resource to benefit cancer research.

This looks like a lot of great additions to IDC. As a person working on CDA, I had one quick question regarding data versioning. We currently have a script that directly queries the BigQuery tables. Am I correct in my understanding that each data version will be stored under a different dataset (idc_v2, idc_v3, idc_v4, etc.)?

Thank you @Donovan_Ruth!

Yes, this is correct. In addition to this, we are maintaining idc_current view that can be used as an alias to the latest version.