IDC Production Release

fedorov · September 29, 2021, 6:27pm

Thanks to @farahank for announcing the production release!

The main highlights of the production release:

the amount of data available increased from ~1TB in the initial pilot release to >16TB
support for digital pathology added
introduction of versioning to support reproducible science
examples of use cases released to the community
API for programmatic access to cohorts

Details on the major milestones and improvements that were accomplished by @IDC_team in less than 12 months since the initial introduction of the IDC pilot:

the number of collections available in IDC increased from 27 (~1TB) to 113 (>16TB of data)
- National Lung Screening Trial (NLST) collection ( >26K patients, >73K studies, >11TB in size) is now available in IDC!
we added support for DICOM digital pathology
- digital pathology component of the two of the CPTAC collections, CPTAC-LSCC and CPTAC-LUAD were converted into the DICOM/TIFF dual personality format and are included in the release
- open source SliM viewer is now integrated with IDC to support visualization of DICOM Slide Microscopy modality
we added support for IDC data versioning, which means you will always be able to access the precise set of files you used in your analysis as defined by DICOM SOPInstanceUIDs that are unique and resolvable within IDC, or CRDC Globally Unique Identifiers (GUIDs), even if the collection(s) containing those files has been updated
a number of analysis use cases have been developed, and are now available as Colab Notebooks demonstrating examples of how IDC data can be analyzed on the cloud
- DeepPrognosis use case - replication study, 2 year survival score of NSCLC patients
- Lung Nodules segmentation and prognosis use case - NSCLC patients nodules segmentation (nnU-Net) and prognosis (DeepPrognosis)
- Thoracic Organs at Risk segmentation use case - NSCLC patients thoracic OAR segmentation (nnU-Net)
- Tissue classification in slide microscopy images - this tutorial builds on the publication “Classification and mutation prediction from non–small cell lung cancer histopathology images using deep learning” (Coudray et al., 2018), one of the most cited pathomics publications in recent years
IDC API is now live, enabling programmatic access to the functionality available through the IDC portal, including authenticated operations with IDC cohorts. IDC API complements Google BigQuery and Cloud Storage APIs that are available to query IDC metadata tables and retrieve files hosted in IDC
numerous features and bug fixes were implemented in IDC Portal; most prominently, cohorts support was enhanced by integrating with IDC data versioning - no matter what version of IDC data you used to form your cohort, you will always be able to export the manifest, or apply the cohort filter to the current IDC data version
we added various examples demonstrating how Google Cloud tools can be used to enable exploration and analysis of IDC data, and reproducibility in AI reserch. You can learn how to
- set up a GCP Compute Engine VM with the desktop interface to 3D Slicer
- use Google DataStudio to build a highly customizable dashboard to explore metadata related to your cohort beyond what can be done with IDC portal, see live dashboard here
- use BigQuery and SQL to get quick access to all of the DICOM metadata extracted from the 36+ million (and counting) DICOM instances available in IDC
IDC implemented security controls and gained Authority to Operate at the Federal Information Security Modernization Act (FISMA) Low level
our launch of the IDC pilot cloud credit program was successful, with a growing number of community members onboarded and using IDC credit allocation for their research (you can see some highlights of this work presented by IDC users in the recording of the “Infrastructure and Standards” session at SIIM Conference on Machine Intelligence in Medical Imaging 2021,
session)
we had numerous presentations and tutorials at such venues as MICCAI, RSNA, ASTRO, AAPM, SIIM CMIMI, NCI Imaging Community webinar
we published an open access manuscript with the overview and vision for IDC role in the community, accompanied by demonstration videos highlighting some of the key functions of the system

We need input from YOU to guide our development!

Please give IDC a try: we have free cloud credits to help you get started. We welcome you to join our community and help us build this resource to benefit cancer research.

Topic		Replies	Views
IDC May 2023 release Announcements release	1	428	May 11, 2023
IDC March 2023 release Announcements release	0	807	March 16, 2023
IDC data release v24 May 2026: New GDC projects, preclinical models, updates from TCIA Announcements release	0	92	May 14, 2026
IDC pilot v3: August 2021 release announcement Announcements release	0	523	September 13, 2021
IDC July 2023 release Announcements release	0	1998	July 17, 2023

IDC Production Release

Related topics