Integrating QC tools to work with IDC

satishev · October 25, 2020, 4:27pm

I was not sure where else to post this, and figured this would be a good place to start the conversation. As some of you may be aware, we have been developing quality control tools for imaging and pathology data

GitHub - ccipd/MRQy: MRQy is a quality assurance and checking tool for quantitative assessment of magnetic resonance imaging (MRI) data.
GitHub - choosehappy/HistoQC: HistoQC is an open-source quality control tool for digital pathology slides

I saw a recent post about image variability within a sequence (clearly a use-case for QC), and I suspect it would be super useful to folks to be able to use QC tools when they are working with data from IDC. I wanted to start a discussion on this. This may be something the CCDH (@melissa?) is working on, but I wanted to reach out for any collaborative opportunities in this space.

pieper · October 25, 2020, 4:46pm

Sounds very good to me . Probably setting up MRQy against IDC is feasible now.

fedorov · October 25, 2020, 5:08pm

@satishev I do not think what you are suggesting is something that CCDH is working on, or something that is in their scope of interests.

But I agree overall it would be interesting to apply those tools against the data in IDC. Probably the easiest would be to make a colab notebook integrating with the metadata from IDC BigQuery tables, and fetching individual DICOM instances from the buckets. I think it should be straightforward for the most part, and the main chunk of work will be to adapt the ingestion part of your pipeline. Let us know if you want to work on this and have questions!

pihltd · October 26, 2020, 11:40am

@satishev, I just wanted to confirm what @fedorov was saying about CCDH, image quality is outside of their scope. Instead, CCDH is primarily focused on integrating the various data models used within CRDC and mapping across the various semantic ontologies. IDC will benefit in that it will be easier to search across CRDC and integrate images with the data from other CRDC data repositories.

satishev · October 26, 2020, 5:10pm

I think I understand. Are there any specific notebook examples or resources we should look at to start working on this?

Setting this up to be as useful or usable from the outset would ensure that folks actually use the tools, especially given the IDC will have multiple types of data available.

fedorov · October 26, 2020, 6:41pm

Yes! I think these are the relevant pointers:

Organization of data - IDC User Guide - you can learn here how the DICOM files and metadata tables are organized
Colab Notebooks - IDC User Guide - sample Colab notebooks

Please let us know when you have questions!

Topic		Replies	Views
Text2Cohort: a new LLM toolkit to query IDC database using Natural Language Queries Announcements	4	742	May 27, 2023
Magnetic Resonance imaging dashboard is now available Announcements dashboard	2	58	April 7, 2025
Metadata information for pathology images Support	4	157	May 21, 2024
IDC slide microscopy dashboard: when IDC Portal is not enough Data dashboard	0	38	November 25, 2024
Ingestion pipeline documentation Data	2	55	May 2, 2025

Integrating QC tools to work with IDC

Related topics