I was not sure where else to post this, and figured this would be a good place to start the conversation. As some of you may be aware, we have been developing quality control tools for imaging and pathology data
I saw a recent post about image variability within a sequence (clearly a use-case for QC), and I suspect it would be super useful to folks to be able to use QC tools when they are working with data from IDC. I wanted to start a discussion on this. This may be something the CCDH (@melissa?) is working on, but I wanted to reach out for any collaborative opportunities in this space.
@satishev I do not think what you are suggesting is something that CCDH is working on, or something that is in their scope of interests.
But I agree overall it would be interesting to apply those tools against the data in IDC. Probably the easiest would be to make a colab notebook integrating with the metadata from IDC BigQuery tables, and fetching individual DICOM instances from the buckets. I think it should be straightforward for the most part, and the main chunk of work will be to adapt the ingestion part of your pipeline. Let us know if you want to work on this and have questions!
@satishev, I just wanted to confirm what @fedorov was saying about CCDH, image quality is outside of their scope. Instead, CCDH is primarily focused on integrating the various data models used within CRDC and mapping across the various semantic ontologies. IDC will benefit in that it will be easier to search across CRDC and integrate images with the data from other CRDC data repositories.
I think I understand. Are there any specific notebook examples or resources we should look at to start working on this?
Setting this up to be as useful or usable from the outset would ensure that folks actually use the tools, especially given the IDC will have multiple types of data available.