Hi, I was recently participating in the IDC-hosted external project. I would like to know how we should submit our prepared tutorial notebook?
Kevin, assuming you have your notebook in GitHub, you can just mention it in this thread, and I can make a new section in the documentation for user-contributed notebooks under this section Colab notebooks - IDC User Guide, and link it from there. Would this work for you?
Sounds good, thanks. I prepared and tested my notebook on GCP. Didn’t use Colab for it. But the difference between a GCP notebook and a Colab one should be minor.
I will discuss with my supervisor to finalize it, and get back to you here.
You can start by sharing the notebook in GitHub!
Here is the link to our prepared notebook:
Please also include it to your IDC examples. Thanks!
@Kevin I have not tried to run the notebook, just looked over it, and it does look very interesting, relevant and helpful! Thank you for your contribution, it is greatly appreciated!
Anything you would like to share about your experience working with IDC on this use case? Any feedback for improvement?
Do you plan to work on more use cases? We plan to have more MR collections in the upcoming data release that have segmentations (and in particular, LGG-1p19qDeletion, contributed by your lab!). Would be great to have some use cases for some of the MR datasets, given your group expertise with brain imaging!
Thank you again for your contribution! It is now linked it from the IDC documentation here: Colab notebooks - IDC User Guide.
@fedorov Cool. Glad to know, and you are very welcome.
It has been a nice experience for me to work on GCP with IDC. The cloud way of bringing researchers to the data sounds attractive. I am personally supportive of this new platform. Regarding the points to improve, I would say the filtering function on the IDC portal is temporarily limited, and the “derived object” for segmentation data might not be clearly explained on “IDC user guide” ---- adding a specific example would be helpful. Those are minor points, and can surely be improved. I plan to keep working on IDC and GCP.
One question about LIDC: The dataset downloaded from TCIA (the traditional way) contains .xml for segmentation labeling information (e.g., the coordinates of nodules). I was wondering what is the analogue to that in LIDC? In my current notebook, I use the package pylidc to load that information.
Thank you for adding our notebook on the IDC website. By the way, can you replace the link to my name by this page:
I’m working on a couple of notebooks that will use brain MRI, I’ll update you soon! Thanks
Thank you for this feedback! I agree we should improve those items.
Segmentations of volumetrically segmented nodules, and the qualitative assessments of those nodules were converted from LIDC XML representation into DICOM (see details discussed in this paper). You can look up individual segmentations stored in DICOM Segmentation objects using this convenience table
canceridc-data.idc_views.segmentations (referenced from Organization of data - IDC User Guide). You can then convert them into an ITK-readable format using dcmqi or highdicom. Conceptually, using those tools is no different from when you use dcm2niix for creating NIfTIs for the brain MR scans.
I understand this is definitely much less convenient than using
pylidc, which was custom developed for handling LIDC XML files. But once you understand how to handle standard DICOM Segmentations, you can use the same approach for any other collection that contains those objects, and there are quite a few such collections in IDC/TCIA.
It would be great if we could work together to revise your notebook to work with the segmentations stored in IDC directly.
Thank you. I took another look at “canceridc-data.idc_views.segmentations” with your “LIDC_exploration” notebook (Segmentation part). If we follow the workflow on DICOM images (i.e., BIgQuery → Prepare Manifest file → Download the selected DICOMs), I guess to obtain segmentation masks, we can also first extract meta-data from the “idc_views.segmentations” to prepare manifest file and then download the segmentation DICOM. The segmentation DICOM can be converted via the packages that you mentioned, and finally into numpy arrays to denote nodule labels. Is my understanding correct?
I agree. If we can revise the notebook and resolve the IDC segmentations with your help, that would be great.
Exactly! Let me know how it goes, I hope it will not be too difficult to adjust your notebook to be fully driven by the data in IDC, and not use
pylidc. To be clear,
pylidc is a great package - I don’t have anything against it, but for IDC we need to have approaches that do not rely on data versions that are not within IDC, and need approaches that can scale and that do not use tools that are dataset-specific.
Yeah, I see your point. It is reasonable that we prepare something more generalized and can be used for other IDC datasets. I will look into it and get back to you.