Some computational tasks - training of a digital pathology model that requires only selected tiles at specific resolutions is a prime example! - do not require the entire image file, which (especially in digital pathology) can be rather large.
With IDC you can use S3 API to efficiently retrieve the entire image pyramid, but you can also access individual tiles/frames of a digital pathology image directly - without waiting to fetch the entire pyramid or wasting disk space for large sets of files.
Our new documentation article provides instructions on how you can access individual tags or image tiles from either Google Cloud Storage (GCS) or AWS storage blobs:
- Given
SeriesInstanceUID
, map DICOM series to the list of files (you can get the identifier from the portal or by filtering IDC content usingidc-index
python package) - Use native GCS/AWS library to instantiate a storage client for accessing individual files (no authentication is required!)
- Use
highdicom
python package to fetch the selected region of the image matrix or access specific tags.
For details check out the documentation article, accompanied by a notebook tutorial!
As always, we would be very interested in your feedback/questions in replies to this post.
p.s. You can also access individual frames of the slide images available in IDC using DICOMweb - stay tuned for a tutorial on that topic!