Hi, and welcome to IDC!
The main thing to know upfront: IDC histopathology slides are stored in DICOM Slide Microscopy (SM) format, and each slide downloads as a folder of .dcm files rather than a single SVS or TIFF. This is expected — the files represent the different resolution levels of the image pyramid. Tools like wsidicom, TIAToolbox (DICOMWSIReader), and QuPath (v0.4+) all handle this natively by pointing them at the folder.
To find and download slides, the idc-index Python package is the easiest starting point (no authentication needed):
from idc_index import IDCClient
client = IDCClient()
# Find slide microscopy series (here: TCGA-LUAD collection)
slides = client.sql_query("""
SELECT i.PatientID, i.SeriesInstanceUID, i.series_size_MB
FROM index i
WHERE i.collection_id = 'tcga_luad' AND i.Modality = 'SM'
""")
# Download — each series lands in its own subfolder
client.download_from_selection(
seriesInstanceUID=slides["SeriesInstanceUID"].tolist(),
downloadDir="./slides",
dirTemplate="%collection_id/%PatientID/%SeriesInstanceUID"
)
Note that slides can be several GB each, so it’s worth checking total size (SUM(series_size_MB)) before kicking off a large download.
For a fuller introduction to searching and working with IDC pathology data, the Getting started with digital pathology notebook (runnable for free on Colab) is a good next step. You may also be interested in this recent post: Get started with TIAToolbox to analyze IDC pathology images.
Finally, really the easiest interface to navigate IDC and learn how to use its content is the Claude skill we announced in this post: Imaging Data Commons Claude skill launched!
Feel free to follow up here with any questions!