Andres, thanks for reaching out, and sorry for a delayed reply!
I will answer your question in several parts, with increasing complexity.
IDC data is accompanied by a lot of metadata that is intended to help you select relevant data based on criteria like the ones you asked about. Most of that metadata is coming from DICOM attributes, but some is coming from other sources, such as collection-specific metadata provided in various spreadsheets.
We have so much metadata that we use Google BigQuery for managing it. But over time we realized that working with BigQuery may be too hard for beginner users. Because of this late last year we started work on idc-index
python package that wraps a small subset of metadata as well as basic functions for searching and downloading data from IDC.
The first part of your question (“CT images with age and gender that I can download”) can be answered completely with just idc-index
, since the metadata attributes needed to select data based on these criteria (DICOM Modality
, PatientAge
, PatientSex
) are included in idc-index
.
The high-level steps to do this are the following as demonstrated in this really short notebook here):
pip install idc-index
- select DICOM series that have the attributes you need that looks like this
query = """
SELECT PatientAge, PatientID, collection_id, SeriesInstanceUID
FROM
index
WHERE
Modality = 'CT' and PatientAge is not NULL and PatientSex is not NULL
"""
selection_result = client.sql_query(query)
- Download files by the DICOM
SeriesInstanceUID
identifiers:
client.download_from_selection(seriesInstanceUID=selection_result.SeriesInstanceUID.values.tolist()[:10], downloadDir = ".")
To make the search more comprehensive, at the moment you would need to rely on BigQuery. I will follow up with the second part a bit later (to break my response in smaller bits, and not to over-complicate it).
For now, please let me know if this works for you as a start, and if you have any follow up questions!
Would be great to have your feedback about idc-index
- how usable it is, and what you would like to see there!