CT images with basic demographics - age, gender, BMI, smoking status, etc

Hi,

I was wondering whether there are CT datasets in the Imaging Data Commons with basic demographic information such as age, gender, BMI, smoking status, etc that I can download. Could you please guide me on how to check the number of CT images with this information and also the way I can download them?

Many thanks in advance!

1 Like

Andres, thanks for reaching out, and sorry for a delayed reply!

I will answer your question in several parts, with increasing complexity.

IDC data is accompanied by a lot of metadata that is intended to help you select relevant data based on criteria like the ones you asked about. Most of that metadata is coming from DICOM attributes, but some is coming from other sources, such as collection-specific metadata provided in various spreadsheets.

We have so much metadata that we use Google BigQuery for managing it. But over time we realized that working with BigQuery may be too hard for beginner users. Because of this late last year we started work on idc-index python package that wraps a small subset of metadata as well as basic functions for searching and downloading data from IDC.

The first part of your question (“CT images with age and gender that I can download”) can be answered completely with just idc-index, since the metadata attributes needed to select data based on these criteria (DICOM Modality, PatientAge, PatientSex) are included in idc-index.

The high-level steps to do this are the following as demonstrated in this really short notebook here):

  1. pip install idc-index
  2. select DICOM series that have the attributes you need that looks like this
query = """
SELECT PatientAge, PatientID, collection_id, SeriesInstanceUID
FROM
  index
WHERE
  Modality = 'CT' and PatientAge is not NULL and PatientSex is not NULL
"""

selection_result = client.sql_query(query)
  1. Download files by the DICOM SeriesInstanceUID identifiers:
client.download_from_selection(seriesInstanceUID=selection_result.SeriesInstanceUID.values.tolist()[:10], downloadDir = ".")

To make the search more comprehensive, at the moment you would need to rely on BigQuery. I will follow up with the second part a bit later (to break my response in smaller bits, and not to over-complicate it).

For now, please let me know if this works for you as a start, and if you have any follow up questions!

Would be great to have your feedback about idc-index - how usable it is, and what you would like to see there!

1 Like

A post was split to a new topic: Access to IDC clinical data through idc-index

Many thanks for this, @fedorov.

I’ll start playing with this before continuing with the other part.

Thanks again!

1 Like