Hello, professor. I am a student in University of Seoul. I am wondering that if I can get a normal person CT image for datasets. Because, I can search ct image for lung cancer image, but me and my teammates are struggling from get the normal person ct image which is really hard to find. I want to ask you to get a normal person ct image if not, please let me know how to get ct image because we need that for a simple research.
This is a great question!
Indeed, most of the images in IDC are from cancer patients. However, the NLST collection contains a dataset of chest CTs collected in a cancer screening trial, and thus will include images for non-cancer patients.
I am not that familiar with that collection, but you could probably use some of the metadata in the bigquery-public-data.idc_v9.nlst_canc table to select patients that did not have cancer.
As an example, you can query for the counts of patients that have distinct values of clinical_n (Clinical N code for staging, AJCC 6, per dictionary here) with this query:
SELECT
clinical_n,
COUNT(DISTINCT(pid)) as num_patients
FROM
`bigquery-public-data.idc_v9.nlst_canc`
GROUP BY
clinical_n
which will result in the following:

But I do not know if the missing value for clinical or pathological stage, for example, can be used as the indication that there was no cancer in that patient. @dclunie do you know?
In the âpersonâ table supplied by the NLST folks there is a âlung_cancerâ column described as:
Confirmed Lung Cancer?
Does the participant have a confirmed lung cancer diagnosis?
0=âNoâ
1=âYesâ
Thank you for your help once again, but I have a question on open data which is# LIDC-IDRIthat we are using this data.
Is this data including non-cancer lung ct image? If that is the case, we will be very happy to get this idea.
2022ë 5ě 25ěź (ě) ě¤ě 9:01, David Clunie via Imaging Data Commons <notifications@canceridc.discoursemail.com>ëě´ ěěą:
@imjkang7 I am sorry I have not responded earlier.
I am afraid we do not have the information for the LIDC-IDRI collection whether specific patient didnât have cancer.
But you can select patients in the NLST collection that did not have a confirmed cancer.
NLST collection is accompanied by several tables containing clinical data, see Files and metadata - IDC User Guide. If you follow the link, you will find the schema describing the metadata in those tables. One of those tables, prsn, contains the column can_scr table, defined as âIndicates whether the cancer followed a positive, negative, or missed screen, or whether it occurred after the screening years.â with values "0=âNo Cancerâ 1=âPositive Screenâ 2=âNegative Screenâ 3=âMissed Screenâ 4=âPost Screeningâ.
We confirmed with the curators of the NLST collection that patients that have value 0 (âNo cancerâ) in that table can be used to identify ânormalâ subjects that did not have a confirmed lung cancer diagnosis.
Putting this all together, you can first identify patients that did not have cancer using the following query:
SELECT
distinct(pid)
FROM
`bigquery-public-data.idc_v9.nlst_prsn`
WHERE
can_scr = 0
You can join those patient identifiers with the DICOM metadata in IDC dicom_all table to get CT scans for all patients that had negative cancer. The query below will give you the list of all SeriesInstanceUIDs for CT scans from those non-cancer patients, and the URL to the viewer, so you can visually examine those series.
SELECT
distinct(SeriesInstanceUID),
CONCAT("https://viewer.imaging.datacommons.cancer.gov/viewer/",StudyInstanceUID,"?seriesInstanceUID=",SeriesInstanceUID) as viewer_url
FROM
`bigquery-public-data.idc_v9.nlst_prsn` as prsn
JOIN
`bigquery-public-data.idc_v9.dicom_all` as dicom_all
ON
SAFE_CAST(prsn.pid AS STRING) = dicom_all.PatientID
WHERE
can_scr = 0 AND Modality = "CT"
You can download the series identified using the instructions in this documentation page selecting by specific SeriesInstanceUID: Downloading data - IDC User Guide. You can download all of them, but this will take some time for ~200K series.