Hello, professor. I am a student in University of Seoul. I am wondering that if I can get a normal person CT image for datasets. Because, I can search ct image for lung cancer image, but me and my teammates are struggling from get the normal person ct image which is really hard to find. I want to ask you to get a normal person ct image if not, please let me know how to get ct image because we need that for a simple research.
This is a great question!
Indeed, most of the images in IDC are from cancer patients. However, the NLST collection contains a dataset of chest CTs collected in a cancer screening trial, and thus will include images for non-cancer patients.
I am not that familiar with that collection, but you could probably use some of the metadata in the
bigquery-public-data.idc_v9.nlst_canc table to select patients that did not have cancer.
As an example, you can query for the counts of patients that have distinct values of
clinical_n (Clinical N code for staging, AJCC 6, per dictionary here) with this query:
SELECT clinical_n, COUNT(DISTINCT(pid)) as num_patients FROM `bigquery-public-data.idc_v9.nlst_canc` GROUP BY clinical_n
which will result in the following:
But I do not know if the missing value for clinical or pathological stage, for example, can be used as the indication that there was no cancer in that patient. @dclunie do you know?
In the “person” table supplied by the NLST folks there is a “lung_cancer” column described as:
Confirmed Lung Cancer?
Does the participant have a confirmed lung cancer diagnosis?
Thank you for your help once again, but I have a question on open data which is# LIDC-IDRIthat we are using this data.
Is this data including non-cancer lung ct image? If that is the case, we will be very happy to get this idea.
2022년 5월 25일 (수) 오전 9:01, David Clunie via Imaging Data Commons <firstname.lastname@example.org>님이 작성:
@imjkang7 I am sorry I have not responded earlier.
I am afraid we do not have the information for the LIDC-IDRI collection whether specific patient didn’t have cancer.
But you can select patients in the NLST collection that did not have a confirmed cancer.
NLST collection is accompanied by several tables containing clinical data, see Files and metadata - IDC User Guide. If you follow the link, you will find the schema describing the metadata in those tables. One of those tables,
prsn, contains the column
can_scr table, defined as “Indicates whether the cancer followed a positive, negative, or missed screen, or whether it occurred after the screening years.” with values "0=“No Cancer” 1=“Positive Screen” 2=“Negative Screen” 3=“Missed Screen” 4=“Post Screening”.
We confirmed with the curators of the NLST collection that patients that have value 0 (“No cancer”) in that table can be used to identify “normal” subjects that did not have a confirmed lung cancer diagnosis.
Putting this all together, you can first identify patients that did not have cancer using the following query:
SELECT distinct(pid) FROM `bigquery-public-data.idc_v9.nlst_prsn` WHERE can_scr = 0
You can join those patient identifiers with the DICOM metadata in IDC
dicom_all table to get CT scans for all patients that had negative cancer. The query below will give you the list of all
SeriesInstanceUIDs for CT scans from those non-cancer patients, and the URL to the viewer, so you can visually examine those series.
SELECT distinct(SeriesInstanceUID), CONCAT("https://viewer.imaging.datacommons.cancer.gov/viewer/",StudyInstanceUID,"?seriesInstanceUID=",SeriesInstanceUID) as viewer_url FROM `bigquery-public-data.idc_v9.nlst_prsn` as prsn JOIN `bigquery-public-data.idc_v9.dicom_all` as dicom_all ON SAFE_CAST(prsn.pid AS STRING) = dicom_all.PatientID WHERE can_scr = 0 AND Modality = "CT"
You can download the series identified using the instructions in this documentation page selecting by specific
SeriesInstanceUID: Downloading data - IDC User Guide. You can download all of them, but this will take some time for ~200K series.