Hello, professor. I am a student in University of Seoul. I am wondering that if I can get a normal person CT image for datasets. Because, I can search ct image for lung cancer image, but me and my teammates are struggling from get the normal person ct image which is really hard to find. I want to ask you to get a normal person ct image if not, please let me know how to get ct image because we need that for a simple research.
This is a great question!
Indeed, most of the images in IDC are from cancer patients. However, the NLST collection contains a dataset of chest CTs collected in a cancer screening trial, and thus will include images for non-cancer patients.
I am not that familiar with that collection, but you could probably use some of the metadata in the bigquery-public-data.idc_v9.nlst_canc
table to select patients that did not have cancer.
As an example, you can query for the counts of patients that have distinct values of clinical_n
(Clinical N code for staging, AJCC 6, per dictionary here) with this query:
SELECT
clinical_n,
COUNT(DISTINCT(pid)) as num_patients
FROM
`bigquery-public-data.idc_v9.nlst_canc`
GROUP BY
clinical_n
which will result in the following:
But I do not know if the missing value for clinical or pathological stage, for example, can be used as the indication that there was no cancer in that patient. @dclunie do you know?
In the āpersonā table supplied by the NLST folks there is a ālung_cancerā column described as:
Confirmed Lung Cancer?
Does the participant have a confirmed lung cancer diagnosis?
0=āNoā
1=āYesā
Thank you for your help once again, but I have a question on open data which is# LIDC-IDRIthat we are using this data.
Is this data including non-cancer lung ct image? If that is the case, we will be very happy to get this idea.
2022ė 5ģ 25ģ¼ (ģ) ģ¤ģ 9:01, David Clunie via Imaging Data Commons <notifications@canceridc.discoursemail.com>ėģ“ ģģ±:
@imjkang7 I am sorry I have not responded earlier.
I am afraid we do not have the information for the LIDC-IDRI collection whether specific patient didnāt have cancer.
But you can select patients in the NLST collection that did not have a confirmed cancer.
NLST collection is accompanied by several tables containing clinical data, see Files and metadata - IDC User Guide. If you follow the link, you will find the schema describing the metadata in those tables. One of those tables, prsn
, contains the column can_scr
table, defined as āIndicates whether the cancer followed a positive, negative, or missed screen, or whether it occurred after the screening years.ā with values "0=āNo Cancerā 1=āPositive Screenā 2=āNegative Screenā 3=āMissed Screenā 4=āPost Screeningā.
We confirmed with the curators of the NLST collection that patients that have value 0 (āNo cancerā) in that table can be used to identify ānormalā subjects that did not have a confirmed lung cancer diagnosis.
Putting this all together, you can first identify patients that did not have cancer using the following query:
SELECT
distinct(pid)
FROM
`bigquery-public-data.idc_v9.nlst_prsn`
WHERE
can_scr = 0
You can join those patient identifiers with the DICOM metadata in IDC dicom_all
table to get CT scans for all patients that had negative cancer. The query below will give you the list of all SeriesInstanceUID
s for CT scans from those non-cancer patients, and the URL to the viewer, so you can visually examine those series.
SELECT
distinct(SeriesInstanceUID),
CONCAT("https://viewer.imaging.datacommons.cancer.gov/viewer/",StudyInstanceUID,"?seriesInstanceUID=",SeriesInstanceUID) as viewer_url
FROM
`bigquery-public-data.idc_v9.nlst_prsn` as prsn
JOIN
`bigquery-public-data.idc_v9.dicom_all` as dicom_all
ON
SAFE_CAST(prsn.pid AS STRING) = dicom_all.PatientID
WHERE
can_scr = 0 AND Modality = "CT"
You can download the series identified using the instructions in this documentation page selecting by specific SeriesInstanceUID
: Downloading data - IDC User Guide. You can download all of them, but this will take some time for ~200K series.