Segmentation data in RIDER lung CT database

There are 9 segmentation files per CT data in RIDER lung CT dataset (RIDER Lung CT - The Cancer Imaging Archive (TCIA) Public Access - Cancer Imaging Archive Wiki).
In the metafile, the series descriptions are such as “QIN CT challenge: alg01 run1segmentation result”.

  1. What do “alg01” and “run1” indicate?
  2. How were these nine segmentations made?

Yoichi

1 Like

This is a great questions @watan016, which can be used as an example to discuss what IDC users should do when they want to learn more about the specific items in their cohort/query/collection.

In order to answer this question, you should first keep in mind that collection identifier (corresponding to the collection_id column in the dicom_all table) should be treated as a label grouping together both the items released by the original contributors of what initially formed the collection, but also the analysis results of the data in the original collection that might be contributed later (we discuss this in part 3 of the Getting started tutorial series).

In order to understand the provenance of the individual items contained in the collection, you should check the value of the source_DOI and/or source_URL columns.

Taking the collection in your question, in the below we query for all distinct combinations of source_DOI/source_URL encountered for the files in the RIDER-Lung-CT collection:

SELECT
  DISTINCT(Source_DOI),
  Source_URL
FROM
  `bigquery-public-data.idc_current.dicom_all`
WHERE
  collection_id = "rider_lung_ct"

Here is the result:

Row Source_DOI Source_URL
1 10.7937/K9/TCIA.2015.1BUVFJR7 https://doi.org/10.7937/K9/TCIA.2015.1BUVFJR7
2 10.7937/K9/TCIA.2015.U1X8A5NR https://doi.org/10.7937/K9/TCIA.2015.U1X8A5NR
3 10.7937/tcia.2020.jit9grk8 https://doi.org/10.7937/tcia.2020.jit9grk8

Although it is one collection, you have several contributions, and you can click the links above to learn more about those.

Now, if we narrow down the query a bit more, we can check the URLs/DOIs corresponding to just the segmentations in that collection:

SELECT
  DISTINCT(Source_DOI),
  Source_URL
FROM
  `bigquery-public-data.idc_current.dicom_all`
WHERE
  collection_id = "rider_lung_ct"
  AND Modality = "SEG"
Row Source_DOI Source_URL
1 10.7937/tcia.2020.jit9grk8 https://doi.org/10.7937/tcia.2020.jit9grk8
2 10.7937/K9/TCIA.2015.1BUVFJR7 https://doi.org/10.7937/K9/TCIA.2015.1BUVFJR7

You mentioned, you have questions about segmentations that include “alg01” in the SeriesDescription, so we can refine the query further:

SELECT
  DISTINCT(Source_DOI),
  Source_URL
FROM
  `bigquery-public-data.idc_current.dicom_all`
WHERE
  collection_id = "rider_lung_ct"
  AND Modality = "SEG"
  AND SeriesDescription LIKE "%alg0%"

Now we have only one URL: QIN multi-site collection of Lung CT data with Nodule Segmentations (QIN-LungCT-Seg) - TCIA DOIs - Cancer Imaging Archive Wiki.

Following the link, you can learn more about that collection, and also the contact information if your question is not addressed in the documentation. IDC did not generate that dataset, so we do not know all the details, but we have the pointers to help you investigate issues like that and track the provenance of data hosted in IDC.

Please let me know if you have any further questions!

Hi Yoichi, I work with TCIA and helped with loading this data back when it was submitted. I think this will probably be clear now that Andrey has shown you how to get to the dataset description but, just to be sure, the page mentions:

  1. “3 academic institutions (Columbia, Stanford, Moffitt-USF) each ran their own segmentation algorithm” – this is represented as “alg01”, “alg02” and “alg03” in the descriptions.

  2. “Segmentations were performed 3 different times with different initial conditions, resulting in 9 segmentations formatted as DICOM Segmentation Objects (DSOs) for each tumor volume.” – this is represented as “run1”, “run2”, and “run3”.

1 Like

Yes, it is clear for me now. Thank you very much for your help.
Yoichi