This is a great questions @watan016, which can be used as an example to discuss what IDC users should do when they want to learn more about the specific items in their cohort/query/collection.
In order to answer this question, you should first keep in mind that collection identifier (corresponding to the collection_id
column in the dicom_all
table) should be treated as a label grouping together both the items released by the original contributors of what initially formed the collection, but also the analysis results of the data in the original collection that might be contributed later (we discuss this in part 3 of the Getting started tutorial series).
In order to understand the provenance of the individual items contained in the collection, you should check the value of the source_DOI
and/or source_URL
columns.
Taking the collection in your question, in the below we query for all distinct combinations of source_DOI
/source_URL
encountered for the files in the RIDER-Lung-CT collection:
SELECT
DISTINCT(Source_DOI),
Source_URL
FROM
`bigquery-public-data.idc_current.dicom_all`
WHERE
collection_id = "rider_lung_ct"
Here is the result:
Although it is one collection, you have several contributions, and you can click the links above to learn more about those.
Now, if we narrow down the query a bit more, we can check the URLs/DOIs corresponding to just the segmentations in that collection:
SELECT
DISTINCT(Source_DOI),
Source_URL
FROM
`bigquery-public-data.idc_current.dicom_all`
WHERE
collection_id = "rider_lung_ct"
AND Modality = "SEG"
You mentioned, you have questions about segmentations that include “alg01” in the SeriesDescription, so we can refine the query further:
SELECT
DISTINCT(Source_DOI),
Source_URL
FROM
`bigquery-public-data.idc_current.dicom_all`
WHERE
collection_id = "rider_lung_ct"
AND Modality = "SEG"
AND SeriesDescription LIKE "%alg0%"
Now we have only one URL: QIN multi-site collection of Lung CT data with Nodule Segmentations (QIN-LungCT-Seg) - TCIA DOIs - Cancer Imaging Archive Wiki.
Following the link, you can learn more about that collection, and also the contact information if your question is not addressed in the documentation. IDC did not generate that dataset, so we do not know all the details, but we have the pointers to help you investigate issues like that and track the provenance of data hosted in IDC.
Please let me know if you have any further questions!