I am working on research that needs to understand what those fields are for and how they are related to the other fields either in IDC or GDC. Is there any additional information or document that I missed to follow up for this inquiry.
Most of the fields in the dicom_all table are the DICOM metadata attributes extracted from the files in IDC. To understand that table, it is best to look at the query that is used to populate it:
With the explanation above in mind, if you want to know the descriptions of the attributes in dicom_all, you should use the following approach:
if the column is coming from the dicom_metadata table, you should refer to the DICOM standard to understand the semantics of the individual attributes. I personally find the DICOM browser by Innolitics extremely helpful in looking up DICOM attributes: All CIODs – DICOM Standard Browser.
If you want to join IDC data with the information available in GDC, you will need to use PatientID column in the dicom_all table, and join with the information in GDC.
Can you give me an example of the kind of information from GDC that you would like to join? I think it would be better if we figure this out on the example that is relevant to your interests.
Also, do you mind if I move this conversation to the public forum? I think it may be of interest to other users of IDC.
Can you give me an example of the kind of information from GDC that you would like to join? I think it would be better if we figure this out on the example that is relevant to your interests.
It’s more the description of each field in the table dicom_all. For example, in the view auxiliary_metadata, I could find the description of the field dicom_all.StudyInstanceUID according to https://learn.canceridc.dev/data/organization-of-data/files-and-metadata. Another example is I clearly knew dicom_all.PatientAge specifies the age of the patient based on the name of the field but could not find the official description for this field. It would be nice if we could have all descriptions for each field in the table dicom_all otherwise I just need to look up the other views beside the auxiliary_metadata.
Also, do you mind if I move this conversation to the public forum? I think it may be of interest to other users of IDC.
I don’t mind at all. Please move this conversation to the public if you feel it’s necessary.
Most of the fields in the dicom_all table are the DICOM metadata attributes extracted from the files in IDC. To understand that table, it is best to look at the query that is used to populate it:
WITH
pre_dicom_all AS (
SELECT
aux.tcia_api_collection_id AS tcia_api_collection_id,
aux.idc_webapp_collection_id AS idc_webapp_collection_id,
aux.idc_webapp_collection_id AS collection_id,
aux.collection_timestamp AS collection_timestamp,
aux.collection_hash as collection_hash,
aux.collection_init_idc_version AS collection_init_idc_version,
aux.collection_revised_idc_version AS collection_revised_idc_version,
aux.access AS access,
dcm.PatientID as PatientID,
aux.idc_case_id as idc_case_id,
aux.patient_hash as patient_hash,
aux.patient_init_idc_version AS patient_init_idc_version,
aux.patient_revised_idc_version AS patient_revised_idc_version,
dcm.StudyInstanceUID AS StudyInstanceUID,
aux.study_uuid as crdc_study_uuid,
aux.study_hash as study_hash,
aux.study_init_idc_version AS study_init_idc_version,
With the explanation above in mind, if you want to know the descriptions of the attributes in dicom_all, you should use the following approach:
if the column is coming from the dicom_metadata table, you should refer to the DICOM standard to understand the semantics of the individual attributes. I personally find the DICOM browser by Innolitics extremely helpful in looking up DICOM attributes: All CIODs – DICOM Standard Browser.
if the column is coming from the auxiliary_metadata table, its schema is available in the BigQuery console for the table bigquery-public-data.idc_current.auxiliary_metadata, and is also described here: etl_flow/schema.py at master · ImagingDataCommons/etl_flow · GitHub
I am working on research that needs to understand what those fields are for and how they are related to the other fields either in IDC or GDC.
This is a great question!
If you want to join IDC data with the information available in GDC, you will need to use PatientID column in the dicom_all table, and join with the information in GDC.
Can you give me an example of the kind of information from GDC that you would like to join? I think it would be better if we figure this out on the example that is relevant to your interests.
Also, do you mind if I move this conversation to the public forum? I think it may be of interest to other users of IDC.
I understand. As I discussed in the earlier response above, the answer will depend on where that field is coming from into dicom_all.
For most of the fields, they are DICOM attributes extracted from DICOM images. As mentioned earlier, you can use Innolitics DICOM browser to look up the definitions from the DICOM standard.
So, for example, if you are interested in PatientID, you will do the following:
Type “Patient ID” (note that you will need to break CamelCase notation of the column names into separate words, so “PatientID” becomes “Patient ID”, etc) in the search field and hit “return”
Select the attribute from the list.
If you want to see the authoritative definition and context within the latest version of the DICOM standard, click the “View in Standard” link.
It is not practical to replicate the description of the attributes into the BigQuery table schema, since those descriptions can be very extensive, and also the description may vary depending on the context where it is used. As the specific example for PatientID, semantics of this attribute will be a bit different for small imaging.
Thank you for asking those questions! DICOM standard is complex, the subject of medical imaging is complex, and we are here to help you navigate this information. I hope this response is helpful.