Hey there, I’m trying to build a report containing a variety of DICOM metadata about the collections from TCIA. I haven’t gotten to the part about trying to exclude the datasets that aren’t actually in TCIA yet because I’m stuck just getting the basic query to work. Any thoughts on what’s going wrong in the SQL below? The goal is to have a series-level report that contains all of these fields in it for each series/row. If you can also show me how to exclude the stuff that isn’t TCIA data that would be great too. I’m assuming it would be to add some filter on the DOI URL to ignore things in Zenodo? Thanks!
SELECT
SeriesInstanceUID,
STRING_AGG(DISTINCT Modality, ', ') AS Modality,
STRING_AGG(DISTINCT Laterality, ', ') AS Laterality,
STRING_AGG(DISTINCT ImageLaterality, ', ') AS ImageLaterality,
STRING_AGG(DISTINCT SeriesDescription, ', ') AS SeriesDescription,
ARRAY_TO_STRING(ImageType, ', ') AS ImageType,
STRING_AGG(DISTINCT AcquisitionType, ', ') AS AcquisitionType,
STRING_AGG(DISTINCT AcquisitionNumber, ', ') AS AcquisitionNumber,
STRING_AGG(DISTINCT ContrastBolusAgent, ', ') AS ContrastBolusAgent,
STRING_AGG(DISTINCT ContrastBolusTotalDose, ', ') AS ContrastBolusTotalDose,
ARRAY_TO_STRING(ConvolutionKernel, ', ') AS ConvolutionKernel,
STRING_AGG(DISTINCT StudyDescription, ', ') AS StudyDescription,
STRING_AGG(DISTINCT BodyPartExamined, ', ') AS BodyPartExamined,
STRING_AGG(DISTINCT SliceThickness, ', ') AS SliceThickness,
STRING_AGG(DISTINCT PatientPosition, ', ') AS PatientPosition,
ARRAY_TO_STRING(ARRAY(SELECT ARRAY_TO_STRING(arr, ', ') FROM UNNEST(ImageOrientationPatient) AS arr), ', ') AS ImageOrientationPatient,
STRING_AGG(DISTINCT Source_DOI, ', ') AS Source_DOI,
STRING_AGG(DISTINCT collection_name, ', ') AS collection_name
FROM
`bigquery-public-data.idc_current.dicom_all`
GROUP BY
SeriesInstanceUID