Per the conversation in the GitHub ticket, right now, the above filter is how records are counted. This is the reason for the behavior you’re seeing.
With related data, there’s a slight difference, in that related filters are only applied to related cases. In effect, the above query becomes something like:
SELECT SOPClassUID, COUNT(*) as cases
FROM (
SELECT PatientID, SOPClassUID
FROM (
SELECT PatientID, SOPClassUID
FROM `idc-dev-etl.etl_metadata.dicom_derived_all_new`
WHERE StudyInstanceUID in (
SELECT StudyInstanceUID
FROM `idc-dev-etl.etl_metadata.dicom_derived_all_new` dicom
LEFT JOIN `isb-cgc.TCGA_bioclin_v0.clinical_v1` AS clin
ON clin.case_barcode = dicom.PatientID
WHERE (has_related='True' AND clin.vital_status='Alive')
)
GROUP BY PatientID, SOPClassUID
)
)
GROUP BY SOPClassUID;
If you had an original filter as well, it becomes:
SELECT SOPClassUID, COUNT(*) as cases
FROM (
SELECT PatientID, SOPClassUID
FROM (
SELECT PatientID, SOPClassUID
FROM `idc-dev-etl.etl_metadata.dicom_derived_all_new`
WHERE StudyInstanceUID in (
SELECT StudyInstanceUID
FROM `idc-dev-etl.etl_metadata.dicom_derived_all_new` dicom
LEFT JOIN `isb-cgc.TCGA_bioclin_v0.clinical_v1` AS clin
ON clin.case_barcode = dicom.PatientID
WHERE ((has_related='True' AND clin.vital_status='Alive') OR has_related='False') AND (Modality='CT')
)
GROUP BY PatientID, SOPClassUID
)
)
GROUP BY SOPClassUID;
For derived data, we CAN do this, we’re just not doing it currently–because unlike related data, which is linked to PatientID and so relatively easy to determine a simple has/does not have relationship based on the PatientID, derived data are instance records as well, at the same level (if you will) as Original data. Furthermore, derived data is broken down into 3 separate subcategories (Segmentation, Qualitative, Quantitative). So, as for how it’s behaving right now, this is the filter for records:
SELECT SOPClassUID, COUNT(*) as cases
FROM (
SELECT PatientID, SOPClassUID
FROM (
SELECT PatientID, SOPClassUID
FROM `idc-dev-etl.etl_metadata.dicom_derived_all_new`
WHERE StudyInstanceUID in (
SELECT StudyInstanceUID
FROM `idc-dev-etl.etl_metadata.dicom_derived_all_new` dicom
WHERE (AnatomicRegionSequence IS NULL)
)
GROUP BY PatientID, SOPClassUID
)
)
GROUP BY SOPClassUID
ORDER BY cases DESC;
The filter for counts, on the other hand:
SELECT SOPClassUID, COUNT(*) as cases
FROM (
SELECT PatientID, SOPClassUID
FROM (
SELECT PatientID, SOPClassUID
FROM `idc-dev-etl.etl_metadata.dicom_derived_all_new`
WHERE StudyInstanceUID in (
SELECT StudyInstanceUID
FROM `idc-dev-etl.etl_metadata.dicom_derived_all_new` dicom
WHERE (AnatomicRegionSequence IS NULL AND has_segmentation='True')
)
GROUP BY PatientID, SOPClassUID
)
)
GROUP BY SOPClassUID
ORDER BY cases DESC;
Notice that counts are explicitly done under their derived subcategory. The record filter, on the other hand, assumes that if the user checks AnatomicRegionSequence is None, they mean anything with no AnatomicRegionSequence available.