Although we claim we include this collection QIN multi-site collection of Lung CT data with Nodule Segmentations - TCIA DOIs - Cancer Imaging Archive Wiki, we actually don’t.
SELECT
SOPInstanceUID,
Modality,
collection_id
FROM
`canceridc-data.idc_views.dicom_all`
WHERE
Source_DOI = "10.7937/K9/TCIA.2015.1BUVFJR7"
I think what happened is that some instances from that collection got pulled in because of the TCIA API limitations (it was, or maybe still is, incorrectly assigning segmentations from that collection that were done for LIDC to the LIDC collection), but it contains more than just those LIDC segs.
The right thing to do would be to completely exclude that analysis results collection, until we include all of the original collections it corresponds to, and until it includes all of the instances, not a subset, but probably it would be too much effort between now and broad release.
@bill.clifford what do you think? Should we just document this limitation in the documentation?