Hi there,
I’m completely new to IDC and am trying to gain some familiarity with it using the Jupyter Notebooks provided under the “Getting Started” tab. My specific interest is in accessing the clinical data for the CMB-LCA dataset. I took inspiration the notebook on working with IDC clinical data and swapped out the collection ID from rms_mutation_prediction to cmb_lca:
define the query that selects all rows where collection_id is ‘cmb_lca’
note that we can refer to clinical_index table in the query
query = “”"
SELECT *
FROM clinical_index
WHERE collection_id = ‘cmb_lca’
“”"
execute the query
matching_items = c.sql_query(query)
matching_items
However, when I run this code, the returned dataframe doesn’t have anything in it. If I swap the collection ID back to rms_mutation_prediction it works just fine. Maybe I’ve got the collection ID incorrect? Please advise, and thank you for your help!
1 Like
Tyler, thank you for your question!
TL;DR: unfortunately, there is no publicly available clinical data tables accompanying this collection. You can, however, get limited clinical data from the slides metadata!
Clinical data is available in IDC only for those collections where it was shared by the entity that shared the collection. In this particular case, CMB-LCA
collection in IDC contains only the digital pathology component of the dataset, converted into DICOM, as described in this Zenodo record: CMB-LCA: DICOM converted Slide Microscopy images for the Cancer Moonshot Biobank initiative Lung Cancer collection (you have access to the DOI of the collection in the source_doi
column in idc-index
and in the collection tooltip in IDC Portal).
From the Zenodo page you can learn that this slide microscopy collection was derived from the images shared in vendor-specific format, as referenced in the “Additional details” of the descriptor.
That DOI resolves to a TCIA wiki page, and you can confirm that there is no clinical data shared on that page.
Note, however, that DICOM slide microscopy images do contain limited clinical metadata: anatomic site from where tissue was collected, and the diagnosis. Take a look at this notebook tutorial that explains how to access it: IDC-Tutorials/notebooks/pathomics/slide_microscopy_metadata_search.ipynb at master · ImagingDataCommons/IDC-Tutorials · GitHub (you can get the result below by replacing the ccdi_mci
collection ID in the corresponding cell with cmb_lca
).
1 Like
I raised with the folks at TCIA, and they pointed out that the Cancer Moonshot Biobank collections are accompanied by the non-imaging data available from dbGap and CTDC (Clinical and Translational Data Commons). Here are the links you can also find at the TCIA page for the content of origin (same page that references the original pathology files in vendor formats):