I am working on a machine learning digital pathology project with medulloblastoma WSIs and would like to use IDC data as an external test set. I’ve already downloaded the images from the IDC SM portal however I was wondering if it is possible to gain access to any associated pathology reports that might be on file. I am primarily interested in the pathologist ground truth assessment of each slide/case, whether that is in the form of a report or image annotations, if available. Thank you!
Kindly,
Therry Malone | Research Assistant
Mark D. Krieger | Surgeon-in-Chief; Senior VP; Billy and Audrey L. Wilder Chair in Neurosurgery
Jennifer A. Cotter | Director, Neuropathology; Director, Center for Pathology Research Services
Children’s Hospital Los Angeles
4650 Sunset Blvd., Mailstop #43 | Los Angeles, CA 90027
Ph: 562.400.5141 | thmalone@chla.usc.edu www.CHLA.org
Unfortunately, we do not have access to the pathology reports for the images. They may be available from GDC for a subset of TCGA slides, but we have not investigated this, and do not have those in IDC at this time. I will make a note to try to investigate this, but unfortunately I cannot give you any estimate when this may be done.
Unfortunately, we do not have access to the pathology reports for the images. They may be available from GDC for a subset of TCGA slides, but we have not investigated this, and do not have those in IDC at this time. I will make a note to try to investigate this, but unfortunately I cannot give you any estimate when this may be done.
I have a follow up question. My understanding is that there is overlapping data between the IDC and KidsFirst portals, specifically from the CCDI-MCI and CBTN, respectively. Is it possible to know what CBTN data was included in the CCDI-MCI dataset that is accessible through the IDC SM portal? I’ve looked through both repositories extensively as well as the DICOM metadata of slides downloaded from IDC but haven’t found any linking information.
Unfortunately, we do not have access to the pathology reports for the images. They may be available from GDC for a subset of TCGA slides, but we have not investigated this, and do not have those in IDC at this time. I will make a note to try to investigate this, but unfortunately I cannot give you any estimate when this may be done.
Since you reached out via email, by default your message is accessible only to the forum staff. It would be easier to coordinate the response, and it would be helpful for the rest of the users of IDC if we could make this tread public. Please let me know.
Since you reached out via email, by default your message is accessible only to the forum staff. It would be easier to coordinate the response, and it would be helpful for the rest of the users of IDC if we could make this tread public. Please let me know.
Thank you for bringing this to our attention. We identified 68 participants that overlap between the MCI and CBTN studies. The participant_id mappings between the two datasets are included in the table below. Downloading the IDC data associated with these MCI participant IDs should provide the dataset you are looking for.
Column 1
Column 2
MCI participant_id
CBTN participant_id
PBBHZD
C4034646
PBBIMT
C4318161
PBBIUN
C4245345
PBBIVA
C5623929
PBBIXX
C4317792
PBBIZK
C4317915
PBBIZT
C4103649
PBBJJS
C4344360
PBBKSJ
C4344483
PBBMCX
C2698374
PBBMHS
C5491581
PBBMKI
C4653705
PBBMRT
C5491458
PBBMYD
C4830825
PBBNFJ
C5260218
PBBNWE
C4745217
PBBPPA
C4633533
PBBRDE
C5623806
PBBTBU
C4948413
PBBTFD
C4948536
PBBUTU
C7617267
PBBVUU
C5254560
PBBWUK
C5254437
PBBXMP
C5254314
PBBYTY
C5254191
PBBZAR
C5253945
PBBZNE
C5254068
PBBZNV
C5253822
PBCAEF
C5492319
PBCBCD
C5492565
PBCBJR
C5492196
PBCBKE
C6970410
PBCBSA
C5623437
PBCCLD
C5623683
PBCDLB
C5492442
PBCDVZ
C5623560
PBCFES
C6353688
PBCFPT
C6083826
PBCGFJ
C6083703
PBCHET
C6083580
PBCKHN
C6958725
PBCKLL
C6353811
PBCLUB
C6083457
PBCMJC
C6349014
PBCSYM
C6969918
PBCUIN
C7333137
PBCUNY
C6958971
PBCUWA
C6970533
PBCWDZ
C7302510
PBCWFT
C7093779
PBCWIV
C6970041
PBCXFJ
C7093902
PBCXHU
C7093656
PBCYMG
C7302387
PBCYUR
C7093533
PBDAEM
C7457490
PBDAKJ
C7302018
PBDALL
C7302633
PBDAUZ
C7302756
PBDBFH
C7302264
PBDDFI
C7457613
PBDEMM
C7424526
PBDENP
C7457367
PBDFBU
C7617021
PBDFKM
C7611978
PBDGFD
C7617144
PBDHFB
C7612101
PBDJPW
C7617390
In additional, we are currently developing CPI bulk query and export feature for our CCDI Ecosystem. We can notify you when this feature becomes available. In the meantime, here is the current method of pulling study synonyms for the MCI (or other) cohort:
Navigate to the Participant metadata table for selected cases
Download the JSON output file for the selected cases
Within the JSON file, for applicable cases with alternate study synonyms, there will be a data element for “Available CPI Mapping”, consisting of an array of synonyms for associated studies or domains. CBTN is listed as the domain description “Children’s Brain Tumor Network”, and the associated ID for that entry will be the associated CBTN ID.
Thank you for providing this list and instructions for extracting CBTN IDs from participant metadata. I was able to replicate your mapping with your instructions.
Does this approach yield the ground truth mapping for all associated sources (i.e., are the 68 overlapping MCI/CBTN participants you identified the only participants taken from the CBTN cohort), or are there additional mappings/methods for determining which participants came from CBTN and have associated CBTN IDs?
Essentially, I have a list of CBTN Collection / Participant IDs sourced from my institution and KidsFirst, and I need to know their IDC participant ID, if it exists.
Hi Therry, If you have a list of participants and would like to identify alternate identifiers used across pediatric cancer studies or repositories, the CCDI Participant Index (CPI) can help map related participant IDs across multiple CCDI-supported datasets and systems. The CPI is designed to support data integration and cross-study research by maintaining mappings between de-identified participant identifiers rather than storing clinical or genomic data itself. The CPI API allows authorized users to retrieve associated participant identifiers, validate IDs, discover related domains or studies, and access metadata and statistics about the index while maintaining strong privacy protections. Additional technical documentation is available in the CPI API Documentation.