Creating cohort in portal from a list of PatientIDs

Hi,

I have a list of PatientID and I would like to load them in the portal (https://portal.imaging.datacommons.cancer.gov/explore/ ) to create a cohort. How can I do that?

Best,

Luca

···

Luca Graglia

Director of Software and Infrastructure Services

Data for the Common Good

Biological Sciences Division

University of Chicago

lgraglia@bsd.uchicago.edu

Luca, thank you for reaching out with this qestion.

At this time, it is not possible to use a list of PatientIDs to build a cohort in IDC Portal.

Can you tell us more about what you would like to do with the cohort? What is your ultimate goal?

If you want to be able to download the images for the specific list of patients, it is possible with a little bit of coding, and I am happy to help with that.

Hi Andrey,

Thank you for your response.
My ultimate goal is to be able to see how much data there is in the IDC given a set of PatientIDs, and download those images / request to have access to them if there is any governance layer before the download can happen.
In an ideal scenario I would also like some metadata about those images.

Thank you for your offer, I would love some guidance in to downloading the images for a specific list of patients.

Best,

Luca

1 Like

Luca, thank you for the clarification! I will follow up.

Also, note that I moved this discussion into a public section of the IDC forum, since I believe your question is of general interest, and does not contain any sensitive information. I see you are communicating via email, and I wanted to make sure you recognize this.

You can join the forum and continue the conversation in the forum here Creating cohort in portal from a list of PatientIDs , or continue communicating by email!

1 Like

Sorry for the delay in replying - travels/deadlines!

Here’s the basic recipe - happy to expand based on your feedback/questions:

  1. Install the prerequisite idc-index python package - this will give you interface to navigate basic metadata accompanying IDC content:
    $ pip install --upgrade idc-index
    
  2. Instantiate IDCClient that provides API/metadata tables:
    from idc_index import IDCClient
    client = IDCClient()
    
  3. IDCClient provides access to a pandas dataframe documented here, corresponding to the current release of IDC data, which you can use to select items for the given patient identifiers:
    patient_ids = ["TCGA-3L-AA1B","PANLMU"]
    selection = client.index[client.index["PatientID"].isin(patient_ids)]
    
  4. If you just want to download everything for the specific patients, you can:
    client.download_from_selection(patientId = patient_ids, downloadDir=".")
    

The downloaded content by default will be organized in a hierarchy collection/patient/study/series:

$ tree -d
.
├── ccdi_mci
│   └── PANLMU
│       └── 2.25.60737598245052570577932078803929433012
│           └── SM_1.3.6.1.4.1.5962.99.1.856942911.401081431.1727433828671.4.0
└── tcga_coad
    └── TCGA-3L-AA1B
        └── 2.25.173524747743997252212346304558885903341
            ├── SM_1.3.6.1.4.1.5962.99.1.3192501271.1499461926.1639575073815.2.0
            ├── SM_1.3.6.1.4.1.5962.99.1.3215887122.1825455320.1639598459666.2.0
            └── SM_1.3.6.1.4.1.5962.99.1.3233347454.2096386808.1639615919998.2.0

Please let me know if this addresses your use case!

Thank you @fedorov this is great!

1 Like

Great to hear that @Luca_Graglia! I will then mark the earlier reply as solution, but please let me know if you have any further comments or questions related to this.