I finally made the time to play with it a bit. As a disclamer, I do not know much about ChatGPT beyond occasionally using it via the OpenAI web interface.
What I liked
First of all, I learned new things and it is useful! I understand there is very limited code that you introduce on top of ChatGPT, but what you do is very helpful. The “pretext” you added (I had no idea there is even a concept of “pretext”!) in this line https://github.com/UM2ii/text2cohort/blob/main/text2cohort.ipynb?short_path=7cd8242#L62 is very handy! If you add this text before anything in a new ChatGPT interaction, you can then use off-the-shelf ChatGPT as a convenient IDC SQL query helper:
Make sure to use regex. Please be as specific as possible and only return the final query enclosed in ```. Do not provide explanations. Using the table: bigquery-public-data.idc_current.dicom_all:
Here’s an example (for those reading this, I emphasize - this “just works” out of the box with OpenAI ChatGPT interface, no extra code needed!):
(the actual query is a bit different, but the answer is close enough:
SELECT
collection_id
FROM
`bigquery-public-data.idc_current.dicom_all`
WHERE
REGEXP_CONTAINS(modality, r'CT')
GROUP BY
collection_id
ORDER BY
COUNT(DISTINCT PatientID) DESC
)
Second, it was very interesting to see the queries you selected for your evaluation! I think the most unexpected one was “For each collection hosted on IDC, what is the proportion of male and female patients?”.
What I did not like
I note that I was not able to run your notebook - I do not have any OpenAI tokens, and I am not interested to buy any. You may want to add a more prominent disclaimer to your notebook and/or instructions how to get tokens and get the API key.
I was not able to get the same results as you demonstrate in the supplementary table 1 of the version 2 of your preprint. Few examples:
SELECT
collection,
COUNTIF(REGEXP_CONTAINS(body_part, r'(?i)chest') AND modality = 'CT') AS num_cases_chest_ct,
COUNT(DISTINCT case_id) AS num_cases_total,
COUNTIF(patient_sex = 'M') / COUNT(DISTINCT case_id) AS proportion_male,
COUNTIF(patient_sex = 'F') / COUNT(DISTINCT case_id) AS proportion_female
FROM
`bigquery-public-data.idc_current.dicom_all`
GROUP BY
collection
ORDER BY
num_cases_chest_ct DESC
In contrast, the result presented in your preprint (purportedly, without any expert corrections) is the following (formatted for presentation purposes):
WITH
gender_counts AS (
SELECT
collection_id,
PatientSex,
COUNT(DISTINCT PatientID) AS patient_count
FROM
`bigquery-public-data.idc_current.dicom_all`
GROUP BY
collection_id,
PatientSex ),
total_patients AS (
SELECT
collection_id,
COUNT(DISTINCT PatientID) AS total_count
FROM
`bigquery-public-data.idc_current.dicom_all`
GROUP BY
collection_id )
SELECT
gender_counts.collection_id,
total_patients.total_count,
ROUND(gender_counts.patient_count / CAST(total_patients.total_count AS numeric), 2) AS male_proportion,
ROUND((total_patients.total_count - gender_counts.patient_count) / CAST(total_patients.total_count AS numeric), 2) AS female_proportion
FROM
gender_counts
JOIN
total_patients
ON
gender_counts.collection_id = total_patients.collection_id
ORDER BY
gender_counts.collection_id;
I do not know if there is a mistake in the preprint and it was revised by the expert, or the results of ChatGPT are not expected to be reproducible, or the results via API may be different from web interface, or I was using a different version of ChatGPT behind the scenes … but in either case, this this lack of reproducibility is a major problem if you want to present this as an academic study.
Next, and somewhat related to the above, I am curious how come in your results queries were using proper DICOM attributes and collection_id
, while in the results I was getting, that was not the case? Another unexplained observation.
On another occasion, I noticed that the resulting query was simply incorrect and misleading, and, quite likely, this would not be noticed by the user. Here’s the example:
Remarkably, DICOM modalities were selected correctly. The problem is, the proposed query will select all collections that have exactly 2 modalities, with one of those modalities be either CT or SM. The cleaned up query with the added column listing all modalities within the collection (to demonstrate that the query is not satisfying the prescribed requirements) is below, along with the snippet of the result:
SELECT
collection_id,
COUNT(DISTINCT PatientID) AS patient_count,
STRING_AGG(DISTINCT(Modality)) AS modalities
FROM
`bigquery-public-data.idc_current.dicom_all`
WHERE
modality IN ('MR', 'SM')
GROUP BY
collection_id
HAVING
COUNT(DISTINCT modality) = 2
ORDER BY
patient_count DESC
The above is consistent with my experience with ChatGPT overall - when it gives the correct answer, it is amazing. But then it will give an incorrect answer, without even a slight expression of doubt, and for a user who does not have the domain knowledge, it is impossible to detect that the answer is incorrect. Those answers should always be cross-checked, which greatly diminishes the practical value of this tool.
My take away
I can definitely see how the use of ChatGPT can be handy as an aid in exploring IDC - especially with the pretext customization proposed (and I hope this thread will motivate some of the beginners!). Further, I can also see how it can help put together initial version of the query even for those users who are somewhat familiar with SQL, but want to get the initial query automatically.
BUT - especially if you are novice user! - never treat the results produced by this tool as truth. I am sure the models will evolve, but I think the practical value of this approach is yet to be established. I would also be very interested to see how ChatGPT, with as minimal effort as possible, can become more DICOM-aware (e.g., by using DICOM attributes, incorporating the knowledge of the DICOM data model).
I would also encourage those who want to continue those exploration to try to engage with the users of IDC and/or broad community of imaging researchers and survey their needs with respect to what queries they would find interesting.