I do have one question though. Do you have any idea how to translate the ICD-O-3 codes that are in the ‘nlst_prsn’ column ‘de_type’ into something human-readable?
I may have figured this out, in part. I believe ICD O 3 Coding Updates may be the official home for these codes and there is an Annotated Histology List spreadsheet that appears to have the mappings. I’m still kind of confused though, because there is more than one label for the numeric values in most cases. Some of them are marked “preferred” but in some cases there is more than one preferred label for a single numeric value. Can you just choose whatever you like best in that circumstance? Maybe the CRDC vocabulary people can weigh in?
I got the ones I use from the SEER site but I think that now redirects to the NAACCR site that you found.
I just use grep to find the line out of a CSV converted from the Excel, with the code and awk or csvtool out the relevant column with the description, but be careful of duplicate lines (including synonyms) embedded quotes, other punctuation, control characters and excessive length.
Also, not every code I encounter in real-world data sets is in the “current” list (some may have been retired/replaced).
Yeah, this is the part I was confused about. When there are lots of synonyms and more than one is marked “preferred” what should I use? But maybe it doesn’t really matter which one I pick since they’re all supposed to mean the same thing if they were granted the same code.