Short Answer
- No, the
TSA suffix does not always indicate a tumor slide. It indicates a physical slide type (Top Slide, variant A), not tissue origin (TCGA Barcode - GDC Docs). In TCGA-BRCA, 91 out of 635 TSA slides are from normal tissue (verified via idc-index sm_index query — see evidence below).
- DX1 slides are FFPE diagnostic slides (Janowczyk, 2016). In TCGA-BRCA, all 1,061 DX1 slides come from tumor samples (1,060 primary tumor + 1 metastatic), but this reflects the TCGA collection protocol — not a universal rule that DX means tumor.
Understanding the TCGA Slide Barcode
A TCGA slide barcode like TCGA-A2-A0T2-01Z-00-DX1 has two independent encodings (TCGA Barcode - GDC Docs):
| Barcode segment |
Position |
Encodes |
Determines tissue type? |
| Sample type code |
Positions 14-15 (e.g., 01) |
Tissue origin: 01 = Primary Solid Tumor, 11 = Solid Tissue Normal (TCGA Sample Type Codes) |
Yes |
| Slide suffix |
Final segment (e.g., DX1) |
Physical slide preparation type (TCGA Barcode - GDC Docs) |
No |
The sample type code (not the slide suffix) determines whether a slide is tumor or normal.
Slide Suffix Meanings
The slide suffix encodes the physical preparation method and position, not the tissue origin (TCGA Barcode - GDC Docs):
| Prefix |
Full Name |
Preparation |
Description |
| TS |
Top Slide |
Frozen section |
Cut from the top of a tissue portion during surgery; adjacent to tissue used for genomic analysis (TCIA TCGA Guide) |
| BS |
Bottom Slide |
Frozen section |
Cut from the bottom of a tissue portion (TCGA Barcode - GDC Docs) |
| MS |
Middle Slide |
Frozen section |
Cut from the middle of a tissue portion (TCGA Barcode - GDC Docs) |
| DX |
Diagnostic Slide |
FFPE (formalin-fixed paraffin-embedded) |
Permanent diagnostic-quality slide from clinical pathology workflow (Janowczyk, 2016) |
The letter or number after the prefix (e.g., A in TSA, 1 in DX1) indicates the slide order within that type (TCGA Barcode - GDC Docs).
Evidence from TCGA-BRCA Data in IDC
The following counts were obtained by querying TCGA-BRCA slides in IDC using sm_index.ContainerIdentifier and sm_index.primaryAnatomicStructureModifier_CodeMeaning via idc-index 0.11.10.
Note: The ContainerIdentifier column was added to sm_index in idc-index 0.11.10. If you are using an older version, upgrade first: pip install --upgrade idc-index
TSA appears on both tumor AND normal slides
Sample type 01 (Primary Solid Tumor): 542 TSA slides
Sample type 06 (Metastatic): 2 TSA slides
Sample type 11 (Solid Tissue Normal): 91 TSA slides
TSA is not a tumor indicator. It appears on 91 normal tissue slides in TCGA-BRCA.
DX slides appear almost exclusively on tumor samples (in TCGA-BRCA)
Sample type 01 (Primary Solid Tumor): 1060 DX1 slides, 67 DX2, 4 DX3, 1 DX4
Sample type 06 (Metastatic): 1 DX1 slide
Sample type 11 (Normal): 0 DX slides
DX slides in TCGA-BRCA are exclusively from tumor samples. However, this reflects the TCGA collection protocol (diagnostic slides were prepared for tumor specimens), not a universal rule that DX = tumor.
Normal tissue slides are predominantly TS/BS types
Normal slides by type: TSA (91), TSB (93), TSC (54), TSD (28), TS1 (27), TS2 (19),
TS3 (18), BSA (18), BSB (9), TS4 (9), TSE (8), TS5 (6), ...
Normal tissue in TCGA-BRCA was primarily submitted as frozen sections (TS/BS), not diagnostic slides (DX).
Correct Way to Identify Tumor vs Normal
Always use the sample type code or structured DICOM metadata — never rely on slide suffix alone.
Note: The ContainerIdentifier column was added to sm_index in idc-index 0.11.10. If you are using an older version, upgrade first: pip install --upgrade idc-index
Approach 1: Use primaryAnatomicStructureModifier_CodeMeaning from sm_index
This column contains structured tissue type from DICOM specimen metadata (idc-index indices reference).
from idc_index import IDCClient
client = IDCClient()
client.fetch_index("sm_index")
# Get tumor slides in TCGA-BRCA
client.sql_query("""
SELECT
s.primaryAnatomicStructureModifier_CodeMeaning as tissue_type,
COUNT(*) as slide_count,
COUNT(DISTINCT i.PatientID) as patient_count
FROM sm_index s
JOIN index i ON s.SeriesInstanceUID = i.SeriesInstanceUID
WHERE i.collection_id = 'tcga_brca'
GROUP BY tissue_type
ORDER BY slide_count DESC
""")
# Returns: Neoplasm, Primary (2704), Normal (399), None (8 metastatic)
Approach 2: Parse the TCGA barcode sample type code from ContainerIdentifier
The ContainerIdentifier column in sm_index stores the TCGA slide barcode (idc-index indices reference). Sample type codes are defined in the TCGA Sample Type Codes table: 01-09 = tumor, 10-19 = normal.
# Extract sample type code from barcode
client.sql_query("""
SELECT
SUBSTRING(SPLIT_PART(s.ContainerIdentifier, '-', 4), 1, 2) as sample_type_code,
s.primaryAnatomicStructureModifier_CodeMeaning as tissue_type,
COUNT(*) as slide_count
FROM sm_index s
JOIN index i ON s.SeriesInstanceUID = i.SeriesInstanceUID
WHERE i.collection_id = 'tcga_brca'
GROUP BY sample_type_code, tissue_type
ORDER BY sample_type_code
""")
# Returns: 01 → Neoplasm, Primary (2704), 06 → None (8), 11 → Normal (399)
References
This response was prepared using the imaging-data-commons Claude Code skill and verified against IDC data version v23 with idc-index 0.11.10.