CIC Column Anomalies

Discovered during hospital filter development, 2026-03-10. Each HCODE links to the hospital page.

1. Duplicate Column Name Collision

The Google Sheet has two columns both named “CIC” in different sections:

CIC (staff boolean)

Column CJ · Index 87 · Section: Personal
YES NO Subroga empty

❌ Lost during processing — DictReader overwrites this

CIC (collaboration year)

Column DJ · Index 113 · Section: Colaboraciones
2018 2019 2026 ND

✅ Survives — second column wins in DictReader

Python’s csv.DictReader keys rows by column header. When two columns share a name, the last one silently overwrites the first. The processed hospitals.csv has two identical CIC columns, both containing year data. The original YES/NO staff data is completely lost.

The cleaning script (clean_data.py) has no CIC-specific logic — the corruption is purely from the DictReader duplicate-key behavior. Both source files (hospitals.csv and hospitals-2026-03-03.csv) have identical CIC data, so this isn’t a recent change.

2. Cross-Tab: Boolean vs Year

17 consistent (YES + year)  ·  15 missing years (YES + ND)  ·  6 contradictions (NO/empty + year)  ·  103 no involvement
CIC boolean (col 87)CIC year (col 113)CountAssessment
YESyear17Consistent — both columns agree
YESND15Missing year data — needs backfill
NO / emptyyear6Contradictory — needs reconciliation
NOND8Consistent — explicitly no CIC
SubrogaND1SML only
emptyND103Consistent — no involvement

3. The 6 Contradictions

These hospitals have a collaboration year recorded but the boolean column says NO or is blank:

HCODEHospitalCIC boolean (col 87)CIC year (col 113)Issue
COP Centro Oncológico Pediátrico de Baja California empty 2024 Empty boolean, has year
INP Instituto Nacional de Pediatría empty 2023 Empty boolean, has year
MTY Hospital Universitario UANL empty 2023 Empty boolean, has year
CVJ Hospital del Niño Morelense NO 2023 Explicitly NO but has year
MLM Hospital Infantil de Morelia NO 2023 Explicitly NO but has year
TAB Hospital del Niño Dr. Rodolfo Nieto Padrón NO 2023 Explicitly NO but has year

4. The 15 Missing Years

These hospitals say YES in the boolean but have no year recorded. The year column needs backfilling:

HCODEHospitalCIC booleanCIC year
ABCABC Medical CenterYESND
CPECampecheYESND
GDLCivil nuevoYESND
HMOSonoraYESND
ICMICMYESND
IQRIQRYESND
IVAIVAYESND
MOCMoctezumaYESND
NOV20 de NoviembreYESND
OAXOaxacaYESND
OCCCMNOYESND
PUEPueblaYESND
SGFSGFYESND
XALXalapaYESND
ZCLZacatecasYESND

5. Consistent Data (17 hospitals)

These hospitals have YES in the boolean AND a valid year — no issues:

HCODEHospitalCIC year
CULCuliacán2018
ITOHITO2018
LAPLa Paz2019
MIDMérida2019
TIJTijuana2019
CVMTamaulipas2021
NIMIMIEM2021
HRAHRA2022
TPQNayarit2022
ACPAcapulco2024
CUUChihuahua2024
CYWCelaya2024
TGZTuxtla2024
BJXBajío2025
LEOLeón2025
PCAPachuca2025
MISISSEMYM2026

6. Recommended Fix

Step 1: Rename one column in the Google Sheet — keep “CIC” for the boolean (staff), rename the collaboration column to “CIC Año” or “Año CIC”
Step 2: Reconcile the 6 contradictions with Ana
Step 3: Backfill the 15 missing years for hospitals marked YES
Step 4: Re-export and re-run clean_data.py

Current dashboard workaround: the filter system treats any non-falsy CIC value (including years) as “Sí” for boolean filtering.