Healthcare analytics
Hospital Discharge Intelligence.
A healthcare data analysis case study using de-identified New York hospital discharge data to explore where operational burden becomes visible: costs, length of stay, payer mix, diagnosis groups, mortality risk, and provider-level variation. Python and Pandas were used for data preparation; Power BI was used for modeling, analysis, and dashboard delivery.
Project preview
Dashboard signals at a glance.
A visual preview of the analytical patterns. The full dashboard remains available for interactive exploration.
Why it matters
From healthcare burden to product questions.
My first goal was simple: take real healthcare data and see whether I could turn it into analytical observations that may be useful in practice, for example for healthcare companies, insurers, or medical organizations looking at diagnosis, cost, and risk patterns.
The hospital discharge dataset interested me because it was based on real 2021 inpatient data, not a synthetic training file. I wanted to see whether cost, length of stay, payer mix, and provider-level variation could reveal patterns that might later be useful for practical healthtech or healthcare analytics questions.
Analytical questions
What the dashboard explores.
The analysis is descriptive. It is designed for exploration and hypothesis generation, not causal claims.
Cost pressure
Which diagnosis groups are associated with unusually high median costs and charges?
Length of stay
Where do long stays suggest care coordination, discharge, or aftercare challenges?
Payer and risk mix
How do cost, payer groups, severity, and mortality-risk patterns differ across segments?
Provider variation
Where do hospital-level patterns reveal operational heterogeneity worth investigating further?
Data and method
Public real-world data, cleaned for analysis.
The dataset is the 2021 SPARCS de-identified inpatient discharge file from New York State. It contains discharge-level detail on patient characteristics, diagnoses, services, and charges without protected health information.
SPARCS 2021
Hospital Inpatient Discharges, de-identified, New York State Department of Health.
Python and Pandas
Reduced the raw file from 32 columns to 14 analysis-ready fields.
Power BI
Built the analytical model and measures for cost, charges, length of stay, payer comparison, and risk segmentation.
Interactive report
Designed an interactive dashboard focused on burden, provider variation, and financial signals.
Selected signals
Examples from the analysis.
These are descriptive signals from the dashboard, not medical or policy conclusions.
High-cost diagnosis signal: one diagnosis group in the dashboard showed an approximately $90K median cost.
Long-stay signal: maltreatment and abuse-related cases showed an average stay of 37 days.
Financial signal: several service lines showed charges around 3-3.5x actual care costs.
Payer and risk signal: Medicare discharges showed higher average costs and a larger share of major or extreme mortality risk than private insurance patients in this dataset view.
Limitations
What this project does not claim.
- This is a descriptive analysis, not a causal study.
- The dataset covers New York State inpatient discharges in 2021.
- Patterns may reflect coding, case mix, hospital specialization, and other contextual factors.
- The dashboard is for exploration, not medical, reimbursement, or policy decisions.