Healthcare analytics

Hospital Discharge Intelligence.

A healthcare data analysis case study using de-identified New York hospital discharge data to explore where operational burden becomes visible: costs, length of stay, payer mix, diagnosis groups, mortality risk, and provider-level variation. Python and Pandas were used for data preparation; Power BI was used for modeling, analysis, and dashboard delivery.

Open Power BI dashboard View GitHub PDF preview

2.05M de-identified discharge records

202 New York State hospitals

14 analysis-ready fields

Power BI interactive dashboard

Project preview

Dashboard signals at a glance.

A visual preview of the analytical patterns. The full dashboard remains available for interactive exploration.

Open interactive dashboard View PDF report

Median cost by diagnosis Selected high-cost contrast

Foreign body entering opening ~$90K

Chronic rheumatic heart disease ~$50K

Selected comparator group lower

Common-care baseline lower

Length of stay Care complexity signal

37d Maltreatment / abuse

lower Respiratory care

lower Common inpatient

Charges vs. costs Financial pressure

3-3.5x

Selected service lines showed charges several times above actual care costs.

Payer and risk mix Higher-risk segment

Medicare more major / extreme risk

Private insurance lower share

Why it matters

From healthcare burden to product questions.

My first goal was simple: take real healthcare data and see whether I could turn it into analytical observations that may be useful in practice, for example for healthcare companies, insurers, or medical organizations looking at diagnosis, cost, and risk patterns.

The hospital discharge dataset interested me because it was based on real 2021 inpatient data, not a synthetic training file. I wanted to see whether cost, length of stay, payer mix, and provider-level variation could reveal patterns that might later be useful for practical healthtech or healthcare analytics questions.

Analytical questions

What the dashboard explores.

The analysis is descriptive. It is designed for exploration and hypothesis generation, not causal claims.

Cost pressure

Which diagnosis groups are associated with unusually high median costs and charges?

Length of stay

Where do long stays suggest care coordination, discharge, or aftercare challenges?

Payer and risk mix

How do cost, payer groups, severity, and mortality-risk patterns differ across segments?

Provider variation

Where do hospital-level patterns reveal operational heterogeneity worth investigating further?

Data and method

Public real-world data, cleaned for analysis.

The dataset is the 2021 SPARCS de-identified inpatient discharge file from New York State. It contains discharge-level detail on patient characteristics, diagnoses, services, and charges without protected health information.

Source

SPARCS 2021

Hospital Inpatient Discharges, de-identified, New York State Department of Health.

Cleaning

Python and Pandas

Reduced the raw file from 32 columns to 14 analysis-ready fields.

BI model

Power BI

Built the analytical model and measures for cost, charges, length of stay, payer comparison, and risk segmentation.

Dashboard

Interactive report

Designed an interactive dashboard focused on burden, provider variation, and financial signals.

Selected signals

Examples from the analysis.

These are descriptive signals from the dashboard, not medical or policy conclusions.

High-cost diagnosis signal: one diagnosis group in the dashboard showed an approximately $90K median cost.

Long-stay signal: maltreatment and abuse-related cases showed an average stay of 37 days.

Financial signal: several service lines showed charges around 3-3.5x actual care costs.

Payer and risk signal: Medicare discharges showed higher average costs and a larger share of major or extreme mortality risk than private insurance patients in this dataset view.

Limitations

What this project does not claim.

This is a descriptive analysis, not a causal study.
The dataset covers New York State inpatient discharges in 2021.
Patterns may reflect coding, case mix, hospital specialization, and other contextual factors.
The dashboard is for exploration, not medical, reimbursement, or policy decisions.