HMIS Data Quality Assessment Workflow

Systematic evaluation of facility-level monthly health data quality

INPUT: Monthly Facility Data

facility_id, period_id, indicator, count

Raw health facility reporting data aggregated at monthly level across all indicators

1

Data loading and preparation

  • Load raw HMIS data from CSV file
  • Convert period_id to date format for time series ordering
  • Detect available admin area columns in dataset
2

Outlier detection

  • Calculate median and MAD for each facility-indicator combination
  • Flag MAD-based outliers: values >10× MAD from median
  • Flag proportional outliers: single month >80% of annual volume
  • Final flag: (MAD outlier OR proportional outlier) AND count >100
3

Completeness assessment

  • Generate complete monthly time series for each facility-indicator
  • Flag months with missing reports as incomplete (flag=0)
  • Flag inactive periods: 6+ months missing before first or after last report
  • Remove inactive periods from analysis (flag=2)
4

Consistency analysis

  • Aggregate to district level (accounts for patient movement)
  • Calculate indicator pair ratios: Penta1/Penta3 ≥0.95, ANC1/ANC4 ≥0.95
  • Apply BCG/Delivery benchmark: ratio between 0.7-1.3
  • Expand district consistency results to all facilities in that area
5

DQA scoring

  • Filter to core DQA indicators
  • Check: complete, no outliers, consistent
  • DQA score = 1 if all checks pass
  • DQA score = 0 if any check fails

OUTPUTS

M1_output_outlier_list.csv (flagged outliers only)
M1_output_outliers.csv (all records with flags)
M1_output_completeness.csv (completeness status)
M1_output_consistency_geo.csv (district-level)
M1_output_consistency_facility.csv (facility-level)
M1_output_dqa.csv (final DQA scores)