Coverage estimates¶
Background and purpose¶
Objective of the module¶
The Coverage Estimates module quantifies health service coverage by integrating adjusted administrative service volumes from the Health Management Information System (HMIS), population projections from the United Nations World Population Prospects (UN WPP), and household survey data. While the module currently draws on Demographic and Health Surveys (DHS) and Multiple Indicator Cluster Surveys (MICS), it is designed to accommodate other nationally representative survey sources as they become available. The module estimates the share of the target population that received a given health service, providing a standardized measure of service reach for use in monitoring, comparison, and downstream analysis. The module is structured in two components.
Part 1 constructs target population denominators using multiple methodological approaches and evaluates their performance by comparing resulting coverage estimates with available survey reference values for each health indicator.
Part 2 allows users to review and adjust denominator selections based on programmatic considerations and to extend survey-based coverage estimates over time using trends derived from administrative data, where survey data are not available.
Together, these components convert administrative service volumes into standardized coverage estimates that can be examined over time and across geographic levels, and used in analytical and monitoring contexts.
Analytical rationale¶
Health service coverage is a core metric for assessing health system performance and equity. While Module 2 produces adjusted service volumes, these figures on their own do not indicate the extent to which services reach the populations they are intended to serve. Coverage estimates place service delivery in context by relating service volumes to population need.
This module addresses key challenges in estimating coverage, including:
-
Multiple data sources: Integrates HMIS data with survey data
-
Denominator uncertainty: Different methods for estimating target populations may yield different results; the module systematically evaluates options
-
Temporal gaps: Surveys occur every 3-5 years; the module projects estimates for intervening years using administrative trends
-
Subnational analysis: Enables coverage monitoring at national, provincial, and district levels
Key points¶
| Component | Details |
|---|---|
| Inputs | M2_adjusted_data (national & subnational) from Module 2 Survey data (MICS/DHS) from GitHub repository Population data (UN WPP) from GitHub repository |
| Outputs | M4_denominators (national, admin2, admin3) - calculated target populations M4_combined_results (national, admin2, admin3) - coverage estimates with all denominators M5_coverage_estimation (national, admin2, admin3) - final coverage with projections |
| Purpose | Estimate health service coverage by comparing service volumes to target populations, validated against survey benchmarks |
Part 1 and part 2 explained¶
Part 1: Denominator calculation and selection
-
Calculates target populations (denominators) using multiple approaches: HMIS-based (from ANC1, delivery, BCG, Penta1) and population-based (UN WPP)
-
Compares coverage estimates from each denominator against survey data
-
Automatically selects the "best" denominator for each indicator by minimizing error
-
Outputs: Denominator datasets and combined results showing all options
Part 2: Denominator selection and survey projection
-
Allows users to override automatic selections and choose specific denominators
-
Calculates year-over-year coverage trends from administrative data
-
Projects survey estimates forward using HMIS trends to fill temporal gaps
-
Outputs: Final coverage estimates combining HMIS, survey, and projected values
Analytical workflow¶
Overview of analytical steps¶
Part 1: Denominator calculation and selection¶
Step 1: Load and prepare data sources The module begins by loading three data sources and ensuring they are compatible. HMIS data is aggregated from monthly to annual totals. Survey data is harmonized (DHS prioritized over MICS) and forward-filled to create continuous time series. Population data is filtered to the target country.
Step 2: Calculate multiple denominator options For each health indicator, the module calculates several possible target populations:
-
Service-based denominators: Using HMIS volumes divided by survey coverage (e.g., if 10,000 women received ANC1 and survey says coverage is 80%, estimated pregnancies = 10,000/0.80 = 12,500)
-
Population-based denominators: Using UN population projections and birth rates
-
Each denominator is adjusted for demographic factors (pregnancy loss, stillbirths, mortality rates) to match the indicator's target age group
Step 3: Calculate coverage for each denominator The module computes coverage by dividing the service volume by each denominator option. This produces multiple coverage estimates per indicator, each based on a different population assumption.
Step 4: Compare to survey benchmarks Each coverage estimate is compared to survey data using squared error calculation. The survey serves as the benchmark since it is based on representative household sampling.
Step 5: Select the best denominator The denominator producing the lowest error (closest match to survey) is automatically selected as "best." The selection prioritizes HMIS-based denominators over population projections to ensure data is driven by observed service delivery.
Step 6: Generate outputs The module saves denominator datasets for transparency and combined results files showing coverage from all denominators plus the selected best option.
Step 7: Repeat for subnational levels If subnational data is available, the process repeats for administrative level 2 (e.g., provinces) and level 3 (e.g., districts), with fallback mechanisms to handle missing local survey data.
Part 2: Denominator selection and survey projection¶
Step 1: User configuration Users review Part 1 results and configure denominator selections for each indicator. Options include using the automatic "best" selection or overriding with a specific denominator based on programmatic knowledge.
Step 2: Filter to selected denominators The module filters Part 1's combined results to include only user-selected denominators, creating a focused dataset for analysis.
Step 3: Calculate coverage trends Year-over-year changes (deltas) in HMIS-based coverage are calculated. This shows whether coverage is increasing, decreasing, or stable over time.
Step 4: Identify survey baseline For each geographic area and indicator, the most recent survey observation is identified as the baseline anchor point for projections.
Step 5: Project survey estimates forward The module extends survey coverage estimates into years without surveys by applying HMIS trends. The projection uses: Last survey value + (Current year HMIS coverage - Survey year HMIS coverage). This preserves the survey calibration while incorporating observed trends.
Step 6: Combine all estimates The final output merges three types of estimates:
-
HMIS-based coverage: Direct calculation from service volumes and selected denominators
-
Original survey values: Actual household survey observations
-
Projected survey coverage: Survey estimates extended using HMIS trends
Step 7: Save final outputs Results are saved with standardized column structures for each administrative level, ready for visualization and reporting.
Workflow diagram¶
Key decision points¶
1. Selection of denominators
In Part 1, the module automatically selects denominator options based on their alignment with available survey reference values. In Part 2, users may review and override these selections based on programmatic knowledge or analytical priorities. The choice of denominator determines whether coverage estimates are primarily anchored to observed service delivery patterns (HMIS-based denominators) or to demographic projections (population-based denominators).
2. Treatment of gaps between surveys
Household surveys are conducted at irregular intervals, typically every three to five years. In Part 1, survey values are forward-filled between survey years, implicitly assuming constant coverage until the next survey observation. In Part 2, coverage is projected forward using trends derived from HMIS data, allowing changes in service delivery to be reflected in periods without survey data.
3. Use of national versus subnational survey data
For immunization indicators only, when subnational survey estimates are not available, the module applies national survey values to subnational units as a fallback. This approach assumes that national immunization coverage rates are broadly representative at subnational levels, an assumption that may not hold in all settings. This fallback mechanism is not applied to other indicators, such as maternal or child health services, for which subnational analysis requires locally observed survey data.
4. Adjustment of denominators for target populations
Each health indicator corresponds to a specific target population (for example, pregnant women for antenatal care or infants for childhood vaccination). The module applies sequential demographic adjustments—such as pregnancy loss, stillbirths, and mortality—to align denominators with the relevant target population for each indicator.
Data processing and outputs¶
Input integration
The module integrates three primary data sources: annualized HMIS service volumes aggregated by geographic unit; household survey coverage estimates harmonized across survey rounds and forward-filled to create continuous time series; and population projections filtered to extract age- and sex-specific populations relevant to each health indicator.
Denominator construction
Using the relationship between reported HMIS service volumes and survey-based coverage estimates, the module derives HMIS-implied denominators representing the population size consistent with observed service delivery and survey coverage levels. These denominators are further adjusted to reflect indicator-specific target populations through sequential demographic corrections, including pregnancy loss, stillbirths, and mortality.
Coverage calculation
Multiple coverage estimates are calculated by dividing service volumes by alternative denominator options, including population-based, HMIS-implied, and hybrid approaches. Each coverage estimate is evaluated against survey reference values to assess plausibility and to inform denominator selection for each indicator.
Temporal projection
For years beyond the most recent survey observation, coverage estimates are projected forward by combining the last observed survey value with trends derived from HMIS data.
Analysis outputs and visualization¶
The FASTR analysis generates coverage estimate visualizations at multiple geographic levels:
1. Coverage calculated from HMIS data (national)
National-level coverage trends comparing HMIS-derived estimates against survey benchmarks.

2. Coverage calculated from HMIS data (admin area 2)
Coverage patterns at an intermediate subnational level (admin_area_2), highlighting geographic variation in service delivery across regions.

3. Coverage calculated from HMIS data (admin area 3)
Coverage estimates at a finer subnational level (admin_area_3), supporting more localized monitoring and identification of subnational disparities.

Interpretation guide
For all coverage charts (outputs 1–3):
- Black line/points: Survey-based coverage (DHS/MICS) — the reference standard
- Grey line/points: HMIS-based coverage calculated from facility data
- Red line/points: Projected coverage extending survey estimates using HMIS trends
- Y-axis: Coverage percentage (0–100%)
- X-axis: Time period (years)
Geographic levels:
- Output 1: National-level trends
- Output 2: Admin area 2 (regional/provincial) breakdown
- Output 3: Admin area 3 (district) breakdown for local targeting
Detailed reference¶
Part 1: Denominator calculation (technical details)¶
Configuration parameters¶
The module begins with several configurable parameters that control the analysis:
COUNTRY_ISO3 <- "ISO3" # ISO3 country code (e.g., "RWA", "UGA", "ZMB")
SELECTED_COUNT_VARIABLE <- "count_final_both" # Which adjusted count to use
ANALYSIS_LEVEL <- "NATIONAL_PLUS_AA2" # Geographic scope
Analysis level options:
NATIONAL_ONLY: National-level analysis onlyNATIONAL_PLUS_AA2: National + administrative area 2 (e.g., provinces)NATIONAL_PLUS_AA2_AA3: National + admin area 2 + admin area 3 (e.g., districts)
Demographic adjustment rates:
PREGNANCY_LOSS_RATE <- 0.03 # 3% pregnancy loss
TWIN_RATE <- 0.015 # 1.5% twin births
STILLBIRTH_RATE <- 0.02 # 2% stillbirths
P1_NMR <- 0.039 # Neonatal mortality rate
P2_PNMR <- 0.028 # Post-neonatal mortality rate
INFANT_MORTALITY_RATE <- 0.063 # Infant mortality rate
UNDER5_MORTALITY_RATE <- 0.103 # Under-5 mortality rate
Count variable options:
count_final_none: No adjustments (raw reported data)count_final_outlier: Outlier adjustment onlycount_final_completeness: Completeness adjustment onlycount_final_both: Both adjustments (recommended)
Input data sources¶
Part 1 integrates three primary data sources:
1. HMIS Adjusted Data (from Module 2)
- National:
M2_adjusted_data_national.csv - Subnational:
M2_adjusted_data_admin_area.csv - Contains service volumes by indicator, area, and time period
2. Survey Data (DHS/MICS)
- Source: GitHub repository (unified survey dataset)
- Provides coverage benchmarks for comparison
- DHS data prioritized over MICS when both available
3. Population Data (UN WPP)
- Source: GitHub repository
- Provides population-based denominators
- Includes total population, births, under-1, and under-5 populations
Additional data context:
Population projections (UN WPP) Sourced from the United Nations World Population Prospects, these estimates provide age-specific and total population figures used to calculate denominators for coverage estimates. These projections account for demographic trends, including fertility, mortality, and migration.
Survey data - MICS MICS, conducted by UNICEF, provide household survey-based estimates for key health indicators, including coverage of maternal and child health services.
Survey data - DHS DHS, conducted by USAID, provide survey data on health service utilization, including immunization rates and maternal care coverage.
Core functions documentation¶
process_hmis_adjusted_volume()
Purpose: Prepares HMIS data for denominator calculation
Input:
- Adjusted volume data from Module 2
- Selected count variable (e.g.,
count_final_both)
Processing:
- Aggregates monthly data to annual totals
- Counts number of reporting months per year
- Pivots data to wide format (one column per indicator)
Output:
annual_hmis: Annual service counts by area and yearhmis_countries: List of countries in datasethmis_iso3: ISO3 code(s) present
Example structure:
process_survey_data()
Purpose: Harmonizes and extends survey data for use as coverage benchmarks
Input:
- Survey data (DHS/MICS)
- HMIS country names and ISO3 codes
- Optional national reference (for subnational fallback)
Key processing steps:
- Harmonization
- Recodes indicator names (e.g.,
polio1→opv1,vitamina→vitaminA) - Normalizes source labels (
dhs,mics) -
Filters by country and date range
-
Source prioritization
- When both DHS and MICS exist for same year/area/indicator
- DHS is selected preferentially
-
Preserves source details for transparency
-
Fallback logic
- If
sbamissing, usesdeliveryvalues - If
pnc1_mothermissing, usespnc1values -
Subnational areas use national values when local data unavailable (for BCG, Penta1, Penta3)
-
Forward-Filling
- Creates complete time series for each area
- Carries forward last observed value (
na.locf) - Creates "carry" columns (e.g.,
anc1carry,bcgcarry)
Output:
carried: Extended survey data with forward-filled valuesraw: Raw survey observations (wide format)raw_long: Raw survey observations (long format) with source details
process_national_population_data()
Purpose: Prepares UN WPP population estimates for denominator calculation
Input:
- Population estimates (UN WPP)
- HMIS country identifiers
Processing:
- Filters to national level and target country
- Extracts key population indicators:
crudebr_unwpp: Crude birth ratepoptot_unwpp: Total populationtotu1pop_unwpp: Under-1 population
Output:
wide: Population indicators in wide formatraw_long: Population data in long format with source tracking
calculate_denominators()
Purpose: Calculates all possible denominators from HMIS and population data. This is the core function that generates multiple denominator estimates.
Input:
hmis_data: Annual service countssurvey_data: Survey reference values (carried forward)population_data: UN WPP estimates (national only)
Denominator types calculated:
A. Service-Based Denominators (using HMIS numerator ÷ survey coverage):
- From ANC1:
danc1_pregnancy: Estimated pregnanciesdanc1_delivery: Estimated deliveriesdanc1_birth: Estimated births (live + stillbirths)danc1_livebirth: Estimated live birthsdanc1_dpt: Eligible for DPT (adjusted for neonatal mortality)danc1_measles1: Eligible for MCV1-
danc1_measles2: Eligible for MCV2 -
From delivery:
ddelivery_livebirth,ddelivery_birth,ddelivery_pregnancy-
ddelivery_dpt,ddelivery_measles1,ddelivery_measles2 -
From SBA (Skilled Birth Attendance):
- Same structure as delivery denominators
dsba_livebirth,dsba_birth,dsba_pregnancy-
dsba_dpt,dsba_measles1,dsba_measles2 -
From BCG (national only):
-
dbcg_pregnancy,dbcg_livebirth,dbcg_dpt -
From Penta1:
dpenta1_dpt,dpenta1_measles1,dpenta1_measles2
B. Population-Based Denominators (national only):
dwpp_pregnancy: From crude birth rate × total population ÷ (1 + twin rate)dwpp_livebirth: From crude birth rate × total populationdwpp_dpt: Under-1 populationdwpp_measles1: Under-1 population adjusted for neonatal mortalitydwpp_measles2: Further adjusted for post-neonatal mortality
C. Vitamin A and Full Immunization:
For each livebirth denominator, additional denominators are automatically created:
d*_vitaminA: Livebirth × (1 - U5MR) × 4.5 (children 6-59 months)d*_fully_immunized: Livebirth × (1 - IMR)
Adjustment for Incomplete Reporting:
When nummonth < 12, population-based denominators are scaled:
Output:
Data frame with all calculated denominators plus original HMIS and survey data
classify_source_type()
Purpose: Categorizes denominators to prevent circular references
Logic:
reference_based: Denominator calculated from same indicator (e.g.,danc1_pregnancyfor ANC1)unwpp_based: Denominator from UN WPP population dataindependent: Denominator from a different service indicator
Importance:
This classification ensures that when selecting "best" denominators, we avoid using reference-based denominators (which would artificially show 100% coverage equal to the survey value).
compare_coverage_to_survey()
Purpose: Selects the best-performing denominator for each indicator
Input:
- Coverage estimates from all denominators
- Survey reference values (forward-filled)
Selection algorithm:
- Calculate coverage: For each denominator option
- Calculate error: Compare to survey benchmark
-
Classify source type: Label each denominator as independent, reference-based, or UNWPP
-
Selection hierarchy:
Priority 1: Independent denominators (non-reference, non-UNWPP) → lowest error
Priority 2: Reference-based denominators (only if no independent available)
Priority 3: UNWPP denominators (last resort fallback)
- Geographic consistency: Best denominator selected per geographic area × indicator (not per year)
Output:
Coverage data filtered to only the best-performing denominator for each indicator, with ranking
Key design decision:
- UNWPP denominators excluded from "best" selection by default
- Prevents over-reliance on population projections
- Ensures HMIS data drives coverage when available
- UNWPP used only when no HMIS-based options exist
create_combined_results_table()
Purpose: Merges coverage estimates and survey observations into unified output
Input:
- Coverage comparison results (best denominator selected)
- Raw survey observations
- All coverage data (optional, includes all denominators)
Output structure:
admin_area_1 year indicator_common_id denominator_best_or_survey value
Country_Name 2020 anc1 best 85.3
Country_Name 2020 anc1 survey 84.2
Country_Name 2020 anc1 danc1_pregnancy 85.3
Country_Name 2020 anc1 dwpp_pregnancy 82.1
Denominator categories:
best: Selected optimal denominatorsurvey: Actual survey observationd*_*: Individual denominator results (all options)
Statistical methods & algorithms¶
Forward-filling (last observation carried forward)
Survey data typically has gaps (e.g., DHS every 5 years). To create continuous denominators:
Example:
Year: 2015 2016 2017 2018 2019 2020
Raw: 85.3 NA NA NA 87.2 NA
Filled: 85.3 85.3 85.3 85.3 87.2 87.2
This assumes coverage remains constant until next observation.
Squared error minimization
To select the best denominator:
Where:
- \(C_{d,t}\) = Coverage using denominator \(d\) in year \(t\)
- \(S_t\) = Survey coverage in year \(t\)
- Summation is across all years with survey data
Conceptual framework: Demographic cascades¶
Before presenting the specific formulas, it is important to understand the conceptual flow of denominator calculations. Denominators are derived through sequential demographic adjustments that reflect the biological cascade from pregnancy to specific health service target populations.
Illustrative example: From pregnancy to DPT-eligible population
Consider how an estimated 10,000 pregnancies translate to the population eligible for DPT vaccination:
Starting point (pregnancies): 10,000
→ After pregnancy loss (3%): 10,000 × (1 - 0.03) = 9,700 deliveries
→ After twin adjustment (1.5% rate): 9,700 × (1 - 0.015/2) = 9,627 births
→ After stillbirths (2%): 9,627 × (1 - 0.02) = 9,435 live births
→ After neonatal deaths (3.9%): 9,435 × (1 - 0.039) = 9,067 DPT-eligible children
This cascade demonstrates how each demographic factor sequentially reduces the population size as we move through life stages. The detailed mathematical formulas in the following sections follow this same logic, but work in both directions:
- Forward cascade: Starting from earlier indicators (ANC1, Delivery) and adjusting toward later target populations
- Backward cascade: Starting from later indicators (BCG, Penta1) and working backwards to estimate earlier populations
The specific rates and formulas for each denominator source are provided in detail below.
HMIS-based Denominator Calculations¶
Denominators derived from ANC1
Starting from ANC1 service counts and survey coverage, we calculate:
Estimated pregnancies (base calculation):
Estimated deliveries (adjusted for pregnancy loss):
Estimated births (adjusted for twin births):
Estimated live births (adjusted for stillbirths):
Population eligible for DPT/Penta vaccines (adjusted for neonatal mortality):
Population eligible for MCV1 (adjusted for post-neonatal mortality):
Population eligible for MCV2 (adjusted for additional post-neonatal mortality):
Denominators derived from delivery
Starting from institutional delivery counts and survey coverage:
Estimated live births (base calculation):
Estimated births (adjusted for stillbirths):
Estimated pregnancies (adjusted for twin births and pregnancy loss):
Population eligible for DPT/Penta vaccines:
Population eligible for MCV1:
Population eligible for MCV2:
Note: Denominators derived from Skilled Birth Attendance (SBA) follow the same formulas as delivery denominators.
Denominators derived from BCG (National analysis only)
Starting from BCG vaccination counts and survey coverage:
Estimated live births (base calculation):
Estimated pregnancies (working backwards through demographic adjustments):
Population eligible for DPT/Penta vaccines:
Denominators derived from Penta1
Starting from Penta1 vaccination counts and survey coverage:
Population eligible for DPT/Penta vaccines (base calculation):
Population eligible for MCV1:
Population eligible for MCV2:
Denominators derived from live birth counts
When live birth data is directly reported in HMIS:
Estimated live births (base calculation):
Estimated pregnancies (working backwards):
Estimated deliveries:
Estimated births:
Population eligible for DPT/Penta vaccines:
Population eligible for MCV1:
Population eligible for MCV2:
UNWPP-based Denominator Calculations¶
Denominators derived from UN WPP (National analysis only)
Instead of using service volumes, these denominators are calculated directly from population projections and demographic rates:
Estimated pregnancies (from crude birth rate and total population):
Estimated live births (from crude birth rate):
Population eligible for DPT/Penta vaccines (under-1 population):
Population eligible for MCV1 (adjusted for neonatal mortality):
Population eligible for MCV2 (adjusted for post-neonatal mortality):
Adjustment for Incomplete Reporting:
When HMIS data contains fewer than 12 months of reported data in a year, all UNWPP denominators are scaled to match the reporting period:
This adjustment ensures denominators are comparable to service volumes that may only represent partial-year reporting.
Denominators derived from live birth estimates (secondary calculations)
After all primary live birth denominators are calculated (from ANC1, Delivery, BCG, Penta1, Live Birth Counts, and WPP), the module generates additional target population estimates for specific interventions by applying age-specific mortality adjustments:
Children aged 6-59 months (Vitamin A supplementation target population)
For each live birth denominator source, the estimated number of children aged 6-59 months is calculated:
Where:
sourcerepresents any of: anc1, delivery, bcg, penta1, livebirths, or wpp- The factor 4.5 represents the approximate duration (in years) of the Vitamin A target age range (6-59 months ≈ 4.5 years)
- Under-5 mortality rate adjusts for child survival to reach the 6-59 month age range
- Result: Estimated population of children aged 6-59 months eligible for Vitamin A supplementation
Infants under 12 months (fully immunized child target population)
For each live birth denominator source, the estimated number of infants under 12 months is calculated:
Where:
sourcerepresents any of: anc1, delivery, bcg, penta1, livebirths, or wpp- Infant mortality rate adjusts for survival to 12 months of age
- Result: Estimated population of infants under 1 year old eligible for full immunization assessment
These target population estimates are calculated automatically for all available live birth denominators, ensuring consistent methodology across different source indicators.
Workflow execution steps¶
Part 1 executes the following workflow for each administrative level (national, admin2, admin3):
Step 1: Load and validate input data
- Load HMIS adjusted data from Module 2 (national and subnational files)
- Load survey data from GitHub repository (unified DHS/MICS dataset)
- Load UN WPP population data from GitHub repository
- Validate ISO3 codes match across datasets
- Aggregate monthly HMIS data to annual totals
- Harmonize survey data (DHS prioritized over MICS)
- Forward-fill survey values to create continuous time series
Step 2: Calculate HMIS-based denominators
- For each health indicator with survey coverage data:
- Calculate base denominator:
count ÷ survey_coverage - Apply demographic cascades to derive related denominators
- Generate denominators from all available source indicators (ANC1, Delivery, BCG, Penta1, Live Births)
Step 3: Calculate WPP-based denominators
- Extract population projections for target country
- Calculate pregnancy estimates from crude birth rate
- Calculate live birth estimates
- Generate under-1 population denominators
- Apply mortality adjustments for vaccine-eligible populations
- Adjust for incomplete reporting periods (months reported < 12)
Step 4: Calculate secondary denominators
- For each
*_livebirthdenominator: - Calculate Vitamin A denominator:
livebirth × (1 - U5MR) × 4.5 - Calculate Fully Immunized denominator:
livebirth × (1 - IMR)
Step 5: Calculate coverage estimates
- Divide HMIS service volume by each denominator option
- Create coverage estimates for all indicator-denominator combinations
- Preserve survey-based coverage as benchmark
Step 6: Select best denominator
- For each indicator, compare all denominator-based coverage estimates to survey data
- Calculate squared error:
Σ(coverage_d,t - survey_t)² - Select denominator with minimum error as "best"
- Apply preference rules (HMIS-based preferred over WPP)
- Flag denominators as "reference" if from same service
Step 7: Format and save outputs
- Save denominator files with source and target metadata
- Save combined results with all coverage estimates
- Mark best denominator for easy filtering
- Include survey values in output
- Create separate files for national, admin2, and admin3 levels
- Generate empty files with correct structure for unavailable admin levels
Output files specification
Part 1 generates seven CSV files:
Denominator files
1. M4_denominators_national.csv
2. M4_denominators_admin2.csv
3. M4_denominators_admin3.csv
Structure:
Fields:
denominator: Full denominator name (e.g.,danc1_livebirth)source_indicator: Service used (e.g.,source_anc1,source_wpp)target_population: Target group (e.g.,target_livebirth,target_dpt)value: Calculated denominator size
Combined results files
4. M4_combined_results_national.csv
5. M4_combined_results_admin2.csv
6. M4_combined_results_admin3.csv
Structure:
Fields:
indicator_common_id: Health indicator (e.g.,anc1,penta3)denominator_best_or_survey: Eitherbest,survey, or specific denominator namevalue: Coverage percentage (0-100+)
Special "best" Entry: Duplicates the selected optimal denominator for easy filtering
7. M4_selected_denominator_per_indicator.csv
Purpose: Summary of the best-performing denominator selected for each indicator at each geographic level
Structure:
Fields:
indicator_common_id: Health indicator (e.g.,anc1,penta3)denominator_national: Best denominator for national-level coveragedenominator_admin2: Best denominator for admin level 2 coveragedenominator_admin3: Best denominator for admin level 3 coverage
Data safeguards and validation
Part 1 includes multiple validation checks:
-
ISO3 Validation: Ensures survey and population data match HMIS country
-
Geographic matching: Validates admin area names between HMIS and survey
- Reports match rate (e.g., "15/20 regions match")
-
Falls back to higher geographic level if mismatch detected
-
Fallback mechanisms:
- Subnational → National if no local survey data
- SBA → Delivery if SBA missing
-
PNC1_mother → PNC1 if missing
-
Edge case handling: Detects when admin_area_3 should be used as admin_area_2 in certain country contexts
-
Empty data handling: Creates empty CSVs with correct structure when data unavailable
-
Error handling: Wraps survey processing in
tryCatchto handle mismatches gracefully
Indicators supported
Part 1 processes the following health indicators:
Maternal health:
anc1: Antenatal care 1st visitanc4: Antenatal care 4+ visitsdelivery: Institutional deliverysba: Skilled birth attendancepnc1: Postnatal care (child)pnc1_mother: Postnatal care (mother)
Immunization:
bcg: BCG vaccinepenta1,penta2,penta3: Pentavalent vaccinemeasles1,measles2: Measles-containing vaccinerota1,rota2: Rotavirus vaccineopv1,opv2,opv3: Oral polio vaccinefully_immunized: Full immunization status
Child health:
nmr: Neonatal mortality rate (survey only)imr: Infant mortality rate (survey only)vitaminA: Vitamin A supplementation
Usage notes and best practices
When to Use Which Count Variable
count_final_none: No adjustments (raw reported data)count_final_outlier: Outlier adjustment onlycount_final_completeness: Completeness adjustment onlycount_final_both: Both adjustments (recommended)
Interpreting "best" Denominators
The "best" denominator may vary by indicator and area based on:
- Data availability (some services not universally reported)
- Reporting completeness (affects HMIS-based denominators)
- Population projection quality (affects WPP denominators)
- Survey coverage levels (extreme values reduce denominator options)
Why multiple denominators?
Different denominators serve different purposes:
- Independent denominators: Provide cross-validation between services
- Reference denominators: Show internal HMIS consistency (but excluded from "best" by default)
- WPP denominators: Offer population-based benchmarks
- Comparing multiple options reveals data quality issues
Troubleshooting common issues
Issue: No matching admin areas between HMIS and survey
- Solution: Check ISO3 code is correct; verify admin area naming conventions; module will fall back to national analysis
Issue: All denominators show >100% coverage
- Solution: May indicate under-reporting in survey or over-reporting in HMIS; check data quality from Module 2
Issue: UNWPP selected as "best" for most indicators
- Solution: May indicate poor HMIS data quality or completeness; review Module 2 adjustments
Part 2: Denominator selection and survey projection (technical details)¶
Purpose and Objectives¶
Part 2 serves three key purposes:
-
User-driven denominator selection: While Part 1 automatically selects the "best" denominator by minimizing error against survey data, Part 2 allows users to override this selection and choose specific denominators based on programmatic knowledge or policy priorities
-
Temporal trend analysis: Computes year-over-year changes (deltas) in coverage to understand service delivery trends over time
-
Survey projection: Projects survey-based coverage estimates forward in time using trends observed in administrative (HMIS) data, filling gaps where survey data is unavailable
User configuration¶
Users configure Part 2 through two key parameter sets:
1. Denominator selection configuration
At the top of the script, users specify which denominator to use for each indicator:
DENOMINATOR_SELECTION <- list(
# PREGNANCY-RELATED INDICATORS
anc1 = "best", # Options: "best", "danc1_pregnancy", "ddelivery_pregnancy", "dbcg_pregnancy", "dlivebirths_pregnancy", "dwpp_pregnancy"
anc4 = "best",
# LIVE BIRTH-RELATED INDICATORS
delivery = "best", # Options: "best", "danc1_livebirth", "ddelivery_livebirth", "dbcg_livebirth", "dlivebirths_livebirth", "dwpp_livebirth"
bcg = "best",
sba = "best",
pnc1_mother = "best",
pnc1 = "best",
# DPT-ELIGIBLE AGE GROUP INDICATORS
penta1 = "best", # Options: "best", "danc1_dpt", "ddelivery_dpt", "dpenta1_dpt", "dbcg_dpt", "dlivebirths_dpt", "dwpp_dpt"
penta2 = "best",
penta3 = "best",
opv1 = "best",
opv2 = "best",
opv3 = "best",
# MEASLES-ELIGIBLE AGE GROUP INDICATORS
measles1 = "best", # Options: "best", "danc1_measles1", "ddelivery_measles1", "dpenta1_measles1", "dbcg_measles1", "dlivebirths_measles1", "dwpp_measles1"
measles2 = "best",
# ADDITIONAL INDICATORS
vitaminA = "best", # Options: "best", "danc1_vitaminA", "dbcg_vitaminA", "ddelivery_vitaminA", "dwpp_vitaminA"
fully_immunized = "best" # Options: "best", "danc1_fully_immunized", "dbcg_fully_immunized", "ddelivery_fully_immunized", "dwpp_fully_immunized"
)
Denominator options by indicator type:
The available denominators vary by indicator type based on the appropriate target population:
- Pregnancy-based indicators (ANC1, ANC4): Use pregnancy-adjusted denominators
- Live birth-based indicators (Delivery, BCG, SBA, PNC): Use live birth-adjusted denominators
- DPT-eligible age group (Penta1-3, OPV1-3): Use DPT-adjusted denominators (children eligible for DPT)
- Measles-eligible age group (Measles1, Measles2): Use measles-adjusted denominators (children eligible for measles vaccine)
Each denominator option combines a source (ANC1, Delivery, BCG, Penta1, or WPP) with an age-adjustment factor.
2. Administrative level configuration
RUN_NATIONAL <- TRUE # Always TRUE - national analysis is mandatory
RUN_ADMIN2 <- TRUE # Enable/disable admin level 2 analysis
RUN_ADMIN3 <- TRUE # Enable/disable admin level 3 analysis
The script automatically checks data availability and disables admin levels with no data.
Core functions and methods¶
Function 1: coverage_deltas()
Purpose: Calculates year-over-year changes in coverage for each indicator-denominator-geography combination.
Algorithm:
Process:
- Groups data by geography (admin areas), indicator, and denominator
- Optionally fills in missing years to create a complete time series
- Sorts data chronologically within each group
- Calculates delta as: \(\Delta\text{coverage}_t = \text{coverage}_t - \text{coverage}_{t-1}\)
Mathematical formulation: $$ \Delta C_{i,d,g,t} = C_{i,d,g,t} - C_{i,d,g,t-1} $$
where: - \(C\) = coverage estimate - \(i\) = indicator - \(d\) = denominator - \(g\) = geographic area - \(t\) = time (year)
Input:
coverage_df: Data frame with coverage estimateslag_n: Number of years to lag (default = 1 for year-over-year)complete_years: Whether to fill missing years (default = TRUE)
Output:
Data frame with original coverage values plus a delta column showing year-over-year change.
Example output:
| admin_area_1 | indicator_common_id | denominator | year | coverage | delta |
|---|---|---|---|---|---|
| Country A | penta3 | dpenta1_dpt | 2018 | 75.2 | NA |
| Country A | penta3 | dpenta1_dpt | 2019 | 78.5 | 3.3 |
| Country A | penta3 | dpenta1_dpt | 2020 | 80.1 | 1.6 |
Function 2: project_survey_from_deltas()
Purpose: Projects survey-based coverage estimates forward using administrative data trends.
Algorithm:
Process:
- Identify baseline: For each geography-indicator combination, find the most recent survey observation
- Extract the last observed survey year
-
Record the baseline coverage value at that year
-
Attach baseline to each denominator path: Since Part 2 operates on specific denominator selections, attach the baseline to each denominator series
-
Compute cumulative deltas: For years after the baseline year, calculate cumulative sum of deltas:
$\(\text{cumulative delta}_t = \sum_{\tau = \text{baseline year} + 1}^{t} \Delta C_\tau\)$
- Calculate projection: Add cumulative delta to baseline value:
$\(\text{Projected coverage}_t = \text{Baseline coverage} + \text{cumulative delta}_t\)$
Mathematical formulation:
For each indicator \(i\), denominator \(d\), and geography \(g\):
- Find baseline:
- For \(t > y_{\text{baseline}}\):
where:
- \(S\) = survey-based coverage estimate
- \(\hat{S}\) = projected survey coverage
- \(\Delta C\) = year-over-year change in administrative coverage
Assumptions:
- Trends observed in administrative data reflect true changes in service coverage
- The baseline survey provides an accurate reference point
- Administrative data trends can be applied to survey estimates
Input:
deltas_df: Output fromcoverage_deltas()containing coverage changessurvey_raw_long: Raw survey data with years and values
Output:
Data frame with projected coverage for each year, indicator, denominator, and geography combination.
Example output:
| admin_area_1 | indicator_common_id | denominator | year | baseline_year | projected |
|---|---|---|---|---|---|
| Country A | penta3 | dpenta1_dpt | 2018 | 2018 | 75.0 |
| Country A | penta3 | dpenta1_dpt | 2019 | 2018 | 78.3 |
| Country A | penta3 | dpenta1_dpt | 2020 | 2018 | 79.9 |
Function 3: build_final_results()
Purpose: Combines HMIS coverage, projected survey estimates, and original survey values into a unified output dataset.
Algorithm:
Process:
- Prepare HMIS coverage: Extract coverage estimates from administrative data
-
Rename coverage column to
coverage_covfor clarity -
Merge projections: Join projected survey estimates
- Match by geography, year, indicator, and denominator
-
Create
coverage_avgsurveyprojectioncolumn -
Process original survey data (if available):
- Collapse multiple survey sources by taking mean value
- Preserve source metadata (source, source_detail)
-
Expand survey values across all denominators for that indicator
-
Calculate final projections: Use an improved projection formula that anchors to the last survey value:
For years after the last survey year:
$$ \text{Projected coverage}t = \text{Last survey value} + (C) $$},t} - C_{\text{HMIS, last survey year}
This additive approach: - Preserves the calibration to survey data - Applies the HMIS trend (delta) to extend the estimate forward - Avoids compounding errors from year-to-year deltas
- Combine results: Merge all components using full outer join to preserve:
- Years with only HMIS data
- Years with only survey data
- Years with both data sources
Mathematical formulation:
Let:
- \(t_s\) = year of last survey
- \(S_{t_s}\) = survey coverage at year \(t_s\)
- \(C_{\text{HMIS},t}\) = HMIS-based coverage at year \(t\)
For \(t > t_s\):
Input:
coverage_df: HMIS-based coverage estimates from selected denominatorsproj_df: Projected survey estimates fromproject_survey_from_deltas()survey_raw_df: Original survey data (optional)
Output:
Comprehensive data frame with columns:
- Geographic identifiers (admin_area_1, admin_area_2, admin_area_3)
- year, indicator_common_id, denominator
coverage_cov: HMIS-based coveragecoverage_original_estimate: Original survey valuescoverage_avgsurveyprojection: Projected survey coveragesurvey_raw_source: Survey data source (e.g., "DHS", "MICS")survey_raw_source_detail: Detailed source information
Helper functions¶
Helper function: filter_by_denominator_selection()
Purpose: Filters the combined results from Part 1 based on user's denominator selection.
Algorithm:
- Iterate through each indicator in
DENOMINATOR_SELECTION - For each indicator:
- If selection is "best": Keep rows where
denominator_best_or_survey == "best" - If selection is a specific denominator: Keep rows where
denominator_best_or_survey == selected_denominator - Convert selected rows to coverage format (rename columns, filter out survey entries)
- Combine results across all indicators
Input:
combined_results_df: Output from Part 1 with all denominator optionsselection_list: The DENOMINATOR_SELECTION configuration list
Output:
Filtered data frame containing only the user-selected denominators.
Helper function: extract_survey_from_combined()
Purpose: Extracts raw survey values from Part 1 combined results.
Algorithm:
- Filter for rows where
denominator_best_or_survey == "survey" - Rename
valuecolumn tosurvey_value - Select relevant columns dynamically based on admin levels present
Input:
Combined results data frame from Part 1
Output:
Survey data frame with columns: admin areas, year, indicator_common_id, survey_value
Workflow execution steps¶
Part 2 executes the following workflow for each administrative level (national, admin2, admin3):
Step 1: Load data
- Load combined results from Part 1 for all admin levels
- Check which admin levels have data
- Extract survey data for use as projection baseline
- Display messages about data availability
Step 2: For each admin level
Sub-step 1: Filter by denominator selection
- Apply user's denominator choices using
filter_by_denominator_selection() - Message: Number of records selected
Sub-step 2: Compute deltas
- Calculate year-over-year coverage changes using
coverage_deltas() - Creates complete time series with gaps filled
Sub-step 3: Project survey values
- Use
project_survey_from_deltas()to extend survey estimates - Baseline is anchored to most recent survey
- Projections use cumulative deltas from HMIS trends
Sub-step 4: Build final results
- Combine HMIS coverage, projections, and original surveys
- Calculate final projected estimates using additive formula
- Preserve all metadata
Step 3: Standardize and save outputs
- Define required columns for each admin level
- Ensure all required columns exist (add as NA if missing)
- Order columns correctly
- Remove inappropriate admin level columns
- Save as CSV with UTF-8 encoding
- Create empty files for admin levels with no data
Output specifications¶
Part 2 produces three output files:
1. National Output: M5_coverage_estimation_national.csv¶
Columns:
admin_area_1: Country nameyear: Year of estimateindicator_common_id: Standardized indicator codedenominator: Selected denominator sourcecoverage_original_estimate: Original survey-based coverage (NA for years without surveys)coverage_avgsurveyprojection: Projected survey coverage using HMIS trendscoverage_cov: HMIS-based coverage estimatesurvey_raw_source: Survey source (e.g., "DHS 2018")survey_raw_source_detail: Additional source details
2. Admin Level 2 Output: M5_coverage_estimation_admin2.csv¶
Columns:
Same as national, plus:
admin_area_2: Second-level administrative division name (e.g., province, region)
3. Admin Level 3 Output: M5_coverage_estimation_admin3.csv¶
Columns:
admin_area_1: Country nameadmin_area_3: Third-level administrative division name (e.g., district)year: Year of estimateindicator_common_id: Standardized indicator codedenominator: Selected denominator sourcecoverage_original_estimate: Original survey coveragecoverage_avgsurveyprojection: Projected survey coveragecoverage_cov: HMIS-based coveragesurvey_raw_source: Survey sourcesurvey_raw_source_detail: Source details
Methodological considerations¶
1. Denominator selection strategy
When to use "best":
- Uncertain about which denominator is most appropriate
- Want to rely on data-driven selection from Part 1
- Starting point for analysis
When to specify a denominator:
- Programmatic knowledge suggests a specific denominator is most accurate
- Policy requirements dictate use of specific population estimates
- Conducting sensitivity analyses
- Known issues with certain data sources
2. Projection methodology
The projection approach in Part 2 uses an additive delta method rather than multiplicative or direct replacement:
Advantages:
- Preserves the level calibration from survey data
- Smoothly extends survey estimates using administrative trends
- Avoids compounding errors from year-to-year changes
- Maintains consistency when HMIS coverage is stable
Limitations:
- Assumes HMIS trends reflect true coverage changes
- May diverge from reality if administrative data quality declines
- Projections become less reliable further from baseline survey
- Does not account for systematic biases in HMIS data
Best practice: Projections should be validated against new survey data when available, and the baseline should be updated with the most recent survey.
3. Handling missing data
Part 2 implements several strategies for missing data:
- Complete time series: The
coverage_deltas()function can fill missing years, creating a continuous series - Survey gaps: Projections extend estimates forward, but years before the first survey remain NA
- Admin level gaps: Script automatically detects and skips admin levels with no data
- Missing denominators: If a selected denominator does not exist for an indicator, that indicator-denominator combination is omitted
4. Multi-level analysis consistency
Part 2 processes each administrative level independently:
- National: Aggregated country-level estimates
- Admin 2: Provincial/regional estimates (may not sum to national due to different denominators)
- Admin 3: District-level estimates
Important: Estimates across levels may not be directly comparable if different denominators are selected or if data quality varies by level.
Validation and quality checks
Users should validate Part 2 outputs by:
- Checking projection reasonableness:
- Are projected values within plausible ranges (0-100%)?
-
Do trends make programmatic sense?
-
Comparing denominators:
- Run Part 2 with different denominator selections
-
Assess sensitivity of results to denominator choice
-
Validating against new surveys:
- When new survey data becomes available, compare projections to actual values
-
Update baseline and re-run if necessary
-
Reviewing HMIS trends:
- Large deltas may indicate data quality issues
-
Sudden changes should be investigated
-
Admin level consistency:
- Check if subnational trends align with national patterns
- Investigate large discrepancies
Troubleshooting common issues
Issue: "No data in admin2 combined results"
- Cause: Part 1 didn't process admin level 2, or no subnational data exists
- Solution: Set
RUN_ADMIN2 <- FALSEor check Part 1 inputs
Issue: Projections show implausible values (>100% or <0%)
- Cause: Large errors in HMIS data or inappropriate denominator
- Solution: Review denominator selection, check HMIS data quality, consider different denominator
Issue: Missing denominators in output
- Cause: Selected denominator not calculated in Part 1 for that indicator
- Solution: Check Part 1 denominator options, verify indicator-denominator compatibility
Issue: Gaps in projected coverage
- Cause: Missing HMIS data for some years
- Solution: Review Module 2 outputs, check data completeness adjustments
Code examples¶
Example 1: Running Part 1 with default settings
# Set working directory
setwd("/path/to/module/directory")
# Load required libraries
library(dplyr)
library(tidyr)
library(zoo)
library(stringr)
library(purrr)
# Configure country
COUNTRY_ISO3 <- "KEN" # Replace with your country code
# Use default analysis level (national + admin2)
ANALYSIS_LEVEL <- "NATIONAL_PLUS_AA2"
# Run Part 1
source("05_module_coverage_estimates_part1.R")
Part 1 generates denominator estimates and selects the best denominator for each indicator based on survey comparison.
Example 2: Adjusting mortality parameters
# Use country-specific mortality rates from DHS or other sources
PREGNANCY_LOSS_RATE <- 0.04 # Default: 0.03
TWIN_RATE <- 0.02 # Default: 0.015
STILLBIRTH_RATE <- 0.025 # Default: 0.02
P1_NMR <- 0.045 # Default: 0.039
P2_PNMR <- 0.030 # Default: 0.028
INFANT_MORTALITY_RATE <- 0.070 # Default: 0.063
UNDER5_MORTALITY_RATE <- 0.110 # Default: 0.103
# These parameters affect survival-adjusted denominators
source("05_module_coverage_estimates_part1.R")
Sources for country-specific rates: DHS final reports, UN Inter-agency Group for Child Mortality Estimation (UN IGME), or national vital statistics.
Example 3: Running Part 2 with custom denominator selections
# Override automatic "best" selection for specific indicators
DENOM_ANC1 <- "danc1_pregnancy" # Use ANC1-based denominator
DENOM_PENTA3 <- "dwpp_dpt" # Use WPP population estimate
DENOM_MEASLES1 <- "best" # Keep automatic selection
# Run Part 2
source("06_module_coverage_estimates_part2.R")
Use case: When programmatic knowledge suggests a specific denominator is more appropriate than the statistically selected option.
Example 4: National-only analysis for rapid assessment
# Part 1: Run national level only (faster)
ANALYSIS_LEVEL <- "NATIONAL_ONLY"
source("05_module_coverage_estimates_part1.R")
# Part 2: Will automatically skip subnational levels
source("06_module_coverage_estimates_part2.R")
Use case: Initial exploratory analysis, or when subnational survey data is unavailable.
Example 5: Full subnational analysis
# Part 1: Include admin3 level
ANALYSIS_LEVEL <- "NATIONAL_PLUS_AA2_AA3"
source("05_module_coverage_estimates_part1.R")
# Part 2: Will process all available levels
source("06_module_coverage_estimates_part2.R")
Use case: Detailed district-level analysis where subnational survey data exists.
Example 6: Programmatic use of outputs
# Load coverage outputs
coverage_national <- read.csv("M5_coverage_estimation_national.csv")
coverage_admin2 <- read.csv("M5_coverage_estimation_admin2.csv")
# Filter to specific indicator
penta3_national <- coverage_national %>%
filter(indicator_common_id == "penta3")
# Compare HMIS-based and survey-projected coverage
coverage_comparison <- penta3_national %>%
select(year, coverage_cov, coverage_avgsurveyprojection, coverage_original_estimate) %>%
mutate(
hmis_survey_gap = coverage_cov - coverage_avgsurveyprojection,
data_source = case_when(
!is.na(coverage_original_estimate) ~ "Survey",
!is.na(coverage_avgsurveyprojection) ~ "Projected",
TRUE ~ "HMIS only"
)
)
# Identify admin2 areas with coverage below threshold
low_coverage_areas <- coverage_admin2 %>%
filter(indicator_common_id == "penta3", year == max(year)) %>%
filter(coverage_avgsurveyprojection < 80) %>%
arrange(coverage_avgsurveyprojection)
Usage notes¶
Output file columns
Part 2 output files (M5_coverage_estimation_*.csv) contain:
| Column | Description |
|---|---|
admin_area_1 | Country name |
admin_area_2 / admin_area_3 | Subnational area (where applicable) |
year | Calendar year |
indicator_common_id | Health indicator code |
denominator | Selected denominator type |
coverage_cov | HMIS-derived coverage (numerator ÷ denominator × 100) |
coverage_original_estimate | Survey value where available |
coverage_avgsurveyprojection | Survey value projected using HMIS trends |
survey_raw_source | Survey source (DHS/MICS) |
survey_raw_source_detail | Specific survey name and year |
Reviewing denominator options
Part 1 output files (M4_combined_results_*.csv) contain coverage estimates from all denominator options. To review:
- Open the combined results file
- Filter to indicator of interest
- Compare
valuecolumn across differentdenominator_best_or_surveyentries - The row marked
bestshows the automatically selected denominator - Rows marked
surveyshow actual survey observations
To override automatic selection in Part 2, set the DENOM_* parameters to a specific denominator name instead of "best".
Subnational data requirements
The module checks for subnational data availability:
- If
ANALYSIS_LEVELis set to include admin2 or admin3, the module validates that matching survey data exists - If no matching subnational survey data is found, the module falls back to a higher geographic level
- Console messages indicate which analysis levels are being processed
Validation checks
After running both parts, review outputs for:
- Coverage values outside expected range (negative or >100%)
- Gaps in time series (missing years)
- Consistency between
coverage_covandcoverage_avgsurveyprojection - Denominator selections in Part 1 output
Value add beyond standard DHIS2 analysis¶
While DHIS2 provides a robust foundation for data collection, storage, and basic visualization, FASTR builds on this foundation with additional capabilities: automatic data quality adjustment before analysis, advanced analytical methods including disruption detection and coverage projection, standardized visualizations using percent-change approaches, improved coverage estimation using survey-derived denominators, faster analytics cycles aligned with country decision-making timelines, and built-in capacity strengthening through reproducible methods.