Skip to content

Database Integration

The Database Integration feature allows you to append cleaned data to the unified survey database stored on GitHub.

GitHub Repository

  • Repository: FASTR-Analytics/modules
  • Branch: main
  • Files:
  • survey_data_unified.csv - Survey indicators
  • population_estimates_only.csv - Population estimates

Workflow Overview

┌─────────────────┐
│ 1. Pull from    │
│    GitHub       │
└────────┬────────┘
         ▼
┌─────────────────┐
│ 2. Validate     │
│    Names        │
└────────┬────────┘
         ▼
┌─────────────────┐
│ 3. Check        │
│    Duplicates   │
└────────┬────────┘
         ▼
┌─────────────────┐
│ 4. Append &     │
│    Push         │
└─────────────────┘

Step 1: Pull from GitHub

Click "Pull Latest from GitHub" to fetch the current database.

This ensures you're working with the most recent version, especially important for collaborative workflows.

Success message:

Successfully pulled from GitHub!
Survey records: 19,419
Population records: 1,318


Step 2: Validate Admin Area Names

Click "Validate Admin Area Names" to check if your data's geographic areas match the existing database.

If all names match:

✓ All admin area names match! Proceed to duplicate check.

If there are unmatched names:

For each unmatched area, you'll see a dropdown to:

  • Select the correct name from the database
  • Choose "IGNORE" to skip those records

Example:

Nigeria - Lagos State
Years: 2018, 2019 | Records: 24
Select correct name or IGNORE: [Dropdown: IGNORE, la Lagos State, ...]

After making selections, click "Apply Corrections & Continue".


Step 3: Check for Duplicates

Click "Check for Duplicates" to identify records that already exist in the database.

A record is considered a duplicate if it matches on:

  • admin_area_1 (country)
  • admin_area_2 (province)
  • year
  • indicator_common_id

Results Summary

After checking, you'll see a summary showing: - X with different values: Records that exist but have different values - Y with same values: Records that already match (no action needed) - Z new records: Records not in the database

Handling Different Values

For records with different values, you have two options:

Per-row actions: Each row shows the current action and a toggle button: - Click "Replace" to use the new fetched value - Click "Keep Instead" to keep the existing database value

Bulk actions: Use the dropdown and "Apply to All" button to set all different-value records at once.

Record Types

Section Description Action
Different Values Values don't match database Choose keep or replace
Same Values Values already match Skipped automatically
New Records Not in database Will be added

Step 4: Append & Push to GitHub

Commit Message

The app auto-generates detailed commit messages:

Add data: NGA, GIN, SEN (156 records)

Countries: NGA, GIN, SEN
Indicators: anc1, penta1, penta3, bcg, measles1
Years: 2018-2023
Source: DHS
Records: 156

Notes: Updated with latest DHS 2023 data

Timestamp: 2026-01-12 16:45 UTC

You can add custom notes in the "Commit Notes" field.

Push

  1. Ensure "Push changes to GitHub" is checked
  2. Click "Append & Push to GitHub"
  3. Wait for confirmation

Success message:

✓ Success!
Survey records added: 145
Population records added: 11
Total survey database: 19,564 records
Total population database: 1,329 records
Changes pushed to GitHub!


Setting Up GitHub Token

Create a Personal Access Token

  1. Go to github.com/settings/tokens
  2. Click "Generate new token"
  3. Configure:
  4. Name: FASTR Survey Fetcher
  5. Repository access: Select FASTR-Analytics/modules
  6. Permissions: Contents → Read and write
  7. Click "Generate token"
  8. Copy the token (you won't see it again!)

Add Token to Environment

Local (.Renviron file):

GITHUB_TOKEN=ghp_your_token_here

Hugging Face Spaces: 1. Go to Space Settings 2. Find "Variables and secrets" 3. Add secret: GITHUB_TOKEN = your token


Data Routing

Records are automatically routed to the correct file based on indicator_common_id:

File Indicators
survey_data_unified.csv anc1, penta1, bcg, measles1, u5mr, mcpr, etc.
population_estimates_only.csv poptot, popu5, livebirth, womenrepage, totu1pop, totu5pop, popgrowth