Python Library

Installation

pip install gri

Requires Python 3.8+. Dependencies: pandas, numpy, matplotlib, pyyaml.

Quick Start

Object-Oriented API

from gri import GRIAnalysis

# Load survey and compute full scorecard
analysis = GRIAnalysis.from_survey_file("survey_data.csv")
scorecard = analysis.calculate_scorecard(include_max_possible=True)

# Visualize and report
analysis.plot_scorecard(save_to="scorecard.png")
print(analysis.generate_report())

Functional API

from gri import calculate_gri, load_benchmark_suite, load_gd_survey

# Load data
survey = load_gd_survey(3)  # Global Dialogues wave 3
benchmarks = load_benchmark_suite()

# Calculate GRI for a single dimension
gri_score = calculate_gri(
    survey_df=survey,
    benchmark_df=benchmarks["country_gender_age"],
    strata_cols=["country", "gender", "age_group"]
)
print(f"GRI: {gri_score:.3f}")

Key Functions

Function Description
calculate_gri() Core GRI calculation via TVD
calculate_diversity_score() Strata coverage measurement
calculate_gri_scorecard() Multi-dimensional scorecard
calculate_design_effect() Design effect, effective N, and precision retained
calculate_sri() Strategic Representativeness Index
monte_carlo_max_scores() Maximum achievable score simulation
calculate_efficiency_ratio() Actual vs. theoretical maximum

Key Classes

Class Description
GRIAnalysis High-level analysis wrapper (load, calculate, plot, report)
GRIScorecard Configuration-driven scorecard generator
GRIConfig Configuration management

Data Format

Survey data should be a CSV with one row per respondent and demographic columns:

country,gender,age_group,religion,environment
India,Female,25-34,Hindu,Urban
Brazil,Male,35-44,Christian,Urban
Nigeria,Female,18-24,Muslim,Rural
...

Required columns depend on which dimensions you calculate:

Column Values Used In
country Country name All geographic dimensions
gender Male, Female Gender-related dimensions
age_group 5-year bands (e.g., 18-24) Age-related dimensions
religion Major religion category Religion dimensions
environment Urban, Rural Environment dimensions

The library includes built-in loaders for Global Dialogues data (load_gd_survey()) and World Values Survey data (load_wvs_survey()).

Interpreting Scores

GRI Range Interpretation
0.90–1.00 Excellent — near-perfect demographic match
0.70–0.89 Good — strong representation with minor gaps
0.50–0.69 Moderate — noticeable demographic skew
0.30–0.49 Low — significant underrepresentation in key strata
0.00–0.29 Very low — major demographic mismatch

Context matters: a GRI of 0.35 on Country x Gender x Age (2,699 strata) is more impressive than 0.35 on Continent (6 strata). Always consider the maximum achievable score for the dimension.

Advanced Features

Design Effect and Effective Sample Size

Quantifies the precision cost of demographic mismatch. Returns the design effect, effective sample size, and precision retained:

from gri import calculate_design_effect

result = calculate_design_effect(
    survey_df=survey,
    benchmark_df=benchmark,
    strata_cols=["country", "gender", "age_group"]
)
print(f"Design Effect: {result['design_effect']:.2f}")
print(f"Effective N: {result['effective_n']:.0f}")
print(f"Precision Retained: {result['precision_retention']:.1%}")

Strategic Representativeness Index (SRI)

Combines GRI with diversity scores using configurable weights:

from gri import calculate_sri

sri = calculate_sri(
    survey_df=survey,
    benchmark_df=benchmark,
    strata_cols=["country", "gender", "age_group"],
    gri_weight=0.7,
    diversity_weight=0.3
)

Visualization

from gri import plot_gri_scorecard, plot_segment_deviations

# Full scorecard heatmap
plot_gri_scorecard(scorecard, save_to="heatmap.png")

# Segment-level deviation analysis
plot_segment_deviations(analysis, dimension="Country × Gender × Age",
                        top_n=20, save_to="deviations.png")

Monte Carlo Simulation

Estimate the maximum achievable GRI for a given sample size:

from gri import monte_carlo_max_scores

max_scores = monte_carlo_max_scores(
    benchmark_df=benchmark,
    strata_cols=["country", "gender", "age_group"],
    sample_size=1000,
    n_simulations=1000
)
print(f"Max GRI at N=1000: {max_scores['max_gri']:.3f}")

Source Code

The full source code is available on GitHub. Issues and contributions are welcome.