Python Library

Installation

pip install gri

Requires Python 3.8+. Dependencies: pandas, numpy, matplotlib, pyyaml.

Quick Start

Object-Oriented API

from gri import GRIAnalysis

# Load survey and compute full scorecard
analysis = GRIAnalysis.from_survey_file("survey_data.csv")
scorecard = analysis.calculate_scorecard(include_max_possible=True)

# Visualize and report
analysis.plot_scorecard(save_to="scorecard.png")
print(analysis.generate_report())

Functional API

from gri import calculate_gri, load_benchmark_suite, load_gd_survey

# Load data
survey = load_gd_survey(3)  # Global Dialogues wave 3
benchmarks = load_benchmark_suite()

# Calculate GRI for a single dimension
gri_score = calculate_gri(
    survey_df=survey,
    benchmark_df=benchmarks["country_gender_age"],
    strata_cols=["country", "gender", "age_group"]
)
print(f"GRI: {gri_score:.3f}")

Key Functions

Function Description
calculate_gri() Core GRI calculation via TVD
calculate_diversity_score() Strata coverage measurement
calculate_gri_scorecard() Multi-dimensional scorecard
calculate_vwrs() Variance-Weighted Representativeness Score
calculate_sri() Strategic Representativeness Index
monte_carlo_max_scores() Maximum achievable score simulation
calculate_efficiency_ratio() Actual vs. theoretical maximum

Key Classes

Class Description
GRIAnalysis High-level analysis wrapper (load, calculate, plot, report)
GRIScorecard Configuration-driven scorecard generator
GRIConfig Configuration management

Data Format

Survey data should be a CSV with one row per respondent and demographic columns:

country,gender,age_group,religion,environment
India,Female,25-34,Hindu,Urban
Brazil,Male,35-44,Christian,Urban
Nigeria,Female,18-24,Muslim,Rural
...

Required columns depend on which dimensions you calculate:

Column Values Used In
country Country name All geographic dimensions
gender Male, Female Gender-related dimensions
age_group 5-year bands (e.g., 18-24) Age-related dimensions
religion Major religion category Religion dimensions
environment Urban, Rural Environment dimensions

The library includes built-in loaders for Global Dialogues data (load_gd_survey()) and World Values Survey data (load_wvs_survey()).

Interpreting Scores

GRI Range Interpretation
0.90–1.00 Excellent — near-perfect demographic match
0.70–0.89 Good — strong representation with minor gaps
0.50–0.69 Moderate — noticeable demographic skew
0.30–0.49 Low — significant underrepresentation in key strata
0.00–0.29 Very low — major demographic mismatch

Context matters: a GRI of 0.35 on Country x Gender x Age (2,699 strata) is more impressive than 0.35 on Continent (6 strata). Always consider the maximum achievable score for the dimension.

Advanced Features

Variance-Weighted Representativeness Score (VWRS)

Weights each stratum’s deviation by the variance of responses within that stratum, prioritizing representation in segments where opinions actually differ:

from gri import calculate_vwrs

vwrs = calculate_vwrs(
    survey_df=survey,
    benchmark_df=benchmark,
    response_col="ai_sentiment",
    strata_cols=["country", "gender"]
)

Strategic Representativeness Index (SRI)

Combines GRI with diversity scores using configurable weights:

from gri import calculate_sri

sri = calculate_sri(
    survey_df=survey,
    benchmark_df=benchmark,
    strata_cols=["country", "gender", "age_group"],
    gri_weight=0.7,
    diversity_weight=0.3
)

Visualization

from gri import plot_gri_scorecard, plot_segment_deviations

# Full scorecard heatmap
plot_gri_scorecard(scorecard, save_to="heatmap.png")

# Segment-level deviation analysis
plot_segment_deviations(analysis, dimension="Country × Gender × Age",
                        top_n=20, save_to="deviations.png")

Monte Carlo Simulation

Estimate the maximum achievable GRI for a given sample size:

from gri import monte_carlo_max_scores

max_scores = monte_carlo_max_scores(
    benchmark_df=benchmark,
    strata_cols=["country", "gender", "age_group"],
    sample_size=1000,
    n_simulations=1000
)
print(f"Max GRI at N=1000: {max_scores['max_gri']:.3f}")

Source Code

The full source code is available on GitHub. Issues and contributions are welcome.